Add some Sail internals documentation

author: Alasdair Armstrong 2019-05-29 19:21:36 +0100
committer: Alasdair Armstrong 2019-05-29 19:22:21 +0100
commit: baf2f0b11b05722a54bf3c504661c27dd8133f4b (patch)
tree: fa65bcd5ac94b0fa5ab7575333f8ba49f5f7e643
parent: 51d01d8bacc8d117210984c48c54b17e35236848 (diff)
2 files changed, 454 insertions, 0 deletions
diff --git a/doc/internals.md b/doc/internals.md
new file mode 100644
index 00000000..90faf734
--- /dev/null
+++ b/doc/internals.md
@@ -0,0 +1,454 @@
+# Sail Internals
+
+## Abstract Syntax Tree
+
+The Sail abstract syntax tree (AST) is defined by an ott grammar in
+[sail.ott](https://github.com/rems-project/sail/blob/sail2/language/sail.ott). This
+generates a OCaml (and lem) version of the ast in `src/ast.ml` and
+`src/ast.lem`. Technically the OCaml version of the AST is generated
+by [Lem](https://github.com/rems-project/lem), which allows the Sail
+OCaml source to seamlessly interoperate with parts written in
+Lem. Although we do not make much use of this, it in principle allows
+us to implement parts of Sail in Lem, which would enable them to be
+verified in Isabelle or HOL.
+
+The Sail AST looks something like:
+
+```OCaml
+and 'a exp =
+ | E_aux of ( 'a exp_aux) * 'a annot
+
+and 'a exp_aux =  (* expression *)
+ | E_block of ( 'a exp) list (* sequential block *)
+ | E_id of id (* identifier *)
+ | E_lit of lit (* literal constant *)
+ | E_cast of typ * ( 'a exp) (* cast *)
+ | E_app of id * ( 'a exp) list (* function application *)
+ | E_tuple of ( 'a exp) list (* tuple *)
+ | E_if of ( 'a exp) * ( 'a exp) * ( 'a exp) (* conditional *)
+ ...
+```
+
+Each constructor is prefixed by a unique code (in this case `E` for
+expressions), and is additionally wrapped by an *aux* constructor
+which attaches an annotation to each node in the AST, consisting of an
+arbitrary type that parameterises the AST, and a location identifying
+the position of the AST node in the source code:
+
+```
+type 'a annot = l * 'a
+```
+
+There are various types of locations:
+
+```
+type l =
+  | Unknown
+  | Unique of int * l
+  | Generated of l
+  | Range of Lexing.position * Lexing.position
+  | Documented of string * l
+```
+
+`Range` defines a range of positions given by the parser, the
+`Documented` constructor attaches doc-comments to
+locations. `Generated` is used to signify that code is generated based
+on code from some other location. `Unique` is used internally to tag
+locations with unique integers so they can be referred to
+later. `Unknown` is used for Sail that has no obvious corresponding
+location, although this should be avoided as much possible as it leads
+to poor error messages. Ast nodes programatically generated initially
+have `Unknown` locations, but `Ast_util.locate` can be used to
+recursively attach `Generated` locations for error purposes.
+
+A convention in the Sail source is that a single variable `l` is
+always a location, usually the closest location.
+
+### AST utilities
+
+There are various functions for manipulating the AST defined in
+[ast_util](https://github.com/rems-project/sail/blob/sail2/src/ast_util.mli). These
+include constructor functions like `mk_exp` for generating untyped
+expressions and various functions for destructuring AST nodes. It also
+defines various useful Sets and Maps, e.g. sets of identifiers, type
+variables etc. Note that type variables in Sail are often internally
+referred to as *kids*, I think this is because type variables are
+defined as having a specific *kind*, i.e. `'n : Int` is a type
+variable with kind `Int`. (Although `kinded_id` is technically a
+separate type which is a type variablem * kind pair).
+
+### Parse AST
+
+There is a separate AST
+[parse_ast.ml](https://github.com/rems-project/sail/blob/sail2/src/parse_ast.ml)
+which the parser generates. It is very similar to the main AST except
+it contains some additional syntactic sugar. The parse ast is
+desugared by the
+[initial_check](https://github.com/rems-project/sail/blob/sail2/src/initial_check.mli)
+file, which performs some basic checks in addition from mapping the
+parse AST to the full AST.
+
+## Overall Structure
+
+The main entry point for Sail is the file
+[sail.ml](https://github.com/rems-project/sail/blob/sail2/src/sail.ml). Each
+backend option e.g. `-c`, `-lem`, `-ocaml` etc, is referred to as a
+*target*, and the `set_target` function is used to set the
+`opt_target` variable, for example
+
+```OCaml
+  ( "-c",
+    Arg.Tuple [set_target "c"; Arg.Set Initial_check.opt_undefined_gen],
+	" output a C translated version of the input");
+```
+
+defines the `-c` option. Each Sail option is configured via via `opt_`
+variables defined at the top of each relevant file. In this case we
+tell Sail that when we generate C, we also want to generated
+`undefined_X` functions for each type `X`. In general such `opt_`
+variables should be set when we start Sail and remain immutable
+thereafter.
+
+The first function called by `main` is `Sail.load_files`. This
+function parses all the files passed to Sail, and then concatenates
+their ASTs. The preprocessor is then run, which evaluates `$directive`
+statements in Sail, such as
+
+```
+$include <prelude.sail>
+```
+
+Unlike the C preprocessor the Sail pre-processor operates over actual
+Sail ASTs rather than strings. This can recursively include other
+files into the AST, as well as add/remove parts of the AST with
+`$ifdef` etc. Directives that are used are preserved in the AST, so
+they also function as a useful way to pass auxilliary information to
+the various Sail backends.
+
+The initial check mentioned above is then run to desugar the AST, and
+then the type-checker is run which produces a fully type-checked
+AST. Type annotations are attached to every node (for which an
+annotation makes sense) using the aux constructors. The type-checker
+is discussed in more details later.
+
+After type-checking the Sail scattered definitions are de-scattered
+into single functions.
+
+All the above is shared by each target and performed by the
+`load_files` function.
+
+The next step is to apply various target-specific rewrites to the AST
+before passing it to the backend for each target.
+
+The file
+[rewrites.ml](https://github.com/rems-project/sail/blob/sail2/src/rewrites.ml) defines a list of rewrites:
+
+```OCaml
+let all_rewrites = [
+	("pat_string_append", Basic_rewriter rewrite_defs_pat_string_append);
+    ("mapping_builtins", Basic_rewriter rewrite_defs_mapping_patterns);
+    ("mono_rewrites", Basic_rewriter mono_rewrites);
+    ("toplevel_nexps", Basic_rewriter rewrite_toplevel_nexps);
+	("monomorphise", Basic_rewriter monomorphise);
+	...
+```
+
+and each target specifies a list of rewrites to apply like so:
+
+```OCaml
+let rewrites_interpreter = [
+    ("no_effect_check", []);
+    ("realise_mappings", []);
+    ("toplevel_string_append", []);
+    ("pat_string_append", []);
+    ("mapping_builtins", []);
+    ("undefined", [Bool_arg false]);
+    ("vector_concat_assignments", []);
+    ("tuple_assignments", []);
+    ("simple_assignments", [])
+  ]
+```
+
+Once these rewrites have occurred the `Sail.target` function is called
+which invokes the backend for each target, e.g. for OCaml:
+
+```OCaml
+  | Some "ocaml" ->
+     let ocaml_generator_info =
+       match !opt_ocaml_generators with
+       | [] -> None
+       | _ -> Some (Ocaml_backend.orig_types_for_ocaml_generator ast, !opt_ocaml_generators)
+     in
+     let out = match !opt_file_out with None -> "out" | Some s -> s in
+	 Ocaml_backend.ocaml_compile out ast ocaml_generator_info
+```
+
+There is also a `Sail.prover_regstate` function that allows the
+register state to be Set up in a prover-agnostic way for each of the
+theorem-prover targets.
+
+## Type Checker
+
+The Sail type-checker is contained within
+[src/type_check.ml](https://github.com/rems-project/sail/blob/sail2/src/type_check.ml). This
+file is long, but is structured as follows:
+
+1. The type checking environment is defined. The functions that
+   operate on the typing environment are contained with a separate
+   `Env` module. The purpose of this module is to hide the internal
+   representation of the typing environment and ensure that the main
+   type-checker code can only interact with it using the functions
+   defined in this module, which can be set up to guarantee any
+   invariant we need. Code outside the type-checker itself can only
+   interact with an even more restricted subset of these functions
+   exported via the mli interface file.
+
+2. Helper definitions for subtyping and constraint solving are set
+   up. The code that actually talks to an external SMT solver is
+   contained within a separate file
+   [src/constraint.ml](https://github.com/rems-project/sail/blob/sail2/src/constraint.ml),
+   wheras the code here sets up the interface between the typing
+   environment and the solver.
+
+3. Next comes the definitions for unification and instantiation of
+   types. There is some additional code (3.5) to handle subtyping with
+   existential types, which can use unification to instantiate
+   existentially quantified type variables.
+
+4. Sail allows some type-level constructs to appear in term-level
+   variables, but these are then eliminated during type-checking in a
+   process called *sizeof*-rewriting, after the (somewhat-awkwardly)
+   named *sizeof* keyword.
+
+5. Finally all the typing rules are given. Sail has a bi-directional
+   type-checking approach. So there are checking rules like
+   `check_exp`, `check_pat`, etc., as well as inference rules like
+   `infer_exp`, `infer_pat`, etc.
+
+6. Effects added by the previous typing rules are now propagated
+   upwards through the AST
+
+7. Finally we have rules for checking and processing toplevel
+   definitions of functions and datatypes.
+
+The interface by which the rest of Sail can interact with the
+type-checker and type annotations is strictly controlled by it's
+[mli interface](https://github.com/rems-project/sail/blob/sail2/src/type_check.mli)
+file. As much of the type-checking internals is kept as abstract as
+possible.
+
+## Rewriter
+
+The rewriting framework used by the various rewrites mentioned
+previously is defined in
+[src/rewriter.ml](https://github.com/rems-project/sail/blob/sail2/src/rewriter.ml). It
+contains various large structs with functions for every single AST
+node, and allows data to be threaded through each re-write in various
+ways. Most of the re-writes are defined in
+[src/rewrites.ml](https://github.com/rems-project/sail/blob/sail2/src/rewrites.ml),
+although the re-writer is used for other rewrite like passes such as
+e.g. constant folding in
+[src/constant_fold.ml](https://github.com/rems-project/sail/blob/sail2/src/constant_fold.ml)
+which combines the rewriter with the Sail interpreter.
+
+The rewriter also acts as the interface to the Sail monomorphisation
+code, in
+[src/monomorphise.ml](https://github.com/rems-project/sail/blob/sail2/src/monomorphise.ml).
+
+## Jib
+
+The C and SMT backends for Sail use a custom intermediate
+representation (IR) called *Jib* (it's a type of Sail). Like the full
+AST this is defined as an ott grammar in
+[language/jib.ott](https://github.com/rems-project/sail/blob/sail2/language/jib.ott). The
+sail "-ir" option allows Sail to be converted into this IR without any further processing.
+
+The Jib related files are contained within a subdirectory
+`src/jib/`. First we convert Sail to A-normal form (ANF) in
+[src/jib/anf.ml](https://github.com/rems-project/sail/blob/sail2/src/jib/anf.ml),
+then
+[src/jib/jib_compile.ml](https://github.com/rems-project/sail/blob/sail2/src/jib/jib_compile.ml)
+converts this into Jib.
+
+The Jib representation has the advantage of being much simpler than
+the full Sail AST, and it removes a lot of the typing complexity, as
+types are replaced with simpler ones (`ctyp`). Note that many
+Jib-related types are prefixed by `c` as it was originally only used
+when generating C.
+
+The key optimization we do when generating Jib is we analyze the Sail
+types using SMT to try to fit arbitrary-precision types into smaller
+fixed-precision machine-word types that exist within Jib. To aid in
+this we have a
+[specialization pass](https://github.com/rems-project/sail/blob/sail2/src/specialize.ml)
+that removes polymorphism by creating specialised copies of functions
+based on how their type-quantifiers are instantiated. This can be used
+in addition to the Sail monomorphisation above.
+
+Once we have generated Jib, the code generator from Jib into C is
+fairly straightforward.
+
+## Jib to SMT translation
+
+Starting with some Sail such as:
+```
+default order dec
+
+$include <prelude.sail>
+
+register r : bits(32)
+
+$property
+function property(xs: bits(32)) -> bool = {
+  ys : bits(32) = 0x1234_5678;
+  if (r[0] == bitzero) then {
+    ys = 0xffff_ffff
+  } else {
+    ys = 0x0000_0000
+  };
+  (ys == sail_zeros(32) | ys == sail_ones(32))
+}
+```
+
+We first compile to Jib, then inline all functions and flatten the
+resulting code into a list of instructions as below. The Sail->Jib
+step can be parameterised in a few ways so it differs slightly to when
+we compile Sail to C. First, a specialisation pass specialises
+integer-polymorphic functions and builtins, which is reflected in the
+name mangling scheme so, e.g.
+```
+zz7mzJzK0zCz0z7nzJzK32#bitvector_access
+```
+is a specialised version of bitvector access for 'n => 32 & 'm => 0.
+This lets us generate optimal SMT operations for monomorphic code, as
+the SMTLIB operations like ZeroExtend and Extract are only defined for
+natural number constants and bitvectors of known lengths. We also have
+to treat zero-length bitvectors differently to C, as SMT does not
+allow zero-length bitvectors, and unlike when we compile to C, we can
+have fixed-precision bitvectors of greater that 64-bits in the
+generated Jib.
+
+```
+var ys#u12_l#9 : fbits(32, dec)
+ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x12345678)
+var gs#2#u12_l#15 : bool
+var gs#1#u12_l#17 : bit
+gs#1#u12_l#17 : bit = zz7mzJzK0zCz0z7nzJzK32#bitvector_access(R, 0l)
+gs#2#u12_l#15 : bool = eq_bit(gs#1#u12_l#17, UINT64_C(0))
+kill gs#1#u12_l#17 : bit
+var gs#6#u12_l#16 : unit
+jump gs#2#u12_l#15 then_13
+ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x00000000)
+gs#6#u12_l#16 : unit = UNIT
+goto endif_14
+then_13:
+ys#u12_l#9 : fbits(32, dec) = UINT64_C(0xFFFFFFFF)
+gs#6#u12_l#16 : unit = UNIT
+endif_14:
+kill gs#2#u12_l#15 : bool
+var gs#5#u12_l#10 : bool
+var gs#3#u12_l#14 : fbits(32, dec)
+gs#3#u12_l#14 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
+gs#5#u12_l#10 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#3#u12_l#14)
+kill gs#3#u12_l#14 : fbits(32, dec)
+var gs#7#u12_l#11 : bool
+jump gs#5#u12_l#10 then_11
+var gs#4#u12_l#12 : fbits(32, dec)
+var gs#0#u9_l#13 : fbits(32, dec)
+gs#0#u9_l#13 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
+gs#4#u12_l#12 : fbits(32, dec) = zz7nzJzK32#not_vec(gs#0#u9_l#13)
+kill gs#0#u9_l#13 : fbits(32, dec)
+goto end_inline_10
+end_inline_10:
+gs#7#u12_l#11 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#4#u12_l#12)
+kill gs#4#u12_l#12 : fbits(32, dec)
+goto endif_12
+then_11:
+gs#7#u12_l#11 : bool = true
+endif_12:
+return : bool = gs#7#u12_l#11
+kill gs#5#u12_l#10 : bool
+kill ys#u12_l#9 : fbits(32, dec)
+end
+undefined bool
+```
+
+The above Jib is then turned into a control-flow-graph in SSA
+form.
+
+![SSA graph](https://github.com/rems-project/sail/blob/sail2/doc/sail-smt.png)
+
+The conditionals that affect control-flow are put into separate nodes
+(in yellow), so we can easily compute the global path conditional for
+each block (stored in a function pi(cond0, ... , condn) by using the
+yellow conditional nodes between each node and the start node.
+
+From this form the conversion to SMT is as follows:
+
+A variable declaration such as
+```
+var x : fbits(32, dec)
+```
+would become
+```
+(declare-const x (BitVec 32))
+```
+
+A call to a builtin
+```
+x : T = f(y0, ... , yn)
+```
+would become
+```
+(define-const x T' exp)
+```
+where exp encodes the builtin f using SMT bitvector operations
+
+Phi functions are mapped to muxers as follows - for a function
+`phi(x0,...,xn)` we turn that into an if-then-else statement which
+selects `x0` to `xn` based on the global path conditionals of the
+parent blocks corresponding to each argument. Each phi function in a
+block always has the same number of arguments as the number of parent
+nodes, and the arguments are in the same order as the node index for
+each parent.
+
+The above scheme generates a lot of declare-const and define-const
+statements that may not be needed so we do some simple dead-code
+elimination and constant propagation, which results in the following
+SMT:
+```
+(set-logic QF_AUFBVDT)
+(declare-const zR/0 (_ BitVec 32))
+(define-const zysz3u12_lz30/5 (_ BitVec 32) (ite (not (= ((_ extract 0 0) zR/0) #b0)) #x00000000 #xFFFFFFFF))
+(assert (and (not (ite (not (= zysz3u12_lz30/5 #x00000000)) (= zysz3u12_lz30/5 (bvnot #x00000000)) true))))
+(check-sat)
+```
+
+For monomorphic-bitvector manipulating code we can generate very
+compact SMT. Both specialisation and monomorphisation can be used to
+help monomorphise bitvectors. For variable-length bitvectors we can
+represent them as length-bitvector pairs, up to a certain bounded max
+length (default 256). This is less efficient but unavoidable in
+certain places. Integers are currently mapped to either 128 bit
+bitvectors (or any configurable max length) or 64 bit bitvectors.
+
+The one slightly tricky thing to support is register references, e.g.
+```
+(*r) : T = 0xFFFF_FFFF
+```
+where `r` is a register reference. For this we look for all registers
+with type T (as a Jib type, where type-equality is trivial), and
+convert the above to
+```
+if r = ref R1 then
+  R1 = 0xFFFF_FFFF
+else if r = ref R2 then
+  R2 = 0xFFFF_FFFF
+else ...
+```
+if we did some smarter constant folding (e.g. propagating the GPRs
+letbinding in some of our specs) this could potentially generate SMT
+that is just as good as a manual if-then-else implementation of the
+register read/write functions. This step is done before converting to
+SSA so each register in the if-then-else cascade gets the correct
+indices.
diff --git a/doc/sail-smt.png b/doc/sail-smt.png
new file mode 100644
index 00000000..146a8c9f
--- /dev/null
+++ b/doc/sail-smt.png
author	Alasdair Armstrong	2019-05-29 19:21:36 +0100
committer	Alasdair Armstrong	2019-05-29 19:22:21 +0100
commit	baf2f0b11b05722a54bf3c504661c27dd8133f4b (patch)
tree	fa65bcd5ac94b0fa5ab7575333f8ba49f5f7e643
parent	51d01d8bacc8d117210984c48c54b17e35236848 (diff)