doc/introduction.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142

\section{Introduction}
\label{sec:intro}

Sail is a language for expressing the instruction-set
architecture (ISA) semantics of processors.

Vendor architecture specification documents typically describe the
sequential behaviour of their ISA with a combination of prose, tables,
and pseudocode for each instruction.

They vary in how precise that pseudocode is: in some it is just
suggestive, while in others it is close to a complete description of
the envelope of architecturally allowed behaviour for sequential code.

For x86~\cite{Intel61}, the Intel pseudocode is just suggestive, with
embedded prose, while the AMD descriptions~\cite{AMD_3_21} are prose
alone.  For IBM Power~\cite{Power2.06}, there is reasonably detailed
pseudocode for many instructions in the manual, but it has never been
machine-parsed.  For ARM~\cite{armarmv8}, there is detailed
pseudocode, which has recently become machine-processed~\cite{Reid16}.
For MIPS~\cite{MIPS64-II,MIPS64-III} there is also reasonably detailed
pseudocode.

The behaviour of concurrent code is often described less well.  In a
line of research from 2007--2018 we have developed mathematically
rigorous models for the allowed architectural envelope of concurrent
code, for x86, IBM Power, \riscv, and ARM, that have been reasonably
well validated by experimental testing and by discussion with the
vendors and
others~\cite{x86popl,tphols09,cacm,CAV2010,Alglave:2011:LRT:1987389.1987395,pldi105,pldi2012,micro2015,FGP16,mixed17}.
In the course of this, we have identified a number of subtle issues
relating to the interface between the intra-instruction semantics and
the inter-instruction concurrency
semantics~\cite{pldi105,pldi2012,micro2015,FGP16,mixed17}. For
example, the concurrency models, rather than treating each instruction
execution as an atomic unit, require exposed register and memory
events, knowledge of the potential register and memory footprints of
instructions, and knowledge of changes to those during execution.  Our
early work in this area had hand-coded instruction semantics for quite
small fragments of the instruction sets, just enough for concurrency
litmus tests and expressed in various ad hoc ways. As our models have
matured, we have switched to modelling the intra-instruction semantics
more completely and in a style closer to the vendor-documentation
pseudocode, and Sail was developed for this.


Sail is intended:
\begin{itemize}
\item To support precise definition of real-world ISA semantics;
\item To be accessible to engineers familiar with existing vendor
  pseudocodes, with a similar style to the pseudocodes used by ARM and IBM Power
  (modulo minor syntactic differences);
\item To expose the structure needed to combine the sequential ISA
  semantics with the relaxed-memory concurrency models we have
  developed;
\item To provide an expressive type system that can statically check
  the bitvector length and indexing computation that arises in these
  specifications, to detect errors and to support code generation,
  with type inference to minimise the required type annotations;
\item To support execution, for architecturally complete emulation
  automatically based on the definition;
\item To support automatic generation of theorem-prover definitions, for
  mechanised reasoning about ISA specifications; and
\item To be as minimal as possible given the above, to ease the tasks
  of code generation and theorem-prover definition generation.
\end{itemize}

A Sail specification will typically define an abstract syntax type
(AST) of machine instructions, a decode function that takes binary
values to AST values, and an execute function that describes how each
of those behaves at runtime, together with whatever auxiliary
functions and types are needed.
%
Given such a specification, the Sail implementation can typecheck it
and generate:
\begin{itemize}
\item An internal representation of the fully type-annotated
  definition (a deep embedding of the definition) in a form that can
  be executed by the Sail interpreter.  These are both expressed in
  Lem~\cite{Lem-icfp2014,Lemcode}, a language of type, function, and
  relation definitions that can be compiled into OCaml and various
  theorem provers. The Sail interpreter can also be used to analyse
  instruction definitions (or partially executed instructions) to
  determine their potential register and memory footprints.
\item A shallow embedding of the definition, also in Lem, that can be
  executed or converted to theorem-prover code more directly.
Currently this is aimed at Isabelle/HOL or HOL4, though the Sail
dependent types should support generation of idiomatic Coq definitions
(directly rather than via Lem).
\item A compiled version of the specification
  directly into OCaml.
\item A efficient executable version of the specification, compiled
  into C code.
\end{itemize}
Sail does not currently support description of the assembly syntax of
instructions, or the mapping between that and instruction AST or
binary descriptions, although this is something we plan to add.

\medskip

Sail has been used to develop models of parts of several architectures:
\begin{center}
\begin{tabular}{|l|l|} \hline
ARMv8 (hand) & hand-written \\ \hline
ARMv8 (ASL)  & generated from ARM's v8.3 public ASL spec \\ \hline
IBM Power    & extracted/patched from IBM Framemaker XML \\ \hline
MIPS         & hand-written \\ \hline
CHERI        & hand-written \\ \hline
\riscv       & hand-written \\ \hline
\end{tabular}
\end{center}
The ARMv8 (hand) and IBM Power models cover all user-mode instructions
except vector, float, and load-multiple instructions, without
exceptions; for ARMv8 this is for the A64 fragment.

The ARMv8 (hand) model is hand-written based on the ARMv8-A
specification document~\cite{armarmv8,FGP16}, principally by \anonymise{Flur}.

The Power model is based on an automatic extraction of pseudocode and
decoding data from an XML export of the Framemaker document source of
the IBM Power manual~\cite{Power2.06,micro2015}, with manual patching
as necessary, principally by \anonymise{Kerneis and Gray}.

The ARMv8 (ASL) model is based on an automatic translation of ARM's
machine-readable public v8.3 ASL specification\footnote{ARM v8-A
  Architecture Specification:
  \url{https://github.com/meriac/archex}}. It includes everything in
ARM's specification.

The MIPS model is hand-written based on the MIPS64 manual version
2.5~\cite{MIPS64-II,MIPS64-III},
but covering only the features in the BERI hardware
reference~\cite{UCAM-CL-TR-868},
which in turn drew on MIPS4000 and MIPS32~\cite{MIPS4000,MIPS32-I}.
%
The CHERI model is based on that and the CHERI ISA reference manual
version~5~\cite{UCAM-CL-TR-891}. These two are both principally by
\anonymise{Norton-Wright}; they cover all basic user and kernel mode MIPS features
sufficient to boot FreeBSD, including a TLB, exceptions and a basic
UART for console interaction. ISA extensions such as floating point
are not covered. The CHERI model supports either 256-bit capabilities
or 128-bit compressed capabilities.