\chapter{Syntax extensions and interpretation scopes} \label{Addoc-syntax} In this chapter, we introduce advanced commands to modify the way {\Coq} parses and prints objects, i.e. the translations between the concrete and internal representations of terms and commands. The main commands are {\tt Notation} and {\tt Infix} which are described in section \ref{Notation}. It also happens that the same symbolic notation is expected in different contexts. To achieve this form of overloading, {\Coq} offers a notion of interpretation scope. This is described in section \ref{scopes}. \Rem The commands {\tt Grammar}, {\tt Syntax} and {\tt Distfix} which were present for a while in {\Coq} are no longer available from {\Coq} version 8.0. The underlying AST structure is also no longer available. The functionalities of the command {\tt Syntactic Definition} are still available, see section \ref{Abbreviations}. \section{Notations} \label{Notation} \comindex{Notation} \subsection{Basic notations} A {\em notation} is a symbolic abbreviation denoting some term or term pattern. A typical notation is the use of the infix symbol \verb=/\= to denote the logical conjunction (\texttt{and}). Such a notation is declared by \begin{coq_example*} Notation "A /\ B" := (and A B). \end{coq_example*} The expression \texttt{(and A B)} is the abbreviated term and the string \verb="A /\ B"= (called a {\em notation}) tells how it is symbolically written. A notation is always surrounded by double quotes (excepted when the abbreviation is a single ident, see \ref{Abbreviations}). The notation is composed of {\em tokens} separated by spaces. Identifiers in the string (such as \texttt{A} and \texttt{B}) are the {\em parameters} of the notation. They must occur at least once each in the denoted term. The other elements of the string (such as \verb=/\=) are the {\em symbols}. An identifier can be used as a symbol but it must be surrounded by simple quotes to avoid the confusion with a parameter. Similarly, every symbol of at least 3 characters and starting with a simple quote must be quoted (then it starts by two single quotes). Here is an example. \begin{coq_example*} Notation "'IF' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3). \end{coq_example*} %TODO quote the identifier when not in front, not a keyword, as in "x 'U' y" ? \subsection{Precedences and associativity} \index{Precedences} \index{Associativity} Mixing different symbolic notations in a same text may cause serious parsing ambiguity. To deal with the ambiguity of notations, {\Coq} uses precedence levels ranging from 0 to 100 (plus one extra level numbered 200) and associativity rules. Consider for example the new notation \begin{coq_example*} Notation "A \/ B" := (or A B). \end{coq_example*} Clearly, an expression such as {\tt (A:Prop)True \verb=/\= A \verb=\/= A \verb=\/= False} is ambiguous. To tell the {\Coq} parser how to interpret the expression, a priority between the symbols \verb=/\= and \verb=\/= has to be given. Assume for instance that we want conjunction to bind more than disjunction. This is expressed by assigning a precedence level to each notation, knowing that a lower level binds more than a higher level. Hence the level for disjunction must be higher than the level for conjunction. Since connectives are the less tight articulation points of a text, it is reasonable to choose levels not so far from the higher level which is 100, for example 85 for disjunction and 80 for conjunction\footnote{which are the levels effectively chosen in the current implementation of {\Coq}}. Similarly, an associativity is needed to decide whether {\tt True \verb=/\= False \verb=/\= False} defaults to {\tt True \verb=/\= (False \verb=/\= False)} (right associativity) or to {\tt (True \verb=/\= False) \verb=/\= False} (left associativity). We may even consider that the expression is not well-formed and that parentheses are mandatory (this is a ``no associativity'')\footnote{ {\Coq} accepts notations declared as no associative but the parser on which {\Coq} is built, namely {\camlpppp}, currently does not implement the no-associativity and replace it by a left associativity; hence it is the same for {\Coq}: no-associativity is in fact left associativity}. We don't know of a special convention of the associativity of disjunction and conjunction, let's apply for instance a right associativity (which is the choice of {\Coq}). Precedence levels and associativity rules of notations have to be given between parentheses in a list of modifiers that the \texttt{Notation} command understands. Here is how the previous examples refine. \begin{coq_example*} Notation "A /\ B" (and A B) (at level 80, right associativity). Notation "A \/ B" (or A B) (at level 85, right associativity). \end{coq_example*} By default, a notation is considered non associative, but the precedence level is mandatory (except for special cases whose level is canonical). The level is either a number or the mention {\tt next level} whose meaning is obvious. The list of levels already assigned is on Figure~\ref{init-notations}. \subsection{Complex notations} Notations can be made from arbitraly complex symbols. One can for instance define prefix notations. \begin{coq_example*} Notation "~ x" := (not x) (at level 75, right associativity). \end{coq_example*} One can also define notations for incomplete terms, with the hole expected to be inferred at typing time. \begin{coq_example*} Notation "x = y" := (@eq ? x y) (at level 70, no associativity). \end{coq_example*} One can define {\em closed} notations whose both sides are symbols. In this case, the default precedence level for inner subexpression is 200. \begin{coq_example*} Notation "{ A } + { B }" := (sumbool A B) (at level 0). \end{coq_example*} One can also define notations for binders. \begin{coq_example*} Notation "{ x : A | P }" := (sig A (fun x => P)) (at level 0). \end{coq_example*} \subsection{Simple factorisation rules} {\Coq} extensible parsing is performed by Camlp4 which is essentially a LL1 parser. Hence, some care has to be taken not to hide already existing rules by new rules. Some simple left factorisation work has to be done. Here is an example. \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70). \end{coq_example*} In order to factorise the left part of the rules, the subexpression referred by {\tt y} has to be at the same level in both rules. However the default behaviour puts {\tt y} at the next level below 70 in the first rule (no associativity is the default), and at the level 200 in the second rule (level 200 is the default for inner expressions). To fix this, we need to force the parsing level of {\tt y}, as follows. \begin{coq_example*} Notation "x < y" := (lt x y) (at level 70). Notation "x < y < z" := (x < y /\ y < z) (at level 70, y at next level). \end{coq_example*} For the sake of factorisation with {\Coq} predefined rules, simple rules have to be observed for notations starting with a symbol: e.g. rules starting with ``\{'' or ``('' should be put at level 0. The list of {\Coq} predefined notations can be found in chapter \ref{Theories}. \subsection{Displaying symbolic notations} The command \texttt{Notation} has an effect both on the {\Coq} parser and on the {\Coq} printer. For example: \begin{coq_example} Check (and True True). \end{coq_example} However, printing, especially pretty-printing, requires more care than parsing. We may want specific indentations, line breaks, alignment if on several lines, etc. The default printing of notations is very rudimentary. For printing a notation, a {\em formatting box} is opened in such a way that if the notation and its arguments cannot fit on a single line, a line break is inserted before the symbols of the notation and the arguments on the next lines are aligned with the argument on the first line. A first, simple control that a user can have on the printing of a notation is the insertion of spaces at some places of the notation. This is performed by adding extra spaces between the symbols and parameters: each extra space (other than the single space needed to separate the components) is interpreted as a space to be inserted by the printer. Here is an example showing how to add spaces around the bar of the notation. \begin{coq_example} Notation "{{ x : A | P }}" := (sig (fun x : A => P)) (at level 0, x at level 99). Check (sig (fun x : nat => x=x)). \end{coq_example} The second, more powerful control on printing is by using the {\tt format} modifier. Here is an example \begin{small} \begin{coq_example} Notation "'If' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3) (at level 200, right associativity, format "'[v ' 'If' c1 '/' '[' 'then' c2 ']' '/' '[' 'else' c3 ']' ']'"). \end{coq_example} \end{small} A {\em format} is an extension of the string denoting the notation with the possible following elements delimited by single quotes: \begin{itemize} \item extra spaces are translated into simple spaces \item tokens of the form \verb='/ '= are translated into breaking point, in case a line break occurs, an indentation of the number of spaces after the ``\verb=/='' is applied (2 spaces in the given example) \item token of the form \verb='//'= force writing on a new line \item well-bracketed pairs of tokens of the form \verb='[ '= and \verb=']'= are translated into printing boxes; in case a line break occurs, an extra indentation of the number of spaces given after the ``\verb=[='' is applied (4 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[hv '= and \verb=']'= are translated into horizontal-orelse-vertical printing boxes; if the content of the box does not fit on a single line, then every breaking point forces a newline and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline (3 spaces in the example) \item well-bracketed pairs of tokens of the form \verb='[v '= and \verb=']'= are translated into vertical printing boxes; every breaking point forces a newline, even if the line is large enough to display the whole content of the box, and an extra indentation of the number of spaces given after the ``\verb=[='' is applied at the beginning of each newline \end{itemize} Thus, for the previous example, we get %\footnote{The ``@'' is here to shunt %the notation "'IF' A 'then' B 'else' C" which is defined in {\Coq} %initial state}: Notations do not survive the end of sections. No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the notation. \begin{coq_example} Check (IF_then_else (IF_then_else True False True) (IF_then_else True False True) (IF_then_else True False True)). \end{coq_example} \Rem Sometimes, a notation is expected only for the parser. %(e.g. because %the underlying parser of {\Coq}, namely {\camlpppp}, is LL1 and some extra %rules are needed to circumvent the absence of factorisation). To do so, the option {\em only parsing} is allowed in the list of modifiers of \texttt{Notation}. \subsection{The \texttt{Infix} command} The \texttt{Infix} command is a shortening for declaring notations of infix symbols. Its syntax is \medskip \noindent\texttt{Infix} "{\symbolentry}" {\qualid} {\tt (} \nelist{\em modifier}{,} {\tt )}. \medskip and it is equivalent to \medskip \noindent\texttt{Notation "x {\symbolentry} y" := {\qualid} x y ( \nelist{\em modifier}{,} )}. \medskip where {\tt x} and {\tt y} are fresh names distinct from {\qualid}. Here is an example. \begin{coq_example*} Infix "/\" and (at level 80, right associativity). \end{coq_example*} \subsection{Reserving notations} A given notation may be used in different contexts. {\Coq} expects all uses of the notation to be defined at the same precedence and with the same associativity. To avoid giving the precedence and associativity every time, it is possible to declare a parsing rule in advance without giving its interpretation. Here is an example from the initial state of {\Coq}. \begin{coq_example} Reserved Notation "x = y" (at level 70, no associativity). \end{coq_example} Reserving a notation is also useful for simultaneously defined an inductive type or a recursive constant and a notation for it. \Rem The notations mentioned on Figure~\ref{init-notations} are reserved. Hence their precedence and associativity cannot be changed. \subsection{Simultaneous definition of terms and notations} \subsection{Displaying informations about notations} % Set/Unset Printing Notation \subsection{Locating notations} \comindex{Locate} \label{LocateSymbol} To know to which notations a given symbol belongs to, use the command \bigskip {\tt Locate {\symbolentry}} \bigskip where symbol is any (composite) symbol surrounded by quotes. To locate a particular notation, use a string where the variables of the notation are replaced by ``\_''. \Example \begin{coq_example} Locate "exists". Locate "'exists' _ , _". \end{coq_example} \SeeAlso Section \ref{Locate}. \section{Interpretation scopes} \label{scopes} % Introduction An {\em interpretation scope} is a set of notations for terms with their interpretation. Interpretation scopes provides with a weak, purely syntactical form of notations overloading: a same notation, for instance the infix symbol \verb=+= can be used to denote distinct definitions of an additive operator. Depending on which interpretation scopes is currently open, the interpretation is different. \subsection{Interpretation rules for notations} At any time, the interpretation of a notation for term is done within a {\em stack} of interpretation scopes and lonely notations. In case a notation has several interpretations, the actual interpretation is the one defined by (or in) the more recently declared (or open) lonely notation (or interpretation scope) which defines this notation. Typically if a given notation is defined in some scope {\scope} but has also an interpretation not assigned to a scope, then, if {\scope} is open before the lonely interpretation is declared, then the lonely interpretation is used (and this is the case even if the interpretation of the notation in {\scope} is given after the lonely interpretation: otherwise said, only the order of lonely interpretations and opening of scopes matters, and not the declaration of interpretations within a scope). The initial state of {\Coq} declares three interpretation scopes and no lonely notations. These scopes, in opening order, are {\tt core\_scope}, {\tt type\_scope} and {\tt nat\_scope}. \subsection{Notations in scope} \subsection{Activation of interpretation scopes} \label{scopechange} \index{\%} % Open (Local) Scope % Close (Local) Scope \subsection{Interpretation of numerals} \subsection{Interpretation scopes of arguments} \subsection{The type interpretation scope} The scope {\tt type\_scope} has a special status. It is a primitive interpretation scope which is temporarily activated each time a subterm of an expression is expected to be a type. This includes goals and statements, types of binders, domain and codomain of implication, codomain of products, and more generally any type argument of a declared or defined constant. \subsection{Interpretation scopes used in the standard library of {\Coq}} We give an overview of the scopes used in the standard library of {\Coq}. For a complete list of notations in each scope, use the commands {\tt Print Scopes} or {\tt Print Scopes {\scope}}. \subsubsection{\tt type\_scope} This includes infix {\tt *} for product types and infix {\tt +} for sum types. It is delimited by key {\tt type}. \subsubsection{\tt nat\_scope} This includes the standard arithmetical operators and relations on type {\tt nat}. Positive numerals in this scope are mapped to their canonical representent built from {\tt O} and {\tt S}. The scope is delimited by key {\tt nat}. \subsubsection{\tt N\_scope} This includes the standard arithmetical operators and relations on type {\tt N} (binary natural numbers). It is delimited by key {\tt N}. \subsubsection{\tt Z\_scope} This includes the standard arithmetical operators and relations on type {\tt Z} (binary integer numbers). It is delimited by key {\tt Z}. \subsubsection{\tt Z\_scope} This includes the standard arithmetical operators and relations on type {\tt positive} (binary strictly positive numbers). It is delimited by key {\tt positive}. \subsubsection{\tt bool\_scope} This includes notations for the boolean operators. \subsubsection{\tt list\_scope} This includes notations for the list operators. \subsubsection{\tt core\_scope} This includes the notation for pairs. It is delimited by key {\tt core}. \subsection{Displaying informations about scopes} \subsubsection{\tt Print Visibility} This displays the current stack of notations in scopes and lonely notations that is used to interpret a notation. The top of the stack is displayed last. Notations in scopes whose interpretation is hidden by the same notation in a more recently open scope are not displayed. Hence each notation is displayed only once. \variant {\tt Print Visibility {\scope}}\\ This displays the current stack of notations in scopes and lonely notations assuming that {\scope} is pushed on top of the stack. This is useful to know how a subterm locally occurring in the scope of {\scope} is interpreted. \subsubsection{\tt Print Scope {\scope}} This displays all the notations defined in interpretation scope {\scope}. It also displays the delimiting key if any and the class to which the scope is bound, if any. \subsubsection{\tt Print Scopes} This displays all the notations, delimiting keys and corresponding class of all the existing interpretation scopes. It also displays the lonely notations. \section{Abbreviations} \index{Abbreviations} \label{Abbreviations} An {\em abbreviation} is a name denoting a (presumably) more complex expression. An abbreviation is a special form of notation with no parameter and only one symbol which is an identifier. This identifier is given with no quotes around. Example: \begin{coq_example*} Notation List := (list nat). \end{coq_example*} An abbreviation expects no precedence nor associativity, since it can always be put at the lower level of atomic expressions, and associativity is irrelevant. Abbreviations are used as much as possible by the {\Coq} printers unless the modifier \verb=(only parsing)= is given. Abbreviations are bound to an absolute name like for an ordinary definition, and can be referred by partially qualified names too. Abbreviations are syntactic in the sense that they are bound to expressions which are not typed at the time of the definition of the abbreviation but at the time it is used. Especially, abbreviation can be bound to terms with holes (i.e. with ``\_''). \Example \begin{coq_eval} Set Strict Implicit. Reset Initial. \end{coq_eval} \begin{coq_example} Definition explicit_id (A:Set) (a:A) := a. Notation id := (explicit_id _). Check (id 0). \end{coq_example} Abbreviations do not survive the end of sections. No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the abbreviation. \Rem \index{Syntactic Definition} % For compatibility Abbreviations are similar to the {\em syntactic definitions} available in versions of {\Coq} prior to version 8.0, except that abbreviations are used for printing (unless the modifier \verb=(only parsing)= is given) while syntactic definitions were not. \section{Summary} \paragraph{Persistence of notations} Notations do not survive the end of sections. They survive modules unless the command {\tt Notation Local} is used instead of {\tt Notation}. \paragraph{Syntax of notations} The different syntactic variants of the command \texttt{Notation} are given on Figure \ref{Grammar-Notation}. \begin{figure} \begin{tabular}{|lcl|} \hline {\sentence} & ::= & \texttt{Notation} \zeroone{\tt Local} {\str} \texttt{:=} {\term} \zeroone{\modifiers} \zeroone{:{\scope}} \verb=.=\\ & $|$ & \texttt{Infix} \zeroone{\tt Local} {\str} \texttt{:=} {\qualid} \zeroone{\modifiers} \zeroone{:{\scope}} \verb=.=\\ & $|$ & \texttt{Notation} \zeroone{\tt Local} {\ident} \texttt{:=} {\term} \zeroone{\tt (only parsing)} \verb=.=\\ & $|$ & \texttt{Reserved Notation} \zeroone{\tt Local} {\str} \zeroone{\modifiers} \verb=.=\\ \\ {\modifiers} & ::= & \nelist{\ident}{,} {\tt at level} {\naturalnumber} \\ & $|$ & \nelist{\ident}{,} {\tt at next level} \\ & $|$ & {\tt at level} {\naturalnumber} \\ & $|$ & {\tt left associativity} \\ & $|$ & {\tt right associativity} \\ & $|$ & {\tt no associativity} \\ & $|$ & {\ident} {\tt ident} \\ & $|$ & {\ident} {\tt global} \\ & $|$ & {\ident} {\tt bigint} \\ & $|$ & {\tt only parsing} \\ & $|$ & {\tt format} {\str} \\ \hline \end{tabular} \caption{Syntax of the variants of {\tt Notation}} \label{record-syntax} \end{figure} \Rem No typing of the denoted expression is performed at definition time. Type-checking is done only at the time of use of the notation. \Rem Many examples of {\tt Notation} may be found in the files composing the initial state of {\Coq} (see directory {\tt \$COQLIB/theories/Init}). % $Id$ %%% Local Variables: %%% mode: latex %%% TeX-master: "Reference-Manual" %%% End: