\chapter{Syntax extensions and interpretation scopes}
\label{Addoc-syntax}

In this chapter, we introduce advanced commands to modify the way
{\Coq} parses and prints objects, i.e. the translations between the
concrete and internal representations of terms and commands. The main
commands are {\tt Notation} and {\tt Infix} which are described in
section \ref{Notation}.  It also happens that the same symbolic
notation is expected in different contexts. To achieve this form of
overloading, {\Coq} offers a notion of interpretation scope. This is
described in section \ref{scopes}.

\Rem The commands {\tt Grammar}, {\tt Syntax} and {\tt Distfix} which
were present for a while in {\Coq} are no longer available from {\Coq}
version 8.0. The underlying AST structure is also no longer available.
The functionalities of the command {\tt Syntactic Definition} are
still available, see section \ref{Abbreviations}.

\section{Notations}
\label{Notation}
\comindex{Notation}

\subsection{Basic notations}

A {\em notation} is a symbolic abbreviation denoting some term
or term pattern.

A typical notation is the use of the infix symbol \verb=/\= to denote
the logical conjunction (\texttt{and}). Such a notation is declared
by

\begin{coq_example*}
Notation "A /\ B" := (and A B).
\end{coq_example*}

The expression \texttt{(and A B)} is the abbreviated term and the
string \verb="A /\ B"= (called a {\em notation}) tells how it is 
symbolically written.

A notation is always surrounded by double quotes (excepted when the
abbreviation is a single ident, see \ref{Abbreviations}). The
notation is composed of {\em tokens} separated by spaces.  Identifiers
in the string (such as \texttt{A} and \texttt{B}) are the {\em
parameters} of the notation. They must occur at least once each in the
denoted term. The other elements of the string (such as \verb=/\=) are
the {\em symbols}.

An identifier can be used as a symbol but it must be surrounded by
simple quotes to avoid the confusion with a parameter. Similarly,
every symbol of at least 3 characters and starting with a simple quote
must be quoted (then it starts by two single quotes). Here is an example.

\begin{coq_example*}
Notation "'IF' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3).
\end{coq_example*}

%TODO quote the identifier when not in front, not a keyword, as in "x 'U' y" ?

\subsection{Precedences and associativity}
\index{Precedences}
\index{Associativity}

Mixing different symbolic notations in a same text may cause serious
parsing ambiguity. To deal with the ambiguity of notations, {\Coq}
uses precedence levels ranging from 0 to 100 (plus one extra level
numbered 200) and associativity rules.

Consider for example the new notation

\begin{coq_example*}
Notation "A \/ B" := (or A B).
\end{coq_example*}

Clearly, an expression such as {\tt (A:Prop)True \verb=/\= A \verb=\/=
A \verb=\/= False} is ambiguous. To tell the {\Coq} parser how to
interpret the expression, a priority between the symbols \verb=/\= and
\verb=\/= has to be given. Assume for instance that we want conjunction
to bind more than disjunction. This is expressed by assigning a
precedence level to each notation, knowing that a lower level binds
more than a higher level.  Hence the level for disjunction must be
higher than the level for conjunction.

Since connectives are the less tight articulation points of a text, it
is reasonable to choose levels not so far from the higher level which
is 100, for example 85 for disjunction and 80 for
conjunction\footnote{which are the levels effectively chosen in the
current implementation of {\Coq}}.

Similarly, an associativity is needed to decide whether {\tt True \verb=/\=
False \verb=/\= False} defaults to {\tt True \verb=/\= (False
\verb=/\= False)} (right associativity) or to {\tt (True
\verb=/\= False) \verb=/\= False} (left associativity). We may
even consider that the expression is not well-formed and that
parentheses are mandatory (this is a ``no associativity'')\footnote{
{\Coq} accepts notations declared as no associative but the parser on
which {\Coq} is built, namely {\camlpppp}, currently does not implement the
no-associativity and replace it by a left associativity; hence it is
the same for {\Coq}: no-associativity is in fact left associativity}.
We don't know of a special convention of the associativity of
disjunction and conjunction, let's apply for instance a right
associativity (which is the choice of {\Coq}).

Precedence levels and associativity rules of notations have to be
given between parentheses in a list of modifiers that the
\texttt{Notation} command understands. Here is how the previous
examples refine.

\begin{coq_example*}
Notation "A /\ B" (and A B) (at level 80, right associativity).
Notation "A \/ B" (or A B)  (at level 85, right associativity).
\end{coq_example*}

By default, a notation is considered non associative, but the
precedence level is mandatory (except for special cases whose level is
canonical). The level is either a number or the mention {\tt next
level} whose meaning is obvious. The list of levels already assigned
is on Figure~\ref{init-notations}.

\subsection{Complex notations}

Notations can be made from arbitraly complex symbols. One can for
instance define prefix notations.

\begin{coq_example*}
Notation "~ x" := (not x) (at level 75, right associativity).
\end{coq_example*}

One can also define notations for incomplete terms, with the hole
expected to be inferred at typing time.

\begin{coq_example*}
Notation "x = y" := (@eq ? x y) (at level 70, no associativity).
\end{coq_example*}

One can define {\em closed} notations whose both sides are symbols. In
this case, the default precedence level for inner subexpression is 200.

\begin{coq_example*}
Notation "{ A } + { B }" := (sumbool A B) (at level 0).
\end{coq_example*}

One can also define notations for binders.

\begin{coq_example*}
Notation "{ x : A  |  P }" := (sig A (fun x => P)) (at level 0).
\end{coq_example*}

\subsection{Simple factorisation rules}

{\Coq} extensible parsing is performed by Camlp4 which is essentially a
LL1 parser. Hence, some care has to be taken not to hide already
existing rules by new rules. Some simple left factorisation work has
to be done. Here is an example.

\begin{coq_example*}
Notation "x < y"     := (lt x y) (at level 70).
Notation "x < y < z" := (x < y /\ y < z) (at level 70).
\end{coq_example*}

In order to factorise the left part of the rules, the subexpression
referred by {\tt y} has to be at the same level in both rules. However
the default behaviour puts {\tt y} at the next level below 70
in the first rule (no associativity is the default), and at the level
200 in the second rule (level 200 is the default for inner expressions).
To fix this, we need to force the parsing level of {\tt y},
as follows.

\begin{coq_example*}
Notation "x < y"     := (lt x y) (at level 70).
Notation "x < y < z" := (x < y /\ y < z) (at level 70, y at next level).
\end{coq_example*}

For the sake of factorisation with {\Coq} predefined rules, simple
rules have to be observed for notations starting with a symbol:
e.g. rules starting with ``\{'' or ``('' should be put at level 0. The
list of {\Coq} predefined notations can be found in chapter \ref{Theories}.

\subsection{Displaying symbolic notations}

The command \texttt{Notation} has an effect both on the {\Coq} parser and
on the {\Coq} printer. For example:

\begin{coq_example}
Check (and True True).
\end{coq_example}

However, printing, especially pretty-printing, requires
more care than parsing. We may want specific indentations,
line breaks, alignment if on several lines, etc. 

The default printing of notations is very rudimentary. For printing a
notation, a {\em formatting box} is opened in such a way that if the
notation and its arguments cannot fit on a single line, a line break
is inserted before the symbols of the notation and the arguments on
the next lines are aligned with the argument on the first line.

A first, simple control that a user can have on the printing of a
notation is the insertion of spaces at some places of the
notation. This is performed by adding extra spaces between the symbols
and parameters: each extra space (other than the single space needed
to separate the components) is interpreted as a space to be inserted
by the printer. Here is an example showing how to add spaces around
the bar of the notation.

\begin{coq_example}
Notation "{{ x : A  |  P }}" := (sig (fun x : A => P))
  (at level 0, x at level 99).
Check (sig (fun x : nat => x=x)).
\end{coq_example}

The second, more powerful control on printing is by using the {\tt
format} modifier. Here is an example

\begin{small}
\begin{coq_example}
Notation "'If' c1 'then' c2 'else' c3" := (IF_then_else c1 c2 c3)
 (at level 200, right associativity, format
  "'[v   ' 'If'  c1 '/' '[' 'then'  c2  ']' '/' '[' 'else'  c3 ']' ']'").
\end{coq_example}
\end{small}

A {\em format} is an extension of the string denoting the notation with
the possible following elements delimited by single quotes:

\begin{itemize}
\item extra spaces are translated into simple spaces
\item tokens of the form \verb='/  '= are translated into breaking point,
  in case a line break occurs, an indentation of the number of spaces
  after the ``\verb=/='' is applied (2 spaces in the given example)
\item token of the form \verb='//'= force writing on a new line
\item well-bracketed pairs of tokens of the form \verb='[    '= and \verb=']'=
  are translated into printing boxes; in case a line break occurs,
  an extra indentation of the number of spaces given after the ``\verb=[=''
  is applied (4 spaces in the example)
\item well-bracketed pairs of tokens of the form \verb='[hv   '= and \verb=']'=
  are translated into horizontal-orelse-vertical printing boxes; 
  if the content of the box does not fit on a single line, then every breaking
  point forces a newline and an extra  indentation of the number of spaces
  given after the ``\verb=[='' is applied at the beginning of each newline
  (3 spaces in the example)
\item well-bracketed pairs of tokens of the form \verb='[v '= and
  \verb=']'= are translated into vertical printing boxes; every
  breaking point forces a newline, even if the line is large enough to
  display the whole content of the box, and an extra indentation of the
  number of spaces given after the ``\verb=[='' is applied at the beginning
  of each newline
\end{itemize}

Thus, for the previous example, we get
%\footnote{The ``@'' is here to shunt
%the notation "'IF' A 'then' B 'else' C" which is defined in {\Coq}
%initial state}:

Notations do not survive the end of sections. No typing of the denoted
expression is performed at definition time. Type-checking is done only
at the time of use of the notation.

\begin{coq_example}
Check 
 (IF_then_else (IF_then_else True False True) 
   (IF_then_else True False True)
   (IF_then_else True False True)).   
\end{coq_example}

\Rem
Sometimes, a notation is expected only for the parser.
%(e.g. because
%the underlying parser of {\Coq}, namely {\camlpppp}, is LL1 and some extra
%rules are needed to circumvent the absence of factorisation).
To do so, the option {\em only parsing} is allowed in the list of modifiers of
\texttt{Notation}.

\subsection{The \texttt{Infix} command}

The \texttt{Infix} command is a shortening for declaring notations of
infix symbols. Its syntax is 

\medskip

\noindent\texttt{Infix} "{\symbolentry}" {\qualid} {\tt (} \nelist{\em modifier}{,} {\tt )}.

\medskip

and it is equivalent to

\medskip

\noindent\texttt{Notation "x {\symbolentry} y" := {\qualid} x y  ( \nelist{\em modifier}{,} )}.

\medskip

where {\tt x} and {\tt y} are fresh names distinct from {\qualid}. Here is an example.

\begin{coq_example*}
Infix "/\" and (at level 80, right associativity).
\end{coq_example*}

\subsection{Reserving notations}

A given notation may be used in different contexts. {\Coq} expects all
uses of the notation to be defined at the same precedence and with the
same associativity. To avoid giving the precedence and associativity
every time, it is possible to declare a parsing rule in advance
without giving its interpretation. Here is an example from the initial
state of {\Coq}.

\begin{coq_example}
Reserved Notation "x = y" (at level 70, no associativity).
\end{coq_example}

Reserving a notation is also useful for simultaneously defined an
inductive type or a recursive constant and a notation for it.

\Rem The notations mentioned on Figure~\ref{init-notations} are
reserved. Hence their precedence and associativity cannot be changed.

\subsection{Simultaneous definition of terms and notations}

\subsection{Displaying informations about notations}

% Set/Unset Printing Notation

\subsection{Locating notations}
\comindex{Locate}
\label{LocateSymbol}

To know to which notations a given symbol belongs to, use the command

\bigskip
{\tt Locate {\symbolentry}}
\bigskip

where symbol is any (composite) symbol surrounded by quotes. To locate
a particular notation, use a string where the variables of the
notation are replaced by ``\_''.

\Example

\begin{coq_example}
Locate "exists".
Locate "'exists' _ , _".
\end{coq_example}

\SeeAlso Section \ref{Locate}.

\section{Interpretation scopes}
\label{scopes}
% Introduction

An {\em interpretation scope} is a set of notations for terms with
their interpretation. Interpretation scopes provides with a weak,
purely syntactical form of notations overloading: a same notation, for
instance the infix symbol \verb=+= can be used to denote distinct
definitions of an additive operator. Depending on which interpretation
scopes is currently open, the interpretation is different.

\subsection{Interpretation rules for notations}

At any time, the interpretation of a notation for term is done within
a {\em stack} of interpretation scopes and lonely notations. In case a
notation has several interpretations, the actual interpretation is the
one defined by (or in) the more recently declared (or open) lonely
notation (or interpretation scope) which defines this notation.
Typically if a given notation is defined in some scope {\scope} but
has also an interpretation not assigned to a scope, then, if {\scope}
is open before the lonely interpretation is declared, then the lonely
interpretation is used (and this is the case even if the
interpretation of the notation in {\scope} is given after the lonely
interpretation: otherwise said, only the order of lonely
interpretations and opening of scopes matters, and not the declaration
of interpretations within a scope).

The initial state of {\Coq} declares three interpretation scopes and
no lonely notations. These scopes, in opening order, are {\tt
core\_scope}, {\tt type\_scope} and {\tt nat\_scope}.

\subsection{Notations in scope}

\subsection{Activation of interpretation scopes}
\label{scopechange}
\index{\%}

% Open (Local) Scope
% Close (Local) Scope

\subsection{Interpretation of numerals}

\subsection{Interpretation scopes of arguments}


\subsection{The type interpretation scope}

The scope {\tt type\_scope} has a special status. It is a primitive
interpretation scope which is temporarily activated each time a
subterm of an expression is expected to be a type. This includes goals
and statements, types of binders, domain and codomain of implication,
codomain of products, and more generally any type argument of a
declared or defined constant.

\subsection{Interpretation scopes used in the standard library of {\Coq}}

We give an overview of the scopes used in the standard library of
{\Coq}. For a complete list of notations in each scope, use the
commands {\tt Print Scopes} or {\tt Print Scopes {\scope}}.

\subsubsection{\tt type\_scope}

This includes infix {\tt *} for product types and infix {\tt +} for
sum types. It is delimited by key {\tt type}.

\subsubsection{\tt nat\_scope}

This includes the standard arithmetical operators and relations on
type {\tt nat}. Positive numerals in this scope are mapped to their
canonical representent built from {\tt O} and {\tt S}. The scope is
delimited by key {\tt nat}.

\subsubsection{\tt N\_scope}

This includes the standard arithmetical operators and relations on
type {\tt N} (binary natural numbers). It is delimited by key {\tt N}.

\subsubsection{\tt Z\_scope}

This includes the standard arithmetical operators and relations on
type {\tt Z} (binary integer numbers). It is delimited by key {\tt Z}.

\subsubsection{\tt Z\_scope}

This includes the standard arithmetical operators and relations on
type {\tt positive} (binary strictly positive numbers). It is
delimited by key {\tt positive}.

\subsubsection{\tt bool\_scope}

This includes notations for the boolean operators.

\subsubsection{\tt list\_scope}

This includes notations for the list operators.

\subsubsection{\tt core\_scope}

This includes the notation for pairs. It is delimited by key {\tt core}.

\subsection{Displaying informations about scopes}

\subsubsection{\tt Print Visibility}

This displays the current stack of notations in scopes and lonely
notations that is used to interpret a notation. The top of the stack
is displayed last. Notations in scopes whose interpretation is hidden
by the same notation in a more recently open scope are not
displayed. Hence each notation is displayed only once.

\variant

{\tt Print Visibility {\scope}}\\

This displays the current stack of notations in scopes and lonely
notations assuming that {\scope} is pushed on top of the stack.  This
is useful to know how a subterm locally occurring in the scope of
{\scope} is interpreted.

\subsubsection{\tt Print Scope {\scope}}

This displays all the notations defined in interpretation scope
{\scope}.  It also displays the delimiting key if any and the class to
which the scope is bound, if any.

\subsubsection{\tt Print Scopes}

This displays all the notations, delimiting keys and corresponding
class of all the existing interpretation scopes.
It also displays the lonely notations.

\section{Abbreviations}
\index{Abbreviations}
\label{Abbreviations}

An {\em abbreviation} is a name denoting a (presumably) more complex
expression. An abbreviation is a special form of notation with no
parameter and only one symbol which is an identifier. This identifier
is given with no quotes around. Example:

\begin{coq_example*}
Notation List := (list nat).
\end{coq_example*}

An abbreviation expects no precedence nor associativity, since it can
always be put at the lower level of atomic expressions, and
associativity is irrelevant. Abbreviations are used as much as
possible by the {\Coq} printers unless the modifier
\verb=(only parsing)= is given.

Abbreviations are bound to an absolute name like for an ordinary
definition, and can be referred by partially qualified names too.

Abbreviations are syntactic in the sense that they are bound to
expressions which are not typed at the time of the definition of the
abbreviation but at the time it is used. Especially, abbreviation can
be bound to terms with holes (i.e. with ``\_'').

\Example

\begin{coq_eval}
Set Strict Implicit.
Reset Initial.
\end{coq_eval}
\begin{coq_example}
Definition explicit_id (A:Set) (a:A) := a.
Notation id := (explicit_id _).
Check (id 0).
\end{coq_example}

Abbreviations do not survive the end of sections. No typing of the denoted
expression is performed at definition time. Type-checking is done only
at the time of use of the abbreviation.

\Rem \index{Syntactic Definition} % For
compatibility Abbreviations are similar to the {\em syntactic
definitions} available in versions of {\Coq} prior to version 8.0,
except that abbreviations are used for printing (unless the modifier
\verb=(only parsing)= is given) while syntactic definitions were not.

\section{Summary}

\paragraph{Persistence of notations}

Notations do not survive the end of sections. They survive modules
unless the command {\tt Notation Local} is used instead of {\tt
Notation}.

\paragraph{Syntax of notations}

The different syntactic variants of the command \texttt{Notation} are
given on Figure \ref{Grammar-Notation}.

\begin{figure}
\begin{tabular}{|lcl|}
\hline
{\sentence} & ::= &
   \texttt{Notation} \zeroone{\tt Local} {\str} \texttt{:=} {\term} 
   \zeroone{\modifiers} \zeroone{:{\scope}} \verb=.=\\
  & $|$ & 
   \texttt{Infix} \zeroone{\tt Local} {\str} \texttt{:=} {\qualid} 
   \zeroone{\modifiers} \zeroone{:{\scope}} \verb=.=\\
  & $|$ & 
   \texttt{Notation} \zeroone{\tt Local} {\ident} \texttt{:=} {\term} 
   \zeroone{\tt (only parsing)} \verb=.=\\
  & $|$ & 
   \texttt{Reserved Notation} \zeroone{\tt Local} {\str}
   \zeroone{\modifiers} \verb=.=\\
\\
{\modifiers}
  & ::= & \nelist{\ident}{,} {\tt at level} {\naturalnumber} \\
  & $|$ & \nelist{\ident}{,} {\tt at next level} \\
  & $|$ & {\tt at level} {\naturalnumber} \\
  & $|$ & {\tt left associativity} \\
  & $|$ & {\tt right associativity} \\
  & $|$ & {\tt no associativity} \\
  & $|$ & {\ident} {\tt ident} \\
  & $|$ & {\ident} {\tt global} \\
  & $|$ & {\ident} {\tt bigint} \\
  & $|$ & {\tt only parsing} \\
  & $|$ & {\tt format} {\str} \\
\hline
\end{tabular}
\caption{Syntax of the variants of {\tt Notation}}
\label{record-syntax}
\end{figure}

\Rem No typing of the denoted expression is performed at definition
time. Type-checking is done only at the time of use of the notation.

\Rem Many examples of {\tt Notation} may be found in the files
composing the initial state of {\Coq} (see directory {\tt
\$COQLIB/theories/Init}).


% $Id$ 

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "Reference-Manual"
%%% End: