# Summary The Ltac tactic language is probably one of the ingredients of the success of Coq, yet it is at the same time its Achilles' heel. Indeed, Ltac: - has nothing like intended semantics - is very non-uniform due to organic growth - lacks expressivity and requires programming-by-hacking - is slow - is error-prone and fragile - has an intricate implementation This has a lot of terrible consequences, most notably the fact that it is never clear whether some observed behaviour is a bug or a proper one. Following the need of users that start developing huge projects relying critically on Ltac, we believe that we should offer a proper modern language that features at least the following: - at least informal, predictible semantics - a typing system - standard programming facilities (i.e. datatypes) This document describes the implementation of such a language. The implementation of Ltac as of Coq 8.7 will be referred to as Ltac1. # General design There are various alternatives to Ltac1, such that Mtac or Rtac for instance. While those alternatives can be quite distinct from Ltac1, we designed Ltac2 to be closest as reasonably possible to Ltac1, while fixing the aforementioned defects. In particular, Ltac2 is: - a member of the ML family of languages, i.e. * a call-by-value functional language * with effects * together with Hindley-Milner type system - a language featuring meta-programming facilities for the manipulation of Coq-side terms - a language featuring notation facilities to help writing palatable scripts We describe more in details each point in the remainder of this document. # ML component ## Overview Ltac2 is a member of the ML family of languages, in the sense that it is an effectful call-by-value functional language, with static typing à la Hindley-Milner. It is commonly accepted that ML constitutes a sweet spot in PL design, as it is relatively expressive while not being either too lax (contrarily to dynamic typing) nor too strict (contrarily to say, dependent types). The main goal of Ltac2 is to serve as a meta-language for Coq. As such, it naturally fits in the ML lineage, just as the historical ML was designed as the tactic language for the LCF prover. It can also be seen as a general-purpose language, by simply forgetting about the Coq-specific features. Sticking to a standard ML type system can be considered somewhat weak for a meta-language designed to manipulate Coq terms. In particular, there is no way to statically guarantee that a Coq term resulting from an Ltac2 computation will be well-typed. This is actually a design choice, motivated by retro-compatibility with Ltac1. Instead, well-typedness is deferred to dynamic checks, allowing many primitive functions to fail whenever they are provided with an ill-typed term. The language is naturally effectful as it manipulates the global state of the proof engine. This allows to think of proof-modifying primitives as effects in a straightforward way. Semantically, proof manipulation lives in a monad, which allows to ensure that Ltac2 satisfies the same equations as a generic ML with unspecified effects would do, e.g. function reduction is substitution by a value. ## Type Syntax At the level of terms, we simply elaborate on Ltac1 syntax, which is quite close to e.g. the one of OCaml. Types follow the simply-typed syntax of OCaml. ``` TYPE := | "(" TYPE₀ "," ... "," TYPEₙ ")" TYPECONST | "(" TYPE₀ "*" ... "*" TYPEₙ ")" | TYPE₁ "->" TYPE₂ | TYPEVAR TYPECONST := ( MODPATH "." )* LIDENT TYPEVAR := "'" LIDENT TYPEPARAMS := "(" TYPEVAR₀ "," ... "," TYPEVARₙ ")" ``` The set of base types can be extended thanks to the usual ML type declarations such as algebraic datatypes and records. Built-in types include: - `int`, machine integers (size not specified, in practice inherited from OCaml) - `string`, mutable strings - `'a array`, mutable arrays - `exn`, exceptions - `constr`, kernel-side terms - `pattern`, term patterns - `ident`, well-formed identifiers ## Type declarations One can define new types by the following commands. ``` VERNAC ::= | "Ltac2" "Type" TYPEPARAMS LIDENT | "Ltac2" "Type" RECFLAG TYPEPARAMS LIDENT ":=" TYPEDEF RECFLAG := ( "rec" ) ``` The first command defines an abstract type. It has no use for the end user and is dedicated to types representing data coming from the OCaml world. The second command defines a type with a manifest. There are four possible kinds of such definitions: alias, variant, record and open variant types. ``` TYPEDEF := | TYPE | "[" CONSTRUCTORDEF₀ "|" ... "|" CONSTRUCTORDEFₙ "]" | "{" FIELDDEF₀ ";" ... ";" FIELDDEFₙ "}" | "[" "..." "]" CONSTRUCTORDEF := | IDENT ( "(" TYPE₀ "," ... "," TYPE₀ ")" ) FIELDDEF := | MUTFLAG IDENT ":" TYPE MUTFLAG := ( "mutable" ) ``` Aliases are just a name for a given type expression and are transparently unfoldable to it. They cannot be recursive. Variants are sum types defined by constructors and eliminated by pattern-matching. They can be recursive, but the `RECFLAG` must be explicitly set. Pattern-maching must be exhaustive. Records are product types with named fields and eliminated by projection. Likewise they can be recursive if the `RECFLAG` is set. Open variants are a special kind of variant types whose constructors are not statically defined, but can instead by extended dynamically. A typical example is the standard `exn` type. Pattern-matching must always include a catch-all clause. They can be extended by the following command. ``` VERNAC ::= | "Ltac2" "Type" TYPEPARAMS QUALID ":=" "[" CONSTRUCTORDEF "]" ``` ## Term Syntax The syntax of the functional fragment is very close to the one of Ltac1, except that it adds a true pattern-matching feature, as well as a few standard constructions from ML. ``` VAR := LIDENT QUALID := ( MODPATH "." )* LIDENT CONSTRUCTOR := UIDENT TERM := | QUALID | CONSTRUCTOR TERM₀ ... TERMₙ | TERM TERM₀ ... TERMₙ | "fun" VAR "=>" TERM | "let" VAR ":=" TERM "in" TERM | "let" "rec" VAR ":=" TERM "in" TERM | "match" TERM "with" BRANCH* "end" | INT | STRING | "[|" TERM₀ ";" ... ";" TERMₙ "|]" | "(" TERM₀ "," ... "," TERMₙ ")" | "{" FIELD+ "}" | TERM "." "(" QUALID ")" | TERM₁ "." "(" QUALID ")" ":=" TERM₂ | "["; TERM₀ ";" ... ";" TERMₙ "]" | TERM₁ "::" TERM₂ | ... BRANCH := | PATTERN "=>" TERM PATTERN := | VAR | "_" | "(" PATTERN₀ "," ... "," PATTERNₙ ")" | CONSTRUCTOR PATTERN₀ ... PATTERNₙ | "[" "]" | PATTERN₁ "::" PATTERN₂ FIELD := | QUALID ":=" TERM ``` In practice, there is some additional syntactic sugar that allows e.g. to bind a variable and match on it at the same time, in the usual ML style. There is a dedicated syntax for list and array litterals. Limitations: for now, deep pattern matching is not implemented yet. ## Ltac Definitions One can define a new global Ltac2 value using the following syntax. ``` VERNAC ::= | "Ltac2" RECFLAG LIDENT ":=" TERM ``` For semantic reasons, the body of the Ltac2 definition must be a syntactical value, i.e. a function, a constant or a pure constructor recursively applied to values. If the `RECFLAG` is set, the tactic is expanded into a recursive binding. ## Reduction We use the usual ML call-by-value reduction, with an otherwise unspecified evaluation order. This is a design choice making it compatible with OCaml, if ever we implement native compilation. The expected equations are as follows. ``` (fun x => t) V ≡ t{x := V} (βv) let x := V in t ≡ t{x := V} (let) match C V₀ ... Vₙ with ... | C x₀ ... xₙ => t | ... end ≡ t {xᵢ := Vᵢ} (ι) (t any term, V values, C constructor) ``` Note that call-by-value reduction is already a departure from Ltac1 which uses heuristics to decide when evaluating an expression. For instance, the following expressions do not evaluate the same way in Ltac1. ``` foo (idtac; let x := 0 in bar) foo (let x := 0 in bar) ``` Instead of relying on the `idtac` hack, we would now require an explicit thunk not to compute the argument, and `foo` would have e.g. type `(unit -> unit) -> unit`. ``` foo (fun () -> let x := 0 in bar) ``` ## Typing Typing is strict and follows Hindley-Milner system. We will not implement the current hackish subtyping semantics, and one will have to resort to conversion functions. See notations though to make things more palatable. In this setting, all usual argument-free tactics have type `unit -> unit`, but one can return as well a value of type `t` thanks to terms of type `unit -> t`, or take additional arguments. ## Effects Regarding effects, nothing involved here, except that instead of using the standard IO monad as the ambient effectful world, Ltac2 is going to use the tactic monad. Note that the order of evaluation of application is *not* specified and is implementation-dependent, as in OCaml. We recall that the `Proofview.tactic` monad is essentially a IO monad together with backtracking state representing the proof state. Intuitively a thunk of type `unit -> 'a` can do the following: - It can perform non-backtracking IO like printing and setting mutable variables - It can fail in a non-recoverable way - It can use first-class backtrack. The proper way to figure that is that we morally have the following isomorphism: `(unit -> 'a) ~ (unit -> exn + ('a * (exn -> 'a)))` i.e. thunks can produce a lazy list of results where each tail is waiting for a continuation exception. - It can access a backtracking proof state, made out amongst other things of the current evar assignation and the list of goals under focus. We describe more thoroughly the various effects existing in Ltac2 hereafter. ### Standard IO The Ltac2 language features non-backtracking IO, notably mutable data and printing operations. Mutable fields of records can be modified using the set syntax. Likewise, built-in types like `string` and `array` feature imperative assignment. See modules `String` and `Array` respectively. A few printing primitives are provided in the `Message` module, allowing to display information to the user. ### Fatal errors The Ltac2 language provides non-backtracking exceptions through the following primitive in module `Control`. ``` val throw : exn -> 'a ``` Contrarily to backtracking exceptions from the next section, this kind of error is never caught by backtracking primitives, that is, throwing an exception destroys the stack. This is materialized by the following equation, where `E` is an evaluation context. ``` E[throw e] ≡ throw e (e value) ``` There is currently no way to catch such an exception and it is a design choice. There might be at some future point a way to catch it in a brutal way, destroying all backtrack and return values. ### Backtrack In Ltac2, we have the following backtracking primitives, defined in the `Control` module. ``` Ltac2 Type 'a result := [ Val ('a) | Err (exn) ]. val zero : exn -> 'a val plus : (unit -> 'a) -> (exn -> 'a) -> 'a val case : (unit -> 'a) -> ('a * (exn -> 'a)) result ``` If one sees thunks as lazy lists, then `zero` is the empty list and `plus` is list concatenation, while `case` is pattern-matching. The backtracking is first-class, i.e. one can write `plus "x" (fun () => "y") : string` producing a backtracking string. These operations are expected to satisfy a few equations, most notably that they form a monoid compatible with sequentialization. ``` plus t zero ≡ t () plus (fun () -> zero e) f ≡ f e plus (plus t f) g ≡ plus t (fun e -> plus (f e) g) case (fun () -> zero e) ≡ Err e case (fun () -> plus (fun () -> t) f) ≡ Val t f let x := zero e in u ≡ zero e let x := plus t f in u ≡ plus (fun () -> let x := t in u) (fun e -> let x := f e in u) (t, u, f, g, e values) ``` ### Goals A goal is given by the data of its conclusion and hypotheses, i.e. it can be represented as `[Γ ⊢ A]`. The tactic monad naturally operates over the whole proofview, which may represent several goals, including none. Thus, there is no such thing as *the current goal*. Goals are naturally ordered, though. It is natural to do the same in Ltac2, but we must provide a way to get access to a given goal. This is the role of the `enter` primitive, that applies a tactic to each currently focussed goal in turn. ``` val enter : (unit -> unit) -> unit ``` It is guaranteed that when evaluating `enter f`, `f` is called with exactly one goal under focus. Note that `f` may be called several times, or never, depending on the number of goals under focus before the call to `enter`. Accessing the goal data is then implicit in the Ltac2 primitives, and may panic if the invariants are not respected. The two essential functions for observing goals are given below. ``` val hyp : ident -> constr val goal : unit -> constr ``` The two above functions panic if there is not exactly one goal under focus. In addition, `hyp` may also fail if there is no hypothesis with the corresponding name. # Meta-programming ## Overview One of the horrendous implementation issues of Ltac is the fact that it is never clear whether an object refers to the object world or the meta-world. This is an incredible source of slowness, as the interpretation must be aware of bound variables and must use heuristics to decide whether a variable is a proper one or referring to something in the Ltac context. Likewise, in Ltac1, constr parsing is implicit, so that `foo 0` is not `foo` applied to the Ltac integer expression `0` (Ltac does have a non-first-class notion of integers), but rather the Coq term `Datatypes.O`. We should stop doing that! We need to mark when quoting and unquoting, although we need to do that in a short and elegant way so as not to be too cumbersome to the user. ## Generic Syntax for Quotations In general, quotations can be introduced in term by the following syntax, where `QUOTENTRY` is some parsing entry. ``` TERM ::= | QUOTNAME ":" "(" QUOTENTRY ")" QUOTNAME := IDENT ``` The current implementation recognizes the following built-in quotations: - "ident", which parses identifiers (type `Init.ident`). - "constr", which parses Coq terms and produces an-evar free term at runtime (type `Init.constr`). - "open_constr", which parses Coq terms and produces a term potentially with holes at runtime (type `Init.constr` as well). - "pattern", which parses Coq patterns and produces a pattern used for term matching (type `Init.pattern`). The following syntactic sugar is provided for two common cases. - `@id` is the same as ident:(id) - `'t` is the same as open_constr:(t) ## Term Antiquotations ### Syntax One can also insert Ltac2 code into Coq term, similarly to what was possible in Ltac1. ``` COQCONSTR ::= | "ltac2" ":" "(" TERM ")" ``` Antiquoted terms are expected to have type `unit`, as they are only evaluated for their side-effects. ### Semantics Interpretation of a quoted Coq term is done in two phases, internalization and evaluation. - Internalization is part of the static semantics, i.e. it is done at Ltac2 typing time. - Evaluation is part of the dynamic semantics, i.e. it is done when a term gets effectively computed by Ltac2. Remark that typing of Coq terms is a *dynamic* process occuring at Ltac2 evaluation time, and not at Ltac2 typing time. #### Static semantics During internalization, Coq variables are resolved and antiquotations are type-checked as Ltac2 terms, effectively producing a `glob_constr` in Coq implementation terminology. Note that although it went through the type-checking of **Ltac2**, the resulting term has not been fully computed and is potentially ill-typed as a runtime **Coq** term. ``` Ltac2 Definition myconstr () := constr:(nat -> 0). // Valid with type `unit -> constr`, but will fail at runtime. ``` Term antiquotations are type-checked in the enclosing Ltac2 typing context of the corresponding term expression. For instance, the following with type-check. ``` let x := '0 in constr:(1 + ltac2:(exact x)) // type constr ``` Beware that the typing environment of typing of antiquotations is **not** expanded by the Coq binders from the term. Namely, it means that the following Ltac2 expression will **not** type-check. ``` constr:(fun x : nat => ltac2:(exact x)) // Error: Unbound variable 'x' ``` There is a simple reason for that, which is that the following expression would not make sense in general. ``` constr:(fun x : nat => ltac2:(clear @x; exact x)) ``` Indeed, a hypothesis can suddenly disappear from the runtime context if some other tactic pulls the rug from under you. Rather, the tactic writer has to resort to the **dynamic** goal environment, and must write instead explicitly that she is accessing a hypothesis, typically as follows. ``` constr:(fun x : nat => ltac2:(exact (hyp @x))) ``` This pattern is so common that we provide dedicated Ltac2 and Coq term notations for it. - `&x` as an Ltac2 expression expands to `hyp @x`. - `&x` as an Coq constr expression expands to `ltac2:(refine (fun () => hyp @x))`. #### Dynamic semantics During evaluation, a quoted term is fully evaluated to a kernel term, and is in particular type-checked in the current environment. Evaluation of a quoted term goes as follows. - The quoted term is first evaluated by the pretyper. - Antiquotations are then evaluated in a context where there is exactly one goal under focus, with the hypotheses coming from the current environment extended with the bound variables of the term, and the resulting term is fed into the quoted term. Relative orders of evaluation of antiquotations and quoted term is not specified. For instance, in the following example, `tac` will be evaluated in a context with exactly one goal under focus, whose last hypothesis is `H : nat`. The whole expression will thus evaluate to the term `fun H : nat => nat`. ``` let tac () := hyp @H in constr:(fun H : nat => ltac2:(tac ())) ``` Many standard tactics perform type-checking of their argument before going further. It is your duty to ensure that terms are well-typed when calling such tactics. Failure to do so will result in non-recoverable exceptions. ## Patterns Terms can be used in pattern position just as any Ltac constructor. The accepted syntax is a subset of the constr syntax in Ltac term position. It does not allow antiquotations. Patterns quotations are typically used with the matching functions provided in the `Pattern` module. # Notations Notations are the crux of the usability of Ltac1. We should be able to recover a feeling similar to the old implementation by using and abusing notations. ## Scopes A scope is a name given to a grammar entry used to produce some Ltac2 expression at parsing time. Scopes are described using a form of S-expression. ``` SCOPE := | STRING | INT | LIDENT ( "(" SCOPE₀ "," ... "," SCOPEₙ ")" ) ``` A few scopes contain antiquotation features. For sake of uniformity, all antiquotations are introduced by the syntax `"$" VAR`. The following scopes are built-in. - constr: + parses `c = COQCONSTR` and produces `constr:(c)` - ident: + parses `id = IDENT` and produces `ident:(id)` + parses `"$" (x = IDENT)` and produces the variable `x` - list0(*scope*): + if *scope* parses `ENTRY`, parses ̀`(x₀, ..., xₙ = ENTRY*)` and produces `[x₀; ...; xₙ]`. - list0(*scope*, sep = STRING): + if *scope* parses `ENTRY`, parses `(x₀ = ENTRY, "sep", ..., "sep", xₙ = ENTRY)` and produces `[x₀; ...; xₙ]`. - list1: same as list0 (with or without separator) but parses `ENTRY+` instead of `ENTRY*`. - opt(*scope*) + if *scope* parses `ENTRY`, parses `ENTRY?` and produces either `None` or `Some x` where `x` is the parsed expression. - self: + parses a Ltac2 expression at the current level and return it as is. - next: + parses a Ltac2 expression at the next level and return it as is. - tactic(n = INT): + parses a Ltac2 expression at the provided level *n* and return it as is. - thunk(*scope*): + parses the same as *scope*, and if *e* is the parsed expression, returns `fun () => e`. - STRING: + parses the corresponding string as a CAMLP5 IDENT and returns `()`. - keyword(s = STRING): + parses the string *s* as a keyword and returns `()`. - terminal(s = STRING): + parses the string *s* as a keyword, if it is already a keyword, otherwise as an IDENT. Returns `()`. - seq(*scope₁*, ..., *scopeₙ*): + parses *scope₁*, ..., *scopeₙ* in this order, and produces a n-tuple made out of the parsed values in the same order. It is forbidden for the various subscopes to refer to the global entry using self of next. For now there is no way to declare new scopes from Ltac2 side, but this is planned. ## Notations The Ltac2 parser can be extended by syntactic notations. ``` VERNAC ::= | "Ltac2" "Notation" TOKEN+ LEVEL? ":=" TERM LEVEL := ":" INT TOKEN := | VAR "(" SCOPE ")" | STRING ``` A Ltac2 notation adds a parsing rule to the Ltac2 grammar, which is expanded to the provided body where every token from the notation is let-bound to the corresponding generated expression. For instance, assume we perform: ``` Ltac2 Notation "foo" c(thunk(constr)) ids(list0(ident)) := Bar.f c ids. ``` Then the following expression ``` let y := @X in foo (nat -> nat) x $y ``` will expand at parsing time to ``` let y := @X in let c := fun () => constr:(nat -> nat) with ids := [@x; y] in Bar.f c ids ``` Beware that the order of evaluation of multiple let-bindings is not specified, so that you may have to resort to thunking to ensure that side-effects are performed at the right time. ## Abbreviations There exists a special kind of notations, called abbreviations, that is designed so that it does not add any parsing rules. It is similar in spirit to Coq abbreviations, insofar as its main purpose is to give an absolute name to a piece of pure syntax, which can be transparently referred by this name as if it were a proper definition. Abbreviations are introduced by the following syntax. ``` VERNAC ::= | "Ltac2" "Notation" IDENT ":=" TERM ``` The abbreviation can then be manipulated just as a normal Ltac2 definition, except that it is expanded at internalization time into the given expression. Furthermore, in order to make this kind of construction useful in practice in an effectful language such as Ltac2, any syntactic argument to an abbreviation is thunked on-the-fly during its expansion. For instance, suppose that we define the following. ``` Ltac2 Notation foo := fun x => x (). ``` Then we have the following expansion at internalization time. ``` foo 0 ↦ (fun x => x ()) (fun _ => 0) ``` Note that abbreviations are not typechecked at all, and may result in typing errors after expansion. # TODO - Implement deep pattern-matching. - Implement compatibility layer with Ltac1 - Craft an expressive set of primitive functions - Implement native compilation to OCaml