| Age | Commit message (Collapse) | Author |
|
MAKEPROD is just MAKEBLOCK2(0), but one word shorter. Since this opcode is
never encountered in the fast path, this optimization is not worth the
extra complexity.
|
|
The generated bytecode almost never needs to modify the field of an OCaml
object. The only exception is the laziness mechanism of coinductive types.
But it modifies field 2, and thus uses the generic opcode SETFIELD anyway.
|
|
|
|
|
|
operations.
Floating-point values are boxed, which means that any operation causes an
allocation. While short-lived, they nonetheless cause the minor heap to
fill, which in turn triggers the garbage collector.
To reduce the number of allocations, I initially went with a shadow stack
mechanism for storing floating-point values. But assuming the CoqInterval
library is representative, this was way too complicated in practice, as
most stack-located values ended up being passed to nextdown and nextup
before being stored in memory.
So, this commit implements a different mechanism. Variants of nextdown and
nextup are added, which reuse the allocation of their input argument.
Obviously, this is only correct if there is no other reference to this
argument. To ensure this property, the commit only uses these opcode
during a peephole optimization. If two primitive operations follow one
another, then the second one can reuse the allocation of the first one,
since it never had time to even reach the stack.
For CoqInterval, this divides the number of allocations due to
floating-point operations by about two.
The following snippet is made 4% faster by this commit (and 13% faster if
we consider only the floating-point operations).
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let eps := sub (next_up one) one in
Pos.iter (fun x => next_down (add x eps)) two n.
Time Eval vm_compute in foo 100000000.
|
|
This does not make the code any slower, since
Is_coq_array(accu) && Is_uint63(sp[0])
and
!Is_accu(accu) && !Is_accu(sp[0])
take the exact same number of tests to pass in the concrete case.
In the accumulator case, it takes one more test to fail, but we do not
care about the performances then.
|
|
|
|
|
|
The second field of a closure can no longer be the value of the first free
variable (or another closure of a mutually recursive block) but must be an
offset to the first free variable.
This commit makes the bytecode compiler and interpreter agnostic to the
actual representation of closures. To do so, the execution environment
(variable coq_env) no longer points to the currently executed closure but
to the last one. This has the following consequences:
- OFFSETCLOSURE(n) now always loads the n-th closure of a recursive block
(counted from last to first);
- ENVACC(n) now always loads the value of the n-th free variable.
These two changes make the bytecode compiler simpler, since it no
longer has to track the relative position of closures and free variables.
The last change makes the interpreter a bit slower, since it has to adjust
coq_env when executing GRABREC. Hopefully, cache locality will make the
overhead negligible.
|
|
|
|
Persistent arrays expose a functional interface but are implemented
using an imperative data structure. The OCaml implementation is based on
Jean-Christophe Filliâtre's.
Co-authored-by: Benjamin Grégoire <Benjamin.Gregoire@inria.fr>
Co-authored-by: Gaëtan Gilbert <gaetan.gilbert@skyskimmer.net>
|
|
Add headers to a few files which were missing them.
|
|
* Add a related test-suite in compare.v (generated by a bash script)
Co-authored-by: Pierre Roux <pierre.roux@onera.fr>
|
|
|
|
|
|
* This commit add float instructions to the VM, their encoding in bytecode
and the interpretation of primitive float values after the reduction.
* The flag '-std=c99' could be added to the C compiler flags to ensure
that float computation strictly follows the norm (ie. i387 80-bits
format is not used as an optimization).
Actually, we use '-fexcess-precision=standard' instead of '-std=c99'
because the latter would disable GNU asm used in the VM.
|
|
|
|
Files kernel/copcodes.ml, kernel/byterun/coq_instruct.h, and
kernel/byterun/coq_jumptbl.h are generated by a simple OCaml program
rather than a pipeline of sed and awk text processing.
|