| Age | Commit message (Collapse) | Author |
|
This makes it possible to skip the check when scanning the stack for the
garbage collector.
|
|
Signed primitive integers defined on top of the existing unsigned ones
with two's complement.
The module Sint63 includes the theory of signed primitive integers that
differs from the unsigned case.
Additions to the kernel:
les (signed <=), lts (signed <), compares (signed compare),
divs (signed division), rems (signed remainder),
asr (arithmetic shift right)
(The s suffix is not used when importing the Sint63 module.)
The printing and parsing of primitive ints was updated and the
int63_syntax_plugin was removed (we use Number Notation instead).
A primitive int is parsed / printed as unsigned or signed depending on
the scope. In the default (Set Printing All) case, it is printed in
hexadecimal.
|
|
caml_something_to_do.
Reviewed-by: gasche
Ack-by: ppedrot
Reviewed-by: xavierleroy
|
|
MAKEPROD is just MAKEBLOCK2(0), but one word shorter. Since this opcode is
never encountered in the fast path, this optimization is not worth the
extra complexity.
|
|
The generated bytecode almost never needs to modify the field of an OCaml
object. The only exception is the laziness mechanism of coinductive types.
But it modifies field 2, and thus uses the generic opcode SETFIELD anyway.
|
|
|
|
|
|
When using OCaml >= 4.10, function caml_process_pending_actions_exn is
called whenever the bytecode interpreter tries to apply a term. This
function exits immediately when caml_something_to_do is not set. But since
term application happens every few opcodes, even exiting immediately still
accounts between 5% and 10% of the whole interpreter. So, this commit
makes sure the function is not called unless caml_something_to_do is
already set (i.e., when the user presses Ctrl+C). This means that this
conditional branch is perfectly predicted by the processor.
On the following benchmark, this commit makes the VM 13% faster.
Time Eval vm_compute in Pos.iter (fun x => x) tt 100000000.
Note that, before OCaml 4.10, the VM code was checking the value of
caml_signals_are_pending before calling caml_process_pending_signals. So,
this commit actually fixes a regression.
|
|
Accumulators can grow arbitrarily large, even when well-typed. So, this
commit makes sure they are allocated on the major heap when they are too
large. If so, fields need to be filled with caml_initialize, in case they
point to the minor heap.
|
|
Unfortunately, compilers are currently unable to optimize the nextafter
function, even in the degenerate case where the second argument is
explicitly infinite. So, this commit implements this case by hand.
On the following testcase, this gives a 15% speedup.
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let eps := sub (next_up one) one in
Pos.iter (fun x => next_down (add x eps)) two n.
Time Eval vm_compute in foo 100000000.
And when looking at the cost of just the allocation-free version of
next_down, the speedup is 1500%. Said otherwise, the latency of next_down
is now on par with the measurement noise due to cache misses and the like.
|
|
operations.
Floating-point values are boxed, which means that any operation causes an
allocation. While short-lived, they nonetheless cause the minor heap to
fill, which in turn triggers the garbage collector.
To reduce the number of allocations, I initially went with a shadow stack
mechanism for storing floating-point values. But assuming the CoqInterval
library is representative, this was way too complicated in practice, as
most stack-located values ended up being passed to nextdown and nextup
before being stored in memory.
So, this commit implements a different mechanism. Variants of nextdown and
nextup are added, which reuse the allocation of their input argument.
Obviously, this is only correct if there is no other reference to this
argument. To ensure this property, the commit only uses these opcode
during a peephole optimization. If two primitive operations follow one
another, then the second one can reuse the allocation of the first one,
since it never had time to even reach the stack.
For CoqInterval, this divides the number of allocations due to
floating-point operations by about two.
The following snippet is made 4% faster by this commit (and 13% faster if
we consider only the floating-point operations).
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let eps := sub (next_up one) one in
Pos.iter (fun x => next_down (add x eps)) two n.
Time Eval vm_compute in foo 100000000.
|
|
|
|
Since the code is compiled in -fPIC mode, the compiler cannot inline the
functions, due to the ABI mandating the ability to interpose visible
symbols. Hiding the symbols of coq_float64.h would work, except that they
float64.ml needs to reference them. (See #13124 for more details.)
This commit improves performances by 7% on the following code.
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let two := of_int63 2 in
Pos.iter (fun x => sub (mul x two) two) two n.
Time Eval vm_compute in foo 100000000.
If we consider only the floating-point operations (by ignoring the cost of
the loop), the speedup is actually 30%.
|
|
This does not make the code any slower, since
Is_coq_array(accu) && Is_uint63(sp[0])
and
!Is_accu(accu) && !Is_accu(sp[0])
take the exact same number of tests to pass in the concrete case.
In the accumulator case, it takes one more test to fail, but we do not
care about the performances then.
|
|
|
|
|
|
|
|
1. There is no point in marking plain integers as GC roots.
2. There is no need to restore the stack pointer, as the stack is not
allocated on the OCaml heap (contrarily to coq_env).
|
|
The first field of accumulators points out of the OCaml heap, so using
closures instead of tag-0 objects is better for the GC.
Accumulators are distinguished from closures because their code pointer
necessarily is the "accumulate" address, which points to an ACCUMULATE
instruction.
|
|
That way, accumulators again can be used directly as execution
environments by the bytecode interpreter. This fixes the issue of the
first argument of accumulators being dropped when strongly normalizing.
|
|
The second field of a closure can no longer be the value of the first free
variable (or another closure of a mutually recursive block) but must be an
offset to the first free variable.
This commit makes the bytecode compiler and interpreter agnostic to the
actual representation of closures. To do so, the execution environment
(variable coq_env) no longer points to the currently executed closure but
to the last one. This has the following consequences:
- OFFSETCLOSURE(n) now always loads the n-th closure of a recursive block
(counted from last to first);
- ENVACC(n) now always loads the value of the n-th free variable.
These two changes make the bytecode compiler simpler, since it no
longer has to track the relative position of closures and free variables.
The last change makes the interpreter a bit slower, since it has to adjust
coq_env when executing GRABREC. Hopefully, cache locality will make the
overhead negligible.
|
|
No need to store the case_info, all the data is reconstructible from the
context. Furthermore, this reconstruction is performed in a context where
we already access the environment, so performance is not at stake.
Hopefully this will also reduce the number of globally allocated VM values,
since the switch representation now only depends on the shape of the inductive
type.
|
|
Persistent arrays expose a functional interface but are implemented
using an imperative data structure. The OCaml implementation is based on
Jean-Christophe Filliâtre's.
Co-authored-by: Benjamin Grégoire <Benjamin.Gregoire@inria.fr>
Co-authored-by: Gaëtan Gilbert <gaetan.gilbert@skyskimmer.net>
|
|
This commit also prefixes young_ptr and young_limit along the way, so as
to not rely on OCaml's compatibility layer. This is a gratuitous change,
since this code is only meant to be compiled with OCaml < 4.10 anyway.
|
|
We cannot use caml_alloc_small because the macros Setup_for_gc and
Restore_after_gc are still relevant (and critical). This means defining
the CAML_INTERNALS macro, but it is a legit use and actually documented
in the OCaml manual.
This will help with forward compatibility with OCaml compilers, e.g.,
issue #10603. Unfortunately, it also means that we can no longer use #9914
to prevent memory corruption.
The old macro is still used for OCaml versions prior to 4.10, as the
upstream macro might process Ctrl+C when it is called.
|
|
|
|
|
|
Issue #10603
|
|
|
|
This is a follow up of #11010 ; I've realized that for example in #11123
a large part of the patch is detabification as indeed the files are
mixed in tabs/space style so developers are forced to do the cleanup
each time they work on them.
Command used:
```
for i in `find . -name '*.c' -or -name '*.h'; do expand -i "$i" | sponge "$i"; sed -e's/[[:space:]]*$//' -i.bak "$i"; done
```
Checked empty diff with `git diff --ignore-all-space`
|
|
* Fix the implementations and add tests
* Change shift from int63 to Z (was always used as a Z)
* Update FloatLemmas.v accordingly
Co-authored-by: Erik Martin-Dorel <erik.martin-dorel@irit.fr>
|
|
* Add a related test-suite in compare.v (generated by a bash script)
Co-authored-by: Pierre Roux <pierre.roux@onera.fr>
|
|
|
|
Flag -fexcess-precision=standard is not enough on x86_32
where -msse2 -mfpmath=sse is required (-msse is not enough)
to avoid double rounding issues in the VM.
Most floating-point operation are now implemented in C because OCaml
is suffering double rounding issues on x86_32 with 80 bits extended
precision registers used for floating-point values, causing double
rounding making floating-point arithmetic incorrect with respect to
its specification.
Add a runtime test for double roundings.
|
|
|
|
|
|
Replace `option comparison` with `float_comparison` (:= `FEq | FLt |
FGt | FNotComparable`) as suggested by Guillaume Melquiond to avoid
boxing and an extra match when using primitive float comparison.
|
|
* This commit add float instructions to the VM, their encoding in bytecode
and the interpretation of primitive float values after the reduction.
* The flag '-std=c99' could be added to the C compiler flags to ensure
that float computation strictly follows the norm (ie. i387 80-bits
format is not used as an optimization).
Actually, we use '-fexcess-precision=standard' instead of '-std=c99'
because the latter would disable GNU asm used in the VM.
|
|
Reviewed-by: maximedenes
|
|
|
|
There are three implementations of this primitive:
* one in OCaml on 63 bits integer in kernel/uint63_amd64.ml
* one in OCaml on Int64 in kernel/uint63_x86.ml
* one in C on unsigned 64 bit integers in kernel/byterun/coq_uint63_native.h
Its specification is the axiom `diveucl_21_spec` in
theories/Numbers/Cyclic/Int63/Int63.v
* comment the implementations with loop invariants to enable an easy
pen&paper proof of correctness (note to reviewers: the one in
uint63_amd64.ml might be the easiest to read)
* make sure the three implementations are equivalent
* fix the specification in Int63.v
(only the lowest part of the result is actually returned)
* make a little optimisation in div21 enabled by the proof of correctness
(cmp is computed at the end of the first loop rather than at the beginning,
potentially saving one loop iteration while remaining correct)
* update the proofs in Int63.v and Cyclic63.v to take into account the
new specifiation of div21
* add a test
|
|
Backport
https://github.com/ocaml/ocaml/commit/71b94fa3e8d73c40e298409fa5fd6501383d38a6
and
https://github.com/ocaml/ocaml/commit/d3e86fdfcc8f40a99380303f16f9b782233e047e
from OCaml VM
|
|
Backport
https://github.com/ocaml/ocaml/commit/c6ce97fe26e149d43ee2cf71ca821a4592ce1785
from OCaml VM
|
|
Backport
https://github.com/ocaml/ocaml/commit/eb1922c6ab88e832e39ba3972fab619081061928
from OCaml VM
|
|
Backport
https://github.com/ocaml/ocaml/commit/055d5c0379e42b4f561cb1fc5159659d8e9a7b6f
from OCaml VM
|
|
Backport
https://github.com/ocaml/ocaml/commit/bc333918980b97a2c81031ec33e72a417f854376
from OCaml VM
|
|
Protect accu and coq_env against GC calls in the VM when calling
primitive integer functions on 32 bits architecture.
|
|
This work makes it possible to take advantage of a compact
representation for integers in the entire system, as opposed to only
in some reduction machines. It is useful for heavily computational
applications, where even constructing terms is not possible without such
a representation.
Concretely, it replaces part of the retroknowledge machinery with
a primitive construction for integers in terms, and introduces a kind of
FFI which maps constants to operators (on integers). Properties of these
operators are expressed as explicit axioms, whereas they were hidden in
the retroknowledge-based approach.
This has been presented at the Coq workshop and some Coq Working Groups,
and has been used by various groups for STM trace checking,
computational analysis, etc.
Contributions by Guillaume Bertholon and Pierre Roux <Pierre.Roux@onera.fr>
Co-authored-by: Benjamin Grégoire <Benjamin.Gregoire@inria.fr>
Co-authored-by: Vincent Laporte <Vincent.Laporte@fondation-inria.fr>
|
|
|
|
This prevents the existence of a few naked pointers to C heap from the OCaml
heap. VM bytecode is represented as any block of size at least 1 whose first
field points to a C-allocated string.
This representation is compatible with the Coq VM representation of
(potentially recursive) closures, which are already specifically tailored
in the OCaml GC to be able to contain out-of-heap data.
|