| Age | Commit message (Collapse) | Author |
|
Ack-by: ppedrot
Reviewed-by: proux01
|
|
Reviewed-by: ppedrot
|
|
|
|
|
|
|
|
|
|
Reviewed-by: ppedrot
|
|
information constructors
Reviewed-by: SkySkimmer
|
|
Unfortunately, compilers are currently unable to optimize the nextafter
function, even in the degenerate case where the second argument is
explicitly infinite. So, this commit implements this case by hand.
On the following testcase, this gives a 15% speedup.
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let eps := sub (next_up one) one in
Pos.iter (fun x => next_down (add x eps)) two n.
Time Eval vm_compute in foo 100000000.
And when looking at the cost of just the allocation-free version of
next_down, the speedup is 1500%. Said otherwise, the latency of next_down
is now on par with the measurement noise due to cache misses and the like.
|
|
operations.
Floating-point values are boxed, which means that any operation causes an
allocation. While short-lived, they nonetheless cause the minor heap to
fill, which in turn triggers the garbage collector.
To reduce the number of allocations, I initially went with a shadow stack
mechanism for storing floating-point values. But assuming the CoqInterval
library is representative, this was way too complicated in practice, as
most stack-located values ended up being passed to nextdown and nextup
before being stored in memory.
So, this commit implements a different mechanism. Variants of nextdown and
nextup are added, which reuse the allocation of their input argument.
Obviously, this is only correct if there is no other reference to this
argument. To ensure this property, the commit only uses these opcode
during a peephole optimization. If two primitive operations follow one
another, then the second one can reuse the allocation of the first one,
since it never had time to even reach the stack.
For CoqInterval, this divides the number of allocations due to
floating-point operations by about two.
The following snippet is made 4% faster by this commit (and 13% faster if
we consider only the floating-point operations).
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let eps := sub (next_up one) one in
Pos.iter (fun x => next_down (add x eps)) two n.
Time Eval vm_compute in foo 100000000.
|
|
|
|
|
|
Since the code is compiled in -fPIC mode, the compiler cannot inline the
functions, due to the ABI mandating the ability to interpose visible
symbols. Hiding the symbols of coq_float64.h would work, except that they
float64.ml needs to reference them. (See #13124 for more details.)
This commit improves performances by 7% on the following code.
From Coq Require Import Int63 BinPos PrimFloat.
Definition foo n :=
let two := of_int63 2 in
Pos.iter (fun x => sub (mul x two) two) two n.
Time Eval vm_compute in foo 100000000.
If we consider only the floating-point operations (by ignoring the cost of
the loop), the speedup is actually 30%.
|
|
This does not make the code any slower, since
Is_coq_array(accu) && Is_uint63(sp[0])
and
!Is_accu(accu) && !Is_accu(sp[0])
take the exact same number of tests to pass in the concrete case.
In the accumulator case, it takes one more test to fail, but we do not
care about the performances then.
|
|
|
|
|
|
|
|
|
|
Otherwise, these constructs would be followed by a spurious Kreturn
opcode, when in tail position.
|
|
|
|
|
|
1. There is no point in marking plain integers as GC roots.
2. There is no need to restore the stack pointer, as the stack is not
allocated on the OCaml heap (contrarily to coq_env).
|
|
Fix #13354
This change is very specific to array, but should not be a significant
obstacle to generalization of the feature to eg axioms if we want to later.
|
|
BTW it was incorrect (array needs an instance)
|
|
not only on subidentifiers of an identifier
Reviewed-by: Zimmi48
|
|
There was not any difference between those after the cleanup patches that
come before.
|
|
It was a hidden invariant of the code.
|
|
|
|
Polymorphic side-effects generated in monomorphic mode would be counted towards
trusted subcomponents. This would allow to make ill-typed terms pass as
legitimate by mimicking the shape of the inlining of monomorphic side-effects in
such a proof.
|
|
|
|
Instead we store that data in the native code that was generated in adapt
the compilation scheme accordingly. Less indirections and less imperative
tinkering makes the code safer.
The global symbol table was originally introduced in #10359 as a way not to
depend on the Global module in the generated code. By storing all the
native-related information in the cmxs file itself, this PR also makes other
changes easier, such as e.g. #13287.
|
|
are in custom output path
Reviewed-by: maximedenes
Reviewed-by: herbelin
|
|
|
|
No need to zip the stack if the machine has made no progress.
|
|
Reviewed-by: SkySkimmer
|
|
Reviewed-by: ppedrot
|
|
By no means a float is a neutral value. When put inside a Zprimitive context
it can trigger computation.
|
|
A partially applied primitive was considered CClosure.Norm, i.e. neutral. But
this is not true, because substituting this term as the head of an application
may trigger further reduction. In this respect, primitive functions behave like
fixpoints.
|
|
Not sure if we can get a bug from this omission.
|
|
|
|
|
|
|
|
We introduce a module type not to have to redeclare CanOrd, UserOrd and
SyntacticOrd all over the place.
|
|
|
|
This is similar to Constant and MutInd but for some reason this was was never
done. Such a patch makes the whole API more regular. We also deprecate the
legacy aliases.
|
|
|
|
|
|
This allows to quickly spot the parts of the code that rely on the canonical
ordering. When possible we directly introduce the quotient-aware versions.
|
|
For now it does not do anything but eventually it should be used to replace
the reliance on canonical names for dual kerpairs such as e.g. constants and
inductive types.
|
|
Reviewed-by: maximedenes
|