diff options
Diffstat (limited to 'dev/doc/profiling.txt')
| -rw-r--r-- | dev/doc/profiling.txt | 128 |
1 files changed, 128 insertions, 0 deletions
diff --git a/dev/doc/profiling.txt b/dev/doc/profiling.txt new file mode 100644 index 0000000000..8455d13377 --- /dev/null +++ b/dev/doc/profiling.txt @@ -0,0 +1,128 @@ +# How to profile Coq? + +I (Pierre-Marie Pédrot) mainly use two OCaml branches to profile Coq, whether I +want to profile time or memory consumption. AFAIK, this only works for Linux. + +## Time + +In Coq source folder: + +opam switch 4.05.0+trunk+fp +./configure -local -debug +make +perf record -g bin/coqc file.v +perf report -g fractal,callee --no-children + +To profile only part of a file, first load it using + +bin/coqtop -l file.v + +and plug into the process + +perf record -g -p PID + +### Per-component [flame graphs](https://github.com/brendangregg/FlameGraph) + +I (Andres Erbsen) have found it useful to look at library-wide flame graphs of +coq time consumption. As the Ltac interpreter stack is reflected in the OCaml +stack, calls to the same primitive can appear on top of multiple essentially +equivalent stacks. To make the profiles more readable, one could either try to +edit the stack trace to merge "equivalent" frames, or simply look at the +aggregate profile on a component-by-component basis. Here is how to do the +second for the standard library ([example output](https://cdn.rawgit.com/andres-erbsen/b29b29cb6480dfc6a662062e4fcd0ae3/raw/304fc3fea9630c8e453929aa7920ca8a2a570d0b/stdlib_categorized_outermost.svg)). + +~~~~~ +#!/bin/bash +make -f Makefile.dune clean +make -f Makefile.dune states +perf record -F99 `# ~1GB of data` --call-graph=dwarf -- make -f Makefile.dune world +perf script --time '0%-100%' | + stackcollapse-perf.pl | + grep Coqtop__compile | + sed -rf <(cat <<'EOF' + s/;caml/;/g + s/_[0-9]*;/;/g + s/Logic_monad__fun;//g + s/_apply[0-9];//g + s/;System/@&@/ + s/;Hashcons/@&@/ + s/;Grammar/@&@/ + s/;Declaremods/@&@/ + s/;Tactics/@&@/ + s/;Pretyping/@&@/ + s/;Typeops/@&@/ + s/;Reduction/@&@/ + s/;Unification/@&@/ + s/;Evarutil/@&@/ + s/;Evd/@&@/ + s/;EConstr/@&@/ + s/;Constr/@&@/ + s/;Univ/@&@/ + s/;Ugraph/@&@/ + s/;UState/@&@/ + s/;Micromega/@&@/ + s/;Omega/@&@/ + s/;Auto/@&@/ + s/;Ltac_plugin__Tacinterp/@&@/ + s/;Ltac_plugin__Rewrite/@&@/ + s/[^@]*@;([^@]*)@/\1;\1/ + s/@//g + :a; s/;([^;]+);\1;/;\1;/g;ta +EOF + ) | + flamegraph.pl +~~~~~ + +## Memory + +You first need a few commits atop trunk for this to work. + +git remote add ppedrot https://github.com/ppedrot/coq.git +git fetch ppedrot +git checkout ppedrot/allocation-profiling +git rebase master + +Then: + +opam switch 4.00.1+alloc-profiling +./configure -local -debug +make + +Note that linking the coqtop binary takes quite an amount of time with this +branch, so do not worry too much. There are more recent branches of +alloc-profiling on mshinwell's repo which can be found at: + +https://github.com/mshinwell/opam-repo-dev + +### For memory dump: + +CAMLRUNPARAM=T,mj bin/coqc file.v + +In another terminal: + +pkill -SIGUSR1 $COQTOPPID +... +pkill -SIGUSR1 $COQTOPPID +dev/decode-major-heap.sh heap.$COQTOPPID.$N bin/coqtop + +where $COQTOPPID is coqtop pid and $N the index of the call to pkill. + +First column is the memory taken by the objects (in words), second one is the +number of objects and third is the place where the objects where allocated. + +### For complete memory graph: + +CAMLRUNPARAM=T,gr bin/coqc file.v + +In another terminal: + +pkill -SIGUSR1 $COQTOPPID +... +pkill -SIGUSR1 $COQTOPPID +ocaml dev/decodegraph.ml edge.$COQTOPPID.$N bin/coqtop > memory.dot +dot -Tpdf -o memory.pdf memory.dot + +where $COQTOPPID is coqtop pid and $N the index of the call to pkill. + +The pdf produced by the last command gives a compact graphical representation of +the various objects allocated. |
