aboutsummaryrefslogtreecommitdiff
path: root/dev/doc/profiling.txt
diff options
context:
space:
mode:
Diffstat (limited to 'dev/doc/profiling.txt')
-rw-r--r--dev/doc/profiling.txt128
1 files changed, 128 insertions, 0 deletions
diff --git a/dev/doc/profiling.txt b/dev/doc/profiling.txt
new file mode 100644
index 0000000000..8455d13377
--- /dev/null
+++ b/dev/doc/profiling.txt
@@ -0,0 +1,128 @@
+# How to profile Coq?
+
+I (Pierre-Marie Pédrot) mainly use two OCaml branches to profile Coq, whether I
+want to profile time or memory consumption. AFAIK, this only works for Linux.
+
+## Time
+
+In Coq source folder:
+
+opam switch 4.05.0+trunk+fp
+./configure -local -debug
+make
+perf record -g bin/coqc file.v
+perf report -g fractal,callee --no-children
+
+To profile only part of a file, first load it using
+
+bin/coqtop -l file.v
+
+and plug into the process
+
+perf record -g -p PID
+
+### Per-component [flame graphs](https://github.com/brendangregg/FlameGraph)
+
+I (Andres Erbsen) have found it useful to look at library-wide flame graphs of
+coq time consumption. As the Ltac interpreter stack is reflected in the OCaml
+stack, calls to the same primitive can appear on top of multiple essentially
+equivalent stacks. To make the profiles more readable, one could either try to
+edit the stack trace to merge "equivalent" frames, or simply look at the
+aggregate profile on a component-by-component basis. Here is how to do the
+second for the standard library ([example output](https://cdn.rawgit.com/andres-erbsen/b29b29cb6480dfc6a662062e4fcd0ae3/raw/304fc3fea9630c8e453929aa7920ca8a2a570d0b/stdlib_categorized_outermost.svg)).
+
+~~~~~
+#!/bin/bash
+make -f Makefile.dune clean
+make -f Makefile.dune states
+perf record -F99 `# ~1GB of data` --call-graph=dwarf -- make -f Makefile.dune world
+perf script --time '0%-100%' |
+ stackcollapse-perf.pl |
+ grep Coqtop__compile |
+ sed -rf <(cat <<'EOF'
+ s/;caml/;/g
+ s/_[0-9]*;/;/g
+ s/Logic_monad__fun;//g
+ s/_apply[0-9];//g
+ s/;System/@&@/
+ s/;Hashcons/@&@/
+ s/;Grammar/@&@/
+ s/;Declaremods/@&@/
+ s/;Tactics/@&@/
+ s/;Pretyping/@&@/
+ s/;Typeops/@&@/
+ s/;Reduction/@&@/
+ s/;Unification/@&@/
+ s/;Evarutil/@&@/
+ s/;Evd/@&@/
+ s/;EConstr/@&@/
+ s/;Constr/@&@/
+ s/;Univ/@&@/
+ s/;Ugraph/@&@/
+ s/;UState/@&@/
+ s/;Micromega/@&@/
+ s/;Omega/@&@/
+ s/;Auto/@&@/
+ s/;Ltac_plugin__Tacinterp/@&@/
+ s/;Ltac_plugin__Rewrite/@&@/
+ s/[^@]*@;([^@]*)@/\1;\1/
+ s/@//g
+ :a; s/;([^;]+);\1;/;\1;/g;ta
+EOF
+ ) |
+ flamegraph.pl
+~~~~~
+
+## Memory
+
+You first need a few commits atop trunk for this to work.
+
+git remote add ppedrot https://github.com/ppedrot/coq.git
+git fetch ppedrot
+git checkout ppedrot/allocation-profiling
+git rebase master
+
+Then:
+
+opam switch 4.00.1+alloc-profiling
+./configure -local -debug
+make
+
+Note that linking the coqtop binary takes quite an amount of time with this
+branch, so do not worry too much. There are more recent branches of
+alloc-profiling on mshinwell's repo which can be found at:
+
+https://github.com/mshinwell/opam-repo-dev
+
+### For memory dump:
+
+CAMLRUNPARAM=T,mj bin/coqc file.v
+
+In another terminal:
+
+pkill -SIGUSR1 $COQTOPPID
+...
+pkill -SIGUSR1 $COQTOPPID
+dev/decode-major-heap.sh heap.$COQTOPPID.$N bin/coqtop
+
+where $COQTOPPID is coqtop pid and $N the index of the call to pkill.
+
+First column is the memory taken by the objects (in words), second one is the
+number of objects and third is the place where the objects where allocated.
+
+### For complete memory graph:
+
+CAMLRUNPARAM=T,gr bin/coqc file.v
+
+In another terminal:
+
+pkill -SIGUSR1 $COQTOPPID
+...
+pkill -SIGUSR1 $COQTOPPID
+ocaml dev/decodegraph.ml edge.$COQTOPPID.$N bin/coqtop > memory.dot
+dot -Tpdf -o memory.pdf memory.dot
+
+where $COQTOPPID is coqtop pid and $N the index of the call to pkill.
+
+The pdf produced by the last command gives a compact graphical representation of
+the various objects allocated.