Skip to content

Commit

Permalink
fixup! Initial sections for profiling with perf and native debugging
Browse files Browse the repository at this point in the history
  • Loading branch information
tmcgilchrist committed Nov 6, 2024
1 parent 8be983a commit 5c74a15
Showing 1 changed file with 17 additions and 17 deletions.
34 changes: 17 additions & 17 deletions manual/src/cmds/profile-perf.etex
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
\chapter{Profiling (perf)} \label{c:profiler-perf}
%HEVEA\cutname{profiler-perf.html}

This chapter describes how use \texttt{perf}, the Linux Performance Events tools, to profile OCaml programs.
This chapter describes how to use \texttt{perf}, the Linux Performance Events tools, to profile OCaml programs.

Linux Performance Events (\texttt{perf}) is a set of tools for performance observability, including CPU performance counter profiling , and static and dynamic tracing. The main features covered here are \texttt{perf-record(1)} for recording events and \texttt{perf-report(1)} for printing and visualising recorded events. Perf has many other features for profiling and visualising performance, see (man perf(1), \href{https://perfwiki.github.io/main/}{Perf wiki} or \href{https://www.brendangregg.com/perf.html}{Brendan Gregg's Blog}) for more general documentation.
Linux Performance Events (\texttt{perf (1)}) is a set of tools for performance observability, including CPU performance counter profiling, and static and dynamic tracing. The main features covered here are \texttt{perf-record(1)} for recording events, and \texttt{perf-report(1)} for printing and visualising recorded events. Perf has many other features for profiling and visualising performance, see (man perf(1), \href{https://perfwiki.github.io/main/}{Perf wiki} or \href{https://www.brendangregg.com/perf.html}{Brendan Gregg's Blog}) for more general documentation.

\section{s:ocamlperf-compiling}{Compiling for profiling}

How to setup your executable for time profiling depends on which version of OCaml is being used. For OCaml versions 4.14 and earlier, either frame pointers or DWARF can be used. For OCaml 5.0 onwards, it is recommended to compile with frame pointers enabled.
How to compile an executable for time profiling depends on which version of OCaml is being used. For OCaml versions 4.14 and earlier, either frame pointers or DWARF can be used. For OCaml 5.0 onwards, it is recommended to compile with frame pointers enabled.

To use frame pointers, you must configure the compiler with \texttt{--enable-frame-pointers}. You can also install an opam switch with frame pointers enabled as follows:

\begin{verbatim}
opam switch create <YOUR-SWITCH-NAME-HERE> ocaml-option-fp
\end{verbatim}

Frame pointer support for OCaml is available for x86_64 architecure on Linux from OCaml 4.12 onwards and on macOS from OCaml 5.3. Support for the arm64 architecture is available on Linux and macOS from OCaml 5.3. Other Tier-1 architecures POWER, RISC-V and s390x are currently unsupported.
Frame pointer support for OCaml is available for x86_64 architecure on Linux from OCaml 4.12 onwards and on macOS from OCaml 5.3. Support for the arm64 architecture is available on Linux and macOS from OCaml 5.4. Other Tier-1 architecures POWER, RISC-V and s390x are currently unsupported.

OCaml 5 requires frame pointers due to the non-contiguous stacks that are used in the implementation of effects, mentioned in \href{https://dl.acm.org/doi/10.1145/3453483.3454039}{Retrofitting effect handlers onto OCaml} Section 5.5. The non-contiguous stacks do not work with the copying nature of perf, so traces produced for OCaml 5 without frame pointers enabled will be truncated and/or inaccurate. Further more they will also be truncated if your Linux distribution doesn't enable frame pointers for libraries OCaml link against (e.g. libc).
OCaml 5 requires frame pointers due to the non-contiguous stacks that are used in the implementation of effects, mentioned in \href{https://dl.acm.org/doi/10.1145/3453483.3454039}{Retrofitting effect handlers onto OCaml} (Section 5.5). The non-contiguous stacks do not work with the copying nature of perf so traces produced for OCaml 5 without frame pointers enabled will be truncated and inaccurate. They will also be truncated if your Linux distribution doesn't enable frame pointers for libraries OCaml links against.

\section{s:ocamlperf-profiling}{Profiling an execution}

Expand All @@ -26,18 +26,18 @@ The basic \texttt{perf} command is:
perf record -F 99 --call-graph fp <YOUR_EXECUTABLE>
\end{verbatim}

The \texttt{-F 99} tells perf to sample at 99Hz, which avoids generating excessive data for longer runs and is unlikely to be in lockstep with other periodic activities. The \texttt{--call-graph fp} instructs perf to use frame pointers to get the call-graph (which is the default anyway) and then whatever OCaml executable you want to profile. This will create a \texttt{perf.data} file in the current directory, alternatively use \texttt{--output} to choose a more descriptive filename.
The \texttt{-F 99} tells perf to sample at 99Hz, which avoids generating excessive data for longer runs and is unlikely to be in lockstep with other periodic activities. The \texttt{--call-graph fp} instructs perf to use frame pointers to get the call-graph and then whatever OCaml executable you want to profile. This will create a \texttt{perf.data} file in the current directory, alternatively use \texttt{--output} to choose a more descriptive filename.

The \texttt{perf record} command works by copying (a segment) of the call stack at every sample and recording this into a perf.data file. These samples are then processed (using \texttt{perf report} to reconstruct the profiled program's call stack at every sample) offline, after recording has finished.
The \texttt{perf record} command works by copying a segment of the call stack at every sample and recording this into a \texttt{perf.data} file. These samples are then processed after recording has finished using \texttt{perf report} to reconstruct the profiled program's call stack at every sample.

\texttt{perf record --call-graph} has multiple options for record call graphs:
\texttt{perf record --call-graph} has multiple options available for recording call graphs:
\begin{itemize}
\item Frame Pointers, which is the default
\item DWARF's Call Frame Information (CFI)
\item Hardware Last Branch Record (LBR) available on certain Intel CPUs
\end{itemize}

Perf will use the symbols present in an OCaml executable, so it is useful to understand OCaml's name mangling scheme to map these names to OCaml source locations. Before 5.1, ocamlopt mangles names using the following format "camlModule__identifier_stamp" while 5.1 onwards replaces the double underscore separator with a dot "camlModule.identifier_stamp". Both options are supported by perf.
Perf will use the symbols present in an OCaml executable, so it is useful to understand OCaml's name mangling scheme to map these names to OCaml source locations. Before 5.1, \texttt{ocamlopt} mangles names using the following format \texttt{camlModule__identifier_stamp} while 5.1 onwards replaces the double underscore separator with a dot \texttt{camlModule.identifier_stamp}. Both options are supported by \texttt{perf}.

For example, consider the following program:
\begin{caml_example*}{verbatim}
Expand All @@ -55,19 +55,19 @@ let main () =
let _ = main ()
\end{caml_example*}

This will produce the following names \texttt{camlFib.main_274} for \texttt{main} function and \texttt{camlFib.fib_271} for the \texttt{fib} function in the \texttt{Compute} module when compiled with OCaml 5.2.
This will produce the following names \texttt{camlFib.main_274} for \texttt{main} function and \texttt{camlFib.fib_271} for the \texttt{fib} function in the \texttt{Compute} module.

\section{s:ocamlperf-printing}{Printing profiling information}

The basic \texttt{perf} reporting command is:
The \texttt{perf report} command summarises the captured \texttt{perf.data} file. The basic \texttt{perf report} command is:

\begin{verbatim}
perf report -f --no-children perf.data
\end{verbatim}

this will display the accumulated call-graphs in an interactive terminal interface where you can navigate the data, select functions and threads for more information. \texttt{--stdio} will output present similar data using a text based report.
This displays the accumulated call-graphs in an interactive terminal interface where you can navigate the data selecting functions and threads for more information. Alternatively \texttt{--stdio} will output similar data using a text based report writing to stdout.

Alternatively profiles can be turned into FlameGraphs, a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately. Use the scripts found at Brendan Gregg's \href{https://www.brendangregg.com/flamegraphs.html}{Flame Graphs} web page as follows:
Profile data can be turned into Flame Graphs, a visualisation of hierarchical data, created to visualise stack traces of profiled software so that the most frequent code-paths can be identified quickly and accurately. Use the scripts \texttt{stackcollapse-perf.pl} and \texttt{flamegraph.pl} found at Brendan Gregg's \href{https://www.brendangregg.com/flamegraphs.html}{Flame Graphs} web page as follows:

\begin{verbatim}
git clone https://github.com/brendangregg/FlameGraph
Expand All @@ -79,12 +79,12 @@ Alternatively profiles can be turned into FlameGraphs, a visualization of hierar
flamegraph.pl out.folded > out.svg ## Create the FlameGraph svg
\end{verbatim}

Additionally some perf tools (e.g. perf-report(1) and perf-annotate(1)) make use of DWARF debugging symbols for re-associating symbols with source code locations, if you need these features then recompile with "-g" to include debugging information with the executable.
Some perf tools (e.g. perf-report(1) and perf-annotate(1)) make use of DWARF debugging symbols for re-associating symbols with source code locations, if you need these features then recompile with \texttt{-g} to include debugging information with the executable.

The profile data captured can be processed by \texttt{perf-record} in a number of other ways, or using online tools like \href{https://www.speedscope.app}{speedscope.app} and \href{https://profiler.firefox.com/}{profiler.firefox.com}, or any other tool that accepts Linux perf data.
The profile data captured can be processed by \texttt{perf script} in a number of other ways, or using online tools like \href{https://www.speedscope.app}{speedscope.app} and \href{https://profiler.firefox.com/}{profiler.firefox.com}, or any other tool that accepts Linux perf formatted data.

\section{s:ocamlperf-conclusion}{Conclusion}

Enabling Frame Pointers may introduce a small performance penalty on certain architectures (up to 10\% perf cost on x86_64 has been measured). Users of this feature are encouraged to do their own benchmarking to determine the impact.
For time profiling of native code, users are recommended to use standard tools such as perf, eBPF, DTrace or Instruments (on macOS). Compiling with frame pointers enabled is often required to allow these tools to work most effectively. Profiling with gprof is no longer supported.

For time profiling of native code, users are recommended to use standard tools such as perf (on Linux), Instruments (on macOS), DTrace and eBPF. Profiling with gprof is no longer supported.
Enabling Frame Pointers may introduce a small performance penalty on certain architectures (up to 10\% performance cost on x86_64 has been measured). Users of this feature are encouraged to do their own benchmarking to determine the impact.

0 comments on commit 5c74a15

Please sign in to comment.