diff --git a/Changes b/Changes index b210c3563c01..1104874262fa 100644 --- a/Changes +++ b/Changes @@ -94,6 +94,12 @@ Working version review by Florian Angeletti, Anil Madhavapeddy, Gabriel Scherer, and Miod Vallat) +- #?????: Document support for Linux perf and frame pointers. + (Tim McGilchrist, review by ???) + +- #?????: Document support for native debugging with GDB and LLDB. + (Tim McGilchrist, review by ???) + ### Compiler user-interface and warnings: - #13428: support dump=[source | parsetree | lambda | ... | cmm | ...] diff --git a/manual/README.md b/manual/README.md index b6fc76ad9661..fc2aa8edc236 100644 --- a/manual/README.md +++ b/manual/README.md @@ -8,14 +8,14 @@ Prerequisites - A LaTeX installation. -- The HeVeA LaTeX-to-HTML converter (available in OPAM): +- The HeVeA LaTeX-to-HTML converter (available in opam): Note that you must make sure `hevea.sty` is installed into TeX properly. Your package manager may not do this for you. Run `kpsewhich hevea.sty` to check. -Building the manual +Building the Manual -------- 0. Build the OCaml compiler (including the native one) from sources. @@ -38,16 +38,16 @@ In the manual: - The PDF manual is in directory `texstuff` as file `manual.pdf`. -Source files +Source Files ------------ The manual is written in an extended dialect of LaTeX and is split across many source files. During the build process, these source files are converted into classical LaTeX files using the tools available in the `manual/tools` directory. These files are then converted to the different output -formats using either LaTeX or hevea. +formats using either LaTeX or HeVeA. Each part of the manual corresponds to a specific directory, and each distinct -chapters (or sometimes sections) are mapped to a distinct `.etex` file: +chapter (or sometimes sections) are mapped to a distinct `.etex` file: - Part I, Introduction to OCaml: `tutorials` - The core language: `coreexamples.etex` @@ -57,7 +57,7 @@ chapters (or sometimes sections) are mapped to a distinct `.etex` file: - Advanced examples with classes and modules: `advexamples.etex` - Part II, The OCaml language: `refman` - This part is divided in two very distinct chapters; the + This part is divided in two distinct chapters; the `OCaml language` chapter and the `Language extensions` chapter. - The OCaml language: `refman.etex` @@ -85,8 +85,10 @@ chapters (or sometimes sections) are mapped to a distinct `.etex` file: - Optimisation with Flambda: `flambda.etex` - Fuzzing with afl-fuzz: `afl-fuzz.etex` - Runtime tracing with Runtime_events: `runtime_tracing.etex` + - The “Tail Modulo Constructor” program transformation: `tail-mod-cons.etex` + - Runtime detection of data races with ThreadSanitizer: `tsan.etex` -Note that ocamlc,ocamlopt and the toplevel options overlap a lot. +Note that `ocamlc`, `ocamlopt`, and the toplevel options overlap a lot. Consequently, these options are described together in the file `unified-options.etex` and then included from `comp.etex`, `native.etex`, and `top.etex`. If you need to update this list of options, the top comment @@ -98,20 +100,22 @@ of `unified-options.etex` contains the relevant information. - The core library: `core.etex` - The standard library: `stdlib-blurb.etex` - The compiler front-end: `compilerlibs.etex` - - The unix library: Unix system calls: `libunix.etex` - - The str library: regular expressions and string processing: `libstr.etex` + - The Unix library: Unix system calls: `libunix.etex` + - The `str` library: regular expressions and string processing: `libstr.etex` - The threads library: `libthreads.etex` - The runtime_events library: `libruntime_events.etex` - The dynlink library: dynamic loading and linking of object files: `libdynlink.etex` + - Recently removed or moved libraries (Graphics, Bigarray, Num, LablTk): + `old.etex` -Latex extensions +Latex Extensions ---------------- ### Sections (and subsections, and subsubsections) In order to provide stable links to all part of the manual, the standard -`\section`, `\subsection` and `\subsubsection` macros are replaced by +`\section`, `\subsection`, and `\subsubsection` macros are replaced by variants that take the section label as their first argument. For instance, in the manual, you have to write ```latex @@ -121,20 +125,20 @@ rather than ```latex \section{Basics\label{s:basics}} ``` -This restriction ensures that hevea picks the section label when generating the -header ids. +This restriction ensures that HeVeA picks the section label when generating the +header IDs. A similar macro, `\lparagraph`, is provided for paragraphs. -### Caml environments +### Caml Environments The tool `tools/ocamltex` is used to generate the LaTeX code for the examples in the introduction and language extension parts of the manual. It implements two pseudo-environments: `caml_example` and `caml_eval`. -The pseudo-environment `caml_example` evaluates its contents using an ocaml +The pseudo-environment `caml_example` evaluates its contents using an OCaml interpreter and then translates both the input code and the interpreter output -to LaTeX code, e.g. +to LaTeX code, e.g., ```latex \begin{caml_example}{toplevel} let f x = x;; @@ -148,12 +152,12 @@ let f x = x ``` The {verbatim} or {toplevel} argument of the environment corresponds to the mode -of the example. Three modes are available -- toplevel, verbatim and signature. +of the example. Three modes are available -- toplevel, verbatim, and signature. The `toplevel` mode mimics the appearance and behavior of the toplevel. In particular, toplevel examples must end with a double semi-colon `;;`, otherwise an error would be raised. The `verbatim` does not require a final `;;` and is intended to be a lighter mode for code examples. If you want to declare a -signature instead of ocaml code, you must use the `{signature}` argument to the +signature instead of OCaml code, you must use the `{signature}` argument to the `caml_example` environment. ```latex @@ -219,20 +223,20 @@ let pi = 4. *. atan 1.;; let f x = x +. pi;; \end{caml_example} ``` -Beware that the detection code for these pseudo-environments is quite brittle +Beware that the detection code for these pseudo-environments is quite brittle, and the environments must start and end at the beginning of the line. ### Quoting The tool `tools/texquote2` provides support for verbatim-like quotes using -`\"` delimiters. More precisely, outside of caml environments and verbatim +`\"` delimiters. More precisely, outside of Caml environments and verbatim environments, `texquote2` translates double quotes `"text"` to `\machine{escaped_text}`. -### BNF grammar notation +### BNF Grammar Notation The tool `tools/transf` provides support for BNF grammar notations and special -quotes for non-terminal. When transf is used, the environment `syntax` can +quotes for non-terminal. When `transf` is used, the environment `syntax` can be used to describe grammars using BNF notation: ```latex \begin{syntax} @@ -257,11 +261,11 @@ Moreover, outside of the syntax environment, `@`-quotes can be used to introduce fragment of grammar: `@'(' module-expr ')'@`. As a consequence, when this extension is used `@` characters must be escaped as `\@`. This extension is used mainly in the language reference part of the manual. -and a more complete description of the notation used is available in the +A more complete description of the notation used is available in the first subsection of `refman/refman.etex`. -Consistency tests +Consistency Tests ----------------- -The `tests` folder contains consistency tests that checks that the manual +The `tests` folder contains consistency tests that checks that the manual, and the rest of the compiler sources stay synced. diff --git a/manual/src/allfiles.etex b/manual/src/allfiles.etex index 5fd6d872d015..930f8c7d1b94 100644 --- a/manual/src/allfiles.etex +++ b/manual/src/allfiles.etex @@ -69,7 +69,9 @@ and as a \input{ocamldep.tex} \input{ocamldoc.tex} \input{debugger.tex} +\input{native-debugger.tex} \input{profil.tex} +\input{profile-perf.tex} \input{intf-c.tex} \input{flambda.tex} \input{afl-fuzz.tex} diff --git a/manual/src/cmds/Makefile b/manual/src/cmds/Makefile index e169d9d21609..c44df882d40c 100644 --- a/manual/src/cmds/Makefile +++ b/manual/src/cmds/Makefile @@ -10,8 +10,8 @@ TEXQUOTE = $(OCAMLRUN) $(TOOLS)/texquote2 TRANSF = $(OCAMLRUN) $(TOOLS)/transf FILES = comp.tex top.tex runtime.tex native.tex lexyacc.tex intf-c.tex \ - ocamldep.tex profil.tex debugger.tex ocamldoc.tex \ - warnings-help.tex flambda.tex tail-mod-cons.tex \ + ocamldep.tex profil.tex profile-perf.tex debugger.tex native-debugger.tex \ + ocamldoc.tex warnings-help.tex flambda.tex tail-mod-cons.tex \ afl-fuzz.tex runtime-tracing.tex unified-options.tex tsan.tex etex-files: $(FILES) diff --git a/manual/src/cmds/native-debugger.etex b/manual/src/cmds/native-debugger.etex new file mode 100644 index 000000000000..8a884e60ed85 --- /dev/null +++ b/manual/src/cmds/native-debugger.etex @@ -0,0 +1,289 @@ +\chapter{Native debugger} \label{c:native-debugger} +%HEVEA\cutname{native-debugger.html} + +\section{s:native-debugger-preliminaries}{Preliminaries} + +This chapter describes the support for debugging executables built with \texttt{ocamlopt}, the native-code compiler, using GDB or LLDB on Linux, macOS, and FreeBSD. We will call this \texttt{native debugging}, in contrastto bytecode debugging supported via \texttt{ocamldebug} (see chapter~\ref{c:debugger}). + +\subsection{ss:native-debugger-dwarf}{DWARF} + +The OCaml compiler uses the \href{http://dwarfstd.org/}{DWARF} debugging information file format to describe the debug information it generates. DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging, and it is used by Linux ELF, macOS Mach-O and FreeBSD ELF. + +Within the DWARF standard, the compiler specifically uses Call Frame Information (CFI) to describe a call stack for OCaml code, sections of the runtime written in C, and across the Foreign Function Interface (FFI) if the language provides CFI information. (If the language has been compiled to include CFI information). + +OCaml defines its own calling convention that details how arguments are passed to functions, how values are returned and how registers are used. The calling convention is architecture specific and differs from the C calling convention. See [proc.ml] for specific details for your chosen architecture. + +The OCaml compiler generates line information that maps instructions back to their source program location (e.g., the instruction at address \texttt{0x00} originated from \texttt{myprogram.ml} line 42). This allows native debuggers to display the OCaml source code for the program being debugged and enables stepping through OCaml source code. + +Debuggers often need to view or modify the state of any function on the call stack, CFI allows the native debugger to find and inspect this state. CFI information is preserved across the OCaml / C boundary, so the OCaml runtime can be debugged. + +\subsection{ss:native-debugger-name-mangling}{Name Mangling} + +Name mangling describes how the OCaml compiler generates symbol names for OCaml language constructs. The format of these symbols is important for debuggers, performance, and observability tools to uniquely identify the source function for a symbol without referencing the original source code. Without source mappings, you often need to use mangled names to set breakpoints; otherwise they will appear in the information the native debugger displays. Understanding OCaml's name mangling is therefore essential when debugging OCaml programs. OCaml 5.1.1 uses the a name mangling scheme of \texttt{caml._}, where \texttt{NNN} is a randomly generated number. Before 5.1.1 the scheme is uses two underscores as the separator, e.g., \texttt{caml___}. + +\paragraph{Note:} For the Windows MSVC port (restored in OCaml 5.3), the scheme uses \texttt{\$} as the separator, e.g., \texttt{caml\$_}. + +\subsection{ss:native-debugger-frame-pointers}{Frame Pointers} + +The OCaml native compiler supports generating frame pointers, which native debugger can use to walk the stack of function calls in a program. The frame pointer (also known as the base pointer) is a register (e.g., \texttt{\%rbp} on x86_64 or \texttt{x29} on ARM64) that points to the base of the current stack frame. The stack frame (also known as the activation frame or the activation record) refers to the portion of the stack allocated to a single function call. By saving the frame pointer along with the return address, the call stack for OCaml can be maintained. Using frame pointers only, without CFI enabled, it is possible to debug OCaml programs, however the experience is closer to debugging assembly and using DWARF with CFI is recommended. + +\section{s:native-debugger-compilation}{Compiling for Debugging} + +Before debugging OCaml programs, first the native compiler \texttt{ocamlopt} must be installed with CFI emission enabled. CFI emission is controlled by the \texttt{--enable-cfi} flag and is enabled by default. This allows debugging the assembly code generated by \texttt{ocamlopt}. To perform source-level debugging, compile code with the \texttt{-g} flag, which records debugging information for exception backtraces and generates mappings between assembly and source locations in OCaml. (Note only GDB and LLDB on Linux reliably support source locations). + +\section{s:native-debugger-gdb}{Using GDB} +Here we walk through debugging a simple OCaml program using GDB on Linux, showing the commands to use and the expected outputs. + +Consider the following program: +\begin{caml_example*}{verbatim} +(* fib.ml *) +let rec fib n = + if n = 0 then 0 + else if n = 1 then 1 + else fib (n-1) + fib (n-2) + +let main () = + let r = fib 20 in + Printf.printf "fib(20) = %d" r + +let _ = main () +\end{caml_example*} + +Compile this program with \texttt{ocamlopt} like so: + +\begin{verbatim} +$ ocamlopt -g -o fib.exe fib.ml +$ ./fib.exe 20 +fib(20) = 6765 +\end{verbatim} + +When run this program prints the 20th Fibonacci number. The use of recursion is an excuse to inspect the call stack. To do so, startup a GDB session for this program: + +\begin{verbatim} +$ gdb ./fib.exe +... +\end{verbatim} + +Breakpoints can be set using either the mangled names produced by the compiler or a combination of file name and line number. For example: + +\begin{verbatim} +(gdb) break camlFib.fib_ # press tab +(gdb) break camlFib.fib_270 # 270 happens to be the random number generated for NNN +Breakpoint 1 at 0x3cd50: file fib.ml, line 2. + +(gdb) break fib.ml:7 # breakpoint for main function +Breakpoint 2 at 0x3cdc0: file fib.ml, line 7. +\end{verbatim} + +Now we can run the program and print a backtrace (note this session uses Ubuntu 24.04 LTS on x86_64). + +\begin{verbatim} +(gdb) run +Starting program: /home/tsmc/fib.exe +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". + +Breakpoint 2, camlFib.main_272 () at fib.ml:7 +7 let main () = +(gdb) continue +Continuing. + +Breakpoint 1, camlFib.fib_270 () at fib.ml:2 +2 let rec fib n = +(gdb) backtrace +#0 camlFib.fib_270 () at fib.ml:2 +#1 0x0000555555590de1 in camlFib.main_272 () at fib.ml:8 +#2 0x0000555555590e86 in camlFib.entry () at fib.ml:11 +#3 0x000055555558eaa7 in caml_program () +#4 +#5 0x00005555555de126 in caml_startup_common (pooling=, argv=0x7fffffffe3f8) + at runtime/startup_nat.c:132 +#6 caml_startup_common (argv=0x7fffffffe3f8, pooling=) at runtime/startup_nat.c:88 +#7 0x00005555555de19f in caml_startup_exn (argv=) at runtime/startup_nat.c:139 +#8 caml_startup (argv=) at runtime/startup_nat.c:144 +#9 caml_main (argv=) at runtime/startup_nat.c:151 +#10 0x000055555558e892 in main (argc=, argv=) at runtime/main.c:37 +\end{verbatim} + +There is basic support for printing OCaml values using \href{https://github.com/ocaml/ocaml/blob/5.4.0/tools/gdb.py}{tools/gdb.py} and the built-in Python scripting in GDB. Download that file and load it into GDB like so: + +\begin{verbatim} +(gdb) source ~/ocaml/tools/gdb.py +OCaml support module loaded. Values of type 'value' will now +print as OCaml values, there is a $Array() convenience function, +and an 'ocaml' command is available for heap exploration +(see 'help ocaml' for more information). + +(gdb) p (value)$rax +$1 = caml:14 + +\end{verbatim} + +We can also print other kinds of OCaml values. +In order to illustrate this, consider the following program: +\begin{caml_example*}{verbatim} +(* test_blocks.ml *) +type t = {s : string; i : int} + +let main a b = + print_endline "Hello, world!"; + print_endline a; + print_endline b.s + +let _ = main "foo" {s = "bar"; i = 42} +\end{caml_example*} + +Compile this program with \texttt{ocamlopt} and load it into GDB: + +\begin{verbatim} +$ ocamlopt -g -o test_blocks.exe test_blocks.ml +$ gdb ./test_blocks.exe +(gdb) source ~/ocaml/tools/gdb.py +... +(gdb) break camlTest_blocks.main_273 +Breakpoint 1 at 0x16db0: file test_blocks.ml, line 4. +(gdb) run +... +Breakpoint 1, camlTest_blocks.main_273 () at test_blocks.ml:4 +4 let main a b = +(gdb) p (value)$rax # Print out the first argument to main +$1 = caml(-):'foo'<3> +(gdb) p (value)$rbx # Then print the second argument +$2 = caml(-):('bar', 42) = {caml(-):'bar'<3>, caml:42} +(gdb) p *(value*)$rbx@2 # Examine the second field +$3 = {caml(-):'bar'<3>, caml:42} +\end{verbatim} + +Note the use of x86_64 register names: : \texttt{\$rax} and \texttt{\$rbx}. We can print values as their OCaml representations (note The (m) or (u) (or (g) or (-)) is the GC color). + +\subsection{ss:native-debugger-gdb-commands}{GDB Commands} +Summary of interesting OCaml specific GDB commands: +\begin{options} +\item["break "\var{locspec}] +Set a breakpoint at all of the code locations matching \var{locspec}, e.g., using the mangled OCaml names or specifying the linenum in the source file as \texttt{filename:linenum}. + +\item["backtrace"] +Print the backtrace of the entire stack. This will include OCaml source references identifying which stack frame maps to a source location, e.g., \texttt{fib.ml:4}. + +\item["disassemble "\var{addresses}] +Display a range of \var{addresses} as machine instructions. Typically used with the mangled OCaml names to display the assembly for a function. + +\item["info "\var{frame}] +This command prints a verbose description of the selected stack \var{frame}. + +\item["list "\var{linenum}] +Print lines centered around line number \var{linenum} in the current source file. This will print the source code for OCaml and the OCaml runtime written in C. + +\end{options} + +See the \href{https://sourceware.org/gdb/current/onlinedocs/gdb.html/}{Debugging with GDB} documentation for more details. In general only the features described above should work in GDB. Otherwise, users will need to fall back to assembly debugging. GDB is expected to work on all supported Linux architectures. + +\section{s:native-debugger-lldb}{Using LLDB} + +Here we will walk through debugging the earlier fib example using LLDB on Linux. Startup an LLDB session using the `fib.exe` from earlier: + +\begin{verbatim} +$ lldb ./fib.exe +Current executable set to 'fib.exe' (aarch64). +(lldb) +\end{verbatim} + +Breakpoints can be set using the OCaml mangled names or using a combination of file name and line number. For example: + +\begin{verbatim} +(lldb) breakpoint set -n camlFib.main # press tab for autocomplete +(lldb) breakpoint set -n camlFib.main_272 +Breakpoint 3: where = fib.exe`camlFib.main_272 + 40, address = 0x00000000000510b0 +(lldb) breakpoint set -f fib.ml -l 7 # breakpoint for line 7 in fib.ml +Breakpoint 2: where = fib.exe`camlFib.main_272, address = 0x0000000000051088 +(lldb) +\end{verbatim} + +Now we can run the program. +\begin{verbatim} +(lldb) run +Process 11391 launched: '/home/tsmc/fib.exe' (aarch64) +Process 11391 stopped +* thread #1, name = 'fib.exe', stop reason = breakpoint 2.1 + frame #0: 0x0000aaaaaaaf1088 fib.exe`camlFib.main_272 at fib.ml:7 + 4 else if n = 1 then 1 + 5 else fib (n-1) + fib (n-2) + 6 +-> 7 let main () = + 8 let r = fib 20 in + 9 Printf.printf "fib(20) = %d" r + 10 +warning: This version of LLDB has no plugin for the language "assembler." Inspection of frame variables will be limited. +(lldb) bt # Print the backtrace +* thread #1, name = 'fib.exe', stop reason = breakpoint 2.1 + * frame #0: 0x0000aaaaaaaf1088 fib.exe`camlFib.main_272 at fib.ml:7 + frame #1: 0x0000aaaaaaaf117c fib.exe`camlFib.entry at fib.ml:11 + frame #2: 0x0000aaaaaaaee644 fib.exe`caml_program + 476 + frame #3: 0x0000aaaaaab45b08 fib.exe`caml_start_program + 132 + frame #4: 0x0000aaaaaab45600 fib.exe`caml_main [inlined] caml_startup(argv=) at startup_nat.c:145:7 + frame #5: 0x0000aaaaaab455fc fib.exe`caml_main(argv=) at startup_nat.c:151:3 + frame #6: 0x0000aaaaaaaee2d0 fib.exe`main(argc=, argv=) at main.c:37:3 + frame #7: 0x0000fffff7d784c4 libc.so.6`__libc_start_call_main(main=(fib.exe`main at main.c:31:1), argc=1, argv=0x0000fffffffffb78) at libc_start_call_main.h:58:16 + frame #8: 0x0000fffff7d78598 libc.so.6`__libc_start_main_impl(main=0x0000aaaaaaba0dc8, argc=16, argv=0x000000000000000f, init=, fini=, rtld_fini=, stack_end=) at libc-start.c:360:3 + frame #9: 0x0000aaaaaaaee370 fib.exe`_start + 48 +(lldb) +\end{verbatim} + +There is basic support for printing OCaml values using \href{https://github.com/ocaml/ocaml/blob/5.3.0/tools/lldb.py}{tools/lldb.py} and the built-in Python scripting in LLDB. Download that file and load it into LLDB like so: + +\begin{verbatim} +(lldb) command script import ~/ocaml/tools/lldb.py +OCaml support module loaded. Values of type 'value' will now +print as OCaml values, and an 'ocaml' command is available for +heap exploration (see 'help ocaml' for more information). +(lldb) p (value)$x0 +(value) 41 caml:20 +(lldb) +\end{verbatim} + +Note: above we are using an ARM64 Linux machine, so our first argument is passed in the first register \texttt{x0}. + +We can also print out all kinds of OCaml values. Reusing the `test_blocks.exe` startup a new LLDB session: + +\begin{verbatim} +$ lldb ./test_blocks.exe +... +(lldb) command script import ~/ocaml/tools/lldb.py +... +(lldb) br s -n camlTest_blocks.main_273 +Breakpoint 1: where = test_blocks.exe`camlTest_blocks.main_273 + 40, address = 0x0000000000019ab0 +(lldb) run +... +(lldb) p (value)$x0 +(value) 187649984891864 caml(-):'Hello, world!'<13> +(lldb) p (value)$x1 +(value) 187649984891808 caml(-):('bar', 42) +\end{verbatim} + +\subsection{ss:native-debugger-lldb-commands}{LLDB Commands} + +Summary of interesting OCaml specific LLDB commands: + +\begin{options} +\item["breakpoint set -n "\var{symbol}] +Set a breakpoint at code location matching \var{symbol}, e.g, using the mangled OCaml name. + +\item["breakpoint set -f "\var{filename}" -l"\var{linenum}] +Set a breakpoint at \var{linenum} in \var{filename}, e.g., \texttt{fib.ml:7} + +\item["breakpoint set -a "\var{address}] +Set a breakpoint on a memory \var{address}. + +\item["backtrace"] +Print the backtrace of the entire stack. This will include OCaml source references identifying which stack frame maps to a source location. + +\item["disassemble"] +Disassemble specified instructions in the current target. Useful options include \texttt{-n} plus mangled OCaml name to disassemble a specific function and \texttt{-a} plus an address to disassemble function containing this address. + +\item["frame info"] +List information about the current stack frame in the current thread. + +\item["source"] +Commands for examining source code described by debug information for the current target process. + +\end{options} diff --git a/manual/src/cmds/profil.etex b/manual/src/cmds/profil.etex index 7826fab3fa2e..6bd24548303a 100644 --- a/manual/src/cmds/profil.etex +++ b/manual/src/cmds/profil.etex @@ -140,7 +140,5 @@ Display a short usage summary and exit. Profiling with "ocamlprof" only records execution counts, not the actual time spent within each function. There is currently no way to perform -time profiling on bytecode programs generated by "ocamlc". For time -profiling of native code, users are recommended to use standard tools -such as perf (on Linux), Instruments (on macOS) and DTrace. Profiling -with "gprof" is no longer supported. +time profiling on bytecode programs generated by "ocamlc". For time profiling +of native code (see chapter~\ref{c:profiler-perf}). \ No newline at end of file diff --git a/manual/src/cmds/profile-perf.etex b/manual/src/cmds/profile-perf.etex new file mode 100644 index 000000000000..87fffda0a217 --- /dev/null +++ b/manual/src/cmds/profile-perf.etex @@ -0,0 +1,172 @@ +\chapter{Profiling (perf)} \label{c:profiler-perf} +%HEVEA\cutname{profiler-perf.html} + +This chapter describes how to use \texttt{perf} to profile OCaml programs. + +Linux Performance Events (\texttt{perf (1)}) is a suite of tools for performance observability. The main features covered here are \texttt{perf-record(1)} for recording events and \texttt{perf-report(1)} for printing and visualising recorded events. \texttt{perf} has many additional profiling and visualising features. For more comprehensive documentation, see (\texttt{perf(1)}, \href{https://perfwiki.github.io/main/}{\texttt{perf} wiki} or \href{https://www.brendangregg.com/perf.html}{Brendan Gregg's Blog}). + +\section{s:ocamlperf-call-graph}{Background} + +CPU profiling is typically performed by sampling the CPU call graph at frequent intervals to gather statistics on the code paths that are consuming CPU resources. To profile OCaml code, \texttt{perf} needs to understand the call graph of OCaml. \texttt{perf} supports multiple options for recording call graphs: +\begin{itemize} +\item Frame Pointers, which is the default. +\item DWARF's Call Frame Information (CFI). +\item Hardware Last Branch Record (LBR). +\end{itemize} + +Of these options, frame pointers are recommended for profiling OCaml code for the following: +\begin{itemize} +\item Unwinding is faster and uses less CPU. +\item Trace files produced are smaller. +\item Frame pointers provide more accurate call graphs, particularly when used with a Linux distribution that supports them. +\item Frame pointers work better with OCaml 5's non-contiguous stacks. +\end{itemize} + +Frame pointer based call graphs use a convention where the head of the linked list of stack frames can be found in a register called the frame pointer (e.g., \$rbp on x86_64), and two pointers to the previous stack frame and the return address are saved at a known offset from the frame pointer. This linked list of stack frames is then used to walk the stack of called functions. OCaml 5 features non-contiguous stacks as part of the implementation of effects, see \href{https://dl.acm.org/doi/10.1145/3453483.3454039}{Retrofitting effect handlers onto OCaml} (Section 5.5). + +DWARF based call graphs use the DWARF CFI information to perform unwinding. However this produces larger trace files that are more costly to capture and are often truncated because \texttt{perf} has not copied enough of the call stack. It also requires including CFI debugging information in your program resulting in larger binaries. + +Hardware Last Branch Record (LBR) uses a processor provided method to record call graphs. This has the dual limitations of restricted availability (only on certain Intel CPUS) and a limited stack depth. The stack depth is 16 on Haswell and 32 since Skylake. + +\section{s:ocamlperf-compiling}{Compiling for Profiling} + +The process for compiling an executable for CPU profiling depends on the OCaml version. For OCaml versions 4.14 and earlier, either frame pointers or DWARF can be used, while for OCaml 5.0 and later, enabling frame pointers is recommended. + +To enable frame pointers, configure the compiler with \texttt{--enable-frame-pointers}. Alternatively, you can install an opam switch with frame pointers enabled, as follows: + +\begin{verbatim} + opam switch create ocaml-option-fp +\end{verbatim} + +Frame pointer support for OCaml is available on x86_64 architecture on Linux starting with OCaml 4.12 and on macOS from OCaml 5.3. ARM64 architecture is supported on Linux and macOS from OCaml 5.4, while other Tier-1 architectures (POWER, RISC-V, and s390x) are currently unsupported. + +\section{s:ocamlperf-profiling}{Profiling an Execution} + +The basic \texttt{perf} command for profiling is: +\begin{verbatim} + perf record -F 99 --call-graph fp +\end{verbatim} + +The \texttt{-F 99} option sets \texttt{perf} to sample at 99Hz, reducing excessive data generation during longer runs and minimising overlap with other periodic activities. The \texttt{--call-graph fp} instructs \texttt{perf} to use frame pointers to get the call graph, followed by the OCaml executable you want to profile. This command creates a \texttt{perf.data} file in the current directory. Alternatively use \texttt{--output} to choose a more descriptive filename. + +The \texttt{perf record} command works by copying a segment of the call stack at each sample and recording this data into a \texttt{perf.data} file. These samples can then be processed after recording using \texttt{perf report} to reconstruct the profiled program’s call stack at every sample. + +\texttt{perf} uses the symbols in an OCaml executable, so it helps to understand OCaml's name mangling scheme to map names to OCaml source locations. Before OCaml 5.1, \texttt{ocamlopt} mangled names used the \texttt{camlModule__identifier_stamp} format; from 5.1 onwards, the separator is a dot \texttt{camlModule.identifier_stamp}. Both formats are supported by \texttt{perf}. + +Consider the following program: + +\begin{caml_example*}{verbatim} +module Compute = struct + let rec fib n = + if n = 0 then 0 + else if n = 1 then 1 + else fib (n-1) + fib (n-2) +end + +let main () = + let r = Compute.fib 20 in + Printf.printf "fib(20) = %d" r + +let _ = main () +\end{caml_example*} + +This program produces the names \texttt{camlFib.main_274} for the \texttt{main} function and \texttt{camlFib.fib_271} for the \texttt{fib} function in the \texttt{Compute} module. + +\section{s:ocamlperf-printing}{Printing profiling information} + +The \texttt{perf report} command summarises data in the \texttt{perf.data} file. +The basic \texttt{perf report} command is: + +\begin{verbatim} + perf report -f --no-children -i perf.data +\end{verbatim} + +This command provides an interactive interface where you can navigate through the accumulated call graphs and select functions and threads for detailed information. Alternatively \texttt{--stdio} will output similar data using a text based report writing to stdout. Note that if stack traces appear broken, it may be due to software not having frame pointer support. + +Consider the following program which calculates the Tak function. +\begin{caml_example*}{verbatim} +let (x,y,z) = + try + let x = int_of_string Sys.argv.(1) in + let y = int_of_string Sys.argv.(2) in + let z = int_of_string Sys.argv.(3) in + (x,y,z) + with _ -> (18,12,6) + +let rec tak x y z = + if x > y then + tak (tak (x-1) y z) (tak (y-1) z x) (tak (z-1) x y) + else z + +let main () = + let r = tak x y z in + Printf.printf "tak %d %d %d = %d\n" x y z r + +let _ = main () +\end{caml_example*} + +The \texttt{perf} report for this might resemble the following. +\begin{verbatim} +Samples: 809 of event 'cycles', Event count (approx.): 24701952617 + Overhead Command Shared Object Symbol +- 100.00% tak-fp.exe tak-fp.exe [.] Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + Tak$tak_402 + + Tak$tak_402 + 0.00% tak-fp.exe [kernel.kallsyms] [k] 0xffffb9a5ff79d854 + 0.00% perf-exec [kernel.kallsyms] [k] 0xffffb9a5ff719c34 +\end{verbatim} + +Profiling data can also be visualised as Flame Graphs, which highlight the most frequent code paths in stack traces. The original scripts scripts \texttt{stackcollapse-perf.pl} and \texttt{flamegraph.pl} can be found at Brendan Gregg's \href{https://www.brendangregg.com/flamegraphs.html}{Flame Graphs} web page and work as follows: + +\begin{verbatim} +git clone https://github.com/brendangregg/FlameGraph +cd FlameGraph + +# Collect the results into perf.data +perf record -F 99 --call-graph fp +perf script -i perf.data | ./stackcollapse-perf.pl out.folded +flamegraph.pl out.folded > flamegraph.svg ## Create the FlameGraph svg +\end{verbatim} + +Alternatively \href{https://github.com/jonhoo/inferno}{inferno} is a Rust port of the Flame Graphs tools which works in a similar way and is faster to process large perf files. + +\begin{verbatim} +cargo install inferno + +# Collect the results into perf.data +perf script -i perf.data | inferno-collapse-perf > stacks.folded +cat stacks.folded | inferno-flamegraph > flamegraph.svg +\end{verbatim} + +Some \texttt{perf} tools (e.g., \texttt{perf-report(1)} and \texttt{perf-annotate(1)}) use DWARF debugging symbols to associate symbols with source code locations, if you need these features, the program needs to be compiled with \texttt{-g} to include debugging information in the executable. + +Captured profile data can also be processed using \texttt{perf script} in various ways or with online tools like \href{https://www.speedscope.app}{speedscope.app} and \href{https://profiler.firefox.com/}{profiler.firefox.com}, or any other tool that supports \texttt{perf}-formatted data. + +\section{s:ocamlperf-conclusion}{Conclusion} + +For CPU profiling of native code, standard tools such as \texttt{perf}, eBPF, DTrace, or Instruments (on macOS) are recommended. Compiling with frame pointers enabled is often necessary for these tools to work effectively. Profiling with \texttt{gprof} is no longer supported. + +Enabling frame pointers can impact performance on certain architectures (up to 10\% performance cost on x86_64 has been measured). Users of this feature are encouraged to benchmark their own applications to assess this impact. + +\section{s:ocamlperf-glossary}{Glossary} + +The following terminology is used in this chapter of the manual. + +\begin{itemize} +\item[{\bf Call graph}] The chain of function calls that have lead to the current function (also referred to as a call stack). +\item[{\bf Unwinding}] The process of restoring the program's state to how it was before some function(s) were called and possibly giving a profiler or debugger much more information. (also called stack unwinding). +\item[{\bf Stack frame}] Refers to the portion of the stack allocated to a single function call. (also called an activation frame, activation record or simply frame). +\end{itemize}