Skip to content

Commit

Permalink
fixup! Initial sections for profiling with perf and native debugging
Browse files Browse the repository at this point in the history
  • Loading branch information
tmcgilchrist committed Nov 6, 2024
1 parent 834792b commit 897acdd
Showing 1 changed file with 23 additions and 22 deletions.
45 changes: 23 additions & 22 deletions manual/src/cmds/native-debugger.etex
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,36 @@

\section{s:native-debugger-preliminaries}{Preliminaries}

This chapter describes the support for debugging executables built with the native-code compiler \texttt{ocamlopt} using GDB or LLDB on Linux, macOS and BSD. We will call this \texttt{native debugging}, compared to bytecode debugging supported via \texttt{ocamldebug} (see chapter~\ref{c:debugger}).
This chapter describes the support for debugging executables built with \texttt{ocamlopt}, the native-code compiler, using GDB or LLDB on Linux, macOS, and FreeBSD. We will call this \texttt{native debugging}, compared to bytecode debugging supported via \texttt{ocamldebug} (see chapter~\ref{c:debugger}).

\subsection{ss:native-debugger-dwarf}{DWARF}

The OCaml compiler uses the \href{http://dwarfstd.org/}{DWARF} debugging information file format to describe the debug information it generates. DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging, and it is used by Linux ELF, macOS Mach-O and FreBSD ELF.
The OCaml compiler uses the \href{http://dwarfstd.org/}{DWARF} debugging information file format to describe the debug information it generates. DWARF is a debugging information file format used by many compilers and debuggers to support source level debugging, and it is used by Linux ELF, macOS Mach-O and FreeBSD ELF.

Within the DWARF standard the compiler specifically uses Call Frame Information (hereafter abbreviated as CFI) to describe a call stack for OCaml code, sections of the runtime written in C. e.g. Garbage Collector and across the Foreign Function Interface (FFI) if the language provides CFI information. (If the language has been compiled to include CFI information).
Within the DWARF standard the compiler specifically uses Call Frame Information (hereafter abbreviated as CFI) to describe a call stack for OCaml code, sections of the runtime written in C, and across the Foreign Function Interface (FFI) if the language provides CFI information. (If the language has been compiled to include CFI information).

OCaml defines its own calling convention (which is architecture specific and differs from the C calling convention) for how arguments are passed to functions, how values are returned and how registers are used. See [proc.ml] for specific details for your architecture.
OCaml defines its own calling convention that details how arguments are passed to functions, how values are returned and how registers are used. The calling convention is architecture specific and differs from the C calling convention. See [proc.ml] for specific details for your chosen architecture.

The compiler also generates line information that maps instructions back to their location in the source program (e.g. the instruction at address x originated from myprogram.ml line 42). This allows native debuggers to display the OCaml source code for the program being debugged and enables stepping through OCaml source code.
The OCaml compiler generates line information that maps instructions back to their location in the source program (e.g. the instruction at address \texttt{0x00} originated from \texttt{myprogram.ml} line 42). This allows native debuggers to display the OCaml source code for the program being debugged and enables stepping through OCaml source code.

Debuggers often need to view or modify the state of any function that is on the call stack, CFI allows the native debugger to find and inspect this state. CFI information is preserved across the OCaml / C boundary so the OCaml runtime can be debugged.

\subsection{ss:native-debugger-name-mangling}{Name Mangling}

Name mangling is the process for describing how the OCaml compiler generates symbol names for OCaml language constructs. The format of these symbols is important for debuggers, performance and observability tools, to uniquely identify the source function for a symbol and to do so without resource to the original source code. In the absensce of source mappings, you often need to use mangled names to set breakpoints or they will appear in information the native debugger will display. As such knowing how OCaml performs name mangling is important when debugging OCaml programs. OCaml 5.1.1 uses the a name mangling scheme of \texttt{caml<MODULE_NAME>.<FUNCTION_NAME>_<NNN>} where \texttt{NNN} is a randomly generated number. Before 5.1.1 the scheme is uses two underscores as the separator e.g. \texttt{caml<MODULE_NAME>__<FUNCTION_NAME>_<NNN>}.
Name mangling is the process for describing how the OCaml compiler generates symbol names for OCaml language constructs. The format of these symbols is important for debuggers, performance and observability tools, to uniquely identify the source function for a symbol and to do so without reference to the original source code. In the absence of source mappings, you often need to use mangled names to set breakpoints or they will appear in the information the native debugger will display. As such knowing how OCaml performs name mangling is important when debugging OCaml programs. OCaml 5.1.1 uses the a name mangling scheme of \texttt{caml<MODULE_NAME>.<FUNCTION_NAME>_<NNN>} where \texttt{NNN} is a randomly generated number. Before 5.1.1 the scheme is uses two underscores as the separator, e.g. \texttt{caml<MODULE_NAME>__<FUNCTION_NAME>_<NNN>}.

\paragraph{Note:} For the Windows MSVC port (restored in OCaml 5.3) the scheme uses \texttt{\$} as the separator, e.g. \texttt{caml<MODULE_NAME>\$<FUNCTION_NAME>_<NNN>}.

\subsection{ss:native-debugger-frame-pointers}{Frame Pointers}

The OCaml native compiler also supports maintaining Frame Pointers, which can be used by a debugger to walk the stack of function calls in a program. The Frame Pointer (also known as the base pointer) is a register (e.g. %rbp on x86_64 or x29 on ARM64) that points to the base of the current stack frame. The stack frame (also known as the activation frame or the activation record) refers to the portion of the stack allocated to a single function call. By saving the frame pointer along with the return address, between stack frames the call stack for OCaml can be maintained. It should be possible to use just frame pointers to debug OCaml programs, similar to debugging plain assembly code.
The OCaml native compiler supports generating frame pointers, which can be used by a native debugger to walk the stack of function calls in a program. The frame pointer (also known as the base pointer) is a register (e.g. \texttt{\%rbp} on x86_64 or \texttt{x29} on ARM64) that points to the base of the current stack frame. The stack frame (also known as the activation frame or the activation record) refers to the portion of the stack allocated to a single function call. By saving the frame pointer along with the return address the call stack for OCaml can be maintained. Using frame pointers only, without CFI enabled, it is possible to debug OCaml programs, however the experience is closer to debugging assembly and using DWARF with CFI is recommended.

\section{s:native-debugger-compilation}{Compiling for debugging}

Before debugging OCaml programs, first the native compiler \texttt{ocamlopt} must be installed with CFI emission enabled. CFI emission is controlled by the \texttt{--enable-cfi} flag and is enabled by default. This is sufficient to allow debugging the assembly code generated by \texttt{ocamlopt}. To perform source level debugging code need to be compiled with \texttt{-g} flag that records debugging information for exception backtraces, and generates mappings between assembly and source locations in OCaml. (Note only GDB and LLDB on Linux reliably support source locations).

\section{s:native-debugger-gdb}{Using GDB}
Here we will walk through debugging a simple OCaml program using GDB on Linux, showing the commands to use and the expected outputs.
Here we walk through debugging a simple OCaml program using GDB on Linux, showing the commands to use and the expected outputs.

Consider the following program:
\begin{caml_example*}{verbatim}
Expand All @@ -50,31 +52,30 @@ let _ = main ()
Compile this program with ocamlopt like so:

\begin{verbatim}
$ ocamlopt --version
5.2.0
$ ocamlopt -g -o fib.exe fib.ml
$ ./fib.exe 20
fib(20) = 6765
\end{verbatim}

Then when run this program prints the 20th Fibonnaci number, using recursion allows an opportunity to inspect the call stack. Startup a GDB session for this program:
When run this program prints the 20th Fibonnaci number. The use of recursion is an excuse to inspect the call stack. To so so, startup a GDB session for this program:

\begin{verbatim}
$ gdb ./fib.exe
...
\end{verbatim}

Break points can be set either using the mangled names produced by the compiler or using a combination of file name and line number. For example:
Break points can be set either using the mangled names produced by the compiler or a combination of file name and line number. For example:

\begin{verbatim}
(gdb) break camlFib.fib_ # press tab
(gdb) break camlFib.fib_270 # 270 happens to be the random number generated this time
(gdb) break camlFib.fib_270 # 270 happens to be the random number generated for NNN
Breakpoint 1 at 0x3cd50: file fib.ml, line 2.

(gdb) break fib.ml:7 # breakpoint for main function
Breakpoint 2 at 0x3cdc0: file fib.ml, line 7.
\end{verbatim}

Now we can run the program.
Now we can run the program and print a backtrace (note this session uses Ubuntu 24.04 LTS on x86_64).

\begin{verbatim}
(gdb) run
Expand Down Expand Up @@ -104,7 +105,7 @@ Breakpoint 1, camlFib.fib_270 () at fib.ml:2
#10 0x000055555558e892 in main (argc=<optimised out>, argv=<optimised out>) at runtime/main.c:37
\end{verbatim}

There is basic support for printing OCaml values using \href{https://github.com/ocaml/ocaml/blob/5.3.0/tools/gdb.py}{tools/gdb.py} and the built in Python scripting in GDB. Download that file and load it into GDB like so:
There is basic support for printing OCaml values using \href{https://github.com/ocaml/ocaml/blob/5.4.0/tools/gdb.py}{tools/gdb.py} and the built in Python scripting in GDB. Download that file and load it into GDB like so:

\begin{verbatim}
(gdb) source ~/ocaml/tools/gdb.py
Expand Down Expand Up @@ -153,16 +154,16 @@ $2 = caml(-):('bar', 42) = {caml(-):'bar'<3>, caml:42}
$3 = {caml(-):'bar'<3>, caml:42}
\end{verbatim}

Note the use of x86_64 register names. We can print values as their OCaml representations (note The (m) or (u) (or (g) or (-)) is the GC color).
Note the use of x86_64 register names: : \texttt{\$rax} and \texttt{\$rbx}. We can print values as their OCaml representations (note The (m) or (u) (or (g) or (-)) is the GC color).

\subsection{ss:native-debugger-gdb-commands}{GDB Commands}
Summary of interesting OCaml specific GDB commands:
\begin{options}
\item["break "\var{locspec}]
Set a breakpoint at all of the code locations matching \var{locspec}. e.g. Using the mangled OCaml names or specifying the linenum in the source file as \texttt{filename:linenum}.
Set a breakpoint at all of the code locations matching \var{locspec}, e.g. using the mangled OCaml names or specifying the linenum in the source file as \texttt{filename:linenum}.

\item["backtrace"]
Print the backtrace of the entire stack, this will include OCaml source references identifying which stack frame maps to a source location. e.g. fib.ml:4
Print the backtrace of the entire stack. This will include OCaml source references identifying which stack frame maps to a source location, e.g. \texttt{fib.ml:4}.

\item["disassemble "\var{addresses}]
Display a range of \var{addresses} as machine instructions. Typically used with the mangled OCaml names to display the assembly for a function.
Expand Down Expand Up @@ -240,7 +241,7 @@ heap exploration (see 'help ocaml' for more information).
(lldb)
\end{verbatim}

Note we are using an ARM64 Linux machine so our first argument is in the first register x0
Note above we are using an ARM64 Linux machine so our first argument is passed in the first register \texttt{x0}.

We can also print out all kinds of OCaml values. Reusing the `test_blocks.exe` startup a new LLDB session:

Expand All @@ -265,16 +266,16 @@ Summary of interesting OCaml specific LLDB commands:

\begin{options}
\item["breakpoint set -n "\var{symbol}]
Set a breakpoint at code location matching \var{symbol}. e.g. Using the mangled OCaml name.
Set a breakpoint at code location matching \var{symbol}, e.g, using the mangled OCaml name.

\item["breakpoint set -f "\var{filename}" -l"\var{linenum}]
Set a breakpoint at \var{linenum} in \var{filename}. e.g fib.ml:7
Set a breakpoint at \var{linenum} in \var{filename}, e.g. \texttt{fib.ml:7}

\item["breakpoint set -a "\var{address}]
Set a breakpoint on a memory \var{address}.

\item["backtrace"]
Print the backtrace of the entire stack, will include OCaml source references identifying which stack frame maps to a source location. e.g. fib.ml:4
Print the backtrace of the entire stack. This will include OCaml source references identifying which stack frame maps to a source location.

\item["disassemble"]
Disassemble specified instructions in the current target. Useful options include \texttt{-n} plus mangled OCaml name to disassemble a specific function and \texttt{-a} plus an address to disassemble function containing this address.
Expand Down

0 comments on commit 897acdd

Please sign in to comment.