Skip to content

euspectre/racehound

 
 

Repository files navigation

# RaceHound
--------------
RaceHound can be used to detect data races in the Linux kernel on x86.

Kernel 3.10 or newer is required, 4.1 or newer is recommended.
Kprobes, kallsyms and debugfs support should be enabled in the kernel.

The main source code repository of RaceHound, releases and issue tracker
are available at https://github.com/euspectre/racehound.

# How it works
---------------

The ideas implemented here are similar to how DataCollider tool operates
on MS Windows.

The kernel part of RaceHound (racehound.ko module) monitors the instructions
in the kernel code in runtime. It operates as follows:

1. Place software breakpoints (Kprobes, actually) at the locations specified
by the user.

2. When a software breakpoint hits, check if the corresponding instruction
is about to access memory. Determine the address and the size of that memory
area.

3. Save the contents of that area (or at least a part of it).

4. Place a hardware breakpoint on that memory area to detect writes (if the
instruction reads from it) or both reads and writes (if the instruction
writes to it).

5. Make a delay. The length of the delay is configurable.
If some code tried to access that memory area during the delay, the hardware
breakpoint might have detected it.

6. Disarm the hardware breakpoint.

7. Just in case, check if the contents of that memory area have changed
during the delay. Might help detect races that were not caught above.

8. Let the execution continue as usual.

The found races are reported to the system log and to racehound/events file
in debugfs.

For the present, RaceHound can only monitor the instructions executing in
process context. So, no IRQ, SoftIRQ, NMI, etc. It can still detect races
where one of the conflicting instruction works in the process context
(a software breakpoint is placed there) while another one executes in, say,
IRQ or SoftIRQ and is caught by a hardware breakpoint.

# Build prerequisites
--------------

* cmake 2.8.10 or newer.

* Everything that is needed to build kernel modules (kernel development
files, etc.).

* GCC C compiler and C++ compiler with C++11 support (GCC 4.9 or newer is
preferable).

* (only if you want to build "ma_lines" plugin) GCC development files needed
to build plugins.

* elfutils, libelf, libdw and their development files (for example, install
elfutils-devel and elfutils-libelf-devel if you use Fedora, RHEL or the like).

# Build
--------------

  $ mkdir racehound.build
  $ cd racehound.build
  $ cmake path_to_racehound_sources
  $ make

To install RaceHound, you can now execute "make install" as root.

Note that RaceHound is installed for the current kernel only. If you update
or otherwise change the kernel, please rebuild RaceHound and install it
again.

Alternatively, you can skip "make install" step and use RaceHound directly
from its build directory.

# Self-tests (optional)
--------------

To build the tests provided with RaceHound, run "make build_tests" in the
directory where RaceHound was built. Then you can run "ctest" as root there
to run all the tests.

The tests check the basic functionality of RaceHound.

# Usage
--------------

It is assumed here that debugfs is available and mounted to /sys/kernel/debug.

## Scenario 1: monitoring a set of instructions

1. The user can tell the kernel part of RaceHound to monitor the particular
instructions in the kernel code. The locations of these instructions should
be written to /sys/kernel/debug/racehound/breakpoints in the following
format:

  [<module_name>:]{init|core}+0xoffset[,delay=<value>]

If <module_name> is not specified, the kernel or a built-in module is
assumed.

"init" and "core" are the areas containing the code of the kernel or a
module in memory. See core_layout/module_core, etc., in struct module. Dealing
with the ELF sections in the kernel has its difficulties, same for the
addresses and sizes of the functions, so RaceHound uses "init" and "core"
areas instead.

"delay" can be used to set the length of the delay for the given monitored
instruction when a software breakpoint hits. If it is not specified, "delay"
parameter of "racehound" kernel module will be used. The value is in milliseconds.

lines2insns tool from this project can help prepare the location strings in
the correct format.

Suppose you would like to monitor the memory accesses corresponding to the
lines 82 and 84 in something.c source file of test_module.ko. The module
should be built with debug info. Then you can do the following.

  $ echo "something.c:82" | lines2insns test_module.ko
  test_module:core+0x22f

  $ echo "something.c:84" | lines2insns test_module.ko
  test_module:core+0x251

If you are interested, say, only in writes at something.c:84, you can
specify this as well:

  $ echo "something.c:84:write" | lines2insns test_module.ko
  test_module:core+0x251

If the debug info for test_module.ko is in a separate file, say,
test_module.ko.debug, please specify both:

  $ echo "something.c:84:write" | lines2insns test_module.ko test_module.ko.debug
  test_module:core+0x251

If you know the ELF sections and the offsets there for the instructions of
interest, lines2insns can convert that to the proper format too:
  $ echo "test_module:.text+0x251" | lines2insns --section-to-area test_module.ko
  test_common_target:core+0x251

  $ echo "test_module:.exit.text+0x1" | lines2insns --section-to-area test_module.ko
  test_common_target:core+0x335

  $ echo "test_module:.init.text+0xdd" | lines2insns --section-to-area test_module.ko
  test_common_target:init+0xdd

See lines2insns --help for more details.

So, suppose you would like to monitor the instructions at the following
locations in test_module:
  test_module:core+0x22f
  test_module:core+0x251
...as well as one location in the kernel proper or in a built-in module:
  core+0x77654

2. Load racehound.ko if it is not loaded yet.

  insmod /usr/local/lib/modules/$(uname -r)/misc/racehound.ko [delay=...]

You can optionally specify how long to delay execution of the instructions
to check for the conflicting memory accesses in the process context and
in atomic context. "delay" parameter of "racehound" module can be used
for that. The value is in milliseconds. The default is 5000/HZ ms (5 jiffies).

3. Instruct RaceHound to monitor the given instructions (as root).

  echo "test_module:core+0x22f" > /sys/kernel/debug/racehound/breakpoints
  echo "test_module:core+0x251" > /sys/kernel/debug/racehound/breakpoints
  echo "core+0x77654,delay=50" > /sys/kernel/debug/racehound/breakpoints

Note that the execution of the instruction at "core+0x77654" will be delayed
by 50 milliseconds ("delay=50") no matter which global settings for delays
RaceHound has.

Reading /sys/kernel/debug/racehound/breakpoints will show the list of the
instructions to be monitored.

Note that it is no longer required to load RaceHound before the analyzed
module(s).

As soon as RaceHound receives the list of instructions to monitor, it will
check if the corresponding components of the kernel are loaded. If they are,
the monitoring will start immediately. If a module is not loaded yet,
RaceHound will wait for it to load and will process it after that.

Different kernel components can be monitored simultaneously.

To stop monitoring an instruction, you can write the same string as before
to /sys/kernel/debug/racehound/breakpoints but with '-' prepended. Example:

  echo "-core+0x77654" > /sys/kernel/debug/racehound/breakpoints

4. Make the analyzed kernel code work.

5. If RaceHound detects a race, it will output something like this to the
system log (see dmesg):

  [rh] Detected a data race on the the memory block at ffffffffc04c1454, size 4:
  [rh] ----- Thread #1, comm: sh -----
  [rh] IP: hello_plus+0x43/0x60 [test_simple]
    hello_device_write+0x5b/0x70 [test_simple]
    full_proxy_write+0x54/0x90
    __vfs_write+0x37/0x160
    vfs_write+0xb5/0x1a0
    SyS_write+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x1a/0xa9

  [rh] ----- Thread #2, comm: sh -----
  [rh] IP: right before hello_minus+0x18/0x20 [test_simple]
    hello_minus+0x18/0x20 [test_simple]
    hello_device_write+0x3d/0x70 [test_simple]
    full_proxy_write+0x54/0x90
    __vfs_write+0x37/0x160
    vfs_write+0xb5/0x1a0
    SyS_write+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x1a/0xa9

If the race is detected only because the memory area has been modified
during the delay, RaceHound, obviously, has no idea which code has done that
modification. It will output information only about the thread it knows in
that case.

If the same race (same conflicting instructions with the same call traces)
happens 2 times or more, RaceHound will only output its first occurrence.

6. If "test_module" is built with debug info, you can use addr2line or a
similar tool to find the source lines for the conflicting access
(my_func+0x18/0x20 in the example above).

7. Unload "racehound" module when it is no longer needed. If the analyzed
kernel components are still loaded then, RaceHound will automatically
"detach" from them first.

Notes and tips

* If the kernel is built with CONFIG_PREEMPT=y, it may help make the system
more responsive when using RaceHound. Especially, - if you use the delays of
100 ms or longer.

* There are only 4 hardware breakpoints available on an x86 CPU, so monitoring
too many instructions that execute often may be pointless. The exact values
of "too many" and "often" may vary, of course.

* If the "hot code paths" are constantly monitored, the performance overhead
may become significant as well.

* If you suspect two instructions to race against each other, it is usually
better to monitor only one of them. The reports about the races might be
less useful otherwise: the accesses from RaceHound itself may be listed
there.

* If you try to write a location to monitor to
/sys/kernel/debug/racehound/breakpoints and it fails, check the system log
(dmesg), it may provide more info.

On x86-64, adding a location to monitor may also fail if the kernel does not
have the following fix:
"kprobes/x86: Return correct length in __copy_instruction()"
(commit c80e5c0c23ce2282476fdc64c4b5e3d3a40723fd in the mainline, included
into kernel 4.1).
------------------

## Scenario 2: sweeping through the code

The idea is as follows. Suppose we have a list of the instructions from the
given portion of the code, say, part of a module or of the kernel. The list
may be quite long and monitoring all these instructions at the same time can
be a bad idea (performance overhead, etc.).

Note that "ma_lines" plugin for GCC 4.9+ can be used to get the list of the
locations in the source code where memory accesses may happen (except the
code written in assembly). lines2insns will translate it to the list of
locations in the binary code. See ma_lines/Readme.txt.

The kernel part of RaceHound provides info about which software breakpoints
have been hit. It is available via /sys/kernel/debug/racehound/events file.
Poll/select can be used for that file to wait till the events are available
there. The events can then be read, one per line.

The format for the "BP hit" events is the same as used for
racehound/breakpoints file, see above.

An example that demonstrates the usage of this API is also provided:
https://github.com/euspectre/racehound/blob/master/examples/events.py

A user-space application may use the information about the events to
automatically start and stop monitoring the instructions from the list
according to some policy. This should allow to keep overhead at the
acceptable level.

An example of such application is provided here:
https://github.com/euspectre/racehound/blob/master/examples/check_races.py

Note that both events.py and check_races.py need Python 3.4 or newer.

About

Data race detector for the Linux kernel

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 57.9%
  • CMake 17.1%
  • C++ 13.5%
  • Shell 6.2%
  • Awk 4.8%
  • Assembly 0.5%