cdefmt
("c de format", short for "c deferred formatting") is a highly efficient logging framework that targets resource-constrained devices, like microcontrollers.
Inspired by https://github.com/knurling-rs/defmt/
The idea is simple, we want to spend as little time as possible on creating logs, and have them take up as little space as possible.
cdefmt
achieves that through a multi phased approach:
- Encode as much information as possible at compile time
Done via the macros incdefmt/include/cdefmt.h
and some modifications to the compilation flags/linkerscript.
All of the log strings, parameter format and size information are encoded at compile time into the binary, these can be stripped from the shipped binary. - Have minimal information encoded into each log - only an ID and the raw values of the arguments.
- Defer the formatting logic to post processing
Performed by the rust library cdefmt-decoder, using information from the originally compiled binary to parse the ID and arguments into a full string.
For more technical details refer to Technical Details
Before using the logging framework, there are some setup steps necessary to integrate the encoding logic into your compiled binary.
Once that's done, you can use the cdefmt-decoder
library to decode your logs.
A working example is provided under:
- stdout generates the logs and prints them to
stdout
Ifstdout
is aFIFO
(aka shell pipe|
), the output format changes to a more compact, easy to parse, but not human understandable format. - stdin takes the logs from it's
stdin
and, using the original binary file, parses them into proper logs.
It expects the input to be formatted in the same way asstdout
outputs inFIFO
mode.
The easiest way to run the example would be to build the project using cmake:
cmake -S . -B build
cmake --build build/
Then run example-stdout
and pipe it's stdout into the example-stdin
, while providing example-stdin
a path to the originally compiled binary of example-stdout
:
build/examples/stdout/example-stdout | build/stdin --elf build/examples/stdout/example-stdout
To further complicate matters, it's possible to strip example-stdout
and run the stripped binary, logs will still work:
# Remove all symbols and debugging information from example-stdout
strip build/examples/stdout/example-stdout -o build/examples/stdout/example-stdout-stripped --strip-all
# Remove `.cdefmt` section from the stripped binary
objcopy --remove-section .cdefmt build/examples/stdout/example-stdout-stripped
# Run the example (note that the parser still needs the original un-stripped binary)
build/examples/stdout/example-stdout-stripped | build/stdin --elf build/examples/stdout/example-stdout
This means that it's possible to ship stripped binaries, while using the originals to parse the logs.
The stripping process removes the following from the binary:
- log strings
- debugging information
- symbol table
This has 2 advantages:
- reduced size
- obfuscation
- cdefmt encodes the log strings along with some metadata into a special section in the elf binary, we need to modify the project's linker script to generate that section:
Add the cdefmt metadata section to the end of the linker script (or right before/DISCARD/
if it exists):/* CDEFMT: log metadata section */ .cdefmt 0 (INFO) : { KEEP(*(.cdefmt.init .cdefmt.init.*)) . = . + 8; KEEP(*(.cdefmt .cdefmt.*)) }
- cdefmt uses the build-id to uniqely identify an elf file, and validate the compatibility of the parsed logs with the supplied elf:
Update (or add if it doesn't exist) the.note.gnu.build-id
section:.note.gnu.build-id : { PROVIDE(__cdefmt_build_id = .); *(.note.gnu.build-id) }
- Compile with the following flags:
-g
- cdefmt uses debugging information to parse log arguments.-Wl,--build-id
- link the build-id into the binary.
- Add the header
cdefmt.h
to your project. - Include the header wherever you want to use it.
- Implement cdefmt_log to forward the logs to your logging backend.
- Call CDEFMT_GENERATE_INIT outside of a function scope.
- Call cdefmt_init after your logging backend is initialized and
cdefmt_log
can be safely called. - Enjoy.
See example project for reference.
We'll follow the process of generating and parsing/formatting a single log.
Starting off with a simple log taking two arguments:
CDEFMT_INFO("Example log {} {:x}", argument_a, arg_b);
This macro is expanded by the preprocessor into
1 do {
2 const static __attribute__((section(".cdefmt"))) char cdefmt_log_string2[] = "{""\"version\":""1"",""\"counter\":""2"",""\"level\":""2"",""\"file\":\"""/root/git/cdefmt/examples/stdout/main.c""\",""\"line\":""52"",""\"message\":\"""Example log {} {:x}""\",""\"names\": [""\"""argument_a""\""",""\"""arg_b""\"""]""}";
3 struct __attribute__((packed)) cdefmt_log_args_t2 {
4 const char* log_id;
5 __typeof__(argument_a) arg0;
6 __typeof__(arg_b) arg1;
7 };
8 struct cdefmt_log_args_t2 cdefmt_log_args2 = {
9 .log_id = cdefmt_log_string2,
10 };
11 do {
12 memcpy((&cdefmt_log_args2.arg0), &(argument_a), sizeof(cdefmt_log_args2.arg0));
13 } while (0);
14 do {
15 memcpy((&cdefmt_log_args2.arg1), &(arg_b), sizeof(cdefmt_log_args2.arg1));
16 } while (0);
17 cdefmt_log(&cdefmt_log_args2, sizeof(cdefmt_log_args2), 2);
18 } while (0);
Let's go over each line and explain what's happening there:
- Standard
do { ... } while (0)
macro mechanism:- Creates scope for local variables
- Requires user to add a semicolon after the macro call
- Protects against
if (...) macro_call(); else ...
issues
- We have a few things going on here, let's break them down:
const static char cdefmt_log_string2[]
:
We're creating a const static string that will hold the actual log string along with some metadata.
Making itstatic
allows the compiler to allocate a permanent address for this string, this will come into play later 😉.
Notice how the string's name ends with
2
, the same value as in thecounter
field in the json.__attribute__((section(".cdefmt")))
:
Tell the linker to place the string into the special.cdefmt
section in the elf.
This allows us to separate all the log strings into a separate section, which can later be stripped from the elf, removing any trace of the strings from the binary.- Now comes the string itself, it's generated using some macro magic and depends on a known c feature - any consecutive string literals are merged into a single string literal by the compiler.
The resulting string is actually a simple json, let me format it for readability:{ "version": 1, "counter": 2, "level": 2, "file": "/root/git/cdefmt/examples/stdout/main.c", "line": 52, "message": "Example log {} {:x}", "names": ["argument_a", "arg_b"] }
field explanation version json schema version counter unique counter identifying this log, the uniqueness is per compilation unit ( .c
file), so logs fromlevel the log level (verbose, debug, etc...) file the file where this log came from line the code line the log came from message the user provided format string names the argument names, in the same order as the user provided them
- We're creating a new type which will uniquely identify this log and it's arguments.
The struct ispacked
to remove any padding, and make it as small as possible.
Notice how the struct's name ends with
2
, the same value as in thecounter
field in the json. - The first value in the struct is the log id.
- The second value in the struct is a copy of the first argument.
- The third value in the struct is a copy of the second argument.
- Closing the type definition.
- Instantiate a variable of the newly defined type.
Notice how
2
appears here again in all the type/variable names. - Assign the address of the log string into the
log_id
field.
We'll be using this address as the log's unique identifier, the parser will be able to extract the log string from the original elf using this value. - Closing the initializer.
- See 1.
- Copy the value of the first argument into the log structure.
- See 1.
- See 1.
- Copy the value of the second argument into the log structure.
- See 1.
- Send the log into the user implemented logging backend
argument explanation &cdefmt_log_args2 pointer to the log structure sizeof(cdefmt_log_args2) size of the log structure 2 log level (can be used for runtime filtering by the user) From the user's point of view, he's receiving a binary blob, it's the user's responsibility to store this blob, and then use the parsing library to parse this blob back into the log string. - See 1.
⚠️ Notice how we don't access the log string at any time during runtime, only it's address.
Let's look at how the necessary metadata is stored in the binary.
As we've seen in the previous section, the log metadata is stored in a special .cdefmt
section, this section is manually defined in the linker script:
1. /* CDEFMT: log metadata section */
2. .cdefmt 0 (INFO) : {
3. KEEP(*(.cdefmt.init .cdefmt.init.*))
4. . = . + 8;
5. KEEP(*(.cdefmt .cdefmt.*))
6. }
- Comment
- Define a new section:
parameter explanation .cdefmt
Section name 0
Place the section at address 0.
All the logs we insert into this section will start from address 0(INFO)
Set the section type as not allocatable, so that no memory is allocated for the section when the program is run. - Put the
.cdefmt.init
inputs into this section, this comes first as we want the init log to be uniquely placed at address 0. - Advance the location by 8, this ensures that even if the init log wasn't compiled, none of the other logs will ever have id 0.
- Put the
.cdefmt
inputs into this section, this is where all the log strings/metadata will be stored.
To gain a better understanding I recommend reading through the binutils ld docs.
When the project is compiled and linked, each log string is assigned a unique address by the linker, this address can be used as a unique ID to identify the log.
The compilation process embeds these addresses into the places that reference the log strings, because, being marked INFO
and static
, they don't depend on any runtime linkage, and are hardcoded by the linker, this means we can completely remove the .cdefmt
section from the shipped binary, without affecting any of the functionality.
Now we know where the metadata strings are stored, and how we can extract them using the log ID, however we still don't know how to parse the log's arguments, as they're in a packed struct and can have any size, type or structure.
The next section will discuss exactly that.
As we've seen previously, the log's arguments are copied into a packed struct, we need to know it's structure to be able to parse the binary blob back into usable data.
This is where debugging information comes in handy.
One of the requirements of this framework is compiling the code with -g
, this instructs the compiler/linker to generate debugging information and embed it into the elf.
ELF uses the DWARF debugging format to store the debugging information, we can use that to our advantage.
Short reminder from the Generation section, for each log we're doing 3 things that are useful here:
- Defining a new type:
struct cdefmt_log_args_t<counter>
. - Inserting
counter
into the log's metadata string. - Inserting the source file path into the log's metadata string.
The compiler will generate debugging information that describes this new type, and will embed it into the binary.
Using the counter value, and the source file path, we can find the specific compilation unit that defines the structure of the log args struct, there we can find the debugging information to figure out how to correctly parse the log's arguments from a binary blob.
Since the debugging information is only needed at parsing time, it can be stripped from the shipped binary.
All that's left is to parse the binary blobs generated by the logger:
- Load the original, unstripped binary - there we find the
.cdefmt
section and the debugging information. - Extract the log id from the binary blob.
Since we have the elf, we know the size of a pointer in the target, so we know the size of the log id. - Find the log metadata in the
.cdefmt
section, and parse the json. - Use the counter and source file to find the debugging information describing the log's arguments.
- Parse the remaining fields of the log args structure.
- Format the log string with the parsed arguments.
- Profit :D
- MIT license (LICENSE or http://opensource.org/licenses/MIT)