Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect offsets in PCC-relative address calculations when compiling C for CHERIoT #106

Open
alees24 opened this issue Jan 21, 2025 · 6 comments
Assignees

Comments

@alees24
Copy link

alees24 commented Jan 21, 2025

Toolchain issue with auipcc instructions in compiled C.

Commit: 6d22636
(also observed with earlier versions)

The offsets used in the calculation of PCC-relative targets are observed to be incorrect. The offsets appear not to have been adjusted for the fact that the AUIPCC instruction employs an 11-bit shifted constant rather than the RV-standard 12-bit offset. This has been observed to occur when the ensuing instruction is 'clc' (as in the pseudo-op 'cllc'), cincaddrimm or cjalr, as in this demonstration of a simple function call.

In every case that's been studied, the correct target address may be obtained by adding the literal constant from the 'auipcc <>' into the calculated address a second time, i.e. calculating as if that constant were still 12-bit shifted as per the RV standard.
Note: the file cheriot_toolchain.cmake shows the compilation switches being used; since this is our first use of C and CHERI on a large codebase, please do check the options used.

Update: the significant variable in this specific failure is the disabling of linker relaxations, but with more extensive source trees, the same faults occurs even with linker relaxations enabled.

This simple reproduction consists of a simple C/C++ file (test_case.c and test_case.cc; the code is the same) and a contrived assembler file - test_space.S - that ensures the target address is sufficiently far from the call site that auipcc must have a non-zero offset.

Build instructions:

  cmake -B build
  cmake --build build

Output:

  • test_c - ELF file compiled from C source
  • test_cc - ELF file compiled from C++ source

Relevant disassembly from 'test_c':

00200080 <start>:
; start():
  200080: 17 13 00 00  	auipcc	ct1, 1
  200084: 67 00 83 81  	cjr	-2024(ct1)
...
00200898 <start2>:
; start2():
  200898: 6f 00 50 00  	j	0x20109c <start2+0x804>
  20089c: 73 10 00 c0  	unimp
  2008a0: 73 10 00 c0  	unimp

The above disassembly shows the error in test_c but you may also simulate it using the included Verilator build of Sunburst chip:

  ./Vtop_chip_verilator -E build/test_c | more

Output from simulation of test_c (C compilation)

Simulation of Sunburst Chip
===========================

Tracing can be toggled by sending SIGUSR1 to this process:
$ kill -USR1 88228

USBDPI: Monitor output file created at /home/adrianl/toolchain_bug/usb0.log. Works well with tail:
$ tail -f /home/adrianl/toolchain_bug/usb0.log

UART: Created /dev/pts/1 for uart0. Connect to it with any terminal program, e.g.
$ screen /dev/pts/1
UART: Additionally writing all UART output to 'uart0.log'.

Simulation running, end by pressing CTRL-c.
TOP.top_chip_verilator.u_top_chip_system.u_core_ibex.u_ibex_top_tracing.u_ibex_tracer.unnamedblk1: Writing execution trace to trace_core_00000000.log
                  25: Illegal instruction (hart 0) at PC 0x00200098: 0xc0001073
                  33: Illegal instruction (hart 0) at PC 0x00200000: 0x0001145b
                  41: Illegal instruction (hart 0) at PC 0x00200000: 0x0001145b
                  49: Illegal instruction (hart 0) at PC 0x00200000: 0x0001145b

The trace log output is:

Time	Cycle	PC	Insn	Decoded instruction	Register and memory contents
             18	         5	00200080	00001317	CH.auipcc	c6, 0x1	  x6=0x00200880+0x15e3e0000
             22	         7	00200084	81830067	CH.cjalr	c0,-2024(c6)	  x6:0x00200880+0x15e3e0000  x0=0x00000000+0x000000000
             30	        11	00200098	c0001073	-->csrrw	x0,cycle,x0	  x0:0x00000000  x0=0x00000000
             38	        15	00200000	    0000	-->c.unimp	
             46	        19	00200000	    0000	-->c.unimp	
             54	        23	00200000	    0000	-->c.unimp

Compilation with linker relaxations enabled

The corresponding output from the build, where the auipcc/cjalr is replaced with a simple cjal instruction, is fine:

./Vtop_chip_verilator -E build/test_cc | more

Output from simulation of test_c/cc

Simulation of Sunburst Chip
===========================

Tracing can be toggled by sending SIGUSR1 to this process:
$ kill -USR1 88213

USBDPI: Monitor output file created at /home/adrianl/toolchain_bug/usb0.log. Works well with tail:
$ tail -f /home/adrianl/toolchain_bug/usb0.log

UART: Created /dev/pts/1 for uart0. Connect to it with any terminal program, e.g.
$ screen /dev/pts/1
UART: Additionally writing all UART output to 'uart0.log'.

Simulation running, end by pressing CTRL-c.

Time	Cycle	PC	Insn	Decoded instruction	Register and memory contents
             18	         5	00200080	0190006f	CH.cjal	c0,200898	  x0=0x00000000+0x000000000
             26	         9	00200898	0050006f	CH.cjal	c0,20109c	  x0=0x00000000+0x000000000
             34	        13	0020109c	10500073	wfi

Source files for reproducing this issue: source.zip
Simulator binary (x64 Linux): simulator.zip

@resistor resistor self-assigned this Jan 23, 2025
@marnovandermaas
Copy link

With the -S flag, this is the output:

start:                                  # @start
.Lfunc_begin0:
    .file   0 "toolchain_bug/build" "toolchain_bug/test_case.c" md5 0x4719a33fed9f03105d48b7f613596fc3
    .cfi_sections .debug_frame
    .cfi_startproc
# %bb.0:
    .file   1 "toolchain_bug" "test_case.c" md5 0x4719a33fed9f03105d48b7f613596fc3
    .loc    1 7 3 prologue_end              # test_case.c:7:3
    ctail   start2
.Ltmp0:
.Lfunc_end0:

@davidchisnall
Copy link

ctail   start2

That's interesting. I thought we expanded that in the back end, rather than emitting it.

@luismarques
Copy link

As far as I can tell, the CHERIoT 11-bit immediate for AUIPCC wasn't implemented in LLD (except when applying a CHERIOT_COMPARTMENT_HI relocation and the instruction is rewritten to AUIPCC). So every time an AUIPCC + ... addressing sequence is used it will compute the wrong result.

I assume the reason this was overlooked was because the toolchain users and developers were always using relaxations and all of those AUIPCC + ... instructions were rewritten to CJAL, etc.

ctail   start2

That's interesting. I thought we expanded that in the back end, rather than emitting it.

It's expanded in the MC layer but not for textual assembly.

@davidchisnall
Copy link

Yes, that sounds about right. I don't think I've ever seen an auipcc with a non-zero offset that wasn't coupled with a relocation. I'm not really sure why your code is not being emitted with a relaxation.

LLD doesn't need to know about the shift except when applying relocations. We shouldn't have any relocations except CHERIOT_COMPARTMENT_HI on auipcc in LLD, so if the compiler is emitting the wrong relocation, that's a different bug.

@luismarques
Copy link

luismarques commented Jan 24, 2025

Yes, that sounds about right. I don't think I've ever seen an auipcc with a non-zero offset that wasn't coupled with a relocation. I'm not really sure why your code is not being emitted with a relaxation.

The reproducer is being compiled with -mno-relax for the C version, which explains the lack of the relaxation. I don't have the context for why that's the case. @alees24? Of course, you'd normally expect the linker to still do the correct thing even without relaxations, or at least emit a warning if that's not supported for this special toolchain.

LLD doesn't need to know about the shift except when applying relocations. We shouldn't have any relocations except CHERIOT_COMPARTMENT_HI on auipcc in LLD, so if the compiler is emitting the wrong relocation, that's a different bug.

It has a R_RISCV_CHERI_CCALL relocation. That's how RISCVMCCodeEmitter::expandFunctionCall expands the ctail.

@alees24
Copy link
Author

alees24 commented Jan 24, 2025

The reproducer is being compiled with -mno-relax for the C version, which explains the lack of the relaxation. I don't have the context for why that's the case. @alees24?

I switched off the relaxations only because the linker was failing to achieve convergence and I needed to get past that. The problem was seen before that, though, albeit not as such an obvious fault. With relaxations permitted I still see a failure at the point of trying to follow an implicit compiler-generated library call to __library_import_libcalls_memcpy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants