Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Known-small memcpy-s are still libcalls? #86

Closed
nwf opened this issue Dec 21, 2024 · 4 comments
Closed

Known-small memcpy-s are still libcalls? #86

nwf opened this issue Dec 21, 2024 · 4 comments

Comments

@nwf
Copy link
Member

nwf commented Dec 21, 2024

Continuing from #85 and looking at the NATIVE_LITTLE_ENDIAN option afforded by BLAKE2s's portable implementation (https://github.com/BLAKE2/libb2/blob/643decfbf8ae600c3387686754d74c84144950d1/src/blake2-impl.h#L32-L36), load32 now uses a memcpy to read four bytes.

And we dutifully turn that into an actual library call, rather than crushing it down into a clw instruction:

Disassembly of section .text.load32:

00000000 <load32>:
; {
       0: 7d 71         cincoffset      csp, csp, -16
       2: 06 e4         csc     cra, 8(csp)
       4: db 05 a5 fe   cmove   ca1, ca0
       8: 48 00         cincoffset      ca0, csp, 4
       a: 5b 25 45 00   csetbounds      ca0, ca0, 4
;   memcpy( &w, src, sizeof( w ) );
       e: 11 46         li      a2, 4

00000010 <.LBB1_1>:
      10: 97 03 00 00   auipcc  ct2, 0
                        00000010:  R_RISCV_CHERIOT_COMPARTMENT_HI       __library_import_libcalls_memcpy
                        00000010:  R_RISCV_RELAX        *ABS*
      14: 83 b3 03 00   clc     ct2, 0(ct2)
                        00000014:  R_RISCV_CHERIOT_COMPARTMENT_LO_I     .LBB1_1
                        00000014:  R_RISCV_RELAX        *ABS*
      18: 82 93         cjalr   ct2
;   return w;
      1a: 12 45         clw     a0, 4(csp)
      1c: a2 60         clc     cra, 8(csp)
      1e: 41 61         cincoffset      csp, csp, 16
      20: 82 80         cret

This, too, then fails to get inlined at -Oz and everything's still a bit sad.

@resistor
Copy link
Collaborator

@nwf Can you share more specifically what command line you're using that's producing that code? I'm not able to reproduce it, and am seeing codegen equivalent to #85 when using NATIVE_LITTLE_ENDIAN

@nwf
Copy link
Member Author

nwf commented Dec 23, 2024

The relevant bits of xmake output are, I think,

[ 43%]: cache compiling.release blake2s-ref.c
/cheri/out/cheriot/sdk/bin/clang -c -std=c2x -Qunused-arguments -target riscv32-unknown-unknown -mcpu=cheriot -mabi=cheriot -mxcheri-rvc -mrelax -fshort-wchar -nostdinc -Oz -g -ffunction-sections -fdata-sections -fomit-frame-pointer -fno-builtin -fno-exceptions -fno-asynchronous-unwind-tables -fno-c++-static-destructors -fno-rtti -Werror -I/cheri/source/cheriot/cheriot-rtos/sdk/include/c++-config -I/cheri/source/cheriot/cheriot-rtos/sdk/include/libc++ -I/cheri/source/cheriot/cheriot-rtos/sdk/include -I/cheri/source/cheriot/cheriot-rtos/sdk/boards/../include/platform/generic-riscv -DCHERIOT_NO_AMBIENT_MALLOC -DBLAKE2_API=__cheri_libcall -DNATIVE_LITTLE_ENDIAN -DTEMPORAL_SAFETY -DSOFTWARE_REVOKER -DSAIL -DRISCV_HTIF -DCPU_TIMER_HZ=2000 -DTICK_RATE_HZ=10 -DSIMULATION -DCONFIG_MSHWM -DDEVICE_EXISTS_clint -DDEVICE_EXISTS_shadow -DDEVICE_EXISTS_uart -DREVOKABLE_MEMORY_START=0x80000000 "-DCHERIOT_INTERRUPT_NAMES=FakeInterrupt=4, " -o ../../../../../build/cheriot/cheriot-rtos/b3sum/build.sail/.objs/libb2s/cheriot/cheriot/release/blake2s-ref.c.o blake2s-ref.c

and then

[ 72%]: linking library libb2s.library
/cheri/out/cheriot/sdk/bin/ld.lld --script=/cheri/source/cheriot/cheriot-rtos/sdk/library.ldscript --compartment --gc-sections --relax -o ../../../../../build/cheriot/cheriot-rtos/b3sum/build.sail/cheriot/cheriot/release/libb2s.library ../../../../../build/cheriot/cheriot-rtos/b3sum/build.sail/.objs/libb2s/cheriot/cheriot/release/blake2s-ref.c.o

Relative to upstream (BLAKE2/libb2@643decf), I have the following local changes to get things compiling:

1287f4a33af5% git diff
diff --git a/src/blake2-impl.h b/src/blake2-impl.h
index 115e66e..8e2c3af 100644
--- a/src/blake2-impl.h
+++ b/src/blake2-impl.h
@@ -21,7 +21,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <string.h>
-#include "config.h"
+// #include "config.h"
 
 #define BLAKE2_IMPL_CAT(x,y) x ## y
 #define BLAKE2_IMPL_EVAL(x,y)  BLAKE2_IMPL_CAT(x,y)
@@ -154,7 +154,7 @@ static inline void secure_zero_memory(void *v, size_t n)
   explicit_memset(v, 0, n);
 #else
   memset(v, 0, n);
-  __asm__ __volatile__("" :: "r"(v) : "memory");
+  __asm__ __volatile__("" :: "C"(v) : "memory");
 #endif
 #endif
 }
diff --git a/src/blake2.h b/src/blake2.h
index 5ca17f6..00b44e4 100644
--- a/src/blake2.h
+++ b/src/blake2.h
@@ -31,6 +31,7 @@
   #define BLAKE2_DLL_PRIVATE
 #endif
 
+#if !defined(BLAKE2_API)
 #if defined(BLAKE2_DLL)
   #if defined(BLAKE2_DLL_EXPORTS) // defined if we are building the DLL
     #define BLAKE2_API BLAKE2_DLL_EXPORT
@@ -42,6 +43,7 @@
   #define BLAKE2_API
   #define BLAKE2_PRIVATE
 #endif
+#endif
 
 #if defined(__cplusplus)
 extern "C" {

@resistor
Copy link
Collaborator

The first reason this happens is because of the compiler flag -fno-builtin, which prevents LLVM from replacing the call to memcpy with a call to __builtin_memcpy.

Once that flag is dropped, a second reason comes into play: we're declaring memcpy with the cheri_libcall calling convention, which also blocks conversion to __builtin_memcpy.

@resistor
Copy link
Collaborator

With #90 this works properly when -fno-builtin is dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants