ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong #18

vmurali · 2023-11-09T19:03:13Z

It looks like the bounds check uses "minimum-inst-bytes" when checking if a taken address is in-bounds. This means that if the jump/branch target is a 32-bit instruction, where the last 16 bits are out-of-bounds, we will get an out-of-bounds error after fetching the target instruction, which kind of defeats the purpose w.r.t. triaging why the branch or the jump failed; we might as well not do out-of-bounds checks during jumps/taken-branches and wait for the instruction fetch to perform the bounds check.

A potential scenario where this exception (as it is spec-ed currently) becomes useless is as follows:
We have a compressed trampoline initially, allowing us to jump correctly to the trampoline and out of it with just a 2-byte bounds. Now the trampoline is removed and the code is inlined (perhaps some JIT-optimization). When we are debugging/trap handling an out-of-bounds exception because of this, we will still want access to the old PC, so that we can fix the PC bounds there. The way the spec is, we will lose the previous PC information in this scenario, which is not ideal.

Potential fixes:
a) check for 4-bytes in-bounds on taken-branch/jump. This has the problem if the target is a single instruction compressed trampoline with precisely set bounds - we will cause a spurious exception while executing the previous branch/jump.
b) store the previous PC and taken-ness in some micro-architectural state, and if the new instruction's fetch fails check for boundedness, set the EPC with the previous PC if the taken-ness is set. If taken-ness is not set, set EPC to current PC.
c) store the previous PC and taken-ness in some architectural state, and every boundedness failure during instruction fetch stores the previous PC and current PC. This could be easier for writing the trap handlers - the return address for the trap handler is still the current PC, but the fix of the bounds will be done on the previous PC as opposed to (b) where the trap handler has to compute the PC to return to.

### Tasks

The text was updated successfully, but these errors were encountered:

rmn30 · 2023-11-10T13:29:46Z

Thanks Murali, this is an interesting one. I think the current check will precisely identify the instruction responsible for an out of bounds PC in most cases (e.g. a computed jump gone wrong, possibly due to an attack). I expect the case of jumping to the last instruction in PCC (whether 2 or 4-bytes) would be quite rare, but the example you raise is an interesting one. I think @davidchisnall or @nwf-msr are better qualified to comment there.

In terms of the potential solutions:

a) At first glance this looks bad because it means you can't jump to a compressed instruction at the end of PCC but this could be solved by padding with a c.nop (or just not using the compressed jump instruction there). I guess it depends on how often this arises whether the extra padding matters, and if there are any existing sw alignment requirements etc.
b) sounds interesting but means extra architectural and hardware complexity to solve a problem that I'm not sure will arise in practice and the sw still has to do quite a bit of work?
c) sounds simpler from a hw / architectural point of view and gives software the flexibility to do what it wants at the cost of a little even if this comes with some sw complexity.

Other options:

d) retain status quo and accept current limitation of the check, even if it is a bit inconsistent
e) drop the check entirely and just rely on instruction fetch check

I'd be interested to hear from @kliuMsft re. the relative hw cost of each of these, especially since the current check is potentially a little costly.

davidchisnall · 2023-11-10T13:59:00Z

At first glance this looks bad because it means you can't jump to a compressed instruction at the end of PCC but this could be solved by padding with a c.nop

I think the pain involved here depends a lot on the use case. I think the ABI requires functions to be aligned to 4-byte boundaries and so you almost certainly need padding after the end of anything where PCC ends on a 2-byte boundary, and adding a c.nop and strongly aligning the end is easier for statically linked code. For JIT'd code, it depends a bit on what your memory management looks like.

I think the trampoline example is very unlikely to exist on embedded code: on-device generation of fine-grained code like this is typically limited to the kinds of VMs that require 8+ MiBs of RAM to run, so I don't object to the current version (if you really want to track trampolines like this, allocate 4 bytes).

If moving to a 4-byte check in the cj[al][r] instruction would simplify the implementation (@kliuMsft ?) then I don't think it will affect and sensible code and shouldn't hurt our ABI in any way.

vmurali · 2023-11-10T15:25:29Z

b) sounds interesting but means extra architectural and hardware complexity to solve a problem that I'm not sure will arise in practice and the sw still has to do quite a bit of work?

I dont think this will be a lot of hardware complexity - at least not in the critical path. The check will happen during instruction fetch, and you just need to add a mux when writing the EPC in the case of a taken previous instruction and out-of-bounds exception.

c) sounds simpler from a hw / architectural point of view and gives software the flexibility to do what it wants at the cost of a little even if this comes with some sw complexity.

Right, this would give the software freedom to do whatever it wants - it can completely ignore the previous PC (making it really simple). Not sure why s/w gets more complex.

d) retain status quo and accept current limitation of the check, even if it is a bit inconsistent

Given how early you are in the spec process, I don't think it's a good idea.

e) drop the check entirely and just rely on instruction fetch check

This is also bad when I am writing an automatic jitter. I will need to fix both the caller code and callee code. First I will fix the callee code, and then on encountering the out-of-bounds exception, I will fix the caller code.

vmurali · 2024-01-25T06:06:18Z

I think there's one more issue - representability checks. Previously, the bounds check at jumps/branches for the target+2 ensured the cap is representable. But if one removes the check during jump/branch execution, then one still needs to do the representability check during jump/branch execution. Alternatively, the bounds check during fetch can be done using the previous pc bounds and current pc address for taken branches/jumps.

This makes the overall solution less than ideal. But I still contend that it's an improvement.

rmn30 · 2024-02-29T13:12:32Z

I think #37 addresses this. Feel free to comment further if you disagree.

rmn30 mentioned this issue Jan 9, 2024

Allowing delayed check for jump/branch instructions #23

Closed

rmn30 closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong #18

ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong #18

vmurali commented Nov 9, 2023 •

edited by rmn30

Loading

rmn30 commented Nov 10, 2023

davidchisnall commented Nov 10, 2023

vmurali commented Nov 10, 2023 •

edited

Loading

vmurali commented Jan 25, 2024

rmn30 commented Feb 29, 2024

ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong #18

ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong #18

Comments

vmurali commented Nov 9, 2023 • edited by rmn30 Loading

rmn30 commented Nov 10, 2023

davidchisnall commented Nov 10, 2023

vmurali commented Nov 10, 2023 • edited Loading

vmurali commented Jan 25, 2024

rmn30 commented Feb 29, 2024

vmurali commented Nov 9, 2023 •

edited by rmn30

Loading

vmurali commented Nov 10, 2023 •

edited

Loading