Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch FIFO clearing twice on CJALR permit execute error #28

Closed
marnovandermaas opened this issue Mar 18, 2024 · 6 comments
Closed

Fetch FIFO clearing twice on CJALR permit execute error #28

marnovandermaas opened this issue Mar 18, 2024 · 6 comments

Comments

@marnovandermaas
Copy link
Contributor

The behavior I am seeing is that the first 4 instruction bytes of the trap handler being dropped when a CJALR error happened.

You can see from this wave form that fifo_clear in the prefetch buffer gets asserted twice in succession. It causes the instr_rdata_i that corresponds to the PC of the trap handler 12FD42C1 to be dropped.
image

I think this happens because the CJALR clears an entry in the FIFO through branch_req_spec_o. However, the taking of the interrupt also causes an entry to be cleared. This extra entry causes the fetch FIFO to become out of sync. The solution is to not clear the FIFO in the CHERI execute stage when an instruction fault happens. This means using the behavior of branch_req_o (no spec) for example.

This is what I would consider the correct behavior to be after applying the patch from this PR:
image

Here is the dump that causes the error:


error.elf:	file format elf32-littleriscv

Disassembly of section .rom_loader:

00100000 <_start>:
		...

00100080 <start>:
  100080: 5b 05 d0 03  	cspecialr	ca0, mtdc
  100084: 97 05 00 00  	auipcc	ca1, 0
  100088: 5b 06 e0 03  	cspecialr	ca2, mscratchc
  10008c: b7 02 10 00  	lui	t0, 256
  100090: 93 82 82 0b  	addi	t0, t0, 184
  100094: db 82 55 20  	csetaddr	ct0, ca1, t0
  100098: 5b 80 c2 03  	cspecialw	mtcc, ct0
  10009c: 85 42        	addi	t0, zero, 1
  10009e: 73 90 42 bc  	csrw	3012, t0
  1000a2: 21 44        	addi	s0, zero, 8

001000a4 <permitexecute>:
  1000a4: 02 95        	cjalr	ca0
  1000a6: 6f 00 40 02  	j	0x1000ca <fail>
  1000aa: 13 00 00 00  	nop
  1000ae: 01 00        	nop

001000b0 <compartment_key>:
		...

001000b8 <trap>:
  1000b8: c1 42        	addi	t0, zero, 16

001000ba <delayloop>:
  1000ba: fd 12        	addi	t0, t0, -1
  1000bc: e3 9f 02 fe  	bnez	t0, 0x1000ba <delayloop>
  1000c0: 73 50 30 34  	csrwi	mtval, 0
  1000c4: 73 50 20 34  	csrwi	mcause, 0

001000c8 <pass>:
  1000c8: 01 a0        	j	0x1000c8 <pass>

001000ca <fail>:
  1000ca: 01 a0        	j	0x1000ca <fail>
		...

0010014c <end>:
  10014c: 01 00        	nop
  10014e: 01 00        	nop
  100150: 01 00        	nop
  100152: 01 00        	nop
  100154: 01 00        	nop
  100156: 00 00        	unimp	

In case its useful here are the wave files that I used to generated the images above:
wavefiles.zip

@kliuMsft
Copy link
Contributor

Looking at your failure case waveform, I noticed something interesting at the fetch interface. Basically there is a fetch request at 0x0000_0000 due to the speculative fetch. The fetch is granted however the rvalid for that never come (you can see there is a rvalid for the fetch at address 0x001_00a8 and then a rvalid for 0x001_00b8, but nothing in the middle). That's probably why the fetch_fifo went out of sync. Can you check the FPGA memory design?

image

@marnovandermaas
Copy link
Contributor Author

Yes, it seems that currently the memory system grants the access to address zero but never returns anything. However, if I don't grant the access the ibex will just hang and retry fetching from that address. Either way, I think it's better not to do this speculative fetch for a capability that you know will fail.

@kliuMsft
Copy link
Contributor

I guess we are still not on the same page. There are 2 separate issues,

  1. The sonata memory system behavior is clearly a bug. It's ok to return an instr_err together with instr_rvalid, but granting access without issuing a instr_rvalid is NOT acceptable. This condition causes CPU to go into an unrecoverable failure state (same consequence if this happens on the data interface
  2. whether we want to issue a speculative fetch. To me it is similar to branch prediction and thus the fetching stage should be able to handle. The intention for introducing the speculative fetch is to improve timing on the instr interface (to remove bound checking from the critical path). While this is now mitigated by the ISA spec change, I'd like to defer it until we can fully evaluate the impact of the change (syntheis/timing analysis and regression simulation).

@marnovandermaas
Copy link
Contributor Author

Agreed with point number 1. I have opened an issue in the Sonata repository to track this while I investigate: lowRISC/sonata-system#28

In terms of the speculative fetch, I am curious about the ISA change that you mention. Is this published anywhere?

@kliuMsft
Copy link
Contributor

That's in the cheriot-sail PR CHERIoT-Platform/cheriot-sail#37

@marnovandermaas
Copy link
Contributor Author

Thanks that very useful. I'm going to close this issue as discussing the speculative branching should be done separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants