Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulator report error #2236

Open
ha0lyu opened this issue Dec 24, 2024 · 9 comments
Open

simulator report error #2236

ha0lyu opened this issue Dec 24, 2024 · 9 comments
Labels

Comments

@ha0lyu
Copy link

ha0lyu commented Dec 24, 2024

Observed Behavior

Test ibex in verilator with cosim, simulator report error illegal instruction.

001003e8 <main>:
  1003e8:	1141                	addi	sp,sp,-16
  1003ea:	c606                	sw	ra,12(sp)
  1003ec:	0000                	unimp                # <---  This is the illegal instruction!
  1003ee:	0000                	unimp
  1003f0:	37b9                	jal	10033e <func>
  1003f2:	00000517          	auipc	a0,0x0
  1003f6:	05650513          	addi	a0,a0,86 # 100448 <main+0x60>
  1003fa:	3959                	jal	100090 <puts>
  1003fc:	40b2                	lw	ra,12(sp)
  1003fe:	4501                	li	a0,0
  100400:	0141                	addi	sp,sp,16
  100402:	8082                	ret
DASM(0x00010413)
mv      s0, sp

Image

MTVAL in ibex_simple_system.log file is 0x00000000

Expected Behavior

The terminal should print right mtval value.

Steps to reproduce the issue

Run build/lowrisc_ibex_ibex_simple_system_cosim_0/sim-verilator/Vibex_simple_system --meminit=ram,ibex/examples/sw/simple_system/input/test.elf

#include "simple_system_common.h"
int main(void){
    // puts("Starting test");
    __asm__ volatile(".word 0x000000");
    puts("Test complete");
    return 0;
}

My Environment

Build verilator to run ibex by co-sim.

EDA tool and version:
None

Operating system:
Ubuntu Linux 18.04

Version of the Ibex source code:
8f4c75c

@ha0lyu ha0lyu added the Type:Bug Bugs label Dec 24, 2024
@rswarbrick
Copy link
Contributor

I'm just doing an extremely basic triaging job here. Is the point that if we send a bogus instruction to Ibex then the message that gets reported to the console gives the wrong instruction contents?

(That definitely looks wrong, but I want to make sure I've understood the issue properly)

@rswarbrick
Copy link
Contributor

That said, it might be worth looking at the waves for this run in a simulator. The apparently-incorrect message in the screenshot comes from this code:

  // make sure we are called later so that we do not generate messages for
  // glitches
  always_ff @(negedge clk_i) begin
    // print warning in case of decoding errors
    if ((ctrl_fsm_cs == DECODE) && instr_valid_i && !instr_fetch_err_i && illegal_insn_d) begin
      $display("%t: Illegal instruction (hart %0x) at PC 0x%h: 0x%h", $time, u_ibex_core.hart_id_i,
               pc_id_i, id_stage_i.instr_rdata_i);
    end
  end

(in ibex_controller.sv). That looks a lot like it's genuinely reporting the instruction data that it's seeing.

Can you do a bit of debugging to figure out why this isn't the zero that you see elsewhere?

@ha0lyu
Copy link
Author

ha0lyu commented Dec 29, 2024

Hi @rswarbrick,
Thanks for your reply.

Is the point that if we send a bogus instruction to Ibex then the message that gets reported to the console gives the wrong instruction contents?

Yes, such as 0xe472f323, it will be reported correctly:

TOP.ibex_simple_system.u_top.u_ibex_tracer.unnamedblk2.unnamedblk3: Writing execution trace to trace_core_00000000.log
                 469: Illegal instruction (hart 0) at PC 0x00100416: 0xe472f323
Terminating simulation by software request.

I am not sure whether every illegal instruction will be reported correctly, but 0x00000000 is wrong.

Can you do a bit of debugging to figure out why this isn't the zero that you see elsewhere?

I am a new learner of ibex, I have a lot of things about ibex to learn. Could you mind give me some clue to debug this problem?

@ha0lyu
Copy link
Author

ha0lyu commented Jan 2, 2025

I am not sure whether this is related to the ibex configure, so tested it again, but built simulator by:

fusesoc --cores-root=. run --target=sim --setup --build lowrisc:ibex:ibex_simple_system_cosim  $(util/ibex_config.py opentitan fusesoc_opts)

This will use opentitan configure to run, not:

fusesoc --cores-root=. run --target=sim --setup --build lowrisc:ibex:ibex_simple_system_cosim --RV32E=0 --RV32M=ibex_pkg::RV32MFast

This time, co-sim reported error again:

TOP.ibex_simple_system.u_top.u_ibex_tracer.unnamedblk2.unnamedblk3: Writing execution trace to trace_core_00000000.log
                 161: Illegal instruction (hart 0) at PC 0x00100362: 0x00010413
                 165: Illegal instruction (hart 0) at PC 0x00100362: 0x00010413
Terminating simulation by software request.
- ../src/lowrisc_ibex_sim_shared_0/./rtl/sim/simulator_ctrl.sv:93: Verilog $finish
Received $finish() from Verilog, shutting down simulation

@ha0lyu
Copy link
Author

ha0lyu commented Jan 6, 2025

Hi @rswarbrick,
I have figured out this bug. The compressed decoder bug, the instr_o should be instr_i when decode illegal instruction.
I think this PR will fix this bug, I have tested it:

Image

@rswarbrick
Copy link
Contributor

Hmm. Are you sure this is an RTL bug? I think that maybe we should actually be tweaking the $display message so that it is more explicit about what it's printing, maybe with a note that says this is from a compressed instruction.

Looking at ibex_core.sv, I think the value we care about probably appears as rvfi_instr_id, so maybe something like this would work? (untested!)

  always_ff @(negedge clk_i) begin
    // Print warning in case of decoding errors. To help debugging, this reports the instruction
    // value using rvfi_insn_id from further up in the hierarchy.
    if ((ctrl_fsm_cs == DECODE) && instr_valid_i && !instr_fetch_err_i && illegal_insn_d) begin
      $display("%t: Illegal instruction (hart %0x) at PC 0x%h: 0x%h", $time, u_ibex_core.hart_id_i,
               pc_id_i, rvfi_insn_id);
    end
  end

@ha0lyu
Copy link
Author

ha0lyu commented Jan 8, 2025

Yes, I think this is an RTL bug. When decode compressed instructions at ibex_compressed_decoder.sv:

 // C0
      2'b00: begin
        unique case (instr_i[15:13])
          3'b000: begin
            // c.addi4spn -> addi rd', x2, imm
L48:            instr_o = {2'b0, instr_i[10:7], instr_i[12:11], instr_i[5],
                       instr_i[6], 2'b00, 5'h02, 3'b000, 2'b01, instr_i[4:2], {OPCODE_OP_IMM}};
L50:            if (instr_i[12:5] == 8'b0)  illegal_instr_o = 1'b1;
          end

L48 has changed the instr_o, but when L50 if condition is true, this value remains changed. I think this why $display illegal instruction 0x00000000 as 0x00010413:

instr_i = 0x00000000
instr_o = {2'b0, instr_i[10:7], instr_i[12:11], instr_i[5], instr_i[6], 2'b00, 5'h02, 3'b000, 2'b01, instr_i[4:2], {OPCODE_OP_IMM}};
instr_o ={00, 0000, 00, 0, 0, 00, 00010, 000, 01, 000, 0010011} = 0000_0000_0000_0001_0000_0100_0001_0011 = 0x00010413

I analyzed the value of id_stage_i.instr_rdata_i, its data flow.Here is a figure to show how:
Image

@ha0lyu
Copy link
Author

ha0lyu commented Jan 9, 2025

Hi @rswarbrick,

Looking at ibex_core.sv, I think the value we care about probably appears as rvfi_instr_id, so maybe something like this would work? (untested!)

I tested the code you mentioned and made some changes to be able to compile it with Fusesoc.

always_ff @(negedge clk_i) begin
    // Print warning in case of decoding errors. To help debugging, this reports the instruction
    // value using rvfi_insn_id from further up in the hierarchy.
    if (id_stage_i.instr_valid_i && !id_stage_i.instr_fetch_err_i && id_stage_i.illegal_insn_o) begin
      $display("ibex_core.sv:%t: Illegal instruction (hart %0x) at PC 0x%h: 0x%h", $time, u_ibex_core.hart_id_i,
               id_stage_i.pc_id_i, id_stage_i.instr_rdata_i);
    end
  end

This piece of code is placed on line 707~714 of the ibex_core.sv file. Build ibex by:

fusesoc --cores-root=. run --target=sim --setup --build \
        lowrisc:ibex:ibex_simple_system $(util/ibex_config.py opentitan fusesoc_opts)

Here is the output:

TOP.ibex_simple_system.u_top.u_ibex_tracer.unnamedblk2.unnamedblk3: Writing execution trace to trace_core_00000000.log
ibex_core.sv:                 151: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
                 151: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
ibex_core.sv:                 153: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
ibex_core.sv:                 155: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
                 155: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
ibex_core.sv:                 157: Illegal instruction (hart 0) at PC 0x001000b0: 0x00010413
Simulation timeout of 3000 cycles reached, shutting down simulation.

Display ibex_core.sv shows 0x00010413 also. If I patched the ibex_compressed_decoder.sv file:

TOP.ibex_simple_system.u_top.u_ibex_tracer.unnamedblk2.unnamedblk3: Writing execution trace to trace_core_00000000.log
ibex_core.sv:                 151: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
                 151: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
ibex_core.sv:                 153: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
ibex_core.sv:                 155: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
                 155: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
ibex_core.sv:                 157: Illegal instruction (hart 0) at PC 0x001000b0: 0x00000000
Simulation timeout of 3000 cycles reached, shutting down simulation.

I am not good at SystemVerilog, if I made mistakes at ibex_core.sv file, please correct me. Thanks.

@rswarbrick
Copy link
Contributor

I strongly suspect that the RTL is working correctly here. If illegal_instr_o is true then we don't have any opinion on what instr_o should contain. What I was suggesting was a change is to modify the message that gets displayed (and basically avoid the mangled instr_o signal that you show).

I agree that your change will avoid the problem, but I don't think we need to make that change to the hardware, because the existing behaviour of the hardware itself seems sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants