Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging Virtex DSP48E1 bug #445

Open
4 of 5 tasks
gussmith23 opened this issue Jun 26, 2024 · 7 comments
Open
4 of 5 tasks

Debugging Virtex DSP48E1 bug #445

gussmith23 opened this issue Jun 26, 2024 · 7 comments

Comments

@gussmith23
Copy link
Owner

gussmith23 commented Jun 26, 2024

Tracking issue for finding the bug in the DSP48E1 model.

Some things to try in debugging:

  • Look through the diff for anything suspicious in how we modified the Verilog. Attempt to minimize the diff.
  • Triple-check that we encoded all of the DRC warnings as constraints in the arch description.
  • Binary search for the bug: that is, cut the datapath short and "return early" from the DSP48E1 module, and figure out at which point the bug manifests. E.g. if we just return the A input always, does it manifest? If we return the result of the preadder, does it manifest? The multiplier? The ALU? This will allow us to rule out parts of the design.

Other subtasks to finish this up:

@gussmith23
Copy link
Owner Author

First, Vishal found most of the bugs it seems -- all that remains are bugs in the combinational case. That is, if we synthesize combinational designs, the result from Lakeroad does not function as expected. Interestingly, the differences can be subtle, too. For example, it took a number of iterations for random testing with Verilator to find a case that breaks when testing a 9bit mul-sub case.

@gussmith23
Copy link
Owner Author

2024-06-26

Trying to binary search the bug.

Process:

  • Modify the racket-import version of DSP48E1, and modify the Verilator version in the same way.
  • Reimport the racket-import verison
  • Change the test so that it's attempting to synthesize whatever you reduced the racket import version to. E.g. if the racket import version just returns A, then make the test module return just A.

Returning just the input A works.

Returning multiplication of A * B works.

A*B+C works. It seems like it's subtraction (A*B-C) that's not working. It's extra suspicious that it is an off-by-one error and that it doesn't always appear.

I thought I found the bug, but still getting bugs in eval.
Weirder thing is that I'm pretty sure more stuff is breaking in the eval now, compared to when Vishal was running it. He told me only combinational stuff was breaking, but now, 2 stage stuff is breaking too.

At least one of the tests that's failing in the eval is passing when I make it an integration test in LR. Unsure why that would be, but it certainly feels like it could be a mismatched Lakeroad version or something.

Furthermore, the tests are passing when I run them on leviathan, within the lakeroad submodule of the lakeroad eval repo. weird.

Yeah, every test i'm adding to Lakeroad that's failing in the eval seems to be passing in Lakeroad itself. Something's weird.

List of tests not passing:

robustness_experiments:mult_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_2_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_2_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_signed_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mult_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:muladd_0_stage_signed_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_unsigned_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:mulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmuladd_0_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmuladd_0_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_12_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_1_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_signed_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_0_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:addmulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submuladd_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submuladd_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_8_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_9_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_10_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_11_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_12_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_13_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_14_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_15_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_signed_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_0_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_16_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_unsigned_17_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_1_stage_signed_18_bit:lakeroad-virtex:verilator <stdout>:
robustness_experiments:submulsub_2_stage_unsigned_18_bit:lakeroad-virtex:verilator <stdout>:

Gonna run an eval run in Docker and see if it replicates there.

@gussmith23
Copy link
Owner Author

2024-06-27

Seems like it's still running into errors in Docker. So I think there's two options at this point

  • there really is a bug still, and i'm just having trouble replicating it outside of the eval (which may be a bug in my tests), or
  • there's a bug in the eval code calling Lakeroad.

The second one is easier to check, so i'll do it first, but at this point it really does feel more likely that it's the first.

Oh interesting. Before, I tried disabling all solvers but Bitwuzla, so that I could get the eval to match what was happening in the integratio ntests (because all of my integration tests were using Bitwuzla). But when I ran the eval with just Bitwuzla...well, what happened? I forget, but I think what happened was that I couldn't replicate the issues? Oh, no, according to this thread with Vishal, it was still failing when I just used Bitwuzla. Anyway, what just happened was that I made an integration test that uses STP instead of Bitwuzla, and it fails. So I finally have the case I was looking for, which is good. Now I can go back to the delta debugging I was doing earlier.

I just went through the diffs. Here's the thoughts i had.

  • It'd be good to check again that there are no latches being inferred during the import process.
  • There's one suspect line in the simulation model:
    always @(alu_o)
      qp_o_mux <= #1 alu_o;
    We just use alu_o. I think in the past when they've written something like this, they've used it to break combinational loops...maybe. I'm not sure on that, honestly.

Easy check of the second point -- just remove the delay in the Verilator model and see what happens.

interestingly, it's still an off by one error.

I'm almost certain the bug is in the carry logic. There's some weird stuff happing there. Specifically I think I just realized that qcarryin_o_mux0 isn't driven in the modified-for-racket version, which is definitely not right (it's driven in the original version). I might have deleted that code---wouldn't be surprised.

qcarryin_o_mux is also multiply driven. That has to cause issues. I'll look at the above missing driver issue first, but I'd say that the multiple drivers issue is also worth looking at!

Ok, there was definitely a bug with qcarryin_o_mux0 not being set - just a typo bug. That's resolved.

Still bugs left though. Running the eval and things are still failing. E.g. muladd_0_stage_signed_9_bit.

Again, still an off by one error. So I suspect it's more bugs in the carry.

There are still latches being inferred, too.

Latch inferred for signal `\DSP48E1.\qalumode_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:920$434': $auto$proc_dlatch.cc:427:proc_dlatch$1106
Latch inferred for signal `\DSP48E1.\qopmode_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:904$430': $auto$proc_dlatch.cc:427:proc_dlatch$1133
Latch inferred for signal `\DSP48E1.\cci_drc_msg' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:888$405': $auto$proc_dlatch.cc:427:proc_dlatch$1160
Latch inferred for signal `\DSP48E1.\cis_drc_msg' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:888$405': $auto$proc_dlatch.cc:427:proc_dlatch$1189
Latch inferred for signal `\DSP48E1.\qcarryinsel_o_mux' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:879$402': $auto$proc_dlatch.cc:427:proc_dlatch$1212
Latch inferred for signal `\DSP48E1.\the_mask' from process `\DSP48E1.$proc$/home/gus/lakeroad-evaluation/lakeroad/lakeroad-private/DSP48E1/DSP48E1_modified_for_racket_import.v:1385$702': $auto$proc_dlatch.cc:427:proc_dlatch$1069

Killed the latches.

I should really include somewhere that yosys is a very helpful debugging tool when doing this process.

Still running into bug.

Interestingly, the multiple drivers warning also disappeared? That doesn't feel right. I don't feel like I fixed that issue.

@gussmith23
Copy link
Owner Author

2024-06-28

Realizing I did solve the multiple drivers problem when I fixed the typo bug.

Got another bug to replicate in integration tests:
integration_tests/lakeroad/xilinx-virtex/virtex_mult_0_stage_signed_12_bit.sv

Interestingly, fails with STP but not Bitwuzla. So theyre just finding different solutions, I assume.

@gussmith23
Copy link
Owner Author

2024-06-29

My debugging strategy now is to put $displays in the Verilator model to see where in the pipeline the value diverges from what i expect. Frankly I probably should be using a waveform viewer for this.

Unsurprisingly, qcarryin_o_mux is getting set to 1 at some point. That's definitely the problem.

Ok, I think the problem is coming from the fact taht we changed

              3'b111 : qcarryin_o_mux_tmp <= qp_o_mux[47];

to

              3'b111 : qcarryin_o_mux_tmp <= qp_o_reg1[47];

If I recall, we did this because of a combinational cycle that was caused by using qp_o_mux. We thought through it though, and i think this change should have been legal based on some constraints that should be in the design.

Found the bug -- the constraints in the arch description for carryinsel were wrong. They'd been copy-pasted and not updated.

@gussmith23
Copy link
Owner Author

2024-07-01

Apparently not done with debugging yet. One of these tests is failing in CI.

At this point, might be most efficient to just look through the constraints for any more obvious bugs.

No obvious bugs.

Back on the hunt for bugs in the code itself. Specifically just adding $display statements into the simulation model.

For this singular example, it's correct up to qmult_o_mux. So bug is presumably still after that. I wonder if it's still carryin related.

z mux is C, x is the result of the mul, y is zero. so those all check out just fine.
alu_o also seems to have the right answer.

It seems to be an issue with the P register. It seems like the racket import version thinks that the ALU output will make it to the output, but it doesn't. The correct output is presumably at alu_o, but doesn't make it through to PREG.

I wonder if vishal just didn't find all of the constraints on PREG? They're encoded in a very, very annoying way in the simulation file. I wonder if they're more compact in the pdf.

He got all of the ones I found when I just did a pass.

@gussmith23
Copy link
Owner Author

2024-07-02

IT seems like the PREG is working fine---I think there might be a register in the middle somewhere that's enabled when it shouldn't be.

C is making it through to the ALU instantly. But the result of the multiply does not -- it seems to be only making it to the ALU

Wait nevermind -- it seems like an answer is coming out of the multiplier before the clock ticks, but it's not correct. then when the clock ticks, the multiplier (qmult_o_mux) signal gets the right answer (minus the post-add). But then because

omg, it was because i had different --pipeline-depths set for synthesis and simulation. This would have been fixed by #442. Wasted a whole day on this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant