-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating Point Exception #22
Comments
Hi Zaid, This probably means that there is a division by zero somewhere. Not really sure where that would be. Would it be possible for you to add some print statements in the Thanks |
I did some debugging and it seems the problem is on line 51 in include/nvm_queue.h which has the following text: |
Thank you for your effort. I suspect that this may be because it is not able to read controller registers properly. I assume you are using the kernel module version (and not the SmartIO version)? In |
These are the values for each of the members of
|
Thank you again. This is a bit strange, it appears that it managed to read the time-out value from the registers, but he maximum number of entries has become 0. I may be doing something wrong in Out of curiosity, what kind of nvme controller/disk is this? |
This is a Corsair NX500 SSD. |
So the value returned from CAP.MQES reg value in |
The reason I add 1 is that MQES is, according to the spec,
If it returns 65535, I suspect a different type of issue, namely that all MMIO reads are returning 0xFFs (this is a common symptom when reading does not work). If you attempt to read any other values directly from the memory mapped pointer, does it all return 0xFFs? Could you for example try this: |
This is the value printed due to |
That's a good thing! The controller probably supports 65536 entries, and the bug is that I'm using 16-bit variable to read out the value and add one. I'm trying to find a datasheet or documentation in order to verify that it supports that many queue entries, but I suspect that this is the case. I'll add a quick fix for it and add your name in the commit message (if you want), or you can send me a patch or alternatively a pull request if you want. In the mean time, you can probably comment out the |
Thank you for your time and effort, by the way. |
Wait but the specification states that this register is only 16 bits. |
Yes, the register is only 16 bits, but I belive the value allowed to be 65536. I'll look around, and check the spec errata if it is mentioned, and I'll also have a look at what the Linux kernel NVMe driver does. Does the indentify example program work for you if you don't add one? |
Yes that example works but I think typically what people do is they always use the MQES+1 value. So I think you will need to change the type of the max_entries field in all of your code. I believe this is leading to another issue at the end of the latency benchmark when you are unmapping memory as I get the following error at the end of the latency benchmark: I am trying to debug it and see what is happening. |
The spec says in the introduction chapter:
Since it's 2^16 - 1 number of I/O queues, does that mean 64K in this case perhaps means 2^16? Regardless, the unmapping is a known bug (see #18), I haven't had time to rewrite the kernel module yet. The problem is how I'm currently storing address mappings, as I'm storing everything into one global list when I should instead create a list per open descriptor. It happens with the nvm-latency-benchmark because I check against the PID, and for some reason CUDA programs fork into child processes so the "wrong" PID is attempting to clean up the stored mapping. It will clean up automatically once you unload the module, so do that every now and then if you run the nvm-latency-bench program multiple times. The warnings and errors look bad, but it is expected. |
The permanent fix would be to change the type of |
I am trying to run the example nvm-identify but I get the following output:
Resetting controller and setting up admin queues...
Floating point exception
The dmesg output is this:
[May24 16:40] traps: nvm-identify[3179] trap divide error ip:7f6d2f98a434 sp:7ffd9a74e3b0 error:0 in libnvm.so[7f6d2f985000+9000]
I am not sure what is going on. Any help would be appreciated.
The text was updated successfully, but these errors were encountered: