Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[viostor] io hung can not recover #1204

Open
lixianming19951001 opened this issue Nov 29, 2024 · 8 comments
Open

[viostor] io hung can not recover #1204

lixianming19951001 opened this issue Nov 29, 2024 · 8 comments
Assignees
Labels

Comments

@lixianming19951001
Copy link

Describe the bug
windows io hung can not recover

To Reproduce
I downloaded the code from the master branch and reset it to commit 379f291 for compilation
My VM originally had a driver version of 24000
I replaced it with the viostor I compiled myself

  1. start a vm
    I have a local file like windows-2022-legacy-datacenter.raw, and I use SPDK AIO bdev to start the virtual machine with the vhost user protocol
  2. Use fio to stress test the C driver
    fio --name=fio-randwrite --filename=C:\test --direct=1 --iodepth=32 --rw=randwrite -thread -time_based -refill_buffers -norandommap -randrepeat=0 -group_reporting --blocksize=64k -numjobs=32 -size=30G -runtime=60000 --ioengine=windowsaio --cpus_allowed_policy=split
  3. Stop SPDK, and after sleeping for 2 minutes, start SPDK again
  4. windows io hung and can not recover
    hung
  5. Final black screen
    dark

Host:

  • Disto: [e.g. Fedora, Ubuntu, Proxmox]
  • Kernel version :5.4.56
  • QEMU version : 5.2.0
  • QEMU command line :I'm sorry, I can't provide that
  • libvirt version :4.7.0
  • libvirt XML file : I'm sorry, I can't provide that

VM:

  • Windows version : windows server 2022 datacenter
  • Which driver has a problem:viostor
  • Driver version or commit hash that was used to build the driver:379f2910b85bb6b32a0991904d2402420957bdb6

Additional context
The version before 379f291 had no issues and could automatically recover fio

@vrozenfe
Copy link
Collaborator

vrozenfe commented Dec 1, 2024

@lixianming19951001

Thank you for reporting the issue.
If possible, can I ask you to dump the VM memory to a dump file with "dump-guest-memory -w"
command (you will need to install vmcoreinfo device as well) and share it with me?

Thanks,
Vadim.

@lixianming19951001
Copy link
Author

@vrozenfe

I tried (add to the XML, then reproduce it and execute virsh qemu-monitor-command test_lxm_3 --hmp dump-guest-memory -w test.dmp) and got a 4GB file. I'm not sure if it meets your requirements. If it does, how can I share it with you?

Best regards,
Li.

@vrozenfe
Copy link
Collaborator

vrozenfe commented Dec 2, 2024

@lixianming19951001
Thanks. Can you zip it first and put it somewhere on your google drive to share with me?
I will also need your viostor.pdb file as weel.

Best,
Vadim.

@lixianming19951001
Copy link
Author

@vrozenfe
Our company has strict permission controls. I have tried my best to send the files from [email protected] to [email protected], but I am not sure if you can access them.

Best regards,
Li.

@menli820
Copy link

menli820 commented Dec 17, 2024

Hi all,
Just to update that we have a fully blk function test recently, and also detect a regression issue https://issues.redhat.com/browse/RHEL-70446 that should be introduced from the same commit.

Also I am working on reproducing the same scenario described in this upstream issue.

Thanks
Menghuan

@vrozenfe
Copy link
Collaborator

vrozenfe commented Jan 7, 2025

@menli820
Thank you Menghuan.
Let's check internally what to with this regression.

Best,
Vadim.

@vrozenfe
Copy link
Collaborator

vrozenfe commented Jan 7, 2025

@menli820
Btw,
Can you please check how it works for vioscsi?

Thanks,
Vadim.

vrozenfe referenced this issue Jan 7, 2025
Backports fixes and improvements from vioscsi PRs #1150 and #1162

virtqueue struct vq was also removed in favour of adaptExt->vq[QueueNumber],
which results in a minor performance increase.

Signed-off-by: benyamin-codez <[email protected]>
@menli820
Copy link

menli820 commented Jan 8, 2025

@menli820 Btw, Can you please check how it works for vioscsi?

Thanks, Vadim.

@vrozenfe vioscsi does not have the regression issue: https://issues.redhat.com/browse/RHEL-70446.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants