You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have sunk a few days of my time into this and was unable to come up with a solution. I'm documenting it here in case anybody else wants to pick this up where I left of.
The problem is, that on the Debian buildds, on the Debian gitlab CI as well as on tests.reproducible-builds.org, the bmaptool test suite runs into a dead-lock around ~50% of the time it is run. I am unable to reproduce this problem on my own laptop, so debugging this was not trivial. I hacked in a periodic call to pstree -l -a to have a look at what the process tree looks like when the deadlock happens:
All of these tools are invoked by _generate_compressed_files() in tests/test_api_base.py so my guess was that the way that the tools are called using subprocess.Popen(), followed by Popen.wait() somehow causes the kind of deadlock that is described in the documentation of Popen.wait(). So I replaced those subprocess calls with calls to subprocess.check_output() and slurped the whole standard output into a Python string (it's only ~2 MB of data). But this did not change the behaviour at all.
Searching online for deadlocks involving the subprocess module revealed that in Python 2.7 there was a problem of deadlocks when the subprocess module was called from Python threads. The fix was supposedly to add close_fds=True to the Popen call. I tried that too but it resulted no change in behaviour.
Then I stumbled across a slightly different deadlock:
The invocation of "df -PT" is from a completely different part of the code. It comes from get_file_system_type() in bmaptool/BmapHelpers.py which just runs:
So maybe the problem is not with tests/test_api_base.py but the problem is a more fundamental one. I looked into other parts of the codebase and _open_compressed_file() in bmaptool/TransRead.py does an aweful lot of threading in combination with opening decompressors via the subprocess module.
I have no solution to the problem yet but I do have a workaround:
--- a/tests/helpers.py+++ b/tests/helpers.py@@ -284,29 +284,19 @@ def calculate_chksum(file_path):
return hash_obj.hexdigest()
█
█
+import subprocess++
def copy_and_verify_image(image, dest, bmap, image_chksum, image_size):
"""
Copy image 'image' using bmap file 'bmap' to the destination file 'dest'
and verify the resulting image checksum.
"""
█
- f_image = TransRead.TransRead(image)- f_dest = open(dest, "w+b")- if bmap:- f_bmap = open(bmap, "r")- else:- f_bmap = None-- writer = BmapCopy.BmapCopy(f_image, f_dest, f_bmap, image_size)- # Randomly decide whether we want the progress bar or not- if bool(random.getrandbits(1)) and sys.stdout.isatty():- writer.set_progress_indicator(sys.stdout, None)- writer.copy(bool(random.getrandbits(1)), bool(random.getrandbits(1)))+ subprocess.check_call(+ ["python3", "-m", "bmaptool", "copy"]+ + (["--bmap", bmap] if bmap else ["--nobmap"])+ + [image, dest]+ )
█
- # Compare the original file and the copy are identical
assert calculate_chksum(dest) == image_chksum
-- if f_bmap:- f_bmap.close()- f_dest.close()- f_image.close()
Instead of calling BmapCopy.BmapCopy().copy(), spawn a new process running the
full bmaptool utility with the given options. This is slower than the existing
code because bmaptool gets spawned a lot but it will effectively test the same
thing. Except that with this change we do not anymore test using BmapCopy as a
library. But I do not believe that this test is useful for Debian because we do
not ship this as a library but we ship it as a program that is to be called
from the terminal. And when run like that, the deadlock problem never occurs.
Maybe somebody else feels motivated to look into this. It would probably help being able to reproduce this issue locally instead of a remote machine without ssh access. :)
The text was updated successfully, but these errors were encountered:
Hi,
I have sunk a few days of my time into this and was unable to come up with a solution. I'm documenting it here in case anybody else wants to pick this up where I left of.
The problem is, that on the Debian buildds, on the Debian gitlab CI as well as on tests.reproducible-builds.org, the bmaptool test suite runs into a dead-lock around ~50% of the time it is run. I am unable to reproduce this problem on my own laptop, so debugging this was not trivial. I hacked in a periodic call to
pstree -l -a
to have a look at what the process tree looks like when the deadlock happens:All of these tools are invoked by _generate_compressed_files() in tests/test_api_base.py so my guess was that the way that the tools are called using subprocess.Popen(), followed by Popen.wait() somehow causes the kind of deadlock that is described in the documentation of Popen.wait(). So I replaced those subprocess calls with calls to subprocess.check_output() and slurped the whole standard output into a Python string (it's only ~2 MB of data). But this did not change the behaviour at all.
Searching online for deadlocks involving the subprocess module revealed that in Python 2.7 there was a problem of deadlocks when the subprocess module was called from Python threads. The fix was supposedly to add close_fds=True to the Popen call. I tried that too but it resulted no change in behaviour.
Then I stumbled across a slightly different deadlock:
The invocation of "df -PT" is from a completely different part of the code. It comes from get_file_system_type() in bmaptool/BmapHelpers.py which just runs:
But this should be completely harmless, no???
So maybe the problem is not with tests/test_api_base.py but the problem is a more fundamental one. I looked into other parts of the codebase and _open_compressed_file() in bmaptool/TransRead.py does an aweful lot of threading in combination with opening decompressors via the subprocess module.
I have no solution to the problem yet but I do have a workaround:
Instead of calling
BmapCopy.BmapCopy().copy()
, spawn a new process running thefull
bmaptool
utility with the given options. This is slower than the existingcode because
bmaptool
gets spawned a lot but it will effectively test the samething. Except that with this change we do not anymore test using
BmapCopy
as alibrary. But I do not believe that this test is useful for Debian because we do
not ship this as a library but we ship it as a program that is to be called
from the terminal. And when run like that, the deadlock problem never occurs.
I also documented this in the associated Debian bug: https://bugs.debian.org/1081336
Maybe somebody else feels motivated to look into this. It would probably help being able to reproduce this issue locally instead of a remote machine without ssh access. :)
The text was updated successfully, but these errors were encountered: