-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Calling switch_mode_and_capture_file
repeatedly eventually causes OSError: [Errno 12] Cannot allocate memory
#1130
Comments
switch_mode_and_capture_file
repeatedly eventually causes OSError: [Errno 12] Cannot allocate memory
I wonder if this is the same problem as #1125. You could try the workaround mentioned there (to use the |
@davidplowman thanks, I'll look at trying that soon, although I was curious if commenting out the line in Since you mentioned how If my goal is to switch from the video config to the still config, capture a jpeg, and then switch back to the video config, is For example, if the pipeline of But if it is something like Raw Image Data -> JPEG Encoder -> JPEG file, then how long it takes to capture the frame data is dependent on the slow JPEG encoder instead of a faster operation like just storing the image data in memory? The reason I'm asking is because there is currently a delay in the chain from when the printer sends a signal to capture a timelapse frame, the HA server receiving it and sending an MQTT packet, and then finally when the pi receives the packet, the time it takes to switch modes and capture the frame. So if I could optimise the last part, the time it takes to capture the image data (not the jpeg), then that would reduce the total time I need to have the printer stay parked (since the printer can't receive feedback from the pi, the printer just waits in the park area a fixed amount of time for the signal to propagate and the pi to take an image) |
On anything other than a Pi 5, the If you're going to do a mode switch for the capture, then Note that stopping the camera and switching to a different mode is relatively time-consuming, normally I would expect a few hundred milliseconds. The other option would be to run permanently in the capture mode - if you can run at 10 or 15fps then that's a much lower latency. The catch is that you might not have enough memory. Running with 2 buffers might be enough for occasional captures, you'd have to try it - otherwise you'd need 3. If you don't need a preview then you could use 24-bit RGB instead of 32-bit to save some space. You might even be able to use YUV420 (even less memory), though you'd probably need a software conversion to RGB for saving as JPEG (though OpenCV has a routine for that). |
@davidplowman ok thanks, I'll leave the Regarding switching speed, a while ago I did time how long it took to run:
And it usually came out around 1.5 secs (with some variance) before using So were you're implying that the time to run Also, doing it this way (dual configs) would allow me to use some features such as HQ noise reduction for the still images, while still using the normal fast denoising for the video mode, whereas if you only used a single config you would probably have to use fast denoising in order to get a decent framerate for the video, at the expense of the denoising quality for the stills? Regarding memory, currently I don't explicitly set the format of the image in either config, but I assumed that because I don't configure or call for a preview at any point then it would automatically select either RGB24 or YUV? If not, then I suppose setting RGB24 for both would be best, or would setting YUV for the video mode be better? Although maybe for MJPEG. using YUV has no quality difference since MJPEG stream isn't the best quality to begin with (some blockiness) when using the H/W encoder? And regarding the actual image format for RGB24, does it make any difference for the still jpeg config, or the video MJPEG config whether I select BGR888, or RGB888? |
It might be worth timing some of this stuff for yourself.
That's pretty much what
That's true. On Pis other than Pi 5s, doing the higher quality (and slower) stills denoise may reduce the framerate. Note that you can change the denoise setting while the camera is running, but the catch will be knowing on which frame the change has actually taken effect.
Preview/video modes select 32-bit ARGB formats by default, but only because they're usually easier to display. If you're not displaying them, 24-bit RGB would be more efficient. YUV420 is even better, and the H.264 video encoder will accept it directly. Unfortunately Python mostly doesn't have much support for YUV420, so you'd have to convert it to 24-bit RGB for saving as a JPEG. OpenCV has a function
MJPEG should accept RGB and YUV420 image formats but, as you say, quality is worse for a similar bitrate compared to H.264.
I don't think the choice of BGR or RGB makes any difference. My rule of thumb is that reds and blues invariably come out the wrong way round and you end up swapping colours until it's right. Always swap colours by requesting the other format, rather than doing software conversions. |
@davidplowman thanks for your suggestions, and apologies for my late reply. Now regarding your reply: For BGR vs RGB, I tried both and it doesn't make any difference for jpeg images like you said, so I decided to use BGR888, since the default format if I don't specify any is the 32 bit version of that. Now regarding the timings, I copied your code and replaced time1 = The time to run Using some code in the 3D printer and a pi pico to translate the PWM signal, I sent a digital on/off signal to the GPIO of the the raspberry pi, which started the
So you can see that by far the biggest contributor to the total function runtime is the It is critical that the camera/ If it helps, I wouldn't mind putting a copy of the frame into memory and then saving that into a jpeg in a background thread. Of course, this copy of the frame would have to have some kind of protection so that it doesn't get overwritten by the next iteration while it is still being saved to a jpeg. Obviously, if I use the copy the frame into memory idea and the jpeg saving takes 9 secs several times in a row, the pi could run out of memory, but that shouldn't happen as the average time to encode the jpeg/save the request appears to be 1 sec or less, it's just these outliers that take up to 9 secs sometimes which I want to deal with. Also, I could get rid of the switching configs part and just run the camera with the high quality stills config (but lose out on achieving 30fps for the mjpeg stream), but given that the worst case total time is about 10 seconds due to the req.save issue, I don't think saving about 0.3 secs from not switching configs would help here. Separately, I also noticed something interesting in the log which is that right at the end of the test, when the pi was zipping up all of the images (to make it quicker to copy off the pi later), I got this error which I've never seen before:
Is it possible this could be causing the req.save issue, or is it just that I need to reseat the FPC cable? I should also note that I noticed when running htop, that the swap amount used on the pi zero 2 was quite high (about 190MB/200MB), but the actual RAM usage was still below 250MB/417MB in most cases so this seemed like a false alarm? |
Ok, so I did some further reading and it seems like this might be a solution to get what I described above:
It's based on the capture_dng_and_jpeg_helpers.py example. This new version of Unfortunately when I tested it, the time to run just this line took a maximum of 8.2 secs (and 3.7 secs on another iteration) during just the first few tests:
Which is strange because with the previous code, the limiting factor was saving the request data, not getting the request/buffer data itself... Also, I just did some further testing using the normal GPIO test, instead of me manually triggering Please let me know if there is a faster or more error-free solution, as per my previous message. |
Hi, thanks for the update. A couple of comments... Firstly, when you say "colours are different", do you mean they're actually different colours, or just that a the pixel level they show different levels of colour detail. The latter might be expected when using YUV420 (though to be honest, it's normally marginal to the point of being unnoticeable, except perhaps when there's nasty aliasing going on). The former is more surprising. I wonder if you should try setting the configuration's As regards the timing questions, I'm starting to get a bit lost in all the text, I don't know if it's possible for you to distill what you're doing into a simple ~20 line test case. Then I could run it easily and check what's happening. But broadly I think the approach sounds right. I'd have a main thread that starts and stops the encoder. Then another thread that does the big JPEG encodes sounds right. I'd either copy the image buffers and forward them, or I'd forward the requests directly, not sure. I'd be inclined to use a single thread there, so that you can wait for it to go idle before sending it more stuff. Edited to add: we're just about to post an updated release which will fix the original memory problem. But I think in this kind of case, the |
@davidplowman thanks, I'll investigate the colour discrepancy later, and will probably keep using Persistent Allocator as you suggested. Sorry for the overly verbose previous message. In the above code, The core code of take_frame that I have an issue with is this:
The issue I have (that I was talking about in my 1st reply yesterday) was that the time to run First, assume that a gpio signal pulse goes high every 4 seconds for 20ms. Therefore, my question is, how can I rewrite this code so that the saving of the jpeg is handled in a separate thread? In other words take_frame will return after stopping encoder, switching modes, getting the image data, switching back, and starting the encoder. That list of things in theory should be fairly fast (less than 4 sec). The separate thread will then save the jpeg image to disk in the background, so that if it ends up taking 9.1 secs to save, it will not block the next call of |
So I did some further testing with many different configurations, such as:
In all of these cases, given that I have ruled out disk IO speeds being an issue (such as in case 3), the common problem seems to be the commands for getting data in the first place, such as capturing a buffer, capturing a file/image, or saving a request. In other words, it seems specifically the process of getting the camera data sometimes runs extremely slowly (max times vary from 8-30 secs depending on the test run or the config). Given that these commands are the culprits, I'm not sure how I can go about fixing this? Also, I thought I should ask if it's best to move this conversation to a new issue, given that the problems that I'm asking about are now quite off-topic compared to my original post about the memory allocation problem? |
I took your code snippet above and turned into a standalone example (below). I didn't have a Pi Zero 2W to hand, so I took a Pi 3B and set
I found that mostly things were very well behaved. About 0.7s per JPEG, 0.01s to write the data, and the "full time" was easily under 1s. But occasionally I found it going wrong when everything was taking much longer. I could only reproduce this in the first few runs after booting up. Also, interestingly, without reducing the memory to 512MB, it thought it was OK. Could you confirm what behaviour you see with the script above? I understood (this is such a long thread now!) that you're using a Pi Zero 2W. Can you confirm that you're using the latest OS. Are you using 32 or 64 bit? |
@davidplowman for my OS, I'm running Raspberry Pi OS Bookworm Lite 64bit. OS Details (results of I took your code and added a few lines just to keep track of the max full time and to give me an alert if the full time went over 1 sec:
However it seems I didn't need them to tell me something was wrong because after a few runs of the program I got a full blown error and traceback which I've never seen before:
As you can see, the full runtime for the run with the error was 2.5 secs. Does this imply that I need to switch to the 32 bit lite OS? If it's something that I can fix without having to change hardware then I'd be more than happy to try, such as switching OS or updating something. |
Thanks for the update. That's all very interesting. I chose 32-bit because I think it's generally regarded as slightly easier on the memory, which might explain why you are perhaps seeing problems with greater frequency than I am. But even then, I have occasional problems on the 32-bit OS, most notably shortly after booting. The code I posted is vulnerable to crashing when a long delay happens (as you saw, and which I sometimes see also). The solution, or at least the sticking plaster, would be signalling back to the main thread once it's safe to proceed. (I don't know if that kind of a workaround - where you occasionally miss a capture - is any good to you or not.) However, I do want to poke around a bit more and see if there are any clues as to what's happening. After the system boots, there are usually a few processes that might come along and "do something", so perhaps they're getting in the way. Another thing I'm curious about is whether increasing the amount of swap helps - it's easy enough to try. Using lots of swap on a micro SD card is usually not a great idea, but if it's transient and happens rarely, then maybe that's not so bad. |
Well, I've poked round a bit more but am not very much the wiser. I can't really see how it can be struggling with memory because it seems to stay under 250MB at all times, yet it does seem sensitive to memory pressure. I found this variation which does reduce the memory footprint slightly, and I think it does work a bit better.
It takes a little longer to get going because of the cv2 import ( |
@davidplowman thanks for the code and suggestions! I tried the suggestion from your 1st message of increasing the swap amount, and I increased it to 1024MB but unfortunately that didn't help the original program and it still had a long max full time of 3.3 secs. Regarding the most new code you posted, I found it did help a lot, and using your test program I observed a max full time of 1.86-1.99 secs after about 1000 iterations. However, when I integrated this new code into my original program (as well as starting up another small program that I had shutdown to check whether it was causing the issues) and ran it with 1000 iterations in a for loop just like in your program, I unfortunately got the same traceback error from earlier:
Do you know how I can fix this error? I know you mentioned before that this crash can happen if the jpeg encode thread takes too long, but I don't quite understand why this would happen, or why it can't find a buffer? My rough guess about this is because it uses MappedArray, and that is directly accessing the buffer memory in order to get the raw frame data (instead of first making a copy of the buffer), then once a new request happens in the main thread, the program loses access to the old memory that MappedArray was accessing and it can't complete the encode, or maybe it can't start a new MappedArray for the next encode that is now in the queue? That is just a complete guess though, so I would be grateful if you could let me know how I can fix this error, and if it is possible to do so while not missing any frames to be encoded? One workaround I tried, was to make the encoding single threaded instead of multi threaded again, so now the main function blocks until For this use case, 3.66 secs blocking time is just about acceptable (the limit is approx 4 secs), but given that the average is about 0.8-1.2 secs, I would like to try and reduce this max value if possible, just in case it ends up going higher than 4 secs in the future because my testing was not thorough enough. |
I'll look again next week, when I may be able to find some more knowledgeable Linux-y types. In the meantime, if you did have a spare SD card lying around, I'd encourage you to give the 32-bit OS a go. It is thought to be better, but it would be interesting to know, and I don't currently have any other ideas to try, certainly nothing as relatively straightforward! |
@davidplowman I did some more testing over the weekend, and found out the following things:
I should note that the testing for this was done using my main program rather than your isolated test program, because I needed it to take pictures when the GPIO pin went high. Given that I am still having issues even with the "optimal" code using Mapped Array and multiple threads for encoding, and using the optimal OS (32bit lite), do I need to give up on trying to use this program with a pi zero 2 due to its lack of RAM? Now for the details to go with the above summary (you can skip these if needed): DMA error fix & using multiple render threads After implementing the new method where I spawn a new render thread each time we take a picture, I was able to get this new version to work fine with no errors even with 0 delay/sleep between Debugging continued slowdowns
For reference, a GPIO pulse was happening at an interval of about 3.8 seconds. You can see that in the top section, everything is working fine - both full_time_max and jpeg_time_max are low. However, the next GPIO signal received happens a whole 10 seconds after the previous one (23:51:06 vs 23:50:56), and there is a sudden spike of jpeg_time in the render thread to 5.3 secs. My guess about this is because even though the jpeg encode is happening in another thread, because python doesn't have real multi threading, is it possible that the jpeg thread could suddenly be assigned CPU time by the GIL for several seconds at a time when the main thread needs it? Or could it be something else in the linux system interfering with the python program or hogging the CPU for a few seconds? Another example is this:
You can see that strangely despite python itself reporting the full time to run the function as 0.5 secs, the first INFO message from picam2 doesn't appear in the logs until 7 secs after the GPIO message in the journald logs, which is contradictory. So this suggests maybe journald itself is lagging and not logging the messages at the actual time they occurred, so maybe I need to switch from print statements to python logging so that each message is output with an accurate timestamp? In the systemd definition file, I run the program with My final note is that last night I would get multiple failures in a row even though I was restarting the python program/systemd service each time, whereas today everything went fine on my first test. So I'm not sure if this indicates that there was a process running last night that is not running right now, such as journald doing a lot of logging cleanup or something in the background. |
Hi again. A few things...
|
Describe the bug
I have a python program called bambu_rpi_webcam.py that I created in order to run an MJPEG stream 24/7 in video mode and then occasionally switch to still mode to take a full resolution still for a timelapse when a signal is received from home assistant via MQTT.
The MJPEG server functionality is copy and pasted from the example code mjpeg_server_2.py.
To do the alternating functionality of streaming MJPEG and taking high res still images, I have two configs setup:
A lower resolution (1480, 1080) 30fps video config:
A high resolution (3552, 2592) stills config:
In order to quickly switch from the MJPEG mode to the stills mode, I use the following function, which gets called by a callback function when the MQTT signal is received:
So you can see the core of this function is to:
switch_mode_and_capture_file
to quickly switch to the stills config, take the image and then switch backThis function works completely fine the first few times it is called. However, after approx. 5 hours of running the program, and about 267 times of the MQTT handler calling this function, the program inevitably silently locks up with the error:
OSError: [Errno 12] Cannot allocate memory
If you look below you can see the full traceback for this error, and it is clear that it is caused when
switch_mode_and_capture_file
is called.In order to try and fix this error I tried following advice from #1102, to update my OS using:
I also updated picamera2 and others using:
However neither of these things helped get rid of the problem.
In #1102, I also saw a mention of a potential solution of commenting out this line in
/lib/udev/rules.d/60-dma-heap.rules
:SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached", OPTIONS+="link_priority=-50"
And in #1125, I saw a mention of using a "persistent allocator" to solve the problem.
Since these were 2 different solutions and different bugs, I wasn't sure if they were the correct solution for my exact issue.
In addition, I also saw a comment #1102 (comment) that said that #1102 was to do with the Pi 5, and so a new bug report should be opened for devices that aren't the Pi 5
And so for these 2 reasons I have opened this new issue.
To Reproduce
I have found it hard to reproduce the behaviour on demand by using a single test script. I tried using this test python script, but even with 10,000 iterations I could not force the bug to show up:
test_stability.py
Whereas if I let the original bambu_rpi_webcam.py program run for 5 hours and only 267 iterations then it will cause the bug.
Expected behaviour
I was expecting
switch_mode_and_capture_file
to be able to switch back and forth between video and still modes an infinite number of times without crashing due to a memory leak/bug.Clearly the problem is not a general lack of memory otherwise it would not work the first time, but this issue only occurs after several iterations of using
switch_mode_and_capture_file
.Console Output, Screenshots
The traceback when the error occurs is as follows:
Below is the full journalctl log of the program up until the point I noticed it was no longer working because of the memory error:
journalctl.log
Hardware :
Raspberry Pi Zero 2 W
Raspberry Pi Camera Module V3 Wide
Additional context
OS: Raspberry Pi OS Bookworm
OS version (result of
uname -a
):Linux p1s-cam 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux
picamera2 version (result of
dpkg -s python3-picamera2 | grep Version
):0.3.22-2
As per #1102:
Result of
ls -l /dev/dma_heap
:Contents of
/lib/udev/rules.d/60-dma-heap.rules
:The text was updated successfully, but these errors were encountered: