-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meminfo logspam after some uptime #170
Comments
vmlog.txt |
Hi thanks for the report, that is interresting! To notes, this log line is printed only when a packet arrives, when the free memory is less than 50%, after the fw triggers the GC, and the free memory after GC is still under 60%. I'm away from my laptop until 2023, but later I'll try to push the firewall to reproduce that (I succeeded to run with 16MB and the free memory is constantly under 20% so the GC is always triggered, but I haven't noticed any spin like that, the log printing stops right after the network activity). |
note its not a "hard" spin, there are minute-sized breaks between the logmsgs. the part that confuses me most is the large/sudden jump in free mem after GC. |
this may or may not be just an artifact of how the nat table works. test setup: upper-vm <-> mirage-fw <-> lower-vm the upper has
i changed the mirage-fw build to revert 06b9a88 to get logging outside the 50-60% free band. the tests are mostly performed in the lower-vm, generating traffic toward upper-vm. @hannesm helped me with some details about how the NAT is implemented, the key infos for me are:
my "stressers" for nat look like this:
(the last uses tcp by default, there is no --tcp) none of these will ever create a reply, so the traffic is purely lower->upper. for some reason i dont fully understand, that doesnt trigger the meminfo logmsg.
which reliably triggers 3-5 meminfo logprints. the mirage-fw has 42MB memory, which during startup looks like so:
if i trigger the memprint (via ping -f) after that, i get ~77.5% free. then going step-by-step with the stressers, checking free after each, i get:
the numbers at each stage do not seem to change if i run the stressers more than once. this to me confirms what i am seeing there is indeed memory usage based on nat-state table/cache/hash growth. if i run the stressers in reverse, it goes to 68.7% right after tcp, and doesnt change at all with udp or icmp stresser. so, at this point i would in general be tempted to file this under "yeah, thats just how it is", and file a PR to change the GC/reporting levels from 50%/60% to something more quiet. BUT i still dont see how i got to 57.2% free on the production instance. 68.7% (which is the max i can reach with stressing) vs 57.2% is well beyond rounding error. so something about this i still dont understand. random guess? some kind of memory arena fragementation, like what if the cache decides to grow while some of the mem is taken up by packets? |
found a way to soak up another 17% of memory: fragmentation. test setup the same as last post.
this sends 1000 byte sized packets with 100 byte fragment size, well within the max-16-frags limit.
this sends 1000 byte sized packets with 20 byte fragment size, which should be above the max-16-frag limit, but isnt?!
this sends 1000 byte sized packets with 16 byte fragment size, which finally triggers the too-many-frags codepath: running the same 1000/16 frag test again, plenty more max-frags logscrolling. if i repeat the 1000/16 frag test, no more changes. 51.3% free and 0.8% packet loss. if i vary the fragmentation now, it can actualy free up some memory!
lots of max-frag log scrolling, 52.7% free, 0.4% loss. frag test with 8000/512 leads to meminfo scrolling (!), and very few max-frag log entries, 53.6% free and 0.3% loss. frag test with 1000/16 (the initial bad one), back to 51.3% and 0.4% loss. so there is some upper bound on the impact of this. more poking around with fragmentation parameters gives varying inconsistent results. the overall vibe i am getting there is not actual leakage, but a fragmented memory arena. otoh the packet loss indicates it is ... unhappy.
so the packet loss may be caused by overall ram situation directly (no buffers?) or by triggering the GC+log codepath too often (blockage/distraction). some ideas for tests someone who (unlike me) actualy knows ocaml might try:
|
so, one possible outcome from this issue to me is still "we have a better understanding of how the memory is managed, adjust limits and logging, add a note to the documentation, and move on". so with some help from @hannesm i crafted xaki23@9b00613 key changes:
built that, deployed it to my testsetup, and ... but ... this doesnt happen when i run the tcp-stresser (and/or frag stresser) first. so, uh, wat? also, and even though this might remove the "pre-stress to make GC wake up" workaround, and without knowing if that can even be sanely done with ocaml ... |
i added a scheduled meminfo print every second (without forced GC) to get a better idea whats going on. some logs for this: memlog1.txt memlog2.txt memlog3.txt and, for comparison ... doesnt explain going straight from 80% free to OOM though. |
Wow, this is very interesting, your work and analysis is amazing, thanks a lot for that! Free memory is calculated from total memory minus used memory (the used memory calculation is done with every The With your memlog 1 to 3, I can see the situation:
My current assumption about the memory usage is, since I removed the 10-packets pending limit some months ago (abb5080) we can now, under stress, have many pending packets. Each packet is stored into a Cstruct (the qubes-mirage-firewall/client_net.ml Line 116 in 609f529
The promising part of your experiment is that if it is possible to allocate memory for the nat table and the fragment list, the ocaml runtime only have to deals packets with Cstruct which seems to work very well even without the |
(may be unrelated but posting it here because at the moment it is just "observations after long mirage-firewall uptimes" stuff) just had my main mirage-fw instance play possum ... ~38d of uptime, went rather unresponsive. was not passing packets, was still logging the (expected) meminfo logspam. nothing else useful in the log. unsure if there were any other relevant tests i could have done before restart. |
Hi @xaki23 , thanks for this update. I can't be sure that this is related but it might. Would you mind to try another With this and a 32MB fw, I have something like: 10MB is taken at startup (with only 1 client connected) before Logs for 32MB (don't know why only 27 are reported):
With 256 MB, it doesn't seem to drop below 231/251 MB (10 MB at startup and another 10 MB during rushes). I'll continue to try with that unikernel and I'd be interested in feedback over a longer period (feel free to increase the delay between log lines). |
Update: I managed to be OOMed:
Free memory being around 50% before suggests that in fact the unikernel hits the Gc.set {(Gc.get ()) with
Gc.allocation_policy = 0 ; (* next-fit allocation, will fragment => compact with major *)
Gc.max_overhead = 0 ; (* do a compaction at end of each major collection *)
Gc.space_overhead = 80 ; (* see https://v2.ocaml.org/api/Gc.html *)
Gc.major_heap_increment = 32766 ; (* incr heap size (asked to Solo5) by 256k (=32k words of 8B) *)
} ;
|
It seems to be more complicated, I'm still facing OOM :( Don't bother with this code, I'll try on my side first. |
the first line is startup/init.
then 10 days of occasional dipping to right below the 60% mark (which triggers the logging).
then something seems to have taken at least 3%+ of memory at once, and is not giving it back (so far).
there is nothing relevant-looking in the log right before it.
most of the "right below 60%" events and the jump to 57% are around 15-17 minutes "after the hour", which is the usual time a sync-backup-offsite job is running, which starts/stops a VM.
is there something like vm-state that is allocated in larger chunks when hitting some limit or so?
(which would explain the sudden big jump)
i will keep it running for now, perhaps at some point it will move away from the 57% area...
The text was updated successfully, but these errors were encountered: