Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meminfo logspam after some uptime #170

Open
xaki23 opened this issue Dec 23, 2022 · 12 comments
Open

meminfo logspam after some uptime #170

xaki23 opened this issue Dec 23, 2022 · 12 comments

Comments

@xaki23
Copy link
Contributor

xaki23 commented Dec 23, 2022

[2022-12-11 21:00:22] 2022-12-11 20:00:22 -00:00: INF [memory_pressure] Writing meminfo: free 30MiB / 37MiB (80.11 %)

[2022-12-12 04:16:04] 2022-12-12 03:16:03 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.97 %)
[2022-12-12 17:15:51] 2022-12-12 16:15:51 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (59.72 %)
[2022-12-12 18:16:07] 2022-12-12 17:16:07 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (60.00 %)
[2022-12-13 04:17:06] 2022-12-13 03:17:05 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.94 %)
[2022-12-14 06:16:33] 2022-12-14 05:16:31 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (59.84 %)
[2022-12-14 08:15:56] 2022-12-14 07:15:53 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.90 %)
[2022-12-15 17:16:18] 2022-12-15 16:16:14 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (59.77 %)
[2022-12-15 19:16:32] 2022-12-15 18:16:29 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.90 %)
[2022-12-15 20:16:10] 2022-12-15 19:16:07 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.99 %)
[2022-12-16 10:16:20] 2022-12-16 09:16:16 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.94 %)
[2022-12-17 02:17:06] 2022-12-17 01:17:01 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.93 %)
[2022-12-18 04:17:05] 2022-12-18 03:16:59 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.95 %)
[2022-12-19 08:16:25] 2022-12-19 07:16:18 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.99 %)
[2022-12-21 19:15:59] 2022-12-21 18:15:49 -00:00: INF [memory_pressure] Writing meminfo: free 23MiB / 37MiB (59.93 %)
[2022-12-21 21:17:06] 2022-12-21 20:16:58 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (59.71 %)

[2022-12-22 15:16:49] 2022-12-22 14:16:40 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.05 %)
[2022-12-22 15:16:50] 2022-12-22 14:16:40 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.16 %)
[2022-12-22 15:16:50] 2022-12-22 14:16:41 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.19 %)
[2022-12-22 15:16:51] 2022-12-22 14:16:42 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.19 %)
[2022-12-22 15:16:52] 2022-12-22 14:16:42 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.08 %)
[2022-12-22 15:16:52] 2022-12-22 14:16:43 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.11 %)
[2022-12-22 15:16:53] 2022-12-22 14:16:43 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.16 %)
[2022-12-22 15:16:54] 2022-12-22 14:16:44 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.18 %)
[2022-12-22 15:16:54] 2022-12-22 14:16:45 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.10 %)
[2022-12-22 15:16:55] 2022-12-22 14:16:45 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.03 %)
[2022-12-22 15:16:55] 2022-12-22 14:16:46 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.18 %)
[2022-12-22 15:16:56] 2022-12-22 14:16:47 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.22 %)
[2022-12-22 15:16:58] 2022-12-22 14:16:48 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.22 %)
[2022-12-22 15:16:59] 2022-12-22 14:16:50 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-22 15:17:00] 2022-12-22 14:16:50 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (56.98 %)
[2022-12-22 15:17:00] 2022-12-22 14:16:50 -00:00: INF [memory_pressure] Writing meminfo: free 21MiB / 37MiB (57.08 %)
[2022-12-22 15:17:00] 2022-12-22 14:16:51 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.18 %)
[2022-12-22 15:17:01] 2022-12-22 14:16:52 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.22 %)
[thousands of lines all around 57% omitted]
[2022-12-23 16:58:07] 2022-12-23 15:57:56 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:58:07] 2022-12-23 15:57:56 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.22 %)
[2022-12-23 16:58:14] 2022-12-23 15:58:04 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:58:56] 2022-12-23 15:58:45 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:08] 2022-12-23 15:58:57 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:08] 2022-12-23 15:58:57 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:15] 2022-12-23 15:59:04 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:15] 2022-12-23 15:59:04 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:15] 2022-12-23 15:59:04 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.21 %)
[2022-12-23 16:59:15] 2022-12-23 15:59:04 -00:00: INF [memory_pressure] Writing meminfo: free 22MiB / 37MiB (57.23 %)
[and still ongoing]

the first line is startup/init.
then 10 days of occasional dipping to right below the 60% mark (which triggers the logging).
then something seems to have taken at least 3%+ of memory at once, and is not giving it back (so far).

there is nothing relevant-looking in the log right before it.
most of the "right below 60%" events and the jump to 57% are around 15-17 minutes "after the hour", which is the usual time a sync-backup-offsite job is running, which starts/stops a VM.

is there something like vm-state that is allocated in larger chunks when hitting some limit or so?
(which would explain the sudden big jump)

i will keep it running for now, perhaps at some point it will move away from the 57% area...

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 23, 2022

vmlog.txt
extracted "add client" and "has gone" loglines, added a running count of how many VMs currently active at the front.
so the trigger was not it hitting some magic number of concurrent active VMs.
it was ping-pong-ing between 17 and 18 active at the time, but f.ex. 2022-12-12 it was 18-19.

@palainp
Copy link
Member

palainp commented Dec 23, 2022

Hi thanks for the report, that is interresting!

To notes, this log line is printed only when a packet arrives, when the free memory is less than 50%, after the fw triggers the GC, and the free memory after GC is still under 60%.
Unfortunately I have no clue why it sudden appear, and the infinite spin makes me think that the firewall continues to think that a packet needs to be forwarded.

I'm away from my laptop until 2023, but later I'll try to push the firewall to reproduce that (I succeeded to run with 16MB and the free memory is constantly under 20% so the GC is always triggered, but I haven't noticed any spin like that, the log printing stops right after the network activity).

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 23, 2022

note its not a "hard" spin, there are minute-sized breaks between the logmsgs.
the FW also continues to work just fine.
and, it being my main FW, it is quite fair to assume it has constant (but variable) traffic.
the "just below 60%" cases in the first 10 days were during times of high outgoing traffic (== backup sync).

the part that confuses me most is the large/sudden jump in free mem after GC.
a slow-but-constant leaking would leave a very different pattern.
here, it suddenly dropped by at least 3%, probably a good chunk more.

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 27, 2022

this may or may not be just an artifact of how the nat table works.

test setup: upper-vm <-> mirage-fw <-> lower-vm

the upper has

  • no upstream interface // netvm,
  • a default route pointing at dev lo,
  • no INPUT firewall.

i changed the mirage-fw build to revert 06b9a88 to get logging outside the 50-60% free band.

the tests are mostly performed in the lower-vm, generating traffic toward upper-vm.

@hannesm helped me with some details about how the NAT is implemented, the key infos for me are:

  • default state table size 5000
  • 7/8 for tcp
  • rest udp
  • 100 icmp
  • (no, the order with rest-before-icmp doesnt make sense to me)

my "stressers" for nat look like this:

sudo hping3 --faster --rand-dest -I eth0 -c 100000 --icmp 3.x.x.x 
sudo hping3 --faster --rand-dest -I eth0 -c 100000 --udp 2.x.x.x 
sudo hping3 --faster --rand-dest -I eth0 -c 100000 1.x.x.x 

(the last uses tcp by default, there is no --tcp)

none of these will ever create a reply, so the traffic is purely lower->upper.

for some reason i dont fully understand, that doesnt trigger the meminfo logmsg.
so my way to make it log meminfo is:

sudo ping -f <ip of upper-vm> -c 10000 

which reliably triggers 3-5 meminfo logprints.

the mirage-fw has 42MB memory, which during startup looks like so:

[2022-12-27 15:13:41] Solo5: Memory map: 42 MB addressable: 
[2022-12-27 15:13:41] 2022-12-27 14:13:41 -00:00: INF [memory_pressure] Writing meminfo: free 30MiB / 37MiB (80.10 %) 

if i trigger the memprint (via ping -f) after that, i get ~77.5% free.

then going step-by-step with the stressers, checking free after each, i get:

  • after icmp 76.3%
  • after udp 74.8%
  • after tcp 68.7%

the numbers at each stage do not seem to change if i run the stressers more than once.

this to me confirms what i am seeing there is indeed memory usage based on nat-state table/cache/hash growth.
the proportions (1.2%, 1.5%, 6.1%) do not follow the expected 2:12:87 pattern, but thats probably some kind of os/ocaml/hashtable level quantization/chunking.

if i run the stressers in reverse, it goes to 68.7% right after tcp, and doesnt change at all with udp or icmp stresser.
this again would confirm there is some quantization/chunking involved.

so, at this point i would in general be tempted to file this under "yeah, thats just how it is", and file a PR to change the GC/reporting levels from 50%/60% to something more quiet.

BUT i still dont see how i got to 57.2% free on the production instance.
(which after 5 additional days is still hovering at 57.2% +/- 0.3%, and working well)

68.7% (which is the max i can reach with stressing) vs 57.2% is well beyond rounding error.

so something about this i still dont understand.

random guess? some kind of memory arena fragementation, like what if the cache decides to grow while some of the mem is taken up by packets?

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 28, 2022

found a way to soak up another 17% of memory: fragmentation.
in particular hitting the "too many fragments" codepath.
and once it is in that state, i am seeing packet loss too!

test setup the same as last post.
also reusing the "tcp stresser" and "ping -f to get meminfo" commandlines from that.

  • freshly started fw: 77.5% free mem
  • one round of tcp stresser: 68.8% free
  • the ping -f is showing no packet loss at this point
sudo hping3 --faster --rand-dest -I eth0 -c 100000 -f -d 1000 -m 100 1.x.x.x

this sends 1000 byte sized packets with 100 byte fragment size, well within the max-16-frags limit.
after this, free mem still 68.8% and no packet loss.
== no change.

sudo hping3 --faster --rand-dest -I eth0 -c 100000 -f -d 1000 -m 20 1.x.x.x

this sends 1000 byte sized packets with 20 byte fragment size, which should be above the max-16-frag limit, but isnt?!
after this, free mem still 68.8% and no packet loss.
== no change.

sudo hping3 --faster --rand-dest -I eth0 -c 100000 -f -d 1000 -m 16 1.x.x.x

this sends 1000 byte sized packets with 16 byte fragment size, which finally triggers the too-many-frags codepath:
lots of "dropping from cache, maximum number of fragments exceeded" in log.
after this, free mem 61.1% and no packet loss.

running the same 1000/16 frag test again, plenty more max-frags logscrolling.
free mem now 51.3% and the ping -f test starts showing 0.8% packet loss.
it is also spewing a lot more meminfo lines, which make sense considering now after GC we are barely above 50%.
== it is hitting the forced GC more often.

if i repeat the 1000/16 frag test, no more changes. 51.3% free and 0.8% packet loss.
so

if i vary the fragmentation now, it can actualy free up some memory!

sudo hping3 --faster --rand-dest -I eth0 -c 100000 -f -d 2000 -m 32 1.x.x.x

lots of max-frag log scrolling, 52.7% free, 0.4% loss.
after repeats still 52.7% free and 0.4% loss.
frag test with 4000/32 still 52.7% free and 0.4% loss.
frag test with 4000/128 now 53.8% free and 0.3% loss.

frag test with 8000/512 leads to meminfo scrolling (!), and very few max-frag log entries, 53.6% free and 0.3% loss.

frag test with 1000/16 (the initial bad one), back to 51.3% and 0.4% loss.

so there is some upper bound on the impact of this.
(though that could be because of the test design too)

more poking around with fragmentation parameters gives varying inconsistent results.
at some point 8000/1024 resulted in mem free 50% and 1.1% loss, but unable to repro that.

the overall vibe i am getting there is not actual leakage, but a fragmented memory arena.
perhaps composing the max-frag logmsg to be sent actualy leads to the "leakage"?

otoh the packet loss indicates it is ... unhappy.
to check if the loss is just because overall low memory situation...
i tried to resized fw vm to 50M ram (+8MB).

  • after start 83.7% free
  • unable to trigger meminfo with ping -f right after start (== doesnt push it below 50%)
  • after tcp stress 74.4% free (== with nat cache fully extended)
  • after 1000/100 frags still 74.4%
  • after 1000/16 frags 68%
  • after another 1000/16 frags 60%
  • stable at that point, with no packet loss.

so the packet loss may be caused by overall ram situation directly (no buffers?) or by triggering the GC+log codepath too often (blockage/distraction).

some ideas for tests someone who (unlike me) actualy knows ocaml might try:

  • replace the meminfo print in GC codepath by a timer-scheduled one
  • disable the max-frag log message to see if the log message itself is causing the impact

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 28, 2022

so, one possible outcome from this issue to me is still "we have a better understanding of how the memory is managed, adjust limits and logging, add a note to the documentation, and move on".
as in, there isnt an actual problem, we are just seeing high-uptime mirage-fw for the first time.

so with some help from @hannesm i crafted xaki23@9b00613

key changes:

  • reintroduce a variant of scheduled meminfo based on changes @talex5 made some months ago
  • change forced-gc limit to 10% free, print meminfo when this is hit

built that, deployed it to my testsetup, and ...
... it died from OOM on the very first ping -f.
without the forced-gc ever going off.

but ... this doesnt happen when i run the tcp-stresser (and/or frag stresser) first.
as in, if i start with a clean 80% mem free firewall, a ping -f (in particular with larger packets, like -s 1400) drives it straight into an OOM wall.
but if i start with 69% or 51% mem free, this doesnt happen.
during this, the <10% free condition never seems to be met, so the GC seems to be doing its thing on its own.

so, uh, wat?
unless i pre-stress the memory, the GC is too naive/lazy to do its thing?

also, and even though this might remove the "pre-stress to make GC wake up" workaround, and without knowing if that can even be sanely done with ocaml ...
... perhaps we should consider pre-allocating the max needed space for both nat cache and fragmentation. since both have fixed upper limits afaict, that would make the situation a lot clearer.

@xaki23
Copy link
Contributor Author

xaki23 commented Dec 28, 2022

i added a scheduled meminfo print every second (without forced GC) to get a better idea whats going on.

some logs for this:

memlog1.txt
clean arena, the logged arp call marks the start of the ping -f

memlog2.txt
clean arena, OOM right after the arp // start of ping -f

memlog3.txt
clean arena, this one took a whole lot longer, was yoyo-ing between 40-60%, then went back up to almost 80%, then ran straight into the wall again.

and, for comparison ...
memlog4.txt
clean start, running tcp-stresser (takes it to 68.x %), then ping -f ...
happily keeps yoyo-ing between 40-60%.
log is truncated, did that for a while longer, with different packet sizes too, no OOM or relevant change in pattern.
sometimes with larger packets it dips down to something like 29%, but clearly the GC seems to have some internal 40% threshold, which explains why the forced-gc at <10% in status() is never firing.

doesnt explain going straight from 80% free to OOM though.

@palainp
Copy link
Member

palainp commented Dec 30, 2022

Wow, this is very interesting, your work and analysis is amazing, thanks a lot for that!

Free memory is calculated from total memory minus used memory (the used memory calculation is done with every malloc/free calls done by the ocaml runtime (mirage/ocaml-solo5@cf50612)), so if we have a very fragmented memory (and we can with xen, I observed a lot of small memory requests from 32B to 8kB, unsure what this is related to, may be entropy gathering?), we can have 50%+ free memory (less than 50% of the total memory in use, this may can represent a larger percentage of the current heap size) and not be able to expand heap size. A previous estimation used the "top of the heap" address, which allowed to work up to 90% of the total memory, but now, due to fragmentation we can fail earlier. The change from 10% to 50% was a fix for #143 .

The Fatal error: out of memory comes from the runtime when sbrk cannot increase the memory given to ocaml (https://github.com/mirage/ocaml-solo5/blob/714de76319d1c3137302a2fd4373a5a1284a4a91/nolibc/sysdeps_solo5.c#L115). According on the GC doc (https://v2.ocaml.org/api/Gc.html) the default increment is with 15% steps.

With your memlog 1 to 3, I can see the situation:

  • around 7MiB (a bit less than 20%) for the kernel + starting allocations (this seems coherent with my unstripped kernel size, as a side note the latest Dockerfile strips the kernel to around 4MiB). It seems fair to assume that this memory is at bottom the the heap.
  • less than 1% of other memory, it can be anywhere in the heap, it depends where the runtime wants that memory to be located.
  • the runtime wants more memory and solo5 fails to increase the heap size.

My current assumption about the memory usage is, since I removed the 10-packets pending limit some months ago (abb5080) we can now, under stress, have many pending packets. Each packet is stored into a Cstruct (the frame buffer in

Netback.listen backend ~header_size:Ethernet.Packet.sizeof_ethernet (fun frame ->
comes from https://github.com/mirage/mirage-net-xen/blob/939dfcbfc31f6c396532c08d8857f2672d90f2f3/lib/backend.ml#L235). I'm not sure how the GC deals with Cstructs but it can store them into the high part of the heap and at any point needs to increase the heap size (this is pure speculation, I haven't dived into that part of ocaml yet :) but according to memlog4 this is probably not the case).

The promising part of your experiment is that if it is possible to allocate memory for the nat table and the fragment list, the ocaml runtime only have to deals packets with Cstruct which seems to work very well even without the memory_pressure.ml code :)

@xaki23
Copy link
Contributor Author

xaki23 commented Mar 17, 2023

(may be unrelated but posting it here because at the moment it is just "observations after long mirage-firewall uptimes" stuff)

just had my main mirage-fw instance play possum ... ~38d of uptime, went rather unresponsive. was not passing packets, was still logging the (expected) meminfo logspam. nothing else useful in the log.
sys-net was okay, anything directly connected to sys-net (aka dmz) was okay, hard restart of sys-mfwt recovered net for downstream VMs.

unsure if there were any other relevant tests i could have done before restart.

@palainp
Copy link
Member

palainp commented Mar 17, 2023

Hi @xaki23 , thanks for this update. I can't be sure that this is related but it might.

Would you mind to try another memory_pressure approach (https://github.com/palainp/qubes-mirage-firewall.git branch copy-packet). The idea is to check the size of the free memory (it should be more than 6MB) and not a ratio anymore (which may have no sense at all as the total memory can be very different in one or another vm). I also removed the spamming log line and replaced it with a status printout every 10 seconds (this delay should be increased if in production, removing io logs also benefits to maximise the available bandwidth).

With this and a 32MB fw, I have something like: 10MB is taken at startup (with only 1 client connected) before memory_pressure.init, when I try to overload the network bandwidth, it drops to around the 6MB limit, and then cruises between 6 and 12MB. I assume that the 5MB difference to the startup value might be related to NAT tables.

Logs for 32MB (don't know why only 27 are reported):

[2023-03-17 17:12:51] 2023-03-17 16:12:51 -00:00: INF [memory_pressure] Memory usage: free 17MiB / 27MiB (61.56 %)
[2023-03-17 17:13:01] 2023-03-17 16:13:01 -00:00: INF [memory_pressure] Memory usage: free 17MiB / 27MiB (61.50 %)
[2023-03-17 17:13:11] 2023-03-17 16:13:11 -00:00: INF [memory_pressure] Memory usage: free 15MiB / 27MiB (53.00 %)
[2023-03-17 17:13:21] 2023-03-17 16:13:21 -00:00: INF [memory_pressure] Memory usage: free 10MiB / 27MiB (37.51 %)
[2023-03-17 17:13:31] 2023-03-17 16:13:31 -00:00: INF [memory_pressure] Memory usage: free 9.1MiB / 27MiB (34.18 %)
[2023-03-17 17:13:41] 2023-03-17 16:13:41 -00:00: INF [memory_pressure] Memory usage: free 8.4MiB / 27MiB (31.52 %)
[2023-03-17 17:13:51] 2023-03-17 16:13:51 -00:00: INF [memory_pressure] Memory usage: free 9.1MiB / 27MiB (34.05 %)
[2023-03-17 17:14:01] 2023-03-17 16:14:01 -00:00: INF [memory_pressure] Memory usage: free 6.5MiB / 27MiB (24.34 %)
[2023-03-17 17:14:11] 2023-03-17 16:14:11 -00:00: INF [memory_pressure] Memory usage: free 12MiB / 27MiB (42.23 %)
[2023-03-17 17:14:21] 2023-03-17 16:14:21 -00:00: INF [memory_pressure] Memory usage: free 8.4MiB / 27MiB (31.58 %)

With 256 MB, it doesn't seem to drop below 231/251 MB (10 MB at startup and another 10 MB during rushes). I'll continue to try with that unikernel and I'd be interested in feedback over a longer period (feel free to increase the delay between log lines).

@palainp
Copy link
Member

palainp commented Mar 21, 2023

Update: I managed to be OOMed:

[2023-03-21 08:24:46] 2023-03-21 07:24:46 -00:00: INF [memory_pressure] Memory usage: free 13MiB / 27MiB (46.43 %)
[2023-03-21 08:24:56] 2023-03-21 07:24:56 -00:00: INF [memory_pressure] Memory usage: free 13MiB / 27MiB (45.77 %)
[2023-03-21 08:25:01] Io_page: memalign(4096, 4096) failed, even after GC.
[2023-03-21 08:25:01] Fatal error: exception Out of memory
[2023-03-21 08:25:01] Raised by primitive operation at Io_page.get in file "duniverse/io-page/lib/io_page.ml", line 44, characters 4-22
[2023-03-21 08:25:01] Called from Io_page.pages.inner in file "duniverse/io-page/lib/io_page.ml", line 59, characters 25-32
[2023-03-21 08:25:01] Called from Netchannel__Frontend.Make.refill_requests.(fun) in file "duniverse/mirage-net-xen/lib/frontend.ml", line 152, characters 18-35
[2023-03-21 08:25:01] Called from Netchannel__Frontend.Make.listen.loop.(fun) in file "duniverse/mirage-net-xen/lib/frontend.ml", line 223, characters 6-26
[2023-03-21 08:25:01] Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "duniverse/lwt/src/core/lwt.ml", line 1849, characters 23-26
[2023-03-21 08:25:01] Re-raised at Lwt.Miscellaneous.poll in file "duniverse/lwt/src/core/lwt.ml", line 3077, characters 20-29
[2023-03-21 08:25:01] Called from Xen_os__Main.run.aux in file "duniverse/mirage-xen/lib/main.ml", line 37, characters 10-20
[2023-03-21 08:25:01] Called from Dune__exe__Main.run in file "main.ml" (inlined), line 3, characters 12-29
[2023-03-21 08:25:01] Called from Dune__exe__Main in file "main.ml", line 115, characters 5-10
[2023-03-21 08:25:01] Solo5: solo5_exit(2) called

Free memory being around 50% before suggests that in fact the unikernel hits the sbrk limit of solo5 (Solo5 won't permit the heap to grows over 1MB to the memory size to leave some space to the stack https://github.com/mirage/ocaml-solo5/blob/714de76319d1c3137302a2fd4373a5a1284a4a91/nolibc/sysdeps_solo5.c#L115).
So I pushed another commit which changes that:

  Gc.set {(Gc.get ()) with
    Gc.allocation_policy = 0 ; (* next-fit allocation, will fragment => compact with major *)
    Gc.max_overhead = 0 ; (* do a compaction at end of each major collection *)
    Gc.space_overhead = 80 ; (* see https://v2.ocaml.org/api/Gc.html *)
    Gc.major_heap_increment = 32766 ; (* incr heap size (asked to Solo5) by 256k (=32k words of 8B) *)
  } ;

major_heap_increment will tell Solo5 to increase the size of the heap in smaller, fixed steps (the default is 15% of the current size). max_overhead will try to compact the heap for each major collection to limit the growth of the heap, and as we manually trigger the GC, I try to switch to the simpler next-fit collector (with space_overhead limited as described in the documentation).

@palainp
Copy link
Member

palainp commented Mar 21, 2023

It seems to be more complicated, I'm still facing OOM :( Don't bother with this code, I'll try on my side first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants