-
-
Notifications
You must be signed in to change notification settings - Fork 603
Memory Management
Upon boot time, in arch_setup_free_memory()
, OSv discovers how much physical memory is available by reading ent820 entries and then linearly maps the identified memory ranges by calling memory::free_initial_memory_range()
. Once the memory is discovered, all corresponding memory ranges are ultimately registered in memory::free_page_ranges
of type page_range_allocator
that effectively tracks all used/free physical memory and implements lowest level memory allocation logic. The key fields of page_range_allocator
are _free_huge
and _free
. The first one is an intrusive multiset of page ranges of size >= 256 MB, the latter is array of 16 intrusive lists where each stores page ranges of corresponding logarithmic size. At this level, memory is tracked/allocated/freed in 4K chunks (pages) aligned at 0x...000 addresses, which means that individual page range is a contiguous area of physical memory N-pages long.
So for example, given 100MB from the host on QEMU, OSv would find 3 memory ranges - smaller ~640KB in lower memory, medium 1MB located in the 2nd MB and the largest one starting at wherever loader.elf
ends - roughly 9.5MB offset and ending at 100MB. With OSv running with 100MB of RAM and gdb paused right after arch_setup_free_memory()
, the free_page_ranges
looks like this:
(gdb) osv heap
0xffff800000001000 0x000000000009e000 // Lower RAM < 640KB
0xffff800000100000 0x0000000000100000 // 2nd MB - ends right below kernel
0xffff800000950000 0x0000000005a8e000 // Starts right above the kernel
For more details on how memory is managed and setup at the lowest level, please read Managing Memory Pages.
From this point on OSv is ready to handle "malloc/free" family and memory::alloc_page()/free_page()
calls by drawing/releasing memory from/to free_page_ranges
in form of page_range
objects (see methods page_range_allocator::alloc()
, alloc_aligned()
and free()
) and mapping to virtual address ranges. However until much later when SMP is enabled (multiple vCPUs are fully activated), the allocations would be handled at a different granularity than after SMP is on. In addition in the first phase (pre-SMP enabled) the allocations draw pages directly from the free_page_ranges
object, whereas after SMP is enabled they draw memory from L1/L2 pools. There are as many L1 pools as vCPUs (per-cpu
construct) and a single global L2 pool. The L1 pools draw pages from the L2 pool which in turn draws page ranges from free_page_ranges
. Both L1 and L2 pools operate at page size level and implement low/high watermark algorithm (for example L1 pools keep at least 128 pages of memory available).
It is also worth noting that most malloc functions (except malloc_large()
) end up calling std_malloc()
(see https://github.com/cloudius-systems/osv/blob/186779b2e477815bbcea8ccff6ba26a7e21cea09/core/mempool.cc#L1544-L1565) that allocates virtual memory in different ways depending on whether we are in pre/post-SMP enabled mode and depending on the size of the memory request. The sizes ranges are:
- x <= 1024 (page size/4)
- 1024 < x <= 4096
- x > 4096
If we are in SMP-enabled mode and requested size is less or equal 1024 bytes, the allocation is going to be delegated to malloc pools (see https://github.com/cloudius-systems/osv/blob/186779b2e477815bbcea8ccff6ba26a7e21cea09/core/mempool.cc#L177-L353). Malloc pools are setup per-CPU and dedicated to specific size range (2^(k-1) < x <=2^k where k is less or equal 10). The way std_malloc()
handles <= 4K allocations directly impacts varying degrees of underlying physical memory utilization. For example, any request above 1024 bytes will use the whole page and in worst-case scenario waste 3K of physical memory. Similarly, malloc pool allocations in worst-case scenarios may waste up to half of 2^k-1 segment size.
The malloc_large/free_large()
calls draw memory directly free_page_range in both pre- and post-SMP-enabled phases.
- Page tables
- Code in mmu.cc
------ 0x 0000 0000 0000 0000
| |
------ 0x 0000 0040 0020 0000 elf_start --\
| | |- Kernel (Core ELF) - < 8MB
------ 0x 0000 0040 00a0 0000 elf_start + elf_size
| |
|----- 0x 0000 1000 0000 0000 program_base - 16 T --\
| | |- s_program - 8G
|----| 0x 0000 1002 0000 0000 --\ --X
| | |- Program |
|----| 0x 0000 1004 0000 0000 --/ |
| | |- ELF Namespaces(max: 32) - 256G
| | ...................... |
| | |
|----| 0x 0000 1042 0000 0000 --/
| |
|----| 0x 0000 2000 0000 0000 VMAs start --\
| | | - VMAs (mmap)
|----- 0x 0000 8000 0000 0000 VMAs end --/
| |
------ 0x ffff 8000 0000 0000 phys_mem --\
| | |- Main Area - 16T
------ 0x ffff 9000 0000 0000 --X
| | |- Page Area - 16T
------ 0x ffff a000 0000 0000 --X
| | |- Mempool Area - 16T
------ 0x ffff b000 0000 0000 --X
| | |- Debug Area - 80T
------ 0x ffff ffff ffff ffff --/