Skip to content
WALDEMAR KOZACZUK edited this page Mar 24, 2020 · 21 revisions

Low-level Layer

Upon boot time, in arch_setup_free_memory(), OSv discovers how much physical memory is available by reading ent820 entries and then linearly maps the identified memory ranges by calling memory::free_initial_memory_range(). Once the memory is discovered, all corresponding memory ranges are ultimately registered in memory::free_page_ranges of type page_range_allocator that effectively tracks all used/free physical memory and implements lowest level memory allocation logic. The key fields of page_range_allocator are _free_huge and _free. The first one is an intrusive multiset of page ranges of size >= 256 MB, the latter is array of 16 intrusive lists where each stores page ranges of corresponding logarithmic size. At this level, memory is tracked/allocated/freed in 4K chunks (pages) aligned at 0x...000 addresses, which means that individual page range is a contiguous area of physical memory N-pages long.

So for example, given 100MB from the host on QEMU, OSv would find 3 memory ranges - smaller ~640KB in lower memory, medium 1MB located in the 2nd MB and the largest one starting at wherever loader.elf ends - roughly 9.5MB offset and ending at 100MB. With OSv running with 100MB of RAM and gdb paused right after arch_setup_free_memory(), the free_page_ranges looks like this:

(gdb) osv heap
0xffff800000001000 0x000000000009e000 // Lower RAM < 640KB 
0xffff800000100000 0x0000000000100000 // 2nd MB - ends right below kernel
0xffff800000950000 0x0000000005a8e000 // Starts right above the kernel

For more details on how memory is managed and setup at the lowest level, please read Managing Memory Pages.

High-level layer

From this point on OSv is ready to handle "malloc/free" family and memory::alloc_page()/free_page() calls by drawing/releasing memory from/to free_page_ranges in form of page_range objects (see methods page_range_allocator::alloc(), alloc_aligned() and free()) and mapping to virtual address ranges. However until much later when SMP is enabled (multiple vCPUs are fully activated), the allocations would be handled at a different granularity than after SMP is on. In addition in the first phase (pre-SMP enabled) the allocations draw pages directly from the free_page_ranges object, whereas after SMP is enabled they draw memory from L1/L2 pools. There are as many L1 pools as vCPUs (per-cpu construct) and a single global L2 pool - global_l2. The L1 pools draw pages from the global L2 pool which in turn draws page ranges from free_page_ranges. Both L1 and L2 pools operate at page size level and implement low/high watermark algorithm (for example L1 pools keep at least 128 pages of memory available).

TODO: Describe L1 and L2 in more detail

It is also worth noting that most malloc functions (except for malloc_large) end up calling std_malloc() that allocates virtual memory in different ways depending on whether we are in pre/post-SMP enabled mode and depending on the size of the memory request. The sizes ranges are:

  • x <= 1024 (page size/4)
  • 1024 < x <= 4096
  • x > 4096

If we are in SMP-enabled mode and requested size is less or equal 1024 bytes, the allocation is going to be delegated to malloc pools. Malloc pools are setup per-CPU and dedicated to specific size range (2^(k-1) < x <=2^k where k is less or equal 10). The way std_malloc() handles <= 4K allocations directly impacts varying degrees of underlying physical memory utilization. For example, any request above 1024 bytes will use the whole page and in worst-case scenario waste 3K of physical memory. Similarly, malloc pool allocations in worst-case scenarios may waste up to half of 2^k-1 segment size.

TODO: Describe how exactly pre-SMP and post-SMP memory allocation differs.

The malloc_large/free_large() calls draw memory directly free_page_range in both pre- and post-SMP-enabled phases.

Mapping

  • Page tables
  • Code in mmu.cc
 ------  0x 0000 0000 0000 0000
 |    | 
 ------  0x 0000 0040 0020 0000  elf_start     --\ 
 |    |                                          |- Kernel (Core ELF) - < 8MB 
 ------  0x 0000 0040 00a0 0000  elf_start + elf_size 
 |    | 
 |-----  0x 0000 1000 0000 0000  program_base  - 16 T --\  
 |    |                                          |- s_program - 8G 
 |----|  0x 0000 1002 0000 0000  --\           --X 
 |    |                             |- Program   | 
 |----|  0x 0000 1004 0000 0000  --/             | 
 |    |                                          |- ELF Namespaces(max: 32) - 256G 
 |    |  ......................                  | 
 |    |                                          | 
 |----|  0x 0000 1042 0000 0000                --/ 
 |    |
 |----|  0x 0000 2000 0000 0000  VMAs start    --\
 |    |                                          | - VMAs (mmap)  
 |-----  0x 0000 8000 0000 0000  VMAs end      --/
 |    | 
 ------  0x ffff 8000 0000 0000  phys_mem      --\ 
 |    |                                          |- Main Area - 16T 
 ------  0x ffff 9000 0000 0000                --X 
 |    |                                          |- Page Area - 16T 
 ------  0x ffff a000 0000 0000                --X 
 |    |                                          |- Mempool Area - 16T 
 ------  0x ffff b000 0000 0000                --X 
 |    |                                          |- Debug Area - 80T 
 ------  0x ffff ffff ffff ffff                --/
Clone this wiki locally