Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support emulated NUMA for BCM2711 and BCM2712 #6273

Merged
merged 8 commits into from
Oct 22, 2024

Conversation

popcornmix
Copy link
Collaborator

From: Tvrtko Ursulin [email protected]

This series adds a very simple NUMA emulation implementation and enables
selecting it on arm64 platforms.

Obvious question is why? Short answer - it can bring a significant performance
uplift on Raspberry Pi 5.

Longer answer is that splitting the physical RAM into chunks, and utilising an
allocation policy such as interleaving, can enable the BCM2712 memory controller
to better utilise parallelism in physical memory chip organisation.

In more concrete numbers, testing with Geekbench 6 shows that splitting into four
emulated NUMA nodes can uplift the single core score of the benchmark by around
6%, and the multi-core by around 18%.

Code is quite simple and new functionality can be enabled using the new
NUMA_EMULATION Kconfig option and then at runtime using the existing (shared
with other platforms) numa=fake= kernel boot argument.

@popcornmix
Copy link
Collaborator Author

This PR also allows numa to be configured system wide (without directly using numactl) and enabled it by default through bootargs.

@pelwell I'm sure the bootargs change is better in a lower level file.

@pelwell
Copy link
Contributor

pelwell commented Jul 19, 2024

I'm sure the bootargs change is better in a lower level file.

It isn't possible to put some command line settings in a common file and add to/modify it elsewhere, but if there are some devices that share a common command line then we can put it in a common file and override it in its entirety where necessary.

As it happens, of the BCM2711 devices, only CM4S has a different command line, and BCM2712 devices share the same one.

Is this something you are thinking to enable by default on BCM2711 and BCM2712?

@popcornmix
Copy link
Collaborator Author

Is this something you are thinking to enable by default on BCM2711 and BCM2712?

Yes - enabled by default on BCM2711 and BCM2712 (ideally on all distributions) is the goal.

@pelwell
Copy link
Contributor

pelwell commented Jul 23, 2024

Bumped with a minor DT refactor.

@DTEAM-1
Copy link

DTEAM-1 commented Sep 24, 2024

Quick question. Can I use this script on a pi4 (64-Bit)?

@popcornmix
Copy link
Collaborator Author

Quick question. Can I use this script on a pi4 (64-Bit)?

There are benefits from NUMA on a Pi4, but not as great as those available on Pi5.
I would only recommend this if you know what you are doing (i.e. can restore/revert back if it doesn't work for you).

It is likely in the coming weeks that we will merge enough (both bootloader and kernel) so NUMA is easy to switch on to experiment with. Following some wider testing it will likely be enabled by default.

@popcornmix popcornmix marked this pull request as ready for review October 18, 2024 18:02
@popcornmix
Copy link
Collaborator Author

popcornmix commented Oct 18, 2024

I've added a patch to disable numa if the settings are incompatible with cma settings (as otherwise cma heap creation fails and that leads to a hung boot with no output).

If numa is disabled due to cma, you'll see a message like:

[    0.000000] cma: CMA linux,cma [2000000-15ffffff] straddles range [0-fffffff]

and only a single numa region:

[    0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x000000007fffffff]

If numa is successful, say with 8 nodes, you would expect:

[    0.000000] NUMA: NODE_DATA [mem 0x0fffd2c0-0x0fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1fffd2c0-0x1fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x2fffd2c0-0x2fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x3f7fd2c0-0x3f7fffff]
[    0.000000] NUMA: NODE_DATA [mem 0x4fffd2c0-0x4fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x5fffd2c0-0x5fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x6fffd2c0-0x6fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x7fe892c0-0x7fe8bfff]

The default cmdline adjustment is now numa_policy=interleave for 2711 and 2712, and additionally system_heap.max_order=0 iommu_dma_numa_policy=interleave for just 2712 (due to its better iommu support).

The key setting numa=fake=<n> is not set here, so we will boot with a single numa region and behaviour should be pretty much unchanged from before this PR.

But users can set it themselves, or if they have a new enough bootloader, it will set it optimally, based on whether the sdram is single or dual rank, and what the bank low setting is.

For the automatic setting, you currently need to set rpi-eeprom-config parameter SDRAM_BANKLOW to 3 (recommended for Pi4) or 1 (recommended for Pi5). e.g.

SDRAM_BANKLOW=1

You should then in /proc/cmdline a numa=fake=<n> setting, and hopefully sdram bandwidth limited processing tasks will run faster.

mairacanal and others added 8 commits October 21, 2024 18:24
Add some common code for splitting the memory into N emulated NUMA memory
nodes.

Individual architecture can then enable selecting this option and use the
existing numa=fake=<N> kernel argument to enable it.

Memory is always split into equally sized chunks.

Signed-off-by: Maíra Canal <[email protected]>
Co-developed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: “Rafael J. Wysocki" <[email protected]>
Allow selecting NUMA emulation on arm64.

Signed-off-by: Maíra Canal <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: “Rafael J. Wysocki" <[email protected]>
Add numa_policy kernel argument to allow overriding the kernel's default
NUMA policy at boot time.

Syntax identical to what tmpfs accepts as it's mpol argument is accepted.

Some examples:

 numa_policy=interleave
 numa_policy=interleave=skip-interleave
 numa_policy=bind:0-3,5,7,9-15
 numa_policy=bind=static:1-2

Signed-off-by: Tvrtko Ursulin <[email protected]>
system_heap.max_order=<uint>

Signed-off-by: Tvrtko Ursulin <[email protected]>
... Make sure CMA zones do not straddle the emulated NUMA nodes ...

Signed-off-by: Tvrtko Ursulin <[email protected]>
Allow NUMA and NUMA_EMULATION on 64-bit Pi kernels

Signed-off-by: Dom Cobley <[email protected]>
Most 2711 devices and all 2712 device share common bootargs (command
lines). Make the common values shared defaults, overriding them were
necessary.

Signed-off-by: Phil Elwell <[email protected]>
The default cmdline adjustment is now numa_policy=interleave for 2711 and 2712,
and additionally system_heap.max_order=0 iommu_dma_numa_policy=interleave for
just 2712 (due to its better iommu support).

The key setting numa=fake=<n> is not set here, so we will boot with a single
numa region and behaviour should be pretty much unchanged from before this PR.

Signed-off-by: Dom Cobley <[email protected]>
@popcornmix
Copy link
Collaborator Author

@pelwell I think this is ready. Any objections?

@pelwell pelwell merged commit dd44b93 into raspberrypi:rpi-6.6.y Oct 22, 2024
11 of 12 checks passed
@popcornmix popcornmix deleted the rpi-6.6.y-numa branch October 22, 2024 15:11
popcornmix added a commit to raspberrypi/firmware that referenced this pull request Oct 22, 2024
See: raspberrypi/linux#6273

kernel: DRM Writeback connector priority changes
See: raspberrypi/linux#6345

kernel: Input: matrix-keypad - don't map irqs when atomic
See: raspberrypi/linux#6427
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Oct 22, 2024
See: raspberrypi/linux#6273

kernel: DRM Writeback connector priority changes
See: raspberrypi/linux#6345

kernel: Input: matrix-keypad - don't map irqs when atomic
See: raspberrypi/linux#6427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants