Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic cpu pinning pins programs to only 4 cores on Intel 13900 #498

Closed
wereii opened this issue Sep 23, 2024 · 19 comments · Fixed by #515
Closed

Automatic cpu pinning pins programs to only 4 cores on Intel 13900 #498

wereii opened this issue Sep 23, 2024 · 19 comments · Fixed by #515

Comments

@wereii
Copy link

wereii commented Sep 23, 2024

As the title says, on my machine with an Intel 13900kf processor, automatic core pinning (pin_cores=yes, or commented out ie the default) will pick out the cpus 8-11, which is just 2 cores with 4 threads in total.

# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ       MHZ
  0    0      0    0 0:0:0:0          yes 5500.0000 800.0000 1100.0551
  1    0      0    0 0:0:0:0          yes 5500.0000 800.0000  800.0000
  2    0      0    1 4:4:1:0          yes 5500.0000 800.0000 1020.2340
  3    0      0    1 4:4:1:0          yes 5500.0000 800.0000 1023.4670
  4    0      0    2 8:8:2:0          yes 5500.0000 800.0000 1100.0000
  5    0      0    2 8:8:2:0          yes 5500.0000 800.0000  800.0000
  6    0      0    3 12:12:3:0        yes 5500.0000 800.0000 1100.0031
  7    0      0    3 12:12:3:0        yes 5500.0000 800.0000  800.0000
  8    0      0    4 16:16:4:0        yes 5800.0000 800.0000  976.1010
  9    0      0    4 16:16:4:0        yes 5800.0000 800.0000  799.0430
 10    0      0    5 20:20:5:0        yes 5800.0000 800.0000 1404.8610
 11    0      0    5 20:20:5:0        yes 5800.0000 800.0000 1502.6730
 12    0      0    6 24:24:6:0        yes 5500.0000 800.0000  801.1090
 13    0      0    6 24:24:6:0        yes 5500.0000 800.0000  800.0000
 14    0      0    7 28:28:7:0        yes 5500.0000 800.0000 1040.9830
 15    0      0    7 28:28:7:0        yes 5500.0000 800.0000 1099.2321
 16    0      0    8 32:32:8:0        yes 4300.0000 800.0000  800.0090
 17    0      0    9 33:33:8:0        yes 4300.0000 800.0000 1991.7040
 18    0      0   10 34:34:8:0        yes 4300.0000 800.0000  860.5860
 19    0      0   11 35:35:8:0        yes 4300.0000 800.0000  800.0000
 20    0      0   12 36:36:9:0        yes 4300.0000 800.0000  800.0350
 21    0      0   13 37:37:9:0        yes 4300.0000 800.0000  800.0000
 22    0      0   14 38:38:9:0        yes 4300.0000 800.0000  800.0000
 23    0      0   15 39:39:9:0        yes 4300.0000 800.0000  800.0000
 24    0      0   16 40:40:10:0       yes 4300.0000 800.0000  800.0000
 25    0      0   17 41:41:10:0       yes 4300.0000 800.0000  800.0000
 26    0      0   18 42:42:10:0       yes 4300.0000 800.0000  800.0000
 27    0      0   19 43:43:10:0       yes 4300.0000 800.0000  800.0000
 28    0      0   20 44:44:11:0       yes 4300.0000 800.0000  800.0000
 29    0      0   21 45:45:11:0       yes 4300.0000 800.0000  800.0000
 30    0      0   22 46:46:11:0       yes 4300.0000 800.0000  800.0000
 31    0      0   23 47:47:11:0       yes 4300.0000 800.0000  800.0000

Quick look around the code and in issues tells me this is currently basically by design, at least going by the "picking cores with most maxfreq", though because of the rather atypical max freq spread between even the p-cores on this processor it will pin games to CPU 8-11 (cores 4 and 5).
In my case, I've noticed very low fps in Helldivers 2 ~70 FPS instead of ~140 with no pinning.

I can already see that there is a check that won't try pinning less then 4 cores. I guess it's not really feasible to make the automatic pinning algorithm universal across all the possibilities with p/e cores but one idea here would be to also log a warning if the autopinning results in less then X% (let's say 10%) cores of the total core count ?

@HenrikHolst
Copy link
Contributor

HenrikHolst commented Oct 30, 2024

The main question is why only some are reporting a max of 5.8Ghz while the other P cores are reporting a max of 5.5Ghz. Are only some cores boostable on the 13900?

edit: ok so in the current code we used 5% as the safety margin for boost and that is too small here, 10% works fine but ofc is that enough for future cpus or should be do this some other way.

Anyway quick fix here is to change line 128 in daemon/gamemode-cpu.c from
unsigned long long cutoff = (freq * 5) / 100;
to
unsigned long long cutoff = (freq * 10) / 100;

@wereii
Copy link
Author

wereii commented Oct 30, 2024

Thanks for looking into this.

I can't verify if I have the correct max frequencies listed for this CPU as not even Intel seems to have these details published somewhere, though I think some posts in the Arch forum did mention there is a variation even between the P-Cores. So I am assuming my lscpu output is correct.

@HenrikHolst
Copy link
Contributor

Thanks for looking into this.

I can't verify if I have the correct max frequencies listed for this CPU as not even Intel seems to have these details published somewhere, though I think some posts in the Arch forum did mention there is a variation even between the P-Cores. So I am assuming my lscpu output is correct.

Oh I have no doubt that it is correct, only puzzled why Intel didn't add info about which cores are P and which ones are E instead leaving us to use the frequency to try and determine which one is which..

@wereii
Copy link
Author

wereii commented Oct 31, 2024

For what it's worth, in this case all the P-cores are also the only ones with multiple threads, but there indeed does not seem to be a direct indicator for the distinction, at least not in the kernel.

Here is also how inxi shows it:

# inxi -C
CPU:
  Info: 24-core (8-mt/16-st) model: 13th Gen Intel Core i9-13900KF bits: 64
    type: MST AMCP cache: L2: 32 MiB
  Speed (MHz): avg: 1100 min/max: 800/5500:5800:4300
# ...

@HenrikHolst
Copy link
Contributor

For what it's worth, in this case all the P-cores are also the only ones with multiple threads, but there indeed does not seem to be a direct indicator for the distinction, at least not in the kernel.

Here is also how inxi shows it:

# inxi -C
CPU:
  Info: 24-core (8-mt/16-st) model: 13th Gen Intel Core i9-13900KF bits: 64
    type: MST AMCP cache: L2: 32 MiB
  Speed (MHz): avg: 1100 min/max: 800/5500:5800:4300
# ...

Yeah the P cores have SMT on the 12900, 13900 and the 14900 but then on the new 245 and 285 they have dropped SMT so there the P cores have 1 thread just like the E cores.

@HenrikHolst
Copy link
Contributor

a quicker fix is to otherwise simple change say pin_cores in gamemode.ini from "yes" to "0-15"

@afayaz
Copy link

afayaz commented Jan 26, 2025

I found this on Stack Overflow:

https://stackoverflow.com/a/79238548

On Alder lake (and mostly like other hybrid Intel architectures), instead of /sys/devices/cpu there are two directories: /sys/devices/cpu_atom/ and /sys/devices/cpu_core/, being the first (cpu_atom) for the e-cores and the second (cpu_core) for the p-cores.

Inside each directory there is a file named cpus that contain the cpu range number.

I don't have access to an Alder Lake or newer CPU so can't verify this right now.
It would also be great to find some official documentation on this.

@wereii
Copy link
Author

wereii commented Jan 26, 2025

I can confirm that:

> cat /sys/devices/cpu_core/cpus
0-15
> cat /sys/devices/cpu_atom/cpus
16-31
  • Intel 13900kf, 6.13.0-2-cachyos

@HenrikHolst
Copy link
Contributor

Sounds extremely promising, thanks for the info!

@HenrikHolst
Copy link
Contributor

Made this quick patch, could any one check if this makes it work for you guys with alderlake? Also added a small debug log if the sys path is used to make sure that the patch works when that is present:

diff --git a/daemon/gamemode-cpu.c b/daemon/gamemode-cpu.c
index f27b254..cfd5416 100644
--- a/daemon/gamemode-cpu.c
+++ b/daemon/gamemode-cpu.c
@@ -73,6 +73,32 @@ static int read_small_file(char *path, char **buf, size_t *buflen)
        return 1;
 }
 
+static int check_pe_cores(char *cpulist, char **buf, size_t *buflen, GameModeCPUInfo *info)
+{
+       long from, to;
+       char *list = cpulist;
+       while ((list = parse_cpulist(list, &from, &to))) {
+               for (long cpu = from; cpu < to + 1; cpu++) {
+                       CPU_SET_S((size_t)cpu, CPU_ALLOC_SIZE(info->num_cpu), info->online);
+               }
+       }
+
+       if (!read_small_file("/sys/devices/cpu_core/cpus", buf, buflen))
+               return 0;
+
+       LOG_MSG("found Alder Lake+ info in the kernel, using that to determine the P cores\n");
+
+       list = *buf;
+
+       while ((list = parse_cpulist(list, &from, &to))) {
+               for (long cpu = from; cpu < to + 1; cpu++) {
+                       CPU_SET_S((size_t)cpu, CPU_ALLOC_SIZE(info->num_cpu), info->to_keep);
+               }
+       }
+
+       return 1;
+}
+
 static int walk_sysfs(char *cpulist, char **buf, size_t *buflen, GameModeCPUInfo *info)
 {
        char path[PATH_MAX];
@@ -125,7 +151,7 @@ static int walk_sysfs(char *cpulist, char **buf, size_t *buflen, GameModeCPUInfo
                        if (ret > 0 && ret < PATH_MAX) {
                                if (read_small_file(path, buf, buflen)) {
                                        unsigned long long freq = strtoull(*buf, NULL, 10);
-                                       unsigned long long cutoff = (freq * 5) / 100;
+                                       unsigned long long cutoff = (freq * 10) / 100;
 
                                        if (freq > max_freq) {
                                                if (max_freq < freq - cutoff)
@@ -278,8 +304,9 @@ int game_mode_initialise_cpu(GameModeConfig *config, GameModeCPUInfo **info)
        } else if (park_or_pin == IS_CPU_PIN && pin_cores[0] != '\0') {
                if (!walk_string(buf, pin_cores, new_info))
                        goto error_exit;
-       } else if (!walk_sysfs(buf, &buf2, &buf2len, new_info)) {
-               goto error_exit;
+       } else if (!check_pe_cores(buf, &buf2, &buf2len, new_info)) {
+               if (!walk_sysfs(buf, &buf2, &buf2len, new_info))
+                       goto error_exit;
        }
 
        if (park_or_pin == IS_CPU_PARK &&

@wereii
Copy link
Author

wereii commented Jan 26, 2025

I am getting errors applying that patch on top of current master or 1.8.2:

patch -p1 < ../../quick.patch
patching file daemon/gamemode-cpu.c
Hunk #1 succeeded at 177 with fuzz 1 (offset 104 lines).
Hunk #2 FAILED at 151.
Hunk #3 FAILED at 304.
2 out of 3 hunks FAILED -- saving rejects to file daemon/gamemode-cpu.c.rej

(Am I applying it wrong?)

@HenrikHolst
Copy link
Contributor

strange, I simply did a "git pull" made the changes and then did a "git diff" and a "git diff" after the first pull was empty so I have no uncommitted changes lingering. Let me try and apply the patch myself to a new clone to see what happens.

@HenrikHolst
Copy link
Contributor

henrik@Sineya:~$ git clone https://github.com/FeralInteractive/gamemode.git
henrik@Sineya:~$ cd gamemode/
henrik@Sineya:~/gamemode$ patch -p1 < ../gamemode-cpu.c.patch 
patching file daemon/gamemode-cpu.c

So that worked out of the box, I suspect that you have perhaps have local changes?!

@wereii
Copy link
Author

wereii commented Jan 26, 2025

Tried again with new gamemode repo in different location, still fails. Githubs codeblock might be doing something weird to it? I do have to select, copy, paste it into a patch file, so might be better to try keep it as file? (Quick idea could be termbin, e.g command | nc termbin.com 9999, ps: I keep that one as alias pasta :).
E: or well, just drop the >file as attachment haha

@HenrikHolst
Copy link
Contributor

new try with termbin: https://termbin.com/291v

@wereii
Copy link
Author

wereii commented Jan 26, 2025

That worked! compiling and testing in a bit

@wereii
Copy link
Author

wereii commented Jan 26, 2025

Works for me:)

> gamemoded
v1.8.2
Loading config file [/usr/share/gamemode/gamemode.ini]
Loading config file [/etc/gamemode.ini]
found Alder Lake+ info in the kernel, using that to determine the P cores
Successfully initialised bus with name [com.feralinteractive.GameMode]...
Adding game: 452143 [/usr/bin/glxgears]
Entering Game Mode...
governor was initially set to [powersave]
Requesting update of governor policy to performance
ERROR: Failed to call Inhibit on org.freedesktop.ScreenSaver: No route to host
	org.freedesktop.DBus.Error.ServiceUnknown
	The name is not activatable
Setting ioprio value...
ERROR: Skipping ioprio on client [452143,452143]: ioprio was (0) but we expected (4)
Pinning process...
> taskset -pc `pidof glxgears`
pid 452143's current affinity list: 0-15

Ignore the ScreenSaver errors, my locker setup sadly does not listen on dbus.

@HenrikHolst
Copy link
Contributor

Great to hear, will think some more about cleaning up the logging and will create MR asap.

HenrikHolst added a commit to HenrikHolst/gamemode that referenced this issue Jan 26, 2025
@wereii found a proper kernel way to check for the presence of P- and E-cores so we don't have to check the frequency differences.

Kept the frequency check in case the kernel way does not exist on the users kernel but upped the fail-safe to 10% from 5% to close FeralInteractive#498
@afayaz
Copy link

afayaz commented Jan 27, 2025

Thanks for that!

For reference, the manpage for perf-stat was the closest thing to official documentation that I could find:

https://www.man7.org/linux/man-pages/man1/perf-stat.1.html#INTEL_HYBRID_SUPPORT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants