Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperthreading breaks cluster-hosts file syntax, causes 'error parsing hostfile' #46

Open
geerlingguy opened this issue Nov 2, 2024 · 0 comments

Comments

@geerlingguy
Copy link
Owner

Over in #45, @CraftComputing ran into some errors running this playbook on an Intel Granite Rapids Xeon 6980P system.

That system has 2x Intel Xeon 6980P 128-Core / 256-Thread CPUs, meaning there's a total of 256 cores, and 512 threads, when Hyperthreading is enabled.

The current code that generates the cluster-hosts file uses ansible_processor_vcpus to indicate the number of slots available on the machine, which in this case would output 512:

{{ hostvars[host].ansible_default_ipv4.address }}:{{ hostvars[host].ansible_processor_vcpus }}

There are two things I could do to resolve the issue:

  1. Drop the slots part out of the cluster-hosts file entirely, and rely on mpirun choosing the correct number of cores.
  2. Switch to using ansible_processor_cores or maybe ansible_processor_nproc?

The latter option could also be complicated on multi-CPU systems, because I might need to do some math to generate the number of cores correctly...

In the end, the simplest thing may be to document that "if you have Hyperthreading enabled, override variable XYZ to specify the total number of cores on your system"... something like that. And then use a variable that defaults to ansible_processor_vcpus but can be overridden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant