Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_cpus_per_task() assigns 1 to cpus_per_task when not specified #27

Open
therealjpetereit opened this issue Jan 7, 2025 · 9 comments

Comments

@therealjpetereit
Copy link

Hello and merry 2025 🎉🙌

I am using the execellent Snakemake workflow automation our SLURM HPC Cluster in Western Australia.

As most HPC clusters there are specific rules on what job submissions are allowed and which are not.

For us there is a rule that if I request a GPU node I cannot set cpus_per_task=x

The problem I am encountering is that when I omit cpus_per_task=x from my config.yaml or snakefile resources, the slurm executor
plugin will automatically set cpus_per_task=1 as defined by the get_cpus_per_task() function.

Is there a way to instead of setting it to 1, to make slurm do this?

lets say if I do cpus_per_task='default' the plugin would simply omit setting a cpus_per_task value

Cheers
Jakob

@therealjpetereit therealjpetereit changed the title get_cpus_per_task() assings 1 to cpus_per_task when not specified get_cpus_per_task() assigns 1 to cpus_per_task when not specified Jan 7, 2025
@cmeesters
Copy link
Member

Thank you for bringing this issue to my attention. Interesting. Can you use --cpus-per-gpu instead?

I will polish the PR on GRES (see snakemake/snakemake-executor-plugin-slurm#173 ) in the coming days and wonder whether it helps to code "if a gpu is selected, set --cpus-per-gpu" instead. Of course, the idea is to provide a solution, that works everywhere. It appears to be a rather futile task. My fellow admins have quite a fantasy when deviating from standards, and SchedMD (the company behind SLURM) occasionally has funny ideas, too.

@therealjpetereit
Copy link
Author

Hi,

Can you use --cpus-per-gpu instead?
Depends how it's implemented, the way they use it in the cluster config is that cpus_per_task is red flagged if gpu partition is used. The system has net cpus_per_gpu=8 in our case, but that number is arbitrary and could be anything in the future.

So my recommendation would be to change make cpus_per_task in the snakemake sheduler accept a 'none' argument and it omits the line where cpus_per_task is added to the sbatch command.
I saw that in one of the helper scripts.

I'll track that down an add it as extra comment.

the idea is to provide a solution, that works everywhere.

here's hoping 😂😅

Cheers
J

@therealjpetereit
Copy link
Author

Ok,

so I think if you look at lib/python3.12/site-packages/snakemake_executor_plugin_slurm/__init__.py

it has in line 178
call += f" --cpus-per-task={get_cpus_per_task(job)}"

that could have an if not statement for cpus_per_task='none'

Maybe you have a better idea, this is just what came to my mind.

Thanks for the prompt responses 🙌

@cmeesters
Copy link
Member

My idea would be to still have a default. 1 CPU is fine for most gpu tasks. So, get_cpus_per_task(job) should be

  1. renamed
  2. return --cpus-per-gpu=<value> if a gpu is selected
  3. else return --cpus-per-taks=<value>

where <value> is equal to threads of a resource section or threads directive of a rule and will take cpus_per_taks or cpus_per_gpu as a fallback.

Does this make sense to you? And would it work on your cluster?

The idea is to keep the workflow itself generic. If 8 is the upper value for now, that is a non-issue. You might be using a multi-gpu + multi-cpu application (in the future), where this proposed layout might come in handy. For know, it will not hurt, or will it?

BTW to me there is no difference in --cpus_per_task or --cpus_per_gpu, because a gpu-task is a process like any other regarding the scheduling process (SLURM is rather ignorant to the offloading to gpus). The difference might be imposed, because otherwise admins need to implement a plugin, to deny cpu-only jobs in gpu partitions. But that is only my take.

@therealjpetereit
Copy link
Author

therealjpetereit commented Jan 7, 2025

Hi,

I just tried if --cpus_per_gpu works but sadly it doesn't. I assume it still does cpus_per_task in the background.

sbatch --export=ALL -A XXX-gpu' -p gpu -t 600 --mem 1000 --ntasks=1 --cpus-per-task=5 --gpus=1 test.sh
sbatch: error: lua: cli_filter: cannot explicitly request CPU resources for GPU allocation; each allocated GPU allocates 8 cores
sbatch: error: cli_filter plugin terminated with error

sbatch --export=ALL -A 'XXX-gpu' -p gpu -t 600 --mem 1000 --ntasks=1 --cpus-per-gpu=5 --gpus=1 test.sh
sbatch: error: lua: cli_filter: cannot explicitly request CPU resources for GPU allocation; each allocated GPU allocates 8 cores
sbatch: error: cli_filter plugin terminated with error

Your logic is sound, sadly the coding of the cli_filter doesn't allow it in my specific case.
Our cluster basically takes the --gpus=x and then sets the calue for cpus, which collides with setting any cpus.

I did tell teh HPC staff that this is an issue, but alas I haven't heard from them.

@cmeesters
Copy link
Member

cannot explicitly request CPU resources for GPU allocation; each allocated GPU allocates 8 cores

ah, this is what you meant. My bad. Still doesn't make sense. Even, if you get assigned n cpus, you should not be able to oversubscribe but explicitly underusing a resource, which otherwise does not get used anyway? (Don't tell me that a signification fraction of GPU applications will use exactly 8 cpus! ;-)

It appears your admin team wrote a filter script. Now, I could set point 2 in my list to simply returning no string, if there is only one cpu selected or none are selected. I do not want users to explicitly write None either way.

@therealjpetereit
Copy link
Author

Hi,

yes you are correct, it's a bit of a silly filter.

That solution should work for me, as long maybe return no string if cpus are set to zero ?
Anyway, I am happy with any solution =)

Cheers
J

@therealjpetereit
Copy link
Author

Hi again,

I just came across the same situation for memory, once you have a solution for the cpus_per_task, would you mind applying the same to --mem ?

Cheers
J

@cmeesters
Copy link
Member

I'm nearing completion. Hopefully, a fix to all our liking will be released next week.

Yet, I do not understand your last comment. We have

if job.resources.get("mem_mb_per_cpu"):
      call += f" --mem-per-cpu {job.resources.mem_mb_per_cpu}"
elif job.resources.get("mem_mb"):
      call += f" --mem {job.resources.mem_mb}"
else:
      self.logger.warning(
          "No job memory information ('mem_mb' or 'mem_mb_per_cpu') is given "
          "- submitting without. This might or might not work on your cluster."
      )

(hopefully, I got the indentation right - see lines 235 et seqq., ought to be correct there.)

Anyway, there is no default setting to be omitted. These parameters are just set if required by a user.

Cheers
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants