Failure to parse workflow config when setting slurm_extra based on attempt #208

blaiseli · 2025-01-27T09:36:10Z

Software Versions

$ snakemake --version
8.27.1
$ pip list | grep "snakemake-executor-plugin-slurm"
snakemake-executor-plugin-slurm           0.15.0
snakemake-executor-plugin-slurm-jobstep   0.2.1
$ sinfo --version
slurm 23.02.6

Describe the bug

When I try to set resources using an <v1> if attempt == 1 else <v2> expression in the workflow config.yaml, this works with integers, but not with strings.

Example:

set-resources:
    my_rule:
        mem_mb: 16384 if attempt == 1 else (32768 if attempt == 2 else (49152 if attempt == 3 else 65536))
        runtime: 119 if attempt == 1 else 1439
        cpus_per_task: 32 if attempt == 1 else 12
        # This fails
        slurm_extra: "'--qos=fast'" if attempt == 1 else "'--qos=normal'"
        # This works:
        # slurm_extra: "'--qos=fast'"

Logs

When trying to set up slurm_extra as above, I get the following kind of error:

slurm_script: error: Couldn't parse config file: while parsing a block mapping
  in "/pasteur/appa/homes/bli/src/nanopore_assembly/src/workflow/profile/config.yaml", line 82, column 9
expected <block end>, but found '<scalar>'
  in "/pasteur/appa/homes/bli/src/nanopore_assembly/src/workflow/profile/config.yaml", line 86, column 37

The text was updated successfully, but these errors were encountered:

blaiseli · 2025-01-27T09:50:59Z

The following syntax seems to be parsable:

slurm_extra: '( "--qos=fast" if attempt == 1 else "--qos=normal" )'

cmeesters · 2025-01-28T12:43:36Z

Ah, thank you for this feedback. I will update the docs accordingly - and perhaps open a PR to allow direct qos settings.

May I ask: this qos selection seems like something you ought to place in different partitions rather than choosing qos features. Can you offer a documentation link? I am trying to make the plugin as generic as possible. It's hard. My fellow admins have quite some phantasy, when it comes to (mis)configuring SLURM.

blaiseli · 2025-01-29T17:33:44Z

May I ask: this qos selection seems like something you ought to place in different partitions rather than choosing qos features. Can you offer a documentation link?

I'm not sure to understand your question.

The documentation of our institution's cluster has restricted access, and I'm not even sure where to find what... Here are some explanations:

On our institution's cluster, different partitions have access to different sets of QOS. A given QOS allows a certain maximum runtime. So if I notice that a job tends to timeout, I ask for a longer runtime at further attempts, and I need to adjust the QOS accordingly using slurm_extra.

On top of that, a given user has access to a certain set of partitions. I write snakemake workflows that may / will have to be run by other users, with a more restricted set of partitions than me.

Concretely, I run some of my tests using the "common" partition, because I know that those users will have access to it. And the "common" partition has access to "ultrafast", "fast", and "normal" QOSes, on which jobs can request up to 5 minutes, 2 hours, and 1 day respectively.

One of my rules tends to run in about 2h, sometimes less, sometimes more, so I make a first attempt with 119 minutes on qos fast, then 1439 on qos normal (I reserve 1 minute, in order to be able to group the rule with another short-running downstream one). At the same time, I also play with memory, because sometimes, the job gets oomkilled (possibly depending on the size of the input data), and I try to reduce the time spent in the queue by reducing the number of requested processors. Hence the weird workflow profile...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to parse workflow config when setting slurm_extra based on attempt #208

Failure to parse workflow config when setting slurm_extra based on attempt #208

blaiseli commented Jan 27, 2025

blaiseli commented Jan 27, 2025 •

edited

Loading

cmeesters commented Jan 28, 2025

blaiseli commented Jan 29, 2025

Failure to parse workflow config when setting slurm_extra based on attempt #208

Failure to parse workflow config when setting slurm_extra based on attempt #208

Comments

blaiseli commented Jan 27, 2025

blaiseli commented Jan 27, 2025 • edited Loading

cmeesters commented Jan 28, 2025

blaiseli commented Jan 29, 2025

blaiseli commented Jan 27, 2025 •

edited

Loading