Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use resources: mem_gb=3 specs in all Java tools to request -Xmx{snakemake.resources.mem_gb}G #127

Open
dlaehnemann opened this issue Jun 25, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@dlaehnemann
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Specifying this via params strings makes snakemake unaware of these resources.

Describe the solution you'd like
Via resources, snakemake can manage available memory in its scheduling.

@dlaehnemann dlaehnemann added the enhancement New feature or request label Jun 25, 2020
@christopher-schroeder
Copy link
Contributor

I will try to do that. The mem_gb=3, is this just a rough guess? For example, I think SnpEff requires more than 3gb, but it might also depend on the vcf, doesn't it?

@christopher-schroeder christopher-schroeder self-assigned this Jun 26, 2020
@christopher-schroeder
Copy link
Contributor

christopher-schroeder commented Jun 26, 2020

Another question: I see the benefits for dealing only with "gb", but it would also be possible to allow other units, like 512m. What if the user wants to set the used amount of ram to something smaller than 1gb? I don't think that 0.5gb is allowed (but I haven't tested this). So without thinking much about it, I would rather use "mem" instead of "mem_gb" and let the user choose the unit by himself.

ps.: Nevermind, snakemake would not be able to handle this. Stupid idea.

@christopher-schroeder
Copy link
Contributor

christopher-schroeder commented Jun 26, 2020

I have a suggestion. The metric unit prefixes T, G, M, k, h and da are independent of the unit itself. So it would be great, if snakemake would support them, so that one could write

snakemake -j 1h -mem 512m

which would then be translated internally to 100 and (512 * 1024). (probably noone would use "hecto" (h), but anyway). Tthis would also be useful for other resources. With the internally translation to the base unit - "bytes" in the case of mem - we would avoid any problems with small units.

@dlaehnemann
Copy link
Contributor Author

The 3 was just to a number in there instead of some placeholder, so a different default for different tools makes sense. I think the most important point is getting a unified approach of treating memory resources in all Java tools whose bioconda packages allow specification of -Xmx.

Generally, I would think that on most machines and in most settings, specifying integer values of Gigabytes should be good enough. Memory is usually available and only a limiting factor for some tools, so generous defaults would usually make sense, to avoid job failures due to limited JVM memory resources. But I do see the setting of some tool that runs on a single core but needs a lot of memory, so that when you run lots of those jobs in parallel, it does get limiting. The easiest way out would be to be able to specify float values for --resources mem_gb=0.5, and with a quick search of the snakemake code I couldn't find any requirement for a resource to be int. So this might just work---I guess a quick test whether snakemake complains would be a start?

Also, I do see the appeal of a clean solution allowing metric units for memory and introducing a separate memory resource command-line parameter, but I'm not sure if this is not over-engineering the problem for now. Maybe @johanneskoester can comment on the idea and if we all agree to pursue it, this problem would head over to the snakemake repo, first... :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants