Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gridss #15

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Gridss #15

wants to merge 16 commits into from

Conversation

christopher-schroeder
Copy link
Contributor

@christopher-schroeder christopher-schroeder commented Mar 25, 2020

This implements my gridss calling pipeline, which is suppose to replace delly as the structural variant caller. Gridss only calls breakends and varlociraptor does not support breakends yet. I suggest we should have both callers separately at first by this PR. I would like to implement purple and linx as well, for cnv and fusiongene calling, which both requires gridss. When varlociraptor is modified and able to handle breakends, we should remove delly in a second step.

PS: Don't change the file structure! gridss requires a very specific file structure for the assemble step!!!

@johanneskoester
Copy link
Contributor

Thanks a lot! Wow, this looks like quite some amount of reverse engineering work. For my understanding: what are the main advantages of breaking gridss into pieces, compared to just running gridss.sh?

@christopher-schroeder
Copy link
Contributor Author

gridss.sh is already broken into these pieces. But the execution is iteratively in bash. Now with each step beeing a rule, they can be performed in parallel. Also not every step requires the same amount of ressources (primarly cores, but we could also measure the amount of memory).
The number of threads for each rules are the observed number of cores on one of our servers for each step. And of cause you know the benefits of using snakemake, like recalculating intermediate files and automatically updating downstream results.
Additionally I recognize, that we dont need every produced metric, so we can adjust the code for the pipeline requirements (I haven't done that yet).
Furthermore I think about replacing samtools by sambamba and introducing pipes to save intermediate temp files for an increased perfomance.

@christopher-schroeder
Copy link
Contributor Author

I would really appreciate to have this PR merged

@johanneskoester
Copy link
Contributor

Thanks, makes sense to me. I'd like to wait with merging until varlociraptor supports gridss output. I hope this will happen within the next two weeks or so. For readability and also testing, I wonder whether we should move this into wrappers first. The commands a really ugly.

@johanneskoester
Copy link
Contributor

Thinking about it, this will also enable us to work around the weird path restrictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants