-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi-C workflow #139
base: main
Are you sure you want to change the base?
Hi-C workflow #139
Conversation
@@ -209,6 +209,10 @@ task bwa_mem { | |||
description: "Read group information for BWA to insert into the header. BWA format: '@RG\tID:foo\tSM:bar'", | |||
group: "common", | |||
} | |||
skip_mate_rescue: "Skip mate rescue" | |||
skip_pairing: "Skip pairing; mate rescue performed unless `skip_mate_rescue` also in use" | |||
split_smallest: "For split alignment, take the alignment with the smallest coordinate as primary" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the help text.
-S skip mate rescue
-P skip pairing; mate rescue performed unless -S also in use
-5 for split alignment, take the alignment with the smallest coordinate as primary
-P
is the only option ith an entry in the manual:
-P | In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.
Nothing overly helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-frantz - I've added some additional text, but it relies on google searching and reading Q&A on online forums. I don't see anything definitive from the bwa
authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text for skip_[mate_rescue,pairing]
looks good now! But I still have my original Q: What's the alternative to "take the alignment with the smallest coordinate as primary"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's random, since these have the same score (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another partial review. Still chugging through this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a doozy. I skimmed some parts. Need to do another closer review of some parts, but this is enough feedback for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this version dir should be 2.31.1-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is named after bedtools
, why does it have hic
scripts in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, it was just plain bedtools
, but with the decision to remove the embedded scripts, those needed to get built in to some image and this one depends on bedtools
. I can go somewhere else, but then we'll need to install bedtools
in to another container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.5.4-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe consolidate some of these new Docker images? It looks like we can maybe merge bedtools
, hilow
and juicertools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were trying to avoid these types of monolithic images? We can do that, but is that the direction we want to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. There's a balance to be found, but I'm not sure what it is yet... Responding to #139 (comment) in this thread to consolidate conversation.
Maybe we can merge hilow
and juicertools
(is my understanding correct that they are usually used together?), building that image with bedtools
and moving the hic
script which depends on bedtools
into there? That's a bit "monolithic", but keeps the bedtools
image "clean" and hopefully isn't too sprawling.
Thoughts on that?
args = get_args() | ||
|
||
f=open(args.filter_pairs) | ||
blackID=defaultdict(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was just a copy+paste, but can we use proper casing in this file? And blackID
should probably be updated to exclude_list
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we incorporate some Python tooling? A linter of some sort, formatter, etc? Don't really want to ask you to rewrite all these py scripts which appear to be mostly copy+paste, but also they aren't really conformant to any Python standards...
} | ||
} | ||
|
||
task fastq_to_sam { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task fastq_to_sam { | |
task fastq_to_ubam { |
???
} | ||
|
||
Float fastq_size = size(read_one_fastq_gz, "GiB") + size(read_two_fastq_gz, "GiB") | ||
Int disk_size_gb = ceil(fastq_size * 2) + 10 + modify_disk_size_gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this calculation solid? I'm not really sure what to expect for the size of a uBAM.
@@ -1327,7 +1327,90 @@ task faidx { | |||
cpu: 1 | |||
memory: "4 GB" | |||
disks: "~{disk_size_gb} GB" | |||
container: "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" | |||
container: "quay.io/biocontainers/samtools:1.19.2--h50ea8bc_0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be updated for the rest of the file? I thought you added a lint or a CI or something to catch these cases...
external_help: "https://www.htslib.org/doc/samtools-sort.html", | ||
} | ||
prefix: "Prefix for the output file. The extension `.bam` will be added." | ||
uncompressed: "Output uncompressed BAM?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think until streaming comes to WDL, we should probably avoid allowing uncompressed output
call bowtie2.build { input: | ||
reference = reference_download.downloaded_file, | ||
prefix = basename(reference_fa_name, ".fa.gz"), | ||
ncpu = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be hardcoded here.
Implements a workflow to generate a bowtie2-aligned BAM and a
.hic
file for analysis. This utilizes the commonly used HiC-Pro workflow.