Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi-C workflow #139

Open
wants to merge 124 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
c8b17b2
feat: hi-c initial commit
adthrasher Mar 22, 2024
55f09d8
clean up check errors
adthrasher Mar 22, 2024
0a025a0
Add Hi-C workflows. Split ReadGroup struct into separate file.
adthrasher Mar 28, 2024
d374a51
Fix check issues
adthrasher Mar 28, 2024
a5cd476
Fixes to Hi-C workflow
adthrasher Apr 4, 2024
6bd01bf
Expose restriction sites option
adthrasher Apr 4, 2024
b531cc5
Doc update for hic-core
adthrasher Apr 5, 2024
1357d55
Rename fastq workflow
adthrasher Apr 5, 2024
0279f9e
First pass at input validation
adthrasher Apr 5, 2024
c27bdf3
syntax fixes
adthrasher Apr 5, 2024
e4dba06
Add Dockerfiles for pairix and juicer
adthrasher Apr 5, 2024
f8a8cf2
Add tests for juicer and pairix
adthrasher Apr 5, 2024
32be249
Update to custom docker images
adthrasher Apr 5, 2024
9567e04
String? -> String with defaults
adthrasher Apr 8, 2024
e029ca0
Add download link for restrction site files
adthrasher Apr 8, 2024
182a31a
Fix tests
adthrasher Apr 8, 2024
7dd86e1
chore: merge main
adthrasher Aug 27, 2024
22e7523
Update read_group.wdl
adthrasher Aug 27, 2024
8c76606
chore: rename simple Hi-C workflow
adthrasher Aug 27, 2024
eee143e
feat: Add HiLOW inspired Hi-C workflow, really its just HiC-Pro
adthrasher Aug 27, 2024
06c2240
chore: add bowtie2 Dockerfile
adthrasher Aug 27, 2024
ee0cff8
chore: fix lint issues
adthrasher Aug 27, 2024
118e794
chore: fix lint warnings
adthrasher Aug 27, 2024
39fe94b
chore: remove duplicate faidx task
adthrasher Aug 27, 2024
961ce2e
chore: update function name
adthrasher Aug 27, 2024
0e450dd
chore: update function name
adthrasher Aug 27, 2024
0b39a96
chore: cleanup
adthrasher Aug 27, 2024
b150efe
chore: fix check warnings
adthrasher Aug 27, 2024
8c95537
chore: fix lint issues
adthrasher Aug 27, 2024
eaa5540
feat: add HiC-Pro workflow
adthrasher Aug 27, 2024
2560177
refactor: separate HiC-Pro workflow
adthrasher Aug 27, 2024
134cf76
chore: mv qc_hic to hicpro
adthrasher Aug 28, 2024
3b8f797
chore: specify prefix
adthrasher Aug 28, 2024
3564657
chore: add bedtools dockerfile
adthrasher Aug 28, 2024
fb61aa3
refactor: add bowtie function format as struct
adthrasher Aug 28, 2024
f143eea
chore: move bowtie2filter into the bowtie2 task
adthrasher Aug 28, 2024
d65883b
chore: lint fixes
adthrasher Aug 28, 2024
54489e8
chore: revent trailing commas that mininwdl disallows
adthrasher Aug 28, 2024
f6ae399
chore: fix lint warning
adthrasher Aug 28, 2024
5c7ab92
chore: sort before markdup and index
adthrasher Aug 29, 2024
495863f
chore: update container
adthrasher Aug 29, 2024
98f052a
feat: add hic reference generation workflow
adthrasher Aug 29, 2024
f2a556b
chore: add outputs and documentation
adthrasher Aug 29, 2024
332cbc0
chore: add fixmate
adthrasher Aug 30, 2024
d2de023
fix: correct samtools ordering
adthrasher Aug 30, 2024
fb31903
fix: ligation -> restriction
adthrasher Aug 30, 2024
5e59bf9
chore: remove dead code
adthrasher Aug 30, 2024
34287e3
fix: fix read group specification
adthrasher Sep 3, 2024
ae579ee
fix: retain RGs in final merged BAM
adthrasher Sep 4, 2024
3477fc8
feat: add bam entrypoint. rename fastq entrypoint.
adthrasher Sep 5, 2024
b5e48e2
chore: clean up lint warnings
adthrasher Sep 6, 2024
b9fe3d0
chore: clean up lint warnings
adthrasher Sep 6, 2024
b1cc7d9
chore: fix fastq meta entry
adthrasher Sep 9, 2024
3875ba1
chore: update meta for params
adthrasher Sep 9, 2024
0be8e40
refactor: hic-simple -> to-ubam workflows
adthrasher Sep 10, 2024
23c71c5
Update tools/picard.wdl
adthrasher Sep 10, 2024
3276579
chore: remove embedded bash script
adthrasher Sep 10, 2024
76c1b27
Update tools/picard.wdl
adthrasher Sep 10, 2024
7d8a751
Update tools/picard.wdl
adthrasher Sep 10, 2024
149520f
Update workflows/hic/hicpro-core.wdl
adthrasher Sep 10, 2024
5e8349d
Update workflows/hic/hicpro-core.wdl
adthrasher Sep 10, 2024
8ca023b
Update tools/samtools.wdl
adthrasher Sep 10, 2024
b68e666
Update workflows/hic/hicpro-core.wdl
adthrasher Sep 10, 2024
551a155
Update workflows/hic/hicpro-core.wdl
adthrasher Sep 10, 2024
984127c
Update workflows/hic/hicpro-core.wdl
adthrasher Sep 10, 2024
f1cc66f
chore: apply PR feedback
adthrasher Sep 10, 2024
c131c11
chore: apply PR feedback, mininwdl check and shellcheck suggestions
adthrasher Sep 11, 2024
bb99cf2
chore: add disk requirements
adthrasher Sep 11, 2024
f206b88
chore: remove stray line continuation
adthrasher Sep 11, 2024
95f31eb
chore: additional disk resource
adthrasher Sep 11, 2024
84fdf72
chore: apply PR feedback
adthrasher Sep 11, 2024
c98c4bd
chore: reduce memory for RevertSam and FastqToSam
adthrasher Sep 11, 2024
e78e7e3
chore: apply PR feedback
adthrasher Sep 13, 2024
66a5645
fix: correct argument
adthrasher Sep 13, 2024
47623fe
chore: clarify sort type
adthrasher Sep 19, 2024
2402098
chore: use interpolation instead of concatenation
adthrasher Sep 19, 2024
9824304
refactor: split Hi-C post into a separate workflow
adthrasher Oct 28, 2024
4b1a97c
chore: fix lint warnings
adthrasher Oct 28, 2024
3e06810
chore: fix lint warnings
adthrasher Oct 28, 2024
0aa4f97
chore: apply PR feedback
adthrasher Oct 28, 2024
ce77e28
refactor: make split parameter configurable
adthrasher Oct 28, 2024
275c5d8
chore: apply PR feedback
adthrasher Oct 28, 2024
e837f8a
chore: fix miniwdl check
adthrasher Oct 28, 2024
f35f38b
Merge branch 'main' into hic_workflow
adthrasher Jan 3, 2025
cf471e6
chore: fix merge issues
adthrasher Jan 3, 2025
380fe6c
chore: resolve sprocket issues
adthrasher Jan 3, 2025
160a3aa
chore: resolve sprocket lints
adthrasher Jan 3, 2025
86cbd2f
chore: revert problematic trailing comma
adthrasher Jan 3, 2025
40d1763
chore: fix sprocket lints
adthrasher Jan 3, 2025
01a1ece
chore: revert trailing commas that break miniwdl
adthrasher Jan 3, 2025
334a34e
chore: revert trailing commas that break miniwdl
adthrasher Jan 3, 2025
0287b2b
chore: revert trailing commas that break miniwdl
adthrasher Jan 3, 2025
9f8331a
chore: revert trailing commas that break miniwdl
adthrasher Jan 3, 2025
09399c9
chore: satisfy sprocket lints
adthrasher Jan 3, 2025
7bf0db8
chore: satisfy sprocket lints
adthrasher Jan 3, 2025
3adc65c
chore: update bowtie outputs
adthrasher Jan 3, 2025
8e918db
chore: update bowtie outputs
adthrasher Jan 3, 2025
143e9bf
chore: update reporting description
adthrasher Jan 3, 2025
a06c495
chore: apply PR feedback
adthrasher Jan 3, 2025
b96b348
chore: threads -> ncpu
adthrasher Jan 3, 2025
5168c62
chore: remove memory mapped IO
adthrasher Jan 3, 2025
17ae8a9
chore: document omitted bowtie2 arguments
adthrasher Jan 3, 2025
ad3364f
chore: migrate embedded script
adthrasher Jan 3, 2025
42fe071
chore: apply PR feedback
adthrasher Jan 3, 2025
f553ae4
chore: migrate embedded script
adthrasher Jan 3, 2025
64186e5
chore: apply PR feedback
adthrasher Jan 6, 2025
3139e69
chore: apply PR feedback
adthrasher Jan 6, 2025
a262420
chore: apply PR feedback
adthrasher Jan 6, 2025
de835c3
chore: address sprocket lints
adthrasher Jan 6, 2025
058fbac
chore: address sprocket lints
adthrasher Jan 6, 2025
49e1c29
chore: address sprocket lints
adthrasher Jan 6, 2025
02ae409
chore: address sprocket lints
adthrasher Jan 6, 2025
27f9e2a
test: add bowtie2 tests
adthrasher Jan 6, 2025
de9b0be
test: add picard tests
adthrasher Jan 6, 2025
df85b2e
test: add samtools and picard tests
adthrasher Jan 6, 2025
8ac2922
fix: add check for Nonetype
adthrasher Jan 8, 2025
6923f27
test: add HiLOW tests
adthrasher Jan 8, 2025
cafe359
chore: apply PR feedback
adthrasher Jan 9, 2025
827dd75
chore: move to git lfs
adthrasher Jan 9, 2025
1038702
chore: remove unused hichip task
adthrasher Jan 9, 2025
c5eb1fa
chore: point containers to branch
adthrasher Jan 9, 2025
20cff40
chore: apply PR feedback
adthrasher Jan 9, 2025
723e77c
chore: apply PR feedback
adthrasher Jan 9, 2025
9414122
Merge branch 'main' into hic_workflow
adthrasher Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docker/bedtools/2.31.1/Dockerfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version dir should be 2.31.1-0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is named after bedtools, why does it have hic scripts in it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, it was just plain bedtools, but with the decision to remove the embedded scripts, those needed to get built in to some image and this one depends on bedtools. I can go somewhere else, but then we'll need to install bedtools in to another container.

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM quay.io/biocontainers/bedtools:2.31.1--hf5e1c6e_2 AS bedtools
FROM python:3.9.19

COPY --from=bedtools /usr/local/bin/ /usr/local/bin/

ENTRYPOINT [ "bash" ]
8 changes: 8 additions & 0 deletions docker/bowtie2/2.5.4/Dockerfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.5.4-0

Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM quay.io/biocontainers/samtools:1.17--h00cdaf9_0 AS samtools
FROM quay.io/biocontainers/bowtie2:2.5.4--he20e202_2

COPY --from=samtools /usr/local/bin/ /usr/local/bin/
COPY --from=samtools /usr/local/lib/ /usr/local/lib/
COPY --from=samtools /usr/local/libexec/ /usr/local/libexec/

ENTRYPOINT [ "bowtie2" ]
7 changes: 7 additions & 0 deletions docker/juicertools/1.6.2-0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM nservant/hicpro:3.0.0 AS hicpro
FROM aidenlab/juicer:1.0.13

COPY --from=hicpro /HiC-Pro_3.0.0/bin/utils/hicpro2juicebox.sh /HiC-Pro_3.0.0/bin/utils/hicpro2juicebox.sh
RUN chmod a+rwx /opt/juicer-1.6.2/CPU/common/juicer_tools.1.7.6_jcuda.0.8.jar

ENTRYPOINT [ "bash" ]
3 changes: 3 additions & 0 deletions tests/input/test.bwa_aln_pe.bsorted.pairs.gz
Git LFS file not shown
446 changes: 446 additions & 0 deletions tools/bowtie2.wdl

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions tools/bwa.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ task bwa_mem {
description: "Read group information for BWA to insert into the header. BWA format: '@RG\tID:foo\tSM:bar'",
group: "common",
}
skip_mate_rescue: "Skip mate rescue"
skip_pairing: "Skip pairing; mate rescue performed unless `skip_mate_rescue` also in use"
split_smallest: "For split alignment, take the alignment with the smallest coordinate as primary"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the help text.

       -S            skip mate rescue
       -P            skip pairing; mate rescue performed unless -S also in use
       -5            for split alignment, take the alignment with the smallest coordinate as primary

-P is the only option ith an entry in the manual:

-P | In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.

Nothing overly helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-frantz - I've added some additional text, but it relies on google searching and reading Q&A on online forums. I don't see anything definitive from the bwa authors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text for skip_[mate_rescue,pairing] looks good now! But I still have my original Q: What's the alternative to "take the alignment with the smallest coordinate as primary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's random, since these have the same score (I think).

short_secondary: "Mark shorter split hits as secondary"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar Q. What's the alternative here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark shorter split hits as secondary (for Picard compatibility).

It is apparently for compatibility with Picard. Other than that, the manual has no details. Google returns a biostars thread with the following:

without -M, a split read is flagged as 2048 (supplementary alignment) see http://picard.sourceforge.net/explain-flags.html. This flag is a recent addition to the SAM spec.
with option -M it is flagged as a duplicate flag=256 (not primary alignment): will be ignored by most 'old' tools.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though that answer conflicts with the description. They claim it is a duplicate, but the manual labels it secondary.

use_all_cores: {
description: "Use all cores? Recommended for cloud environments.",
group: "common",
Expand All @@ -230,6 +234,10 @@ task bwa_mem {
""
)
String read_group = ""
Boolean skip_mate_rescue = false
Boolean skip_pairing = false
Boolean split_smallest = false
Boolean short_secondary = false
Boolean use_all_cores = false
Int ncpu = 4
Int modify_disk_size_gb = 0
Expand Down Expand Up @@ -269,6 +277,10 @@ task bwa_mem {
bwa_db/"$PREFIX" \
~{basename(read_one_fastq_gz)} \
~{basename(read_two_file)} \
~{if skip_mate_rescue then "-S" else ""} \
~{if skip_pairing then "-P" else ""} \
~{if split_smallest then "-5" else ""} \
~{if short_secondary then "-M" else ""} \
| samtools view --no-PG --threads "$samtools_cores" -hb - \
> ~{output_bam}

Expand Down
Loading
Loading