Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/restructure outputs #6

Merged
merged 20 commits into from
May 30, 2024
Merged

Conversation

KateSakharova
Copy link
Contributor

@KateSakharova KateSakharova commented May 17, 2024

Updates:
I added FASTQC for decontaminated reads. Renamed files in decontamination step very urgly...

MultiQC report example. Have a look!
multiqc_report.html.zip

Current structure:

├── assembly
│   ├── decontamination
│   │   └── SRR6180434.txt
│   ├── spades
│   │   └── 3.15.5
│   │       ├── params.txt
│   │       ├── SRR6180434.assembly_graph.fastg.gz
│   │       ├── SRR6180434.assembly_graph_with_scaffolds.gfa.gz
│   │       ├── SRR6180434.contigs.fa.gz
│   │       └── SRR6180434.scaffolds.fa.gz
|   |       └── coverage
│   │              └── SRR6180434.txt.gz
|   |       └── qc
│   |          ├── multiqc
│   |          │   ├── multiqc_data
│   |          │   ├── multiqc_plots
│   |          │   └── multiqc_report.html
│   |          └── quast
│   |              ├── SRR6180434
│   |              └── versions.yml
├── pipeline_info
│   ├── execution_report_2024-05-17_16-17-51.html
│   ├── execution_timeline_2024-05-17_16-17-51.html
│   ├── execution_trace_2024-05-17_16-17-51.txt
│   ├── params_2024-05-17_16-18-58.json
│   ├── pipeline_dag_2024-05-17_16-17-51.html
│   └── software_versions.yml
└── qc
    ├── fastp
    │   ├── SRR6180434.fastp.html
    │   ├── SRR6180434.fastp.json
    │   └── SRR6180434.fastp.log
    └── fastqc
        ├── decontaminated_SRR6180434_1_fastqc.html
        ├── decontaminated_SRR6180434_1_fastqc.zip
        ├── decontaminated_SRR6180434_2_fastqc.html
        ├── decontaminated_SRR6180434_2_fastqc.zip
        ├── SRR6180434_1_fastqc.html
        ├── SRR6180434_1_fastqc.zip
        ├── SRR6180434_2_fastqc.html
        ├── SRR6180434_2_fastqc.zip
        └── versions.yml_fastqc.zip
        └── versions.yml

Problems:
I can output assembly into assembler/version, but I can't easily add coverage/qc/decontamination into that folder. For assembly process we do not refer to params.assembler because we have this lovely bit:

    READS_QC.out.qc_reads.branch { meta, reads ->
        xspades: ["metaspades", "spades"].contains(params.assembler)
                && meta.single_end == false
                || isMetatranscriptomic
        megahit: params.assembler == "megahit" || meta.single_end == true
    }.set { qc_reads }

and if I use $params.assembler / coverage (where requested assembler = spades) but during pipeline assembler was changed to megahit - then coverage would be in spades but assembly results would be in megahit (because that process was launched).

TODO:

  • spades vs metaspades? how to change folder name?

@KateSakharova KateSakharova marked this pull request as ready for review May 20, 2024 10:39
@KateSakharova KateSakharova force-pushed the feature/restructure_outputs branch from a8e13be to e0db239 Compare May 20, 2024 10:42
@KateSakharova KateSakharova self-assigned this May 20, 2024
Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kate. I left you some comments.
One thing that is missing is the study / run version on the folder name


SRPXXX / SRPXXXX
├── SRR1240
│     ├── SRR12403548
│     │    │   ├── assembly
│     │    │   ├── decontamination
│     │    │   │   └── SRR6180434.txt
│     │    │   ├── spades
│     │    │   │   └── 3.15.5
│     │    │   │       ├── params.txt
│     │    │   │       ├── SRR6180434.assembly_graph.fastg.gz
│     │    │   │       ├── SRR6180434.assembly_graph_with_scaffolds.gfa.gz
│     │    │   │       ├── SRR6180434.contigs.fa.gz
│     │    │   │       └── SRR6180434.scaffolds.fa.gz
|     |    |   |       └── coverage
│     │    │   │              └── SRR6180434.txt.gz
|     |    |   |       └── qc
│     │    │   |          ├── multiqc
│     │    │   |          │   ├── multiqc_data
│     │    │   |          │   ├── multiqc_plots
│     │    │   |          │   └── multiqc_report.html
│     │    │   |          └── quast
│     │    │   |              ├── SRR6180434
│     │    │   |              └── versions.yml
│     │    ├── pipeline_info
│     │    │   ├── execution_report_2024-05-17_16-17-51.html
│     │    │   ├── execution_timeline_2024-05-17_16-17-51.html
│     │    │   ├── execution_trace_2024-05-17_16-17-51.txt
│     │    │   ├── params_2024-05-17_16-18-58.json
│     │    │   ├── pipeline_dag_2024-05-17_16-17-51.html
│     │    │   └── software_versions.yml
│     │    └── qc
│     │        ├── fastp
│     │        │   ├── SRR6180434.fastp.html
│     │        │   ├── SRR6180434.fastp.json
│     |        │   └── SRR6180434.fastp.log
│     │        └── fastqc
│     |            ├── decontaminated_SRR6180434_1_fastqc.html
│     │            ├── decontaminated_SRR6180434_1_fastqc.zip
│     │            ├── decontaminated_SRR6180434_2_fastqc.html
│     │            ├── decontaminated_SRR6180434_2_fastqc.zip
│     │            ├── SRR6180434_1_fastqc.html
│     │            ├── SRR6180434_1_fastqc.zip
│     │            ├── SRR6180434_2_fastqc.html
│     │            ├── SRR6180434_2_fastqc.zip
│     │            └── versions.yml_fastqc.zip
│     │            └── versions.yml

The attached multiqc report looks very good, but it has an extra software section include only fastqc:
image

modules/nf-core/megahit/main.nf Outdated Show resolved Hide resolved
subworkflows/local/assembly_qc.nf Outdated Show resolved Hide resolved
workflows/miassembler.nf Outdated Show resolved Hide resolved
}
qc_reads_extended.branch { meta, reads ->
megahit: params.assembler == "megahit"
|| meta.single_end == true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the or meta.single_end bits, that is already decided by now

conf/modules.config Outdated Show resolved Hide resolved
@@ -21,7 +21,7 @@ process SAMTOOLS_BAM2FQ {

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def prefix = task.ext.prefix ? "decontaminated_${meta.id}": "${meta.id}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep things consistent, I would put this one in the task.ext in the modules just like BWA_MEM2 (https://github.com/EBI-Metagenomics/miassembler/pull/6/files#diff-bf809c928cb10b54d251dec8a140a2d5505150f97175d8d7b51dda9cc57971feR88)

@mberacochea
Copy link
Member

We also need to remove any mentions about this workflow being part of nf-core, such as the multiqc methods text:

Data was processed using ebi-metagenomics/miassembler v1.0dev of the nf-core collection of workflows ....

@KateSakharova KateSakharova force-pushed the feature/restructure_outputs branch from 508db9a to 5156e43 Compare May 29, 2024 13:24
Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: I think this is ready to be merge now

@KateSakharova KateSakharova merged commit 36bc2f0 into main May 30, 2024
1 check passed
@KateSakharova KateSakharova deleted the feature/restructure_outputs branch May 30, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants