Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/restructure outputs #6

Merged
merged 20 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ If you're not used to this workflow with git, you can start with some [docs from

## Tests

You can optionally test your changes by running the pipeline locally. Then it is recommended to use the `debug` profile to
receive warnings about process selectors and other debug info. Example: `nextflow run . -profile debug,test,docker --outdir <OUTDIR>`.
You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:

```bash
nf-test test --profile debug,test,docker --verbose
```

When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.
Expand All @@ -40,7 +43,7 @@ If any failures or warnings are encountered, please follow the listed URL for mo

### Pipeline tests

Each `nf-core` pipeline should be set up with a minimal set of test-data.
Each of the Microbiome Informatics pipelines should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.
Expand Down Expand Up @@ -82,7 +85,7 @@ Once there, use `nf-core schema build` to add to `nextflow_schema.json`.

Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.

The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block.

### Naming schemes

Expand Down
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: nf-test CI
on:
push:
branches:
- dev
pull_request:
release:
types: [published]

env:
NXF_ANSI_LOG: false
NFTEST_VER: "0.8.4"

jobs:
test:
name: Run pipeline with test data
runs-on: ubuntu-latest

steps:
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: actions/setup-java@99b8673ff64fbf99d8d325f52d9a5bdedb8483e9 # v4
with:
distribution: "temurin"
java-version: "17"

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v2

- name: Install nf-test
uses: nf-core/setup-nf-test@v1

- name: Run pipeline with test data
run: |
nf-test test
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ testing*
results/

*.pyc
.pytest_cache/

assets/fetch_tool_credentials.json
assets/fetch_tool_credentials.json
.nf-test.log
.nf-test/
30 changes: 23 additions & 7 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,32 +1,48 @@
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
lint:
files_exist:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- docs/output.md
- docs/usage.md
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
- .github/workflows/branch.yml
- .github/workflows/ci.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- conf/test_full.config
- lib/Utils.groovy
- lib/WorkflowMain.groovy
- lib/NfcoreTemplate.groovy
- lib/WorkflowMiassembler.groovy
- lib/nfcore_external_java_deps.jar
files_unchanged:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/CONTRIBUTING.md
- LICENSE
- docs/README.md
- .gitignore
multiqc_config:
- report_comment
nextflow_config:
nextflow_config: False
- params.input
- params.validationSchemaIgnoreParams
- params.custom_config_version
- params.custom_config_base
- manifest.name
- manifest.homePage
readme:
- nextflow_badge
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
56 changes: 52 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@

This pipeline is still in early development. It's mostly a direct port of the mi-automation assembly generation pipeline. Some of the bespoke scripts used to remove contaminated contigs or to calculate the coverage of the assembly were replaced with tools provided by the community ([SeqKit](https://doi.org/10.1371/journal.pone.0163962) and [quast](https://doi.org/10.1093/bioinformatics/btu153) respectively).

> [!NOTE]
> This pipeline uses the nf-core template with some tweaks, but it's not part of nf-core.

## Usage

> [!WARNING]
Expand All @@ -23,12 +26,21 @@ nextflow run ebi-metagenomics/miassembler --help
Input/output options
--study_accession [string] The ENA Study secondary accession
--reads_accession [string] The ENA Run primary accession
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades for PE, megahit for SE]
--private_study [boolean] To use if the ENA study is private [default: false]
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades]
--reference_genome [string] The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics internal
directory (accepted: chicken.fna, salmon.fna, cod.fna, pig.fna, cow.fna, mouse.fna, honeybee.fna,
rainbow_trout.fna, ...) [default: human+phiX]
--reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal directory
[default: /nfs/production/rdf/metagenomics/pipelines/prod/assembly-pipeline/blast_dbs/]
rainbow_trout.fna, rat.fna, ...)
--blast_reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
directory.
--bwamem2_reference_genomes_folder [string] The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal
directory.
--remove_human_phix [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
--human_phix_blast_index_name [string] Combined Human and phiX BLAST db. [default: human_phix]
--human_phix_bwamem2_index_name [string] Combined Human and phiX bwa-mem2 index. [default: human_phix]
--min_contig_length [integer] Minimum contig length filter. [default: 500]
--assembly_memory [integer] Default memory allocated for the assembly process. [default: 100]
--spades_only_assembler [boolean] Run SPAdes/metaSPAdes without the error correction step. [default: true]
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud
infrastructure.
--email [string] Email address for completion summary.
Expand All @@ -50,7 +62,43 @@ nextflow run ebi-metagenomics/miassembler \
--reads_accession SRR1631361
```

## Outputs

The outputs of the pipeline are organized as follows:

```
results/SRP1154
└── SRP115494
└── SRR6180
└── SRR6180434
├── assembly
│   └── metaspades
│   └── 3.15.5
│   ├── coverage
│   ├── decontamination
│   └── qc
│   ├── multiqc
│   └── quast
└── qc
├── fastp
└── fastqc

```

The nested structure based on ENA Study and Reads accessions was created to suit the Microbiome Informatics team’s needs. The benefit of this structure is that results from different runs of the same study won’t overwrite any results.

## Tests

There is a very small test data set ready to use:

```bash
nextflow run main.nf -resume -profile test,docker
```

### End to end tests

Two end-to-end tests can be launched (with megahit and metaspades) with the following command:

```bash
pytest tests/workflows/ --verbose
```
2 changes: 1 addition & 1 deletion assets/email_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<img src="cid:nfcorepipelinelogo">

<h1>ebi-metagenomics/miassembler v${version}</h1>
<h1>ebi-metagenomics/miassembler ${version}</h1>
<h2>Run Name: $runName</h2>

<% if (!success){
Expand Down
14 changes: 4 additions & 10 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,21 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "ebi-metagenomics/miassembler Methods Description"
section_href: "https://github.com/ebi-metagenomics/miassembler"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using ebi-metagenomics/miassembler v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>Data is processed using MGnify ebi-metagenomics/miassembler v${workflow.manifest.version} ${doi_text}. Supported assemblers are MEGAHIT, SPAdes and metaSPAdes (default). Single-end reads are assembled only using MEGAHIT and metatranscriptomic data only with SPAdes. Pipeline uses a set of custom functions and modules from nf-core collection (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<p>${tool_citations}</p>
<h4>References</h4>
<ul>
<li>Richardson LJ, Allen B, Baldi G, Beracochea M, Bileschi M, Burdett T, Burgin J, Caballero-Pérez J, Cochrane G, Colwell L, Curtis T, Escobar-Zepeda A, Gurbich T, Kale V, Korobeynikov A, Raj S, Rogers AB, Sakharova E, Sanchez S, Wilkinson D and Finn RD. (2023) MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research. doi: <a href="https://academic.oup.com/nar/article/51/D1/D753/6880769">10.1093/nar/gkac1080</a></li>
<li>Li, D., Liu, C-M., Luo, R., Sadakane, K., and Lam, T-W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. doi: <a href="https://doi.org/10.1093/bioinformatics/btv033">10.1093/bioinformatics/btv033</a></li>
<li>Prjibelski A., Antipov D., Meleshko D., Lapidus A., Korobeynikov A. (2020). Using SPAdes De Novo Assembler. Current Protocols. doi: <a href="https://doi.org/10.1002/cpbi.102">10.1002/cpbi.102</a></li>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
${tool_bibliography}
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
<ul>
${nodoi_text}
<li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
<li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
</ul>
</div>
Binary file added assets/mgnify_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
report_comment: >
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/tree/dev" target="_blank">ebi-metagenomics/miassembler</a>
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/" target="_blank">ebi-metagenomics/miassembler</a>
analysis pipeline.

report_section_order:
"ebi-metagenomics-miassembler-methods-description":
order: -1000
software_versions:
order: -1001
"ebi-metagenomics-miassembler-summary":
order: -1002

export_plots: true

skip_versions_section: true

top_modules:
- fastqc
- quast
Expand Down
3 changes: 0 additions & 3 deletions assets/samplesheet.csv

This file was deleted.

Loading