Skip to content

Commit

Permalink
Resolve merge conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
Ge94 committed Dec 17, 2024
2 parents ef5001f + 71685fd commit 39aca5c
Show file tree
Hide file tree
Showing 54 changed files with 2,379 additions and 727 deletions.
31 changes: 4 additions & 27 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ There are typically two types of tests that run:

### Lint tests

`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
This pipeline follows some of the `nf-core` [guidelines](https://nf-co.re/developers/guidelines).
To enforce these, the `nf-core` team has developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.

If any failures or warnings are encountered, please follow the listed URL for more documentation.

Expand All @@ -52,9 +52,9 @@ These tests are run both with the latest available version of `Nextflow` and als

:warning: Only in the unlikely and regretful event of a release happening with a bug.

- On your own fork, make a new branch `patch` based on `upstream/master`.
- On your own fork, make a new branch `patch` based on `upstream/main`.
- Fix the bug, and bump version (X.Y.Z+1).
- A PR should be made on `master` from patch to directly this particular bug.
- A PR should be made on `main` from patch to directly this particular bug.

## Pipeline contribution conventions

Expand Down Expand Up @@ -93,26 +93,3 @@ Please use the following naming schemes, to make it easy to understand what is g

- initial process channel: `ch_output_from_<process>`
- intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`

### Nextflow version bumping

If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`

### Images and figures

For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).

## GitHub Codespaces

This repo includes a devcontainer configuration which will create a GitHub Codespaces for Nextflow development! This is an online developer environment that runs in your browser, complete with VSCode and a terminal.

To get started:

- Open the repo in [Codespaces](https://github.com/ebi-metagenomics/miassembler/codespaces)
- Tools installed
- nf-core
- Nextflow

Devcontainer specs:

- [DevContainer config](.devcontainer/devcontainer.json)
3 changes: 0 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,4 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/ebi-metageno
- [ ] Make sure your code lints (`nf-core lint`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir <OUTDIR>`).
- [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).
80 changes: 80 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
name: nf-core linting
on:
push:
branches:
- dev
pull_request:
release:
types: [published]

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Set up Python 3.12
uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: "3.12"

- name: Install pre-commit
run: pip install pre-commit

- name: Run pre-commit
run: pre-commit run --all-files

nf-core:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2

- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
python-version: "3.12"
architecture: "x64"

- name: read .nf-core.yml
uses: pietrobolcato/[email protected]
id: read_yml
with:
config: ${{ github.workspace }}/.nf-core.yml

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }}
- name: Run nf-core pipelines lint
if: ${{ github.base_ref != 'main' }}
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md

- name: Run nf-core pipelines lint --release
if: ${{ github.base_ref == 'main' }}
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md

- name: Save PR number
if: ${{ always() }}
run: echo ${{ github.event.pull_request.number }} > PR_number.txt

- name: Upload linting log file artifact
if: ${{ always() }}
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4
with:
name: linting-logs
path: |
lint_log.txt
lint_results.md
PR_number.txt
21 changes: 11 additions & 10 deletions .github/workflows/ci.yml → .github/workflows/nf_tests.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
name: nf-test CI
on:
push:
branches:
- dev
pull_request:
release:
types: [published]
workflow_dispatch:

env:
NXF_ANSI_LOG: false
Expand All @@ -15,22 +13,25 @@ jobs:
name: Run pipeline with test data
runs-on: ubuntu-latest

strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
NXF_VER: ["24.04.0"]

steps:
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: actions/setup-java@99b8673ff64fbf99d8d325f52d9a5bdedb8483e9 # v4
with:
distribution: "temurin"
java-version: "17"

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/[email protected]
with:
version: "${{ matrix.NXF_VER }}"

- name: Install nf-test
uses: nf-core/setup-nf-test@v1
with:
version: 0.9.0
install-pdiff: true
version: 0.9.2

- name: Run pipeline with test data
run: |
Expand Down
7 changes: 6 additions & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ lint:
- .github/workflows/ci.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- .github/workflows/ci.yml
- conf/test_full.config
- lib/Utils.groovy
- lib/WorkflowMain.groovy
Expand All @@ -32,18 +33,22 @@ lint:
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/CONTRIBUTING.md
- .github/workflows/linting.yml
- LICENSE
- docs/README.md
- .gitignore
multiqc_config:
- report_comment
nextflow_config: False
nextflow_config:
- params.input
- params.validationSchemaIgnoreParams
- params.custom_config_version
- params.custom_config_base
- manifest.name
- manifest.homePage
- custom_config
readme:
- nextflow_badge
nf_core_version: 3.0.2
77 changes: 66 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ This pipeline is still in early development. It's mostly a direct port of the mi
## Usage

> [!WARNING]
> It only runs in EBI Codon cluster using Slurm ATM.
Pipeline help:

```bash
Expand All @@ -28,14 +25,14 @@ Typical pipeline command:
Input/output options
--study_accession [string] The ENA Study secondary accession
--reads_accession [string] The ENA Run primary accession
--private_study [boolean] To use if the ENA study is private
--private_study [boolean] To use if the ENA study is private, *this feature only works on EBI infrastructure at the moment*
--samplesheet [string] Path to comma-separated file containing information about the raw reads with the prefix to be used.
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit)
--single_end [boolean] Force the single_end value for the study / reads
--library_strategy [string] Force the library_strategy value for the study / reads (accepted: metagenomic, metatranscriptomic,
genomic, transcriptomic, other)
--library_layout [string] Force the library_layout value for the study / reads (accepted: single, paired)
--platform [string] Force the sequencing_platform value for the study / reads
--platform [string] Force the sequencing_platform value for the study / reads
--spades_version [string] null [default: 3.15.5]
--megahit_version [string] null [default: 1.2.9]
--flye_version [string] null [default: 2.9]
Expand All @@ -45,7 +42,7 @@ Input/output options
--blast_reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
directory.
--bwamem2_reference_genomes_folder [string] The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal

--reference_genomes_folder [string] The folder with reference genomes, defaults to the Microbiome Informatics internal
directory.
--remove_human_phix [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
Expand All @@ -64,7 +61,6 @@ Generic options
--multiqc_methods_description [string] Custom MultiQC yaml file containing HTML including a methods description.
```
Example:
```bash
Expand All @@ -78,14 +74,17 @@ nextflow run ebi-metagenomics/miassembler \
```
### Required DBs:
- `--reference_genome`: reference genome in FASTA format
- `--blast_reference_genomes_folder`: mandatory **human_phiX** is provided on [FTP](https://ftp.ebi.ac.uk/pub/databases/metagenomics/pipelines/references/)
- `--bwamem2_reference_genomes_folder`: mandatory **human_phiX** is provided on [FTP](https://ftp.ebi.ac.uk/pub/databases/metagenomics/pipelines/references/)
Blast and bwa-mem2 reference databases can be generated for any reference genome to polish input sequences with.
#### BWA-MEM2
As explained in [bwa-mem2's README](https://github.com/bwa-mem2/bwa-mem2?tab=readme-ov-file#getting-started):
```
# Use precompiled binaries (recommended)
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 \
Expand All @@ -98,6 +97,7 @@ bwa-mem2-2.2.1_x64-linux/bwa-mem2 index ref.fa
This will generate multiple index files in a folder. The folder containing them is the one to use as `bwamem2_reference_genomes_folder`.
#### BLAST
```
makeblastdb -in <ref.fa> -dbtype nucl -out <my_db_file>
```
Expand Down Expand Up @@ -147,6 +147,18 @@ PRJ1,ERR1,/path/to/reads/ERR1_1.fq.gz,/path/to/reads/ERR1_2.fq.gz,paired,metagen
PRJ2,ERR2,/path/to/reads/ERR2.fq.gz,,single,genomic,megahit,32
```
### ENA Private Data
The pipeline includes a module to download private data from ENA using the EMBL-EBI FIRE (File Replication) system. This system is restricted for use within the EMBL-EBI network and will not work unless connected to that network.
If you have private data to assemble, you must provide the full path to the files on a system that Nextflow can access.
#### Microbiome Informatics Team
To process private data, the pipeline should be launched with the `--private_study` flag, and the samplesheet must include the private FTP (transfer services) paths. The `download_from_fire` module will be utilized to download the files.
This module uses [Nextflow secrets](https://www.nextflow.io/docs/latest/secrets.html#how-it-works). Specifically, it requires the `FIRE_ACCESS_KEY` and `FIRE_SECRET_KEY` secrets to authenticate and download the files.
## Outputs
The outputs of the pipeline are organized as follows:
Expand Down Expand Up @@ -197,6 +209,49 @@ results
The nested structure based on ENA Study and Reads accessions was created to suit the Microbiome Informatics team’s needs. The benefit of this structure is that results from different runs of the same study won’t overwrite any results.
### Coverage
The pipeline reports the coverage values for the assembly using two mechanisms: `jgi_summarize_bam_contig_depths` and a custom whole assembly coverage and coverage depth.
#### jgi_summarize_bam_contig_depths
This tool summarizes the depth of coverage for each contig from BAM files containing the mapped reads. It quantifies the extent to which contigs in an assembly are covered by these reads. The output is a tabular file, with rows representing contigs and columns displaying the summarized coverage values from the BAM files. This summary is useful for binning contigs or estimating abundance in various metagenomic datasets.
This file is generated per assembly and stored in the following location (e.g., for study `SRP115494` and run `SRR6180434`): `SRP1154/SRP115494/multiqc/SRR5949/SRR5949318/assembly/metaspades/3.15.5/coverage/SRR6180434_coverage_depth_summary.tsv.gz`
##### Example output of `jgi_summarize_bam_contig_depths`
| contigName | contigLen | totalAvgDepth | SRR6180434_sorted.bam | SRR6180434_sorted.bam-var |
| -------------------------------- | --------- | ------------- | --------------------- | ------------------------- |
| NODE_1_length_539_cov_105.072314 | 539 | 273.694 | 273.694 | 74284.7 |
###### Explanation of the Columns:
1. **contigName**: The name or identifier of the contig (e.g., `NODE_1_length_539_cov_105.072314`). This is usually derived from the assembly process and may include information such as the contig length and coverage.
2. **contigLen**: The length of the contig in base pairs (e.g., `539`).
3. **totalAvgDepth**: The average depth of coverage across the entire contig from all BAM files (e.g., `273.694`). This represents the total sequencing coverage averaged across the length of the contig. This value will be the same as the sample avg. depth in assemblies of a single sample.
4. **SRR6180434_sorted.bam**: The average depth of coverage for the specific sample represented by this BAM file (e.g., `273.694`). This shows how well the contig is covered by reads.
5. **SRR6180434_sorted.bam-var**: The variance in the depth of coverage for the same BAM file (e.g., `74284.7`). This gives a measure of how uniform or uneven the read coverage is across the contig.
#### Coverage JSON
The pipeline calculates two key metrics: coverage and coverage depth for the entire assembly. The coverage is determined by dividing the number of assembled base pairs by the total number of base pairs before filtering. Coverage depth is calculated by dividing the number of assembled base pairs by the total length of the assembly, provided the assembly length is greater than zero. These metrics provide insights into how well the reads cover the assembly and the average depth of coverage across the assembled contigs. The script that calculates this number is [calculate_assembly_coverage.py](bin/calculate_assembly_coverage.py).
The pipeline creates a JSON file with the following content:
```json
{
"coverage": 0.04760503915318373,
"coverage_depth": 273.694
}
```
The file is stored in (e.g. for study `SRP115494` and run `SRR6180434`) -> `SRP1154/SRP115494/multiqc/SRR5949/SRR5949318/assembly/metaspades/3.15.5/coverage/SRR6180434_coverage.json`
### Top Level Reports
#### MultiQC
Expand All @@ -219,10 +274,10 @@ SRR6180434,short_reads_filter_ratio_threshold_exceeded
##### Runs exclusion messages
| Exclusion Message | Description |
| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `short_reads_filter_ratio_threshold_exceeded` | The maximum fraction of reads that are allowed to be filtered out. If exceeded, it flags excessive filtering. The default value is 0.9, meaning that if more than 90% of the reads are filtered out, the threshold is considered exceeded, and the run is not assembled. |
| `short_reads_low_reads_count_threshold` | The minimum number of reads required after filtering. If below, it flags a low read count, and the run is not assembled. |
| Exclusion Message | Description |
| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `short_reads_filter_ratio_threshold_exceeded` | The maximum fraction of reads that are allowed to be filtered out. If exceeded, it flags excessive filtering. The default value is 0.1, meaning that if less than 10% of the reads are retained after filtering, the threshold is considered exceeded, and the run is not assembled. |
| `short_reads_low_reads_count_threshold` | The minimum number of reads required after filtering. If below, it flags a low read count, and the run is not assembled. |
#### Assembled Runs
Expand Down
6 changes: 3 additions & 3 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ report_comment: >
analysis pipeline.
report_section_order:
"software_versions":
order: -1000
"ebi-metagenomics-miassembler-methods-description":
order: -1001
"ebi-metagenomics-miassembler-summary":
"software_versions":
order: -1002
"ebi-metagenomics-miassembler-summary":
order: -1003

export_plots: true

Expand Down
Loading

0 comments on commit 39aca5c

Please sign in to comment.