Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding pipeline nf-tests to nf-core/mag #550

Draft
wants to merge 62 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
18a1645
Merge pull request #499 from nf-core/dev
jfy133 Sep 26, 2023
ba72349
Merge pull request #524 from nf-core/dev
jfy133 Oct 10, 2023
e728900
Merge pull request #541 from nf-core/dev
jfy133 Nov 17, 2023
65034b5
Added ability to auto-create samplesheet for phageannotator
CarsonJM Nov 20, 2023
43491d8
Created tests for samplesheet creation
CarsonJM Nov 20, 2023
e39111e
Added nf-core-pipeline parameter to schema
CarsonJM Nov 20, 2023
69c684e
Changed input to samplesheet creation
CarsonJM Nov 20, 2023
bd06532
Conditional running of samplesheet creation
CarsonJM Nov 20, 2023
a564425
Updated docs and added pipeline to output file name
CarsonJM Nov 29, 2023
11bc721
Started structuring mag for nf-tests
CarsonJM Dec 8, 2023
f97ef32
Moving config file params to nf-test files
CarsonJM Dec 12, 2023
53e58e1
virus identification test complete
CarsonJM Dec 13, 2023
4dc19db
Added test_adapterremoval tests
CarsonJM Dec 13, 2023
848f4ef
Merge branch 'pipeline-nf-tests' of https://github.com/CarsonJM/mag i…
CarsonJM Dec 13, 2023
d5ec5e4
Merge branch 'nf-core:master' into pipeline-nf-tests
CarsonJM Dec 13, 2023
8ecb7ed
Updated CI to run nf-tests
CarsonJM Dec 13, 2023
490260b
Fixed test_nothing nf-test
CarsonJM Dec 13, 2023
647a197
Modified names and used collectFile()
CarsonJM Dec 14, 2023
ff096dc
Restructured workflows dir and added more nf-tests
CarsonJM Dec 14, 2023
155a152
Fixed linting and CI
CarsonJM Dec 14, 2023
857fb2d
Fixed CI part 2
CarsonJM Dec 14, 2023
40b4ebf
Added ancient_dna test and reproducibility for binning
CarsonJM Dec 15, 2023
2188e59
Added and fixed nf-tests
CarsonJM Jan 4, 2024
edb4a80
Updated CI tags
CarsonJM Jan 4, 2024
065c3a6
Fixed snapshots and linting
CarsonJM Jan 8, 2024
6110f5f
Fixed resource requests
CarsonJM Jan 8, 2024
2143e0a
Fixed linting
CarsonJM Jan 8, 2024
438ace3
Updated snapshot
CarsonJM Jan 8, 2024
6028f5a
Remove unused local modules
CarsonJM Jan 12, 2024
4ac4033
Merge branch 'dev' into pipeline-nf-tests
CarsonJM Jan 17, 2024
d52798d
Updated nf-core modules and pinned prettier version
CarsonJM Jan 17, 2024
a3c984a
Added test_data config to CI
CarsonJM Jan 17, 2024
319142c
Fixed tags for nf-core modules
CarsonJM Jan 17, 2024
8d39df0
Merge branch 'nf-core:dev' into dev
CarsonJM Jan 17, 2024
0126a63
Updated snapshots to reflect new nf-core modules
CarsonJM Jan 17, 2024
c5a0b86
Fixed CPU./mem requests
CarsonJM Jan 17, 2024
bdaa7e6
Update genomad hash
CarsonJM Jan 17, 2024
fafb1b2
Removed unecessary reproducibility options and updated schema
CarsonJM Jan 17, 2024
972ec92
Fixed linting
CarsonJM Jan 17, 2024
f4c7633
Fixed CI yet again
CarsonJM Jan 17, 2024
2cadf7b
[automated] Fix linting with Prettier
nf-core-bot Jan 17, 2024
332b7c7
[automated] Fix linting with Prettier
nf-core-bot Jan 17, 2024
c15f253
Merge branch 'dev' of https://github.com/CarsonJM/mag into pipeline-n…
CarsonJM Feb 2, 2024
7509201
Merge branch 'nf-core:dev' into dev
CarsonJM Feb 2, 2024
3317f8f
Merge branch 'dev' of https://github.com/CarsonJM/mag into pipeline-n…
CarsonJM Feb 2, 2024
96c48cc
Updated cat/cat and bin_summary
CarsonJM Feb 2, 2024
bd07e3a
Started updating nf-test snapshots
CarsonJM Feb 5, 2024
4a9e9e3
Merge branch 'dev' of https://github.com/nf-core/mag into pipeline-nf…
CarsonJM Feb 8, 2024
a9c8c08
Updated several workflow nf-tests after pipeline version update
CarsonJM Feb 8, 2024
61d86aa
Added test.config to CI for mem req
CarsonJM Feb 9, 2024
03e3631
Updated virus identification test
CarsonJM Feb 12, 2024
6f1bf11
Merge branch 'pipeline-nf-tests' of https://github.com/CarsonJM/mag i…
CarsonJM Feb 12, 2024
95e770f
Merge branch 'dev' of https://github.com/nf-core/mag into pipeline-nf…
CarsonJM Feb 12, 2024
4f3068b
Updated test config temporarily to test CI
CarsonJM Feb 12, 2024
1f4d36f
Update virus_identification snap
CarsonJM Feb 12, 2024
9ade6a3
Include test_data for nf-tests
CarsonJM Feb 12, 2024
1bc4619
Added nf-test config file
CarsonJM Feb 12, 2024
5561b5f
Fixed path to nf-test config
CarsonJM Feb 12, 2024
1bdbc49
Added back config files and restructued nf-test config
CarsonJM Feb 14, 2024
6080045
Added config file to each nf-test file
CarsonJM Feb 14, 2024
e625dee
Renamed test configs and included profiles in nf-tests
CarsonJM Feb 14, 2024
41c682a
Removed local proxy code
CarsonJM Feb 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 86 additions & 74 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,112 +1,124 @@
name: nf-core CI
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
name: nf-core CI
on:
push:
branches:
- dev
pull_request:
release:
types: [published]
merge_group:
types:
- checks_requested
branches:
- master
- dev

env:
NXF_ANSI_LOG: false
NFT_VER: "0.8.3"
NFT_WORKDIR: "~"
NFT_DIFF: "pdiff"
NFT_DIFF_ARGS: "--line-numbers --expand-tabs=2"

concurrency:
group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
cancel-in-progress: true

jobs:
test:
name: Run pipeline with test data
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }}"
changes:
name: Check for changes
runs-on: ubuntu-latest
strategy:
matrix:
NXF_VER:
- "23.04.0"
- "latest-everything"
outputs:
# Expose matched filters as job 'tags' output variable
tags: ${{ steps.filter.outputs.changes }}
steps:
- name: Free some space
run: |
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"

- name: Check out pipeline code
uses: actions/checkout@v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
- uses: actions/checkout@v3
- name: Combine all tags.yml files
id: get_username
run: find . -name "tags.yml" -not -path "./.github/*" -exec cat {} + > .github/tags.yml
- name: debug
run: cat .github/tags.yml
- uses: dorny/paths-filter@v2
id: filter
with:
version: "${{ matrix.NXF_VER }}"
filters: ".github/tags.yml"

- name: Run pipeline with test data
define_nxf_versions:
name: Choose nextflow versions to test against depending on target branch
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.nxf_versions.outputs.matrix }}
steps:
- id: nxf_versions
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
if [[ "${{ github.event_name }}" == "pull_request" && "${{ github.base_ref }}" == "dev" && "${{ matrix.NXF_VER }}" != "latest-everything" ]]; then
echo matrix='["latest-everything"]' | tee -a $GITHUB_OUTPUT
else
echo matrix='["latest-everything", "23.04.0"]' | tee -a $GITHUB_OUTPUT
fi

profiles:
name: Run workflow profile
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }}
test:
name: ${{ matrix.tags }} ${{ matrix.profile }} NF ${{ matrix.NXF_VER }}
needs: [changes, define_nxf_versions]
if: needs.changes.outputs.tags != '[]'
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# Run remaining test profiles with minimum nextflow version
NXF_VER: ${{ fromJson(needs.define_nxf_versions.outputs.matrix) }}
tags: ["${{ fromJson(needs.changes.outputs.tags) }}"]
profile:
[
test_host_rm,
test_hybrid,
test_hybrid_host_rm,
test_busco_auto,
test_ancient_dna,
test_adapterremoval,
test_binrefinement,
test_virus_identification,
]
steps:
- name: Free some space
run: |
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- "docker"

steps:
- name: Check out pipeline code
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline with ${{ matrix.profile }} test profile
run: |
nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker --outdir ./results
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

checkm:
name: Run single test to checkm due to database download
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }}
runs-on: ubuntu-latest
- uses: actions/setup-python@v4
with:
python-version: "3.11"
architecture: "x64"

steps:
- name: Free some space
- name: Install pdiff to see diff between nf-test snapshots
run: |
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
python -m pip install --upgrade pip
pip install pdiff

- name: Check out pipeline code
uses: actions/checkout@v2
- name: Cache nf-test installation
id: cache-software
uses: actions/cache@v3
with:
path: |
/usr/local/bin/nf-test
/home/runner/.nf-test/nf-test.jar
key: ${{ runner.os }}-${{ env.NFT_VER }}-nftest

- name: Install Nextflow
- name: Install nf-test
if: steps.cache-software.outputs.cache-hit != 'true'
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
wget -qO- https://code.askimed.com/install/nf-test | bash
sudo mv nf-test /usr/local/bin/

- name: Download and prepare CheckM database
- name: Run nf-test
run: |
mkdir -p databases/checkm
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz -P databases/checkm
tar xzvf databases/checkm/checkm_data_2015_01_16.tar.gz -C databases/checkm/
nf-test test --verbose --tag ${{ matrix.tags }} --profile +"${{ matrix.profile }}" --junitxml=test.xml --tap=test.tap

- uses: pcolby/tap-summary@v1
with:
path: >-
test.tap

- name: Run pipeline with ${{ matrix.profile }} test profile
- name: Output log on failure
if: failure()
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results --binqc_tool checkm --checkm_db databases/checkm
sudo apt install bat > /dev/null
batcat --decorations=always --color=always ${{ github.workspace }}/.nf-test/tests/*/meta/nextflow.log

- name: Publish Test Report
uses: mikepenz/action-junit-report@v3
if: always() # always run even if the previous step fails
with:
report_paths: test.xml
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ results/
testing/
testing*
*.pyc
.nf-tests/
.nf-test.log
.nf-test/
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ lint:
- config_defaults:
- params.phix_reference
- params.lambda_reference
actions_ci: false
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#543](https://github.com/nf-core/mag/pull/543) - Automatic samplesheet generation for nf-core/phageannotator (@CarsonJM)

### `Changed`

- [#581](https://github.com/nf-core/mag/pull/581) - Added explicit licence text to headers of all custom scripts (reported by @FriederikeHanssen and @maxibor, fix by @jfy133)
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The pipeline then:
- Performs ancient DNA validation and repair with [pyDamage](https://github.com/maxibor/pydamage) and [freebayes](https://github.com/freebayes/freebayes)
- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool)
- assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT) and optionally identifies viruses in assemblies using [geNomad](https://github.com/apcamargo/genomad), or Eukaryotes with [Tiara](https://github.com/ibe-uw/tiara)
- generates a samplesheet that can be used as input for other nf-core pipelines. Currently, [phageannotator](https://github.com/nf-core/phageannotator) is supported.

Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions.

Expand Down
2 changes: 2 additions & 0 deletions bin/combine_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ def main(args=None):
how="outer",
)

# sort results for reproducibility
results.sort_values(by="bin", inplace=True, ignore_index=True)
results.to_csv(args.out, sep="\t")


Expand Down
2 changes: 2 additions & 0 deletions bin/summary_busco.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,8 @@ def main(args=None):
else:
df_final = df_specific.append(df_failed)

# sort output file for reproducibility
df_final.sort_values(by="GenomeBin", inplace=True)
df_final.to_csv(args.out, sep="\t", index=False)


Expand Down
16 changes: 15 additions & 1 deletion conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ process {
memory = { check_max (128.GB * task.attempt, 'memory' ) }
time = { check_max (12.h * task.attempt, 'time' ) }
}
//bowtie2 returns exit code 250 when running out of memory
withName: BOWTIE2_HOST_REMOVAL_ALIGN {
cpus = { check_bowtie2_cpus (8, task.attempt ) }
memory = { check_max (40.GB * task.attempt, 'memory' ) }
time = { check_max (16.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139,250] ? 'retry' : 'finish' }
}
//bowtie2 returns exit code 250 when running out of memory
withName: BOWTIE2_PHIX_REMOVAL_ALIGN {
cpus = { check_bowtie2_cpus (8, task.attempt ) }
memory = { check_max (40.GB * task.attempt, 'memory' ) }
time = { check_max (16.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139,250] ? 'retry' : 'finish' }
}
//MEGAHIT returns exit code 250 when running out of memory
withName: MEGAHIT {
cpus = { check_megahit_cpus (8, task.attempt ) }
Expand All @@ -147,7 +161,7 @@ process {
}
//returns exit code 247 when running out of memory
withName: BOWTIE2_ASSEMBLY_ALIGN {
cpus = { check_max (2 * task.attempt, 'cpus' ) }
cpus = { check_bowtie2_cpus (8, 'cpus' ) }
memory = { check_max (8.GB * task.attempt, 'memory' ) }
time = { check_max (8.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139,247] ? 'retry' : 'finish' }
Expand Down
4 changes: 3 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,9 @@ process {
}

withName: BOWTIE2_HOST_REMOVAL_ALIGN {
ext.args = params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive"
ext.args = [
params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive"
].join(' ').trim()
ext.args2 = params.host_removal_save_ids ? "--host_removal_save_ids" : ''
ext.prefix = { "${meta.id}_run${meta.run}_host_removed" }
publishDir = [
Expand Down
9 changes: 6 additions & 3 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,19 @@ params {
max_memory = '6.GB'
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.multirun.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
skip_krona = true
min_length_unbinned_contigs = 1
megahit_fix_cpu_1 = true
spades_fix_cpus = 1
bowtie2_fix_cpu_1 = true
maxbin2_fix_cpu_1 = true
binning_map_mode = 'own'
min_length_unbinned_contigs = 1000000
max_unbinned_contigs = 2
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"
busco_clean = true
skip_gtdbtk = true
gtdbtk_min_completeness = 0
skip_concoct = true
}
23 changes: 13 additions & 10 deletions conf/test_adapterremoval.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,19 @@ params {

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.euk.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
metaeuk_db = "https://github.com/nf-core/test-datasets/raw/modules/data/proteomics/database/yeast_UPS.fasta"
clip_tool = 'adapterremoval'
keep_phix = true
centrifuge_db = null
kraken2_db = null
skip_krona = true
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"
skip_megahit = true
skip_spades = true
skip_spadeshybrid = true
skip_quast = true
skip_prodigal = true
skip_binning = true
skip_binqc = true
skip_gtdbtk = true
gtdbtk_min_completeness = 0
clip_tool = 'adapterremoval'
skip_concoct = true
bin_domain_classification = true
skip_prokka = true
skip_metaeuk = true
}
45 changes: 26 additions & 19 deletions conf/test_ancient_dna.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,30 @@ params {
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv'
centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz"
kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz"
skip_krona = true
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz"
skip_gtdbtk = true
gtdbtk_min_completeness = 0
ancient_dna = true
binning_map_mode = 'own'
skip_spades = false
skip_spadeshybrid = true
bcftools_view_high_variant_quality = 0
bcftools_view_medium_variant_quality = 0
bcftools_view_minimal_allelesupport = 3
refine_bins_dastool = true
refine_bins_dastool_threshold = 0
skip_concoct = true
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv'
skip_clipping = true
keep_phix = true
kraken2_db = null
centrifuge_db = null
skip_krona = true
megahit_fix_cpu_1 = true
spades_fix_cpus = 1
skip_spadeshybrid = true
ancient_dna = true
skip_quast = true
skip_prodigal = true
bowtie2_fix_cpu_1 = true
binning_map_mode = 'own'
maxbin2_fix_cpu_1 = true
bcftools_view_high_variant_quality = 0
bcftools_view_medium_variant_quality = 0
bcftools_view_minimal_allelesupport = 3
refine_bins_dastool = true
refine_bins_dastool_threshold = 0
min_length_unbinned_contigs = 1
max_unbinned_contigs = 2
skip_binqc = true
skip_gtdbtk = true
skip_prokka = true
skip_metaeuk = true
}
Loading
Loading