Skip to content

Commit

Permalink
Merge pull request #56 from genepi/code_refactoring
Browse files Browse the repository at this point in the history
Code refactoring
  • Loading branch information
seppinho authored Aug 7, 2023
2 parents bae68d0 + a886f08 commit b8f4c6c
Show file tree
Hide file tree
Showing 47 changed files with 500 additions and 1,110 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# ecSeq/DNAseq
# genepi/umi-pipeline-nf
---
# Releases

---
# Prereleases
## v0.1.0 -
* Initialised repo
* Initialised repo
43 changes: 30 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,34 @@
[<img width="200" align="right" src="docs/images/ecseq.jpg">](https://www.ecseq.com)
[![Nextflow](https://img.shields.io/badge/nextflow-20.07.1-brightgreen.svg)](https://www.nextflow.io/)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/)
[![Docker](https://img.shields.io/docker/automated/ecseq/dnaseq.svg)](https://hub.docker.com/r/ecseq/dnaseq)

umi-pipeline-nf Pipeline
Umi-pipeline-nf
======================

**umi-pipeline-nf** is based on a [snakemake pipeline](https://github.com/nanoporetech/pipeline-umi-amplicon) provided by [Oxford Nanopore Technologies (ONT)](https://nanoporetech.com/). To increase efficiency and usability the pipeline was transferred to [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation simple and results highly reproducible.
**Umi-pipeline-nf** creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicon data.
The pipeline can be run for the whole fastq_pass folder of your nanopore run and, per default, outputs the aligned consensus sequences of each UMI cluster in bam file. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
umi-pipeline-nf is based on the snakemake [ONT UMI analysis pipeline](https://github.com/nanoporetech/pipeline-umi-amplicon) (workflow originally developed by [Karst et al, Nat Biotechnol 18:165–169, 2021](https://www.nature.com/articles/s41592-020-01041-y)). We transferred the pipeline to [Nextflow](https://www.nextflow.io) and included [additional functionalities](#main-adaptations).

## Overview
`umi-pipeline-nf` creates highly accurate single-molecule consensus sequences based on amplicon data tagged by unique molecular identifiers (UMIs). The pipeline can be run for the whole fastq_pass folder of your nanopore run and per default, the output are the aligned consensus sequences in bam file format.
Additional flags can be set to perform a variant calling ( [freebayes](https://github.com/freebayes/freebayes), [lofreq](http://csb5.github.io/lofreq/) or [mutserve](https://mitoverse.readthedocs.io/mutserve/mutserve/) )
## Workflow

> See the [output documentation](docs/output.md) for more details of the results.
1. Input reads are aligned against a reference genome.
2. The flanking UMI sequences of all reads are extracted.
3. The extracted UMIs are used to cluster the reads.
4. Per cluster, highly accurate consensus sequences are created.
5. The consensus sequences are aligned against the reference sequenced.
6. An optional variant calling step can be performed.

> See the [output documentation](docs/output.md) for a detailed overview of the pipeline and its output files.
## Main Adaptations

* It comes with docker containers making **installation simple, portable** and **results highly reproducible**.
* The pipeline is **optimized for parallelization**.
* Read filtering strategy per UMI cluster was adapted to **preserve the highest quality reads**.
* **Three commonly used variant callers** ([freebayes](https://github.com/freebayes/freebayes), [lofreq](http://csb5.github.io/lofreq/) or [mutserve](https://mitoverse.readthedocs.io/mutserve/mutserve/)) are supported by the pipeline.
* The raw reads can be optionally **subsampled**.
* The raw reads can be **filtered by read length and quality**.

> See the [usage documentation](docs/usage.md) for all of the available parameters of the pipeline.
## Quick Start

Expand All @@ -21,19 +37,20 @@ Additional flags can be set to perform a variant calling ( [freebayes](https://g
2. Download the pipeline and test it on a minimal dataset with a single command

```bash
nextflow run AmstlerStephan/umi-pipeline-nf -profile test,docker
nextflow run genepi/umi-pipeline-nf -profile test,docker
```

3. Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible)

```bash
nextflow run AmstlerStephan/umi-pipeline-nf -r main -c <custom.config> -profile docker
nextflow run genepi/umi-pipeline-nf -r main -c <custom.config> -profile docker
```

> See the [usage documentation](docs/usage.md) for all of the available options when running the pipeline.

### Credits

These scripts were originally written for use by [GENEPI](https://genepi.i-med.ac.at/), by ([@StephanAmstler](https://github.com/AmstlerStephan)).
The pipeline was written by ([@StephanAmstler](https://github.com/AmstlerStephan)).
Nextflow template pipeline: [EcSeq](https://github.com/ecSeq).
Original snakemake-based pipeline: [nanoporetech/pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon).
Original workflow: [SorenKarst/longread_umi](https://github.com/SorenKarst/longread_umi).
149 changes: 0 additions & 149 deletions bin/bam_to_phred.py

This file was deleted.

30 changes: 26 additions & 4 deletions bin/extract_umis.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,26 @@ def parse_args(argv):
help="Length of adapter",
)
parser.add_argument(
"-t", "--threads", dest="THREADS", type=int, default=1, help="Number of threads."
"-t",
"--threads",
dest="THREADS",
type=int,
default=1,
help="Number of threads."
)
parser.add_argument(
"--tsv", dest="TSV", action="store_true", help="write TSV output file"
"--tsv",
dest="TSV",
action="store_true",
help="write TSV output file"
)
parser.add_argument(
"-o", "--output", dest="OUT", type=str, required=False, help="Output directory"
"-o",
"--output",
dest="OUT",
type=str,
required=False,
help="Output directory"
)
parser.add_argument(
"--output_format",
Expand All @@ -82,7 +95,10 @@ def parse_args(argv):
help="Reverse UMI sequence",
)
parser.add_argument(
"INPUT_FA", type=str, default="/dev/stdin", help="Filtered Reads"
"INPUT_FA",
type=str,
default="/dev/stdin",
help="Filtered Reads"
)

args = parser.parse_args(argv)
Expand All @@ -109,8 +125,10 @@ def extract_umi(query_seq, query_qual, pattern, max_edit_dist, format):
edit_dist = result["editDistance"]
locs = result["locations"][0]
umi = query_seq[locs[0]:locs[1]+1]

if format == "fastq":
umi_qual = query_qual[locs[0]:locs[1]+1]

return edit_dist, umi, umi_qual


Expand All @@ -123,15 +141,18 @@ def extract_adapters(entry, max_adapter_length, format):
if len(entry.sequence) > max_adapter_length:
read_5p_seq = entry.sequence[:max_adapter_length]
read_3p_seq = entry.sequence[-max_adapter_length:]

if format == "fastq":
read_5p_qual = entry.quality[:max_adapter_length]
read_3p_qual = entry.quality[-max_adapter_length:]

return read_5p_seq, read_3p_seq, read_5p_qual, read_3p_qual


def get_read_name(entry):
return entry.name.split(";")[0]


def get_read_strand(entry):
strand = entry.name.split("strand=")
if len(strand) > 1:
Expand All @@ -140,6 +161,7 @@ def get_read_strand(entry):
else:
return "+"


def combine_umis_fasta(seq_5p, seq_3p, strand):
if strand == "+":
return seq_5p + seq_3p
Expand Down
2 changes: 0 additions & 2 deletions bin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
description='Toolset to work with ONT amplicon sequencing using UMIs',
zip_safe=False,
install_requires=[
'tqdm',
'pysam',
'numpy',
'pandas',
Expand All @@ -32,7 +31,6 @@
'umi_extract = umi_amplicon_tools.extract_umis:main',
'umi_reformat_consensus = umi_amplicon_tools.reformat_consensus:main',
'umi_parse_clusters = umi_amplicon_tools.parse_clusters:main',
'umi_bam_to_phred = umi_amplicon_tools.bam_to_phred:main',
'umi_stats = umi_amplicon_tools.umi_stats:main'
]
},
Expand Down
Loading

0 comments on commit b8f4c6c

Please sign in to comment.