Galaxy for virologist training Exercise 4: Nanopore mapping 101

Title	Galaxy
Training dataset:	Nanopore MinION Sequencing of a Monkey Pox Virus (MPXV) from Spain 2022 oubreak. Data is publicly available at SRA with ID ERR10297654. Paper
Questions:	How Nanopore reads are differently assembled from Illumina?
Objectives:	Understand the concept of assembly Learn how to interpret assembly quality control metrics
Estimated time:	40 min

1. Description

Nanopore techology is a third generation sequencing technique which allows to get longer sequences, but with reduced sequence quality. Different technologies have different formats, qualities, and specific known biases which make the analysis different among them. In this tutorial, we are going to see an example of how to assemble long reads from a Nanopore sequencing run.

2. Upload data to galaxy

Training dataset

[SRA ID: ERR10297654](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=ERR10297654&display=metadata

Create new history

Click the + icon at the top of the history panel and create a new history with the name nanopore assembly 101 tutorial as explained here

Upload data

Look for SRA in the tool search bar and select Faster Download and Extract Reads in FASTQ format from NCBI SRA
Accession = ERR10297654
Execute

Load reference file from NCBI

Search NCBI using the search toolbox and select NCBI Accession Download Download sequences from GenBank/RefSeq by accession through the NCBI ENTREZ API
Select source for IDs > Direct entry
ID List = NC_063383.1
Execute

Unhide data

Using SRA and NCBI API downloads data as hidden so we are going to unhidde this data as follows:

Click on the strikethrough eye (Show hidden)
Select the strikethrough for ERR10297654 and NC_063383.1 datas.
Then select the location icon (show active)

Mapping with Minimap2

Search minimap2 using the search toolbox and select Map with minimap2 A fast pairwise aligner for genomic and spliced nucleotide sequences
Will you select a reference genome from your history or use a built-in index?: Use a genome from history and built-in index
- Select NC_063383.1
Select fastq dataset: ERR10297654
Select a profile of preset options > Oxford Nanopore Read to reference mapping (map-ont)
Click execute and wait.

Mapping stats with samtools

Search flagstatst using the search toolbox and select Samtools flagstat tabulate descriptive stats for BAM datset
BAM File to report statistics of > Select Minimap2 bam output
Click execute and wait.
Click in the 👁️ and see the bam stats.

Which is the mapping rate?

5.30%

How many reads do we have in our dataset?

21868

This training history is available at: https://usegalaxy.eu/u/s.varona/h/nanopore-assembly-101-tutorial

Note: Nanopore data is known to have more error than short sequencing reads. This is why assembly post-processing is strongly recommended, usually using combined sequencing aproximation with both Nanopore and Illumina reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06_mapping_nanopore.md

06_mapping_nanopore.md

Galaxy for virologist training Exercise 4: Nanopore mapping 101

1. Description

2. Upload data to galaxy

Training dataset

Create new history

Upload data

Load reference file from NCBI

Unhide data

Mapping with Minimap2

Mapping stats with samtools

Files

06_mapping_nanopore.md

Latest commit

History

06_mapping_nanopore.md

File metadata and controls

Galaxy for virologist training Exercise 4: Nanopore mapping 101

1. Description

2. Upload data to galaxy

Training dataset

Create new history

Upload data

Load reference file from NCBI

Unhide data

Mapping with Minimap2

Mapping stats with samtools