Skip to content

Mapping-Assisted Targeted-Assembly for Metagenomics

License

Notifications You must be signed in to change notification settings

DreadBonney/matam

 
 

Repository files navigation

MATAM

Mapping-Assisted Targeted-Assembly for Metagenomics

Getting Started

The recommended way of getting MATAM is through conda (see below). For SSU rRNA assembly, run:

  1. Getting and indexing default SSU rRNA reference database

index_default_ssu_rrna_db.py -d $DBDIR --max_memory 10000

  1. Running MATAM

    • SSU rRNA recovery only

      matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i reads.fastq --cpu 4 --max_memory 10000 -v

    • SSU rRNA recovery and taxonomic assignment

      matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i reads.fastq --cpu 4 --max_memory 10000 -v --perform_taxonomic_assignment

      The taxonomic assignment is done with RDP classifier and the training model used by default is "16srrna"

Compiling MATAM from source code

  1. Cloning MATAM repository

git clone https://github.com/bonsai-team/matam.git && cd matam

  1. Compiling MATAM and dependencies

./build.py

  1. (Optional) Getting and indexing default SSU rRNA reference database

./index_default_ssu_rrna_db.py --max_memory 10000

  1. Assembling

$MATAMDIR/bin/matam_assembly.py -i reads.fastq --cpu 4 --max_memory 10000 -v

Hardware requirements

We recommand running MATAM with at least 10Go of free RAM. You can try running MATAM with less RAM if --max_memory is set to a lower value (eg. --max_memory 4000 for 4Go).

Some steps of MATAM are highly paralelized. You can get a significant speed increase during these steps by setting the --cpu option to a higher value

Dependencies

Quick install

To install all of the needed depencies except samtools, you can run the following command-line in Debian-like distributions :

sudo apt-get update && sudo apt-get install curl git gcc g++ python3 default-jdk automake make cmake ant libsparsehash-dev zlib1g-dev bzip2

Since the samtools package in current Ubuntu-like distributions is usualy a deprecated version (v0.1.19), you probably have to get a more recent version. We recommand getting samtools through bioconda (https://bioconda.github.io/)

Full dependencies list

  • gcc v4.9.0 or superior, (full C++11 support, <regex> included, and partial C++14 support)
  • C++ libraries: rt, pthread, zlib
  • Samtools v1.x or superior
  • Python 3
  • automake, make, cmake
  • Apache Ant
  • Java SE 7 JDK. OpenJDK is ok (openjdk-7-jdk paquet on debian)
  • bzip2
  • google sparse hash library (libsparsehash-dev paquet on debian)

MATAM in Docker

https://hub.docker.com/r/bonsaiteam/matam/

To run MATAM using docker, just run:

docker run bonsaiteam/matam matam_assembly.py

MATAM with conda

A conda package is available here: https://anaconda.org/bonsai-team/matam

Before you begin, you should have installed Miniconda or Anaconda. See https://conda.io/docs/installation.html for more details.
Then you will need to add the followings channels:

conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
conda config --add channels bonsai-team
conda config --add channels salford_systems

Finally, matam can be installed with: conda install matam

Indexing a custom reference database

To run MATAM on a custom reference database, run:

matam_db_preprocessing.py -i ref_db.fasta -d my_ref_db --cpu 4 --max_memory 10000 -v

Running example datasets

The following example datasets are provided:

16 bacterial species simulated dataset

  • Running de-novo assembly

    $MATAMDIR/bin/matam_assembly.py -i $MATAMDIR/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v

  • Running assembly in validation mode (For developpers. Exonerate must be available in $PATH)

    $MATAMDIR/bin/matam_assembly.py -i $MATAMDIR/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --true_references $MATAMDIR/examples/16sp_simulated_dataset/16sp.fasta --true_ref_taxo $MATAMDIR/examples/16sp_simulated_dataset/16sp.taxo.tab --cpu 4 --max_memory 10000 --debug

Release versioning

MATAM releases will be following the Semantic Versioning 2.0.0 rules described here: http://semver.org/spec/v2.0.0.html

About

Mapping-Assisted Targeted-Assembly for Metagenomics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 78.9%
  • C++ 20.5%
  • Other 0.6%