Taxonomic classification, content summarization and gene identification: all-in-1 metagenomic analysis toolkit
LMAT's main goal is to efficiently assign taxonomic labels to the reads with reference representation down to the species level while maintaining accuracy in the presence of novel organisms. Scalable performance is demonstrated on real and simulated data to show accurate classification even with novel genomes on samples that include viruses, prokaryotes, fungi and protists.
LMAT has three related subcomponents (taxonomic profiling, content summarization and gene annotation) that can be run separately.
The quick installation procedure will use CMake to ease the process, by downloading, building and installing all the required packages.
- CMake3
- C/C++ compiler with OpenMP support (like gcc, clang, icc, xlc)
- Recommended: python, for some tools
- Optional: MPI, for use in building a Reference Database
redoall
is a convenient wrapper that will direct the installation through CMake for typical compilers (GNU gcc, clang/LLVM, Intel C/C++ compilers and IBM XL compilers for Power 8 and 9):
usage: redoall [profile] [compiler]
The 1st optional parameter chooses the build profile of CMake:
D
forDebug
R
forRelease
(this is the current default)I
for release with debug info (RelWithDebInfo
)M
for release with minimum size (MinSizeRel
)- for just cleaning the parameter is
clean
The 2nd optional parameter selects the compiler family:
gnu
for using GCCintel
for using Intel compilersclang
for using clang compilersibmpwr9
for compiling in Power 9 with IBM compilersibmpwr8
for compiling in Power 8 with IBM compilers
git clone https://github.com/LivGen/LMAT.git
cd LMAT
./redoall
git clone https://github.com/LivGen/LMAT.git
cd LMAT
./redoall D intel
- Scientific details about LMAT are explained in the article "Scalable metagenomic taxonomy classification using a reference genome database".
- Further information about LMAT can be found in the article "Using populations of human and microbial genomes for organism detection in metagenomes".
- Please refer to documentation in the 'doc' subdirectory for technical information on LMAT.
- LMAT web site at LLNL.
- This is an example of LMAT run.
If you are analyzing more than one sample with LMAT you can easily visualize and compare them using Recentrifuge: Robust comparative analysis and contamination removal for metagenomic data.
With a score-oriented approach, Recentrifuge is especially useful in the case of low biomass metagenomic studies or when a more reliable detection of minority organisms is needed, like in clinical, environmental and forensic analysis. Further details are in the bioRxiv pre-print.
For usage and documentation, please, see running Recentrifuge for LMAT in the Recentrifuge wiki.
LMAT uses PERM, a ‘C’ library for persistent heap management used with a dynamic-memory allocator, also developed at LLNL. For PERM (so LMAT) to work in the right conditions, some kernel tuning is advisable:
- Turn off periodic flush to file and dirty ratio flush:
echo 0 > /proc/sys/vm/dirty_writeback_centisecs
echo 100 > /proc/sys/vm/dirty_background_ratio
echo 100 > /proc/sys/vm/dirty_ratio
- Turn off address space randomization:
echo 0 > /proc/sys/kernel/randomize_va_space
==============================================
: : : ·· ·· · ··········
: : : ··· ··· · · ··
: : : ·· ·· ·· ·· ·· ·· ··
: : ······· ·· ·· ·· ······· ··
: ······ ·· ·· ·· ·· ··
····· ·· ·· ·· ·· ··
==============================================
Livermore Metagenomics Analysis Toolkit
==============================================