-
Notifications
You must be signed in to change notification settings - Fork 0
Transrate De novo Transcriptome Evaluation
Transrate is a software used for evaluating the quality of de-novo transcriptome assemblies built from paired end reads.
Transrate can do this using three different metrics:
Contig metrics - Calculates simple stats of the contigs such as N50.
Read mapping metrics - Maps paired end reads back to the contigs and penalizes for Chimerism etc.
Comparative metrics - Aligns contigs to a reference from a closely related species.
You don't need to use all of these in the evaluation, and in fact, you can't use Comparative metrics without a good reference, and even then it may not be ideal as it will penalize potentially biologically correct novelty in the assembly.Instead the Read mapping metrics which requires only the reads and the assembly is recommended to be used.
Transrate will generate an assembly score, from 0 to 1, with 1 being the maximum score and 0 the lowest.
As a point of reference for scoring, Transrate did an analysis of 155 published transcriptomes with scores that ranged between 0.001 - 0.52 with around 50% of the assemblies scoring above 0.3.
You can read more here: http://biorxiv.org/content/biorxiv/early/2015/06/27/021626.full.pdf
The total assembly score is based on the score of each individual contig in the assembly, and beyond generating this score, Transrate also creates two output files containing "bad contigs" and "good contigs" filtered based on the contig scores.
NOTE: Important that paired end reads are all paired ( does not contain orphan reads ).
Using Transrate is simple.
Example of evaluation of assembly constructed from two sets of paired end reads using 10 CPU threads:
transrate --threads 10 --assembly path/Trinity.fasta --left /path/forward_reads_sample1.fastq,/path/forwards_reads_sample2.fastq --right /path/reverse_reads_sample1.fastq,/path/reverse_reads_sample2.fastq --output /path/results_directory
NOTE: If you do not specify reads, Read mapping metric will not be used. If you do not specify a reference, then Comparative metrics will not be used. Contig stats will always be provided.
Transrate command list:
--assembly
Assembly file(s) in FASTA format, comma-separated
--left
Left reads file(s) in FASTQ format, comma-separated
--right
Right reads file(s) in FASTQ format, comma-separated
--reference
Reference proteome or transcriptome file in FASTA format
--threads
Number of threads to use (default: 8)
--merge-assemblies
Merge beshttp://www.molgen.mpg.de/IMPRS/applicationt contigs from multiple assemblies into file
--output
Directory where results are output (will be created) (default: transrate_results)
--loglevel
Log level. One of [error, info, warn, debug] (default: info)
--install-deps
Install any missing dependencies. One of [ref]
--examples
Show some example commands with explanations