-
Notifications
You must be signed in to change notification settings - Fork 7
Taxonomy miscongruence
To compute the taxonomy miscongruence of a tree, the NCBI taxonomy must be installed into an SQL Server database:
$TT/ncbitax/load.sh <Server> <Database> NCBI
Here $TT
is the installation directory defined in the section Installation.
This command will install the NCBI taxonomy on the server <Server>
, database <Database>
, schema NCBI
.
A user must have DDL permissions in the database, including create schema
.
$TT/ncbitax/integrity.sh <Server> <Database> NCBI
The tax_ids printed by this script have inconsistent taxonomic lineages.
This SQL command will print the encoded lineage for Aquabacterium pictum:
EXEC NCBI.tax2phen 2315236, 0;
Let the file genome.tab
contain two columns: NCBI assembly id and NCBI taxid:
$ head -3 genome.tab
3378 441960
6778 5762
7048 423536
Create a directory phen/
with encoded taxonomic lineages for each assembly:
mkdir phen
$TT/trav genome.tab -threads 10 "sqsh-ms -S <Server> -D <Database> -w 1024 \
-C %QNCBI.tax2phen %2, 0%Q | grep -v '(return status = 0)' > phen/%1"
This directory will be used in the below sections.
The name "phen" means "phenotypes".
These are Boolean or nominal attributes of the tree objects.
These attributes are not used in a tree building.
Suppose a tree file tree
created by makeDistTree
contains assemblies in phen/
.
Solve the maximum parsimony problem for each taxonomic class present in the assemblies of the tree:
$TT/phylogeny/tree_quality_phen.sh tree "" phen 0 1 ""
The screen output contains a line which looks like:
# Non-monophyletic disagreements: 19380 (1.57) V !
The number 19380 is the taxonomy miscongruence.
The symbol "V" means that lower values are better.
The symbol "!" which is the last on the line helps finding this line by grep
.
The value 1.57 in parentheses equals the taxonomy miscongruence divided over the number of assemblies in the tree.
Suppose two trees of assemblies are stored in the files tree1
and tree2
created by makeDistTree
or converted to the makeDistTree
format, and these assemblies are in phen/
Then this command will create two intersection trees and compute the taxonomy miscongruence for each of them:
$TT/phylogeny/tree2_quality_phen.sh tree1 tree2 phen 0 ""
Suppose a tree file tree
created by makeDistTree
contains assemblies in phen/
.
Solve the maximum parsimony problem for each taxonomic class present in the assemblies of the tree::
$TT/phylogeny/tree2names.sh tree phen 0
As a result a file gain_nodes
is created, where each line has format
<asm1>:<asm2> <taxonomic name>
which means that <txonomic name>
is gained at the least common ancestor of assemblies <asm1>
and <asm2>
.
Directory phen/
in an Incremental distance tree directory
If an incremental distance tree directory has the subdirectory phen/
which is a link to a directory with encoded taxonomic lineages for each reservoir object, then the taxonomy miscongruence for the initial tree can be computed by
$TT/phylogeny/distTree_inc_tree1_quality.sh
and this script will be invoked by distTree_inc.sh
and distTree_inc_delete.sh
.