seq_consensus is a simple Python 3 library focused on calculating consensus sequences. Ambiguous letters in the input are handled as well. Numpy is used under the hood. Currently, DNA/RNA sequences are supported.
The package additionally offers a small utility
(cons_tool
),
which allows calculating consensus sequences on the commandline.
The method is identical with the approach by Geneious and very similar to the function ConsensusSequence from the DECIPHER
R package (options a little different).
The API documentation contains
some more description.
The complete user guide is found here and the API is documented here. Below some small examples for demonstration:
from seq_consensus import consensus
seqs = [
'ATTGC',
'AT-CC',
'RT-C-'
]
consensus(seqs, threshold=0.6)
This returns:
'AT-CC'
The script cons_tool
allows using the same functionality from the commandline.
An especially useful feature is the possibility to group sequences
by arbitrary regular expression pattern matched in the sequence headers:
cons_tool -k 'p:\w+' input.fasta
Example output (given that taxonomic annotations are present in the headers):
>p:Evosea consensus (n=124)
TACKATTTA--RTATTGAC-?TWA?-GKTACTAAAGCATGGGKA-T?AAA?AGGATTAGAGACCCTYGTA
>p:Chordata consensus (n=7065)
TWAYTTTA?--WAW-YWAY-YTGAA-YCCACGAAAGCTAAGAMA-CAAACTGGGATTAGATACCCCACTA
>p:Mollusca consensus (n=843)
TWAWTWTAW--WAW?WWAY-TTGAA-KYYAYGAAAKCTWRGRWA-YAAACTAGGATTAGATACCCTAYTA
>p:Chordata consensus (n=8509)
TWAYTTTA?--WAW-YMAC-TTGAA-CCCACGAAAGCTARGAMA-CAAACTGGGATTAGATACCCCACTA
>p:Platyhelminthes_ consensus (n=130)
TWAWTWTAA--WDW?TKWY-YTGAA-KYYACGAAAGYTAKGWTA-YAAACTGGGATTAGATACCCCATTA
>p:Ascomycotaconsensus (n=280)
TTAWTWTAA--WAA?TDAC-TTGAR-K??ACGAAAGCTWRGRWA-CAAACTAGGATTAGATACCCYABTA
>p:Streptophyta consensus (n=269)
TWAWTWTAW--WAW?TRAY-TTGAR-KY?ACGAAAGCTTRGRKA-CAAACTAGGATTAGATACCCTAKTA
(...)