This repository contains a Python script that calculates the dN/dS ratio for multiple sequence alignments of coding DNA sequences. The dN/dS ratio is an important measure in molecular evolution, comparing the rate of non-synonymous substitutions (dN) to the rate of synonymous substitutions (dS) in coding DNA sequences. A dN/dS ratio > 1 indicates positive selection, a ratio < 1 indicates purifying (negative) selection, and a ratio ≈ 1 suggests neutral evolution.
The script can be run in different modes to:
- Calculate site counts for non-synonymous and synonymous sites in codons.
- Calculate substitution counts for non-synonymous and synonymous substitutions between codon pairs.
- Calculate dN/dS ratios per sequence in the alignment compared to the consensus sequence.
- Calculate dN/dS ratios per site in the alignment.
You can run the script using different commands and options:
Calculate the non-synonymous and synonymous site counts for each codon:
./dNdS_calculator.py site_counts -c CODON_TABLE -o OUTFILE
Calculate the non-synonymous and synonymous substitution counts for each pair of codons:
./dNdS_calculator.py sub_counts -c CODON_TABLE -o OUTFILE
Calculate dN/dS ratios per sequence in the alignment compared to the consensus sequence:
./dNdS_calculator.py per_sequence -i INFILE -p PREFIX -s SITE_COUNTS -u SUB_COUNTS
Calculate dN/dS ratios per site in the alignment:
./dNdS_calculator.py per_site -i INFILE -p PREFIX -s SITE_COUNTS -u SUB_COUNTS
Potential future additions to this repository include:
- Adding support for different genetic codes, such as mitochondrial or alternative nuclear codes.
- Extending the script to accommodate for gaps and ambiguous characters in the alignment.
- Implementing sliding window analysis for calculating dN/dS ratios over a specified window size.
- Providing visualization options for the generated dN/dS data, such as heatmaps or line plots.
- Allowing for parallel processing to speed up calculations on large datasets.