GitHub - panyuwen/TMRCA: time to the most recent common ancestor

TMRCA (by Yuwen Pan)

For calculation of within-population TMRCA & cross-population divergence time, requiring phased VCF as input. Missing genotype (i.e., ".|.") is acceptable.

If you use the TMRCA in your research work, please cite at least one of the following paper(s):

Lineage-specific positive selection on ACE2 contributes to the genetic susceptibility of COVID-19 (National Science Review, 2022)

Examples:

./TMRCA -h

./TMRCA --gzvcf input.vcf.gz --region region.txt --ape [email protected]

Inputs:

required

--gzvcf: phased VCF.gz file, in GT format, i.e., "1|0". accept missing geno (.|.). male chrX should be homozygote.

--region: bed file, one region indicated in each line. 3 columns: chr start end . no header line, tab or space delimited, additional columns will be ignored. chromosome ID(s) should be coded in the same way as those in the VCF file (i.e., "1" is different from "chr1")

--ape: difference between human and chimpanzee (or any other outgroup) genome. each line indicates a genetic position with allele difference between human genome and chimpanzee genome. 2 columns: chr pos , additional columns will be ignored, no header, tab or space delimited. the filename pattern should contain a '@' where the chromosome number would go (e.g., --ape [email protected]). get all the files prepared under the same folder. the program will then find all the files by replacing "@" with the chromosome IDs in your input files.

optional

--samples: file for sample info. each line indicates the sample ID, population ID, and also which haplotype to be used (or both haplotypes). 2 columns: sampleID_1/2 popID . no header line. For example, "sampleX_2 popX" stands for the 2nd haplotype from sampleX, and this sample belongs to popX. default: "all" samples in the input vcf data will be used and considered as one pop.

--Tind: T / F . whether to estimate pairwise TMRCA between individuals/haplotypes. default: F. it may take some additional time if "T".

--pairs: indicate pairs of populations between which the time should be estimated. one pair of populations in each line. 2 columns: popN popM . pop name(s) should be in the --samples file. no header line, tab or space delimited, additional columns will be ignored. Default: all possible pairs of populations will be considered.

--divT: divergence time between humna and chimpanzee (or any other outgroup) in years. Default: 13e6

--out: output file prefix. default: out .

Notes:

This program is compiled in centos7, older systems may not be supported.

If you have any problem using the compiled program, you can use the other 2 scripts in the following way: python2 TMRCA.py2.py [--options]; OR python3 TMRCA.py3.py [--options].

By: Yuwen Pan, 2022
Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Human_panTro5		Human_panTro5
LICENSE		LICENSE
README.md		README.md
TMRCA		TMRCA
TMRCA.py2.py		TMRCA.py2.py
TMRCA.py3.py		TMRCA.py3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Examples:

Inputs:

Notes:

About

Releases

Packages

Languages

License

panyuwen/TMRCA

Folders and files

Latest commit

History

Repository files navigation

Examples:

Inputs:

Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages