We develop a pipeline for benchmarking of variants in the genomes of Chinese Quartet, enabling evaluation of the performance of different sequencing technologies, variant calling algorithms, and pipelines.
The following software/packages are required in same environment:
- python
- snakemake
- pysam
- numpy
- pandas
You can use conda to install all of these packages, for example:
conda install package-name
You also need to install the following software:
- bedtools
- bcftools
- tabix
- bgzip
- hap.py (for small variants benchmarking)
- truvari (for structural variants benchmarking)
Download the latest version of variants and benchmark regions of Chinese Quartet according to the instruction.
- Config your own config.yaml according to the template.
- Config your own vcf file in a tsv (Tab-Separated-Values) file according to the template.
Run the piepline with snakemake
snakemake -s ./Sankefile -j 40 -k --ri # on a local computer
snakemake -s ./Sankefile -j 10 -k --ri --cluster 'qsub -l nodes=1:ppn=12 -l walltime=99:00:00' >sublog 2>&1 & # on a cluster
Under construction!
Jia P, Dong L, Yang X, Wang B, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, et al: Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. bioRxiv 2022:2022.2009.2008.504083. PDF
- Kai Ye ([email protected])
- Peng Jia ([email protected])