DV-TrioTrain v0.8

DeepVariant-TrioTrain is an automated pipeline for extending DeepVariant (DV), a deep-learning-based germline variant caller. See the original DeepVariant GitHub page to learn more.

Background

The existing DeepVariant models were only trained on human data. Previous work built species-specific DeepVariant models for mosquito genomes and the endangered Kākāpō parrot. We built TrioTrain (DV-TT) to enable us to build custom DeepVariant models for cattle, bison, and yak genomes. Our custom models incorporate allele frequency data from over 5,500 published Bovine samples, making DV-TT the first tool to expand the existing Allele Frequency model into non-human, mammalian genomes. Our work illustrates the limitations of applying models built exclusively with human-genome datasets in other species. Our findings suggest that comparative genomics approaches in deep learning model development offer performance benefits over species-specific models.

How does TrioTrain work?

DV-TT is a SLURM-based, automated pipeline that produces new DV model(s) for germline variant-calling in any diploid organism, focusing on species without NIST-GIAB reference materials.

Currently, TrioTrain supports initializing training using an existing DV model. An index of compatible models can be found here.

Specifically, TrioTrain builds upon the existing DV model for short-read (Illumina) Whole Genome Sequence (WGS) data and, optionally, adds population-level allele frequency data from published samples. During model development, DV-TrioTrain iteratively feeds labeled examples from parent-offspring duos. Intuitively, a model trained on both parents should better predict inherited variants in the offspring; therefore, two training rounds are performed for each trio. After re-training, any models built with DV-TrioTrain become an alternative checkpoint with DeepVariant's one-step, single-sample variant caller.

Assuming the necessary training data for your favorite species already exist, TrioTrain automatically enables customizing the DeepVariant model. Additional details about the required data can be found here.

Why TrioTrain?

The unique re-training approach enables the model to incorporate inheritance expectations; however, models built by DV-TrioTrain do not require trio-binned data for variant calling.

While the DV-TT pipeline assumes re-training data are from trio-binned samples, models are trained to prioritize features of inherited variants to produce fewer Mendelian Inheritance Errors (MIE) in individual samples, in contrast to the DeepTrio joint-caller.

Get Started

Detailed user guides for installation, configuration, and a tutorial walk-through using the Human GIAB samples are available here.

How to cite

Citation to go here

Please also cite:

A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.
doi: https://doi.org/10.1038/nbt.4235

Improving variant calling using population data and deep learning. BMC Bioinformatics 24, 197 (2023).
Nae-Chyun Chen, Alexey Kolesnikov, Sidharth Goel, Taedong Yun, Pi-Chuan Chang, and Andrew Carroll.
doi: https://doi.org/10.1186/s12859-023-05294-0

Feedback and technical support

For questions, suggestions, or technical assistance, feel free to open an issue page or reach out to Jenna Kalleberg at [email protected]

Contributing to TrioTrain

Please open a pull request if you wish to contribute to TrioTrain.

License

GPL-3.0 license

Acknowledgments

Many thanks to the developers and contributors of the many open-source packages used by TrioTrain:

Name		Name	Last commit message	Last commit date
Latest commit History 421 Commits
.github		.github
docs		docs
scripts		scripts
triotrain		triotrain
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deep_variant_flags.txt		deep_variant_flags.txt
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DV-TrioTrain v0.8

Table of Contents

Background

How does TrioTrain work?

Why TrioTrain?

Get Started

How to cite

Please also cite:

Feedback and technical support

Contributing to TrioTrain

License

Acknowledgments

About

Releases

Packages

Languages

License

jkalleberg/DV-TrioTrain

Folders and files

Latest commit

History

Repository files navigation

DV-TrioTrain v0.8

Table of Contents

Background

How does TrioTrain work?

Why TrioTrain?

Get Started

How to cite

Please also cite:

Feedback and technical support

Contributing to TrioTrain

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages