-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(datasets): adding new variant index model (#641)
* feat(variant annotation): new variant annotation schema + logic to extract from VEP * fix: typehints in function * refactor(variant annotation): migrating methods to the new schema * chore: pre-commit auto fixes [...] * refactor(variant index): sorting out new variant index dataset * chore: pre-commit auto fixes [...] * feature(vep): adding predictors to vep transcript object * fix(schema): fixing schema missing fields * fix(schema): fixing schema missing fields * fix(schema): fixing schema missing fields * fix(schema): fixing schema missing fields * chore: pre-commit auto fixes [...] * fix(annotation): array union under condition * fix: merging dbxref objects * feat(variants): updating variants to make more robust * feat: migrating methods to new variant index * adjusting variant index methods * some updates * rename v2g to variant to gene * chore: pre-commit auto fixes [...] * adding test * chore: pre-commit auto fixes [...] * fix(precommit): json file needed to rename to jsonl * fix(precommit): removing steps depending on old data model * fix(coftest): fixing variant index mock generation * fix: typo in package import * fix: sorting out conftest * refactor(gwas ingest): Updating GnomAD handling * refactor(gnomad): variant annotation removed, changed to variant index, steps updated * refactor: shuffling around gnomad logic * fix: references in tests * refactor: sorting out gnomad variant dag * refactor: cleaning configs and tests * docs(vep): adding datasource description * test(vep): adding more test to the vep parser * test(vep): tests are now running * fix: removing version suffix from pyproject and airflow config * fix: reverting DAGs - removing temporary modifications I added for testing * fix: addressing reviewer comments * refactor: fiddling with variant index annotation logic * chore: addressing comments * fix: variant cross-ref snake case * fix: correcting join strategy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
b3e89bb
commit f79c789
Showing
42 changed files
with
2,239 additions
and
730 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
defaults: | ||
- variant_index | ||
|
||
variant_annotation_path: ${datasets.variant_annotation} | ||
credible_set_path: ${datasets.credible_set} | ||
vep_output_json_path: ${datasets.vep_output_path} | ||
gnomad_variant_annotations_path: ${datasets.gnomad_variants} | ||
variant_index_path: ${datasets.variant_index} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: Ensembl annotations | ||
--- | ||
|
||
<div align="center"> | ||
<img width="100" height="100" src="../../../../assets/imgs/ensembl_logo.png"> | ||
<h1>Ensembl</h1> | ||
</div> | ||
|
||
[Ensembl](https://www.ensembl.org/index.html) provides a diverse set of genetic data Gentropy takes advantage of including gene set, and variant annotations. |
5 changes: 5 additions & 0 deletions
5
docs/python_api/datasources/ensembl/variant_effect_predictor_parser.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
title: Variant effector parser | ||
--- | ||
|
||
::: gentropy.datasource.ensembl.vep_parser.VariantEffectPredictorParser |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: ld_index | ||
title: GnomAD Linkage data ingestion | ||
--- | ||
|
||
::: gentropy.ld_index.LDIndexStep | ||
::: gentropy.gnomad_ingestion.LDIndexStep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: variant_annotation | ||
title: GnomAD variant data ingestion | ||
--- | ||
|
||
::: gentropy.variant_annotation.VariantAnnotationStep | ||
::: gentropy.gnomad_ingestion.GnomadVariantIndexStep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,4 +2,4 @@ | |
title: variant_to_gene | ||
--- | ||
|
||
::: gentropy.v2g.V2GStep | ||
::: gentropy.variant_to_gene.V2GStep |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
{ | ||
"transcript_ablation": "SO_0001893", | ||
"splice_acceptor_variant": "SO_0001574", | ||
"splice_donor_variant": "SO_0001575", | ||
"stop_gained": "SO_0001587", | ||
"frameshift_variant": "SO_0001589", | ||
"stop_lost": "SO_0001578", | ||
"start_lost": "SO_0002012", | ||
"transcript_amplification": "SO_0001889", | ||
"feature_elongation": "SO_0001907", | ||
"feature_truncation": "SO_0001906", | ||
"inframe_insertion": "SO_0001821", | ||
"inframe_deletion": "SO_0001822", | ||
"missense_variant": "SO_0001583", | ||
"protein_altering_variant": "SO_0001818", | ||
"splice_donor_5th_base_variant": "SO_0001787", | ||
"splice_region_variant": "SO_0001630", | ||
"splice_donor_region_variant": "SO_0002170", | ||
"splice_polypyrimidine_tract_variant": "SO_0002169", | ||
"incomplete_terminal_codon_variant": "SO_0001626", | ||
"start_retained_variant": "SO_0002019", | ||
"stop_retained_variant": "SO_0001567", | ||
"synonymous_variant": "SO_0001819", | ||
"coding_sequence_variant": "SO_0001580", | ||
"mature_miRNA_variant": "SO_0001620", | ||
"5_prime_UTR_variant": "SO_0001623", | ||
"3_prime_UTR_variant": "SO_0001624", | ||
"non_coding_transcript_exon_variant": "SO_0001792", | ||
"intron_variant": "SO_0001627", | ||
"NMD_transcript_variant": "SO_0001621", | ||
"non_coding_transcript_variant": "SO_0001619", | ||
"coding_transcript_variant": "SO_0001968", | ||
"upstream_gene_variant": "SO_0001631", | ||
"downstream_gene_variant": "SO_0001632", | ||
"TFBS_ablation": "SO_0001895", | ||
"TFBS_amplification": "SO_0001892", | ||
"TF_binding_site_variant": "SO_0001782", | ||
"regulatory_region_ablation": "SO_0001894", | ||
"regulatory_region_amplification": "SO_0001891", | ||
"regulatory_region_variant": "SO_0001566", | ||
"intergenic_variant": "SO_0001628", | ||
"sequence_variant": "SO_0001060" | ||
} |
Oops, something went wrong.