Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: trigger release process #619

Merged
merged 25 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6dfee41
fix(spark helpers): sorter function updated to avoid crash (#613)
DSuveges May 20, 2024
e355970
feat: fine-mapping simulations class (#618)
addramir May 22, 2024
c2bfa18
feat(config): gnomAD steps configuration extraction and versioning (#…
project-defiant May 28, 2024
2ba7adb
fix(docs): update roadmap.md (#622)
buniello May 28, 2024
f0960b1
fix(ld): correct syntax for the static method (#624)
tskir May 29, 2024
8ae94ea
fix(susie_finemapper): correct syntax for saving the logs df (#625)
tskir May 29, 2024
48cf2a8
fix(susie_finemapper): fix in the fine-mapper in case of sum stat imp…
addramir May 29, 2024
b60a19f
fix(SummaryStatistics): fix in sanity_filter (#623)
addramir May 30, 2024
fa6d500
build(deps): use pandas[gcp, parquet] (#626)
tskir May 30, 2024
b22951b
fix(docs): fixed typo in l2g_prediciton schema page (#629)
project-defiant May 31, 2024
daa8331
feat(config): extract gwas_significance parameter to step configurati…
project-defiant Jun 4, 2024
95f26d0
feat(data_release): preparation for 24.06 data release (#633)
project-defiant Jun 6, 2024
027b685
build(deps): bump typing-extensions from 4.11.0 to 4.12.1 (#632)
dependabot[bot] Jun 6, 2024
947d1e6
build(deps-dev): bump pep8-naming from 0.13.3 to 0.14.1 (#616)
dependabot[bot] Jun 6, 2024
fd3154a
feat(qtl): ingest credible sets from single cell derived QTLs (#630)
ireneisdoomed Jun 6, 2024
689340c
feat(spark-helpers): enforce schema of returned objects (#617)
DSuveges Jun 6, 2024
95244fd
feat: adding locus-breaker clumping method (#634)
DSuveges Jun 7, 2024
820f921
build(deps): bump wandb from 0.16.2 to 0.17.0 (#606)
dependabot[bot] Jun 7, 2024
d1577ee
build(deps-dev): bump ipython from 8.24.0 to 8.25.0 (#636)
dependabot[bot] Jun 10, 2024
e52fd11
build(deps): bump scikit-learn from 1.4.0 to 1.5.0 (#638)
dependabot[bot] Jun 11, 2024
ca43fff
feat: enable interface with gcp secrets manager (#635)
ireneisdoomed Jun 11, 2024
45d991c
feat: credible set quality filtering (#640)
Daniel-Considine Jun 11, 2024
976ee30
build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 (#637)
dependabot[bot] Jun 13, 2024
3c8ce58
feat(config): 24.06 data release fixes (#639)
project-defiant Jun 13, 2024
7625a79
fix(L2GPrediction): schema validation (#642)
project-defiant Jun 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ src/airflow/logs/*
!src/airflow/logs/.gitkeep
site/
.env
.coverage*
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ setup-dev: ## Setup development environment

check: ## Lint and format code
@echo "Linting API..."
@poetry run ruff src/gentropy .
@poetry run ruff check src/gentropy .
@echo "Linting docstrings..."
@poetry run pydoclint --config=pyproject.toml src
@poetry run pydoclint --config=pyproject.toml --skip-checking-short-docstrings=true tests
Expand Down
19 changes: 16 additions & 3 deletions config/datasets/ot_gcp.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Release specific configuration:
release_version: "24.03"
release_version: "24.06"
dev_version: XX.XX
release_folder: gs://genetics_etl_python_playground/releases/${datasets.release_version}

Expand All @@ -8,6 +8,7 @@ static_assets: gs://genetics_etl_python_playground/static_assets
outputs: gs://genetics_etl_python_playground/output/python_etl/parquet/${datasets.dev_version}

## Datasets:
# GWAS
gwas_catalog_dataset: gs://gwas_catalog_data
# Ingestion input files:
gwas_catalog_associations: ${datasets.gwas_catalog_dataset}/curated_inputs/gwas_catalog_associations_ontology_annotated.tsv
Expand All @@ -29,7 +30,18 @@ gwas_catalog_study_index: ${datasets.gwas_catalog_dataset}/study_index
gwas_catalog_study_locus_folder: ${datasets.gwas_catalog_dataset}/study_locus_datasets
gwas_catalog_credible_set_folder: ${datasets.gwas_catalog_dataset}/credible_set_datasets

# Input datasets
# GnomAD
gnomad_public_bucket: gs://gcp-public-data--gnomad/release/
# LD generation
# Templates require placeholders {POP} to expand template to match multiple populationwise paths
ld_matrix_template: ${datasets.gnomad_public_bucket}/2.1.1/ld/gnomad.genomes.r2.1.1.{POP}.common.adj.ld.bm
ld_index_raw_template: ${datasets.gnomad_public_bucket}/2.1.1/ld/gnomad.genomes.r2.1.1.{POP}.common.ld.variant_indices.ht
liftover_ht_path: ${datasets.gnomad_public_bucket}/2.1.1/liftover_grch38/ht/genomes/gnomad.genomes.r2.1.1.sites.liftover_grch38.ht
# variant_annotation
gnomad_genomes_path: ${datasets.gnomad_public_bucket}4.0/ht/genomes/gnomad.genomes.v4.0.sites.ht/

# Others
chain_38_37: gs://hail-common/references/grch38_to_grch37.over.chain.gz
chain_37_38: ${datasets.static_assets}/grch37_to_grch38.over.chain
vep_consequences: ${datasets.static_assets}/vep_consequences.tsv
anderson: ${datasets.static_assets}/andersson2014/enhancer_tss_associations.bed
Expand All @@ -49,7 +61,7 @@ summary_statistics: ${datasets.outputs}/summary_statistics
study_locus_overlap: ${datasets.outputs}/study_locus_overlap
susie_finemapping: ${datasets.outputs}/finngen_susie_finemapping

ld_index: ${datasets.outputs}/ld_index
ld_index: ${datasets.static_assets}/ld_index
catalog_study_index: ${datasets.study_index}/catalog
catalog_study_locus: ${datasets.study_locus}/catalog_study_locus

Expand All @@ -60,6 +72,7 @@ from_sumstats_pics: ${datasets.credible_set}/from_sumstats
l2g_gold_standard_curation: ${datasets.release_folder}/locus_to_gene_gold_standard.json
l2g_model: ${datasets.release_folder}/locus_to_gene_model
l2g_predictions: ${datasets.release_folder}/locus_to_gene_predictions
l2g_feature_matrix: ${datasets.release_folder}/locus_to_gene_feature_matrix
colocalisation: ${datasets.release_folder}/colocalisation
study_index: ${datasets.release_folder}/study_index
variant_index: ${datasets.release_folder}/variant_index
Expand Down
2 changes: 1 addition & 1 deletion config/step/ot_ld_based_clumping.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
defaults:
- ld_based_clumping

ld_index_path: ${datasets.ld_index}
ld_index_path: ${datasets.ld_index}/2.1.1
study_locus_input_path: ???
study_index_path: ???
clumped_study_locus_output_path: ???
16 changes: 16 additions & 0 deletions config/step/ot_ld_index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,19 @@ defaults:
- ld_index

ld_index_out: ${datasets.ld_index}
ld_matrix_template: ${datasets.ld_matrix_template}
ld_index_raw_template: ${datasets.ld_index_raw_template}
grch37_to_grch38_chain_path: ${datasets.chain_37_38}
liftover_ht_path: ${datasets.liftover_ht_path}
ld_populations:
- afr # African-American
- amr # American Admixed/Latino
- asj # Ashkenazi Jewish
- eas # East Asian
- est # Estonian
- fin # Finnish
- nfe # Non-Finnish European
- nwe # Northwestern European
- seu # Southeastern European
# The version will of the gnomad will be inferred from ld_matrix_template and appended to the ld_index_out.
use_version_from_input: true
1 change: 1 addition & 0 deletions config/step/ot_locus_to_gene_predict.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ defaults:
run_mode: predict
model_path: ${datasets.l2g_model}
predictions_path: ${datasets.l2g_predictions}
feature_matrix_path: ${datasets.l2g_feature_matrix}
credible_set_path: ${datasets.credible_set}
variant_gene_path: ${datasets.v2g}
colocalisation_path: ${datasets.colocalisation}
Expand Down
15 changes: 15 additions & 0 deletions config/step/ot_variant_annotation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,18 @@ defaults:
- variant_annotation

variant_annotation_path: ${datasets.variant_annotation}
gnomad_genomes_path: ${datasets.gnomad_genomes_path}
chain_38_37: ${datasets.chain_38_37}
gnomad_variant_populations:
- afr # African-American
- amr # American Admixed/Latino
- ami # Amish ancestry
- asj # Ashkenazi Jewish
- eas # East Asian
- fin # Finnish
- nfe # Non-Finnish European
- mid # Middle Eastern
- sas # South Asian
- remaining # Other
# The version will of the gnomad will be inferred from ld_matrix_template and appended to the ld_index_out.
use_version_from_input: true
1 change: 1 addition & 0 deletions config/step/ot_window_based_clumping.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ defaults:
summary_statistics_input_path: ???
study_locus_output_path: ???
inclusion_list_path: ???
gwas_significance: 1e-8
1 change: 1 addition & 0 deletions docs/python_api/_python_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ The overall architecture of the package distinguishes between:
- [**Datasets**](datasets/_datasets.md): data model
- [**Methods**](methods/_methods.md): statistical analysis tools
- [**Steps**](steps/_steps.md): pipeline steps
- [**Common**](common/_common.md): Common classes
8 changes: 8 additions & 0 deletions docs/python_api/common/_common.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Common
---

Common utilities used in gentropy package.

- [**Version Engine**](version_engine.md): class to extract version from datasource input paths
- [**Types**](types.md): Literal types used in the gentropy
8 changes: 8 additions & 0 deletions docs/python_api/common/types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Literal Types
---

:::gentropy.common.types
:::gentropy.common.types.LD_Population
:::gentropy.common.types.VariantPopulation
:::gentropy.common.types.DataSourceType
12 changes: 12 additions & 0 deletions docs/python_api/common/version_engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: VersionEngine
---

**VersionEngine**:

Version engine allows for registering datasource specific version seeker class to retrieve datasource version used as input to gentropy steps. Currently implemented only for GnomAD datasource.

This class can be then used to produce automation over output directory versioning.

:::gentropy.common.version_engine.VersionEngine
:::gentropy.common.version_engine.GnomADVersionSeeker
2 changes: 1 addition & 1 deletion docs/python_api/datasets/l2g_prediction.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ title: L2G Prediction

## Schema

--8<-- "assets/schemas/l2g_prediction.md"
--8<-- "assets/schemas/l2g_predictions.md"
5 changes: 5 additions & 0 deletions docs/python_api/methods/clumping.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ We have implemented two clumping methods:

1. **Distance-based clumping:** Uses genomic window to clump the significant SNPs into one hit.
2. **LD-based clumping:** Uses genomic window and LD to clump the significant SNPs into one hit.
3. **Locus-breaker clumping:** Applies a distance cutoff between baseline significant SNPs. Returns the start and end position of the locus as well.

The algorithmic logic is similar to classic clumping approaches from PLINK (Reference: [PLINK Clump Documentation](https://zzz.bwh.harvard.edu/plink/clump.shtml)). See details below:

Expand All @@ -20,3 +21,7 @@ The algorithmic logic is similar to classic clumping approaches from PLINK (Refe
# LD-based clumping:

::: gentropy.method.clump.LDclumping

# Locus-breaker clumping

::: gentropy.method.locus_breaker_clumping.locus_breaker
2 changes: 1 addition & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The Open Targets core team is working on refactoring Open Targets Genetics, aimi
- Faster/robust addition of new datasets and datatypes
- Reduce computational and financial cost

See [here](https://github.com/opentargets/issues/issues?q=is%3Aissue+is%3Aopen+label%3AGenetics_ETL_refactoring) for a list of open issues for this project.
See [here](https://github.com/opentargets/issues/issues?q=is%3Aissue+is%3Aopen+label%3Agentropy) for a list of open issues for this project.

Schematic diagram representing the drafted process:

Expand Down
Loading
Loading