Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(l2g_predictions): annotate based on list of features + filter out missing annotation #925

Merged
merged 5 commits into from
Dec 5, 2024

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Nov 22, 2024

✨ Context

The feature matrix contains all features we have developed for L2G.
However, that doesn't mean that we want to use all of them during training. Right now, this is true for isProteinCoding.

🛠 What does this PR implement

  • add_locus_to_gene_features didn't take into account the list of features used for training. Now it does
  • We were filtering out features with nulls, instead of features equal to 0. Now each prediction only include features that have annotation

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g poetry run pre-commit run --all-files)?

@github-actions github-actions bot added bug Something isn't working size-S Dataset Step labels Nov 22, 2024
Copy link
Contributor

@project-defiant project-defiant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how many features did we add additionally to the predictions with last run that we do not care about?

) -> L2GPrediction:
"""Add features to the L2G predictions.
"""Add features used to extract the L2G predictions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It extracts the features based on features_list from L2GFeatureMatrix and reannotates the locusToGeneFeatures column with the map constructed from extracted features if the column exists or
creates the column when it is missing from the schema.


Args:
feature_matrix (L2GFeatureMatrix): Feature matrix dataset
features_list (list[str]): List of features used in the model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of features to extract from feature matrix.

@ireneisdoomed
Copy link
Contributor Author

@project-defiant One: isProteinCoding, which is true for everyone right now.

Thanks for the review!

@ireneisdoomed ireneisdoomed merged commit 43f047a into dev Dec 5, 2024
5 checks passed
@ireneisdoomed ireneisdoomed deleted the il-prediction-features branch December 5, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Dataset size-S Step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants