Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional refactor #81

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open

Functional refactor #81

wants to merge 41 commits into from

Conversation

LukasMahieu
Copy link
Collaborator

@LukasMahieu LukasMahieu commented Dec 17, 2024

Functional Refactor of crested.tl.Crested(...) class

In this large refactor I'm moving everything from the Crested class that does not use both a model and the AnnDatamodule out to a new _tools.py module where everything will be functional.
All the old functions remain in the Crested class for backward compatibility (for now) but will now raise a deprecation warning.

I'm giving up a bit of clarity for ease of use by combining functions that do the same on different inputs into one single function.

Equivalent new functions

  • tl.Crested.get_embeddings(...) ---> tl.extract_layer_embeddings(...)
  • tl.Crested.predict(...) ---> tl.predict(...)
  • tl.Crested.predict_regions(...) ---> tl.predict(...)
  • tl.Crested.predict_sequence(...) ---> tl.predict(...)
  • tl.Crested.score_gene_locus(...) ---> tl.score_gene_locus(...)
  • tl.Crested.calculate_contribution_scores(...) ---> tl.contribution_scores(...)
  • tl.Crested.calculate_contribution_scores_regions(...) ---> tl.contribution_scores(...)
  • tl.Crested.calculate_contribution_scores_sequence(...) ---> tl.contribution_scores(...)
  • tl.Crested.calculate_contribution_scores_enhancer_design(...) ---> tl.contribution_scores(...)
  • tl.Crested.tfmodisco_calculate_and_save_contribution_scores_sequences ---> tl.contribution_scores_specific(...)
  • tl.Crested.tfmodisco_calculate_and_save_contribution_scores ---> tl.contribution_scores(...)
  • tl.Crested.enhancer_design_motif_implementation ---> tl.enhancer_design_motif_insertion
  • tl.Crested.enhancer_design_in_silico_evolution ---> tl.enhancer_design_in_silico_evolution

New functions

Some utility functions were hidden inside the Crested class but required to be made explicit.

  • utils.calculate_nucleotide_distribution (advised for enhancer design)
  • utils.derive_intermediate_sequences (required for inspecting intermediate results from enhancer design)

New behaviour

  • All functions that accept a model can now also accept lists of models, in which case the results will be averaged across models.
  • All functions use a similar api, namely they expect some 'input' that can be converted to a one hot encoding (sequences, region names, anndatas with region names), but now the conversion happens behind the scenes so the user doesn't have to worry about this and we don't have a separate function per input format.

Tests

Added a bunch more unit tests for the tooling module so it behaves as expected, as well as unit tests that ensure the refactored functions mimick the behaviour of the olds functions.

Notebooks

  • Rewrote the tutorials to use the new functional API (WIP).
  • Expanded on the enhancer design section

To do

  • update other notebooks
  • add deprecation warnings to crested class

LukasMahieu and others added 30 commits November 26, 2024 17:16
@LukasMahieu
Copy link
Collaborator Author

I'm adding a bunch of people for review. This is way too much code change to actually test in depth for one person, but I just want to make sure we're on the same page for the functional refactor's new API before actually merging. When we agree on the format, I'll merge and ask all lab members to install the pre-release version and test from the main branch before pushing a new release.

You can look at the tutorial notebook and compare with the current release to get a good idea of what's different, or look at the refactor unit tests to see the equivalent api's.

Let me know if you have comments or suggestions.
@erceksi @nkempynck @casblaauw @SeppeDeWinter

@LukasMahieu
Copy link
Collaborator Author

Another idea I had: should we add a register_model similar to the register_genome that we have?
It's a bit dangerous in the sense that a user might forget which model they are working with if they have multiple, but it would get rid of many model=... arguments since every function expects it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant