Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explainability: new module #44

Open
wants to merge 67 commits into
base: main
Choose a base branch
from
Open

Conversation

JochenSiegWork
Copy link
Collaborator

- Add proof of concept for Explainer class and explanation
  data structures to express explanations for feature
  vectors and molecules.
- Add Christian W. Feldmanns visualization code for shap
  weighted heatmaps of the molecular structure.

@JochenSiegWork JochenSiegWork marked this pull request as draft July 5, 2024 09:40
@JochenSiegWork JochenSiegWork self-assigned this Jul 5, 2024
@JochenSiegWork JochenSiegWork force-pushed the explainability_module branch 2 times, most recently from 9d236b9 to d6d880e Compare November 21, 2024 12:24
@JochenSiegWork JochenSiegWork marked this pull request as ready for review December 3, 2024 11:20
Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • move to experimental
  • only ignorer errors where required.

Overall this is a lot to look through and my brain started to nope out...
I think we can leave it as is and improve later. I still think the text below the explanations are confusing.

molpipeline/explainability/explainer.py Outdated Show resolved Hide resolved
molpipeline/explainability/visualization/utils.py Outdated Show resolved Hide resolved
tests/test_explainability/test_shap_explainers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry still haven't finished everything. Just submitting a WIP

return feature_matrix


def _convert_to_array(value: Any) -> npt.NDArray[np.float64]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use numpy.atleast_1d instead.

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly TypeAlias is deprecated https://docs.python.org/3/library/typing.html#typing.TypeAlias

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its obvious some varaibles aren't required to be explicitly typed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook: After cell [16]:
We hypothesis -> We hypothesi(z/s)e?

raise ValueError("Value is not a scalar or numpy array.")


def _get_prediction_function(pipeline: Pipeline | BaseEstimator) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Callable[[npt.Arraylike], npt.Arraylike]

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its obvious some varaibles aren't required to be explicitly typed.


# determine type of returned explanation
featurization_element = self.featurization_subpipeline.steps[-1][1] # type: ignore[union-attr]
self.return_element_type_: type[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints for class/instance variables should be typed outside of functions. It would be best to do this above the init function.

return atom_weights


ShapExplanation: TypeAlias = list[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

import shap
from scipy.sparse import issparse, spmatrix
from sklearn.base import BaseEstimator
from typing_extensions import override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use typing.override

self.values = np.zeros((self.x_res, self.y_res))

@property
def dx(self) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since dx and dy are repeatedly called, would it make sense to just return self._dx or self._dy?
We make x_lim, y_lim x_res, and y_res properties and when the setter is called, we appt self._dx and self._dy.

y_lim: Sequence[float],
x_res: int,
y_res: int,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow to pass functions during init

if isinstance(explainer, SHAPTreeExplainer) and isinstance(
estimator, GradientBoostingClassifier
):
# there is currently a bug in SHAP's TreeExplainer for GradientBoostingClassifier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log a warning or info so me might remember this :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants