explainability: new module #44

JochenSiegWork · 2024-07-05T09:40:03Z

- Add proof of concept for Explainer class and explanation
  data structures to express explanations for feature
  vectors and molecules.
- Add Christian W. Feldmanns visualization code for shap
  weighted heatmaps of the molecular structure.

- Add proof of concept for Explainer class and explanation data structures to express explanations for feature vectors and molecules. - Add Christian W. Feldmanns visualization code for shap weighted heatmaps of the molecular structure.

c-w-feldmann

move to experimental
only ignorer errors where required.

Overall this is a lot to look through and my brain started to nope out...
I think we can leave it as is and improve later. I still think the text below the explanations are confusing.

molpipeline/explainability/explainer.py

molpipeline/explainability/visualization/utils.py

tests/test_explainability/test_shap_explainers.py

c-w-feldmann

Sorry still haven't finished everything. Just submitting a WIP

c-w-feldmann · 2024-12-09T09:47:18Z

molpipeline/experimental/explainability/explainer.py

+    return feature_matrix
+
+
+def _convert_to_array(value: Any) -> npt.NDArray[np.float64]:


I think we could use numpy.atleast_1d instead.

c-w-feldmann · 2024-12-09T09:56:07Z

molpipeline/experimental/explainability/explainer.py

+    return atom_weights
+
+
+ShapExplanation: TypeAlias = list[


If I read this correctly TypeAlias is deprecated https://docs.python.org/3/library/typing.html#typing.TypeAlias

Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

If its obvious some varaibles aren't required to be explicitly typed.

Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

c-w-feldmann

Notebook: After cell [16]:
We hypothesis -> We hypothesi(z/s)e?

c-w-feldmann · 2025-01-07T10:47:53Z

molpipeline/experimental/explainability/explainer.py

+    raise ValueError("Value is not a scalar or numpy array.")
+
+
+def _get_prediction_function(pipeline: Pipeline | BaseEstimator) -> Any:


-> Callable[[npt.Arraylike], npt.Arraylike]

c-w-feldmann · 2025-01-09T12:47:38Z

molpipeline/experimental/explainability/explainer.py

+    return atom_weights
+
+
+ShapExplanation: TypeAlias = list[


Maybe:

ShapExplanation = Sequence[SHAPFeatureExplanation | SHAPFeatureAndAtomExplanation]

c-w-feldmann · 2025-01-09T12:48:39Z

molpipeline/experimental/explainability/explainer.py

+    return atom_weights
+
+
+ShapExplanation: TypeAlias = list[


If its obvious some varaibles aren't required to be explicitly typed.

c-w-feldmann · 2025-01-09T12:51:49Z

molpipeline/experimental/explainability/explainer.py

+
+        # determine type of returned explanation
+        featurization_element = self.featurization_subpipeline.steps[-1][1]  # type: ignore[union-attr]
+        self.return_element_type_: type[


type hints for class/instance variables should be typed outside of functions. It would be best to do this above the init function.

c-w-feldmann · 2025-01-09T12:56:21Z

molpipeline/experimental/explainability/explainer.py

+    return atom_weights
+
+
+ShapExplanation: TypeAlias = list[


Also the name reads to me like this was an implemented class. Maybe ExplanationList or something like this?

c-w-feldmann · 2025-01-09T13:04:43Z

molpipeline/experimental/explainability/explainer.py

+import shap
+from scipy.sparse import issparse, spmatrix
+from sklearn.base import BaseEstimator
+from typing_extensions import override


better use typing.override

c-w-feldmann · 2025-01-09T13:12:56Z

molpipeline/experimental/explainability/visualization/heatmaps.py

+        self.values = np.zeros((self.x_res, self.y_res))
+
+    @property
+    def dx(self) -> float:


since dx and dy are repeatedly called, would it make sense to just return self._dx or self._dy?
We make x_lim, y_lim x_res, and y_res properties and when the setter is called, we appt self._dx and self._dy.

c-w-feldmann · 2025-01-09T13:14:19Z

molpipeline/experimental/explainability/visualization/heatmaps.py

+        y_lim: Sequence[float],
+        x_res: int,
+        y_res: int,
+    ):


allow to pass functions during init

c-w-feldmann · 2025-01-09T13:28:34Z

tests/test_experimental/test_explainability/test_shap_explainers.py

+            if isinstance(explainer, SHAPTreeExplainer) and isinstance(
+                estimator, GradientBoostingClassifier
+            ):
+                # there is currently a bug in SHAP's TreeExplainer for GradientBoostingClassifier


maybe log a warning or info so me might remember this :D

JochenSiegWork marked this pull request as draft July 5, 2024 09:40

JochenSiegWork self-assigned this Jul 5, 2024

JochenSiegWork force-pushed the explainability_module branch from bf9714d to 3869edb Compare August 13, 2024 08:02

JochenSiegWork force-pushed the explainability_module branch from cdbece2 to 21a13bf Compare October 9, 2024 07:01

JochenSiegWork force-pushed the explainability_module branch 2 times, most recently from 9d236b9 to d6d880e Compare November 21, 2024 12:24

JochenSiegWork added 24 commits November 25, 2024 13:49

explainability: new module

8e24c7a

- Add proof of concept for Explainer class and explanation data structures to express explanations for feature vectors and molecules. - Add Christian W. Feldmanns visualization code for shap weighted heatmaps of the molecular structure.

explainability: changes for numpy2

8b39bf7

explanability: add new visualization

6573676

explainability: linting vis code

3bd54d1

explainability: fix linting

36f2517

explainability: vis linting

3fdeee5

explainability: more linting

bd78f18

linting

2f6c521

linting again

83df2dc

explaonability: add matplotlib as dependency for visualization

787e79f

explainability: improve speed

368b9af

explainability: speed up unittests

59b550f

explainability: suppress already checked mypy warning

3cbf568

mypy

d69d2b1

mypy + rename interface atom_weights

c8ac95f

explainability: add more visualization

206af69

explainability: refactored shap heatmap visualization

76395c4

linting

9db8ca0

linting

e1176e4

linting

2e89391

linting

e9e0102

linting

45008c9

linting

4f5c186

explainability: handle fill values better

d883236

JochenSiegWork added 21 commits November 25, 2024 14:11

linting

44cfe84

linting

2f7bc43

linting

8a95d29

linting

5565805

linting

29daa6d

linting

fe50d74

linitng

994fc3e

linting

e0c3d3c

linting

633def2

linting

fbb855c

linting

725cdb8

linting

f12432a

linting

5603cf0

add feature_names to explainability code

1981d6a

linting

d6bc7c2

add xai notebook and adaptations

398a5dc

linting

0a9acc4

improve xai notebook

0c14d58

finished notebook

ccb43e3

black

671b4d0

example data

964213b

JochenSiegWork marked this pull request as ready for review December 3, 2024 11:20

JochenSiegWork requested a review from c-w-feldmann December 3, 2024 13:38

c-w-feldmann requested changes Dec 4, 2024

View reviewed changes

molpipeline/explainability/explainer.py Outdated Show resolved Hide resolved

molpipeline/explainability/visualization/utils.py Outdated Show resolved Hide resolved

tests/test_explainability/test_shap_explainers.py Outdated Show resolved Hide resolved

JochenSiegWork added 4 commits December 4, 2024 16:57

Christian comments 1

52387f5

linting

8d3ae06

pydocstyle

55036aa

rename xai notebook and update import

a970d3d

c-w-feldmann reviewed Dec 9, 2024

View reviewed changes

c-w-feldmann requested changes Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explainability: new module #44

explainability: new module #44

JochenSiegWork commented Jul 5, 2024

c-w-feldmann left a comment •

edited by JochenSiegWork

Loading

c-w-feldmann left a comment

c-w-feldmann Dec 9, 2024

c-w-feldmann Dec 9, 2024

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann left a comment

c-w-feldmann Jan 7, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

c-w-feldmann Jan 9, 2025

		return feature_matrix


		def _convert_to_array(value: Any) -> npt.NDArray[np.float64]:

		raise ValueError("Value is not a scalar or numpy array.")


		def _get_prediction_function(pipeline: Pipeline \| BaseEstimator) -> Any:

explainability: new module #44

Are you sure you want to change the base?

explainability: new module #44

Conversation

JochenSiegWork commented Jul 5, 2024

c-w-feldmann left a comment • edited by JochenSiegWork Loading

Choose a reason for hiding this comment

c-w-feldmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c-w-feldmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c-w-feldmann left a comment •

edited by JochenSiegWork

Loading