Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHAPR for custom defined models #420

Closed
mehwish2021 opened this issue Nov 18, 2024 · 14 comments
Closed

SHAPR for custom defined models #420

mehwish2021 opened this issue Nov 18, 2024 · 14 comments

Comments

@mehwish2021
Copy link

Hello All,
I am using DeepCC model in which I want to apply SHAPR to get the shaply values .
I defined the functions as described in here https://cran.r-project.org/web/packages/shapr/vignettes/understanding_shapr.html , but its still giving me the error that SHAPR is not defined for this model.

I have defined these functions get_model_specs() and predict_model() as mentioned in the vignette but I am confused how and where are those functions called?
Please help as it has taken much time and I am unable to resolve this issue

@martinju
Copy link
Member

Hi! Please use the github version of the shapr package.

@mehwish2021
Copy link
Author

Thanks for your prompt response Martinju.
A little more clarification, do you mean I have to add these functions in the code of shapR package?? Can you please guide in which file I need to add these functions ?
I would really appreciate you response

@martinju
Copy link
Member

No, the package on CRAN is outdated (and will be replaced by the main at GitHub) soon. It does things a bit differently. I suggest you install the github version of shapr instead, modify your code accordingly, try to use that to explain your custom model instead. The procedure is explained her:
https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html#explain-custom-models
(which is slightly different from the procedure with the version on CRAN)

If you still got issues after that, let me know.

@vmombo
Copy link

vmombo commented Nov 21, 2024

Hello All, I have a Keras model, is it possible to make it work with it?

@martinju
Copy link
Member

Hello All, I have a Keras model, is it possible to make it work with it?

Yes. Please see the main vignetten on the pkgdown site for instructions.

@vmombo
Copy link

vmombo commented Nov 22, 2024

Thanks. I tried and that works well on model I built with the Boston Dataset. However with one of my real model I get this

`explanation <- explain(
model = finalModel,
x_explain = sdt.test,
x_train = dt.test[1:100, ],
approach = "empirical", # Choose explanation approach
phi0 = phi0, # Specify baseline value
predict_model = predict_model.EnsembleModel,
get_model_specs = get_model_specs,
#max_n_coalitions = 2^10, # Reasonable limit
verbose = "progress"
)

Success with message:
max_n_coalitions is NULL or larger than or 2^n_features = 3.86856262276681e+25,
and is therefore set to 2^n_features = 3.86856262276681e+25.

Error in .makeMessage(..., domain = domain) :
argument is missing, with no default`

I am trying to understand this error by looking on the source code I still don't understand. Any suggest about where I should investigate.?
PS: I use the version from the main of this github repo.

@martinju
Copy link
Member

It seems you have an enormous number of features? (83 or 84 features?)

I have not seen this error message before, but in any case, you should first try to reduce the number of features to see if it works then.
Also, set max_n_coalitions to 500 or something to reduce runtime when testing.

@vmombo
Copy link

vmombo commented Nov 22, 2024

Yes, excellent. But I've dug into the code a little bit to have like a traceback of my error.

I think apart from the big number of features. my error is coming from the warning in the code. Especially for me . In the explain function, we call setup and in the function check_computability my input leads me to this condition:

if (isFALSE(is_groupwise) && n_features > 30) { warning( "Due to computation time, we strongly recommend enabling iterative estimation with iterative = TRUE", " when n_features > 30.\n", ) }

However a warning written this way produce an error because of the "," at the end. You can reproduce my error just by calling this line :
warning( "Due to computation time, we strongly recommend enabling iterative estimation with iterative = TRUE", " when n_features > 30.\n", )

@martinju
Copy link
Member

Oh, thanks a lot for catching that bug! I will fix it ASAP

@martinju
Copy link
Member

@vmombo I just merged the fix to main

@vmombo
Copy link

vmombo commented Nov 22, 2024

thank you very much. I have just seen your comment

@vmombo
Copy link

vmombo commented Nov 25, 2024

Hi @martinju ,

I've 85 variables to explain for some predictions, I've tried several times but I got some crashes with the following error : R(27812,0x2033e3840) malloc: *** error for object 0x600001bdc080: pointer being freed was not allocated R(27812,0x2033e3840) malloc: *** set a breakpoint in malloc_error_break to debug,

I’m not sure if this is due to my computer's capacity, but I suspect it might be (so many coalitions). I found a "walk around" which is by grouping variables as you did in the paper.

However and most importantly, I would like to know the following:

  • In the grouped approach, what exactly is being explained in the predictions?
  • Is it the mean value of the group?

Thanks in advance, for your help

@martinju
Copy link
Member

Hi again!
85 variables is a lot, so resorting to the grouped approach is recommended.
By reducing the number of coalitions evaluated in each batch, you should be able to avoid memory issues, however. But it will take a lot of time to compute shapley values with decent accuracy.

The grouped approach explains how the group of features contributes to the prediction instead of the individual features (i.e. what happens as you add/remove the entire group instead of add/remove individual features) -- it is not about mean values or so. A basic and practical introduciton to the approach is given here: https://martinjullum.com/publication/jullum-2021-efficient/jullum-2021-efficient.pdf

@vmombo
Copy link

vmombo commented Nov 27, 2024

Thank you for the clarification and the resource!

I'll take a closer look at the reference—appreciate you sharing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants