Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference between the max_rounds and smoothing_rounds parameters #596

Open
Natsu817 opened this issue Jan 26, 2025 · 3 comments
Open

Comments

@Natsu817
Copy link

Hi team,
Could you explain the difference between the max_rounds and smoothing_rounds parameters in EBM?
I’d like to make the shape function smoother—should I focus on adjusting one of these parameters, and if so, which one?

@paulbkoch
Copy link
Collaborator

Hi @Natsu817 -- Perhaps a concrete example would be easiest. Let's say you had:

number of features: 7
interactions=0
max_rounds=3000
smoothing_rounds=1000

In this case, you'll get 7 * 3000 = 21,000 total boosting steps where a single feature is boosted on. The first 1000 * 7 = 7000 boosting steps will be smoothed, and the remaining 14,000 steps after that will greedily choose the cuts.

During the smoothing rounds, the features will be selected cyclically in a round robin fashion, and the per-feature cuts will be randomly selected. Often jagged edges in the graphs form early in the boosting process if a particular cut point is greedily chosen multiple times. Randomly choosing the cuts initially allows for the general shape of the graphs to form before later sharpening them through greedy cut selection.

Surprisingly, you can often obtain fairly competitive results purely using randomly selected cuts. If you wanted maximal smoothness, setting smoothing_rounds==max_rounds should do the trick.

You might also want to try using reg_alpha or reg_lambda.

We have documentation specifically for obtaining smooth results:
https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

@Natsu817
Copy link
Author

Natsu817 commented Jan 29, 2025

@paulbkoch

Thanks for your response!
Your detailed explanation was really clear and easy to understand. I also checked the link you shared and will use these references as I move forward with my analysis.

I have two more questions I'd like to ask:

Question1:
In issue #529, greedy_ratio was mentioned.
I understand that there are greedy rounds and cyclic rounds, but how do these relate to smoothing_rounds? Are they separate concepts?
Does smoothing_rounds control whether cut points are selected randomly or greedily in one or both of these rounds?

Question2:
I'm interested in the theoretical impact of reg_alpha andreg_lambdaon EBM. Do you have any reference materials on this?

I also tested different reg_lambda values on an existing model and noticed that in some cases, the shape became smoother, and the AUC improved.

@paulbkoch
Copy link
Collaborator

paulbkoch commented Jan 29, 2025

Answer1:

Let's make this simpler first and consider the scenario where smoothing_rounds=0. In that case the EBM algorithm is intermixing greedy_rounds and cyclic_rounds. cyclic_rounds do exactly as they sound, they visit each feature in-order and boost those features in that order. The gain calculation is not used when choosing which feature to boost on, however gain is used while picking the cut that we make within that feature. For some datasets cyclic_rounds by themselves are great and you end up with a perfectly good EBM. There are some datasets however where some of the features would benefit from more boosting than other features. Typically these are datasets where there is a lot of complexity in some of the graphs, or when you have things like categoricals. In these cases the cyclic boosting algorithm will tend to overcook some features and undercook others. This is where greedy rounds come in handy because during the greedy rounds we pick which feature to boost based on gain. Combined, this allows for a more flexible algorithm.

Answer2:

I added reg_alpha (L1 regularization) and reg_lambda (L2 regularization) because these are also available on our friendly competitors: XGBoost, LightGBM, Catboost. The gradient calculations are identical to theirs, so I think anything you find on how these work in other GBDTs will also apply to EBMs. These are also used in linear models, so all together you should have a wealth of resources to explore. Happy to answer any specific questions you have regarding these for EBMs though.

Most of the time I'd expect a small reg_alpha or reg_lambda to help AUC, but the default of 0.0 isn't terrible, and the best value to use needs to be tuned per dataset, which is why these are 0.0 by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants