The difference between the max_rounds and smoothing_rounds parameters #596

Natsu817 · 2025-01-26T09:59:25Z

Hi team,
Could you explain the difference between the max_rounds and smoothing_rounds parameters in EBM?
I’d like to make the shape function smoother—should I focus on adjusting one of these parameters, and if so, which one?

paulbkoch · 2025-01-26T11:25:53Z

Hi @Natsu817 -- Perhaps a concrete example would be easiest. Let's say you had:

number of features: 7
interactions=0
max_rounds=3000
smoothing_rounds=1000

In this case, you'll get 7 * 3000 = 21,000 total boosting steps where a single feature is boosted on. The first 1000 * 7 = 7000 boosting steps will be smoothed, and the remaining 14,000 steps after that will greedily choose the cuts.

During the smoothing rounds, the features will be selected cyclically in a round robin fashion, and the per-feature cuts will be randomly selected. Often jagged edges in the graphs form early in the boosting process if a particular cut point is greedily chosen multiple times. Randomly choosing the cuts initially allows for the general shape of the graphs to form before later sharpening them through greedy cut selection.

Surprisingly, you can often obtain fairly competitive results purely using randomly selected cuts. If you wanted maximal smoothness, setting smoothing_rounds==max_rounds should do the trick.

You might also want to try using reg_alpha or reg_lambda.

We have documentation specifically for obtaining smooth results:
https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

Natsu817 · 2025-01-29T13:35:38Z

@paulbkoch

Thanks for your response!
Your detailed explanation was really clear and easy to understand. I also checked the link you shared and will use these references as I move forward with my analysis.

I have two more questions I'd like to ask:

Question1:
In issue #529, greedy_ratio was mentioned.
I understand that there are greedy rounds and cyclic rounds, but how do these relate to smoothing_rounds? Are they separate concepts?
Does smoothing_rounds control whether cut points are selected randomly or greedily in one or both of these rounds?

Question2:
I'm interested in the theoretical impact of reg_alpha andreg_lambdaon EBM. Do you have any reference materials on this?

I also tested different reg_lambda values on an existing model and noticed that in some cases, the shape became smoother, and the AUC improved.

paulbkoch · 2025-01-29T19:02:49Z

Answer1:

Let's make this simpler first and consider the scenario where smoothing_rounds=0. In that case the EBM algorithm is intermixing greedy_rounds and cyclic_rounds. cyclic_rounds do exactly as they sound, they visit each feature in-order and boost those features in that order. The gain calculation is not used when choosing which feature to boost on, however gain is used while picking the cut that we make within that feature. For some datasets cyclic_rounds by themselves are great and you end up with a perfectly good EBM. There are some datasets however where some of the features would benefit from more boosting than other features. Typically these are datasets where there is a lot of complexity in some of the graphs, or when you have things like categoricals. In these cases the cyclic boosting algorithm will tend to overcook some features and undercook others. This is where greedy rounds come in handy because during the greedy rounds we pick which feature to boost based on gain. Combined, this allows for a more flexible algorithm.

Answer2:

I added reg_alpha (L1 regularization) and reg_lambda (L2 regularization) because these are also available on our friendly competitors: XGBoost, LightGBM, Catboost. The gradient calculations are identical to theirs, so I think anything you find on how these work in other GBDTs will also apply to EBMs. These are also used in linear models, so all together you should have a wealth of resources to explore. Happy to answer any specific questions you have regarding these for EBMs though.

Most of the time I'd expect a small reg_alpha or reg_lambda to help AUC, but the default of 0.0 isn't terrible, and the best value to use needs to be tuned per dataset, which is why these are 0.0 by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference between the max_rounds and smoothing_rounds parameters #596

The difference between the max_rounds and smoothing_rounds parameters #596

Natsu817 commented Jan 26, 2025

paulbkoch commented Jan 26, 2025

Natsu817 commented Jan 29, 2025 •

edited

Loading

paulbkoch commented Jan 29, 2025 •

edited

Loading

The difference between the max_rounds and smoothing_rounds parameters #596

The difference between the max_rounds and smoothing_rounds parameters #596

Comments

Natsu817 commented Jan 26, 2025

paulbkoch commented Jan 26, 2025

Natsu817 commented Jan 29, 2025 • edited Loading

paulbkoch commented Jan 29, 2025 • edited Loading

Natsu817 commented Jan 29, 2025 •

edited

Loading

paulbkoch commented Jan 29, 2025 •

edited

Loading