-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The difference between the max_rounds and smoothing_rounds parameters #596
Comments
Hi @Natsu817 -- Perhaps a concrete example would be easiest. Let's say you had: number of features: 7 In this case, you'll get 7 * 3000 = 21,000 total boosting steps where a single feature is boosted on. The first 1000 * 7 = 7000 boosting steps will be smoothed, and the remaining 14,000 steps after that will greedily choose the cuts. During the smoothing rounds, the features will be selected cyclically in a round robin fashion, and the per-feature cuts will be randomly selected. Often jagged edges in the graphs form early in the boosting process if a particular cut point is greedily chosen multiple times. Randomly choosing the cuts initially allows for the general shape of the graphs to form before later sharpening them through greedy cut selection. Surprisingly, you can often obtain fairly competitive results purely using randomly selected cuts. If you wanted maximal smoothness, setting smoothing_rounds==max_rounds should do the trick. You might also want to try using reg_alpha or reg_lambda. We have documentation specifically for obtaining smooth results: |
Thanks for your response! I have two more questions I'd like to ask: Question1: Question2: I also tested different |
Answer1: Let's make this simpler first and consider the scenario where smoothing_rounds=0. In that case the EBM algorithm is intermixing greedy_rounds and cyclic_rounds. cyclic_rounds do exactly as they sound, they visit each feature in-order and boost those features in that order. The gain calculation is not used when choosing which feature to boost on, however gain is used while picking the cut that we make within that feature. For some datasets cyclic_rounds by themselves are great and you end up with a perfectly good EBM. There are some datasets however where some of the features would benefit from more boosting than other features. Typically these are datasets where there is a lot of complexity in some of the graphs, or when you have things like categoricals. In these cases the cyclic boosting algorithm will tend to overcook some features and undercook others. This is where greedy rounds come in handy because during the greedy rounds we pick which feature to boost based on gain. Combined, this allows for a more flexible algorithm. Answer2: I added reg_alpha (L1 regularization) and reg_lambda (L2 regularization) because these are also available on our friendly competitors: XGBoost, LightGBM, Catboost. The gradient calculations are identical to theirs, so I think anything you find on how these work in other GBDTs will also apply to EBMs. These are also used in linear models, so all together you should have a wealth of resources to explore. Happy to answer any specific questions you have regarding these for EBMs though. Most of the time I'd expect a small reg_alpha or reg_lambda to help AUC, but the default of 0.0 isn't terrible, and the best value to use needs to be tuned per dataset, which is why these are 0.0 by default. |
Hi team,
Could you explain the difference between the max_rounds and smoothing_rounds parameters in EBM?
I’d like to make the shape function smoother—should I focus on adjusting one of these parameters, and if so, which one?
The text was updated successfully, but these errors were encountered: