You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you very much for opening this issue.
The problem with rbv2 scenarios and memory (more precisely rbv2_super and rbv2_xgboost) is the following:
During the raw data collection, overall memory usage of the process was tracked via /usr/bin/time on the cluster (i.e.,
maximum resident size). However, for xgboost models (especially if using the darts booster) this measurement apparently was quite buggy and did not always return meaningful values but often values close to zero or even numerically zero. Additionally, during fitting of the surrogate on the raw data, scalers on the targets are used which in this case (xgboost and memory) can result in some overflow so values slightly below 0 can then be predicted by the surrogate (you will notice that those negative memory estimates are usually very close to zero, i.e., around - 1e-5.
Overall, this is of course very suboptimal and currently there is no good workaround.
We are aware of the issues of memory and rbv2_* (https://slds-lmu.github.io/yahpo_gym/frequently_asked.html memory estimation for rbv2_*) and will address this in the v2 version of Yahpo Gym (#65#67) which however still needs some time.
A potential workaround for now could be to restrict the configspace to not allow for the xgboost + darts combination but ideally users should not rely on memory objectives of rbv2_super and rbv2_xgboost for now (or at least take them with a grain of salt) - sorry.
I'll keep this issue open for visibility.
Edit: On a side note, the code above sets the repl fidelity (replication of cv folds) as the main fidelity parameter (instead of the trainsize) - not sure if you actually want this. Also, if you do not specify both fidelities in this line while benchmark.objective_function(config, fidelity)['info']['objectives']['memory'] > 0.: HPOBench will use the default value for the fidelity parameter not provided (which again might not be always meaningful). Note that this double fidelity space (repl and trainsize) is specific to the rbv2_* scenarios.
@sumny Thank you very much for pointing out the issue with the choice of fidelities. How would you suggest I specify the fidelity space for optimisers that would like one fidelity parameter when the benchmark is multi-multi-fidelity such as benchmark rbv2_* ? Regarding ignored fidelity parameters is it best to explicitly set them to their maximum value?
Randomly sampling 13 times is enough to discover a configuration with negative memory:
The text was updated successfully, but these errors were encountered: