-
Notifications
You must be signed in to change notification settings - Fork 3
Performance comparison
Performance comparison among different surrogate models, acquisition functions and explore/exploit factors. See summary.
The tuning is done by matching the test_engine
vs base_engine
. The test_engine
will take the param from the optimizer while the base_engine
will use the best param. In the beginning the best param is the default param and this is used by the base_engine
. After the match (500 games at depth 6 in this case see sample tuning command line), the result is sent to the optimizer. And if the test_engine
won (greater than 50%) the best param is updated. In the next trial the base_engine
will use the current best param while the test_engine
will take new param suggested by the optimizer.
engine: stockfish
study match depth control: --depth 6 --base-time-sec 30
games per trial: --games-per-trial 500
trials: --trials 100
pruner: --threshold-pruner result=0.45
input param: OrderedDict([('FutMargin', {'default': 227, 'min': 50, 'max': 350, 'step': 4}), ('RazorMargin', {'default': 527, 'min': 250, 'max': 650, 'step': 4})])
init param: {'FutMargin': 227, 'RazorMargin': 527}
--sampler name=skopt acquisition_function=EI xi=10000
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_1 --pgn-output study_1.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=10000
study best param: {'FutMargin': 306, 'RazorMargin': 558}
study best value: 0.5095625000000003
study best trial number: 94
Every dot is a 500-game match of stockfish at depth 6. The objective value is the match result between the param suggested by optimizer and the default param. Result is from the point of view of the optimizer param.
The dark dots (darker means higher trial number) are more spread out compared to study 2 as this study 1 prefers more exploration (high xi value) while that of study 2 prefers exploitation (low xi value).
--sampler name=skopt acquisition_function=EI xi=0.0001
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_2 --pgn-output study_2.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=0.0001
study best param: {'FutMargin': 290, 'RazorMargin': 650}
study best value: 0.5086562500000001
study best trial number: 97
--sampler name=skopt acquisition_function=PI xi=10000
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_3 --pgn-output study_3.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=10000
study best param: {'FutMargin': 326, 'RazorMargin': 334}
study best value: 0.5129687500000003
study best trial number: 98
--sampler name=skopt acquisition_function=PI xi=0.0001
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_4 --pgn-output study_4.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=0.0001
study best param: {'FutMargin': 350, 'RazorMargin': 250}
study best value: 0.5089062499999999
study best trial number: 95
--sampler name=skopt acquisition_function=LCB kappa=10000
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_5 --pgn-output study_5.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=10000
study best param: {'FutMargin': 282, 'RazorMargin': 650}
study best value: 0.506265625
study best trial number: 96
--sampler name=skopt acquisition_function=LCB kappa=0.0001
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_6 --pgn-output study_6.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=0.0001
study best param: {'FutMargin': 178, 'RazorMargin': 594}
study best value: 0.5071718750000004
study best trial number: 97
--sampler name=skopt acquisition_function=PI
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_7 --pgn-output study_7.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI
study best param: {'FutMargin': 350, 'RazorMargin': 258}
study best value: 0.5101249999999999
study best trial number: 89
--sampler name=skopt acquisition_function=LCB
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_8 --pgn-output study_8.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB
study best param: {'FutMargin': 226, 'RazorMargin': 650}
study best value: 0.5086718750000001
study best trial number: 92
--sampler name=skopt acquisition_function=EI
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_9 --pgn-output study_9.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI
study best param: {'FutMargin': 298, 'RazorMargin': 250}
study best value: 0.5093906250000005
study best trial number: 99
--sampler name=skopt
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_10 --pgn-output study_10.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt
study best param: {'FutMargin': 306, 'RazorMargin': 250}
study best value: 0.5119843750000003
study best trial number: 95
--sampler name=tpe
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_11 --pgn-output study_11.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=tpe
study best param: {'FutMargin': 110, 'RazorMargin': 338}
study best value: 0.5089843750000003
study best trial number: 93
--sampler name=cmaes
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_12 --pgn-output study_12.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=cmaes
study best param: {'FutMargin': 190, 'RazorMargin': 494}
study best value: 0.5138437500000004
study best trial number: 98
--sampler name=skopt acquisition_function=PI base_estimator=GBRT
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_13 --pgn-output study_13.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=GBRT
study best param: {'FutMargin': 326, 'RazorMargin': 442}
study best value: 0.5124062500000004
study best trial number: 99
--sampler name=skopt acquisition_function=PI base_estimator=ET
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_14 --pgn-output study_14.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=ET
study best param: {'FutMargin': 110, 'RazorMargin': 594}
study best value: 0.5097031250000001
study best trial number: 94
--sampler name=skopt acquisition_function=PI base_estimator=RF
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_15 --pgn-output study_15.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=RF
study best param: {'FutMargin': 166, 'RazorMargin': 254}
study best value: 0.5099531250000003
study best trial number: 98
engine: stockfish
hash: 64
threads: 1
opponent: default param
depth control: 6
games: 10k
study 1 best param after 100 trials vs default -> Elo of study 1 best param
study 2 best param after 100 trials vs default -> Elo of study 2 best param
...
Score of sf_study_1 vs sf_default: 4037 - 4719 - 1244 [0.466] 10000
... sf_study_1 playing White: 1865 - 2482 - 653 [0.438] 5000
... sf_study_1 playing Black: 2172 - 2237 - 591 [0.493] 5000
... White vs Black: 4102 - 4654 - 1244 [0.472] 10000
Elo difference: -23.7 +/- 6.4, LOS: 0.0 %, DrawRatio: 12.4 %
Finished match
Score of sf_study_2 vs sf_default: 4470 - 4219 - 1311 [0.513] 10000
... sf_study_2 playing White: 2372 - 2025 - 603 [0.535] 5000
... sf_study_2 playing Black: 2098 - 2194 - 708 [0.490] 5000
... White vs Black: 4566 - 4123 - 1311 [0.522] 10000
Elo difference: 8.7 +/- 6.3, LOS: 99.6 %, DrawRatio: 13.1 %
Finished match
Score of sf_study_3 vs sf_default: 4368 - 4126 - 1506 [0.512] 10000
... sf_study_3 playing White: 2175 - 1845 - 980 [0.533] 5000
... sf_study_3 playing Black: 2193 - 2281 - 526 [0.491] 5000
... White vs Black: 4456 - 4038 - 1506 [0.521] 10000
Elo difference: 8.4 +/- 6.3, LOS: 99.6 %, DrawRatio: 15.1 %
Finished match
Score of sf_study_4 vs sf_default: 4479 - 4165 - 1356 [0.516] 10000
... sf_study_4 playing White: 2459 - 1956 - 585 [0.550] 5000
... sf_study_4 playing Black: 2020 - 2209 - 771 [0.481] 5000
... White vs Black: 4668 - 3976 - 1356 [0.535] 10000
Elo difference: 10.9 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.6 %
Finished match
Score of sf_study_5 vs sf_default: 4216 - 4270 - 1514 [0.497] 10000
... sf_study_5 playing White: 2232 - 2042 - 726 [0.519] 5000
... sf_study_5 playing Black: 1984 - 2228 - 788 [0.476] 5000
... White vs Black: 4460 - 4026 - 1514 [0.522] 10000
Elo difference: -1.9 +/- 6.3, LOS: 27.9 %, DrawRatio: 15.1 %
Finished match
Score of sf_study_6 vs sf_default: 4082 - 4587 - 1331 [0.475] 10000
... sf_study_6 playing White: 2116 - 2090 - 794 [0.503] 5000
... sf_study_6 playing Black: 1966 - 2497 - 537 [0.447] 5000
... White vs Black: 4613 - 4056 - 1331 [0.528] 10000
Elo difference: -17.6 +/- 6.3, LOS: 0.0 %, DrawRatio: 13.3 %
Finished match
Score of sf_study_7 vs sf_default: 4558 - 3950 - 1492 [0.530] 10000
... sf_study_7 playing White: 2443 - 1825 - 732 [0.562] 5000
... sf_study_7 playing Black: 2115 - 2125 - 760 [0.499] 5000
... White vs Black: 4568 - 3940 - 1492 [0.531] 10000
Elo difference: 21.2 +/- 6.3, LOS: 100.0 %, DrawRatio: 14.9 %
Finished match
Score of sf_study_8 vs sf_default: 4516 - 4177 - 1307 [0.517] 10000
... sf_study_8 playing White: 2177 - 2172 - 651 [0.500] 5000
... sf_study_8 playing Black: 2339 - 2005 - 656 [0.533] 5000
... White vs Black: 4182 - 4511 - 1307 [0.484] 10000
Elo difference: 11.8 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.1 %
Finished match
Score of sf_study_9 vs sf_default: 4411 - 4125 - 1464 [0.514] 10000
... sf_study_9 playing White: 2366 - 1892 - 742 [0.547] 5000
... sf_study_9 playing Black: 2045 - 2233 - 722 [0.481] 5000
... White vs Black: 4599 - 3937 - 1464 [0.533] 10000
Elo difference: 9.9 +/- 6.3, LOS: 99.9 %, DrawRatio: 14.6 %
Finished match
Score of sf_study_10 vs sf_default: 4007 - 4385 - 1608 [0.481] 10000
... sf_study_10 playing White: 2048 - 2130 - 822 [0.492] 5000
... sf_study_10 playing Black: 1959 - 2255 - 786 [0.470] 5000
... White vs Black: 4303 - 4089 - 1608 [0.511] 10000
Elo difference: -13.1 +/- 6.2, LOS: 0.0 %, DrawRatio: 16.1 %
Finished match
Score of sf_study_11 vs sf_default: 3998 - 4206 - 1796 [0.490] 10000
... sf_study_11 playing White: 2006 - 2134 - 860 [0.487] 5000
... sf_study_11 playing Black: 1992 - 2072 - 936 [0.492] 5000
... White vs Black: 4078 - 4126 - 1796 [0.498] 10000
Elo difference: -7.2 +/- 6.2, LOS: 1.1 %, DrawRatio: 18.0 %
Finished match
Score of sf_study_12 vs sf_default: 4393 - 4155 - 1452 [0.512] 10000
... sf_study_12 playing White: 2440 - 2037 - 523 [0.540] 5000
... sf_study_12 playing Black: 1953 - 2118 - 929 [0.483] 5000
... White vs Black: 4558 - 3990 - 1452 [0.528] 10000
Elo difference: 8.3 +/- 6.3, LOS: 99.5 %, DrawRatio: 14.5 %
Finished match
Score of sf_study_13 vs sf_default: 4202 - 4324 - 1474 [0.494] 10000
... sf_study_13 playing White: 2131 - 2068 - 801 [0.506] 5000
... sf_study_13 playing Black: 2071 - 2256 - 673 [0.481] 5000
... White vs Black: 4387 - 4139 - 1474 [0.512] 10000
Elo difference: -4.2 +/- 6.3, LOS: 9.3 %, DrawRatio: 14.7 %
Finished match
Score of sf_study_14 vs sf_default: 4245 - 4322 - 1433 [0.496] 10000
... sf_study_14 playing White: 2059 - 2139 - 802 [0.492] 5000
... sf_study_14 playing Black: 2186 - 2183 - 631 [0.500] 5000
... White vs Black: 4242 - 4325 - 1433 [0.496] 10000
Elo difference: -2.7 +/- 6.3, LOS: 20.3 %, DrawRatio: 14.3 %
Finished match
Score of sf_study_15 vs sf_default: 4421 - 4380 - 1199 [0.502] 10000
... sf_study_15 playing White: 2425 - 2002 - 573 [0.542] 5000
... sf_study_15 playing Black: 1996 - 2378 - 626 [0.462] 5000
... White vs Black: 4803 - 3998 - 1199 [0.540] 10000
Elo difference: 1.4 +/- 6.4, LOS: 66.9 %, DrawRatio: 12.0 %
Finished match
The studies or tuning are all done at depth 6 engine vs engine matches of 500 games per trial to get the objective value. Each study is consists of 100 trials.
The best param in each study is matched against the default param. The Elo of default param is set to 0 as reference.
Elo1 is from a game match at depth 6 on 10k games.
Elo2 is from a game match at tc 1s+50ms on 1k games. Average depth is around 13.
study | sampler | model | acq_func | explore/exploit | Elo1 | Elo2 |
---|---|---|---|---|---|---|
1 | skopt | GP | EI | xi=10000, explore | -23.7 | 2.1 +/- 15.8 |
2 | skopt | GP | EI | xi=0.0001, exploit | +8.7 | - |
3 | skopt | GP | PI | xi=10000, explore | +8.4 | - |
4 | skopt | GP | PI | xi=0.0001, exploit | +10.9 | - |
5 | skopt | GP | LCB | kappa=10000, explore | -1.9 | - |
6 | skopt | GP | LCB | kappa=0.0001, exploit | -17.6 | - |
7 | skopt | GP | PI | xi=0.01, default | +21.2 | - |
8 | skopt | GP | LCB | kappa=1.96, default | +11.8 | - |
9 | skopt | GP | EI | xi=0.01, default | +9.9 | - |
10 | skopt | GP | gp_hedge | - | -13.1 | - |
11 | optuna | TPE | EI | - | -7.2 | - |
12 | optuna | CmaEs | EI | - | +8.3 | - |
13 | skopt | GBRT | PI | xi=0.01, loss=quantile | -4.2 | - |
14 | skopt | ET | PI | xi=0.01, default | -2.7 | - |
15 | skopt | RF | PI | xi=0.01, default | +1.4 | - |