-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
462 lines (402 loc) · 28.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<meta name="Author" content="Nikolaus Hansen"/>
<meta name="keywords" content="source code,
evolution strategy, evolutionary algorithm, evolutionary strategy,
covariance matrix adaptation, CMA, CMA-ES, CMAES,
black-box optimization, robust optimization, robust search,
global optimization, noisy optimization"/>
<meta name="description" content=""/>
<title>The CMA Evolution Strategy</title>
<meta name="filter" content="translated by html2xhtml - http://www.it.uc3m.es/jaf/conversor.html"/>
</head>
<body text="#000000" bgcolor="#ffffff" link="#0000ef" vlink="#51188e" alink="#ff0000">
<FONT size=-1><A HREF="index.html">UP</A></FONT>
<h2>The CMA Evolution Strategy</h2>
The CMA-ES (<b>C</b>ovariance
<b>M</b>atrix
<b>A</b>daptation
<b>E</b>volution
<b>S</b>trategy) is an evolutionary algorithm for difficult
<b>non-linear non-convex black-box optimisation</b> problems in continuous
domain. The CMA-ES is considered as state-of-the-art in evolutionary computation and
has been adopted as one of the <b>standard tool</b>s for continuous optimisation in many (probably hundreds of) research labs and industrial environments around the world.
The CMA-ES is typically applied to unconstrained or
bounded constraint optimization problems, and search space
dimensions between three and a hundred. The method should be
applied, if derivative based methods, e.g. quasi-Newton BFGS or
conjugate gradient, (supposedly) fail due to a rugged search
landscape (e.g. discontinuities, sharp bends or ridges, noise,
local optima, outliers). If second order derivative based methods
are successful, they are usually faster than the CMA-ES: on purely
convex-quadratic functions, f(x)=x<SUP>T</SUP>Hx, BFGS (Matlabs
function fminunc) is typically faster by a factor of about ten (in
terms of number of objective function evaluations needed to reach
a target function value, assuming that gradients are not
available). On the most simple quadratic function
f(x)=||x||<SUP>2</SUP>=x<SUP>T</SUP>x BFGS is faster by a factor
of about 30.
<p>Similar to quasi-Newton methods (but not inspired by them), the CMA-ES is a <b>second
order</b> approach estimating a positive definite matrix within an
iterative procedure (more precisely: a covariance matrix, that is,
<I>on convex-quadratic functions</I>, closely related to the inverse
Hessian). This makes the method feasible on non-separable and/or
badly conditioned problems. In contrast to quasi-Newton methods,
the CMA-ES does not use or approximate gradients and does not even
presume or require their existence. This makes the method feasible
on <b>non-smooth</b> and even non-continuous problems, as well as on
multimodal and/or noisy problems. It turns out to be a
particularly reliable and highly competitive evolutionary
algorithm for local optimization (<a
href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenaost2001">Hansen & Ostermeier
2001</a>) and, surprising at first sight, also for global
optimization (<a
href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenukernppsn2004">Hansen & Kern
2004</a>, <a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/cec2005.html"> CEC 2005</a>, <a
href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#gecco09-bipop">Hansen 2009</a> in <a href="https://numbbo.github.io/ppdata-archive/bbob/2009">BBOB-2009</a>).
</p>
<p><A NAME=invariances>The CMA-ES has several <b>invariance
properties</b></A>. Two of them, inherited from the plain
evolution strategy, are (i) invariance to order preserving
(i.e. strictly monotonic) transformations of the objective
function value (that is, e.g., ||x||<SUP>2</SUP> and
3||x||<SUP>0.2</SUP>-100 are <I>equivalent</I> objective functions
with identical performance of CMA-ES), and (ii) invariance to
angle preserving (rigid) transformations of the search space
(including rotation, reflection, and translation), if the initial
search point is transformed accordingly. Invariances are highly
desirable: they imply uniform behavior on classes of functions and
therefore imply the generalization of empirical results.
</p>
<p><A NAME=parameters>The CMA-ES does not require a tedious
<b>parameter tuning</b> for its application</A>. In fact, the choice
of strategy internal parameters is not left to the user
(arguably with the exception of population size λ).
Finding good (default) strategy parameters is considered as part
of the algorithm design, and not part of its
application—the aim is to have a well-performing algorithm
<I>as is</I>. The default population size λ is
comparatively small to allow for fast convergence. Restarts with
increasing population size (<A
HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#augeruhansen2005a">Auger & Hansen
2005<A>) improve the global search performance. For the
<b>application</b> of the CMA-ES, an initial solution, an initial
standard deviation (step-size, variables should be defined such
that the same standard deviations can be reasonably applied to
all variables, see also <A HREF="cmaes_sourcecode_page.html#practical">here</A>)
and, possibly, the termination criteria (e.g. a
function tolerance) need to be set by the user.
The most common applications are model calibration (e.g. curve fitting)
and shape optimisation.
<H3> Materials and Source Code </H3>
<ul>
<li>
Four slides on randomized search and the CMA-ES
<a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/searchandcmaslides.pdf">
(2004, pdf)</a>.
</li>
<li>
A <A HREF="http://arxiv.org/abs/1604.00772">written <FONT color="#ff0000">tutorial</FONT></A>. </li>
<LI>
<A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/gecco2013-CMA-ES-tutorial.pdf">Tutorial slides (pdf 1.8MB)</A>
presented at the <A HREF="http://www.sigevo.org/gecco-2013">GECCO 2013</A>. </LI>
<LI>
<A HREF="http://en.wikipedia.org/wiki/CMA-ES">CMA-ES at wikipedia</A>
</LI>
<LI>Comparison studies
<UL>
<LI>
<A HREF=https://numbbo.github.io/ppdata-archive/bbob/2009>BBOB 2009 Comparison with a wide variety of other algorithms on 24 benchmark functions</A>
</LI>
<li>
<A HREF="http://cma-es.github.io/cec2005">CEC 2005 Comparison with other Evolutionary Algorithms on a
Benchmark Function Set</A>
</li>
<li>
<A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#kernea2004">Kern et al (2004)</A> compare
the CMA-ES with two other Evolution Strategies, (1+1)-ES and CSA-ES, and with two Estimation of
Distribution Algorithms, IDEA and MBOA, on 14 separable and non-separable benchmark
functions (<A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/kernnatcomp.pdf">pdf 827KB</A>).
</li>
</UL>
<li>
<A HREF="cmaes_sourcecode_page.html"><FONT color="#ff0000">Source code page</FONT></A> (C, C++, Java, Matlab, Octave, Python, Scilab)</li>
<LI> A list of <a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/cmaapplications.pdf">references to
<FONT color="#ff0000">applications</FONT> (pdf)</a> of the CMA-ES (not quite exhaustive and <i>entirely</i> outdated).
</LI>
</UL>
</LI>
</ul>
<A name="chronobiography">
<h3>Commented Bibliography (chronologically)</h3>
</A>
Following is a commented bibliography mainly of the crucial advances on the single-objective and non-elitist "standard" CMA-ES.
Some papers on elitistic or multi-objective algorithm variants are included as well.
<ol>
<LI><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenea1995">Hansen et
al (1995). On the adaptation of arbitrary normal mutation distributions
in evolution strategies: The generating set adaptation.</A> In L. Eshelman
(Ed.), <I>Proceedings of the Sixth International Conference on Genetic
Algorithms</I>, Pittsburgh, pp. 57-64. Morgan Kaufmann.</font>
<p>The generating set adaptation (GSA) is the first evolution strategy
to our knowledge that learns a problem scaling <i>and</i> is invariant
to coordinate system rotations. The GSA implements the principal idea
of CMA, to increase the likelihood of selected steps, without using
the formalism of a covariance matrix. Global step-size control is
done by mutative self-adaptation. The paper also provides an algorithm
that learns individual step-sizes and one direction (AII). AII has a
linear number of strategy parameters and hence learns a less rich model
quicker.
</p>
</LI>
<LI><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenaost1996"> Hansen
and Ostermeier (1996). Adapting arbitrary normal mutation
distributions in evolution strategies: The covariance matrix
adaptation.</A> In <I>Proceedings of the 1996 IEEE International
Conference on Evolutionary Computation</I>, pp. 312-317.</font>
<p>The first CMA paper, where the covariance matrix adaptation is
introduced into the (1,λ)-ES (μ=1). The paper emphasizes on
the evolution path and the differences to the generating set
adaptation (<a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenea1995">Hansen et al
1995</a>). Using the formalism of a covariance matrix has three
important consequences. First, path length control (cumulative
step-size adaptation, CSA) can be used to control the global
step-size. This will become particularly relevant when μ is chosen
to be greater than 1. Second, the eigensystem, determined by the
covariance matrix, can give insights into the underlying problem
structure. De facto, it turns out to be particularly useful to monitor
the time evolution of eigenvalues of the covariance matrix. Third, the
computational expense to <i>generate</i> a new individual reduces from
<i>O(n<sup>3</sup>)</i> to <i>O(n<sup>2</sup>)</i>. Because the eigendecomposition can be postponed for <i>Ω(n)</i> iterations, the overall internal expenses become <i>O(n<sup>2</sup>)</i>.
</p>
</li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenaost1997"> Hansen
and Ostermeier (1997). Convergence properties of evolution strategies
with the derandomized covariance matrix adaptation: The
<nobr>(μ/μ<sub>I</sub>, λ)-</nobr>ES.</A> In
<I>EUFIT'97, 5th Europ.Congr.on Intelligent Techniques and Soft
Computing, Proceedings</I>, pp. 650-654.</font>
<p>Introduction of the "multi-membered" (μ/μ,λ)-CMA-ES
with intermediate recombination of all μ parental
solutions. Investigation of the effect of μ for small λ on unimodal
test functions, some including distorted selection. The (3/3,12)-CMA-ES is
recommended as the best compromise for being fast and robust. </p>
</p>
</li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenaost2001"> Hansen
and Ostermeier (2001). Completely Derandomized Self-Adaptation in
Evolution Strategies.</A> <I>Evolutionary Computation</I>, 9(2),
pp. 159-195 (2001).</font>
<p>A comprehensive article, where weighted recombination is
introduced in the (μ/μ,λ)-CMA-ES. Objectives and rationales
behind the strategy are discussed as well as (default) strategy parameter
choices in detail. Comparisons with the so-called <em>correlated mutations</em>
and scale-up investigations, up to 320-D, on a considerable number of unimodal test
functions are shown. On the multimodal Rastrigin function the myth
that (local) adaptation to the topography of the function jeopardizes
global search properties is disproved. Local adaptation
allows for larger step-sizes and consequently is likely to improve global
search performance. </p> </li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenea2003">
Hansen et al (2003). Reducing
the Time Complexity of the Derandomized Evolution Strategy with
Covariance Matrix Adaptation (CMA-ES).</A> <i>Evolutionary
Computation</i>, 11(1), pp. 1-18 (2003).</font>
<p>
Extension with the so-called rank-μ update of the covariance matrix
which scales up the efficiency of the CMA to large population
sizes. The rank-μ update reduces the time complexity, i.e. the
number <EM>of generations</EM> to adapt the complete matrix roughly
from 10<i>n</i><sup><font size="-1">2</font></sup> to 20<i>n</i>. The
other way around, the learning speed, with respect to the number of
objective function evaluations, depends much less on the population
size if the rank-μ update is used. Reasonable population sizes
range between 5 and 5<i>n</i>. For larger population sizes a
deficiency of the step-size adaptation is discovered (that has been solved in <A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenukernppsn2004">
Hansen and Kern (2004)</A>). Simulations on a
number of unimodal test functions and scale-up investigations are
presented. </p> </li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenukernppsn2004">
Hansen and Kern (2004). Evaluating the CMA Evolution Strategy on
Multimodal Test Functions.</A> In <i> Eighth International Conference
on Parallel Problem Solving from Nature PPSN VIII, Proceedings</i>,
pp. 282-291.</font>
<p>
In this paper 1) the <i>weighted</i> rank-μ update of the
covariance matrix is introduced. 2) the step-size adaptation problem
discovered in <a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenea1995">Hansen et al
(2003)</a> is addressed by a considerably improved setting of the
step-size control parameters: for large population sizes the backward time
horizon of the evolution path is diminished by increasing the
parameter <i>c</i><sub><font size="-1">σ</font></sub> and the step-size damping <i>d</i><sub><font size="-1">σ</font></sub> is increased. 3) the
influence of the population size on the performance is investigated on
a number of multimodal functions, revealing that an increased
population size can dramatically improve the global search
performance. Comparisons to other optimization algorithms reveal that
on additively decomposable (and therefore separable) functions the
CMA-ES can be vastly outperformed, e.g., by <em>differential evolution</em>,
while it reveals superior performance on non-separable functions.
</p>
</li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#augeruhansen2005a"> Auger
and Hansen (2005). A Restart CMA Evolution Strategy With Increasing
Population Size.</A> In
<I>IEEE Congress on Evolutionary Computation, CEC
2005, Proceedings</I>, pp. 1769-1776. </font>
<p>
The remaining strategy parameter <em>population size</em>, λ,
that is most important for the global search performance, is removed
by implementing a restart mechanism. Before each restart the
population size is increased by a factor of two. Otherwise restarts
are conducted independently. Tests on a benchmark function set of 25
functions provided for the <a href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/cec2005.html">session on
real-parameter optimization</a> reveal highly competitive performance
on local <em>and</em> global optimization problems, compared to the
other submitted algorithms. </p></li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#hansenReview2006">
N. Hansen (2006). The CMA Evolution Strategy: A Comparing Review.</A>
In J.A. Lozano, P. Larrañaga, I. Inza and E. Bengoetxea (Eds.).
Towards a new evolutionary computation. Advances in estimation of
distribution algorithms. Springer, pp. 75-102.</font>
<p>
The CMA is approached and analyzed from the view point of
<em>estimating a search distribution</em>. Important differences to
the so-called <em>estimation of distribution algorithms</em> (EDAs)
are revealed. 1) in CMA the distribution of the selected <i>steps</i>
is estimated, while in EDAs the selected <i>points</i> are
modeled. This leads invariably to larger variances in CMA. In particular, a
"pure EDA" converges prematurely on a linear function whereas a "pure CMA" (without step-size control) does not. 2) in EDAs the estimation is usually done from scratch,
while in CMA the distribution is updated (at least for population size
λ < 4<I>n</I><SUP>2</SUP>). Therefore in CMA the estimation can be done
reliably <I>independent of the population size</I> and, for a small population size, in a lesser number of function evaluations. 3) in CMA an
additional global step-size adaptation is conducted, based on a different
adaptation principle (cumulative step-size adaptation, or path length
control) that often leads to a considerably larger population spread and prevents premature convergence even more effectively.
Major concepts and design principles, namely change rates, invariance,
and stationarity are discussed.
</p></li>
<li><font size=-1><A
HREF="http://dx.doi.org/10.1109/CEC.2006.1688662"> Jastrebski
and Arnold (2006). Improving evolution strategies through active
covariance matrix adaptation.</A> In <i>2006 IEEE World Congress on
Computational Intelligence, Proceedings</i>,
pp. 9719-9726</font>
<p>
A <i>negative</i> rank-μ update of the covariance matrix is introduced using the μ worst of the λ individuals.
All vectors for both, the original and the negative, rank-μ updates
have the same weight. For population sizes up to λ=4<i>n</i> a
notable speed-up compared to the original equally weighted
rank-μ update can be observed. On the discus (or tablet) function
the speed-up exceeds a factor of two for large dimensions and small
population sizes. While positive definiteness of the covariance matrix
cannot be guaranteed anymore, empirically the covariance matrix
remains always positive definite. For a larger population size
presumably the learning parameter β needs to be modified.
</p></li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#igeletal2006"> Igel et al
(2006). A Computational Efficient Covariance Matrix Update and a
(1+1)-CMA for Evolution Strategies.</a> In <I>Genetic and Evolutionary
Computation Conference, GECCO 2006, Proceedings</I>,
pp.453-460.</font>
<p>
This paper introduces 1) the CMA into the (1+1)-ES using an improved
variant of a success rule based step-size adaptation in place of the
path length control, and 2) an incremental <I>O(n<SUP>2</SUP>)</I>
Cholesky update of the covariance matrix replacing the original
<I>O(n<SUP>2</SUP>)</I> covariance matrix update altogether with the
iterative <I>O(n<SUP>3</SUP>)</I> eigendecomposition of the covariance
matrix (the latter is usually postponed until after <I>n/10</I>
generations). The resulting algorithm is an elegant CMA variant and
accomplishes what is expected. Nevertheless, its practical relevance
is limited. The (1+1)-ES is slightly faster than its ","-counterpart,
but more prone to converge to local optima. Applying the Cholesky
update prohibits to utilize the evolution path for the covariance
matrix update. This is a significant disadvantage as the evolution
path considerably improves the performance on many functions and was
addressed in <A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#suttorp2009">Suttorp et.al. (2009)</A>.
</p></li>
<li><font size=-1><A HREF="https://doi.org/10.1162/evco.2007.15.1.1">Igel et al (2007). Covariance Matrix Adaptation for Multi-objective Optimization.</A> <I>Evolutionary Computation</I> 15 (1): 1-28.</font>
<p>
MO-CMA-ES is a multi-objective algorithm maintaining several (1+1)-CMA-ES algorithms in parallel to approximate the pareto set. The algorithm is based on non-dominated sorting. <A HREF="https://doi.org/10.1007/978-3-540-70928-2_16">Igel et al (2007)</A> introduce steady-state selection into this algorithm. <A HREF="https://dl.acm.org/doi/10.1145/1830483.1830573">Voß et al (2010)</A> improve the step-size adaptation.
</p></li>
<li><font size=-1><A HREF="https://link.springer.com/content/pdf/10.1007%2F978-3-540-87700-4_30.pdf">Ros and Hansen (2008). A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity.</A> In <i><A HREF="https://link.springer.com/book/10.1007/978-3-540-87700-4"> Tenth International Conference
on Parallel Problem Solving from Nature PPSN X, Proceedings</A></i>,
pp. 296-305.</font>
<p>
A very simple change in the covariance matrix update restricts the changes to the diagonal elements of the matrix, AKA separable or diagonal CMA-ES. As the most important consequence, the learning rate for the covariance matrix update can be increased from roughly <I>O(1/n<SUP>2</SUP>)</I> to <I>O(1/n)</I>. Also, internal time and memory complexity become linear in the dimension. However, the main feature of CMA-ES to learn dependencies between variables is abandoned. Yet, surprisingly, the sep-CMA-ES is competitive on the Rosenbrock function in dimension larger than a hundred.
</p>
</li>
<li><font size=-1><A HREF="https://hal.inria.fr/inria-00382093/document">Hansen (2009). Benchmarking a BI-Population CMA-ES on the BBOB-2009
Function Testbed</A>. <I>Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers</I>, pp. 2389-2396.</font>
<p>
This paper makes two somewhat significant algorithmic contributions. (i) The cumulation parameter setting <I>c<SUB>c</SUB></I> is improved for large population size such that it approaches 1/2 instead of 4 / <I>n</I> when μ / <I>n</I> → ∞ and <I>n</I> ≫ 1. This improves the performance with very large population sizes roughly by a factor of two (not shown). (ii) The BIPOP restart scheme with two modes is invented: the original IPOP mode (Auger and Hansen 2005) and a mode with small population size. The resulting algorithm is a simple portfolio between the IPOP-CMA-ES and a CMA-ES with independent restarts where the initial conditions for mean and step-size vary. The latter mode allows to solve some weakly structured multi-modal functions that IPOP-CMA-ES fails to solve. The benchmarking data provide a performance baseline, however generally, algorithm portfolios may often not be the best baseline to choose from.
</p></li>
<li><font size=-1><A HREF="https://www.ini.rub.de/upload/file/1511450464_867252f68023ec06d925/glasmachers2010b.pdf">Glasmachers et al (2010). Exponential natural evolution strategies</A>. In <I>Proceedings Companion of the 12th Annual Conference Companion on Genetic and Evolutionary Computation Conference</I>, pp. 393-400.</font>
<p>This seminal paper introduces the exponential Natural Evolution Strategy (xNES) and shows its close connection to CMA-ES. The additive update of the covariance matrix <I>C</I> with a weighted empirical covariance matrix √<I>CM</I>√<I>C</I> and learning rate 2<I>η</I> can be expressed as a multiplicative update of <I>C</I> or, in first order, of √<I>C</I> with I + <I>ηM</I> if the weights sum to zero or with <I>I</I> + <I>η</I>(<I>M</I> - I) if the weights sum to one. These multiplicative updates are in first order equivalent with the exponential multiplicative update by exp(<I>ηM</I>) or exp(<I>η</I>(<I>M</I> - I)). The exponential update guaranties <I>C</I> to remain positive definite. In xNES, all recombination weights are offset such that they sum to zero.
</p></li>
<li><font size=-1><A HREF="http://www.cmap.polytechnique.fr/~nikolaus.hansen/ws1p32-hansen.pdf">Hansen and Ros (2010).
Benchmarking a weighted negative covariance matrix update
on the BBOB-2010 noiseless testbed</A>. In <I>Proceedings Companion of the 12th Annual Conference Companion on Genetic and Evolutionary Computation Conference</I>, pp. 1673-1680.</font>
<p>The idea of using different recombination weights (Hansen and Ostermeier 2001) is combined with the idea of active CMA-ES using negative weights for the covariance matrix update (Jastrebski and Arnold 2006). This combination is straightforward and has become the default implementation of CMA-ES.
</p></li>
<li><font size=-1><A HREF="https://link.springer.com/content/pdf/10.1007%2F978-3-642-15844-5_16.pdf">Akimoto et al (2010). Bidirectional relation between CMA evolution strategies and natural evolution strategies. In International Conference on Parallel Problem Solving from Nature.</A>
In <i><A HREF="https://link.springer.com/book/10.1007/978-3-642-15844-5"> 11th International Conference on Parallel Problem Solving from Nature PPSN XI, Proceedings</A></i>, pp. 154-163.</font>
<p>This paper fully establishes the link between a natural gradient descent in the parametrized space of Gaussian distributions and the update equations in the CMA-ES algorithm. Remarkably, the update of the covariance matrix can be (and has been) expressed without explicit computation of the Fisher matrix, which would often be prohibitively expensive. This link provides a deep theoretical foundation to CMA-ES.
</p></li>
<!-- Anne Auger, Dimo Brockhoff, Nikolaus Hansen
Mirrored Sampling in Evolution Strategies With Weighted Recombination
Genetic and Evolutionary Computation Conference (GECCO 2011), SIGEVO, Jul 2011, Dublin, Ireland. pp.861-868, ⟨10.1145/2001576.2001694⟩
inria-00612522
DOI : 10.1145/2001576.2001694 -->
<li><font size=-1><A HREF="https://link.springer.com/content/pdf/10.1007%2F978-3-642-32937-1_30.pdf">Loshchilov et al (2012). Alternative Restart Strategies for CMA-ES.</A> In <i><A HREF="https://link.springer.com/book/10.1007/978-3-642-32937-1"> 12th International Conference on Parallel Problem Solving from Nature PPSN XII, Proceedings</A></i>, pp. 296-305.</font>
<p>
The initial step-size is an important parameter for the performance on multi-modal functions. Surprisingly, a relatively small initial step-size can significantly outperform the usual default choice on some multi-modal functions even when a large population is needed. This gives rise to consider more intricate variation schemes for the initial step-size using also smaller step-sizes not only when the population size is small.
</p></li>
<li><font size=-1><A HREF="http://hal.inria.fr/hal-00997294/en">Hansen et al (2014). How to assess step-size adaptation mechanisms in randomised search.</A> In <i><A HREF="http://www.springer.com/computer/ai/book/978-3-319-10761-5"> 13th International Conference on Parallel Problem Solving from Nature PPSN XIII, Proceedings</A></i>, pp. 60-69.</font>
<p>
While this paper is not stricly about CMA-ES, it introduces a polished and fully parallelizable version of two point adaptation (TPA) which has since become a secondary standard and default for step-size adaptation in CMA-ES. Two candidate solutions are sampled on the line of the last mean shift symmetrically about the current mean. Their ranking difference is used to update the step-size. When combined with CMA-ES, the norm in the denominator of line 10 in Algorithm 3 is the Mahalanobis norm from the current covariance matrix, see also <A href="http://www.cmap.polytechnique.fr/~nikolaus.hansen/publications.html#akimoto2016projection">Akimoto & Hansen (2016)</A>, Eq.(6).
</p></li>
<!-- Dirk Arnold, Nikolaus Hansen
A (1+1)-CMA-ES for Constrained Optimisation
GECCO, ACM, Jul 2012, Philadelphia, United States. pp.297-304, ⟨10.1145/2330163.2330207⟩
hal-00696268
DOI : 10.1145/2330163.2330207
-->
</li><LI>2014- to be continued. </LI>
</ol>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-11976094-1");
pageTracker._trackPageview();
} catch(err) {}</script>
</body>
</html>
<FONT size=-1><A HREF="index.html">UP</A></FONT>
<hr WIDTH="100%">
<!-- <center><font size="-1"><i>
last change $Date: 2016-12-15 23:47:51 +0100 (Thu, 15 Dec 2016) $
</i></font></center> -->
<P ALIGN="right">
<a href="http://extremetracking.com/open?login=hanseni">
<img src="http://t1.extreme-dm.com/i.gif" style="border: 0;"
height="38" width="41" id="EXim" alt="eXTReMe Tracker" /></a>
<script type="text/javascript"><!--
var EXlogin='hanseni' // Login
var EXvsrv='s10' // VServer
EXs=screen;EXw=EXs.width;navigator.appName!="Netscape"?
EXb=EXs.colorDepth:EXb=EXs.pixelDepth;
navigator.javaEnabled()==1?EXjv="y":EXjv="n";
EXd=document;EXw?"":EXw="na";EXb?"":EXb="na";
EXd.write("<img src=http://e1.extreme-dm.com",
"/"+EXvsrv+".g?login="+EXlogin+"&",
"jv="+EXjv+"&j=y&srw="+EXw+"&srb="+EXb+"&",
"l="+escape(EXd.referrer)+" height=1 width=1>");//-->
</script><noscript><div id="neXTReMe"><img height="1" width="1" alt=""
src="http://e1.extreme-dm.com/s10.g?login=hanseni&j=n&jv=n" />
</div></noscript>
</P>