At a glance... |
Syllabus |
Models |
Code |
Lecturer
MaxWalkSat is a non-parametric stochastic method for sampling the landscape of the local region. Historically speaking, MaxWalkSat was a very impactful algorithm. But, at least here, the real purpose of discussing MaxWalkSat is to introduce the idea of landscapes. It will be argued that more important than the algorithms is the shape of the space they search. Since this shape can change, it is not possible to prove the adequacy of these meta-heuristics unless you first characterize the space they are trying to explore.
So before we talk about MaxWalkSat, we had better talk sone about landscapes.
The concept of "search" is surprisingly general. It can be applied to many applications. For example here are some recent applications of search-based studies, just by me and my students, in the last few years:
- In 2014: learning better pilot strategies;
- in 2013: feature planning;
- In 2012: test case generation;
- In 2010: spacecraft re-entry optimization;
- In 2009: software process planning;
- In 2007: software tool selection;
- In 2002: optimizing risk mitigation plans
for software requirements engineering (updated and heavily optimized here).
- Pardon my hubris, but I have to say that this paper is just a little bit famous: according to Mark Harman, it was one of the earliest papers to apply Pareto optimality in search-based software engineering (SBSE) for requirements engineering.
This a wide range of application areas- so much so that it really begs the question "why is search so useful"?
The real story is that underneath surface features of all these problems is a common mathematical structure called, you guessed it, the landscape.
To know the landscape is to know how to optimize, how to avoid getting stuck on being over-adapted (hence overspecialized) on some local peak, when as Stewart Brand so aptly puts it...
- "Whole mountain ranges of opportunity could be glimpsed in the distance, but getting to them involved venturing 'downhill' into regions of lower fitness".
Studying such landscapes made Brand distrust claims for "optimality" since what you call "optimum" may actually be just a stepping zone to a better place.
Brand's favorite landscape comes from a 1932 genetics paper that discusses how different breeding strategies respond well (or badly) to environmental pressures. In the following, the x-y axis might be "length of hair" and "body weight" and the z-axis might "probability of winning a fight".
Says Brand:
-
"The first two illustrate how low selection pressure or a high rate of mutation (which comes with small populations) can broaden the range of a species whereas intense selection pressure or a low mutation rate can severely limit a species to the very peak of local fitness. The third diagram shows what happens when the landscape itself shifts, and the population has to evolve to shift with it."
-
"The bottom row explores how small populations respond to inbreeding by wandering ineffectively. The best mode of exploration Wright deemed the final diagram, showing how a species can divide into an array of races that interact with one another. That jostling crowd explores well, and it can respond to opportunity."
So to understand search, understand the landscape. If you know the landscape, you can see where it can trap and where it can help us out. One such trap is the saddle which, in the above diagram is the flat space between the mountain (called a pole) and the hole next to it.
Note that if walk along the saddle, you might think you are in a stable space of solutions. But be warned, one slip to the left or right and the landscape changes dramatically. In the above space you might fall into a hole or a pole.
Another trap is the local minima that seems like a good idea but if you get sucked into it, you may never find the much better place:
Another bad landscape is one that is completely flat. Try as you like, you walk along way around this one before finding anything better or worse:
The opposite of flat is bumpy:
Bumpy landscapes are common so Harman comments that understanding the neighborhood of our solutions is an open and pressing issue in search-based software engineering (SBSE):
- "In some software engineering applications, solution robustness may be as im-portant as solution functionality. For example, it may be better to locate an area of the search space that is rich in fit solutions, rather than identifying an even better solution that is surrounded by a set of far less fit solutions."
- "Hitherto, research on SBSE has tended to focus on the production of the fittest possible results. However, many application areas require solutions in a search space that may be subject to change. This makes robustness a natural second order property to which the research community could and should turn its attention."
For an example of a solution that would alarm Harman, here is a pendulum with a red ball on the end. Would you care to live on this ball? I think not.
Bumpy landscapes mean that, sometimes, to achieve better goals you may have to first give up some of your current achievements. In the history of A.I. this is also called Sussmann's anomaly- that sometimes the way to "better" is via "worse". There are many ways to "jump over" these anomalies. Sussman (and his supervisor, Marvin Minsky) believed that intelligence requires an exolicit list of exceptions or tricks and that any plan for making things better had better be auditted by a "debugging" system. That is a knowledge-full approach that requires some analyst to supply descriptions of "when to jump around". Alternate knowledge-less approaches are:
- Stochastic jumps using, say, simulated annealing;
- The retries mechanism discussed below;
- Momentum constants added to the mutators of stocahstic search. Such momentum constants resist sudden stops to the search: if a local maximum is reached, the momentum constant would push the inference a little further just to see if anything better lies beyond the current position. For example, in the following landscape, momentum would nudge the ball over that little lip into a better (and lower) world.
Which is best: knowledge-full or knowledge-less? Well, that depends on the nature of the problem, the intrinsic value of the knowledge, etc etc. But the general engineering trade-off is that knowledge-less approaches are faster to build and maintain, while the knowledge-full approaches perform comparatively better. FYI- I used to work on knowledge-full approaches but I have found my life to be easier since I switched to knowledge-less.
Since landscapes are so important, they have been extensively studied. In the case of systems of numeric equations, it is possible to infer the landscape from the maths. For example, here are eight important kinds of landscapes:
The Jacobian matrix (the matrix of all first-order partial derivatives) of these three-dimensional systems have 3 eigenvalues:
- one of which must be real
- and the other two can be either both real or complex-conjugate.
Depending on the types and signs of the eigenvalues, there are a few interesting cases :
- Node when all eigenvalues are real and have the same sign; The node is stable (unstable) when the eigenvalues are negative (positive);
- Saddle: when all eigenvalues are real and at least one of them is positive and at least one is negative; Saddles are always unstable;
- Focus-Node: when it has one real eigenvalue and a pair of complex-conjugate eigenvalues, and all eigenvalues have real parts of the same sign; The equilibrium is stable (unstable) when the sign is negative (positive);
- Saddle-Focus when it has one real eigenvalue with the sign opposite to the sign of the real part of a pair of complex-conjugate eigenvalues; This type of equilibrium is always unstable.
This is all well and good- if you are studying a 3d continuous numeric system. However, there are many other kinds of systems. For example the FORTRAN code used in spacecraft re-entry optimization may be 100,000s lines of code with a multitide of discrete (e.g. boolean) values as well as conditionals that divide the internal space into many discontinuous regions (one for each if-else branch). For such more complex systems, to learn the landscape, the best you can do is:
- Poke around a little;
- When done, go back and do it all again to see if you end up in a different place.
Enter MaxWalkSat.
The above was a very long introduction to a very simple idea: when randomly searching a space by mutating all dimensions, sometimes fixate on improving things on just one dimension. This is called local search.
In terms of landscapes, it is like:
- Jump all around the hills
- Sometimes, sitting still while rolling marbles left and right
- Then taking one step along the direction where the marbles roll the furthest.
- Go to 1.
The code is remarkably simple- even simpler than simulated annealing. There is no notion of "cooling" or jumping to non-optimal local solutions. Instead, at some probability p just jump around. Then, at probability (1-p), fixate on one variable and try all its values in (say) increments of (max-min)/10 (and use the value that most improves the score function).
Note that there is an added idea: the retry. Sometimes, there is no point repainting the house if the foundations are crumbling. So, occasionally, just go back to the beginning to reset and retry (I love retries- its such a lazy way to solve problems: give up, forget everything, and start all over). In terms of landscapes, the idea is that good solutions are few and far between. If you are not making good progress locally, you should not crawl around the local space- better to leap to somewhere else and start over.
Here's the code for local search, adapted from Kautz and Selman et al.'s MaxWalkSat system:
FOR i = 1 to max-tries DO
solution = random assignment
FOR j =1 to max-changes DO
IF score(solution) > threshold
THEN RETURN solution
FI
c = random part of solution
IF p < random()
THEN change a random setting in c
ELSE change setting in c that maximizes score(solution)
FI
RETURN failure, best solution found
This is a very successful algorithm. Here's some performance of WALKSAT (a simpler version of MAXWALKSAT) against a complete search
Here's another example: two AI algorithms trying to solve the "latin's square" problem: i.e. pick an initial pattern, then try to fill in the rest of the square with no two colors on the same row or column.
- The deterministic method is the kind of exhaustive theorem proving method used in the 1960s and 1970s that convinced people that "AI can never work".
- The stochastic method is a local search which looks for a color to flip that reduces the number of conflicts on each row or column.
This stochastic local search kills the latin squares problem (and, incidently, many other problems).
So the important idea here is not simulated annealing or MaxWalkSat or whatever other algorithm you'd care to name. The real thing to understand is landscapes and how we can walk around it, faster. Happy trails campers!
Copyright © 2015 Tim Menzies.
This is free and unencumbered software released into the public domain.
For more details, see the license.