Skip to content

The single pole balancing experiment

Iaroslav Omelianenko edited this page Nov 15, 2021 · 7 revisions

The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge.

The pole-balancing problem can be visualized as follows:

The single pole-balancing

As you can see from the scheme above, the physical system has the following constraints:

  1. The pole must remain upright within ±r the pole failure angle
  2. The cart must remain within ±h of origin
  3. The controller must always exert a non-zero force Fx

Where r is a pole failure angle (±12 ̊ from 0 ̊) and h is a track limit (±2.4 meters from the track centre).

The simulation of the cart balancing terminated when either the following conditions were met: the pole inclines over the failure angle, or the cart position is outside of the track.

The experiment's goal is to devise a controller that can keep the pole balanced for a defined number of simulation steps. The controller always applies to the cart a force at full magnitude in either direction (bang-bang control). The successful controller should simulate a single-pole balancing for at least 500'000 time steps (10'000 simulated seconds).

To run this experiment with a population size equal to 150 organisms, execute the following commands:

cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole

The above command will execute 100 trials of a single pole-balancing experiment within 100 generations with a population of 150 organisms. The executor will save the results of the experiment in the output directory.

The console output will look similar to the following:

Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620479404
Average
        Trial duration:         221.247774ms
        Epoch duration:         39.216614ms
        Generations/trial:      10.9

Champion found in 83 trial run
        Winner Nodes:           7
        Winner Genes:           10
        Winner Evals:           634

        Diversity:              47
        Complexity:             16
        Age:                    3
        Fitness:                1.000000

Average among winners
        Winner Nodes:           7.1
        Winner Genes:           11.5
        Winner Evals:           1557.2
        Generations/trial:      10.9

        Diversity:              59.940000
        Complexity:             18.600000
        Age:                    3.570000
        Fitness:                1.000000

Averages for all organisms evaluated during experiment
        Diversity:              37.594982
        Complexity:             18.064586
        Age:                    2.396373
        Fitness:                0.171238

Efficiency score:               11.135725

>>> Start genome file:  ./data/pole1startgenes
>>> Configuration file: ./data/pole1_150.neat

As you can see from the statistics above, the NEAT algorithm is extremely effective in finding a successful solver for pole balancing. It was able to find a solver in all trials compared to the XOR problem, where only about 90 percent of the trials were solved.

It is interesting to find how the population size affects the performance of the algorithm. To find this, we are going to run this experiment using a population of 1'000 organisms. Use the following command to run mentioned evaluation:

cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_1000.neat -genome ./data/pole1startgenes -experiment cart_pole

The commands above produce the following console outputs:

Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620480184
Average
        Trial duration:         296.815693ms
        Epoch duration:         142.432241ms
        Generations/trial:      2.5

Champion found in 23 trial run
        Winner Nodes:           7
        Winner Genes:           10
        Winner Evals:           2543

        Diversity:              40
        Complexity:             16
        Age:                    2
        Fitness:                1.000000

Average among winners
        Winner Nodes:           7.0
        Winner Genes:           10.3
        Winner Evals:           1936.6
        Generations/trial:      2.5

        Diversity:              64.660000
        Complexity:             17.170000
        Age:                    1.450000
        Fitness:                1.000000

Averages for all organisms evaluated during experiment
        Diversity:              26.504369
        Complexity:             17.313629
        Age:                    1.120346
        Fitness:                0.160989

Efficiency score:               11.480676

>>> Start genome file:  ./data/pole1startgenes
>>> Configuration file: ./data/pole1_1000.neat

By comparing the number of generations per trial for both experiments (10.9 and 2.5), you can see that increasing population size effectively increases the algorithm's performance. Furthermore, with 1'000 organisms in the population of solvers, the solution is often found in the NULL, i.e., within the initial random population.

In both single pole-balancing configurations described above, the optimal genome of the winner organism consists of seven nodes and ten genes.

Winner's Network Graph

The seven network nodes have the following meaning:

  • #1 is a bias
  • #2-5 are sensors receiving input system state in following order:
    • cart's X position,
    • cart's acceleration among X,
    • pole's angle,
    • pole's angular velocity.
  • #6 and #7 are output nodes signaling the action that should be applied to the system to balance a single pole at each simulation step, i.e., the force direction. The applied force direction depends on the relative strength of activations of both output neurons. If activation of the first output neuron (6th node) greater than activation of the second neuron (7th node), then the positive force direction is applied. Otherwise, the negative force direction is applied.

The ten genes are precisely the number of links required to connect five input sensor nodes with two output neuron nodes (5x2).

Next, we will evaluate the algorithm against a more advanced variant of the pole balancing problem: Double-Pole Balancing.