Practical Evolutionary Algorithms

A practical book on Evolutionary Algorithms that teaches you the concepts and how they’re implemented in practice.

Get the book

Population Initialisation

Preamble

In [1]:
# used to create block diagrams
%reload_ext xdiag_magic
%xdiag_output_format svg
    
import numpy as np                   # for multi-dimensional containers
import pandas as pd                  # for DataFrames
import plotly.graph_objects as go    # for data visualisation
import plotly.express as px

Introduction

Before the main optimisation process (the "generational loop") can begin, we need to complete the initialisation stage of the algorithm. Typically, this involves generating the initial population of solutions by randomly sampling the search-space. We can see in the figure below that this initialisation stage is the first real stage, and it's only executed once. There are many schemes for generating the initial population, and some even include simply loading in a population from an earlier run of an algorithm.

In [2]:
%%blockdiag
{
    orientation = portrait
    Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation
    Initialisation [color = '#ffffcc']
}
blockdiag { orientation = portrait Initialisation -> Evaluation -> "Terminate?" -> Selection -> Variation -> Evaluation Initialisation [color = '#ffffcc'] } InitialisationEvaluationTerminate?SelectionVariation

Randomly sampling the search-space

When generating an initial population, it's often desirable to have a diverse representation of the search space. This supports better exploitation of problem variables earlier on in the search, without having to rely solely on exploration operators.

We previously defined a solution $x$ as consisting of many problem variables.

$$ x=\langle x_{1},x_{2},\ldots,x_{\mathrm{D}} \rangle \tag{1} $$

We also defined a multi-objective function $f(x)$ as consisting of many objectives.

$$ f(x) =(f_{1}(x),f_{2}(x),\ldots,f_{M}(x))\tag{2} $$

However, we need to have a closer look at how we describe a general multi-objective optimisation problem before we initialise our initial population.

$$ \left.\begin{array}{lll}\tag{3} optimise & f_{m}(x), & m=1,2,\ldots,\mathrm{M};\\ subject\, to & g_{j}(x)\geq0, & j=1,2,\ldots,J;\\ & h_{k}(x)=0, & k=1,2,\ldots,K;\\ & x_{d}^{(L)}\leq x_{d}\leq x_{d}^{(U)} & d=1,2,\ldots,\mathrm{D}; \end{array}\right\} $$

We may already be familiar with some parts of Equation 3, but there are some we haven't covered yet. There are $\mathrm{M}$ objective functions which can be either minimised or maximised. The constraint functions $g_j(x)$ and $h_k(x)$ impose inequality and equality constraints which must be satisfied by a solution $x$ for it to be considered a feasible solution. Another condition which affects the feasibility of a solution $x$ is whether the problem variables fall between (inclusively) the lower $x_{d}^{(L)}$ and upper $x_{d}^{(U)}$ boundaries within the decision space.

The lower $x_{d}^{(L)}$ and upper $x_{d}^{(U)}$ boundaries may not be the same for each problem variable. For example, we can define the following upper and lower boundaries for a problem with 10 problem variables.

In [3]:
D_lower = [-2, -2, -2, 0, -5, 0.5, 1, 1, 0, 1]
D_upper = [ 1,  2,  3, 1, .5, 2.5, 5, 5, 8, 2]

In Python, we normally use np.random.rand() to generate random numbers. If we want to generate a population of 20 solutions, each with 10 problem variables ($\mathrm{D} = 10$), we could try something like the following.

In [4]:
D = 10
population = pd.DataFrame(np.random.rand(20,D))
population
Out[4]:
0 1 2 3 4 5 6 7 8 9
0 0.329671 0.374100 0.824420 0.922642 0.318527 0.188295 0.327129 0.479323 0.210610 0.851801
1 0.645927 0.083338 0.110879 0.004719 0.459459 0.171020 0.352289 0.192019 0.506796 0.651664
2 0.999210 0.910585 0.674060 0.053042 0.535111 0.801007 0.070166 0.128370 0.424526 0.331537
3 0.015599 0.427451 0.201586 0.088630 0.727220 0.389936 0.809707 0.105465 0.211743 0.472576
4 0.466289 0.409967 0.153361 0.467884 0.870126 0.434619 0.494461 0.849840 0.387758 0.704783
5 0.044350 0.428790 0.115291 0.051819 0.594828 0.886502 0.858761 0.860539 0.200967 0.781117
6 0.299581 0.045350 0.981161 0.173037 0.766032 0.553089 0.696421 0.312112 0.091100 0.514797
7 0.554139 0.198144 0.028411 0.280367 0.745371 0.087668 0.744502 0.699962 0.769607 0.821034
8 0.232491 0.941960 0.506784 0.332358 0.999900 0.195531 0.964598 0.705568 0.978146 0.092838
9 0.161326 0.984075 0.330721 0.312350 0.194344 0.036229 0.105671 0.586225 0.693641 0.786574
10 0.303700 0.959589 0.769842 0.231915 0.546678 0.102105 0.990598 0.328266 0.116188 0.713510
11 0.200493 0.307815 0.336176 0.453963 0.063704 0.995317 0.286012 0.324126 0.320682 0.493043
12 0.475509 0.816603 0.391715 0.315581 0.277499 0.070199 0.698860 0.315524 0.491214 0.452783
13 0.887191 0.207538 0.765098 0.302126 0.864817 0.692735 0.041277 0.147419 0.010411 0.508252
14 0.789530 0.593923 0.056915 0.194743 0.453338 0.443845 0.255385 0.978932 0.371420 0.893424
15 0.584534 0.713603 0.498059 0.839776 0.591688 0.483747 0.363979 0.535339 0.427405 0.354482
16 0.688269 0.092315 0.157689 0.663994 0.501117 0.845432 0.709506 0.688688 0.971338 0.496053
17 0.229410 0.243525 0.376468 0.759605 0.130511 0.531057 0.641985 0.437506 0.100129 0.043900
18 0.624602 0.530905 0.594844 0.985875 0.656480 0.664620 0.237749 0.474532 0.514094 0.099991
19 0.669218 0.089019 0.553244 0.229331 0.893242 0.242295 0.304969 0.057211 0.525875 0.154247

This works fine if all of our problem variables are to be within the boundaries 0 and 1 ($x_d \in [0,1]$). However, in this case, we have 10 different upper and lower boundaries, so we can use np.random.uniform() instead.

In [5]:
population = pd.DataFrame(np.random.uniform(low=D_lower, high=D_upper, size=(20,D)))
population
Out[5]:
0 1 2 3 4 5 6 7 8 9
0 -0.243405 0.653442 1.176629 0.421281 -0.839248 0.685737 4.527334 4.929203 6.061257 1.441931
1 -0.831149 -0.084428 -0.819016 0.356604 -0.774314 2.260451 4.844954 1.239304 4.164855 1.027875
2 -1.016566 1.884099 0.715404 0.303204 -0.569247 1.291614 4.544202 3.981528 4.420089 1.071693
3 0.967065 0.762243 1.145826 0.439030 -2.861063 1.656407 2.664406 4.584127 2.936088 1.170738
4 0.243326 1.204101 1.604941 0.617665 -3.343967 2.016591 3.602125 4.262386 6.500192 1.274097
5 -0.017601 -0.376401 1.974580 0.108455 -3.915353 0.677477 4.746552 2.535112 0.373486 1.835593
6 -1.240112 -1.447560 1.910043 0.784465 -1.425895 1.341335 1.975107 2.036065 5.440526 1.873996
7 -1.822004 -1.558009 -0.064267 0.608635 -2.137119 1.606009 2.639081 4.744328 1.736861 1.412605
8 0.961073 -1.899880 1.265955 0.791345 -0.228110 2.215742 4.639379 4.214345 0.859496 1.432187
9 0.258511 -1.974326 -0.328683 0.255757 -3.327445 1.709358 3.083017 2.100346 2.304902 1.370313
10 -0.012020 1.207488 2.620471 0.075654 -4.241606 1.837806 1.802931 2.288309 5.458069 1.634762
11 -0.213199 1.853227 -0.361364 0.696477 -0.907890 0.712989 1.115687 3.850481 5.998671 1.633682
12 0.946941 1.775229 0.404866 0.046627 0.031602 0.812499 1.844088 3.830245 2.505024 1.236807
13 -0.422588 1.727576 -1.799397 0.841338 -1.209451 1.908033 3.804657 3.128292 1.036201 1.057680
14 -0.505849 -1.895723 1.187553 0.669382 -4.634824 2.205534 2.306581 4.896221 2.086699 1.338742
15 -1.136096 1.377803 0.196270 0.264709 -0.483707 0.807833 1.495210 4.371743 1.441752 1.926959
16 0.288149 -1.067553 2.730704 0.196218 -4.731168 1.159789 2.663498 3.960847 7.095345 1.117828
17 -0.628291 0.176261 0.035413 0.986121 -1.435457 2.347143 3.830653 2.046763 4.395512 1.351336
18 -0.289966 -1.032931 -1.074309 0.959382 -4.163810 0.931842 1.183848 1.166898 4.931351 1.009114
19 -0.408237 -0.435879 0.225300 0.670240 -1.353233 1.370362 2.244957 3.627118 4.735529 1.791956

Let's double-check to make sure our solutions fall within the problem variable boundaries.

In [6]:
population.min() > D_lower
Out[6]:
0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
dtype: bool
In [7]:
population.max() < D_upper
Out[7]:
0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
dtype: bool

Great! Now all that's left is to visualise our population in the decision space. We'll use a parallel coordinate plot.

In [8]:
fig = go.Figure(layout=dict(xaxis=dict(title='problem variables', range=[1, 10]),yaxis=dict(title='value')))

for index, row in population.iterrows():
    fig.add_scatter(x=population.columns.values+1, y=row, name=f'solution {index+1}')

fig.show()

To compare one variable to another, we may also want to use a scatterplot matrix.

In [9]:
fig = px.scatter_matrix(population, title=' ')
fig.update_traces(diagonal_visible=False)
fig.show()

Conclusion

In this section, we had a closer look at multi-objective problems so that we knew how we could complete the initialisation stage in an evolutionary algorithm. We generated a population of solutions within upper and lower boundaries, checked to make sure the problem variables fell between the boundaries, and then visualised them using a scatterplot matrix and parallel coordinate plot. In a simple evolutionary algorithm, we have a population that is ready to enter the generational loop.

Support this work

You can access this notebook and more by getting the e-book on Practical Evolutionary Algorithms.