您的位置:首页 > 移动开发

进化算法应用 Evolutionary Principles applied to Problem Solving

2013-09-28 12:33 381 查看

Evolutionary Principles applied to Problem Solving

进化算法的应用

Posted by David Rutten on September 25, 2010 at 9:00am
Send
Message View Blog

This blog post is a rough approximation of the lecture I gave at the AAG10 conference in Vienna on September 21st 2010. Naturally it will be quite a different experience as
这篇博文是我2010年九月份在维也纳AAG10会议上演讲的一个大概的稿子。所以读这篇文章与读媒体上面报道的文章相比将会是一个很不一样的体验,
the medium is quite different, but it my hope the basic premise of the lecture remains intact. This post deals with Evolutionary Solvers in general, but I use Rhino, Grasshopper
但是我希望仍能保证我对讲座原始性的承诺。这篇文章简单的讲述了进化算法,但是我是用犀牛GH里面的Galapagos计算器来展示这个话题的。
and Galapagos to demonstrate the topics.

September 24th 2010
David Rutten

Evolutionary Principles applied to Problem Solving

using



There is nothing particularly new about Evolutionary Solvers or Genetic Algorithms. The first references to this field of computation stem from the early 60's when Lawrence J. Fogel published the landmark paper "On the Organization
of Intellect" which sparked the first endeavours into evolutionary computing. The early 70's witnessed further forays with seminal work produced by -among others- Ingo Rechenberg and John Henry Holland. Evolutionary Computation didn't gain popularity beyond
the programmer world until Richard Dawkins' book "The Blind Watchmaker" in 1986, which came with a small program that generated a seemingly endless stream of body-plans called "Bio-morphs" based on human selection. Since the 80's the advent of the personal
computer has made it possible for individuals without government funding to apply evolutionary principles to personal projects and they have since made it into the common parlance.
这里面没有什么关于进化算法和基因算法新的东西。这种算法起源于上世纪60年代Lawrence J.Fogel 的一份具有时代意义的报告“有关智力的组织”,这份报告标志着第一次人们向进化算法计算的努力。随着Ingo Rechenberg 和John Henry Holland 等一些人对种子的研究,上世纪七十年代见证了在此算法方面更深远的一些突破。 进化算法直到1986年Richard Dawkins的书《盲人修表》发布才开始流行于世界范围内的编程开发者。这本书带来了一段小程序,这个程序解决了一个看似是无限循环的一个被人们称为”生物变形体“流体剖面问题。上世纪八十年代随着个人电脑普及的到来,使得个人在不再依赖政府投资的情况下可以将进化算法应用于个人项目,从此,进化算法称为了很普通的话题。

The term "Evolutionary Computing" may very well be widely known at this point in time, but they are still very much a programmers tool. 'By programmers for programmers' if you will. The applications out there that apply evolutionary
logic are either aimed at solving specific problems, or they are generic libraries that allowother programmers to piggyback along. It is my hope that Galapagos will provide ageneric platform for the application of Evolutionary Algorithms
to be used on a wide variety of problems bynon-programmers.
进化计算这个项目在当前应该已经广为人知了,但是他们仍然只是程序员的工具。或者可以说是被程序员使用,为了程序员而存在。这个使用了进化逻辑的程序(应该指的Galapagos计算器)可以被用着解决具体的问题,或者可以说是一个通用的函数库,这个库可以让其他程序员进行打包封装。我希望,Galapagos计算器能够为进化算法的应用提供一个通用的平台,这个平台能够广泛的被不懂编程的人员用在各种问题上面。

Pros and Cons
赞成与反对

Before we dive into the subject matter too deeply though I feel it is important to highlight some of the (dis)advantages of this particular type of solver, just so you know what to expect. Since we are not living in the best of all
possible worlds there is often no such thing as the perfect solution. Every approach has drawbacks and limitations. In the case of Evolutionary Algorithms these are luckily well known and easily understood drawbacks, even though they are not trivial. Indeed,
they may well be prohibitive for many a particular problem.
在我们深入这个话题之前,我觉得我应该明确一下这种解决方案的优缺点,以便您知道自己想要什么。众所周知,我们并非生活在最理想的状态,总会有更好的解决方案。

每一种方法都有缺点和限制。在进化算法中有幸的是这些都可以被知道和理解的缺点,尽管他们不是琐碎的事情。的确,这些可能让很多人在解决问题的时候望而却步。
Firstly; Evolutionary Algorithms are slow. Dead slow. It is not unheard of that a single process may run for days or even weeks. Especially complicated set-ups that require a long time in order to solve a single iteration will
quickly run out of hand. A light/shadow or acoustic computation for example may easily take a minute per iteration. If we assume we'll need at least 50 generations of 50 individuals each (which is almost certainly an underestimate unless the problem has a
very obvious solution.) we're already looking at a two-day runtime.
首先,进化算法很慢,慢死了。一个单一的线程跑上几天或者几个星期并不足为奇。尤其是一个复杂的组织要求长时间的去解决一个单一的迭代会很快运行的失去控制。
例如一个光/阴影或者声学的一个计算,每一个迭代会很轻易的消耗掉一分钟。如果我们假定需要至少50代的进化,每代50个个体(除非这个问题有个很明显的解决方案,否则这样个估计也很保守)我们已经需要两天的计算时间!

Secondly, Evolutionary Algorithms do not guarantee a solution. Unless a predefined 'good-enough' value is specified, the process will tend to run on indefinitely, never reachingThe Answer, or, having reached it, not
recognizing it for what it is.
其次,进化算法计算并不能确保一定能找到一个解决问题的方案。进化算法必须设定一个预期值,否则计算将没有一个方向,永远找不到一个答案,或者找到了答案,但是程序并不知道那是一个答案。

All is not bleak and dismal however, Evolutionary Algorithms have strong benefits as well, some of them rather unique amongst the plethora of computational methods. They are remarkablyflexible for example, able to tackle a
wide variety of problems. There are classes of problems which are by definition beyond the reach of even the best solver implementation and other classes that are very difficult to solve, but these are typically rare in the province of the human meso-world.
By and large the problems we encounter on a daily basis fall into the 'evolutionary solvable' category.
除了这些缺点外,进化算法同时有很多强大的有点,有些优点在众多的算法中相当独特。例如它非常灵活,以至于可以处理很多问题,这在人类世界中确实少见。不久的将来我们在生活中遇到的各种问题都将落入进化计算的进程中。(难道David是指云计算?)

Evolutionary Algorithms are also quite forgiving. They will happily chew on problems that have been under- or over-constrained or otherwise poorly formulated. Furthermore, because the run-time process isprogressive,
intermediate answers can be harvested at practically any time. Unlike many dedicated algorithms, Evolutionary Solvers spew forth a never ending stream of answers, where newer answers are generally of a higher quality than older answers. So even a pre-maturely
aborted run will yield something which could be called a result. It might not be a very good result, but it will be a result of sorts.
进化算法应用很广泛,他们能解决各种静定或者超静定问题,或者表达不是很清楚的问题。而且随着算法的改进,算法中间的结果几乎在任何时间都可以被获取(计算器能产生很多结果,这些结果可以被获取)。有别于其他一些特定的算法,进化算法能不停的给出一系列答案,而且新的答案永远比较旧的好。所以即便计算过早的被终止也会产生一个可以被称作答案的结果。尽管不是一个好的答案,但也是一个不怎么样的结果。

Finally, Evolutionary Solvers allow -in principle- for a high degree ofinteraction with the user. This too is a fairly unique feature, especially given the broad range of possible applications. The run-time process is highly
transparent and browsable, and there exists a lot of opportunity for a dialogue between algorithm and human. The solver can be coached across barriers with the aid of human intelligence, or it can be goaded into exploring sub-optimal branches and superficially
dead-ends.
最后进化算法大体上能和用户进行高度的交互。这也是它独特之处之一,尤其是鉴于各种应用的广泛性,这种特性更加显著。此算法运算过程高度的透明和可读,用户和算法直接存在大量的交互机会。计算器在人的帮助下可以克服各种障碍,或者它可以被用于探索最优化的分支,这些分支往往表面上看着无解。

The Process
进程

In this section I shall briefly outline the process of an Evolutionary Solver run. It is a highly simplified version of the remainder of the blog post, and I'll skip over many interesting and even important details. I'll show the
process as a series of image frames, where each frame shows the state of the 'population' at a given moment in time. Before I can start however, I need to explain what the image below means.
在这个章节,我将简明的勾勒出进化算法是如何运行的。这是篇我之前发布的博文的高度简化版,我略过了很多有意思甚至很重要的细节。不管怎么样,开始之前我必须解释一些下面图片表达什么意思。



What you see here is the Fitness Landscape of a particular model. The model contains two variables, meaning two values which are allowed to change. In Evolutionary Computing we refer to variables asgenes. As we changeGene
A, the state of the model changes and it either becomes better or worse (depending on what we're looking for). So asGene A changes, the fitness of the entire model goes up or down. But for every value ofA, we can also varyGene B,
resulting in better or worse combinations of A and B. Every combination of A and B results in a particular fitness, and this fitness is expressed as the height of the Fitness Landscape. It is the job of the solver to find the highest peak in this
landscape.
你在这里看到的是某个模型中的适应值曲面。这个模型包含两个变量,就是两个可变的值。在进化算法计算中我们称其为基因变量(genes).随着Gene A的改变,模型的状态也跟着改变,变好也可能变坏(要适我们所要寻找的结果而定)。所以随着基因A的变化,整个模型的适应度上升或者下降。对于每个A值来将,随着变量基因B的不同则得到的是A,B组合的好坏状态。每一个组合都得到一个适应度,这个适应度是用曲面的高度也表达的。计算器的职责就是找到曲面的峰值。

Of course a lot of problems are defined by not just two but many genes, in which case we can no longer speak of a 'landscape' in the strict sense. A model with 12 genes would be a 12-dimensional fitness volume deformed in 13 dimensions
instead of a two-dimensional fitness plane deformed in 3 dimensions. As this is impossible to visualize I shall only use one and two-dimensional models, but note that when we speak of a "landscape", it might mean something terribly more complex than the above
image shows.
当然很多问题有很多变量而不仅仅只有两个变量,这样我们就不能用曲面来准确的表达了。一个有12个变量的模型将会是一个12维度适应值空间变形于13维空间中,而非一个二维适应度平面在三维空间的变形。这不可能可视化表达,所以我只用一个二维的模型来替代,但是,注意当我们说曲面时,它可能是一个很复杂的东西。

As the solver starts it has no idea about the actual shape of the fitness landscape. Indeed, if we knew the shape we wouldn't need to bother with all this messy evolutionary stuff in the first place. So the initial step of the solver
is to populate the landscape (or "model-space") with a random collection of individuals (or "genomes"). A genome is nothing more than a specific value for each and every gene. In the above case, a genome could for example be {A=0.2 B=0.5}. The solver will
then evaluate the fitness for each and every one of these random genomes, giving us the following distribution:
一开始计算器并不知道拟合曲面的真实形状。确实,如果我们知道形状我们一开始就没必要费心的搞这些进化计算乱七八糟的东西了。所以,计算器的第一步是在拟合曲面(或者空间)中填入一系列的个体(或者叫做基因组)。一个基因对于每个变量来说无非就是一个值。根据以上情况,一个基因可以是{A =0.2 B = 0.5}.然后计算器将会计算每个随机基因的拟合值,得到下面一幅场景:



Once we know how fit every genome is (i.e., the elevation of the red dots), we can make a hierarchy from fittest to lamest. We are looking for high-ground in the landscape and it is a reasonable assumption that the higher genomes
are closer to potential high-ground than the low ones. Therefore we can kill off the worst performing ones and focus on the remainder:
一旦我们知道每对基因的适应度(也就是红点的高度),我们可以从最适合到最不适合将点做一个分级。我们可以找到曲面的制高点,我们可以很合理的假设越高的基因比低一点的基因更接近制高点。因此,我们清除掉表现不好的基因,将注意力放到剩余上面来:



It is not good enough to simply pick the best performing genome from the initial population and call it quits. Since all the genomes in Generation 0 were picked at random, it is actually quite unlikely that any of them will have hit
the jack-pot. What we need to do is breed the best performing genomes in Generation 0 to create Generation 1. When we breed two genomes, their offspring will end up somewhere in the intermediate model-space, thus exploring fresh ground:
只是简单的从初始人口中挑出优质的基因然后就停止工作,这样并不好。既然第0代基因都是随机挑选的,所以他们中的一个成为最佳结果是不大可能的。我们需要做的是养育第0代表现好的基因来创造第一代。当我们养育了两代基因,他们的后代将在拟合模型空间中的某个地方终止,这样得到下面一副新的场景:



We now have a new population, which is no longer completely random and which is already starting to cluster around the three fitness 'peaks'. All we have to do is repeat the above steps (kill off the worst performing genomes, breed
the best-performing genomes) until we have reached the highest peak.
先在我们有个新的人群,这个人群并非完全随机的来,这个一堆基因在拟合峰值附近堆放。我们需要做的就是重复上两个步骤(去掉表现差的值,养育表现好的值)直到我们找到最优值。



In order to perform this process, an Evolutionary Solver requires five interlocking parts, which I'll discuss in something resembling detail. We could call this theanatomy of the Solver.

Fitness Function
Selection Mechanism
Coupling Algorithm
Coalescence Algorithm
Mutation Factory

为了演示这个过程,一个进化计算被分成了5个环环相扣的部分,这里我将逐个详细讲解。我们可以把这些叫做进化算法的剖析。
1,拟合函数
2,筛选机制
3,耦合算法
4,合并算法
5,突变工厂

Fitness Functions
拟合函数



In biological evolution, the quality known as "Fitness" is actually something of a stumbling block. Usually it is very difficult to say exactly what it means to befit. It certainly has little or nothing to do with being the
strongest, or the fastest, or the most vicious. The reason there are no flying dogs isn't that evolution hasn't gotten around to making any yet, it is that the dog lifestyle is supremely incompatible with flying and the sacrifices required to equip a dog with
flight would certainly detract more from the overall fitness than flight would add to it. Fitness is the result of a million conflicting forces. Evolutionary Fitness is the ultimate compromise.
在进化论中,质量被称为适应度,是真正的绊脚石。通常很难准确说出什么样的环境是最合适的。它和最强,最快,或者最邪恶几乎没半点关系。我们知道世界上没有会飞的狗,原因不是进化还没有考虑去做一些飞狗,而是因为狗的生活方式和飞行高度不相容,是因为将狗进化成能飞行比狗能飞行这项技能所带来的好处要牺牲更多。最优选择是百万种相互冲撞的作用的结果。进化的最优的结果是最终的平衡。

A fit individual is on average able to produce more offspring than an unfit one, so we could say that fitness equals the number of genetic children. A better measure yet would be to count the number of grand-children. And
a better measure yet would be to count the allele frequency in the gene-pool of the genes that made up the individual in question. But these are all rather ad-hoc definitions that cannot be measured on the spot.

At least in Evolutionary Computation, fitness is a very easy concept.Fitness is whatever we want it to be. We are trying to solve a specific problem, and therefore we know what it means to be fit. If for example we are seeking
to position a shape so that it may be milled with minimum material waste, there is a very strict fitness function that leaves no room for argument.

Let's have a look at the fitness landscape again and let's imagine it represents a model that seeks to encase an object in a minimum volume bounding-box. A minimum bounding-box is the smallest orthogonal box that completely contains
any given shape. In the image below, the green shape is encased by two bounding boxes. B has a smaller area than A and is therefore fitter.



When we need to mill or 3D-print a shape, it is often a good idea to rotate it until it requires the least amount of material to be used during manufacturing. For a real minimum bounding-box we need at least three rotation axes, but
since that will not allow me to display the real fitness landscape, we will restrict ourselves to rotation around the world X and Y axes. So,Gene A will represent the rotation around the X axis andGene B will represent rotation around the
Y axis. There is no need to allow for rotation higher than 360 degrees, so both genes have a limited working domain. (In fact, since we are talking about orthogonal boxes, even a 0-90 degree domain would suffice). Behold rotation around a single axis:



When we pick two rotational angles at random, we end up somewhere on the fitness landscape. If we allow for 4 decimal places in the rotation angles it means we can actually generate almost 810,000,000,000 (or 810 billion) unique rotations.
It is therefore exceptionally unlikely that we manage to pick a random rotation that yields the best possible answer. But let's say we don't even manage to get close. Let's say we manage to pick a random genome that is at the bad end of the fitness scale,
i.e. at the bottom of the fitness landscape. What can we say about the blood-line of this genome? When we track the descendants of a particular genome there is always a large amount of randomness involved due to the workings of the Solver, but there is a strong
general tendency that can be distinguished. Just like water will always flow downhill along the steepest slope, so genetic descendants will generally climb uphill along the steepest slope:



Every individual tries to maximize its own fitness, as high fitness is rewarded by the solver. And the steepest uphill climb is the fastest way towards high fitness. So if the black sphere represents the location of the ancestral
genome, the orange track represents the pathway of its most successful offspring. We can repeat this exercise for a large amount of sample points which will tell us something about how the Solver and the Fitness Landscape interact:



Since every genome is pulled uphill, every peak in the fitness landscape has abasin of attraction around it. This basin represents all the points in model-space that will converge upon that specific peak. It is important
to notice that thearea of the basin is in no way representative of the
quality of the peak. Indeed, a very poor solution may have a large basin of attraction while a good peak might have a small catchment area. Problems like this are typically very difficult to solve, as the solution tends to get stuck in local optima. But
we'll have a look at problematic fitness functions later on.

First, let's have a closer look at the actual fitness landscape for our minimum bounding-box model. I'm afraid it's not quite as simple as the image we've been using so far. I was actually quite surprised how organic and un-box-like
the actual fitness landscape for this problem is. Remember, the x-axis rotation is mapped along theGene A direction and the y-axis rotation along theGene B direction. So every point on theAB plane represents a unique rotation composed
of two angles. The elevation of this point is a direct mapping of the volume of the bounding-box at those two rotation angles:



The first thing to notice is that the landscape is periodic. I.e., it repeats itself every 90 degrees in both directions. Also, this landscape is in fact inverted as we're looking for aminimum volume, not a maximum one. Thus,
the orange peaks in fact represent the worst solutions to this problem. Note that there are 16 of these peaks in the entire range and that they are rounded. When we look at the bottom of this fitness landscape, we get a rather different view:



It would appear that the lowest points in this landscape (the minimum bounding-boxes) are both fewer in number and of a different kind. We only get 8 optimal solutions and they are all very sharp, indicating a somewhat more fragile
state.

Still, on the whole we have nothing to complain about. All the solutions are of equal quality and there are no local optima at all. We can generalize this landscape to a 2-dimensional graph:



No matter where you end up as an ancestral genome, your blood-line will always find its way to a minimum bounding box. There's nowhere for it to get 'stuck'. So it's really just a question about who gets there first. If we look at
a slightly more complex fitness graph, it becomes apparent that this need not be the case:



This fitness landscape has two kinds of solutions. The high quality sharp ones near the bottom of the graph and the low quality flat ones near the top. The basin of attraction is given for both solutions (yellow for high quality,
pink for low quality) and you can see that about half of the model space is attracted to the low quality solutions.

An even worse example (flipped upright again this time, so high values indicate good solutions) would be the following fitness landscape:



The basins for these peaks are very small indeed and therefore easy to miss by a random sampling of the landscape. As soon as a lucky genome finds the peak on the left, its offspring will rapidly populate the low peak causing the
rest of the population to go extinct. It is now even less likely that the better peak on the right will be found. The smaller the basins for solution, the harder it is to solve a problem with an evolutionary algorithm.

Another example of a cumbersome problem to solve would be a discontinuous fitness landscape:



Even though there are strictly speaking no local optima, there is also no 'improvement' on the plateaus. A genome which finds itself in the middle of one of these horizontal patches doesn't know where to go. If it takes a step to
the left, nothing changes. If it takes a step to the right, nothing changes. There's no 'pressure' in this fitness landscape, so all the genomes will wander about aimlessly, until one of them has the good fortune to suddenly step onto a higher plateau. At
this point it will quickly dominate the gene-pool and the wandering starts again until the next plateau is accidentally found.

Even worse than this though is a landscape that has a high degree of noise or chaos. A landscape may be continuous and yet feature so much detail that it becomes impossible to make any intelligible pronunciations regarding the fitness
of a local patch:



In a landscape like this, mommy and daddy may both be very similar and both be very fit, but when they mate the offspring might end up in one of the fissures. A landscape like this defies navigation.

Selection Mechanisms

Biological Evolution proceeds by Natural Selection. The ruthless force identified by Darwin as the arbiter of progress. Put simply, Natural Selection affects the direction of the gene-pool over time by regulating who gets to mate.
In extreme cases mating is prevented because a specific genome is so unfit that the bearer cannot survive until reproductive age. Another rather extreme case would be sterility. However, there's a myriad ways in which Natural Selection can make it difficult
or impossible for certain individuals to pass on their genetic footprint.

However, Natural Selection isn't the only game in town. For a long time now humans have been using Artificial Selection in order to breed specific characteristics into a (sub)species. When we try to solve problems using an Evolutionary
Solver, we always use some form of artificial selection. There's no such thing as sex or gender in the computer. The process of selection is also much simpler than in nature, as there is basically only one question that needs to be answered: Who gets
to mate?

Allow me to enumerate the mechanisms for parent selection that are available in Galapagos. This is only a small subset of the selection algorithms that are possible, but they seem to cover the basics rather well.

First off, we have Isotropic Selection, which is the simplest kind of algorithm you can imagine. In fact, it is the absence of a selection algorithm. In Isotropic Selection everyone gets to mate:



No matter where you find yourself on this fitness graph, your chances of ending up in a mating couple are constant. You might think that this is a particularly pointless selection strategy as it does nothing to further the evolution
of the gene-pool. But it is not without precedent in nature. Take for example wind-pollination or coral spawning. If you're a sexually functioning member of such a species, you get to play ball come mating season. Another example would be females in a walrus
colony. Every female in a colony gets to breed with the dominant male, no matter how fit or unfit she is. Isotropic Selection is certainly not without function either. For one, it dampens the speed with which a population runs uphill. It therefore acts as
a safe-guard against a premature colonization of a local -and possibly inferior- optimum.

Another mechanism available in Galapagos is Exclusive Selection, where only the top N% of the population get to mate:



If you're lucky enough to be in the top N%, you'll likely have multiple offspring. A good analogy in nature for Exclusive Selection would be Walrus males. There's only a few harems to go around and far too many males to assign them
all (a harem of one female after all is not really a harem). The flunkies get to sit on the side-line without a single chance to father a walrus baby, doing whatever it is walruses do when they can't get any action.

Another common pattern in nature is Biased Selection, where the chance of mating increases as the fitness increases. This is something we typically see with species that form stable couples. Everyone is basically capable of finding
a mate, but the really attractive individuals manage to get a lot of hanky-panky on the side, thus increasing their chances of becomes genetic founders for future generations. Biased Selection can be amplified by using power functions, which have the effect
of flattening or exaggerating the curve.



Coupling Algorithms

Coupling is the process of finding mates. Once a genome has been elected to mate by the active Selection Algorithm, it has to pick a mate from the population to complete the act. There are of course many ways in which mate
selection could occur, but Galapagos at the moment only allows one; selection by genomic distance. In order to explain this in detail, I should first tell you how aGenome Map works. This



is a Genome Map. It displays all the genomes (individuals) in a certain population as dots on a grid. The distance between two genomes on the grid is roughly analogous with the distance between the genomes in gene-space. I say roughly
because it is in fact impossible to draw a map with exact distances. A single genome is defined by a number of genes. We assume that all the genomes in a species have the same number of genes (this is not technically a limitation of Evolutionary Algorithms,
even though it is currently a limitation of Galapagos). Therefore the distance between two genomes is an N-Dimensional value, where N equals the number of genes. It is not possible to accurately display an N-Dimensional point cloud on a 2-Dimensional screen
so the Genome Map is only a coarse approximation. It also follows that the axes of this graph have no meaning whatsoever, the only information a Genome Map conveys is which genomes are more or less similar (close together) and which genomes are more or less
different (far apart).

Imagine you are an individual that has been selected for mating (yay). The population is well distributed and you are somewhere near the average (I'm sure you are a wildly original and delightful person in real life, but for the time
being try to imagine you are in fact sort of average):



That red dot is you. Who looks attractive?

You could of course limit your search of potential partners to your immediate neighbourhood. This means that you mate with individuals who are very much like you and it means your offspring will also be very much like you.



When this is taken to extremes we call it incestuous mating behaviour and it can become detrimental pretty quickly. Biological incest has a nasty habit of expressing unhealthy but recessive genes, but in the digital world of Evolutionary
Solvers the biggest risk of incest is a rapid decline in population diversity. Low diversity decreases the chances of finding alternate solution basins and thus it risks getting stuck in local optima.

The other extreme is to exclude everyone near you. You'll often hear it said that opposites attract, but that's true only up to a point. At some point the genomes at the other end of the scale become so different as to be
incompatible.



This is called zoophilic mating and it can be equally detrimental. This is especially true when a population is not a single group of genomes, but in fact contains multiple sub-species, each of which is climbing their own little fitness
peak.



You definitely do not want to mate with a member in a different sub-species, as the offspring would likely land somewhere in the middle. And since these two species are climbing different peaks, "in the middle" actually puts you in
a fitness valley.

It would seem that the best option is to balance in-breeding and out-breeding. To select individuals that are not too close and not too far. In Galapagos you can specify an in-breeding factor (between -100% and +100%, total out-breeding
vs. total in-breeding respectively) that allows you to guide this relative offset:



Note that mate selection at present completely ignores mate fitness. This is something that needs looking into for future releases, but even without any advanced selection algorithms the solver still works.

Coalescence Algorithms

Once a mate has been selected, offspring needs to be generated. On the genetic level this is anything but fun and games. The biological process of gene recombination is horrendously complicated and itself subject to evolution (meiotic
drive for example). The digital variant is much more basic. This is partially because genes in evolutionary algorithms are not very similar to biological genes. Ironically, biological genes are far more digital than programmatic genes. As Mendel discovered
in the 1860's, genes are not continuously variable qualities. Instead they behave like on-off switches. Genes in evolutionary solvers like Galapagos behave like floating point numbers, that can assume all the values between two numerical extremes.

When we mate two genomes, we need to decide what values to assign to the genes of the offspring. Again, Galapagos provides several mechanisms for achieving this.



Imagine we have two genomes of four genes each. There is no gender and no sex-based characteristics in the solver so the combination of M and D is potentially a completely symmetrical process. A mechanism that is somewhat synonymous
with biological recombination is Crossover Coalescence.



In Crossover mating, junior inherits a random number of genes from mommy and the remainder from daddy. In this mechanism gene value is maintained.

Blend Coalescence will compute new values for genes based onboth parents, basically averaging the values:



It is also possible to add a blending preference based on relative fitness. If mum is fitter than dad for example, her gene values will be more prominent in the offspring:



Mutation Factories

All the mechanisms we have discussed so far (Selection, Coupling and Coalescence) are designed to improve the quality of solutions on a generation by generation basis. However all of them have a tendency toreduce the bio-diversity
in a population. The only mechanism which can introduce diversity is mutation. Several types of mutation are available in the Galapagos core, though the nature of the implementation in Grasshopper at the moment restricts the possible mutation to only Point
mutations.

Before we get to mutations though, I'd like to talk briefly about Genome Graphs. A popular way to display multi-dimensional points on a two-dimensional medium is to draw them as a series of lines that connect different values on a
set of vertical bars. Each bar represents a single dimension. This way we can quite easily display not just points with any number of dimensions, but even points with a different number of dimensions in the same graph:



Here for example we have a genome consisting of 5 genes. This genome is thus a point in the 5-dimensional space that delineates this particular species. When G0 is drawn at ⅓, it means that the value is one-third between the minimum
and maximum allowed limits. The benefit of this graph is that it becomes quite easy to spot sub-species in a population, as well as lone individuals. When we apply mutations to a genome, we should see a change in the graph, as every unique genome has a unique
graph.



The above modification shows a Point Mutation, where a single gene value is changed. This is currently the only mutation type that is possible in Galapagos. We could also swap two adjacent gene values, in which case we get
an Inversion Mutation:



Inversion mutations are only useful when subsequent genes have a very specific relationship. It tends to drastically modify a genome and thus in most cases also drastically modify fitness. This is almost always a detrimental operation.

Two examples of mutations that cannot be used on a species which requires a fixed number of genes areAddition andDeletion mutations.





Conclusion

Galapagos is still a very young product and hasn't really had time to position itself firmly in any work-flow, provided that it could. It seems to be capable of solving relatively small problems quite quickly, but it certainly needs
a lot of work to make it more robust and usable. It is likely that the most effective applications for a solver of this type and capability are small or partial problems. To try and evolve anything complicated will almost certainly result in frustration.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: