GECCO 2018 Paper: Learning an Evolvable Genotype-Phenotype Mapping

This paper provides profound insights for GA design, in specific the automatic generation of evolvable genotype-phenotype mappings using ANN autoencoders. It is clearly elucidated, with a firm grasp of the foundational literature in both biological evolution and evolutionary computation. The results are promising and establish a rich line of inquiry for researchers to pursue.

— Reviewer #1

Thanks, Reviewer #1. My co-authors (Charles Ofria and Wolfgang Banzhaf) and I are really excited about Learning an Evolvable Genotype-Phenotype Mapping, too! (Almost as excited as we are about traveling to Kyoto this summer to present it at the 2018 GECCO conference.) I hope this blog post will get you on board, too, or at least prompt some good chin-scratching (the academic equivalent of exuberance! ).

We will leave out most details about what we actually did in the paper (you can find those in the paper!), and instead focus on describing and discussing the paper’s main idea: a new digital evolution technique we call AutoMap. We’ll start by reflecting on the problem this technique tries to solve, introduce the deep learning methods the technique employs (called “autoencoders”) then cover how the technique works, briefly review the experiments performed in the paper, and prognosticate on where the research goes from here. This blog post will focus on building intuition for the what and why of AutoMap, especially for folks outside the field. We’ll use lots of graphics, skip the math, emphasize the biological analogy of digital evolution, and throw in fun bits along the way.

If you’re just looking for the publication itself, you can find the PDF here and the paper’s supporting materials here.

🔗 What is a Genotype-Phenotype Map?

We should start, actually, by defining genotype and phenotype. In biology, phenotype refers to an organism’s observable characteristics (morphological, behavioral, physiological, chemical, etc.). Importantly, the phenotype governs an organism’s ability to survive and reproduce — in evolutionary terms, its fitness. In biology, genotype refers to the heritable information that shapes an organism’s phenotype. This is typically equated with an organism’s DNA content. Importantly, the genotype is heritable — genetic information is passed from generation to generation with recombination (mixing and matching of two sets of parental genomes), in sexual organisms, and mutation (random changes to the genetic code).

The elephant’s DNA (genotype), which it inherited from its parents, influences physical characteristics of its body (phenotype).

With these preliminaries out of the way, let’s get a handle on the idea of the genotype-phenotype map. Fundamentally, the genotype-phenotype map describes the relationship between an organism’s genotype and its phenotype. It’s akin to the concept of development in biology, the process through which an organism’s genotype and environment interact to determine its phenotype. In this discussion, although environmental influences on the phenotype are certainly a topic of interest in evolution research, we’ll abstract them away for our discussion.

The genotype-phenotype map is a set of rules that defines a phenotype for every possible genotype. This is akin to the mathematical concept of a function: you put something (a genotype) in and you get something (a certain phenotype) out.

Later on, we’ll use the terms “genotype space” and “phenotype space.” These describe, for a certain system, the set of all possible genotypes and the set of all possible phenotypes, respectively. In biology, the genotype space would be the set of all possible DNA sequences and the phenotype would be the set of all possible physical forms of organisms, including those that can’t survive or can’t reproduce. In digital evolution research, a diverse abundance of genotype and phenotype spaces are possible — and are studied. Let’s take a brief detour to build some background on digital evolution research and look at the genotype-phenotype map in the context of this research.

🔗 Digital Evolution and the Genotype-Phenotype Map

For those unfamiliar, digital evolution broadly refers to the use of algorithms (computer programs) inspired by biological evolution to evolve solutions to problems. We’ll go into more detail, but first let’s start by digging into a particular piece of digital evolution research. The video below from Cheney et al. shows how virtual soft-bodied robots were evolved for locomotion on a flat surface. It’s fun… give it a watch!

Companion video to [Cheney et al., 2013].

In broad strokes, here’s how Cheney et al. evolved these virtual robots:

From a population of soft-bodied robots, evaluate how fast each robot walks.
Select the fastest walkers to serve as parents for the next generation.
Generate a new population of robots from the selected parents.
- This new population is different from the parental generation population due to mutation and recombination.
- Some walkers may be faster than walkers in the parental generation, some may be slower.
Repeat!

The initial population is randomly generated. Generation on generation, better and better walkers were discovered through this evolutionary process. What we see in the video are phenotypes, the bodies of these digital walkers that are getting better and better at walking.

So, what’s going on with the genotypes? In the video above, a technique called HyperNEAT was used to define the genotype-phenotype map. We don’t need to go into detail here, but you can think of the genotypes that were used as math equations. To generate a phenotype for a particular genotype (math equation), the positions (e.g., x,y,z coordinates) of each voxel (each little cube of tissue that comprises the phenotype) are plugged into the math equation. Then, the outputs of the math equation determine what tissue type is placed at each location. Using the HyperNEAT genotype-phenotype map, genotypes tend to encode phenotypes composed of several contiguous swaths of a single tissue type (e.g., red contractile tissue). This is because the math equations tend to spit out similar outputs for nearby voxels. You can see this pattern in the example phenotype below.

Indirect-encoded phenotype in action. Figure 1 from [Cheney et al., 2013]

In other experiments, a different genotype-phenotype map was used to encode soft-bodied robot phenotypes: the direct genotype-phenotype map. With this map, the genotype is simply a list of tissue types, one for each voxel. The phenotype is generated trivially: just assemble the soft-bodied robot as directed — each voxel’s tissue type is directly defined in the genotype. For this reason, this genotype-phenotype map is known as a direct encoding. A direct encoding is a genotype-phenotype map for which each and every piece of information describing the phenotype is specified discretely in the genotype. We call HyperNEAT an indirect encoding because each component of the phenotype doesn’t directly correspond to a component of the genotype: changing part of a math equation genotype can change the tissue types of several phenotypic voxels at once.

Using the direct genotype-phenotype map, genotypes tend to encode phenotypes composed of a hodgepodge amalgam of mixed tissue types. This is because the probability of large contiguous swaths being composed of the same tissue type is small. You can see this pattern in the example phenotype below.

Representative direct-encoded phenotype. Figure 7 from [Cheney et al., 2013]

As you can probably see at this point, many, many genotype-phenotype maps could be defined and employed to evolve soft-bodied robots. Digital evolution researchers don’t just study soft-bodied robots. Digital evolution researchers are interested in “evolving” solutions for a wide variety of different problems. If you can imagine a computational problem, someone has probably at least thought about “evolving” solutions to it. This work spans a wide variety of phenotype spaces. For example — a well known one! — evolutionary techniques were successfully employed to generate antenna configurations for NASA spacecraft to meet challenging design requirements.

Evolved spacecraft antenna design [Hornby et al., 2006].

For each domain, in order to use evolutionary algorithms a genotype-phenotype map must be defined. Unlike evolutionary biologists, digital evolution researchers are forced to think practically about how to define genotype-phenotype maps… and to think about it often. In each problem domain, some genotype-phenotype maps will outperform others: evolution will tend to discover higher-quality phenotypes (e.g., faster soft-bodied walkers or better antennas) when certain genotype-phenotype maps are employed.

Recall how, for the soft-bodied robot example we just covered, the indirect genotype-phenotype map tended to yield phenotypes with contiguous segments of the same tissue type while the direct genotype-phenotype map didn’t. It turns out that soft-bodied robots with contiguous segments of the same tissue type tend to be better walkers because contiguous same-tissue voxels can act in concert to power locomotion. The hodgepodge amalgams yielded by the direct genotype-phenotype map just jiggled around, mostly in place. When using the indirect genotype-phenotype map, much faster soft-bodied walker phenotypes could be evolved. Although the fast soft-bodied phenotypes could certainly be encoded using the direct genotype-phenotype map, they simply weren’t discovered by evolution — there weren’t accessible evolutionary paths from a randomly-initialized direct genotype to the fast walkers.

This particular “good” HyperNEAT genotype-phenotype map, though, might not be a good (or even applicable) option in other problem domains. Here, we’re starting to get at the problem that Automap, the technique introduced in our paper, aims to tackle: how can “good” genotype-phenotype maps be automatically generated for a wide variety of problem domains? First, we should talk about what exactly makes genotype-phenotype map “good.” The answer is closely related to the concept of evolvability, which we will cover next.

🔗 What is Evolvability?

We can conceive of evolvability concretely by imagining a gallery of offspring from a particular parental organism, like the example depicted below.

Wild-type and mutant strains of Arabidopsis thaliana from [Griffiths, 2015].

This gallery reflects the phenotypic outcomes of genetic perturbation of the parental organism. As we look through the gallery of offspring, we can ask a number of questions to assess the degree to which variation introduced by mutation is deleterious. At what frequency are lethal outcomes observed? At what frequency are mildly deleterious outcomes observed? In a similar vein, we can assess the amount of phenotypic diversity observed among the offspring in the gallery. How often are the parental and offspring phenotypes indistinguishable? How often are radical phenotypic changes observed? We can also ascertain the frequency at which phenotypic variation that is both viable and novel is observed.

These twin capacities are essential to evolutionary search. Without any heritable variation, evolution would have no raw material to select from and would stagnate. Without any viable variation, evolution would select against all novelty and again stagnate. Hence, evolutionary innovation depends on the production of heritable, novel phenotypic variation, some of which must not be severely deleterious.

Although the term is used somewhat inconsistently in the literature, most definitions of evolvability tie into:

the amount of novel, heritable phenotypic variation among offspring, and
the degree to which heritable phenotypic variation among offspring is viable.

Let’s examine a pair of biological examples of non-arbitrary phenotypic outcomes under mutation to build our intuition for evolvability.

Two example biological systems: flies and dogs.

A developmental constraint against certain non-viable phenotypic variation in Drosophila melanogaster was discovered through artificial selection experiments [Coyne, 1987] [Tuinstra et al., 1990]. In these experiments, researchers were able to successfully select for bilaterally symmetric phenotypic criteria, such as uniform reduction of eye size, but were unable to successfully select for bilaterally asymmetric phenotypic traits, such as different-sized eyes. Tuinstra et al. hypothesize that the very nature of the developmental process constrains the phenotypic variation that can be observed in offspring, in this case curtailing offspring that lack bilateral symmetry. Specifically, they hypothesize that a lack of bilateral symmetry-breaking information during the embryological development of Drosophila explains the negative result of artificial selection for bilaterally asymmetric phenotypic traits. In this way, the distribution of phenotypic diversity in offspring is biased away from (likely not useful) asymmetric variation.

In addition to qualities that constrain against non-viable mutational outcomes, biological organisms can possess qualities that facilitate increased heritable variation for a phenotypic trait. Somatotropin, also known as growth hormone, has widespread anabolic effects on tissues throughout the body. Mutations that affect somatotropin regulation, structure, or the response by cells all provide avenues for substantial heritable variation in body size [Devesa et al., 2013]. Dog breeds exhibit a range of body weights spanning nearly an order of magnitude. Indeed, among certain groups of dogs, much of this variation can be explained by just six genes, several of which are associated with pathways somatotropin participates in [Rinbault et al., 2013]. The presence of such hormonal signaling pathways can be viewed as making a broad range of heritable phenotypic variation more readily realizable via mutation.

To recap, evolvability describes the phenotypic outcomes under genetic perturbation: to what extent are novel, viable phenotypic outcomes observed?

🔗 What is the Connection Between Genotype-Phenotype Maps and Evolvability?

Let’s think back to the soft-bodied robots.

What phenotypic outcomes of mutation might we expect to observe under the indirect genotype-phenotype map? Because the genotype is a math equation, small changes to the genotype can result in substantial changes to the phenotype (i.e., a change in the tissue type of many voxels). Thus, we can expect genetic perturbation to sometimes yield substantial phenotypic novelty. Further, because the math equation will still tend to spit out similar outputs for nearby voxels, the resulting phenotype will still likely have sizable contiguous same-tissue segments so we can largely expect that the resulting phenotype will have at least a fighting chance at viability (i.e., being able to walk).

Now, what phenotypic outcomes of mutation might we expect to observe under the direct genotype-phenotype map. Each genetic change can only affect a single phenotypic voxel, so we won’t observe much phenotypic novelty under mutation. Further, if you managed to evolve a viable phenotype by accumulating large swaths of contiguous tissues, under the direct encoding most genetic perturbations would break up those contiguous tissues. Thus, we would expect a high proportion of phenotypic outcomes of genetic perturbation to be deleterious.

It looks like the indirect genotype-phenotype map is more evolvable: we expect to observe greater phenotypic novelty and viability under genetic perturbation. But why? Fundamentally, the indirect encoding takes advantage of the regularity in the phenotype space that soft-bodied robots with large contiguous tissue segments tend to be better walkers. By defining the phenotype using a math equation, the genotype-phenotype map largely filters out nonviable phenotypes — genotypes get translated to phenotypes with large contiguous tissue segments and not hodgepodge phenotypes. Under the indirect genotype-phenotype map, the genotype space can be thought of as more compact than the phenotype space. Genotypes only correspond to a fraction of the possible phenotypes: the phenotypes that are potentially interesting for locomotion (i.e., phenotypes with contiguous tissue segments). It makes sense, then, because the genotype space is more compact small changes in the genotype space can yield large changes in the phenotype space (i.e., novelty).

We’ll return to play with these ideas more later on. In the meantime, let’s dive into the technical core of the Automap approach: autoencoders.

🔗 What are Autoencoders?

Autoencoders are little machines that learn to take an input and then give that same input back to you.

Wait a second! That doesn’t sound very interesting.

Have you ever seen the trick where a magician saws her assistant in half then puts her back together? The magician starts out with a one-piece assistant and ends up with a one-piece assistant. The interesting part is the bit in the middle! Autoencoders can learn representations that decompose an input into its fundamental features — a more compact representation — and then reconstruct the input from those fundamental features. This works something like a police sketch. First, someone sees a criminal and describes the face’s physical features with words. This description is the compact representation. Then, the police artist reconstructs the criminal’s face from the description.

Schematic of hypothetical police composite process. Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.

Why does this work? It works because the witness has seen lots of faces and knows what the important bits to describe are. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she knows that the mouth goes under the nose, the nose goes between the eyes, etc. and doesn’t need the witness to tell her absolutely everything about the face in order to draw it.

Now, what if our magician started out with a beat-up assistant — maybe with some bruises or broken bones — then, through the process of sawing her assistant in half and putting her back together ended up with an assistant in one piece and in the pink of health. Sound interesting yet?

Well, autoencoders can also be used to reconstruct a corrupted input. This works something like a police sketch, too. Suppose that the criminal was wearing pantyhose that partially obscured his face. The witness can still describe the suspect’s face and the police artist can still draw it. Under the right conditions, the missing part of the face can be reconstructed reasonably well.

Schematic of hypothetical police composite process with suspect in disguise (incomplete input). Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.

Why does this work? It works because the witness can still see and describe part of the face. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she can make a pretty good guess by cluing off the fact, for example, that faces have left-right symmetry or maybe that the criminal probably had a cheekbone and ear on the part of the face that was obscured. Again, because she’s seen lots of faces the police artist doesn’t need the witness to tell her absolutely everything about the face in order to draw it.

We’ll talk more about each of these two autoencoder superpowers separately. First, we’ll cover bottlenecked autoencoders, which specifically focus on learning a compact representation for inputs. Then, we’ll cover denoising autoencoders, which specifically focus on learning to fix corrupted inputs. It’s certainly possible to have an autoencoder that both bottlenecks and denoises, but it’s instructive to consider these two types of autoencoders separately.

But first, let’s build an intuitive notion of exactly what autoencoders are and how they learn. Autoencoders fall under the umbrella of deep learning, a fancy (trendy) shorthand that just means multilayer artificial neural networks. Multilayer artificial neural networks themselves are just cleverly devised math equations. They’re called artificial neural networks because their structure and function somewhat resemble how information is processed by networks of neurons in the brain. The moniker “deep learning” refers to the fact that these neural networks are really big: lots of layers and tens of thousands or more parameters.

Even though there’s lots of interesting mathematics and biological inspiration behind them, at the end of the day multilayer artificial neural networks are still just math equations. You feed a vector of numbers into one side of these big math equations, they percolate through, and numbers come out the other end. The input and output numbers can represent whatever you want. For example, you could feed in pictures of animals and have the deep neural network output whether it thinks there’s a cat in the image. You could feed in black and white images and have the deep neural network output its best guess about how to colorize the image. You could feed in recordings of speech and have the deep neural network output whether it thinks a keyword, like “Alexa” or “Hey Google,” was said.

Here’s the clever trick that makes artificial neural networks tick: if you know what should be coming out the other end for certain inputs, you can easily figure out how to adjust the parameters (numerical constants in the math equation) so that the output produced the next time you feed in that input will be less wrong (i.e., will more closely resemble the output you want). This is how artificial neural networks learn. In some ways, you can think about this like studying with flashcards. You take a card out of the training deck, feed the input side of the flashcard (e.g., the animal image) one side of the math equation and check the output side of the flashcard (e.g., is it a cat?) against what came out of the other side of the math equation. If the was wrong, you adjust the parameters of the math equation to make its output less wrong next time. The cool superpower of deep learning is that once these math equations have studied enough, they can often start to guess correctly on flashcards they haven’t seen. For example, a smartspeaker might have never heard you say its keyword before but can usually detect it nevertheless.

Making flashcards for autoencoders is easy. Remember that an autoencoer’s goal is to take an input and then give that same input back to you. This means that both sides of the flashcard are the same thing. If we’re trying to train an autoencoder that works like our police sketch example, we just need lots of pictures of faces. Then, we just make our flashcards by putting the faces on both sides. We train the autoencoder so that each output face looks more and more like its corresponding input face. If we really want to make sure that our autoencoder gets good at reconstruction faces from partial observations (like when the crook’s wearing a burglar mask), we can take a sharpie and black out parts of the faces on the input side of the card. If we were training a smartspeaker to recognize a keyword, we would have to manually label the output sides of its flashcards as to whether the keyword was present. Because we don’t have to label an autoencoder’s flashcards manually, it’s considered a form of unsupervised (or, more specifically, self-supervised) learning. The point is, we don’t need any manually-written labels; we just need lots of examples of whatever it is we want our autoencoder to be able to encode/decode and/or denoise.

If this brief monograph left you curious (or confused) about deep learning, you can look here for a more detailed introduction to artificial neural networks.

🔗 Bottlenecked Autoencoder

Here’s a nice graphic showing a bottlenecked autoencoder in action on the MNIST handwritten digit dataset.

Schematic of bottlenecked autoencoder in action on MNIST handwritten digit dataset. Graphic from [Dertat, 2017].

As you can see, a handwritten 4 goes in the left side and through the encoder (a multilayer artificial neural network) to produce the more compact encoded representation. The encoder has to figure out a way to represent the 4 without just using all the raw pixels. Then, the code is read by the decoder (another mutlilayer artificial neural network) to generate the reconstruction seen on the right. The reconstruction looks very much like the original input, but if you look closely it’s not an exact replica. If we were still training, we could use some derivative math to figure out how to adjust the parameters of the encoder and decoder to try to get output that resembles the original image even better.

As you can see in the example below, the autoencoder works well across many different handwritten digits.

Example input-output pairs (top/bottom) for a MNIST bottleneck autoencoder. Graphic from [Dertat, 2017].

This works because the autoencoder can “trust” that only images of handwriting will come in. If the autoencoder was trained with arbitrary static, a compact representation of arbitrary static wouldn’t be possible: there would be no patterns, symmetries, regularities, etc. that allow the encoder to compress the input image. (I like to think of the fact that arbitrary static couldn’t be compressed as a result of the fact that the set of all arbitrary static is equivalent to the space of all possible images. Thinking this way, handwritten-digit images can be compressed because they only occupy a [very small] portion of the space of all possible images so it takes less information to specify which of the possible handwritten digit images.)

Remember, the interesting part of the bottlenecked autoencoder is the code in the middle. We refer to the set of all possible codes as the latent space. The latent space is a fantastic, magical place. We can get a really cool visualization of what the decoder portion of the autoencoder is doing by exploring the latent space. We’ll use a technique called interpolation to do this.

Schematic of interpolation in latent space. Graphic from [Despois, 2017].

To perform a latent space interpolation, we start with two handwritten digit images: a point of departure and a destination. We call the space the original images come from the pixel space. We use the encoder portion of the autoencoder to jump into the latent space (i.e., running the image through the encoder and looking at the) at the point of departure and the destination. Then, we traverse a direct path between the point of departure and the destination in the latent space. A direct path just means that we slowly interpolate between our point of departure in the latent space and our destination point in the latent space. All the while, we use the decoder portion of the autoencoder to take snapshots of our progress through the latent space, passing our current position in the latent space through the decoder to yield an image representing our current position in the latent space.

The latent space interpolation we just described contrasts with the more rudimentary pixel space interpolation.

Schematic of interpolation in pixel space. Graphic from [Despois, 2017].

To perform a pixel space interpolation, we pick out two handwritten digit images: a point of departure and a destination, just as before. Then, we just traverse a direct path between the point of departure and the destination in the pixel space, slowly turning up the intensity of pixels that need to increase to match the destination and slowly turning down the intensity of pixels that need to decrease to match the destination.

Here are some interpolations between MNIST digits. Some interpolations are in the pixel space and some are in the latent (encoded) space. While you look through, compare and contrast between these interpolations. I put a bunch of examples in here because they look really cool!!

Frame-by-frame MNIST interpolation in latent space: 2 -> 3. Graphic from [Despois, 2017].

Frame-by-frame MNIST interpolation in pixel space: 2 -> 3. Graphic from [Despois, 2017].

Frame-by-frame MNIST interpolation in pixel space: 5 -> 9. Graphic from [Despois, 2017].

Frame-by-frame MNIST interpolation in latent space: 5 -> 9. Graphic from [Despois, 2017].

Animated MNIST interpolation 2 -> 1, pixel space (left) and latent space (right). Graphic from [Despois, 2017].

Animated MNIST interpolation 1 -> 3, pixel space (left) and latent space (right). Graphic from [Despois, 2017].

Animated MNIST interpolation 5 -> 2, pixel space (left) and latent space (right). Graphic from [Despois, 2017].

Animated MNIST interpolation 5 -> 6, pixel space (left) and latent space (right). Graphic from [Despois, 2017].

Based on these interpolations, we can make a few observations about the difference between latent space and pixel space. While traversing the latent space, almost every intermediate step looked like it could have conceivably been handwritten (even if it did not clearly correspond to a specific digit). It is reasonable to suppose that many, if not most, points in the regions of latent space we’re exploring correspond to reasonably handwriting-looking images. While traversing, the pixel space many intermediate steps were obviously not handwritten — in the middle, it was often clear that two ghost images were superimposed.

Now, it’s important to recognize that paths between digits where all the intermediate states look reasonably handwritten do exist in pixel space. (If you think about it, that’s exactly what our projections from interpolations in latent space traced out.) However, note that these paths are rare: we would have to trace a very specific, long, and circuitous route through pixel space to follow them; we couldn’t just beat a direct path between points like we did in latent space.

🔗 Denoising Autoencoder

Here’s another nice graphic, this one showing a denoising autoencoder in action on the MNIST handwritten digit dataset.

Schematic of denoising autoencoder in action on MNIST handwriting image. Graphic from [Dertat, 2017].

We start out with an image of a handwritten 4 from the MNIST dataset on the left. Then, some static is added to corrupt the image. Importantly, to our eye the 4 is still recognizable… this is a good indication that the 4 has not been corrupted to the point where it is no longer recognizable. If we were to feed completely garbled static into the denoising autoencoder, there would be absolutely nothing to go off of to reconstruct the original image. The autoencoder might still make a best guess, reconstructing something that looks like a handwritten digit, but it would almost certainly not resemble the actual uncorrupted 4. Anyhow, the somewhat corrupted input travels through the denoising autoencoder (i.e., the encoder and decoder) and a (very good) best guess as to the original uncorrupted input comes out the other side.

As we can see, the trained denoising autoencoder is capable of reconstructing all different kinds of handwritten digits.

Example of denoising autoencoder at work on noisy MNIST handwriting. Graphic from [Dertat, 2017].

Again this works because the autoencoder was trained on example handwritten digits. It has learned the subset of images that qualify as a valid handwritten digit. Images that fall outside of that subset are simply mapped to a nearby image that qualifies as a valid handwritten digit.

🔗 How To Use Autoencoders as a Genotype-Phenotype Map?

We use the two types of autoencoders (bottlenecked and denoising) as genotype-phenotype maps slightly differently, so we’ll cover them separately here. First, though, we should work through some background on fitness landscapes, which we will subsequently be using to understand the properties of genotype-phenotype maps constructed with both kinds of autoencoders.

🔗 Fitness Landscapes

This video gives a great introduction to the concept of fitness landscapes.

Video by Bjørn Østman and Randy Olson about fitness landscapes.

Let’s recap the main idea. When visualizing a 3D fitness landscape, the phenotype is determined by two values: x and y. (In some landscapes, the x-y coordinates represent the genotype. Here, we’ll just look at landscapes where the x-y coordinates are phenotypic features.) Each individual x-y coordinate therefore corresponds to a particular phenotype. The fitness landscape is drawn as a surface that spans all possible x-y coordinates; the height of the surface represents the fitness of a particular phenotype with each x-y value. Because the progeny of high-fitness individuals from the previous generation are over-represented in the successive generation (i.e., survival of the fittest), a population (individual phenotype points on the fitness surface) tends to “climb” the surface. You can see this very nicely in an animated excerpt from Bjørn Østman and Randy Olson’s video.

Animated excerpt from video by Bjørn Østman and Randy Olson about fitness landscapes.

🔗 Bottleneck Autoencoder

Let’s consider the following hypothetical fitness landscape.

A hypothetical phenotype-space fitness landscape.

If we evolve a single population in this landscape using the direct genotype-phenotype map, after several generations it will “climb” upwards and end up somewhere on the landscape’s high-fitness ridge.

End state of a single population evolved using the direct genotype-phenotype mapping.

If we evolve several more populations in this landscape from different starting conditions, they will converge to regions spread across the high-fitness ridge. If we repeat with enough independent evolutionary runs, we start to get decent coverage of the high-fitness ridge.

End state of multiple populations evolved using the direct genotype-phenotype mapping.

Now, it’s time to train our bottlenecked autoencoder. We use the set of high-fitness phenotypes attained through all of our independent evolutionary runs as training data. Once we are finished training the autoencoder, we just snap off the decoder portion of the autoencoder to use as our genotype-phenotype map.

Schematic of a bottlenecked autoencoder and how it might serve as a genotype-phenotype map.

To generate a phenotype from a genotype, we just stick that genotype into the decoder and read the output. Hence, the genotype space is now equivalent to the latent space we discussed in the last section.

If the size of the bottleneck in our hypothetical example was just one scalar value, it is likely that the encoder of our autoencoder would learn to encode the phenotypes in terms of position along the ridge. Remember: an autoencoder learns a code that represents just the regions of the input space (in this case, phenotype space) in which it encountered training data.

Using a bottlenecked autoencoder to obtain a lower-dimensional genotype.

With this genotype-phenotype map, if we were to walk through all possible genotypes — from small scalar values to large scalar values — the corresponding phenotypes would likely trace the high-fitness ridge in the phenotype space. Note that we’ve accomplished a reduction in dimensionality: the phenotype space is still two-dimensional (x and y coordinates define a phenotype) but the genotype space — the space we’re paging through in our evolutionary search — is one-dimensional. You’re probably starting to see it at this point, but we’ll circle back shortly to explicitly discuss the evolvability implications of the bottlenecked autoencoder genotype-phenotype mapping.

🔗 Denoising Autoencoder

Let’s consider another hypothetical fitness landscape.

A hypothetical phenotype-space fitness landscape.

If we evolve a single population in this landscape using the direct genotype-phenotype map, after several generations it will “climb” upwards and end up somewhere on one of the landscape’s high-fitness hilltops.

End state of a single population evolved using the direct genotype-phenotype mapping.

If we evolve several more populations in this landscape from different starting conditions, they will converge to different high-fitness hilltops. If we repeat with enough independent evolutionary runs, we start to reveal a pattern in the distribution of high-fitness hilltops.

End state of multiple populations evolved using the direct genotype-phenotype mapping.

Now, it’s time to train our denoising autoencoder. We use the set of high-fitness phenotypes attained through all of our independent evolutionary runs as training data. Once we are finished training the autoencoder, we can use the whole autoencoder (i.e., the encoder and the decoder) as the genotype-phenotype map.

Schematic of a denoising autoencoder and how it might serve as a genotype-phenotype map.

To generate a phenotype from a genotype, we just stick that genotype into the autoencoder and read the output. With this approach, the dimensionality of the genotype space remains equivalent to dimensionality of the phenotype space. However, most genotypes would no longer map to their direct counterpart in phenotype space. Instead, they would be “snapped” to the nearest fitness peak where training data (i.e., converged populations evolved using the direct genotype-phenotype map) was observed.

Using a denoising autoencoder to “snap” to local fitness peaks.

Again, a denoising autoencoder learns to recognize the regions of the input space (in this case, phenotype space) in which it encountered training data. If it encounters input that doesn’t fall into one of these regions, it will make its best guess as to the nearest valid region that that input could have come from.

🔗 Autoencoder Genotype-Phenotype Maps and Evolvability

Now that we’ve seen how autoencoders can be used as genotype-phenotype maps, let’s see why we might want to use autoencoders as genotype-phenotype maps. Specifically, let’s see what autoencoders can do for us in terms of evolvability. Recall the two properties we’re after with respect to evolvability:

heritable phenotypic novelty is introduced by genetic perturbation, and
viability of phenotypic outcomes under genetic perturbation.

🔗 Bottlenecked Autoencoder

In the following animation and frame-by-frame summary, a hypothetical evolutionary trajectory under the bottlenecked auotencoder genotype-phenotype map is traced out. We start with a particular genotype and then repeatedly mutate it. We can watch the phenotype change in the phenotype space on the right in step with the corresponding genotype change in the genotype space on the right.

Frame-by-frame hypothetical evolutionary trajectory with bottlenecked genotype-phenotype map.

Animated hypothetical evolutionary trajectory with bottlenecked genotype-phenotype map.

With the bottlenecked genotype-phenotype map, we can make large jumps in genotype space and get at least some viable outcomes. Hence, we attain good evolvability.

🔗 Denoising Autoencoder

The following animation and frame-by-frame summary also trace out a hypothetical evolutionary trajectory, this time with the denoising autoencoder genotype-phenotype map. We start with a particular genotype and then repeatedly mutate it. We can watch the genotype (hollow black circle) and the phenotype (the filled-in black circle) change. The genotype and phenotype are plotted together in this visualization because the genotype and phenotype spaces have the same dimensionality under the denoising autoencoder genotype-phenotype map.

Frame-by-frame hypothetical evolutionary trajectory with denoising genotype-phenotype map.

Animated hypothetical evolutionary trajectory with denoising genotype-phenotype map.

The pattern here looks something like the biological notion of canalization: many mutations are silent until they build up to have a large enough effect to shift the phenotype significantly. Canalization boosts evolvability. Many mutations are silent and therefore can accumulate via genetic drift because they are not selected away. Once enough mutations accumulate to be expressed, significant phenotypic novelty is observed. The tendency of the denoising autoencoder to snap genotypes to high-fitness regions of the phenotype space helps to promote the viability of phenotypic novelty when it is expressed. Importantly, this genotype-phenotype allows escape from local fitness peaks in the phenotype space to explore other peaks so evolution is less likely to get stuck and stagnate.

🔗 Thought Experiment: Evolving Police Composites

In order to drive home the implications of autoencoder genotype-phenotype maps for evolvability let’s talk through a little thought experiment. Think back to the problem of police face reconstruction we discussed before. Suppose we’re trying to evolve a face that, as judged by the witness of a crime, maximally resembles the perpetrator. (Yes, this is a real thing people do [Frowd et al., 2004]). To accomplish this, we start out with a set of random genotypes that map to different phenotypes (images). The witness selects the images that most closely resemble the suspect’s face. Then, we mutate and recombine the best matches to make a new batch of images for the witness to consider As we iterate through this process, hopefully we generate images that more and more closely resemble the suspect’s face.

Consider trying to evolve a facial composite using the direct genotype-phenotype map. Under this map, the intensity of each pixel of the image is directly encoded in the genotype. First of all, the randomly generated images wouldn’t look very much like faces at all — they’d look more like static. Supposing that we were actually able to eventually get to an image that vaguely resembles a face at all, then what? Is there a path of pixel-by-pixel changes that leads to the suspect’s face where every pixel-by-pixel change more closely resembles the perpetrator’s face? I’d argue we’d be likely to sooner or later get stuck at a dead end where the image doesn’t resemble the perpetrator’s face but pixel-by-pixel changes to the image make it look less like the perpetrator’s face (or a face at all).

Evolving the composite using the direct genotype-phenotype map probably won’t work well.

Would either of the two ideas be helpful?

Instead of having the genotype directly represent the image at the pixel level, encode genotypes analogously to a verbal description then use a police artist who can draw a suspect from verbal descriptions to generate phenotypes.
Use a direct encoding as before, but before running images by the suspect have a police artist look at the (likely noisy) image and re-draw it as the most similar image that actually looks like a face.

This is analogous to what the bottlenecked and denoising autoencoders, respectively, accomplish. I would argue that either of these ideas would be vastly superior to the direct genotype-phenotype map. As we’ll see shortly, in some ways, autoencoders can (and do) quite literally accomplish this task with faces.

(For those who are curious, software to evolve police composites uses an indirect genotype-phenotype map based on eigenfaces [Frowd et al., 2004].)

🔗 What’s Actually in the Paper?

We used two toy problems to experiment with autoencoder genotype-phenotype maps:

the n-legged table problem, where the phenotypic fitness landscape is extremely rugged because only level tables (i.e., tables with identical leg lengths) have high fitness and
the scrabble string problem, where the phenotypic fitness landscape is extremely rugged because only strings with characters that spell valid English words have high fitness.

We showed that bottlenecked and denoising autoencoder genotype-phenotype maps improve evolvability and demonstrated how, that, in some circumstances, better evolutionary outcomes can be attained using autoencoder genotype-phenotype mappings.

For more details, go get the PDF here.

🔗 What Next?

Our paper Learning an Evolvable Genotype-Phenotype Mapping is really just a proof-of-concept demonstration. We’re really excited to explore our AutoMap technique (including potential pitfalls) further, demonstrate it on challenging problems, and devise extensions.

Shameless plug: keep up with me and developments with this research as they happen on twitter @MorenoMatthewA .

🔗 Deep Learning: Not Just for Toy Problems

You might have already heard, but deep learning is pretty hot right now. It’s a powerful technique that performs well on hard problems. Remember the latent space interpolations that we looked at with the MNIST data set? Well, here are some latent space interpolations with faces instead of handwriting.

Autoencoder latent space interpolation with faces! Graphic from [White, 2016]

Animated GAN latent space interpolation with faces. Graphic from [Karras et al., 2017]

Cool, right? (Disclaimer: the animated latent space interpolation comes from a different, but related, deep learning technique called Generative Adversarial Networks).

My point here is that deep learning is a truly powerful technique. There’s a lot of gas in the can to do some really cool things with AutoMap. Plus, deep learning and autoencoders an active area of research. As advances are made in the theory, software, and hardware behind autoencoders, AutoMap will continue to get more powerful.

One more challenging application we’re considering testing AutoMap on is on the soft-bodied robot problem covered earlier. Can we learn a genotype-phenotype map that’s even more evolvable than HyperNEAT? Can we use a more evolvable genotype-phenotype map to evolve soft-bodied robots that are even faster walkers?

🔗 Fun Tangent: @smilevector

When I was poking around for cool graphics, I stumbled upon my new all-time twitter account, @smilevector. It’s a bot run by run by Tom White that uses autoencoders to animate smiles onto still pictures.

Let the latent space put a smile (vector) on your face!

According to the account’s bio,

I take images of people and then add or remove smiles. Reposting images from my Following list. Bot created by @dribnet

Remember how we were talking about how the latent space was a magical, wonderful place? It turns out that you can find a vector in the latent space that represents the idea of a ``smile.’’ To add a smile to an image, you just use an autoencoder encoder to map your image to the latent space, add the smile vector to that code in the latent space, and then use the decoder to translate the resulting code back to an image.

🔗 Theoretical Implications

This work was, in part, inspired by recent efforts to understand evolvability in terms of learning theory [Kouvaris et al., 2017]. We hope that this work helps to strengthen an explicit connection between applied learning theory (i.e., machine learning) and evolvability.

It should be noted that the AutoMap autoencoder mechanism, taken literally, obviously does not make a completely clean analogy to biological evolution. Notably, in biological evolution the mechanisms that conceivably promote evolvability aren’t directly exposed to information about other divergent, independent lineages (although these mechanisms are conceivably exposed to information about a population’s evolutionary history).

🔗 Potential Drawbacks

Further work with AutoMap, of course, will also investigating potential pitfalls with the technique and, hopefully, testing schemes to circumvent them. Reviewer #2 had some very insightful questions about potential pitfalls of AutoMap that traced concerns my coauthors and I had been wondering about ourselves. Let’s talk through some of the points Reviewer #2 brought up… plus some of my own!

The approach requires that we already have evolved a set of good solutions that spans the entire fitness landscape, as that is what’s used as input. If we have these solutions available already, then why would we go on to create a new GP mapping? A better GP mapping appears particularly useful in cases where the existing mapping would not allow GAs to converge quickly to good solutions, but then we would not have the necessary input for the method available.

For some problem domains where the direct genotype-phenotype mapping performs very poorly, potential decent solutions to use as training data might be attained using a manually designed indirect encoding. We would then hope to find even better solutions by training an AutoMap genotype-phenotype map and performing further evolutionary search.

Where this is not possible, we might be able to “bootstrap” our way to a decent genotype-phenotype map through repeated cycles of evolution to generate training data and learning to train the autoencoder genotype-phenotype map. It will be interesting to test the viability of this get-around.

For some problem domains (e.g., trying to evolve faces), a collection of known high fitness phenotypes (e.g. face images), might exist a priori. So if we’re really lucky, we sometimes might not even have to use evolutionary methods to generate training data.

There appears to be a danger with this approach that huge parts of the fitness landscape will become inaccessible since the GP mapping only generates phenotypes with very specific characteristics (see the legged table example – all generated phenotypes consist only of balanced tables). This could be very dangerous in cases where the good solutions cannot be expected to all share certain simple features. How does one deal with that?

To a certain extent, this is an inherent problem with any indirect genotype-phenotype map [Clune et al., 2008]. Clune et al., proposed the HybrID technique as a potential solution to the restrictiveness of indirect genotype-phenotype mappings [Clune et al., 2009]. In HybrID, evolution starts off with an indirect genotype-phenotype map and then, following convergence, proceeds to a fine-tuning phase using a direct genotype-phenotype map. HybrID could certainly be used in this manner with AutoMap as the genotype-phenotype map. HybrID could also be used to generate new training data (e.g., high-fitness phenotypes with irregular optimizations) in order to try to teach the autoencoder genotype-phenotype map to map to phenotypes that might otherwise have been neglected.

In high-dimensional fitness landscapes, the number of input “good” solutions required to span the whole landscape could be prohibitively large.

Yes.

Generating training data through replicate evolutionary runs is arbitrarily parallelizable, so the question becomes one of available compute resources. The potential solution is likely along the lines of something spelled out with refrigerator poetry magnets in the kitchenette of the office suite my lab shares with ICER: supercomputing with parallel love clusters compute scalable science. (Also, for those interested, data standards are speed, help my slegehammer crashed, and youclickcoolcomputervirus.com.)

Thanks again, Reviewer #2, for your careful read-through and thoughtful comments. Now, here are a few of my own concerns with AutoMap.

Autoencoders work really well, but don’t work perfectly. Take a look at some recent results from denoising autoencoders in action on face images.

Denoiser in action reconstructing missing parts of faces. Left to right: original image, corrupted image, reconstructed image. Note the fuzziness of the reconstructed parts. Graphic from [Allen and Li, 2018].

Denoiser in action reconstructing missing parts of faces. Left to right: original image, corrupted image, reconstructed image. Note the artifacts induced on the image segments that weren’t missing. Graphic from [Allen and Li, 2018].

As Allen and Li point out, there seems to be a trade-off here between avoiding fuzziness in the reconstructed segment of the image and avoiding inducing artifacts elsewhere in the image. Perhaps there’s a methodological fix for this particular problem. However, it goes to show that autoencoders do have limits to their performance

Will autoencoders work well enough in order to serve as genotype-phenotype maps for all evolutionary applications? Probably not. Will they work well enough for some interesting evolutionary applications? Hopefully.

How should we decide how big to make the bottleneck on bottlenecked autoencoders used as genotype-phenotype maps for different problem domains? The only answers I can think of here are good heuristic guessing and trial and error.

Finally, a somewhat more profound conundrum. Manually designing genotype-phenotype maps is heuristic, hard, and time-consuming. However, deep learning can often also be heuristic, hard, and time-consuming. Does AutoMap really make things any easier?

🔗 Let’s Chat

I would love to hear your thoughts on AutoMap, especially thoughts about potential applications!!

I started a twitter thread (right below) so we can chat

We used deep learning to get evolvable genotype-phenotype maps from example high-fitness phenotypes. I wrote this non-technical blog to explain the how and why... it’s fun with lots of cool graphics 🎇🎆 plus some killer refrigerator poetry from @icermsu 😺😺😺 https://t.co/aiHJUyinsX
— Matthew A Moreno (@MorenoMatthewA) June 8, 2018

Pop on there and drop me a line or make a comment

🔗 Acknowledgements

Thanks, of course, to my coauthors Charles Ofria and Wolfgang Banzhaf one of whom also deserves a shout-out for serving as my PhD advisor.

I’d also like to acknowledge Professors America Chambers and Adam Smith for advising my initial theoretical and experimental work on evolvability, which led me to this methodological research.

This writeup benefited from feedback from members of my lab group. Thank you!

As a graduate student, I receive institutional support from the Computer Science and Engineering and the NSF BEACON Center for the Study of Evolution in Action at Michigan State University.