Evolving Efficient Locomotive Strategies in Embodied Agents

*UPDATE* [December 2004]:

A pdf version of my thesis is now available.
My thesis is published at CSULB
GPL'ed source code is located here
Various updates to this page
I've started work on replacing the 2nd generation RNN with a more biologically plausible spiking network

Introduction

The mamillian brain consists of a complex network of cortical microcircuits that allow us to learn from and interact with our environment. Due to its plasticity and learning capacity, it is often an alluring target to model in the development of robotic control systems.

Neurons

Figure 1. Neuron Connections in the Brain

In our attempt to better understand the way the brain works, we have come up with many different mathematical models that attempt to describe the functionality of these neurons as dynamical systems. The earliest model was called the "perceptron" developed by Rosenblatt (1958) from McCulloch & Pitts' work. Further research into squid neurons by Hodgkins & Huxley (1952) and their resulting dynamical equations have formed the basis for the modern spiked neural networks. Initial computational models were based upon linear activation functions. These first models were unable to solve non linear problems such as XOR. Further advances in the models themselves based upon the rate hypothesis of neural encoding has led to the development of more advanced networks that make use of non linear activation functions, such as the Multi-Layer of Perceptrons (MLP). More recently, investigations have been producing evidence that precise millisecond timing of neuronal pulses can carry information (Maas, 1997). This research has led to the development of more advanced computational models that are often labeled Pulsed or Spiking Neural Networks.

In my research, I evolve intelligent locomotive strategies in a population of embodied virtual agents. The virtual environment utilizes a physics engine to model real world dynamics. Previous work in this field include the virtual creatures evolved by Karl Sims (1994). This brief paper will provide an overview of the results obtained during the research phase of my thesis. For further detailed information, please refer to (Ruebsamen, 2002).

The Environment

The virtual environment employs constants that mirror the real world environment. A physics engine govern the interactions amongst objects within the environment. All rigid bodies within the virtual environment are constrained by the physics engine. Russell Smith's excellent open source ODE physics engine was used in these simulations.

The environment currently consists of a virtual world with an infinite horizontal plane that acts as the ground. Any object within this environment will be acted upon by the Newtonian model of physics. Gravity exerts a continuous acceleration of approximately -9.8 m/s^2. Friction, intertia and other physical forces are accurately simulated to provide for an environment that more or less accurately models the physical properties of our own world.

Embodied Agents (Virtual Creatures)

An Embodied Agent

Figure 2. An Embodied Agent (aka Virtual Creature)

The morphology of the embodied agent is completely predetermined and not subject to the forces of evolution. The 'brain' of the embodied agent is composed of a complex recurrent neural network. Several context layers provide a sort of short term "memory" to allow for time dependency. Recurrency in the network allows the agent to learn from both temporal and spatial patterns in the environment and is a requirement to allow for locomotive behaviors to evolve. Table 1 demonstrates that the creatures genetic complexity can range from a fairly simple creature such as the Crawler with a chromosome length of only 266 genes, all the way to the complex Runner morphology with a chromosome length of almost 1000 genes.

Table 1. Genetic Makeup of Embodied Agents

Each embodied agent is endowed with an recurrent neural network (the brain) that maintains several time dependent context layers. The network is described in more detail in the next section. Inputs are fed into the agent's network from sensors located within the agent's body. These sensors can include touch, direction, smell, or any variety of different sensors programmed to obtain information from the creatures environment. The outputs from the agent's network are then fed to the effectors which directly power the agent's appendages to produce physical movement. This input/output cycles repeats itself hundreds of times per second to produce movement. Since my goal is to produce intelligent/effective locomotive strategies, the weights of the recurrent artificial neural network (ANN) are evolved to effectively decode the incoming sensory data and then produce output to properly manipulate the motors, joints, and appendages of a particular morphology of embodied agent and generate efficient motions that are then rewarded.

Sensor Map

Figure 3. Typical Sensor Layout

Evolution & Learning

A Genetic Algorithm (GA) is employed to train the Recurrent Nerual Network that each embodied agent is endowed with. The network used is an Elman Network (Elman, 1991) altered so that the network itself can self-modify the weights in the memory layer. Through experimentation, I was able to determine that self-modifying networks performed significantly better than their non self-modifying counterparts. This type of network differs from conventional two-layer networks in that the first layer has a recurrent connection with (n) context layers (ie. memory). The delay in this connection allows for values from the previous time step(s) to be stored and used in the current time step. Because this type of network can store historical information for future reference, it is able to learn temporal patterns as well as spatial patterns. The Agents with strong survial skills are rewarded by being able to reproduce and thus their progeny tend to populate the next generation. Individuals with a poor fitness tend to become extinct. This fierce darwinian competition allow only for the fittest locomotive strategies to survive into newer generations.

Evolving a Population of Agents

Figure 4. Population of creatures evolving new walking behavior

Each agent has a fixed amount of time to perform its task. Upon expiration of this time period, the population of agents are evaluated and assigned a fitness score. The agents chances of reproduction are proportionate to fitness of its learned locomotive strategy. This method of selection is known as roulette wheel selection. Modified crossover and mutation operators are further utilized to introduce diversity and allow for exploration of the fitness landscape.

It is important to note that only the recurrent neural network is evolved with the GA. The physical morphology of the creature is not evolved during the simulation.

Results

Videos of Evolved Virtual Creatures

Crawler Morphology (1.8Mb)	Long Arm Morphology (2.7Mb)


Hopper Morphology (1.0Mb)	Runner Morphology (3.1Mb)

The video of the Crawler morphology shows the results after the population has been evolved for 300 generations. It is already evident by 300 generations that a smooth motor response has evolved. Furthermore, it is quite interesting that the crawler has evolved a method of locomotion that involves the alternating movement of its appendages.

The second video shows the evolved behaviors of the Long Arm morphology after 300 generations of evolution. Again, there is a smooth motor response; however, this time, the virtual creatures have evolved a method of locomotion that involves the combined simultaneous pulling effort of both of its arm-like appendages. This evolved behavior appears to be more efficient at maintaining a fixed course than alternating the arm movements.

The third video demonstrates the evolved behaviors of the Hopper morphology after 300 generations of evolution. Because of the morphology of the Hopper creature, the only conceivable means of locomotion would probably have to involve some type of springing motion. What is really surprising about the results are that these creatures learned how to perform "Somersaults".

The most genetically complex morphology is demonstrated in the last video, again demonstrating the evolved behavior after 300 generations of evolution. These creatures utilize complex ball & socket joints to attach each appendage to its torso. Notice how its running behavior is almost "life-like" and fluid in form.

Population of creatures learning how to Somersalt

Figure 5. Unexpected Behaviors can Evolve

Finally, figure 5 shows a population of Hopper creatures evolving unique locomotive strategies such as performing the surprising behavior of mid-air somersaults. Figure 5 also demonstrates how a population of embodied agents evolve and compete as a population.

The most exciting feature of the framework is the possibility of modeling real world machines (ie. vehicles, robots, toys, etc) within the virtual environment and evolving intelligent behaviors for the virtual models. The evolved intelligence may then be transferred back into the real world machine, and may begin to demonstrate the same intelligent behaviors that were evolved in the virtual model. A framework such as this can eliminate some of the complexity involved in designing advanced control systems for robotic devices.

The Future

Future researchers may consider implementing the following changes:

Increased complexity of the environment where the creatures interact (ie. introduce obstacles & tools)
Possible interaction and competition amongst the creatures themselves
Parallel Implementations
Biologically plausible neural models
Introduce new creature morphologies, with the possibility of different "species" of creatures competing against each other.
Wheeled vehicles require fewer sensory inputs and outputs, thus allowing more of the neural structure to evolve intelligent behaviors.
Dynamic Fitness Functions (ie. first evolve locomotion, then evolve other behaviors)