Learning to walk with evolutionary algorithms applied to a bio-mechanical model

Using real muscles to walk on a human-like model.

The code for this post can be found in this GitHub repository.

One of the 2017 NIPS Challenges is «Learning to Run»: as the name suggests, the task is to design and develop a learning algorithm that is capable of controlling a bio-mechanical model of the human body to make it walk. The actuators, differently from most robotic cases, are legs muscles, 9 for each leg. The authors of the challenge modified the OpenSIM Environment to adapt it to a reinforcement learning setting, thus adding a reward signal.

Something went terribly wrong (or terribly right).

A lot of participants designed end-to-end deep reinforcement learning algorithms, that in recent years have shown to work remarkably well on continuous control tasks. But, often, they need great computational power and time in order to learn a successful policy, working better if parallelized on multiple computers.

I decided to try to have a little fun with it and implement and extend a quite light-weight method that I had recently developed for robotic manipulators: evolutionary algorithms applied to neurocontrollers. Their advantage is to be derivative-free and highly parallelizable, and can achieve results similar to Deep RL algorithms, as shown by OpenAI. The problem is that they are quite completely stochastic, thus it is hard to estimate what and how they will learn. But I was happy to give them a shot.

First steps, baby steps.

…it is important not to exploit only the best performing model, since it could get stuck into a local minima from which it is very hard to escape.

Evolutionary algorithms are used for numeric optimization problems, thus they optimize parameters relative to a fitness function. What are, then, the parameters and the fitness function? As it is empiricaly evident: legs move in a periodic way. Muscles activations thus are activated with a periodic and sinusoidal time law, but this law can be difficult to hand-engineer. What can be done to shape periodic functions? This is where Fourier series comes in. With Fourier series one can approximate virtually any periodic function, using a weighted sum of sine and cosine. But, in theory, they need infinite terms. I designed a truncated Fourier series to shape the muscle activations. I used only the first four terms in a cosine series, thus having 8 parameters: 4 weights that multiply directly the cosines at different frequencies and 4 phases that shift the signals. This creates a periodic function for each muscle. Having each leg 9 muscles, I used 9 of these different periodic functions, since I assume that, for the other leg, we use the same funcions but shifted in phase of 180º. The genetic algorithm, thus, modifies these parameters (very few, compared to any algorithm that involves a neural network) to optimize the fitness function, that is obviously the total reward, i.e. the lenght that the robot walked before falling.

Walking pattern after around a day of training.

The parameters are modified randomly, by sampling from a white gaussian, but if a particular sampled direction improved the performance, the parameters get updated again towards that direction to exploit it until the performance no longer improves. I ran three models in parallel, because they were quite heavy both on computation than memory for a single laptop. The best performing weights get stored into a central parameters file, and after a certain number of episodes the learning restarts from the three best performing parameters groups. This allows a better exploration of different possible behaviours: it is important not to exploit only the best performing model, since it could get stuck into a local minima from which it is very hard to escape, while a worse model can surpass it in the long run by adjusting its behaviour. Indeed, restarting from the best models after a series of episodes was a key factor to learn an appreciable walking pattern. It is quite remarkable how the model learned a movement behaviours that is human like, even without having any prior knowledge of it.

The model achieves a succesful walk for some steps after a relatively short training on a very slow model and old laptop, using only a Intel Core 2 Duo CPU. Other Deep RL models obviously achieve far better performances, but need very long training and powerful hardware. My goal was not to really compete with those, but just to show how a walking pattern could be obtained with under 100 parameters and using genetic algorithm, that train fast even on a single old laptop.

The code for this post can be found in this GitHub repository.