Approach

When you think about it, controlling a character in a physically simulated world isn’t very different from controlling a legged robot in the real world. Unfortunately, legged robotic locomotion is tough. That’s why lots of robots walk with very slow careful movement, never lifting their feet high off the ground or straightening their knees completely. Legged robots are not only chaotic systems composed of many interconnected pieces – they’re also under-actuated. This means we cannot fully control their trajectory through space, so sophisticated planning and coordination is generally needed.

Fortunately, game characters have an advantage over robots when it comes to solving these problems. Robots need to solve problems in the real world using real hardware. Simulated characters don’t have to worry about things like inaccurate sensors, underpowered actuators, maintenance, or damage. Most importantly simulations can run faster than real-time, which lets us effectively apply deep reinforcement learning (DRL).

DRL essentially works by training a control policy using trial and error experiences. The policy is a neural network which is optimized to output decisions which maximize performance on a task by learning from the experiences. Reward values associated with the decisions let the policy learn from which decisions were good and bad in the long run, and encourage it to repeat those which were good. This has led to breakthroughs on many difficult AI problems, most famously DeepMind’s AlphaGo used DRL to become the first Go AI capable of winning against professional players. Simultaneously, it has also been causing a revolution in robotics.

DReCon trains characters to track randomly generated trajectories like this using DRL. These are gathered thousands of times a second Complex robotic tasks such as locomotion or object manipulation usually have relied on expensive optimization methods to achieve any level of success, which made application to real time systems difficult. An unsatisfactory trade-off between runtime cost and task performance was almost unavoidable, but I would argue DRL has the capability to eliminate this trade-off in some applications. With DRL, expensive operations are performed as precomputation yet result in a controller with high performance and low runtime cost. Doing this precomputation requires extensive trial-and-error – really time consuming on real robots, much faster with simulated ones. Your browser does not support the video tag. Users defines the short term trajectory / heading they want for the character – an artificial “worst case” user is used during training DReCon trains simulated characters using DRL to use their joint actuators (similar to muscles) to follow a user controlled animated character. The controller automatically learns to correct any physical problems with the motion of the animated character since the goal is to track the motion as best as physically possible. This necessitates learning how to maintain balance and walk around while preserving the overall animation style. We get a high degree of responsiveness by training with an artificial user who varies gamepad input aggressively and randomly. The system has no choice but to learn a strategy that adapts to such a “worst case” user. Motion Matching continually generates animation through a search process

To generate animation we use Motion Matching, a data-driven technique that enables very realistic character animation. “For Honor” was the first game to use this method, and nowadays it’s seeing more widespread adoption. Motion Matching replaces the traditional method of directly choosing animations using a finite state machine. Developers instead choose constraints they want respected. Then, a large dataset of motion-capture is searched and the pieces of animation that violate the constraints the least are continuously combined to generate the resulting character motion. This can be implemented as a nearest neighbor search in a high dimensional space, where each dimension represents one of the constraints. Our implementation constrains these features: the characters path in the next few frames, facing direction along the path, animation style, and continuity of foot placements. Your browser does not support the video tag. A visual example of Motion Matching searching animations from “For Honor” Through this search process, Motion Matching generates very realistic looking animation because large amounts of data are intelligently patched together specifically with the intent of achieving high-level motion objectives and preserving continuity. This allows complex but subtle locomotion behaviours from motion capture to be reproduced. Because of this, using DRL to train a policy which tracks the output of Motion Matching has an advantage over previous work, such as DeepMimic which tracked fixed animations and used rewards to achieve secondary goals. Planning movement (for example steering) is handled by Motion Matching and as a result motion objectives can only be achieved by learning how humans achieve these objectives in the motion capture data – rather than by maximizing arbitrary goal based rewards that will invariably compete with tracking.

A simulated character can be controlled precisely (red) to follow a path (black)

Nonetheless, learning this way significantly increases the amount of system states the controller must adapt to and makes training more complicated. To solve this, our control policy assumes the joint positions from the animation are somewhat acceptable inputs to a simple open-loop control system, and only train the policy to make small corrections to these to maximize tracking performance. We also assume corrections should be temporally coherent and employ a filtering scheme to force the corrections to be smoothed out over time. In this way the learning is constrained towards more favorable control strategies which are in the neighborhood of the open-loop controller, and cannot exploit noise in the simulation.