We know that compute-rich AI can beat humans at brainy games like Go, and can even react quickly enough to prevail at StarCraft. But Jenga? This is not a mathematics or pattern recognition problem, it’s a party game, an activity that stands — and falls — in that special arena where intuition and the highly nuanced “human touch” make all the difference, right?

Well, not anymore. MIT researchers just taught a robot to play the block tower game.

There’s a lot more to a friendly game of Jenga than meets the eye. Strategies are informed by a complex set of tactile and visual stimuli — by touching a block and observing the tower, we not only see but also feel our actions and their consequences. The MIT Jenga robot thus marks an important step in AI’s transition to the physical world.

So far, researchers have the Jenga robot playing on its own, and so sadly it has yet to devise any new or unusual techniques to foil opponents. It has however managed to learn physics.

It is an extremely difficult to combine tactile stimuli with visual stimuli to build a real-time interactive physical system based on data streams, and tactile reasoning remains relatively underdeveloped in robotic manipulation. Additionally, tactile stimuli can only be obtained via invasive interaction, so the process of getting the data also changes it.

Researchers equipped the robot with a soft-pronged gripper, a force-sensing wrist cuff, and an external camera to enable it to “see and feel” the blocks and the tower. Instead of training a model on data from thousands of block-extraction experiments, researchers allowed the bot to learn the best real-world block extraction techniques by exploring physical interactions via trial and error.

Researchers made progress by grouping the Jenga robot’s results in clusters representing different types of block behaviors, which is similar to humans’ learning experiences when playing the game. One cluster of data for example could include attempts if a block is wedged too tightly to be moved, while another cluster of data could represent scenarios where blocks are easier to move.

By combining the data clusters with real-time tactile and visual measurements, the robot was able to generate and refine a concise model to predict block behaviors. “We saw how many blocks a human was able to extract before the tower fell, and the difference was not that much,” noted researcher Miquel Oller.

“There are many tasks that we do with our hands where the feeling of doing it ‘the right way’ comes in the language of forces and tactile cues,” says co-author Alberto Rodriguez. “For tasks like these, a similar approach to ours could figure it out.”

We probably won’t see a robotic Jenga champion anytime soon; the MIT team aims to apply the research to other domains such as separating recyclables from landfill trash, assembling consumer products, etc.

The paper See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion was published in the journal Science Robotics.