Manipulation By Feel

Guiding our fingers while typing, enabling us to nimbly strike a matchstick, and inserting a key in a keyhole all rely on our sense of touch. It has been shown that the sense of touch is very important for dexterous manipulation in humans. Similarly, for many robotic manipulation tasks, vision alone may not be sufficient – often, it may be difficult to resolve subtle details such as the exact position of an edge, shear forces or surface textures at points of contact, and robotic arms and fingers can block the line of sight between a camera and its quarry. Augmenting robots with this crucial sense, however, remains a challenging task.

Our goal is to provide a framework for learning how to perform tactile servoing, which means precisely relocating an object based on tactile information. To provide our robot with tactile feedback, we utilize a custom-built tactile sensor, based on similar principles as the GelSight sensor developed at MIT. The sensor is composed of a deformable, elastomer-based gel, backlit by three colored LEDs, and provides high-resolution RGB images of contact at the gel surface. Compared to other sensors, this tactile sensor sensor naturally provides geometric information in the form of rich visual information from which attributes such as force can be inferred. Previous work using similar sensors has leveraged the this kind of tactile sensor on tasks such as learning how to grasp, improving success rates when grasping a variety of objects.

Below is the real time raw sensor output as a marker cap is rolled along the gel surface:

Hardware Setup & Task Definition

For our experiments, we use a modified 3-axis CNC router with a tactile sensor mounted face-down on the end effector of the machine. The robot moves by changing the X, Y, and Z position of the sensor relative to its working stage, driving each axis with a separate stepper motor. Because of the precise control of these motors, our setup can achieve a resolution of roughly 0.04mm, helpful for careful movements in fine manipulation tasks.



The robot setup, prepared for the die rolling task is described below. The tactile sensor is mounted on the end effector at the top left of the image, facing downwards.

We demonstrate our method through three representative manipulation tasks:

Ball repositioning task: The robot moves a small metal ball bearing to a target location on the sensor surface. This task can be difficult because coarse control will often apply too much force on the ball bearing, causing it to slip and shoot away from the sensor with any movement. Analog stick deflection task: When playing video games, we often rely solely on our sense of touch to manipulate an analog stick on a game controller. This task is of particular interest because deflecting the analog stick often requires an intentional break and return of contact, creating a partial observability situation. Die rolling task: In this task, the robot rolls a 20-sided die from one face to another. In this task the risk of the object slipping out under the sensor is even greater, thus making the task the hardest of the three. An advantage of this task is that it additionally provides an intuitive success metric – when the robot has finished manipulation, is the correct number showing face up?



From left to right: The ball repositioning, analog stick, and die rolling tasks.

Each of these control tasks are specified in terms of goal images directly in tactile space; that is, the robot aims to manipulate the objects so that they produce a particular imprint upon the gel surface. These goal tactile images can be more informative and natural to specify than, say, a 3D-pose specification for an object or desired force vector.

Deep Tactile Model-Predictive Control

How can we utilize our high-dimensional sensory information to accomplish these control tasks? All three manipulation tasks can be solved using the same model-based reinforcement learning algorithm, which we call tactile model-predictive control (tactile MPC), built on top of visual foresight. It is important to note that we can use the same set of hyperparameters for each task, eliminating manual hyperparameter tuning.



A summary of deep tactile model predictive control.

The tactile MPC algorithm works by training an action-conditioned visual dynamics or video-prediction model on autonomously collected data. This model learns from raw sensory data, such as image pixels, and is able to directly make predictions of future images taking as input future hypothetical actions taken by the robot and starting tactile images we call context frames. No other information, such as the absolute position of the end effector, is specified.



Video-prediction model architecture.

In tactile MPC, as shown in the figure above, at test time, a large number of action sequences, 200 in this case, are sampled and the resulting hypothetical trajectories are predicted by the model. The trajectory which is predicted to most closely reach the goal is selected, and the first action in this sequence is taken in the real world by the robot. To allow for recovery in case of small errors in the model, trajectories the planning procedure is repeated at every step.

This control scheme has previously been applied and found success at enabling robots to lift and rearrange objects, even generalizing to previously unseen objects. If you’re interested in reading more about this, details are available in the paper.

To train the video-prediction model, we need to collect diverse data that will allow the robot to generalize to tactile states that it has not seen before. While we could sit at the keyboard and tell the robot how to move for every step of each trajectory, it would be much nicer if we could give the robot a general idea of how to collect the data, and allow it to do its thing while we catch up on homework or sleep. With a few simple reset mechanisms ensuring that things on the stage do not get out of hand over the course of data collection, we are able to collect data in a fully self-supervised manner, by collecting trajectories based on randomized action sequences. During these trajectories, the robot records tactile images from the sensor as well as the randomized actions it takes at each step. Each task required about 36 hours, in wall clock time, of data collection to train the respective predictive model, with no human supervision necessary.



Randomized data collection for the analog stick task (video sped up).

For each of the three tasks, we present representative examples of plans and rollouts:



Ball rolling task - The robot rolls the ball along the target trajectory.



Analog stick task - To reach the target goal image, the robot breaks and re-establishes contact with the object.



Die task - The robot rolls the die from the starting face labeled 20 (as seen in the prediction frames with red borders, which indicate context frames fed into the video-prediction model) to the one labeled 2.

As can be seen in these example rollouts, using the same framework and model settings, tactile MPC is able to perform a variety of manipulation tasks.

What’s Next?

We have shown a touch-based control method, tactile MPC, based on learning forward predictive models for high resolution tactile sensors, which is able to reposition objects based on user provided goals. The use of this combination of algorithms and sensors for control seems promising, and more difficult tasks may be within reach with the use of combined vision and touch sensing. However, our control horizon remains relatively short, in the tens of timesteps, which may not be sufficient for more complex manipulation tasks that we would hope to achieve in the future. In addition substantial improvements are needed on methods for specifying goals to enable more complex tasks such as general purpose object positioning or assembly.

This blog post is based on the following paper which will be presented at International Conference on Robotics and Automation 2019:

Manipulation by Feel: Touch-Based Control with Deep Predictive Models

Stephen Tian*, Frederik Ebert*, Dinesh Jayaraman, Mayur Mudigonda, Chelsea Finn, Roberto Calandra, Sergey Levine

Paper link, video link

We would like to thank Sergey Levine, Roberto Calandra, Mayur Mudigonda, and Chelsea Finn for their valuable feedback when preparing this blog post.