ClearGrasp was trained to identify transparent objects with the correct depth. The researchers say the process can not only detect transparent objects, but will also help with grasp planning by recovering “detailed non-planar geometries, which are critical for manipulation algorithms.” There are two steps:

Input colour image to deep convolutional network and perform surface normals estimation, occlusion boundaries detection, and transparent object segmentation; Replace the mask with initially input depth image, and feed the modified depth information along with surface and boundary information into global optimization.

The surface normals guide the reconstruction of depth into the shape, while the occlusion boundaries maintain the distinctions. Because researchers could not find a suitable existing dataset for training and testing the ClearGrasp algorithm, they made another key contribution to the research field by building a large-scale synthetic training dataset with more than 50,000 RGB-D images. They also created a real-world test benchmark with 286 RGB-D images of transparent objects and their ground truth geometries.

Researchers compared the ClearGrasp approach with DenseDepth, a SOTA monocular depth estimation method that uses a deep neural network to directly predict the depth value from colour images, with ClearGrasp significantly outperforming DenseDepth on ability to generalize to previously unseen object shapes.

Parallel-jaw grasping process

Putting their algorithm into action, the researchers incorporated ClearGrasp into a robotic picking system. The arm was tasked with identifying and grasping various transparent objects on a table within the robot’s workspace, using RGB-D camera images and 500 trial-and-error grasping attempts on a state-of-the-art grasping algorithm.



ClearGrasp showed its potential in practical pick-and-place applications, with researchers reporting a significant improvement in grasping success rate: from 64 percent to 86 percent for suction and from 12 percent to 72 percent for parallel-jaw grasping.

ClearGrasp failure cases, where errors in output depth (highlighted in red) are due to errors in occlusion boundary prediction (highlighted in black).

The researchers acknowledge ClearGrasp has its limitations and there is room for improvement for example in dealing with diverse lighting conditions and cluttered environments, but conclude the method successfully combines synthetic training data and multiple sensor modalities to infer accurate 3D geometry of transparent objects for manipulation.



The paper ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation is on arXiv. More information on the results can be found on the project page. The code, data, and benchmark have been released on GitHub.