Mask R-CNN (Regional Convolutional Neural Network) has been the state-of-the-art model for object instance segmentation since it was proposed by Facebook Research Scientist Kaiming He in 2017 and won Best Paper at ICCV the same year. Mask R-CNN utilizes a relatively simple method to achieve its success in tasks of object detection, instance segmentation, and keypoint detection.

Mask Scoring R-CNN (MS R-CNN) is a new model proposed by a team from HUST (Huazhong University of Science & Technology) and Horizon Robotics Inc. which tweaks a Mask R-CNN based algorithm to optimize the scoring of instance segmentation masks. The paper Mask Scoring R-CNN has been accepted by CVPR 2019 and demonstrates new SOTA results, consistently outperforming Mask R-CNN on the COCO benchmark for instance segmentation.

“Scoring” is the core of the proposed method: The authors find that previous methods including Mask R-CNN treat the confidence of instance classification the same as the mask quality (measured with IoU, Intersection-over-Union) although they are usually not well correlated.

The new method uses a network to learn the quality of the predicted instance masks via regression (measured with a MaskIoU score) and then penalize the instance mask score if the classification score is high while the actual mask quality is low.

The researchers conducted evaluation experiments on the COCO dataset, and AP (average precision over IoU thresholds) at different scales (AP@0.5, AP@0.75, APs, APm, APl) were used as evaluation metrics.

The results show that no matter what backbone network is used, MS R-CNN can always outperform Mask R-CNN by more than one percent.

The Mask Scoring R-CNN research paper is on arXiv and the code has been open-sourced on Github.