Microsoft Research Asia and University of Science and Technology of China have jointly released a new human pose estimation model which has set records on three COCO benchmarks. The neural network “HRNet” features a distinctive parallel structure that can maintain high-resolution representations throughout the entire representative process.

HRNet (High Resolution Network) model has outperformed all existing methods on Keypoint Detection, Multi-Person Pose Estimation and Pose Estimation tasks in the COCO dataset. The project research paper has been accepted by CVPR 2019.

The research team designed a parallel structure to enable the model to connect multi-resolution subnetworks in a novel and effective way.

Most existing methods connect resolution subnetworks in series, from high-to-low resolution or low-to-high resolution.

HRNet’s network starts with a high-resolution subnetwork. Unlike existing networks, it does not rely on a single, low-to-high upsampling process to aggregate low-level and high-level representations, but instead conducts repeated multi-scale fusions throughout the process.

The research team introduces “exchange units” which shuttle across different subnetworks, enabling each one to receive information from other parallel subnetworks. High-resolution representations can be obtained by repeating this process.

Researchers compared HRNet performance on Keypoint Detection with existing methods on the COCO val2017 validation set. The HRNet-W48 (big size) and the HRNet-W32 (small size) both broke the COCO record on the ImageNet classification task. On the COCO test-dev set for pose estimation and multi-person pose estimation tasks, both HRNet-W48 and HRNet-W32 also surpassed other existing methods. On other datasets, HRNet performed better than all rivals on MPII verification sets, PoseTrack, and ImageNet verification sets.

HRNet has been open-sourced. In addition to pose estimation, the new method could also be applied in semantic segmentation, face alignment, object detection, image translation and other areas.

The paper Deep High-Resolution Representation Learning for Human Pose Estimation is on arXiv.