A cooperative research group from Google, Stanford, and Johns Hopkins has proposed “Auto-DeepLab,” a new method which utilizes hierarchical Neural Architecture Search (NAS) for semantic image segmentation. The project team includes top AI researchers Director of the Stanford Vision Lab Fei-Fei Li; and UCLA Center for Cognition, Vision, and Learning Director Alan Yuille.

Semantic image segmentation is an important a computer vision task that assigns a semantic label to every pixel in an image. Neural Architecture Search is a key AutoML process that has already been successfully used for other image classification tasks, and the team explored ways to extend NAS to dense image prediction problems. Existing methods usually focus on searching the cell structure and hand-designing an outer network structure. Researchers proposed searching the network level structure in addition to the cell level structure, as many more architectural variations for dense image prediction can be found at the network level.

The researchers also developed “a differentiable formulation that allows efficient gradient-based architecture search over two-level hierarchical search space,” which only requires three days on a P100 GPU, making it 1000x faster than the DPC model (previous SOTA, see Table 1).

Evaluations were performed on three datasets — Cityscape, PASCAL VOC 2012, and ADE20K — to compare the work with state-of-the-art architectures.

As seen in the detailed evaluation results presented in Table 4 to Table 7, the new Auto-DeepLab method easily outperforms previous SOTA architectures when there is no pre-training. It can also perform comparably with top ImageNet-pretrained models, and even outperforms some of them.

The paper Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation is on arXiv.