Even as fashion image analysis gets more traction from today’s image recognition researchers, understanding fashion images remains challenging for real-world applications due to large deformations, occlusions, and discrepancies in clothing across domains and between consumer and commercial images.

DeepFashion is a large-scale clothes database introduced last year by a research team from the Chinese University of Hong Kong (CUHK). The dataset contains over 800k diverse fashion images, each labeled with 50 categories, 1,000 descriptive attributes, bounding boxes and clothing landmarks.

DeepFashion was a solid foundation, but it left a number of areas for improvement. It was limited to a single clothing-item per image, sparse landmarks (4~8 only), and had no per-pixel masks. CUHK researchers recently teamed up with Chinese AI giant SenseTime to develop a greatly improved iteration in DeepFashion2, a large-scale benchmark with comprehensive tasks and annotations of fashion image understanding.

DeepFashion2 contains 491K images of 13 popular clothing categories. A full spectrum of tasks are defined, including clothes detection and recognition, landmark and pose estimation, segmentation, as well as verification and retrieval. All these tasks are supported by rich annotations.

The dataset also includes a total of 801K images of pieces of clothing. Each item is labeled with scale, occlusion, zooming, viewpoint, bounding box, dense landmarks, and per-pixel mask. These items can be categorized as 43.8k clothing identities, where a clothing identity represents a class of apparel with nearly identical cuts, patterns, and designs. Images of the same clothing identities are taken from buyers and sellers, where an item from the buyer and an item from the seller forms a pair.

Researchers say the work makes three main contributions:

Compared with other clothes datasets, DeepFashion2 annotations are at least 3.5× those of DeepFashion, 6.7× of ModaNet, and 8× of FashionAI. A full spectrum of tasks is carefully defined on the proposed dataset. Researchers extensively evaluated Mask R-CNN with DeepFashion2. A novel Match R-CNN is also proposed to aggregate all the learned features from clothes categories, poses, and masks to solve clothing image retrieval in an end-to-end manner.

333The research team believes the rich data and labels of DeepFashion2 will accelerate the development of future algorithms to understand fashion images. The paper DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images is on arXiv.