Researchers from the University of Cambridge have recently presented a method for facial localization not only on humans, but on sheep as well. Introducing a new feature extraction scheme, they managed to show very competitive performance in comparison to other methods. The new scheme is called the triplet-interpolated feature, and it’s used at each iteration of the cascaded pose regression framework. CPR progressively refines a loosely specified initial guess, and each refinement is carried out by a different regressor (independent variable represents the input or cause or it is tested to see if it is the cause) that performs simple image measurements dependent on the output of the previous regressors. This model is able to extract features from similar semantic locations, when an estimated shape is given, even with large variations in head pose and when the facial landmarks are sparsely distributed. The proposed scheme was evaluated on human and sheep facial landmark localization. Face alignment or locating semantic facial landmarks – such as eyes, nose, mouth and chin – is essential for tasks like face recognition, tracking, animation and 3D modeling. Two types of source information are usually used: facial appearance and shape information.



Animal and human faces tend to illustrate different information, such as emotions, expressions, identity and health conditions. Besides humans, this is important in animal welfare as well. The special emphasis here was given to the sheep, which don’t have a wide array of facial expressions compared to other animals. However, specific postures have been linked with emotional experiences, the same way as wagging the tail is usually connected with dogs’ happiness, for example – backward ear posture in sheep and other animals is associated with unpleasant situations, such as fear. This is important for detecting sheep in pain, and experts on animal welfare research are able to detect patterns of sheep being in pain, such as orbital tightening, abnormal ear position and nostril and philtrum shape. Therefore, to automatically identify those features, it is important to localize the corresponding landmarks, similar to human face alignment.

Several obstacles are met when the state-of-the-art algorithms are applied to human and animal facial landmark localization, such as a small number of facial landmarks in practice (unlike the benchmark dataset where a large number is annotated), and there are lots of different head pose variations.

The new feature extraction scheme called the triplet-interpolation feature (TIF) for cascaded pose regression, which uses three anchor landmarks to calculate the shape-indexed feature, and is therefore more robust to large head pose variation and shape deformation. The researchers also used an augmentation scheme for training sample – they augmented the minority training samples with more random initializations and vice versa. In the framework of cascaded shape regression, data augmentation is usually carried out during training time. TIF process works like this: out of every group of three randomly selected landmarks, one is randomly chosen and assigned as the primary point. Then two vectors, from the primary, to the other two, can span the whole plane by linear combination, and by setting its parameters, we can select a position within the spanned area.

The datasets used ranged from AFW, HELEN, LFPW to the newly annotated iBug. It was partitioned into 3148 training images and 689 test images. A comparison with two other methods has been made. The first one is Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features. The second one is Explicit Shape Regression (ESR) which generates a face shape by repeatedly refining an initial guess shape via a series of cascaded regression functions.

The sheep face image in the collected dataset exhibited a wide range of diversity, for example: sheep breed, facial colour, lighting condition, background, occlusion, head pose, ear posture etc. Around 90% of the sheep images were localized with mean error less than 10% of the face size. Even though the model performed considerably well on the dataset, the researchers have localized only the landmarks of interest, but there are still many problems to identify sheep in pain, and it’s an unusual new challenge for computer vision, and animal behavior and welfare studies as well.

_______________

References: