Step 3: Adding Camera/ImagePicker functionality to your app

Next up, you’ll need to add either a camera or image picker functionality to your app so that you have an image to work with. If you’re not sure how to do this, you can check out the following blogpost for adding a camera functionality to your app with CameraX:

And the following library for the image picker:

You can then create a basic layout for your app containing an ImageView and a TextView to display the picked image and the output results respectively.

Step 4: Initialize the pose estimation predictor

Once we have the UI of our app in place, the next step would be to initialize the pose estimation predictor in the app. We can do it as follows:

Note: Be sure to replace “YOUR_API_KEY” with the actual API key you receive from your Fritz’s Project Dashboard.

We used the accurate version of the model—you can select either a fast or a small model as well, depending on your use case. You can find more options to customize your pose model here:

Step 5: Creating a FritzVisionImage from the obtained bitmap

The predictor can only perform inference on an object of the type FritzVisionImage , so our next step would be to convert the obtained Bitmap from our imagePicker or Camera into a FritzVisionImage :

We chose our Bitmap to be mutable since we’ll later be drawing the pose and keypoints (dots) as shown in the screenshot above.

Step 5: Performing inference on the resulting image

Finally, we’ll be running our image through the predictor we created in step 4, which will give us the detected poses in the image. The code for doing this is pretty straightforward, and it looks like this:

Earlier in this blog, I talked about how this API gives us 17 keypoints, their names, and their probability scores. Our job now is to extract them from every pose that’s detected.

Calling the keypoints() method on the pose obtained in the forEach loop above gives us an array of KeyPoint objects, and each KeyPoint object has the following properties:

// a unique id of the keypoint

private int id;

// position of the keypoint

private PointF position;

// score of the point (ranges from 0-1)

private float score;

To recap, these keypoints are:

nose (available at the first position of the array)

(available at the first position of the array) left eye (available at the second position of the array)

(available at the second position of the array) right eye (available at the third position of the array)

(available at the third position of the array) left ear (available at the fourth position of the array)

(available at the fourth position of the array) and so on …

In order to get all the keypoints we want (in our case, since we are detecting a cropped face, we only want the ears, eyes, and the shoulders), we can modify our loop as follows:

This is what the outcome looks like: