TensorFlow is everywhere. It’s expanded from use only on servers, to mobile devices, and now is inside common objects like cameras. At a recent 360 camera developer gathering in San Jose, I was surprised that the top three projects created in a mini-hackathon all used TensorFlow running inside a 360 camera for object recognition and classification.

Based on discussions with the hackathon participants, I’ve summarized the top 5 reasons they are running TensorFlow inside of a camera. Although the event focused on the RICOH THETA camera, the same requirements roughly apply to any 360 camera, including consumer 360 cameras from GoPro, Insta360, and Samsung.

1: Terabytes of Data Need to Be Processed

A common way to process image data is to take a video of scene, load the video up to the cloud and then process the video as individual frames in the cloud. For example, if you upload spherical images to Google Maps with the Google Street View app, you can upload video to get hundreds of Street View images onto Google Maps with a single video file upload.

To illustrate a common workflow with 360 video data, I’ll show a screenshot of Google Maps. The blue circles are individual pictures that Google servers took out of a 5.4K video file that I uploaded using Google’s Street View mobile app.

After the video is processed into images, they can be viewed on Google Maps or Street View. The spheres are linked together. Cloud processing works fine. It takes about an hour for the images to populate Google Maps. Auto-linking of the images inside of maps may take a few days.

Most people are okay with the time it takes to process the images and video in the cloud. The main problem is that one hour of video is 792 GB. This is for special low-framerate video that is running at 5 frames per second (fps) and is one-sixth the size of standard 30fps video.

The problem of large video data packages is going to get worse as 360 cameras in the next few years will have higher resolution and larger file sizes. Video compression standards like HEVC or AV1 aren’t going to be enough to get the video to cloud-based TensorFlow servers fast enough.

2: Processing Power of Consumer Devices Is Good Enough for Many Things

The camera we used last weekend was the size of an energy bar that people eat for snacks. It contained a Snapdragon 625 Qualcomm MCU with an 8 Cortex A-53 cores and an Adreno 506 GPU. This is getting close to the power of mobile phones, which is also astounding. The camera can process a live video stream internally with TensorFlow.

While this same idea applies to any type of IoT device with sensors, cameras are an especially good match for TensorFlow applications as the image, video, and sound data matches well with the mathematical tensor constructs that TensorFlow excels at.

In the graphic below, I am running TensorFlow inside the camera and project a virtual screen to my laptop so that I can grab these screenshots of a live video stream. The camera does not have a screen on it.

3: Wi-Fi and Mobile Networks Still Remain Slow

Although the camera we used support 5Ghz Wi-Fi, it’s still too slow for the volume of visual data we need to transmit to the cloud for processing. While edge devices in a data center may have the luxury of 100 Gigabit Ethernet, our IoT devices in the field are battling Wi-Fi channel interference and a changing environment such as an industrial robot getting installed with big metal sides that partially block the Wi-Fi signal. Realistically, cellular modems, even with multiple modems bonded together, aren’t going to solve this problem.

To make the problem more real, I’ll provide a brief description of each of the prototype TensorFlow projects that developers presented at the conference.

Smart traffic system that uses video going into TensorFlow to detect and classify cars, people, and animals at intersections and then optimize the traffic signal. Crowd-based smart airplane tracker to identify airplanes in the sky for plane spotter hobbyists or for analysis of noise pollution as a result of plane flight paths. Personal home safety analysis to detect if an elderly person falls down in kitchen

In each of the examples, the device is connected to the Internet with Wi-Fi. There is a large volume of visual data that needs to be processed. TensorFlow is looking for a certain condition to send a trigger to a machine or an alert to a human. While it would be great to be able to push all the visual data up to the cloud for processing and learning, it’s not feasible in these projects.

4: TensorFlow Can Identify Potential Gold in Big Data Garbage

By detecting objects in a video stream and then taking a still 360 image of the scene, TensorFlow can reduce a 10-gigabyte data package down to 10 megabytes.

Let’s go back to the TF Detect demo that I’m running on my camera. The picture below is of a video stream. When a suspected person is detected, TensorFlow triggers a still image to be taken. The still image is about 3.6 MB. Assuming that the object you’re going to analyze isn’t in the scene most of the time, you can take a picture only when a certain object is in the scene. This data can be fed into a more complex learning model in the cloud to improve the learning of your IoT device.

Try it Yourself

Inspired by the projects at the conference, I played around with TensorFlow running as an embedded application inside my camera. The GitHub repository below is simply a fork of the original TensorFlow repository. My friend Makoto Shohara ran an awesome workshop in Tokyo and modified the original TensorFlow Android examples to work as an embedded application. He also put together a nice set of steps of how to port TensorFlow to embedded Android devices. I extended the documentation and examples here, which is a fork of his personal GitHub repository.

Next Steps

The current learning model is not optimized for equirectangular (360 degree) images or video. Although the detection is usable, we really need to improve it to get full 360 object detection. Community member Fabien Benetou pointed out this research paper, Object Detection in Equirectangular Panorama. This is obviously a new and hot field. TensorFlow is already excellent at standard 2D media. Use with 360 media and potentially integrating it with more complex augmented reality applications is obviously the next frontier. If you know of an equirectangular object dataset or cool TensorFlow project for 360 media, let us know.