For better or worse, AI can now figure out what you’re doing even without “seeing” you. The MIT Computer Science & AI Lab (CSAIL) has unveiled a neural network model that can detect human actions through walls or in extremely dark places.

Although automating the process of action recognition from visual data has been a computer vision research focus for some time, previous camera-based approaches — much like human eyes — could only sense visible light and were largely limited by occlusions. The MIT CSAIL researchers overcame those challenges by using radio signals in the WiFi frequencies, which can penetrate occlusions.

Their “RF-Action” AI model is an end-to-end deep neural network that recognizes human actions from wireless signals. The model uses radio frequency (RF) signals as input, generates 3D human “skeletons” as an intermediate representation, and can track and recognize actions and interactions of multiple people. The skeleton step enables the model to learn not only from RF-based datasets, but also from existing vision-based datasets.

Researchers say RF-Action is the first model to use radio signals for skeleton-based action recognition. “There are lots of potential applications regarding human behavior understanding and smart homes. For example, monitoring the elderly’s abnormal behaviors such as falling down at home, monitoring whether patients take their medicine appropriately, or remote control of smart home devices by actions,” says the paper’s co-first author Tianhong Li.

Although the new research is likely to further inflame those who fear AI-powered surveillance and security technologies, Li believes the model could actually assuage such anxieties. Unlike video-based systems, the RF-Action input is only radio signals, which do not include personal identification details such as facial data, appearance or clothing.

The above images demonstrate RF-Action’s performance in two scenarios. On the left, two people are shaking hands, one is visible through a doorway while the other is occluded by a wall. On the right, one person is making a phone call while another person, barely discernable in the darkness, is about to throw an object at her. The RF-Action model recognizes both image interactions correctly.

RF-Action model architecture

Due to the lack of existing action detection datasets with RF signals and corresponding skeletons, the researchers had to build their own. They had 30 volunteers interact in 10 different environments — offices, lounges, hallways, corridors, lecture rooms, etc. — and employed a radio device to collect RF signals and a camera system with 10 different viewpoints to collect video frames. The final 25-hour dataset also includes two through-wall scenarios — one for training and one for testing.

In experiments the RF-Action model outperformed both the SOTA HCN model for skeleton-based action recognition and the SOTA Aryokee model for RF-based action recognition in visible as well as through-wall scenarios. Considering the remarkable accuracy achieved, this new model may soon make its way into modern smart home environments.

There is however still work to be done. The model has so far only been tested on a single wall and within a detection range of 1 to 11 meters (3 to 36 ft). “The RF signal will have much more attenuation with multiple walls, so it may be hard to have enough signal to noise ratio with too many walls,” says Li.

The paper Making the Invisible Visible: Action Recognition Through Walls and Occlusions is on arXiv.