He has written opinion pieces for The Washington Post, PBS NewsHour, National Geographic, Slate, The San Francisco Chronicle, and The Chronicle of Higher Education, and has produced three VR documentary experiences which were official selections at the Tribeca Film Festival in 2016 and 2017. His new book, Experience on Demand: What Virtual Reality Is, How It Works, and What It Can Do is out now.

Bailenson studies the psychology of Virtual Reality (VR), in particular how virtual experiences lead to changes in perceptions of self and others. His lab builds and studies systems that allow people to meet in virtual space, and explores the changes in the nature of social interaction. His most recent research focuses on how VR can transform education, environmental conservation, empathy, and health.

Jeremy Bailenson is founding director of Stanford University’s Virtual Human Interaction Lab , Thomas More Storke Professor in the Department of Communication, Professor (by courtesy) of Education, Professor (by courtesy) Program in Symbolic Systems, a Senior Fellow at the Woods Institute for the Environment, and a Faculty Leader at Stanford’s Center for Longevity. He earned a B.A. cum laude from the University of Michigan in 1994 and a Ph.D. in cognitive psychology from Northwestern University in 1999. He spent four years at the University of California, Santa Barbara as a Post-Doctoral Fellow and then an Assistant Research Professor.

Jeremy Bailenson: If I could succeed in any endeavor as an academic it would be perfecting what I call the virtual handshake. And I don’t mean an actual handshake, I mean that metaphorically. Why do we go to business meetings to be with other people? Because there’s a social connection, this intimacy that when you’re in the same room it feels like you’re there with them and you can do eye contact and you can do subtle posture changes and you can have multiway conversations with sidelong glances, and it feels real. We call that social presence.

VR is not there yet. But if you think about cars: 40,000 people died in the United States last year driving and 1.3 million people worldwide died in car accidents. Think about the productivity lost by sitting in a box for an hour each way to and from work. Think about the fossil fuel that we’re burning while we commute back and forth to work. Think about the road rage. Think about the germs that you get on public transportation. I’m not claiming that we should not see people; I love social connection. What I’m saying is that there’s a subset of travel that if you think about it, why do we drive all the way to work so we can sit at a desk and pound on a computer? Maybe we only need to go two days to work. And for those meetings that are not essential we need to put those in VR.

We cannot support a planet of 11 billion people—which we’ll be at quite soon—with everybody driving and flying everywhere using fossil fuels. It’s just not going to happen. So why don’t we have networked meetings yet? And the answer is because there’s this secret sauce, this social presence that we have face-to-face that we don’t get with videoconference yet. And VR isn’t there yet. So what we need to do is to be able to track more body movements.

The bottleneck is actually not bandwidth because avatar-based communication is cheaper from a bandwidth standpoint than video. The reason is, if you’re doing an avatar-based communication all the 3D models for the avatars are stored locally on each machine. What travels over the network is the tracking data. So locally a camera detects that I smiled and then it sends over network a packet that says smile at 22 percent. And then on the other computer it then draws that smile. So you’re not sending visual information over the network. What you’re sending is very cheap information which is semantic information about movement. The bottleneck is we can’t track movements that accurately. So if you think of the commercial systems right now they track what we call 18 degrees of freedom. Your head and both hands. You can do rotation which has three and X, Y and Z which is obviously three. And so you’ve got 18 points, two hands and a head. In order to have a conversation flow we need to have subtle cheek movements and the twitch of my elbow. Everything I do communicates meaning whether I’m doing it intentionally or not. And the theory that drives this understanding of how humans interact verbally and nonverbally is called interactional synchrony, and psychologists have been studying this for decades, since the 1960s. And the idea is that conversation, it’s a very—it’s an intricate dance and when we’re in a room with people everything is so tightly choreographed. When you nod your head I change my intonation. And when she moves her elbow my knee bobs. And there’s all of these pairwise movements and that’s what makes a conversation feel special face to face. We have to track all the movements of the people in the room in a way that’s sufficient to get that synchrony across.