A computer keyboard, at its most reliable, is barely noticeable to the person using it. It is designed to be felt, heard and mostly ignored. Once you’re comfortable with the thing, you usually aren’t focused on the keys you’re pressing, or the gentle clack-clacking sound those keys makes. Instead, you’re immersed in whatever is happening on the screen.

Technology has always been a go-between in this way, a means by which humans relate to their environment. In the beginning, when people couldn’t use their bodies to best complete a job, they built a tool—an axe, a hammer, a pulley—to do it for them. For millennia, machines handled increasingly complicated physical work, but humans still carried most of the burden of information processing. Within the last century, that changed dramatically.

With computers that could handle tasks like code breaking and complex calculations, the balance shifted. Machines that could handle data required more sophisticated interfaces beyond simple levers to pull or wheels to turn. Yet, the same interfaces that would enable a new level of machine-human coordination on informational tasks would also remain limited by their own design.

“It created an information-processing problem,” says Gerwin Schalk, the deputy director of the National Center for Adaptive Neurotechnologies. “So in our interaction with the environment, all of the sudden, it wasn’t just humans and some tool, there was something in between—and that something was a computer.”

The problem, as Schalk sees it, is that humans and computers are both able to do far more than the interface between them allows.

“Computers are very fast, they’re extremely sophisticated, and they can process data of gargantuan complexity in just a fraction of a second,” he told me. “Humans are very good at other things. They can look at a scene and immediately know what’s going on. They can establish complex relationships. The issue that is now emerging is an issue of communication. So the underlying problem and question is, how do humans, who are extremely powerful and complex, interact with their increasingly complex and capable environments? Robots are extremely complex. Computers are extremely complex. Our cellphones are extremely complex.”

At this point in technological history, interfaces are built so computers can do as much as possible within the limitations of a human’s sensory motor systems. Given what many people use computers for, this arrangement works out well—great, even. Most of the time, people are reading, writing text and looking at or clicking on pictures and video. “For that, keyboards and mice—and trackpads, and to a lesser extent, voice control, which I think is still not so ubiquitous due to its relative unreliability—are still cheap, robust and well-suited to the task,” says Kaijen Hsiao, a roboticist and the chief technology officer of Mayfield Robotics, located just south of San Francisco. For others though, traditional interfaces aren’t enough. “If I’m trying to explain to a computer some complex plan, intent, or perception that I have in my brain, we cannot do that,” Schalk says.

Put simply, it’s a communication issue even more challenging than human-to-human communication—which is itself complex and multifaceted. There’s always some degree of translation that happens in communicating with another person. But the extra steps required for communicating with a machine verge on prohibitively clunky. “And when you’re trying to explain that same thing to a computer, or to a robot, you have to take this vivid imagery [from your head], and you have to translate this into syntactic and semantic speech, thereby already losing a lot of the vividness and context,” Schalk says. “And then, you’re taking the speech and you’re actually translating this into finger movements, typing those sentences on a computer keyboard. It’s completely ridiculous if you think about it.” On a practical level, for most people, this ridiculousness isn’t apparent. You have to write an email, you use your keyboard to type it. Simple. “But if you just, on a very high level, think about how pathetic our interaction with the environment has become, compared with where it used to be, well, that’s a problem, and in fact that problem can be quantified,” Schalk says. “Any form of human communication doesn’t really [travel at] more than 50 beats per second—that’s either perceived speech or typing. So that’s basically the maximum rate at which a human can transmit information to an external piece of technology. And 50 beats per second is not just inadequate. It is completely, grossly pathetic. When you think about how many gigabytes per second a computer can process internally and what the brain can process internally, it’s a mismatch of many, many, many orders of magnitude.” This mismatch becomes even more pronounced as machines get more sophisticated. So much so, several roboticists told me, that a failure to improve existing interfaces will ultimately stop advances in fields like machine learning and artificial intelligence until there are changes. “As technologies like speech recognition, natural language processing, facial recognition, etcetera, get better, it makes sense that our communication with machines should go beyond screens and involve some of the more subtle forms of communication we use when interacting with other people,” says Kate Darling, who specializes in robot-human interaction at the Massachusetts Institute of Technology. “If we want a machine to be able to mimic states of human emotion, then having it express those through tone, movement and other cues will be a fuller representation of its abilities.”

Such cues will have to be part of a larger fluid interaction to work best. That might mean, for instance, making sure to build subtle forms of communications for robots designed to work with a pilot in a cockpit or a surgeon in an operating room—settings where humans need to be able to predict what a robot is about to do, but still stay focused on what they’re doing themselves.

“There are all these ways people are working alongside a robot, and they need to understand when a robot’s about to move,” says Missy Cummings, the head of Duke University’s Robotics Lab. “[With other humans,] we use our peripheral vision and we see slight motion, so we infer, but robots don’t have those same fluid motions. So we’re trying to figure out how to use a combination of lights and sounds, for example, to figure out how to communicate more nuanced interactions.”

In some settings, like when a person is driving and needs to pay attention to the road, voice communication is still the best interface.

“Of course, the problem with that is voice-recognition systems are still not good enough,” Cummings says. “I’m not sure voice recognition systems ever will get to the place where they’re going to recognize context. And context is the art of conversation.”

There is already a huge effort underway to improve voice-based interfaces, and it’s rooted in the idea that digital assistants like Siri and devices like the Amazon Echo will take on increasingly prominent roles in people’s lives. At the same time, we’re likely to see improvements to other mediative interfaces.

This already happened to some extent with the touch screen, an interface that was long dismissed as worthless on popular gadgets because the technology really wasn’t very good. “Touch screen buttons?” one commenter wrote on the website Engadget as the iPhone was unveiled in 2007, “BAD idea. This thing will never work.” (Schalk calls the iPhone a “profound advance” in human-machine interaction, but also just a “mitigating strategy.”) So far, though, other interfaces—voice control, handwriting digitizers, motion control, and so on—haven’t really taken off.

Many technologists argue that the rise of augmented reality and virtual reality will produce the next big interface. But several engineers and scholars told me that such a leap will require technological advancement that just isn’t there yet. For one thing, even the most sophisticated mixed-reality platforms—Microsoft’s HoloLens comes up a lot—aren’t precise enough in terms of their mapping of the real world in real time, as a user moves through it. Which means these sorts of systems are handy for projecting webpages or other virtual elements onto the walls of the room you’re in, but they’re nowhere near able to do something revolutionary enough to fundamentally change the way people think about communicating with machines.

One of the key questions for developers of these systems is to figure out to what extent—and at what times—the nonvirtual world matters to people. In other words, how much of the physical world around you needs to be visible, if any of it? For a conference call, for instance, augmented reality is far preferable to virtual reality, says Blair MacIntyre , a professor in the School of Interactive Computing at the Georgia Institute of Technology. “You totally wouldn’t want just VR version of that because maybe I need to look at my notes, or type something on my computer, or just pick up my coffee cup without knocking it over.” This is an issue MacIntyre likes to call “the beer problem,” as in the need to pause for a sip of your beer while you’re playing video games. “In VR, that becomes hard,” he says, “whereas in AR, that becomes a little easier.” Eventually, he says, augmented reality really will be able to track smaller objects and augment smaller pieces of the world, which will make its applications and interface more sophisticated. Displays will become more clear. Checking the status of a flight at the airport, for instance, could mean using mixed-reality to look up in one’s field of vision—rather than searching for information by smartphone or finding the physical display board in the airport.

“But I think we still need the keyboards and touch screens for the input that requires them, honestly,” he says. “Haptic feedback is superimportant. I’ve done the touch-typing on the HoloLens, with the virtual keyboard in midair. They don’t work well, right? Because by the time you see the visual or hear the auditory feedback of your finger hitting it, you end up consciously having to control the typing of your hand.”

Eventually, he says, the motions that have become normalized to smartphone users may translate to the augmented reality sphere. This is the sort of interface that many people associate with the film "Minority Report," in which a series of hand gestures can be used to orchestrate complex computing tasks.

“I have this vision of the future of HoloLens,” MacIntyre says, “where maybe I still have my phone or a small tablet I can use for very precise interaction, and then—when the visual tracking gets good enough—I’m doing pinch-zoom or drag-to-rotate in the air if it can really precisely track my fingers. But it has to be better than it is now. And I think that stuff will get better.”

Better interfaces don’t just have to work, technically, though. They also have to delight users. This was arguably one of the iPhone’s greatest triumphs; the fact that the device—sleek, original and frankly gorgeous—made people want to interact with it. That made the iPhone feel intuitive, although an engaging interface is arguably more important than an intuitive one. There’s an entire research community devoted to gesture-based interfaces, for instance, when the use of gestures this way isn’t really intuitive. This isn’t necessarily a good thing, Cummings, the Duke roboticist, told me. Humans are accustomed to gesturing as a way of emphasizing something, but with the exception of people who use their hands for speaking sign language, “How much do we actually do by gestures?” she says. “And then, it actually increases your mental workload because you have to remember what all the different signals mean. We get led by bright, shiny objects down some rabbit holes.”