Brains don’t talk much, as a rule. But they’re not quiet, either—fizzing with message-carrying molecules among an uncountably complicated thicket of neurons. Still, despite the seeming ubiquity of functional magnetic resonance imagery in stories about “the part of the brain that controls X,” scientists don’t really know what’s going on in there. Functional MRI images actually blur spatially over relatively huge chunks of think-meat, and over several seconds of time. Very low resolution. Electroencephalograms take a faster snapshot, but of the entire brain at once. So neural interfaces like the ones Chang uses—deployed in the past to allow physically paralyzed people to control computers—offer an opportunity for more detailed “electrocorticography,” reading the activity of the brain more directly.

But how to translate an inner monologue to out-loud speech? Chang’s group does it in two steps. First they use a machine-learning algorithm to sync up their recordings of the motor cortex as a person’s mouth moves with the acoustics of the words that movement produces. They use this to train a virtual mouth, essentially a simulation of mouth parts which they can then control with output from the BCI. Chang’s team recorded his five participants talking and electrocorticographically recorded their brains. Then he used those brain recordings to teach a computer to make sounds with a simulated mouth. The mouth produced speech, which listeners recruited on Amazon’s Mechanical Turk were mostly able to transcribe, roughly.

“This is currently a superhot topic, and a lot of very good groups are working on it,” says Christian Herff, a computer scientist at Maastricht University. His team similarly recorded motor cortex activity, but in people with their brains opened up on an operating table, awake and talking while waiting for surgery to remove tumors. Herff’s team went directly from the recordings to a machine-learning trained audio output, bypassing the virtual mouth. But it worked pretty well too. Machine learning has gotten better, electrocorticography has improved, and computer scientists, linguists, and neurosurgeons are all collaborating on the science—leading to a minor boom in the field, Herff says.

Other approaches are chasing the same goal of turning brain activity directly into speech. In a paper earlier this year, a team at Columbia University showed it could generate speech using recordings from the auditory cortex—the part that processes sound—instead of the motor cortex. Right now, people who can’t physically make speech often have to use letter-by-letter technologies to spell out words, a much slower process than actual talking. These researchers would like to give those people a better option. “What approach will ultimately prove better for decoding imagined speech remains to be seen, but it is likely that a hybrid of the two may be best,” says Nima Mesgarani, the Columbia engineer who led that team.

[#video: https://www.youtube.com/embed/kbX9FLJ6WKw

The work is still preliminary, years away from widespread clinical or commercial use. The data set isn’t big enough to train a reliable model, for one thing. But the challenges run even deeper. “Right now this technique is limited to cases where we have direct access to the cortex. If we wanted to do this for the mass market, of course, opening the skull is not an option,” says Tanja Schultz, a computer scientist at the University of Bremen and an early innovator in the field (and Herff’s PhD advisor). Also, Schultz says, “the electrode montage on different patients is usually based on their medical requirements, so the positioning of the electrodes is never the same across patients … The second problem is that brains are not the same. In general, the motor cortex layout is similar across subjects, but it’s not identical.” That makes it hard to generalize the models that turn those signals into speech.