A machine learning method uncovered a hidden clue in people’s language predictive of the later manifestation of psychosis: the frequent use of words associated with sound. A paper published by the journal npj Schizophrenia released the findings by scientists from Emory University and Harvard University.

Hidden details

The researchers developed a new machine-learning methodology to more precisely quantify the semantic richness of people’s conversational language (a known indicator for psychosis). Their results indicated that automated analysis of the two language variables (more frequent use of words associated with sound and speaking with low semantic density, or vagueness) can predict if an at-risk person will later develop psychosis with an impressive 93 percent accuracy.

Trained clinicians had not noticed how individuals at risk for psychosis use more words associated with sound than the average population, though abnormal auditory perception is a pre-clinical symptom.

“Voices: Living with Schizophrenia” by WebMD, YouTube.

Machine learning can spot patterns in people’s use of language that even doctors who have undergone training to diagnose and treat those at risk of psychosis may not notice. “Trying to hear these subtleties in conversations with people is like trying to see microscopic germs with your eyes,” says first study author Neguine Rezaii, a fellow in the Department of Neurology at Harvard Medical School. That being said, it is possible to use machine learning to find subtle patterns hiding in people’s language. “It’s like a microscope for warning signs of psychosis,” she adds. Rezaii started working on the study while she was a resident in the Department of Psychiatry and Behavioral Sciences at Emory University School of Medicine.

“Trying to hear these subtleties in conversations with people is like trying to see microscopic germs with your eyes,” Neguine Rezaii, fellow in the Department of Neurology at Harvard Medical School.

Behind the data

Researchers first used machine learning to establish “norms” for conversational language. They fed a computer software program the online conversations of 30,000 users of Reddit, a popular social media platform where people have informal discussions about a wide array of sujects. The software program, known as Word2Vec, utilizes an algorithm to change individual words to vectors, assigning each one a location in a semantic space based on its meaning. Such with similar meanings are positioned closer together than those with different meanings.

They also developed a computer program to perform “vector unpacking,” or analysis of the semantic density of word usage. Previous work has measured semantic coherence between sentences. Vector unpacking enabled the researchers to quantify how much information was packed into each sentence. After generating a baseline of “normal” data, the researchers applied the same techniques to diagnostic interviews of 40 participants that had been conducted by trained clinicians, as part of the multi-site North American Prodrome Longitudinal Study (NAPLS), funded by the National Institutes of Health.

Vector unpacking enabled the researchers to quantify how much information was packed into each sentence.

The automated analyses of the participant samples were then compared to the normal baseline sample and the longitudinal data on whether the participants converted to psychosis.

"This research is interesting not just for its potential to reveal more about mental illness, but for understanding how the mind works” concludes senior author Phillip Wolff, a professor of psychology at Emory.