With the coronavirus growing more deadly in China, artificial intelligence researchers are applying machine-learning techniques to social media, web, and other data for subtle signs that the disease may be spreading elsewhere.

The new virus emerged in Wuhan, China, in December, triggering a global health emergency. It remains uncertain how deadly or contagious the virus is, and how widely it might have already spread. Infections and deaths continue to rise. More than 31,000 people have now contracted the disease in China, and 630 people have died, according to figures released by authorities there Friday.

John Brownstein, chief innovation officer at Harvard Medical School and an expert on mining social media information for health trends, is part of an international team using machine learning to comb through social media posts, news reports, data from official public health channels, and information supplied by doctors for warning signs the virus is taking hold in countries outside of China.

The program is looking for social media posts that mention specific symptoms, like respiratory problems and fever, from a geographic area where doctors have reported potential cases. Natural language processing is used to parse the text posted on social media, for example, to distinguish between someone discussing the news and someone complaining about how they feel. A company called BlueDot used a similar approach—minus the social media sources—to spot the coronavirus in late December, before Chinese authorities acknowledged the emergency.

“We are moving to surveillance efforts in the US,” Brownstein says. It is critical to determine where the virus may surface if the authorities are to allocate resources and block its spread effectively. “We’re trying to understand what’s happening in the population at large,” he says.

Ask WIRED What is a coronavirus?

The rate of new infections has slowed slightly in recent days, from 3,900 new cases on Wednesday to 3,700 cases on Thursday to 3,200 cases on Friday, according to the World Health Organization. Yet it isn’t clear if the spread is really slowing or if new infections are simply becoming more difficult to track.

So far, other countries have reported far fewer cases of coronavirus. But there is still widespread concern about the virus spreading. The US has imposed a travel ban on China even though experts question the effectiveness and ethics of such a move. Researchers at Johns Hopkins University have created a visualization of the virus’s progress around the world based on official numbers and confirmed cases.

Health experts did not have access to such quantities of social, web, and mobile data when seeking to track previous outbreaks such as severe acute respiratory syndrome (SARS). But finding signs of the new virus in a vast soup of speculation, rumor, and posts about ordinary cold and flu symptoms is a formidable challenge. “The models have to be retrained to think about the terms people will use and the slightly different symptom set,” Brownstein says.

Even so, the approach has proven capable of spotting a coronavirus needle in a haystack of big data. Brownstein says colleagues tracking Chinese social media and news sources were alerted to a cluster of reports about a flu-like outbreak on December 30. This was shared with the WHO, but it took time to confirm the seriousness of the situation.

Beyond identifying new cases, Brownstein says the technique could help experts learn how the virus behaves. It may be possible to determine the age, gender, and location of those most at risk more quickly than using official medical sources.

Alessandro Vespignani, a professor at Northeastern University who specializes in modeling contagion in large populations, says it will be particularly challenging to identify new instances of the coronavirus from social media posts, even using the most advanced AI tools, because its characteristics still aren’t entirely clear. “It’s something new. We don’t have historical data,” Vespignani says. ”There are very few cases in the US, and most of the activity is driven by the media, by people’s curiosity.”