Purdue team uses Twitter to track flu patterns

When you post to Twitter, your message doesn’t disappear as it’s buried by the latest news. Some people are analyzing that data.

“When I mention data analytics, a lot of people ... think I’m exploring peoples’ privacy,” said Wei-Liang Kao, a graduate student at Purdue University. “But I wanted to show that big data isn’t just about privacy exploitation. It’s helping people.”

Wei-Liang and a team of computer information technology researchers from Purdue and Perscio, an Indianapolis-based startup, created a computer model that uses Google searches and Tweets, as well as transportation, weather and population data, to predict flu trends as many as five days in the future.

The team recently won $20,000 in IBM’s Big Data for Social Good project.

“Understanding how flu moves is what they were working on,” said John Springer, the team’s faculty adviser. “That’s a real world problem.”

Dirty data

Collecting so much data is a daunting task, but making sense of it all is an even bigger challenge. Social media, while giving researchers the ability to cast a wide net, are considered “dirty data” because of incomplete sentences, sarcasm and abbreviated text.

“Cleaning the data is also a big challenge,” said Yanan Tao, a graduate student.

The model uses the Google search and Twitter data to assess how many people are talking about the flu and where. Factoring in such parameters as population density, major airports or other transportation trends helps predict how fast and where the flu might spread.

“I think we got a pretty accurate model,” said Anuja Rayarikar, a graduate student.

Watch the flu spread

To visualize where flu is spreading, the data are translated to a map of the United States, painting a state-by-state, county-by-county picture of flu patterns in real time, as well as five days in the past or future.

The visualization resembles a heat map of the entire country, said Yuan Hsin, a graduate student.

“After the model is built up, we need to visualize that,” she said. “It’s very inspiring.”

The map made one thing clear: If you want to avoid the flu, “go to western Kansas,” Springer said, and avoid New England.

“There’s a swath of the United States where there’s no major airports. ... The population density is low,” he said. “If you look at the map (where flu activity is high), you’ve got the West Coast, you’ve got the East Coast. The Northeast, especially.”

Future applications

The computer model could have potential applications in homeland security, Springer said, and is especially relevant with recent outbreaks of flu in the United States. It also could be extended to monitor other diseases, making it a relevant tool to track Ebola or any other outbreaks in the future.

There’s even a possibility your phone could alert you when flu activity is near. That’s one notification you might not mind checking.

“Nowadays, cellphones keep track of where you are,” Kao said. “So you if you travel to a new place you might get a flu alert: ‘There’s flu in this area, so watch out.’ ”