Anasthasia Shilov

In order to evaluate a dataset of over 11 million cells from a study of dengue fever, Yale researchers developed a cutting-edge neural network that recognizes and represents patterns in large datasets.

The network, SAUCIE — or sparse autoencoder for unsupervised clustering, imputation, and embedding — simultaneously analyzes and visualizes large datasets. Though the program could be used for other applications, the researchers demonstrated its efficacy using data from dengue patients.

“The neural network was designed to process high volumes of high-dimensional, single-cell data and then automatically perform a bunch of tests on it including denoising, batch correction, clustering and visualization,” said senior author Smita Krishnaswamy, a School of Medicine professor of genetics and computer science.

As data measurement techniques become increasingly sophisticated, researchers can use neural networks like SAUCIE to draw conclusions from high-volume and high-dimensional datasets that otherwise would overwhelm older programs. Newer solutions often address only one task, but SAUCIE is innovative because it accomplishes many tasks. Not only can SAUCIE analyze data in multiple ways, it also has a faster runtime and can handle larger datasets than other computational methods.

“As the neural network processes data, usually we have no ability to look into it and understand what it is doing … What we did was design a neural network and penalize it in certain ways to make sure that it is interpretable for us,” said first author Matt Amodio GRD ’23, a graduate student in computer science. “We did that by not only telling it to process the data but also telling it how to process the data, and then we could go look inside and understand it.”

The team then used SAUCIE to analyze the dengue patients’ data, which were collected by a team in India that works with School of Medicine professor Ruth Montgomery. Her immunology research considers why different individuals have different outcomes when exposed to the same conditions — such as dengue patients and healthy individuals who live in the same home. This goal of differentiating outcomes when considering a dataset of 11,228,838 cells inspired the development of SAUCIE, and the research yielded useful biological conclusions in addition to computational advancements.

“[SAUCIE] identified some very interesting clusters which were combinations of markers and cell functions which distinguished our patient groups,” Montgomery said. “These were things which we could recreate with traditional methods but which would have taken inordinately much longer to find. And in some cases, SAUCIE did identify some clusters of cells which the human-based approach, manual gating, would have missed.”

With these innovative capabilities of SAUCIE, researchers can explore new questions more efficiently and effectively. For Krishnaswamy, the lab’s work with SAUCIE showed that autoencoders can tackle large datasets, find patterns and yield interpretable results.

“[SAUCIE] is very much within the goals of our lab: unsupervised exploration of high-dimensional biomedical data. There’s a lot of biomedical data out there, whether it’s electronic health record data or single-cell data,” Krishnaswamy said. “They have a lot of dimensions, a lot of observations, and people often don’t know what the structure of that data is.”

Montgomery also spoke of new research possibilities, including analysis of responses to the influenza vaccine, outcomes for pregnant women in Brazil exposed to the Zika virus, and incidences of West Nile Virus infections in eastern Texas. She also noted that the collaboration between the two Yale research groups will be a foundation for future joint research.

“One of the things that people need to begin to think about is the interaction between computational work and wwet bench work in biological systems,” Montgomery said. “It’s really a shared effort, and we really need collaborative teams to push our research forward.”

The first description of dengue fever in the United States was written by Benjamin Rush in Philadelphia in 1780.

Katie Taylor | katie.taylor@yale.edu