Artificially intelligent (AI) systems are as diverse as they come from an architectural standpoint, but there’s one component they all share in common: datasets. The trouble is, large sample sizes are often a corollary of accuracy (a state-of-the-art diagnostic system by Google’s DeepMind subsidiary required 15,000 scans from 7,500 patients), and some datasets are harder to find than others.

Researchers from Nvidia, the Mayo Clinic, and the MGH and BWH Center for Clinical Data Science believe they’ve come up with a solution to the problem: a neural network that itself generates training data — specifically, synthetic three-dimensional magnetic resonance images (MRIs) of brains with cancerous tumors. It’s described it in a paper (“Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks”) being presented today at the Medical Image Computing & Computer Assisted Intervention conference in Granada, Spain.

“We show that for the first time we can generate brain images that can be used to train neural networks,” Hu Chang, a senior research scientist at Nvidia and a lead author on the paper, told VentureBeat in a phone interview.

The AI system, which was developed using Facebook’s PyTorch deep learning framework and trained on a Nvidia DGX platform, leverages a generative adversarial network (GAN) — a two-part neural network consisting of a generator that produces samples and a discriminator, which attempts to distinguish between the generated samples and real-world samples — to create convincing MRIs of abnormal brains.

The team sourced two publicly available datasets — the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) — to train the GAN, and set aside 20 percent of BRATS’ 264 studies for performance testing. Memory and compute restraints forced the team to downsample the scan from a resolution of 256 x 256 x 108 to 128 x 128 x 54, but they used the original images for comparison.

The generator, fed images from ADNI, learned to produce synthetic brain scans (complete with white matter, grey matter, and cerebral spinal fluid) given an image from the ADNI. Next, when set loose on the BRATS dataset, it generated full segmentations with tumors.

The GAN annotated the scans, a task that can take a team of human experts hours. And because it treated the brain and tumor anatomy as two distinct labels, it allowed researchers to alter the tumor’s size and location or to “transplant” it to scans of a healthy brain.

“Conditional GANs are perfectly suited for this,” Chang said. “[It can] remove patients’ privacy concerns [because] the generated images are anonymous.”

So how’d it do? When the team trained a machine learning model using a combination of real brain scans and synthetic brain scans produced by the GAN, it achieved 80 percent accuracy — 14 percent better than a model trained on actual data alone.

“Many radiologists we’ve shown the system have expressed excitement,” Chang said. “They want to use it to generate more examples of rare diseases.”

Future research will investigate the use of higher-resolution training images and larger datasets across diverse patient populations, Chang said. And improved versions of the model might shrink the boundaries around tumors so that they don’t look “superimposed.”

It’s not the first time Nvidia researchers have employed GANs in transforming brain scans. This summer, they demonstrated a system that could convert CT scans into 2D MRIs and another system that could align two or more MRI images in the same scene with superior speed and accuracy.