The electric cars, manned spacecraft and must-have devices of tomorrow all will be built with discoveries made today in materials science. But to find the alloys, nanomaterials and polymers that will enable these future technologies requires scaling up how researchers store, share, analyze and sift through the surge of materials data from academia, national facilities and industry.

A new $700,000 grant from the National Science Foundation will fund a consortium of Midwestern universities to address these challenges and stimulate innovative materials science research. The Midwest Big Data Spoke for Integrative Materials Design (IMaD) connects experimental and simulation results from several research groups to broaden access to data and computational tools, encourage discovery and collaboration, and extract valuable new discoveries from materials science data.

Founding participants include Northwestern University, the University of Chicago, Argonne National Laboratory, the University of Illinois, the University of Michigan and the University of Wisconsin. The consortium also will work with industrial partners in the aeronautics and automotive industries on developing new data-driven commercial applications for materials science datasets.

“The IMaD spoke will build bridges between materials science data sources so that we'll be able to link far more data than anyone has had access to before,” said co-primary investigator Ian Foster, the Arthur Holly Compton Distinguished Service Professor of Computer Science at UChicago, Argonne senior scientist and Computation Institute senior fellow. “We’ll then work with various groups to apply machine learning and simulation methods to advance the goal of computationally based design of materials.”

The effort supports the mission of the Materials Genome Initiative, launched by the White House in 2011 to accelerate the pace of discovery, deployment and manufacture of advanced materials to improve clean energy, national security and human welfare. It builds upon the National Institute of Standards and Technology-funded Center of Excellence for Hierarchical Materials Design (CHiMAD) formed in 2013, and the Materials Data Facility, created in 2015 to enable the discovery, reuse and publication of materials science data for scientists and researchers.

“What we are facing in the materials community in the coming years is the challenge of sharing, searching and curating large materials data,” said Peter Voorhees, the Frank C. Engelhart Professor of Materials Science and Engineering at Northwestern University and co-director of ChiMAD. “With data coming from our partners, IMaD will become an important resource for materials data that will leverage the strength of the materials engineering community in the Midwest.”

Building bridges

Initially, IMaD will connect data from its founding partners with the MDF, establishing a deep and comprehensive resource for materials science data. New tools created by Globus, a project of the Computation Institute at UChicago and Argonne, will make it easier for researchers to automatically upload new data from ongoing experiments, archive existing data to MDF and find data they need from other sites.

Several existing databases built and maintained by IMaD partners will link up with MDF, creating a “one-stop shop” for finding materials science data. CHiMAD will contribute databases on the properties and structures of polymer nanocomposites and polymer blends. The University of Michigan will contribute its PRISMS Center Material Commons, with data on microstructural evolution and the mechanical behavior of structural metals. Laboratories at the University of Illinois and University of Wisconsin will provide datasets on alloy corrosion, solute diffusion and other important material properties.

In addition to eliminating data silos and creating a multi-institutional resource, IMaD and MDF will help materials science researchers struggling with “big data” problems created by new techniques and technologies, such as resonant soft X-ray scattering and 4-D X-ray tomography, which can produce terabytes of results.

Building bridges between detached data resources is only the beginning. IMaD participants will also develop new computational tools that intelligently search through data, as well as use it in predicting and simulating the properties of new, untested materials. Machine-learning tools will take experimental data from known materials and predict new compositions with desirable properties, such as resistance to high temperatures or corrosive environments.

“We’re putting together a unique set of data capabilities that will allow meta-analyses and machine-learning studies that were really not possible before,” said Ben Blaiszik, research scientist at the Computation Institute. “Scientists will be able to bridge disparate datasets and get better results than they could from any one of the datasets by themselves.”

The consortium also will include several outreach efforts, including webinars, tutorials, and work with industry and technology partners interested in applying materials science data into new commercial products. In particular, researchers will engage companies from the aeronautics and automotive sectors on increasing usage of materials datasets, automating data workflows and training their workforces on computational techniques. Another partnership with Citrine Technologies will link databases with their text-mining methods for generating structured data from previously published research.

“We think these powerful combined efforts will bolster the Midwest’s leadership in materials science and engineering,” Foster said. “IMaD can also serve as a model system for other materials science communities and data-heavy fields such as genomics and the digital humanities.”