Data management at the core of USGS Earth science research

The U.S. Geological Survey alone consists of about 8,500 scientists and staff in more than 200 different locations. It not only creates maps and datasets in biology, hydrology, geography and geology for Earth science research in the United States, according to Kevin Gallagher, associate director to the Core Science Systems of USGS, but its work also supports larger multidisciplinary areas, such as climate change, environmental health, energy and minerals and responding to natural hazards.

That kind of data management -- which enables scientists across disciplines, sectors and countries to discover, access, use and share relevant data -- was the impetus for the Community for Data Integration. Formed in 2009, CDI aimed to build a data management community that would advance earth science through enhanced use of data, tools and techniques. Besides providing a way for scientists and data managers to share ideas and learn new skills, CDI aimed to improve USGS’s capabilities in data and information acquisition, management, use and delivery. The resulting tools, services, and techniques would benefit community members, their parent organizations, other partners and customers and the earth science community at large.

Members of the CDI join voluntarily -- usually, Gallagher explained, because these scientists and data managers have a passion for earth science research and technology and recognize the opportunity to leverage each other’s efforts, tools, techniques and experiences. “The purpose is to further the data integration goals of the USGS, to quite frankly learn and expand their expertise in computing, modeling, analytics, data management and data integration,” he said.

While the majority of CDI’s 450 members are USGS staff, nearly 100 members represent other federal agencies, universities and non-government organizations, including partners like the Earth Science Information Partnership, National Science Foundation projects, the National Ecological Observatory Network, NASA and Esri.

The community is made up of working groups formed around common interests and challenges, such as citizen science, data management and the semantic web. Members meet monthly via webinar and every two years with an in-person conference. In between times, they communicate online where members can discuss, exchange information and vote on project proposals.

The group votes on community-based small-scale research and development projects for which Gallagher and his team provide seed money. “I provide half a million dollars a year as incentive for a community to work together and propose projects that I co-fund jointly with them,” he said.

The community members can submit statement of interests for project ideas, and once those are voted on, the top proposals are sent back to respective working group principal investigators to develop into full-blown requests for proposals, identifying the partners, outcomes and schedule. The finalized RFPs are sent to Gallagher and Tim Quinn, chief of the Office of Enterprise Information, for the final selection process. “We’ll try to fund as many of those projects that we can,” Gallagher said.

The projects must meet specific criteria and focus on data integration for interdisciplinary research, innovative approaches to data management and cutting-edge technology. Proposals must tackle known and pressing challenges and have the ability to contribute to a more integrated data environment for USGS, with the incorporation of a tool, methodology or infrastructure that can be reused and scaled.

“I’m funding projects that are for use by the community, so all that criteria that’s in there needs to be in the statement of interests and accumulated on the website, where the community participation is,” Gallagher said. He aims to fund 20 to 30 projects a year.

This proposal process has also proven to be cost effective. According to the USGS, from fiscal year 2012 to the present, the USGS has invested $1.76 million of direct funds into small-scale R&D projects, which have been matched with more than $1.9 million in in-kind efforts, resulting in a number of publicly available products and publications supporting USGS and overall science research.

So far, collaboration efforts have created the USGS Science Data Catalog, a registry of USGS datasets available online for visitors to search and access. The USGS Data Management website created by the Data Management Working Group offers training materials related to the entire data management life cycle, including data planning, curation, preservation and long-term disposition. The working group created the USGS Science Data Lifecycle Model that forms the foundation for the agency’s data management best practices.

The community also created the National Water Information Systems Mapper, an open-source tool to easily map and provide access to geospatial water data, and the Geo Data Portal, which provides access to a variety of climate modeling activities and downscaled climate projections.

Moving forward, Gallagher has goals to integrate the CDI with the data collection, higher-level analysis and innovation efforts of USGS’s John Wesley Powell Center for Analysis and Synthesis and the USGS Innovation Center for Earth Sciences. “We’re always continuously looking for opportunities to expand the influence,” Gallagher said.

Editor's note: This story was changed Jan. 6. It originally referred to Tim Woods rather than Tim Quinn as the person who works with Gallagher on project funding. Also the name of the Earth Science Information Partnership was corrected.