For researchers and developers in the artificial intelligence industry, the demand for high-quality AI training data only seems to be increasing. Despite this increasing demand, one of the biggest challenges that ML developers face is the search for quality data to train their algorithms. While the demand grows, companies and private organizations seem to be guarding their data more seriously.

Even open data from social media platforms like Twitter and Facebook are difficult to collect. Custom-built APIs are often required just to scrape the data from these platforms. With the goal of creating easier access to open data for researchers, developers, journalists, and general enthusiasts, Google Dataset Search was launched late last year. When the company that made the world’s largest search engine creates a search engine specifically for datasets, it’s bound to get AI developers and machine learning researchers excited.

What is Google Dataset Search?

As the name implies, Google Dataset Search is a search engine specifically for finding datasets. With their dataset search engine similar to Google Scholar, the company aims to improve worldwide access to open data. The search engine is free to use and is available in multiple languages, with more language options to be added in the future.

Lionbridge AI is an industry-leading provider of Search Evaluation Services. Contact us to learn how we can improve your search engine.

The Interface

While Google Dataset Search is still in beta, the search engine UX is well-developed, providing a succinct overview of each indexed dataset. If the details are available for the dataset, Google Dataset Search displays the following information:

Area/geolocale covered File formats Author(s) License information Creation date Providing company or facility Dataset description Time period covered Date of most recent update Variables measured

How do you make your dataset available on Google Dataset Search?

Google Dataset Search crawls and indexes datasets from websites and repositories online just as its corpus search engine does. If you want your dataset to be crawled and indexed properly, you must follow the Schema.org Dataset Markup or one of the other data structure methods described in the dataset developer info.

To learn more about Google Dataset Search, check out the FAQ thread in the community help page.

In a world where technology is developing exponentially year after year, the quest for quality data sources will continue to be a challenging one. If you’re still having trouble finding the training data you need, get in touch with Lionbridge AI to learn how our crowd of multilingual experts can help you meet your project’s needs.

Multilingual Data Annotation Services

Lionbridge provides professional data annotation services in over 300 languages.

Some of our most popular languages include: