A few data sets are accessible from our data science apprenticeship web page.

Source code and data for our Big Data keyword correlation API (see also section in separate chapter, in our book

Great statistical analysis: forecasting meteorite hits (see also section in separate chapter, in our book)

Fast clustering algorithms for massive datasets (see also section in separate chapter, in our book)

53.5 billion clicks dataset available for benchmarking and testing

Over 5,000,000 financial, economic and social datasets

New pattern to predict stock prices, multiplies return by factor 5 (stock market data, S&P 500; see also section in separate chapter, in our book)

3.5 billion web pages: The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages

Another large data set - 250 million data points: This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record.