@Taryn and the kind folks at Stack Overflow publish their data export periodically with your questions, answers, comments, user info, and more. It’s available as an XML data dump, which I then take and import into SQL Server for teaching performance tuning.

You can download the 38GB torrent (magnet), which gives you a series of 7Zip files that you can extract to produce a ~304GB SQL Server 2008 database. You can then attach it to any 2008-2017 SQL Server. (If you want a smaller 10GB version, check out the mini 10GB version circa 2010.

The data goes up to 2018-06-03 and includes:

Badges – 27M rows, 1GB data.

Comments – 66M rows, 22GB

PostHistory -106M rows, 174GB (new)

Posts – 41M rows, 106GB

Users – 9M rows, 1GB.

Votes – 151M rows, 5GB.

And a few smaller supporting tables: PostHistoryTypes, PostLinks, PostTypes, and Tags.

As always, it’s open source, licensed under Creative Commons Attribution-Share Alike 3.0, as is the source data dump.

The new PostHistory table makes this one way bigger.

The Posts table holds questions and answers. The PostHistory table tracks events that have happened to a post over time, like edits. One of my favorite features of the Stack sites is the ability for people to edit each others’ questions and answers. You can jump right in and improve someone’s question by adding more details, or edit their answers, too.

To join them together:

SELECT pht.Type, ph.* FROM dbo.PostHistory ph INNER JOIN dbo.PostHistoryTypes pht ON ph.PostHistoryTypeId = pht.Id WHERE ph.PostId = 11227809 ORDER BY ph.CreationDate; 1 2 3 4 5 SELECT pht . Type , ph . * FROM dbo . PostHistory ph INNER JOIN dbo . PostHistoryTypes pht ON ph . PostHistoryTypeId = pht . Id WHERE ph . PostId = 11227809 ORDER BY ph . CreationDate ;

Results for a very active question:

I haven’t included this table in the past because it’s so large, and I wanted people to be able to do Stack query demos on smaller machines. However, we’ve solved that problem by distributing a separate 10GB StackOverflow2010 database (1GB 7zip) showing data from 2008-2010.

To learn more about using the database: