Community wiki to collect data sets available to download/seed on bit torrent

DNS Census, the DNS registration dataset snapshot taken in 2013 (compressed ~15GB and uncompressed 157GB).

The DNS Census 2013 is an attempt to provide a public dataset of registered domains and DNS records. It was inspired by the Internet Census 2012 which showed that releasing data anonymously via BitTorrent is a good thing to do. The dataset contains about 2.5 billion DNS records gathered in the years 2012-2013. All data is compressed using xz/LZMA2. DNS records are written into CSV files. There is one file for each DNS record type (A/AAAA/CNAME/DNAME/MX/NS/SOA/TXT). The records are sorted lexicographically by hostname and by time.

NYC taxi trip data - 2013 Trip Data (11.0GB) and 2013 Fare Data (7.7GB)

Fare data looks like this, showing medallion, hack_license, vendor_id, pickup date/time, payment type, fare, tip amount (look at all those zeros!), tolls, and total. Trip data (the good stuff!) looks like this. Each file has about 14 million rows, and each row contains medallion, hack license, vendor id, rate code, store and forward flag, pickup date/time dropoff date/time, passenger count, trip time in seconds, trip distance, and latitude/longitude coordinates for the pickup and dropoff locations.

Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.

The Combined Online Information System (COINS) is the database for UK Government expenditure. The data is used to produce expenditure data in the Budget report; Supply Estimates; Public Expenditure Statistical Analyses (PESA); Whole of Government Accounts (WGA); the monthly Public Sector Finance Releases. It is also used by the ONS for other National Statistics releases.

This service is designed to facilitate storage of all the data used in research, including datasets as well as publications. There are many advantages of using bittorrent technology to disseminate this work.

911datasets - (3,254 GB) (3 TB) in 256,673 files (as of 31 Dec 2014)

Crowdsourcing 9/11 information distribution. The idea behind 911datasets.org is to make raw information about 9/11 available to a broad audience, and to provide a way to ask useful questions about the data. Most of the material was obtained using the Freedom Of Information Act (FOIA) process.

Link to all torrents

+ Wikileaks data storage - wlstorage.net

You can either download everything, or select files by "project", as folders or single files: torrents (1827 in total at time of posting).

More details here

2012 Internet Census (568 GB torrent)

While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.

Full data download

Hilbert Browser tool

Image gallery

(Taken from this answer)

Sci-Hub is a paywall-bypassing website that uses "shared" user credentials to provide PDF or HTML scientific papers. The website itself doesn't store any papers. Answer from this open data stack exchange question.

Collection of more than 1 million books.

Collection of more than 50 million scientific papers.

All data gathered during our research is released into the public domain for further study.

This is a collection of Geocities data downloaded by a bunch of people who call themselves ARCHIVE TEAM, who began scraping the Yahoo! Geocities site during a six month period in 2009, before Yahoo! shut down geocities.com on October 26th, 2009. This collection is compressed in a UNIX filesystem with both 7zip archives and tape archives (gtar).

Wikimedia data dump (over 23 TB total, but smaller torrents available)

This is an unofficial listing of Wikimedia data dump torrents, dumps of Wikimedia site content distributed using BitTorrent... This includes both dumps already being distributed at dumps.wikimedia.org and dumps created and distributed solely by others. BitTorrent is not officially used to distribute Wikimedia dumps; this article lists user-created torrents. Please protect your computer and verify the md5sum for any file downloaded from these unofficial mirrors.

In particular, Wikidata

The GHTorrent project

GHTorrent monitors the Github public event time line. For each event, it retrieves its contents and their dependencies, exhaustively. It then stores the raw JSON responses to a MongoDB database, while also extracting their structure in a MySQL database. Currently (Jan 2015), MongoDB stores around 4TB of JSON data (compressed), while MySQL more than 1.5 billion rows of extracted metadata. A large part of the activity of 2012, 2013, 2014 and 2015 has been retrieved, while we are also going backwards to retrieve the full recorded history of important projects.

Since 2015, the dumps are daily as mysql format

Downloads: http://ghtorrent.org/downloads.html