I took the Reddit comment archive and converted all the JSON into one SQLite database using this program that I wrote: https://gist.github.com/ers35/3b615a75fa0ed5e6d5cc I ran a few tests to make sure the number of database rows matches the number of JSON records. "SELECT MAX(rowid) FROM comment" and "SELECT COUNT(id) FROM comment" both return 1659361605. This gives me some confidence as to the integrity of the dataset, but I cannot be 100% sure.The compressed size is 163G. The uncompressed filesize is 553G on disk, a nice savings vs. the 908 GB JSON. This is due to only having to store the field names once and storing the timestamps in binary instead of ASCII.