To anyone growing up in this era, it may seem strange that modern data started with a punch card. Measuring 7.34 inches wide by 3.25 inches high and approximately .07 inches thick, a punch card was a piece of paper or cardstock containing holes in specific locations that corresponded to specific meanings. In 1890, they were introduced by Herman Hollereith (who would later build IBM) as a way to modernize the system for conducting the census. Instead of relying on individuals to tally up for example, how many Americans worked in agriculture, a machine could be used to count the number of cards that had holes in a specific location that would only appear on the census cards of citizens that worked in that field.

German inventor Dr Herman Hollerith (1860-1929) devised a mechanical system for recording data using the punched card data processing technology for the 1890 US census. Photo by Shutterstock

The problems with this are obvious - it's manual, limited, and not to mention fragile. Coding up data and programs through a series of holes in a piece of paper can only scale so far, but it's incredibly useful to remember it for two reasons: First, it is a great visual to keep in mind for data and second, it was revolutionary for its day because the existence of data, any data, allowed for faster and more accurate computation. It's sort of like the first time you were allowed to use a calculator on a test. For certain problems, even the most basic computation makes a world of difference.

The punch card remained the primary form of data storage for over half a century. It wasn't until the early 1980s that a new technology called magnetic storage rolled around. It manifested in different forms including large data rolls but the most notable example was the consumer friendly floppy disk. The first floppy disks were 8 inches and contained 256,256 bytes of data, about 2000 punch cards worth (and yes, it was sold in part as holding the same amount of data as a box of 2000 punch cards). This was a more scalable and stable form of data, but still insufficient for the amount of data we generate today.

With optical discs (like the CD's that still exist in some electronic stores or make frequent mobiles in children's crafting classes) we again add another layer of density. The larger advancement from a computational perspective is the magnetic hard drive, a laser encoded drive initially capable of holding gigabytes — now terabytes. We've gone through decades of innovation very quickly, but to put it in scale, one terabyte (a reasonable amount of storage for a modern hard drive) would be equivalent to 4,000,000 boxes of the earlier form punch card data. To date, we've generated about 2.7 zetabytes of data as a society. If we put that volume of data into a historical data format, say the conveniently named 8 inch floppy disk and stacked them end to end it would go from earth to the sun 5,300 times.

That said, it's not like there's one hard drive holding all of this data. The most recent big innovation has been The Cloud. The cloud, at least from a storage perspective, is data that is distributed across many different hard drives. A single modern hard drive isn't efficient enough to hold all the data even a small tech company generates. Therefore, what companies like Amazon and Dropbox have done is build a network of hard drives, and get them better at talking to each other and understanding what data is where. This allows for massive scaling because it's usually easy to add another drive to the system.