Late 2017, we set out to replace and upgrade our existing reporting and analytics infrastructure with something that would be a better fit for our workloads. Keeping costs and required maintenance at a minimum would be a nice plus, making for an easy sell. After a bit of research, it was obvious Amazon Redshift had the potential to tick all the right boxes. While steadily porting the most problematic workloads away from our existing infrastructure, I started writing an investigative article on the fundamental concepts of Amazon Redshift. I learned a lot studying each individual building block, allowing me to make some small, but impactful changes to our own setup along the way.

The outcome is a 10.000 word document (1 hour reading time), covering 7 topics:

Storage

Distribution

Importing data

Table maintenance

Exporting data

Query processing

Workload management

The text is available in three formats:

The project is open source and available on Github.

Thanks to everyone who proof-read earlier iterations and provided me with indispensable feedback.

I hope this work can teach you as much as it thought me. I’m looking forward to your feedback.