A Definitive Compilation of Apache Airflow Resources

AKA a new data engineer’s reading list to start working on a production level Apache Airflow architecture.

Official Resources

Introduction/Overview

Airflow in the Industry

Airflow Distributed Deployment

‘How Apache Airflow Distributes Jobs on Celery workers’ by Hugo Lime (Data Scientist, Sicara AI and Big Data). Sicara Engineering. Apr, 2019. ‘A Guide On How To Build An Airflow Server/Cluster’ by Tianlon Song (Sr. Software Engineer, Machine Learning & Big Data at Zillow). Oct, 2016.

Testing

Additional Reading

‘We’re All Using Airflow Wrong and How to Fix It’ by Jessica Laughlin. Bluecore Engineering. Aug, 2018.

These are the resources I’ve used to understand airflow and the data architecture frameworks associated with it. I should update this post as I continue my work and find more helpful resources. Please feel free to suggest additions to this list.

For an ultra exhaustive compilation of Airflow resources, check out the ‘Awesome Apache Airflow GitHub Repo’ by Jakob Homan (Data Software Engineer, Lyft. Airflow Committer and PMC Member).