The industry demand for Data Engineers is constantly on the rise, and with it more and more software engineers and recent graduates try to enter the field. The biggest hurdle for newcomers lies in understanding the Data Engineering landscape and getting hands-on experience with relevant frameworks.

We at Insight offer a 7 week tuition-free Fellowship to transition into Data Engineering and have worked with hundreds of Fellows who had to overcome this exact hurdle. We asked them which resources were particularly helpful in making the transition and the results are in — see below for the top 10 blog posts/resources!

Data Engineering Landscape

Before jumping into a project or choosing frameworks to work with, it is important to take a step and look at the big picture. What is Data Engineering and what is the role of a Data Engineer? Which are the most important concepts and frameworks one should understand? Here is a collection of great articles that shed some light on these questions and have helped our Fellows in the past:

1. Getting Started with Data Engineering

This blog post by Richard Taylor starts with a discussion of what big data and data engineering really mean before delving into an overview of the current landscape. The author strikes a great balance between being concise yet deep enough to cover topics such as the CAP theorem or resource managers.

2. Want to Become a Data Engineer?

The article by Pranav Dar walks through the different technical skills a data engineer should acquire and lists useful resources to get started. It starts very basic with introductory articles to Python, but also includes links to courses covering the Hadoop ecosystem, including Spark and Hive.

3. Distributed Architecture Concepts I Learned While Building a Large Payments System

You can think of this blog post by George Orosz as a notebook the author developed when transitioning into Data Engineering themself. Rather than jumping into the details of specific frameworks, it focuses on basic recurring concepts that are helpful for any tech stack.

4. Data Engineering Cookbook

This cookbook by Andreas Kretz is not yet complete but already has gathered a huge following. This is not surprising, given that it already contains a lot of high quality content starting from a definition of Data Engineering (the author likes to refer to it as plumbing for Data Science) to agile development methodologies to in-depth discussions about Hadoop and Docker. It is definitely worth bookmarking this cookbook as it rapidly evolves to one of the most comprehensive resources for Data Engineers.