CONTENTS

INTRODUCTION PREFACE: EDUCATION CHAPTER 1: DATA EXPLORATION CHAPTER 2: DATA PIPELINES CHAPTER 3: MONITORING TOOLS CHAPTER 4: PRODUCTIVITY TOOLS CHAPTER 5: AUTOMATION & INNOVATION CHAPTER 6: COMMUNICATION CHAPTER 7: SKILLS CHAPTER 8: MOTIVATION CONCLUSION

INTRODUCTION

I have been very fortunate in working with data. The next week after I finished my last exams on my computer Information Systems degree that I received with honors, I landed a job to flourish my software development skills that I learned in college by developing user interfaces so any user can create any report with our software house that we provided to our clients. I still to this day do various data related tasks in most of the bulk of my time and I think I have gained enough wisdom to tell you the basics a data engineer has to go through along his journey. Let us start then.

PREFACE: EDUCATION

Education is important, but remember, each track teaches you one side of the coin, leaving the other side entirely up to you. What others tell you to learn or you think you need to learn during your career is even more important than what you learned before you graduated.

Like in every draft, you have to prove your own potential within your college days, boot camps, and interviews. But that is just the first step of the journey. The rest, you have to learn them as well within the time-frame of your career.

Taking courses related to your data career is very important and unlike my times that were not plenty of options available in my college years on specializing on some specific data skills, today there are a lot of college and universities that provide degrees in data engineering and data analytics. Although at my time there were several courses in my college that I could complete online or hybrid (partially online and partially offline), we didn’t have access to various certificates and data courses that you could complete on websites by just registering with an e-mail account to gain more knowledge about your data needs. Regardless, now that the term data has become more important than ever, you have more opportunities in taking advantage of the available facilities provided currently on boosting your knowledge on anything you want to do related with data. Some things in the traditional education system have not changed much and you still have two generic approaches to acquire a data career.

Practical Degrees

The first approach is to get a computer information systems degree from a university like I did. The study of information systems has been there since the inception of the first computers that took the space of a big hallway. Although this major in the past used to be a study on how we process data into information, it now has a more modern practical approach on giving you different side business projects that you need to solve while at the same time learning the ins and outs of a specific programming language. For instance, I had an interesting project where I had to get various weather information from a website which the user could search through them through several options on a java application. Or take for instance in one of my last classes where I had to create a C# application that allowed users to search through a dashboard the various hotels available by various criteria and get a report out of it that was made out of crystal reports. Don’t worry though. If you missed doing that as a student, you can always catch up by doing side projects that you are interested to explore on your free time. Besides that you learn a lot about marketing, management, operations, economics, accounting, product development, statistics, and business law and ethics in this major, we see that this major gives you a head start in your career and enough common sense in finding creative ways on applying your technical skills to the bottom line of the business organization that you are working with, a feature that a computer science curriculum does not provide. For me personally, my first projects that I did in my job did not felt much of a challenge because they were similar to the ones I faced in college and I rarely got lost on the big picture on how our software house was used by our clients because I already knew all the general facets for how a business operates, regardless of shape and form. However, if you did not traverse such degree path, you can still gain the same knowledge these days by reading several tech articles and magazines related to technology. Before the 2008 recession, people were playing it safe with the current investments that still provided a lot of high interest in return. Now that past investments have not shown to be sustainable for the future, people are more open on investing in creative ideas that have the potential to either make our economies more effective and efficient, as well make our existing services more convenient. What you can learn in a business course can now be learned by reading autobiographies of success stories of startups and companies that are trying to reinvent themselves.

Theoretical Degrees

The other approach to get a career in data is by getting a computer science degree. Although a computer science degree will not give you straight visibility that you are data literate after graduation, it does show that you have mastered your craftsmanship of building solutions that are easy to maintain and can scale. At my time, my business major and my first job were involved in using Microsoft Windows as an operating system and using compiled languages like C# and Java in graphical user interfaces. Although these tools deliver tangible results, in terms of productivity, Linux and interpreted languages outperform the work you can output for the same amount of time. Fret not if you did not graduate as a computer scientist. I was able to learn Linux and interpreted languages later throughout my career, but it took me at least one year to start using them to their fullest potential. A computer science degree also teaches you fundamental concepts of how programming languages operate as well. You will have to know those concepts well and a lot of math in order to solve the homework provided in your classes. Because how challenging the material of a computer science degree is, from what I learned from my other colleagues that used to study in this major is that you will not be able to get an easy grade compared to other majors. On the other hand, a lot of my colleagues in my information systems department did not get good grades either because a lot of them were doing their courses while having a full time or part-time job. In a computer science degree, it is very challenging to juggle yourself by doing two things at once unless you can manage your time effectively while still being psychologically happy with your life. Although a computer science degree does not give you the basic intuition that you need to know while working in a business organization, it gives you a heads up of all the interview questions that you need to answer, as they most likely are computer science related, especially from big multinational corporations. Furthermore, the foundations you learned in a computer science degree will help you tremendously in the way you approach solving business problems. I think a computer science degree changes your perspective and your mindset on how you will be interacting with the tools that you will be using in your daily work. If you didn’t venture this path, there are many books that can teach you on the ways you can prepare ahead of a technical interview, as well the opportunity to learn the basic foundations within the trade of your skills so you can get all the nook and crannies of your field. To me, that is the difference between a driver that drifts their car with the shifts in automatic mode and the driver that surpasses them with the shifts in manual mode. The former is just a professional while the latter is a real expert in his field.

Although practical degrees can get you ahead of the curve, theoretical degrees provide foundations that come in handy when problems are more complicated or too big to manage. Do get your hands dirty, but also learn a tip or two from those who have mastered their craft.

On advice and self-assessment

Whichever track you choose in your education, you will have to learn the other side of the coin on your own. This is critically important. I see many people to this day that are completely divided about this topic in many ways indirectly or implicitly. This is not the correct approach in having a holistic approach for touching all the basics of what a data engineer is involved around. You cannot expect without doing enough side projects on a domain that you completely do not know about that you will be able to develop successfully the most basic requirements of a project under a tight deadline. You cannot expect on all occasions that a drag and drop third-party program will become your salvation to all of your perils. And no matter what others tell you, it is an embarrassment if you deliver your specialties without having a deep care on how a general business operates. It just shows that your career is motivated solely by the technical specialties you learn instead of having an entrepreneurship mindset where you are honestly passionate about a specific business domain or topic.

For instance, one colleague recommended me to read a book that teaches you a more depth in look at the fundamentals of a relational database. Although I learned how normalization works and the various ways you can join tables within a database in my college years and was able to grasp those concepts more concretely through my years of work experience, the book that I was advised to read in depth was a game changer. It helped me to look at databases from a completely different perspective than I used to before and starting on paying attention to different things that I did not consider before. Furthermore, I was lucky enough to meet and work with a team of developers that embraced on doing all their programming in interpreted languages within a Linux environment instead of doing all the bulk of their work through drag and drop, plug and play. Although they never gave me direct advice, I noticed the trend from job descriptions and blogs that Linux and interpreted languages were becoming an important medium in most workplaces environments. Reading the situation, I became proactive and initiative on dealing with most of my tasks with the use of an interpreted language and all the command lines you can ever use within a Linux environment.

When you get specific advice, make sure to not ignore it as it may help you tremendously in the future of your career. If you get rejections of your work with no feedback or the feedback provided is very vague, it is not constructive, it seems such communication is more of a transaction instead of a deep caring relationship for the progress of your career. You should always differ between those two types of feedback you get either offline or online Q&A portals and focus on finding conversations that can create long lasting relationships instead of transactions either because their life is already fully booked or they dislike something about you either because it is something very personal or they are not in the mood or have the energy to just start arguing about it. Just respect each individual own situation and there will always be an opportunity for similar individuals on having the time to provide honest constructive feedback as kindred spirits. In other situations, this opportunity is not always available so you have to read the environment and be self-initiative and a starter on learning skills that are guaranteed to be prevalent in the future.

CHAPTER 1: DATA EXPLORATION

Your job is how to sail the winds in the best way that avoids your ship in getting the most damage instead of getting to your destination in the fastest way possible.

Remember, it is better for your crew to wait a little bit longer for arriving at the treasure to avoid getting your boat shipwrecked.

When I first started my career, I was fortunate enough to have ample time on exploring ways I can navigate my best on getting the data that I need for my deliverable without making any compromises for using the resources inefficiently or making my source code later on harder to maintain. For many of you, you may not have had the opportunity of starting to work on such environment. You may instead have an impatient businessman that through their non-verbal expression show their annoyance that goes through the mantra of their life as “if it works with a duct tape, you don’t need to fix it”. The end result is those inefficient queries that you wrote last year where working great until quite recently it has stalled the entire system. They mistakenly thought that you know how to write efficient queries and you have no choice due to the urgency to leave some senior within your team handle the inefficient queries you wrote to be optimized. Regardless of your situation, it is time to learn for not letting your crew setting the direction of your ship and get the proper expertise on reading the waves that will approach to your ship by reading weather patterns. Mastering data exploration instead of getting the proper insights as fast as possible should be the first important role for a captain that sails a data lake with a container of their choice. Otherwise, you won’t be any different from the rest of the crew. They may be the fellow crew pirates that can get digging gold for you so you can have enough supply for the rest of your journey, but if they don’t know how to get past the tides properly, they won’t last for long and that is the last thing a captain will want to do for his crew. The basics of data exploration a data engineer should know are:

SQL Keywords: To know the order of execution of all the basic keywords of a query and how to use them properly (SELECT, FROM, WHERE, GROUP BY, HAVING)

To know the order of execution of all the basic keywords of a query and how to use them properly (SELECT, FROM, WHERE, GROUP BY, HAVING) Aggregates, Data Types & Manipulation: The ability to understand the data types of each column, how to manipulate strings, the ability to use conditional statements and how to aggregate data by picking the max, min, sum, average, median and so on.

The ability to understand the data types of each column, how to manipulate strings, the ability to use conditional statements and how to aggregate data by picking the max, min, sum, average, median and so on. Joins: The ability to know what the most proper joins against several tables, that includes left join, right join, inner join, full outer join, union, union all.

The ability to know what the most proper joins against several tables, that includes left join, right join, inner join, full outer join, union, union all. Null Values: The ability to understand what NULL represents in a column and the corresponding Boolean logic (true or false) when evaluating against null values.

The ability to understand what NULL represents in a column and the corresponding Boolean logic (true or false) when evaluating against null values. Cardinality: Understanding how the records of a table are identified as unique and the ability to know and ways to identify the join of two tables whether they are a one to zero relationship, one to one relationship, and one to many relationship.

Understanding how the records of a table are identified as unique and the ability to know and ways to identify the join of two tables whether they are a one to zero relationship, one to one relationship, and one to many relationship. Window Functions: The ability to use window functions that involve partitioning, sorting and ranking records (row_number, rank, dense_rank), getting the values from previous or subsequent records (lag, lead), and the ability to do cumulative sums (rows between unprecedented and current).

The ability to use window functions that involve partitioning, sorting and ranking records (row_number, rank, dense_rank), getting the values from previous or subsequent records (lag, lead), and the ability to do cumulative sums (rows between unprecedented and current). Advantages of each database: To understand the advantages and disadvantages of a relational and columnar database and how to use indexes and partition columns correspondingly.

Although all of the above are easy to learn, they are very hard to master. Courses that provide exercises and assignments can teach you fast the basics, but more challenging topics for the same concepts can either get you lost or have not grasped the topic completely. In a matter of fact, you may not face any difficult challenges on your data explorations that you do within your side projects and you will only encounter them on greater undertakings that involve in the exploration of big volumes of data.

CHAPTER 2: DATA PIPELINES

Great boats are not bought from others. Great boats are built from scratch that retains great bonds with the owner since the first days they get raised.

Creating a data pipeline requires you to learn different types of data operations that each takes time to master. Setting the proper data models, writing reusable general scripts and taking advantage of what each database offers helps in making data pipelines more efficient.

Running a ship from a great owner is great but building one from your own shows you can take complete charge of the whole journeys on terrains that were never explored before. Not all journeys can fit with the same ship the masters have provided you with the keys, some paths are too narrow where only small thin ships can traverse though while others require to be larger than the current you have at your disposal due to the extreme winds and thunderstorms that you have to cross through. It is fun to find ways on traversing effectively with the wheels of your ship to different destinations, but it is even more fun to take in charge what type of ships that you have to build ahead of your next journeys. This expands the responsibilities that you have to take as a data engineer where having a long-term strategy of the stuff that you are creating becomes more important than ever before. Creating data pipelines include the following:

Config: The ability to set up and configure the appropriate packages on your computer instances.

The ability to set up and configure the appropriate packages on your computer instances. Data Operations: The ability to do various data operations, such as download, import, export across databases, data lakes, local instances and FTP servers.

The ability to do various data operations, such as download, import, export across databases, data lakes, local instances and FTP servers. Database Model: The ability to optimize existing pipelines by using the right data models, such as storing your historical data more efficiently (SCD Type II) and effectively (by using partitions), as well using a star or snowflake schema to store slow changing fields and attributes within dimensions and fast-changing fields and transactions in facts.

The ability to optimize existing pipelines by using the right data models, such as storing your historical data more efficiently (SCD Type II) and effectively (by using partitions), as well using a star or snowflake schema to store slow changing fields and attributes within dimensions and fast-changing fields and transactions in facts. Replication: The ability to update tables incrementally via upserts, replicating metadata changes from one database to another, as well the ability for the server to retry failed attempts due to timeout sessions and temporary internal server errors.

The ability to update tables incrementally via upserts, replicating metadata changes from one database to another, as well the ability for the server to retry failed attempts due to timeout sessions and temporary internal server errors. Scripts: The ability to write several general scripts that can be reused for all the data operations that you need to run within your project.

The ability to write several general scripts that can be reused for all the data operations that you need to run within your project. Knowledge of various Database Types : To take the most advantage of the different features each different type of database or instance provides.

Although setting up a local relational database is easy as writing three command lines in your operating system, other proprietary databases need you to connect to their API or use their proprietary executable and libraries to take the most advantage of their features. Open source tools are even harder to set up or to maintain since not everything is tailored to your needs out of the box. Besides setting up instances, if you never learned how a data operation works before, it will get some time before you master on writing the perfect script of doing it effectively. All data operations you do within your project will always have some elements that are unique or you have never encountered in the previous projects you have worked before. Regardless, no matter what project you are working for, there will always be a way to identify data operations that are closely similar to other data operations where the only difference is the parameters you pass. In those cases, writing helper functions or general scripts where you can pass different types of parameters abstracts away most of the code that you would have repeated several times within your code. The data exploration that you have mastered previously will help you a lot in doing the rest of the upkeep work on setting the right data models and making sure you can replicate them in other instances even during times where your server has an outage. Although creating a ship is challenging, keeping the ship out of danger is also a formidable task. How do you know you don’t hit on an iceberg during the middle of the night and how do you evaluate the impact of a collision like that to remedy the effect before the whole ship starts to sink? What do you do when your ship comes with something unexpected like a big whale swarming by?

CHAPTER 3: MONITORING TOOLS

Set the flashlights on, the speakers on, a toolbox and a map on-board, and someone to oversee the surrounding area from the top. It is better to be ready for any event that will surprise us so we do not miss our spot!