ETL made easy

Despite the fact that an ETL task is pretty challenging when it comes to loading Big Data, there’s still the scenario in which you can load gigabytes or terabytes of data from Oracle into BigQuery ( Analytics Data Warehouse Google Cloud) relatively easy and very effective.

The Data Preparation and Transformations process consume up nearly 80% of one’s time of Data Scientist work.

In my case, the diagram below is based on cloud architecture that we have been created for our customer using Machine Learning Libraries like pandas, numpy, and others for Data Science. Our customer’s database table was imported into BigQuery for further data preparation and transformations using Google Cloud Dataprep.

I would like to mention the excellent job/work of TRIFACTA, a company that developed Dataprep together with Google.

This tool is really useful and time-saving, let’s say that it made our Data Scientists smile.

Data pipeline

Steps :

1. Export Oracle database table to CSV.

You can use a third-party ETL tool as well to connect your database directly to the BigQuery as described in this link, eliminating some steps described below.