Full disclosure: As this article is authored by an ETL-centric company with its strong suit in manipulating big data outside of databases, what follows will not seem objective to many. Nevertheless it is still meant to present food for thought, and opens the floor to discussion.

Since their beginnings, data warehouse architects (DWA) have been tasked with creating and populating a data warehouse with disparately sourced and formatted data. Due to dramatic growth in data volumes, these same DWAs are challenged to implement their data integration and staging operations more efficiently. The question of whether data transformation will occur inside or outside the target database has become a critical one because of the performance, convenience, and financial consequences involved.

In ETL (extract, transform, load) operations, data are extracted from different sources, transformed separately, and loaded to a DW database and possibly other targets. In ELT, the extracts are fed into the single staging database that also handles the transformations.

ETL remains prevalent because the marketplace flourishes with proven players like Informatica, IBM, Oracle — and IRI with Voracity, which combines FACT (Fast Extract), CoSort or Hadoop transforms, and bulk loading in the same Eclipse GUI — to extract and transform data. This approach prevents burdening databases designed for storage and retrieval (query optimization) with the overhead of large-scale data transformation.

However, with the development of new database technology and hardware appliances like Oracle Exadata that can handle transformations ‘in a box’, ELT may be a practical solution under certain circumstances. And there are specific benefits to isolating the staging (load) and semantic (transform) layers.