Big Data Killed the Data Modeling Star

Big data offers BI professionals new ways of making information work for the business.

By Paul Sonderegger, Chief Strategist, Endeca Technologies, Inc.

MTV's first video, Video Killed The Radio Star, captured TV's disruption of the music industry. Big data is disrupting the BI industry in a similar way -- changing what BI teams do and how they do it.

It's not immediately obvious why this should be. Shouldn't big data be like a DBA right-to-work act? If managing data is what BI teams do today, a greater supply of information should mean their skills are in greater demand. That's true up to a point, but big data shoots past that point, inverting the relationship.

The volume of big data is such a change in degree that it's a change in kind. It's like running a zoo where every morning the number of animals you have grows by orders of magnitude. Yesterday you had three lions. Today you have 300.

Big volume isn't even the big story. The big story is the variety of data and its velocity of change. This is like running a zoo where the number of animals shoots up every morning as does the number of kinds of animals. Yesterday you had 300 lions. Today you have 30,000 animals, including lions, hummingbirds, giant squid, and more.

The biggest bottleneck in making this data menagerie productive is labor. In a big data world, data modeling, integration, and performance tuning are governors of data use because they rely on relatively slow manual processes done by relatively expensive specialists. In an ironic twist, the substitution of computing capital for labor that transformed other business processes (such as inventory management, manufacturing, and accounting) will do the same to information management itself.

Take the relatively simple case of a data mart with fast-growing volume. As the volume of data grows, query performance tuning becomes both more important and more difficult. Performance tuning requires trade-offs. For example, pre-aggregating the data improves query response but cuts off the user from detailed data which may be valuable for certain investigations. As data volume grows, more data aggregation may be required, eliminating levels of detail that used to be available. When the users rebel, the BI team has to haggle over remediation and strike a new balance. This time-consuming approach is simply unaffordable in a big-data world.

Removing this bottleneck is what data warehouse appliances are all about, including those from Netezza (now IBM) and Vertica (now HP), plus SAP's HANA and Oracle's Exalytics appliances. Dramatic increases in processing horsepower from in-memory architectures, as well as faster look-ups thanks to the improved compression and organization of columnar stores, make performance tuning through model-tweaking a thing of the past.