Users have always wished for less dependence on vendors. However, that comes at a price. As they navigate the twists and turns of today's big data ecosystem, they take on responsibilities that were once the vendors', at least to some degree.

The new style of data engineering calls for a heaping helping of DevOps, that being the extension of Agile methods that requires developers to take more responsibility for how innovative applications perform in production. At the same time, engineers are required to learn new software at a breathtaking pace.

Of late, users have had to connect the dots represented by a steady parade of open source tools. This might be called the curse of interesting times.

Teams have also had to switch components in and out at a fairly swift clip. The most vivid example is that many early adopters have had to create MapReduce-based Hadoop applications, only to have to redo them using the Spark processing engine.

Streaming in flux Other examples of component hopping abound, with a variety of young open source offerings to sort through for Hadoop SQL querying tools, machine learning and other capabilities. Telling examples have emerged from the open source data streaming space, which is evolving along with a new class of real-time systems that go beyond batch processing. In streaming, the tools are in more than a little flux. Early contender Apache Storm took a back seat to Apache Spark, which, in turn, found Apache Flink nibbling at its heels -- and all of this occurred in just a few quick years. This is the very nature of modern data engineering, according to no less a personage than Hadoop co-creator Doug Cutting, chief architect at Cloudera. Today, people have to be ready to experiment with software components, he said. In fact, it is not hard to find shops that have worked with several streaming architectures, involving a lot of on-the-job learning. As Spark moves to add record at a time style streaming via a recently announced Drizzle add-on, more learning will be in the offing. This anticipated Spark update comes as teams still work to find out how streaming itself works in the context of a larger big data ecosystem. It was clear in technical sessions at last month's Spark Summit East conference that a whole lot of tuning was happening. What that means is data engineers are figuring out ways to monitor streaming apps to stay ahead of processing failures. They are finding out how the components work in different combinations. Such application hardening is a big part of moving from proof of concept to production. End users are now a part of this quest, just as much as vendors.