by Jeff Magnusson, Charles Smith, John Lee, and Nathan Bates

We’re pleased to announce Lipstick (our Pig workflow visualization tool) as the latest addition to the suite of Netflix Open Source Software.

At Netflix, Apache Pig is used heavily amongst developers when productionizing complex data transformations and workflows against our big data. Pig provides good facilities for code reuse in the form of Python and Java UDFs and Pig macros. It also exposes a simple grammar that allows our users to easily express workflows on big datasets without getting “lost of the weeds” worrying about complicated MapReduce logic.

While Pig’s high level of abstraction is one of its most attractive features, scripts can quickly reach a level of complexity upon which the flow of execution, and it’s relation to the MapReduce jobs being executed, become difficult to conceptualize. This tends to prolong and complicate the effort required to develop, maintain, debug, and monitor the execution of scripts in our environment. In order to address these concerns we have developed Lipstick, a tool that enables developers to visualize and monitor the execution of their data flows at a logical level.

Lipstick was initially developed as a stand-alone tool that produced a graphical depiction of a Pig workflow. While useful, we quickly realized that combining the workflow with information about the job as it ran gave the developer insight that previously required a lot of sifting through logs (or a Pig expert) to piece together. Now, as an implementation of Pig Progress Notification Listener, Lipstick piggybacks on top of all Pig scripts executed in our environment notifying a Lipstick server of job executions and periodically reporting progress as the script executes.