At the StrataHadoop conference in Barcelona last week, Rod Smith, Vice President of the IBM Emerging Internet Technologies organization, presented work on an internal product they have been developing in their consulting work with clients that integrates data sources, and data analysis. Using an IPython-style web-based notebook interface, users can search for data sets, extract data, and build visualizations that are embedded in the document in real-time.

Smith says that business people only know what they want when they see it, so it's crucial to have a platform where they can quickly develop and prototype ideas together with data scientists. The tool, which is only used internally so far, also has a more technical view where one can directly interface to Python and Spark. Finally, the data analysis results can be easily exposed via a REST interface, for example, to build JavaScript-based visualizations.

IPython notebook is a feature of IPython, which presents an interface similar to a text console over a web browser. It has since been extended to also integrate plotting capabilities, turning it into a kind of operating system independent console with graphics output. Zeppelin is an open source project which also covers other languages like Scala, or SQL.

This approach emerged as a theme during the conference. Shawn Scully from GraphLab presented their new data analysis product, GraphLab Create, which follows a very similar approach. In a live demo, Scully put together a recommender in a web notebook, and then deployed the learned model from there. Their goal is to provide a simple tool which allows data scientists to quickly create what they call prediction apps. Scully said instead of prototyping a system in one programming language, and then reimplementing the pipeline in order to deploy it, you can easily deploy the pipeline without having to change toolsets.

Prediction APIs were also the focus of the PAPIs.io conference that took place on the two days before the StrataHadoop conference. According to Louis Dorard, who organized the conference, bridging the gap between analyzing a data set and putting the results into production is in fact still a big challenge, and he predicts there will be a lot of future potential for innovation in that area.

One of the first prediction apps was Google's prediction API, first released in 2010. Lately, offerings like Azure ML from Microsoft, or Databricks cloud provide a similar approach of a web-based unified interface to quickly prototype and deploy data analysis solutions.

Asked whether these different product will eventually be merged into one unified solution, Rod Smith said that different audiences still require different kinds of solutions. Tools like Databricks cloud, Azure ML, or GraphLab are more geared towards data scientists, who are also proficient in programming, while his work focuses more on presenting a clean interface where data scientists and business people can interact well. In short, "notebooks will be the new spreadsheets."