If you ever had a task to create a data processing and visualisation system, you know how much work it is.

And I’m not even talking about writing the actual processing algorithms and creating visualisations. Unless such systems are the core of your business and you’re creating them on a regular basis, you’ll be spending most of the time scaffolding out you backend as well as frontend and making sure they interact correctly.

And we all know that doing front-ends in 2015 is quite a challenge — just look at all the tools, frameworks and libs you may want to use for it!

Backend is not too different — there are plenty of languages, frameworks and approaches to pick from.

And then you have to worry about storing results in database, caching all those other minor things that I have forgotten right now.

Oh, and what if you need real-time processing? And don’t forget the scaling — what if you need to process large amounts of data? Can your system scale?

As you can see, there’s a lot of things to do here. And that’s something that people who deal with data processing have been doing for quite some time. I’ve been doing that too — I have written a bunch of code scaffolders for most of those tasks. The problem is — they never quite work for 100%, you always need to tweak something here and there (especially when you need scalable systems). And that still takes a lot of time from creating of actual data processing code.

The Solution — Exynize platform

To solve all of those issues, me and my colleagues at AKSW came up with idea of a platform that will take care of all that boilerplating. Delegating all the boring parts to the platform will allow developers to focus on the most important bit — data processing and visualisation.

Thus Exynize (short for “Extract, Synchronize, Analyze”) was born.

After a few very early prototypes that worked, but was not simplifying the workflow enough, we’ve came to the current version of the platform.

A platform that allows:

constructing pipelines right in your browsers with very little effort,

writing processing component as if you was dealing with a single data item,

re-using existing processing modules in new pipelines,

creating real-time processing and visualisation without thinking about doing real-time at all,

spending time on doing actual work, not fiddling with scaffolding.

Sounds interesting? Then read on — I’m going to show you two simple demos and explain how the system works under the hood.

If you fancy video presentations rather than text, here’s my screencast with those demoes. If you like reading — just scroll down a bit, all of the bits in the video are covered in text below as well.

We’ll start with looking at the demo cases first and after that we’ll look under the hood of the platform.

But before that I want to mention that Exynize platform is currently made with javascript. Backend is based on node.js, express.js and RethinkDB, while front-end uses React.js and Twitter Bootstrap. All of that with sweet taste of Babel.js (so don’t be surprised to see ES6 code). Once again — we’ll go into more details on that after going through use cases.

Use Case 1: Twitter product comparison

For the first use case, let’s compare how people talk about three new smartphones (let’s say iPhone 6s, Nexus 6p and Galaxy S6) on Twitter. Here’s what we need to do:

Take Twitter feed filtered by phone models, and English language for simplicity Calculate sentiments for the text of tweets Display the resulting sentiments and last 10 tweets in three-column layout

Let’s start by writing simple Twitter source component. Here’s how it’ll look:

Twitter source component

Hopefully, the comments are enough to make sense of the code — it’s pretty straight forward.

There are three Exynize-specific things here:

As you might’ve noticed, I imported NPM package “twit”. You are indeed allowed to use npm packages, but for the moment they are limited to whitelisted (for security reasons) subset of such. That might change in the future once I’ll figure out a better way to sandbox components. The main function is exposed using ES6 “export default”. This is a rule for all components. All the non-default arguments (so, all aside from “obs”)will be turned into input fields in the UI. But we’ll go into more detail on that in architecture part. Last parameter for the function — “obs” — is an Rx.Observer that is used to dispatch new data. It is a part of awesome RxJS library that is used to assemble the pipelines, we’ll also look into more details on it in architecture part.

If you are not familiar with Observables, for the moment just think of them as promises that can resolve more than one time.

As you can see, it’s pretty easy to create a reusable Twitter source in Exynize.

Now let’s create a sentiment analysis processor. We’re not going to do anything crazy here and just use a basic AFINN-based sentiment analysis.

Here’s how the source code will look:

Sentiment analysis processor

It’s also pretty trivial. The only thing to note here is that we return Rx.Observable in the end. This is because all the processors are applied to source using “flatMap” method that expects Rx.Observable as result.

And for the last (and probably most complex) bit in this pipeline, we’re going to create a rendering component that’ll display the result for us as a nice web page. Note, that even without it we can already save the pipeline and send JSON type GET requests to it to get results as JSON (e.g. to use it in other tool).

The code for renderer will look like this:

Rendering component that displays tweets for three different phone models in three columns

The idea’s pretty simple — we create a function that returns new React component. Once pipeline is called from browser, Exynize platform will serve that component with a wrapper that does all the real-time data fetching. All you have to care about is “data” property that’ll be passed to your component.

Here’s how the result will look in the browser (see the video for more detailed walkthrough):