Abstract

Nowadays, building applications involves many technologies. There are technologies to render user interfaces, to retrieve and store data, to serve many users, to distribute computing, etc. Increasingly, certain requirements imply the usage of neural networks. So what is the reality of building enterprise applications with the available state-of-the-art neural network technology?

Expectations

I am not a scientist. I am curious, like to understand how things work, have analytical aptitude, and love math, but… this is not what I am being paid for. During working hours, my task is to develop real world applications in timely and cost-efficient manner. And thankfully there is plenty of available technology, aka tools, aka frameworks. that allow me to do exactly that. Without an understanding of magnetism, I am still able to store information. Without understanding query optimization principles, I am still able to write efficient data queries or, without knowing how to fill then memory of a graphics card, I am still able to render a user interface on the monitor. There is even this funny quotation from the definition of technology on Wikipedia:

“(Technology)… can be embedded in machines which can be operated without detailed knowledge of their workings.”

I expected that using Neural Network Technology is not different. That I could, merely by obeying the constraints of the design of a framework, applying patterns, avoiding anti-patterns, and gluing it with all the other relevant technologies, develop real world application without detailed knowledge of every technology I use.

Reality

The reality is very different. Someone willing to employ neural network technologies at the moment (as of January 2017) is forced to do scientific work or at least have an in-depth understanding of neural network methods, despite a number of publicly available technologies created by the brightest and most resourceful minds of our age.

Sure, for a plethora of reasons such as – the technology is not mature enough, some fundamental issues are still unsolved, there is too little pressure from the industry, etc. However, there are some reasons that are focus-related. I will address those which became obvious to me during work on a real-world application. In particular:

Kicking-off with the Technology

Tools to move from experiment to the real application

Development Language

Design of Deep Learning Framework

Coming to Terms With Deep Learning

Background

Our project started 2014 with the development of a recommendation engine for finding solutions in a text-corpora based on documented customer contacts with an after-sales support organization. After successfully implementing a number of use cases based on statistical approaches with token frequency, collaborative filtering and number of data quality improvements involving, among others, advanced natural language processing techniques we have rolled out a number of productive systems.

In 2016 we turned our attention to Neural Networks and Deep Learning. Having great success with Java (actually, Scala), Spark, and the available Java-based machine learning frameworks our first choice was deeplearning4j Version dl4j-0.4-rc3.9 (aka dl4j).

It was spring 2016, and we got annoyed with dl4j. In retrospect, I see the main drivers of our annoyance were less about the framework itself and more about our expectations. What we expected was a yet another robust enterprise framework. OK, the “0.4” and “rc” in the version number should have given us a hint on the maturity of the framework, but we were ignorant. At that time, getting dl4j to work for us was complicated, we did not manage to make it run on GPU backend, and even to make the CPU backend work, we had to compile the framework ourselves, which felt like too much additional work that kept us from fulfilling our main task: implementing a neural network that would learn things for our use case. After two months of trial and error and parallel experiments with a well-known Python-based framework, we decided to switch to that Python framework and suspend work on the dl4j-based network implementation. Oh, the configuration of Python framework was as complicated as that of dl4j, we just got more lucky with it, that’s all.

By the end of November 2016, seven months later, we still hadn't managed to build a network configuration that would converge with data from our domain. After the initial success with building toy models (MNIST, seq2seq and some other models) we had decided that the Python framework was the most promising, but boy did we err. There were plenty of assumptions about what we could have gotten wrong, but there was no visible direction that would enable us to succeed.

At that time, a colleague of mine, Wolfgang Buchner, mentioned that he has recently saw that dl4j had undergone a major re-vamp. We immediately attempted to build an experimental model with dl4j Version 0.7.2, and within two weeks, we actually succeeded. Within the next two weeks, our model converged to a satisfactory level with our actual data. Four weeks total.

Of course no one was very optimistic at the beginning, thus the result surprised us. Reflecting on this surprise, I attempted to analyze the main factors that, in my opinion, helped us succeed, and I came to the conclusion that there were four.

Kicking-Off With the Technology

There are times when it’s OK to skip the documentation and move straight to the code. I personally don’t often need to read documentation to understand a framework providing the MVC pattern or ORM framework, because these are well established patterns that are provided by well established frameworks.

In case of neural networks, I do have to read the documentation, if they have it at all, to kick off a project. Sure there are plenty of papers on arXiv, great lectures on YouTube from renowned MIT professors on KLD, Entropy, Regressions, Backprop, and whatnot. But theoretical explanations of principle, and the capability to write the code that implements those principles, are two very different animals.

Dl4j has two strengths when it comes to helping someone at the start:

Documentation . Framework documentation is not my first choice to understand principles, but definitely the first choice if I want to be able start writing code really fast. Reason being — it focuses on making things work instead of explaining algorithm working principles in-depth, and it focuses on end-to-end use cases, including pre-processing data and giving advice on the “dark art” of hyper-parameter tuning. This I have never seen in the documentation of other deep learning frameworks.

. Framework documentation is not my first choice to understand principles, but definitely the first choice if I want to be able start writing code really fast. Reason being — it focuses on making things work instead of explaining algorithm working principles in-depth, and it focuses on end-to-end use cases, including pre-processing data and giving advice on the “dark art” of hyper-parameter tuning. This I have never seen in the documentation of other deep learning frameworks. Community. I have been hanging around in every deep learning community I could find. Dl4j has the most vibrant, active, patient and open community I have experienced. Yes, most of the answers still come from Skymind people but there is always someone on dl4j gitter channel who has a couple of good hints up their sleeve.

In general, I have the feeling that the intention of the dl4j community is to build real applications. Based on my experience with other deep learning communities I feel that their intention is to discuss topics of their PhD thesis, or prove this or that theorem.

Tools to Move From Experiments to Real Applications

Dl4j is an ecosystem. And as an ecosystem, it provides a number of tools to pre-process data or read it from different formats, to integrate the framework with other (e.g. consumer) technologies and to semi-automatically tune the hyperparameters of my model.

There is one single tool provided by dl4j above all others that had a massive impact on the success of our project so far. It is so called the dl4j User Interface, or UI. It is a web page automatically created by the framework (with minimal configuration, literally five lines of code) that shows graphs of some parameters during network training:

val uiServer = UIServer.getInstance() val statsStorage = new InMemoryStatsStorage() uiServer.attach(statsStorage) listeners +:= new StatsListener(statsStorage) model.setListeners(listeners: _*)

By itself, that would be fine, if you can “read” this analysis data (which, by the way, does not happen by default). So dl4j goes step further and provides extensive and very concrete documentation on how to interpret and analyze the readings, even providing very particular advice on tuning my network configuration. That really made a difference for our project. I am posting the picture of the UI below, but seriously, just navigate to the visualization documentation page of dl4j, and you can read about it in a way more detail.

Development Language

To my astonishment, most of the deep learning frameworks are implemented in dynamic (type) languages. Dynamic typing is a good thing in many cases, but I believe it is the worst possible choice when developing deep learning software.

If you have already worked with some deep learning framework(s), haven’t you wondered why each and every framework provides a number of classes that download, pre-process, and feed the data into a neural network? I haven’t seen such a thing in any other class of frameworks, but I have a guess as to why it is so: namely, because the data has to be quantified and formatted in a rather complex structure, and the transformation of data in a form readable (and learnable) by network is damn difficult.

And then we have this dynamic language, that is so implicit that I literally NEVER know what the hell method A or method B is returning. And when I look at method A, I see it is calling at some point the method A’ which in turn calls A’’ and so on and so forth until I reach the bottom of the stack where the data array is instantiated. By the time I reach the bottom, I have already forgotten what I wanted to accomplish and am trying to figure out the implementation of some utility method of the framework.

In the domain that is so data-centric, where the data structure is so important and models’ ability to learn is so dependant on correctness of data, how for heaven’s sake can someone select dynamic language as the development language?

Fun fact: when you create matrix(10, 255, 64) for training a Recurrent Neural Network in a well known framework, you get 10 sequences of 255 elements of size 64; in dl4j, instead, you get 10 sequences of 64 elements of size 255. How is that not important to know in advance what data structure what method would return?

Dl4j is developed in Java. And although Java itself is not the most innovative language out there, it offers two things extremely important to me and my teammates: type safety and its youngest “cousin” Scala, one of the languages best adapted for machine learning out there.

The Design of the Deep Learning Framework

What is available out-of-the-box versus what has to be built by ourselves is an important issue. My observation is that many frameworks are built with only a limited number of use cases in mind, and all the deep learning frameworks I have encountered have mainly research in mind.

One major design advantage of the dl4j from version 0.7.2 is its ability to switch backends without re-compiling the code. The class-path is being scanned for the backend libraries and the available backend is loaded automatically. The obvious advantage is to be able to run the code on a CPU while testing locally and to run the same code on the GPU when deploying on a GPU-Rig. Another advantage is to be able to do backend specific stuff. Consider this code:

import org.nd4j.linalg.factory.Nd4jBackend import org.nd4j.linalg.jcublas.JCublasBackend /** * We need the Nd4JBackend just for determining if we are using * CUDA or CPU, Dl4j uses its own instance. */ private object Nd4jBackendHolder { val backend: Nd4jBackend = Nd4jBackend.load() } /** * Trait supporting Nd4jBackend dependent configurations. * E.g.: * BackendType match { case CUDA => import org.nd4j.jita.conf.CudaEnvironment CudaEnvironment.getInstance().getConfiguration .setMaximumGridSize(512) .setMaximumBlockSize(512) case CPU => // do some CPU specific stuff } */ trait Nd4jBackendUtils { import Nd4jBackendHolder._ trait BackendType case object CPU extends BackendType case object CUDA extends BackendType val BackendType: BackendType = backend match { case _: JCublasBackend => CUDA case _ => CPU } }

With this simple trait you are able to e.g. configure your model differently based on available backend. We set the batch size based on available BackendType because e.g. GPU is able to process larger batches more efficiently:

object TrainingJob extends Nd4jBackendUtils { val dataIter = new AsyncMultiDataSetIterator( new SegmentedThoughtCorpusIterator(inputFiles, ... train = true), batchSize = BackendType match { case CUDA => 256 case _ => 10 } ) ... }

The well-known Python framework broke our back when, in an attempt to improve the convergence of our network, we tried to implement number of custom layers (e.g. Stochastic Depth). Because of the design of that network, there is literally no possibility to debug a layer, the usage of backends (the ones who do the heavy lifting) was so unintuitive that we literally were guessing what the design should be, and since everything we wrote was compiled without problems and failed only in runtime, this attempt turned into a nightmare.

Up until now, we haven’t written own layers in dl4j. That work is commencing now and in a short time I will be able to evaluate this aspect more objectively.

Outlook

Currently I see a lot of discussion, experiments and effort being invested into networks to analyze image information. I believe that there is huge potential in the manufacturing industry, and until now I have heard very little about efforts to build solutions for manufacturing.

I think manufacturers or producers of manufacturing equipment will require networks of choreographed and aligned networks. Often subsystems or aggregates have to work in a intelligent and semi-autonomous mode while contributing to the coordination or analysis of the whole manufacturing process.

So ability to train networks for specific tasks and then join them in larger networks will be important. Also the ability to train a network and then replicate it with slight modifications (such as those required because the next production line has a slightly different configuration) will be extremely important to build products using deep learning efficiently. Last but not least, even with state-of-the-art technology, the hyperparameter tuning of a neural network is a painstaking and elaborate process which, to my opinion, could be one of the main hindrances in bringing deep learning applications to market in a timely manner.

In respect to dl4j I strongly feel that that this framework will overtake the current top dogs of deep learning simply by providing the industry tools to build actual products/applications using deep learning. This feeling is motivated by the current condition and the focus of the framework developers. For instance:

Dl4j team is working on the solution for hyper-parameter tuning, called Arbiter;

The community is very active, just check-out the liveliness of the dl4j gitter channel;

GitHub statistics look very healthy to me, and last but not least;

From the involvement of Skymind employees both in support of community and evolving dl4j code-base, it seems that dl4j is very central for the business model of this company. And, according to my experience, when commercial enterprize is backing open source project, it gives a huge boost to that project.

The work on the described project will continue throughout 2017 and, likely, 2018. Our current plan is to stick with dl4j and use it in production. I would love to hear about your experience with deep learning and currently available deep learning frameworks, so comment on!

The described project is being developed for Samhammer AG and continues as of 2017.

Special thanks to Wolfgang Buchner and other guys for your excellent criticism and corrections.