When we launched Staffjoy last year, we were one of the first commercial users of the Julia programming language. With the publication of our Mobius algorithm last month, I am sad to share that Staffjoy no longer uses Julia. In this post, I will share what worked well with Julia, what broke, and the opportunities for the future of the language.

How we started using Julia

My friend Zoli first introduced me to the Julia programming language, which he described as an open-source Matlab alternative. Intrigued, I researched its ecosystem, where I found the JuMP optimization library from MIT. In my college operations research classes, we used Excel for simple problems, but had to suffer through GAMS for more complex applications. Knowledge from Excel and GAMS was not transferrable to other systems because they were not complete programming languages — just formulation tools. After graduation, I stopped tinkering in optimization because none of it analysis tools were approachable.

Most optimization systems rely on a variety of specialized “solvers”, which are low-level libraries that implement algorithms such as simplex or branch and bound. Some are free, such as CBC, and others are expensive, such as Gurobi. Most solvers have just a C language interface, but some have specialized libraries for higher-level languages. For an analyst, writing high-level problems in the low-level C language does not make sense, so tools like Excel, GAMS, and JuMP make it easier to interface with these low-level languages.

When I was introduced to Julia, I was excited by the ability to easily formulate mathematical problems in a complete, modern programming language. JuMP makes it easy to express an optimization problem in Julia data structures, then swap the solver used to compute that problem without any modification. This makes benchmarking easy, and it allows users to change the solver depending on the environment. For instance, we used an open-source solver in our development environments, then a paid solver (which was faster) in production environments.

The Beginning of Staffjoy

It was this discovery of Julia that prompted me to revisit my college research on scheduling and call the project “Staffjoy”. When we first completed an original version of the Staffjoy algorithm, Andrew and I thought that it was ready for customer data. With help from some friends, we secured inputs from a real workforce, inputted them into our algorithm, and waited. It took three days on a thirty-two core server to create one schedule for a workweek. The raw computing cost for that single schedule was over $100, which would not scale.

Andrew and I returned to whiteboards in order to reformulate our algorithms. After prototyping for another month, we developed the dynamic programming techniques that allowed Staffjoy to solve such large scheduling problems in a short amount of time. The ease of development in Julia and JuMP empowered us to prototype and test these algorithms without unnecessary hinderance.

Creating a Production Service

As we grew Staffjoy and developed it into a web application, we used more of Julia’s features and libraries to build an automated, cloud-based scheduling service.

Converting our scheduling logic into a production service required many steps:

Unit testing of logic

Parallelization of parts of our code

Backtesting of historical data

Continuous integration

A Docker deployment system

A tasking integration between our API and the scheduler

Syslog

Monitoring

As we developed code in Julia, we applied many standard software practices. The lack of developer tools and instability of the language made it difficult to maintain the codebase.

Testing

The issues with Julia first began with testing. The testing tools in Julia were immature compared to other programming languages. We had issues structuring the project correctly, and we ended up having a master “test” function that called all other functions. It was difficult to determine whether a test was running, and debug information was lacking

Stability

Stability of Julia in production was problematic. We packaged the Julia programming language into Docker containers for deployment. If we had not done this, I believe that we would have had issues maintaining correct library versions and programming language versions in production. We had to compile Julia from source during the build process, and we ended up using shell script to manage dependencies. In production, I ended up getting paged too often about intermittent failures. What seemed to cause the most issues was that one thread would crash, but the overall program would not crash exit, so the container would never be replaced.

Networking and Service-Oriented Architecture

Poor networking libraries hurt our ability to run a service-oriented architecture. The Requests.jl library (inspired by its famous Python eponym) made REST interactions fairly straightforward. However, instabilities and random failures in TLS handling caused a variety of production issues at scale.

In order to debug issues and monitor errors, we stream logs to a syslog server. Formatting and managing Julia logs was such a difficult process that I ended up piping stdout through a variety of shell scripts to accomplish basic tasks like injecting the current environment into log lines.

The reason we ultimately stopped using Julia was the variety of deprecations that occurred during the 0.4 release. Updating thousands of lines of code, such as Dict syntax that we used extensively for testing, proved to be so prohibitively difficult that we never upgraded . As dependencies were frozen and the need for new algorithm changes emerged, we ended up completely rewriting most of the code into Python.

In the end, what stopped us from using Julia was that it was a scientific tool for research, and it had not yet reached the stability that made it stable enough as a production service.

Future of the Language

As we used Julia , I was amazed by the support shown by members of the MIT Operations Research Department. They have truly created one of the most vibrant communities of operations research practitioners on the internet. In addition, they maintained the best example of the Julia programming language by constantly responding to issues and feature requests, while at the same time updating the library to comply with the changing specifications of the Julia language.

To make Julia feasible for commercial applications in the future, the language needs more well-maintained scientific libraries like JuMP. At the core, most programming languages are quite similar, but the developer tools and communities shape the culture and ultimately the roadmap of the language.

I have heard that alignment among the language’s maintainers has been troublesome, such as issues fulfilling features funded by grants. In order for Julia to become feasible for use outside of research it becomes increasingly important that there exists unified vision and leadership for the future of the language.

Conclusion

Once Julia reaches a 1.0 release, I hope that it becomes easier to create and maintain a stable codebase in the language. I still prototype in the Julia language and appreciate how flexible it is for testing new ideas. However, in order to maintain its feasibility for being more than a prototyping tool, Julia still has work needed on its developer tools, runtime library, core packages, and stability. The Julia language helped to create Staffjoy and turn it into a business, and for that I am grateful. However, in order to provide stable services that we could scale for our customers, Staffjoy had to abandon Julia.