Last month I attended Kaggle Days SF, a community-organized Kaggle conference held concurrently with Google Cloud’s flagship conference, Google Cloud Next. The conference main track featured a variety of talks from prominent Kagglers and Googlers: Bojan Tunguz talked about the present and future of AutoML; Jeremy Walthers gave a technical dissection on choosing the number of folds in your k-fold cross validation.

In this article I discuss my personal favorite talk of the conference: “The Secrets of Productive Developer Tool” by François Chollet, the creator of Keras. His thesis is simple: the developer experience of the tools that you use a critical factor in your success as a machine learning practitioner.

Good tools are how you build great products, win competitions, get papers published. […] This seems obvious but you’d be shocked…very few developers of data science tools or software tools care about UX. If you keep this principle in mind, it’s like a superpower.

Chollet cites three principles as the key to a great developer experience:

Deliberately design end-to-end workflows focused on what users care about. Reduce cognitive load for your users. Provide helpful feedback for your users.

API design is a topic of great interest to me, so I thought it’d be to dissect these principles in written form here. You can also watch the talk in its original form on YouTube:

Deliberately design end-to-end workflows focused on what users care about

Good software APIs are designed as holistic workflows, not as sets of atomic features. To illustrate this principle, Chollet cites the following example:

# 👎

cooked_burger = cook_burger(

burger,

grill_model='GR12',

time_on_grill=120,

grill_temperature=150

)

# 👍

cooked_burger = cook_burger(

burger,

level='medium rare'

)

These two code samples do the same thing: they cook an allegorical burger for the user. However, the way go about doing so is radically different.

The first function is focused on procedure. It is parameterized with the exact variables — grill_model , time_on_grill , and grill_temperature — that are key procedurally to the process of cooking a burger. This API makes the implementor’s life easy because it makes determining what to do easy: find a GR12 , heat it up to 150 degrees, and cook the burger for 120 seconds.

Chollet refers to this style derisively as checkbox-driven design. The user experience is akin to going to a restaurant and asking for a burger, only to be given a form to fill out specify how. A typical diner doesn’t care, or want to care, about your grill model or cooking temperature or whatever; the only preference they want to express is the level to which they want their burger patty cooked. If this API was a restaurant it wouldn’t be in business for very long!

His proposed alternative has just that single user variable: level . This is more work for the implementor, who now has to think about and implement a mapping from the vague level to specific cook times and temperatures. But this also hues very closely to what the user cares about, thus ultimately providing a much nicer user experience. Chollet refers to this as user-centric design.

Checkbox-driven APIs have variables that are solution-driven: graph , session , scope , buffer , and param_group , to name a few. User-driven APIs provide variables that are problem-oriented and domain-oriented: layer , model , optimizer , weights , initializer , and so on.

Chollet cites scikit-learn , the venerable Python machine learning library, as his canonical example of API design done right. This is actually a pretty widely held opinion, so it’s worth going on a brief tangent discussing what about scikit-learn is so compelling.

At its core, every model in scikit-learn , from the most complicated to the least, is implemented an an object, and that object has its basic settings applied at instantiation time. Once you have created that object you call fit on it to fit a model to an X and a y , then call predict on some unknown data X_pred to generate an output y_pred . Here’s a code snippet for a simple linear regression model:

from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=False)

model.fit(X, y)

y_pred = model.predict(X_pred)

Every other model in scikit-learn , no matter how complicated, follows this same API.

Notice how the code you write corresponds exactly with what you’re trying to get out of writing it: a model that has been fit on some data that you can now use to predict things. All of the predicates correspond exactly! And that simplicity paves the way for elegance: once you understand this basic abstraction, you are well on your way to understanding everything else in the library. Data transformers fit , then transform . Higher-order pipelines are fit jobs on lists of transformer objects bottoming out in a model object. Model selector means creating objects that run fit on data plus transforms plus models. So on and so forth.

Plus it’s really simple. So simple that you would be forgiven for believing that things were always this easy. They were not. Another presenter, Bing Xu (of XGBoost fame), had a great slide showcasing “before sklearn”:

function g = sigmoid(z)

g = 1./(1+exp(-z));

end function [J grad h th] = cost(theta, xtrain, ytrain, alpha, iter)

th=theta

m=size(xtrain,1);

for j=1:iter

h=sigmoid(xtrain*th);

J=-(1/m)*sum(ytrain.*log(h)+(1-ytrain).*log(1-h));

th=th+(alpha/length(xtrain))*xtrain'*(ytrain-h)

end grad=zeros(size(theta,1),1);

for i=1:size(grad)

grad(i)=(1/m)*sum((h-ytrain)'*xtrain(:,i));

end

end

And “after sklearn”:

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(random_state=0,

multi_class='multimodal').fit(X, y)

Bing referred to the former — an opaque snippet of Octave taken from Andrew Ng’s original machine learning course—as an artifact of the “Ancient Era”. The scikit-learn way of doing things, he explained, ushered in a new “Renaissance Era” of ML.

Reduce cognitive load for your users

“Good software makes hard things simple.” Francois’s example of software that fails at this goal is the UNIX utility tar . Uncompressing a tar file with tar — far and away the most common thing you’d do with this utility — requires the following mysterious incantation:

tar -xvzf filename.tar.gz

No one can do this without looking up the command. The tar API is so comically bad, it has its own XKCD.

A key to a well-designed API is minimizing cognitive load—the number of concepts that a user has to know to perform a task with your software. The key to minimizing cognitive load is smart automation. Reduce the characters and code users need to express to get “good default” behavior. It’s especially important to not let your power users overwhelm you with feature requests that make simpler things harder; after all, the more common the workload, the easier your library should make it to execute.

Chollet provides two specific examples optimizing for cognitive load. Here’s the first one:

numpy.sum(inputs, axis=None, keepdims=False) xxx.sum(inputs, dim, keepdim=False)

The first code sample is from the venerable numpy library, the flagship numerical computation library in Python. The second is from a Google project named Jax, which Chollet describes as “basically numpy, but faster”.

Jax has extremely few new API concepts and, as this code snippet demonstrates, borrows heavily from numpy . But it does it haphazardly. The Jax versus of sum has the same number of argument and the same function name, but subtly different parameters. The axis argument has been replaced with dim ; this is a bad change because it substitutes one fairly arbitrary name for a concept for another less well-known name for the same concept. Even more mysteriously, the similarly arbitrary keepdims argument has been replaced with… keepdim .

Even a single-character difference like this one creates a massive roadblock for users. Given that numpy has set the standard for numerical computation in Python, and given that Jax is, on a conceptual level, very close to numpy , Jax’s choice to arbitrarily modify names for things and break conventions is self-defeating. It forces users to remember which arbitrary name ( dim or axis ? keepdims or keepdim ?) corresponds with which API — a serious and seemingly pointless source of cognitive load.

The message Chollet is communicating here: when designing an API, pay attention to prior art, and prefer existing standards and concepts over new ones.

Chollet’s own keras is a good example of this. The keras API is fundamentally borrowed from scikit-learn , so any user experienced with scikit-learn already knows much of what they need to know to be effective with keras . The many neural network specific optimizations introduced in keras , like model layer construction, model compilation, and training generators, blend mostly seamlessly on top of existing concepts:

import keras

from keras.models import Sequential

from keras.layers import Dense, Dropout, Flatten

from keras.layers import Conv2D, MaxPooling2D

from keras.layers.normalization import BatchNormalization model = Sequential()

model.add(Conv2D(32, input_shape=(28, 28, 1)))

model.add(MaxPooling2D((2, 2)))

model.add(Dropout(0.25))

model.add(Conv2D(64))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dense(10, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy,

metrics=['accuracy']) model.fit(X_train, y_train, batch_size=512, epochs=1,

validation_data=(X_test, y_test)) model.predict(X_test)

His other point is that whilst libraries like keras and scikit-learn may be doing very complex things under the hood, the user doesn’t need to be aware of most of these choices. Good API exposes the user to “good defaults”: code that does the simplest, most common, most expected thing without needing further prompting from the user.

He cites LSTM layers in keras :

from keras.layers import LSTM

model.add(LSTM(32))

LSTM stands for long short term memory, and it’s a particularly complex piece of compute graph engineering (if you haven’t heard of them before, colah’s blog has a crash course: “Understanding LSTM Networks”). There are many choices you could make when creating an LSTM layer; the documentation lists 23 separate function arguments. keras gives you the option of making no choices whatsoever — just provide a single required argument (the number of nodes in the layer) and trust keras to choose a simple, reasonable default for you.

Although simple defaults seem like an obvious Good Thing, not all software has them. To pick on one project in particular: Apache Airflow, a popular Python-based graph workflow engine used by thousands of companies, doesn’t work out of the box; in one of my recent projects getting Airflow to a working state required a half hour of Googling error messages and fixing things absent from the configuration files the project ships with. Frustrating, and definitely not what you want to put in the face of users just getting their hands wet with a new technology.

If the cognitive load of a workload is sufficiently low, it should be possible for a user to go through it from memory without looking up a tutorial or documentation after having done it once or twice.

Provide helpful feedback to your users

Good software is something you should be able to approach almost without docs. You just look at the way things are named, and try them in practice, and make some mistakes and the software is just going to tell you, hey, you made a mistake — here’s what you should be doing instead. That way you can iterate on your work almost without looking up any docs.

Software is rife with unhelpful error messages:

So much so that getting an actually helpful error message is notable:

A good error message should answer three questions:

What happened, in what context?

What did the software expect?

How can the user fix it?

Of course, throwing good error messages requires catching users when they fail, which requires knowing when users fail and why. For implementors, this is where StackOverflow is at its best: see what kinds of questions users are asking, do your best to answer them, and take the further step of incorporating your answer into helpful error messages that will better guide the future users that will inevitably run into the same pitfall.

Another point that Chollet makes is the importance of documentation. Good documentation is the red-headed stepchild of software; we all know how important it is, yet we’re perpetually getting to it “tomorrow”. Documentation is time consuming to write, difficult to perfect, and a burden on further code changes once written.

But it’s also one of the most important ingredients in user success. And good docs extends beyond your actual, well, documentation; it also includes user guides, and blog posts, and tutorials, and helpful StackOverflow answers, useful GitHub issue answers and code recipes for getting stuff done. keras itself owes part of its popularity to its blog, which includes articles like “How conventional neural networks see the world”, “Building Autoencoders in Keras”, and “Building powerful image classification models using very little data” which are the next best thing to classics in the field.

Write docs!

Conclusion

Chollet argues that Keras is as a successful as it has been because it prioritizes the developer experience. Keras is easy to use, which lets you iterate more quickly on models; which means you build more of them; which means you find the ones that work more quickly.

Chollet cites three principles as a foundation for effective user-centric design:

Deliberately design end-to-end workflows focused on what users care about. Reduce cognitive load for your users. Provide helpful feedback for your users.

Hopefully you now understand what these three principles are and how they can help you implement powerful APIs that “just work”. To watch the talk in is original video form, check out the YouTube recording.

Finally, some food for thought. From Kenneth Reitz, the creator of the requests library: