History Of The Stack Exchange API, Mistakes

In an earlier post, I wrote about some of the philosophy and “cool bits” in the 1.0 release of the Stack Exchange API. That’s all well and good, but of course I’m going to tout the good parts of our API; I wrote a lot of it after all. More interesting are the things that have turned out to be mistakes, we learn more from failure than success after all.

Returning Total By Default

Practically every method in the API returns a count of the elements the query would return if not constrained by paging.

For instance, all questions on Stack Overflow:

{ "total": 1936398, "page": 1, "pagesize": 30, "questions": [...] }

Total is useful for rendering paging controls, and count(*) queries (how many of my comments have been up-voted, and so on); so it’s not that the total field itself was a mistake. But returning it by default definitely was.

The trick is that while total can be useful, it’s not always useful. Quite frequently queries take the form of “give me the most recent N questions/answers/users who X”, or “give me the top N questions/answers owned by U ordered by S”. Neither of these common queries care about total, but they’re paying the cost of fetching it each time.

“Implicit” Types

Each method in the Stack Exchange API returns a homogenous set of results, wrapped in a meta data object. You get collections of questions, answers, comments, users, badges, and so on back.

The mistake is that although the form of the response is conceptually consistent, the key under which the actual data is returned is based on the type. Examples help illustrate this.

/1.0/questions returns:

{ "total": 1947127, ... "questions": [...] }

/1.0/users returns:

{ "total": 507795, ... "users": [...] }

This makes it something of a pain to write wrappers around our API in statically typed languages. A much better design would have been a consistent `items` field with an additional `type` field.

How /1.0/questions should have looked:

{ "total": 1947127, "type": "question", ... "items": [...] }

This mistake became apparent as more API wrappers were written. Stacky, for example, has a number of otherwise pointless classes (the “Responses” classes) just to deal with this.

Inconsistent HTML “Safety”

This one only affects web apps using our API, but it can be a real doozy when it does. Essentially, not all text returns from our API is safe to embed directly into HTML.

This is complicated a bit by many of our fields having legitimate HTML in them, making it so consumers can’t just html encode everything. Question bodies, for example, almost always have a great deal of HTML in them.

This led to the situation where question bodies are safe to embed directly, but question titles are not; user about mes, but not display names; and so on. Ideally, everything would be safe to embed directly except in certain rare circumstances.

This mistake is a consequence of how we store the underlying data. It just so happens that we encode question titles and user display names “just in time”, while question bodies and user about mes are stored pre-rendered.

A Focus On Registered Users

There are two distinct mistakes here. First, we have no way of returning non-existent users. This question, for instance, has no owner. In the API, we return no user object even though we clearly know at least the display name of the user. This comes from 1.0 assuming that every user will have an id, which is a flawed assumption.

Second, the /1.0/users route only returns registered users. Unregistered users can be found via their ids, or via some other resource (their questions, comments, etc.). This is basically a bug that no one noticed until it was too late, and got frozen into 1.0.

I suppose the lesson to take from these two mistakes is that your beta audience (in our case, registered users) and popular queries (which for us are all around questions and answers) have a very large impact on the “polish” pieces of an API get. A corollary to Linus’ Law to be aware of, as the eyeballs are not uniformly distributed.

Wasteful Request Quotas

Our request quota system is a lift from Twitter’s API for the most part, since we figured it was better to steal borrow from an existing widely used API than risk inventing a worse system.

To quickly summarize, we issue every IP using the API a quota (that can be raised by using an app key) and return the remaining and total quotas in the X-RateLimit-Current and X-RateLimit-Max headers. These quotas reset 24 hours after they are initially set.

This turns out to be pretty wasteful in terms of bandwidth as, unlike Twitter, our quotas are quite generous (10,000 requests a day) and not dynamic. As with the total field, many applications don’t really care about the quota (until they exceed it, which is rare) but they pay to fetch it on every request.

Quotas are also the only bit of meta data we place in response headers, making them very easy for developers to miss (since no one reads documentation, they just start poking at APIs). They also aren’t compressed due to the nature of headers, which goes against our “always compress responses” design decision.

The Good News

Is that all of these, along with some other less interesting mistakes, are slated to be fixed in 2.0. We couldn’t address them in 1.1, as we were committed to not breaking backwards compatibility in a point-release (there were also serious time constraints).