In Getting Data | The REST Dialogues, Duncan Cragg conducts an interview with an imaginary eBay architect. While I don't play one on TV, I am a real eBay architect and would love to participate in this dialogue. As the first two parts are complete, I felt I should post my follow ups here and hopefully invite Mr. Cragg to conduct the remaining 7 parts with me. I must make the standard disclaimers though. I am not speaking for eBay. None of my comments reflect on current or future products. This is purely a technical discourse on the merits of REST vs SOAP styles of interaction, whether eBay ever chooses to offer such an interaction or not.

Duncan Cragg: So - let's get straight to my argument: I claim that your SOAP APIs, as instances of the SOA style, won't scale or interoperate as well as they would if they were implemented in the REST style. Which, in the form of the Web, has largely proven scalability and interoperability.

Dan Pritchett: The scaling argument is an interesting position. Most of the data that would be returned by eBay interfaces will involve structure that is best captured in XML. From a scaling perspective, XML is XML. Parsing is definitely more expensive than generation and there is little argument that REST can reduce the parse load placed on our resources but this is only a portion of the overall processing load.

Interoperability would depend largely on the relatively similarity between the eBay entities and other Web 2.0 entities. To the extent there is overlap, I would concur that a standardized format improves interoperability. I would also assert that the most interesting entities at eBay are unique to eBay.

DC: That's true now, but if the Web 2.0 vision comes together, you may care: your API traffic could increase dramatically. It would be better to be the one prepared for the scale of the API-Web!

Can you really argue in your company that you don't need to be scalable? What if your port 80 traffic needs to be routed to your APIs for some reason?

DP: While my imaginary co-worker may state that scalability is not a concern for eBay, I would never make such a claim. Scalability is always an architectural consideration. Rather than expecting that port 80 traffic would be routed to the API though, I would expect that future traffic growth might come from applications that leverage the API.

DC: As for interoperability, you could be excluded from Web 2.0 industry-boosting consortia, or excluded from perhaps hugely popular Web 2.0 applications in the future...

Interoperability raises the level of the market as a whole. Market players shouldn't differentiate on what's common to them, they should differentiate on the level above.

It also depends on the value you place on having happy customers who don't have to do the same thing multiple ways or multiple times.

DP: This comes back to defining what operations and entities are common between eBay and other market players. There are probably subsets of entities that are common (e.g. messages or users) but even in that context, there are structured components that require extensions from a common base entity to prove useful. This becomes the substance of the conversation and I also believe the largest challenge that the semantic web currently faces.

DC: OK, let's look at your SOAP API. There are 72 function calls in there that begin with 'Get'. Each one specifies a particular piece of data that you can fetch.

DP: Go on

DC: Sure, but you don't need a new function call for everything you can get from your system: you can just use HTTP GET!

DP: Sure, I just need to parameterize the GET operation to differentiate what data you're requesting. But aren't we largely debating syntax and mechanism, not semantics?

DC: It's not just any 'data going in': the URI can be passed around for anyone to re-use. This URI is more interoperable because so much deployed software understands it. No-one understands 'GetSearchResults()'!

DP: Okay, fair enough. URI oriented requests can be more easily saved and shared.

DC: Another example of how the URI can glue things together is that the data returned from your GETs can have more URIs in them, ready to go! You won't get data from your Web Service with 'GetItem()' in it..

DP: Another fair point.

DC: REST also talks about the formats of the data behind a URI. In a GET, the response data is given a Content-Type, and there's an expectation that clients will understand the types of data being returned: interoperability comes from broad standardisation of return data.

DP: But now we're back to the point I've made earlier. Standardization assumes common entities. We certainly have entities that can be declared common at a high level (e.g. users, products, messages). These entities become somewhat less common as you dive into the details. We also have several entities that are not common (e.g. items, bids).

DC:The explicit statement of Content-Type reflects a culture of agreement forced by the sharability of URIs: your URIs are more sharable when more clients understand the data they dereference to.

On the other hand, the culture of SOA is to declare custom WSDL and custom XML schemas.

Like I said, one day you may care about interoperability, and having an architecture that puts a high value on content type and schema standardisation, as REST does, puts you one step ahead.

DP: So the suggestion is that all vendors are going to agree on a common set of entities and their detailed schemas? I suppose that might happen for a subset of the entities but I think even that will prove challenging. I was at Sun in 1992 when we proposed an industry standard format for calendar appointments. Fourteen years later there are still competing standards and trying to give a Mac iCal appointment to an Outlook user is harder than it should be.

If there are REST standards around the entities that we publish, then it would make sense for us to consider them. To the extent that our entities are unique, then isn't the format we publish the standard by definition?

DC: You can also gain scalability by partitioning on those URIs.

DP: We partition along many dimensions, URI's just being one of them.

DC: Yes, but URI partitioning cuts right through the system in a very simple way: your partitioning is an application-specific optimisation which has to be hand-coded behind the SOAP interface.

DP: Our partitioning doesn't follow the model you've imagined but I can't really share all of our partitioning magic with you.

DC: Another benefit of using HTTP over using SOAP is that you get cacheing built in to the architecture, which you can start using as soon as you ask for it in the headers. This boosts scalability.

DP: Caching dynamically generated content is considerably more difficult than you think. There are portions of our results that can be cached but rarely the entire result set from a single request. We already to caching where it can be done and still provide correct results to the interface. Bear in mind that you are talking about a system with more than 5,000 state changes per second.

DC: Which is where you're potentially inefficient.

DP: Correctness must always override efficiency, especially where money is concerned.

DC: Again - it's application-specific.

So - even in the simple cases of fetching data, REST has given you much greater scalability and interoperability than your SOAP interface - as well as a simpler, more generic approach.

DP: In many cases our caches have to be application specific. The correctness of the data can only be insured by understanding the logic used to generate it. We've studied caching opportunities extensively and apply caching where it can be done safely, with no risk of producing inconsistent or incorrect results. REST isn't going to change the business rules or our customer's expectation of accuracy.

DC: And we're only one-ninth of the way through our conversation!

DP: Great, this has been fun!

Technorati Tags: architecture, atom, ebay, engineering, http, identity, programming, protocol, rest, scalability, security, services, soa, software, to_read, toread, web