Today at CommunityOne in New York, we’re announcing a bunch of Cloud-related stuff. Some of it has my fingerprints on it. This is my personal take on the interesting parts.

[Disclaimer]: Like it says on the front page, I work for Sun and sometimes even speak for it officially, but not in this blog. These are my own views as a project insider, and the perceptions of what it is and why it matters are mine; the company’s may differ.

Back Story · Just before Christmas, the group I’m in morphed into the Cloud Business Unit. My boss called up and said “That’s not for you, right? Want to move over to GlassFish/Web-tier land?” I said “Hell no, I don’t really grok Cloud but then neither does anyone else. Put me in, coach.”

So, starting right after New Years, I’ve been cloudworking with a bunch of people in various Sun shops and the folks from our recent Q-layer acquisition. After a few years in loosely-defined strategy and intelligence and evangelism work, it’s been a real thrill to buckle down and grind away on stuff with a view to shipping it.

The Announcement · We’re going to be rolling out a Sun Cloud offering later this year. It’s going to have a storage service that’ll offer WebDAV and something S3-flavored. Also, there’ll be a compute service, based partly on the Q-layer technology.

And it’s got an API.

The API · This is a unified view of the storage and compute and networking services. It’s built around the notion of a “Virtual Data Center” (VDC), which contains networks and clusters of servers and public IP addresses and storage services. The idea is to give you the administrative and operational handles to build something really big and ambitious. The VDC notion is really slick and I think something like it is going to be required in any serious cloud API going forward.

At the bottom level, the interface is based on HTTP and tries hard to be RESTful. All the resources—servers, networks, virtual data centers—are represented in JSON. [Don’t you mean XML? -Ed.] [Nope, JSON is just right for this one. -Tim]

We even tried to do the “Hypertext as engine of application state” thing. To use the API, you need one URI to get started; it identifies the whole cloud. Dereference that, you get some JSON that has URIs for your VDCs; dereference those, and you get more JSON that represents your clusters and servers and networks and so on. This has the nice side-effect that the API doesn’t constrain the design of the URI space at all. [Who cares? -Ed.] [Stay tuned. -Tim]

This interface does more than just Resource CRUD; there are operations like “Start server” and “Snapshot storage”. The kind of thing that classic REST design patterns don’t really give you a cookbook for. Here’s an example of how it works: the representation of a server includes a bunch of “controller” URIs; a POST to one of these constitutes a request to do something, say reboot the server.

On top of the low-level REST there’s a set of libraries for those who’d rather not deal with HTTP messaging; available off the top in Java, Ruby, and Python. (Hmm, the other day I saw somebody check something into a directory called php , but that’s not a commitment).

On top of that there’s a command-line interface suitable for shell-scripting, except for it emits JSON instead of classic-Unix lines-of-text. I wonder how that will work out?

Finally, there’s a Web GUI so you can build your VDC by dragging and dropping things around. It’s nice demo-ware and I can see people using that for getting a quick ad-hoc server deployment on the air on short notice. But my bet is that for heavy lifting, you’re going to want to script your deployments, not drag-and-drop them.

Zero Barrier to Exit · Maybe the single most interesting thing about this API is that the spec is published under a Creative Commons “Attribution” license, which means that pretty well anybody can do pretty well anything with it. I’m pretty convinced that if Cloud technology is going to take off, there’ll have to be a competitive ecosystem; so that when you bet on a service provider, if the relationship doesn’t work out there’s a way to take your business to another provider with relatively little operational pain. Put another way: no lock-in.

I got all excited about this back in January at that Cloud Interop session. Anant Jhingran, an IBM VIP, spoke up and said “Customers don’t want interoperability, they want integration.”

“Bzzzzzzzzzt! Wrong!” I thought. But then I realized he was only half wrong; anyone going down this road needs integration and interoperability.

So that’s what we’re trying to do here. We’ve done a lot of work to keep the interfaces generic rather than Sun-specific, and I think we won’t be the only provider of cloud-computing services through this API.

A Work In Progress · Not only is the API CC-licensed and free for use by anybody, it’s not finished yet. We’ve got a lot of back-end infrastructure already built, but there’s still time to refine and improve the API before we’re in ship/lockdown mode. The work’s being done in public over at a Kenai.com project called The Sun Cloud APIs. The spec-ware is on a set of wiki pages starting here. If you want an introduction, the place to start is “Hello Cloud” — An illustrative walk-through of the Sun Cloud API.

If you want to be part of the design process, get yourself a Kenai login and join the project. That gets you a ticket to the forums (which have an Atom feed, thank goodness). There’s no rule saying committers have to be Sun people, down the road; this should be a meritocracy.

How about taking this to a standards organization? I suppose I’d be OK with that once there are a few implementors, and proof that it works. We’re confident that we can build infrastructure behind every interface that’s in there now, which is good; if someone else could do so independently, that’d be better. If we were going to do that, my feeling is that the right level to standardize would the REST/HTTP interface; let implementors compete to offer slick high-level programming-language APIs.

Why REST? · It’s a sensible question. The chief virtue of RESTful interfaces is massive scaling. But gimme a break, these are data-center management operations; a typical transaction frequency would be a single-digit number per week, with the single digit often being “0”, and it wouldn’t be surprising if a big multi-cluster staged-boot operation had a latency of minutes. The data-center controls are unlikely to be a bottleneck.

Why, then? Simply because we wanted a bits-on-the-wire interface. APIs, in the general case, suck; and are really hard to make portable. Bits-on-the-wire are ultimately flexible and interoperable. If you’re going to do bits-on-the-wire, Why not use HTTP? And if you’re going to use HTTP, use it right. That’s all.

However I think we will be forgiven, in this case, for not really sweating the ETags and caching part of the spec yet.

My Fingerprints · I’ve been working on the specification at the REST level. Most of the heavy lifting was done by Craig McLanahan with guidance from Lew Tucker. I played my accustomed role as designated minimalist: the API has become noticeably smaller since I got involved. I suspect Craig is still feeling a bit traumatized by my enthusiastic wielding of the spec machete.

Also I’ve been implementing a glue-code bridge between the REST API and the Q-layer back-end. It’s in Ruby and so far I’m talking straight to Rack, the “router” is just a big case statement over URI-matching regexps.

I’m not sure, at this point, whether it’s a proof-of-concept or ends up shipping. The Q-layer interface is a moving target; we just completed the acquisition around January 1 and they’re making a bunch of changes to morph the product into what we need for the Sun Cloud.

Open source? Maybe, if it turns out to work. The subject hasn’t even come up.

The Business End · How do you make money in clouds? I’m not convinced that there are big fat margins in operating global-scale infrastructure, competing with Amazon AWS and its peers. I am 100% convinced that if there were a general-purpose platform for running behind the firewall to automate scaling and deployment and take IT out of many loops, there are a whole lot of enterprises who’d love that kind of elasticity in their infrastructure.

Machine virtualization is a big deal, obviously. Lightweight lockin-free data-center virtualization might be bigger, I think.