Today I launched a “field trial” of my sort-of-a-startup (http://welshare.com). And here I’ll write some technical details.

The stack:

Java – the language and the platform I’m most familiar with. It is claimed that the JVM is very performant, so that’s welcome.

spring – all internal functioning of the application is wired by the spring framework. Again – I’m more or less a ‘spring expert’. Very interesting portfolio projects are coming out now (spring-mobile, spring-data, etc), so again – that’s welcome

spring-mvc – this is the web framework. It provides an easy RESTful style of programming

jQuery – obviously, some client-side richness is needed, and jQuery is industry-standard. I didn’t have any prior experience with jQuery, so the client-side code needs some polishing.

MySQL – well, no NoSQL for now. I’ll discuss this decision in a moment

Hibernate – the most common choice for object-relational mapping. Why object-relational mapping in the first place? I’m used to it, it makes development easy. And it offers some very good features out of the box, like caching, search, sharding, etc.

Lucene – search is a must. And lucene is, again, an industry-standard. Why just bare lucene, and not Solr? In a minute.

Ehcache – caching is a must. Both hibernate and spring have good caching options, and both support ehcache. Ehcache is a standard caching solution for java, similar to memcached for LAMP. (well, memcached has a Java API, and differs a little from ehcache in terms of replication at least, but that’s out of the scope of this article)

Now, that’s a typical web-application architecture. A presentation layer (browser/jQuery sending and getting data from spring-mvc controllers), a service layer, which is invoked by the controllers and contains all the logic, and a DAO layer, which contains all the database access.

And yes, it is trivial and boring. If I want this application to become “big”, I need something more .. scalable..and fashionable. Like NoSQL, messaging, distributed file system. Well, no.

I’ve already expressed my views about NoSQL for startups. In short – it was way easier and faster to write the thing with MySQL than to cope with NoSQL. I tried Cassandra (and spent some more time researching why should I try it), but it wasn’t a good match.

Messaging. Twitter uses message queues to balance the load, and to queue the messages until they can be handled. Well, I started the project with JMS (ActiveMQ) and coded to that paradigm for a short while. Then I realized that.. I don’t need this complication yet. And so I dropped it.

I mentioned that search is a must. And the typical choice for a search engine is Solr, which is backed by lucene. But I choose Hibernate-search with lucene. Why? Because Hibernate synchronizes the lucene index with the database automatically. I wouldn’t have to write any code to communicate with Solr, let alone to deploy and manage Solr.

I guess the picture is clear for now. I have created a minimum viable product in terms of technology (and not so minimum in terms of features). If I had to use and deploy NoSQL, Solr, message queue, and whatnot, I wouldn’t be able to launch.

Now, I agree, this sounds like “that guy has made some crappy piece of software that will never be capable of growing”. But the simple architecture, combined with a clean code, is rather extensible. What do I mean:

When I choose to / have to switch to a NoSQL store for something, the only code I have to change is a few DAOs. All of the rest stays the same. I’ve been strictly keeping the layer boundaries so the service layer has absolutely no knowledge of what storage mechanism is used. In fact, I plan to move the url-shortening to a key-value store really soon, and the user relationships (followings and friendships) to a graph database in the near future. With the separated layers and with the help of the spring-data project I see this as “not a big deal”. Certainly, it will be time-consuming, at least because of the learning curve

A Message Queue can be incorporated into the architecture really easy. I’ll just have to plug it between the service and the dao layers. Not much hassle, especially with the limited scope of the MQ. Of course, it will take time to test and measure, but it is not a complex task

Solr – if lucene with hibernate-search turns out to have some performance and scalability problems, all that will be required is to write some post-processor (after the DAOs finish the database work) to communicate with Solr (currently this post-processor is provided by hibernate, and does not communicate with Solr). Not a small task, but one requiring minimal changes in the existing code

You get the picture – I’ve postponed some time-consuming tasks, but I have taken the measures to make them really easy to do later. It’s all about over-architecture and over-design. That has always brought trouble, and so I strived not to overdesign things.

So what will happen if I get millions of users overnight? I won’t 🙂 But I have written the code so that it is prepared to sustain an eventual growth.

(With one exception, that is really a horrible mistake. I don’t have enough tests. A few unit-tests and a few selenium tests just aren’t enough. I hope that doesn’t eat my head.)

Finally, a few operational details – currently using Amazon EC2, but without being coupled to any other service. I’m storing things on S3, but with one configuration I can switch to file system (and I do so for development).

Builds are done via Maven and Hudson. I don’t really use hudson for deploying, at least for now. Deploying without downtime is a wonderful feature of Tomcat 7 – you can have two version of the same application running simultaneously. The older version will be there until all active sessions to it expire.

Monitoring is done via JMX and the CPU monitor of amazon.

As a conclusion – I hope I’m right with my decisions. Time and server monitoring will tell.