Friday, June 5, 2009 at 3:08AM

Mather Corgan, president of HotPads, gave a great talk on how HotPads uses AWS to run their real estate search engine. I loved the presentation for a few reasons:

It gives real costs on on their servers, how many servers they have, what they are used for, and exactly how they use S2, EBS, CloudFront and other AWS services. This is great information for anybody trying to architect a system and wondering where to run it.

HotPads is a "real" application. It's a small company and at 4.5 million page-views/month it's large but not super large. It has custom server side components like indexing engines, image processing, and background database update engines for syncing new real estate data. And it also stores a lot of images and has low latency requirements.



This a really good example mix of where many companies are or would like to be with their applications.



Their total costs are about $11K/month, which is about what they were paying at their previous provider. I found this is a little surprising as I thought the cloud would be more expensive, but they only pay for what they need instead of having to over provision for transient uses like testing. And some servers aren't necessary anymore as EBS handles backups so database slave servers are no longer required.



There are lots more lessons like this that I've abstracted down below.



Site: http://hotpads.com - a map-based real estate search engine, listing homes for sale, apartments, condos, and rental houses.

Stats

800,000 visits/month

4.5 million page-views/month

3.5 million real-estate listings updated daily

Platform

Java

MySQL

AWS

Costs

EC2 - $7400/month - run 20 of various size instances at anyone time. Most work is in the background processing of images, not web serving.

* $150: 2 Small HAProxy Load Balancers - 2 for failover, these have the elastic IPs, round robin DNS point at the elastic IPs.

* $1,200: 3-5 Large Tomcat Web Servers - an array of 3 run at night and 5 during the day.

* $1,500: 5 Large Tomcat Job Servers

* $900: 1 X-Large 1 Large Index Server - used to power property search and have several GB of RAM for the JVM

* $1,200: 1 X-Large 2 Large MySQL masters

* $1,200: 1 X-Large 2 Large MySQL slaves

* $300: 1 Large Messaging Server ActiveMQ - will be replaced with SQS

* $300: 1 Large Map tile creation servers Tilecache

* $600: Development/testing/migration/ servers

S3 - $1500/month - few hundred million objects for files for maps and real-estate listing photos. 4TB of database backup stored as EBS diffs ($600/month).

Elastic Block Storage - $500/month

CloudFront - $460/month - is used to serve static files and map files throughout the world. It serves static files, map tiles, and listing photos.

Elastic IP Addresses - $8/month

RightScale - $500/month - used for management and deployment.

Lessons Learned

Major reason for choosing EC2 was the cloud API which allows adding servers at any time. In their previous hosting service they had to prepay for a month at a time so they would order the minimum necessary to get by that month. That doesn't leave room for servers for development, test, preview servers for customers or making live database servers upgrades (which requires 2x servers)?

Overall cost is about the same as with previous hosting site but the overall speed of development and ease of management is night and day different. Getting more servers and lots more flexibility.

HotPads is a small company and doesn't think added trouble of colocation isn't worth it for them yet.

Advantage of Amazon over something like Google App Engine is that Amazon allows you to innovate by building your own services on your own machines.

S3 is better for larger objects because for small files that are not viewed often the cost of puts outweighs everything. Not a cache to use for short lived objects because the put costs start to dominate.

* For a 67 KB object (600 px image) which is where the cost of putting an image into S3 equals the cost of storing it there and about equal the cost of storing it once.

* For a 6.7 KB object (15 px thumb nail) the put (small fee for putting an object into S3) cost is 10x the storage transfer costs.

Costs have to figured into the algorithms you use.

* In April 330 GB of images downloaded at $.15/GB cost $49. 55mm GETs at $1/mm cost $55. 42mm PUTs at $1/1k cost $420!

* $100 download and GETs of maptiles.

* So S3 very cheap for larger files, watch out for lots of short lived small files.

CloudFront is 10 times faster than S3 but is more expensive for infrequently viewed files.

* Makes frequently viewed listings faster.

* For infrequently viewed listings the CloudFront has to go to S3 to get the file the first time which means you have to pay twice for a file that will be viewed only once.

EBS

* Used on database servers because it's faster than local storage (especially for random writes), blocks of data redundant, and supports easy backups and versioning via cloning.

* Only 10% cost overhead.

* Allowed them to get rid of second set of slaves because the backups were so CPU intensive they had to have slaves to do the backups. EBS allows snapshots of running drives so the extra slaves are unnecessary.

* Databases are I/O bound and the CPU is vastly underutilized so there's extra capacity when you need it.

SimpleDB - not using, pretty proprietary. May be of value because you only pay for what you use given how under utilized your own database servers can be.

Reserved Instances

* 1 year for the cost of 6 months and guaranteed (denied one time) to get an instance.

* Con is tied to an instance type and they want more flexibility to choose instance types as their software changes and take advantage of new instance types as they are released.