Posted Jan 14, 2011

Apache Cassandra 0.7 Advances Open Source NoSQL

By Sean Michael Kerner

The open source Apache Cassandra NoSQL database, since its inception, has been focused on being highly scalable.

With the new Cassandra 0.7 release this week, that scalability is being improved with secondary indexes and large row support for database rows of up to two billion columns per row.

Apache Cassandra 0.7 follows the 0.6 release, which came out in April of 2010. The Cassandra database was originally begun by developers at Facebook and has since been adopted by multiple web properties including Twitter, Digg and Reddit. Commercial support is available by way of startup Riptano.

"0.7 is a really strong release to start 2011 off with," Jonathan Ellis, vice president of Apache Cassandra told InternetNews.com. "We continue to see interest in Cassandra from a variety of market verticals, everything from e-commerce to government."

With the 0.7 release, one of the key new features is secondary index support, which is important for a number of reasons. Ellis explained that with secondary indexes, development is easier as developers no longer have to maintain index ColumnFamilies manually in their application code

Additionally, Ellis noted that secondary indexes are guaranteed to stay in sync with the source object data. Secondary indexes in Cassandra are also non-blocking.

"Cassandra can create secondary indexes in the background on existing data without blocking queries or updates," Ellis said.

The other big new feature is large row support for up to two billion columns per row. In previous Cassandra releases, there was a limit where a single column value could not be larger than 2 GB.

Cassandra 0.7 also enables database administrators to perform online schema changes from a client API. The schema changes can be done without the need to restart the database cluster. Ellis noted that a command-line interface is provided with the 0.7 release to manage the changes.

When it comes to migrating from Cassandra 0.6 to the new 0.7 release, Ellis noted that it's fairly easy. He added that 0.7 is fully compatible with 0.6 data files and there is documentation to help users with the process.

Moving forward, Cassandra will benefit from some new contributions that have come from Digg and Twitter. Digg publicly announced its move to Cassandra in March of 2010. For Digg, the move was a migration from MySQL to the NoSQL approach taken by Cassandra.

"The next major version will have distributed counters, contributed by Digg and Twitter engineers, entity groups, intra-node encryption support, and even more performance improvements," Ellis said.

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of Internet.com, the network for technology professionals.