This series of posts present an introduction to Apache Cassandra. It discusses key Cassandra features, its core concepts, how it works under the hood, how it is different from other data stores, data modelling best practices with examples, and some tips & tricks.

Cassandra is a popular open source NoSQL database. It is being successfully used in a variety of scenarios like analytics, time series analysis, monitoring, retail, e-commerce, etc. One common overarching theme where you find Cassandra in use are environments with high write volumes.

Key Cassandra Features

Here are some of the key features of Cassandra.

Distributed : Cassandra is built to run on a cluster of nodes to provide high availability, fault tolerance and scalability.

: Cassandra is built to run on a cluster of nodes to provide high availability, fault tolerance and scalability. Multi Master or Master Less: Many data stores e.g. MongoDB are based on a master slave architecture.

All the writes goes on a master node and reads are executed on slaves. On the other hand, Cassandra works in a master-less or multi master mode.

Writes are distributed among nodes using a hash function (more on this later) and reads are channeled onto specific nodes.

High Write Availability : When a master node goes down, MongoDB stops taking new writes until rest of the nodes choose a new master. On the other hand, in Cassandra, if one node goes down, the writes are redirected towards other nodes and the system continues to operate.

: When a master node goes down, MongoDB stops taking new writes until rest of the nodes choose a new master. On the other hand, in Cassandra, if one node goes down, the writes are redirected towards other nodes and the system continues to operate. Linear Scaling : due to its multi master architecture, Cassandra is linearly scalable, doubling the number of nodes in a cluster can handle twice the writes.

: due to its multi master architecture, Cassandra is linearly scalable, doubling the number of nodes in a cluster can handle twice the writes. Design Time Schema : Cassandra requires defining schema and data types at design time. That’s not how Cassandra started, but it evolved and now you must define schema first.

: Cassandra requires defining schema and data types at design time. That’s not how Cassandra started, but it evolved and now you must define schema first. Hot Writes in RAM : Cassandra stores incoming writes in RAM to provide speedy performance (more on this later).

: Cassandra stores incoming writes in RAM to provide speedy performance (more on this later). AP system : Cassandra is considered highly available and partition tolerant system in terms of CAP theorem.

: Cassandra is considered highly available and partition tolerant system in terms of CAP theorem. Column family Store: Cassandra is neither a row based store nor column oriented store, its a column family store which is a different concept. (More on this later)

Next: Apache Cassandra, Part 2: Cassandra vs MongoDB