Search startup Kosmix has released a C++ implementation of the Google File System as open source. This parallels the existing Hadoop/HDFS project which is written in Java. The Kosmix team has deep engineering talent, including a strong track record, and having recently built a web-scale crawler and search engine from scratch. Google has a set of tools that the rest of the industry needs in order to compete...it's cool that folks are stepping up to the task and leveraging the open source model to try to provide some balance.

KFS arrives with an impressive set of features for an alpha release:

Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.

- New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes. Availability - Replication is used to provide availability due to chunk server failures.

- Replication is used to provide availability due to chunk server failures. Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.

- Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes. Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.

- To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk. Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.

- During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application. Language support - KFS client library can be accessed from C++, Java, and Python.

- KFS client library can be accessed from FUSE support on Linux - By mounting KFS via FUSE, this support allows existing linux utilities (such as, ls) to interface with KFS.

on Linux - By mounting KFS via FUSE, this support allows existing linux utilities (such as, ls) to interface with KFS. Leases - KFS client library uses caching to improve performance. Leases are used to support cache consistency.

Every startup that scales beyond a single machine needs platform technology to build their application and run their cluster. If enough folks adopt the code and contribute, the hope is that it could become something like the gcc/linux/perl of the cluster storage layer.