Sector/Sphere supports distributed data storage, distribution, and processing over large clusters of commodity computers, either within a data center or across multiple data centers. Sector is a high performance, scalable, and secure distributed file system. Sphere is a high performance parallel data processing engine that can process Sector data files on the storage nodes with very simple programming interfaces. ( Presentation: PDF 608KB / Poster: PDF 283KB )

Why Sector/Sphere?

High Performance. Sector and Sphere are highly optimized for data intensive applications. Sphere supports massive parallel in-storage data processing, supported by Sector's unique application-aware data placement mechanism. In our benchmarks, Sphere runs constantly 2 - 4 times faster than Hadoop MapReduce (see benchmark).

WAN Support. Sector is one of the few file systems that can effectively support multiple data centers across wide area networks. Sector uses UDT to enable high speed data transfer, while its data placement strategy can make Sector effectively work as a content distribution network over WAN.

Software Level Fault Tolerance. Sector does not require hardware RAID for reliability; instead, data is automatically replicated in Sector for high reliability and availability. Meanwhile, both Sector slaves and masters can be removed and inserted at run time. Sector also supports multiple active masters for high performance and availability.

Rule-based Data Management. For each file, users can control its replication factor, replication distance, and replication locations (when necessary). The rules can be changed at run time.

Compatible with Legacy Systems. Many existing applications or job schedulers can continue to work with Sector files with little modification.