STXXL: Standard Template Library for Extra Large Data Sets.





The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i. e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the closeness to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

The key features of STXXL are:

Transparent support of parallel disks. The library provides implementations of basic parallel disk algorithms. STXXL is the only external memory algorithm library supporting parallel disks.

algorithms. is the only external memory algorithm library supporting parallel disks. The library is able to handle problems of very large size (tested to up to dozens of terabytes).

size (tested to up to dozens of terabytes). Improved utilization of computer resources. STXXL implementations of external memory algorithms and data structures benefit from overlapping of I/O and computation.

implementations of external memory algorithms and data structures benefit from overlapping of I/O and computation. Small constant factors in I/O volume. A unique library feature called "pipelining" can save more than half the number of I/Os, by streaming data between algorithmic components, instead of temporarily storing them on disk. A development branch supports asynchronous execution of the algorithmic components, enabling high-level task parallelism.

can save more than the number of I/Os, by streaming data between algorithmic components, instead of temporarily storing them on disk. A development branch supports execution of the algorithmic components, enabling high-level task parallelism. Shorter development times due to well known STL-compatible interfaces for external memory algorithms and data structures.

due to well known interfaces for external memory algorithms and data structures. STL algorithms can be directly applied to STXXL containers; moreover, the I/O complexity of the algorithms remains optimal in most of the cases. [more info]

containers; moreover, the I/O complexity of the algorithms remains optimal in most of the cases. [more info] For internal computation, parallel algorithms from the MCSTL or the libstdc++ parallel mode are optionally utilized, making the algorithms inherently benefit from multi-core parallelism.

algorithms from the MCSTL or the libstdc++ parallel mode are optionally utilized, making the algorithms inherently benefit from parallelism. STXXL is free, open source, and available under the Boost Software License 1.0.

Current maintainers: Andreas Beckmann, Timo Bingmann

Past contributors: Roman Dementiev (author), Peter Sanders, Johannes Singler, Raoul Steffen, Markus Westphal

Downloads and Documentation

Support Channels

Questions concerning use and development can be search for and asked on Stack Overflow, and longer user-contributed solutions may also be shared via the Github Wiki..

Bugs and issues should be reported via Github's issue tracker

Discussions about future development and details of the STXXL should be posted to the sourceforge forums.

Platforms supported

Operating Systems Compilers Extras Linux (kernel >= 2.4.18) g++ (3.4-4.9)

icpc (2011,2013,2015)

clang++ (3.1-3.5) libstdc++ parallel mode (optional, included with g++ 4.3+)

Boost (optional) Mac OS X clang++ (3.5) FreeBSD g++ Windows Visual C++ 2010, 2012 and 2013 -- Boost (required only for VS 2010)

Versions

Version 1.4.1 (October 29, 2014) Integrated support for kernel based asynchronous I/O on Linux (new file type "linuxaio"), which exploits Native Command Queuing (NCQ) if available. Merged stxxl::unordered_map branch, which provides a hash map backed by external memory. Replaced struct default_completion_handler with a NULL pointer, thus avoiding superfluous new/delete work for each I/O request Added stxxl::external_shared_ptr which is a proxy class to allow use of shared_ptr classes inside stxxl containers Fixing bugs and warnings on 32-bit systems (yes, they still exist). Use atomic_counted_object in class file for request reference counting. Adding support for MinGW-w64 (64-bit) systems with working SJLJ thread implementations.

Version 1.4.0 (December 12, 2013) reorganized source hierarchy into include/ lib/ tests/ examples/ doc/ tools/ CMake build system for cross-platform compilation greatly improved documentation with tutorials and examples efficient external matrix operations new containers stxxl::sequence and stxxl::sorter improved .stxxl disk configuration files and additional options combined stxxl_tool of disk benchmarks simple examples and skew3 as real-world stream application support for Visual Studio 2012 and 2013 _without_ Boost important bug fixes in stxxl::queue and stxxl::priority_queue

Version 1.3.1 (March 10, 2011) Contains memory management, disk virtualization, prefetching, and so on, as the lower layers, and as part of the higher layer (pipelined) sorting with SMP and multi-core processor support, (pipelined) scanning and containers (vectors, stacks, priority queues, maps (B+Tree), queues, deques). Currently that sums to about 35,000 lines of code.

See the current Changelog for detailed lists of changes.

Branches

Special features are maintained as Github forks until they are merged into master. Until inclusion into the master branch, the interface may change without further notice.

Asynchronous Pipelining/Streaming: parallel_pipelining_integration

This contains an unfinished integration attempt of the async nodes and parallel sorting/pipelining described in the IPDPS 2009 paper. If someone has a stake or interest in this branch, please contact me (Timo) about further work on it. -- 2014-10-23

Publications, Ongoing and Completed Projects using STXXL

If you use STXXL and wish to appear in the following list, please provide a description line via email to one of the maintainers.