Fast on the heals of the 0.9.5 maintenance release, the Apache Storm community is pleased to announce that Apache Storm 0.10.0-beta has been released and is now available on the downloads page.

Aside from many stability and performance improvements, this release includes a number of important new features, some of which are highlighted below.

Secure, Multi-Tenant Deployment

Much like the early days of Hadoop, Apache Storm originally evolved in an environment where security was not a high-priority concern. Rather, it was assumed that Apache Storm would be deployed to environments suitably cordoned off from security threats. While a large number of users were comfortable setting up their own security measures for Apache Storm (usually at the Firewall/OS level), this proved a hindrance to broader adoption among larger enterprises where security policies prohibited deployment without specific safeguards.

Yahoo! hosts one of the largest Apache Storm deployments in the world, and their engineering team recognized the need for security early on, so it implemented many of the features necessary to secure its own Apache Storm deployment. Yahoo!, Hortonworks, Symantec, and the broader Apache Storm community have worked together to bring those security innovations into the main Apache Storm code base.

We are pleased to announce that work is now complete. Some of the highlights of Apache Storm's new security features include:

Kerberos Authentication with Automatic Credential Push and Renewal

Pluggable Authorization and ACLs

Multi-Tenant Scheduling with Per-User isolation and configurable resource limits.

User Impersonation

SSL Support for Storm UI, Log Viewer, and DRPC (Distributed Remote Procedure Call)

Secure integration with other Hadoop Projects (such as ZooKeeper, HDFS, HBase, etc.)

User isolation (Storm topologies run as the user who submitted them)

For more details and instructions for securing Apache Storm, please see the security documentation.

A Foundation for Rolling Upgrades and Continuity of Operations

In the past, upgrading an Apache Storm cluster could be an arduous process that involved un-deploying existing topologies, removing state from local disk and ZooKeeper, installing the upgrade, and finally redeploying topologies. From an operations perspective, this process was disruptive to say the very least.

The underlying cause of this headache was rooted in the data format Apache Storm processes used to store both local and distributed state. Between versions, these data structures would change in incompatible ways.

Beginning with version 0.10.0, this limitation has been eliminated. In the future, upgrading from Apache Storm 0.10.0 to a newer version can be accomplished seamlessly, with zero down time. In fact, for users who use Apache Ambari for cluster provisioning and management, the process can be completely automated.

Easier Deployment and Declarative Topology Wiring with Flux

Apache Storm 0.10.0 now includes Flux, which is a framework and set of utilities that make defining and deploying Apache Storm topologies less painful and developer-intensive. A common pain point mentioned by Apache Storm users is the fact that the wiring for a Topology graph is often tied up in Java code, and that any changes require recompilation and repackaging of the topology jar file. Flux aims to alleviate that pain by allowing you to package all your Apache Storm components in a single jar, and use an external text file to define the layout and configuration of your topologies.

Some of Flux' features include:

Easily configure and deploy Storm topologies (Both Storm core and Micro-batch API) without embedding configuration in your topology code

Support for existing topology code

Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL

YAML DSL support for most Storm components (storm-kafka, storm-hdfs, storm-hbase, etc.)

Convenient support for multi-lang components

External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution)

You can read more about Flux on the Flux documentation page.

Partial Key Groupings

In addition to the standard Stream Groupings Apache Storm has always supported, version 0.10.0 introduces a new grouping named "Partial Key Grouping". With the Partial Stream Grouping, the tuple stream is partitioned by the fields specified in the grouping, like the Fields Grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed.

Documentation for the Partial Key Grouping and other stream groupings supported by Apache Storm can be found here. This research paper provides additional details regarding how it works and its advantages.

Improved Logging Framework

Debugging distributed applications can be difficult, and usually focuses on one main source of information: application log files. But in a very low latency system like Apache Storm where every millisecond counts, logging can be a double-edged sword: If you log too little information you may miss the information you need to solve a problem; log too much and you risk degrading the overall performance of your application as resources are consumed by the logging framework.

In version 0.10.0 Apache Storm's logging framework now uses Apache Log4j 2 which, like Apache Storm's internal messaging subsystem, uses the extremely performant LMAX Disruptor messaging library. Log4j 2 boast an 18x higher throughput and orders of magnitude lower latency than Apache Storm's previous logging framework. More efficient resource utilization at the logging level means more resources are available where they matter most: executing your business logic.

A few of the important features these changes bring include:

Rolling log files with size, duration, and date-based triggers that are composable

Dynamic log configuration updates without dropping log messages

Remote log monitoring and (re)configuration via JMX

A Syslog/RFC-5424-compliant appender.

Integration with log aggregators such as syslog-ng

Streaming ingest with Apache Hive

Introduced in version 0.13, Apache Hive includes a Streaming Data Ingest API that allows data to be written continuously into Hive. The incoming data can be continuously committed in small batches of records into existing Hive partition or table. Once the data is committed its immediately visible to all hive queries.

Apache Storm 0.10.0 introduces both a Apache Storm Core API bolt implementation that allows users to stream data from Apache Storm directly into hive. Apache Storm's Hive integration also includes a State implementation for Apache Storm's Micro-batching/Transactional API (Trident) that allows you to write to Hive from a micro-batch/transactional topology and supports exactly-once semantics for data persistence.

For more information on Apache Storm's Hive integration, see the storm-hive documentation.

Microsoft Azure Event Hubs Integration

With Microsoft Azure's support for running Apache Storm on HDInsight, Apache Storm is now a first class citizen of the Azure cloud computing platform. To better support Apache Storm integration with Azure services, Microsoft engineers have contributed several components that allow Apache Storm to integrate directly with Microsoft Azure Event Hubs.

Apache Storm's Event Hubs integration includes both spout and bolt implementations for reading from, and writing to Event Hubs. The Event Hub integration also includes a Micro-batching/Transactional (Trident) spout implementation that supports fully fault-tolerant and reliable processing, as well as support for exactly-once message processing semantics.

Redis Support

Apache Storm 0.10.0 also introduces support for the Redis data structure server. Apache Storm's Redis support includes bolt implementations for both writing to and querying Redis from a Apache Storm topology, and is easily extended for custom use cases. For Apache Storm's micro-batching/transactional API, the Redis support includes both Trident State and MapState implementations for fault-tolerant state management with Redis.

Further information can be found in the storm-redis documentation.

JDBC/RDBMS Integration

Many stream processing data flows require accessing data from or writing data to a relational data store. Apache Storm 0.10.0 introduces highly flexible and customizable support for integrating with virtually any JDBC-compliant database.

The Storm-JDBC package includes core Apache Storm bolt and Trident state implementations that allow an Apache Storm topology to either insert Apache Storm tuple data into a database table or execute select queries against a database to enrich streaming data in an Apache Storm topology.

Further details and instructions can be found in the Storm-JDBC documentation.

Reduced Dependency Conflicts

In previous Apache Storm releases, it was not uncommon for users' topology dependencies to conflict with the libraries used by Apache Storm. In Apache Storm 0.9.3 several dependency packages that were common sources of conflicts have been package-relocated (shaded) to avoid this situation. In 0.10.0 this list has been expanded.

Developers are free to use the Storm-packaged versions, or supply their own version.

The full list of Apache Storm's package relocations can be found here.

Future Work

While the 0.10.0 release is an important milestone in the evolution of Apache Storm, the Apache Storm community is actively working on new improvements, both near and long term, and continuously exploring the realm of the possible.

Twitter recently announced the Heron project, which claims to provide substantial performance improvements while maintaining 100% API compatibility with Apache Storm. The corresponding research paper provides additional details regarding the architectural improvements. The fact that Twitter chose to maintain API compatibility with Apache Storm is a testament to the power and flexibility of that API. Twitter has also expressed a desire to share their experiences and work with the Apache Storm community.

A number of concepts expressed in the Heron paper were already in the implementation stage by the Apache Storm community even before it was published, and we look forward to working with Twitter to bring those and other improvements to Apache Storm.

Thanks

Special thanks are due to all those who have contributed to Apache Storm -- whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Your efforts are very much valued and appreciated.

Full Change Log