Tephra is used by the Apache Phoenix as well to add cross-row and cross-table transaction support with full ACID semantics.

Apache Tephra provides globally consistent transactions on top of distributed data stores such as Apache HBase . While HBase provides strong consistency with row- or region-level ACID operations, it sacrifices cross-region and cross-table consistency in favor of scalability. This trade-off requires application developers to handle the complexity of ensuring consistency when their modifications span region boundaries. By providing support for global transactions that span regions, tables, or multiple RPCs, Tephra simplifies application development on top of HBase, without a significant impact on performance or scalability for many workloads.

How It Works

Tephra leverages HBase’s native data versioning to provide multi-versioned concurrency control (MVCC) for transactional reads and writes. With MVCC capability, each transaction sees its own consistent “snapshot” of data, providing snapshot isolation of concurrent transactions.

Tephra consists of three main components:

Transaction Server - maintains global view of transaction state, assigns new transaction IDs and performs conflict detection;

- maintains global view of transaction state, assigns new transaction IDs and performs conflict detection; Transaction Client - coordinates start, commit, and rollback of transactions; and

- coordinates start, commit, and rollback of transactions; and TransactionProcessor Coprocessor - applies filtering to the data read (based on a given transaction’s state) and cleans up any data from old (no longer visible) transactions.

Transaction Server A central transaction manager generates a globally unique, time-based transaction ID for each transaction that is started, and maintains the state of all in-progress and recently committed transactions for conflict detection. While multiple transaction server instances can be run concurrently for automatic failover, only one server instance is actively serving requests at a time. This is coordinated by performing leader election amongst the running instances through Apache ZooKeeper. The active transaction server instance will also register itself using a service discovery interface in ZooKeeper, allowing clients to discover the currently active server instance without additional configuration.

Transaction Client A client makes a call to the active transaction server in order to start a new transaction. This returns a new transaction instance to the client, with a unique transaction ID (used to identify writes for the transaction), as well as a list of transaction IDs to exclude for reads (from in-progress or invalidated transactions). When performing writes, the client overrides the timestamp for all modified HBase cells with the transaction ID. When reading data from HBase, the client skips cells associated with any of the excluded transaction IDs. The read exclusions are applied through a server-side filter injected by the TransactionProcessor coprocessor.