





OLTP and operational analytics for Apache Hadoop



Download latest Apache Phoenix binary and source release artifacts

Browse through Apache Phoenix JIRAs



Sync and build Apache Phoenix from source code News: Phoenix next major release 5.0.0 has been released and is available for download here

Overview Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds: the power of standard SQL and JDBC APIs with full ACID transaction capabilities and

the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.

Who is using Apache Phoenix? Read more here...



Mission Become the trusted data platform for OLTP and operational analytics for Hadoop through well-defined, industry standard APIs.

Quick Start Tired of reading already and just want to get started? Take a look at our FAQs, listen to the Apache Phoenix talk from Hadoop Summit 2015, review the overview presentation, and jump over to our quick start guide here.

SQL Support Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. To see a complete list of what is supported, go to our language reference. All standard SQL query constructs are supported, including SELECT , FROM , WHERE , GROUP BY , HAVING , ORDER BY , etc. It also supports a full set of DML commands as well as table creation and versioned incremental alterations through our DDL commands. Here’s a list of what is currently not supported: Relational operators . Intersect, Minus.

. Intersect, Minus. Miscellaneous built-in functions. These are easy to add - read this blog for step by step instructions. Connection Use JDBC to get a connection to an HBase cluster like this: Connection conn = DriverManager.getConnection("jdbc:phoenix:server1,server2:3333",props); where props are optional properties which may include Phoenix and HBase configuration properties, and the connection string which is composed of: jdbc:phoenix [ :<zookeeper quorum> [ :<port number> [ :<root node> [ :<principal> [ :<keytab file> ] ] ] ] ] For any omitted parts, the relevant property value, hbase.zookeeper.quorum, hbase.zookeeper.property.clientPort, and zookeeper.znode.parent will be used from hbase-site.xml configuration file. The optional principal and keytab file may be used to connect to a Kerberos secured cluster. If only principal is specified, then this defines the user name with each distinct user having their own dedicated HBase connection (HConnection). This provides a means of having multiple, different connections each with different configuration properties on the same JVM. For example, the following connection string might be used for longer running queries, where the longRunningProps specifies Phoenix and HBase configuration properties with longer timeouts: Connection conn = DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”, longRunningProps); Connection conn = DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning", shortRunningProps); while the following connection string might be used for shorter running queries: Please read the relevant FAQ entry for example URLs.

Transactions To enable full ACID transactions, a beta feature available in the 4.7.0 release, set the phoenix.transactions.enabled property to true. In this case, you’ll also need to run the transaction manager that’s included in the distribution. Once enabled, a table may optionally be declared as transactional (see here for directions). Commits over transactional tables will have an all-or-none behavior - either all data will be committed (including any updates to secondary indexes) or none of it will (and an exception will be thrown). Both cross table and cross row transactions are supported. In addition, transactional tables will see their own uncommitted data when querying. An optimistic concurrency model is used to detect row level conflicts with first commit wins semantics. The later commit would produce an exception indicating that a conflict was detected. A transaction is started implicitly when a transactional table is referenced in a statement, at which point you will not see updates from other connections until either a commit or rollback occurs. Non transactional tables have no guarantees above and beyond the HBase guarantee of row level atomicity (see here). In addition, non transactional tables will not see their updates until after a commit has occurred. The DML commands of Apache Phoenix, UPSERT VALUES, UPSERT SELECT and DELETE, batch pending changes to HBase tables on the client side. The changes are sent to the server when the transaction is committed and discarded when the transaction is rolled back. If auto commit is turned on for a connection, then Phoenix will, whenever possible, execute the entire DML command through a coprocessor on the server-side, so performance will improve. Most commonly, an application will let HBase manage timestamps. However, under some circumstances, an application needs to control the timestamps itself. In this case, the CurrentSCN property may be specified at connection time to control timestamps for any DDL, DML, or query. This capability may be used to run snapshot queries against prior row values, since Phoenix uses the value of this connection property as the max timestamp of scans. Timestamps may not be controlled for transactional tables. Instead, the transaction manager assigns timestamps which become the HBase cell timestamps after a commit. Timestamps still correspond to wall clock time, however they are multiplied by 1,000,000 to ensure enough granularity for uniqueness across the cluster.