Zeppelin + Hive

Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing.

We can integrate Hive using JDBC Interpreter. Following hive configuration and paths are from Cloudera setup but the same should be applicable for any other Hadoop distributions.

Interpreter Configuration

Go to the Interpreter screen and click +Create to create a new interpreter

Interpreter Name: hive

Interpreter group: jdbc

Change the following properties

default.driver org.apache.hive.jdbc.HiveDriver

default.url jdbc:hive2://hive_host:10000/

Artifacts

org.apache.hive:hive-jdbc:0.14.0

org.apache.hadoop:hadoop-common:2.6.0

(or) copy the jars from node itself

cp /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar /zeppelin/path/interpreter/jdbc/

cp /opt/cloudera/parcels/CDH/jars/hive-jdbc-* /zeppelin/path/interpreter/jdbc/

Configure HADOOP_HOME and HADOOP_CONF_DIR in zeppelin-env.sh

Kerberos Configuration

If the cluster is kerberized add the following properties for hive interpreter

default.url jdbc:hive2://hive_host:10000/;principal=hive/_HOST@DOMAIN

zeppelin.jdbc.auth.type KERBEROS

zeppelin.jdbc.keytab.location /path/to/keytab zeppelin.jdbc.principal principal_name

User Impersonation

User impersonation is must for authorizing users to run queries based on Sentry rules. To enable user impersonation add the user who can impersonate other users in hadoop.proxyuser.hive.groups. For Cloudera setup Hue user can be used.

Proxy user configuration in Cloudera Manager

Add the following properties for hive interpreter

zeppelin.jdbc.auth.kerberos.proxy.enable true

default.proxy.user.property hive.server2.proxy.user

Interpolation

In the latest zeppelin version we can use the Z context variables in SQL query like

select * from patents_list where priority_country = ‘{country_code}’

To enable it add the following property