TL;DR to be found at the end

Recently I came into a situation that I "needed" Hadoop native libraries. Well, when I say "needed", I mean I was just getting fed up by the constant warnings like this one:



WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

So I thought I would build my own Hadoop native libraries. How hard can it be, right? Honest answer? Less than an hour if you don't have a tutorial. Fifteen minutes if you do and most of that is compilation time. In my search, I found out a lot of tutorials and guides were either outdated or didn't offer everything needed for a full compilation and installation and that is why I wrote my own which I tested on two independent Macs, thus it should be "tested enough".

Why do it

There was no real world issue I was hoping to solve. I just had a few minutes on my hands and I used them to learn something new. But I did read that there are cases of speed improvements which is good if you are developing or testing something locally because local machines tend to be slow and any improvement is more than welcome. Another thing is I did see two random articles a while back saying they did have some issues with the Java libraries, but chances of some of you having the same issues are really small.

Dependencies

First of all, we need to install the dependencies for the build and I am including links so you can check what you are going to install exactly:

(Please note I am skipping maven, java and others that I think you would already have. If I am wrong, tell me and let's update the article. As well as Hadoop installation. There is a beautiful article about Hadoop installation on Mac by Zhang Hao here.)

For the installation of most of these, I will be using Homebrew. It's a good tool, has a one-liner installation and a very short average time to be productive with it. As the link provides everything you need I am skipping the installation here.

If you are not using Homebrew for the first time, update and upgrade your tools. If you are using it for some time already and would like to keep some things with the current version, use brew pin like this.



# Update brew update brew upgrade # Then the installation brew install wget gcc autoconf automake libtool cmake snappy gzip bzip2 zlib openssl

As you could have noticed one of those dependencies listed is missing from the list above. Yes! It is a protobuf that has been deprecated and can't be easily installed from Homebrew. So let's build our own. It's cleaner that way and much more fun then it sounds. We will first need to get it from GitHub and unarchive it somewhere. You can delete it right after, so you don't need a special folder structure.



wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz tar -xzf protobuf-2.5.0.tar.gz cd protobuf-2.5.0

Then comes the process of building and making sure everything went smoothly. It takes some time and I advise you to run it step by step to see and know what is happening. Some warnings here and there are normal so you can skip those.



./configure make make check make install # And just to check if everything is ok. # This should print libprotoc 2.5.0 protoc --version

OpenSSL setup

Now, linking OpenSSL libraries by hand as Homebrew refuses to link OpenSSL and the compiler needs them. This is a known feature and needs to be done by running ln .



cd /usr/local/include ln -s ../opt/openssl/include/openssl .

This will solve an error that looks something like the caption below.



[ exec ] -- Configuring incomplete, errors occurred! [ exec ] See also /Users/user/github/hadoop/hadoop-tools/hadoop-pipes/target/native/CMakeCMake Error at /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137 ( message ) : [ exec ] Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the [ exec ] system variable OPENSSL_ROOT_DIR ( missing: OPENSSL_INCLUDE_DIR ) [ exec ] Call Stack ( most recent call first ) : [ exec ] /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378 ( _FPHSA_FAILURE_MESSAGE ) [ exec ] /usr/local/Cellar/cmake/3.14.3/share/cmake/Modules/FindOpenSSL.cmake:413 ( find_package_handle_stFiles/CMakeOutput.log. [ exec ] andard_args ) [ exec ] CMakeLists.txt:20 ( find_package ) [ exec ] [ exec ]

Building native libraries

And finally! The building of the libraries. Again, this will create a folder that you can delete in the end. Here is probably the first place you will need to modify something and that is the version of Hadoop you will be using.



git clone https://github.com/apache/hadoop.git cd hadoop # Change the version as needed git checkout branch-<VERSION> # And just package. mvn package -Pdist ,native -DskipTests -Dtar # After build, move your newly created libraries. cp -R hadoop-dist/target/hadoop-<VERSION>/lib $HADOOP_HOME

Setting up environment variables

Now the critical part, making your shell see the libraries. I don't know what kind of shell you are using, nevertheless, put this into your shell profile ( .bashrc , .zshrc , etc.):



export HADOOP_OPTS = "-Djava.library.path= ${ HADOOP_HOME } /lib/native" export LD_LIBRARY_PATH = $LD_LIBRARY_PATH : ${ HADOOP_HOME } /lib/native export JAVA_LIBRARY_PATH = $JAVA_LIBRARY_PATH : ${ HADOOP_HOME } /lib/native

This will point all the libraries to the right path and will make everything fall right into place. The last thing that we need is just to check if everything is ok (and by everything I mean almost everything, because bzip is acting up and I still have not found a way to solve, when I do I will update this).



hadoop checknative -a #The output should be something like this. 19/05/17 19:00:14 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 19/05/17 19:00:14 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/local/Cellar/hadoop/2.7.5/lib/native/libhadoop.dylib zlib: true /usr/lib/libz.1.dylib snappy: true /usr/local/lib/libsnappy.1.dylib lz4: true revision:99 bzip2: false openssl: true /usr/lib/libcrypto.35.dylib 19/05/17 19:00:14 INFO util.ExitUtil: Exiting with status 1

Afterword

Hopefully, everything is running smoothly and you no longer get those warnings and if I helped even one person with this I am glad. Because if there is no added value for the reader, then it is just me talking to my wall. On the other hand, if you did find some issues in the code or the article, please do tell me and I will fix everything I am capable of.

This is just a step by step shell script extracted from the upper text.