Keeping you updated with latest technology trends, Join DataFlair on Telegram

1. Objective

Through this Apache Flink installation Tutorial, we will understand how to setup multi node Apache Flink cluster. Moreover, we will see Apache Flink Cluster configuration, prerequisites for Flink cluster setup, installation of Flink. Also, we will look at Flink cluster execution, starting Flink cluster and how to stop the cluster in Flink. Along with this, we will also understand how to start programming in Apache Flink and run Flink Applications after Apache Flink Cluster setup on CentOS/RedHat.

So, let’s start Apache Flink Cluster Setup Tutorial.

2. Introduction to Apache Flink Cluster setup on CentOS

Before we start setting cluster on Flink, let us revise our Flink concepts.

So, as we know Apache Flink – Key Big data platform and we have seen what is Apache Flink, Apache Flink features and Apache Flink use cases in real time, let us learn how to install Apache Flink on CentOS. Moreover, what are the prerequisites for Apache Flink Cluster and also various commands and setups required for complete Flink installation?

a. Platform for Apache Flink Installation on CentOS

OS: Linux is supported as a development and production platform. Here we will use CentOS or Redhat for Flink installation.

is supported as a development and production platform. Here we will use CentOS or Redhat for Flink installation. Flink: Apache Flink 1.x (flink-1.1.3-bin-hadoop26-scala_2.10.tgz)

3. Install Flink on Master

i. Prerequisites for Apache Flink Cluster

a. Add Entries in hosts file

You need to edit hosts file ($sudo nano /etc/hosts) and add entries of master and slaves as below:



MASTER-IP master SLAVE01-IP slave01 SLAVE02-IP slave02

b. Install Java 8 (Recommended Oracle Java)

(NOTE: In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of corresponding IP)

You need to perform below steps for Java 8 installation on CentOS:

Download Archive File

Download latest version of java for 32 Bit:



$ cd /opt/ $ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u25-b17/jdk-8u25-linux-i586.tar.gz" $ tar jdk-8u25-linux-i586.tar.gz

Install JAVA

After extracting tar file, we just need to set up new version of java using alternatives. Use the following commands to do it.



$ cd /opt/jdk1.8.0_25/ $ alternatives --install /usr/bin/java java /opt/jdk1.8.0_25/bin/java 2 $ alternatives --config java

/opt/jdk1.8.0/bin/java /opt/jdk1.7.0_55/bin/java /opt/jdk1.8.0_25/bin/java

$ alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_25/bin/jar 2 $ alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_25/bin/javac 2 $ alternatives --install /usr/bin/javaws javaws /opt/jdk1.8.0_25/bin/javaws 2 $ alternatives --set jar /opt/jdk1.8.0_25/bin/jar $ alternatives --set javac /opt/jdk1.8.0_25/bin/javac

There are 3 programs which provide ‘java’.Once JAVA 8 installation on server is done, we need to setup javac and jarring using following commands:

Check JAVA Version

Following command has to be used to check java version:



$ java -version java version "1.8.0_25" Java(TM) SE Runtime Environment (build 1.8.0_25-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

Setup Environment Variables

Follow below steps to set Java environment:

Setup JAVA_HOME Variable

$ export JAVA_HOME=/opt/jdk1.8.0_25

Setup JRE_HOME Variable

$ export JRE_HOME=/opt/jdk1.8.0_25/jre

Setup PATH Variable

$ export PATH=$PATH:/opt/jdk1.8.0_25/bin:/opt/jdk1.8.0_25/jre/bin

c. Configure SSH

Below are the steps for SSH configuration:

Install Open SSH Server-Client:

$sudo yum -y install openssh-server openssh-client

Start the SSH Services



$sudo chkconfig sshd on $sudo service sshd start

Generate Key Pairs:

$ssh-keygen -t rsa -P ""

Configure password-less SSH:

Copy the content of .ssh/id_rsa.pub (of master) to .ssh/authorized_keys (of all the slaves as well as master)

Check by SSH to all the Slaves:



$ssh slave01 ssh slave02

ii. Install Flink on RedHat/CentOS

Now we are all ready with the prerequisites to install Flink. Let us start Flink installation on RedHat/CentOS.

a. Download Flink

You need to download below Flink setup for installation:

http://www-eu.apache.org/dist/flink/flink-1.1.3/flink-1.1.3-bin-hadoop26-scala_2.10.tgz

b. Untar Tar ball

$tar xzf flink-1.1.3-bin-hadoop26-scala_2.10.tgz

(Note: All the required jars, scripts, configuration files, etc. are available in FLINK_HOME directory (flink-1.1.3))

c. Setup Configuration

Now set the required Flink configuration as below:

Edit .bashrc

Edit .bashrc file located in user’s home directory and add following environment variables:



export FLINK_HOME=/home/ubuntu/flink-1.1.3/ export PATH=$PATH:$FLINK_HOME/bin

(Note: After above step, restart the Terminal/Putty so that all the environment variables will come into effect)

Edit flink-conf.yaml:

Edit configuration file flink-conf.yaml (located in FLINK_HOME/conf) and specify master node (Job Manager):



$nano flink-conf.yaml jobmanager.rpc.address: master

Edit Slaves:

Edit configuration file slaves (located in FLINK_HOME/conf) and add following entries:



$nano slaves slave01 slave02

iii. Install Flink On Slaves

“Flink is setup on Master; now install Flink on all the Slaves”

Below are the steps required to be performed for installing Apache Flink on Slave nodes:

a. Setup Pre-requisites on all the slaves

Run following steps on all the slaves:

“1.1. Add Entries in hosts file”

“1.2. Install Java 8 (Recommended Oracle Java)”

b. Copy configured setups from master to all the slaves

Create tar-ball of configured setup:

$ tar czf flink.tar.gz flink-1.1.3

(NOTE: Run this command on Master)

Copy the configured tar-ball on all the slaves

$ scp flink.tar.gz slave01:~

(NOTE: Run this command on Master)

$ scp flink.tar.gz slave02:~

(NOTE: Run this command on Master)

c. Un-tar configured flink setup on all the slaves

$tar xzf flink.tar.gz

(NOTE: Run this command on all the slaves)

Flink is setup on all the Slaves. Now let us start the Cluster

iv. Start the Apache Flink Cluster

Once Flink setup on Master and slave is completed, we need to start the Flink services as below:

a. Start the Services

$bin/start-cluster.sh

(Note: Run this command on Master)

b. Check whether services have been started

Use the commands as shown below to check the status of the services:

Check daemons on Master



$jps JobManager

Check daemons on Slaves



$jps TaskManager

v. Play with Apache Flink

As the Flink setup on master and slave is completed and all services are running fine, let us start Flink applications:

a. Flink Web UI

http://<Master-IP>:8081

The UI will show the information about job manager, task managers, jobs, etc.

b. Run Flink Application

$ bin/flink run <Jar-Path> -input <Input-Path> -output <Output-Path>

Note: If you are using local-FS for input, the input file must be available on all the nodes of the cluster. To Use HDFS use hdfs://master:9000///<Path>

vi. Stop the Flink Cluster

Once you are done with Flink practicals, let us learn how to stop the Flink cluster.

Use below commands for the same:

a. Stop the Apache Flink Services

$bin/stop-cluster.sh

(Note: Run this command on Master)

Now when we have learnt how to do Flink installation on multi node cluster in CentOS/RedHat, let us learn some of the Flink real life use cases and Commands to play with Apache Flink.

Learn how to install Flink Cluster on Linux.

So, this was all in Apache Flink Cluster Setup Tutorial. Hope you like our explanation.

4. Conclusion – Apache Flink Cluster

Hence, in this Apache Flink Cluster setup, we discussed Flink installation on CentOs. Also, we saw installing Flink on Master and Slaves. Still, if you have any confusion, ask in the comment tab.

Reference for Flink