This article will help you how to setup/configure a High-Availability (HA) cluster on Linux/Unix based systems. Cluster is nothing but a group of computers (called nodes/members) to work together to execute a task. Basically there are four types of clusters available, which are Storage Cluster, High-availability Cluster, Load-balancing Cluster, and HIGH-Performance Computing Cluster. In production, HA (High-Availability) and LB (Load Balancing) Clusters are the most deployed cluster types in the clustered environment. It offers, uninterrupted availability of services/data as they can be ( for eg: web services) to the end-user community. HA Cluster configurations sometimes grouped into two subsets: (Active-active and Active-passive).

Active-active: Typically it needs a minimum of two nodes, both nodes should be run actively the same service/application. This is mainly used to achieve the Load Balancing (LB) Cluster to distributes the workloads across the nodes.

Active-passive: It also needs a minimum of two nodes to provide a fully redundant system. Here, the service/application runs only on one node at a time and it is mainly used to achieve the High Availability (HA) Cluster as one node will be active and the other will be a standby (passive).

In our setup, we will be focusing only on High-Availability (Active-passive) also known as Fail-over cluster. One of the biggest achievements by having the nodes in the HA cluster will be tracking each other nodes and migrate the service/application to the next node in-case of any failures in nodes. Also, the faulty node won’t be visible to the clients from outside, but there will be a small service disruption during the migration period. It also maintains the data integrity of the service using HA.

The High-Availability Cluster in RedHat / Centos 7 is completely different from the previous versions. In RedHat version 7 onwards “pacemaker” becomes the default Cluster Resource-Manager (RM) and Corosync is responsible is an exchange and update the cluster information with other cluster nodes regularly. Both Pacemaker and Corosync are very powerful opensource technologies that are completely the replacement of CMAN and RGManager from the previous versions of RedHat clusters.

This step by step guide will help you how to configure High-Availability (HA) / Fail-over cluster with common iscsi shared storage on RHEL/CentOS 7.6. You can use the same guide for all the versions of RHEL/CentOS/Fedora with a few minimal changes.

Prerequisites:

Operating System : CentOS Linux 7

Shared Storage : iSCSI SAN

Floating IP address : For Cluster nodes

package : pcs, fence-agents-all and targetcli

My Lab Setup :

For the Lab setup, I am using 3 centos machines. Two for Cluster nodes and one for iSCSI/Target Server

Node-1: Operating System:- CentOS Linux 7 (Core)

hostname:- node1.lteck.local

IP Address:- 192.168.3.100

Node-2: Operating System:- CentOS Linux 7 (Core)

hostname:- node2.lteck.local

IP Address: -192.168.3.101

iSCSI-Server: Operating System:- CentOS Linux 7 (Core)

hostname:- iscsi-server.local

IP Address:- 192.168.3.102

Block device :- /dev/sdb

Other Info: Cluster Name :- linuxteck_cluster

Virtual IP:- 192.168.3.105

Step 1: Setup Storage Server (iSCSI)

Use the following command to check the available block device to use for a Storage Server.

# lsblk

Output:

From the above command, it will list all (/dev/sda and /dev/sdb) the block devices in a tree format. In our demo, I will be using “/dev/sdb” with 1GB disk as shared storage for cluster nodes.

Note: Shared storage becomes one of the important resources to all the High-Availability clusters, as it needs to provide the same type of application data across all the nodes in the cluster and it should be accessed either consecutively or at the same time of a running application in the cluster. In production, SAN storage will be used widely. In our LAB, we will use iSCSI shared storage for our HA Cluster.

Add the following entries into /etc/hosts file in the following format “IP Address Domain-name [Domain-aliases]”. It will help to resolves host-names, which means it can easily bind local IP addresses into a host-name or a web address or URLs. # vi /etc/hosts 192.168.3.102 storage.lteck.local storage Note: The above fields are separated by at least one space or tab. The 1st field is the numeric IP address and the 2nd field specifies the locally-know host-name connected the IP address of the 1st field and the 3rd field will be aliases or alternate name for the given host-name. To learn more about DNS: click here How to set up Domain Name Service (DNS) on Linux First, let’s update the latest current version and then install the target utility package. # yum update -y # yum install -y targetcli Now follow the below command to get in the interactive shell of iSCSI Server. # targetcli targetcli shell version 2.1.fb49

Copyright 2011-2013 by Datera, Inc and others.

For help on commands, type ‘help’. /> (a) Create a backstore block device: /> /backstores/block create ltecklun1 /dev/sdb (b) Create iSCSI for IQN target: /> /iscsi create iqn.2020-01.local.server-iscsi:server (c) Create ACLs: /> /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1/acls create iqn.2020-01.local.client-iscsi:client1 (d) Create LUNs under the ISCSI target: /> /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1/luns create /backstores/block/ltecklun1 (e) Enable CHAPP Authentication /> cd /iscsi/iqn.2020-01.local.server-iscsi:server/tpg1 /iscsi/iqn.20…i:server/tpg1> set attribute authentication=1 /iscsi/iqn.20…i:server/tpg1> cd acls/iqn.2020-01.local.client-iscsi:client1 /iscsi/iqn.20…iscsi:client1> set auth userid=linuxteck /iscsi/iqn.20…iscsi:client1> set auth [email protected] /iscsi/iqn.20…iscsi:client1> cd / /> ls /> saveconfig /> exit (f) Add a firewall rule to permit iscsi port 3260 OR disable it # firewall-cmd –permanent –add-port=3260/tcp # firewall-cmd –reload # firewall-cmd –list-all OR # systemctl disable firewalld.service # systemctl stop firewalld.service (g) Disable the SELinux (h) Finally, enable and start the iSCSI target. # systemctl enable target.service # systemctl restart target.service # systemctl status target.service Note: That’s it for the iSCSI configuration part. Click here To see the detailed configuration setup of iSCSI Server-Client on Centos / RHEL 7.6. Step 2: Setup High-Availability (HA) Cluster Add the following host entries on all the nodes and shared storage in the cluster. It will help the systems to communicate with each other using hostnames. Node:1 # vi /etc/hosts 192.168.3.100 node1.lteck.local node1

192.168.3.101 node2.lteck.local node2

192.168.3.102 storage.lteck.local storage Node:2 # vi /etc/hosts 192.168.3.100 node1.lteck.local node1

192.168.3.101 node2.lteck.local node2

192.168.3.102 storage.lteck.local storage (a) Import the LUNs on all the nodes across the cluster (Node1 and Node2) (i) Before importing LUN from the shared storage, let’s update the latest current version of Centos 7.x on both nodes (Node1 and Node2) # yum update -y (ii) Install the iscsi-initiator package on both nodes (Node1 and Node2) # yum install -y iscsi-initiator-utils (iii) Use the following command to add the initiator name on both nodes (Node1 and Node2). You can pick the initiator name from the target server which was already created, in our case it is “iqn.2020-01.local.client-iscsi:client1”. # vi /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.2020-01.local.client-iscsi:client1 (iv) Save and restart iscsid service on both nodes # systemctl restart iscsid.service # systemctl enable iscsid.service # systemctl status iscsid.service (v) Next, configure CHAP authentication on both nodes (Node1 and Node2) # vi /etc/iscsi/iscsid.conf node.session.auth.authmethod = CHAP node.session.auth.username = linuxteck node.session.auth.password = [email protected] Save the file: (vi) Now, it’s a time to Discover the iSCSI Shared Storage (LUNs) on both nodes (Node1 and Node2) # iscsiadm –mode discoverydb –type sendtargets –portal 192.168.3.102 –discover Output:

192.168.3.102:3260,1 iqn.2020-01.local.server-iscsi:server Note: The LUN has successfully discovered on both nodes (iQNs). (vii) Use the following command to log in to the Target Server: # iscsiadm -m node –login Output: Logging in to [iface: default, target: iqn.2020-01.local.server-iscsi:server, portal: 192.168.3.102,3260] (multiple)

Login to [iface: default, target: iqn.2020-01.local.server-iscsi:server, portal: 192.168.3.102,3260] successful. (viii) Use the following command to verify the newly added disk on both nodes # lsblk Note: The new disk drive “sdb” with 1GB volume size is visible now on both of the nodes (Node1 and Node2). (ix) Use the following command to create a filesystem to the newly added block device (/dev/sdb) to any one of your nodes, either node1 or node2. In our demo, I will use it on Node1. # mkfs.xfs /dev/sdb Note: Before moving to install the Custer packages, we need to ensure that our shared storage is accessible on all the nodes with the same data. For testing purpose, use the following steps to mount the newly added disk temporarily with /mnt directory and create 3 files named “1, 2, 3”, then use ‘ls’ command to verify these files are placed in /mnt directory and finally unmount the /mnt directory from Node1. # mount /dev/sdb /mnt # cd /mnt [[email protected] mnt]# touch 1 2 3 [[email protected] mnt]# ls

1 2 3 [[email protected] mnt]# cd [[email protected] ~]# umount /mnt/ Now, move on to the Node2 and run the following command to see those files created on Node1 are available on Node2. [[email protected] ~]# mount /dev/sdb /mnt/ [[email protected] ~]# cd /mnt/ [[email protected] mnt]# ls

1 2 3 [[email protected] mnt]# cd

[[email protected] ~]# umount /mnt/ Note: It is confirmed that our shared storage is working on all the available nodes in the Cluster. In our case, it is perfectly working on both Node1 and Node2. Finally, we have successfully implemented the LUN “/dev/sdb” on both nodes. That’s it. Now move forward into the Cluster setup. (b) Install and configure Cluster Setup (i) Use the following command to Install cluster Packages (pacemaker) on both nodes (Node1 and Node2) # yum install pcs fence-agents-all -y Note: Once you have successfully installed the packages on both nodes, then configure the firewall service to permit the High-Availability application to have a direct connection between the nodes (Node1 and Node2). If you wish not to apply any firewall rules, then simply disable it. # firewall-cmd –permanent –add-service=high-availability # firewall-cmd –reload # firewall-cmd –list-all (ii) Now, start the cluster service and enable it for every reboot on both nodes (Node1 and Node2). # systemctl start pcsd # systemctl enable pcsd # systemctl status pcsd (iii) Cluster Configuration -: Use the following command to set the password for “hacluster” user on both nodes (Node1 and Node2). # echo <EnterYourPassword> | passwd –stdin hacluster Note: The real purpose of “hacluster” user in the cluster is to communicate between the nodes. This user (hacluster) has been created during the installation of Cluster software itself. In order to make proper communication, we need to set a password for this account. It is recommended to use the same password on all nodes. (iv) Use the following command to Authorize the nodes. Execute it to only one of your node in the Cluster. In our case I prefer to run it on Node1. # pcs cluster auth node1.lteck.local node2.lteck.local Output: Username: hacluster

Password:

node2.lteck.local: Authorized

node1.lteck.local: Authorized Note: The above command is mainly used to authenticate the pcs to the pcsd across the nodes in the cluster. Authentication should be done only once. The token (Authorization) key file will be saved either one of the paths (~/.pcs/tokens or /var/lib/pcsd/tokens). (v) Start and configure the Cluster Nodes. Execute the following command to only one of your nodes. In our case Node1 # pcs cluster setup –start –enable –name linuxteck_cluster node1.lteck.local node2.lteck.local Note: Using the above command we can set up a New Cluster. The cluster can be defined with a name and consists of all the nodes to be part of the Cluster. In our case, we have defined the Cluster name as “linuxteck_cluster” and added node1 and node2 be part of it. The combination of ‘–start’ & ‘–enable’ will affect the cluster service on both nodes (Node1 and Node2) in the Cluster. From the above output, you can see the cluster has successfully created and started the service on both nodes. (vi) Enable the Cluster service for every reboot # pcs cluster enable –all Output: node1.lteck.local: Cluster Enabled

node2.lteck.local: Cluster Enabled Note: Using the above command we have enabled the Clusters on both nodes. Next, before adding the resources on top of the cluster we need to check the status of the Clusters. (vii) Use the following command to get the simple or detailed cluster status # pcs cluster status Output: Cluster Status:

Stack: corosync

Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) – partition with quorum

Last updated: Wed Mar 11 19:46:41 2020

Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local

2 nodes configured

0 resources configured PCSD Status:

node1.lteck.local: Online

node2.lteck.local: Online Note: It will list only the status of your cluster part and the following command will get you the detailed information of the Cluster which consists of the details of the Nodes, the status of pcs and the resources. # pcs status Output: Cluster name: linuxteck_cluster WARNINGS:

No stonith devices and stonith-enabled is not false Stack: corosync

Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) – partition with quorum

Last updated: Wed Mar 11 19:47:06 2020

Last change: Wed Mar 11 18:58:35 2020 by hacluster via crmd on node1.lteck.local 2 nodes configured

0 resources configured Online: [ node1.lteck.local node2.lteck.local ] No resources Daemon Status:

corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled Note: From the above output, we could able to see the Cluster setup is working perfectly on both of the nodes, but no resources are configured yet. Next, let’s try to add a few resources in order to complete the cluster setup. Before moving forward let us try to verify the cluster configuration. # crm_verify -L -V WARNING: You will be notified by an error from the above output “Errors like unpack_resources”. It means the above tool has found some errors regarding the Fencing setup as STONITH is enabled by default. In our demo setup, we will disable this feature. This option “stonith-enabled=false” is not recommended for a production cluster setup. (viii) Setup Fencing Fencing is also called as STONITH “Shoot The Other Node In The Head”, it is one of the important tools in the cluster which can be used to safeguard the data corruption on the shared storage. Fencing plays a vital role when the nodes could not able to talk to each other. It will detach the shared storage accessing from the faulty node. In Fencing there are two types available: Resource Level Fencing and Node Level Fencing. In this demo, I am not going to run Fencing (STONITH), as our machines are running on Vmware environment, which doesn’t support it, but those who are implementing on a production environment please click here to see the entire setup of fencing Use the following command to disable the sTONITH and ignore the quorum policy and check the status of Cluster Properties to ensure both are disabled: # pcs property set stonith-enabled=false # pcs property set no-quorum-policy=ignore # pcs property list Output: Cluster Properties:

cluster-infrastructure: corosync

cluster-name: linuxteck_cluster

dc-version: 1.1.20-5.el7_7.2-3c4c782f70

have-watchdog: false

no-quorum-policy: ignore

stonith-enabled: false Note: The output of the Cluster Properties shows both the STONITH and Quoram Policy are disabled. (ix) Resources / Cluster Services In Clustered services, the resources would be either a physical hardware unit such as disk drives or logical units like IP address, Filesystem or applications. In a cluster, a resource can run only on a single node at a time. In our demo we will be using the following resources : Httpd Service

IP Address

Filesystem First, let us install and configure the Apache server on both nodes (Node1 and Node2). Follow the steps: # yum install -y httpd Add the below entries at the end of the apache configuration file (‘/etc/httpd/conf/httpd.conf’) # vi /etc/httpd/conf/httpd.conf <Location /server-status>

SetHandler server-status

Order deny,allow

Deny from all

Allow from 127.0.0.1

</Location> Save the file. Note: For storing Apache files (HTML/CSS) we need to use our centralized storage unit (i.e., iSCSI server). This setup has to do only on one node. In our case Node1. # mount /dev/sdb /var/www/ # mkdir /var/www/html # echo “Red Hat Hight Availability Cluster on LinuxTeck” > /var/www/html/index.html # umount /var/www Note: That’s it for the Apache configuration. Use the following command add a firewall rule for apache service on both nodes (Node1 and Node2) OR simply disable the Firewall. Click here To see the detailed configuration setup of Apache LAMP on Centos / RHEL 7.6. # firewall-cmd –permanent –add-port=80/tcp # firewall-cmd –permanent –add-port=443/tcp # firewall-cmd –reload # firewall-cmd –list-all OR # systemctl disable firewalld.service # systemctl stop firewalld.service Disable the SELinux or click here to configure SELinux for Apache. (x) Create Resources. In this section, we will add three cluster resources: “FileSystem resources named as APACHE_FS”, “Floating IP address resources named as APACHE_VIP”, “Webserver resources named as APACHE_SERV”. Use the following command to add the three resources in the same group. (i) Add the first resource: Filesystem with the combination of shared storage (iSCSI Server) # pcs resource create APACHE_FS Filesystem device=”/dev/sdb” directory=”/var/www” fstype=”xfs” –group apache Output: Assumed agent name ‘ocf:heartbeat:Filesystem’ (deduced from ‘Filesystem’) (ii) Add the second resource: Floating IP address # pcs resource create APACHE_VIP IPaddr2 ip=192.168.3.105 cidr_netmask=24 –group apache Output: Assumed agent name ‘ocf:heartbeat:IPaddr2’ (deduced from ‘IPaddr2’) (iii) Add the third recourse: APACHE_SERV # pcs resource create APACHE_SERV apache configfile=”/etc/httpd/conf/httpd.conf” statusurl=”http://127.0.0.1/server-status” –group apache Output: Assumed agent name ‘ocf:heartbeat:apache’ (deduced from ‘apache’) Note: After the resources and the resource group creation, start the cluster. # pcs cluster start –all Output: node1.lteck.local: Starting Cluster (corosync)…

node2.lteck.local: Starting Cluster (corosync)…

node2.lteck.local: Starting Cluster (pacemaker)…

node1.lteck.local: Starting Cluster (pacemaker)… Note: The above output clearly shows, as both the Corosync service and Pacemaker services are started on both of the nodes (Node1 and Node2) in the Cluster. The status can be check use the following command: # pcs status Output: Cluster name: linuxteck_cluster

Stack: corosync

Current DC: node1.lteck.local (version 1.1.20-5.el7_7.2-3c4c782f70) – partition with quorum

Last updated: Thu Mar 12 19:09:13 2020

Last change: Thu Mar 12 19:09:00 2020 by root via cibadmin on node1.lteck.local 2 nodes configured

3 resources configured Online: [ node1.lteck.local node2.lteck.local ] Full list of resources: Resource Group: apache APACHE_FS (ocf::heartbeat:Filesystem): Started node1.lteck.local

APACHE_VIP (ocf::heartbeat:IPaddr2): Started node1.lteck.local

APACHE_SERV (ocf::heartbeat:apache): Started node1.lteck.local Daemon Status: corosync: active/enabled

pacemaker: active/enabled

pcsd: active/enabled Note: The above output clearly shows the cluster is up and all the three resources are running on the same node (Node1). Now we need to use the Apache Virtual IP address to get the sample web page earlier.