Access the HAProxy Node

After having accessed the BusyBox and being inside a ssh session, just access the instances by name, in our case we want to access hapx-node01.

debian@busybox:~$ ssh hapx-node01

Configure Pacemaker

Before carrying out with the Pacemaker configuration, it is worth making some observations.

1 — Let’s check IP configuration, using ip addr :

debian@hapx-node01:~$ ip addr show enp0s3.41



3: enp0s3.41@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

link/ether 08:00:27:a4:ce:07 brd ff:ff:ff:ff:ff:ff

inet6 fe80::a00:27ff:fea4:ce07/64 scope link

valid_lft forever preferred_lft forever

As you can see, we still don’t have our cluster’s IP ( 192.168.4.20 ) configured on any of the network interfaces.

2 — Let’s check Pacemaker configuration, using crm status

debian@hapx-node01:~$ sudo crm status



Stack: corosync

Current DC: hapx-node02 (version 1.1.16-94ff4df) - partition with quorum

Last updated: Sun Feb 2 19:53:25 2020

Last change: Sun Feb 2 19:51:43 2020 by hacluster via crmd on hapx-node02



2 nodes configured

0 resources configured



Online: [ hapx-node01 hapx-node02 ]



No resources

Here we notice that we have only two active and configured nodes ( hapx-node01 and hapx-node02 ), but no resources that will make up our cluster ( virtual-ip-resource and haproxy-resource ).

3 — Let’s configure resources on Pacemaker using crm configure

Here we define our Virtual IP as 192.168.4.20 . This will be the IP address of our K8S cluster (Control Plane EndPoint).

At this point, we will configure the features of our HAProxy Cluster using the crmsh tool. crmsh is a cluster management shell for the Pacemaker High Availability stack.

The following step can be run on any node, because right now Corosync should keep the Cluster Configuration in sync.

Note: each line below represents a command that should be entered separately in the command line.

debian@hapx-node01:~$ cat <<EOF | sudo crm configure

property stonith-enabled=no

property no-quorum-policy=ignore

property default-resource-stickiness=100

primitive virtual-ip-resource ocf:heartbeat:IPaddr2 params ip="192.168.4.20" broadcast=192.168.4.31 nic=enp0s3.41 cidr_netmask=27 meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart

primitive haproxy-resource ocf:heartbeat:haproxy op monitor interval=20 timeout=60 on-fail=restart

colocation loc inf: virtual-ip-resource haproxy-resource

order ord inf: virtual-ip-resource haproxy-resource

commit

bye

EOF

4 — Let’s check our IP configuration one more time, using ip addr :

debian@hapx-node01:~$ ip addr show enp0s3.41 3: enp0s3.41@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

link/ether 08:00:27:a4:ce:07 brd ff:ff:ff:ff:ff:ff

inet 192.168.4.20/27 brd 192.168.4.31 scope global enp0s3.41

valid_lft forever preferred_lft forever

inet6 fe80::a00:27ff:fea4:ce07/64 scope link

valid_lft forever preferred_lft forever

Voilá! Now our cluster’s IP is properly configured and managed in the enp0s3.41 interface.

5 — Let’s get some more information from our cluster, using crm status :

debian@hapx-node01:~$ sudo crm status Stack: corosync

Current DC: hapx-node01 (version 1.1.16-94ff4df) - partition with quorum

Last updated: Sun Feb 2 19:19:16 2020

Last change: Sun Feb 2 19:04:37 2020 by root via cibadmin on hapx-node01 2 nodes configured

2 resources configured Online: [ hapx-node01 hapx-node02 ] Full list of resources: virtual-ip-resource (ocf::heartbeat:IPaddr2): Started hapx-node01

haproxy-resource (ocf::heartbeat:haproxy): Started hapx-node01

Here we can see that both nodes and resources are active and configured.

Looking closer, we can see that the hapx-node01 node is the one that has these two resources ( virtual-ip-resource and haproxy-resource ) allocated. That makes perfect sense, as we configured these resources to be always allocated on the same node.

Pacemaker parameters explained (TL;DR):

property stonith-enabled=no

STONITH has the function of protecting your data against corruption and the application to get unavailable, due to simultaneous unintentional access by several nodes. For example, just because a node does not respond, does not mean that it has stopped accessing its data. The only way to be 100% sure that your data is secure is to ensure that the node is actually offline before allowing the data to be accessed by another node.

STONITH also plays a role in the event that a service cannot be stopped. In this case, the cluster uses STONITH to force the node to go offline, making it safe to start the service elsewhere.

STONITH is an acronym for "Shoot The Other Node In The Head", and is the most popular data protection mechanism.

To ensure the integrity of your data, STONITH is activated by default.

In our case, as we do not access data such as databases nor files, it does not make sense to keep STONITH active. For this reason, we set it to stonith-enabled=no

property no-quorum-policy=ignore

The no-quorum-policy parameter determines how the cluster behaves when there aren't enough nodes to compose it. To avoid a split-brain scenario, the cluster will only respond if quorum is achieved. To illustrate, imagine a cluster with five nodes, where, due to a network failure, two separate groups are created: one group with three nodes, and another group with two nodes. In this scenario, only the group with three nodes is able to achieve a majority of votes. Thus, only the group with three nodes can make use of cluster resources. This configuration is very important, because there would be a risk of resources corruption if the group with only two nodes was also able to use them. The default value for the no-quorum-policy parameter is stop .

We only have two nodes in our example. Thus, if one of they got offline for any reason, our whole cluster would be taken down due to lack of quorum (>50%). To avoid this situation, we configure our policy to ignore and nothing else needs to be done. In a production scenario, it would be a good idea to have at least 3 nodes to ensure higher availability though.

property default-resource-stickiness=100

The default-resource-stickiness determines where the cluster resources will be allocated. The default behavior is to get the resources back to the original nodes where they were allocated. This means that, after a failure, the resource will be allocated in another node from the cluster and, when the original node is back to a healthy state, the resource is moved back to it. This is not ideal, because the users will be exposed to a inconsistent scenario twice. To avoid this situation, you can set a weight (between -1.000.000 and 1.000.000) to the default-resource-stickiness parameter: a 0 means the resource will be moved back to its original node; a positive value tells the resource should be kept where it is.

In our case, we arbitrarily set it to 100 .

primitive virtual-ip-resource ocf:heartbeat:IPaddr2 params ip="192.168.4.20" broadcast=192.168.4.31 nic=enp0s3.41 cidr_netmask=27 meta migration-threshold=2 op monitor interval=20 timeout=60 on-fail=restart

primitive - Represents a resource that should exist as a single instance in the whole cluster. An IP, for example, can be configured as a primitive resource and there should be only one instance of this resource in the cluster at any given time.

virtual-ip-resource - A unique name we give to our resource.

ocf:heartbeat:IPaddr2 - The OCF cluster resource agent.

meta migration-threshold - When a resource is created, you can configure it to be moved to a different node after a given number of failures happen. This parameter serves this purpose. After the limit is reached, the current node won't be able to own the resource until one of the following happens

An administrator resets the resource’s failcount value.

○ The resource’s failure-timeout value is reached.

○ The default value for the migration-threshold is INFINITY . Internally, this is defined as a very high, but finite value. Setting this to 0 disables the threshold behavior for the given resource.

params - Parameters for resource agent:

ip - The IPv4 address to be configured in dotted quad notation, for example "192.168.1.1". (required, string, no default)

nic - The base network interface on which the IP address will be brought online. If left empty, the script will try and determine this from the routing table. Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. Prerequisite: There must be at least one static IP address, which is not managed by the cluster, assigned to the network interface. If you can not assign any static IP address on the interface, modify this kernel parameter: sysctl -w net.ipv4.conf.all.promote_secondaries=1 (or per device). (optional, string, default eth0)

cidr_netmask - The netmask for the interface in CIDR format (e.g., 24 and not 255.255.255.0). If unspecified, the script will also try to determine this from the routing table. (optional, string, no default)

broadcast - Broadcast address associated with the IP. If left empty, the script will determine this from the netmask. (optional, string, no default)

op - Configure monitoring operation:

monitor - The action to perform. Common values: monitor , start , stop

interval - If set to a nonzero value, a recurring operation is created that repeats at this frequency, in seconds. A nonzero value makes sense only when the action name is set to monitor. A recurring monitor action will be executed immediately after a resource start completes, and subsequent monitor actions are scheduled starting at the time the previous monitor action completed. For example, if a monitor action with interval=20s is executed at 01:00:00, the next monitor action does not occur at 01:00:20, but at 20 seconds after the first monitor action completes.

If set to zero, which is the default value, this parameter allows you to provide values to be used for operations created by the cluster. For example, if the interval is set to zero, the name of the operation is set to start, and the timeout value is set to 40, then Pacemaker will use a timeout of 40 seconds when starting this resource. A monitor operation with a zero interval allows you to set the timeout/on-fail/enabled values for the probes that Pacemaker does at startup to get the current status of all resources when the defaults are not desirable.

timeout - If the operation does not complete in the amount of time set by this parameter, it's aborted and considered as failed. The default value is the value of timeout if set with the pcs resource op defaults command, or 20 seconds if it is not set. If you find that your system includes a resource that requires more time than the system allows to perform an operation (such as start, stop, or monitor), investigate the cause and, if the lengthy execution time is expected, you can increase this value.

The timeout value is not a delay of any kind, nor does the cluster wait the entire timeout period if the operation returns before the timeout period has completed.

on-fail - The action to take if this action ever fails.

Allowed values:

○ ignore - Pretend the resource did not fail.

○ block - Do not perform any further operations on the resource.

○ stop - Stop the resource and do not start it elsewhere.

○ restart - Stop the resource and start it again (possibly on a different node).

○ fence - STONITH the node on which the resource failed.

○ standby - Move all resources away from the node on which the resource failed.

primitive haproxy-resource ocf:heartbeat:haproxy op monitor interval=20 timeout=60 on-fail=restart

The explanation is the same as above.

The explanation is the same as above. colocation loc inf: virtual-ip-resource haproxy-resource

colocation restrictions allow you to tell the cluster how resources depend on each other. It has an important side-effect: it affects the order in which the resources are assigned to a node.

Think a bit about it: the cluster can’t colocate A with B , unless it knows where B is located. For this reason, when creating colocation restrictions, it's really important to think if A needs to be colocated with B or if B needs to be colocated with A .

In our case, since the haproxy-resource should be colocated with the virtual-ip-resource , the haproxy-resource will be allocated on the same node where the virtual-ip-resource is.

order ord inf: virtual-ip-resource haproxy-resource

The order constraints tell the cluster the order in which resources should be allocated. In this case, we are informing that the virtual-ip-resource should always be allocated before the haproxy-resource .

Ordering constraints affect only the ordering in which resources are created. They do not cause the resources be colocated on the same node.

Understading the user-data file (TL;DR)

The cloud-init HAProxy configuration file can be found here. This sets up a Load Balance for the Kube Master Nodes.

Below you can find the same file commented for easier understanding:

View HAProxy stats page

Now that everything is set up, you can access the HAProxy stats through the Virtual IP we just configured.

Open your browser at http://192.168.4.20:32700

User: admin

Password: admin

It will show:

Notice all Control Plane EndPoints are DOWN

kube-mast01:6443

kube-mast02:6443

kube-mast03:6443

This will be fixed once we setup our Kubernetes Master nodes.

Test High Availability