In the recent posts of this series we talked about many different aspects of clustering for the JBoss AS 7 and its quality assured version EAP 6, such as:

Until now, there is one important thing we have not covered yet: clustering of the messaging subsystem. The EAP 6 as well as the AS 7 uses HornetQ as default messaging provider. In this post we want to give an overview about the clustering abilities of HornetQ and explain how to use the various clustering features in combination with the EAP 6 or respectively the JBoss AS 7. We implemented a simple JMS client application to demonstrate the HornetQ clustering abilities.

Overview

As already explained in a previous post about cluster topologies, clustering can mean many different things depending on what you want to achieve. One important thing to achieve is bare computing power i. e. here a higher message throughput. You want to be able to receive and deliver more messages during the same time frame. Another thing is high availability and failover.

Higher message throughput

A higher message throughput is meant with the wording clustering in HornetQ. In a HornetQ cluster all newly received messages can distributed across the whole cluster. A server that receives a message distributes this message by default round robin to itself or to those servers which are connected by a cluster connection. The illustration below shows a possible topology of a symmetric cluster with server-side message load balancing.

Load balancing can be configured to be a little bit more intelligent, so that messages will only be redirected to servers with corresponding consumers. But that does not completely solve the problem: What if the consumers disconnect while the message is on its way to the server? That is the case message redistribution was invented for. If enabled, the message will then, after a configurable timeout, be redirected to another server with consumers for that message.

Another approach which avoids network overhead is client-side load balancing during the creation of a new connection via the JMS connection factory.

High Availability

A second thing to achieve could be high availability. High availability does not mean that we do not loose any messages. HornetQ guarantees the delivery of durable JMS messages. The data, such as messages or destinations are stored in a file-based journal. What we want to achieve is that producers can still send messages and the consumers will still receive messages if one HornetQ server crashes.

High availability in HornetQ is realized through a live-backup structure, as shown in the illustration below. There is always one live server that processes the current message load. A live server can be backed by a second server. This backup server will not process any messages before the live server crashes and failover occured.

A backup server can also be backed with as many servers as you want. Those servers are waiting to become a backup server. If the live server crashes, the backup will become live and one of the awaiting-to-become-backup servers will become the passive backup of the new live server. When the previous live server is restarted it will be awaiting to become backup or become backup if there is none. To negotiate the roles with the other servers, every node of the ha-cluster will need a cluster-connection to all the other nodes. So you will need to enable clustering even if you are not using the clustering capabilities in order to achieve a higher throughput.

Only the live server processes messages. When the live server dies, we would expect, that the backup does not only receive new messages, but also delivers the messages, which are already received by the previous live server. The backup knows about the messages by a shared journal. In the upcoming version 2.3 of HornetQ the journal can be shared via replication over the network. However, for now a shared file-system-directory is required for failover. For performance reasons it is recommended to use a block-based protocol like Fibre Channel or HyperSCSI to share the journal directory, instead of a file-based protocol like NFS or SMB/CIFS. According to several threads as this, NFS is not supported or tested and might lead to problems. There are plenty SAN solutions provided by different big companies that will do a good job here. For the demo in this post we will simply put everything on the same machine.

We get a highly available messaging cluster topology with the live-backup structure. But how is a failover of the producers or consumers handled between the live and backup? The participants of the cluster, including live and backup server, send information about their connection details by using ip multicasts. If ip multicasts can not be used it is also possible to use a static configuration of initial connections. After an initial connection, the client will also be informed about the topology. If the current connection is stale, the client establishes an new connection to another node.

Setting up our HornetQ cluster-environment

Now that we know the basics, we want to show how the two clustering mechanisms explained above can be configured. We use the EAP 6 version and setup a cluster in standalone mode. You could also use the AS 7 and built the JBoss AS 7.1.2 tag by yourself.

Setting up multiple HornetQ servers

Let us start with two fresh installations of the application server. As already mentioned, we will use the standalone mode with the standalone-full-ha.xml configuration. The configuration is already prepared for clustering, including the HornetQ configuration for server-side load-balancing between multiple HornetQ servers.

The most important configuration is that the messaging system is configured to be clustered, as shown in the following listing.

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <clustered>true</clustered> <cluster-user>clusteruser</cluster-user> <cluster-password>cluster-secret</cluster-password> <!-- ... --> </hornetq-server> </subsystem>

As you already guessed all cluster-nodes need to be configured with the same cluster-user and cluster-password. The cluster user is by default HORNETQ.CLUSTER.ADMIN.USER.

The next configuration is how the nodes of the cluster can connect to each other. In the deafult configuration, the servers use ip multicasts to discover other nodes. Each server propagates information about server connectors to other servers. The way, how a server broadcasts the connector is configured via the broadcast-group, as shown in the following listing.

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <broadcast-groups> <broadcast-group name="bg-group1"> <socket-binding>messaging-group</socket-binding> <broadcast-period>5000</broadcast-period> <connector-ref>netty</connector-ref> </broadcast-group> </broadcast-groups> <!-- ... --> </hornetq-server> </subsystem>

The server receives the connector information of other nodes over the discovery group. This discovery group will be referenced by the cluster-connection to establish connections to other servers of the group. The following listing contains the default configuration:

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <discovery-groups> <discovery-group name="dg-group1"> <socket-binding>messaging-group</socket-binding> <refresh-timeout>10000</refresh-timeout> </discovery-group> </discovery-groups> <!-- ... --> </hornetq-server> </subsystem>

Now, with those connections, the load can be server-side distributed between the cluster nodes. There are many other configuration parameters for cluster-connections, such as reconnect-attempts or max-hops in a chain-topology. For our cluster, the defaults are sufficient and we can now start our server nodes.

Configuration for high availability

Let us setup a backup server for each cluster node, before we try our cluster out. We start again with two fresh installations of the application server. Both live and backup will need to have clustering enabled as explained in the previous subsection.

High availability in HornetQ requires a shared journal. To keep it easy we will not be using a SAN for the journal (this will not make a difference in the configuration). Instead the live server and its backup will be running on the same machine and share a directory.

The first configuration makes the journal persistent and tells the server that it will share the journal directory with other servers as shown in the listing below:

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <!-- ... --> <!-- first of all we want to use a journal on disk (this is important) --> <persistence-enabled>true</persistence-enabled> <!-- the journal will be shared by multiple servers --> <shared-store>true</shared-store> <!-- a directories which can be accessed by both, live and backup --> <journal-directory path="path/to/journal" relative-to="user.home"/> <bindings-directory path="path/to/bindings" relative-to="user.home"/> <large-messages-directory path="path/to/large-message" relative-to="user.home"/> <paging-directory path="path/to/paging" relative-to="user.home"/> <journal-file-size>102400</journal-file-size><!-- you may tune this --> <journal-min-files>2</journal-min-files><!-- you may tune this --> <!-- When we shut down the live, we want failover to kick in. --> <failover-on-shutdown>true</failover-on-shutdown> <!-- ... --> </hornetq-server> </subsystem>

Note that the configuration is the same for live and backup. The configuration contains the details about the journal, such as the location. The last element enables failover on server shutdown. Normally there would not be failover if you manually shut the live down.

As a next step, it is necessary to decide which server should be the backup server. The configuration for a backup is the same as for an “awaiting-to-become-backup”, as shown in the listing below. Note: A server that has been marked as the backup needs the live server for successful start-up.

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <backup>true</backup><!-- true for backup, false for live --> <!-- ... --> </hornetq-server> </subsystem>

In order that the client can failover to the backup, without another lookup of the ConnectionFactory, it is necessary that the initial ConnectionFactory knows the other cluster nodes. Therefore, the ConnectionFactory needs to reference the discovery group and it must be marked with ha, as shown in the listing below:

<subsystem xmlns="urn:jboss:domain:messaging:1.3"> <hornetq-server> <!-- ... --> <jms-connection-factories> <connection-factory name="RemoteConnectionFactory"> <discovery-group-ref discovery-group-name="dg-group1"/><!-- This only works for clients that can be reached by multicasts from the servers --> <entries> <entry name="java:jboss/exported/jms/RemoteConnectionFactory"/> </entries> <ha>true</ha><!-- important for automatic client failover --> <reconnect-attempts>-1</reconnect-attempts> </connection-factory> </jms-connection-factories> <!-- ... --> </hornetq-server> </subsystem>

By default the connection does not try to reconnect automatically. This can be configured by the reconnect-attempts element. In the example above, the value -1 is used for infinite retries. Maybe not the best configuration for production.

Try out the cluster with the demo application

We provide a little demo application in the jms-cluster-example directory of our github project jbosscc-as7-examples . The server-side application features a message-driven-bean that subscribes a queue. The payload of received messages is written to the console of the application server that processes the message. The client application contains a message producer for the queue.

The repository also contains the configuration for a cluster with two lives and their two backups. The configuration has been tested with the EAP 6 but should also work well with the JBoss AS 7.1.2. Simply setup four fresh instances and copy the contents of the standalone/configuration/ folders into your JBoss installation directories. Start the instances on the same machine with the following commands:

live1 : ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl1 backup1: ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jbu1 live2 : ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jl2 backup2: ./standalone.sh -c standalone-full-ha.xml -Djboss.node.name=jbu2

Note: They will bind to 127.0.0.1 other IP adresses may not work with the provided configurations.

The demo application can be compiled via mvn package . After the servers have been started and the message driven bean has been deployed on the live servers, you can use the client application to produce messages. Simply start it with mvn exec:exec .

The client connects to the first live server live1 and starts to produce messages. You can observe load-balancing by console-output of the message driven bean. The messages will distributed between the live servers. When you kill the live servers or just one of them you can observe failover, which is accompanied with some harmless warnings. If you then start the lives or just one live again, you can see its message driven bean not only consuming new messages but also messages the backup received when the server was down. A remote consumer would receive messages even if the live is down. However, in our example the message driven bean is deployed on the live server and it will also die, if the jvm process of the live server is killed.

Summary

The clustering mechanisms described in this post are very powerful but can also become very complex. They are stable but you can easily make your system unreliable if you configure it wrong or do things like forgetting to close sessions. So it needs to be handled with care.

If it is finished early enough, HornetQ 2.3 will be included into EAP 6.1. That would mean you could share the journal without a SAN. Before that we have to use a SAN for high availability. But you could also use a “higher-message-throughput” cluster without any live-backup pairs but with an ha-enabled connection-factory. Clients will always be able to send and receive messages but some single messages might could get stuck in the journal of a crashed server. These messages would not be delivered right away.

The next and last post of this blogpost series will be about JGroups and Cloud issues. Thank you for your feedback on the last posts and if you have got any questions or comments to this post feel free to comment or write an e-mail to

heinz.wilming (at) akquinet.de

immanuel.sims (at) akquinet.de