11 Replies Latest reply on Apr 2, 2014 7:30 AM by jgabrielygalan

Client code to switch servers on failure

jgabrielygalan Mar 21, 2014 7:48 AM

Hi,

I would like to do the following, and I'm not sure if it's possible or how: I would like to have 2 standalone servers, not in master/backup or cluster (I mean, I don't want to redistribute messages), to which clients can connect to with the Core API. When one of the server fails, clients should connect to the other and continue operation normally.

My first try was to not configure anything special in the servers, so a standalone normal configuration. In the client I do:

	Map<String, Object> props = newHashMap();
	props.put("host", "localhost");
	props.put("port", "5445");
	TransportConfiguration host1 = new TransportConfiguration(NettyConnectorFactory.class.getName(), props);

	Map<String, Object> props2 = newHashMap();
	props2.put("host", "localhost");
	props2.put("port", "6445");
	TransportConfiguration host2 = new TransportConfiguration(NettyConnectorFactory.class.getName(), props2);

ServerLocator serverLocator = HornetQClient.createServerLocatorWithHA(host1, host2);

ClientSessionFactory sf = null;
ClientSession session = null;
ClientProducer producer = null;
try {
	sf = serverLocator.createSessionFactory();
	System.out.println("Creating session factory: " + sf);
	session = sf.createSession();
	System.out.println("Creating session: " + session);
	producer = session.createProducer("testaddress");
	for (int i = 0; i < 100; i++) {
		ClientMessage message = session.createMessage(Message.TEXT_TYPE, true);
		message.putStringProperty("type", "testLog");
		message.putStringProperty("notification", "logtest");
		message.getBodyBuffer().writeNullableSimpleString(new SimpleString("test from java"));
		producer.send(message);
		System.out.println("Message " + i + " sent.");
		Thread.sleep(5000);
	}

	} finally {
		producer.close();
		session.close();
		sf.close();
		serverLocator.close();
	}

1. Re: Client code to switch servers on failure

jgabrielygalan Mar 24, 2014 5:48 AM (in response to jgabrielygalan)

It seems that my question was cut:

With that code I start sending messages, then stop one server, and the client fails with a connection error. I would have expected that the createServerLocatorWithHA would failover to the other host when the first one failed. Is there anything special I need to do to achieve this? As I said above I want both servers to be online (so no master/backup), and I don't need any redistribution of messages between servers (so no need for clustering).

Thanks,

Jesus.
Actions
2. Re: Client code to switch servers on failure

ataylor Mar 24, 2014 6:23 AM (in response to jgabrielygalan)

for HA you need a cluster with live/backup pairs. take a look at the HA chapter in the HornetQ docs
Actions
3. Re: Client code to switch servers on failure

jbertram Mar 24, 2014 11:49 AM (in response to jgabrielygalan)

If you don't want any hardware to be idle then look at using a "colocated" topology where you run a live and backup instance on the same physical server.
Actions
4. Re: Client code to switch servers on failure

jgabrielygalan Mar 24, 2014 11:55 AM (in response to jbertram)

Thanks. Based on your answers I am trying to make a manual application level failover, based on a SessionFailureListener. Is there any specific recommendation for that? I am planning on attaching a SessionFailureListener to the session. Then on a connectionFailed event, close all resources up to the SessionFactory and start over, creating SessionFactory, Session and Producer against the other host.

Thanks,

Jesus.
Actions
5. Re: Client code to switch servers on failure

jbertram Mar 24, 2014 12:23 PM (in response to jgabrielygalan)

My general recommendation would be to save yourself the trouble and just use a colocated topology. This is especially simple in Wildfly because it ships with an example configuration for this.
Actions
6. Re: Client code to switch servers on failure

jgabrielygalan Mar 25, 2014 6:59 AM (in response to jbertram)

Sorry, yesterday I answered in IRC, but then I had to leave. My comment was that if I understand correctly, a colocated topology involves having the master and the backup in the same server. I'm not sure if this is very advantageous for my use case, cause I want to protect from servers failling, restarting, etc. But, on the other hand, it's not important to failover the messages that are already queued and persisted. I can wait until the server is fixed or restarted or whatever. The important thing is that the producers keep producing. I would prefer to avoid the overhead of syncing the master with the backup (network traffic, as I can't afford a SAN shared disk). A simple master/backup in two machines would suffice, but I'd rather use the machines as two master instances that can take the clients of the other in case of failure.

Is there a better way to achieve this than manually failing over in the client in case of a connection failure?

Thanks,

Jesus.
Actions
7. Re: Client code to switch servers on failure

ataylor Mar 25, 2014 8:25 AM (in response to jgabrielygalan)

Sorry, yesterday I answered in IRC, but then I had to leave. My comment was that if I understand correctly, a colocated topology involves having the master and the backup in the same server. I'm not sure if this is very advantageous for my use case, cause I want to protect from servers failling, restarting, etc. But, on the other hand, it's not important to failover the messages that are already queued and persisted. I can wait until the server is fixed or restarted or whatever. The important thing is that the producers keep producing. I would prefer to avoid the overhead of syncing the master with the backup (network traffic, as I can't afford a SAN shared disk). A simple master/backup in two machines would suffice, but I'd rather use the machines as two master instances that can take the clients of the other in case of failure.

you either have to have network traffic by using replication or use a shared disc.

regarding servers crashing the point is you have the backup for each live server colocated on another VM.
Actions
8. Re: Client code to switch servers on failure

jbertram Mar 25, 2014 10:58 AM (in response to jgabrielygalan)

My comment was that if I understand correctly, a colocated topology involves having the master and the backup in the same server. I'm not sure if this is very advantageous for my use case, cause I want to protect from servers failling, restarting, etc.

In a colocated topology you do indeed have a live and a backup on the same physical server, but the backup which is colocated with the live is not backing up that particular live instance. It is backing up the live instance on the other physical server.

The important thing is that the producers keep producing.

We don't currently support live-to-live connection failover. Therefore, the only way to get the functionality you're looking for is with a live/backup pair.

A simple master/backup in two machines would suffice, but I'd rather use the machines as two master instances that can take the clients of the other in case of failure.

If you have 2 physical servers and you want both to be live but you still want the ability to fail-over then you need to use a colocated topology as I've already mentioned. In this case you'd have 4 instances of HornetQ running, 2 on each physical server, 1 live and 1 backup. Each backup would be servicing the live server on the other physical server.

Is there a better way to achieve this than manually failing over in the client in case of a connection failure?

Yes. Use a colocated topology.
1 of 1 people found this helpful
Actions
9. Re: Client code to switch servers on failure

jgabrielygalan Apr 2, 2014 4:57 AM (in response to jbertram)

> Therefore, the only way to get the functionality you're looking for is with a live/backup pair.

Hi, sorry, I couldn't continue with this topic for some days. Now I'm back to it, and I'm trying to setup the simple master/backup pair, as you said. With the code above, though, I launch the client, it starts sending messages. Then I kill the master, in the backup logs I see that the failover is successful, but the client fails with a connection error. Am I doing something wrong? Here is the relevant configuration sections of the servers:

Master:
        <shared-store>false</shared-store>
        <backup>false</backup>

   <broadcast-groups>
      <broadcast-group name="bg-group1">
         <group-address>231.7.7.7</group-address>
         <group-port>9876</group-port>
         <broadcast-period>5000</broadcast-period>
         <connector-ref>netty</connector-ref>
      </broadcast-group>
   </broadcast-groups>

   <discovery-groups>
      <discovery-group name="dg-group1">
         <group-address>231.7.7.7</group-address>
         <group-port>9876</group-port>
         <refresh-timeout>10000</refresh-timeout>
      </discovery-group>
   </discovery-groups>

   <cluster-connections>
      <cluster-connection name="my-cluster">
         <address>jms</address>
         <connector-ref>netty</connector-ref>
         <discovery-group-ref discovery-group-name="dg-group1"/>
      </cluster-connection>
   </cluster-connections>

Backup:
        <shared-store>false</shared-store>
        <backup>true</backup>

   <broadcast-groups>
      <broadcast-group name="bg-group1">
         <group-address>231.7.7.7</group-address>
         <group-port>9876</group-port>
         <broadcast-period>5000</broadcast-period>
         <connector-ref>netty</connector-ref>
      </broadcast-group>
   </broadcast-groups>

   <discovery-groups>
      <discovery-group name="dg-group1">
         <group-address>231.7.7.7</group-address>
         <group-port>9876</group-port>
         <refresh-timeout>10000</refresh-timeout>
      </discovery-group>
   </discovery-groups>

   <cluster-connections>
      <cluster-connection name="my-cluster">
         <address>jms</address>
         <connector-ref>netty</connector-ref>
         <discovery-group-ref discovery-group-name="dg-group1"/>
      </cluster-connection>
   </cluster-connections>

When I launch the servers, the master sends some data to the failover, which prints:

10:46:46,656 INFO [org.hornetq.core.server] HQ221109: HornetQ Backup Server version 2.5.0.SNAPSHOT (Wild Hornet, 124) [null] started, waiting live to fail before it gets active
10:46:53,245 INFO [org.hornetq.core.server] HQ221024: Backup server HornetQServerImpl::serverUUID=68593b6d-7144-11e3-9268-fd18f3a87c58 is synchronized with live-server.
10:46:55,585 INFO [org.hornetq.core.server] HQ221031: backup announced

I kill the master (ctrl+c) and I see this in logs:
Master:
10:48:27,256 INFO [org.hornetq.integration.bootstrap] HQ101001: Stopping HornetQ Server
10:48:27,270 WARN [org.hornetq.core.server] HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED enabled=true
10:48:27,270 WARN [org.hornetq.core.server] HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED true
10:48:27,307 WARN [org.hornetq.core.server] HQ222113: On ManagementService stop, there are 2 unexpected registered MBeans: [core.acceptor.netty-throughput, core.acceptor.netty]
10:48:27,330 WARN [org.hornetq.core.server] HQ222015: LIVE IS STOPPING?!? message=FAIL_OVER enabled=true
10:48:27,331 WARN [org.hornetq.core.server] HQ222015: LIVE IS STOPPING?!? message=FAIL_OVER true
10:48:27,536 INFO [org.hornetq.core.server] HQ221002: HornetQ Server version 2.5.0.SNAPSHOT (Wild Hornet, 124) [68593b6d-7144-11e3-9268-fd18f3a87c58] stopped

Backup:

10:48:27,293 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
10:48:27,294 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
10:48:27,370 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
10:48:27,387 INFO [org.hornetq.core.server] HQ221037: HornetQServerImpl::serverUUID=68593b6d-7144-11e3-9268-fd18f3a87c58 to become 'live'
10:48:27,403 WARN [org.hornetq.core.client] HQ212004: Failed to connect to server.
10:48:29,999 INFO [org.hornetq.core.server] HQ221003: trying to deploy queue test
10:48:30,202 INFO [org.hornetq.core.server] HQ221003: trying to deploy queue jms.queue.DLQ
10:48:30,221 INFO [org.hornetq.core.server] HQ221003: trying to deploy queue jms.queue.ExpiryQueue
10:48:30,294 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final localhost:6445
10:48:30,297 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final localhost:6455
10:48:30,300 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=68593b6d-7144-11e3-9268-fd18f3a87c58

So it seems the failover worked, but the client doesn't reconnect to the failover server. The client, though, got the configuration of both servers when connecting:

Map<String, Object> props = newHashMap();
props.put("host", "localhost");
props.put("port", "5445");
TransportConfiguration host1 = new TransportConfiguration(NettyConnectorFactory.class.getName(), props);
ServerLocator serverLocator = HornetQClient.createServerLocatorWithHA(host1);
ClientSessionFactory sf = null;
ClientSession session = null;
ClientProducer producer = null;
try {
    sf = serverLocator.createSessionFactory();
    session = sf.createSession();
    producer = session.createProducer("testaddress");
    for (int i = 0; i < 100; i++) {
        ClientMessage message = session.createMessage(Message.TEXT_TYPE, true);
        message.getBodyBuffer().writeNullableSimpleString(new SimpleString("test from java"));
        producer.send(message);
        Thread.sleep(5000);
    }

} finally {
    producer.close();
    session.close();
    sf.close();
    serverLocator.close();
}

The session factory shows both servers:

Session factory created: ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=73ac8d30-ba43-11e3-be49-c1cad955a847, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=73ac8d30-ba43-11e3-be49-c1cad955a847, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost, backupConfig=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=6445&host=localhost]

but:

Apr 02, 2014 10:48:27 AM org.hornetq.core.protocol.core.impl.RemotingConnectionImpl fail
WARN: HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
[WARNING]
Caused by: HornetQObjectClosedException[errorType=OBJECT_CLOSED message=HQ119018: Producer is closed]
    at org.hornetq.core.client.impl.ClientProducerImpl.checkClosed(ClientProducerImpl.java:346)
    at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:126)

I have checked the failover examples, and the client code doesn't need anything special, the failover is transparent, but that's not what I'm experiencing. Can you help me understand how to make this work?

Thanks,

Jesus.
Actions
10. Re: Client code to switch servers on failure

ataylor Apr 2, 2014 5:12 AM (in response to jgabrielygalan)

serverLocator.setReconnectAttempts(-1) should do the trick, or any other non 0 number.

there are a few other configurations you may want to play around with.
Actions
11. Re: Client code to switch servers on failure

jgabrielygalan Apr 2, 2014 7:30 AM (in response to ataylor)

2 minutes after posting the question I found the answer:

I was missing the setting of the reconnection-attempts. After adding this to the client code:

serverLocator.setReconnectAttempts(10);

everything works as expected. I'll have to tune the value, good to know that -1 is unlimited retries.

Thanks for all your answers and support,

Jesus.
Actions

Go to original post