Client to access cluster (possibly hitting back...| JBoss.org Content Archive (Read Only)

30. Re: Client to access cluster (possibly hitting backup server)

clebert.suconic Nov 26, 2011 11:12 PM (in response to underscore_dot)

I couldn't really understand what's going on.. It would be a bit harder to tell you how to fix it.

Can you simplify your scenario? Start with the version you are using.

31. Re: Client to access cluster (possibly hitting backup server)

sv_srinivaas Nov 28, 2011 10:25 AM (in response to clebert.suconic)

Clebert,

In my case the issue is with the MDB which is not failing over to the backup server. I'm using Hornetq 2.2.5 Final and JBoss 5.1.0 version. OS is windows XP. I have configured 3 servers one for live jms node, one for backup and I've deployed the MDB in the thrid node to consumes messages from live and when live goes down it has to consume messages from backup but it does not.

Everythinng works fine as long as the live node is up and running, once i kill the live node then the backup server gets started as expected but the MDB fails to switch over to the backup node and I see the below exception thrown in the MDB server.

"Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]"

I have already sent the config files for all the 3 nodes. Pls let me know how to configure the ra.xml for the MDB to connect to remote live and backup nodes.

Thanks

Srinivaas

32. Re: Client to access cluster (possibly hitting backup server)

clebert.suconic Nov 28, 2011 12:58 PM (in response to sv_srinivaas)

There's a setting you have to make on the MDB's connection factory, setting ha=true.

I have asked Andy Taylor to provide some specifics config (if you can't find it). He's in UK though.. so it's already late for him.

I couldn't understand your configs (especially your ha.xml). if you could attach instead of paste text. (switch to Advanced Editor on the webpage) here.

33. Re: Client to access cluster (possibly hitting backup server)

sv_srinivaas Nov 29, 2011 12:18 AM (in response to clebert.suconic)

Clebert, thanks for your time. I've already sent all the xml files as attachments on Nov 24th as a zip file. I've also set ha=true in the ra.xml for the MDB connection factory. Pls let me know if you need any other details regd this.

34. Re: Client to access cluster (possibly hitting backup server)

ataylor Nov 29, 2011 5:49 AM (in response to sv_srinivaas)

can you confirm that the backup is announcing itself correctly? i.e. you will see "backup announced" in the server logs?

35. Re: Client to access cluster (possibly hitting backup server)

ataylor Nov 29, 2011 5:54 AM (in response to ataylor)

I can also see that you have both discovery and static connectors configured in the ra.xml file, you should use one or the other.

36. Re: Client to access cluster (possibly hitting backup server)

clebert.suconic Nov 29, 2011 9:03 AM (in response to ataylor)

I just realized yo are using z:/ as your journal folder? what is that? NFS? Shared Storage?

So, an important bit of information we didn't have until now is that you are using Windows.

An important factor for shared storage is that you should use a Shared Storage, and it should support distributed locking on Java, otherwise the system will not distribute locks what will cause you issues.

In general we have been suggesting Linux / GFS2 (which we have tested so far). However I believe there are systems on Solaris this would work fine.

I'm not sure Windows will guarantee you proper locking over NFS what could lead to both systems starting creating the issue you're seeing. See if the backup is holding itself until the live system is killed.

37. Re: Client to access cluster (possibly hitting backup server)

underscore_dot Nov 30, 2011 12:43 PM (in response to clebert.suconic)

Hi, Clevert.

My case is very similar to the one exposed by Srinivaas, though I'm not running MDB. Just running as standalone.

I'm running HornetQ 2.2.5 Final on CentOS with AIO/Ext3.

I'm testing HA, so I have a live and a backup node.

If, while a producer is sending messages to a destination, I shut live node down, then this producer gets the following exception:

"Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]".

And, even though backup node announces failover with "backup announced", the consumer no longer receives messages.

My configs are inside this zip file: http://community.jboss.org/servlet/JiveServlet/downloadBody/17363-102-1-126129/ha-failover-failback-test.zip

Both producer and consumer directly/localy instanciate a connection factory as a Spring bean (not through JNDI - check my previous post).

Many thanks for your help.

38. Re: Client to access cluster (possibly hitting backup server)

ataylor Nov 30, 2011 12:52 PM (in response to underscore_dot)

use a non spring client to clarify if your config works, usually you'll find that the error is with Spring rather than HornetQ.

One thing, and im no spring expert, if you arent using jndi top look up the connection factory then how do you set all the attributes needed, HA=true etc

39. Re: Client to access cluster (possibly hitting backup server)

sv_srinivaas Dec 2, 2011 2:25 AM (in response to clebert.suconic)

Clebert/ Andy,

Thanks for your time. It took me a while to go through the documentation and do some testing. At last I was able get the failover to work with MDBs but it just happens once in the morning and not after that. I have checked this for last couple of days and this is what happens.

1. Cleared all the tmp folders including the shared folder and lock files

2. Started live, backup and MDB (backup was able to announce itself) for the first time.

3. Sent messages using MDBRemoteFailoverStaticClientExample.java, MDB consumed messages from and also sent reply msg to live node.

4. On stopping live, backup node was started and MDBs failed over to backup node and was able to consume & also send reply messages as expected.

Following this, I thought of repeating this test few times to check for consistency and I did the below tasks.

1. Stopped all the nodes (live/ backup and mdb) but I had issues stopping the mdb node and hence I had to kill it.

2. Deleted all the files in the shared store including the lock file

3. Restarted live node, got an exception (

164 ERROR [STDERR] java.lang.IllegalArgumentException: Error instantiating connector factory "org.hornetq.integration.transports.netty.NettyConnectorFactory"

4. Started backup node and got unable to announce backup exception.

After this stage how many ever times i try i get the same error in live and backup nodes and ultimately the MDB fails to failover. This behavior is consistent for last couple of days where everything works for the first time and after that it doesn't.

Answers to some of your questions.

I just realized yo are using z:/ as your journal folder? what is that? NFS? Shared Storage?

)

It is only NFS .

Also if I use only static connectors i'm not able to see the failover happening, only with discovery it happens, again i have not tried with static connectors for the first time when things work as expected.

Another point to note: Previously I had the MDB node (in default-with-hornetq configuration) and my backup node (in all-wth-hornetq) in the same physical box (when failover of MDBs did not happen). Now i've deployed the MDB in a separate physical box after which the MDBs failover to backup but again only for the first time and not always.

Attached are the configuration files for all the nodes and log files for live and backup node (when it failed to start). Thanks.

configs.zip 72.7 KB

40. Re: Client to access cluster (possibly hitting backup server)

sv_srinivaas Dec 5, 2011 10:25 AM (in response to sv_srinivaas)

HI,

Now I'm able to make the MDB failover (atleast 8 / 10 times) to backup node. I did not do any changes to any of the xmls instead I just stopped deleting the temp and lock files after every run and now it works fine most of the times.

I just have one question regarding this. Now the MDB fails over to backup node only if i kill the live node in between sending messages where a connection is kept open as per the MDBRemoteFailoverStaticClientExample.java shipped with the distro.

As per this code

1. Program sends a message

2. Waits for user input (when I kill the live node and the MDMs failover as expected) and press a key

3. Sends another message and this time to the backup node and MDB picks it from backup node

So far it works fine, but if i dont kill the live in between as mentioned below

1. Program sends a message

2. Waits for user input, just press a key (and DO NOT kill the live node now)

3. Sends another message to the same live node and sender program terminates

4. KILL LIVE NODE

5. Backup starts now, but MDBs doesn't failover always (1 out of 5 times it fails over to backup but not otherwise)

5. Run the sender again and messages are sent to backup node but MDBs are not available to process it.

Is it not possible to make the MDBs failover to backup node irrespective of whether an open connection exists or not to the live node during failover?

Thanks!