What's your meaning of "Consumer"?
Is it camel jms consumer or jms client?
Can you explain that "5 bos.com broker gets exceptions that rdu2.com is down (as expected)"?
It looks like the exception is throws from the broker and Camel router doesn't know anything about it.
"Consumer is reconnected to rdu2.com but NO messages received. "
Do you mean broker rdu2.com doesn't receive any message?
For both consumer and producer I use "external-mq-fabric-client" from Fuse by Examples but connecting with failover() instead of discovery()
I'm attaching karaf.log from the BOS broker instance (camel and broker runs on the same container). Also attaching some camel output from karaf CLI.
The broker receive the message but can't route them to the consumer. "Exchanges Total" and "Exchanges Failed are increasing".
In this example I was connected initially to broker RDU2, received a few messages then shutdown rdu2, reconnect to broker bos but no messages received. Producer was sending messages for the whole time of the test.
I did the logic a bit simpler. Now I have two connection factories, 1st is for DMZ fabric and second is for both RDU2 and BOS (again with failover() ). No camel load balancing.
Now the simple test case which works perfectly:
All rdu2 brokers are started
1. Client connects to rdu2 host 1 (current master).
2. Publisher push messages to dmz fabric, the client in rdu2 broker is getting the message
3. Stop publisher
4. Stop rdu2 host 1, client detects this and reconnects to rdu2 host 2
5. Start publisher and again client is getting messages.
This means basic failover works, camel is routing correctly to the next elected master
Now if I do the very same test but without stopping the publisher I'm getting the JMS exceptions as attached above:
2014-11-25 18:14:52,240 | ERROR | rg.jboss.issues] | DefaultErrorHandler | rg.apache.camel.util.CamelLogger 215 | 136 - org.apache.camel.camel-core - 2.12.0.redhat-611412 | Failed delivery for (MessageId: ID:zaska-54402-1416939267471-1:1:1:1:20 on ExchangeId: ID-fuse-fabric-02-rdu2-41792-1416939222035-0-8). Exhausted after delivery attempt: 1 caught: org.springframework.jms.IllegalStateException: javax.jms.JMSException: Stopped.; nested exception is javax.jms.IllegalStateException: javax.jms.JMSException: Stopped.
Does this "Stopped" means that the broker is down or the camel route is down? Looks like it's somehow stuck and don't want to reconnect.
Do I need to do some additional logic for Error Handling and Dead Letter Channel/queue?
I made it working by putting all brokers in fabric 2 on one connection factory with failover() and added initialReconnectDelay=2000&timeout=5000