10 Replies Latest reply on Jul 7, 2009 11:25 AM by gaohoward

Clusted FailOver doesn't seem to work correctly with Bridge

artp Jul 1, 2009 7:30 PM

I have a cluster that I'll call cluster A that has a Bridge that moves messages from a Cluster A queue to another cluster's(Cluster B) queue. Cluster B contains two nodes(B0,B1). Initially, when i start all the nodes up all is well, messages are moved from Cluster A to one node(ie B0) in Cluster B.

When i take down B0, i get failover messages, on B1, saying that fail over is happening and it's completed. Natually, i'd expect further messages from Cluster A to be processed by B1. Sometimes FailOver works to B1, but most of the time it doesn't. When it doesn't I see the MessageCount and DeliveryCount grow higher has i send more messages from Cluster A. It looks as if the Bridge is doing it's job since i can see the messagecount from the jmx console. FailOver appears to be causing something to go wrong once messages get to B1. From that point on messages are stuck on B1.

To get the messages out of the stuck state i have to bring nodes up and down(ie. bring B0 up then take down B1). Eventually, the FailOver kicks in and one Node proceses the messages. Once that happens, messages are processed.

Is there something i'm missing?

Environment:
jboss 5.1, JBM 1.4.4 src, I modified JMSRemoteConnection to compile against Remoting 2.5.1, so we could have the Bridge fixes, etc.

1. Re: Clusted FailOver doesn't seem to work correctly with Bri

gaohoward Jul 2, 2009 10:15 AM (in response to artp)

Hi,

Can you please post your bridge service configure here?

Thanks
Howard
Actions

2. Re: Clusted FailOver doesn't seem to work correctly with Bri

artp Jul 2, 2009 1:15 PM (in response to artp)

I forgot to mention that we have the bridge running as a HASingleton on cluster A. The JMSProvider, has two nodes(ie Cluster B nodes B0,B1) configured in the url provider.

<mbean code="org.jboss.jms.server.bridge.BridgeService"
 name="jboss.messaging:service=Bridge,name=OTSPackageBridge"
 xmbean-dd="xmdesc/Bridge-xmbean.xml">

 <!-- The JMS provider loader that is used to lookup the source destination -->
 <depends optional-attribute-name="SourceProviderLoader">
 jboss.messaging:service=JMSProviderLoader,name=HAJNDIJMSProvider</depends>

 <!-- The JMS provider loader that is used to lookup the target destination -->
 <depends optional-attribute-name="TargetProviderLoader">
 jboss.messaging:service=JMSProviderLoader,name=OTS-HAJNDIJMSProvider</depends>

 <!-- The JNDI lookup for the source destination -->
 <attribute name="SourceDestinationLookup">com.company.wop.ots.event.queue</attribute>

 <!-- The JNDI lookup for the target destination -->
 <attribute name="TargetDestinationLookup">com.company.wop.ots.event.queue</attribute>
 <attribute name="QualityOfServiceMode">0</attribute>

 <!-- The maximum number of messages to consume from the source
 before sending to the target -->
 <attribute name="MaxBatchSize">5</attribute>

 <!-- The maximum time to wait (in ms) before sending a batch to the target
 even if MaxBatchSize is not exceeded.
 -1 means wait forever -->
 <attribute name="MaxBatchTime">5000</attribute>
 <!-- The number of ms to wait between connection retrues in the event connections
 to source or target fail -->
 <attribute name="FailureRetryInterval">5000</attribute>

 <!-- The maximum number of connection retries to make in case of failure,
 before giving up -1 means try forever-->
 <attribute name="MaxRetries">-1</attribute>

 <!-- If true then the message id of the message before bridging will be added
 as a header to the message so it is available to the receiver. Can then be
 sent as correlation id to correlate in a distributed request-response -->
 <attribute name="AddMessageIDInHeader">false</attribute>

</mbean>

Here is a snippet of our hajndi-jms-ds.xml

 <!-- The JMS provider loader -->
 <mbean code="org.jboss.jms.jndi.JMSProviderLoader"
 name="jboss.messaging:service=JMSProviderLoader,name=HAJNDIJMSProvider">
 <attribute name="ProviderName">DefaultJMSProvider</attribute>
 <attribute name="ProviderAdapterClass">
 org.jboss.jms.jndi.JNDIProviderAdapter
 </attribute>
 <!-- The combined connection factory -->
 <attribute name="FactoryRef">XAConnectionFactory</attribute>
 <!-- The queue connection factory -->
 <attribute name="QueueFactoryRef">XAConnectionFactory</attribute>
 <!-- The topic factory -->
 <attribute name="TopicFactoryRef">XAConnectionFactory</attribute>
 <!-- Access JMS via HAJNDI -->
 <attribute name="Properties">
 java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
 java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
 java.naming.provider.url=${jboss.bind.address:localhost}:1100
 jnp.disableDiscovery=false
 jnp.partitionName=${jboss.partition.name:DefaultPartition}
 jnp.discoveryGroup=${jboss.partition.udpGroup:230.0.0.4}
 jnp.discoveryPort=1102
 jnp.discoveryTTL=16
 jnp.discoveryTimeout=5000
 jnp.maxRetries=1
 </attribute>
 </mbean>

Note:We changed the connection provider to ConnectioFactory, since ClusteredConnectionFactory didn't work

<mbean code="org.jboss.jms.jndi.JMSProviderLoader"
 name="jboss.messaging:service=JMSProviderLoader,name=OTS-HAJNDIJMSProvider">
 <attribute name="ProviderName">OTSJMSProvider</attribute>
 <attribute name="ProviderAdapterClass">
 org.jboss.jms.jndi.JNDIProviderAdapter
 </attribute>
 <!-- The combined connection factory -->
 <attribute name="FactoryRef">ConnectionFactory</attribute>
 <!-- The queue connection factory -->
 <attribute name="QueueFactoryRef">ConnectionFactory</attribute>
 <!-- The topic factory -->
 <attribute name="TopicFactoryRef">ConnectionFactory</attribute>
 <!-- Access JMS via HAJNDI -->
 <attribute name="Properties">
 java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
 java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
 java.naming.provider.url=dev-ots01.test.company.com:1100,dev-ots02.test.company.com:1100
 jnp.disableDiscovery=false
 jnp.partitionName=OTS-dev
 jnp.discoveryGroup=228.1.2.21
 jnp.discoveryPort=1102
 jnp.discoveryTTL=16
 jnp.discoveryTimeout=5000
 jnp.maxRetries=1
 </attribute>
 </mbean>

3. Re: Clusted FailOver doesn't seem to work correctly with Bri

artp Jul 2, 2009 2:50 PM (in response to artp)
We're using postgres to persist messages. In the file postgresql-persistence-service.xml, I configured the PostOffice to
<attribute name="Clustered">true</attribute> <attribute name="FailoverOnNodeLeave">true</attribute>

Is this correct?
Actions
4. Re: Clusted FailOver doesn't seem to work correctly with Bri

clebert.suconic Jul 2, 2009 3:52 PM (in response to artp)

jboss 5.1, JBM 1.4.4 src, I modified JMSRemoteConnection to compile against Remoting 2.5.1, so we could have the Bridge fixes, etc.

I believe some of the changes Howard made on trunk will also require some changes on Remoting 2.5.X, but I don't think the required changes have been made yet.

Howard: do you have more details about this?

artp: as you are runninig from source: did you run the complete testsuite? You may have bugs at your code that we haven't tested yet. (the pending changes on remoting may be an issue for you).
Actions
5. Re: Clusted FailOver doesn't seem to work correctly with Bri

clebert.suconic Jul 2, 2009 3:53 PM (in response to artp)

I believe some of the changes Howard made on trunk

typo... Old habit.. I meant.. Branch_1_4.
Actions
6. Re: Clusted FailOver doesn't seem to work correctly with Bri

timfox Jul 6, 2009 2:40 PM (in response to artp)

If you want a client in JBM 1.x to failover automatically from one node to another you need to set supportsFailover to true in the descriptor where the connection factory is deployed.
Actions
7. Re: Clusted FailOver doesn't seem to work correctly with Bri

artp Jul 6, 2009 2:50 PM (in response to artp)

Ideally, I should be using ClusteredConnectionFactory, but i haven't gotten that to work. Messages get stuck on Cluster B when using ClusteredConnectionFactory.It seems like there are issues with ClusteredConnectionFactory. I'll try creating a custom ConnectionFactory with only SupportsFailover set to true to see if it helps.
Actions
8. Re: Clusted FailOver doesn't seem to work correctly with Bri

timfox Jul 6, 2009 3:58 PM (in response to artp)

It depends what you want to do. When you originally said failover I assume you really meant reconnection, in which case you don't need a clustered connection factory.
Actions
9. Re: Clusted FailOver doesn't seem to work correctly with Bri

artp Jul 6, 2009 5:44 PM (in response to artp)

I tried setting SupportsFailOver to true on the Connection. Then, I tried testing failover by taking down node B0, and fail over worked, messages went to B1. After i period of time i brought up B0 and took down B1. Messages never got delievered to B0 and piled up in the message/deliver count queue. I brought up B1, and still nothing happened until i took down B0, then fail over occured and B1 processed the backlog of messages. Next i took down B1 and once again i end up in the messages stuck situation.

It seems to me that FailOver works the first time I take down node B0, but after that FailOver doesn't work properly. Is there anything else i can try?
Actions
10. Re: Clusted FailOver doesn't seem to work correctly with Bri

gaohoward Jul 7, 2009 11:25 AM (in response to artp)

Hi,

Just for your info, using remoting 2.5.1 cannot solve the message stuck issue. You probably need to wait for 2.5.2 release. and then better wait for a new JBM release for AS 5.

Thanks.
Actions

Go to original post