-
1. Re: How to configure a cluster with fault tolerance
martinmurphy Mar 14, 2008 10:30 AM (in response to jssacristan)Hi Jorge,
What version of Fuse ESB are you using? Have you had a look at the examples\cluster demo? You can use JCAFlow to provide clustered transactional persistence.
- Martin
-
2. Re: How to configure a cluster with fault tolerance
jssacristan Mar 14, 2008 3:52 PM (in response to martinmurphy)Thank you Martin, I'm using Fuse ESB 3.3.0.8.
Cluster demo works ok, but I think it is not the topology I need.
I'm using JMSFlow, what is the difference? It should work both JMS and JCA... Sorry, I'm still a newbie in this technology.
Jorge
-
3. Re: How to configure a cluster with fault tolerance
martinmurphy Mar 21, 2008 9:25 AM (in response to jssacristan)Sorry it took a while to get back, basically JCAFlow supports transactions, so a message won't be removed from the queue until the flow has completed. With JMSFlow there is still a danger that the message could be lost if the broker died while the message was in a component being processed.
-
4. Re: How to configure a cluster with fault tolerance
bsnyder Mar 21, 2008 8:12 PM (in response to jssacristan)The goal I want to obtain is to shutdown one node of the cluster while it processing the request (insert the wait) and that the other contiue with the execution of that request. Is it possible?
To achieve fault tolerance and high availability (which is essentially what you are describing) you will need to configure failover and message persistence on the ActiveMQ broker (as I see you've already done) since ActiveMQ is used by the NMR to communicate with the JBI components.
You say that you want to shut down one node in the cluster while processing is in-flight, allowing another node continue with the processing. For this to happen, you will also need:
1) Have more than one ServiceMix instance running
2) Have the ServiceMix instances networked via the ActiveMQ configuration
3) To deploy the same JSR-181 service to more than one ServiceMix instance
This way there will be more than one instance of the service running so that if one deployment of the service becomes unreachable, the NMR can route the message to another deployment of the service. Other items that will need to be done include:
1) Set persistent=true on the container element in the conf/servicemix.xml file. This has the affect of telling the ActiveMQ broker to persist messages as they flow through the NMR so that messages are not lost in the event of a failure.
2) Set the MessageExchange.JTA_TRANSACTION_PROPERTY_NAME property on the message exchange. This can only be achieved via Java code and the best place for it is at the earliest point where the message exchange is created, i.e., in a marshaler on the servicemix-http component. This property affects the quality of service and therefore the flow that the NMR chooses to handle the message exchange.
The best thing to do is start creating it all and come here with your questions and we'll help you as much as we can.
Bruce
-
5. Re: How to configure a cluster with fault tolerance
jssacristan Mar 24, 2008 1:15 PM (in response to bsnyder)Thank you Bruce and Martin
Well, in fact Master/Slave doesn't work. I run two instances of FUSE Message Broker 5.0.0.9, one as master and one as slave (see below for configurations). Sometimes I get this error in the slave:
ERROR Service - Async error occurred: java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-36131-1206320334896-2:1:-1
And whenever I shutdown master I get this error in the slave:
ERROR MasterConnector - Network connection between vm://broker2#0 and tcp:///192.168.205.141:61616 shutdown: null
java.io.EOFException
WARN BrokerService - Master Failed - starting all connectors
ERROR BrokerService - Failed to startAllConnectors
INFO TransportConnector - Connector vm://broker2 Stopped
On the other hand, Bruce said:
This way there will be more than one instance of the service running so that if one deployment of the service becomes unreachable, the NMR can route the message to another deployment of the service. Other items that will need to be done include:
But if I shutdown both servicemix and activemq how can the other node returns the response to a client which is listening the service that is down?
-
ActiveMQ master:
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:amq="http://activemq.org/config/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd http://activemq.org/config/1.0 http://activemq.apache.org/schema/activemq-core.xsd http://activemq.apache.org/camel/schema/spring http://activemq.apache.org/camel/schema/spring/camel-spring.xsd"> <!-- Allows us to use system properties as variables in this configuration file --> <bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/> <broker xmlns="http://activemq.org/config/1.0" brokerName="broker1" dataDirectory="${activemq.base}/data" persistent="true"> <!-- Destination specific policies using destination names or wildcards --> <destinationPolicy> <policyMap> <policyEntries> <policyEntry topic="FOO.>" producerFlowControl="false" memoryLimit="1mb"> <dispatchPolicy> <strictOrderDispatchPolicy/> </dispatchPolicy> <subscriptionRecoveryPolicy> <lastImageSubscriptionRecoveryPolicy/> </subscriptionRecoveryPolicy> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> <!-- The transport connectors ActiveMQ will listen to --> <transportConnectors> <transportConnector name="openwire" uri="tcp://localhost:61616" discoveryUri="multicast://default"/> <transportConnector name="ssl" uri="ssl://localhost:61617"/> <transportConnector name="stomp" uri="stomp://localhost:61613"/> <transportConnector name="xmpp" uri="xmpp://localhost:61222"/> </transportConnectors> <!-- The store and forward broker networks ActiveMQ will listen to --> <networkConnectors> </networkConnectors> </broker> </beans>
-
ActiveMQ slave:
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:amq="http://activemq.org/config/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd http://activemq.org/config/1.0 http://activemq.apache.org/schema/activemq-core.xsd http://activemq.apache.org/camel/schema/spring http://activemq.apache.org/camel/schema/spring/camel-spring.xsd"> <!-- Allows us to use system properties as variables in this configuration file --> <bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/> <broker xmlns="http://activemq.org/config/1.0" brokerName="broker2" dataDirectory="${activemq.base}/data" masterConnectorURI="tc p://192.168.205.141:61616" shutdownOnMasterFailure="false" persistent="true"> <!-- Destination specific policies using destination names or wildcards --> <destinationPolicy> <policyMap> <policyEntries> <policyEntry topic="FOO.>" producerFlowControl="false" memoryLimit="1mb"> <dispatchPolicy> <strictOrderDispatchPolicy/> </dispatchPolicy> <subscriptionRecoveryPolicy> <lastImageSubscriptionRecoveryPolicy/> </subscriptionRecoveryPolicy> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> <!-- The transport connectors ActiveMQ will listen to --> <transportConnectors> <transportConnector name="openwire" uri="tcp://localhost:61616" discoveryUri="multicast://default"/> <transportConnector name="ssl" uri="ssl://localhost:61617"/> <transportConnector name="stomp" uri="stomp://localhost:61613"/> <transportConnector name="xmpp" uri="xmpp://localhost:61222"/> </transportConnectors> <!-- The store and forward broker networks ActiveMQ will listen to --> <networkConnectors> </networkConnectors>
-
6. Re: How to configure a cluster with fault tolerance
bsnyder Mar 24, 2008 1:27 PM (in response to jssacristan)Well, in fact Master/Slave doesn't work. I run two instances of FUSE Message Broker 5.0.0.9, one as master and one as slave (see below for configurations). Sometimes I get this error in the slave:
ERROR Service - Async error occurred: java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-36131-1206320334896-2:1:-1
The error above appears to be a known issue currently as is identified via AMQ-1464.
And whenever I shutdown master I get this error in the slave:
ERROR MasterConnector - Network connection between vm://broker2#0 and tcp:///192.168.205.141:61616 shutdown: null
java.io.EOFException
WARN BrokerService - Master Failed - starting all connectors
ERROR BrokerService - Failed to startAllConnectors
INFO TransportConnector - Connector vm://broker2 Stopped
I'm not sure why the slave is unable to start it's connectors exactly, are the master and slave brokers out of sync possibly? Take a look at the following steps for the manual synchronization of a master and slave:
http://activemq.apache.org/masterslave.html#MasterSlave-RecoveryingaMasterSlavetopology
Bruce
-
7. Re: How to configure a cluster with fault tolerance
jssacristan Mar 27, 2008 10:36 AM (in response to bsnyder)I follow the steps to resync brokers, I do this:
vmplatina1:/opt/iona/fuse-message-broker-5.0.0.9# scp -r root@vmplatina2:/opt/iona/fuse-message-broker-5.0.0.9/data .
It copies data directory from the slave to the master.
I've also tried deleting both data directories but not work.
If I delete data directories I get this error in slave but no errors in master, when I shutdown master:
ERROR MasterConnector - Network connection between vm://broker2#0 and tcp:///192.168.205.141:61616 shutdown: null
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
... ...
ERROR BrokerService - Failed to startAllConnectors
INFO TransportConnector - Connector vm://broker2 Stopped
If I resync, when I shutdown master I get this:
//MASTER//
INFO BrokerService - ActiveMQ Message Broker (broker1, ID:vmplatina1-45019-1206400757923-0:0) is shutting down
WARN ActiveMQConnection - Async exception with no exception listener: java.io.EOFException
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
... ...
ERROR efaultMessageListenerContainer - Setup of JMS message listener invoker failed - trying to recover
javax.jms.IllegalStateException: The Consumer is closed
at org.apache.activemq.ActiveMQMessageConsumer.checkClosed(ActiveMQMessageConsumer.java:681)
... ...
INFO TransportConnector - Connector openwire Stopped
INFO TransportConnector - Connector ssl Stopped
INFO TransportConnector - Connector stomp Stopped
INFO TransportConnector - Connector xmpp Stopped
INFO BrokerService - ActiveMQ JMS Message Broker (broker1, ID:vmplatina1-45019-1206400757923-0:0) stopped
//SLAVE//
ERROR Service - Async error occurred: java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-45019-1206400757923-2:1:-1
java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-45019-1206400757923-2:1:-1
at org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:576)
... ...
ERROR Service - Async error occurred: java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-45019-1206400757923-2:1:1
java.lang.IllegalStateException: Cannot remove session that had not been registered: ID:vmplatina1-45019-1206400757923-2:1:1
at org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:576)
WARN MasterConnector - The Master has shutdown
WARN BrokerService - Master Failed - starting all connectors
ERROR BrokerService - Failed to startAllConnectors
INFO TransportConnector - Connector vm://broker2 Stopped
-
8. Re: How to configure a cluster with fault tolerance
jssacristan Mar 25, 2008 8:26 AM (in response to bsnyder)I tried with ActiveMQ-5.1-SNAPSHOT from yesterday Mar 24 and I didn't get the first error I mentioned (Cannot remove session that had not been registered).
I tried to send requests to servicemix, but sometimes I get this error:
ERROR MasterBroker - Slave Failed
javax.jms.JMSException: Slave broker out of sync with master: Acknowledgment (MessageAck {commandId = 234, responseRequired = true, ackType = 2, consumerId = ID:vmplatina1-33180-1206410706631-0:0:20:1, firstMessageId = null, lastMessageId = ID:vmplatina1-33180-1206410706631-0:0:28:1:2, destination = queue://org.apache.servicemix.jms.{http://ejemplos.ws.lawebsemantica.com}SaludoService:Saludo, transactionId = null, messageCount = 1}) was not in the dispatch list: []
at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:344)
... ...
Moreover, slave broker still fails when I shutdown master.
-
9. Re: How to configure a cluster with fault tolerance
jssacristan Mar 27, 2008 8:21 AM (in response to jssacristan)Please, somebody has an idea?
I don't know if I am synchronizing the brokers well, I use this:
vmplatina1:/opt/iona/fuse-message-broker-5.0.0.9# scp -r root@vmplatina2:/opt/iona/fuse-message-broker-5.0.0.9/data .
It copies data directory from the slave to the master. I've also tried deleting both data directories but it doesn't work.
-
10. Re: How to configure a cluster with fault tolerance
davestanley Mar 27, 2008 12:07 PM (in response to jssacristan)Hi Jorge,
Its sounds like you need a hot standby configuration (to give you HA).
You can do this in a few different ways, but start with the simplest case and setup both processes running on a single node. You would have esb1_host1(master) and esb2_host1(slave) and esb1 goes down, esb2 will come up and listen on the same hostname and port, so the failover will be transparent to the client.
In order to achieve this, you will need to have both ESB instances pointing to the same data directory. You will need to change the JMX ports in your servicemix.properties file for each instance but other than that they should have the exact same config and can use the exact same install. Enable the amq:persistence adapter in activemq.xml as follows:
<amq:persistenceAdapter> <amq:journaledJDBC journalLogFiles="5" dataDirectory="./data/amq"></amq:journaledJDBC> </amq:persistenceAdapter>
The first process started will be the master. When you start the second instance, the underlying ActiveMQ broker will try and lock its persistent store. As the master already has the lock, the slave will go into standby mode and will wait for the lock to be released. In standy mode, it will not listen on any ports until it can acquire the lock, so effectively its just standing by waiting on the master.
If you control-C the master, you should see the slave take over transparently and come up listening on its configured ports (which will be the exact same as the master).
Having both processes run on a single node is the simplest scenario. If you want the standby on a separate node, you can - but make sure they both use the same DB (using NFS or whatever).
Up to this point, we have just discussed HA and failover. Its also possible to setup a cluster of ESB containers where you have more than one live listening process. In order to do this you need to establish NetworkConnectors between the live ESB instances so they are aware of each other. So theoretically a HA/clustered configuration can look something like this:
esb1_host1(master) and esb2_host1(slave)
esb1_host2(master) and esb2_host2(slave)
Where esb1_host1 and esb1_host2 are servicing requests but esb2_host1 & esb2_host2 are just waiting to take over.
The second part to your question seems to be around achieving reliability for the HTTP BC. As you are using HTTP, when the master goes down you are going to loose your connection to the server so you need to be able to gracefully handle this error condition in your axis client - i.e. expect failures. The best you can do is to retry the request.
If its possible to use JMS rather than HTTP, this will give you a more robust solution as the JMS consumer can persist the message so that the slave can recover the message when it comes back up.
Another alternative maybe to use the CXF BC instead of HTTP BC/JSR181 with CXF on the client side with WS-RM enabled. This might be able to give you the reliability your looking for by handling the retries under the covers for you.
Hope this helps,
/Dave
-
11. Re: How to configure a cluster with fault tolerance
jssacristan Mar 27, 2008 2:17 PM (in response to davestanley)Thank you very much Dave! Your answer is very useful and interesting for me, now I know how I can advance. As soon as I can I'll prove all what you have commented and I'll post it in the forum.
Regards,
Jorge