Cluster Recoverability Issues...
lhankins Sep 21, 2007 3:33 PMHi Guys,
We're seeing some cluster recoverability issues. We're using JBoss 4.0.5 in a clustered configuration (just a tw
For the most part, everything works great.
A while back we had a quartz job that caused an OutOfMemoryException in one node of the cluster, after which, the whole cluster fell apart.
To try and reproduce this situation, I've created an admin only URL where I can cause one of the following two things on a single node of the cluster :
1) call system.exit
2) start a quartz job that purposefully runs the node out of memory.
I just ran a small test with scenario #1, and I can reproduce the problem. Basically, I hit the URL on node1, causing the JVM to exit. After that, node2 is still present, but the application is hamstrung (we get exceptions on any operation which touch JMS).
I've watched the logs on node2 when I cause node1 to die, and I do see the JMS queues/etc migrate from node1 (now dead) to node2. Here are the migration type log messages (immediately after node1 has died) :
2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServicePreExecutionMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServicePostExecutionMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServiceDownloadMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for DeliveryServiceMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ApplicationEventsMdb 2007-09-21 11:58:22,140 [MessageDispatcher up processing thread] [] [] INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.focus-rcl-cluster] New cluster view for partition focus-rcl-cluster (id: 2, delta: -1) : [10.10.11.14:1199] 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] I am (10.10.11.14:1199) received membershipChanged event: 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] Dead members: 1 ([10.10.11.13:1199]) 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] New Members : 0 ([]) 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] All Members : 1 ([10.10.11.14:1199]) 2007-09-21 11:58:22,656 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/jbossmq-httpil, warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/ 2007-09-21 11:58:23,672 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.il.uil2.UILServerILService] JBossMQ UIL service available at : /0.0.0.0:8193 2007-09-21 11:58:23,703 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.DLQ] Bound to JNDI name: queue/DLQ 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServicePreExecuteQueue] Bound to JNDI name: queue/rcl/reportServicePreExecuteQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServiceExecuteQueue] Bound to JNDI name: queue/rcl/reportServiceExecuteQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServiceDownloadQueue] Bound to JNDI name: queue/rcl/reportServiceDownloadQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServicePostExecuteQueue] Bound to JNDI name: queue/rcl/reportServicePostExecuteQueue 2007-09-21 11:58:23,734 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/deliveryServiceQueue] Bound to JNDI name: queue/rcl/deliveryServiceQueue 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Topic.rcl/events/reportEventsTopic] Bound to JNDI name: topic/rcl/events/reportEventsTopic 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Topic.rcl/events/applicationEventsTopic] Bound to JNDI name: topic/rcl/events/applicationEventsTopic 2007-09-21 11:58:25,297 [MessageDispatcher up processing thread] [] [] INFO [org.jboss.cache.TreeCache] viewAccepted(): [magnum:3542|2] [magnum:3542] 2007-09-21 11:58:30,703 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServiceExecutionMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for DeliveryServiceMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServiceDownloadMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServicePreExecutionMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ApplicationEventsMdb 2007-09-21 11:58:30,750 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServicePostExecutionMdb 2007-09-21 11:58:30,828 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServiceExecutionMdb 2007-09-21 11:58:30,844 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServicePostExecutionMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ApplicationEventsMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for DeliveryServiceMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServicePreExecutionMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServiceDownloadMdb
The exceptions we see after we kill node1 and then try to perform an operation that touches JMS on node2 are the following :
2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServicePreExecutionMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServicePostExecutionMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ReportServiceDownloadMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for DeliveryServiceMdb 2007-09-21 11:58:20,703 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms for ApplicationEventsMdb 2007-09-21 11:58:22,140 [MessageDispatcher up processing thread] [] [] INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.focus-rcl-cluster] New cluster view for partition focus-rcl-cluster (id: 2, delta: -1) : [10.10.11.14:1199] 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] I am (10.10.11.14:1199) received membershipChanged event: 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] Dead members: 1 ([10.10.11.13:1199]) 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] New Members : 0 ([]) 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] All Members : 1 ([10.10.11.14:1199]) 2007-09-21 11:58:22,656 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/jbossmq-httpil, warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/ 2007-09-21 11:58:23,672 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.il.uil2.UILServerILService] JBossMQ UIL service available at : /0.0.0.0:8193 2007-09-21 11:58:23,703 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.DLQ] Bound to JNDI name: queue/DLQ 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServicePreExecuteQueue] Bound to JNDI name: queue/rcl/reportServicePreExecuteQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServiceExecuteQueue] Bound to JNDI name: queue/rcl/reportServiceExecuteQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServiceDownloadQueue] Bound to JNDI name: queue/rcl/reportServiceDownloadQueue 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/reportServicePostExecuteQueue] Bound to JNDI name: queue/rcl/reportServicePostExecuteQueue 2007-09-21 11:58:23,734 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Queue.rcl/deliveryServiceQueue] Bound to JNDI name: queue/rcl/deliveryServiceQueue 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Topic.rcl/events/reportEventsTopic] Bound to JNDI name: topic/rcl/events/reportEventsTopic 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO [org.jboss.mq.server.jmx.Topic.rcl/events/applicationEventsTopic] Bound to JNDI name: topic/rcl/events/applicationEventsTopic 2007-09-21 11:58:25,297 [MessageDispatcher up processing thread] [] [] INFO [org.jboss.cache.TreeCache] viewAccepted(): [magnum:3542|2] [magnum:3542] 2007-09-21 11:58:30,703 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServiceExecutionMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for DeliveryServiceMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServiceDownloadMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServicePreExecutionMdb 2007-09-21 11:58:30,734 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ApplicationEventsMdb 2007-09-21 11:58:30,750 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for ReportServicePostExecutionMdb 2007-09-21 11:58:30,828 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServiceExecutionMdb 2007-09-21 11:58:30,844 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServicePostExecutionMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ApplicationEventsMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for DeliveryServiceMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServicePreExecutionMdb 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] [] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for ReportServiceDownloadMdb
Any ideas...?