4 Replies Latest reply on Sep 23, 2014 4:12 AM by vladimir_v

broker failover not triggered on java.lang.OutOfMemoryError

vladimir_v Sep 18, 2014 4:37 AM

Hi folks,

I have Jboss Fuse 6.1 running in Fabric with three root containers and each of them has child with A-MQ (one Master, two Slaves) with NFSv4 for back-end storage for the brokers.

Today I saw that the current A-MQ master was out of memory and I'm wondering why failover was not triggered. In the logs of the Master I have:

2014-09-16 14:53:59,724 | WARN | 987588645-167914 | nio | tty.io.nio.SelectChannelEndPoint 697 | 95 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.14.v20131031 | handle failed

2014-09-16 14:55:38,839 | WARN | 987588645-167921 | nio | tty.io.nio.SelectChannelEndPoint 697 | 95 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.14.v20131031 | handle failed

java.lang.OutOfMemoryError: Java heap space

2014-09-16 14:54:28,065 | ERROR | pool-17-thread-1 | GitDataStore | abric8.git.internal.GitDataStore 1117 | 84 - io.fabric8.fabric-git - 1.0.0.redhat-387 | Failed to pull from the remote git repo /services/jboss-fuse/fabric-01-brq/fabric8-karaf-1.0.0.redhat-379/instances/brq-amq01/data/git/local/fabric. Reason: java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space

2014-09-16 14:58:46,844 | WARN | qtp987588645-101 | nio | tty.io.nio.SelectChannelEndPoint 697 | 95 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.14.v20131031 | handle failed

java.lang.OutOfMemoryError: Java heap space

2014-09-16 14:58:41,707 | WARN | TCP Accept-44445 | tcp | sun.rmi.runtime.Log$LoggerLog 236 | - - | RMI TCP Accept-44445: accept loop for ServerSocket[addr=/0.0.0.0,localport=44445] throws

java.lang.OutOfMemoryError: Java heap space

2014-09-16 14:58:12,520 | WARN | tyMonitor Worker | FailoverTransport | sport.failover.FailoverTransport 267 | 107 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610387 | Transport (ssl://fuse-fabric-02-stg.jboss.org/10.34.40.177:61616) failed, attempting to automatically reconnect

2014-09-16 14:57:37,410 | WARN | per@0.0.0.0:8182 | AbstractConnector | erver.AbstractConnector$Acceptor 955 | 95 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.14.v20131031 |

java.lang.OutOfMemoryError: Java heap space

2014-09-16 14:57:21,957 | WARN | 645-99 Selector0 | nio | io.nio.SelectorManager$SelectSet 518 | 95 - org.eclipse.jetty.aggregate.jetty-all-server - 8.1.14.v20131031 |

java.lang.OutOfMemoryError: Java heap space

From the CLI "fabric:container-list" displayed it as "false" and no bindings in "cluster-list". The JVM process itself was still there.

On the Slave side it was giving errors that it can't obtain lock on KahaDB. That's correct because the Master JVM was still holding the lock.

My question is how can I avoid this? I want if the Master is in failed/false state to kill the JVM and release the NFS lock so a Slave can pick it up.

Thanks

1. Re: broker failover not triggered on java.lang.OutOfMemoryError

bibryam Sep 18, 2014 7:34 AM (in response to vladimir_v)

You can try setting useShutdownHook="false" systemExitOnShutdown="true" on the broker element. That should stop the process whenever a broker becomes slave.
Actions
2. Re: broker failover not triggered on java.lang.OutOfMemoryError

vladimir_v Sep 22, 2014 3:42 AM (in response to bibryam)

It didn't helped, I had OOM again this morning.
Any other ideas?
Actions
3. Re: broker failover not triggered on java.lang.OutOfMemoryError

bibryam Sep 22, 2014 5:10 AM (in response to vladimir_v)

In my opinion you should try to fix the OOM issue rather than master/slave one. If your broker throws OOM issues, even if the slave takes over, the chances are that the new broker will also fail with the same error as both brokers have the same configuration.
To prevent the OOM error, you have to configure brokers producer flow control http://activemq.apache.org/producer-flow-control.html#ProducerFlowControl-Systemusage

HTH
Actions
4. Re: broker failover not triggered on java.lang.OutOfMemoryError

vladimir_v Sep 23, 2014 4:12 AM (in response to bibryam)

Thanks for the link.
I'm not sure if in my case it will help, the brokers are almost idle all the time.

I checked the OS logs and looks like I have issues with the NFSv4 storage (KahaDB is there). I'll do a thread dumps when it happens again and submit a bug.
Actions

Go to original post