6 Replies Latest reply on Jan 11, 2016 1:44 PM by meabhi007

    If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.

    meabhi007

      Hi,

       

      I have a 2 node hornetQ cluster configured through wildfly 8.2.0 standalone deployments.

      It works fine if one node is broadcasting the message and other node is just listening.

      As soon as, another source (it can be any utility or active node from other clustered environment) also starts broadcasting the message on same multicast socket (configured for above mentioned cluster), Out of memory exception is thrown by hornetq with following stacktrace:

       

      2015-12-23 09:19:17.692 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> Exception in thread "hornetq-discovery-group-thread-dg-group1" <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> java.lang.OutOfMemoryError: Java heap space <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readSimpleStringInternal(ChannelBufferWrapper.java:86) <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readStringInternal(ChannelBufferWrapper.java:115) <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> at org.hornetq.core.buffers.impl.ChannelBufferWrapper.readString(ChannelBufferWrapper.java:93) <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> at org.hornetq.core.cluster.DiscoveryGroup$DiscoveryRunnable.run(DiscoveryGroup.java:300) <@>

      2015-12-23 09:19:17.708 GMT+0000 <@> ERROR <@> [:hornetq-discovery-group-thread-dg-group1] <@> <@> <@> <@> <@> <@> <@> at java.lang.Thread.run(Thread.java:745) <@>

       

       

      Actual memory usage captured through heapdump is hardly 60-70MB, which means there is something else, which is causing this exception.

      Did anyone encountered this kind of issue?


       

      Message was edited by: Abhishek Abhishek MulticastTest.rar is utility to generate Multicast messages on specific multicast socket.

        • 1. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
          jbertram

          If a HornetQ node is configured with a broadcast-group then it will broadcast discovery information every so often (depending on the configuration) and any node configured with a discovery-group will receive those broadcasts (assuming their sending/listening on the same multicast address/port) and respond as necessary.  Note, actual "core" or JMS messages are not part of this.

           

          I've never seen an OOME from the discovery-group thread.  Are you using -XX:+HeapDumpOnOutOfMemoryError to get the heap dump when the OOME occurs?

           

          Any additional details you can provide about the use-case (including server and client configuration) would be helpful.

          • 2. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
            meabhi007

            Hi Justin,

            yes, HornetQ is configured with broadcast group and i am using -XX:+HeapDumpOnOutOutOfMemoryError to get the heapdump.

             

            following is messaging subsystem configuration on messaging server:

             

            =============

            <subsystem xmlns="urn:jboss:domain:messaging:2.0">

                        <hornetq-server>

                            <persistence-enabled>true</persistence-enabled>

                            <security-enabled>false</security-enabled>

                           <journal-type>NIO</journal-type>

                           <journal-file-size>102400</journal-file-size>

                            <journal-min-files>2</journal-min-files>

                            <shared-store>false</shared-store>

                            <backup>false</backup>

                            <allow-failback>true</allow-failback>

                            <failover-on-shutdown>true</failover-on-shutdown>

                            <check-for-live-server>true</check-for-live-server>

                            <backup-group-name>${messaging.backup.group.a:backup-group-1}</backup-group-name>

                            <connectors>

                                <netty-connector name="netty" socket-binding="messaging"/>

                 <netty-connector name="netty-throughput" socket-binding="messaging-throughput">

                       <param key="batch-delay" value="50"/>

                                </netty-connector>

                                <http-connector name="http-connector" socket-binding="http">

                                    <param key="http-upgrade-endpoint" value="http-acceptor"/>

                                </http-connector>

                                <http-connector name="http-connector-throughput" socket-binding="http">

                                    <param key="http-upgrade-endpoint" value="http-acceptor-throughput"/>

                                    <param key="batch-delay" value="50"/>

                                </http-connector>

                                <in-vm-connector name="in-vm" server-id="0"/>

                            </connectors>

                            <acceptors>

                                <netty-acceptor name="netty" socket-binding="messaging"/>

                 <netty-acceptor name="netty-throughput" socket-binding="messaging-throughput">

                       <param key="batch-delay" value="50"/>

                       <param key="direct-deliver" value="false"/>

                                </netty-acceptor>

                                <http-acceptor name="http-acceptor" http-listener="default"/>

                                <http-acceptor name="http-acceptor-throughput" http-listener="default">

                                    <param key="batch-delay" value="50"/>

                                    <param key="direct-deliver" value="false"/>

                                </http-acceptor>

                                <in-vm-acceptor name="in-vm" server-id="0"/>

                            </acceptors>

                            <broadcast-groups>

                                <broadcast-group name="bg-group1">

                                    <socket-binding>messaging-group</socket-binding>

                                    <connector-ref>netty</connector-ref>

                                </broadcast-group>

                            </broadcast-groups>

                            <discovery-groups>

                                <discovery-group name="dg-group1">

                                    <socket-binding>messaging-group</socket-binding>

                                </discovery-group>

                            </discovery-groups>

                            <cluster-connections>

                                <cluster-connection name="my-cluster">

                                    <address>jms</address>

                                    <connector-ref>netty</connector-ref>

                                    <discovery-group-ref discovery-group-name="dg-group1"/>

                                </cluster-connection>

                            </cluster-connections>

                            <security-settings>

                                <security-setting match="#">

                                    <permission type="send" roles="guest"/>

                                    <permission type="consume" roles="guest"/>

                                    <permission type="createNonDurableQueue" roles="guest"/>

                                    <permission type="deleteNonDurableQueue" roles="guest"/>

                                </security-setting>

                            </security-settings>

                            <address-settings>

                                <!--default for catch all-->

                                <address-setting match="#">

                                    <dead-letter-address>jms.queue.DLQ</dead-letter-address>

                                    <expiry-address>jms.queue.ExpiryQueue</expiry-address>

                                    <redelivery-delay>0</redelivery-delay>

                                    <max-size-bytes>10485760</max-size-bytes>

                                    <address-full-policy>PAGE</address-full-policy>

                                    <page-size-bytes>2097152</page-size-bytes>

                                    <message-counter-history-day-limit>10</message-counter-history-day-limit>

                                </address-setting>

                            </address-settings>

                       <jms-connection-factories>

                                <connection-factory name="InVmConnectionFactory">

                                    <connectors>

                                        <connector-ref connector-name="in-vm"/>

                                    </connectors>

                                    <entries>

                                        <entry name="java:/ConnectionFactory"/>

                                    </entries>

                                    <connection-ttl>600000</connection-ttl>

                                    <client-failure-check-period>120000</client-failure-check-period>

                                </connection-factory>

                                <connection-factory name="RemoteConnectionFactory">

                                    <connectors>

                                        <connector-ref connector-name="netty"/>

                                    </connectors>

                                    <entries>

                                        <entry name="RemoteConnectionFactory"/>

                                        <entry name="java:jboss/exported/jms/RemoteConnectionFactory"/>

                                    </entries>

                                    <ha>true</ha>

                                    <block-on-acknowledge>true</block-on-acknowledge>

                                    <reconnect-attempts>-1</reconnect-attempts>

                                    <connection-ttl>600000</connection-ttl>

                                    <client-failure-check-period>120000</client-failure-check-period>

                                </connection-factory>

                                <connection-factory name="testConnectionFactory">

                                    <connectors>

                                        <connector-ref connector-name="netty" backup-connector-name="netty"/>

                                    </connectors>

                                    <entries>

                                        <entry name="testConnectionFactory"/>

                                        <entry name="java:jboss/exported/testConnectionFactory"/>

                                    </entries>

                                    <ha>true</ha>

                                    <block-on-acknowledge>true</block-on-acknowledge>

                                    <reconnect-attempts>-1</reconnect-attempts>

             

             

                                    <connection-ttl>600000</connection-ttl>

                                    <client-failure-check-period>120000</client-failure-check-period>

                                </connection-factory>

                                <pooled-connection-factory name="hornetq-ra">

                                    <transaction mode="xa"/>

              <consumer-window-size>0</consumer-window-size>

                                    <connectors>

                                        <connector-ref connector-name="in-vm"/>

                                    </connectors>

                                    <entries>

                                        <entry name="java:/JmsXA"/>

                                    </entries>

                                    <reconnect-attempts>-1</reconnect-attempts>

                                    <connection-ttl>600000</connection-ttl>

                                    <client-failure-check-period>120000</client-failure-check-period>

                                </pooled-connection-factory>

                            </jms-connection-factories>

                            <jms-destinations>

                                <jms-queue name="report">

                                    <entry name="report"/>

                                    <entry name="java:jboss/exported/report"/>

                                </jms-queue>

                                <jms-topic name="indexdata">

                                    <entry name="indexdata"/>

                                    <entry name="java:jboss/exported/indexdata"/>

                                </jms-topic>

                            </jms-destinations>

                        </hornetq-server>

                    </subsystem>

            ==========

             

            I am using a java utility to generate the additional muticast messages on same Multicast socket, generating even 10-20 additional messages on that socket causes the heapdump geneated on Messaging server.

            • 3. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
              jbertram

              Why are you generating additional multicast messages on the same multicast address/port where HornetQ is listening for broadcast information?

              • 4. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
                meabhi007

                Our QA team reported the OOM issue in Messaing server, while troubleshooting we observed that it can happen when more than one JMS server nodes(from different clusters) are broadcasting on same Multicast socket.

                To quickly reproduce the same, we tried to generate the message from utility and managed to reproduce the issue.

                 

                Now we are looking for answers to following questions:

                1. is there a configuration in wildfly/hornetQ, so that discovery group can filter messages coming from additional source. this will make sure that Cluster node doesn't get affected, even when any other cluster's broadcaster is using the same multicast message.

                2. Throwing OOM exception in this case seems inappropriate / confusing,  rather hornetQ code should have handled it properly and log statemement that receiving messages from multiple sources, which are not part of this cluster.

                • 5. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
                  jbertram

                  Clusters should be isolated from each other by using a unique multicast address/port.  If you have multiple, different clusters using the same multicast address/port then I consider that a misconfiguration.  There is no way to "filter" unwanted broadcasts.

                   

                  The OOME thrown in this instance isn't coming from HornetQ itself.  It's coming from the JVM in response to a call in HornetQ which requires more memory.

                   

                  There is no information in the message which indicates the identity of the cluster the message is for so there is no way for the receiver to know it isn't intended for him.  That is why each cluster should use a unique multicast address/port.

                   

                  Can you provide me with a test-case I can use to reproduce the OOME?

                  • 6. Re: If HornetQ cluster receives messages from more than one node, then Out of Memory exception occurs.
                    meabhi007

                    Hi Justin,

                     

                    Thanks for your updates.

                    I guess, my question is already answered with your last response.

                    To reproduce the OOM issue, you can use the attached MulticastTest.rar utility to generate multicast messages on socket, where any one cluster is active.