3 Replies Latest reply on Sep 3, 2015 1:15 PM by jbertram

    Lost HornetQ messages

    venom27

      We have a big issue with HornetQ under Wildfly 8.2.0.FINAL

      JMS messages sometimes lost without any errors or notifications.

      We have a cluster with 21 wildfly nodes. Every node has only one deployed app. Most of applications are 'single-noded'. Several are deployed on two nodes for scalability reasons. Config for all nodes is the same.

      Almost all from standalone-full-ha.xml. HornetQ config below:

              <subsystem xmlns="urn:jboss:domain:messaging:2.0">

                  <hornetq-server>

                      <security-enabled>false</security-enabled>

                      <cluster-user>jmscluster</cluster-user>

                      <cluster-password>${jboss.messaging.cluster.password:R2xld2VrRGltZWw3}</cluster-password>

                      <journal-file-size>102400</journal-file-size>

       

       

                      <connectors>

                          <http-connector name="http-connector" socket-binding="http">

                              <param key="http-upgrade-endpoint" value="http-acceptor"/>

                          </http-connector>

                          <http-connector name="http-connector-throughput" socket-binding="http">

                              <param key="http-upgrade-endpoint" value="http-acceptor-throughput"/>

                              <param key="batch-delay" value="50"/>

                          </http-connector>

                          <in-vm-connector name="in-vm" server-id="0"/>

                      </connectors>

       

       

                      <acceptors>

                          <http-acceptor http-listener="default" name="http-acceptor"/>

                          <http-acceptor http-listener="default" name="http-acceptor-throughput">

                              <param key="batch-delay" value="50"/>

                              <param key="direct-deliver" value="false"/>

                          </http-acceptor>

                          <in-vm-acceptor name="in-vm" server-id="0"/>

                      </acceptors>

       

       

                      <broadcast-groups>

                          <broadcast-group name="bg-group1">

                              <socket-binding>messaging-group</socket-binding>

                              <connector-ref>

                                  http-connector

                              </connector-ref>

                          </broadcast-group>

                      </broadcast-groups>

       

       

                      <discovery-groups>

                          <discovery-group name="dg-group1">

                              <socket-binding>messaging-group</socket-binding>

                          </discovery-group>

                      </discovery-groups>

       

       

                      <cluster-connections>

                          <cluster-connection name="my-cluster">

                              <address>jms</address>

                              <connector-ref>http-connector</connector-ref>

                              <discovery-group-ref discovery-group-name="dg-group1"/>

                          </cluster-connection>

                      </cluster-connections>

       

       

                      <security-settings>

                          <security-setting match="#">

                              <permission type="send" roles="guest"/>

                              <permission type="consume" roles="guest"/>

                              <permission type="createNonDurableQueue" roles="guest"/>

                              <permission type="deleteNonDurableQueue" roles="guest"/>

                          </security-setting>

                      </security-settings>

       

       

                      <address-settings>

                          <address-setting match="#">

                              <dead-letter-address>jms.queue.DLQ</dead-letter-address>

                              <expiry-address>jms.queue.ExpiryQueue</expiry-address>

                              <max-size-bytes>10485760</max-size-bytes>

                              <page-size-bytes>2097152</page-size-bytes>

                              <message-counter-history-day-limit>10</message-counter-history-day-limit>

                              <redistribution-delay>1000</redistribution-delay>

                          </address-setting>

                      </address-settings>

       

       

                      <jms-connection-factories>

                          <connection-factory name="InVmConnectionFactory">

                              <connectors>

                                  <connector-ref connector-name="in-vm"/>

                              </connectors>

                              <entries>

                                  <entry name="java:/ConnectionFactory"/>

                              </entries>

                          </connection-factory>

                          <connection-factory name="RemoteConnectionFactory">

                              <connectors>

                                  <connector-ref connector-name="http-connector"/>

                              </connectors>

                              <entries>

                                  <entry name="java:jboss/exported/jms/RemoteConnectionFactory"/>

                              </entries>

                              <ha>true</ha>

                              <block-on-acknowledge>true</block-on-acknowledge>

                              <reconnect-attempts>-1</reconnect-attempts>

                          </connection-factory>

                          <pooled-connection-factory name="hornetq-ra">

                              <transaction mode="xa"/>

                              <connectors>

                                  <connector-ref connector-name="in-vm"/>

                              </connectors>

                              <entries>

                                  <entry name="java:/JmsXA"/>

                                  <entry name="java:jboss/DefaultJMSConnectionFactory"/>

                              </entries>

                          </pooled-connection-factory>

                      </jms-connection-factories>

       

       

                      <jms-destinations>

                              ...

                      </jms-destinations>

                  </hornetq-server>

              </subsystem>

       

      Each server starts like this:

      java -D[Standalone] -Xms128m -Xmx1024m -XX:MaxPermSize=512m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -server -XX:MaxDirectMemorySize=256M -XX:+UseThreadPriorities -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseStringCache -XX:+UseCodeCacheFlushing -XX:+UseLargePages -XX:+UseG1GC -XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -Dorg.jboss.boot.log.file=/opt/wildfly/standalone/log/server.log -Dlogging.configuration=file:/opt/wildfly/standalone/configuration/logging.properties -jar /opt/wildfly/jboss-modules.jar -mp /opt/wildfly/modules org.jboss.as.standalone -Djboss.home.dir=/opt/wildfly -Djboss.server.base.dir=/opt/wildfly/standalone -c standalone-full-ha.xml -Djboss.bind.address=0.0.0.0 -Djboss.bind.address.unsecure=0.0.0.0 -Djboss.bind.address.management=0.0.0.0 -Djboss.messaging.group.address=231.7.7.9 -Djboss.node.name=some-srv2 -Djboss.default.multicast.address=230.0.44.195

      Where

      jboss.messaging.group.address 

      Has same value for all nodes,

      jboss.default.multicast.address 

      Is unique per node. Servers with the same applications have the same subnet (230.0.44.195 and 230.0.44.35 for example)

      Under java we use hornetq-ra connection factory with usual javaEE MDB classes and JMSContextfor sending.

      Plus sometimes we have errors like this on server startup:

      ERROR [Thread-5 (HornetQ-server-HornetQServerImpl::serverUUID=95088f81-5139-11e5-b318-6fe8ac3312f9-171344418)] [] [core.client] HQ214016: Failed to create netty connection: java.nio.channels.UnresolvedAddressException

      Every cluster start the issue appears on different nodes. Tried to figure out proper sequence of server start as workaround. Nothing helps. With 5 nodes cluster works usually fine. Any help will be appreciated.

      Update

      We 'detect' lost message by logs. We have some common MessageSender which shortly looks like

      public void sendEvent(Event event) {

          try (JMSContext jmsContext = connectionFactory.createContext()) {

          jmsContext.createProducer().send(destination, event);

          logger.info("Event with type [" + event.getType() + "] successfully sent to destination: " + destination);

          }

      }

       

      And MDB on other side receive the message in this way:

       

      @MessageDriven(name = "AppMDB", activationConfig = {

        @ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue"),

        @ActivationConfigProperty(propertyName = "destinationLookup", propertyValue = "java:jms/queue/someInQueue"),

        @ActivationConfigProperty(propertyName = "minSession", propertyValue = "1"),

        @ActivationConfigProperty(propertyName = "maxSession", propertyValue = "4")

      })

      public class AppMDB implements MessageListener {

          @Override
          @EventEntryPoint
          public void onMessage(Message message) {

              logger.info("App received incoming event: " + event);

              ...

          }

      }

       

      I see successful sending from one side, and no 'receive' message on other. This is how we detect 'lost' ones.

       

      BTW,

      • The cluster is under Amazon AWS
      • JTA transactions are used
        • 1. Re: Lost HornetQ messages
          jbertram

          Couple of things:

          1. You mention lost messages but then you don't explain the circumstances in which you observe the lost messages.  More detail on that would be helpful if you want some feedback on that specific point.
          2. You say that most apps are bound to a single node yet you are clustering all your nodes together.  Can you explain the rationale behind that?
          3. Can you include the stack-trace for the UnresolvedAddressException you're observing?
          4. I see that you're binding your servers so that they listen on 0.0.0.0, but I don't see any evidence that you're changing your http-connector to use a specific IP address rather than 0.0.0.0.  Remember, 0.0.0.0 is useless for a client trying to connect to a remote server (since it is a loop-back address).  I think it's likely that your nodes aren't even actually clustering properly due to this issue.
          • 2. Re: Lost HornetQ messages
            venom27

            Hi, thanks for help!

            1. Updated original post with some code samples

            2. Well, we have several apps which differs by business logic (SOA). All of them use HornetQ as MOM to send/receive messages from each other. And all of them live in one cluster. Depends on load we can add/remove some of the nodes at any time (horizontal scalability). Hope answer your question

            3. Sure

            ERROR [org.hornetq.core.client] (Thread-17 (HornetQ-server-HornetQServerImpl::serverUUID=8614850f-51ff-11e5-9833-2f12

            f8df8fc9-99220722)) HQ214016: Failed to create netty connection: java.nio.channels.UnresolvedAddressException

                    at sun.nio.ch.Net.checkAddress(Net.java:107) [rt.jar:1.7.0_80]

                    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:649) [rt.jar:1.7.0_80]

                    at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:176) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:169) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelPipeline$HeadHandler.connect(DefaultChannelPipeline.java:1008) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:495) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:480) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.CombinedChannelDuplexHandler.connect(CombinedChannelDuplexHandler.java:168) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:495) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:480) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:495) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:480) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:465) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:847) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:354) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.15.Final.jar:4.0.15.Final]

                    at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_80]

            4. We thought about this point. Actually we have some open questions here. As I said before, we are under Amazon AWS. And by default multicast isn't supported. To get rid of this restriction, we added one more interface (call them edge0) to all vm's which support multicast and broadcast. Actually this is a switch for all wildfly nodes, to be able to connect them together. So now on each vm we have two interfaces: eth0 as default one which can be accessed from outside, and  edge0 - internal one, can not be exposed outside. So, when we use ip from eth0 - no multicast, all falls down. When we use edge0 ip - wildfly's webUI and load balancers are failed. So we left 0.0.0.0

            • 3. Re: Lost HornetQ messages
              jbertram

              On the actual message loss I'd need a reproducible test-case to investigate further.  I don't see anything which might indicate a problem.

               

              The UnresolvedAddressException looks like a configuration and/or environmental issue.

               

              I certainly understand that using 0.0.0.0 is necessary in some use-cases, but in any case that's not a viable configuration for your connector(s) (and you should even see a message in the log saying that).  I suggest you configure the connectors with a real IP address whatever that may be.

               

              I also suggest you use a clustering mechanism that's suitable for AWS like JGroups' S3_PING which can be used by HornetQ.