14 Replies Latest reply on Jul 22, 2016 2:51 AM by mnovak

    WF8.1/HornetQ2.4.7 Randomly blocking of random channels

    frankthetank

      Hello all!

       

      I am having a strange issue with random blocking of random topic queues. It is always the same topic as it is my messaging backbone.

      We started noticing this problem when we would get random OutOfMemory crashes and after investigations noticed that it was a topic queue filling up until the memory was exhausted.

      We do run on a very limited hardware (Quadcore ARMv7, 2gbRam) and the system will run fine during my performance tests (load average: 4.98, 4.85, 4.44, stable 68% Mem) but then it will randomly have issues:

       

      2016-07-14 09:10:06,776 WARN  [org.hornetq.core.client] (Thread-1683) HQ212054: Destination address=jms.topicMyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

       

      It can run for days doing the same tests (every 5 minutes we do something that 'floods' the system) and it works fine and then out of the blue....

      There are no known outside influencers and the system has the same set-up all the time. Only slight differences in monitoring & diagnostic tools (we use the jbosscli to gain info).

       

      I also really occasionally get this warning:

      HQ222172: Queue jms.queue.QueueName was busy for more than 10,000 milliseconds. There are possibly consumers hanging on a network operation

       

      The Topic itself is set to not use persistence, DLQ or paging. I did not remove the entries for journal though. Did not think it would hurt if configured but not used.

      Connector and acceptors are set to in-vm.

       

      The first time it was a listener that used a Wildcard in the selector, but after simply adding in multiple listeners for it, the issue remains.

      So far it has also been different Topic/Queues that were filling up. The subsequent handlers all do (more or less) the same basic things and write to the same Infinispan Cache.

      The listeners are a mix of annotated and programmatically created (those wildcard replacements) versions. I have seen the issue on both.

       

      On the occasion I was able to check on a system that was starting to fill up, I saw that there was a consumer, it just seemingly did not do anything.

       

      Though I do see two differences:

      a) a topics queue will fill up until it consumes all the memory and crashes. With the tools I was able to see the queue and the amount of messages it had until it busted.

        Clearly visible with large amount of log warnings.

      b) a topics queue will fill up, but then seemingly empty out again.

        Here I only see a entries for only a short time.

       

      I have two monitors running:

      one checks the delivering count of the topic

      another iterates through all queues of the topic and checks the message count. If it is > 0, it prints out more details

      both use jbosscli remotely which, while remote is faster then local because ARM, it is still kinda slow.

       

      Any ideas what to look for or how to improve my monitoring & diagnostics?

      Anything else needed for help?

       

      I saw in this Thread the mention of ActiveMQ. Should I switch over?

       

      Note: I do have an issue with Oracle's ARMv7 HF JVM segfaulting for unknown reasons. Could be related but I am not betting on it.

        • 1. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
          mnovak

          Hi Frank,

           

          I guess you have configured BLOCK address-full-policy to avoid paging. By default HornetQ takes 10MB of memory per queue/topic before it starts to block producers. You say you have 2GB of memory available. So if you have more than 200 queues or topics then it's possible that server fails on OutOfMemoryError if all the queues get full.

           

          Another problem might be that you have JMS client with transacted session and you send 1000s of normal messages in one transaction. Before committing the session they're all held memory. In this way server could be exhausted on memory as well.

           

          This brings me to another idea that you have many producers with large sessions as described above which together blow up the server memory.

           

          Cheers,

          Mirek

          • 2. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
            frankthetank

            Thanks for the reply, but first a quick update:

             

            So over the weekend I had the issue again. Actually I am working on it right now.

            Just one queue filling up even though all queues get the same amount of messages and doing the same amount of processing and all others (51 for this Topic) are working fine.

             

            dump via jboss-cli:

            6dc9d19f-34b9-45e8-a784-22c4ddccdfea : 11513

              consumer-count :     "result" => 1

              delivering-count :     "result" => 1

              filter :     "result" => "message_type = 'event_property_change' AND ( resource_url LIKE 'services/%/blah/something')"

              dead-letter-address :     "result" => undefined,

              durable :     "result" => false

              expiry-address :     "result" => undefined,

              id :     "result" => 119L

              messages-added :     "result" => 61469L

              paused :     "result" => false

              queue-address :     "result" => "jms.topic.MyChannel"

              scheduled-count :     "result" => 0L

              temporary :     "result" => true

             

            I enabled logging for the MDB via jboss-cli and see no logs being produced which indicates to me that the MDB stopped handling of messages.

            I cross-checked with the amount of subscribers on the Topic an the number is correct.

             

            The MDB has the @MessageDriven annotation and also has

            @TransactionManagement(value = TransactionManagementType.CONTAINER)

            @TransactionAttribute(value = TransactionAttributeType.NOT_SUPPORTED)

             

            @Mirek:

            Is this what you meant with transactions? I also have transactions disabled at the producer side, which is a Singleton/Startup that creates the connection & session during creation.

             

            I also have the MDB set to "useLocalTx", propertyValue = "true". Dunno if that can conflict with the above entries.

             

            I also left everything at BLOCK and such as I want to find the culprit and reason for the failures.

            Actually I think the best resolution would be to add the TTL so that the older messages get dumped, seeing that they will be replaced soon anyway.

            Mind you, until it is actually full, I do not get the "blahblah is blocked" message! At least I am not getting it now.

             

            Or if I totally missed what you are asking, pls elaborate so I can provided the needed information.

            Thanks!

             

            I have also thought about the GC being a culprit. Since the current instance is flooding again, I will restart and add in GC logging.

            Could also be our infamous NIC driver. Had some issues before. Hope this is not a Heisenbug.

             

            Update:

            The deployment claims delivery is active for this MDB:

            "BD.CBMEL" => {

                            "component-class-name" => "BD.CBMEL",

                            "declared-roles" => [],

                            "delivery-active" => true,

                            "execution-time" => 0L,

                            "invocations" => 0L,

                            "methods" => {},

                            "peak-concurrent-invocations" => 0L,

                            "pool-available-count" => 30,

                            "pool-create-count" => 1,

                            "pool-current-size" => 1,

                            "pool-max-size" => 30,

                            "pool-name" => "mdb-strict-max-pool",

                            "pool-remove-count" => 0,

                            "run-as-role" => undefined,

                            "security-domain" => "other",

                            "timers" => [],

                            "wait-time" => 0L,

                            "service" => undefined

            }

             

             

             

            A small dump of this weekends results:

            No lines were removed and only the topic name changed to protect the innocent. Sadly we cannot see who really is being filled or exceeding max delivery attempts (but in the latter both are among those queues for this topic)

             

            2016-07-14 13:45:45,923 WARN  [org.hornetq.core.client] (Thread-1765) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-14 17:05:54,093 WARN  [org.hornetq.core.client] (Thread-1759) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-15 15:48:07,943 ERROR [org.hornetq.ra] (Thread-21288 (HornetQ-client-global-threads-17694520)) HQ154004: Failed to deliver message: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 43]

            2016-07-15 15:48:47,475 WARN  [org.hornetq.core.server] (Thread-3276 (HornetQ-remoting-threads-HornetQServerImpl::serverUUID=1c5caa8d-4913-11e6-8fb8-fdbc4bf27b50-11387154-12030799)) HQ222150: Message has exceeded max delivery attempts. No Dead Letter Address configured for queue 935d5575-936e-4b5b-9cca-5ff2059f0184 so dropping it

            2016-07-15 16:18:06,844 ERROR [org.hornetq.ra] (Thread-21472 (HornetQ-client-global-threads-17694520)) HQ154004: Failed to deliver message: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 43]

            2016-07-15 16:18:21,456 WARN  [org.hornetq.core.server] (Thread-3331 (HornetQ-remoting-threads-HornetQServerImpl::serverUUID=1c5caa8d-4913-11e6-8fb8-fdbc4bf27b50-11387154-12030799)) HQ222150: Message has exceeded max delivery attempts. No Dead Letter Address configured for queue 93d2f890-45f4-4769-9daa-48d84607214e so dropping it

            2016-07-15 17:56:05,926 WARN  [com.barco.bcm.propertystore.notification.NotificationEmitterImpl] (Thread-22171 (HornetQ-client-global-threads-17694520)) Execption @ sendNotification : javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 30

            2016-07-16 04:17:48,212 WARN  [org.hornetq.core.client] (Thread-1809) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-16 19:28:48,267 WARN  [org.hornetq.core.client] (Thread-1788) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 06:58:10,900 ERROR [org.hornetq.ra] (Thread-36848 (HornetQ-client-global-threads-17694520)) HQ154004: Failed to deliver message: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 43]

            2016-07-17 06:58:24,115 WARN  [org.hornetq.core.server] (Thread-5864 (HornetQ-remoting-threads-HornetQServerImpl::serverUUID=1c5caa8d-4913-11e6-8fb8-fdbc4bf27b50-11387154-12030799)) HQ222150: Message has exceeded max delivery attempts. No Dead Letter Address configured for queue 93d2f890-45f4-4769-9daa-48d84607214e so dropping it

            2016-07-17 08:15:49,870 WARN  [org.hornetq.core.client] (Thread-1759) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 10:06:08,381 ERROR [org.hornetq.ra] (Thread-38074 (HornetQ-client-global-threads-17694520)) HQ154004: Failed to deliver message: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 43]

            2016-07-17 10:06:38,399 WARN  [org.hornetq.ra] (Thread-38074 (HornetQ-client-global-threads-17694520)) HQ152002: Unable to roll local transaction back

            2016-07-17 14:27:46,153 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:27:56,171 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:28:06,185 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:28:16,193 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:28:26,205 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:28:36,213 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-17 14:28:46,226 WARN  [org.hornetq.core.client] (Thread-1906) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            2016-07-18 04:38:48,311 WARN  [org.hornetq.core.client] (Thread-2004) HQ212054: Destination address=jms.topic.MyChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

            • 3. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
              mnovak

              Hi Frank,

               

              I guess the best way to proceed is make heap dump when OutOfMemoryError happens. Could you add "-XX:+HeapDumpOnOutOfMemoryError" to JAVA_OPTS when starting the server (it can be added to $JBOSS_HOME/bin/standalone.conf)?

               

              Then could you create a few heap dumps during normal operation. By comparing several heap dumps (for example in visualvm) we can find which objects are growing and find what's going wrong. Heap dump can be created by:

              <JAVA_HOME>/bin/jmap -dump:format=b,file=c:\temp\heap.hprof <PID>

               

              Could you share your standalone...xml file?

               

              What is MDB doing? It's reading message from topic? Does it call some EJB?

               

              Thanks,

              Mirek

              • 4. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                frankthetank

                While I am assembling the data you requested, a quick answer to "What the MDB is doing?"

                We have a central property storage where the devices we monitor store their properties on a per-device basis. We use an infiniscan cache with each device getting its own region. The property gets prepended with "service/UID/" when placed into the storage.

                If a property changed, we send out a message containing the updated property.

                On the other side of the topic we have individual listeners/handlers for various, what we call group specific, data. One each, nothing duplicated/concurrently listening for the same.

                What most, as in the example above, do is to see if all the devices have the same values (via the storage) and dump the result as a property into the storage as a 'group' data (again in it's own region).

                Again, creates message.

                 

                All the listeners that have misbehaved select via the wildcard "service/%/property/url", where the wildcard simply stands for 'from any device'. They also only have 1 thread each.

                 

                Currently I am running with 48 simulated devices with 150 properties each. Though the special listeners only listen for aprox. 40ish properties. To create traffic I toggle between disconnecting and re-establishing the monitoring connection to the devices every 5 minutes. This is more then worst case but it helps finding issues.

                 

                Here is a dropbox to the required files:

                dump.hprof pwd = Topic url excluding # and beyond

                standalone.xml

                standalone.conf just in case its needed (the GC stuff was added after the last run)

                • 5. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                  mnovak

                  Thanks Frank for the update!

                   

                  You mentioned that you're using message selectors. I'm not sure where they're used. If there are 40 properties which you're interested then is it that there are 40 MDBs where each has message selector for one property?

                  • 6. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                    frankthetank

                    Mirek, yes correct.

                    This way basically gave me the best value on my limited hardware to get the desired throughput.

                    Earlier versions used more wildcarded selectors but that caused congestion and using multiple threads could cause race conditions.

                     

                    Example would be:

                    "message_type = 'event_property_change' AND ( resource_url LIKE 'services/%/something/blah')"

                    • 7. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                      frankthetank

                      Sorry, used the wrong password for the dump. Re-uploading now.

                      • 8. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                        mnovak

                        Thanks Frank! Another question

                         

                        Topic is working in a way that if you send a message to topic then all subscribers will get this message. So if there are 40 MDBs on topic then there are 40 subscriptions. You can imagine subscription as normal queue with just one consumer (=subscriber). When message is send to topic then all the subscriptions/queues get reference to this message. When all MDBs receives/processes this message then the message itself is deleted because there are 0 references to it.

                         

                        If MDB is receiving only some messages which matches the message selector then there must stay some messages in the subscription. Those are messages which do not match message selector of the MDB and are never consumed. Do you clean up those messages?

                         

                        You can check number of messages for each subscription in CLI:

                        /subsystem=messaging/hornetq-server=default/runtime-queue=...<some string from subscription name and connection id of MDB>:count-messages()

                         

                        Not sure if use of queue instead of topic would solve this problem. Anyway this is something what does not explain OutOfMemoryError. Dump will help.

                        • 9. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                          frankthetank

                          Thanks Mirek. I really appreciate you help!

                           

                          @Topics: really? I thought they worked in such a way that if you send a message it checks to see who wants the message, gives it to them and if no one else wants it, it gets dropped.

                          I.e. the publish/subscribe pattern.

                          At least the behaviour so far would support it as, while I see it rise and fall with the delivering-count, I see no 'left over' messages.

                           

                          But would that not only make sense if any kind of congestion was on the Topic and not in one of the Queues selecting from the topic? All of the 39 others work fine with no congestion visible ...

                          Then again, my monitor was only tracking delivering-count, not message-count. Let me retest and get back on that.

                           

                          Anyway, looks like I have to read up on my HornetQ documentation.

                           

                          Do you know to get from the MDB to the queue-id it uses? Totally clueless how to get that. Currently I just take the filter to figure out who is responsible. Sadly that changes every time.

                          • 10. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                            mnovak

                            Well, now i'm confused as well but I hope that i'm right with the topic:-)

                             

                            I checked the queue-id thing. If activation config property cliendID and subscriptionName is specified for MDB like:

                            @MessageDriven(name = "mdb",

                              activationConfig = {

                            @ActivationConfigProperty(propertyName = "subscriptionName", propertyValue = "mySubscription"),

                            @ActivationConfigProperty(propertyName = "clientID", propertyValue = "myClientId"),

                            ...})

                             

                            then the name of the runtime-queue in CLI is:

                            /subsystem=messaging-activemq/server=default/runtime-queue=myClientId.mySubscription

                             

                            If cliendID and subscriptionName is not set then random value is generated.

                            • 11. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                              frankthetank

                              Hey Mirek,

                               

                              ok, thanks! I had used it before and never saw it show up anywhere... now I know. Cool!

                              Will change asap.

                               

                              Edit:

                              Do you know if similar is possible for programmatically created listeners?

                               

                              Quick side update:

                              Running on completely isolated network (actually our default) and with GC running I did see this:

                               

                              Tue Jul 19 03:19:35 UTC 2016 56954.358: [Full GC (Ergonomics) [PSYoungGen: 7309K->0K(248576K)] [ParOldGen: 781365K->383841K(786432K)] 788675K->383841K(1035008K), [Metaspace: 80416K->80416K(85296K)], 2.9085510 secs] [Times: user=8.73 sys=0.03, real=2.92 secs]

                              and at the same time:

                              2016-07-19 03:19:46,283 WARN  [org.hornetq.core.client] (Thread-295) HQ212054: Destination address=jms.topic.PropertyChangeChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

                              2016-07-19 03:20:06,165 WARN  [com.barco.bcm.server.serviceconnection.notification.PropertyChangeEmitterImpl] (Thread-322) Execption @ sendNotification : javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 49

                               

                              again

                              Tue Jul 19 07:59:35 UTC 2016 73755.358: [Full GC (Ergonomics) [PSYoungGen: 4909K->0K(248064K)] [ParOldGen: 780893K->387167K(786432K)] 785803K->387167K(1034496K), [Metaspace: 80422K->80422K(85296K)], 2.6073058 secs] [Times: user=7.53 sys=0.00, real=2.61 secs]

                              and at the same time:

                              2016-07-19 07:59:46,882 WARN  [org.hornetq.core.client] (Thread-271) HQ212054: Destination address=jms.topic.PropertyChangeChannel is blocked. If the system is configured to block make sure you consume messages on this configuration.

                              2016-07-19 08:00:06,835 WARN  [com.barco.bcm.server.serviceconnection.notification.PropertyChangeEmitterImpl] (Thread-319) Execption @ sendNotification : javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 105

                               

                              Though it does not happen all the time. Could be that if a message tries to get sent at exactly that time, it causes the short congestion warning.

                              Also it affects other network related things as well, at least our connections to the monitored devices indicated problems.

                              • 12. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                                mnovak

                                This is weird. The warn message: "Destination address=jms.topic.PropertyChangeChannel is blocked...." should appear only when topic is full and BLOCK policy activates. It means that value max-size-bytes (10 Mb) was reached for the topic.

                                 

                                Could you check the message counts in the subcriptions in CLI in runtime-queue? I want to know who was right with the topic. :-)

                                 

                                I have a note to the design of the application. Do you think that you could use one queue instead of topic? The idea is that each MDB receives just its own messages which matches the MDB's message selector. There must not stay messages in the queue. So the prerequisite is that every message must match message selector of just one MDB.

                                • 13. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                                  frankthetank

                                  As requested I created a script to dump the delivering count of all subscriptions for the given topic -> file

                                  During this time the message-count and delivering-count went up and down in harmony and nothing was left unclaimed.

                                   

                                  @CLI:

                                  I figured out that I can actually use --commands and chain them up for better performance. At least one pain improved.

                                   

                                  @GC:

                                  So during a "Full GC (Ergonomics)" it seems that this has an effect on threads.

                                  I can get a 'is blocked' on my topic and/or it influences other threads (in this case websocket connection to device) I have.

                                  During the two that happened last night one caused the 'is blocked', the other the websocket, but not both.

                                   

                                  What if is the GC that (indirectly) kills/hangs the thread and in case of a thread used by a topic, causes the build up?

                                   

                                  Now don't ask me what GC we are using... haven't quite been able to figure that one out.

                                  java -XX:+PrintFlagsFinal -version -> file Cannot see what GC is being used...

                                  • 14. Re: WF8.1/HornetQ2.4.7 Randomly blocking of random channels
                                    mnovak

                                    Hi Frank,

                                     

                                    this is becoming tough case. I believe you might hit some dead lock or live deadlock case. Without thread dump and proper reproducer so developers could take a look and review your code and use case then it will be hard to solve it. I would suggest you to migrate to newer Wildfly version, for example to Wildfly 10 and see if the problem persist there.

                                     

                                    There will be much better developer response for Wildfly 10 issues.