13 Replies Latest reply on Dec 11, 2013 5:46 PM by synclpz

    Problem with HornetQ HA colocated cluster with (live+backup node pairs)

    bbuterbrott

      I have been working with HornetQ HA cluster and was trying to create a working solution, but I failed at it. I would appriciate if you would tell me where I am wrong.

       

      I have two JBoss 6.1.EAP (with HornetQ module updated to version 2.3.12) running together in cluster with the following HornetQ configs (which I have attached to this message). I want to get a HornetQ colocated cluster with live+backup pairs on each JBoss, so that messages won't be lost and would be delievered only one time.

       

      I made a test case to see if it was possible. I deployed producer war on the first node (hornetq-producer.zip project) and consumer wars on both nodes (hornetq-mdb.zip project).

       

      My producer sends 5 000 messages to the testQueue on the GET HTTP request. After it has sent all the messages it responds with OK string.

      Consumer mdbs has a 300ms delay to consume message, so that the rate of consumption is less than the rate of producing messages.

       

      So the test flow is the following:

      1. Start producer and wait for it to respond with OK, which means all the messages has been sent.
      2. After that kill -9 JBoss on node1.
      3. Wait for all messages to be processed.
      4. Start node1 again.
      5. Wait if there are new messages.
      6. Then count "message received" log entries. Count should be equal to 5 000.

       

      As an example I had this result:

       

      After 3rd step:

       

      node1: 891 messages

      node2: 4 367 messages

      total: 5 258 messages

       

      After 5th step no messages came.

       

      So the question is if it is possible to achieve what i am trying to get from HornetQ? If so, if the configuration of hornetq servers is right? Or maybe i am missing something important?

        • 1. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
          ataylor

          yes you should receive only 5000 msgs, the caveat here is for xa transactions that are awaiting recovery by the transaction manager. Im a bit confused as to why you get > 5000, are you sure you are clearing out any old journals between runs, just in case?, also make sure they aren sharing a journal by accident.

           

          I would add some sort of Integer id and timestamp to your messages by your producer so you can see what, if any, messages are being delivered twice.

          • 2. Re: Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
            bbuterbrott

            Thanks for the fast reply!

            Yes, I am removing tmp, log and data dirs in standalone server dir completly after each test run, only after JBoss stop.

            Two JBoss are not sharing journal because path to the journal is relative to JBoss home directory. But active and backup server on each JBoss seem to share it. Should I explicitly specify journal path for each HornetQ server instance on JBoss? Or this should not be a problem?

            Regarding the duplicates. My producer uses AtomicLong to produce unique id of the message. I see in the logs, that multiple messages are being delivered twice, after the killing of one node. I attached a log file from node2 with sample output. For example messages test_4987, test_4985 are being delivered twice.

            • 3. Re: Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
              ataylor

              Two JBoss are not sharing journal because path to the journal is relative to JBoss home directory. But active and backup server on each JBoss seem to share it. Should I explicitly specify journal path for each HornetQ server instance on JBoss? Or this should not be a problem?

              Yes thats your problem, each HornetQ instance must have its own Journal. The backup server basically replicated its live server to its jouranl until it fails over and reads in the journal, if another server is also writing to this then they will be both reading in the same messages, hence duplicates.

              • 4. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                bbuterbrott

                Thanks for pointing that out! After I explicitly set HornetQ paths there were no duplicates, but some messages got lost when I was killing the node with the producer. But it was because I was creating session in a non-transacted mode. When I changed it to the transacted mode it started working as expected and no messages got lost anymore.

                • 5. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                  bbuterbrott

                  Perfomance in transacted mode is not high enough for our needs, so now I am facing the problem of lost messages in non-transacted mode.

                  I stated earlier that messages were lost only when I was killing the node with the producer, but after some more tests, I have found that it is not true and messages get lost even if I kill only consumer node. What is more strange that this happens not everytime (that's why I thought everything was alright with this case at the previous time).

                  The messages that are lost are the odd messages, which are load-balanced to the down node after it's killing. As I mentioned before, sometimes I see that theese (lost) messages being delivered to the alive node after it has received it's even messages, but sometimes I don't. And I don't see any message deliveries even after I bring up the node, which I killed.

                  So the question is if what i am trying to achieve is only possible in transacted mode or is it again some misconfigured server property or something?

                  Also I use java:/JmsXA as a connection factory and as far as I am concerned it supports transaction, so if I use it and create a session in non transacted mode will it be wrapped in transaction or something?

                  I ask that because I see strange messages in log concerning transactions like:

                  HQ142015: Uncommitted transaction with id 8,487 found and discarded

                  But i create JMS session like this:

                  Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);

                  Is it some internal HornetQ transaction that is mentioned in the log, so I should just ignore theese logs?

                  I would be grateful for any info on this.

                  Thanks for your time in advance!

                  • 6. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                    ataylor

                    Im not really sure as to what you are seeing but a few things to bear in mind.

                     

                    once messages are persisted by the server they will never go missing, the bridges that balance the messages between nodes have no relation as to the transaction mode of the client.

                     

                    when you create a session with auto acknowledge, this only means that the consumer will ack the message after the call to receive has taken place.

                     

                    Also the java:/JmsXA connection factory is pooled and when used in JEE components it will automatically enlist the jms resource in an XA transaction(NB this should not be used for normal clients)

                    1 of 1 people found this helpful
                    • 7. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                      bbuterbrott

                      Andy Taylor wrote:

                       

                      Also the java:/JmsXA connection factory is pooled and when used in JEE components it will automatically enlist the jms resource in an XA transaction(NB this should not be used for normal clients)

                      Thanks for pointing that out! That solved my problem.

                      • 8. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                        bbuterbrott

                        I have one more question.

                        I am now trying to get some flag indicating that the message processing was interrupted by the kill command, so that i would know if message was already delivered, but it's processing failed, when I receive it on another node.

                        As far as I understand what I am looking for is JMSRedelivered flag on javax.jms.Message object.

                        But the problem is, that this flag is set only if Exception is thrown in onMessage() method of MDB. But if onMessage() is ended by kill command, then when i receive this message on the other node this flag is set to false.

                        And I need some flag which will be true for both cases, or at least some flag to diffirentiate these two cases.

                        Is JMSRedelivered should be true in the second case too and it is some misconfiguration? Or mb I should look for some other flag?

                        Thanks in advance!

                        • 9. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                          ataylor

                          If you want this sort of guarantee then you need to use XA. Remember its a trade off, guarantee vs performance. Once the message is received by the consumer/mdb then its forgotten about when non transacted

                          1 of 1 people found this helpful
                          • 10. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                            bbuterbrott

                            Thank you for clarifying this! Your help was really appreciated!

                            • 11. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                              synclpz

                              Hi Andy.

                               

                              you said:

                              Once the message is received by the consumer/mdb then its forgotten about when non transacted

                               

                              Does that mean "received and successfully processed with onMessage() method"?

                               

                              We see that message entered onMessage() in MDB is "redelivered" in two cases:

                               

                              1. If method had thrown an exception

                              2. If method was interrupted by killing the cluster node

                               

                              In first case redelivery flag is raised, but in latter it is not. Thinking logically, the same things should apply for redelivery after onMessage() thrown an exception or if it is killed while inside onMessage(), because in both cases message is "redelivered". Do I miss something?

                              • 12. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                                ataylor

                                we acknowledge the message after the onmessage has been called so if an exception is thrown we notify the session that the message needs to be redelivered. If you kill the cluster node then obviously there is no way of the server to know if the message was delivered or not since it has been killed.

                                • 13. Re: Problem with HornetQ HA colocated cluster with (live+backup node pairs)
                                  synclpz

                                  Anyway inspite of

                                   

                                  If you kill the cluster node then obviously there is no way of the server to know if the message was delivered or not since it has been killed.

                                  message is being found as non-delivered by backup node and is delivered again if there is any consumer left in cluster. So the conclusion is that backup node does not know whether there was any delivery attempt prior it became live.

                                   

                                  Just for clarification, am I right?