12 Replies Latest reply on Mar 28, 2010 6:26 AM by BJ Chippindale

    Topic cannot be/is not unsubscribed

    BJ Chippindale Master

      OK... for those not familiar with the setup.

       

      I have 2 JBoss 4.2.3.GA nodes running clustered.   They are dual-homed.  eth0 is the external face and eth1 is the common-subnet on each of them.  In other words  eth0 is 192.168.1.6 and on node 1 and eth0 is 192.168.2.7 on node 2 with a 255.255.255.0 netmask.

       

      eth1 on node 1 is 192.168.3.6  and on node 2 it is 192.168.3.7 with the same netmask.

      In other words, eth1-eth1 is the subnet on which clustering can occur.  

       

      Through judicious tweaking of the command line two JBoss instances recognize each other and APPEAR to be a cluster.

       

      They are built on all-with-hornetq profiles.

       

      The hornetq config is straightforward.  There is a local_bind_address in the broadcast groups and jboss is invoked setting the hornetq.remoting.netty.host to the appropriate value.

       

      All good most of the time... but we are testing and so we tried shutting down an application that is attached to a durable topic... on just one node.

       

      Shutdown appears to work OK.  No Exceptions after listing the deployed apps getting the URL and then calling undeploy.   Node1 now has messages piling up.   (not really, we do this quietly, but it doesn't help).

       

      This is the only app attached to the topic.  Only other copy of the app is on the other side of the cluster.  Wanted to see that messages went (have set to forward when nothing is attached).  

       

      So nothing WAS transferred.    This is not good.

       

      The Topic still has a subscriber.

       

      Let's restart the application. 

       

      Can't connect to the topic as something is already subscribed.

       

      Looks a lot like the attached snippet.

       

      OK...

       

      Try to unsubscribe all.  This gathers rather more vehement exceptions related to having the reader still attached.  Recall that it was undeployed now.

       

      So so so. 

       

      Try the same drill with node 2 down.  That is, ONLY node 1 is available to shut down and turn on.

       

      No exceptions at all. 

       

      But this is not a cluster as I know it.  

       

      Restarting JBoss puts it all right again.  Nothing else appears to work.  Once it is broken in this fashion  turning off node 2 does not allow the stuck subscription to be reset.  JBoss has to be restarted.

       

      Which is all probably something simple to someone.   Right now I am Mystifried... as it is 0130, and nothing seems to make it happy.

       

      respectfully

      BJ

        • 1. Re: Topic cannot be/is not unsubscribed
          Tim Fox Master

          I have to be honest, I can't really follow your email.

           

          How about describing in separate sections

           

          1) Your topology. What nodes you have, what consumers, topics, queues you have on each node

           

          2) List what actions you performed. E.g. I create a consumer on node A. I shut down the app on node A. I shut down the server on node A

           

          3) What you expected to see. E.g. I expected to see that messages from node A were available on node B.

           

          4) What you actually saw. E.g. I didn't see any of the messages

           

          I'd also *highly* recommend reading the clustering and HA chapters. In particular the chapters on "server side load balancing" and "message redistribution".

          • 2. Re: Topic cannot be/is not unsubscribed
            BJ Chippindale Master

            Actually, I think it is related to this:

             

            https://community.jboss.org/message/442542#442542

             

            Topology:  Two Nodes.    Both running RHEL5.4,  JBoss-4.2.3.GA,  Profile based on "All-With-Hornetq",  Dual-Homed.

             

            eth0 nics on the two nodes are not connected to each other.  Not on the same subnet.  Nicely firewalled in fact.

             

            eth1 nics on the two nodes are connected on the same subnet.

             

            +++++++++++++++++++

             

            You'll have to excuse my being a bit punchy.   It is 0200 hours here. 

             

            Test 1:

            We shut down an application (using the JMX interface) that is attached to a durable topic on node 1.   We expected to see a message go to the app still running on node 2.

             

            * It did not.

             

            Shutdown appears to work OK.  No Exceptions after calling undeploy. 

            However, node1 has messages piling up. 

             

            Test2:

            We attempt to restart the application on node 1. 

            This failed.  The error that appears in the included snip of server log indicates the cause - there is already a subscriber to the topic.

             

            I don't understand this on several counts but let that pass for a moment.  I suspect my not understanding is related to the difference between core and JMS.   Not right now though.

             

             

            Test 3: Repeating the shutdown and startup without messages,   the topic remains subscribed.  The application (the only subscriber), is undeployed. JMX shows it as inactive.  The subscription remains locked on the topic.  It had a client ID,  there are no messages and it never lets go.

             

             

            Test 4: We try the same shutdown and startup with node 2 JBoss completely down.   In this case the restart of the application works perfectly, the topic subscription is relinquished and resubscribed properly and everything works as expected. 

             

            ++++++++++++++++++++++++++

             

            I suspect that this is VERY specific to the configuration we are using.  I am currently looking for the remoting upgrade to sp11 (if that wasn't put in already).   Also looking at going to standalone...   it isn't I think, a fault of Hornetq, but a problem with the JBoss interaction in this configuration.

             

            One thing...   the notion that I haven't read the manual is at this point -  a bit insulting.  You guys are doing a good job and you have an impressive tool... but don't try to teach your grandpa to suck an egg. 

             

            I believe there is an error on page 158 as  "Use_Duplicate_Detection" is not a permitted option for the cluster configuration.   Looks like it is simply a default.   Either that or I fat-fingered the option... I don't see it offered in the error message but that could be omission of another sort.

             

            respectfully

            BJ

            • 3. Re: Topic cannot be/is not unsubscribed
              Tim Fox Master

              That thread is about JBoss Messaging project, not HornetQ. They are almost 100% different codebases.

               

              Also, HornetQ does not use JBoss Remoting.

               

              "We shut down an application (using the JMX interface) that is attached to a durable topic". Not sure what you mean by "durable  topic". Topics are neither durable or not. Topic subscriptions can be durable. Perhaps you mean that.

               

              What do you mean by "application"? Do you main a JMS client containing a topic subscriber? How do you mean "using the JMX interface". Does your JMS client have it's own JMX interface?

               

              You mention "messages are piling up", but give no description of where these messages came from, what clients sent them and at what node they are attached.

               

              I'm trying to help you here, but I'm really not parsing your description. I suggest you get some sleep, and try again tomorrow with a clearer head

              • 4. Re: Topic cannot be/is not unsubscribed
                BJ Chippindale Master

                Yes -  Durable subscription to topic.   I know it is about jboss.  Doesn't make the error less relevant to the problem I get... just not so relevant to your knowledgebase.  Trying to build some VMs locally to reproduce it better.

                 

                The application is an mbean, deployed inside jboss.

                 

                Application is undeployed using JBoss 4.2.3 standard undeploy method exposed in MainDeployer.

                 

                Other clients generate the messages if tickled in the right spot... or can be simple soap.  Messages go in.  System works normally if we don't muck with it.   There is no problem sending and retrieving messages in normal conditions.

                 

                Also, HornetQ does not use JBoss Remoting

                 

                OK... that's something.     Don't focus on messages.  Messages don't pile up if we don't send any and we STILL get problem trying to get unsubscribed from topic.    I could understand it if we yanked the app out of the deploy directory without even stopping it....  

                 

                ...hmmm... could it want to be stopped before we undeploy it?  Could it not be inclusive of "stop"?     That's a JBoss question, not for you :-). 

                 

                Weird part is the sensitivity to the other node in the cluster.  Take THAT away and there is no exception.   All just works.

                 

                respectfully

                BJ

                • 5. Re: Topic cannot be/is not unsubscribed
                  BJ Chippindale Master

                  No need for this to even BE related to Hornetq code base.   Not directly at least.

                   

                  If application has not relinquished subscription on topic for whatever reason,  Hornetq sees no reason to forward messages to other server.   Correct behaviour IMHO.  

                   

                  This may be strictly JBoss problem.  You innocent bystanders.  Believe it, question is asked on JBoss side too.

                   

                  Standalone would not help in this instance.  App has to be killed sufficiently to have subscription stopped... and JBoss has to be smart enough to drop damned subscription when app is gone.   May not NOTICE if subscription is held against Hornetq. 

                   

                  BJ

                  • 6. Re: Topic cannot be/is not unsubscribed
                    BJ Chippindale Master

                    Reading spec and code and wondering if it is not doing what is right.

                     

                    The "durable" part of a durable subscription means that the mail is held for the client while the client is "away". 

                     

                    Two questions -  first is whether I am reading the spec correctly... second is how to reconnect to the existing durable subscription when the client restarts.   Going to have a new session... how to get a handle on the one that used to be there.

                     

                    Or do I misunderstand something else.

                     

                     

                    respectfully

                    BJ

                    • 7. Re: Topic cannot be/is not unsubscribed
                      BJ Chippindale Master

                      The subscription is owned by the other member of the cluster?

                       

                      So so so... perhaps I am missing something my team did, or I have semi-successfully clustered.   Getting exceptions every 2 minutes when the client attempts to restart isn't cool and why it doesn't do that when we restart the whole node is a matter of no small curiousity.

                       

                      BJ

                      • 8. Re: Topic cannot be/is not unsubscribed
                        Tim Fox Master

                        Like I mentioned before. We would like to help you, but we are lacking a coherent description of your problem, so we don't have a good picture of what you have done, what you expect to see, and what you actually see.

                         

                        Without that, it's not really possible for us to comment on whether what you're seeing is correct or not.

                        • 9. Re: Topic cannot be/is not unsubscribed
                          Tim Fox Master

                          Could you also create some kind of test program that we can run to replicate the issue, as described here:

                           

                          http://community.jboss.org/wiki/Howtoreportabugissue

                          • 10. Re: Topic cannot be/is not unsubscribed
                            BJ Chippindale Master

                            Issue devolved to a lack of discovery.   The cluster was NOT being formed and so there was no place for any messages to go.

                             

                            There were two reasons.  First was that the Admins had not generally allowed multicast, they'd allowed it for JBoss.  Which defaults in the 230.0.0.8 range.   I can use 230.0.0.9,  not 230.7.7.7,  not 231.anything.   Only tool I have to test with is a simple app that I have to recompile elsewhere to test an address.  Takes a wee-while to do.

                             

                            I didn't figure this out until this morning so there isn't much else to say.

                             

                            What I want here is to be able to

                             

                            1. Take a durable subscription to a topic.
                            2. Drop the application that has held the subscription.
                            3. Restart the application.

                             

                            That works when the other member of the cluster is down.

                             

                            Not when the other member of the cluster is half-up.

                             

                            This is all part of the larger issue of intergration with JBoss 4.2.3 and the difficulty with getting discovery. 

                             

                            I suspect that I have had a PARTIAL clustering in which some information about the cluster was available, but the actual hornetq cluster connection was not.  The breakage is not therefore likely to be anything to do with Hornetq.  Except I can't tell discovery how to choose the second nic out of 2... maybe.     Which breaks discovery on this sort of dual-homed system, and is a separate issue.

                             

                             

                            respectfully

                            BJ

                            • 11. Re: Topic cannot be/is not unsubscribed
                              Tim Fox Master

                              BJ - can you add a JIRA for the possible UDP problem with multiple NICs? - then someone will investigate.

                               

                              If you want to see if the cluster has been formed or not, you can do this by going to the JMX console and invoking the method getNodes on the cluster connection control. This will tell you which other nodes it is connected to. For more info, this has been discussed on other threads.

                               

                              On your other points, you need to get the cluster to form correctly for clustering to work, so these are probably just knock on effects to you not having UDP allowed.