1 2 Previous Next 20 Replies Latest reply on Aug 5, 2005 1:46 AM by starksm64

    Failover problem

    vignesh76

      Hi,

      We have two clustered JBoss instances in cluster configuration containing clustered SLSBs. Web applications deployed in a third JBoss instance access these clustered SLSBs. I was testing failover by stopping the current active clustered instance and each time testing the functionality of the web app to ensure that the failover has successfully taken place to the other clustered instance. Initially the failover works fine but at some point the failover breaks and the web app starts throwing "Service Unavilable" exception even though one of the nodes is available to accept requests . To recover from this the jboss instance running the web application has to be restarted. Can anyone clarify whether this is a JBoss issue and also provide a solution to this problem? Thanks in advance.

        • 1. Re: Failover problem

          If you keep having this probelm, you can open up a Jira issue under JBossAS/clustering component. Please attach your test case as well.

          Thanks,

          -Bem

          • 2. Re: Failover problem
            darranl

            Does your web application cache the interfaces that are being used or do you perform the lookups each time?

            As part of your testing did you make sure at least one of the nodes was up all the time so when each node is started it can join the original cluster or at some points were all of the nodes shut down?

            Finally for the next access from the web application was one of the nodes up during the previous request still available?

            • 3. Re: Failover problem
              vignesh76

              Thanks Ben and Darran,

              Forgot to mention the JBoss version, it is 4.0.1.

              1) I am not sure about whether they are cached or not as my role is more into system implementation but I shall try to find out. However could you tell me how does this caching affect the failover process. Also, what if the web application is not in use i.e. not in the midst of any ejb calls when the failover is needed. I tested with a scenario where the web application was totally idle (logged out) and after stopping the active instance, I tried to login to the application again when this error occurs.

              2) Yes, I ensured that one of the nodes was always up and that the clustering info maintained by this node had been updated after I shut down the active instance and beforing using the web application again to make ejb calls.

              3) Yes, one of the nodes was up and ready with updated clustering info after I shutdown the active node and before making the next request from the web application to test failover.

              I would like to emphasize here that intially it worked, i.e suppose I haves 2 clustered nodes A and B, below are the test cases tested and the results I observed. I tested with one browser instance accessing the web app running on a 3rd instance C. Also, for each test I used the same 2 use cases in the web app to check failover.

              1) A and B running (B is active node) -->OK

              2) A running, shutdown B (A becomes active node) -->OK

              3) Restarted B, shutdown A (B should become active node now)-->Service Unavailable

              4) Restarted instance C (B becomes active node now)--> OK

              5) Restarted A, B running (B is active, from step 4)-->OK

              6) A running, Shutdown B (A should become active)-->Service unavailable.

              I hope this brings more clarity to this issue and leads to a quick solution. Otherwise I guess I would need to create a Jira issue as Ben has suggested.

              • 4. Re: Failover problem
                vignesh76

                Created an issue for this case in JIRA.

                • 5. Re: Failover problem
                  vignesh76

                  The status of the Jira issue can be tracked at the below URL

                  http://jira.jboss.com/jira/browse/JBAS-2074

                  • 6. Re: Failover problem

                     

                    "ben.wang@jboss.com" wrote:
                    If you keep having this probelm, you can open up a Jira issue under JBossAS/clustering component. Please attach your test case as well.

                    Thanks,

                    -Bem


                    Ben this is very bad advice.

                    Before anyone reports a bug they should:
                    1) Test the latest version
                    2) Identify where the problem exists, NOT I have a problem.
                    3) Make sure it is a bug and not some configuration issue with JBoss/network/OS/other

                    Just saying "I have a problem getting it to work in some old version" is not a bug report.

                    It is a help request that belongs in the forums until the underlying issue is identified
                    as a bug in the code.
                    http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossHelp

                    • 7. Re: Failover problem

                      The alternative would be that we just ditch the forums
                      and let people use JIRA to let people ask questions.

                      Something we are not going to do.

                      • 8. Re: Failover problem
                        vignesh76

                        Hi,

                        Is it solved by using the latest JBoss version? Please clarify. Would appreactie if someone could give me pointers in the right direction towards getting this resolved.

                        Thanks.


                        • 9. Re: Failover problem

                          Adrian, yes, it is a bit hasty. But in this case, I have read the question and I am giving Vignesh the benefit of the doubt since he has been able to help answering some forum posts.

                          I seldom ask a user to create a Jira issue. (As a matter of fact, I don't encourage people to create a Jira unless it is necessary :-)

                          -Ben

                          • 10. Re: Failover problem

                             

                            "vignesh76" wrote:
                            Hi,

                            Is it solved by using the latest JBoss version?


                            Suck it and see. :-)


                            Please clarify. Would appreactie if someone could give me pointers in the right direction towards getting this resolved.


                            Read the "twenty questions" from the link I posted
                            on the kind of questions you need to ask yourself
                            to resolve your problem.
                            Or post here to at least provide enough information that we can help you.

                            You need to identify what the problem is, not post "IT DOES NOT WORK".

                            e.g. In this case,
                            * Is the cluster rejoining? Show the cluster view changes in comparison to the client requests.
                            * Did the client receive the up-to-date view of the cluster? i.e. When did the client know about the cluster view changes?
                            * Are the dns/ip addresses that the client is told to contact correct? i.e. is the client even contacting the correct machines?
                            * etc.

                            We are not going to go through a long thread of request/response trying to get
                            your configuration, logging, example code, etc.

                            OFF TOPIC:
                            AFAIK (my information may be out-of-date),
                            there isn't a single JDK vendor that will support you on Fedora?
                            They will support you Redhat EL.

                            • 11. Re: Failover problem
                              vignesh76

                              Appreciate your help here. All I requested was a little help that too in a polite manner and did not expect sarcastic comments for responses. If the attitude is not to help, then people should rather ignore instead of posting such responses. Also the users cannot be expected to think at the same level as experts or developers and raise the same kind of questions an expert would raise, that would defeat the purpose of having this forum. If not for the JIRA issue, I wouldn't have had to hear all this. Anyway..

                              I believe my posts were quite reasonable and not "dumb" and had sufficent information to answer some if not all the questions raised.

                              * Is the cluster rejoining? Show the cluster view changes in comparison to the client requests.

                              -->Shall provide more info such as console output for evidence that the cluster view was getting updated when a node left and joined the cluster.

                              * Did the client receive the up-to-date view of the cluster? i.e. When did the client know about the cluster view changes?

                              -->Not fully clear what you are asking here. The client JBoss instance was running all the time when I stopped the active instance and then checked failover from the client bu making a new request. There were not active requests to the clustered nodes while stopping or starting a node again to join the cluster.

                              * Are the dns/ip addresses that the client is told to contact correct? i.e. is the client even contacting the correct machine

                              --> If the IP addresses had been wrong the initial failover, that I have cleary emphasized as working in my post would have not worked at all.

                              I shall test the scenarios again with the latest verion of JBoss and all I asked was a clarification that whether this was a known issue in version 4.0.1 as the recent posts have been more diverted towards setting ethics while totally ignoring user issues for which the forum is meant to be.

                              • 12. Re: Failover problem

                                 

                                "vignesh76" wrote:
                                Appreciate your help here. All I requested was a little help that too in a polite manner and did not expect sarcastic comments for responses.


                                If you don't want sacrastic comments, I'll avoid responding to your posts in future :-)

                                • 13. Re: Failover problem
                                  vignesh76

                                  Yeah, How would you react if someone mocked you just because he was in a superior technical position! Being superior doesn't give someone the authority to be arrogant and mock polite strangers.

                                  I shall continue to wait for an appropriate solution from the rest of the JBoss community if not from the Director of JBoss support team. Thanks for whatever assistance and leads you have provided so far..

                                  • 14. Re: Failover problem
                                    darranl

                                     

                                    "vignesh76" wrote:

                                    1) I am not sure about whether they are cached or not as my role is more into system implementation but I shall try to find out. However could you tell me how does this caching affect the failover process. Also, what if the web application is not in use i.e. not in the midst of any ejb calls when the failover is needed. I tested with a scenario where the web application was totally idle (logged out) and after stopping the active instance, I tried to login to the application again when this error occurs.


                                    The reason I asked you about the caching if the interfaces is because JBoss uses smart proxies to handle the failover to the different nodes in the cluster.

                                    Each time an invocation is passed to the server if the members of the cluster have changed the response will contain the new list of members so the smart proxy will have an up to date list of members for the next invocation.

                                    "vignesh76" wrote:

                                    1) A and B running (B is active node) -->OK

                                    2) A running, shutdown B (A becomes active node) -->OK

                                    3) Restarted B, shutdown A (B should become active node now)-->Service Unavailable


                                    Looking at your example the list of nodes in the cluster has completely changed between invocation 2 and 3, it would help a lot to know where you obtain the references to the remote interfaces, where they are reused and if they are recreated at all.


                                    1 2 Previous Next