1 2 Previous Next 15 Replies Latest reply on Jul 31, 2002 5:01 AM by slaboure

    HA-JNDI (Fault tolerance)

    fred_soulier

      Hi

      Config
      ------
      W2K
      JBoss3.0.0_Tomcat4.0.3
      Sun JDK 1.3.1_04

      Problem:
      --------
      I run a 2-node cluster of JBoss servers. Both use static IP address on the internal LAN 192.168.20.210 & 192.168.20.211
      HA-JNDI is configured on port 55555.

      When I run my client that looks up 3 different EJBs in a loop, I can see some logging output (generated by these beans) only on 1 node. The other node sits there and does nothing.
      If I stop the node that responds I expect the Client to be automatically redirected to the remaining working node, instead I get a java.lang.IllegalStateException: container is not started, you cannot invoke ejb methods on it.

      It seems the responding node is always the one that has been started first.

      If I try with one node at a time I can see the node responding, the HA-JNDI is looked up correctly and so on..

      The problem is really:
      If the node that was responding to the lookup requests fails, why do I get this exception and why the other node does not take over?

      I am including my RMITest.java client and the cluster-service.xml

      I have tried UDP and TCP for the Javagroup stuff but same result.

      note: there is some SSL related stuff (commented out) in the cluster-service.xml because I've been trying that as well.

      Thanks for any help.
      Fred

        • 1. Re: HA-JNDI (Fault tolerance)
          slaboure

          First the easy answer: could you please try with the last CVS HEAD if possible? Some bug fixes are not in the official release.

          If you experience the same problem with HEAD, I will check.

          Cheers,


          sacha

          • 2. Re: HA-JNDI (Fault tolerance)
            fred_soulier

            Hi Sacha,

            By CVS HEAD you mean the module "jboss-all" that builds a JBoss3.1.0alpha?

            Fred

            • 3. Re: HA-JNDI (Fault tolerance)
              fred_soulier

              OK. Got "jboss-all" updated to CVS HEAD. Built JBoss3.1.0alpha and I have been fighting with it for the last few hours... either it does not start or hangs or does not shutdown properly...

              So for the time being I will install a fresh JBoss3.0.0 and see whether the pbm is still there...

              Although I really really need to find a solution to this pbm.

              Fred

              • 4. Re: HA-JNDI (Fault tolerance)
                fred_soulier

                I have tried with a fresh JBoss3.0.0_Tomcat4.0.3 install on both nodes. I only changed the cluster-service.xml, deployed my ear file, ran my client... Same thing, only 1 node process the requests... kill this node... no switching to the other node... client throws exception and stops...

                Fred

                • 5. Re: HA-JNDI (Fault tolerance)
                  belaban

                  Fred,

                  it could be a problem with JavaGroups. Are you sure the 2 nodes 'see' each other ? You can try this out by starting 2 Draw instances and checking whether they form a group (both should show '2 instances' in their title):

                  java org.javagroups.demos.Draw

                  If this works, then it is not a JavaGroups problem. If they don't find each other, try the following:

                  1. Get the latest JavaGroups from javagroups.sf.net. If you know how to do it, get it from the CVS, build javagroups-all.jar and drop the JAR into the correct location in JBoss.

                  2. If this still doesn't work:

                  3. Modify the JavaGroups properties (described in the Cluster documentation): add a bind_addr property to the UDP spec, e.g.:

                  "UDP(bind_addr=192.168.20.210;...):"

                  This will tell the instance to bind to the correct interface in case of a multi-homed system. You need to change this for the other node.

                  Hope this helps,
                  Bela

                  • 6. Re: HA-JNDI (Fault tolerance)
                    fred_soulier

                    Well, They see each other because the ReplicantManager displays msgs when one node dies (deadMembers: 1) or when a node is started (There are new members. Spawning MergeMembers thread.)
                    Also when a node is started and there is already a node with the same partition running I can see in the log: [CLUSTER] Number of cluster members: 2

                    I have just tried JBoss3.0.1RC1_Tomcat4.0.4 and it's exactly the same pbm...

                    Fred

                    • 7. Re: HA-JNDI (Fault tolerance)
                      fred_soulier

                      I got javagroups from CVS and rebuilt it. Replaced javagroups-2.0.jar in JBoss /sever/all/lib by javagroups-all.jar.
                      When I restarted JBoss it complained about UNICAST.setProperties() for min_wait_time=2000 which I had in my cluster-service.xml. Removed min_wait_time=2000 and restarted... Good no error.

                      So now I need to re-run my test and I will try the Draw example from javagroups as well just to be on the safe side.
                      Stay tuned :)

                      • 8. Re: HA-JNDI (Fault tolerance)
                        fred_soulier

                        Ok. Just tried the Draw demo from Javagroups (rebuilt from CVS, v2.0.2) and it's working fine.

                        1st Scenario
                        ------------
                        2 instances of Draw running on same box can see each other. Drawings on one appear on the other, etc...

                        2nd scenario
                        ------------
                        2 instances running on 2 different boxes (by the way the 2 boxes are the exact same boxes I use for my 2-node cluster)
                        and again they can see each other. Drawings on one appear on the other, etc...


                        So Javagroups runs fine on these boxes (v2.0.2 built from CVS).

                        /Fred

                        • 9. Re: HA-JNDI (Fault tolerance)
                          fred_soulier

                          Finally, ran my client again with following config:

                          JBoss3.0.0_Tomcat4.0.3
                          Javagroups in JBoss3.0.0 replaced by Javagroups 2.0.2 from CVS

                          Same problem. Only 1 node serves the responses and if it dies there is no failover to the 2nd node...

                          Can someone look at this pbm? I'm ready to try whatever hacks/fixes may work but I need directions.
                          Thanks.

                          /Fred

                          • 10. Re: HA-JNDI (Fault tolerance)
                            fred_soulier

                            OK. It seems that the name of the EJB hasn't been bound through the HA-JNDI...
                            Logging CONSOLE output in debug mode, I get:

                            18:39:09,634 DEBUG [HAJNDI] lookupLocally

                            Looking at the source of org.jboss.ha.jndi.HAJNDI in the lookup(Name name) method I get this message most likely because the super.lookup(name) (in NamingServer) failed...
                            It then calls the lookupLocally(name) method which returns the name.

                            I've attached the client I use and the cluster-service.xml

                            In jboss.xml I have for my ejb/GUIDGenerator:

                            <!-- =================== -->
                            <!-- GUID Generator EJB -->
                            <!-- =================== -->

                            <ejb-name>GUIDGeneratorEJB</ejb-name>
                            <jndi-name>ejb/GUIDGenerator</jndi-name>
                            True
                            <cluster-config>
                            <partition-name>CLUSTER</partition-name>
                            <home-load-balance-policy>org.jboss.ha.framework.interfaces.RoundRobin</home-load-balance-policy>
                            <bean-load-balance-policy>org.jboss.ha.framework.interfaces.RoundRobin</bean-load-balance-policy>
                            </cluster-config>



                            If I use PROVIDER_URL="192.168.20.210:1099" the name is found.
                            If I use PROVIDER_URL="" the name is found because HAJNDI looked it up locally.

                            Why is my EJB name not bound through HA-JNDI?
                            Am I missing something in a config file?

                            /Fred

                            • 11. Re: HA-JNDI (Fault tolerance)
                              slaboure

                              Hello Fred,

                              I made a few bug fixes this week-end. Could you please try to get a fresh version from HEAD (jboss-all from HEAD) and try it. But don't forget to set your test client code to use HA-JNDI and not simply JNDI!! (use the good port number *on the client side*!)

                              Cheers,


                              Sacha

                              • 12. Re: HA-JNDI (Fault tolerance)
                                fred_soulier

                                Hi Sacha
                                Thanks.
                                Please see the attached files.

                                Basically it yielded some positive results (that was a joy to my eyes) and maybe some not so good.

                                /Fred

                                • 13. Re: HA-JNDI (Fault tolerance)
                                  slaboure

                                  > Scenario #2
                                  > -----------
                                  > The 3 nodes are running.
                                  >
                                  > From my client trying to lookup my EJB with:
                                  > PROVIDER_URL = ""
                                  > fails.

                                  don't set PROVIDER_URL = "", but simply PROVIDER_URL = null (i.e. don't set it!)

                                  If you still get exceptions, then the stacktrack is appreciated.


                                  > Scenario #3
                                  > -----------
                                  > The 3 nodes are running.
                                  >
                                  > From my client trying to lookup my EJB with:
                                  > PROVIDER_URL = "192.168.20.210:1100,192.168.20.104:1100,192.168.20.118:1100"
                                  > returns the correct lookup but ...
                                  > The client looks up the same EJB 10 times.

                                  logical, you need to make subsequent calls *with the same object* to have round robin behaviour. You always get a *new* object (stub) => it is logical that it doesn't work.

                                  > So according to the source code, the binding is not found in the HA-JNDI and it
                                  > looks for it in the local JNDI tree which is not what I expected...
                                  > Why is the name not bound through the HA-JNDI?

                                  see the documentation. normal.

                                  > Scenario #4 (fail-over)
                                  > -----------------------
                                  > The 3 nodes are running.
                                  >
                                  > The client looks up the same EJB in an infinite loop.
                                  > PROVIDER_URL = "192.168.20.210:1100,192.168.20.104:1100,192.168.20.118:1100"
                                  > (PROVIDER_URL="" does not work as mention earlier)
                                  >
                                  > The client runs happily and requests are dispatched to all nodes.
                                  > I switch off the W2K node #3
                                  > ...
                                  > in the local JNDI tree).
                                  > - once application is deployed and local JNDI tree is setup, the exceptions stop.

                                  stacktrack appreciated. And try it on HEAD please (I don't want to hunt old bugs)

                                  Cheers,


                                  Sacha

                                  • 14. Re: HA-JNDI (Fault tolerance)
                                    fred_soulier

                                    Hi Sacha,

                                    Thanks for your reply.

                                    Today's CVS (HEAD) does not build:
                                    ...
                                    generate-parsers:
                                    [mkdir] Created dir: /home/fsoulier/development/JBoss_Head/jboss-all/server/output/parsers/org/jboss/ejb/plugins/cmp/ejbql

                                    BUILD FAILED
                                    file:/home/fsoulier/development/JBoss_Head/jboss-all/server/build.xml:382: Failed to launch JJTree





                                    >>don't set PROVIDER_URL = "", but simply PROVIDER_URL = null (i.e. don't set it!)
                                    >>If you still get exceptions, then the stacktrack is appreciated.
                                    Yep, no setting the PROVIDER_URL works.


                                    >>logical, you need to make subsequent calls *with the same object* to have round robin behaviour. You always get a *new* object (stub) => it is logical that it doesn't work.
                                    OK I made some changes to my client to call the same business method getGUID() 6 times for the same stub.
                                    The results were:

                                    Node #1: Linux (192.168.20.104)
                                    Node #2: W2K (192.168.20.210)
                                    Node #3: W2K (192.168.20.211)


                                    1st batch
                                    ---------
                                    Node #1: 1
                                    Node #2: 3
                                    Node #3: 2

                                    2nd batch
                                    ---------
                                    Node #1: 3
                                    Node #2: 1
                                    Node #3: 2

                                    3rd batch
                                    ---------
                                    Node #1: 1
                                    Node #2: 3
                                    Node #3: 2

                                    4th batch
                                    ---------
                                    Node #1: 1
                                    Node #2: 3
                                    Node #3: 2

                                    5th batch
                                    ---------
                                    Node #1: 3
                                    Node #2: 1
                                    Node #3: 2

                                    So yes there is some load balancing done. (note: maybe I should do the test with more
                                    nodes and more calls using the same stub?)


                                    >>see the documentation. normal.
                                    Yes a fine manual indeed :)
                                    page17: "So, a EJB home lookup through HA-JNDI, will always be delegated to the local
                                    JNDI instance."


                                    >>stacktrack appreciated. And try it on HEAD please (I don't want to hunt old bugs)
                                    Client changed to call the getGUID() method 10000 times using the same stub to test the
                                    failover capability of the stub.
                                    3 nodes were running, all nodes were serving responses, I then decided to be mean and
                                    shutdown 2 of them and got this exception.

                                    ...
                                    [java] (#0_6792_<1>) Got_GUID_Generator_Reference: ejb/GUIDGenerator:Stateless / Got_GUID: 5BA35C2B4058146800284ED632C81693
                                    [java] (#0_6793_<1>) Got_GUID_Generator_Reference: ejb/GUIDGenerator:Stateless / Got_GUID: 5B9DC4EC4058142D000297FE0618AD26
                                    [java] (#0_6794_<1>) Got_GUID_Generator_Reference: ejb/GUIDGenerator:Stateless / Got_GUID: 5BA35C344058146800284ED63E7FA0C0
                                    [java] (#0_6795_<1>) Got_GUID_Generator_Reference: ejb/GUIDGenerator:Stateless / Got_GUID: 5B9DC4F64058142D000297FE5A5DB81F
                                    [java] java.lang.IllegalStateException: container is not started, you cannot invoke ejb methods on it
                                    [java] at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:240)
                                    [java] at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:215)
                                    [java] at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:117)
                                    [java] at org.jboss.invocation.jrmp.server.JRMPInvoker_Stub.invoke(Unknown Source)
                                    [java] at org.jboss.invocation.jrmp.interfaces.JRMPInvokerProxyHA.invoke(JRMPInvokerProxyHA.java:164)
                                    [java] at org.jboss.invocation.InvokerInterceptor.invoke(InvokerInterceptor.java:92)
                                    [java] at org.jboss.proxy.TransactionInterceptor.invoke(TransactionInterceptor.java:51)
                                    [java] at org.jboss.proxy.SecurityInterceptor.invoke(SecurityInterceptor.java:48)
                                    [java] at org.jboss.proxy.ejb.StatelessSessionInterceptor.invoke(StatelessSessionInterceptor.java:109)
                                    [java] at org.jboss.proxy.ClientContainer.invoke(ClientContainer.java:82)
                                    [java] at $Proxy2.getGUID(Unknown Source)
                                    [java] at com.lastminute.ebasket.RMISSLTest.main(RMISSLTest.java:90)

                                    /Fred

                                    1 2 Previous Next