11 Replies Latest reply on Nov 13, 2007 12:17 PM by galder.zamarreno

    Strange clustering behaviour when running testsuite offline

      If I am offline (no network connection) and I start jboss-head manually
      in the all configuration, there are no problems booting the server:

      [ejort@warjort bin]$ ./run.sh -c all
      ...
      16:09:46,858 INFO [AjpProtocol] Starting Coyote AJP/1.3 on ajp-127.0.0.1-8009
      16:09:46,867 INFO [ServerImpl] JBoss (Microcontainer) [5.0.0.Beta3 (build: SVNTag=JBoss_5_0_0_Beta3 date=200711121409)] Started in 1m:41s:358ms
      


      If however, I start the server from the testsuite I get all sorts of errors
      (in particular the ejb3 testsuite)
      [ejort@warjort ejb3]$ ./build.sh -f build-test.xml
      [ejort@warjort ejb3]$ ./build.sh -f build-test.xml ejb-tests
      


      e.g. from server/all/log/server.log
      2007-11-12 16:13:50,174 ERROR [org.jgroups.protocols.UDP] failed sending message to null (59 bytes)
      java.lang.Exception: dest=/229.11.11.11:45699 (62 bytes)
       at org.jgroups.protocols.UDP._send(UDP.java:337)
       at org.jgroups.protocols.UDP.sendToAllMembers(UDP.java:287)
       at org.jgroups.protocols.TP.doSend(TP.java:1186)
       at org.jgroups.protocols.TP.send(TP.java:1176)
       at org.jgroups.protocols.TP.down(TP.java:943)
       at org.jgroups.protocols.PING.sendMcastDiscoveryRequest(PING.java:218)
       at org.jgroups.protocols.PING.sendGetMembersRequest(PING.java:212)
       at org.jgroups.protocols.PingSender.run(PingSender.java:76)
       at java.lang.Thread.run(Thread.java:595)
      Caused by: java.io.IOException: Network is unreachable
       at java.net.PlainDatagramSocketImpl.send(Native Method)
       at java.net.DatagramSocket.send(DatagramSocket.java:612)
       at org.jgroups.protocols.UDP._send(UDP.java:328)
       ... 8 more
      


      What's the difference? :-)

      NOTE: I do see some errors at shutdown for the manual boot which suggests
      it is not properly testing whether somethings are working?

      e.g.
      16:12:17,980 ERROR [ExceptionUtil] org.jboss.jms.server.connectionfactory.ConnectionFactory@1f9cb2c startService
      java.lang.IllegalStateException: Cannot find replicant to remove: CF_jboss.messaging.connectionfactory:service=ConnectionFactory
       at org.jboss.jms.server.connectionfactory.ConnectionFactoryJNDIMapper.unregisterConnectionFactory(ConnectionFactoryJNDIMapper.java:274)
       at org.jboss.jms.server.connectionfactory.ConnectionFactory.stopService(ConnectionFactory.java:239)
       at org.jboss.system.ServiceMBeanSupport.jbossInternalStop(ServiceMBeanSupport.java:328)
       at org.jboss.system.ServiceMBeanSupport.stop(ServiceMBeanSupport.java:206)
      


      and

      16:12:19,360 INFO [JGCacheInvalidationBridge] Problem while shuting down invalidation cache bridge
      java.lang.IllegalStateException: Cache not in STARTED state!
       at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:3929)
       at org.jboss.cache.CacheImpl.get(CacheImpl.java:1441)
       at org.jboss.cache.CacheImpl.get(CacheImpl.java:1415)
       at org.jboss.ha.framework.server.DistributedStateImpl.get(DistributedStateImpl.java:296)
       at org.jboss.ha.framework.server.DistributedStateImpl.remove(DistributedStateImpl.java:278)
       at org.jboss.ha.framework.server.AOPContainerProxy$8.remove(AOPContainerProxy$8.java)
       at org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge.stopService(JGCacheInvalidationBridge.java:326)
       at org.jboss.system.ServiceMBeanSupport.jbossInternalStop(ServiceMBeanSupport.java:328)
       at org.jboss.system.ServiceMBeanSupport.jbossInternalLifecycle(ServiceMBeanSupport.java:247)
       at org.jboss.cache.invalidation.bridges.AOPContainerProxy$14.jbossInternalLifecycle(AOPContainerProxy$14.java)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:157)
       at org.jboss.mx.server.Invocation.dispatch(Invocation.java:96)
       at org.jboss.mx.server.Invocation.invoke(Invocation.java:88)
       at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
       at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:668)
       at org.jboss.system.microcontainer.ServiceProxy.invoke(ServiceProxy.java:167)
       at $Proxy4.stop(Unknown Source)
       at org.jboss.system.microcontainer.StartStopLifecycleAction.uninstallAction(StartStopLifecycleAction.java:56)
      


        • 1. Re: Strange clustering behaviour when running testsuite offl
          brian.stansberry

           

          What's the difference? :-)


          Perhaps the setting for node0 in ejb3/local.properties? (Or testsuite/local.properties if running the main testsuite.)

          16:12:17,980 ERROR [ExceptionUtil] org.jboss.jms.server.connectionfactory.ConnectionFactory@1f9cb2c
          startService
          java.lang.IllegalStateException: Cannot find replicant to remove: CF_jboss.messaging.connectionfacto
          ry:service=ConnectionFactory
           at org.jboss.jms.server.connectionfactory.ConnectionFactoryJNDIMapper.unregisterConnectionFa
          ctory(ConnectionFactoryJNDIMapper.java:274)
          


          Not sure about this one.

          16:12:19,360 INFO [JGCacheInvalidationBridge] Problem while shuting down invalidation cache bridge
          java.lang.IllegalStateException: Cache not in STARTED state!


          There's something screwy with the service dependencies here; haven't had a chance to determine exactly what the issue is/how I think it should work and make a coherent post. The JGCacheInvalidationBridge is being stopped after a JBC service it depends on is already stopped. I'm guessing this all relates to the use of @JMX on pojo services -- the thing actually being stopped is the JMX proxy, while the inter-service dependencies are expressed in terms of the underlying pojo beans.

          • 2. Re: Strange clustering behaviour when running testsuite offl

            Even with the network connected, it's taking a lot longer to boot the server
            from the testsuite:

            2007-11-12 16:52:53,200 INFO [org.jboss.bootstrap.microcontainer.ServerImpl] JBoss (Microcontainer) [5.0.0.Beta3 (build: SVNTag=JBoss_5_0_0_Beta3 date=200711121409)]
            Started in 5m:29s:740ms
            


            And it has those flush failure messages:
            2007-11-12 16:48:42,394 DEBUG [org.jboss.ha.framework.server.JChannelFactory] Passing unique node id 127.0.0.1:1099 to the channel as additional data
            2007-11-12 16:48:42,753 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 127.0.0.1:32822
            -------------------------------------------------------
            2007-11-12 16:49:06,932 WARN [org.jboss.ha.framework.server.JChannelFactory] Flush failed at 127.0.0.1:32822 DefaultPartition-JMS-CTRL
            


            So it looks like it is not going over loopback even though the server is started
            with "-b 127.0.0.1"?

            • 3. Re: Strange clustering behaviour when running testsuite offl

               

              "bstansberry@jboss.com" wrote:
              What's the difference? :-)


              Perhaps the setting for node0 in ejb3/local.properties? (Or testsuite/local.properties if running the main testsuite.)


              node0 is localhost, see jboss-head/ejb3/local.properties

              #IMPORTANT:- Please do not check this file into CVS with your local changes
              #This file is used to pass config info to targets like clustered-tests
              #Please uncomment or add your properties to this file.
              
              #
              # Both node0 and node1 properties are needed to run clustering tests.
              # e.g., clustered-tests. Note that you will need to have two separate ips
              # (even at the same machine). Actually what we needed are just node0 and node1
              # ips and the rest are optional.
              #
              node0=127.0.0.1
              #node0=${env.MYTESTIP_1}
              #node0.http.url=http://192.168.1.103:8080
              #node0.jndi.url=jnp://127.0.0.1:1099
              
              node1=${env.MYTESTIP_2}
              #node1.http.url=http://192.168.1.113:8080
              #node1.jndiurl=jnp://192.168.1.113:1099
              
              #Timeout for jboss to start
              jboss.startup.timeout=120
              


              • 4. Re: Strange clustering behaviour when running testsuite offl
                galder.zamarreno

                do you get the same issue when running the main testsuite? or is the issue limited only to start from ejb3 project?

                • 5. Re: Strange clustering behaviour when running testsuite offl
                  brian.stansberry

                  I'll poke around a bit and see if something pops out. I can't see why you'd get different behavior with "./run.sh -c all" and starting the testsuite; with the change you made last week to Main(), in both cases JGroups binds to 127.0.0.1.

                  • 6. Re: Strange clustering behaviour when running testsuite offl

                    I've found the problem. It is the fact that is using the IPv6 stack.

                    So I guess either the JDK or JGroups isn't handling 127.0.0.1 as loopback for IPV6?

                    The following patch solves the problem:

                    [ejort@warjort testsuite]$ svn diff
                    Index: imports/server-config.xml
                    ===================================================================
                    --- imports/server-config.xml (revision 66950)
                    +++ imports/server-config.xml (working copy)
                    @@ -87,6 +87,7 @@
                     <jvmarg value="-XX:MaxPermSize=512m" />
                     <jvmarg value="-Xmx512m" />
                     -->
                    + <sysproperty key="java.net.preferIPv4Stack" value="true" />
                     <sysproperty key="java.endorsed.dirs" value="${jboss.dist}/lib/endorsed" />
                     <sysproperty key="jgroups.udp.ip_ttl" value="${jbosstest.udp.ip_ttl}" />
                     </server>
                    


                    • 7. Re: Strange clustering behaviour when running testsuite offl
                      brian.stansberry

                      I owe you at least several beers. And I'm coming to Neuchatel next month so hopefully I can pay up. :-)

                      I'll apply the same to all the server:config entries; it's likely the underlying cause of the long-discussed "hudson noip test run doesn't complete" problem.

                      http://wiki.jboss.org/wiki/Wiki.jsp?page=IPv6 discusses the known issues with IPv6 and JGroups. That page is a bit dated; looks like you've found a new twist on the problem.

                      What JDK and OS are you running?

                      • 8. Re: Strange clustering behaviour when running testsuite offl

                         

                        "bstansberry@jboss.com" wrote:

                        What JDK and OS are you running?


                        [ejort@warjort build]$ uname -a
                        Linux warjort 2.6.22.9-61.fc6 #1 SMP Thu Sep 27 17:45:57 EDT 2007 i686 i686 i386 GNU/Linux
                        [ejort@warjort build]$ java -version
                        java version "1.5.0_09"
                        Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
                        Java HotSpot(TM) Server VM (build 1.5.0_09-b03, mixed mode)
                        


                        • 9. Re: Strange clustering behaviour when running testsuite offl
                          galder.zamarreno

                          I wonder whether it makes sense to have some AS and testsuite start options defined in a central point to avoid silly things like this?, i.e. -Djava.net.preferIPv4Stack

                          • 10. Re: Strange clustering behaviour when running testsuite offl
                            dimitris

                            Maybe we should enable java.net.preferIPv4Stack by default in server Main().

                            • 11. Re: Strange clustering behaviour when running testsuite offl
                              galder.zamarreno

                              I'm not so sure of putting it there. I was thinking more of having the same source for all common, shared options for standard bin AS startup and testsuite started AS instances.

                              If Sun eventually get around to fixing the IPv6 issues in the jvm, if java.net.preferIPv4Stack was in Main, we'd have to change the code.