12 Replies Latest reply on Nov 29, 2006 11:29 AM by belaban

    Error starting cache with TCP and TCPPING

    urisherman

      Hi,
      I tried running a simple program using JBossCache (both from a standalone on windows and in WLS8.1 on linux, so the environment probably isn't the issue), after I changed the configuration to use TCP and TCPPING unicasting rather than UDP and multicasting.

      I took the new configuration from some example in JGroups site -


      <TCP start_port="7050" />
      <TCPPING initial_hosts="10.106.124.239[7050],10.106.124.240[7050]" port_range="5" timeout="3000"
      num_initial_members="2" up_thread="true" down_thread="true"/>

      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>

      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>

      <pbcast.NAKACK gc_lag="100" retransmit_timeout="3000"
      up_thread="true" down_thread="true"/>

      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="false" print_local_addr="false" up_thread="true" down_thread="true"/>


      I got this error:

      Exception in thread "Main Thread" org.jgroups.ChannelException: unable to setup the protocol stack
      at org.jgroups.JChannel.(JChannel.java:261)
      at org.jgroups.JChannel.(JChannel.java:234)
      at org.jboss.cache.TreeCache._createService(TreeCache.java:1373)
      at org.jboss.cache.TreeCache.createService(TreeCache.java:1300)
      at test.Main.main(Main.java:22)
      Caused by: java.lang.Exception: Configurator.sanityCheck(): event GET_DIGEST_STABLE is required by STABLE, but not provided by any of the layers below
      at org.jgroups.stack.Configurator.sanityCheck(Configurator.java:345)
      at org.jgroups.stack.Configurator.createProtocols(Configurator.java:280)
      at org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:56)
      at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:177)
      at org.jgroups.JChannel.(JChannel.java:258)
      ... 4 more

      Any help would be appreciated, couldn't find any info about this "GET_DIGEST_STABLE" event....

        • 1. Re: Error starting cache with TCP and TCPPING
          belaban

          This stack config is totally incorrect, you're missing 50%, take tcp.xml from any of the JGroups src distros...

          • 2. Re: Error starting cache with TCP and TCPPING
            urisherman

            Hi,
            I got tcp.xml from JGroups distros, this is my configuration now -
            <TCP start_port="7050"
            loopback="true"
            recv_buf_size="20000000"
            send_buf_size="640000"
            discard_incompatible_packets="true"
            max_bundle_size="64000"
            max_bundle_timeout="30"
            use_incoming_packet_handler="true"
            use_outgoing_packet_handler="false"
            down_thread="false" up_thread="false"
            enable_bundling="true"
            use_send_queues="false"
            sock_conn_timeout="300"
            skip_suspected_members="true"/>
            <TCPPING timeout="3000"
            down_thread="false" up_thread="false"
            initial_hosts="10.106.124.239[7050],10.106.124.240[7050]"
            port_range="1"
            num_initial_members="3"/>
            <MERGE2 max_interval="100000"
            down_thread="false" up_thread="false" min_interval="20000"/>
            <FD_SOCK down_thread="false" up_thread="false"/>
            <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
            <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
            <pbcast.NAKACK max_xmit_size="60000"
            use_mcast_xmit="false" gc_lag="0"
            retransmit_timeout="300,600,1200,2400,4800"
            down_thread="false" up_thread="false"
            discard_delivered_msgs="true"/>
            <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
            down_thread="false" up_thread="false"
            max_bytes="400000"/>
            <pbcast.GMS print_local_addr="true" join_timeout="3000"
            down_thread="false" up_thread="false"
            join_retry_timeout="2000" shun="true"
            view_bundling="true"/>
            <FC max_credits="2000000" down_thread="false" up_thread="false"
            min_threshold="0.10"/>
            <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
            <pbcast.STREAMING_STATE_TRANSFER down_thread="false" up_thread="false"
            use_flush="true" use_reading_thread="true"/>
            <!-- pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/ -->
            <pbcast.FLUSH down_thread="false" up_thread="false"/>



            But it still doesn't work. I'm kind of lost here......
            I got this error......
            org.jgroups.ChannelException: unable to setup the protocol stack
            at org.jgroups.JChannel.(Lorg.jgroups.conf.ProtocolStackConfigurator;)V(JChannel.java:261)
            at org.jgroups.JChannel.(Ljava.lang.String;)V(JChannel.java:234)
            at org.jboss.cache.TreeCache._createService()V(TreeCache.java:1373)
            at org.jboss.cache.TreeCache.createService()V(TreeCache.java:1300)
            at org.jboss.cache.example.j2eeservices.JBossCacheManager.main([Ljava.lang.String;)V(JBossCacheManager.java:74)
            at jrockit.reflect.NativeMethodInvoker.invoke0(Ljava.lang.Object;ILjava.lang.Object;[Ljava.lang.Object;)Ljava.lang.Object;(Unknown Source)
            at jrockit.reflect.NativeMethodInvoker.invoke(Ljava.lang.Object;[Ljava.lang.Object;)Ljava.lang.Object;(Unknown Source)
            at java.lang.reflect.Method.invoke(Ljava.lang.Object;[Ljava.lang.Object;I)Ljava.lang.Object;(Unknown Source)
            at weblogic.t3.srvr.StartupClassService.invokeMain(Ljava.lang.String;Ljava.lang.Class;Ljava.lang.String;)V(StartupClassService.java:229)
            at weblogic.t3.srvr.StartupClassService.invokeClass(Ljava.lang.String;Ljava.lang.String;Ljava.lang.String;)V(StartupClassService.java:160)
            at weblogic.t3.srvr.StartupClassService.access$000(Lweblogic.t3.srvr.StartupClassService;Ljava.lang.String;Ljava.lang.String;Ljava.lang.String;)V(StartupClassService.java:36)
            at weblogic.t3.srvr.StartupClassService$1.run()Ljava.lang.Object;(StartupClassService.java:121)
            at weblogic.security.acl.internal.AuthenticatedSubject.doAs(Lweblogic.security.subject.AbstractSubject;Ljava.security.PrivilegedAction;)Ljava.lang.Object;(AuthenticatedSubject.java:321)
            at weblogic.security.service.SecurityManager.runAs(Lweblogic.security.acl.internal.AuthenticatedSubject;Lweblogic.security.acl.internal.AuthenticatedSubject;Ljava.security.PrivilegedAction;)Ljava.lang.Object;(SecurityManager.java:118)
            at weblogic.t3.srvr.StartupClassService.invokeStartupClass(Lweblogic.management.configuration.StartupClassMBean;)V(StartupClassService.java:116)
            at weblogic.t3.srvr.StartupClassService.initialize()V(StartupClassService.java:60)
            at weblogic.t3.srvr.SubsystemManager.initialize()V(SubsystemManager.java:118)
            at weblogic.t3.srvr.T3Srvr.initializeHere()V(T3Srvr.java:895)
            at weblogic.t3.srvr.T3Srvr.initialize()V(T3Srvr.java:670)
            at weblogic.t3.srvr.T3Srvr.run([Ljava.lang.String;)I(T3Srvr.java:344)
            at weblogic.Server.main([Ljava.lang.String;)V(Server.java:32)
            Caused by: java.lang.Exception: ProtocolStack.setup(): couldn't create protocol stack
            at org.jgroups.stack.ProtocolStack.setup()V(ProtocolStack.java:179)
            at org.jgroups.JChannel.(Lorg.jgroups.conf.ProtocolStackConfigurator;)V(JChannel.java:258)
            ... 20 more

            • 3. Re: Error starting cache with TCP and TCPPING
              belaban

              If you have the full stack trace, JGroups should spit out the reason why you cannot create the stack.
              I also suggest you test this config with standalone JGroups, to see whether this works: http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss

              • 4. Re: Error starting cache with TCP and TCPPING
                urisherman

                 

                If you have the full stack trace, JGroups should spit out the reason why you cannot create the stack.


                That's what you'd think, but no..... all I got was that error message I posted.

                Anyway, I realized the distro's version I downloaded might be incompatible with the one jbosscache uses, so I downloaded an earlier JGroups distro and now it's starting up properly.

                This is my configuration now -
                <config>
                <TCP start_port="7050" loopback="true"
                send_buf_size="100000" recv_buf_size="200000" />
                <TCPPING timeout="3000" initial_hosts="10.106.124.240[7050],10.106.124.239[7050]"
                port_range="3" num_initial_members="2" />
                <FD timeout="2000" max_tries="4" />
                <VERIFY_SUSPECT timeout="1500" down_thread="false"
                up_thread="false" />
                <pbcast.NAKACK gc_lag="100"
                retransmit_timeout="600,1200,2400,4800" />
                <pbcast.STABLE stability_delay="1000"
                desired_avg_gossip="20000" down_thread="false" max_bytes="0"
                up_thread="false" />
                <VIEW_SYNC avg_send_interval="60000" down_thread="false"
                up_thread="false" />
                <pbcast.GMS print_local_addr="true" join_timeout="5000"
                join_retry_timeout="2000" shun="true" />
                <pbcast.STATE_TRANSFER down_thread="true"
                up_thread="true"/>
                </config>

                But I got a new problem -
                The first machine starts up just fine, I then add a node to the tree and start the second machine, which fails to retrieve the initial state -

                org.jboss.cache.CacheException: Initial state transfer failed: Channel.getState() returned false
                at org.jboss.cache.TreeCache.fetchStateOnStartup()V(TreeCache.java:3191)
                at org.jboss.cache.TreeCache.startService()V(TreeCache.java:1429)
                at org.jboss.cache.example.j2eeservices.JBossCacheManager.main([Ljava.lang.String;)V(JBossCacheManager.java:75)
                at jrockit.reflect.NativeMethodInvoker.invoke0(Ljava.lang.Object;ILjava.lang.Object;[Ljava.lang.Object;)Ljava.lang.Object;(Unknown Source)
                at jrockit.reflect.NativeMethodInvoker.invoke(Ljava.lang.Object;[Ljava.lang.Object;)Ljava.lang.Object;(Unknown Source)
                at java.lang.reflect.Method.invoke(Ljava.lang.Object;[Ljava.lang.Object;I)Ljava.lang.Object;(Unknown Source)
                at weblogic.t3.srvr.StartupClassService.invokeMain(Ljava.lang.String;Ljava.lang.Class;Ljava.lang.String;)V(StartupClassService.java:229)
                at weblogic.t3.srvr.StartupClassService.invokeClass(Ljava.lang.String;Ljava.lang.String;Ljava.lang.String;)V(StartupClassService.java:160)
                at weblogic.t3.srvr.StartupClassService.access$000(Lweblogic.t3.srvr.StartupClassService;Ljava.lang.String;Ljava.lang.String;Ljava.lang.String;)V(StartupClassService.java:36)
                at weblogic.t3.srvr.StartupClassService$1.run()Ljava.lang.Object;(StartupClassService.java:121)
                at weblogic.security.acl.internal.AuthenticatedSubject.doAs(Lweblogic.security.subject.AbstractSubject;Ljava.security.PrivilegedAction;)Ljava.lang.Object;(AuthenticatedSubject.java:321)
                at weblogic.security.service.SecurityManager.runAs(Lweblogic.security.acl.internal.AuthenticatedSubject;Lweblogic.security.acl.internal.AuthenticatedSubject;Ljava.security.PrivilegedAction;)Ljava.lang.Object;(SecurityManager.java:118)
                at weblogic.t3.srvr.StartupClassService.invokeStartupClass(Lweblogic.management.configuration.StartupClassMBean;)V(StartupClassService.java:116)
                at weblogic.t3.srvr.StartupClassService.initialize()V(StartupClassService.java:60)
                at weblogic.t3.srvr.SubsystemManager.initialize()V(SubsystemManager.java:118)
                at weblogic.t3.srvr.T3Srvr.initializeHere()V(T3Srvr.java:895)
                at weblogic.t3.srvr.T3Srvr.initialize()V(T3Srvr.java:670)
                at weblogic.t3.srvr.T3Srvr.run([Ljava.lang.String;)I(T3Srvr.java:344)
                at weblogic.Server.main([Ljava.lang.String;)V(Server.java:32)


                I tried the solution suggested at the troubleshooting section, that the state might be taking longer to retrieve than configured as "InitialStateRetrievalTimeout", but that wasn't it, I gave it 50000 millis and it fails after less than 3 seconds, so it obviously has nothing to do with the timeout.....
                Any ideas?

                • 5. Re: Error starting cache with TCP and TCPPING
                  brian.stansberry

                  Is there anything in the logs on the other server that gives any indication as to what went wrong?

                  • 6. Re: Error starting cache with TCP and TCPPING
                    urisherman

                    Sure were, I didn't notice them cause I had a logging porblem. Anyway, it was a couple of jars I was missing (they're not mentioned at the http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossCacheAndWebLogic page).
                    Anyway, I got it sorted out and it seems to be working now.
                    I still get some errors at the log -

                    Main Thread org.jboss.cache.transaction.DummyTransactionManager - binding of DummyTransactionManager failed
                    javax.naming.OperationNotSupportedException: bind not allowed in a ReadOnlyContext; remaining name '/TransactionManager'
                    at weblogic.jndi.factories.java.ReadOnlyContextWrapper.newOperationNotSupportedException(Ljava.lang.String;Ljavax.naming.Name;)Ljavax.naming.OperationNotSupportedException;(ReadOnlyContextWrapper.java:145)
                    at weblogic.jndi.factories.java.ReadOnlyContextWrapper.newOperationNotSupportedException(Ljava.lang.String;Ljava.lang.String;)Ljavax.naming.OperationNotSupportedException;(ReadOnlyContextWrapper.java:161)
                    at weblogic.jndi.factories.java.ReadOnlyContextWrapper.bind(Ljava.lang.String;Ljava.lang.Object;)V(ReadOnlyContextWrapper.java:57)
                    at weblogic.jndi.internal.AbstractURLContext.bind(Ljava.lang.String;Ljava.lang.Object;)V(AbstractURLContext.java:45)
                    at javax.naming.InitialContext.bind(Ljava.lang.String;Ljava.lang.Object;)V(InitialContext.java:355)
                    at org.jboss.cache.transaction.DummyTransactionManager.getInstance()Lorg.jboss.cache.transaction.DummyTransactionManager;(DummyTransactionManager.java:33)
                    at org.jboss.cache.DummyTransactionManagerLookup.getTransactionManager()Ljavax.transaction.TransactionManager;(DummyTransactionManagerLookup.java:17)
                    at org.jboss.cache.TreeCache._createService()V(TreeCache.java:1314)
                    at org.jboss.cache.TreeCache.createService()V(TreeCache.java:1300)
                    .
                    .
                    .



                    And

                    DownHandler (TCP) org.jgroups.protocols.TCP - failed to join /224.0.0.75:7500 on eth1: java.net.SocketException: Invalid argument
                    DownHandler (TCP) org.jgroups.protocols.TCP - failed to join /224.0.0.75:7500 on lo: java.net.SocketException: Invalid argument



                    Should I worry about them?

                    • 7. Re: Error starting cache with TCP and TCPPING
                      urisherman

                      I also get this error when trying to remove a cache entry (using the example given at http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossCacheAndWebLogic ) -

                      This is the code:

                      Node n = cache.get( node );
                      
                      // for some reason, calling n.remove( name ) only removes the object
                      // in a local copy of the cache, and does not replicate this change.
                      // the only way I've found to replicate the change is to remove the
                      // entire node, make changes to it, and add it again.
                      n.remove( name );
                      
                      // this map contains all the orig data, minus the object just removed above.
                      Map data = n.getData();
                      
                      // remove the entire node from cache
                      cache.remove( node );
                      
                      // and add it again with the updated data.
                      cache.put( node, data ); // <---- It fails here
                      
                      logger.debug("Removed item from node");




                      This is the exception:

                      java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: registerNatives
                      at org.jboss.cache.TreeCache.invokeMethod(Lorg.jgroups.blocks.MethodCall;)Ljava.lang.Object;(TreeCache.java:5526)
                      at org.jboss.cache.TreeCache.put(Lorg.jboss.cache.Fqn;Ljava.util.Map;)V(TreeCache.java:3601)
                      at org.jboss.cache.TreeCache.put(Ljava.lang.String;Ljava.util.Map;)V(TreeCache.java:3585)
                      at org.jboss.cache.example.servlet.ManipulateCache.doGet(Ljavax.servlet.http.HttpServletRequest;Ljavax.servlet.http.HttpServletResponse;)V(ManipulateCache.java:94)
                      at javax.servlet.http.HttpServlet.service(Ljavax.servlet.http.HttpServletRequest;Ljavax.servlet.http.HttpServletResponse;)V(HttpServlet.java:740)
                      at javax.servlet.http.HttpServlet.service(Ljavax.servlet.ServletRequest;Ljavax.servlet.ServletResponse;)V(HttpServlet.java:853)
                      at weblogic.servlet.internal.ServletStubImpl$ServletInvocationAction.run()Ljava.lang.Object;(ServletStubImpl.java:996)


                      • 8. Re: Error starting cache with TCP and TCPPING
                        manik

                         


                        Anyway, it was a couple of jars I was missing (they're not mentioned at the http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossCacheAndWebLogic page).
                        Anyway, I got it sorted out and it seems to be working now.


                        Care to mention what these are, or even perhaps update the wiki page? :-)

                        • 9. Re: Error starting cache with TCP and TCPPING
                          manik

                           



                          Main Thread org.jboss.cache.transaction.DummyTransactionManager - binding of DummyTransactionManager failed
                          javax.naming.OperationNotSupportedException: bind not allowed in a ReadOnlyContext; remaining name '/TransactionManager'
                          at



                          This is to do with trying to bind the dummy TM in WL's JNDI tree. Why are you using the dummy TM anyway? Why not use WL's TM?

                          • 10. Re: Error starting cache with TCP and TCPPING
                            manik

                             


                            DownHandler (TCP) org.jgroups.protocols.TCP - failed to join /224.0.0.75:7500 on eth1: java.net.SocketException: Invalid argument
                            DownHandler (TCP) org.jgroups.protocols.TCP - failed to join /224.0.0.75:7500 on lo: java.net.SocketException: Invalid argument



                            TCP trying to bind to a multicast address? Why is this, is this something you have configured? Also, 224.0.0.75 seems to be a reserved address (see IANA website) - perhaps this is something WL uses?


                            • 11. Re: Error starting cache with TCP and TCPPING
                              urisherman

                              Why would it try to bind to a mcast address? I didn't define any -

                              <config>
                              <TCP start_port="7050" loopback="true"
                              send_buf_size="100000" recv_buf_size="200000" />
                              <TCPPING timeout="3000" initial_hosts="10.106.124.240[7050],10.106.124.239[7050]"
                              port_range="3" num_initial_members="2" />
                              <FD timeout="2000" max_tries="4" />
                              <VERIFY_SUSPECT timeout="1500" down_thread="false"
                              up_thread="false" />
                              <pbcast.NAKACK gc_lag="100"
                              retransmit_timeout="600,1200,2400,4800" />
                              <pbcast.STABLE stability_delay="1000"
                              desired_avg_gossip="20000" down_thread="false" max_bytes="0"
                              up_thread="false" />
                              <VIEW_SYNC avg_send_interval="60000" down_thread="false"
                              up_thread="false" />
                              <pbcast.GMS print_local_addr="true" join_timeout="5000"
                              join_retry_timeout="2000" shun="true" />
                              <pbcast.STATE_TRANSFER down_thread="true"
                              up_thread="true"/>
                              </config>


                              • 12. Re: Error starting cache with TCP and TCPPING
                                belaban

                                It's JGroups: set TCP.enable_diagnostics="false", and this will go away. It is a multicast address used for probing JGroups instances in a network. This is described in detail on the JGroups wiki