3 Replies Latest reply on May 17, 2007 1:15 AM by jagadeeshvn

    getCacheFromCoordinator received null cache

    jagadeeshvn

      Hi All,

      I am trying to setup a tomcat cluster with 5 servers and my application uses jBoss pojo cache. Some of my servers (lets call it web5, web8 and web10) had some problems finding each other in the cluster and we found that there were some issues with multicast packets not reaching the server. Servers are all multi-homed and so we decided to use GossipRouter and we started it in one of the nodes and used all the configurations that were mentioned in the article. (http://www.jgroups.org/javagroupsnew/docs/manual/html/user-advanced.html).

      Now all the servers started talking to each other, but session replication is still not working in web5, web8 and web10. When I start the server, I am getting the following console output


      -------------------------------------------------------
      GMS: address is 10.5.108.78:36970
      -------------------------------------------------------
      INFO : [2007 05 10, 08-37:09(880)] : org.jboss.cache.TreeCache.viewAccepted(TreeCache.java:5342)- viewAccepted(): [10.5.108.80:33011|1] [10.5.108.80:33011, 10.5.108.78:36970]
      INFO : [2007 05 10, 08-37:09(889)] : org.jboss.cache.TreeCache.startService(TreeCache.java:1426)- TreeCache local address is 10.5.108.78:36970
      ERROR: [2007 05 10, 08-37:12(882)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
      org.jboss.cache.CacheException: Initial state transfer failed: Channel.getState() returned false
      at org.jboss.cache.TreeCache.fetchStateOnStartup(TreeCache.java:3191)
      at org.jboss.cache.TreeCache.startService(TreeCache.java:1429)
      at org.jboss.cache.aop.PojoCache.startService(PojoCache.java:94)
      at com.xminds.SessionTracker.createCache(SessionTracker.java:42)
      at com.xminds.SessionTracker.StartCache(SessionTracker.java:27)
      at com.xminds.servlets.BaseServlet.(BaseServlet.java:20)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
      at java.lang.Class.newInstance0(Class.java:350)
      at java.lang.Class.newInstance(Class.java:303)
      at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1055)
      at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:932)
      at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:3951)
      at org.apache.catalina.core.StandardContext.start(StandardContext.java:4225)
      at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:759)
      at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
      at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:524)
      at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:809)
      at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:698)
      at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:472)
      at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1122)
      at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
      at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
      at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1021)
      at org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
      at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1013)
      at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:442)
      at org.apache.catalina.core.StandardService.start(StandardService.java:450)
      at org.apache.catalina.core.StandardServer.start(StandardServer.java:709)
      at org.apache.catalina.startup.Catalina.start(Catalina.java:551)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:585)
      at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
      at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:432)
      May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager start
      INFO: Register manager /SessionTest to cluster element Host with name localhost
      May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager start
      INFO: Starting clustering manager at /SessionTest
      May 10, 2007 8:37:15 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
      INFO: SEND May 10, 2007:8:37:15 AM 1 10.5.108.80:4,010 GET-ALL-/SessionTest
      May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager getAllClusterSessions
      WARNING: Manager [/SessionTest], requesting session state from org.apache.catalina.cluster.mcast.McastMember[tcp://10.5.108.80:4010,TreeCache-Cluster,10.5.108.80,4010, alive=11440]. This operation will timeout if no session state has been received within 60 seconds.
      ERROR: [2007 05 10, 08-37:16(390)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
      ERROR: [2007 05 10, 08-37:19(899)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
      INFO : [2007 05 10, 08-37:20(426)] : org.jboss.cache.TreeCache._setState(TreeCache.java:2622)- received the state (size=1024 bytes)
      May 10, 2007 8:38:15 AM org.apache.catalina.cluster.session.DeltaManager waitForSendAllSessions
      SEVERE: Manager [/SessionTest]: No session state send at 5/10/07 8:37 AM received, timing out after 60,025 ms.
      May 10, 2007 8:38:15 AM org.apache.coyote.http11.Http11BaseProtocol start
      INFO: Starting Coyote HTTP/1.1 on http-8080
      May 10, 2007 8:38:15 AM org.apache.coyote.http11.Http11BaseProtocol start
      INFO: Starting Coyote HTTP/1.1 on http-8443
      May 10, 2007 8:38:15 AM org.apache.jk.common.ChannelSocket init
      INFO: JK: ajp13 listening on /0.0.0.0:8009
      May 10, 2007 8:38:15 AM org.apache.jk.server.JkMain start
      INFO: Jk running ID=0 time=0/18 config=null
      May 10, 2007 8:38:15 AM org.apache.catalina.storeconfig.StoreLoader load
      INFO: Find registry server-registry.xml at classpath resource
      May 10, 2007 8:38:15 AM org.apache.catalina.startup.Catalina start
      INFO: Server startup in 69672 ms
      May 10, 2007 8:38:20 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
      INFO: SEND May 10, 2007:8:38:20 AM 0 - 445B819C79A10F527B0A419D2D276B85.node3-1178804300523
      May 10, 2007 8:38:20 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
      INFO: SEND May 10, 2007:8:38:20 AM 2 - 445B819C79A10F527B0A419D2D276B85.node3-1178804300580
      INFO : [2007 05 10, 08-38:30(036)] : com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:31)- Receiving add person request from : 61.17.42.35
      INFO : [2007 05 10, 08-38:30(154)] : com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:78)- Adding person : 123, 123 [123<1111> : 123 ] to cache.
      Adding person : 123, 123 [123<1111> : 123 ] to cache against key : 123
      ERROR: [2007 05 10, 08-38:30(156)] : org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)- Servlet.service() for servlet addperson threw exception
      java.lang.NullPointerException
      at com.xminds.SessionTracker.put(SessionTracker.java:64)
      at com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:80)
      at com.xminds.servlets.AddPersonServlet.doPost(AddPersonServlet.java:27)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
      at org.apache.catalina.cluster.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:209)
      at org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:346)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
      at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
      at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
      at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
      at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697)
      at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889)
      at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
      at java.lang.Thread.run(Thread.java:595)
      May 10, 2007 8:38:35 AM org.apache.catalina.cluster.deploy.WarWatcher check
      INFO: check cluster wars at /cluster/apache-tomcat-5.5.20/war-listen


      NullpointerException is due to cache being not started owing to the first exception.


      Please find below my services xml file

      <?xml version="1.0" encoding="UTF-8" ?>



      jboss:service=TransactionManager

      <!-- Configure the TransactionManager -->

      org.jboss.cache.DummyTransactionManagerLookup


      <!-- Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      READ_COMMITTED
      READ_UNCOMMITTED
      NONE
      -->
      REPEATABLE_READ

      <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
      REPL_SYNC

      <!-- Just used for async repl: use a replication queue -->
      false

      <!-- Replication interval for replication queue (in ms) -->
      0

      <!-- Max number of elements which trigger replication -->
      0

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other
      -->
      Sample-Cache

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml

      -->

      <!--bind_addr="75.126.68.196" -->



      <!-- UDP: if you have a multihomed machine,
      set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
      -->
      <!-- UDP: On Windows machines, because of the media sense feature
      being broken with multicast (even after disabling media sense)
      set the loopback attribute to true
      -->
      <UDP mcast_addr="228.1.2.3" mcast_port="48866" bind_addr="10.5.108.80"
      ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
      mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
      ucast_recv_buf_size="80000" loopback="false" />
      <PING up_thread="false" down_thread="false" gossip_host="75.126.68.195" gossip_port="5555" gossip_refresh="15000" timeout="2000" num_initial_members="3"/>
      <MERGE2 min_interval="10000" max_interval="20000" />
      <FD_SOCK />
      <VERIFY_SUSPECT timeout="1500" up_thread="false"
      down_thread="false" />
      <pbcast.NAKACK gc_lag="50"
      retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
      up_thread="false" down_thread="false" />
      <UNICAST timeout="600,1200,2400" window_size="100"
      min_threshold="10" down_thread="false" />
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false" />
      <FRAG frag_size="8192" down_thread="false"
      up_thread="false" />
      <pbcast.GMS join_timeout="5000"
      join_retry_timeout="2000" shun="true" print_local_addr="true" />
      <pbcast.STATE_TRANSFER up_thread="true"
      down_thread="true" />



      <!-- Whether or not to fetch state on joining a cluster -->
      true

      <!-- The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment

      -->
      5000

      <!-- Number of milliseconds to wait until all responses for a
      synchronous call have been received.
      -->
      15000

      <!-- Max number of milliseconds to wait for a lock acquisition -->
      10000

      <!-- Name of the eviction policy class. -->






      Any idea why this is happening with the 3 servers. I am getting the application to work in web6 and web9 without any issues and session replication is also working fine.

      Any help will be greatly appreciated.

      Thanks
      Jugs

        • 1. Re: getCacheFromCoordinator received null cache
          jagadeeshvn

          Sorry, couldn't attach the XML before.


          <?xml version="1.0" encoding="UTF-8" ?>
          
          <server>
           <mbean code="org.jboss.cache.aop.PojoCache"
           name="jboss.cache:service=PojoCache">
           <depends>jboss:service=TransactionManager</depends>
          
           <!-- Configure the TransactionManager -->
           <attribute name="TransactionManagerLookupClass">
           org.jboss.cache.DummyTransactionManagerLookup
           </attribute>
          
           <!-- Isolation level : SERIALIZABLE
           REPEATABLE_READ (default)
           READ_COMMITTED
           READ_UNCOMMITTED
           NONE
           -->
           <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
          
           <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
           <attribute name="CacheMode">REPL_SYNC</attribute>
          
           <!-- Just used for async repl: use a replication queue -->
           <attribute name="UseReplQueue">false</attribute>
          
           <!-- Replication interval for replication queue (in ms) -->
           <attribute name="ReplQueueInterval">0</attribute>
          
           <!-- Max number of elements which trigger replication -->
           <attribute name="ReplQueueMaxElements">0</attribute>
          
           <!-- Name of cluster. Needs to be the same for all clusters, in order
           to find each other
           -->
           <attribute name="ClusterName">Sample-Cache</attribute>
          
           <!-- JGroups protocol stack properties. Can also be a URL,
           e.g. file:/home/bela/default.xml
           <attribute name="ClusterProperties"></attribute>
           -->
          
           <!--bind_addr="75.126.68.196" -->
           <attribute name="ClusterConfig">
          
           <config>
           <!-- UDP: if you have a multihomed machine,
           set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
           -->
           <!-- UDP: On Windows machines, because of the media sense feature
           being broken with multicast (even after disabling media sense)
           set the loopback attribute to true
           -->
           <UDP mcast_addr="228.1.2.3" mcast_port="48866" bind_addr="10.5.108.80"
           ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
           mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
           ucast_recv_buf_size="80000" loopback="false" />
           <PING up_thread="false" down_thread="false" gossip_host="75.126.68.195" gossip_port="5555" gossip_refresh="15000" timeout="2000" num_initial_members="3"/>
           <MERGE2 min_interval="10000" max_interval="20000" />
           <FD_SOCK />
           <VERIFY_SUSPECT timeout="1500" up_thread="false"
           down_thread="false" />
           <pbcast.NAKACK gc_lag="50"
           retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
           up_thread="false" down_thread="false" />
           <UNICAST timeout="600,1200,2400" window_size="100"
           min_threshold="10" down_thread="false" />
           <pbcast.STABLE desired_avg_gossip="20000"
           up_thread="false" down_thread="false" />
           <FRAG frag_size="8192" down_thread="false"
           up_thread="false" />
           <pbcast.GMS join_timeout="5000"
           join_retry_timeout="2000" shun="true" print_local_addr="true" />
           <pbcast.STATE_TRANSFER up_thread="true"
           down_thread="true" />
           </config>
           </attribute>
          
           <!-- Whether or not to fetch state on joining a cluster -->
           <attribute name="FetchStateOnStartup">true</attribute>
          
           <!-- The max amount of time (in milliseconds) we wait until the
           initial state (ie. the contents of the cache) are retrieved from
           existing members in a clustered environment
          
           -->
           <attribute name="InitialStateRetrievalTimeout">5000</attribute>
          
           <!-- Number of milliseconds to wait until all responses for a
           synchronous call have been received.
           -->
           <attribute name="SyncReplTimeout">15000</attribute>
          
           <!-- Max number of milliseconds to wait for a lock acquisition -->
           <attribute name="LockAcquisitionTimeout">10000</attribute>
          
           <!-- Name of the eviction policy class. -->
           <attribute name="EvictionPolicyClass" />
           </mbean>
          </server>


          • 2. Re: getCacheFromCoordinator received null cache
            manik

            Is this intermittent? Could be that your InitialStateRetrievalTimeout is too short...

            • 3. Re: getCacheFromCoordinator received null cache
              jagadeeshvn

              Thanks for your reply.

              Infact it is not intermittent and is happening always. However I solved the problem by using multiple TCP instead of multicast.

              Thanks
              Jugs