3 Replies Latest reply on May 17, 2007 1:15 AM by jagadeeshvn

getCacheFromCoordinator received null cache

jagadeeshvn May 10, 2007 10:00 AM

Hi All,

I am trying to setup a tomcat cluster with 5 servers and my application uses jBoss pojo cache. Some of my servers (lets call it web5, web8 and web10) had some problems finding each other in the cluster and we found that there were some issues with multicast packets not reaching the server. Servers are all multi-homed and so we decided to use GossipRouter and we started it in one of the nodes and used all the configurations that were mentioned in the article. (http://www.jgroups.org/javagroupsnew/docs/manual/html/user-advanced.html).

Now all the servers started talking to each other, but session replication is still not working in web5, web8 and web10. When I start the server, I am getting the following console output

-------------------------------------------------------
GMS: address is 10.5.108.78:36970
-------------------------------------------------------
INFO : [2007 05 10, 08-37:09(880)] : org.jboss.cache.TreeCache.viewAccepted(TreeCache.java:5342)- viewAccepted(): [10.5.108.80:33011|1] [10.5.108.80:33011, 10.5.108.78:36970]
INFO : [2007 05 10, 08-37:09(889)] : org.jboss.cache.TreeCache.startService(TreeCache.java:1426)- TreeCache local address is 10.5.108.78:36970
ERROR: [2007 05 10, 08-37:12(882)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
org.jboss.cache.CacheException: Initial state transfer failed: Channel.getState() returned false
at org.jboss.cache.TreeCache.fetchStateOnStartup(TreeCache.java:3191)
at org.jboss.cache.TreeCache.startService(TreeCache.java:1429)
at org.jboss.cache.aop.PojoCache.startService(PojoCache.java:94)
at com.xminds.SessionTracker.createCache(SessionTracker.java:42)
at com.xminds.SessionTracker.StartCache(SessionTracker.java:27)
at com.xminds.servlets.BaseServlet.(BaseServlet.java:20)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at java.lang.Class.newInstance0(Class.java:350)
at java.lang.Class.newInstance(Class.java:303)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1055)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:932)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:3951)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4225)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:759)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:524)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:809)
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:698)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:472)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1122)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1021)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:718)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1013)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:442)
at org.apache.catalina.core.StandardService.start(StandardService.java:450)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:709)
at org.apache.catalina.startup.Catalina.start(Catalina.java:551)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:432)
May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager start
INFO: Register manager /SessionTest to cluster element Host with name localhost
May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager start
INFO: Starting clustering manager at /SessionTest
May 10, 2007 8:37:15 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
INFO: SEND May 10, 2007:8:37:15 AM 1 10.5.108.80:4,010 GET-ALL-/SessionTest
May 10, 2007 8:37:15 AM org.apache.catalina.cluster.session.DeltaManager getAllClusterSessions
WARNING: Manager [/SessionTest], requesting session state from org.apache.catalina.cluster.mcast.McastMember[tcp://10.5.108.80:4010,TreeCache-Cluster,10.5.108.80,4010, alive=11440]. This operation will timeout if no session state has been received within 60 seconds.
ERROR: [2007 05 10, 08-37:16(390)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
ERROR: [2007 05 10, 08-37:19(899)] : org.jgroups.protocols.FD_SOCK.getCacheFromCoordinator(FD_SOCK.java:684)- received null cache; retrying
INFO : [2007 05 10, 08-37:20(426)] : org.jboss.cache.TreeCache._setState(TreeCache.java:2622)- received the state (size=1024 bytes)
May 10, 2007 8:38:15 AM org.apache.catalina.cluster.session.DeltaManager waitForSendAllSessions
SEVERE: Manager [/SessionTest]: No session state send at 5/10/07 8:37 AM received, timing out after 60,025 ms.
May 10, 2007 8:38:15 AM org.apache.coyote.http11.Http11BaseProtocol start
INFO: Starting Coyote HTTP/1.1 on http-8080
May 10, 2007 8:38:15 AM org.apache.coyote.http11.Http11BaseProtocol start
INFO: Starting Coyote HTTP/1.1 on http-8443
May 10, 2007 8:38:15 AM org.apache.jk.common.ChannelSocket init
INFO: JK: ajp13 listening on /0.0.0.0:8009
May 10, 2007 8:38:15 AM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=0/18 config=null
May 10, 2007 8:38:15 AM org.apache.catalina.storeconfig.StoreLoader load
INFO: Find registry server-registry.xml at classpath resource
May 10, 2007 8:38:15 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 69672 ms
May 10, 2007 8:38:20 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
INFO: SEND May 10, 2007:8:38:20 AM 0 - 445B819C79A10F527B0A419D2D276B85.node3-1178804300523
May 10, 2007 8:38:20 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster logSendMessage
INFO: SEND May 10, 2007:8:38:20 AM 2 - 445B819C79A10F527B0A419D2D276B85.node3-1178804300580
INFO : [2007 05 10, 08-38:30(036)] : com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:31)- Receiving add person request from : 61.17.42.35
INFO : [2007 05 10, 08-38:30(154)] : com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:78)- Adding person : 123, 123 [123<1111> : 123 ] to cache.
Adding person : 123, 123 [123<1111> : 123 ] to cache against key : 123
ERROR: [2007 05 10, 08-38:30(156)] : org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:260)- Servlet.service() for servlet addperson threw exception
java.lang.NullPointerException
at com.xminds.SessionTracker.put(SessionTracker.java:64)
at com.xminds.servlets.AddPersonServlet.doService(AddPersonServlet.java:80)
at com.xminds.servlets.AddPersonServlet.doPost(AddPersonServlet.java:27)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at org.apache.catalina.cluster.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:209)
at org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:346)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697)
at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)
May 10, 2007 8:38:35 AM org.apache.catalina.cluster.deploy.WarWatcher check
INFO: check cluster wars at /cluster/apache-tomcat-5.5.20/war-listen

NullpointerException is due to cache being not started owing to the first exception.

Please find below my services xml file

<?xml version="1.0" encoding="UTF-8" ?>

jboss:service=TransactionManager



org.jboss.cache.DummyTransactionManagerLookup


REPEATABLE_READ


REPL_SYNC


false


0


0


Sample-Cache







<UDP mcast_addr="228.1.2.3" mcast_port="48866" bind_addr="10.5.108.80"
ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
ucast_recv_buf_size="80000" loopback="false" />
<PING up_thread="false" down_thread="false" gossip_host="75.126.68.195" gossip_port="5555" gossip_refresh="15000" timeout="2000" num_initial_members="3"/>
<MERGE2 min_interval="10000" max_interval="20000" />
<FD_SOCK />
<VERIFY_SUSPECT timeout="1500" up_thread="false"
down_thread="false" />
<pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
up_thread="false" down_thread="false" />
<UNICAST timeout="600,1200,2400" window_size="100"
min_threshold="10" down_thread="false" />
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false" />
<FRAG frag_size="8192" down_thread="false"
up_thread="false" />
<pbcast.GMS join_timeout="5000"
join_retry_timeout="2000" shun="true" print_local_addr="true" />
<pbcast.STATE_TRANSFER up_thread="true"
down_thread="true" />


true


5000


15000


10000



Any idea why this is happening with the 3 servers. I am getting the application to work in web6 and web9 without any issues and session replication is also working fine.

Any help will be greatly appreciated.

Thanks
Jugs

1. Re: getCacheFromCoordinator received null cache

jagadeeshvn May 10, 2007 10:03 AM (in response to jagadeeshvn)

Sorry, couldn't attach the XML before.

<?xml version="1.0" encoding="UTF-8" ?>

<server>
 <mbean code="org.jboss.cache.aop.PojoCache"
 name="jboss.cache:service=PojoCache">
 <depends>jboss:service=TransactionManager</depends>

 <!-- Configure the TransactionManager -->
 <attribute name="TransactionManagerLookupClass">
 org.jboss.cache.DummyTransactionManagerLookup
 </attribute>

 <!-- Isolation level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">REPEATABLE_READ</attribute>

 <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
 <attribute name="CacheMode">REPL_SYNC</attribute>

 <!-- Just used for async repl: use a replication queue -->
 <attribute name="UseReplQueue">false</attribute>

 <!-- Replication interval for replication queue (in ms) -->
 <attribute name="ReplQueueInterval">0</attribute>

 <!-- Max number of elements which trigger replication -->
 <attribute name="ReplQueueMaxElements">0</attribute>

 <!-- Name of cluster. Needs to be the same for all clusters, in order
 to find each other
 -->
 <attribute name="ClusterName">Sample-Cache</attribute>

 <!-- JGroups protocol stack properties. Can also be a URL,
 e.g. file:/home/bela/default.xml
 <attribute name="ClusterProperties"></attribute>
 -->

 <!--bind_addr="75.126.68.196" -->
 <attribute name="ClusterConfig">

 <config>
 <!-- UDP: if you have a multihomed machine,
 set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
 -->
 <!-- UDP: On Windows machines, because of the media sense feature
 being broken with multicast (even after disabling media sense)
 set the loopback attribute to true
 -->
 <UDP mcast_addr="228.1.2.3" mcast_port="48866" bind_addr="10.5.108.80"
 ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000"
 mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
 ucast_recv_buf_size="80000" loopback="false" />
 <PING up_thread="false" down_thread="false" gossip_host="75.126.68.195" gossip_port="5555" gossip_refresh="15000" timeout="2000" num_initial_members="3"/>
 <MERGE2 min_interval="10000" max_interval="20000" />
 <FD_SOCK />
 <VERIFY_SUSPECT timeout="1500" up_thread="false"
 down_thread="false" />
 <pbcast.NAKACK gc_lag="50"
 retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
 up_thread="false" down_thread="false" />
 <UNICAST timeout="600,1200,2400" window_size="100"
 min_threshold="10" down_thread="false" />
 <pbcast.STABLE desired_avg_gossip="20000"
 up_thread="false" down_thread="false" />
 <FRAG frag_size="8192" down_thread="false"
 up_thread="false" />
 <pbcast.GMS join_timeout="5000"
 join_retry_timeout="2000" shun="true" print_local_addr="true" />
 <pbcast.STATE_TRANSFER up_thread="true"
 down_thread="true" />
 </config>
 </attribute>

 <!-- Whether or not to fetch state on joining a cluster -->
 <attribute name="FetchStateOnStartup">true</attribute>

 <!-- The max amount of time (in milliseconds) we wait until the
 initial state (ie. the contents of the cache) are retrieved from
 existing members in a clustered environment

 -->
 <attribute name="InitialStateRetrievalTimeout">5000</attribute>

 <!-- Number of milliseconds to wait until all responses for a
 synchronous call have been received.
 -->
 <attribute name="SyncReplTimeout">15000</attribute>

 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <attribute name="LockAcquisitionTimeout">10000</attribute>

 <!-- Name of the eviction policy class. -->
 <attribute name="EvictionPolicyClass" />
 </mbean>
</server>

2. Re: getCacheFromCoordinator received null cache

manik May 16, 2007 10:50 AM (in response to jagadeeshvn)

Is this intermittent? Could be that your InitialStateRetrievalTimeout is too short...
Actions
3. Re: getCacheFromCoordinator received null cache

jagadeeshvn May 17, 2007 1:15 AM (in response to jagadeeshvn)

Thanks for your reply.

Infact it is not intermittent and is happening always. However I solved the problem by using multiple TCP instead of multicast.

Thanks
Jugs
Actions

Go to original post