2 Replies Latest reply on Oct 19, 2006 8:58 AM by rnmixon

    After shunned, transport and corr are null

    rnmixon

      We are starting to see a problem where one cache member is shunned. It used to happen every six weeks or so, but is not happening at least once a week.

      We are running Hibernate 2.8 and the jcache version that comes with it. The jboss-cache.jar manifest says its version is 1.1.1.

      Our topology consists of two Tomcat servers running SuSE Linux Enterprise Server 9. Both servers have 4GB RAM and run dual Opterons in 64-bit mode, though one server is slightly faster than the other. The servers are directly connected to each other over a pair of dedicated ethernet ports. Connections to the web server from Tomcat take place over a separate ethernet port.

      Once the error occurs, the Tomcat instance that shows the following errors in the logs goes to 100% CPU usage and must be restarted.

      I have seen posts that recommended using the FD_SOCK protocol instead of FD in some cases, but am not sure if this applies to us. Also would upgrading the jboss-cache.jar, jboss-common.jar and jboss-system.jar to the ones from JBoss Cache 1.2.4SP2 be of use?

      Thanks in advance, the relevant messages from my Tomcat log are below.

      - Richard

      Here are the relevant log messages:

      2006-10-17 12:43:51,022 INFO [TP-Processor14] ActionFilter:653 - Handling request
      
      2006-10-17 12:44:59,214 WARN [UpHandler (FD)] FD:220 - I was suspected, but will not remove myself from membership (waiting for EXIT message)
      
      2006-10-17 12:44:59,249 WARN [UpHandler (GMS)] GMS:324 - checkSelfInclusion() failed, kingfishS11:5692 is not a member of view [gofishS11:15833|2] [gofishS11:15833]; discarding view
      
      2006-10-17 12:44:59,250 WARN [UpHandler (GMS)] GMS:333 - I (kingfishS11:5692) am being shunned, will leave and rejoin group (prev_members are [gofishS11:15833 kingfishS11:5692 ])
      
      2006-10-17 12:45:42,908 INFO [TP-Processor2] ActionFilter:653 - Handling request
      
      2006-10-17 12:45:43,008 ERROR [TP-Processor2] GroupRequest:178 - both corr and transport are null, cannot send group request
      
      2006-10-17 12:45:43,017 ERROR [TP-Processor2] ActionFilter:370 - Exception getting UserTransaction object to handle request net.sf.hibernate.cache.CacheException: org.jboss.util.NestedRuntimeException: rsp=sender=gofishS11:15833, retval=null, received=false, suspected=false; - nested throwable: (org.jboss.cache.lock.TimeoutException: rsp=sender=gofishS11:15833, retval=null, received=false, suspected=false)
      net.sf.hibernate.cache.CacheException: org.jboss.util.NestedRuntimeException: rsp=sender=gofishS11:15833, retval=null, received=false, suspected=false; - nested throwable: (org.jboss.cache.lock.TimeoutException: rsp=sender=gofishS11:15833, retval=null, received=false, suspected=false)
       at net.sf.hibernate.cache.TreeCache.remove(TreeCache.java:44)
       at net.sf.hibernate.cache.TransactionalCache.remove(TransactionalCache.java:79)
       at net.sf.hibernate.impl.SessionImpl.refresh(SessionImpl.java:2195)
       at net.sf.hibernate.impl.SessionImpl.refresh(SessionImpl.java:2160)
       at com.ltoj.persistence.ServiceLocator.createUserTransaction(ServiceLocator.java:222)
       at com.ltoj.webapp.filter.ActionFilter.doFilter(ActionFilter.java:356)
       at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)