2 Replies Latest reply on Mar 27, 2009 11:02 AM by brian.stansberry

    invalidation messages lost/ignored?

      I have 2-node cluster running on the same host using MVCC and INVALIDATION_ASYNC (JGroups udp) with Hibernate 3.3.1. Under load an Entity update in ClusterNodeA does sometimes does not cause invalidation of same Entity in ClusterNodeB:

      State in B's Cache (and in DB) after B updates UserSession:

      /com.doppelganger.domain.UserSession#1116 {item={hibernateVersion=40, boot=false, loginTime=2009-03-26 09:05:48.0, icecastMount=null, _subclass=com.doppelganger.domain.UserSession, macAddressMD5=null, serverSession=null, enclosingSpace=null, userStates=0, creationDate=2009-03-26 09:05:48.0, deviceIdMD5=null, _version=40, expirationTime=2009-03-26 10:06:55.0, token=108fb802-d5ca-40eb-80bb-c347efa95b64, logoutTime=null, _lazyPropertiesUnfetched=false, user=172}}
      


      State in A's Cache at the same time:
      /com.doppelganger.domain.UserSession#1116 {item={hibernateVersion=39, boot=false, loginTime=2009-03-26 09:05:48.0, icecastMount=null, _subclass=com.doppelganger.domain.UserSession, macAddressMD5=null, serverSession=272, enclosingSpace=205, userStates=0, creationDate=2009-03-26 09:05:48.0, deviceIdMD5=null, _version=39, expirationTime=2009-03-26 10:06:55.0, token=108fb802-d5ca-40eb-80bb-c347efa95b64, logoutTime=null, _lazyPropertiesUnfetched=true, user=172}}
      


      The correct state is hibernateVersion=40; serverSession=NULL (not the case in A).

      Subsequent attempts to update this UserSession on A lead to StaleObjectStateException because hibernateVersion is < 40.

      Question:
      Seems like the invalidation message from B (a) either never arrives at A or (b) is ignored by A for some reason. Any hints on which loggers to turn up and for which messages to look to determine whether it's (a) or (b)?

      Possible workaround/solution?
      Let's say lost invalidation messages are a possibility and the app just has to be able to deal with it. Our app has a TX retry mechanism that retries StaleObjectStateException (and other concurrent access-related exceptions). Seems like it'd be more reliable to removeNode for Entity causing the exception before executing a retry. However, this removal should be done locally only (and not be multicast to other nodes).
      I see org.jboss.cache.config.Option.cacheModeLocal and org.hibernate.cache.jbc2.util.CacheHelper.remove(Cache cache, Fqn region, Object key, Option option). Unfortunately, there is no way to pass on Option to org.hibernate.cache.jbc2.entity.TransactionAccess.remove(key). So it seems like the thing to do is to call CacheHelper.removeNode(stuff, Option.cacheModeLocal=true) before TX retry (would execute outside of tx). What do folks think?



        • 1. Re: invalidation messages lost/ignored?

          The possible solution I described above works for me. The code's below, I would still appreciate peer review and feedback, particularly on:
          -whether option.setCacheModeLocal(true) is enough to ensure non-propagation of the remove call (to other cluster nodes)
          -why was cache getting into this (indefinite) state and whether that's to be expected or a sign of a bug/greater problem

          package com.doppelganger.framework.hibernate;
          
          import com.doppelganger.framework.cache.DGEntityTransactionalAccess;
          import javax.transaction.Transaction;
          import org.hibernate.HibernateException;
          import org.hibernate.StaleObjectStateException;
          import org.hibernate.cache.CacheKey;
          import org.hibernate.event.EventSource;
          import org.hibernate.event.FlushEvent;
          import org.hibernate.event.def.DefaultFlushEventListener;
          import org.hibernate.impl.SessionFactoryImpl;
          import org.hibernate.persister.entity.EntityPersister;
          import org.slf4j.Logger;
          import org.slf4j.LoggerFactory;
          import org.springframework.beans.factory.annotation.Autowired;
          import org.springframework.stereotype.Component;
          import org.springframework.transaction.jta.JtaTransactionManager;
          
          /**
           *
           * @author nikita
           */
          @Component("cacheUpdatingFlushEventListener")
          public class CacheUpdatingFlushEventListener extends DefaultFlushEventListener {
          
           final static private Logger LOG = LoggerFactory.getLogger(CacheUpdatingFlushEventListener.class);
           @Autowired
           private JtaTransactionManager jtaTransactionManager;
          
           @Override
           public void onFlush(FlushEvent event) throws HibernateException {
           try {
           super.onFlush(event);
           } catch (StaleObjectStateException ex) {
          
           //if possible, remove offending entity from 2nd level cache
           //remove offending ENTITY cache node (only locally - no cluster-wide propagation)
          
           Transaction tx = null;
           try {
           SessionFactoryImpl sfImpl = (SessionFactoryImpl) event.getSession().getSessionFactory();
           EntityPersister persister = sfImpl.getEntityPersister(ex.getEntityName());
           if (persister.hasCache()) {
           EventSource session = event.getSession();
           CacheKey ck = new CacheKey(
           ex.getIdentifier(),
           persister.getIdentifierType(),
           persister.getRootEntityName(),
           session.getEntityMode(),
           session.getFactory());
          
          
           if (persister.getCacheAccessStrategy() instanceof DGEntityTransactionalAccess) {
          
           LOG.debug("removing Entity cache node with key=" + ck + " after concurrency exception: " + ex);
          
           tx = jtaTransactionManager.getTransactionManager().suspend();
          
           DGEntityTransactionalAccess entityCacheAccessStrategy = (DGEntityTransactionalAccess) persister.getCacheAccessStrategy();
           entityCacheAccessStrategy.remove(ck, true);
           }
           }
           } catch (Throwable t) {
           LOG.error("error removing Entity cache node after detecting exception: " + ex + ". Proceeding to rethrow detected exception", t);
           } finally {
           if (tx != null) {
           try {
           jtaTransactionManager.getTransactionManager().resume(tx);
           } catch (Throwable t) {
           LOG.error("failed to resume a tx - INVESTIGATE!", t);
           }
           }
           //rethrow original exception
           throw ex;
           }
           }
           }
          }
          


          package com.doppelganger.framework.cache;
          
          import org.hibernate.cache.CacheException;
          import org.hibernate.cache.CacheKey;
          import org.hibernate.cache.jbc2.entity.EntityRegionImpl;
          import org.hibernate.cache.jbc2.entity.TransactionalAccess;
          import org.jboss.cache.config.Option;
          
          /**
           *
           * @author nikita
           */
          public class DGEntityTransactionalAccess extends TransactionalAccess {
          
           final private DGTransactionalAccessDelegate accessDelegate;
          
           public DGEntityTransactionalAccess(EntityRegionImpl region, DGTransactionalAccessDelegate delegate) {
           super(region, delegate);
           if (delegate == null) {
           throw new IllegalStateException("accessDelegate cannot be null and must wrap the same 'region' as passed to this constructor");
           }
           this.accessDelegate = delegate;
           }
          
           /**
           * same as TransactionalAccess.remove(key) except adds switch to NOT send replication/invalidation messages
           * @param key
           * @param localOnly
           * @throws org.hibernate.cache.CacheException
           */
           public void remove(CacheKey key, boolean localOnly) throws CacheException {
           if (localOnly) {
           Option option = new Option();
           option.setCacheModeLocal(true);
           getAccessDelegate().remove(key, option);
           } else {
           remove(key);
           }
           }
          
           /**
           * @return the accessDelegate
           */
           public DGTransactionalAccessDelegate getAccessDelegate() {
           return accessDelegate;
           }
          }
          


          /*
           * To change this template, choose Tools | Templates
           * and open the template in the editor.
           */
          package com.doppelganger.framework.cache;
          
          import org.hibernate.cache.CacheException;
          import org.hibernate.cache.CacheKey;
          import org.hibernate.cache.jbc2.BasicRegionAdapter;
          import org.hibernate.cache.jbc2.access.TransactionalAccessDelegate;
          import org.hibernate.cache.jbc2.util.CacheHelper;
          import org.jboss.cache.config.Option;
          
          /**
           *
           * @author nikita
           */
          public class DGTransactionalAccessDelegate extends TransactionalAccessDelegate {
          
           DGTransactionalAccessDelegate(BasicRegionAdapter adapter) {
           super(adapter);
           }
          
           /**
           * same as remove(key), but with an Option
           * @param key
           * @param option
           */
           void remove(CacheKey key, Option option) throws CacheException {
           region.ensureRegionRootExists();
           CacheHelper.remove(cache, regionFqn, key, option);
           //cache.removeNode(new Fqn(region, key));
           //cache.evict(new Fqn(region, key));
           }
          }
          


          /*
           * To change this template, choose Tools | Templates
           * and open the template in the editor.
           */
          package com.doppelganger.framework.cache;
          
          import org.hibernate.cache.CacheDataDescription;
          import org.hibernate.cache.CacheException;
          import org.hibernate.cache.access.AccessType;
          import org.hibernate.cache.access.EntityRegionAccessStrategy;
          import org.hibernate.cache.jbc2.entity.EntityRegionImpl;
          import org.jboss.cache.Cache;
          
          /**
           *
           * @author nikita
           */
          public class DGEntityRegionImpl extends EntityRegionImpl {
          
           public DGEntityRegionImpl(Cache jbcCache, String regionName, String regionPrefix, CacheDataDescription metadata) {
           super(jbcCache, regionName, regionPrefix, metadata);
           }
          
           /**
           * supply our own Entity TransactionalAccess impl; assume we don't need OptimisticTransactionalAccess (since we're using MVCC)
           * @param accessType
           * @return
           * @throws org.hibernate.cache.CacheException
           */
           @Override
           public EntityRegionAccessStrategy buildAccessStrategy(AccessType accessType) throws CacheException {
           if (AccessType.READ_ONLY.equals(accessType)) {
           return new DGEntityReadOnlyAccess(this, new DGTransactionalAccessDelegate(this));
           }
           if (AccessType.TRANSACTIONAL.equals(accessType)) {
           return new DGEntityTransactionalAccess(this, new DGTransactionalAccessDelegate(this));
           }
          
           throw new CacheException("unsupported access type [" + accessType.getName() + "]");
           }
          }
          
          


          • 2. Re: invalidation messages lost/ignored?
            brian.stansberry

            Option.setCacheModeLocal(true) is definitely enough to ensure non-propagation of the remove call.

            Getting into an inconsistent state is for sure a sign of a bug/greater problem. Greater problem might just be don't use async, but let's dig deeper.

            Can you make this happen with INVALIDATION_SYNC? ASYNC by its nature doesn't guarantee cache consistency, although ending up with an earlier version on a node and no JBC exception in the logs doesn't sound right. With INVALIDATION, the only thing that puts data in the cache is the local caller, so having version 39 in the cache means a local caller either

            1) put it in the cache after the invalidation message for 40 had already arrived
            2) the invalidation message for 40 never arrived.

            The latter is unlikely.

            Do you have any logging telling you how version 39 ended up in A? What it via a putFromExternalRead or a put after a tx running on A updated the entity to version 39? This smells more like a putForExternalRead problem to me.