4 Replies Latest reply on Aug 6, 2009 9:51 AM by manik

    Transactions Created On Reads Holding Up Writers ???

    snacker

      We have a cache which is read by 100's of threads per second.
      The threads are all reading one node.

      The first thread that finds the node missing loads the data (which takes 5-10 seconds) the others wait (with a timeout) for it to be populated.

      If we dump the stack traces, one of the threads is waiting here:

      sun.misc.Unsafe.park(Native Method)
      java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:841)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1160)
      org.jboss.cache.util.concurrent.locks.OwnableReentrantLock.tryLock(OwnableReentrantLock.java:100)
      org.jboss.cache.util.concurrent.locks.AbstractSharedLockContainer.acquireLock(AbstractSharedLockContainer.java:94)
      org.jboss.cache.lock.MVCCLockManager.lockAndRecord(MVCCLockManager.java:132)
      org.jboss.cache.mvcc.MVCCNodeHelper.acquireLock(MVCCNodeHelper.java:155)
      org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:235)
      org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:184)
      org.jboss.cache.interceptors.MVCCLockingInterceptor.handlePutKeyValueCommand(MVCCLockingInterceptor.java:101)


      All of the other threads are waiting in our code for the other thread to complete.

      If we set the timeout in our code (which wait for the thread above to complete) less than the timeout of the jboss cache the writer thread above will complete ONLY once our waiting threads have timed out.

      If we set our internal timeout greater than the timeout of the jboss cache then the writer thread will throw timeout errors.

      We can reproduce this any the time using just 2 threads.

      We have tried different settings, but can't get this to work correctly.

      The only thing that we've found that didn't lock the cache up like this is to use the DummyTransactionManager.

      So is the jboss cache tying in to the current EJB transaction when a simple read is done?

      However 1) that shouldn't be used in production 2) the cache doesn't replicate with that transaction manager.

      Here are the cache settings (not using the "Dummy" txn manager):
      <?xml version="1.0" encoding="UTF-8"?>
       <jbosscache xmlns="urn:jboss:jbosscache-core:config:3.1">
       <locking isolationLevel="READ_COMMITTED" lockAcquisitionTimeout="15000"/>
      
       <transaction
       transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup"
       />
      
       <clustering mode="replication" clusterName="SystemCache-Cluster">
       <!--jmxStatistics exposeManagementStatistics="true"/-->
       <sync replTimeout="20000"/>
       <jgroupsConfig>
       <TCP
       bind_addr="devA"
       loopback="true"
       start_port="7855"
       enable_bundling="false"
       />
       <TCPPING
       down_thread="true"
       initial_hosts="devB[7855]"
       num_initial_members="2"
       port_range="1"
       timeout="3500"
       />
       <MERGE2 max_interval="10000" min_interval="5000"/>
       <FD_SOCK/>
       <FD max_tries="5" shun="false" timeout="2500" />
       <VERIFY_SUSPECT timeout="1500" />
       <pbcast.NAKACK
       use_mcast_xmit="false"
       gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       discard_delivered_msgs="false"
       />
       <pbcast.STABLE
       desired_avg_gossip="50000"
       max_bytes="2100000"
       stability_delay="1000"
       />
       <pbcast.GMS
       join_retry_timeout="2000"
       join_timeout="5000"
       print_local_addr="true"
       shun="false"
       view_bundling="true
       />
       <pbcast.STREAMING_STATE_TRANSFER/>
       </jgroupsConfig>
       </clustering>
       <eviction wakeUpInterval="600000">
       <default algorithmClass="org.jboss.cache.eviction.LRUAlgorithm">
       <property name="maxNodes" value="10000"/>
       <property name="maxAge" value="-1"/>
       <property name="timeToLive" value="-1"/>
      </default>
      


      Some other notes:
      1) the cache doesn't auto-deploy as in 4.0.1, so we have to load it manually in the constructor giving it the filename. (a jboss rep at JavaOne '09 seemed puzzled by this)
      2) the jmxStatistics node caused a null pointer when the the api is parsed
      3) if we comment out the transaction element or use "" for the value of transactionManagerLookupClass it throws a "class not found 'null'" error.

      Any ideas?


        • 1. Re: Transactions Created On Reads Holding Up Writers ???
          snacker

          BTW we are using "JBossCache 'Cascabel' 3.1.0.GA"

          • 2. Re: Transactions Created On Reads Holding Up Writers ???
            snacker

            If we change the EJB to @TransactionAttribute(value=TransactionAttributeType.NEVER), then we can use the GenericTransactionManager and it prevents the lockups.

            So, why is it locking on a read?

            Also, since this cache is supposed to persist until explicitly removed we've commented the node.

            Otherwise we would get warnings like:

            "WARN [RegionImpl] putNodeEvent(): eviction node event queue size is at 98% threshold value of capacity: 200000 Region: / You will need to reduce the wakeUpIntervalSeconds parameter."
            and the cache would also wait until it woke up.

            Lowering the interval to 5 seconds increased the cpu load too much.

            I'm puzzled concerning why the reader threads were hanging here, too waiting for the eviction listener (?) to wakeup though.

            The threads were waiting in
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for <0xedd51380> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
            at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
            at org.jboss.cache.RegionImpl.registerEvictionEvent(RegionImpl.java:249)
            at org.jboss.cache.RegionImpl.registerEvictionEvent(RegionImpl.java:234)
            at org.jboss.cache.interceptors.EvictionInterceptor.registerEvictionEventToRegionManager(EvictionInterceptor.java:252)
            at org.jboss.cache.interceptors.EvictionInterceptor.visitGetKeyValueCommand(EvictionInterceptor.java:215)
            at org.jboss.cache.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:97)
            


            It would lockup after only ~50 requests on that node were made.


            • 3. Re: Transactions Created On Reads Holding Up Writers ???
              snacker

              According to the "MVCCLockManager" api:
              (http://www.jboss.org/file-access/default/members/jbosscache/freezone/docs/3.1.0.CR1/apidocs/org/jboss/cache/lock/MVCCLockManager.html

              Boss Cache's MVCC design doesn't use read locks at all.


              Really?

              Then why are the writers waiting for the reader transactons to complete (or request writes themselves) before it will update the cache as shown in the stacks from the earlier posts???

              We are getting many timeouts in our production environment due to this problem.

              I don't think this has anything to do with jgroups, so SwarmCache (or memcached) might be something we're going to take a look at soon.

              Does JBoss have paid support for JBoss Cache? We really need some answers to the problems we are having.

              The only thing we need is for the cache's to NOT join to the ejb transactions (at least not for reading at a minimum).
              We would like to manage any cache transactions manually if possible.


              • 4. Re: Transactions Created On Reads Holding Up Writers ???
                manik

                Hi, sorry for the late response.

                1) Red Hat does offer paid support for JBoss Cache as a part of the JBoss EAP subscription.

                2) Interesting that using the dummy TM solves the problem - it is still a real TM capable of holding real locks. It's just that other components in the app server won't be using it as well. Perhaps there is some contention there.

                3) jmxStats: could you please create a JIRA for this, and preferably attach a simple unit test?

                4) Eviction: the general solution here is to either reduce the wakeup interval (JBC 3 uses millis to measure this so you can go under 1s if needed) or increase the eviction queue size. This may well be why your readers are hanging.

                Cheers
                Manik