6 Replies Latest reply on Jul 11, 2012 6:08 PM by Dan Berindei

    using a distributed cache for container ejb on AS7

    inspector Newbie

      Hi everybody,

       

      I'm working on scalable ha-clusters and I have a domain-mode cluster in mind which uses Infinispan-distribution to be scalable. So I started to make the cache-containers use distributed-caches instead of replicated ones:

       

                    <subsystem xmlns="urn:jboss:domain:infinispan:1.3">
                      <cache-container name="cluster" aliases="ha-partition" default-cache="default">
                          <transport lock-timeout="60000"/>
                          <!-- changed this one to be distributed -->
                          <distributed-cache name="default" mode="SYNC" batching="true">
                              <locking isolation="REPEATABLE_READ"/>
                          </distributed-cache>
                      </cache-container>
                      <!-- using dist instead of repl as default-cache -->
                      <cache-container name="web" aliases="standard-session-cache" default-cache="dist" module="org.jboss.as.clustering.web.infinispan">
                          <transport lock-timeout="60000"/>
                          <replicated-cache name="repl" mode="ASYNC" batching="true">
                              <file-store/>
                          </replicated-cache>
                          <replicated-cache name="sso" mode="SYNC" batching="true"/>
                          <distributed-cache name="dist" mode="ASYNC" batching="true" l1-lifespan="0">
                              <file-store/>
                          </distributed-cache>
                      </cache-container>
                      <!-- using dist instead of repl as default-cache -->
                      <cache-container name="ejb" aliases="sfsb sfsb-cache" default-cache="dist" module="org.jboss.as.clustering.ejb3.infinispan">
                          <transport lock-timeout="60000"/>
                          <replicated-cache name="repl" mode="ASYNC" batching="true">
                              <eviction strategy="LRU" max-entries="10000"/>
                              <file-store/>
                          </replicated-cache>
                          <!--
                            ~  Clustered cache used internally by EJB subsytem for managing the client-mapping(s) of
                            ~                 the socketbinding referenced by the EJB remoting connector 
                            -->
                          <replicated-cache name="remote-connector-client-mappings" mode="SYNC" batching="true"/>
                          <distributed-cache name="dist" mode="ASYNC" batching="true" l1-lifespan="0">
                              <eviction strategy="LRU" max-entries="10000"/>
                              <file-store/>
                          </distributed-cache>
                      </cache-container>
                      <cache-container name="hibernate" default-cache="local-query" module="org.jboss.as.jpa.hibernate:4">
                          <!-- no changes here -->
                      </cache-container>
                  </subsystem>
      

      It starts up well and seems like everything is ok. But when I try to access a stateful-session-bean the server hangs. It only hangs when I try to use a distributed cache for the container ejb. If I use a replicated cache for ejb but still use distributed caches for cluster and web everything works fine. The problem does appear independent from the number of cluster nodes.

       

      By hanging I mean that the server won't complete the request. It will start to shutdown on CTRL+C but will not finish to. The only way to kill the server is kill -9. Attached to this post there is a file containing the stdout of the server with a thread-dump (kill -3) appended at the point where I requested a SFSB and nothing happened.

       

      I tried this with 7.1.2.Final-tag and some newer trunk version ( 2e2f1a7dfcebebb282987d2d8e2df4407de036ac ). I have not tried it with an EAP 6 yet. The application I was using for testing purposes was the cluster-example from https://github.com/akquinet/jbosscc-as7-examples .

       

      What am I doing wrong? Am I maybe trying something completely stupid?

        • 1. Re: using a distributed cache for container ejb on AS7
          inspector Newbie

          No ideas? Need some more explanation?

          • 2. Re: using a distributed cache for container ejb on AS7
            jaikiran pai Master

            What is the CPU usage when this "hang" occurs? I suspect this is an issue with thread safety and HashMap usage:

             

            [Server:server-one] "ajp-/192.168.122.165:8009-2" daemon prio=10 tid=0x08708800 nid=0x2583 runnable [0xb0882000]
            [Server:server-one]    java.lang.Thread.State: RUNNABLE
            [Server:server-one]     at java.util.HashMap.containsKey(HashMap.java:352)
            [Server:server-one]     at java.util.HashSet.contains(HashSet.java:201)
            [Server:server-one]     at org.infinispan.affinity.KeyAffinityServiceImpl.isNodeInConsistentHash(KeyAffinityServiceImpl.java:351)
            [Server:server-one]     at org.infinispan.affinity.KeyAffinityServiceImpl.getKeyForAddress(KeyAffinityServiceImpl.java:145)
            [Server:server-one]     at org.jboss.as.clustering.ejb3.cache.backing.infinispan.InfinispanBackingCacheEntryStore.createIdentifier(InfinispanBackingCacheEntryStore.java:123)
            [Server:server-one]     at org.jboss.as.clustering.ejb3.cache.backing.infinispan.InfinispanBackingCacheEntryStore.createIdentifier(InfinispanBackingCacheEntryStore.java:71)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.backing.SerializationGroupMemberContainer.createIdentifier(SerializationGroupMemberContainer.java:83)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.backing.SerializationGroupMemberContainer.createIdentifier(SerializationGroupMemberContainer.java:51)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.backing.PassivatingBackingCacheImpl.createIdentifier(PassivatingBackingCacheImpl.java:96)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.backing.PassivatingBackingCacheImpl.createIdentifier(PassivatingBackingCacheImpl.java:60)
            [Server:server-one]     at org.jboss.as.ejb3.cache.spi.impl.AbstractCache.createIdentifier(AbstractCache.java:48)
            [Server:server-one]     at org.jboss.as.ejb3.cache.spi.impl.AbstractCache.createIdentifier(AbstractCache.java:39)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponentInstance.<init>(StatefulSessionComponentInstance.java:70)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.instantiateComponentInstance(StatefulSessionComponent.java:259)
            [Server:server-one]     at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:150)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.constructComponentInstance(StatefulSessionComponent.java:145)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.constructComponentInstance(StatefulSessionComponent.java:76)
            [Server:server-one]     at org.jboss.as.ee.component.BasicComponent.createInstance(BasicComponent.java:85)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.createInstance(StatefulSessionComponent.java:135)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.createInstance(StatefulSessionComponent.java:76)
            [Server:server-one]     at org.jboss.as.ejb3.cache.TransactionAwareObjectFactory.createInstance(TransactionAwareObjectFactory.java:53)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.backing.PassivatingBackingCacheImpl.create(PassivatingBackingCacheImpl.java:121)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.GroupAwareCache.create(GroupAwareCache.java:67)
            [Server:server-one]     at org.jboss.as.ejb3.cache.impl.GroupAwareCache.create(GroupAwareCache.java:41)
            [Server:server-one]     at org.jboss.as.ejb3.component.stateful.StatefulSessionComponent.createSession(StatefulSessionComponent.java:241)
            [Server:server-one]     at org.jboss.as.weld.ejb.StatefulSessionObjectReferenceImpl.<init>(StatefulSessionObjectReferenceImpl.java:72)
            [Server:server-one]     at org.jboss.as.weld.services.bootstrap.WeldEjbServices.resolveEjb(WeldEjbServices.java:60)
            [Server:server-one]     at org.jboss.weld.bean.SessionBean.createReference(SessionBean.java:412)
            [Server:server-one]     at org.jboss.weld.bean.proxy.EnterpriseBeanProxyMethodHandler.<init>(EnterpriseBeanProxyMethodHandler.java:69)
            [Server:server-one]     at org.jboss.weld.bean.SessionBean.create(SessionBean.java:297)
            

             

            HashMap isn't thread safe and there have been known cases where it runs into infinite loops (== high CPU) during multi threaded access. This looks like a bug to me somewhere in Inifinispan. Though the KeyAffinityServiceImpl is marked as ThreadSafe, it ends up using a construct which isn't thread safe.

            • 3. Re: using a distributed cache for container ejb on AS7
              Heinz Wilming Newbie

              Hi,

               

              thanks for your reply. The CPU usage of the server instance process is 100%. I tried the same configuration with the EAP6 too and it’s the same behaviour.

               

              Attached a the thread dump of the server instance and the cpu utiliziation. Should I open a JIRA Ticket?

               

              Regards, Heinz

               

              Bildschirmfoto 2012-07-10 um 14.09.35.png

              • 5. Re: using a distributed cache for container ejb on AS7
                jaikiran pai Master

                Heinz Wilming wrote:

                 

                Hi,

                 

                thanks for your reply. The CPU usage of the server instance process is 100%.

                Okay, so this definitely is a bug in the KeyAffinityServiceImpl which ends up using a non thread safe HashMap which is known to cause issues like this. Thanks for creating that JIRA.

                • 6. Re: using a distributed cache for container ejb on AS7
                  Dan Berindei Expert

                  I don't think this is caused by unsafe usage of HashMap for two reasons:

                   

                  1. The HashSet that appears in the stack trace is never modified (after it has been initialized). The concurrency issues in HashMap appear only with concurrent modifications.

                  2. The profiler screenshot shows KeyAffinityServiceImpl.getKeyForAddress's to have a very significant self time, at 11%. If there was a hang inside the HashMap I would have expected everything but the HashMap methods (or maybe KeyAffinityServiceImpl.isNodeInConsistentHash, assuming everything was inlined) to take > 90% of the time.

                   

                  Based on this, I'm pretty sure the problem is that the key generator is not actually generating any keys for that node and KeyAffinityServiceImpl.getKeyForAddress is stuck in the while loop.