2 Replies Latest reply on Jun 10, 2016 3:30 PM by skull

    Session Repliation on Wildfly 9.0.2 results in IllegalStateException with DummyTransaction and Timeout Exception aquiring lock

    tkummer

      We a running a pair of Wildfly 9.02 servers, crm-appsr1 and crm-appsrv2 in cluster with about 100 to 200 requests per second and would like to turn session replication on.

       

      But after turning session replication on with <distributable/> in two of our webapps we start getting a slew of Infinispan exceptions at leas one of which results in the users getting an error page. The exceptions are of two types.

       

      The first, RemoteException/TimoutException occurs about 180 times per hour or so and starts like this:

       

      2016-02-25 04:18:43,703 ERROR                         [request] UT005023: Exception handling request to /test-domain.do: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from crm-appsrv2, see cause for remote stack trace

      at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)

      at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:71)

       

      ...

       

      Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key E2FnWN3FO_UBl5gMEJSQjOM0vbh5TSKti26_4hjP and requestor GlobalTransaction:<crm-appsrv1>:1310:remote. Lock is held by GlobalTransaction:<crm-appsrv1>:1303:remote, while request came from crm-appsrv1

              at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:198)

      The whole stack trace is at http://pastebin.com/Wt11YfNH.

       

      The other exception happens about 20 times per hour and results in an error page to the user. The first few lines are:

       

      2016-02-25 04:06:22,297 ERROR xp_eXB-g  2724717  sach [InvocationContextInterceptor] ISPN000136: Execution error: java.lang.IllegalStateException: Transaction DummyTransaction{xid=DummyXid{id=545989}, status=3} is not in a valid state to be invoking cache operations on.

              at org.infinispan.interceptors.TxInterceptor.enlist(TxInterceptor.java:389)

              at org.infinispan.interceptors.TxInterceptor.enlistIfNeeded(TxInterceptor.java:351)

      The whole stack trace is at http://pastebin.com/sbSVZKSh

       

      The configuration for Jgroups, Infinispan and Undertow is:

       

          <server xmlns="urn:jboss:domain:3.0">
              ...
              <profile>
                  ...
                  <subsystem xmlns="urn:jboss:domain:jgroups:3.0">
                      <channels default="ee">
                          <channel name="ee"/>
                      </channels>
                      <stacks default="tcp">
                          <stack name="tcp">
                              <transport type="TCP" socket-binding="jgroups-tcp">
                                  <property name="log_discard_msgs">true</property>
                              </transport>
                              <protocol type="TCPPING">
                                  <property name="initial_hosts">
                                      ${jgroups.tcpping.initial_hosts:crm-appsrv1.public.one.com[7600]}
                                  </property>
                                  <property name="port_range">0</property>
                                  <property name="timeout">5000</property>
                              </protocol>
                              <protocol type="MERGE3"/>
                              <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
                              <protocol type="FD"/>
                              <protocol type="VERIFY_SUSPECT"/>
                              <protocol type="pbcast.NAKACK2"/>
                              <protocol type="UNICAST3"/>
                              <protocol type="pbcast.STABLE"/>
                              <protocol type="pbcast.GMS"/>
                              <protocol type="MFC"/>
                              <protocol type="FRAG2"/>
                              <protocol type="RSVP"/>
                          </stack>
                          <stack name="tcphq">
                              <transport type="TCP" socket-binding="jgroups-tcp-hq">
                                  <property name="log_discard_msgs">true</property>
                              </transport>
                              <protocol type="TCPPING">
                                  <property name="initial_hosts">
                                      ${jgroups.tcpping.initial_hosts_hq:crm-appsrv1.public.one.com[7660]}
                                  </property>
                                  <property name="port_range">0</property>
                                  <property name="timeout">5000</property>
                              </protocol>
                              <protocol type="MERGE3"/>
                              <protocol type="FD_SOCK" socket-binding="jgroups-tcp-hq-fd"/>
                              <protocol type="FD"/>
                              <protocol type="VERIFY_SUSPECT"/>
                              <protocol type="pbcast.NAKACK2"/>
                              <protocol type="UNICAST3"/>
                              <protocol type="pbcast.STABLE"/>
                              <protocol type="pbcast.GMS"/>
                              <protocol type="MFC"/>
                              <protocol type="FRAG2"/>
                              <protocol type="RSVP"/>
                          </stack>
                      </stacks>
                  </subsystem>
                  ...
                  <subsystem xmlns="urn:jboss:domain:infinispan:3.0">
                      <cache-container name="server" default-cache="default" module="org.wildfly.clustering.server" aliases="singleton cluster">
                          <transport lock-timeout="60000"/>
                          <replicated-cache name="default" mode="SYNC">
                              <transaction mode="BATCH"/>
                          </replicated-cache>
                      </cache-container>
                      <cache-container name="web" default-cache="dist" module="org.wildfly.clustering.web.infinispan">
                          <transport lock-timeout="60000"/>
                          <!--
                          <replicated-cache name="repl" mode="ASYNC">
                              <transaction mode="BATCH"/>
                              <transaction locking="OPTIMISTIC"/>
                              <locking isolation="READ_COMMITTED"/>
                              <file-store/>
                          </replicated-cache>
                          -->
                          <!--
                          Causes org.infinispan.util.concurrent.TimeoutException (see 0e3c5b439881e361eb75309aa64bd631c5878dec and https://developer.jboss.org/thread/254200)
                          -->
                          <distributed-cache name="dist" mode="ASYNC" owners="2" l1-lifespan="0">
                              <locking isolation="REPEATABLE_READ"/>
                              <transaction mode="BATCH"/>
                              <file-store/>
                          </distributed-cache>
                      </cache-container>
                      <cache-container name="ejb" default-cache="dist" module="org.wildfly.clustering.ejb.infinispan" aliases="sfsb">
                          <transport lock-timeout="60000"/>
                          <distributed-cache name="dist" mode="ASYNC" owners="2" l1-lifespan="0">
                              <locking isolation="REPEATABLE_READ"/>
                              <transaction mode="BATCH"/>
                              <file-store/>
                          </distributed-cache>
                      </cache-container>
                      <cache-container name="hibernate" default-cache="local-query" module="org.hibernate.infinispan">
                          <transport lock-timeout="60000"/>
                          <local-cache name="local-query">
                              <eviction strategy="LRU" max-entries="10000"/>
                              <expiration max-idle="100000"/>
                          </local-cache>
                          <invalidation-cache name="entity" mode="SYNC">
                              <transaction mode="NON_XA"/>
                              <eviction strategy="LRU" max-entries="10000"/>
                              <expiration max-idle="100000"/>
                          </invalidation-cache>
                          <replicated-cache name="timestamps" mode="ASYNC"/>
                      </cache-container>
                  </subsystem>
                  ...
                  <subsystem xmlns="urn:jboss:domain:undertow:2.0">
                      <buffer-cache name="default"/>
                      <server name="default-server">
                          <ajp-listener name="ajp" socket-binding="ajp" max-post-size="41943040" max-parameters="20000"/>
                          <http-listener name="default" socket-binding="http" redirect-socket="https" max-post-size="41943040" max-parameters="20000"/>
                          <host name="default-host" alias="localhost" default-web-module="wildfly-default-war.war">
                              <location name="/" handler="welcome-content"/>
                              <access-log pattern="%t %a ip=%{i,X-Real-IP} %u [%I] %{c,JSESSIONID} %U %{i,X-Requested-URL} %s %b %D &quot;%{i,User-Agent}&quot;" rotate="true" />
                              <filter-ref name="server-header"/>
                              <filter-ref name="x-powered-by-header"/>
                              <single-sign-on path="/" domain="one.com"/>
                          </host>
                      </server>
                      <servlet-container name="default">               
                          <jsp-config/>
                          <websockets/>
                      </servlet-container>
                      <handlers>
                          <file name="welcome-content" path="${jboss.home.dir}/welcome-content"/>
                      </handlers>
                      <filters>
                          <response-header name="server-header" header-name="Server" header-value="WildFly/9"/>
                          <response-header name="x-powered-by-header" header-name="X-Powered-By" header-value="Undertow/1"/>
                      </filters>
                  </subsystem>
                  ...
              </profile>
              ...
          </server>
      

       

      According to the answer to this entry (https://developer.jboss.org/message/883301#883301) the InvalidStateExceptions with DummyTransactions are a bug:

      The second exception is a bug.  The RetryInvoker needs to be sensitive to the status of any current transaction.  If the status is rollback-only, it cannot retry (and throws this exception).

      The answer refers to an issue which marks this as fixed in 8.2.0.Final, 9.0.0.Alpha1. But we are running 9.0.2,

      so perhaps the problems are perhaps not the same.

       

      Unfortunately the problem doesn't happen on our staging servers which have the exact same set up. Perhaps it only happens under load.

       

      I've experimented a bit with the Infinispan settings for the web cache-container, but restarting the productions servers to test changed settings is inconvenient for the users.

       

      Any pointers would be appreciated.