5 Replies Latest reply on Apr 18, 2018 1:36 AM by schernolyas

    Unable to acquire lock after 10 seconds

    schernolyas

      Hi!

      I take exception in my production environment:

       

      2018-04-17 12:44:21,046 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (async-thread--p6-t8) ISPN000136: Error executing command PutKeyValueCommand, writing keys [WrappedByteArray{bytes=[B0x4A176865616C7468..[25], hashCode=549662273}]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key WrappedByteArray{bytes=[B0x4A176865616C7468..[25], hashCode=549662273} and requestor CommandInvocation:server1:2819847. Lock is held by CommandInvocation:server1:2819852

              at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288)

              at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:218)

              at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:436)

              at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:412)

              at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)

              at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)

              at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)

              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

              at java.lang.Thread.run(Thread.java:748)

      The exception occurred on one machine and affected to hotrod client application.  CPU utilization up to 100% at moment of the exception.

      What are reasons of the exception ?

       

      My cluster have 2 hosts and external storage that based on RocksDB.

      Any questions welcome

       

      A lot of thanks for suggestions!

        • 1. Re: Unable to acquire lock after 10 seconds
          gustavonalle

          Could you post the configuration of the cache?

           

          Also, have you checked the Garbage Collector activity? If your CPU is high because of it, you need to tune the JVM correctly.

          • 2. Re: Unable to acquire lock after 10 seconds
            schernolyas

            my configuration is :

            ________________

            <subsystem xmlns="urn:infinispan:server:core:9.2">
              <cache-container name="clustered" default-cache="default" statistics="true">
              <transport lock-timeout="60000"/>
              <global-state/>
              <async-operations-thread-pool min-threads="25" max-threads="25" queue-length="1000" keepalive-time="60000"/>
              <expiration-thread-pool max-threads="1" keepalive-time="60000"/>
              <listener-thread-pool min-threads="1" max-threads="3" queue-length="100000" keepalive-time="60000"/>
              <persistence-thread-pool min-threads="1" max-threads="4" queue-length="0" keepalive-time="60000"/>
              <remote-command-thread-pool min-threads="25" max-threads="25" queue-length="100000" keepalive-time="60000"/>
              <replication-queue-thread-pool  max-threads="10" keepalive-time="60000"/>
              <state-transfer-thread-pool min-threads="5" max-threads="60" queue-length="0" keepalive-time="60000"/>
              <transport-thread-pool min-threads="25" max-threads="25" queue-length="100000" keepalive-time="60000"/>

               <!-- caches for index-->
               <replicated-cache name="LuceneIndexesLocking_news"/>
              <replicated-cache name="LuceneIndexesMetadata_news"/>
              <replicated-cache name="LuceneIndexesLocking_devices"/>
              <replicated-cache name="LuceneIndexesMetadata_devices"/>


              <distributed-cache name="CHANNEL_FACTORS" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
               <!--need cassandra store -->
               <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/channel_factors/data">
              <expiration path="/data/rocksdb/channel_factors/expired"/>
              </rocksdb-store>
              </distributed-cache>
              <distributed-cache name="CHANNELS_FEATURES" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/channels_features/data">
              <expiration path="/data/rocksdb/channels_features/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="CHANNELS_POPULARITY" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/channels_popularity/data">
              <expiration path="/data/rocksdb/channels_popularity/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="DEVICE_POPULARITY" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/device_popularity/data">
              <expiration path="/data/rocksdb/channels_popularity/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="CATEGORY_POPULARITY" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/category_popularity/data">
              <expiration path="/data/rocksdb/category_popularity/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="DEVICES" owners="2" segments="256" l1-lifespan="1000" mode="ASYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="50000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/devices/data">
              <expiration path="/data/rocksdb/devices/expired"/>
              </rocksdb-store>
              <indexing index="LOCAL">
              <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
              <property name="default.directory_provider">infinispan</property>
              <property name="default.worker.execution">async</property>
              <property name="default.index_flush_interval">500</property>
              <property name="default.indexwriter.merge_factor">30</property>
              <property name="default.indexwriter.merge_max_size">1024</property>
              <property name="default.indexwriter.ram_buffer_size">256</property>
              <property name="default.locking_cachename">LuceneIndexesLocking_devices</property>
              <property name="default.data_cachename">LuceneIndexesData_devices</property>
              <property name="default.metadata_cachename">LuceneIndexesMetadata_devices</property>
              </indexing>
              </distributed-cache>
              <distributed-cache name="LuceneIndexesData_devices">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <rocksdb-store preload="true" fetch-state="true" passivation="true" name="devices_index_data" path="/data/rocksdb/devices_index_data/data">
              <expiration path="/data/rocksdb/devices_index_data/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="HIT_RATES" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>

              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/hit_rates/data">
              <expiration path="/data/rocksdb/hit_rates/expired"/>
              </rocksdb-store>
              </distributed-cache>


              <distributed-cache name="METADATA" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/metatadata/data">
              <expiration path="/data/rocksdb/metatadata/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="MODELS" owners="2" segments="2" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="100" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/models/data">
              <expiration path="/data/rocksdb/models/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="NEWS" owners="2" segments="256" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="20000" strategy="REMOVE"/>
              </memory>
              <indexing index="LOCAL">
              <property name="default.directory_provider">infinispan</property>
              <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager

               </property>
              <property name="default.worker.execution">async</property>
              <property name="default.index_flush_interval">500</property>
              <property name="default.indexwriter.merge_factor">30</property>
              <property name="default.indexwriter.merge_max_size">1024</property>
              <property name="default.indexwriter.ram_buffer_size">256</property>
              <property name="default.locking_cachename">LuceneIndexesLocking_news</property>
              <property name="default.data_cachename">LuceneIndexesData_news</property>
              <property name="default.metadata_cachename">LuceneIndexesMetadata_news</property>
              </indexing>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/news/data">
              <expiration path="/data/rocksdb/news/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="LuceneIndexesData_news">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <rocksdb-store preload="true" fetch-state="true" passivation="true" name="news_index_data" path="/data/rocksdb/news_index_data/data">
              <expiration path="/data/rocksdb/news_index_data/expired"/>
              </rocksdb-store>

              </distributed-cache>

              <distributed-cache name="NEWS_FACTORS" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="50000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/news_factors/data">
              <expiration path="/data/rocksdb/news_factors/expired"/>
              </rocksdb-store>
              </distributed-cache>
              <distributed-cache name="NEWS_FEATURES" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/news_features/data">
              <expiration path="/data/rocksdb/news_features/expired"/>
              </rocksdb-store>
              </distributed-cache>
              <distributed-cache name="NEWS_POPULARITY" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/news_popularity/data">
              <expiration path="/data/rocksdb/news_popularity/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="SEGMENTS" owners="1" segments="2048" l1-lifespan="10000" mode="ASYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="4096"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/segments/data">
              <expiration path="/data/rocksdb/segments/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="SPP_VEON" owners="1" segments="1024" l1-lifespan="1000" mode="ASYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/spp_veon/data">
              <expiration path="/data/rocksdb/spp_veon/expired"/>
              </rocksdb-store>
               <!--need cassandra store -->
               </distributed-cache>

              <distributed-cache name="USER_FACTORS" owners="1" segments="512" l1-lifespan="1000" mode="ASYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/user_factors/data">
              <expiration path="/data/rocksdb/user_factors/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="USER_THRESHOLDS" owners="2" segments="100" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="50000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/user_thresholds/data">
              <expiration path="/data/rocksdb/user_thresholds/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="WORD_FEATURES" owners="2" segments="10" l1-lifespan="1000" mode="SYNC">
              <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
              <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
              <memory>
              <object size="1000" strategy="REMOVE"/>
              </memory>
              <rocksdb-store preload="true" passivation="true" name="channel_factors" path="/data/rocksdb/word_features/data">
              <expiration path="/data/rocksdb/word_features/expired"/>
              </rocksdb-store>
              </distributed-cache>

              <distributed-cache name="default">
              <memory>
              <object size="10000" />
              </memory>
              </distributed-cache>
              <counters>
              <strong-counter name="DEVICES_LOCK" initial-value="0" storage="PERSISTENT">
              <lower-bound value="0"/>
              <upper-bound value="100"/>
              </strong-counter>
              <strong-counter name="MODELS_LOCK" initial-value="0" storage="PERSISTENT">
              <lower-bound value="0"/>
              <upper-bound value="100"/>
              </strong-counter>
              </counters>
              </cache-container>
            </subsystem>

            _____

            • 3. Re: Unable to acquire lock after 10 seconds
              schernolyas

              Usually, Infinispan client application has CPU utilization less 1%.

              Host machine have 12 cores.

              • 4. Re: Unable to acquire lock after 10 seconds
                gustavonalle

                Hi, you need to check on the server the JVM activity related to garbage collection to see if it is causing the high CPU.

                 

                You can do that by passing GC related options to the JVM such as -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/tmp/gc.log

                or simply by logging into the machine where Infinispan is running and using the jstat command, for example: jstat -gcutil PID 1000 where PID is the process id of the server.

                • 5. Re: Unable to acquire lock after 10 seconds
                  schernolyas

                  Hi!

                   

                  I use  " <persistence-thread-pool min-threads="1" max-threads="4" queue-length="0" keepalive-time="60000"/>. ". It can be very few for serve  25 caches.