Problems in executing the mass indexer
prashant.thakur Apr 29, 2015 11:30 AMWe are facing problems in starting nodes when Indexing is enabled. The setup has 4 nodes(each 200G allocated with 16 cores and 10Gbps link) data size is approx 70G only where Index related caches are in repl_sync mode and since Index Writers are using single lock across cluster. It seems while replicating data lock acquisition is failing.
The parameters used for indexing in configuration file are as follows
We are using Infinispan 7.0.2.Final
<distributed-cache name="SUBSCRIBER" mode="SYNC" owners="2" segments="60" capacity="1" l1-lifespan="0" remote-timeout="240000">
<transaction
transaction-manager-lookup="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
mode="NON_XA"
locking="OPTIMISTIC"/>
<indexing index="LOCAL" auto-config="true">
<property name="default.indexmanager">near-real-time</property>
<property name="default.indexwriter.merge_factor">30</property>
<property name="default.indexwriter.merge_max_size">1024</property>
<property name="default.indexwriter.ram_buffer_size">220</property>
<property name="default.locking_strategy">native</property>
<property name="default.sharding_strategy.nbr_of_shards">6</property>
<property name="default.max_queue_length">1000000</property>
<property name="default.worker.execution">async</property>
<property name="default.worker.thread_pool.size">32</property>
<property name="default.chunk_size">128000</property>
</indexing>
</distributed-cache>
JGroups setting is attached.
We are getting the following error message while indexing.
29/Apr 16:57:23,100 WARN [TCP] [Timer-2,pocg8-19-node2-26137] (TCPConnectionMap.java:624) - Discarding message because TCP send_queue is full and hasn't been releasing for 2000 ms
ERROR [2015-04-29 20:42:58,906] [org.infinispan.interceptors.InvocationContextInterceptor] [ForkJoinPool.commonPool-worker-9] (InvocationContextInterceptor.java:124) - ISPN000136: Execution error
org.infinispan.remoting.RemoteException: ISPN000217: Received exception from prashantt-ux-35660, see cause for remote stack trace
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:44)
at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:69)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:560)
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290)
at org.infinispan.interceptors.distribution.TxDistributionInterceptor.prepareOnAffectedNodes(TxDistributionInterceptor.java:221)
at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitPrepareCommand(TxDistributionInterceptor.java:205)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.EntryWrappingInterceptor.visitPrepareCommand(EntryWrappingInterceptor.java:109)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.invokeNextAndCommitIf1Pc(AbstractTxLockingInterceptor.java:78)
at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:87)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.NotificationInterceptor.visitPrepareCommand(NotificationInterceptor.java:36)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:131)
at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:118)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
.......................
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key org.infinispan.registry.ScopedKey@a57d6b61 and requestor GlobalTransaction:<ACB-25204>:9:remote. Lock is held by GlobalTransaction:<prashantt-ux-35660>:1025:local, while request came from ACB-25204
Complete logs are attached as tc.out which contains some modifications done to increase timeout by defining Lucene Caches explicitly but as seen from logs 10s time doesn't change.
The questions we have is
1) We are starting Index Writer from only one of the nodes specifically then why process is trying to acquire locking before complete indexing is completed ?
2) How to change the interval from 10s. Which parameter is impacting this 10s timeout ?
3) Will changing time out resolving the locking issue ?
4) Is send queue defind as 640m small to get filled for our messages or is the problem again related back due to processes waiting for locks ?
-
jgroups_tcp.xml.zip 1.2 KB
-
tc.out.zip 60.4 KB