Problem with a PUT on a large row in 4.2.0.FINAL - works with stateTransfer
cbo_ Dec 30, 2010 10:34 AMNeed Help! Using 4.2.0.FINAL and doing some performance testing and just found a functional problem. This is replicated scenario using ReplicationQueue between 2 nodes with one none only doing the updates.
The special focus is on a single row that has an ArrayList with many, many entries. It may raise your eyebrows to know the depth of this row, but I don't want to focus so much on that as I do want to point one thing out. This entry does get to the partner app on a stateTransfer.
To clarify.....I will call the app that actively does the updates to the cache A and the one that is passive B. When the scenario is such that A comes up and does updates to its caches ahead of starting B (which is our normal case), the stateTransfer works correctly and brings all rows to B correctly. When the scenario includes a restart of A....so A comes up, then B, and some situation occurs that requires A to restart (and B stays up), it is up to the put on A to get across to B (this is because we first do a clear on the cache on A when it comes up). In this case all rows in get across from A to B except for this wide/deep row. I have put detailed trace levels on during a test where this occurred and it is evident that the put is done on A, but no evidence it is received by B at all. Here is a snippet of the logs at that time:
This is the A side:
2010-12-23 10:55:00,194 124082 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Scheduled-ReplicationQueueThread-0:) dests=null, command=MultipleRpcCommand{commands=[PutKeyValueCommand{key=5, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@6e64f6d5, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'}, mode=ASYNCHRONOUS, timeout=15000 2010-12-23 10:55:00,194 124082 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Scheduled-ReplicationQueueThread-0:) dests=null, command=MultipleRpcCommand{commands=[PutKeyValueCommand{key=5, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@6e64f6d5, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'}, mode=ASYNCHRONOUS, timeout=15000 2010-12-23 10:55:00,195 124083 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (AsyncSerializationThread-0:) Replication task sending MultipleRpcCommand{commands=[PutKeyValueCommand{key=5, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@6e64f6d5, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'} to addresses null 2010-12-23 10:55:00,205 124093 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=3) 2010-12-23 10:55:00,205 124093 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) Flushing 3 elements 2010-12-23 10:55:00,205 124093 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Scheduled-ReplicationQueueThread-0:) dests=null, command=MultipleRpcCommand{commands=[PutKeyValueCommand{key=6, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@71130ea6, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=1, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@1a26ee49, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=2, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@71e60808, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'}, mode=ASYNCHRONOUS, timeout=15000 2010-12-23 10:55:00,216 124104 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,228 124116 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,240 124128 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,251 124139 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,262 124150 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,272 124160 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,288 124176 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,304 124192 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,315 124203 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,326 124214 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,338 124226 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,351 124239 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,353 124241 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (AsyncSerializationThread-0:) Replication task sending MultipleRpcCommand{commands=[PutKeyValueCommand{key=6, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@71130ea6, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=1, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@1a26ee49, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=2, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@71e60808, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'} to addresses null 2010-12-23 10:55:00,362 124250 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,366 124254 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (AsyncSerializationThread-0:) real_dests=[test16_devenv1-61733] 2010-12-23 10:55:00,367 124255 TRACE [org.jgroups.protocols.pbcast.NAKACK] (AsyncSerializationThread-0:) sending test16_devenv2-53120#4 2010-12-23 10:55:00,367 124255 TRACE [org.jgroups.protocols.TCP] (AsyncSerializationThread-0:) sending msg to null, src=test16_devenv2-53120, headers are RequestCorrelator: id=201, type=REQ, id=3, rsp_expected=false, NAKACK: [MSG, seqno=4], TCP: [channel_name=infinispan-cluster-OHServerHybrid-test16] 2010-12-23 10:55:00,367 124255 TRACE [org.jgroups.protocols.TCP] (AsyncSerializationThread-0:) looping back message [dst: <null>, src: test16_devenv2-53120 (3 headers), size=494 bytes] 2010-12-23 10:55:00,367 124255 TRACE [org.jgroups.protocols.TCP] (AsyncSerializationThread-0:) dest=170.137.237.83:11600 (598 bytes) 2010-12-23 10:55:00,367 124255 TRACE [org.jgroups.protocols.TCP] (AsyncSerializationThread-0:) dest=170.137.237.81:11600 (598 bytes) 2010-12-23 10:55:00,368 124256 TRACE [org.jgroups.protocols.TCP] (Incoming-1,infinispan-cluster-OHServerHybrid-test16,test16_devenv2-53120:) received [dst: <null>, src: test16_devenv2-53120 (3 headers), size=494 bytes], headers are RequestCorrelator: id=201, type=REQ, id=3, rsp_expected=false, NAKACK: [MSG, seqno=4], TCP: [channel_name=infinispan-cluster-OHServerHybrid-test16]
The items in red (key=5) represent the wide row. You can see other items (key=6, key=2, key=1) that are in blue. Items in blue get to the B side. The item in red does not.
Here is the B side (receiving side):
2010-12-23 10:55:00,359 17676491 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,359 17676491 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,369 17676501 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,369 17676501 TRACE [org.jgroups.protocols.TCP] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) received [dst: <null>, src: test16_devenv2-53120 (3 headers), size=494 bytes], headers are RequestCorrelator: id=201, type=REQ, id=3, rsp_expected=false, NAKACK: [MSG, seqno=4], TCP: [channel_name=infinispan-cluster-OHServerHybrid-test16] 2010-12-23 10:55:00,369 17676501 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0) 2010-12-23 10:55:00,369 17676501 TRACE [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) test16_devenv1-61733: received test16_devenv2-53120#4 2010-12-23 10:55:00,369 17676501 TRACE [org.jgroups.protocols.FC] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) sender test16_devenv2-53120 minus 494 credits, 1999479 remaining 2010-12-23 10:55:00,370 17676502 TRACE [org.jgroups.blocks.RequestCorrelator] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 3 2010-12-23 10:55:00,372 17676504 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) Attempting to execute command: MultipleRpcCommand{commands=[PutKeyValueCommand{key=6, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@7139d335, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=1, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@b5976a6, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=2, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@1e7c921c, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'} [sender=test16_devenv2-53120] 2010-12-23 10:55:00,372 17676504 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) Enough waiting; replayIgnored = false, sr STATE_PREEXISTED 2010-12-23 10:55:00,372 17676504 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (Incoming-2,infinispan-cluster-OHServerHybrid-test16,test16_devenv1-61733:) Calling perform() on MultipleRpcCommand{commands=[PutKeyValueCommand{key=6, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@7139d335, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=1, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@b5976a6, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}, PutKeyValueCommand{key=2, value=com.cboe.server.distributedCache.DistributeCacheValueBaseImpl@1e7c921c, putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1}], cacheName='GenericCache'} 2010-12-23 10:55:00,379 17676511 TRACE [org.infinispan.remoting.ReplicationQueue] (Scheduled-ReplicationQueueThread-0:) flush(): flushing repl queue (num elements=0)
The items just above in blue again seem to support that those keys (6, 2, and 1) are received, but the key=5 never is. I am not sure if the NAKACK is significant which I highlighted in green. Otherwise there is no evidence of key=5 received at all by this app.
We are in performance testing and just uncovered this functional issue causing a significant jeopardy.