- 
        1. Re: StateTransferTimeout appears to be ignoredbrian.stansberry Feb 4, 2008 10:16 PM (in response to dcolwell)In the future, please post the full log message, it will save time. :-) 
 The code logging what you posted is not the web session replication service. It's the jboss:service=DefaultPartition mbean from cluster-service.xml.
 Please enable DEBUG logging for category org.jboss.ha. You should get more information on what's happening during the delay.
 The state transfer timeout param controls how long the service will wait for a state tranfer response from the other node. After receiving the response it still needs to integrate the received state. That can take time, although several minutes certainly seems excessive. The debug logging should help show what the problem is.
- 
        2. Re: StateTransferTimeout appears to be ignoreddcolwell Feb 8, 2008 12:17 PM (in response to dcolwell)The following is my log4j trace of org.jboss.ha category. Unfortunately it does not appear to contain useful information during the 3+ minute pause after the line "Fetching State". Is there other information in this log that could give a clue or would you have other thoughts on how I should investigate? 
 Thank you,
 Dave
 2008-02-08 11:04:59,787 [main] DEBUG [ClusterPartition] Setting JGProps from xml to: UDP(discard_incompatible_packets=true;down_thread=false;enable_bundling=false;ip_ttl=2;loopback=false;max_
 bundle_size=64000;max_bundle_timeout=30;mcast_addr=231.1.2.2;mcast_port=47102;mcast_recv_buf_size=25000000;mcast_send_buf_size=640000;tos=8;ucast_recv_buf_size=20000000;ucast_send_buf_size=64
 0000;up_thread=false;use_incoming_packet_handler=true;use_outgoing_packet_handler=false):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(down_thread=false;ma
 x_interval=100000;min_interval=20000;up_thread=false):FD_SOCK(down_thread=false;up_thread=false):FD(down_thread=false;max_tries=5;shun=true;timeout=10000;up_thread=false):VERIFY_SUSPECT(down_
 thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(discard_delivered_msgs=true;down_thread=false;gc_lag=0;max_xmit_size=60000;retransmit_timeout=300,600,1200,2400,4800;up_thread=false;u
 se_mcast_xmit=false):UNICAST(down_thread=false;timeout=300,600,1200,2400,3600;up_thread=false):pbcast.STABLE(desired_avg_gossip=50000;down_thread=false;max_bytes=400000;stability_delay=1000;u
 p_thread=false):pbcast.GMS(down_thread=false;join_retry_timeout=2000;join_timeout=3000;print_local_addr=true;shun=true;up_thread=false;view_bundling=true):FRAG2(down_thread=false;frag_size=60
 000;up_thread=false):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false;use_flush=false)
 2008-02-08 11:05:00,014 [main] DEBUG [ClusterPartition] Creating jboss:service=MSPTEST_emtrack
 2008-02-08 11:05:00,014 [main] DEBUG [ClusterPartition] Creating JGroups JChannel
 2008-02-08 11:05:00,279 [main] DEBUG [ClusterPartition] Creating HAPartition
 2008-02-08 11:05:00,308 [main] DEBUG [ClusterPartition] Initing HAPartition: org.jboss.ha.framework.server.HAPartitionImpl@18837f1
 2008-02-08 11:05:00,308 [main] INFO [MSPTEST_emtrack] Initializing
 2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] setMembershipListener
 2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] setMessageListener
 2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] create replicant manager
 2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] init replicant manager
 2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] registerRPCHandler
 2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] subscribeToStateTransferEvents
 2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] registerMembershipListener
 2008-02-08 11:05:00,342 [main] DEBUG [MSPTEST_emtrack] bind replicant manager
 2008-02-08 11:05:00,342 [main] DEBUG [MSPTEST_emtrack] create distributed state
 2008-02-08 11:05:00,354 [main] DEBUG [MSPTEST_emtrack] init distributed state service
 2008-02-08 11:05:00,359 [main] DEBUG [MSPTEST_emtrack] bind distributed state service
 2008-02-08 11:05:00,366 [main] DEBUG [MSPTEST_emtrack] done initing.
 2008-02-08 11:05:00,366 [main] DEBUG [ClusterPartition] HAPartition initialized
 2008-02-08 11:05:00,366 [main] DEBUG [ClusterPartition] Created jboss:service=MSPTEST_emtrack
 2008-02-08 11:05:00,367 [main] DEBUG [HASessionStateService] Creating jboss:service=HASessionState
 2008-02-08 11:05:00,385 [main] DEBUG [HASessionStateService] Created jboss:service=HASessionState
 2008-02-08 11:05:00,386 [main] DEBUG [HANamingService] Creating jboss:service=HAJNDI
 2008-02-08 11:05:00,386 [main] DEBUG [HANamingService] Initializing HAJNDI server on partition: MSPTEST_emtrack
 2008-02-08 11:05:00,397 [main] DEBUG [HANamingService] initialize HAJNDI
 2008-02-08 11:05:00,397 [main] DEBUG [HAJNDI] subscribeToStateTransferEvents
 2008-02-08 11:05:00,397 [main] DEBUG [TreeHead] registerRPCHandler
 2008-02-08 11:05:00,398 [main] DEBUG [HANamingService] Created jboss:service=HAJNDI
 2008-02-08 11:05:00,401 [main] DEBUG [ClusterPartition] Starting jboss:service=MSPTEST_emtrack
 2008-02-08 11:05:00,404 [main] DEBUG [ClusterPartition] Starting ClusterPartition: MSPTEST_emtrack
 2008-02-08 11:05:02,490 [main] DEBUG [MSPTEST_emtrack] ViewAccepted: initial members set
 2008-02-08 11:05:02,494 [main] DEBUG [ClusterPartition] Starting channel
 2008-02-08 11:05:02,495 [main] DEBUG [MSPTEST_emtrack] get nodeName
 2008-02-08 11:05:02,495 [main] DEBUG [MSPTEST_emtrack] Get current members
 2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Number of cluster members: 2
 2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Other members: 1
 2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Fetching state (will wait for 30000 milliseconds):
 2008-02-08 11:05:02,502 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState called
 2008-02-08 11:08:11,545 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for DistributedReplicantManager
 2008-02-08 11:08:11,549 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for HASessionStateTransfer
 2008-02-08 11:08:11,549 [DRM Async Publisher#0] DEBUG [MSPTEST_emtrack] DRM: Sleeping before re-publishing for 50ms just in case
 2008-02-08 11:08:11,550 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [/HASessionState/Default] Receiving state of HASessionState
 2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for HAJNDI
 2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [TreeHead] registerRPCHandler
 2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for DistributedState
 2008-02-08 11:08:11,552 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] received a state of 1466 bytes; expanded memory by 4915968 bytes (used memory before: 40976664, used memory after: 45892632)
 2008-02-08 11:08:11,552 [main] INFO [MSPTEST_emtrack] state was retrieved successfully (in 189056 milliseconds)
- 
        3. Re: StateTransferTimeout appears to be ignoredbrian.stansberry Feb 8, 2008 12:31 PM (in response to dcolwell)Here's the code that's being executed during the logging gap: log.debug("setState called"); if (obj == null) { log.debug("state is null"); return; } long used_mem_before, used_mem_after; int state_size=obj != null? obj.length : 0; Runtime rt=Runtime.getRuntime(); used_mem_before=rt.totalMemory() - rt.freeMemory(); HashMap state = (HashMap)objectFromByteBuffer(obj); java.util.Iterator keys = state.keySet().iterator(); while (keys.hasNext()) { String key = (String)keys.next(); log.debug("setState for " + key);
 Here's the objectFromByteBuffer() code:public static Object objectFromByteBuffer (byte[] buffer) throws Exception { if(buffer == null) return null; ByteArrayInputStream bais = new ByteArrayInputStream(buffer); MarshalledValueInputStream mvis = new MarshalledValueInputStream(bais); return mvis.readObject(); }
 This is all in the org.jboss.ha.framework.server.HAPartitionImpl class, found in the cluster module of the AS source.
 It's possible there is some issue with the MarshalledValueInputStream.readObject() call although I doubt it. In this case it's basically just an ordinary java.io.ObjectInputStream.readObject() call. And as you can see from the final log message, the byte[] it's deserializing is only 1466 bytes; hardly anything. If that were having issues I expect I'd hear more about it from others.
 Perhaps something the memory calculation business? That at least is somewhat environment-specific. Just a wild guess.
 
    