3 Replies Latest reply on Feb 8, 2008 12:31 PM by brian.stansberry

    StateTransferTimeout appears to be ignored

    dcolwell

      I have deployed JBoss 4.2.1GA across a two node cluster. It's configured for asynchronous, non-transactional replication of user session data using UDP. Application state is exclusively servlet based with no EJBs. After starting up the first node I fire up the second and have observed it pause from 1 to 10 minutes after logging:

      Fetching state (will wait for 30000 milliseconds)

      The next line in my server.log file appears successful, just long:

      state was retrieved successfully (in 189067 milliseconds)

      The 30000 millis is configured by setting a StateTransferTimeout attribute, but different values appear to be equally ignored. Does anyone have thoughts on what may be causing this? The amount of user data potentially active on the first node is rather small (< 100k) if any.

      Thanks much!

        • 1. Re: StateTransferTimeout appears to be ignored
          brian.stansberry

          In the future, please post the full log message, it will save time. :-)

          The code logging what you posted is not the web session replication service. It's the jboss:service=DefaultPartition mbean from cluster-service.xml.

          Please enable DEBUG logging for category org.jboss.ha. You should get more information on what's happening during the delay.

          The state transfer timeout param controls how long the service will wait for a state tranfer response from the other node. After receiving the response it still needs to integrate the received state. That can take time, although several minutes certainly seems excessive. The debug logging should help show what the problem is.

          • 2. Re: StateTransferTimeout appears to be ignored
            dcolwell

            The following is my log4j trace of org.jboss.ha category. Unfortunately it does not appear to contain useful information during the 3+ minute pause after the line "Fetching State". Is there other information in this log that could give a clue or would you have other thoughts on how I should investigate?

            Thank you,
            Dave


            2008-02-08 11:04:59,787 [main] DEBUG [ClusterPartition] Setting JGProps from xml to: UDP(discard_incompatible_packets=true;down_thread=false;enable_bundling=false;ip_ttl=2;loopback=false;max_
            bundle_size=64000;max_bundle_timeout=30;mcast_addr=231.1.2.2;mcast_port=47102;mcast_recv_buf_size=25000000;mcast_send_buf_size=640000;tos=8;ucast_recv_buf_size=20000000;ucast_send_buf_size=64
            0000;up_thread=false;use_incoming_packet_handler=true;use_outgoing_packet_handler=false):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(down_thread=false;ma
            x_interval=100000;min_interval=20000;up_thread=false):FD_SOCK(down_thread=false;up_thread=false):FD(down_thread=false;max_tries=5;shun=true;timeout=10000;up_thread=false):VERIFY_SUSPECT(down_
            thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(discard_delivered_msgs=true;down_thread=false;gc_lag=0;max_xmit_size=60000;retransmit_timeout=300,600,1200,2400,4800;up_thread=false;u
            se_mcast_xmit=false):UNICAST(down_thread=false;timeout=300,600,1200,2400,3600;up_thread=false):pbcast.STABLE(desired_avg_gossip=50000;down_thread=false;max_bytes=400000;stability_delay=1000;u
            p_thread=false):pbcast.GMS(down_thread=false;join_retry_timeout=2000;join_timeout=3000;print_local_addr=true;shun=true;up_thread=false;view_bundling=true):FRAG2(down_thread=false;frag_size=60
            000;up_thread=false):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false;use_flush=false)
            2008-02-08 11:05:00,014 [main] DEBUG [ClusterPartition] Creating jboss:service=MSPTEST_emtrack
            2008-02-08 11:05:00,014 [main] DEBUG [ClusterPartition] Creating JGroups JChannel
            2008-02-08 11:05:00,279 [main] DEBUG [ClusterPartition] Creating HAPartition
            2008-02-08 11:05:00,308 [main] DEBUG [ClusterPartition] Initing HAPartition: org.jboss.ha.framework.server.HAPartitionImpl@18837f1
            2008-02-08 11:05:00,308 [main] INFO [MSPTEST_emtrack] Initializing
            2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] setMembershipListener
            2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] setMessageListener
            2008-02-08 11:05:00,309 [main] DEBUG [MSPTEST_emtrack] create replicant manager
            2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] init replicant manager
            2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] registerRPCHandler
            2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] subscribeToStateTransferEvents
            2008-02-08 11:05:00,337 [main] DEBUG [MSPTEST_emtrack] registerMembershipListener
            2008-02-08 11:05:00,342 [main] DEBUG [MSPTEST_emtrack] bind replicant manager
            2008-02-08 11:05:00,342 [main] DEBUG [MSPTEST_emtrack] create distributed state
            2008-02-08 11:05:00,354 [main] DEBUG [MSPTEST_emtrack] init distributed state service
            2008-02-08 11:05:00,359 [main] DEBUG [MSPTEST_emtrack] bind distributed state service
            2008-02-08 11:05:00,366 [main] DEBUG [MSPTEST_emtrack] done initing.
            2008-02-08 11:05:00,366 [main] DEBUG [ClusterPartition] HAPartition initialized
            2008-02-08 11:05:00,366 [main] DEBUG [ClusterPartition] Created jboss:service=MSPTEST_emtrack
            2008-02-08 11:05:00,367 [main] DEBUG [HASessionStateService] Creating jboss:service=HASessionState
            2008-02-08 11:05:00,385 [main] DEBUG [HASessionStateService] Created jboss:service=HASessionState
            2008-02-08 11:05:00,386 [main] DEBUG [HANamingService] Creating jboss:service=HAJNDI
            2008-02-08 11:05:00,386 [main] DEBUG [HANamingService] Initializing HAJNDI server on partition: MSPTEST_emtrack
            2008-02-08 11:05:00,397 [main] DEBUG [HANamingService] initialize HAJNDI
            2008-02-08 11:05:00,397 [main] DEBUG [HAJNDI] subscribeToStateTransferEvents
            2008-02-08 11:05:00,397 [main] DEBUG [TreeHead] registerRPCHandler
            2008-02-08 11:05:00,398 [main] DEBUG [HANamingService] Created jboss:service=HAJNDI
            2008-02-08 11:05:00,401 [main] DEBUG [ClusterPartition] Starting jboss:service=MSPTEST_emtrack
            2008-02-08 11:05:00,404 [main] DEBUG [ClusterPartition] Starting ClusterPartition: MSPTEST_emtrack
            2008-02-08 11:05:02,490 [main] DEBUG [MSPTEST_emtrack] ViewAccepted: initial members set
            2008-02-08 11:05:02,494 [main] DEBUG [ClusterPartition] Starting channel
            2008-02-08 11:05:02,495 [main] DEBUG [MSPTEST_emtrack] get nodeName
            2008-02-08 11:05:02,495 [main] DEBUG [MSPTEST_emtrack] Get current members
            2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Number of cluster members: 2
            2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Other members: 1
            2008-02-08 11:05:02,496 [main] INFO [MSPTEST_emtrack] Fetching state (will wait for 30000 milliseconds):
            2008-02-08 11:05:02,502 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState called
            2008-02-08 11:08:11,545 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for DistributedReplicantManager
            2008-02-08 11:08:11,549 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for HASessionStateTransfer
            2008-02-08 11:08:11,549 [DRM Async Publisher#0] DEBUG [MSPTEST_emtrack] DRM: Sleeping before re-publishing for 50ms just in case
            2008-02-08 11:08:11,550 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [/HASessionState/Default] Receiving state of HASessionState
            2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for HAJNDI
            2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [TreeHead] registerRPCHandler
            2008-02-08 11:08:11,551 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] setState for DistributedState
            2008-02-08 11:08:11,552 [IncomingPacketHandler (channel=MSPTEST_emtrack)] DEBUG [MSPTEST_emtrack] received a state of 1466 bytes; expanded memory by 4915968 bytes (used memory before: 40976664, used memory after: 45892632)
            2008-02-08 11:08:11,552 [main] INFO [MSPTEST_emtrack] state was retrieved successfully (in 189056 milliseconds)

            • 3. Re: StateTransferTimeout appears to be ignored
              brian.stansberry

              Here's the code that's being executed during the logging gap:

               log.debug("setState called");
               if (obj == null)
               {
               log.debug("state is null");
               return;
               }
              
               long used_mem_before, used_mem_after;
               int state_size=obj != null? obj.length : 0;
               Runtime rt=Runtime.getRuntime();
               used_mem_before=rt.totalMemory() - rt.freeMemory();
              
               HashMap state = (HashMap)objectFromByteBuffer(obj);
               java.util.Iterator keys = state.keySet().iterator();
               while (keys.hasNext())
               {
               String key = (String)keys.next();
               log.debug("setState for " + key);
              


              Here's the objectFromByteBuffer() code:

               public static Object objectFromByteBuffer (byte[] buffer) throws Exception
               {
               if(buffer == null)
               return null;
              
               ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
               MarshalledValueInputStream mvis = new MarshalledValueInputStream(bais);
               return mvis.readObject();
               }
              


              This is all in the org.jboss.ha.framework.server.HAPartitionImpl class, found in the cluster module of the AS source.

              It's possible there is some issue with the MarshalledValueInputStream.readObject() call although I doubt it. In this case it's basically just an ordinary java.io.ObjectInputStream.readObject() call. And as you can see from the final log message, the byte[] it's deserializing is only 1466 bytes; hardly anything. If that were having issues I expect I'd hear more about it from others.

              Perhaps something the memory calculation business? That at least is somewhat environment-specific. Just a wild guess.