0 Replies Latest reply on Oct 26, 2006 9:39 AM by Michael Newcomb

    slow JBoss startup in a 4 machine cluster

    Michael Newcomb Newbie

      When running JBoss in clustered mode there is a relatively long pause when the TreeCache starts up. After it prints the GMS information for the cache, it sits there for 15+ seconds (for the first machine) then finishes starting up.

      Then, when for each other cluster node I start, it pauses at the same place but for increasing times.

      Member 1: 17 seconds
      Member 2: 14 seconds
      Member 3: 64 seconds
      Member 4: 74 seconds

      It is the standard JBoss 4.0.4 installation, but I know the cache uses JGroups and it's pausing inside of the cache initialization.

      The CPU/network are all not being hit, so, I'm assuming that it could be something with the JGroups configuration?

      Thanks for any help you can provide,
      Michael


      Below are the cache portions of the JBoss logs for each member. I've placed '**********' around the slow parts of each log. For brevity, I only pasted the whole cache section of member # 1 but I pasted the slow portions of each other member.

      Member 1:

      09:33:47,253 INFO [TreeCache] setting cluster properties from xml to: UD(down_thread=false;enable_bundling=true;ip_ttl=2;loopback=true;max_bundle_size=64000;max_bundle_timeout=30;mcast_addr=230.1.2.7;mcast_port=45577;mcast_recv_buf_size=25000000;mcast_send_buf_size=640000;ucast_recv_buf_size=20000000;ucast_send_buf_size=640000;up_thread=false;use_incoming_packet_handler=true;use_outgoing_packet_handler=true):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE(down_thread=false;max_interval=100000;min_interval=20000;up_thread=false):FD(down_thread=false;max_tries=5;shun=true;timeout=2500;up_thread=false):VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(discard_delivered_msgs=true;down_thread=false;gc_lag=50;max_xmit_size=60000;retransmit_timeout=100,200,300,600,1200,2400,4800;up_thread=false;use_mcast_xmit=false):UNICAST(down_thread=false;timeout=300,600,1200,2400,3600;up_thread=false):pbcast.STABLE(desired_avg_gossip=50000;down_thread=false;max_bytes=2100000;stability_delay=1000;up_thread=false):pbcast.GMS(down_thread=false;join_retry_timeout=2000;join_timeout=3000;print_local_addr=true;shun=true;up_thread=false):FC(down_thread=false;max_credits=10000000;min_threshold=0.20;up_threa
      d=false):FRAG2(down_thread=false;frag_size=60000;up_thread=false):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false)
      09:33:47,534 INFO [WebService] Using RMI server codebase: http://tcc-srvr-04:8083/
      09:33:47,658 INFO [TreeCache] setting cluster properties from xml to: UDP(ip_mcast=true;ip_ttl=64;loopback=true;mcast_addr=228.1.2.3;mcast_port=45551;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(max_interval=20000;min_interval=10000):FD(down_thread=true;shun=true;up_thread=true):VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=8192;retransmit_timeout=600,1200,2400,4800;up_thread=false):UNICAST(down_thread=false;min_threshold=10;timeout=600,1200,2400;window_size=100):pbcast.STABLE(
      desired_avg_gossip=20000;down_thread=false;up_thread=false):FRAG(down_thread=false;frag_size=8192;up_thread=false):pbcast.GMS(join_retry_timeout=2000;join_timeout=5000;print_local_addr=true;shun=true):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false)
      09:33:47,674 INFO [TreeCache] setEvictionPolicyConfig(): [config: null]
      09:33:47,674 WARN [TreeCache] No transaction manager lookup class has been defined. Transactions cannot be used
      09:33:47,690 INFO [TreeCache] interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.PessimisticLockInterceptor
      class org.jboss.cache.interceptors.CacheLoaderInterceptor
      class org.jboss.cache.interceptors.UnlockInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      class org.jboss.cache.interceptors.CacheStoreInterceptor
      09:33:47,705 INFO [TreeCache] cache mode is REPL_SYNC
      09:33:47,877 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:33:47,877 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:33:47,939 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.74:1540
      -------------------------------------------------------
      09:33:49,950 INFO [TreeCache] viewAccepted(): [192.168.200.74:1540|0] [192.168.200.74:1540]
      09:33:49,950 INFO [TreeCache] my local address is 192.168.200.74:1540
      09:33:49,950 INFO [TreeCache] new cache is null (may be first member in cluster)
      09:33:49,950 INFO [TreeCache] state could not be retrieved (must be first member in group)
      09:33:49,950 INFO [LRUPolicy] Starting eviction policy using the provider: org.jboss.ejb3.cache.tree.StatefulEvictionPolicy
      09:33:49,950 INFO [LRUPolicy] Starting a eviction timer with wake up interval of (secs) 1
      09:33:49,950 INFO [TreeCache] Cache is started!!
      09:33:50,028 INFO [TreeCache] setting cluster properties from xml to: UDP(ip_mcast=true;ip_ttl=2;loopback=true;mcast_addr=228.1.2.3;mcast_p
      ort=43333;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(max_interval=20000;min_interval=10000):FD(down_thread=true;shun=true;up_thread=true):VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=8192;retransmit_t
      imeout=600,1200,2400,4800;up_thread=false):UNICAST(down_thread=false;min_threshold=10;timeout=600,1200,2400;window_size=100):pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):FRAG(down_thread=false;frag_size=8192;up_thread=false):pbcast.GMS(join_retry_timeout=2000;join_timeout=5000;print_local_addr=true;shun=true):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false)
      09:33:50,028 INFO [TreeCache] setEvictionPolicyConfig(): [config: null]
      09:33:50,698 INFO [MailService] Mail Service bound to java:/Mail
      09:33:50,760 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:33:50,760 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:33:50,776 INFO [DefaultPartition] Initializing
      09:33:50,838 INFO [TreeCache] interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.PessimisticLockInterceptor
      class org.jboss.cache.interceptors.UnlockInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      09:33:50,838 INFO [TreeCache] cache mode is REPL_ASYNC
      09:33:50,854 INFO [TreeCache] interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.PessimisticLockInterceptor
      class org.jboss.cache.interceptors.UnlockInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      09:33:50,854 INFO [TreeCache] cache mode is REPL_SYNC
      09:33:50,869 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:33:50,869 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:33:50,916 INFO [NamingService] JNDI bootstrap JNP=/0.0.0.0:1099, RMI=/0.0.0.0:1098, backlog=50, no client SocketFactory, Server SocketFa
      ctory=class org.jboss.net.sockets.DefaultSocketFactory
      09:33:50,932 INFO [SubscriptionManager] Bound event dispatcher to java:/EventDispatcher

      ***********
      09:33:50,932 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.74:1545 (additional data: 19 bytes)
      -------------------------------------------------------
      09:34:07,920 INFO [DefaultPartition] Number of cluster members: 1
      ***********
      +17 seconds in between logs!

      09:34:07,920 INFO [DefaultPartition] Other members: 0
      09:34:07,920 INFO [DefaultPartition] New cluster view for partition DefaultPartition (id: 0, delta: 0) : [192.168.100.74:1099]
      09:34:07,920 INFO [DefaultPartition] Fetching state (will wait for 30000 milliseconds):
      09:34:07,920 INFO [DefaultPartition] I am (192.168.100.74:1099) received membershipChanged event:
      09:34:07,920 INFO [DefaultPartition] Dead members: 0 ([])
      09:34:07,920 INFO [DefaultPartition] New Members : 0 ([])
      09:34:07,920 INFO [DefaultPartition] All Members : 1 ([192.168.100.74:1099])
      09:34:07,936 INFO [HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/0.0.0.0
      09:34:07,951 INFO [DetachedHANamingService$AutomaticDiscovery] Listening on /0.0.0.0:1102, group=230.0.0.4, HA-JNDI address=192.168.100.74:
      1100
      09:34:08,388 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.74:1547
      -------------------------------------------------------
      09:34:10,398 INFO [TreeCache] viewAccepted(): [192.168.200.74:1547|0] [192.168.200.74:1547]
      09:34:10,398 INFO [TreeCache] my local address is 192.168.200.74:1547
      09:34:10,398 INFO [TreeCache] new cache is null (may be first member in cluster)
      09:34:10,398 INFO [TreeCache] state could not be retrieved (must be first member in group)
      09:34:10,398 INFO [TreeCache] Cache is started!!
      09:34:10,414 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.74:1549
      -------------------------------------------------------
      09:34:12,425 INFO [TreeCache] viewAccepted(): [192.168.200.74:1549|0] [192.168.200.74:1549]
      09:34:12,425 INFO [TreeCache] my local address is 192.168.200.74:1549
      09:34:12,425 INFO [TreeCache] new cache is null (may be first member in cluster)
      09:34:12,425 INFO [TreeCache] state could not be retrieved (must be first member in group)
      09:34:12,425 INFO [LRUPolicy] Starting eviction policy using the provider: org.jboss.cache.eviction.LRUPolicy
      09:34:12,425 INFO [LRUPolicy] Starting a eviction timer with wake up interval of (secs) 5
      09:34:12,425 INFO [TreeCache] Cache is started!!



      Member 2:

      09:39:48,828 INFO [TreeCache] cache mode is REPL_SYNC
      09:39:48,828 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:39:48,828 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:39:48,875 INFO [NamingService] JNDI bootstrap JNP=/0.0.0.0:1099, RMI=/0.0.0.0:1098, backlog=50, no client SocketFactory, Server SocketFactory=class org.jboss.net.sockets.DefaultSocketFactory
      09:39:48,890 INFO [SubscriptionManager] Bound event dispatcher to java:/EventDispatcher

      ***********
      09:39:48,906 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.77:2781 (additional data: 19 bytes)
      -------------------------------------------------------
      09:40:02,937 INFO [DefaultPartition] Number of cluster members: 2
      ***********
      +14 seconds in between logs!

      09:40:02,937 INFO [DefaultPartition] Other members: 1



      Member 3:

      09:42:04,323 INFO [TreeCache] cache mode is REPL_SYNC
      09:42:04,339 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:42:04,339 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:42:04,386 INFO [NamingService] JNDI bootstrap JNP=/0.0.0.0:1099, RMI=/0.0.0.0:1098, backlog=50, no client SocketFactory, Server SocketFactory=class org.jboss.net.sockets.DefaultSocketFactory
      09:42:04,402 INFO [SubscriptionManager] Bound event dispatcher to java:/EventDispatcher

      ***********
      09:42:04,418 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.76:2871 (additional data: 19 bytes)
      -------------------------------------------------------
      09:43:08,487 INFO [DefaultPartition] Number of cluster members: 3
      ***********
      +64 seconds in between logs!

      09:43:08,487 INFO [DefaultPartition] Other members: 2



      Member 4:

      09:44:06,472 INFO [TreeCache] cache mode is REPL_SYNC
      09:44:06,472 ERROR [UNICAST] window_size is deprecated and will be ignored
      09:44:06,472 ERROR [UNICAST] min_threshold is deprecated and will be ignored
      09:44:06,517 INFO [NamingService] JNDI bootstrap JNP=/0.0.0.0:1099, RMI=/0.0.0.0:1098, backlog=50, no client SocketFactory, Server SocketFactory=class org.jboss.net.sockets.DefaultSocketFactory
      09:44:06,533 INFO [SubscriptionManager] Bound event dispatcher to java:/EventDispatcher

      ***********
      09:44:06,548 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.200.75:3123 (additional data: 19 bytes)
      -------------------------------------------------------
      09:45:20,729 INFO [DefaultPartition] Number of cluster members: 4
      ***********
      +74 seconds in between logs!

      09:45:20,729 INFO [DefaultPartition] Other members: 3