5 Replies Latest reply on Feb 7, 2006 8:45 AM by hariv

    Configuration

    hariv

      We are using JGROUPS in our application to primary propogate cached data across different nodes in a subnet. It is a very high volume site and has about 8 nodes in the cluster. The following is the Configuration I am using.

      UDP(ucast_send_buf_size=800000;ucast_recv_buf_size=150000):
      PING(timeout=2000;num_initial_members=3;up_thread=true;down_thread=true;):
      MERGE2(min_interval=1000000;max_interval=2000000):
      FD(shun=true;up_thread=true;down_thread=true;timeout=54000000;max_tries=5):
      pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192;up_thread=true;down_thread=true):
      pbcast.STABLE(desired_avg_gossip=2000000;up_thread=true;down_thread=true):
      UNICAST:FRAG(frag_size=8192;down_thread=true;up_thread=true):
      pbcast.GMS:VIEW_ENFORCER:QUEUE

      Everything is working fine in our dev environment. I will appreciate if somebody can validate the connection string I am using before we move this to production.


      Thanks

      Hari

        • 1. Re: Configuration
          belaban

          The timeout value in FD means that crashed members will only be detected after a very long time. Also, VIEW_ENFORCER and QUEUE and non-standard for this stack.
          I suggest you use default.xml and modify it to your purpose

          • 2. Re: Configuration
            hariv

            We rolled out a jgroups cache based propogation solution in production. This solution was working in our qa environment. But unfortunately the solution is not working in production. I am using the following configuration string

            UDP(ucast_send_buf_size=800000;ucast_recv_buf_size=150000):
            PING(timeout=2000;num_initial_members=3;up_thread=true;down_thread=true;):
            MERGE2(min_interval=1000000;max_interval=2000000):
            FD(shun=true;up_thread=true;down_thread=true;timeout=54000000;max_tries=5):
            pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192;up_thread=true;down_thread=true):
            pbcast.STABLE(desired_avg_gossip=2000000;up_thread=true;down_thread=true):
            UNICAST:FRAG(frag_size=8192;down_thread=true;up_thread=true):
            pbcast.GMS:VIEW_ENFORCER:QUEUE


            I turned on debug for jgroups I can see the following messages in the log file.


            xx.xx.105.65 ,xx.xx.105.67,xx.xx.105.69 and xx.xx.105.71
            are the machines


            [org.jgroups.protocols.pbcast.STABLE] received digest xx.xx.105.65:33163#5 (5), xx.xx.105.67:33020#0 (0), xx.xx.105.69:33018#2 (2), xx.xx.105.71:32975#3 (3) from xx.xx.105.69:33018

            2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] received (mcast) 51 bytes from /xx.xx.105.65:33166 (size=51 bytes)
            2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.8.8.8:7600, src: xx.xx.105.65:33165 (2 headers), size = 0 bytes], headers are {UDP=[UDP:channel_name=dictionary], PING=[PING: type=GET_MBRS_REQ, arg=null]}
            2006-02-01 14:00:34,868 WARN [org.jgroups.protocols.UDP] discarded message from different group (dictionary). Sender was xx.xx.105.65:33165
            2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] received (mcast) 51 bytes from /xx.xx.105.65:33166 (size=51 bytes)
            2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.8.8.8:7600, src: xx.xx.105.65:33165 (2 headers), size = 0 bytes], headers are {UDP=[UDP:channel_name=dictionary], PING=[PING: type=GET_MBRS_REQ, arg=null]}
            2006-02-01 14:00:34,869 WARN [org.jgroups.protocols.UDP] discarded message from different group (dictionary). Sender was xx.xx.105.65:33165
            2006-02-01 14:00:34,869 DEBUG [org.jgroups.protocols.PING] received GET_MBRS_REQ from xx.xx.105.65:33165, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=xx.xx.105.67:33023, coord_addr=xx.xx.105.65:33165, is_server=true]]
            2006-02-01 14:00:34,869 DEBUG [org.jgroups.protocols.UDP] sending msg to xx.xx.105.65:33165 (src=xx.xx.105.67:33023), headers are {PING=[PING: type=GET_MBRS_RSP, arg=[own_addr=xx.xx.105.67:33023, coord_addr=xx.xx.105.65:33165, is_server=true]], UDP=[UDP:channel_name=dictionary]}

            My understanding is The above log message means that multicasting is enabled.

            But when I publish text from one of the node; The other nodes does'nt revceive the publish text. I will appreciate your help in this.



            • 3. Re: Configuration
              hariv

              Ben

              I get the following message in the log file
              DEBUG [org.jgroups.protocols.MERGE2] didn't find multiple coordinators in [[own_addr=xx.xx.105.71:32973, coord_addr=xx.xx.105.65:33161, is_server=true], [own_addr=xx.xx.105.69:33016, coord_addr=xx.xx.105.65:33161, is_server=true], [own_addr=xx.xx.105.67:33017, coord_addr=xx.xx.105.65:33161, is_server=true], [own_addr=xx.xx.105.65:33161, coord_addr=xx.xx.105.65:33161, is_server=true]], no need for merge

              The following is the configuration

              UDP(ucast_send_buf_size=800000;ucast_recv_buf_size=150000):
              PING(timeout=2000;num_initial_members=3;up_thread=true;down_thread=true;):
              MERGE2(min_interval=1000000;max_interval=2000000):
              FD(shun=true;up_thread=true;down_thread=true;timeout=54000000;max_tries=5):
              pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192;up_thread=true;down_thread=true):
              pbcast.STABLE(desired_avg_gossip=2000000;up_thread=true;down_thread=true):
              UNICAST:FRAG(frag_size=8192;down_thread=true;up_thread=true):
              pbcast.GMS:VIEW_ENFORCER:QUEUE

              • 4. Re: Configuration
                belaban

                What's the problem then ? Looks like there was no partition, so no need for a merge

                • 5. Re: Configuration
                  hariv

                  The issue is whenever the application invokes castMessage on an instance of RpcDispatcher , then handler does'nt get invoked in all the other nodes. This solution was working in our qa environment but not working in our prod env.

                  The following is the config
                  UDP(ucast_send_buf_size=800000;ucast_recv_buf_size=150000):
                  PING(timeout=2000;num_initial_members=3;up_thread=true;down_thread=true;):
                  MERGE2(min_interval=1000000;max_interval=2000000):
                  FD(shun=true;up_thread=true;down_thread=true;timeout=54000000;max_tries=5):
                  pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192;up_thread=true;down_thread=true):
                  pbcast.STABLE(desired_avg_gossip=2000000;up_thread=true;down_thread=true):
                  UNICAST:FRAG(frag_size=8192;down_thread=true;up_thread=true):
                  pbcast.GMS:VIEW_ENFORCER:QUEUE

                  We rolled out a jgroups cache based propogation solution in production. This solution was working in our qa environment. But unfortunately the solution is not working in production. I am using the following configuration string

                  UDP(ucast_send_buf_size=800000;ucast_recv_buf_size=150000):
                  PING(timeout=2000;num_initial_members=3;up_thread=true;down_thread=true;):
                  MERGE2(min_interval=1000000;max_interval=2000000):
                  FD(shun=true;up_thread=true;down_thread=true;timeout=54000000;max_tries=5):
                  pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800;max_xmit_size=8192;up_thread=true;down_thread=true):
                  pbcast.STABLE(desired_avg_gossip=2000000;up_thread=true;down_thread=true):
                  UNICAST:FRAG(frag_size=8192;down_thread=true;up_thread=true):
                  pbcast.GMS:VIEW_ENFORCER:QUEUE

                  From the below log messages I can see every node is seeing the other nodes.

                  When I turn on debug for jgroups I can see the following messages in the log file.


                  [org.jgroups.protocols.pbcast.STABLE] received digest xx.xx.105.65:33163#5 (5), xx.xx.105.67:33020#0 (0), xx.xx.105.69:33018#2 (2), xx.xx.105.71:32975#3 (3) from xx.xx.105.69:33018

                  2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] received (mcast) 51 bytes from /xx.xx.105.65:33166 (size=51 bytes)
                  2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.8.8.8:7600, src: xx.xx.105.65:33165 (2 headers), size = 0 bytes], headers are {UDP=[UDP:channel_name=dictionary], PING=[PING: type=GET_MBRS_REQ, arg=null]}
                  2006-02-01 14:00:34,868 WARN [org.jgroups.protocols.UDP] discarded message from different group (dictionary). Sender was xx.xx.105.65:33165
                  2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] received (mcast) 51 bytes from /xx.xx.105.65:33166 (size=51 bytes)
                  2006-02-01 14:00:34,868 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.8.8.8:7600, src: xx.xx.105.65:33165 (2 headers), size = 0 bytes], headers are {UDP=[UDP:channel_name=dictionary], PING=[PING: type=GET_MBRS_REQ, arg=null]}
                  2006-02-01 14:00:34,869 WARN [org.jgroups.protocols.UDP] discarded message from different group (dictionary). Sender was xx.xx.105.65:33165
                  2006-02-01 14:00:34,869 DEBUG [org.jgroups.protocols.PING] received GET_MBRS_REQ from xx.xx.105.65:33165, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=xx.xx.105.67:33023, coord_addr=xx.xx.105.65:33165, is_server=true]]
                  2006-02-01 14:00:34,869 DEBUG [org.jgroups.protocols.UDP] sending msg to xx.xx.105.65:33165 (src=xx.xx.105.67:33023), headers are {PING=[PING: type=GET_MBRS_RSP, arg=[own_addr=xx.xx.105.67:33023, coord_addr=xx.xx.105.65:33165, is_server=true]], UDP=[UDP:channel_name=dictionary]}