1 2 Previous Next 15 Replies Latest reply on Apr 15, 2008 3:21 AM by belaban

    Shared Transport in JGroups

    brian.stansberry

      There have been some threads on jbosscache-dev about the JGroups shared transport that some of you have been copied on. FYI, I've put some of the key bits in a forum post:

      http://www.jboss.com/index.html?module=bb&op=viewtopic&t=132159

      Question for you folks is what JGroups release do you plan to ship with JBM 1.4.1? Still 2.4.x or are you moving to 2.6.2? I know 2.6.2 needs to work for AS 5, but didn't know what the standalone JBM plan was.

        • 1. Re: Shared Transport in JGroups
          timfox

          We need to run against the same version that AS 5 uses (2.6.2).

          Clebert is currently testing against that, I'll let him comment on his findings so far.

          • 2. Re: Shared Transport in JGroups
            clebert.suconic

            I have been testing JGroups 2.6.2 and it is working fine without any problems.


            However we need to update the configs at JBoss 5/multiplexor before the JBoss 5 release. It currently doesn't have Unicast as I could see.


            I'm a little confused to what will replace the multiplexor. Whatever it is we need the stack we currently use available on JBAS 5.

            • 3. Re: Shared Transport in JGroups
              brian.stansberry

              [OT] Wow, firefox crashed about 3/4 way through this long response. Restarted it and when it restored the tab it resurrected my in progress response. :-)

              "clebert.suconic@jboss.com" wrote:
              I have been testing JGroups 2.6.2 and it is working fine without any problems.


              Great. :-)

              I'm a little confused to what will replace the multiplexor.


              Basically, JGroups maintains a static Map<String, TP> (where TP is the base class for all transport protocols, i.e. UDP and TCP). If you add a singleton_name="xxx" attribute to your UDP or TCP protocol config, JGroups sees that and checks the map for xxx before instantiating an instance of the protocol, uses the existing protocol if it finds one.

              So, say the JBC cache used for session repl needs a channel. It's configured to use the "udp" stack. So, it does this:

              Channel ch = channelFactory.createMultiplexerChannel("udp", .......);
              ch.connect("Tomcat-DefaultPartition");

              If JBM also wanted to use the "udp" config, your code could:

              Channel ch = channelFactory.createMultiplexerChannel("udp", .......);
              ch.connect("JBM-CTRL");

              In the AS, each service will have it's own channel. Only thing shared would be the UDP protocol at the bottom of the protocol stack. That's because the config of the UDP protocol in the "udp" stack includes singleton_name="udp".

              Let's make it more interesting. Say JBM doesn't want to use the "udp" stack; you want a customized stack, say w/o STATE_TRANSFER. So you'd added a "jbm-ctrl" stack to the -stacks.xml file and instead of the above you do this:

              Channel ch = channelFactory.createMultiplexerChannel("jbm-ctrl", .......);
              ch.connect("JBM-CTRL");

              Now two different channels with separate UDP protocols. OK, but maybe not ideal, since admins now have to work harder to isolate AS clusters from each other, since there are now multiple mulitcast sockets being created.

              But, if you guys and I came to an agreement where we realized the same configuration *of just the UDP protocol* was fine for the "udp" and "jbm-xxx" stacks, then we could specify the same UDP config in all of them. And change the value of the singleton name, for example to singleton_name="shared-udp".

              Now JBC and JBM once again share a UDP protocol.

              Whatever it is we need the stack we currently use available on JBAS 5.


              Please send me whatever you want to use and I'll make sure it's integrated.

              Let's continue this dialogue re: shared transport to see if we can agree on a common config for a shared UDP protocol (or not).

              • 4. Re: Shared Transport in JGroups
                brian.stansberry

                A reason I asked whether you guys planned to use 2.6.2 as your default in your standalone version is the weirdness described in my second post at http://www.jboss.com/index.html?module=bb&op=viewtopic&t=132159.

                The weirdness where the AS impl of ChannelFactory will return a shared transport JChannel from createMultiplexerChannel is an AS-only behavior. The standard impl of ChannelFactory that ships with JGroups will return a MuxChannel.

                • 5. Re: Shared Transport in JGroups
                  clebert.suconic

                  We plan to use JGroups 2.6.2 on Branch_Stable.

                  JBossMessaging currently requires an application server...
                  The only standalone ATM is our testsuite.


                  This is the UDP stack we have successfully tested:

                  <config>
                   <UDP
                   mcast_addr="${jboss.messaging.controlchanneludpaddress,jboss.partition.udpGroup:228.7.7.7}"
                   mcast_port="${jboss.messaging.controlchanneludpport:45568}"
                   tos="8"
                   ucast_recv_buf_size="20000000"
                   ucast_send_buf_size="640000"
                   mcast_recv_buf_size="25000000"
                   mcast_send_buf_size="640000"
                   loopback="false"
                   discard_incompatible_packets="true"
                   max_bundle_size="64000"
                   max_bundle_timeout="30"
                   use_incoming_packet_handler="true"
                   ip_ttl="${jboss.messaging.ipttl:8}"
                   enable_bundling="false"
                   enable_diagnostics="true"
                   thread_naming_pattern="cl"
                  
                   use_concurrent_stack="true"
                  
                   thread_pool.enabled="true"
                   thread_pool.min_threads="1"
                   thread_pool.max_threads="200"
                   thread_pool.keep_alive_time="5000"
                   thread_pool.queue_enabled="true"
                   thread_pool.queue_max_size="1000"
                   thread_pool.rejection_policy="Run"
                  
                   oob_thread_pool.enabled="true"
                   oob_thread_pool.min_threads="1"
                   oob_thread_pool.max_threads="8"
                   oob_thread_pool.keep_alive_time="5000"
                   oob_thread_pool.queue_enabled="false"
                   oob_thread_pool.queue_max_size="100"
                   oob_thread_pool.rejection_policy="Run"/>
                   <PING timeout="2000"
                   num_initial_members="3"/>
                   <MERGE2 max_interval="100000"
                   min_interval="20000"/>
                   <FD_SOCK />
                   <FD timeout="10000" max_tries="5" shun="true"/>
                   <VERIFY_SUSPECT timeout="1500" />
                   <BARRIER />
                   <pbcast.NAKACK use_stats_for_retransmission="false"
                   exponential_backoff="150"
                   use_mcast_xmit="true" gc_lag="0"
                   retransmit_timeout="50,300,600,1200"
                   discard_delivered_msgs="true"/>
                   <UNICAST timeout="300,600,1200,2400,3600"/>
                   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
                   <VIEW_SYNC avg_send_interval="10000"/>
                   <pbcast.GMS print_local_addr="true" join_timeout="3000"
                   shun="false"
                   view_bundling="true"/>
                   <FC max_credits="500000"
                   min_threshold="0.20"/>
                   <FRAG2 frag_size="60000" />
                   <pbcast.STATE_TRANSFER/>
                   <pbcast.FLUSH timeout="20000"/>
                   </config>
                  


                  And this is our TCP stack:

                  <config>
                   <TCP start_port="7900"
                   loopback="true"
                   recv_buf_size="20000000"
                   send_buf_size="640000"
                   discard_incompatible_packets="true"
                   max_bundle_size="64000"
                   max_bundle_timeout="30"
                   use_incoming_packet_handler="true"
                   enable_bundling="false"
                   use_send_queues="false"
                   sock_conn_timeout="300"
                   skip_suspected_members="true"
                   use_concurrent_stack="true"
                   thread_pool.enabled="true"
                   thread_pool.min_threads="1"
                   thread_pool.max_threads="200"
                   thread_pool.keep_alive_time="5000"
                   thread_pool.queue_enabled="true"
                   thread_pool.queue_max_size="500"
                   thread_pool.rejection_policy="run"
                   oob_thread_pool.enabled="true"
                   oob_thread_pool.min_threads="1"
                   oob_thread_pool.max_threads="100"
                   oob_thread_pool.keep_alive_time="5000"
                   oob_thread_pool.queue_enabled="false"
                   oob_thread_pool.queue_max_size="100"
                   oob_thread_pool.rejection_policy="run"/>
                   <MPING timeout="5000"
                   mcast_addr="${jboss.messaging.datachanneludpaddress,jboss.partition.udpGroup:228.6.6.6}"
                   mcast_port="${jboss.messaging.datachanneludpport:45567}"
                   ip_ttl="${jboss.messaging.ipttl:8}"
                   num_initial_members="5"
                   num_ping_requests="3"/>
                   <MERGE2 max_interval="100000" min_interval="20000"/>
                   <FD_SOCK/>
                   <VERIFY_SUSPECT timeout="1500"/>
                   <BARRIER/>
                   <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
                   retransmit_timeout="300,600,1200,2400,4800"
                   discard_delivered_msgs="true"/>
                   <UNICAST timeout="300,600,1200,2400,3600"/>
                   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
                   <VIEW_SYNC avg_send_interval="10000"/>
                  
                   <pbcast.GMS print_local_addr="true" join_timeout="3000"
                   shun="false" view_bundling="true"/>
                   </config>
                  



                  We really need UNICAST on tcp, and pbcast.STATE_TRANSAFER and pbcast.FLUSH. As far as I remember these items are not described on the MUX channel at JBoss5.

                  • 6. Re: Shared Transport in JGroups
                    brian.stansberry

                    Thanks. These are added, revision 71310 of AS trunk. JIRA is http://jira.jboss.com/jira/browse/JBAS-5340 . As we discussed they are named "jbm-control" and "jbm-data". Feel free to edit these going forward. FYI I'll probably rename this file to jboss-jgroups-stacks.xml in the next day or so.

                    Let's keep using this thread to discuss the transport protocol configs, perhaps see if we can agree on common configs for the UDP and TCP protocols that we can share between the "udp" stack and "jbm-control" and/or between the "tcp" stack and "jbm-data". http://jira.jboss.com/jira/browse/JBAS-5341

                    Also, to control multicast TTL you guys are using ${jboss.messaging.ipttl:8} . The other stacks are all using ${jgroups.udp.ip_ttl:2} . Is using your own property with a value of 8 important to you? It would be nice to use a single property. In most cases I'd think an admin would want the same value for all usages.

                    • 7. Re: Shared Transport in JGroups
                      brian.stansberry

                      FYI, I ended up calling the -stacks.xml file "jgroups-channelfactory-stacks.xml".

                      • 8. Re: Shared Transport in JGroups
                        clebert.suconic

                        I have removed all the JGroups configs from XXX-persistence.xml files.

                        From now on, they will aways refer to the ChannelFactory.

                        With this model, we will be able to put in production the same config files we test in our testsuite. (As we already do for EAP 4.3)

                        If anyone has any objection I can revert the change.

                        (Speak now or forever hold your peace :-) )

                        • 9. Re: Shared Transport in JGroups
                          brian.stansberry

                          As mentioned above, I'd like to see if we can come to agreement on a couple common transport protocol configs so in AS 5 we can use the same singleton_name values for the stacks JBM uses and the stacks other AS services use. Benefit is we simplify our users' lives quite a bit by opening fewer sockets. Managing multiple multicast sockets is an important problem for users when it comes to maintaining cluster isolation; the fewer the better.

                          If users want to optimize away from the defaults, that's easy to do; just change the singleton_name to something different.

                          So, I'd like to see if as a default we can use a common UDP protocol config for the "jbm-control" stack and the general "udp" stack. All other AS services by default use the "udp" stack.

                          Might as well discuss whether we want a common TCP protocol between the "jbm-data" stack and the general "tcp" stack. That's not a particular priority for me; just something to think about.

                          I've compared the UDP protocol configs between the "jbm-control" stack and the "udp" stack. Very similar. Here are the differences:

                          mcast_addr:

                          "udp" = ${jgroups.udp.mcast_addr:228.11.11.11}
                          "jbm-control" = ${jboss.messaging.controlchanneludpaddress,jboss.partition.udpGroup:228.7.7.7}

                          Basically, JBM has added its own system property, defaulting to the regular AS one if JBM's isn't set. How important is using that property? Could it go in a commented-out UDP config in the jbm-control stack with a different singleton name, e.g.:

                          <stack name="jbm-control"
                           description="Stack optimized for the JBoss Messaging Control Channel">
                           <config>
                           <!-- Shared transport protocol used with other AS services -->
                           <UDP
                           singleton_name="udp"
                           mcast_addr="${jgroups.udp.mcast_addr:228.11.11.11}"
                           mcast_port="${jgroups.udp.mcast_port:45688}"
                           ...
                           />
                           <!-- Uncomment and comment out above if you don't want
                           to share a transport protocol with other AS services -->
                           <!--
                           <UDP
                           singleton_name="jbm-control"
                          mcast_addr="${jboss.messaging.controlchanneludpaddress,jboss.partition.udpGroup:228.7.7.7}"
                           mcast_port="${jboss.messaging.controlchanneludpport:45568}"
                           ...
                           />
                           -->


                          mcast_port:

                          "udp" = ${jgroups.udp.mcast_port:45688}
                          "jbm-control" = ${jboss.messaging.controlchanneludpport:45568}

                          See mcast_addr discussion above. No matter what, I just noticed that "udp" and "jbm-control" are using the same port; that needs to change if we don't use the same singleton_name. I'll change the "udp" one in a minute.

                          loopback:

                          "udp" = true
                          "jbm-control" = false.

                          We found that FLUSH behaves badly if loopback=false and the interface the channel is using doesn't properly support multicast. So we changed the AS to loopback=true. Either way, the channel doesn't work correctly, but with loopback=true nodes just don't see each other, clusters don't form, and people need to debug the problem using the techniques discussed for years on our wiki/JGroups docs. With loopback=false, you get weird cryptic errors from FLUSH. See http://lists.jboss.org/pipermail/jboss-development/2008-March/011595.html .

                          ip_ttl:

                          "udp" = ${jgroups.udp.ip_ttl:2}
                          "jbm-controll" = ${jboss.messaging.ipttl:8}

                          Different system property, different value. I can't see any reason why we should use a different system property by default. I prefer the "2" value (limit mcast propagation) but if there is a reason for "8" I'd happily switch to it if it lets us have a shared config. :)

                          enable_bundling:

                          "udp" = true
                          "jbm-control" = false

                          Need Bela's input here. I imagine JBM is concerned about latency, which is why they chose "false". I need to perf test http session replication with bundling on and off and see the difference. If it's not huge and JBM really needs 'false', I'm personally comfortable with 'false' as a default.

                          thread_pool.max_threads:

                          "udp" = 25
                          "jbm-control" = 200

                          The "udp" value is too low. I'd be happy to use the JBM value.

                          (BTW, both "udp" and "jbm-control" have thread_pool.min_threads="1" and thread_pool.keep_alive_time="5000").

                          thread_pool.queue_enabled and thread_pool.queue_max_size:

                          "udp" = false and 100
                          "jbm-control" = true and 1000

                          Need Bela's input here. The "udp" stack values came long ago from a JGroups stacks.xml. I'd imagine the JBM values would be more performant.


                          Looking at that list, I don't see any show stoppers. Comparing the TCP protocol config in "tcp" and "jbm-data" shows basically the same set of differences.

                          Comments?

                          • 10. Re: Shared Transport in JGroups
                            brian.stansberry

                             

                            "bstansberry@jboss.com" wrote:
                            No matter what, I just noticed that "udp" and "jbm-control" are using the same port; that needs to change if we don't use the same singleton_name. I'll change the "udp" one in a minute.


                            I'm getting blinder and blinder. 45688 != 45568

                            • 11. Re: Shared Transport in JGroups
                              clebert.suconic

                              Brian...

                              We just need a compatible channel. I've requested another channel as I believed they were different, but if we can get our needs in another channel name, we just need to test and validate that. (Like changing our testsuite to run it).

                              If we can have such equivalent channel in a standard way, that's fine.

                              • 12. Re: Shared Transport in JGroups
                                brian.stansberry

                                OK, good. That knocks off the trivial stuff. I'd still like to get Bela's input on the thread_pool.queue_enabled and thread_pool.queue_max_size, where I think the values you guys had are better.

                                That leaves enable_bundling, where I want to understand your usage so I can pick the best default.

                                How data intensive and latency sensitive is the jbm-control channel? And is it used for request/response type messages (i.e. thread on node A sends a message, blocks waiting for a response from the other nodes)? Or just for async (thread on A sends and returns; doesn't block for response)?

                                The enable_bundling value impacts that. When it is true, there can be an up to 30 ms delay before a message is sent. Adds latency to when the message arrives, which may or may not matter to JBM. That delay doesn't cause the thread that told JGroups to send the message to block -- unless the thread is blocking waiting for a response. If it's waiting for a response, that thread might wait 30 ms for the msg to be sent, plus another 30 ms for the response to be sent.

                                The offset to that increased latency is higher throughput under heavy load w/ enable_bundling=true. The "udp" stack will be used for session replication, which *might* involve heavy load, so we've optimized for that. IMHO that was a marginal decision though, so I want to understand JBM's usage.

                                • 13. Re: Shared Transport in JGroups
                                  clebert.suconic

                                   

                                  I'd still like to get Bela's input on the thread_pool.queue_enabled and thread_pool.queue_max_size, where I think the values you guys had are better.



                                  During our latest testsuite runnings to upgrade JGroups 2.6, I've done so many runs, and at some point I was getting some failures, and I then increased those pools. At the same time I fixed increased those pools I also changed some other stuff... so I believe those numbers are better but I didn't find a scientific proof for those pools. You or Bela might suggest better numbers if you like.

                                  Regarding the JGroups behavior at our code (Or behaviour as Tim Fox would prefer :-p). I will look at the code and give you some examples tomorrow.

                                  • 14. Re: Shared Transport in JGroups
                                    belaban

                                     

                                    "bstansberry@jboss.com" wrote:


                                    enable_bundling:

                                    "udp" = true
                                    "jbm-control" = false

                                    Need Bela's input here. I imagine JBM is concerned about latency, which is why they chose "false". I need to perf test http session replication with bundling on and off and see the difference. If it's not huge and JBM really needs 'false', I'm personally comfortable with 'false' as a default.



                                    Enabling bundling is a huge speedup. This is similar to nagling in TCP. However, if you invoke RPCs, then you should disable it. see http://wiki.jboss.org/wiki/ProblemAndClusterRPCs for details.
                                    Note that you can enable multicasts, but disable unicasts, so RPCs are bundled but individual responses are sent right away. The option is enable_unicast_bundling.



                                    thread_pool.queue_enabled and thread_pool.queue_max_size:

                                    "udp" = false and 100
                                    "jbm-control" = true and 1000

                                    Need Bela's input here. The "udp" stack values came long ago from a JGroups stacks.xml. I'd imagine the JBM values would be more performant.



                                    In JGroups, I have the queue enabled in the default- and disabled in the OOB-pool: OOB messages should always get delivered and not be placed into a queue.
                                    Note that if you need back pressure from a receiver to a sender, then a queue is good, as when it's full and all (max) threads are busy, the receiver won't be able to send any credits back to the sender, so the sender blocks until the receiver sends credits.

                                    I would recommend a queue with a size that depends on the aggregated message rate of all clusters sharing the transport.

                                    For TCP, I would recommend a rejection_policy of "discard", as TCP can block on the write and we've seen hangs with "run" instead of "discard". I would also use a queue here though


                                    1 2 Previous Next