10 Replies Latest reply on Oct 19, 2009 4:32 PM by brian.stansberry

    JGroups timing call inomalities

    axelerator

      Hey,

      we have
      JBoss 4.2.3.GA
      JGroups 2.6.10.merge
      in development. We have a component, which is deployed on all cluster nodes and can be called externally and from within the server. If that component is called on the slave node, it redirects the call to the master (callMethodOnCoordinator). If called on the master node, it just continues.

      What we see is that if we call that component from the outside on the master, the response time is very constant around 90ms. If called on the slave node, where this is relayed to the master, the response time is somewhere between 200ms to 10000ms.

      I don't really get, why there is such a huge difference (besides 10sec to be ridiculous).

      Here is the cluster service config we use:

      <?xml version="1.0" encoding="UTF-8"?>
      <server>
       <mbean code="org.jboss.ha.framework.server.ClusterPartition"
       name="jboss:service=${jboss.partition.name:DefaultPartition}">
      
       <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
       <attribute name="NodeAddress">${jboss.bind.address}</attribute>
       <attribute name="DeadlockDetection">False</attribute>
       <attribute name="StateTransferTimeout">30000</attribute>
       <attribute name="MethodCallTimeout">300000</attribute>
       <attribute name="PartitionConfig">
       <Config>
       <UDP
       mcast_addr="${jboss.partition.udpGroup:228.1.2.3}"
       mcast_port="${jboss.hapartition.mcast_port:45566}"
       tos="8"
       ucast_recv_buf_size="20000000"
       ucast_send_buf_size="640000"
       mcast_recv_buf_size="25000000"
       mcast_send_buf_size="640000"
       loopback="false"
       discard_incompatible_packets="true"
       use_incoming_packet_handler="true"
       max_bundle_size="60000"
       max_bundle_timeout="30"
       ip_ttl="${jgroups.udp.ip_ttl:8}"
       enable_bundling="false"
       receive_on_all_interfaces="true"
       send_on_all_interfaces="true"
       use_concurrent_stack="true"
       thread_pool.enabled="true"
       thread_pool.min_threads="80"
       thread_pool.max_threads="100"
       thread_ss.ha.framework.interfaces.RoundRobin</attribute>
       </mbean>
      
       <mbean code="org.jboss.invocation.unified.server.UnifiedInvokerHA"
       name="jboss:service=invoker,type=unifiedha">
       <depends>jboss:service=TransactionManager</depends>
       <depends optional-attribute-name="Connector"
       proxy-type="attribute">jboss.remoting:service=Connector,transport=socket</depends>
       <depends>jboss:service=${jboss.partition.name:DefaultPartition}</depends>
       </mbean>
      
       <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA"
       name="jboss:service=invoker,type=jrmpha">
       <attribute name="ServerAddress">${jboss.bind.address}</attribute>
       <attribute name="RMIObjectPort">4447</attribute>
       <depends>jboss:service=Naming</depends>
       </mbean>
      
       <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA"
       name="jboss:service=invoker,type=pooledha">
       <attribute name="NumAcceptThreads">1</attribute>
       <attribute name="MaxPoolSize">300</attribute>
       <attribute name="ClientMaxPoolSize">300</attribute>
       <attribute name="SocketTimeout">60000</attribute>
       <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
       <attribute name="ServerBindPort">4448</attribute>
       <attribute name="ClientConnectAddress">${jboss.bind.address}</attribute>
       <attribute name="ClientConnectPort">0</attribute>
       <attribute name="EnableTcpNoDelay">false</attribute>
       <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager</depends>
       <depends>jboss:service=Naming</depends>
       </mbean>
      
       <mbean code="org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge"
       name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
       <depends optional-attribute-name="ClusterPartition"
       proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
       <depends>jboss.cache:service=InvalidationManager</depends>
       <attribute name="InvalidationManager">jboss.cache:service=InvalidationManager</attribute>
       <attribute name="BridgeName">DefaultJGBridge</attribute>
       </mbean>
      </server>
      

      Any idea how this can happen? Something with the slave-master communication seems to go wrong.

      We tested the following constellation:
      Master only -> everything fine,
      Master+Slave, Master queried -> everyhting fine,
      Master+Slave, Slave queried -> Delays and timeouts
      Master+Slave, Both qeueried -> Delays and timeouts

      So I conclude that JGroups must be involved here ...

      Any help is greatly appreciated !!!


      Best regards and cheers,

      Axelerator


        • 1. Re: JGroups timing call inomalities
          brian.stansberry

          Something got messed up in your paste of cluster-service.xml.

          Also, I thought you'd said on IRC that you were using a TCP protocol stack?

          • 2. Re: JGroups timing call inomalities
            axelerator

            Hey,

            yeah, sorry for the mess-up. Attached is the correct version. You're right, in IRC I told you I have those issues with TCP. Though, on the environment where I copied the config from, UDP was set. after browsing the web for some itches, I found a text stating that in GBit environments, UDP doesn not scale that well, so I switched over to TCP as a first step.
            From that on, I tuned the thread_pool settings but still got no major hint on that.

            <?xml version="1.0" encoding="UTF-8"?>
            <server>
             <mbean code="org.jboss.ha.framework.server.ClusterPartition"
             name="jboss:service=${jboss.partition.name:DefaultPartition}">
             <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
             <attribute name="NodeAddress">${jboss.bind.address}</attribute>
             <attribute name="DeadlockDetection">False</attribute>
             <attribute name="StateTransferTimeout">30000</attribute>
             <attribute name="MethodCallTimeout">300000</attribute>
             <attribute name="PartitionConfig">
             <Config>
             <UDP
             mcast_addr="${jboss.partition.udpGroup:228.1.2.3}"
             mcast_port="${jboss.hapartition.mcast_port:45566}"
             tos="8"
             ucast_recv_buf_size="20000000"
             ucast_send_buf_size="640000"
             mcast_recv_buf_size="25000000"
             mcast_send_buf_size="640000"
             loopback="false"
             discard_incompatible_packets="true"
             use_incoming_packet_handler="true"
             max_bundle_size="60000"
             max_bundle_timeout="30"
             ip_ttl="${jgroups.udp.ip_ttl:8}"
             enable_bundling="false"
             receive_on_all_interfaces="true"
             send_on_all_interfaces="true"
             use_concurrent_stack="true"
             thread_pool.enabled="true"
             thread_pool.min_threads="80"
             thread_pool.max_threads="100"
             thread_pool.keep_alive_time="5000"
             thread_pool.queue_enabled="true"
             thread_pool.queue_max_size="10"
             thread_pool.rejection_policy="run"
             oob_thread_pool.enabled="true"
             oob_thread_pool.min_threads="1"
             oob_thread_pool.max_threads="32"
             oob_thread_pool.keep_alive_time="5000"
             oob_thread_pool.queue_enabled="false"
             oob_thread_pool.queue_max_size="100"
             oob_thread_pool.rejection_policy="run"/>
             <PING timeout="5000" num_initial_members="2"/>
             <MERGE2 max_interval="100000" min_interval="20000"/>
             <FD_SOCK/>
             <FD timeout="10000" max_tries="30" shun="true"/>
             <VERIFY_SUSPECT timeout="10000" num_msgs="3"/>
             <pbcast.NAKACK max_xmit_size="60000"
             use_mcast_xmit="false" gc_lag="0"
             retransmit_timeout="300,600,1200,2400,4800"
             discard_delivered_msgs="true"/>
             <UNICAST timeout="300,600,1200,2400,3600"/>
             <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
             max_bytes="400000"/>
             <pbcast.GMS print_local_addr="true" join_timeout="3000"
             join_retry_timeout="2000" shun="true"
             view_bundling="true"
             view_ack_collection_timeout="5000"/>
             <FRAG2 frag_size="60000"/>
             <pbcast.STATE_TRANSFER/>
             <pbcast.FLUSH timeout="0"/>
             </Config>
             </attribute>
             <depends>jboss:service=Naming</depends>
             </mbean>
            
             <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService"
             name="jboss:service=HASessionState">
             <depends>jboss:service=Naming</depends>
             <depends optional-attribute-name="ClusterPartition"
             proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
             <attribute name="JndiName">/HASessionState/Default</attribute>
             <attribute name="BeanCleaningDelay">0</attribute>
             </mbean>
            
             <mbean code="org.jboss.ha.jndi.HANamingService"
             name="jboss:service=HAJNDI">
             <depends optional-attribute-name="ClusterPartition"
             proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
             <attribute name="BindAddress">${jboss.bind.address}</attribute>
             <attribute name="Port">1100</attribute>
             <attribute name="RmiPort">1101</attribute>
             <attribute name="Backlog">50</attribute>
             <depends optional-attribute-name="LookupPool"
             proxy-type="attribute">jboss.system:service=ThreadPool</depends>
            
             <attribute name="DiscoveryDisabled">false</attribute>
             <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}</attribute>
             <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230.0.0.4}</attribute>
             <attribute name="AutoDiscoveryGroup">1102</attribute>
             <attribute name="AutoDiscoveryTTL">16</attribute>
             <attribute name="LoadBalancePolicy">org.jboss.ha.framework.interfaces.RoundRobin</attribute>
            
             </mbean>
            
             <mbean code="org.jboss.invocation.unified.server.UnifiedInvokerHA"
             name="jboss:service=invoker,type=unifiedha">
             <depends>jboss:service=TransactionManager</depends>
             <depends optional-attribute-name="Connector"
             proxy-type="attribute">jboss.remoting:service=Connector,transport=socket</depends>
             <depends>jboss:service=${jboss.partition.name:DefaultPartition}</depends>
             </mbean>
            
             <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA"
             name="jboss:service=invoker,type=jrmpha">
             <attribute name="ServerAddress">${jboss.bind.address}</attribute>
             <attribute name="RMIObjectPort">4447</attribute>
             <depends>jboss:service=Naming</depends>
             </mbean>
            
             <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA"
             name="jboss:service=invoker,type=pooledha">
             <attribute name="NumAcceptThreads">1</attribute>
             <attribute name="MaxPoolSize">300</attribute>
             <attribute name="ClientMaxPoolSize">300</attribute>
             <attribute name="SocketTimeout">60000</attribute>
             <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
             <attribute name="ServerBindPort">4448</attribute>
             <attribute name="ClientConnectAddress">${jboss.bind.address}</attribute>
             <attribute name="ClientConnectPort">0</attribute>
             <attribute name="EnableTcpNoDelay">false</attribute>
             <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager</depends>
             <depends>jboss:service=Naming</depends>
             </mbean>
            
            
             <mbean code="org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge"
             name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
             <depends optional-attribute-name="ClusterPartition"
             proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
             <depends>jboss.cache:service=InvalidationManager</depends>
             <attribute name="InvalidationManager">jboss.cache:service=InvalidationManager</attribute>
             <attribute name="BridgeName">DefaultJGBridge</attribute>
             </mbean>
            
            </server>
            


            • 3. Re: JGroups timing call inomalities
              brian.stansberry

              I don't recommend UDP.send_on_all_interfaces=true. UDP.receive_on_all_interfaces=true is a bit suspect too, although not as much as send_on_all_interfaces. It's much better IMO to pick a common network and bind all nodes to an interface on that network it using -b xxx or -Djgroups.bind_addr=xxx. Reserve receive/send_on_all_interfaces for situations where you really have no choice.

              • 4. Re: JGroups timing call inomalities
                belaban

                And note that 2.6.10.merge is experimental and unsupported ! I recommend 2.6.13.GA instead which has been heavily tested.

                • 5. Re: JGroups timing call inomalities
                  axelerator

                  Hey guys,

                  thanks for pointing this out. We enabled send_on_all_interfaces since we're running in a Solaris Container and weren't sure, which interfaec ultimately would be used for sending/receiving data.

                  Also, thanks for pointing out about the 2.6.10.merge release. is 2.6.13.GA compatible with 4.2.3.GA of the AS?

                  Best regards,

                  Axelerator

                  • 6. Re: JGroups timing call inomalities
                    brian.stansberry

                    Re: 2.6.13 and AS 4.2.3, I personally haven't tried any 2.6 series release with AS 4.x. So I'll defer to the rest of the community for detailed input.

                    The 2.6 series is API compatible with the 2.4 series used in AS 4.2.x, and the fundamental semantics of Channel behavior are the same, so it should work fine. There are some pretty major configuration differences, but you've already seen those with your 2.6.10.merge efforts.

                    I wouldn't expect problems moving from 2.6.10.merge to 2.6.13.

                    • 7. Re: JGroups timing call inomalities
                      axelerator

                      Hi Brian,

                      thanks. That's a good lookout. Also I tested the setup and AS started without any itches and behaved pretty much normally.
                      What I'm nevertheless a bit curious about is the use of synchronous RPC calls (callMethodOnCoordinator) with regards to performance and timings.
                      Since we only use this method here, I'm a bit worried that that's not the "preferred" way to interact. Can you confirm that?

                      I ask because as far as I know, callMethodOnCoordinator does use JGroups to actually get the master server. Since we only have timings here (especially if there are a lot of calls requesting the same object) by seeing a lot of threads in collectResponse I'm quite curious.

                      Also, does the JIRA ticket with regards to a AtomicBoolan instead of a ReentrantLock come into play if there are a lot the same calls around?

                      Best regards,

                      Axelerator

                      • 8. Re: JGroups timing call inomalities
                        brian.stansberry

                         

                        "axelerator" wrote:
                        Hi Brian,

                        thanks. That's a good lookout. Also I tested the setup and AS started without any itches and behaved pretty much normally.
                        What I'm nevertheless a bit curious about is the use of synchronous RPC calls (callMethodOnCoordinator) with regards to performance and timings.
                        Since we only use this method here, I'm a bit worried that that's not the "preferred" way to interact. Can you confirm that?


                        I wouldn't say it's not preferred. Historically a lot of JGroups use cases have been point-to-multipoint(e.g. callMethodOnCluster if done via HAPartition), but there is quite a bit of point-to-point usage as well (e.g. JBoss Cache buddy replication.) You use what meets the requirement of your application.


                        I ask because as far as I know, callMethodOnCoordinator does use JGroups to actually get the master server. Since we only have timings here (especially if there are a lot of calls requesting the same object) by seeing a lot of threads in collectResponse I'm quite curious.


                        The part of figuring out who the coordinator node is is quite lightweight.

                        Do you have large numbers of concurrent calls to the same service from each node? Those are going to be executed serially, since JGroups will only allow one RPC at a time from a given sender to be executing in the application.


                        Also, does the JIRA ticket with regards to a AtomicBoolan instead of a ReentrantLock come into play if there are a lot the same calls around?


                        Please provide a link to the JIRA you are talking about.

                        • 9. Re: JGroups timing call inomalities
                          axelerator

                          Hi Brian,

                          okay - that wasn't the most intelligent post ever. So, let's clear things up.
                          I have a service, which is active on all nodes in the cluster. Nevertheless, if the node a method of this service is invoked on, is not the master node, the call is relayed to that master node (callMethodOnCoordinator).

                          What happens is that the master-node service is called not too heavily, but concurrently 1-2 times/second (to do some mutual exclusive stuff).

                          Therefor, as far as I understand you, the method I use isn't the most intelligent way to do this, since I want multi-threaded access to this services, besides that the service handles semaphore access internally.
                          Does this mean does using JGroups/Jboss Clustering, I can call the same service on each node only sequentially, but not multi-threaded? If so .. what would be your suggestion on handling this issue? I ask because I expected that this remoting call can be handled in a parallel environment.

                          The JIRA case I was referring to, is https://jira.jboss.org/jira/browse/JGRP-829 .

                          • 10. Re: JGroups timing call inomalities
                            brian.stansberry

                            JGroups is only going to allow one message at a time from a particular sender to proceed up the stack past UNICAST. This is needed to ensure message are delivered in order. So, only one RPC at a time.

                            There's a JGroups JIRA to add an API whereby the application can define the scope to which message ordering is applied; e.g. your app could say all messages associated with "LockA" must be ordered, but a message associated with "LockB" wouldn't have to wait for "LockA" messages. But that won't be in JGroups 2.6.

                            How long does these RPCs take once the RPC is actually invoked on the coordinator? That's what's going to drive your throughput since the RPCs execute in series. Ah, you said 90ms, although AIUI that was for a remote client to get a response, so the actual server-side execution time is probably much less.

                            Worst case, lets assume it's 90ms. Your max throughput would be ~ 11 per second. If you are only executing 1 or 2 per second, serializing the execution shouldn't be adding more than an extra 90 ms delay to one RPC, and often wouldn't delay it at all.

                            The key thing here is the execution time of the RPC. With JBoss AS HTTP session replication, the same basic serialization of RPCs occurs at the JGroups level. Yet we handle hundreds of concurrent requests per node and thousands of requests per second. That's because the execution time of the RPC itself is very low.