11 Replies Latest reply on Jun 5, 2007 2:22 AM by redchili

    Farm Deployment with UDP not working

    redchili

      Hi,

      I have a clustered JBoss environment consisting of several nodes. All nodes are absolutely identical in their configuration and form a cluster "DefaultPartition". Joining and leaving the cluster works fine.
      I've also established the farming deploy service but this farming service only works when I use the TCP configuration in cluster-service.xml.

      My problem looks similar to this one:
      http://www.jboss.com/index.html?module=bb&op=viewtopic&t=90561&postdays=0&postorder=asc&start=0

      However as I'm using 4.2.0CR1 the buffer sizes are already increased and don't seem to be the problem.

      Using UDP Protocol I deploy an application on one node, this node starts the application just fine and then "hangs" on deploying it to the cluster:

      14:43:00,210 INFO [ClusterFileTransfer] Start push of file kusssdemo.ear to cluster.


      It hangs there forever and nothing happens on the other nodes.
      The UDP configuration is as follows:
      <Config>
       <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}"
       mcast_port="${jboss.hapartition.mcast_port:45566}"
       tos="8"
       ucast_recv_buf_size="2000000"
       ucast_send_buf_size="640000"
       mcast_recv_buf_size="2500000"
       mcast_send_buf_size="640000"
       loopback="false"
       discard_incompatible_packets="true"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="true"
       use_outgoing_packet_handler="false"
       ip_ttl="${jgroups.udp.ip_ttl:2}"
       down_thread="false" up_thread="false"
       enable_bundling="false"/>
       <PING timeout="2000"
       down_thread="false" up_thread="false" num_initial_members="3"/>
       <MERGE2 max_interval="100000"
       down_thread="false" up_thread="false" min_interval="20000"/>
       <FD_SOCK down_thread="false" up_thread="false"/>
       <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
       <pbcast.NAKACK max_xmit_size="60000"
       use_mcast_xmit="false" gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       down_thread="false" up_thread="false"
       discard_delivered_msgs="true"/>
       <UNICAST timeout="300,600,1200,2400,3600"
       down_thread="false" up_thread="false"/>
       <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
       down_thread="false" up_thread="false"
       max_bytes="400000"/>
       <pbcast.GMS print_local_addr="true" join_timeout="3000"
       down_thread="false" up_thread="false"
       join_retry_timeout="2000" shun="true"
       view_bundling="true"/>
       <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
       <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>
       </Config>
      


      The working TCP config looks like:
      <Config>
       <TCP bind_addr="${jboss.bind.address}" start_port="7800" loopback="true"
       tcp_nodelay="true"
       recv_buf_size="20000000"
       send_buf_size="640000"
       discard_incompatible_packets="true"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="true"
       use_outgoing_packet_handler="false"
       down_thread="false" up_thread="false"
       enable_bundling="false"
       use_send_queues="false"
       sock_conn_timeout="300"
       skip_suspected_members="true"/>
       <TCPPING initial_hosts="192.168.1.106[7800],192.168.1.105[7800]" port_range="3"
       timeout="3000"
       down_thread="false" up_thread="false"
       num_initial_members="2"/>
       <MERGE2 max_interval="100000"
       down_thread="false" up_thread="false" min_interval="20000"/>
       <FD_SOCK down_thread="false" up_thread="false"/>
       <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
       <pbcast.NAKACK max_xmit_size="60000"
       use_mcast_xmit="false" gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       down_thread="false" up_thread="false"
       discard_delivered_msgs="true"/>
       <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
       down_thread="false" up_thread="false"
       max_bytes="400000"/>
       <pbcast.GMS print_local_addr="true" join_timeout="3000"
       down_thread="false" up_thread="false"
       join_retry_timeout="2000" shun="true"
       view_bundling="true"/>
       <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>
       </Config>
      


      In addition tho this question: what is the initial_hosts parameter in the TCP config for? Do I have to specify all potential cluster nodes in this list? Or can i use something like {jboss.bind.address}[7800] there?

      Thanks,
      Reinhard

        • 1. Re: Farm Deployment with UDP not working
          redchili

          I forgot: the Application EAR file is about 10Mb and there is no firewalls whatoever between those nodes.

          • 2. Re: Farm Deployment with UDP not working
            brian.stansberry

            Try increasing ucast_recv_buf_size and mcast_recv_buf_size to more than 10MB.

            • 3. Re: Farm Deployment with UDP not working
              redchili

              I've set the buffersizes to 20Mb now which didn't change the behavior. The cluster node on which the application is deployed still shows the last line in the logs:

              3:56:53,880 INFO [EARDeployer] Started J2EE application: file:/opt/jboss/jboss-4.2.0.CR1/server/node/farm/kusssdemo.ear
              13:56:53,883 INFO [ClusterFileTransfer] Start push of file kusssdemo.ear to cluster.


              However is it really necessary to adopt the size of the buffer with the deployed application archive?
              So if I have a 30Mb ear-file does the buffer have to be 30Mb? This seems a bit wierd for me.

              • 4. Re: Farm Deployment with UDP not working
                redchili

                I've tried it the "other way around" I've deployed the application on one node. And instead of waiting for a push to the other nodes, I started another node which then should pull the application. This gives me an error on the pulling-node:

                14:18:14,071 INFO [FarmMemberService] **** pullNewDeployments ****
                14:18:14,072 INFO [ClusterFileTransfer] Start pull of file kusssdemo.ear from cluster.
                14:19:14,084 ERROR [FarmMemberService] org.jboss.ha.framework.server.ClusterFileTransfer$ClusterFileTransferException: Did not receive response from remote machine trying to open file 'farm/kusssdemo.ear'. Check remote machine error log.
                


                Unfortunately there is no similar entry on the other node (remote machine). It seems like the two machines don't like eachother.
                What's about those other parameters in the UDP Config section. Can I twek something there?

                • 5. Re: Farm Deployment with UDP not working
                  brian.stansberry

                  If your servers are having problems communicating, start with http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss.

                  If those all work OK, next step is to turn on TRACE level logging for category org.jboss.ha and see if it reveals anything.

                  The suggestion to increase the buffer size was wrongheaded; based on faulty memory. The key thing is the buffers need to be bigger than the 512KB chunks that the FarmService breaks files into for transmission. The buffers in the original config were already large enough.

                  • 6. Re: Farm Deployment with UDP not working
                    redchili

                    My servers don't seem to have problems to communicate with eachother. They form a partition, I can join an un-join nodes.
                    To be completely sure I ran the tests you reccommended. I also did the JGroupos multicast tests. All completed successfully.

                    This is the DEBUG log from my cluster services (on the machine that should serve the deployed application)

                    2007-03-27 09:20:01,866 DEBUG [org.jboss.ha.framework.server.FarmMemberService] farmDeployments request, parentDUMap.size=1
                    2007-03-27 09:20:01,903 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@78ee7d2a
                    2007-03-27 09:20:02,023 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@4c85ba73
                    2007-03-27 09:20:02,287 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:02,288 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:02,289 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:02,290 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:02,291 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:02,292 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:02,293 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:02,294 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:02,294 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:02,331 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@58f8a759
                    2007-03-27 09:20:02,891 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:02,892 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:02,893 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:02,894 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:02,895 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:02,896 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:02,897 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:02,898 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:02,899 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:02,931 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@2cb2f1b1
                    2007-03-27 09:20:03,283 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:03,284 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32806 (own address=192.168.1.105:32790)
                    2007-03-27 09:20:03,319 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@3f0ecf98
                    2007-03-27 09:20:03,356 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32806
                    2007-03-27 09:20:03,503 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@655f247f
                    2007-03-27 09:20:04,095 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:04,096 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:04,097 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:04,098 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:04,099 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:04,100 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:04,101 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:04,102 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:04,103 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:04,139 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@5d8d15f0
                    2007-03-27 09:20:05,383 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:05,384 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32802 (own address=192.168.1.105:32786)
                    2007-03-27 09:20:05,419 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@599b6f8b
                    2007-03-27 09:20:05,456 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32802
                    2007-03-27 09:20:05,639 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@6bb83ca2
                    2007-03-27 09:20:06,499 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:06,500 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:06,501 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:06,502 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:06,503 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:06,504 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:06,505 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:06,505 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:06,507 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:06,543 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@11afda9
                    2007-03-27 09:20:07,219 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:07,220 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32809 (own address=192.168.1.105:32793)
                    2007-03-27 09:20:07,255 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@130362d0
                    2007-03-27 09:20:07,292 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32809
                    2007-03-27 09:20:07,427 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@155054f0
                    2007-03-27 09:20:09,543 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:09,544 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32812 (own address=192.168.1.105:32796)
                    2007-03-27 09:20:09,546 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32812
                    2007-03-27 09:20:10,103 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:10,104 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:10,105 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:10,106 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:10,107 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:10,108 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:10,108 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:10,109 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:10,110 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:10,144 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@4deb9df0
                    2007-03-27 09:20:11,243 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@27ce1f87
                    2007-03-27 09:20:12,248 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@4d77ec7b
                    2007-03-27 09:20:13,212 DEBUG [org.jgroups.protocols.MERGE2] initial_mbrs=[[own_addr=192.168.1.106:32809, coord_addr=192.168.1.105:32793, is_server=true], [own_addr=192.168.1.105:32793, coord_addr=192.168.1.105:32793, is_server=true]]
                    2007-03-27 09:20:13,284 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:13,284 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32806 (own address=192.168.1.105:32790)
                    2007-03-27 09:20:13,319 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@7c46a6f8
                    2007-03-27 09:20:13,357 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32806
                    2007-03-27 09:20:13,508 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@27c2386
                    2007-03-27 09:20:13,708 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:13,708 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:13,710 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:13,711 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:13,711 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:13,712 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:13,713 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:13,714 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:13,715 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:13,751 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@6667be00
                    2007-03-27 09:20:15,388 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:15,388 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32802 (own address=192.168.1.105:32786)
                    2007-03-27 09:20:15,424 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@4a2e3a59
                    2007-03-27 09:20:15,461 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32802
                    2007-03-27 09:20:15,644 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@20f8cf1b
                    2007-03-27 09:20:17,224 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:17,225 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32809 (own address=192.168.1.105:32793)
                    2007-03-27 09:20:17,260 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@2ee50686
                    2007-03-27 09:20:17,297 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32809
                    2007-03-27 09:20:17,312 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:17,312 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:17,313 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:17,314 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:17,315 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:17,315 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:17,316 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:17,317 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:17,318 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:17,352 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@7c217540
                    2007-03-27 09:20:17,432 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@3e3d101
                    2007-03-27 09:20:19,548 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:19,548 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32812 (own address=192.168.1.105:32796)
                    2007-03-27 09:20:19,550 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32812
                    2007-03-27 09:20:20,916 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:20,917 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:20,919 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:20,920 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:20,921 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:20,922 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:20,922 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:20,923 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:20,924 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    2007-03-27 09:20:20,960 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@52aa1162
                    2007-03-27 09:20:23,144 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@4ff681ad
                    2007-03-27 09:20:23,288 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:23,289 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.106:32806 (own address=192.168.1.105:32790)
                    2007-03-27 09:20:23,324 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@24fcfde
                    2007-03-27 09:20:23,361 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.106:32806
                    2007-03-27 09:20:23,512 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@45d7f901
                    2007-03-27 09:20:24,116 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StableTask@79bcfbeb
                    2007-03-27 09:20:24,148 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@661cd479
                    2007-03-27 09:20:24,308 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StabilitySendTask@6ec1884e
                    2007-03-27 09:20:24,520 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
                    2007-03-27 09:20:24,521 DEBUG [org.jgroups.util.TimeScheduler] Running task 8-8
                    2007-03-27 09:20:24,522 DEBUG [org.jgroups.util.TimeScheduler] Running task 10-10
                    2007-03-27 09:20:24,523 DEBUG [org.jgroups.util.TimeScheduler] Running task 11-11
                    2007-03-27 09:20:24,524 DEBUG [org.jgroups.util.TimeScheduler] Running task 9-9
                    2007-03-27 09:20:24,524 DEBUG [org.jgroups.util.TimeScheduler] Running task 12-12
                    2007-03-27 09:20:24,525 DEBUG [org.jgroups.util.TimeScheduler] Running task 14-14
                    2007-03-27 09:20:24,526 DEBUG [org.jgroups.util.TimeScheduler] Running task 13-13
                    2007-03-27 09:20:24,526 DEBUG [org.jgroups.util.TimeScheduler] Running task 15-15
                    


                    And the log on the requestion machine:
                    2007-03-27 09:20:01,917 DEBUG [org.jboss.ha.framework.server.FarmMemberService] Found 1 farmDeployments responses
                    2007-03-27 09:20:01,917 INFO [org.jboss.ha.framework.server.FarmMemberService] **** pullNewDeployments ****
                    2007-03-27 09:20:01,918 INFO [org.jboss.ha.framework.server.ClusterFileTransfer] Start pull of file kusssdemo.ear from cluster.
                    2007-03-27 09:20:01,946 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@56b3951d
                    2007-03-27 09:20:03,354 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@2802cf63
                    2007-03-27 09:20:03,430 DEBUG [org.jgroups.util.TimeScheduler] Running task true
                    2007-03-27 09:20:03,431 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.105:32790 (own address=192.168.1.106:32806)
                    2007-03-27 09:20:03,467 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@507d811a
                    2007-03-27 09:20:03,505 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.105:32790


                    Fromt his time on, on the sending machine the
                    Running task
                    (whatever that is) entries go on and on. The receiving machine has already cancelled the transfer (exactly after 1 minute):
                    2007-03-27 09:21:01,931 ERROR [org.jboss.ha.framework.server.FarmMemberService] org.jboss.ha.framework.server.ClusterFileTransfer$ClusterFileTransferException: Did not receive response from remote machine trying to open file 'farm/kusssdemo.ear'. Check remote machine error log.


                    • 7. Re: Farm Deployment with UDP not working
                      redchili

                      And the same with TRACE logging on:

                      2007-03-27 09:58:39,421 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] callMethodOnCoordinatorNode(false), objName=FarmMemberService, methodName=farmDeployments
                      2007-03-27 09:58:39,475 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] dests=[192.168.1.105:32790], method_call=FarmMemberService.farmDeployments(), mode=2, timeout=60000
                      2007-03-27 09:58:39,476 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] real_dests=[192.168.1.105:32790]
                      2007-03-27 09:58:39,507 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer@7d8e9adf
                      2007-03-27 09:58:39,565 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] responses: [sender=192.168.1.105:32790, retval={farm/kusssdemo.ear=Mon Mar 26 17:14:03 CEST 2007}, received=true, suspected=false]
                      
                      2007-03-27 09:58:39,567 DEBUG [org.jboss.ha.framework.server.FarmMemberService] Found 1 farmDeployments responses
                      2007-03-27 09:58:39,568 INFO [org.jboss.ha.framework.server.FarmMemberService] **** pullNewDeployments ****
                      2007-03-27 09:58:39,568 INFO [org.jboss.ha.framework.server.ClusterFileTransfer] Start pull of file kusssdemo.ear from cluster.
                      2007-03-27 09:58:39,568 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] callMethodOnCoordinatorNode(false), objName=org.jboss.ha.framework.server.ClusterFileTransferService, methodName=remotePullOpenFile
                      2007-03-27 09:58:39,568 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] dests=[192.168.1.105:32790], method_call=org.jboss.ha.framework.server.ClusterFileTransferService.remotePullOpenFile(farm/kusssdemo.ear, 192.168.1.106:1099, 192.168.1.106:1099, farm), mode=2, timeout=60000
                      ...
                      ...
                      2007-03-27 09:59:39,802 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] responses: [sender=192.168.1.105:32790, retval=null, received=false, suspected=false]
                      
                      2007-03-27 09:59:39,803 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Ignoring non-received response: sender=192.168.1.105:32790, retval=null, received=false, suspected=false
                      2007-03-27 09:59:39,805 ERROR [org.jboss.ha.framework.server.FarmMemberService] org.jboss.ha.framework.server.ClusterFileTransfer$ClusterFileTransferException: Did not receive response from remote machine trying to open file 'farm/kusssdemo.ear'. Check remote machine error log.
                      


                      The timeout 60000 is exactly the time when the "pulling" machine aborts the transfer. Although it's quite unusual that the transfer of a 10Mb file over a LAN takes more than a minute, should I increase this timeout? If yes, where?

                      • 8. Re: Farm Deployment with UDP not working
                        redchili

                        For completeness, this is the trace on the sending machine when I use TCP instead of UDP:

                        2007-03-27 11:14:14,095 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition received msg
                        2007-03-27 11:14:14,096 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] pre methodName: FarmMemberService.farmDeployments
                        2007-03-27 11:14:14,096 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] handlerName: FarmMemberService methodName: farmDeployments
                        2007-03-27 11:14:14,096 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Handle: FarmMemberService.farmDeployments
                        2007-03-27 11:14:14,096 DEBUG [org.jboss.ha.framework.server.FarmMemberService] farmDeployments request, parentDUMap.size=1
                        2007-03-27 11:14:14,097 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] rpc call return value: {farm/kusssdemo.ear=Mon Mar 26 17:14:03 CEST 2007}
                        2007-03-27 11:14:14,123 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition received msg
                        2007-03-27 11:14:14,129 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] pre methodName: org.jboss.ha.framework.server.ClusterFileTransferService.remotePullOpenFile
                        2007-03-27 11:14:14,129 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] handlerName: org.jboss.ha.framework.server.ClusterFileTransferService methodName: remotePullOpenFile
                        2007-03-27 11:14:14,129 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Handle: org.jboss.ha.framework.server.ClusterFileTransferService.remotePullOpenFile
                        2007-03-27 11:14:14,139 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] rpc call return value: org.jboss.ha.framework.server.ClusterFileTransfer$FileContentChunk@2ef8e535
                        


                        The file is requested, openend and consequentally transferred and successfully deployed on the other machine and the clustered application works fine. Is it possible that 'I'm hitting a bug with the UDP configuration here?

                        • 9. Re: Farm Deployment with UDP not working
                          redchili

                          After days trying to fix this problem I made no progress at all. Should I maybe submit this as a bug? Everything else except the farming deployment works fine in the cluster and the same configuration worked with 4.0.5.

                          • 10. Re: Farm Deployment with UDP not working
                            brian.stansberry

                            I need to see a TRACE log from the sending machine. You posted the requesting machine, which shows half the problem. But not the sending machine. Please leave JGroups at DEBUG as well.

                            If you want, go ahead and zip up the logs and e-mail them to me.

                            The requester fails after 60 seconds because that's the configured RPC timeout. But, the RPC is only asking for the first 512KB chunk of the file, so the total size of the file isn't an issue.

                            • 11. Re: Farm Deployment with UDP not working
                              redchili

                              Sorry for the late answer...
                              Meanwhile I switched to 4.2.0GA and tried again.
                              What I've seen now is that the application is actually deployed to the other nodes but it takes really long when using UDP as protocol for JGroups. The deployment of a 33Mb EAR takes between 8 and 15 minutes. But it's done correctly.

                              If thats normal I guess I can live with it, but if there are any options to speed this up I'd appreciate if you can tell me how.