7 Replies Latest reply on Jan 21, 2009 2:12 PM by brian.stansberry

    discarded message from non-member

    bmelloni

      As far as I can tell I setup a cluster following all the instructions. But as I start the 2nd server, even without any of my apps deployed, I get the error "discarded message from non-member" several times on the first server. Needless to say... things get worse when I try to deploy a simple webapp through the farm folder. I then see lots of errors and although I see the file -slowly (horrendously slowly)- arriving at the other server, the cluster is unusable. The documentation and troubleshooting FAQs haven't helped.

      Does anyone have any clues about the possible causes?

      Configuration for both boxes:
      - JDK 1.6
      - jBoss EAP 4.3
      - Windows XP
      - Firewall disabled for test
      - Run all the jgroups standalone troubleshooting test successfully (problem only when using full jBoss)
      - Using the 'all' configuration
      - Reduced the network to a linksys router and 2 boxes

      Symptoms:

      - jGroups seems to start correctly and recognize both boxes:
      15:50:45,995 INFO [DefaultPartition] I am (127.0.0.1:1099) received membershipChanged event:
      15:50:45,995 INFO [DefaultPartition] Dead members: 0 ([])
      15:50:45,995 INFO [DefaultPartition] New Members : 0 ([])
      15:50:45,995 INFO [DefaultPartition] All Members : 2 ([127.0.0.1:1099, 127.0.0.1:1099])
      15:51:07,386 INFO [TreeCache] viewAccepted(): [192.168.11.103:2591|1] [192.168.11.103:2591, 192.168.11.102:1136]
      15:51:09,730 INFO [TreeCache] viewAccepted(): [192.168.11.103:2595|1] [192.168.11.103:2595, 192.168.11.102:1141]

      - Then the first server complains about 9 times like:
      15:51:24,808 WARN [NAKACK] 192.168.11.103:2600] discarded message from non-member 192.168.11.102:1147, my view is [192.168.11.1
      03:2600|0] [192.168.11.103:2600]

      My guess is that I missed configuring something. Because of the jGroups tests I am reasonably confident that multicast works OK.

      I searched this forum (no results), then googled and found lots of similar posts, but no answers. Any help will be greatly welcome.

        • 1. Re: discarded message from non-member
          brian.stansberry

          First, if you are a support customer (you're using EAP), please open a case via the Customer Support Portal. There's no SLA via the forums.

          Otherwise,

          1) Are you actually using UDP multicast? The ports shown in your logs seem more like what would be used by a TCP-based JGroups config. (Could be multicast though; depends on your config).

          2) I need to understand what channels are using 192.168.11.102:1147 and 192.168.11.103:2600. Please find the logging that looks like this on the two nodes and post the area around it:

          -------------------------------------------------------
          GMS: address is 192.168.11.102:1147
          -------------------------------------------------------

          or

          -------------------------------------------------------
          GMS: address is 192.168.11.103:2600
          -------------------------------------------------------

          3) The following will likely cause problems, although AFAIR not the NAKACK issue you are reporting:

          15:50:45,995 INFO [DefaultPartition] All Members : 2 ([127.0.0.1:1099, 127.0.0.1:1099])

          That tells me you have JBoss bound to 127.0.0.1 on both nodes. That would occur either by starting JBoss with -b 127.0.0.1 on both nodes, or by not setting -b and leaving the 127.0.0.1 default. The AS clustering code uses the bind address and JNDI port to form a unique cluster-wide id for each node. Works fine, except when you bind JBoss to 127.0.0.1 or 0.0.0.0 on more than one machine. If you *want* to use 127.0.0.1 or 0.0.0.0 as the -b value on more than one node, you should edit the server/all/deploy/cluster-service.xml's ClusterPartition mbean and either change

          ${jboss.bind.address}

          to something unique per server, like

          192.168.11.102

          or, explicitly configure a String "NodeName" attribute with a unique value per node:

          node1

          Bottom line, you don't want duplicates in the "[DefaultPartition] All Members" logging.


          • 2. Re: discarded message from non-member
            bmelloni

            Yes, we are a support customer. I am a new employee for the company and I just requested from my boss the info needed to open a ticket.

            Thank you for helping until I am able to open the formal ticket.

            Your suggestion (3) to start with -b took care of the discarded message. But I still get some errors. After starting .103 first and .102 second, the following is still happening:

            A) I see these errors on .102 at about a 2 minute interval:
            09:02:22,093 WARN [ConnectionTable] peer closed connection, trying to re-send msg
            09:02:22,093 ERROR [ConnectionTable] 2nd attempt to send data failed too
            B) Deployment after placing a WAR in the farm folder seems to be horrendously slow (like if it was failing a lot, timing out, and recovering). I see the WAR file being placed in the all/tmp folder, but the byte count goes up at a crawl. In both servers logs I see quite a few debug statements for TORecoveryModule and XARecoveryModule. Once the push finally finished (after 30-60 min!) the application worked on both servers.

            Here are the details you requested in your previous email:

            1) I am using the default clustering configuration, since the instructions say you should get default clustering by just starting in the 'all' configuration. If that is UDP multicast, then yes.

            I believe the only changes I did to the defaults are:
            a) What is indicated in the post-Installation instructions (i.e.: enable the admin accounts so that I can get to the web pages.
            b) Start with "-c all', to get default clustering.
            c) Since I noticed that with the defaults I couldn't access the server by IP, after I capture the logs I posted, I changed the start to include '-b '.

            2)
            =====================
            Log snippet from .103:
            =====================
            08:51:58,433 INFO [ServerInfo] Java version: 1.6.0_11,Sun Microsystems Inc.
            08:51:58,433 INFO [ServerInfo] Java VM: Java HotSpot(TM) Server VM 11.0-b16,Sun Microsystems Inc.
            08:51:58,433 INFO [ServerInfo] OS-System: Windows XP 5.1,x86
            08:51:58,824 INFO [Server] Core system initialized
            08:52:02,621 INFO [WebService] Using RMI server codebase: http://192.168.11.103:8083/
            08:52:02,621 INFO [Log4jService$URLWatchTimerTask] Configuring from URL: resource:jboss-log4j.xml
            08:52:03,058 INFO [TransactionManagerService] JBossTS Transaction Service (JTA version) - JBoss Inc.
            08:52:03,058 INFO [TransactionManagerService] Setting up property manager MBean and JMX layer
            08:52:03,168 INFO [TransactionManagerService] Starting recovery manager
            08:52:03,215 INFO [TransactionManagerService] Recovery manager started
            08:52:03,215 INFO [TransactionManagerService] Binding TransactionManager JNDI Reference
            08:52:07,996 INFO [EJB3Deployer] Starting java:comp multiplexer
            08:52:09,840 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.103:1733
            -------------------------------------------------------
            08:52:11,871 INFO [TreeCache] viewAccepted(): [192.168.11.103:1733|0] [192.168.11.103:1733]
            08:52:11,918 INFO [TreeCache] TreeCache local address is 192.168.11.103:1733
            08:52:11,918 INFO [TreeCache] State could not be retrieved (we are the first member in group)
            08:52:11,918 INFO [TreeCache] parseConfig(): PojoCacheConfig is empty
            08:52:12,074 INFO [STDOUT] no object for null
            08:52:12,074 INFO [STDOUT] no object for null
            08:52:12,121 INFO [STDOUT] no object for null
            08:52:12,137 INFO [STDOUT] no object for {urn:jboss:bean-deployer}supplyType
            08:52:12,137 INFO [STDOUT] no object for {urn:jboss:bean-deployer}dependsType
            08:52:16,480 INFO [NativeServerConfig] JBoss Web Services - Native
            08:52:16,496 INFO [NativeServerConfig] jbossws-native-2.0.1.SP2 (build=200710210837)
            08:52:18,090 INFO [SnmpAgentService] SNMP agent going active
            08:52:18,433 INFO [DefaultPartition] Initializing
            08:52:18,465 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.103:1738
            -------------------------------------------------------
            08:52:20,480 INFO [DefaultPartition] Number of cluster members: 1
            08:52:20,480 INFO [DefaultPartition] Other members: 0
            08:52:20,480 INFO [DefaultPartition] Fetching state (will wait for 30000 milliseconds):
            08:52:20,480 INFO [DefaultPartition] State could not be retrieved (we are the first member in group)
            08:52:20,543 INFO [HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/192.168.11.103
            08:52:20,558 INFO [DetachedHANamingService$AutomaticDiscovery] Listening on /192.168.11.103:1102, group=230.0.0.4, HA-JNDI addr
            ess=192.168.11.103:1100
            08:52:20,933 INFO [TreeCache] No transaction manager lookup class has been defined. Transactions cannot be used
            08:52:21,027 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.103:1742
            -------------------------------------------------------
            08:52:23,043 INFO [TreeCache] viewAccepted(): [192.168.11.103:1742|0] [192.168.11.103:1742]
            08:52:23,043 INFO [TreeCache] TreeCache local address is 192.168.11.103:1742
            08:52:23,324 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.103:1746
            -------------------------------------------------------
            08:52:25,324 INFO [TreeCache] viewAccepted(): [192.168.11.103:1746|0] [192.168.11.103:1746]
            08:52:25,324 INFO [TreeCache] TreeCache local address is 192.168.11.103:1746
            ================================

            Snippet from .102:
            ================================
            09:01:43,031 INFO [ServerInfo] Java version: 1.6.0_10,Sun Microsystems Inc.
            09:01:43,031 INFO [ServerInfo] Java VM: Java HotSpot(TM) Server VM 11.0-b15,Sun Microsystems Inc.
            09:01:43,031 INFO [ServerInfo] OS-System: Windows XP 5.1,x86
            09:01:43,531 INFO [Server] Core system initialized
            09:01:45,359 INFO [WebService] Using RMI server codebase: http://192.168.11.102:8083/
            09:01:45,359 INFO [Log4jService$URLWatchTimerTask] Configuring from URL: resource:jboss-log4j.xml
            09:01:45,750 INFO [TransactionManagerService] JBossTS Transaction Service (JTA version) - JBoss Inc.
            09:01:45,750 INFO [TransactionManagerService] Setting up property manager MBean and JMX layer
            09:01:45,921 INFO [TransactionManagerService] Starting recovery manager
            09:01:45,968 INFO [TransactionManagerService] Recovery manager started
            09:01:45,968 INFO [TransactionManagerService] Binding TransactionManager JNDI Reference
            09:01:47,781 INFO [EJB3Deployer] Starting java:comp multiplexer
            09:01:49,296 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.102:1577
            -------------------------------------------------------
            09:01:51,515 INFO [TreeCache] viewAccepted(): [192.168.11.103:1733|1] [192.168.11.103:1733, 192.168.11.102:1577]
            09:01:51,578 INFO [TreeCache] TreeCache local address is 192.168.11.102:1577
            09:01:51,640 INFO [TreeCache] received the state (size=1024 bytes)
            09:01:51,656 INFO [TreeCache] state was retrieved successfully (in 78 milliseconds)
            09:01:51,656 INFO [TreeCache] parseConfig(): PojoCacheConfig is empty
            09:01:51,703 INFO [STDOUT] no object for null
            09:01:51,703 INFO [STDOUT] no object for null
            09:01:51,718 INFO [STDOUT] no object for null
            09:01:51,750 INFO [STDOUT] no object for {urn:jboss:bean-deployer}supplyType
            09:01:51,765 INFO [STDOUT] no object for {urn:jboss:bean-deployer}dependsType
            09:01:53,000 INFO [NativeServerConfig] JBoss Web Services - Native
            09:01:53,000 INFO [NativeServerConfig] jbossws-native-2.0.1.SP2 (build=200710210837)
            09:01:53,453 INFO [SnmpAgentService] SNMP agent going active
            09:01:53,687 INFO [DefaultPartition] Initializing
            09:01:53,718 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.102:1583
            -------------------------------------------------------
            09:02:00,562 INFO [DefaultPartition] Number of cluster members: 2
            09:02:00,562 INFO [DefaultPartition] Other members: 1
            09:02:00,562 INFO [DefaultPartition] Fetching state (will wait for 30000 milliseconds):
            09:02:00,750 INFO [DefaultPartition] state was retrieved successfully (in 188 milliseconds)
            09:02:00,953 INFO [HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/192.168.11.102
            09:02:00,953 INFO [DetachedHANamingService$AutomaticDiscovery] Listening on /192.168.11.102:1102, group=230.0.0.4, HA-JNDI addr
            ess=192.168.11.102:1100
            09:02:01,218 INFO [TreeCache] No transaction manager lookup class has been defined. Transactions cannot be used
            09:02:01,312 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.102:1589
            -------------------------------------------------------
            09:02:03,578 INFO [TreeCache] viewAccepted(): [192.168.11.103:1742|1] [192.168.11.103:1742, 192.168.11.102:1589]
            09:02:03,640 INFO [TreeCache] TreeCache local address is 192.168.11.102:1589
            09:02:03,734 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is 192.168.11.102:1594
            -------------------------------------------------------
            09:02:06,031 INFO [TreeCache] viewAccepted(): [192.168.11.103:1746|1] [192.168.11.103:1746, 192.168.11.102:1594]
            09:02:06,093 INFO [TreeCache] TreeCache local address is 192.168.11.102:1594

            • 3. Re: discarded message from non-member
              brian.stansberry

              Make sure when you open a support ticket that you reference this thread so the support team can see the background.

              Please post the contents of your deploy/cluster-service.xml file.

              Your farming issue for sure sounds like a communication issue; i.e. lost messages, lots of retries. Not bad enough that the cluster falls apart, but bad enough that RPCs around the cluster take forever.

              • 4. Re: discarded message from non-member
                bmelloni

                Here is cluster-service.xml for both servers.

                It should be 'untouched' from the original install (although I remember having to change 'somewhere' - maybe in this file or another file - a value from 0 to 1 to avoid the nodes fighting each other for the same identity).

                .103 (the first server I start):
                ==================

                <?xml version="1.0" encoding="UTF-8"?>

                <!-- ===================================================================== -->
                <!-- -->
                <!-- Sample Clustering Service Configuration -->
                <!-- -->
                <!-- ===================================================================== -->



                <!-- ==================================================================== -->
                <!-- Cluster Partition: defines cluster -->
                <!-- ==================================================================== -->



                <!-- Name of the partition being built -->
                ${jboss.partition.name:DefaultPartition}

                <!-- The address used to determine the node name -->
                ${jboss.bind.address}

                <!-- Determine if deadlock detection is enabled -->
                False

                <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
                30000

                <!-- The JGroups protocol configuration -->

                <!--
                The default UDP stack:
                - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
                appropriate NIC IP address, e.g bind_addr="192.168.0.2".
                - On Windows machines, because of the media sense feature being broken with multicast
                (even after disabling media sense) set the UDP protocol's loopback attribute to true
                -->

                <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}"
                mcast_port="${jboss.hapartition.mcast_port:45566}"
                tos="8"
                ucast_recv_buf_size="20000000"
                ucast_send_buf_size="640000"
                mcast_recv_buf_size="25000000"
                mcast_send_buf_size="640000"
                loopback="false"
                discard_incompatible_packets="true"
                enable_bundling="false"
                max_bundle_size="64000"
                max_bundle_timeout="30"
                use_incoming_packet_handler="true"
                use_outgoing_packet_handler="false"
                ip_ttl="${jgroups.udp.ip_ttl:2}"
                down_thread="false" up_thread="false"/>
                <PING timeout="2000"
                down_thread="false" up_thread="false" num_initial_members="3"/>
                <MERGE2 max_interval="100000"
                down_thread="false" up_thread="false" min_interval="20000"/>
                <FD_SOCK down_thread="false" up_thread="false"/>
                <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
                <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
                <pbcast.NAKACK max_xmit_size="60000"
                use_mcast_xmit="false" gc_lag="0"
                retransmit_timeout="300,600,1200,2400,4800"
                down_thread="false" up_thread="false"
                discard_delivered_msgs="true"/>
                <UNICAST timeout="300,600,1200,2400,3600"
                down_thread="false" up_thread="false"/>
                <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                down_thread="false" up_thread="false"
                max_bytes="400000"/>
                <pbcast.GMS print_local_addr="true" join_timeout="3000"
                down_thread="false" up_thread="false"
                join_retry_timeout="2000" shun="true"
                view_bundling="true"/>
                <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
                <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>


                <!-- Alternate TCP stack: customize it for your environment, change bind_addr and initial_hosts -->
                <!--

                <TCP bind_addr="thishost" start_port="7800" loopback="true"
                tcp_nodelay="true"
                recv_buf_size="20000000"
                send_buf_size="640000"
                discard_incompatible_packets="true"
                enable_bundling="false"
                max_bundle_size="64000"
                max_bundle_timeout="30"
                use_incoming_packet_handler="true"
                use_outgoing_packet_handler="false"
                down_thread="false" up_thread="false"
                use_send_queues="false"
                sock_conn_timeout="300"
                skip_suspected_members="true"/>
                <TCPPING initial_hosts="thishost[7800],otherhost[7800]" port_range="3"
                timeout="3000"
                down_thread="false" up_thread="false"
                num_initial_members="3"/>
                <MERGE2 max_interval="100000"
                down_thread="false" up_thread="false" min_interval="20000"/>
                <FD_SOCK down_thread="false" up_thread="false"/>
                <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
                <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
                <pbcast.NAKACK max_xmit_size="60000"
                use_mcast_xmit="false" gc_lag="0"
                retransmit_timeout="300,600,1200,2400,4800"
                down_thread="false" up_thread="false"
                discard_delivered_msgs="true"/>
                <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                down_thread="false" up_thread="false"
                max_bytes="400000"/>
                <pbcast.GMS print_local_addr="true" join_timeout="3000"
                down_thread="false" up_thread="false"
                join_retry_timeout="2000" shun="true"
                view_bundling="true"/>
                <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>

                -->

                jboss:service=Naming


                <!-- ==================================================================== -->
                <!-- HA Session State Service for SFSB -->
                <!-- ==================================================================== -->


                jboss:service=Naming
                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                <!-- JNDI name under which the service is bound -->
                /HASessionState/Default
                <!-- Max delay before cleaning unreclaimed state.
                Defaults to 30*60*1000 => 30 minutes -->
                0


                <!-- ==================================================================== -->
                <!-- HA JNDI -->
                <!-- ==================================================================== -->


                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
                ${jboss.bind.address}
                <!-- Port on which the HA-JNDI stub is made available -->
                1100
                <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. -->
                1101
                <!-- Accept backlog of the bootstrap socket -->
                50
                <!-- The thread pool service used to control the bootstrap and
                auto discovery lookups -->
                <depends optional-attribute-name="LookupPool"
                proxy-type="attribute">jboss.system:service=ThreadPool

                <!-- A flag to disable the auto discovery via multicast -->
                false
                <!-- Set the auto-discovery bootstrap multicast bind address. If not
                specified and a BindAddress is specified, the BindAddress will be used. -->
                ${jboss.bind.address}
                <!-- Multicast Address and group port used for auto-discovery -->
                ${jboss.partition.udpGroup:230.0.0.4}
                1102
                <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
                16
                <!-- The load balancing policy for HA-JNDI -->
                org.jboss.ha.framework.interfaces.RoundRobin

                <!-- Client socket factory to be used for client-server
                RMI invocations during JNDI queries
                custom
                -->
                <!-- Server socket factory to be used for client-server
                RMI invocations during JNDI queries
                custom
                -->


                <!-- ==================================================================== -->
                <!-- HA Invokers -->
                <!-- ==================================================================== -->


                jboss:service=TransactionManager
                <depends optional-attribute-name="Connector"
                proxy-type="attribute">jboss.remoting:service=Connector,transport=socket
                jboss:service=${jboss.partition.name:DefaultPartition}



                ${jboss.bind.address}
                4447
                <!--
                custom
                custom
                -->
                jboss:service=Naming


                <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads -->

                1
                300
                300
                60000
                ${jboss.bind.address}
                4448
                ${jboss.bind.address}
                0
                false
                <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager
                jboss:service=Naming


                <!-- ==================================================================== -->

                <!-- ==================================================================== -->
                <!-- Distributed cache invalidation -->
                <!-- ==================================================================== -->


                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                jboss.cache:service=InvalidationManager
                jboss.cache:service=InvalidationManager
                DefaultJGBridge




                .102 (the secondserver):
                ===============

                <?xml version="1.0" encoding="UTF-8"?>

                <!-- ===================================================================== -->
                <!-- -->
                <!-- Sample Clustering Service Configuration -->
                <!-- -->
                <!-- ===================================================================== -->



                <!-- ==================================================================== -->
                <!-- Cluster Partition: defines cluster -->
                <!-- ==================================================================== -->



                <!-- Name of the partition being built -->
                ${jboss.partition.name:DefaultPartition}

                <!-- The address used to determine the node name -->
                ${jboss.bind.address}

                <!-- Determine if deadlock detection is enabled -->
                False

                <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
                30000

                <!-- The JGroups protocol configuration -->

                <!--
                The default UDP stack:
                - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
                appropriate NIC IP address, e.g bind_addr="192.168.0.2".
                - On Windows machines, because of the media sense feature being broken with multicast
                (even after disabling media sense) set the UDP protocol's loopback attribute to true
                -->

                <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}"
                mcast_port="${jboss.hapartition.mcast_port:45566}"
                tos="8"
                ucast_recv_buf_size="20000000"
                ucast_send_buf_size="640000"
                mcast_recv_buf_size="25000000"
                mcast_send_buf_size="640000"
                loopback="false"
                discard_incompatible_packets="true"
                enable_bundling="false"
                max_bundle_size="64000"
                max_bundle_timeout="30"
                use_incoming_packet_handler="true"
                use_outgoing_packet_handler="false"
                ip_ttl="${jgroups.udp.ip_ttl:2}"
                down_thread="false" up_thread="false"/>
                <PING timeout="2000"
                down_thread="false" up_thread="false" num_initial_members="3"/>
                <MERGE2 max_interval="100000"
                down_thread="false" up_thread="false" min_interval="20000"/>
                <FD_SOCK down_thread="false" up_thread="false"/>
                <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
                <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
                <pbcast.NAKACK max_xmit_size="60000"
                use_mcast_xmit="false" gc_lag="0"
                retransmit_timeout="300,600,1200,2400,4800"
                down_thread="false" up_thread="false"
                discard_delivered_msgs="true"/>
                <UNICAST timeout="300,600,1200,2400,3600"
                down_thread="false" up_thread="false"/>
                <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                down_thread="false" up_thread="false"
                max_bytes="400000"/>
                <pbcast.GMS print_local_addr="true" join_timeout="3000"
                down_thread="false" up_thread="false"
                join_retry_timeout="2000" shun="true"
                view_bundling="true"/>
                <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
                <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>


                <!-- Alternate TCP stack: customize it for your environment, change bind_addr and initial_hosts -->
                <!--

                <TCP bind_addr="thishost" start_port="7800" loopback="true"
                tcp_nodelay="true"
                recv_buf_size="20000000"
                send_buf_size="640000"
                discard_incompatible_packets="true"
                enable_bundling="false"
                max_bundle_size="64000"
                max_bundle_timeout="30"
                use_incoming_packet_handler="true"
                use_outgoing_packet_handler="false"
                down_thread="false" up_thread="false"
                use_send_queues="false"
                sock_conn_timeout="300"
                skip_suspected_members="true"/>
                <TCPPING initial_hosts="thishost[7800],otherhost[7800]" port_range="3"
                timeout="3000"
                down_thread="false" up_thread="false"
                num_initial_members="3"/>
                <MERGE2 max_interval="100000"
                down_thread="false" up_thread="false" min_interval="20000"/>
                <FD_SOCK down_thread="false" up_thread="false"/>
                <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
                <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
                <pbcast.NAKACK max_xmit_size="60000"
                use_mcast_xmit="false" gc_lag="0"
                retransmit_timeout="300,600,1200,2400,4800"
                down_thread="false" up_thread="false"
                discard_delivered_msgs="true"/>
                <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                down_thread="false" up_thread="false"
                max_bytes="400000"/>
                <pbcast.GMS print_local_addr="true" join_timeout="3000"
                down_thread="false" up_thread="false"
                join_retry_timeout="2000" shun="true"
                view_bundling="true"/>
                <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>

                -->

                jboss:service=Naming


                <!-- ==================================================================== -->
                <!-- HA Session State Service for SFSB -->
                <!-- ==================================================================== -->


                jboss:service=Naming
                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                <!-- JNDI name under which the service is bound -->
                /HASessionState/Default
                <!-- Max delay before cleaning unreclaimed state.
                Defaults to 30*60*1000 => 30 minutes -->
                0


                <!-- ==================================================================== -->
                <!-- HA JNDI -->
                <!-- ==================================================================== -->


                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
                ${jboss.bind.address}
                <!-- Port on which the HA-JNDI stub is made available -->
                1100
                <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. -->
                1101
                <!-- Accept backlog of the bootstrap socket -->
                50
                <!-- The thread pool service used to control the bootstrap and
                auto discovery lookups -->
                <depends optional-attribute-name="LookupPool"
                proxy-type="attribute">jboss.system:service=ThreadPool

                <!-- A flag to disable the auto discovery via multicast -->
                false
                <!-- Set the auto-discovery bootstrap multicast bind address. If not
                specified and a BindAddress is specified, the BindAddress will be used. -->
                ${jboss.bind.address}
                <!-- Multicast Address and group port used for auto-discovery -->
                ${jboss.partition.udpGroup:230.0.0.4}
                1102
                <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
                16
                <!-- The load balancing policy for HA-JNDI -->
                org.jboss.ha.framework.interfaces.RoundRobin

                <!-- Client socket factory to be used for client-server
                RMI invocations during JNDI queries
                custom
                -->
                <!-- Server socket factory to be used for client-server
                RMI invocations during JNDI queries
                custom
                -->


                <!-- ==================================================================== -->
                <!-- HA Invokers -->
                <!-- ==================================================================== -->


                jboss:service=TransactionManager
                <depends optional-attribute-name="Connector"
                proxy-type="attribute">jboss.remoting:service=Connector,transport=socket
                jboss:service=${jboss.partition.name:DefaultPartition}



                ${jboss.bind.address}
                4447
                <!--
                custom
                custom
                -->
                jboss:service=Naming


                <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads -->

                1
                300
                300
                60000
                ${jboss.bind.address}
                4448
                ${jboss.bind.address}
                0
                false
                <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager
                jboss:service=Naming


                <!-- ==================================================================== -->

                <!-- ==================================================================== -->
                <!-- Distributed cache invalidation -->
                <!-- ==================================================================== -->


                <!-- We now inject the partition into the HAJNDI service instead
                of requiring that the partition name be passed -->
                <depends optional-attribute-name="ClusterPartition"
                proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}
                jboss.cache:service=InvalidationManager
                jboss.cache:service=InvalidationManager
                DefaultJGBridge



                • 5. Re: discarded message from non-member
                  brian.stansberry

                  Your "ConnectionTable" logging:

                  09:02:22,093 WARN [ConnectionTable] peer closed connection, trying to re-send msg
                  09:02:22,093 ERROR [ConnectionTable] 2nd attempt to send data failed too


                  is coming from the JBoss Messaging Data Channel. That channel uses TCP unicast for sending messages, unlike the other channels that use UDP multicast.

                  Farming doesn't use that channel; it uses a different one, the UDP multicast-based one from cluster-service.xml.

                  So, two separate channels using different underlying protocols are experiencing problems, which sounds to me like a network or host configuration problem. Hard to say what; if resolving the firewall issues you raise in a separate thread make it go away, there's your answer.

                  See also http://www.jboss.org/community/docs/DOC-12375

                  • 6. Re: discarded message from non-member
                    bmelloni

                    This Connection issue does not go away with the firewall turned off. The two posts are independent problems.

                    Let's table this problem. I should get my commercial license credentials today or tomorrow and will use phone support to start over from scratch and reinstall both servers in the cluster according to their instructions instead of what the documentation seems to say.

                    A suggestion:
                    - The server config guide is a good reference, but it is worthless as a cluster installation guide unless you are already a jBoss configuration expert.
                    - There is a need for a simple, step by step guide for installing a basic cluster.
                    - I might even write it myself and contribute it back after talking to support. Nobody else should suffer through this installation nightmare.

                    Thanks for trying to help.

                    • 7. Re: discarded message from non-member
                      brian.stansberry

                       

                      "bmelloni" wrote:
                      This Connection issue does not go away with the firewall turned off. The two posts are independent problems.

                      Let's table this problem. I should get my commercial license credentials today or tomorrow and will use phone support to start over from scratch and reinstall both servers in the cluster according to their instructions instead of what the documentation seems to say.


                      OK. The support team is much better equipped to handle issues that are specific to a particular environment.


                      A suggestion:
                      - The server config guide is a good reference, but it is worthless as a cluster installation guide unless you are already a jBoss configuration expert.
                      - There is a need for a simple, step by step guide for installing a basic cluster.
                      - I might even write it myself and contribute it back after talking to support. Nobody else should suffer through this installation nightmare.


                      Thanks for the input. I've heard similar things before, and basically agree. I'd certainly welcome any contributions, particularly on AS 4.x. I'm rewriting the Clustering Guide for AS 5 and have added some of what you are talking about. A draft of that can be found attached to http://www.jboss.org/community/docs/DOC-12928; comments are welcome. (Note: it's the attached document at the bottom of the page; not the links at the top. I won't bore you with the details as to why).