1 Reply Latest reply on May 21, 2010 3:21 AM by praveen.kumar

JBoss Cluster with TCP - Discarded messages

patrickheinzelmann Feb 27, 2010 7:37 PM

I'm trying to create a Jboss Cluster using TCP and using two Windows PCs with Jboss 5.1 GA. I have to use TCP, because the hoster doesn't support UCP.
The servers are creating a cluster with two members, but one node (192.168.0.5) discards the messages of the other node (192.168.0.7).

The following lines are in the server.log files.

Logfile of 192.168.0.5:
2010-02-27 21:37:14,421 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) sender 192.168.0.7:7650 not found in xmit_table
2010-02-27 21:37:14,421 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) range is null
2010-02-27 21:37:14,437 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-4,192.168.0.5:7650) Received new cluster view: MergeView::[192.168.0.5:7650|5] [192.168.0.5:7650, 192.168.0.7:7650], subgroups=[[192.168.0.5:7650|0] [192.168.0.5:7650], [192.168.0.7:7650|4] [192.168.0.7:7650]]
2010-02-27 21:37:15,843 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] (main) deploy, ctxPath=/
2010-02-27 21:37:16,000 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] (main) deploy, ctxPath=/jmx-console
2010-02-27 21:37:16,375 INFO [org.apache.coyote.http11.Http11Protocol] (main) Starting Coyote HTTP/1.1 on http-0.0.0.0-8080
2010-02-27 21:37:16,421 INFO [org.apache.coyote.ajp.AjpProtocol] (main) Starting Coyote AJP/1.3 on ajp-0.0.0.0-8009
2010-02-27 21:37:16,437 INFO [org.jboss.bootstrap.microcontainer.ServerImpl] (main) JBoss (Microcontainer) [5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)] Started in 1m:52s:266ms
2010-02-27 21:37:21,296 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
2010-02-27 21:37:21,828 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
2010-02-27 21:37:23,218 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
2010-02-27 21:37:23,859 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
2010-02-27 21:38:09,437 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
2010-02-27 21:38:09,781 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
2010-02-27 21:38:21,937 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
2010-02-27 21:38:21,937 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.5:7650) sender 192.168.0.7:7650 not found in xmit_table
2010-02-27 21:38:21,937 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.5:7650) range is null
2010-02-27 21:38:21,953 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.TESTPART] (Incoming-2,192.168.0.5:7650) New cluster view for partition TESTPART (id: 5, delta: 1) : [192.168.0.5:1099, 192.168.0.7:1099]
2010-02-27 21:38:21,937 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]

Logfile of 192.168.0.7:
2010-02-27 21:36:25,147 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) sender 192.168.0.5:7650 not found in xmit_table
2010-02-27 21:36:25,147 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) range is null
2010-02-27 21:36:25,147 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,192.168.0.7:7650) Received new cluster view: MergeView::[192.168.0.5:7650|5] [192.168.0.5:7650, 192.168.0.7:7650], subgroups=[[192.168.0.5:7650|0] [192.168.0.5:7650], [192.168.0.7:7650|4] [192.168.0.7:7650]]
2010-02-27 21:36:28,147 WARN [org.jgroups.protocols.FD_SOCK] (OOB-96,192.168.0.7:7650) I was suspected by 192.168.0.5:7650; ignoring the SUSPECT message
2010-02-27 21:37:26,585 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-98,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
2010-02-27 21:37:27,288 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-99,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
2010-02-27 21:37:32,569 WARN [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
2010-02-27 21:37:32,585 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) sender 192.168.0.5:7650 not found in xmit_table
2010-02-27 21:37:32,585 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) range is null

As configuration of cluster, I'm using the standard "all" config. To start the cluster, I'm using following commands.

start command of 192.168.0.5 :
run.bat -c all -b 0.0.0.0 -Djboss.partition.name="TESTPART" -Djboss.default.jgroups.stack="tcp-sync" -Djgroups.bind_address="192.168.0.5"

start command of 192.168.0.7:
run.bat -c all -b 0.0.0.0 -Djboss.partition.name="TESTPART" -Djboss.default.jgroups.stack="tcp-sync" -Djgroups.bind_address="192.168.0.7"

I made small changes at the tcp-sync config in the jgroups-channelfactory-stacks.xml. I changed the ping method from MPING to TCPPING and initial_hosts.

part of the jgroups-channelfactory-stacks.xml of 192.168.0.5:

    <stack name="tcp-sync"
           description="TCP based stack, without flow control and without
                        message bundling. TCP stacks are usually used when IP
                        multicasting cannot be used in a network (e.g.routers
                        discard multicast). This configuration should be used
                        instead of 'tcp' above when (1) synchronous calls are
                        used and (2) the message volume (rate and size) is not
                        that large.">
        <config>
            <TCP
                 singleton_name="tcp_sync"
                 start_port="${jboss.jgroups.tcp_sync.tcp_port:7650}"
                 tcp_nodelay="true"
                 loopback="false"
                 recv_buf_size="20000000"
                 send_buf_size="640000"
                 discard_incompatible_packets="true"
                 max_bundle_size="64000"
                 max_bundle_timeout="30"
                 use_incoming_packet_handler="true"
                 enable_bundling="false"
                 use_send_queues="false"
                 sock_conn_timeout="300"
                 skip_suspected_members="true"
                 enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"
                 diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"
                 diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"

                 use_concurrent_stack="true"

                 thread_pool.enabled="true"
                   thread_pool.min_threads="8"
                   thread_pool.max_threads="200"
                   thread_pool.keep_alive_time="5000"
                   thread_pool.queue_enabled="true"
                   thread_pool.queue_max_size="1000"
                   thread_pool.rejection_policy="discard"

                   oob_thread_pool.enabled="true"
                   oob_thread_pool.min_threads="1"
                   oob_thread_pool.max_threads="8"
                   oob_thread_pool.keep_alive_time="5000"
                   oob_thread_pool.queue_enabled="false"
                   oob_thread_pool.queue_max_size="100"
                   oob_thread_pool.rejection_policy="run"/>
            
            
            
            <TCPPING timeout="3000"
                     initial_hosts="${jgroups.tcpping.initial_hosts:192.168.0.5[7650],192.168.0.7[7650]}"
                     port_range="1"
                     num_initial_members="2"/>
            <MERGE2 max_interval="100000" min_interval="20000"/>
            <FD_SOCK/>
            <FD timeout="6000" max_tries="5" shun="true"/>
            <VERIFY_SUSPECT timeout="1500"/>
            <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
                           retransmit_timeout="300,600,1200,2400,4800"
                           discard_delivered_msgs="true"/>
            <UNICAST timeout="300,600,1200,2400,3600"/>
            <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                           max_bytes="400000"/>
            <pbcast.GMS print_local_addr="true" join_timeout="3000"
                        shun="true"
                        view_bundling="true"
                        view_ack_collection_timeout="5000"/>
            
            <pbcast.STATE_TRANSFER/>
            <pbcast.FLUSH timeout="0"/>
        </config>
    </stack>

part of the jgroups-channelfactory-stacks.xml of 192.168.0.7:

    <stack name="tcp-sync"
           description="TCP based stack, without flow control and without
                        message bundling. TCP stacks are usually used when IP
                        multicasting cannot be used in a network (e.g.routers
                        discard multicast). This configuration should be used
                        instead of 'tcp' above when (1) synchronous calls are
                        used and (2) the message volume (rate and size) is not
                        that large.">
        <config>
            <TCP
                 singleton_name="tcp_sync"
                 start_port="${jboss.jgroups.tcp_sync.tcp_port:7650}"
                 tcp_nodelay="true"
                 loopback="false"
                 recv_buf_size="20000000"
                 send_buf_size="640000"
                 discard_incompatible_packets="true"
                 max_bundle_size="64000"
                 max_bundle_timeout="30"
                 use_incoming_packet_handler="true"
                 enable_bundling="false"
                 use_send_queues="false"
                 sock_conn_timeout="300"
                 skip_suspected_members="true"
                 enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"
                 diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"
                 diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"

                 use_concurrent_stack="true"

                 thread_pool.enabled="true"
                   thread_pool.min_threads="8"
                   thread_pool.max_threads="200"
                   thread_pool.keep_alive_time="5000"
                   thread_pool.queue_enabled="true"
                   thread_pool.queue_max_size="1000"
                   thread_pool.rejection_policy="discard"

                   oob_thread_pool.enabled="true"
                   oob_thread_pool.min_threads="1"
                   oob_thread_pool.max_threads="8"
                   oob_thread_pool.keep_alive_time="5000"
                   oob_thread_pool.queue_enabled="false"
                   oob_thread_pool.queue_max_size="100"
                   oob_thread_pool.rejection_policy="run"/>
            
            
            
            <TCPPING timeout="3000"
                     initial_hosts="${jgroups.tcpping.initial_hosts:192.168.0.7[7650],192.168.0.5[7650]}"
                     port_range="1"
                     num_initial_members="2"/>

            <MERGE2 max_interval="100000" min_interval="20000"/>
            <FD_SOCK/>
            <FD timeout="6000" max_tries="5" shun="true"/>
            <VERIFY_SUSPECT timeout="1500"/>
            <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
                           retransmit_timeout="300,600,1200,2400,4800"
                           discard_delivered_msgs="true"/>
            <UNICAST timeout="300,600,1200,2400,3600"/>
            <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                           max_bytes="400000"/>
            <pbcast.GMS print_local_addr="true" join_timeout="3000"
                        shun="true"
                        view_bundling="true"
                        view_ack_collection_timeout="5000"/>
            
            <pbcast.STATE_TRANSFER/>
            <pbcast.FLUSH timeout="0"/>
        </config>
    </stack>

Are there some additional steps to create a cluster? What I'm doing wrong?

1. Re: JBoss Cluster with TCP - Discarded messages

praveen.kumar May 21, 2010 3:21 AM (in response to patrickheinzelmann)

Give me detail like what changes u did in configuration and important is how u r stating the instances.

Best of luck,
-Praveen Kumar
Actions