4 Replies Latest reply on Apr 15, 2011 12:55 PM by wdfink

Clustering: Master-Slave Identification Problem

nprasanna Apr 12, 2011 8:15 AM

Hi,

I have Jboss EAP 5.1 on 2 Linux Virtual Machines. I wanted to test clustering.

Without modifying any configuration I started the servers with the following commands:

First Machine:

sh run.sh -c Production -g MyCluster -u 239.255.1.7 -b 192.168.41.132 -Djboss.messaging.ServerPeerID=1

When the first one was up and running I started the second one:

sh run.sh -c Production -g MyCluster -u 239.255.1.7 -b 192.168.41.134 -Djboss.messaging.ServerPeerID=2

The Problem:

When the second server is also up and running, in server.log of the 1st server, I get the following repeated warning messages:

2011-04-13 09:10:32,087 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-19,192.168.41.132:55200) 192.168.41.132:55200] discarded message from non-member 192.168.41.134:55200, my view is [192.168.41.132:55200|0] [192.168.41.132:55200]

The second server's log apparently has started it's own Cluster I guess as seen in the log. It identifies itself as the first and only member of its cluster. I think it indicates that both are behaving as Masters, not identifying each others, maybe.

The Weird Part:

If I start the second Server first and then the first server, the situation is surprisingly not reversed!! I still get the repeated warnings mentioned above only in the first server's log.

The various multicast ports that I tested with were:

224.0.0.0 through 224.0.0.255 and also 239.255.x.y

Kindly help and clarify.

1. Clustering: Master-Slave Identification Problem

nprasanna Apr 13, 2011 12:26 AM (in response to nprasanna)

Hi,

This is an urgent problem. So kindly throw some light on it

Thanks.
Actions
2. Clustering: Master-Slave Identification Problem

wdfink Apr 13, 2011 2:58 AM (in response to nprasanna)

Looks like a multicast problem.
I've answered different threads here (search might help), see http://community.jboss.org/thread/165144

But you should read the wiki and test the multicast functionality, see
http://http://community.jboss.org/wiki/TestingJBoss
Actions
3. Clustering: Master-Slave Identification Problem

nprasanna Apr 15, 2011 9:27 AM (in response to wdfink)

Hi Wolf-Dieter Fink,
    Thanks for the reply. I tried the jgroups test that you've suggested http://community.jboss.org/wiki/TestingJBoss. I ran it in the 2 linux VM machines as suggested. But the result was the same as I had described in my initial post. It is this:

Machine A was started First and when the jgroups test command executed establishing a cluster successfully, I started the Machine B. Machine B was identified by Machine A. But not as a cluster member. It threw repeated warning messages similar to the one I had mentioned in my first post:

org.jgroups.protocols.pbcast.NAKACK handleMessage WARNING: Machine A's ip:32770] discarded message from non-member Machine B's ip:32770, my view is [Machine A's ip32770|0] [Machine A's ip:32770]

- Machine A's ip - 192.168.41.132 Machine B's ip - 192.168.41.134

NOW, if I start Machine B first and then Machine A, the output is not as expected. The same warning messages appear in Machine A, not in Machine B. Machine B just says it has started a cluster of its own.

There was another chat-sort of test mentioned in this page: http://www.jgroups.org/manual/html/ch02.html
I ran this test too:
Machine A was the receiver: The command was
/usr/lib/jvm/java/bin/java -cp lib/concurrent.jar:server/testprofile/lib/jgroups.jar:common/lib/commons-logging.jar org.jgroups.tests.McastReceiverTest -mcast_addr 224.10.10.10 -port 5555 -bind_addr 192.168.41.132

Machine B was the sender: The command:
/usr/lib/jvm/java/bin/java -cp lib/concurrent.jar:server/testprofile/lib/jgroups.jar:common/lib/commons-logging.jar org.jgroups.tests.McastSenderTest -mcast_addr 224.10.10.10 -port 5555 -bind_addr 192.168.41.134

Whatever I sent from B was well received by A. But when I made A as the sender, B didn't receive it at all!! Talk about well-receiving! It's the same problem as before.

I tried the tests with -Djava.net.preferIPv4Stack=true, -ttl=32(in the sender) as well. Still no luck.

I checked the wiki(http://community.jboss.org/wiki/JGroups) and the faq (http://community.jboss.org/docs/DOC-9730) pages. But to no avail.

I guess the problem will be in the underlying network architecture. So I'm giving the ifconfig o/p of both the machines here hoping it might help you in the diagnosis.

Machine A:
eth0      Link encap:Ethernet HWaddr 00:0C:29:64:91:1E
            inet addr:192.168.41.132 Bcast:192.168.41.255 Mask:255.255.255.0
            inet6 addr: fe80::20c:29ff:fe64:911e/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
            RX packets:2079 errors:0 dropped:0 overruns:0 frame:0
            TX packets:1784 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:291105 (284.2 KiB) TX bytes:232352 (226.9 KiB)
            Interrupt:67 Base address:0x2024

lo        Link encap:Local Loopback
           inet addr:127.0.0.1 Mask:255.0.0.0
           inet6 addr: ::1/128 Scope:Host
           UP LOOPBACK RUNNING MTU:16436 Metric:1
           RX packets:260 errors:0 dropped:0 overruns:0 frame:0
           TX packets:260 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:24688 (24.1 KiB) TX bytes:24688 (24.1 KiB)

Machine B:
eth0      Link encap:Ethernet HWaddr 00:0C:29:0F:98:33
             inet addr:192.168.41.134 Bcast:192.168.41.255 Mask:255.255.255.0
             inet6 addr: fe80::20c:29ff:fe0f:9833/64 Scope:Link
             UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
             RX packets:2058 errors:0 dropped:0 overruns:0 frame:0
             TX packets:699 errors:0 dropped:0 overruns:0 carrier:0
             collisions:0 txqueuelen:1000
             RX bytes:280247 (273.6 KiB) TX bytes:95707 (93.4 KiB)
             Interrupt:67 Base address:0x2024

lo        Link encap:Local Loopback
           inet addr:127.0.0.1 Mask:255.0.0.0
           inet6 addr: ::1/128 Scope:Host
           UP LOOPBACK RUNNING MTU:16436 Metric:1
           RX packets:45 errors:0 dropped:0 overruns:0 frame:0
           TX packets:45 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:2648 (2.5 KiB) TX bytes:2648 (2.5 KiB)

Thanks in Advance.
Actions
4. Clustering: Master-Slave Identification Problem

wdfink Apr 15, 2011 12:55 PM (in response to nprasanna)

If you start the jgroups test twice at one system does it work?

I suppose you are right and the network will be the problem. But I'm not such familar with the mcast configuration.
What you might test is to use -b 0.0.0.0 or change the mcast_addr to a different from 224* to 239* I think.
Actions

Go to original post