1 Reply Latest reply on Jul 28, 2014 7:07 AM by rvansa

Infinispan/jGroups difference between discovery of node and transport of messages

mavo1986 Jul 25, 2014 7:55 PM

Hello Community,

the reason why i ask this is a little strange and hard to explain^^

I'm trying to produce a split brain scenario to test how infinispan handle this and what i have to do to avoid loss of data / inconsistency. This is for my bachelor thesis.

Version: Infinispan 6.0.2 Final

I'm using a UDP based transport. What i have done is to set up a "proxy" which copies the DatagramPackets from port 45880 (where a single node (A) is running) to 45881 (where the remaing nodes (B and C) are in). This will work for me atm. But my problem is, that if a have started up nodes B and C, they can find each other and can use submitEverywhere() and all this is fine and needed. Now, when i start my "proxy" and then node A, A will join the cluster with message "WARNUNG: JGRP000010: Nachricht von 192.178.168.68:45881 hat eine andere Version (9.20.45) als unsere (3.4.3); Nachricht wird verworfen" and the other nodes will receive a new clusterView, with all three nodes included as expected.

So far, so good. But now, when i shut down the "proxy", the connection between all of the nodes are fine and up. They can distribute tasks and submitEverywhere() will reach all nodes... but... Why? Where is my Problem, or did i missunterstood something?

Is the mcast_port="${jgroups.udp.mcast_port:45880}" from jgroups.xml (file attached) only for discovering new nodes? And where are the "normal" messages sent through the cluster?

Hopefully that anyone can help me :-)

Regards,

Markus

You can find my config files attached to this...

jgroups.xml 2.5 KB
cluster.xml 1.4 KB

1. Re: Infinispan/jGroups difference between discovery of node and transport of messages

rvansa Jul 28, 2014 7:07 AM (in response to mavo1986)

Markus Vogt wrote:

Hello Community,

the reason why i ask this is a little strange and hard to explain^^

I'm trying to produce a split brain scenario to test how infinispan handle this and what i have to do to avoid loss of data / inconsistency. This is for my bachelor thesis.

Beware the split brain scenario is NOT implemented in Infinispan 6.0 - it's on the roadmap for Infinispan 7.0

Version: Infinispan 6.0.2 Final

I'm using a UDP based transport. What i have done is to set up a "proxy" which copies the DatagramPackets from port 45880 (where a single node (A) is running) to 45881 (where the remaing nodes (B and C) are in). This will work for me atm. But my problem is, that if a have started up nodes B and C, they can find each other and can use submitEverywhere() and all this is fine and needed. Now, when i start my "proxy" and then node A, A will join the cluster with message "WARNUNG: JGRP000010: Nachricht von 192.178.168.68:45881 hat eine andere Version (9.20.45) als unsere (3.4.3); Nachricht wird verworfen" and the other nodes will receive a new clusterView, with all three nodes included as expected.

Each message contains version of JGroups, the fact that it writes out 9.20.45 means that you have not copied the message correctly - the first two bytes of the message contain some rubbish.
If you want to simulate disjoint connection between nodes, there's even easier way: you can dynamically insert the DISCARD protocol with up=1 and down=1 right above the UDP protocol - that way you'll block all communication between nodes without any such proxies.

So far, so good. But now, when i shut down the "proxy", the connection between all of the nodes are fine and up. They can distribute tasks and submitEverywhere() will reach all nodes... but... Why? Where is my Problem, or did i missunterstood something?

Is the mcast_port="${jgroups.udp.mcast_port:45880}" from jgroups.xml (file attached) only for discovering new nodes? And where are the "normal" messages sent through the cluster?

mcast_port/mcast_addr is used for multicast communication. However, a lot of messages are sent over unicast, which uses bind_addr/bind_port - you would need to proxy these as well. As I've said, DISCARD is a simpler solution. If you use FD_SOCK in the protocol stack, also don't forget to call FD_SOCK.stopServerSocket(). And the last issue could be with the RELAY|RELAY2 protocol which creates a new channel, but you probably won't have to use RELAY*.

The last thing: There's already a tool for testing (not only) node crashes and splitbrain, RadarGun. Check that out, at least for examples how to simulate crashes (see class InfinispanKillableLifecycle).
Actions