-
1. Re: Network Loss in a 1 to n Node Cluster
brian.stansberry Aug 31, 2006 1:59 PM (in response to jbirkenmaier)First problem: TreeCacheListener.viewChange(View new_view) passes you a JGroups View object whenever there is a cluster topology change.
Second problem: No, we have nothing like that. -
2. Re: Network Loss in a 1 to n Node Cluster
jbirkenmaier Aug 31, 2006 3:17 PM (in response to jbirkenmaier)Thanks for the quick reply. Regarding the first problem. I see the view change on node 1 when I Ctrl-C JBoss on node 2. However, I don't see the view change when I unplug the network cable.
Here is what's logged when I unplug the cable:
10:41:27,090 INFO [dragoneyes] (UpHandler (STATE_TRANSFER)) Suspected member: 192.168.69.253:33013
10:41:27,092 INFO [dragoneyes] (UpHandler (STATE_TRANSFER)) New cluster view for partition dragoneyes (id: 2, delta: -1) : [192.168.69.122:1099]
10:41:27,092 INFO [dragoneyes] (AsynchViewChangeHandler Thread) I am (192.168.69.122:1099) received membershipChanged event:
10:41:27,092 INFO [dragoneyes] (AsynchViewChangeHandler Thread) Dead members: 1 ([192.168.69.253:1099])
10:41:27,092 INFO [dragoneyes] (AsynchViewChangeHandler Thread) New Members : 0 ([])
10:41:27,093 INFO [dragoneyes] (AsynchViewChangeHandler Thread) All Members : 1 ([192.168.69.122:1099])
Here is what's logged when the cable is plugged back in:
10:42:12,912 INFO [dragoneyes] (UpHandler (STATE_TRANSFER)) New cluster view for partition dragoneyes (id: 3, delta: 1) : [192.168.69.122:1099, 192.168.69.253:1099]
10:42:12,914 INFO [dragoneyes] (AsynchViewChangeHandler Thread) Merging partitions...
10:42:12,914 INFO [dragoneyes] (AsynchViewChangeHandler Thread) Dead members: 0
10:42:12,914 INFO [dragoneyes] (AsynchViewChangeHandler Thread) Originating groups: [[192.168.69.122:32809|2] [192.168.69.122:32809], [192.168.69.253:33013|2] [192.168.69.253:33013]]
JBoss/JGroups sees the loss and restoration of the network. Is there no way to hook into UpHandler or AsynchViewChangeHandler to catch the notification? The viewChange method just isn't doing it. -
3. Re: Network Loss in a 1 to n Node Cluster
brian.stansberry Aug 31, 2006 4:16 PM (in response to jbirkenmaier)OK, your problem is the JGroups channel that your cache instance is using isn't detecting the unplugging of the cable. I bet if you wait about a minute, it will.
The logging you posted is actually for a completely different channel. Technically its a different cluster, even though from a surface point of view it seems like there is only one "cluster".
There is a semi-complicated mechanism for registering for the view change events you posted, but that's really not the right thing to do. The right thing is to ensure your JBoss Cache channel detects the cable unplug.
In your cache config file, find the ClusterConfig element and replace FD with:<FD_SOCK down_thread="false" up_thread="false"/> <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
That's the config we're starting to use everywhere now. See http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK for more details. -
4. Re: Network Loss in a 1 to n Node Cluster
jbirkenmaier Sep 1, 2006 9:49 AM (in response to jbirkenmaier)I read the Wiki page and changed the cluster-service.xml file. After 2 hours, the tree cache was notified about the loss of a cluster node and it in turn notified my application. However, 2 hours is much too long. I need to know a lot sooner. Is there a way to change the timeout value from 2 hours to, say, 30 seconds? Right now, the entry looks like what you suggested:
<FD_SOCK down_thread="false" up_thread="false"/>
<FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
The documentation says that setting this timeout value should override the system default of 2 hours but it still waited the 2 hours when it should have waited 50 seconds (?). -
5. Re: Network Loss in a 1 to n Node Cluster
brian.stansberry Sep 1, 2006 10:20 AM (in response to jbirkenmaier)The cluster-service.xml file does not affect your tree cache in any way. Completely unrelated. But, changing that one as well was good :-)
The tree cache you're using for the shared data must have a config file as well. (Unless you're configuring everything programatically, in which case I'd say use a config file.) You need to apply the FD/FD_SOCK change to that file. -
6. Re: Network Loss in a 1 to n Node Cluster
jbirkenmaier Sep 1, 2006 12:29 PM (in response to jbirkenmaier)OK, once I changed the proper tree cache deployment file, it works. Thanks for your help!!