7 Replies Latest reply on Jan 25, 2007 8:04 AM by Bela Ban

    handleJoin(node:port) failed, retrying

    Torsten Römer Newbie

      I have set up a cluster with four nodes and custom partition name following the documentation. I have a test server running in the same subnet, but with the DefaultPartition name.

      All four production nodes and the test server were running and all worked fine, until node1 of the four prod. nodes has been taken down and did not come up again with the repeated warning "handleJoin(node1:port) failed, retrying". I then took down the other three prod. nodes as well, but now any of the nodes failed to come up with the same warning.

      Finally I took down the test server as well and now I could start all four prod. nodes normally.

      Now my question is, how can I avoid this? Should each cluster partition run on its own network? What is then the point with the partition name?

      I read in a post describing a similar problem, that using TCP instead of UDP solved the problem. Should I do that as well? If yes, what are then the "initial_hosts", should it be "thishost + othernodes" on each node?

      Thanks in advance