As per my knowledge, the join-retry-timedout problem with JGroups was solved in 2.8.0 ( by introducing logical addresses and removing shunning?)
It looks like the cause is @ the coordinator to which the new member is trying to send the join request. The join retries for rejoins happens after below lines in coordinator which does not happen always. Following lines were printed after a view change.
2011-02-11 17:23:37,974 WARN [org.jgroups.protocols.pbcast.GMS] 192.168.12.3:7800:1331-25091: failed to collect all ACKs (expected=1) for view [192.168.12.3:7800:1331-25091|4] [192.168.12.3:7800:1331-25091] after 2000ms, missing ACKs from [192.168.12.3:7800:1331-25091] 2011-02-11 17:23:39,977 WARN [org.jgroups.protocols.pbcast.FLUSH] 192.168.12.3:7800:1331-25091: waiting for UNBLOCK timed out after 2000 ms
Why a node fails to get ack from itself? Anyway, let me try setting -Djgroups.bind.address.
Sorry again; it was due to an bug in the host application code. A view change listener added in the coordinator node was blocked at waiting for completion of a thread which was started to perform some operations in the application for the view change.