1 Reply Latest reply on Nov 9, 2005 6:57 AM by manik

    New member servers loop through socket address errors

    fahrv

      My situation is this: I was given an install howto, an existing cluster of four JBoss 4.0.2 servers, and was told to add 4 new servers to the existing cluster.

      However, when the new servers come up, they just keep looping through the same errors in the treecache. In this log snippet, taken from server35p, server35p is the new server I'm trying to bring online, and server07p is the existing master.

      2005-11-08 13:19:03,386 ERROR [org.jgroups.protocols.FD_SOCK] socket address for server07p:43430 could not be fetched, retrying
      2005-11-08 13:19:11,694 ERROR [org.jgroups.protocols.FD_SOCK] socket address for server07p:43430 could not be fetched, retrying
      2005-11-08 13:19:20,002 ERROR [org.jgroups.protocols.FD_SOCK] socket address for server07p:43430 could not be fetched, retrying
      2005-11-08 13:19:31,978 WARN [org.jgroups.protocols.pbcast.GMS] checkSelfInclusion() failed, server35p:33178 is not a member of view [server07p:43430|48] [server07p:43430, server21p:38164, server22p:37383]; discarding view
      2005-11-08 13:19:31,978 WARN [org.jgroups.protocols.pbcast.GMS] I (server35p:33178) am being shunned, will leave and rejoin group (prev_members are [server07p:43430 server21p:38164 server22p:37383 server35p:33178 ])
      2005-11-08 13:19:32,793 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is server35p:33181
      -------------------------------------------------------
      2005-11-08 13:19:32,797 INFO [org.jboss.cache.TreeCache] viewAccepted(): new members: [server07p:43430, server21p:38164, server22p:37383, server35p:33181]
      2005-11-08 13:19:35,798 ERROR [org.jgroups.protocols.FD_SOCK] received null cache; retrying
      2005-11-08 13:19:39,302 ERROR [org.jgroups.protocols.FD_SOCK] received null cache; retrying
      2005-11-08 13:19:42,806 ERROR [org.jgroups.protocols.FD_SOCK] received null cache; retrying
      2005-11-08 13:19:43,310 INFO [org.jboss.cache.TreeCache] received the state (size=192 bytes)
      2005-11-08 13:19:43,310 INFO [org.jboss.cache.TreeCache] transient state: 140 bytes
      2005-11-08 13:19:43,310 INFO [org.jboss.cache.TreeCache] setting transient state
      2005-11-08 13:19:43,311 DEBUG [org.jboss.cache.lock.IdentityLock] Cache instance is null. Use default lock strategy
      2005-11-08 13:19:43,311 INFO [org.jboss.cache.TreeCache] locking the old tree
      2005-11-08 13:19:43,311 INFO [org.jboss.cache.TreeCache] locking the old tree was successful
      2005-11-08 13:19:43,311 INFO [org.jboss.cache.TreeCache] setting the transient state was successful
      2005-11-08 13:19:43,311 INFO [org.jboss.cache.TreeCache] forcing release of all locks in old tree
      2005-11-08 13:19:49,615 ERROR [org.jgroups.protocols.FD_SOCK] socket address for server07p:43430 could not be fetched, retrying
      


      Meanwhile on server07p I see:
      2005-11-08 13:24:22,965 DEBUG [org.jboss.webservice.handler.HandlerChainBaseImpl] Enter: handleResponse
      2005-11-08 13:24:22,965 DEBUG [org.jboss.webservice.handler.HandlerChainBaseImpl] Exit: handleResponse with status: true
      2005-11-08 13:24:28,810 INFO [org.jboss.cache.TreeCache] viewAccepted(): new members: [server07p:43430, server21p:38164, server22p:37383]
      2005-11-08 13:24:29,627 INFO [org.jboss.cache.TreeCache] viewAccepted(): new members: [server07p:43430, server21p:38164, server22p:37383, server35p:33190]
      


      Except for the bind_addr, the /server/all/deploy/cluster-service.xml file is the same across all systems. It's just the default file, although I'll paste it if asked.

      Testing connections by telnetting to server07:43430 from server 35p doesn't work. Should it?

      If you guys can even point me toward the right documentation, I'd be thrilled. I've never worked with JBoss before this week, so I apologize in advance for my ignorance on this subject.