1 Reply Latest reply on Jun 17, 2015 8:18 AM by ramkama87

    JBOSS 5.1.0 GA - Cluster won't form, depending on boot order

    chrishowell

      Hello,

       

      An application that I work with runs inside of a JBOSS instance, and I was recently asked to add in a second instance and custer them.  After coniguring the second instance, when I start it, it refuses to recognize the existing instance to form the cluster; however, I found that if I start the original instance while the second instance is already running that the cluster will form.  I suspected that this was the result of a firewall rule, so after going through the port list and adding every exception I could think of, I disabled the firewalls on both servers.  The problem persists.  Next, I grabbed a network monitor to see if there was any unexpected packet traffic - but I'm not seeing anything that wasn't showing up in the logs.  Both instances are on Windows Server 2008.

       

      The following is an excerpt of the logs when the newly added cluster server  is started second:

      INFO   | jvm 1    | 2014/09/03 10:38:20 | 10:38:20,297 INFO  [TitaniumJBossPartition_DEV] Initializing partition TitaniumJBossPartition_DEV

      INFO   | jvm 1    | 2014/09/03 10:38:20 | 10:38:20,344 INFO  [STDOUT]

      INFO   | jvm 1    | 2014/09/03 10:38:20 | ---------------------------------------------------------

      INFO   | jvm 1    | 2014/09/03 10:38:20 | GMS: address is 172.22.35.78:54530 (cluster=TitaniumJBossPartition_DEV)

      INFO   | jvm 1    | 2014/09/03 10:38:20 | ---------------------------------------------------------

      INFO   | jvm 1    | 2014/09/03 10:38:20 | 10:38:20,391 INFO  [PlatformMBeanServerRegistration] JBossCache MBeans were successfully registered to the platform mbean server.

      INFO   | jvm 1    | 2014/09/03 10:38:20 | 10:38:20,453 INFO  [STDOUT]

      INFO   | jvm 1    | 2014/09/03 10:38:20 | ---------------------------------------------------------

      INFO   | jvm 1    | 2014/09/03 10:38:20 | GMS: address is 172.22.35.78:54530 (cluster=TitaniumJBossPartition_DEV-HAPartitionCache)

      INFO   | jvm 1    | 2014/09/03 10:38:20 | ---------------------------------------------------------

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,388 INFO  [TitaniumJBossPartition_DEV] Number of cluster members: 1

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,388 INFO  [TitaniumJBossPartition_DEV] Other members: 0

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,466 INFO  [RPCManagerImpl] Received new cluster view: [172.22.35.78:54530|0] [172.22.35.78:54530]

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,466 INFO  [RPCManagerImpl] Cache local address is 172.22.35.78:54530

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,466 INFO  [RPCManagerImpl] state was retrieved successfully (in 2.01 seconds)

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,481 INFO  [ComponentRegistry] JBoss Cache version: JBossCache 'Cascabel' 3.1.0.GA

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,481 INFO  [TitaniumJBossPartition_DEV] Fetching serviceState (will wait for 30000 milliseconds):

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,481 INFO  [TitaniumJBossPartition_DEV] State could not be retrieved (we are the first member in group)

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,575 INFO  [HANamingService] Started HAJNDI bootstrap; jnpPort=1100, backlog=50, bindAddress=/172.22.35.78

      INFO   | jvm 1    | 2014/09/03 10:38:22 | 10:38:22,575 INFO  [DetachedHANamingService$AutomaticDiscovery] Listening on /172.22.35.78:1102, group=239.255.100.199, HA-JNDI address=172.22.35.78:1100

      ...

      INFO   | jvm 1    | 2014/09/03 10:39:22 | 10:39:22,138 WARN  [NAKACK] 172.22.35.78:54530] discarded message from non-member 172.22.35.87:63947, my view is [172.22.35.78:54530|0] [172.22.35.78:54530]

      INFO   | jvm 1    | 2014/09/03 10:39:22 | 10:39:22,793 WARN  [NAKACK] 172.22.35.78:54530] discarded message from non-member 172.22.35.87:63947, my view is [172.22.35.78:54530|0] [172.22.35.78:54530]

      INFO   | jvm 1    | 2014/09/03 10:39:22 | 10:39:22,856 WARN  [NAKACK] 172.22.35.78:54530] discarded message from non-member 172.22.35.87:63947, my view is [172.22.35.78:54530|0] [172.22.35.78:54530]

      INFO   | jvm 1    | 2014/09/03 10:39:23 | 10:39:23,230 WARN  [NAKACK] 172.22.35.78:54530] discarded message from non-member 172.22.35.87:63947, my view is [172.22.35.78:54530|0] [172.22.35.78:54530]

       

      The application parameters are identical on both servers, except for the jboss.messaging.ServerPeerID, and the bind IP.

      wrapper.java.additional.11=-Djboss.messaging.ServerPeerID=2    (this is 1 on the other server)

      wrapper.app.parameter.1=org.jboss.Main

      wrapper.app.parameter.2=-c

      wrapper.app.parameter.3=TitaniumORE

      wrapper.app.parameter.4=-b

      wrapper.app.parameter.5=172.22.35.78

      wrapper.app.parameter.6=-g

      wrapper.app.parameter.7=TitaniumJBossPartition_DEV

      wrapper.app.parameter.8=-u

      wrapper.app.parameter.9=239.255.100.199

       

      Both servers are using wrapper to run as a service.  There is also a spot in the wrapper config that specifies the peer ID, and this is also set to 1 on one server, and 2 on the other.

       

      Does anyone know what might cause this behavior?  There's nothing I can think of that would cause this to work successfully in one boot order, but not the other.

       

      Any help would be greatly apprciated.

       

      Thank you,