0 Replies Latest reply on Dec 14, 2015 10:10 AM by thoemic

    Cluster synchronisation problem after nodes are restarted

    thoemic

      Hi,

       

      I have a cluster with 8 nodes on WildFly 8.1.0.Final. All the nodes run in standalone mode. When restarting a couple of nodes, some of them are experiencing trouble communicating.

      The problem can be reproduced as follows:

      1. Start all the nodes. Cluster synchronisation works.
      2. Kill node 3 and node 5
      3. Start node 3
      4. Start node 5
      5. Send message to topic on node 5
        --> Node 3 does not receive the message, but all the other nodes do.
      6. Send message to topic on node 3
        --> All the nodes receive the message
      7. Send message to topic on the other nodes
        --> All the nodes receive the message


      Inspecting the log of node 3 reveales the following log entry which looks suspicious to me:

      org.hornetq.core.server] (Thread-15 (HornetQ-client-global-threads--577881222)) HQ222139: MessageFlowRecordImpl [nodeID=8872394f-a1ce-11e5-8297-23a69032a478, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5446&host=172-17-0-125&use-nio=false, queueName=sf.my-cluster.8872394f-a1ce-11e5-8297-23a69032a478, queue=QueueImpl[name=sf.my-cluster.8872394f-a1ce-11e5-8297-23a69032a478, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=4d8329e7-a26b-11e5-b9e9-690d2e81ad4e]]@2c2e499a, isClosed=false, firstReset=true]::Remote queue binding jms.topic.BsiCrmClusterSyncTopic7c6ce5fd-a26b-11e5-b28e-e329775e0980 has already been bound in the post office. Most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses


      This log entry can only be found in the log of node 3.


      Does someone know how to solve this issue? standalone.xml is attached.


      Best,

      Thomas