4 Replies Latest reply on Aug 12, 2016 3:59 PM by meabhi007

    Same nodeUUID in hornetQ static-connection clustered nodes (wildfly 8.2.0)

    meabhi007

      Hi,

       

      I have two nodes static cluster configuration, one node is kept as active, while other node is set at backup.

      i enabled the debug mode logging for hornetQ, where seems like both nodes are share the same nodeUUID.

      Please confirm, if this is an expected behavior?

       

      attaching the logs files from both nodes and configuration file from backup node.

      Active node config is same, just <backup> is set to false.

       

      Note: i cleaned the data files, still the same behavior.

        • 1. Re: Same nodeUUID in hornetQ static-connection clustered nodes (wildfly 8.2.0)
          jbertram

          Yes, it is expected that a live and a backup will share the same node UUID.

          • 2. Re: Same nodeUUID in hornetQ static-connection clustered nodes (wildfly 8.2.0)
            meabhi007

            Hi Justin,

             

            Thanks for confirmation.

             

            I have node-1(Active) and node-2(Backup) of hornetQ servers.

            When node-1 faces network issue, node-2 becomes active.

            Node-1 network got restored.

            Since split brain occurred, manually graceful shut down triggered on node-1.

             

            Based on hornetQ code, following flow is executed when node-1 goes down.

             

            When node-1 goes down, it notifies node-2 about going down.

            On node-2, while handing this message, it deletes the node-1 reference from topology.

             

             

            Node-1 is sending following packet to node-2.(log from node-1)

            2016-08-11 14:54:19,483 TRACE [org.hornetq.core.client] (ServerService Thread Pool -- 151) Sending packet nonblocking PACKET(DisconnectMessage)[type=11, channelID=0, packetObject=DisconnectMessage, nodeID=736f8d16-5f58-11e6-b85d-9951dd74bc8f] on channeID=0

             

            Node-2 handles this packet and deletes the node-1 from topology. (log from node-2)

            2016-08-11 14:54:19,564 TRACE [org.hornetq.core.client] (Thread-6 (HornetQ-client-netty-threads-1795638920)) handling packet PACKET(DisconnectMessage)[type=11, channelID=0, packetObject=DisconnectMessage, nodeID=736f8d16-5f58-11e6-b85d-9951dd74bc8f]

            2016-08-11 14:54:19,564 TRACE [org.hornetq.core.client] (Thread-6 (HornetQ-client-netty-threads-1795638920)) Disconnect being called on client:PACKET(DisconnectMessage)[type=11, channelID=0, packetObject=DisconnectMessage, nodeID=736f8d16-5f58-11e6-b85d-9951dd74bc8f] server locator = ServerLocatorImpl (identity=(main-ClusterConnection::HornetQServerImpl::serverUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f)) [initialConnectors=[TransportConfiguration(name=netty-connector-cluster-node-1002, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=node1-domain-na&local-address=10-10-24-77, TransportConfiguration(name=netty-connector-cluster-node-2000, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=node2-domain-na&local-address=10-10-24-77], discoveryGroupConfiguration=null] notifying node 736f8d16-5f58-11e6-b85d-9951dd74bc8f as down: java.lang.Exception: trace

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler.handlePacket(ClientSessionFactoryImpl.java:1588) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:641) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:556) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:532) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1712) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.channelRead(HornetQChannelHandler.java:73) [hornetq-core-client-2.4.5.Final.jar:]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:132) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_45]

             

             

            2016-08-11 14:54:19,564 TRACE [org.hornetq.core.client] (Thread-6 (HornetQ-client-netty-threads-1795638920)) nodeDown ServerLocatorImpl (identity=(main-ClusterConnection::HornetQServerImpl::serverUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f)) [initialConnectors=[TransportConfiguration(name=netty-connector-cluster-node-1002, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=node1-domain-na&local-address=10-10-24-77, TransportConfiguration(name=netty-connector-cluster-node-2000, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=node2-domain-na&local-address=10-10-24-77], discoveryGroupConfiguration=null] nodeID=736f8d16-5f58-11e6-b85d-9951dd74bc8f as being down: java.lang.Exception: trace

              at org.hornetq.core.client.impl.ServerLocatorImpl.notifyNodeDown(ServerLocatorImpl.java:1547) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler.handlePacket(ClientSessionFactoryImpl.java:1596) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:641) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:556) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:532) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1712) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.channelRead(HornetQChannelHandler.java:73) [hornetq-core-client-2.4.5.Final.jar:]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:132) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_45]

             

             

            2016-08-11 14:54:19,564 TRACE [org.hornetq.core.client] (Thread-6 (HornetQ-client-netty-threads-1795638920)) removeMember Topology@3a5ec825[owner=ClusterConnectionImpl@33512835[nodeUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=10-10-24-77, address=jms, server=HornetQServerImpl::serverUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f]] removing nodeID=736f8d16-5f58-11e6-b85d-9951dd74bc8f, result=TopologyMember[name = backup-group-1, connector=Pair[a=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=10-10-24-77, b=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=10-10-24-77]], size = 0: java.lang.Exception: trace

              at org.hornetq.core.client.impl.Topology.removeMember(Topology.java:350) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ServerLocatorImpl.notifyNodeDown(ServerLocatorImpl.java:1550) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler.handlePacket(ClientSessionFactoryImpl.java:1596) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:641) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:556) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:532) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1712) [hornetq-core-client-2.4.5.Final.jar:]

              at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.channelRead(HornetQChannelHandler.java:73) [hornetq-core-client-2.4.5.Final.jar:]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:132) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.15.Final.jar:4.0.15.Final]

              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_45]



            Since both nodes are sharing the same node id, while removal from topology, its actually deleting itself.

            Now, when Clients like MDB are trying to connect Node-2,  "OnConnection" method from "org.hornetq.core.server.cluster.impl.ClusterConnectionImpl"  tries to find localMember.

            Here since local member is already deleted (as explained above), OnConnection logs message:


            Local Member is not set at on ClusterConnection ClusterConnectionImpl@33512835[nodeUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=10-10-24-77, address=jms, server=HornetQServerImpl::serverUUID=736f8d16-5f58-11e6-b85d-9951dd74bc8f]

             

            As a result, Messaging Node-2 is not able to serve clients, even though its up.


            Can you please help in understanding the issue and letting me know, if we are doing anything wrong here.


            • 3. Re: Same nodeUUID in hornetQ static-connection clustered nodes (wildfly 8.2.0)
              jbertram

              At this point if you get into a split-brain situation you're only real recourse is to shut-down both brokers and manually resolve any potential data integrity issues.  You shouldn't expect the broker to work properly in any way in a split-brain situation as it wasn't designed for this.

              • 4. Re: Same nodeUUID in hornetQ static-connection clustered nodes (wildfly 8.2.0)
                meabhi007

                Thanks Justin.

                I'll try shared-store approach to see, if it can help in avoiding split brain issues.