5 Replies Latest reply on Dec 16, 2011 12:19 PM by rhusar

    Too many sending are-you-alive msg

    cadmus

      Hi folks!  I have some issue.

      Application runs on JBoss AS 5.1.0.

      There are two physical servers on each 4 node.

      When the server is up, in the logs can be seen that sent too many "are-you-alive msg".

      As I understand it wrong.

      I attached the archive logs from all cluster nodes.

       

      node 1 fragment logs.

       

      12.12 17:19:17,113 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:17,114 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:17,113 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52044 (own address=10.44.0.177:48938)

      12.12 17:19:17,125 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:7901 (own address=10.44.0.177:7900)

      12.12 17:19:27,116 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:27,117 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52044 (own address=10.44.0.177:48938)

      12.12 17:19:27,116 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:27,128 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:7901 (own address=10.44.0.177:7900)

      12.12 17:19:38,078 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:7901 (own address=10.44.0.177:7900)

      12.12 17:19:38,079 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:38,079 DEBUG org.jgroups.protocols.FD$Monitor heartbeat missing from 10.44.0.177:7901 (number=0)

      12.12 17:19:38,079 DEBUG org.jgroups.protocols.FD$Monitor heartbeat missing from 10.44.0.177:52819 (number=0)

      12.12 17:19:38,079 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52044 (own address=10.44.0.177:48938)

      12.12 17:19:38,080 DEBUG org.jgroups.protocols.FD$Monitor heartbeat missing from 10.44.0.177:52044 (number=0)

      12.12 17:19:38,080 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:38,080 DEBUG org.jgroups.protocols.FD$Monitor heartbeat missing from 10.44.0.177:52819 (number=0)

      12.12 17:19:48,082 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:48,082 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52044 (own address=10.44.0.177:48938)

      12.12 17:19:48,082 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:48,083 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:7901 (own address=10.44.0.177:7900)

      12.12 17:19:58,084 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:58,084 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52819 (own address=10.44.0.177:48938)

      12.12 17:19:58,084 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:52044 (own address=10.44.0.177:48938)

      12.12 17:19:58,086 DEBUG org.jgroups.protocols.FD$Monitor sending are-you-alive msg to 10.44.0.177:7901 (own address=10.44.0.177:7900)

       

      Thanks!

        • 1. Re: Too many sending are-you-alive msg
          rhusar

          LOL, with debug logging turned on expect lot of everything :-)

           

          The "FD" you are seeing is fauilure detection. It works by sending and receiving are you alive messages.

           

          Nothing to worry about here.

           

          HTH

          Rado

          • 2. Re: Too many sending are-you-alive msg
            cadmus

            Hi Radoslav !

            I asked for a reason. In the official guide says that

            "Regular traffic from a node counts as if it is a heartbeat response. So, the are-youalive

            messages are only sent when there is no regular traffic to the node for some

            time."

             

            So I decided that it can be the wrong behavior.

             

             


            • 3. Re: Too many sending are-you-alive msg
              rhusar

              Hi Maxim,

               

              okay, I like your approach -- dont trust anything ;-) In that case it will be best if you test it yourself.

               

              What the docs say makes perfect sense. If you are receiving other messages already there is no reason to send additional heartbeat messages, because you know that the node is okay. For this to test deploy an app that when you do a request it will communicate to all other nodes, so a distributable web app modifiying session each time its accessed, plus replicating to all members. Then access it in small interval and you should see no areyoualive or very little.

               

              Rado

              • 4. Re: Too many sending are-you-alive msg
                cadmus

                I do not quite understand what you mean.

                But if you meant that the application is not actively used and for this messages send.

                This is not that case, the application actively in use.

                 



                • 5. Re: Too many sending are-you-alive msg
                  rhusar

                  Actively does not necessarily mean there have been messages exchanged between the servers. If you are just reading the session, there is nothing to replicate. Thus its necessary to send are-you-alive messages.

                   

                  Also note that the intervals need to be quite short, if something goes wrong the cluster needs to react to is asap with tolerance for some network instability.

                   

                  HTH,

                  Rado