3 Replies Latest reply on Nov 29, 2012 5:27 PM by Michael Nielson

    Exception acquiring ownership of X when load balancer moves session. JBoss 7.1.1

    Michael Nielson Newbie

      My application is behind a hardware load balancer instead of mod_cluster. The load balancer is configured to pin user sessions to the same jboss node as long as possible but there are some cases where a user's session is moved between nodes. When this happens our logs show "Exception acquiring ownership of X", there is also a long delay for the user, their request is blocked here:


      sun.misc.Unsafe.park(Native Method)


      A request will be blocked here for 30 seconds before continuing.


      This bug looked interesting to me: https://issues.jboss.org/browse/AS7-4260 Occurences of "Exception acquiring ownership of X (via SharedLocalYieldingClusterLockManager)"


      The fix however seems to be retrying the acquire, this seems like it would make the pause worse for my users?


      My user's sessions last for a long time and it's quite possible for a single node to be overloaded. Moving some of my users to another node helps manage load but a 30 second pause for that user is very painful.


      What I can do to cut down these pauses?





      Message was edited by: Michael Nielson, attached logs.

        • 1. Re: Exception acquiring ownership of X when load balancer moves session. JBoss 7.1.1
          Radoslav Husar Master

          Hi Michael.


          Yes, that is the same issue that linked. Upgrading to 7.1.3 will mititage the issue as you saw in the change set. The fix is assuming that the TimeoutException happens because an expiration thread was trying to lock on the same key and thus timing out. Can you confirm that the session that got the exception was not being expired?


          PS: is "Caused by: java.io.EOFException: Read past end of file" part of this question?



          • 2. Re: Exception acquiring ownership of X when load balancer moves session. JBoss 7.1.1
            Michael Nielson Newbie

            Thanks for your reply!

            PS: is "Caused by: java.io.EOFException: Read past end of file" part of this question?

            "Caused by: java.io.EOFException Read past end of file" seems to happen hand in hand with the "Exception acquiring ownership of X" exceptions. The snippet of logs above is a segment of 66mb+ of logs. In that log I have 2016 instances of "Exception acquiring ownership of" and 1168 instances of "Caused by: java.io.EOFException Read past end of file".


            While this log was being captured my load balancer was not doing a reliable job of pinning sessions, sessions were moving more often than normal.


            An example: User A comes in on Node 1, the user makes a handful of requests within a minute or two all of which are handled by Node 1, User A stops making requests for a minute or two and then requests a few more pages those requests are directed to Node 2.


            The second round of requests from User A start out extremely slow. The first page request of this second round takes more than 10 seconds, almost all of that time waiting inside Request.doGetSession. After the first request to Node 2 the speed goes back to normal. I looked very closely for concurrency issues where both Node A and B were trying to acquire session ownership at the same time but I could not find any (Sessions were moving but there was not contention/concurrency).


            The actual payload of these sessions is very small, in most cases just a few KB. (We actually have another infinispan cache defined with similar sized payloads that gets read from on every request. This cache does not use key affinity but performed very well during these load balancer problems.)

            Can you confirm that the session that got the exception was not being expired?

            We use a standard session timeout of 30 minutes, I can see many many instances of this slow behavior within a ten minute window so I am fairly certain that most sessions were not expired.


            The majority of the problem was solved by fixing the load balancer but there are legitimate situations where load needs to be moved from 1 node to another and I really hate putting my user through a 10-30 second pause just to move their requests. It looks like the upgrade to 7.1.3 would retry these timeouts causing an even longer wait for my users, I would actually prefer to just lose the session data as I don't put any critical data in the session.


            A wait time of 10 seconds seems far too long?