2 Replies Latest reply on Dec 2, 2009 9:58 AM by Jeff Mesnil

    potential issues with producer credits during failover?

    Jeff Mesnil Master

      I saw again a NPE in ServerSessionImpl.getCreditManagerHolder(ServerMessage) because the message's paging store was null.

      After some debugging, this happens when the message is a management message. As management messages are *never* routed (they're handled by the management service instead), it is expected that its paging store is null.

      Normally this is not an issue as there is already a CreditManagerHolder created for the management address by a previsous call to ServerSessionImpl.getCreditManagerHolder(SimpleString) (itself called by
      ServerSessionImpl.handleRequestProducerCredits()).

      But it seems there are cases during failover where the producer did not request credits before sending a management message (it happens with the BridgeImpl's producer to send "sendInfoQueueToQueue" message).

      2 questions:

      * why two methods ServerSessionImpl.getCreditManagerHolder(ServerMessage) and ServerSessionImpl.getCreditManagerHolder(SimpleString)? the first one uses the message's paging store, the 2nd use the paging store for the address from the post office.

      * if failover occurs, what happens with the client producer credits? Did it reset its credits? (I'm still looking at it...)

        • 1. Re: potential issues with producer credits during failover?
          Tim Fox Master

           

          "jmesnil" wrote:
          I saw again a NPE in ServerSessionImpl.getCreditManagerHolder(ServerMessage) because the message's paging store was null.

          After some debugging, this happens when the message is a management message. As management messages are *never* routed (they're handled by the management service instead), it is expected that its paging store is null.


          Ok, so just add line to set this manually.



          * if failover occurs, what happens with the client producer credits? Did it reset its credits? (I'm still looking at it...)


          ClientProducerCreditsImpl::reset

          • 2. Re: potential issues with producer credits during failover?
            Jeff Mesnil Master

             

            "timfox" wrote:
            "jmesnil" wrote:
            I saw again a NPE in ServerSessionImpl.getCreditManagerHolder(ServerMessage) because the message's paging store was null.

            After some debugging, this happens when the message is a management message. As management messages are *never* routed (they're handled by the management service instead), it is expected that its paging store is null.


            Ok, so just add line to set this manually.



            sure. But doesn' this show that on an activated backup node, a producer has sent a message without having requested credits first?

            This happens after failover, when one of the live servers' bridge reconnect to the activated backup server.
            From BridgeImpl.setupNotificationConsumer, it sends a management message "sendInfoQueueToQueue" to the activated backup server.

            The backup server will handle the management message from ServerSessionImpl.handleSend(). When it wants to release the credits in ServerSession.releaseOutStanding(), a NPE is thrown because there is not CreditManagementHolder for the message address (and the management message PaginStore is null).

            The key thing here is that there is no CreditManagementHolder for this address.
            AIUI, the holder should have been created when a client producer sent a credit requests. If there is no holder in that case, this implies that, after failover, the client producer sent the management message before its credit were reset.

            I'm wondering if there isn't a race condition here. The client producer credits are reset outside the failoverLock. Could it be that it is sent after the producer sent the management message from BridgeImpl.setupNotificationConsumer()?

            I'm adding logs to show who's called first on the client and who's executed on the server.