5 Replies Latest reply on Mar 15, 2007 2:26 AM by Ron Sigal

    Bisocket server invoker NPE

    Tim Fox Master

      We are seeing the following exception intermittently in JBM:

      
      12:20:32,765 WARN @Timer-1 [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask@ce796: detected failure on control connection Thread[control: Socket[addr=/10.11.14.146,port=2685,localport=3576],5,]: requesting new control connection
      12:20:32,843 ERROR @Thread-18 [BisocketServerInvoker] unable to get secondary locator
      12:20:32,843 ERROR @Thread-18 [BisocketServerInvoker] Unable to recreate control connection: InvokerLocator [null://10.11.14.146:2685/null]
      java.io.IOException: unable to get secondary locator: null
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker.createControlConnection(BisocketServerInvoker.java:165)
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker$1.run(BisocketServerInvoker.java:729)
      


      Looking at the code, it seems to be swallowing the exception:

      }
       catch (Throwable t)
       {
       log.error("unable to get secondary locator");
       throw new IOException("unable to get secondary locator: " + t.getMessage());
       }
      


      I amended the code, so as not to swallow the exception which revealed the following:

      12:38:44,656 WARN @Timer-1 [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask@45378f: detected failure on control connection Thread[control: Socket[addr=/10.11.14.146,port=3213,localport=3679],5,]: requesting new control connection
      12:38:44,859 ERROR @Thread-15 [BisocketServerInvoker] unable to get secondary locator
      java.lang.NullPointerException
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker.createControlConnection(BisocketServerInvoker.java:160)
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask$1.run(BisocketServerInvoker.java:777)
      12:38:44,859 ERROR @Thread-15 [BisocketServerInvoker] Unable to recreate control connection: InvokerLocator [null://10.11.14.146:3213/null]
      java.io.IOException: unable to get secondary locator: null
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker.createControlConnection(BisocketServerInvoker.java:166)
       at org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask$1.run(BisocketServerInvoker.java:777)
      1
      


      I.e. the client invoker with the id can't be found.

        • 2. Re: Bisocket server invoker NPE
          Ron Sigal Master

          I looked at some old logs, and it looks like the problem is associated with tests that deliberately kill the server, which is why the client side is trying to recreate the control connection. The NullPointerException seems to occur because the client invoker has closed before the ControlMonitorTimerTask has stopped running. I'm not sure how this is happening, but it could be that the TimerTask was in the middle of running when the client invoker closed. In any case, I've modified the code to avoid the NPE and print a warning message.

          I ran the JBM functional tests, and they all pass except for occasional failures in FailoverValveTest at line 287

           assertNotNull(o);
          


          I reran FailoverValveTest with the remoting jar from the SVN repository and the same thing happened, so I don't think I broke anything new.

          • 3. Re: Bisocket server invoker NPE
            Ron Sigal Master

            I should add that if the server has died but the client doesn't know it yet, then the "unable to get secondary locator" error message will still appear.

            • 4. Re: Bisocket server invoker NPE
              Tim Fox Master

               

              "ron.sigal@jboss.com" wrote:
              I looked at some old logs, and it looks like the problem is associated with tests that deliberately kill the server


              Yes, I noticed this during crash and failover tests.

              The NullPointerException seems to occur because the client invoker has closed before the ControlMonitorTimerTask has stopped running. I'm not sure how this is happening, but it could be that the TimerTask was in the middle of running when the client invoker closed.


              When using a timer, you can ensure the task has finished running by doing the following:

              (from java.util.Timer javadoc - cancel method)

              Note that calling this method from within the run method of a timer task that was invoked by this timer absolutely guarantees that the ongoing task execution is the last task execution that will ever be performed by this timer.


              So you just need to add a sychronized stop() method to the TimerTask which calls cancel() and make sure you call this from your code - take a look at org.jboss.messaging.core.plugin.postoffice.cluster.StatsSender which does something very similar.


              In any case, I've modified the code to avoid the NPE and print a warning message.


              Is a warning message really appropriate? A warning would imply something wrong or strange - but this is to be expected.


              I ran the JBM functional tests, and they all pass except for occasional failures in FailoverValveTest at line 287

               assertNotNull(o);
              


              I reran FailoverValveTest with the remoting jar from the SVN repository and the same thing happened, so I don't think I broke anything new.


              FailoverValve is not currently used so I wouldn't worry too much about this. We need to address this in the test suite.

              • 5. Re: Bisocket server invoker NPE
                Ron Sigal Master

                 


                So you just need to add a sychronized stop() method to the TimerTask which calls cancel() and make sure you call this from your code - take a look at org.jboss.messaging.core.plugin.postoffice.cluster.StatsSender which does something very similar.


                Coincidentally, I added just this sort of thing when I was working on the memory leaks over the weekend, but, now that you brought it up again, I realize that it's not a good idea, in this particular case. ControlMonitorTimerTask.shutdown() is called indirectly from Client.removeListener(), and if ControlMonitorTimerTask.run() were synchronized, shutdown could get hung up waiting for run() to do network i/o on failed servers, undermining the "quick removeListener()" facility. I think the right strategy is just to accept the possibility of a race and handle it gracefully. So I've put just enough synchronization in run() to avoid NPE's.


                Is a warning message really appropriate? A warning would imply something wrong or strange - but this is to be expected.


                Fair enough. Now that races are anticipated, I've downgraded the "missing client invoker" messages to DEBUG.


                FailoverValve is not currently used so I wouldn't worry too much about this. We need to address this in the test suite.


                Now you tell me :-) Good, I feel better now.