6 Replies Latest reply on Mar 31, 2008 1:25 PM by brian.stansberry

    JBCACHE-1170

    brian.stansberry

      Discussion of ongoing problems with http://jira.jboss.org/jira/browse/JBCACHE-1170 .

      Possible cause, from org.jgroups.blocks.RequestCorrelator.handleRequest() through which all RPCs come:

       try {
       retval=request_handler.handle(req);
       }
       catch(Throwable t) {
       if(log.isErrorEnabled()) log.error("error invoking method", t);
       retval=t;
       }
      
       if(!hdr.rsp_expected) // asynchronous call, we don't need to send a response; terminate call here
       return;
      
       if(transport == null) {
       if(log.isErrorEnabled()) log.error("failure sending response; no transport available");
       return;
       }
      
       // changed (bela Feb 20 2004): catch exception and return exception
       try { // retval could be an exception, or a real value
       rsp_buf=marshaller != null? marshaller.objectToByteBuffer(retval) : Util.objectToByteBuffer(retval);
      


      The first try block calls into the JBC code and ends up calling into the code that sets the CacheMarshaller200.regionForRemoteCall ThreadLocal.

      Last try block again calls into the JBC response marshalling code, which clears the ThreadLocal.

      Problem is the 2 return calls in between, particularly the 1st which will cause the method to return for every ASYNC RPC w/o clearing the ThreadLocal.

      This is a private method, so there's no clean way to override it to ensure the TL is cleared.

      Perhaps InactiveRegionAwareRpcDispatcher can clear the TL if the return value is null. That's a fragile mechanism to try to guesstimate what returns are possibly async. Mostly just a cleanup attempt.

      A separate question is why the partial state transfer call isn't setting the TL to the correct value. Ah, right, it's not RPC based so the state transfer request doesn't go through the marshaller.

        • 1. Re: JBCACHE-1170
          manik

          Hmm, thinking about this a bit, and referring to your comments on the JIRA issue raised:

          "bstansberry@jboss.com" wrote:

          2007-08-23 17:02:21,162 DEBUG [org.jboss.cache.marshall.VersionAwareMarshaller] Wrote version 20
          2007-08-23 17:02:21,162 DEBUG [org.jboss.cache.marshall.CacheMarshaller200] Region based call. Using region /JSESSION/localhost/http-field
          2007-08-23 17:02:21,162 DEBUG [org.jboss.cache.marshall.CacheMarshaller200] Marshalling object true
          2007-08-23 17:02:21,162 DEBUG [org.jboss.cache.marshall.CacheMarshaller200] Writing region /JSESSION/localhost/http-field to stream
          2007-08-23 17:02:21,162 DEBUG [org.jboss.cache.statetransfer.StateTransferManager] locking the /JSESSION/localhost/http-scoped subtree to return the in-memory (transient) state

          The second line shows the problem -- "/JSESSION/localhost/http-field" is written to the stream; it should be "/JSESSION/localhost/http-scoped", or perhaps nothing at all.


          I'm guessing this is being written on the sender's side?

          So, this means:

          1) The ThreadLocal value being leaked on the receiver is irrelevant, as this will always be overwritten when the next communication comes in.

          2) The ThreadLocal on the sender is irrelevant as well, since this only becomes a problem if the sender attempts to reuse a thread and the next call is a return value. (if not, the ThreadLocal will just get overwritten again).

          So, the only real problem is if the ThreadLocal leaks (and this seems to happen with async replication as per the JGroups code above), and a sending thread is then reused as a receiving thread. Now AFAIK, JGroups maintains separate thread pools for sending and receiving, so the above should never happen.

          Does the above accurately describe the issue you're seeing, though?



          • 2. Re: JBCACHE-1170
            manik

            Ah, I see it - a stale TL on a sender thread, and the sender thread is then reused to transfer state and since it isn't an RPC call, the stale TL isn't overwritten.

            • 3. Re: JBCACHE-1170
              brian.stansberry

              Exactly. :-)

              • 4. Re: JBCACHE-1170
                brian.stansberry

                Sorry, had my head way deep in EJB3 details when I posted here and on the JIRA. Reading it again I certainly wasn't clear about what I thought was going on. :(

                • 5. Re: JBCACHE-1170
                  manik

                  Here's a kludgey solution that will work for you:

                  The only time the TL is used is when, on the remote end, the InactiveRegionAwareRpcDispatcher (IRARD) calls the CacheMarshaller to unmarshall an object. If this object is a method call, it's region is stored in a TL so we know which region to use for the return value.

                  The return value is marshalled by the RequestCorrelator, by passing in the result of a MethodCall invocation.

                  Since whenever the IRARD calls into the CacheMarshaller, it is always guaranteed to be a method call, we can change IRARD to call a particular method - say, regionalisedMethodCallFromByteBuffer() instead of the more generic objectFromByteBuffer(), which will return an instance of RegionalisedMethodCall.

                  class RegionalisedMethodCall
                  {
                   MethodCall call;
                   Region region;
                  }
                  


                  This way we can get rid of the TL altogether.

                  The IRARD then extracts the MethodCall, invokes it, and then returns an instance of RegionalisedReturnValue to the RequestCorrelator.

                  class RegionalisedReturnValue
                  {
                   Object returnValue;
                   Region region;
                  }
                  


                  The RequestCorrelator maybe (if not ASYNC) then calls CacheMarshaller.objectToByteBuffer() passing in the RegionalisedReturnValue.

                  And here's the kludgey bit. The CacheMarshaller first inspects the object passed in, and if it is a RegionalisedReturnValue, it only marshalls the return value in it, with the region passed in.

                  This would prevent any TL leaks if this is not called and won't require any changes in the way JGroups interacts with the marshaller.

                  • 6. Re: JBCACHE-1170
                    brian.stansberry

                    That's pretty much what the AS's HAPartition does. There it's a bit less klugey because there's no general purpose CacheMarshaller equivalent that might get passed something other than an equivalent to your RegionalisedReturnValue. As we were chatting, ensuring that the RPC/state transfer/CacheLoader uses of marshalling invoke different marshaller methods can reduce the klugeyness.