2 Replies Latest reply on Jul 26, 2011 4:16 AM by markaddy

    storeAsBinary behaviour

    markaddy

      Hi,

       

      I have a question on the behaviour of this setting in relation to this test case, the configuration is embedded, using distribution with numOwners=2 and two nodes (Node 1 and Node 2)

       

      If I perform a cache.put(key, object) on Node 1 the following get stored in the cache

       

      {font:courier new}

      NodeA org.infinispan.marshall.MarshalledValue {instance = object, raw = byte[]}

      NodeB org.infinispan.marshall.MarshalledValue {instance = null, raw = byte[]}

      {font}

       

      If I then perform a cache.get(key) on Node 2 and examine the cache I see the following:

       

      {font:courier new}

      NodeA org.infinispan.marshall.MarshalledValue {instance = object, raw = byte[]}

      NodeB org.infinispan.marshall.MarshalledValue {instance = object, raw = byte[]}

      {font}

       

      In our real life example the cache is large and will hold several million entries and for our set of data the average hydrated object size in the cache is ~10K but when this is serialized we get down to approximately 1K.  Requests for cached data are even spread across nodes, there is no stickyness and as a result with storeAsBinary configured we would get an approximate  10% overhead in memory requirements for our cache offset against reduced (de)serialization costs.  However the overhead of deserializing, for us at least is insignificant when compared to the ability to cache as much data as possible.  The latency of obtaining the data to cache is the biggest concern.  If we run in hotrod mode we see massively reduced memory requirements as the hotrod server stores all entries as byte[]. 

       

      Is it possible / a valid request to force serialized storage in Embedded mode?

       

      Thanks

       

      Mark

       

      [C2B2 Consulting|http://www.c2b2.co.uk]

        • 1. Re: storeAsBinary behaviour
          galder.zamarreno

          Mark, not sure I understand what you're asking exactly. Is it that you want Infinispan to *only* store the serialized (byte[]) form in memory and no reference to the original instance?

           

          To understand the current design of storeAsBinary, you need to understand where it's coming from. The aim originally was to provide lazy deserialization which effectively allows data in the receiver side to be deserialized lazily, when data is requested, with the assumption that the right classloader would be in place to be able to deserialize data. Often, when data is replicated to another node, the thread receiving the data might not have access to the right classloader, and so a normal embedded configuration would break with ClassNotFoundException or similar.

           

          Taking this into account, what now is called 'storeAsBinary' is not a true reflection of what happens underneath. In a lazy deserialization case, you normally work with the instances and the binary form is just to allow deserialization to happen when data is request lazily. So, the data is either in one form or the other, hence why you see the MarshalledValue with two fields. Btw, those log messages look a bit outdated. I'd suggest moving to the latest Infinispan version which provides more info (currently 5.0.0.CR8).

           

          Even though we don't provide a pure byte[] form of storing data in embedded more, it shouldn't be difficult for you to work that out. All you need to do is mimic what the Java HotRod client does with the data it receives which is converted into a byte[] using the GenericJBossMarshaller. Once you have got hold of the Marshaller (interface), all you have to do is call objectToByteBuffer() to convert into a byte[] and objectFromByteBuffer() to do the opposite work. See RemoteCacheImpl.bytes2obj() and RemoteCacheImpl.obj2bytes() to see this in action.

          • 2. Re: storeAsBinary behaviour
            markaddy

            Hi Galder,

            Yes, only store byte[] - We have already done as you suggest, grabbed the Marshaller and serialized entries before placing in the cache, reversed the process when retrieving entries from cache.  It works well allowing us to reduce memory usage in embedded mode without any significant overhead.  I thought this may be a potentially useful configuration option. 

            Thanks

            Mark