5 Replies Latest reply on Dec 11, 2011 5:05 PM by sannegrinovero

Infinispan calling CacheStore.load(Object)

mrlemao Dec 10, 2011 4:58 PM

How can a CacheStore tell Infinispan once that an entry doesn't exist in the store?

It seems that if I return null in CacheStore.load(Object), the same method will be called over and over. I tried to return an

InternalCacheEntry with a null value, but then the same same load() is called the next time I try to get the key from the cache.

This scenario assumes that all changes to the store will happen through the cache.

1. Re: Infinispan calling CacheStore.load(Object)

sannegrinovero Dec 10, 2011 5:18 PM (in response to mrlemao)

Hi,
that's not currently possible by design as the cache might be shared, or the Infinispan node might have been restarted and so have no information on what's in the cachestore.

A simple solution would be to have your application store a specific value which means "null" for your purposes, so this canary value would be cached in memory, passivated if needed, or replaced correctly if another node writes to it.
Actions
2. Re: Infinispan calling CacheStore.load(Object)

mrlemao Dec 10, 2011 10:24 PM (in response to sannegrinovero)

I guess if I have NULL values managed by the app, I would also have to make sure I don't use methods like putIfAbsent() (not sure if there could be other infinispan internal behavior changes that could be affected by an explicit NULL value).

I am not sure I follow the shared cache issue you mention above: are you refering to the case where the underlying datastore can change outside the cache or a second node using a different loader? Could you expand a bit on that?

The way a see it (from a distance since I have a limited exposure to infinispan at this point), is that if infinispan is configured with a shared loader and tried to load a key K1, it should not try again unless the app forces it with a flag or the app puts() a new value.

regards
Actions
3. Re: Infinispan calling CacheStore.load(Object)

sannegrinovero Dec 11, 2011 11:47 AM (in response to mrlemao)

Lemao Sen wrote:

I guess if I have NULL values managed by the app, I would also have to make sure I don't use methods like putIfAbsent() (not sure if there could be other infinispan internal behavior changes that could be affected by an explicit NULL value).
yes that's correct, but not very hard: use replace(..) instead of putIfAbstent(..); To be nice and clean, you could wrap the Cache with a custom Cache implementation which adds the necessary bits; might actually be nice to contribute such a utility to the core, assuming the other developers agree with the approach as it might not be the best one (better discuss such a contribution on the developer mailing list first).

I am not sure I follow the shared cache issue you mention above: are you refering to the case where the underlying datastore can change outside the cache or a second node using a different loader? Could you expand a bit on that?
Let's say your node A does a get(key1), and is returned null from the cacheloader; now a different node B writes on the same key1 and actually creates a value. What would you expect to happen on node A if it was doing the same get(key1) again?
If you have a "null cache" or something like that preventing the node A to look again, your application won't receive the value; this might work in some configurations as with DIST you would ask other nodes before looking into the store, but it might fail to return the value in configurations as REPL with passivation (for example).

The way a see it (from a distance since I have a limited exposure to infinispan at this point), is that if infinispan is configured with a shared loader and tried to load a key K1, it should not try again unless the app forces it with a flag or the app puts() a new value.
I'm not agaisnt the idea, just making clear it's not how it works today and it might be a bit tricky to implement. The main argument I have against this is that you would require it to store somewhere all keys for which the value is known to be null; not only this does take some memory, but also you would need additional eviction/cleanup policies to avoid keeping around keys which are of no interest anymore. I guess the L1 cache could be a good fit for some logic around this, as it's both intentionally more limited in size and can optionally retrieve invalidation commands from other nodes on the keys it's containing.

How would you implement it?
And does this affect you significantly?
Actions
4. Re: Infinispan calling CacheStore.load(Object)

mrlemao Dec 11, 2011 4:46 PM (in response to sannegrinovero)

No problem if this is how it works: just trying to understand what is as designed and the motivations behind it.
Sanne Grinovero wrote:
I am not sure I follow the shared cache issue you mention above: are you refering to the case where the underlying datastore can change outside the cache or a second node using a different loader? Could you expand a bit on that?
Let's say your node A does a get(key1), and is returned null from the cacheloader; now a different node B writes on the same key1 and actually creates a value. What would you expect to happen on node A if it was doing the same get(key1) again?
If you have a "null cache" or something like that preventing the node A to look again, your application won't receive the value; this might work in some configurations as with DIST you would ask other nodes before looking into the store, but it might fail to return the value in configurations as REPL with passivation (for example).
I guess it would depend on the cache mode. If I were using invalidation mode, I would expect Infinispan to invalidate the NULL entry in A's cache so the next A get(key1) would fetch it from the store this time. If I were using replication mode, then I would expect the new key1 value to propagate to A's cache, overriding the NULL value with the new value (a A.get(key1) should return the new value of key1). For distribution mode, I would expect that the new key1 value would be propagated to all nodes that have a replica of key1 and the L1 cache of other nodes should invalidate key1.
However, I am sure I am missing a bunch of other use cases here.

For instance, this is what I am seeing when I trace one put() into the cache (replication mode, single node, with a cache store):

load(_OTClwCQ9EeGd6pdINye46g)
ENTRY CREATED (origin=true, pre=true)
ENTRY CREATED (origin=true, pre=false)
ENTRY modified (origin=true, pre=true)
ENTRY modified (origin=true, pre=false)
load(_OTClwCQ9EeGd6pdINye46g)
load(_OTClwCQ9EeGd6pdINye46g)
store(_OTClwCQ9EeGd6pdINye46g)

There are 3 loads before it is finally stored. If the key doesnt exist in the back end, this means that I am hitting the back end 3x before I finaly store it.

The way a see it (from a distance since I have a limited exposure to infinispan at this point), is that if infinispan is configured with a shared loader and tried to load a key K1, it should not try again unless the app forces it with a flag or the app puts() a new value.
I'm not agaisnt the idea, just making clear it's not how it works today and it might be a bit tricky to implement. The main argument I have against this is that you would require it to store somewhere all keys for which the value is known to be null; not only this does take some memory, but also you would need additional eviction/cleanup policies to avoid keeping around keys which are of no interest anymore. I guess the L1 cache could be a good fit for some logic around this, as it's both intentionally more limited in size and can optionally retrieve invalidation commands from other nodes on the keys it's containing.

How would you implement it?
And does this affect you significantly?

I am not sure how to solve this specifically without first understanding quite a bit more from infinispan's design/implementation (not an easy task I suppose). At a first glance it doesnt seem a problem to hit the cache loader once in a while (e.g. eviction could simply discard NULL markers) so I don't see the need to keep track of null everywhere and consistently (the NULL marker is just an internal hint and could be stored as the value of key1, and transparently converted to a NULL by infinispan before it reaches the result of a get(key1) by the app).

In any case, I would at least expect that a put() would remember that it hit the cachestore.load() once so that it doesnt happen 3x times as in the above example.

And, in my case, the distributed cache is the data store across the cluster and the storage is just where it is persisted so everything goes through the cache (this is the opposite of using the cache as a 2nd level cache to a database - here the data store is the database and the cache is mainly speeding up that process).

And finally, thanks for the additional info on this. Appreciated.
Actions
5. Re: Infinispan calling CacheStore.load(Object)

sannegrinovero Dec 11, 2011 5:05 PM (in response to mrlemao)

Lemao Sen wrote:

And finally, thanks for the additional info on this. Appreciated.
Thanks to you! you're rasising some very valid arguments.

What you're saying now is that in the scope of a single operation a cachestore load happens 3 times? That is certainly wrong, the CacheLoader should be hit once at most (ignoring the final store). Please open an issue about this! Very well spotted; I'm sorry from your original post I was assuming it would invoke it for multiple application invocations.
Actions

Go to original post