New version of CachedSetImpl| JBoss.org Content Archive (Read Only)

15. Re: New version of CachedSetImpl

brian.stansberry Feb 3, 2006 4:37 PM (in response to jpyorre)

"ben.wang@jboss.com" wrote:

4. In the case that Key is non-primitive, we shall require them to be Serializable. I think this is in line with regular POJO Cache where we require the POJO to be aspectizable (.e.g, with xml or annotation declaration), otherwise, it needs to be Serializable.
-Ben

In the case of a Set, the user can't provide the key, so we're really saying that any values put in sets have to be Serializable, plus we're going to replicate that value as part of an Fqn anytime one of its internal fields changes.

So, I've got class School, one of whose fields is Set students. Student is itself a potentially large object graph; i.e. has Address with a zip field. Now I change the zip field. The zip going to be in a node whose Fqn includes the Set element node. Therefore even though I've only changed the zip, the entire student object is going to be serialized and replicated as part of the Fqn.

16. Re: New version of CachedSetImpl

smarlow Feb 3, 2006 11:09 PM (in response to jpyorre)

Brian,

Your right of course, I'm getting fast performance locally but I'm guilty of double buffering the data in the same network rpc call (and then some).

I could omit the data field and only pass it in the key but that would be too coarse grained (the entire value would be passed in the fqn as you mentioned below).

One idea that I had before was to speed up the set search by doing a hashed search. I would need to have a consistent hashcode across the cluster for the same search value. Once, I find the matching bucket, compare the entries to see if they match the value.

I have another idea as well that I'm going to try next. I'll write about it later.

Thanks,
Scott

17. Re: New version of CachedSetImpl

smarlow Feb 3, 2006 11:43 PM (in response to jpyorre)

I don't think the hashed search will work either.

18. Re: New version of CachedSetImpl

brian.stansberry Feb 4, 2006 2:23 AM (in response to jpyorre)

I'm half-asleep and this is half-baked, but want to write it down before I sleep and forget about it :) Luckily I have the power to delete posts, so if I wake up, read this and am embarassed, well.... poof!

Use integers as keys. In CachedSetImpl add a field

private Map<Object, Integer> keyLookup = new HashMap();

In add(), after you call putObject(), call getObject() to get back the POJO. (This is probably only necessary if the POJO is a Collection, otherwise you already have the POJO.) Anyway, put the POJO and its key in the keyLookup map.

Tricky thing is the keyLookup map can't be trusted, as it won't be updated if the set is changed due to replication. It's just an optimization.

When you need to find a value (e.g. in contains()), you first check the map. If you find a key, go look up the pojo, confirm equals(). If equals() == true, done. If equals() == false, remove the now invalid entry from the map.

Unfortunately, if you don't find a key or it's invalid, you have to scan through the child nodes to confirm you really don't have it :( If you find you do have it, add it to the map. So all this only helps in the case where you do have something. Which is not the normal case :(

If there were a way the CachedSetImpl could know if members have been added or removed due to replication, then the map could be kept up-to-date and the scan avoided. Or if there were a fast way to check whether the map is up-to-date....

Anyway, off to bed..

19. Re: New version of CachedSetImpl

brian.stansberry Feb 4, 2006 11:27 AM (in response to jpyorre)

If there were a way the CachedSetImpl could know if members have been added or removed due to replication, then the map could be kept up-to-date and the scan avoided.

For JBCACHE-354 (http://jira.jboss.com/jira/browse/JBCACHE-354) to be completed, I think there will have to be some kind of notification of when collection contents have changed.

20. Re: New version of CachedSetImpl

smarlow Feb 4, 2006 8:20 PM (in response to jpyorre)

It looks like there is already a map maintained that we could try to use. Although, it maps Key to Object, I could probably use it to speed up the iterator. Our current iterator is getting the set of of keys and does a lookup on every call to Next.

I think that the lookup is hashed and should be fast, however, it might be faster to have our iterator get the set of all values in the map instead.

I'll try this and see what kind of values I get out of it.

Regarding your idea last night, I like it. I'm thinking that we might consider solving this at the JBoss (Tree) Cache level where we already deal with keeping things up to date across the cluster.

21. Re: New version of CachedSetImpl

smarlow Feb 4, 2006 9:11 PM (in response to jpyorre)

I think that the lookup is hashed and should be fast, however, it might be faster to have our iterator get the set of all values in the map instead.

I'll try this and see what kind of values I get out of it.

This looks promising, adding 2000 entries only took about 5.192 seconds.

I have to work on getting the Iterator.remove operations to work.

22. Re: New version of CachedSetImpl

smarlow Feb 10, 2006 11:32 AM (in response to jpyorre)

Should we add "export" functionality to JBoss Cache 1.4? I'm thinking that we could have a way to:

1. Return set of keys at a specified fqn

2. Return set of values at a specified fqn

3. Return map of keys/values at a specified fqn

I might be able to use this functionality to improve the performance of CacheSetImpl in a general way (e.g. Iteration should be faster if I get all values in one call rather than getting them one at a time).

I'll create a Jira for this if there is agreement.

23. Re: New version of CachedSetImpl

jpyorre Feb 27, 2006 8:06 AM (in response to jpyorre)

I've been away from work for a while (on honeymoon) and a lot seems to have happened since I last time visited the the forum...

"bstansberry@jboss.com" wrote:
No, AFAIK we're not thinking about moving away from a TreeCache-based approach, or at least if we are it's way back in the back of someone's mind. But can we get the benefit of your experiences anyway? :) We're definitely interested.

I was asking about this graph-based POJO cache just because I see so many benefits in a graph-based cache compared to tree-based cache (here are just a few):

1. It always causes problems when forcing a less restricted structure in a more restricted structure, as the case is when forcing a graph structure into a tree structure (tree is a special case of a graph). I'm sure that you are more than aware of this problem... The other way (forcing a tree into a graph) there are no such problems.

2. Implementing two things at the same time (caching and tree-structure) creates very complex and bug-inducing code because you have to handle all kinds of special cases of tree-structure and caching simultaneously. If tree cache was build on top of graph cache (instead of the other way) the code would be more modular and at least a magnitude shorter (this is just an educated guess).

3. Persistence of a tree-structure is relative slow as you have to maintain the tree structure in the database i.e. tree node is dependent of all its parent nodes. Graph has no such dependencies (references to a certain node are direct by using a unique identifier for each node, instead of using a recursive path of nodes as an identifier) and thus is much faster. However, if using graph-based cache as the basis of a tree cache, the performance of persistence would still be equal to the current implementation.

4. Graph-based cache would suits the Entity Bean replication quite perfectly.

"ScottMarlowNovell" wrote:
I compared the performance of this new implementation against what I checked in on December 5th 2005 and don't see much difference:
...

I'm not surprised as your implementation (on December 5th) seems very similar to mine. I wasn't aware that you have already made improvements (these changes were not yet in 1.2 release and I haven't accessed JBossCache version control), my bad.

24. Re: New version of CachedSetImpl

ben.wang Mar 1, 2006 11:51 PM (in response to jpyorre)

"jpyorre" wrote:
I've been away from work for a while (on honeymoon) and a lot seems to have happened since I last time visited the the forum...

Congratulations!! Thanks for the time to come back to visit. :-)

"jpyorre" wrote:
2. Implementing two things at the same time (caching and tree-structure) creates very complex and bug-inducing code because you have to handle all kinds of special cases of tree-structure and caching simultaneously. If tree cache was build on top of graph cache (instead of the other way) the code would be more modular and at least a magnitude shorter (this is just an educated guess).

True. But this, IMO, has more impact on PojoCache (or TreeCacheAop). For TreeCache itself, the usage is always flat, per se. That is, user usually store the cache in a logical fashion without regarding much to the graph relationship. Case in point is you can store like year, month, week, day structure into the tree cache and it will fit nicely.

"jpyorre" wrote:
3. Persistence of a tree-structure is relative slow as you have to maintain the tree structure in the database i.e. tree node is dependent of all its parent nodes. Graph has no such dependencies (references to a certain node are direct by using a unique identifier for each node, instead of using a recursive path of nodes as an identifier) and thus is much faster. However, if using graph-based cache as the basis of a tree cache, the performance of persistence would still be equal to the current implementation.

Why do you need to maintain the tree-structure in the persistency store? Not that our solution now is fully optimized but we currently just store (fqn, node) (so to speak). No need of child parent relationship is needed.

"jpyorre" wrote:
4. Graph-based cache would suits the Entity Bean replication quite perfectly.

This is not quite true (I assume you talk about the entity cache). Currently, we use JBossCache behind Hibernate. So you see, Hibernate has taken care of the object relationship for us. TreeCache is again just a plain replicated cache system.

I agree that PojoCache is well suit for this. But the reality of it is RDBMS is everwhere now. :-)