Welcome to the JBoss/Caches forum !
This forum is used to discuss the design/implementation of the new caching subsystem for JBoss.
What is the caching susbsystem?
A replicated cache whose contents will be the same across all nodes of a caching cluster. When an entry is added to the cache of one node, it will be replicated to all nodes. Reads are always local.
Where will it be used ?
Primary target is a cache for entity beans. The idea is to have a write-through cache for entity beans, with reads being purely local (from the cache), thus improving performance by eliminating access to the DB for read-only methods.
But there are many places where it can be used, e.g. clustered JMS.
What is the structure ?
Not yet decided. Possibly a map-like structure, but it could also be a tree.
What are the semantics ?
- Asynchronous replication: put the update on the JavaGroups channel (see www.javagroups.com for info) and return immediately.
- Synchronous replication: propagate the update to all nodes and wait for all replies (excluding crashed members). This will ensure that all nodes have applied the update before we update another node.
- Serialized synchronous replication: same as above, but provide total serialization, e.g. make sure that, if the same data is simultaneously modified on separate nodes of the cluster, the updates are serialized, e.g. everybody gets updates from all nodes in the same order. We'll use locks to do this. Issues: first version will use timeouts, later versions should use deadlock detection
How do I go about implementing this ?
First stage: design and implementation of the building block described above (in JavaGroups).
Second stage: provide an implementation of the JCache (JSR 107) interface in JBoss. This implementation would use the above JavaGroups building block, and probably will be an MBean
Third stage: implement an entity bean cache using the above MBean.
I'll post the JavaGroups building block once I have a usable interface so people can comment.
Can you tell me where you are about this cache?
Have you some documentation of your work?
I have a first draft of the building block in JavaGroups 2.0.5 (org.javagroups.blocks.TransactionalHashtable/ReplicationManager). Async/sync without locks works, I'm working on the locking now. It should take me somewhere around 2 weeks until locking is done. (That is, without deadlock detection, but just simple timeouts.). The current version is in the JavaGroups CVS (www.sf.net/projects/javagroups).
In the meantime I will create an org.jboss.cache package and create a draft version of a replicated cache MBean, so people can start playing around. This will later probably be replaced by a real JCache implementation.
Once I have created the initial MBean, I'll cross-announce it here and on jboss-dev, so we can get about implementing a JCache-like replicated cache.
this sound interesting.
are you implementing pessimistic locking only, or do you have an optimistic locking policy in mind too?
The ReplicationManager can use both types of locking: you essentially have a send() call which receives the data to be updated (possibly with locking information, if it cannot be inferred from the data), a commit() and a rollback() call.
Pessimistic locking will always attempt to acquire locks before proceeding and - if a lock cannot be acquired - the entire transaction will be rolled back. Optimistic locking will just go ahead, apply teh updates to a local copy and on commit() check whether there's a conflict.
What I'm saying is that ReplicationManager leaves the locking, updating and committing (or aborting) to an implementation of ReplicationReceiver.
TransactionalHashtable will be such an implementation, and it will use pessimistic locking. A first impementation will use timeouts for lock acquisition to avoid deadlocks; but I plan to replace this with a distributed deadlock detection algorithm later.
Both of the above classes are part of JavaGroups (2.0.5), which is in the jboss-latest CVS. I'm going to post a description of them on this forum shortly.
Good luck on your implementation!
I'm not trying to troll here, but is there any evidence that a distributed cache is a benefit, especially as an entity bean cache. I'm thinking that I use homogenous deployment of my EJB tier, and co-location of my web tier and ejb tier just to avoid unnessary serialization. Is there any business benefit for placing a distibuted caching implementation in the jBoss container? At some point, you end up developing a database server, but we already have that.
It depends on what you use the cache for. JCache consists of both a local and a replicated cache. If we look at the local cache side first, the EntityBean cache is not currently a 'service' in the sense that others can use it, or that we can add hooks into it. It is currently some piece of code that resides in the CMP code.
If we can offer a caching service (I'm not even talking about replication yet), then some code duplication can probably be avoided.
Having a local cache obviously helps a lot for accessing read-only entity beans.
Now for the replicated side (disclaimer: I'm not a CMP guy): we're looking at performance improvements especially for a clustered environment. There are 2 scenarios I can think of:
- All nodes access the same shared database
- Each node has its own local database.
For the first case we currently use commit option B or C to guarantee exclusive access to the bean from any node. That means, start a tx, fetch the bean into memory, apply methods, commit tx, remove transient object again.
If we could somehow keep that entry in the cache, and work only on the cache, without always having to go to the DB, that would be great.
Obviously, for a modification, you have to write to the DB and update (or invalidate) the bean in all other nodes (write-through cache), but for reads you only access the bean in the cache, which should improve performance a lot.
For the case where each node has it own local DB: here we currently recomment database replication. But such solutions are typically expensive, plus they don't solve our problem. What commercial DB replication typically does is put the update on a queue and replicated lazily. This can lead to inconsistencies.
So what I have in mind is on write access pin down the bean in all nodes, update it, and commit to local DB. Read access would then just access the cached bean on the local node.
Again, these ideas have to be fleshed out, and I'm welcoming suggestions. I will at some point also need to dig down into the CMP code (unless Dain does this for me :-)) and investigate where to put the hooks for the cache.
But the above discussion is highly CMP specific; the cache will first of all just be a service that can be used by everyone, which (I think) is going to be useful in itself.
My 5 cents,
oravecz: "I'm not trying to troll here, but is there any evidence that a distributed cache is a benefit, especially as an entity bean cache."
Caching is always the cheapest way to get more performance. We have JBoss customers whose apps are search engines running our Coherence cache product today because they couldn't pull data fast enough from a database.
Coherence: Easily share live data across a cluster!