Jeez, Jive is getting on my nerves.
This is the third time I've posted this answer,
but haven't seen it appear anywhere yet.
Sorry for the repost
Since I don't know that much about caching, shut me up if I'm talking nonsense.
To me a cache is a big map with objects in it, i.e. a collection. This is what prevayler is: no sql or such a thing.
So what does prevayler add? Well if I have a collection, and the power goes PANG!, I can restart the system and the store is restored (... no more writes lost in cache state).
Since prevayler extends to my knowledge the collection API's, the Jakarta JXPATH implementation can be applied on top of the system, it doesn't have to be. What does this add?
Say I have a map called global cache, with in it sub maps with caches for every deployed war in the system/network, with submaps for caches for say SFSB and CMPEJB, then I could retrieve cache value objects using JXPATH by:
Object cached_Object = context.getValue("globalcache/my_web_app/sfsb/cache_specific_wrapper_object[cacheObjectPKfield="id"]/cached_object");
The context object is set to point to my global collections object. The cache_specific_wrapper_object could contain time of cacheing, timeout rules, etc. The lookup is done using introspection, so things could use speeding up, but you have to admit: this is spiffy.
The point is that all the commands that modify the global collection object are submitted using a command pattern, which are spooled to disk, combined with a scheduled dump of the in memory collection th clear this command spool.
These command objects could also contain a XPath expression as above to select the objects on which to act:
1. //SFSB -> gimme all sfsb's (where // is short for /?????/?????/?????? etc)
2. //my_deployed_war -> shut this one down
2. / -> delete everything
Further more, here is an interesting quote from the site about obtaining consistent cache snapshots:
How can you expect to produce a consistent snapshot of a system that is constantly being modified?
This is the fundamental problem with Ambitious Transparent Persistence projects. With prevalence, the problem is solved simply by using the command log.
The command log enables the system to have a replica of the business logic on another virtual machine. All commands applied to the "hot" system are also read by the replica and applied in the exact same order. At backup time, the replica stops reading the commands and its snapshot is safely taken. Then, the replica resumes reading the command queue and gets back in sync with the "hot" system.
This command log could for instance be distributed using Javagroups. There would be (MBean) command routers to decide the topology of the distributed caches (which cache replicates which?), and the (MBean) cache themselves feading on these routers. As long as caches are read this could occur concurrent, the writing commands would have to pass a router.
The best thing is that since all commands are fed into the system using an object queue, the transaction characteristics are very favorable: no concurrency problems.
To avoid confusion: I'm neither a cache expert, nor a prevaylor user/expert. But if you might be interested using this technique, or maybe another one that's appealing, let me know (firstname.lastname@example.org).
But since I've learned some XML, and don't like RDBMS that much, this setup really appeals to me.
the JXPath expressions can be cached (LOL) for speed:
When JXPath is asked to evaluate an expression for the first time, it compiles it and caches its compiled representation. This mechanism reduces the overhead caused by compilation. In some cases though, JXPath's own caching may not be sufficient- JXPath caches have limited size and they are cleared once in a while.
Here's how you can precompile an XPath expression:
CompiledExpression expr = context.compile(xpath);
Object value = expr.getValue(context);
The following requirements can be addressed with compiled expressions:
There is a relatively small number of XPaths your application works with, and it needs to evaluate those XPaths multiple times.
Some XPaths need to be precompiled at initialization time
The syntax of some XPaths needs to be checked before they are used for the first time
> So what does prevayler add? Well if I have a
> collection, and the power goes PANG!, I can restart
> the system and the store is restored (... no more
> writes lost in cache state).
A cache doesn't have to persists all its data.
Prevayler it could be used only for the objects that
have to be spooled to disk instead of being discarded
> The point is that all the commands that modify the
> global collection object are submitted using a
> command pattern, which are spooled to disk, combined
> with a scheduled dump of the in memory collection th
> clear this command spool.
At every execution of a Prevayler Command (insert, update) one file is written to disk and the thread
calls a sync, waiting the operating system to flush
filesystem buffers. So I think this is not the
preferred behaviour for a cache.
Good to hear you know the ins and outs. As I said i'm no expert either on caching or prevaylor.
The synch on files is bad. But in a distributed environment the use of disk files is bad anyway.
As I proposed it would be better to have a MBean call on a command router sort of thing, which then sinks on the reliable multicast of the JavaGroups to the distributed caches. The question of if and how to persist the command log is another one then (use jms or log4j for logging).
Fact remains that Prevaylor even with this file sync remains very very fast. My initial idea was that this sort of cache acts as sort of an in memory page plugged before the database, I made the comparison to write a head logging before.
I think that when a cache is gone work in a distributed environment, which the JBoss cache eventually will, that the biggest overhead will be the synching using tcp/ip and/or the logging. The good part of prevaylor is that due to the command pattern driven interface it is ideal in such a distributed environment.
I guess even if we persist some of the cache's entries, this would still be okay if we can configure this behavior. So if this is okay for one application they could configure a cache to use a cache backed by a disk backed by a database, another app may want the cache entries to be purely transient etc.
I like the idea of being able to configure individual caches with default properties.
Again sorry for the reposts. Jive takes a time before it actually displays the messages posted. It'll be that CACHE thing it uses.
I agree with the KISS principle, but I also like to analyse a problem thoroughly before begining tom implement (I guess this is not so XP).
Could you point out how the transactional mechanism works when updating the caches? I've never come to figuring out how transactions work, but I would like to know.
Is it like:
1. Send command to all caches
2. receive acknowledge
3. If all acknowledgements are in
4. send OK
5. wait for timeout
6. send NOT OK
If you could point me to a model, or a set of interfaces that a cache should implement then I could look if can cludge something. If it's not for using it, it'll be good for my understanding of the topic.
Another advantage of using the command pattern would be the cache clean up's using the JXpath commands: one command could be sent that cleans up 100's of instances.
Anyway the KISS principle has left the building, and that usually happens when things aren't feasable anymore ...
But let me know,
> 1. Send command to all caches
> 2. receive acknowledge
> 3. If all acknowledgements are in
> 4. send OK
> 5. wait for timeout
> 6. send NOT OK
1. Send UPDATE (including transaction) to all members
2. Each member tries to acquire a lock (inferred from the update). If successful, the lock is added to the transaction. If the lock is already held, we return immediately. If we fail to obtain a lock after lock_timeout, we throw a LockingException. The update is then stored in a separate queue for the given tx.
3. When all replies have been received, and no exception or timeout occurred, we send a COMMIT (tx) to all members, otherwise a ROLLBACK(tx).
4. When a member receives a ROLLBACK (tx), all entries in the queue for that tx are discarded and all locks held are released
5. When a member receives a COMMIT (tx), all updates in the queue are applied to the hashtable, and then all locks held are released
Note that we don't use stable storage for temp updates, as this is about transient caches, not DBs. Otherwise this is pretty much a standard 2 phase commit.
> If you could point me to a model, or a set of
> interfaces that a cache should implement then I could
> look if can cludge something. If it's not for using
> it, it'll be good for my understanding of the topic.
Yes, I have (in the CVS) a building block (org.javagroups.blocks.ReplicationManager) in JavaGroups which does the sending and receiving of requests, but locking etc still has to be handled by you.
A second building block is org.javagroups.blocks.TransactionalHashtable. It uses the above block, but doesn't yet implement locking.
The first building block may or may not find its way into JBoss (e.g. as a wrapped MBean).
An extended version of the second block will find its way into JBoss (org.jboss.cache). This will be the basis for the *replicated* cache, so its implementation. In other words, we will provide a JCache interface, and the 2nd building block will be an implementation for a replicated cache (you can also have purely local caches).
So have a look at those. I will add them to JBoss early next week.
O.k. I'll look at the code you've checked in.
I think that for experimenting with this material incorporating the tx's is one step beyond me.
I've looked at the transaction mgr code, it's not much, but it's dense.
I've read the OCS4J 2.0, this is the spec that's leading up till now? I'll look at this one for experimenting.
Talking of serializing and locking etc. As an cache is a storage system I guess it could be usefull to add a mv(c)c (multiple version concurrency control) mechanism in the cache, as used by oracle db, posgresql etc. Here's a link an that:
This brings down the no. of locks in a storage sytem.
It sounds very spacey, but the implementation basically is tracking which tx id committed the last change, and which tx id is trying to write a new version of the object. It seems it would amount to adding 2 fields to the Attribute class with the ID's of the tx that created this particular version of the object, and the tx that invalidated this version of the object.
That'll be it for selling bullshit. I'd better start contributing if I can.