Context of MessageStore design task
timfox Dec 1, 2005 4:47 AMI just wanted to summarise this since I know this is a complex discussion that could probably do with being put in context.
I think this may be useful, especially for people getting up to speed with this subject.
I have extended this somewhat to include a quick summary of the lazy loading queues idea which is a related subject, even thought the task at hand (for now anyway) is to design and provide a default implementation of the Message Store not implementation of lazy loading queue functionality, but I think this provides a background to how the message store fits into the whole picture.
(Adrian/others please correct any inaccuracies I have made)
In order to cope with very large queues/subs (currently all message refs for queues are stored in memory at once), Adrian came up with the idea of lazy loading queues:
With lazy loading queues/subs, we have "special references" in each queue / sub. When one of these special marker references are read from the front of the queue, they cause the next n message references to be loaded into the queue followed by another marker. Also the messages corresponding to those message references are themselves are loaded into the message store. (In some cases they may be in the store already since a queue/sub can maintain a reference to the same message).
(The loading could be done in a different thread to smooth out any big pauses for the client??)
The value of n for a particular queue can be configured by the user to suit the operational characteristics of the queue. (Size of messages/expected throughput etc.)
If messages are sent to a queue and there are already more than n messages in that queue, they go straight in the persistent store (even in the case of a non-persistent message!) and no reference is added to the queue.
So this means if the value of n is tuned well for a particular queue, and the messages in that queue have a similar size then we should be able to avoid running out of memory in the message store in most cases. :)
However, if the message store *does* start running out of memory , which messages should it evict first?
It's likely a default implementation would want to evict those messages that have been in the store the shortest amount of time since they are towards the back of the queues (MRU). (Needs to be pluggable)
I guess we should also be able to support eviction of messages in batches (i.e. batch updates in the db for performance). We should also know if a message is already in the db or not without making a db hit at passivation. (E.g. persistent messages will already be there and non-persistent may not) (Again, pluggable)
When memory usage recovers how do we load messages back into the store? (Again, pluggable)
Perhaps we don't bother and just wait until the message is actually accessed, this is simple and perhaps should be out default implementation.
Other implementations could get more complex. We could batch load a subset of the same messages we evicted back into the store in order to utilise free memory well.
This means we would need to keep track of the ids of the messages we evicted. When loading them back in we may not be able to load them all back in since free memory may not have recovered to the same level it was at before.
In order to know how many to load we would need to store some kind of cumulative message size in the db. (SELECT * from message_refs where cumulative_size < xyz). This all gets complicated and it's my feeling which should probably just stick to a simple default impl for now.
As Adrian has pointed out it's key that the message store needs to be designed such these policies are all pluggable, (of course we need to provide a default implementation too).
Alex is coming up with a design that supports eviction, memory management, etc. pluggably.
It's possible that treecache could be used to provide part of a default implementation. (To be investigated)