DB performances for large number of messages
garu Jun 15, 2007 3:39 AMHi all,
i'd like to discuss about db organization because i'm concerned that with actual db structure, messaging won't be able to handle the volume of data i'm going to pump within it.
More generally speaking i think messaging in its actual incarnation is not well suited to handle large volume data.
In a few words i need messaging to become a sort of multiplexer that taking messages from a single source should be able to distribute them to different subscribers. Of these subscribers one will always be active and the others (at least two but could be more) will be active only when needed. This means that i have to guarantee that messages on topics are persisted until the subscribers read them or a certain amount of time has passed.
No problem until now, but i have to handle tenth of topics and hundredth million messages a day. It's not a problem of data size, the average message size is less than 200 byte, but just of number of messages.
Actually, if i din't miss something, all messages for all queues/topics are handled with only two tables jbm_msg and jbm_msg_ref and this is the limit.
I don't know how experienced you are about dbms but i know by experience that doesn't matter what dbms engine you have under the cover, when a table begin to fill with tenths or hundredths millon rows, performances go down the kitchen sink.
Now we have a legacy system (non jms, just C programs) that, to avoid performance bottlenecks, divide the message flows in different tables so that each table (partitioned tables) can remain low volume in terms of rows number and insert time can remain low, but to obtain the same subdivision i'd need a messaging instance for each flow.
If i'd try to propose such an architecture, i'd be killed on place!
Obviously i'm not thinking of a single instance to handle the whole data flow, it would be unsafe to say the least, but on the other side i cannot think to have an instance for each flow.
What i'm thinking about and are proposing is a system by which when you deploy a queue/topic you can ask that the queue/topic be allocated on a different tables set than the default one. This means that choosing the queue/topic where i send i implicitly choose the tables where messages are written, allowing for performances tuning for large number of messages.
I'd like to know you opinion about that.
Thanks, Gabriele