3 Replies Latest reply on Jan 22, 2008 3:46 PM by manik

Problem after loading Huge data

sanatmastan Jan 3, 2008 8:31 AM

Hi,

I am new to Jboss cache, so i gone through some examples and try to use them to my requirements. here is my requirement. my aim is to store and retrieve frequency of a particular key (String), so i have written a wrapper methods to add, remove, and get frequency for a key. the tricky part here is to take the advantage of tree structure of Jboss cache to implement TRIES data structure, which means the input key will be splitted into chars and inserted in the tree as nodes and the leaf node contains the frequency. If the same key is inserted twice the frequency at the leaf will increase to 2 and so on. This structure helps me to save some space by sharing some of the roots nodes for the different leaf nodes. I didnt implement any replication or persistence mechanism.

The problem here is when i insert 3mil (50mb) entries i am able to get the frequency in the way i am expected and if i try to insert 10mil entries (of size 105mb) all are getting inserted (ofcouse by increasing vm heap size) with out any error, but if i try to verify the existence of a key, i am getting not exist, can anyone comment on this behaviour?

I have few questions.

1) I would like to know whether Jboss cache framework will suite to my requirement where it contain less depth (20 to 30 levels) and huge breath size (in thousands)?

2) Is there any limitations on Jboss cache memory size?

I would also like to add the data volume of our application it would be around 70Gb which we are planning to cluster them on different VM on a single node.

Thanks in Advance
Sanat

1. Re: Problem after loading Huge data

manik Jan 4, 2008 8:47 AM (in response to sanatmastan)

In terms of limitations, there are no known memory or size limitations within JBoss Cache, as long as your JVM has adequate heap size.

In terms of performance, tree breadth is represented as Maps containing child nodes for each parent node. If each parent node has a large number of children, the Map implementation may become a bottleneck. As of the current stable release (2.0.0.GA) we use java.util.concurrent.ConcurrentHashMaps here. I would say it is worthwhile profiling to see if this does become a bottleneck for you.

In terms of depth, while 20 - 30 levels deep is probably close to the upper level of what I have seen in production use as most folk tend to prefer broader trees to deeper ones. THe limiting factor here is the cost of retrieving a node from the tree (since it walks the tree structure). Deeper trees means more walking. In 2.1.0 we will be optimising this by significantly reducing tree walking (JBCACHE-811). Again though, something I'd recommend profiling first to see how much of an impact it causes with your access patterns.

Finally, regarding your issue of verifying existence, do you have any eviction configured? It could be that an eviction thread is removing nodes from the tree.
Actions
2. Re: Problem after loading Huge data

sanatmastan Jan 10, 2008 11:49 AM (in response to sanatmastan)

Thanks for the reply Manik,

we are very much new to JBOSS cache, and trying to perform some feasibility study to know whether this solution will fit to our requirements, during the implementation of the above scenario i didnt configure any eviction, which eviction policy should i have to configure?
Actions
3. Re: Problem after loading Huge data

manik Jan 22, 2008 3:46 PM (in response to sanatmastan)

Please read through the eviction section of the user guide. This will explain how each eviction policy works, and you can decide which is most appropriate for you.
Actions

Go to original post