3 Replies Latest reply on Jul 8, 2015 3:31 AM by hchiorean

Should I use a large set of multiple properties on a single node or use child nodes?

ma6rl Jun 29, 2015 1:40 PM

I have around a 1000 string values (each value would be 100-200 bytes) that I need to store on a node. Over time I am expecting to create a large number of these nodes (potentially millions)

Currently my options are to:

a) Store then as a single string property that supports multiple values

b) Store then as child nodes.

I do not need to to be able to query the values, I just need to retrieve the entire set.

Which option would give me better performance? Is there a recommended maximum number of values that can be stored on a property, and are there any size constraints I should take into consideration both for a single property or a node?

Many thanks in advance.

1. Re: Should I use a large set of multiple properties on a single node or use child nodes?

hchiorean Jun 30, 2015 2:49 AM (in response to ma6rl)

If you store them as child nodes, you would presumably use the "string value" as the name of the child node. If that is the case, whenever you want to load such a node or get its path, ModeShape will load all the parent's child references into memory. So in the "child node" approach, from a memory perspective, you'd end up with all the string values plus whatever other structures are created for child references.

I think using a multi-valued string property should probably perform better. Also, using this approach you have the option (via the configuration) to decide whether you want to store these strings into the main ISPN cache or the binary store (via the minimum string size - Binary values - ModeShape 4 - Project Documentation Editor). ModeShape doesn't impose any limit of the size of the strings, but if you're storing data in a DB, you may want to make sure the data column type supports your large strings. Writing/adding values should also be faster for multi-string values than child nodes because of the amount of JCR validations & extra processing logic that takes place when children nodes are added.

The downside of the multi-valued approach (as opposed to the child nodes approach) is probably API usability: for multi-valued properties you'd have to always get the property and iterate through its values as opposed to navigating directly to such a value.
Actions
2. Re: Should I use a large set of multiple properties on a single node or use child nodes?

ma6rl Jul 7, 2015 12:27 PM (in response to hchiorean)

Thanks for the information above. After getting additional requirements it turns out that I need to store more than a simple string so I am using child nodes to represent the data. I do have a follow up questions regarding;

Horia Chiorean wrote:

ModeShape will load all the parent's child references into memory. So in the "child node" approach, from a memory perspective, you'd end up with all the string values plus whatever other structures are created for child references.

Are you saying all of the child nodes are loaded into memory when you load the parent or just their references (pointers to the child nodes)?

If the majority of the time I only need the information on the parent node and only occasionally need all of the child nodes is it better to design the node hierarchy as

Parent
    |
Children
     |
Child 1, Child 2, Child 3 ....

or

Parent
    |
Child 1, Child 2, Child 3 ....

I'm assuming in the first scenario only the reference to children is loaded when I load parent, where as the second scenario will load references to Child 1 through N when I load the parent.
Actions
3. Re: Should I use a large set of multiple properties on a single node or use child nodes?

hchiorean Jul 8, 2015 3:31 AM (in response to ma6rl)

Are you saying all of the child nodes are loaded into memory when you load the parent or just their references (pointers to the child nodes)?

Only the child references, meaning essentially a list of [childName, childKey] pairs. This is not a problem if you don't have lots of children, but if you're planning on storing children upwards of 500k under the same parent, you could start to see performance degrade. This subject has been explained & debated "to death" in places like: Large numbers of child nodes - ModeShape 4 - Project Documentation Editor, Improving performance with large numbers of child nodes | ModeShape and [MODE-2109] Support nodes with a very large number of unordered and uniquely-named children - JBoss Issue Tracker
I'm assuming in the first scenario only the reference to children is loaded when I load parent, where as the second scenario will load references to Child 1 through N when I load the parent.
That's correct, but again, only the child references will be loaded, not the actual child nodes.
Actions

Go to original post