1 Reply Latest reply on Jan 8, 2013 8:51 AM by rhauch

Node.getNodes() and node type

bwallis42 Jan 7, 2013 6:50 PM

I have a model where a node contains three types of child nodes.

{code}
[inf:document] > mix:versionable, nt:hierarchyNode orderable
        + * (inf:history) COPY
        + * (inf:documentState) COPY
        + * (inf:documentData) COPY
        - inf:docType (STRING) mandatory COPY
        - inf:staging (DATE) COPY
        - inf:location (STRING) COPY
{code}

If I call Node.getNodes() it returns all the child nodes, a mixture of all three types. There doesn't seem to be a way to filter the result on node type. There are a couple of variations that filter based on the names of the nodes but in my case that is no help since I cannot come up with a suitable pattern. I could change the naming in my model so that filtering on name would work but that just seems wrong when an implementation restriction imposes itself onto th emodel and it mucks with the rest URLs and WEBDAV paths as well.

So where this is important I could using a query instead

{code}
SELECT da.* FROM [inf:documentData] AS da where ISCHILDNODE(da, "/path/to/document")
{code}

but I am concerned about any potential performance problems this might have.

Are there any?

1. Re: Node.getNodes() and node type

rhauch Jan 8, 2013 8:51 AM (in response to bwallis42)

I'd try to keep it simple. But when in doubt, measure the performance in your own application under representative load.

In case it helps, here is a bit of insight into how the filtering works. The methods that return the filtered child nodes work differently based upon the filter. For example, calling this method with a qualified name (e.g., just one glob with no wildcards) will find the matching node(s) very quickly via a simple hash-based lookup in a map. (This is a great way to find child nodes that all have the same name; e.g., same name siblings.) However, if the filter contains at least one glob with a wildcard, the resulting iterator actually just wraps an iterator that accesses all of the child nodes, skipping those that don't match the filter.

But because you want to check by node type, the child node must be accessed/materialized to obtain the "jcr:primaryType" property. If the node has been recently cached, this will be very fast. If not, then it is very difficult to say in advance whether querying will be faster than accessing all the nodes and iterating yourself -- it simply depends on too many things, such as how many children there are, which cache store (if any) is used, the odds of the children already being cached, the size of the indexes, how the indexes are persisted, whether the querying (reading the indexes) will compete with updates (writing to the indexes), etc.

If you have relatively small numbers of child nodes (say roughly several dozen or less), I would probably use iteration. It's the most simple approach, and I tend to favor navigation children over querying children. When in doOtherwise, the only way to really know which approach is best is to just test it. Just be aware that comparing the performance of querying and iteration will be very difficult unless you're tests exhibit the same load as your application. So if it is really important, the best way to compare is to use both techniques within your application and randomly choose between the variations and measure (in a way that is similar to A/B testing for UIs).

If this is really important to eek out the performance, it may be easier/better to consider using child "container" nodes for the different types of child nodes. Obviously you'd have to weigh this against the other ways your application needs to access the children.

BTW, just because you obtain a javax.jcr.Node instance doesn't mean the node data will be materialized from storage. ModeShape can create a javax.jcr.Node object that simply has the name and identifier (and parent identifier) of the node, and this is often the case when iterating over the children. Only when your application accesses something other the name, identifier, parent, or path will the node's information be searched in the workspace cache and, if none is found (meaning the node hasn't been recently accessed), obtained from the Infinispan cache used to persist content.
1 of 1 people found this helpful
Actions