7 Replies Latest reply on Aug 17, 2016 3:05 AM by hchiorean

How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

barmintor Aug 10, 2016 1:07 PM

We have a MODE application that operates on the jcr:path and jcr:mixinTypes properties of referenced nodes with a query along the lines of:

SELECT [jcr:path] AS path, [jcr:mixinTypes] AS type

FROM [nt:base] as ref

WHERE [jcr:uuid] IN (

SELECT [some:uniqueProperty] FROM [nt:base] AS child

WHERE ISDESCENDANTNODE(child, '/path/to/contextNode')

AND child.[some:uniqueProperty] IS NOT NULL

)

Where jcr:path, jcr:uuid, and some:uniqueProperty are all indexed. However, as the number of references becomes large, the retrieval becomes very slow. The slowdown appears to be related to node retrieval for the index. Two questions:

1) Is there a way to defer node retrieval in favor the indexed values from a query, if the case is that only the values from SELECT are needed?

2) Is it true that [jcr:mixinTypes] (beyond the first value) cannot be indexed with the local indexProvider?

1. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

hchiorean Aug 15, 2016 3:37 AM (in response to barmintor)

Where jcr:path, jcr:uuid, and some:uniqueProperty are all indexed.

form your query, the jcr:path indexes are never used. Remember that indexes cannot handle join criteria (i.e. ISDESCENDANTNODE).

However, as the number of references becomes large, the retrieval becomes very slow. The slowdown appears to be related to node retrieval for the index.

can elaborate a bit on what you mean by this ?

1) Is there a way to defer node retrieval in favor the indexed values from a query, if the case is that only the values from SELECT are needed?

I'm not sure what you mean by this. Indexes work by storing the node keys together with the particular values for each type of index. When using that index, ModeShape will always get the node keys from the index and always load those nodes before resolving the rest of the query.

2) Is it true that [jcr:mixinTypes] (beyond the first value) cannot be indexed with the local indexProvider?

the local index provider does support multi-valued properties.
Actions
2. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

barmintor Aug 16, 2016 7:29 AM (in response to hchiorean)

form your query, the jcr:path indexes are never used. Remember that indexes cannot handle join criteria (i.e. ISDESCENDANTNODE).

Rather than using jcr:path in the query, my hope was that by indexing jcr:path, the Row results would be able to use the indexed value rather than retrieving the matched nodes.

can elaborate a bit on what you mean by this ?

Sure! If there are several hundred, or even thousand, values for a REFERENCE property, and your application needs their jcr:paths, the matching nodes appear to be individually retrieved in the row iterator. This is very slow- particularly, our users report, when using a JDBC backend. We are trying to sort out if there is a way to use values from the index instead.
Actions
3. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

hchiorean Aug 16, 2016 7:47 AM (in response to barmintor)

Indexes are only used to filter out/reduce the number of potential results matching a query criteria (as opposed to loading all the nodes of the repository into memory and checking the criteria against each one); there is no correlation between indexes.
So if you SELECT the [jcr:path] of a node, after determining the correct node ids (candidates) based on the query criteria and indexes, ModeShape will load each of those nodes into memory as the result set is being iterated.
Actions
4. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

barmintor Aug 16, 2016 8:37 AM (in response to hchiorean)

And I take it that the javax.jcr.query implementations are sufficiently encapsulated that there's nothing to be done to fetch those nodes in batches (particularly from the JDBC backend)?
Actions
5. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

hchiorean Aug 16, 2016 9:11 AM (in response to barmintor)

ModeShape's querying system can only retrieve nodes individually (i.e. one at a time) from the underlying persistent store
Actions
6. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

barmintor Aug 16, 2016 10:41 AM (in response to hchiorean)

I am wondering if there is a seam on the javax.jcr.query.QueryResult interface to interact in a more performance-conscious way by overriding QueryResult#getNodes. Our project will need to figure something out.
Actions
7. Re: How to quickly get the referenced paths and mixinTypes from nodes with many reference properties

hchiorean Aug 17, 2016 3:05 AM (in response to barmintor)

You can see the ModeShape implementation starting at the modeshape/QueryResult.java at master · ModeShape/modeshape · GitHub interface and its implementation. The current design returns query results in lazy batches (semantically) where each row of each batch is only loaded from the workspace cache (see below) during the result set iteration. The only performance-related methods on the JCR API afaik. are the limit/offset/skip methods which allow you to selectively load only certain results.

If you want to look deeper at the implementation you should start with modeshape/QuerySources.java at master · ModeShape/modeshape · GitHub which is where the query results come from when indexes are being used. The other thing which you should keep in mind is that it's equally important (performance wise) that the nodes are loaded via the workspace cache, not directly from the DB. It can very well happen that certain nodes are already in the ws cache (have been read previously) and therefore a DB trip is avoided. I'm not convinced batch loading from the DB is the best way (performance wise) especially because of the limit/offset/skip methods I mentioned above. It may be the best solution for your use case, but it may not be optimal for other cases.
Actions

Go to original post