8 Replies Latest reply on Feb 16, 2012 10:00 AM by sannegrinovero

Query vs. ClusteredQuery documentation

matlach Feb 15, 2012 2:23 PM

I was wondering if there was any other documentation than https://docs.jboss.org/author/display/ISPN/Querying+Infinispan that covers the infnispan's query module.

I was especially wondering what was the best use case of using clustered query.

Should we first rely on :

A. local indexes and perform clustered query (clustering mode="distributed" + indexing indexLocalOnly="true") or

B. replicated/distributed indexes and perform local query (clustering mode="distributed" + indexing indexLocalOnly="true" + hibernate.search.default.directory_provider="infinispan") ?

Could someone explain me the 'pro' and 'cons' of those two implementations ? When each of those are best suited ?

Thanks a lot for helping me to make this clearer in my head !

1. Re: Query vs. ClusteredQuery documentation

galder.zamarreno Feb 16, 2012 6:45 AM (in response to matlach)

Clustered, or distributed, queries are pretty much experimental. They were implemented in first Infinispan 5.1.0 alpha but I haven't seen much feedback on it.

Replicated/distributed indexes and local queries have been much more road-tested, so I'd go for them.
1 of 1 people found this helpful
Actions
2. Re: Query vs. ClusteredQuery documentation

sannegrinovero Feb 16, 2012 7:12 AM (in response to galder.zamarreno)

Exactly as Galder said, distributed queries are experimental and research in progress. If you want to try it out and share your thoughts on it you're very welcome, but if you need to deliver something solid in short time I would avoid it for now.

About the pros/cons:
a distributed query will always be less efficient than a query on a single node if it's possible to replicate the indexes.

So it depends if you can replicate the index: replication is less effective on larger clusters, as every single write might generate big chunks of data which need to be sent to all nodes in the cluster. It also depends on your read/write ratio: even if you have many nodes in the cluster but your index is basically read only, replication will be effective.

With larger clusters having many write operations you will likely prefer a clustered query;
to resolve a single query then you'll have
- multiple network RPCs to perform it
- as a consequence, a higher latency before your query result is returned
- more CPU intensive

Still this would be much better than using a non-clustered query when having the index physically distributed across the nodes, as we can't take advantage of the knowledge of how the data is organized.

Use B with replication if possible.
1 of 1 people found this helpful
Actions
3. Re: Query vs. ClusteredQuery documentation

matlach Feb 16, 2012 8:03 AM (in response to sannegrinovero)

Thanks you both Galder and Sanne for the detailled answer, this is much appreciated.

I might tough have another question since my application is write intensive as almost each user action will trigger update.
Let's consider the following scenario :

// put in cache a value with 2 field, one indexed and one not indexed.
Value value = new Value();
value.setNonIndexedFieldValue(123);
value.setIndexedFieldValue(456);
cache.put("key", value);

// later...
// get back value from cache and update the not indexed field
Value value = cache.get("key");
value.setNonIndexedFieldValue(789);
cache.put("key", value);

In this scenario, do the second put operation will trigger an update to the index even tough it hasn't changed ?

I had used in my project both approches, clustered and non clustered query with success (tough with clustered query I had to add some try catch), I was really just wondering performance wize what was the ideal use case for each of them.
Actions
4. Re: Query vs. ClusteredQuery documentation

sannegrinovero Feb 16, 2012 8:35 AM (in response to matlach)

your second scenario would trigger an index update when using Infinispan Query, it would not trigger it when using Hibernate Search. The difference is the session in Hibernate is able to track changes on managed objects, but on Infinispan we only receive a put with an object, having no option to check for a change.

Thinking about it, there might be some situations in which we can detect it, for example if the put is going to carry a valid return value then we could compare values in string form.. nice, please open an improvement request on JIRA!

Regarding issues with clustered queries, I've seen you opened ISPN-1568, thanks for that. If you have more issues please track them as well.
Actions
5. Re: Query vs. ClusteredQuery documentation

matlach Feb 16, 2012 8:51 AM (in response to sannegrinovero)

I've created the issue : https://issues.jboss.org/browse/ISPN-1865

I guess tough it wouldn't work if using "unsafe operation" or "cache.getAdvancedCache().withFlag(Flags.SKIP_REMOTE_LOOKUP).put(...)"

Thanks a lot for your help.
Actions
6. Re: Query vs. ClusteredQuery documentation

sannegrinovero Feb 16, 2012 8:55 AM (in response to matlach)

thanks. Exactly, in that case I would have no choice and would need to have it indexed again. We might add a Flag in case your application knows something that we can't infer from the state.
Actions
7. Re: Query vs. ClusteredQuery documentation

matlach Feb 16, 2012 9:07 AM (in response to sannegrinovero)

Do by doing this manually I would achieve the same result ?

// later...
// get back value from cache and update the not indexed field
Value value = cache.get("key");
value.setNonIndexedFieldValue(789);
//cache.put("key", value);
cache.getAdvancedCache().withFlag(Flag.SKIP_INDEXING).put("key", value);
Actions
8. Re: Query vs. ClusteredQuery documentation

sannegrinovero Feb 16, 2012 10:00 AM (in response to matlach)

yes
Actions

Go to original post