11 Replies Latest reply on Jul 9, 2015 1:58 PM by shawkins

Resultset caching not working as expected with cache store

pranavk Jul 2, 2015 12:13 PM

Hi,

I took a copy of the infinispan config file that is supplied in teiid-runtime jar, and modified it to add a file cache store using the configuration mentioned below. I had then set the config file in the embeddedConifg which was read and the caches were initialized accordingly. As per the file, the cache should hold no more than 2 entries and rest should spill to the specified directory. I am able to see this behavior when i look at the mbeans in jconsole. But when I look at the files formed in the specified directory, they seem to be like some metadata files, all of which are size < 1KB (even on querying large datasets of 10MN). It seems as if the entries are still being held up in the memory.

I am using Teiid embedded. Could you please provide any inputs on the same?

I'm not sure if this is related, especially to the version of infinispan being used - https://issues.jboss.org/browse/ISPN-3298

</properties>

</loader>

</loaders>

</namedCache>

1. Re: Resultset caching not working as expected with cache store

pranavk Jul 2, 2015 12:15 PM (in response to pranavk)

The following is the format of content of all spilled files that are being formed :

þ *	8org.teiid.dqp.internal.process.SessionAwareCache$CacheIDr¨.ìÖ{°H ansiIdentifiers	originalSessionId	originalUserName
parameters java.util.List	sessionId	sql	userName	vdbInfo	org.teiid.vdb.runtime.VDBKey›c¶ „Zåï hashCode#	name	version# > JuqIvUVFmZN3> admin@teiid-security >A/* cache(ttl:72000000) / Select From "bv_allgtd"."dbo.GTD10MN" ;ÿ ç
¥> vds_all1	,org.teiid.dqp.internal.process.CachedResults²=‰ BÎ!

accessInfo8 )org.teiid.dqp.internal.process.AccessInfoÛÍ’ êXÂ creationTime$

externalNames;ü	sensitiveToMetadataChanges	hasLobs	rowLimit#	uuid ;ÿ NO\|ÅÑ	java.util.Arrays$ArrayListÙ¤<¾Íˆ Ò a
oB > bv_allgtd> dbo.GTD10MN 5	>$326b0947-0da8-400a-af7f-02e5f982cf6f˜±£üä)	J¢

2. Re: Resultset caching not working as expected with cache store

shawkins Jul 2, 2015 12:29 PM (in response to pranavk)

> It seems as if the entries are still being held up in the memory.

For legacy and technical reasons Teiid does not use infinispan for the storage of the entries - only the keys. The Teiid BufferManager storage will contain the entries.
Actions
3. Re: Resultset caching not working as expected with cache store

pranavk Jul 2, 2015 3:16 PM (in response to shawkins)

Thanks, couple of questions from that-
1) So storing keys is the only use for Infinispan for the case of Teiid Embedded, or is there any other integration area?

2) Is there any way that I can use Infinispan to even store entries rather than just the keys?

3) Does resultset caching use the same synchronization mechanism as Int Materialized views? Is this sync mechanism no where dependent on syncing of Infinispan?
I'm asking this to know from the point of view of the architectural difference between implementations of Int Mat and resultset caching.

Thanks again,
Pranav
Actions
4. Re: Resultset caching not working as expected with cache store

shawkins Jul 2, 2015 3:45 PM (in response to pranavk)

> 1) So storing keys is the only use for Infinispan for the case of Teiid Embedded, or is there any other integration area?

It's the caching layer behind prepared statement caching as well.

> 2) Is there any way that I can use Infinispan to even store entries rather than just the keys?

No

> 3) Does resultset caching use the same synchronization mechanism as Int Materialized views? Is this sync mechanism no where dependent on syncing of Infinispan?

What do you mean by synchronization? When a result set cache entry needs to be created, the key is placed in cache and will get replicated according to the infinispan config. From there if a remote member actually uses the cache entry the data will be loaded on demand from one of the other nodes.

> I'm asking this to know from the point of view of the architectural difference between implementations of Int Mat and resultset caching.

They are completely different. Materialization is based upon a table structure. Resultset caching is just a linear store of rows.
Actions
5. Re: Resultset caching not working as expected with cache store

pranavk Jul 5, 2015 4:30 AM (in response to shawkins)

Thanks for the clarification.
For (3), by synchronization I meant, the way entire snapshot (data) of the view is replicated across the nodes of the cluster for the case of Internal Mat views; does the same happen with the case of Resultset caching? Or is it, that there is always only a single copy of data, on the node where the query ran first?
Actions
6. Re: Resultset caching not working as expected with cache store

rareddy Jul 6, 2015 8:39 AM (in response to pranavk)

it gets replicated on demand, that means when the same request by appears on the different node, it gets the copy of results from remote node and saves and returns to the result.
Actions
7. Re: Resultset caching not working as expected with cache store

pranavk Jul 7, 2015 6:22 AM (in response to rareddy)

Thanks,
I setup a cluster to run resultset caching. First I ran a query over the first node, as a result of which cache was formed and the results were served. Now when I run the query over the second node, I can see that firstly the cached data is transfered and then results are served. While this was happening I noticed the following kinds of messages being printed on my console (which seem negative as if the replication process failed).
I was getting the same number of records across though when I ran the query on the second node. Am I missing/specifying wrong, some configuration parameters? I feel that a user, looking at alarming messages, wont be able to make out whether there's something actually going wrong, when he's actually able to receive the records across.

[TCP] null: no physical address for Node-A-64774, dropping message
[TCP] null: no physical address for Node-A-64774, dropping message
[TCP] null: no physical address for Node-A-64774, dropping message
[TCP] null: no physical address for machine1-2258, dropping message
[NAKACK] [JGRP00011] Node-A-52223: dropped message 21 from non-member Node-A-64774 (view=MergeView::[Node-A-52223|5] [Node-A-52223], subgroups=[Node-A-52223|4] [Node-A-52223]) (received 5 identical messages from Node-A-64774 in the last 70,298 ms)
[NAKACK] [JGRP00011] machine2-900: dropped message 21 from non-member machine1-2258 (view=[machine2-900|2] [machine2-900]) (received 5 identical messages from machine1-2258 in the last 70,298 ms)
[NAKACK] [JGRP00011] Node-A-52223: dropped message 24 from non-member Node-A-64774 (view=MergeView::[Node-A-52223|7] [Node-A-52223], subgroups=[Node-A-52223|6] [Node-A-52223])
[NAKACK] [JGRP00011] machine2-900: dropped message 24 from non-member machine1-2258 (view=[machine2-900|2] [machine2-900]) (received 4 identical messages from machine1-2258 in the last 98,390 ms)
[FD] I was suspected by machine1-2258; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
[GMS] machine2-900: not member of view [machine1-2258|38]; discarding it
[GMS] Node-A-52223: not member of view [Node-A-64774|38]; discarding it
[GMS] Node-A-52223: failed to collect all ACKs (expected=1) for view [Node-A-52223|39] after 2000ms, missing ACKs from [Node-A-52223]
[GMS] machine2-52236: failed to collect all ACKs (expected=1) for view [machine2-52236|2] after 2000ms, missing ACKs from [machine2-52236]
[GMS] machine2-900: failed to collect all ACKs (expected=1) for view [machine2-900|39] after 2000ms, missing ACKs from [machine2-900]
[GMS] machine2-900: failed to collect all ACKs (expected=1) for view [machine2-900|40] after 2000ms, missing ACKs from [machine2-900]
[GMS] Node-A-52223: failed to collect all ACKs (expected=1) for view [Node-A-52223|40] after 2000ms, missing ACKs from [Node-A-52223]
[GMS] machine2-52236: failed to collect all ACKs (expected=1) for view [machine2-52236|3] after 2000ms, missing ACKs from [machine2-52236]

------------------------------------------------------
Also, I noticed the following replication errors in a few of the runs (querying the second node after cache has ben formed on the 1st), in one of them when I ran the query again after the following error came, it was always hanging. I was neither getting the results from source, nor the partially replicated caches.
So in such scenarios where replication fails, what would be the expected behavior - get a fresh resultset (records may get modified) or sync from another node again?

org.teiid.core.TeiidRuntimeException: TEIID30056 java.io.IOException: java.util.concurrent.TimeoutException
at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:1216) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]
at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:83) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]
at org.teiid.replication.jgroups.JGroupsObjectReplicator$StreamingRunner.run(JGroupsObjectReplicator.java:234) ~[teiid-runtime-8.11.0.Final.jar:8.11.0.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_40]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_40]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.io.IOException: java.util.concurrent.TimeoutException
at org.teiid.replication.jgroups.JGroupsInputStream.read(JGroupsInputStream.java:57) ~[teiid-runtime-8.11.0.Final.jar:8.11.0.Final]
at java.io.InputStream.read(InputStream.java:170) ~[na:1.8.0_40]
at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2313) ~[na:1.8.0_40]
at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2537) ~[na:1.8.0_40]
at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2618) ~[na:1.8.0_40]
at java.io.DataInputStream.readInt(DataInputStream.java:390) ~[na:1.8.0_40]
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2823) ~[na:1.8.0_40]
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:972) ~[na:1.8.0_40]
at org.teiid.client.BatchSerializer$IntColumnSerializer.readObject(BatchSerializer.java:567) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]
at org.teiid.client.BatchSerializer$ColumnSerializer.readColumn(BatchSerializer.java:543) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]
at org.teiid.client.BatchSerializer.readBatch(BatchSerializer.java:924) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]
at org.teiid.common.buffer.impl.BufferManagerImpl.setTupleBufferState(BufferManagerImpl.java:1241) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]
at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:1214) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]
... 9 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
... 22 common frames omitted

Pranav
Actions
8. Re: Resultset caching not working as expected with cache store

shawkins Jul 7, 2015 11:18 AM (in response to pranavk)

> I was getting the same number of records across though when I ran the query on the second node. Am I missing/specifying wrong, some configuration parameters? I feel that a user, looking at alarming messages, wont be able to make out whether there's something actually going wrong, when he's actually able to receive the records across.

The jgroups project would be more definitive about tuning your configuration and logging levels. It would definitely be helpful if they had severity levels associated with those messages. Are you just using the default logging configuration?

> So in such scenarios where replication fails, what would be the expected behavior - get a fresh resultset (records may get modified) or sync from another node again?

We currently have a default timeout of 15 seconds for the total transfer and we'll attempt the transfer 3 times. Are you dealing with a data size and or network such that the timeout is too small? So it may appear to hang for 45 seconds, but beyond that the expected behavior is that it should just re-execute locally.
1 of 1 people found this helpful
Actions
9. Re: Resultset caching not working as expected with cache store

pranavk Jul 9, 2015 1:50 AM (in response to shawkins)

> Are you just using the default logging configuration?
Yes I'm currently just using the default, no changes made yet.

> Are you dealing with a data size and or network such that the timeout is too small?
Yes, I am trying to cache large data sets (25-95MN records). However, I dont believe 45 secs is too less a time, but is there still a way to configure that?

And, is this behavior of attempting transfer thrice same for clustered scenario in Int Mat views too?
Actions
10. Re: Resultset caching not working as expected with cache store

shawkins Jul 9, 2015 9:13 AM (in response to pranavk)

> Yes, I am trying to cache large data sets (25-95MN records). However, I dont believe 45 secs is too less a time, but is there still a way to configure that?

Currently no. The 15 second timeout is cumulative per transfer, which may need to be larger in your case. Can you log an issue for that?

> And, is this behavior of attempting transfer thrice same for clustered scenario in Int Mat views too?

Mat views synch via a broadcast. And bringing up an initial member should just attempt a single pull, but I'd have to double check as that is coordinated by the jgroups state logic.
Actions
11. Re: Resultset caching not working as expected with cache store

shawkins Jul 9, 2015 1:58 PM (in response to shawkins)

> The 15 second timeout is cumulative per transfer, which may need to be larger in your case. Can you log an issue for that?

To add a little more here. A single transfer is 100 KB. If that is taking longer than 15 seconds then it does seem like something could be wrong with the setup.
Actions

Go to original post