11 Replies Latest reply on Jul 9, 2015 1:58 PM by shawkins

    Resultset caching not working as expected with cache store

    pranavk

      Hi,

       

      I took a copy of the infinispan config file that is supplied in teiid-runtime jar, and modified it to add a file cache store using the configuration mentioned below. I had then set the config file in the embeddedConifg which was read and the caches were initialized accordingly. As per the file, the cache should hold no more than 2 entries and rest should spill to the specified directory. I am able to see this behavior when i look at the mbeans in jconsole. But when I look at the files formed in the specified directory, they seem to be like some metadata files, all of which are size < 1KB (even on querying large datasets of 10MN). It seems as if the entries are still being held up in the memory.

       

      I am using Teiid embedded. Could you please provide any inputs on the same?

      I'm not sure if this is related, especially to the version of infinispan being used - https://issues.jboss.org/browse/ISPN-3298

       

       

      <namedCache name="resultset-repl">

         <jmxStatistics enabled = "true"/>

      <transaction transactionMode="TRANSACTIONAL"/>

         <eviction maxEntries="2" strategy="LIRS" />

         <expiration lifespan="7200000" />

         <clustering mode="local"/>

       

         <loaders passivation="true" shared="false" preload="false">

         <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" purgeOnStartup="false" ignoreModifications="false">

         <properties>

         <property name="location" value="${java.io.tmpdir}"/>

         </properties>

         </loader>

         </loaders>

       

      </namedCache>

        • 1. Re: Resultset caching not working as expected with cache store
          pranavk

          The following is the format of content of all spilled files that are being formed :

           

            þ *       8org.teiid.dqp.internal.process.SessionAwareCache$CacheIDr¨.ìÖ{°H        ansiIdentifiers originalSessionId originalUserName    
          parameters     java.util.List   sessionId sql userName vdbInfo   org.teiid.vdb.runtime.VDBKey›c¶    „Zåï        hashCode# name version#     > JuqIvUVFmZN3> admin@teiid-security  >A/* cache(ttl:72000000) */ Select * From "bv_allgtd"."dbo.GTD10MN"  ;ÿ ç
          ¥> vds_all1        ,org.teiid.dqp.internal.process.CachedResults²=‰ BÎ!      

          accessInfo8   )org.teiid.dqp.internal.process.AccessInfoÛÍ’ êX        creationTime$   

          externalNames;ü sensitiveToMetadataChanges   hasLobs rowLimit# uuid    ;ÿ   NO|ÅÑ       java.util.Arrays$ArrayListÙ¤<¾Íˆ Ò        a
            oB  >    bv_allgtd> dbo.GTD10MN 5 >$326b0947-0da8-400a-af7f-02e5f982cf6f˜±£üä)
          • 2. Re: Resultset caching not working as expected with cache store
            shawkins

            > It seems as if the entries are still being held up in the memory.

             

            For legacy and technical reasons Teiid does not use infinispan for the storage of the entries - only the keys.  The Teiid BufferManager storage will contain the entries.

            • 3. Re: Resultset caching not working as expected with cache store
              pranavk

              Thanks, couple of questions from that-

              1) So storing keys is the only use for Infinispan for the case of Teiid Embedded, or is there any other integration area?

               

              2) Is there any way that I can use Infinispan to even store entries rather than just the keys?

               

              3) Does resultset caching use the same synchronization mechanism as Int Materialized views? Is this sync mechanism no where dependent on syncing of Infinispan?

              I'm asking this to know from the point of view of the architectural difference between implementations of Int Mat and resultset caching.

               

              Thanks again,

              Pranav

              • 4. Re: Resultset caching not working as expected with cache store
                shawkins

                > 1) So storing keys is the only use for Infinispan for the case of Teiid Embedded, or is there any other integration area?

                 

                It's the caching layer behind prepared statement caching as well.

                 

                > 2) Is there any way that I can use Infinispan to even store entries rather than just the keys?

                 

                No

                 

                > 3) Does resultset caching use the same synchronization mechanism as Int Materialized views? Is this sync mechanism no where dependent on syncing of Infinispan?

                 

                What do you mean by synchronization?  When a result set cache entry needs to be created, the key is placed in cache and will get replicated according to the infinispan config.  From there if a remote member actually uses the cache entry the data will be loaded on demand from one of the other nodes.

                 

                > I'm asking this to know from the point of view of the architectural difference between implementations of Int Mat and resultset caching.

                 

                They are completely different.  Materialization is based upon a table structure.  Resultset caching is just a linear store of rows.

                • 5. Re: Resultset caching not working as expected with cache store
                  pranavk

                  Thanks for the clarification.

                  For (3), by synchronization I meant, the way entire snapshot (data) of the view is replicated across the nodes of the cluster for the case of Internal Mat views; does the same happen with the case of Resultset caching? Or is it, that there is always only a single copy of data, on the node where the query ran first?

                  • 6. Re: Resultset caching not working as expected with cache store
                    rareddy

                    it gets replicated on demand, that means when the same request by appears on the different node, it gets the copy of results from remote node and saves and returns to the result.

                    • 7. Re: Resultset caching not working as expected with cache store
                      pranavk

                      Thanks,

                      I setup a cluster to run resultset caching. First I ran a query over the first node, as a result of which cache was formed and the results were served. Now when I run the query over the second node, I can see that firstly the cached data is transfered and then results are served. While this was happening I noticed the following kinds of messages being printed on my console (which seem negative as if the replication process failed).

                      I was getting the same number of records across though when I ran the query on the second node. Am I missing/specifying wrong, some configuration parameters? I feel that a user, looking at alarming messages, wont be able to make out whether there's something actually going wrong, when he's actually able to receive the records across.

                       

                      [TCP] null: no physical address for Node-A-64774, dropping message

                      [TCP] null: no physical address for Node-A-64774, dropping message

                      [TCP] null: no physical address for Node-A-64774, dropping message

                      [TCP] null: no physical address for machine1-2258, dropping message

                      [NAKACK] [JGRP00011] Node-A-52223: dropped message 21 from non-member Node-A-64774 (view=MergeView::[Node-A-52223|5] [Node-A-52223], subgroups=[Node-A-52223|4] [Node-A-52223]) (received 5 identical messages from Node-A-64774 in the last 70,298 ms)

                      [NAKACK] [JGRP00011] machine2-900: dropped message 21 from non-member machine1-2258 (view=[machine2-900|2] [machine2-900]) (received 5 identical messages from machine1-2258 in the last 70,298 ms)

                      [NAKACK] [JGRP00011] Node-A-52223: dropped message 24 from non-member Node-A-64774 (view=MergeView::[Node-A-52223|7] [Node-A-52223], subgroups=[Node-A-52223|6] [Node-A-52223])

                      [NAKACK] [JGRP00011] machine2-900: dropped message 24 from non-member machine1-2258 (view=[machine2-900|2] [machine2-900]) (received 4 identical messages from machine1-2258 in the last 98,390 ms)

                      [FD] I was suspected by machine1-2258; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

                      [GMS] machine2-900: not member of view [machine1-2258|38]; discarding it

                      [GMS] Node-A-52223: not member of view [Node-A-64774|38]; discarding it

                      [GMS] Node-A-52223: failed to collect all ACKs (expected=1) for view [Node-A-52223|39] after 2000ms, missing ACKs from [Node-A-52223]

                      [GMS] machine2-52236: failed to collect all ACKs (expected=1) for view [machine2-52236|2] after 2000ms, missing ACKs from [machine2-52236]

                      [GMS] machine2-900: failed to collect all ACKs (expected=1) for view [machine2-900|39] after 2000ms, missing ACKs from [machine2-900]

                      [GMS] machine2-900: failed to collect all ACKs (expected=1) for view [machine2-900|40] after 2000ms, missing ACKs from [machine2-900]

                      [GMS] Node-A-52223: failed to collect all ACKs (expected=1) for view [Node-A-52223|40] after 2000ms, missing ACKs from [Node-A-52223]

                      [GMS] machine2-52236: failed to collect all ACKs (expected=1) for view [machine2-52236|3] after 2000ms, missing ACKs from [machine2-52236]

                       

                      ------------------------------------------------------

                      Also, I noticed the following replication errors in a few of the runs (querying the second node after cache has ben formed on the 1st), in one of them when I ran the query again after the following error came, it was always hanging. I was neither getting the results from source, nor the partially replicated caches.

                      So in such scenarios where replication fails, what would be the expected behavior - get a fresh resultset (records may get modified) or sync from another node again?

                       

                      org.teiid.core.TeiidRuntimeException: TEIID30056 java.io.IOException: java.util.concurrent.TimeoutException

                        at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:1216) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]

                        at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:83) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]

                        at org.teiid.replication.jgroups.JGroupsObjectReplicator$StreamingRunner.run(JGroupsObjectReplicator.java:234) ~[teiid-runtime-8.11.0.Final.jar:8.11.0.Final]

                        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_40]

                        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_40]

                        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_40]

                        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_40]

                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]

                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]

                        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

                      Caused by: java.io.IOException: java.util.concurrent.TimeoutException

                        at org.teiid.replication.jgroups.JGroupsInputStream.read(JGroupsInputStream.java:57) ~[teiid-runtime-8.11.0.Final.jar:8.11.0.Final]

                        at java.io.InputStream.read(InputStream.java:170) ~[na:1.8.0_40]

                        at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2313) ~[na:1.8.0_40]

                        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2537) ~[na:1.8.0_40]

                        at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2618) ~[na:1.8.0_40]

                        at java.io.DataInputStream.readInt(DataInputStream.java:390) ~[na:1.8.0_40]

                        at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2823) ~[na:1.8.0_40]

                        at java.io.ObjectInputStream.readInt(ObjectInputStream.java:972) ~[na:1.8.0_40]

                        at org.teiid.client.BatchSerializer$IntColumnSerializer.readObject(BatchSerializer.java:567) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]

                        at org.teiid.client.BatchSerializer$ColumnSerializer.readColumn(BatchSerializer.java:543) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]

                        at org.teiid.client.BatchSerializer.readBatch(BatchSerializer.java:924) ~[sdf-jdbc-SNAPSHOT.jar:SNAPSHOT]

                        at org.teiid.common.buffer.impl.BufferManagerImpl.setTupleBufferState(BufferManagerImpl.java:1241) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]

                        at org.teiid.common.buffer.impl.BufferManagerImpl.setState(BufferManagerImpl.java:1214) ~[teiid-engine-8.11.0.Final.jar:8.11.0.Final]

                        ... 9 common frames omitted

                      Caused by: java.util.concurrent.TimeoutException: null

                        ... 22 common frames omitted

                       

                      Pranav

                      • 8. Re: Resultset caching not working as expected with cache store
                        shawkins

                        > I was getting the same number of records across though when I ran the query on the second node. Am I missing/specifying wrong, some configuration parameters? I feel that a user, looking at alarming messages, wont be able to make out whether there's something actually going wrong, when he's actually able to receive the records across.

                         

                        The jgroups project would be more definitive about tuning your configuration and logging levels.  It would definitely be helpful if they had severity levels associated with those messages.  Are you just using the default logging configuration?

                         

                        > So in such scenarios where replication fails, what would be the expected behavior - get a fresh resultset (records may get modified) or sync from another node again?

                         

                        We currently have a default timeout of 15 seconds for the total transfer and we'll attempt the transfer 3 times.  Are you dealing with a data size and or network such that the timeout is too small?  So it may appear to hang for 45 seconds, but beyond that the expected behavior is that it should just re-execute locally.

                        1 of 1 people found this helpful
                        • 9. Re: Resultset caching not working as expected with cache store
                          pranavk

                          > Are you just using the default logging configuration?

                          Yes I'm currently just using the default, no changes made yet.


                          > Are you dealing with a data size and or network such that the timeout is too small?

                          Yes, I am trying to cache large data sets (25-95MN records). However, I dont believe 45 secs is too less a time, but is there still a way to configure that?


                          And, is this behavior of attempting transfer thrice same for clustered scenario in Int Mat views too?


                          • 10. Re: Resultset caching not working as expected with cache store
                            shawkins

                            > Yes, I am trying to cache large data sets (25-95MN records). However, I dont believe 45 secs is too less a time, but is there still a way to configure that?

                             

                            Currently no.  The 15 second timeout is cumulative per transfer, which may need to be larger in your case.  Can you log an issue for that?

                             

                            > And, is this behavior of attempting transfer thrice same for clustered scenario in Int Mat views too?

                             

                            Mat views synch via a broadcast.  And bringing up an initial member should just attempt a single pull, but I'd have to double check as that is coordinated by the jgroups state logic.

                            • 11. Re: Resultset caching not working as expected with cache store
                              shawkins

                              > The 15 second timeout is cumulative per transfer, which may need to be larger in your case.  Can you log an issue for that?

                               

                              To add a little more here.  A single transfer is 100 KB. If that is taking longer than 15 seconds then it does seem like something could be wrong with the setup.