3 Replies Latest reply on Feb 29, 2016 8:47 AM by bpiepers

    iNodes and DataBlocks

    bpiepers

      I'm currently debugging a situation where Teiid breaks up a particular "WITH" query into at least two separate queries that yield millions of records. I see differences in behavior between clients that connect and am trying to investigate what the problem is. In order to understand the situation better is there someone here that can answer the following questions? (jdv 6.0 with teiid 8.4.6)

       

      1.

       

      when the buffer manager is working on the queries I see a lot of statements that seem to deal with storing intermediate results somewhere:

          

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (BufferManager Cleaner) Allocating inode 11002 to 90 30600

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker0) Allocating storage data block 939524119 of size 16384 to 24007

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (Worker0_QueryProcessorQueue10826) q46uoHEnQCqA.0 Blocking on source request(s).

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker0) Assigning storage data block 939524119 of size 16384

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (BufferManager Cleaner) 91 30599 writing batch to storage, total writes:  29934

      14:35:59,459 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker0) freeing inode 4579 for 90 24007

       

      What do these statements mean and what is JDV doing at that point?

       

      2.

       

      if you decide to cancel the query on the client side because it takes too long we see that JDV does not stop logging these buffer statements. We see statements like this:

       

      14:39:32,075 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) 90 reading batch 23661 from storage, total reads: 17003

      14:39:32,075 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) Getting object at block 134217741 1 90 23661

      14:39:32,075 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) Allocating inode 1936 to 90 23661

      14:39:32,075 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) adding object 88 17830

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) 90 reading batch 23665 from storage, total reads: 17004

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) Getting object at block 268435469 1 90 23665

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) Starting memory buffer cleaner

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue11969) Allocating inode 2352 to 90 23665

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker1) Assigning storage data block 13 of size 16384

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker1) freeing inode 1520 for 90 23657

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker1) Assigning storage data block 134217741 of size 16384

      14:39:34,133 DEBUG [org.teiid.BUFFER_MGR] (FileStore Worker1) freeing inode 1936 for 90 23661

       

      So very similar to the above. Again: what is JDV doing here and why does it continue to loop over these "batches". I have seen our test and acceptance environment being busy throughout the entire night. Although I understand that buffering is necessary in some situations I fail to understand why it is taking so long and why it doesn't stop looping while no client is connected.

       

      3.

       

      the specific query that we execute does not need to be split up into several queries. I know that it is impossible to instruct JDV to push down the query as is entirely to the source but am keen to know if there are other aspects that may influence the query translator/optimzer to decide when to split up queries. If using a specific translator (so not the simple-jdbc translator) and assuming that the drivers are correct, do factors like adding foreign keys between tables in the VDB models (even if they are not specified on the underlying database) or primary keys also play a role? Can they make the query engine more efficient, rather than just importing the tables/views as they are defined in the underlying database?

        • 1. Re: iNodes and DataBlocks
          shawkins

          > In order to understand the situation better is there someone here that can answer the following questions?

           

          It's probably best to focus on the last question first.  You want to first get the plan to be optimal, then spend more time on the details of processing.  It will makes things somewhat difficult in that you are on an older version.  Issues such as [TEIID-3106] BufferManager Cleaner consuming 90% of total CPU time - JBoss Issue Tracker I think could still apply to 8.4.x.  Is it possible to retry your scenario on a later version, just to make sure that some of the behavior isn't addressed by a later change.

           

          > the specific query that we execute does not need to be split up into several queries.


          Do you have the query plan debug log showing exactly what is inhibiting the pushdown?  Could you provide the full query plan debug log?


          >  If using a specific translator (so not the simple-jdbc translator) and assuming that the drivers are correct, do factors like adding foreign keys between tables in the VDB models (even if they are not specified on the underlying database) or primary keys also play a role?


          They can, but only when there additional restrictions based upon the specific translator, such as salesforce will only allow key based joins for pushdown.  What source are you dealing with?


          > Can they make the query engine more efficient, rather than just importing the tables/views as they are defined in the underlying database?

           

          Beyond specific pushdown considerations, keys are used mostly in costing calculations which guide the selection of join algorithms and several sorting strategies.

           

          > What do these statements mean and what is JDV doing at that point?

           

          The buffer manager cleaner is purging memory entries from the buffer manager.  If they haven't already been persisted, then a file entry is being created for them.

           

          > Although I understand that buffering is necessary in some situations I fail to understand why it is taking so long and why it doesn't stop looping while no client is connected.

           

          Again it could be TEIID-3106 and there is also a faster response to cancellation was addressed with [TEIID-3781] Queries are not killed even if the client sends a cancel request - JBoss Issue Tracker

          • 2. Re: iNodes and DataBlocks
            bpiepers

            Thank you for your quick answer.

             

            I will try to test it with the latest version of JDV, should be possible.

             

            > Do you have the query plan debug log showing exactly what is inhibiting the pushdown?  Could you provide the full query plan debug log?


            Yes, I will post that tomorrow.


            > What source are you dealing with?


            The source is a Netezza appliance so we are using the corresponding translator and driver for that.


            Thanks again for the useful information.





            • 3. Re: iNodes and DataBlocks
              bpiepers

              We have decided to move to the latest version of JDV. Because we are using some custom modules etc. this is not straightforward so in case we have additional information I will come back here.