1 2 Previous Next 17 Replies Latest reply on May 31, 2017 1:38 AM by hungrykoala

    VDB metadata problem during DQP request processing

    hungrykoala

      Hello Teiid folks,

       

      Our application has been intermittently showing some very weird behavior, and I am starting this discussion with the hope of getting some hints, further to what we have already managed to analyze.

       

      Unfortunately, we are still not able to reproduce the problem, and it only appeared on one of our environments, which makes us think that it might be related to some race condition.

       

      We have a standalone application, which upon start-up, creates a certain VDB based on certain data sources, and then starts an Embedded Teiid server, that deploys the VDB and exposes a JDBC endpoint for external clients. Perhaps a pretty much standard setup. We use JBoss Teiid 8.10.0.final

       

      The issue we saw happening twice, was that our application's JDBC endpoint became fully unresponsive (even attempts to connect to it failed) because the metadata for the VDB was null:

       

      {code}

      2017-02-14 11:18:12,250 ERROR org.teiid.PROCESSOR: TEIID30019 Unexpected exception for request <request_id>
      org.teiid.core.TeiidComponentException: TEIID30489 Unable to load metadata for VDB <vdb_name>.
      at org.teiid.dqp.internal.process.Request.initMetadata(Request.java:191)
      at org.teiid.dqp.internal.process.Request.processRequest(Request.java:436)
      at org.teiid.dqp.internal.process.RequestWorkItem.processNew(RequestWorkItem.java:627)
      at org.teiid.dqp.internal.process.RequestWorkItem.process(RequestWorkItem.java:326)
      at org.teiid.dqp.internal.process.AbstractWorkItem.run(AbstractWorkItem.java:51)
      at org.teiid.dqp.internal.process.RequestWorkItem.run(RequestWorkItem.java:265)
      at org.teiid.dqp.internal.process.DQPWorkContext.runInContext(DQPWorkContext.java:276)
      at org.teiid.dqp.internal.process.ThreadReuseExecutor$RunnableWrapper.run(ThreadReuseExecutor.java:119)
      at org.teiid.dqp.internal.process.ThreadReuseExecutor$3.run(ThreadReuseExecutor.java:210)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      {code}

       

      This stack-trace got logged on each incoming JDBC request, so all DQP threads could not process the incoming requests, as VDB was corrupted. However, since ThreadReuseExecutor was used, these exceptions were caught by the thread-management layer, logged, and then the threads were ready to be reused. In other words, the exception never made it as far as to exit the thread. On the contrary, threads were alive, but requests that they served just never made it past this initMetadata() step.

       

      Restarting our application fixed the issue.

       

      VDB with null metadata is probably a bad/corrupted state. It should rather mean that this VDB should had never gotten successfully deployed to Teiid in the first place. Indeed, taking a closer look into our logs revealed that few minutes before this happened, there was a problem with the VDB deployment:

       

      {code}

      ....

      at org.teiid.query.metadata.NativeMetadataRepository.getMetadata(NativeMetadataRepository.java:83)

      at org.teiid.query.metadata.NativeMetadataRepository.loadMetadata(NativeMetadataRepository.java:60)

      at org.teiid.query.metadata.ChainingMetadataRepository.loadMetadata(ChainingMetadataRepository.java:55)

      at org.teiid.runtime.EmbeddedServer.loadMetadata(EmbeddedServer.java:762)

      at org.teiid.runtime.AbstractVDBDeployer.loadMetadata(AbstractVDBDeployer.java:168)

      at org.teiid.runtime.EmbeddedServer.deployVDB(EmbeddedServer.java:712)

      ...

      {code}

       

      Our application handles this case: upon hitting a failed VDB deployment, our application exits its main thread. However, we are not sure about DQP's threads: if they are daemon threads, they won't be waited from the JVM to exit. So the question here is: when using Embedded Teiid what is the daemon flag of the threads it starts/controls? Could a JVM exit be delayed due to lingering Teiid threads?

       

       

      The DQP thread management layer leverages ThreadReuseExecutor for managing the request worker threads, and taking a look at some of the JavaDoc there, I get a feeling that certain race conditions are possible in this class.

       

       

      So our main analysis points for the issue's root cause so far are:

      • lingering Teiid threads: the DQP worker pool remained alive, although our application's main thread exited due to the failed VDB deployment. If this was the root cause, how come we cannot reproduce the problem locally? - every time we have a failed VDB deployment locally, the whole JVM exits successfully, and no traces from Teiid are being left whatsoever;
      • some weird race condition in DQP thread management layer (maybe the DQPWorkContext ThreadLocals are not correctly cleaned up between reusing threads?)

       

      What do you think of this situation? Ever hit a similar problem?

       

      All my best,

      Krum.

        • 1. Re: VDB metadata problem during DQP request processing
          shawkins

          > So the question here is: when using Embedded Teiid what is the daemon flag of the threads it starts/controls? Could a JVM exit be delayed due to lingering Teiid threads?

           

          Threads that Teiid creates/managers are marked as daemon - see NamedThreadFactory used for both engine and socket threads.

           

          > Indeed, taking a closer look into our logs revealed that few minutes before this happened, there was a problem with the VDB deployment

           

          Do you have a full stack for this?  Perhaps it would help to address the root issue as well.

           

          Also the thread doing the deployment is not under the control of Teiid, is that your main thread?

           

          > The DQP thread management layer leverages ThreadReuseExecutor for managing the request worker threads, and taking a look at some of the JavaDoc there, I get a feeling that certain race conditions are possible in this class.

           

          The race condition mentioned there will cause more threads than needed to be created - not that any of the threads will live indefinitely.  The last known issue of any kind with this class was addressed in 8.9, so there shouldn't be anything expected that you are hitting.

           

          > lingering Teiid threads

           

          Do have a thread dump that demonstrates this?

           

          > some weird race condition in DQP thread management layer

           

          I don't quite follow how this would prevent your main thread from exiting, can you elaborate?

           

          • 2. Re: VDB metadata problem during DQP request processing
            hungrykoala

            Many thanks for your quick response and answers!

             

            > Do you have a full stack for this?  Perhaps it would help to address the root issue as well.

            > Also the thread doing the deployment is not under the control of Teiid, is that your main thread?

             

            Yes, our main thread is doing the VDB deployment as part of our application's start up.

             

            Full strack trace could not be disclosed I'm afraid, but it's really just hitting IllegalStateException from our VDB deployment code, that gets called from org.teiid.query.metadata.NativeMetadataRepository.getMetadata(NativeMetadataRepository.java:83) - last Teiid line of code shown above. The business logic of our application checks for the availability of the source of its virtual table, discover that it is actually missing, and hence throws an IllegalStateException. The main thread catches it, simply logs it, and re-throws it. Since there are no further exception-processing layers atop of this one, this effectively means that the main thread exits because of this exception.

             

            > Do have a thread dump that demonstrates this?

             

            I am working on getting a thread dump. The problem is that we are doing a post-mortem analysis, so this is difficult-to-impossible.

             

            > I don't quite follow how this would prevent your main thread from exiting, can you elaborate?

             

            I agree that the race condition idea as an explanation for the observed issue, is a short-sighted one.

            • 3. Re: VDB metadata problem during DQP request processing
              hungrykoala

              > I don't quite follow how this would prevent your main thread from exiting, can you elaborate?

               

              I agree that the race condition idea as an explanation for the observed issue, is a short-sighted one.

               

              Apologies, here is actually what I meant in this hypothesis: in it I assume that the VDB deployment was actually successful, hence no exceptions were thrown by the main thread, and the application started just fine. Later on, when JDBC requests were coming to Teiid from external clients, the worker threads failed to process them, because they saw incompletely initialized VDB metadata (I believe it eventually comes from a ThreadLocal from the class DQPWorkContext -> it contains the Session, which on its turn contains the VDB and its attachments (which includes the metadata)). Certain issues of that kind are definitely possible in Java, in case the construction/initialization and publication of a given object is not done in a safe way in a multi-threaded environment - other threads could possibly pick up a partially constructed value, or even a null one.

               

              Do you believe that could possibly be happening, or is it rather too exotic an explanation?

              • 4. Re: VDB metadata problem during DQP request processing
                hungrykoala

                I was able to reproduce the problem. Here is what happens: our application code first creates and starts the embedded Teiid server (at that very moment its worker thread pool is initialized, created, and ready to serve incoming requests), and only then it deploys the main application VDB. The deployment takes few seconds, as it is relatively heavy.

                 

                The exception observed happens within the short time window between the moment when the server is fully created and started, and the moment when the application VDB is deployed. Within that time window, the VDB metadata, is unsurprisingly, null, causing the exception message pasted above.

                 

                One could argue that it should rather be the opposite order: deployment should be done first, and then the server started - or at least the part which instantiates and initializes the worker thread pool need not be enabled during deployment. However, taking a look at the code of the deployVDB method, it first checks if the server is started and fails if that is not the case. So this is probably why our application had to execute start() and deploy() in the very same order.

                 

                Should I report this as a bug/enhancement request?

                • 5. Re: VDB metadata problem during DQP request processing
                  shawkins

                  To make sure you are saying that there is thread responsible for starting Teiid and another thread attempting to do a deployment?

                   

                  Yes, please capture an issue, then we can determine what needs to be done.

                  • 6. Re: VDB metadata problem during DQP request processing
                    hungrykoala

                    No, it is actually the same thread - the main application thread - that executes the following code:

                     

                    void startApplication() {

                        createEmbeddedServer(); // 1

                        deployVDB();                    // 2

                    }

                     

                    createEmbeddedServer() method does the following:

                     

                    {

                    EmbeddedConfiguration ec = createEmbeddedConfiguration();

                    EmbeddedServer teiidServer = createEmbeddedTeiidServer();

                    teiidServer.start(ec);

                    }

                     

                    So, assuming that line 2 takes some time to complete the deployment, the time window between the fully completed line 1 and fully completed line 2 can exhibit the problem described: as the VDB deployment is in progress, its metadata is still null, hence the error observed. This is because after line 1 is executed, Teiid has already started its worker thread pool.

                     

                    With respect to the impact this can have, it is probably low, as the time window duration is supposedly short - after it requests get processed successfully, as VDB metadata is already created.

                    • 7. Re: VDB metadata problem during DQP request processing
                      hungrykoala

                      I have tried to log into JBoss' JIRA ( JBoss Issue Tracker ), but to no success. Hitting the 'Log in' button doesn't do anything (doesn't show any form for entering credentials), hitting 'Sign Up' button redirects me to this forum space where I am already logged. So not sure how to report the problem.

                      • 8. Re: VDB metadata problem during DQP request processing
                        shawkins

                        Login should redirect you to: https://sso.jboss.org/login

                         

                        But if you are already logged in on the forum, you should be logged in for

                        JIRA as well.

                         

                        Can you try directly logging into the sso link above?

                         

                        On Tue, May 16, 2017 at 12:00 PM, hungrykoala <do-not-reply@jboss.com>

                        • 9. Re: VDB metadata problem during DQP request processing
                          hungrykoala

                          Sorry for the delay, the anti-spam protection of the forum did not allow me to post new messages in the forum for a while. It seems it is now working again.

                           

                          So I tried logging through the SSO link, which initially showed me as logged in:

                          LogIn.png

                           

                          but after clicking on the 'JIRA issue tracker' link I got sent to a page where I was not logged in:

                           

                          LogOut.png

                           

                          I will try with a different browser maybe, this one was with FF.

                          • 10. Re: VDB metadata problem during DQP request processing
                            shawkins

                            That is quite odd.  If you still have issues even with another browser, try sending a message with your login problem description to help@jboss.org

                            • 11. Re: VDB metadata problem during DQP request processing
                              hungrykoala

                              Just did that, as IE did not work too.

                               

                              (I also used the same email to report & resolve the "you cannot post messages" problem I had earlier today)

                              • 12. Re: VDB metadata problem during DQP request processing
                                hungrykoala
                                • 13. Re: VDB metadata problem during DQP request processing
                                  shawkins

                                  This behavior looks specific to older versions of Teiid.  Can you try your usage with a later release?

                                  • 14. Re: VDB metadata problem during DQP request processing
                                    hungrykoala

                                    Unfortunately, as per the migration guide Java 1.8 is required for Teiid 9.1+, while our project still has some fixed dependency to Java 1.6 that cannot be updated easily.

                                     

                                    Anyways, I read your comment from the JIRA, thanks for the quick reply.

                                     

                                    Out of curiosity, what is Teiid's behaviour when different clients of the same VDB send SQL requests to it, at different points in time? Assuming that the VDB has undergone multiple redeployments along the way, is Teiid going to remember the VDB version that each client was initially served, and continue to serve these initial at-connect-time versions?

                                    1 2 Previous Next