9 Replies Latest reply on Jan 20, 2015 10:13 AM by shawkins

    Teiid works fine for the first 20 queries and then blocks

    bcgl

      Hello,

       

      I am using Teiid 8.8.0.Final with two RDBMS: MySQL and PostgreSQL.

      The SQL queries (generated by the Ontop query-rewriter) are rather complex and usually involve joins over the tables of the two DBs.

       

      I am using a query mixer to run variants of 10 template queries. High-level queries are run sequentially.

       

      It worked fine for two first iterations (so around 20 high-level queries) but after something it blocks, as captured in the log file (see at 16:21:25,504).

      After ten minutes, I stopped the server, some out-of-memory errors happened and the running sub-queries are finally closed.

       

      Do you have any clue about why Teiid blocks and how to prevent it?

       

      Please find attached the log file where I set org.teiid.PROCESSOR to DEBUG.

       

      Best regards,

      Benjamin

        • 1. Re: Teiid works fine for the first 20 queries and then blocks
          rareddy

          The default server comes with only -mx1303 of heap size, which too small. Teiid data fetching can use disk buffering, however if the other parts of the VM requires more memory then you will see issues like this. Try increasing the heap size in "<jboss-as>/bin/standalone.conf" file.

          1 of 1 people found this helpful
          • 2. Re: Teiid works fine for the first 20 queries and then blocks
            shawkins

            From the log there are a host of related errors:

             

            16:32:50,195 WARNING [org.jboss.netty.channel.socket.nio.AbstractNioSelector] (New I/O worker #5)  Unexpected exception in the selector loop.: java.lang.OutOfMemoryError: GC overhead limit exceeded

             

            As Ramesh is saying an overhead limit could be hit because the vm is too small.  If possible capture a heap dump on out of memory and see what is taking the most heap space.  You can also adjust other Teiid settings, such as reducing the max active plans to prevent too much concurrent processing by Teiid.

            1 of 1 people found this helpful
            • 3. Re: Teiid works fine for the first 20 queries and then blocks
              bcgl

              Thank you for your answers!

               

              Surprisingly, the problem does not seem to come from the VM size (I had already set up the heap size to 6 GB) but to the limit of active plans that was… too low. With 5 or 10 active plans, it blocks almost immediately, but with 40 active plans, it works nicely. However, I do not understand yet why.

               

              In terms of memory, it uses ~ 2GB.

               

              Thank you again for help.

              • 5. Re: Teiid works fine for the first 20 queries and then blocks
                shawkins

                > With 5 or 10 active plans, it blocks almost immediately, but with 40 active plans, it works nicely. However, I do not understand yet why.


                I don't think we have enough information yet to know exactly what you are seeing.  If what you are observing as blocking is an out of memory / GC overhead error, then that would need to be isolated more.  If what you are seeing is just from the logs of the engine reporting that processing is being queued/blocking, but is otherwise completing normally - then that is expected.

                • 6. Re: Re: Teiid works fine for the first 20 queries and then blocks
                  bcgl

                  Thanks for the links.

                   

                  I think I am in the second situation because by reducing the number of active plans, it blocks quickly (after 30s with a maximum of 10 active plans, see this attached log file) without waiting for the allocated memory to become important. Nevertheless, after looking again at your documentation, I still do not understand why such a blocking should be expected.

                  • 7. Re: Re: Teiid works fine for the first 20 queries and then blocks
                    shawkins

                    > I still do not understand why such a blocking should be expected.

                     

                    I'm still not sure if we're on the same page yet on the term blocking.  The term is somewhat overloaded.  Can you clarify what you are seeing?  The first log showed a critical GC error, and this log only shows the connector level at a debug and nothing that seems too meaningful.

                    • 8. Re: Re: Teiid works fine for the first 20 queries and then blocks
                      bcgl

                      In this second experiment, I can see "normal" log outputs during 30s but after 14:18:43,318 no more log output is printed and my client system keeps waiting for an answer to its SQL query. I have stopped the experiment after 10 min. This is what I meant by the term "blocking".

                       

                      With more active plans allowed, Teiid is able to process every query in less than 10s.

                       

                      In this experiment, there is no concurrency, just one single client sending one SQL query at a time.

                       

                      I do not know well yet the behavior of Teiid, but at first glance, I would have expected some slowing down or rejection of the SQL query but not the system to block suddenly.

                      • 9. Re: Re: Teiid works fine for the first 20 queries and then blocks
                        shawkins

                        > In this experiment, there is no concurrency, just one single client sending one SQL query at a time.

                         

                        We would need to see a log at full trace/debug and ideally a thread dump from when it seems to hang. 

                         

                        > I do not know well yet the behavior of Teiid, but at first glance, I would have expected some slowing down or rejection of the SQL query but not the system to block suddenly.

                         

                        No a GC overhead error nor a query not completing is not expected behavior.