8 Replies Latest reply on Jan 28, 2014 8:53 AM by rareddy

    Adding performance instrumentation

    markaddleman

      I've got some extra time on my hands lately and I thought I'd contribute some performance instrumentation.

       

      Background

      Lately, our team has been hit by a few different performance problems that aren't directly attributed to Teiid but our lack of visibility into how the engine and translators process queries makes getting to the root of the problem more difficult than it needs to be.  Fundamentally, we needed to surface a few key pieces of information and correlate those with user activity:

      1. The query plan
      2. The translators involved in the plan
      3. The work that the translators are doing:
        1. CPU & latency of ExecutionFactory.execute() and ExecutionFactory.getConnection()
        2. CPU & latency of Execution.execution(), cancel() and close()
        3. Cumulative CPU & latency of ResultSetExecution.next()
        4. Total number of rows returned from ResultSetExecution.next()

       

      Implementation Idea

      Basically, I see two parts to this.  More convenient plan logging and engine performance logging.

       

      To improve plan logging, I see adding a new log option that, when enabled, will log each unique, normalized user query and its plan.  I was thinking of using a bloom filter to maintain the set of normalized user queries to keep the memory requirements low.  If the query doesn't already exist in the set, log the plan and the user query.  If I can find the right point in the code where the normalized query is already represented as a string, the CPU hit should be pretty small.

       

      Logging the engine performance is a bit more involved.  Inside the engine, instrument DataTierManager.registerRequest() to log the user, user query and request id.  In order to capture translator performance, it looks like the leverage point is ConnectorWorkItem, mostly in the execute() and handleBatch() methods.  From these two, log the request id, the part identifier, the CPU and latency consumed by the translator.

       

      Obviously, the missing piece is the work that Teiid does to fulfill the user query.  I haven't found a good leverage point for that but I'll keep looking (hints gladly accepted!).  Fortunately, Teiid itself hasn't been the source of performance problems so I consider it a lower priority.

       

      What do people think?  Am I on the right track or do you have suggestions or alternatives?

        • 1. Re: Adding performance instrumentation
          rareddy

          I am sure you looked at COMMAND_LOG context right? It already gives context of the query at user level and translator level with respective times. For motivated users, it also provides the CommandContext object that you can grab inside a Log4J logging appender to log more relevant information to the user.

           

          If you are looking anything more than that, we can certainly accept your observations.

           

          Ramesh..

          • 2. Re: Adding performance instrumentation
            markaddleman

            Embarrassingly, no, I haven't looked at the COMMAND_LOG.  I'll do that before going forward.  Thanks, Ramesh

            • 3. Re: Adding performance instrumentation
              markaddleman

              Thanks for the pointer, Ramesh.  COMMAND_LOG has everything I wanted except for the CPU time accumulated by the execution.  I took a stab at adding that and threading it through the CommandLogMessage.  Attached is the patch.  Please ignore the changes to the pom file, the changes were just to make Eclipse happy.  I'd like to see this added to Teiid and I'm willing to do the work but I'm unsure what the next steps are.

              • 4. Re: Adding performance instrumentation
                rareddy

                Mark,

                 

                There are start and end times for each operation, if you can collate the start and end messages of the user and data then you can get the CPU times for each.

                 

                As per taking the patches, I am not sure if you already signed the CLA but you need to sign Contributor License Agreements then fork the teiid github teiid/teiid · GitHub repo into your own account. Then make the necessary changes in your fork, make sure all tests run correctly etc, then commit changes to your fork. Then with a given JIRA issue number in the commit message, send us pull request from github. We will then verify and accept the changes. BTW, we commiters also follow the same procedure as there is no direct commits allowed.

                 

                Thanks.

                 

                Ramesh..

                • 5. Re: Adding performance instrumentation
                  markaddleman

                  > There are start and end times for each operation, if you can collate the start and end messages of the user and data then you can get the CPU times for each

                   

                  This provides wall clock time but not CPU.  Many of our translators perform IO operations and, among other reasons, we want to be able to disentangle latency due to CPU versus IO operations.  Additionally, we run Teiid in a mainframe environment where CPU cycles are precious.  Calling out CPU specifically allows us to better hone our tuning efforts.

                   

                  re CLA, github and JIRA-

                  Thanks.  I'll follow this procedure.

                  • 6. Re: Adding performance instrumentation
                    rareddy

                    Ah!, yes that would be a good addition.

                    • 7. Re: Adding performance instrumentation
                      markaddleman

                      I've finished the translator side - it was very straightforward.  I could use some guidance on the user query side.  I want to capture the CPU time that Teiid performs to realize the result set - essentially any work that wasn't pushed down to the translators.  Can you point me to a place in the code that manages this process?  Is the unpushed query engine work also multithreaded?

                      • 8. Re: Adding performance instrumentation
                        rareddy

                        RequestWorkItem does all the thread work, so that is only common place.