3 Replies Latest reply on Nov 5, 2015 3:32 PM by shawkins

    BufferManagerImpl getting into inconsistent state and causing eventual OOM with Teiid

    nickwross

      Hi


      We have been experiencing some sporadic OOM issues with Teiid and have hit a brickwall with our investigations. The BufferManagerImpl seems to get into an inconsistent state and stops evicting data - eventually leading to OOM.


      Monitoring via debugging and some additional logging the memoryEntries and eviction queues seem to behave as expected (growing then shrinking back down over time). However, during occasional periods of heavy load the BufferManagerImpl gets into an inconsistent state, once in this state evictions stop and BufferManagerImpl#memoryEntries are never cleared - even once all querying has completed.

       

      Some observations once the inconsistent state is reached;

      • This state is reached before the heap is put under pressure
      • The inconsistent state is never resolved, even when querying has ceased.
      • BufferManagerImpl#memoryEntries does not clear out, refs are strongly reachable in heap dump
      • The BufferManagerImpl#Cleaner timer is still running
      • Both BufferManagerImpl#evictionQueue and BufferManager#initialEvictionQueue collections are empty
      • BufferManagerImpl#evictionQueue.getSize() and BufferManager#initialEvictionQueue.getSize() report negative sizes (LrfuEvictionQueue)

       

      Are there any known issues in this area?
      Any suggestions for further troubleshooting?

      This is of course hard to repeat so raising a Jira with a reproducible use case may not be feasible.

      I can provide further info if required, heap dump screenshots etc...

       

      Config: v8.10.0, 8Gb heap, no disk cache, 10 query plans, default mem settings.


      Thanks,

      -Nick

        • 1. Re: BufferManagerImpl getting into inconsistent state and causing eventual OOM with Teiid
          shawkins

          > Are there any known issues in this area?


          Nothing that matches this description.  There were some configuration and other changes in later releases, such that it would be helpful to try this scenario on 8.12.1 or 8.11.5 just to get a more recent baseline.


          > Any suggestions for further troubleshooting?


          Several things seem very wrong.  The negative queue size is particularly odd.  If you don't see the behavior in a debugger then something like byteman or just creating a test patch may be necessary to understand things further.  Trapping when memory entries exist that are not in an eviction queue or when a negative size occurs could help.  If it's possible to provide at least the query plans / sample client, then we can work at reproducing this as well.  

          • 2. Re: BufferManagerImpl getting into inconsistent state and causing eventual OOM with Teiid
            nickwross

            Thanks for the quick reply. We plan to test with later versions and will follow-up when we have some results.


            One question to help with our debugging, I think you suggested this but can you just confirm that there should never be entries in the memoryEntries collection that are not also present in at least one of the eviction queues?

             

            Thanks,

            -Nick

            • 3. Re: BufferManagerImpl getting into inconsistent state and causing eventual OOM with Teiid
              shawkins

              > One question to help with our debugging, I think you suggested this but can you just confirm that there should never be entries in the memoryEntries collection that are not also present in at least one of the eviction queues?

               

              Yes that is correct.