5 Replies Latest reply on Oct 12, 2017 1:28 PM by shawkins

    Connection Reset.. sometimes..

    woodsr

      I am executing an ETL job through a JDBC teiid driver sitting on a linux box that is hitting an Oracle 11 DB. I can successfully run specific numbers of 'jobs' with no errors and the jobs function exactly as designed.  We run 1 update statement, works fine.  We run 1,000, works fine.  We run 10,000, it gives this error in our ETL tool after 3964 records update in the database. 

      SQLException org.teiid.net.socket.SingleInstanceCommunicationException: Connection reset

      I then tried several other number of records and got other odd results.  I tried 100,000 records and it has the same behavior and fails this time at 4048 records. 

      What we can’t seem to figure out though is why that number is consistent.  What we found was that specific query of 10,000 records gave that specific record result.  A slightly different set of 10,000 records would give a slightly different failure point. All of the failure points where +/- 100 records of 4000. 

      The other sidewinder is that sometimes it works.  I can run the same 10,000 record job 5 times and two times it works, three times it fails on the exact number of records, which in this example is 3964. 

      After lots of google research and bugging different people I decided to start posting this problem on a few sites to see if anyone could recommend some troubleshooting ideas as we are officially out of them.

       

      What we have tried and proved to not impact.

      Nothing is unique about the data or the query that is created.  We can take those queries and run them directly in oracle and they work fine without exception. 

      The ETL tool has been ruled out as it works for 1,000’s of other jobs without having this issue. 

      Oracle and linux settings have been attempted such as changing the SQLNET.EXPIREYTIME to various settings and using urandom vs. random in the java settings and those seem to change the frequency of the job passing or failing, instead of every other time it started failing ever 3 or 4 times, but the record count at which it would fail would not change.  If it failed, it failed at the exact same record count.

      Environment:

      Teiid Driver 8.4 (also tried 9.1, same result)

      Oracle 11.2.0.4.0

      Linux 6.9 64bit

        • 1. Re: Connection Reset.. sometimes..
          shawkins

          Since it seems somewhat consistent what information is in the server logs related to this request?  And what version of the Teiid server are you using?

          • 2. Re: Connection Reset.. sometimes..
            woodsr

            That is the interesting thing about this.  First let me explain how our job is running, we are sending individual update statements through the JDBC connection so we can see each transaction if it hits the server before it hits oracle.   What is super weird is that in this case we only see the 3964 transactions on the runs it fails and there is no logging.  From the Teiid server its basically like the job just finishes.  Each transaction is a separate connection so its not like there is a job complete log, it just stops running the transactions pass or fail.   So the 'failure' is only shown in our ETL tool.

             

            We also did some trace logging in the oracle side and did not really get much from that, which was one of our original theories as the function that was sending the request to reset the connection, maybe because it thought it was idle (per our dba), which we were expecting considering that we changed settings on the oracle server did change the frequency of the failure, however nothing.

             

            The other theory we had was the dev/random vs. dev/urandom settings in java.security file but the test results never pained out, again only changed the frequency at which the error occurs in our job, never solving the problem just masking it.

            • 3. Re: Connection Reset.. sometimes..
              shawkins

              So each transaction from the ETL tool perspective is a unit of work that obtains a Teiid connection, does something that results in an update, and closes the connection (or is it a pooled connection)?  And after ~4000 the ETL tool reports a failure, but there is no failure reported in the server logs.  Is that correct?

               

              The next thing to do would be to increase the logging level for org.teiid.TRANSPORT and org.teiid.COMMAND_LOG to a debug level - see Logging Configuration - Latest WildFly Documentation - Project Documentation Editor

              • 4. Re: Connection Reset.. sometimes..
                woodsr

                Yes that is correct.  Another piece of information is we have our Teiid server duplicated with a load balancing tool that determines which machine to hit for our jobs called F5.   We have attempted to rule that out by turning off one of the load balance boxes so its only hitting one and it has not seemed to have an impact in our testing results, but there is still a hostname swap that happens between the ETL JDBC connection string and the Teiid server.

                 

                Thanks for being so responsive, really appreciate it.

                • 5. Re: Connection Reset.. sometimes..
                  shawkins

                  Having the debug level log will help determine if there was any action on the server side as to why the connection were terminated.

                   

                  If you by-pass the load balancer altogether do you still hit the issue?