4 Replies Latest reply: Nov 18, 2011 6:54 AM by anuj bhatia RSS

    Retry and Timeout settings

    anuj bhatia Newbie

      Hi,

       

      I have a web service request that takes around 10 minutes to complete. This web service is being invoked from an INVOKE activity in a BPEL process running in RiftSaw.

       

      I find that RiftSaw re-invokes the web service after every 5 minutes even though I've set mex.timeout=900000 and added the faultOnFailure extension element:

       

           <ext:failureHandling xmlns:ext="http://ode.apache.org/activityRecovery">

                  <ext:faultOnFailure>true</ext:faultOnFailure>

            </ext:failureHandling>

       

      After around 5 minutes I see a connection reset exception in the JBoss logs from org.apache.cxf.transport.http.HTTPConduit. But shouldn't the process fail at that point instead of retrying the request?

       

      Thanks

      Anuj

        • 1. Re: Retry and Timeout settings
          Marek Baluch Newbie

          Hi Anuj,

           

          I recommend you use an asynchrounous scenario for such web-service invocation. It's a far better solution then changing the timeout for such long running services.

           

          Just to be sure - you changed the mex.timout by adding a name.endpoint properties file into the jar right next to the process definition correct? Thanks

           

          Best regards

          Marek.

          • 2. Re: Retry and Timeout settings
            anuj bhatia Newbie

            Hi Marek,

             

            I agree an asynchronous approach would be better for this scenario and I plan to implement that in the long run. I was wondering if there was a quick fix or whether there was something wrong in my settings.

             

            Also, I've set the mex.tiomeout property in a file called service-config.endpoint that is placed into the jar right next to the process definition. I'm pretty sure it's taking effect.

             

            On further investigation I found a probable cause of the problem is that the JBoss transaction timeout is set to a value smaller than the web service invoke timeout. This is causing some unexpected behavior in RiftSaw. Here's what I observed:

             

            1. Assume that the web service returns a response in 7 minutes, the transaction timeout is set to 5 minutes, the mex.timeout is set to say 15 mins.

             

            2. The BPEL process invokes the web service.

             

            3. After 5 minutes there's a transaction rollback warning in the JBoss logs: [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action a282a58:a20:4ec5f30d:3db aborting with 1 threads active!

             

            4. After 7 minutes (when the invoked web service returns) there's an error from RiftSaw (I assume because the corresponding transaction has been aborted):

             

            org.hibernate.LazyInitializationException: could not initialize proxy - no Session

                at org.hibernate.proxy.AbstractLazyInitializer.initialize(AbstractLazyInitializer.java:86)

                at org.hibernate.proxy.AbstractLazyInitializer.getImplementation(AbstractLazyInitializer.java:140)

                at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.invoke(JavassistLazyInitializer.java:190)

                at org.apache.ode.dao.jpa.bpel.ProcessInstanceDAOImpl_$$_javassist_22.getInstanceId(ProcessInstanceDAOImpl_$$_javassist_22.java)

                at org.apache.ode.bpel.engine.PartnerRoleMessageExchangeImpl.continueAsync(PartnerRoleMessageExchangeImpl.java:136)

                at org.apache.ode.bpel.engine.PartnerRoleMessageExchangeImpl.reply(PartnerRoleMessageExchangeImpl.java:88)

                at org.jboss.soa.bpel.runtime.ws.WebServiceClient$TwoWayCallable$1.call(WebServiceClient.java:298)

                at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:294)

             

            5. After 15 minutes (at end of mex.timeout) the process is marked as failed, with the following messages in the JBoss logs:

             

            [org.apache.ode.bpel.runtime.INVOKE] (ODEServer-3) Failure during invoke: No response received for invoke (mexId=hqejbhcnphr6rgm1w5uw09), forcing it into a failed state.

             

            6. At this point intermittently I find that the web service is re-invoked, though I haven't found the exact scenario in which this happens I think it's probably because some how the ode_job table is left in an inconsistent state (see next point).

             

            7. At the end of this process it's not possible to shut down the JBoss server normally using the shutdown.sh command. The last message logged is:

             

            [org.jboss.soa.bpel.runtime.engine.service.BPELEngineService] (JBoss Shutdown Hook) Stopping JBoss BPEL Engine

             

            and it keeps waiting for the BPEL Engine to stop (I think there's some lock that's not released correctly). So I have to terminate the JBoss process using kill -9. At this point I think sometimes the ode_job table is left inconsistent and in the next test run I see the web service being re-invoked even though it shouldn't be because I've set faultOnFailure to true.

             

             

            I think there should be some check in RiftSaw to detect that the mex.timeout value is being set to a value greater than the JBoss transaction timeout and report it as an error. Also, there's definitelt seems to be a bug with some locks not being released properly that prevents a clean JBoss shutdown.

             

            I'm testing with JBoss 5.1.0 and RiftSaw 2.3.0.Final and JBoss WS is using CXF 3.4.0.

             

            Do you think it's worth logging a Jira for this or am I missing something?

             

            Thanks

            Anuj

            • 3. Re: Retry and Timeout settings
              Gary Brown Master

              Hi Anuj

               

              Yes if you could raise a jira outlining this scenario, and if possible a simple test case to demonstrate the problem.

               

              Thanks.

               

              Regards

              Gary