0 Replies Latest reply on Feb 16, 2012 11:29 AM by d_edery

    Old JBPM (ver 3.2.7) non-logical exception "org.jbpm.JbpmException: transition 'no-work-done' does not exist on Node(preScan)"

    d_edery

      Hi

      We are using JBPM 3.2.7 in our large-scale application for a long time now.

      Recently, due to race-conditions with the DB commit we have decided to move from persistent mode (in which all the process-definitions, process-instances etc... were stored in the DB) to non-persistent that works in the following flow:

      1. We read all the .xml files containing process definitions.
      2. For each .xml file we create a process definition and store it in a static map.
      3. When the user wants to activate a flow (it's usually the same flow on multiple targets) - we create a new ProcessInstance based on the same instance of ProcessDefinition (this might explain the phenomena but we didn't see how from the code).
      4. Each thread in a dedicated thread pool takes one ProcessInstance and activates it (via .signal).
      5. Each action calls the .signal (with or without transition name) in the end of its business logic - thus creating a synchronized workflow.

       

      Recently we have encountered a problem (it occurred only one time) that caused failure in some of the targets (ProcessInstances).

      The exception was (full stack-trace attached to the discussion, highlighted lines describe the flow):

       

      org.jbpm.JbpmException: transition 'no-work-done' does not exist on Node(preScan)

          at org.jbpm.graph.exe.Token.signal(Token.java:170)

          at com.yyyy.nms.massconfig.swdownload.PreScanAction.run(PreScanAction.java:42)

          at com.yyyy.nms.massconfig.common.BaseAction.execute(BaseAction.java:71)

          at org.jbpm.graph.def.Action.execute(Action.java:137)

          at org.jbpm.graph.def.GraphElement.executeAction(GraphElement.java:280)

          at org.jbpm.graph.def.Node.execute(Node.java:395)

          at org.jbpm.graph.def.Node.enter(Node.java:375)

          at org.jbpm.graph.def.Transition.take(Transition.java:151)

          at org.jbpm.graph.def.Node.leave(Node.java:453)

          at org.jbpm.graph.exe.Token.signal(Token.java:214)

          at org.jbpm.graph.exe.Token.signal(Token.java:143)

          at com.yyyy.nms.massconfig.common.FirstWorkflowAction.run(FirstWorkflowAction.java:94)

          at com.yyyy.nms.massconfig.common.BaseAction.execute(BaseAction.java:71)

          at org.jbpm.graph.def.Action.execute(Action.java:137)

          at org.jbpm.graph.def.GraphElement.executeAction(GraphElement.java:280)

          at org.jbpm.graph.def.Node.execute(Node.java:395)

          at org.jbpm.graph.def.Node.enter(Node.java:375)

          at org.jbpm.graph.def.Transition.take(Transition.java:151)

          at org.jbpm.graph.def.Node.leave(Node.java:453)

          at org.jbpm.graph.node.StartState.leave(StartState.java:78)

          at org.jbpm.graph.exe.Token.signal(Token.java:214)

          at org.jbpm.graph.exe.Token.signal(Token.java:143)

          at org.jbpm.graph.exe.ProcessInstance.signal(ProcessInstance.java:287)

          at com.yyyy.nms.jbpmworkflow.JbpmWorkflowInvoker.startNonPersistentWorkflow(JbpmWorkflowInvoker.java:271)

          at com.yyyy.nms.jbpmworkflow.JbpmWorkflowInvoker.startWorkflow(JbpmWorkflowInvoker.java:350)

          at com.yyyy.nms.jbpmworkflow.StartWorkflowMDB.signalWorkflow(StartWorkflowMDB.java:101)

          at com.yyyy.nms.jbpmworkflow.StartWorkflowMDB.handleMessage(StartWorkflowMDB.java:85)

       

      In the .xml file from which the related ProcessDefinition was read looks like this (in the relevant part):

       

      <exception-handler>
           <action class="com.yyyy.nms.massconfig.common.ExceptionAction" />
      </exception-handler>
      
      <start-state>
           <transition to="check-suspend-or-abort"></transition>
      </start-state>
      
      <node name="check-suspend-or-abort">
           <action class="com.yyyy.nms.massconfig.common.FirstWorkflowAction" />
           <transition to="preScan"/>
      
           <transition to="error" name="error"/>
           <transition to="suspended" name="suspended"/>
           <transition to="aborted" name="aborted"/>
      </node>
      
      
      <node name="preScan" description="Pre-upgrade Scan" shouldField="preScan">
            <action class="com.yyyy.nms.massconfig.swdownload.PreScanAction" />
      
           <event type="node-enter">
                <action class="com.yyyy.nms.massconfig.common.StepStartAction" /> 
           </event>
      
           <transition to="acquireFtpForUpload"> 
                <action class="com.yyyy.nms.massconfig.common.StepDoneAction" />
           </transition>
      
           <transition to="acquireFtpForUpload" name="no-work-done"> 
                <action class="com.yyyy.nms.massconfig.common.StepSkippedAction" /> 
           </transition>
      
           <transition to="error" name="error"/>
           <transition to="suspended" name="suspended"/>
           <transition to="aborted" name="aborted"/>
      </node>
      

       

      As you can see, if we are in the "preScan" node - we definitely have the "no-work-done" transition.

       

      The scenario that produced this exception contained 40 process instances (=40 different targets) based on the same ProcessDefinition instance. Out of which 7 targets failed on this exception and the rest (which went through the same nodes and transitions) completed successfully (before and after the 7 failed).

       

      The only clue to a possible problem was that the JBPM exception thrown for all 7 targets was thrown at the same exact time (same millisecond even). Meaning - all 7 tried to perform the .signal("no-work-done") in the PreScanAction at the same exact time.

      We have generated the same scenario on our development environment (using CountDownLatch in the prior to the .signal("no-work-done")) with 10 targets that wait together and all of them call the .signal("no-work-done") in the same exact millisecond - the failure was not reproduced.

       

      Our question is - did you encounter such scenario? If so - was it fixed in newer versions? can we bypass it? (I thought about creating a new ProcessDefinition per ProcessInstance - this will cost a lot in terms of memory consumption and CPU load - but since I couldn't reproduce the problem, I'm not sure that it will solve it).

       

      Oh, we create the ProcessDefinitions map during server load. We don't change them after the creation of the map.

      As you can see in the attached screenshot (from LogMX) - this is how it looks like, 7 different threads throw the same exception on the same time.

      transitionProblem.GIF

      Thank you for your time.

      Sincerely

      David Edery.