1 2 Previous Next 16 Replies Latest reply on Nov 26, 2008 10:24 AM by kukeltje

    handling bulk amounts of processinstances; transaction probl

    macd

      Can anybody give some advice?

      I am trying to figure out if and how jBPM can handle large amounts of processinstances at once.
      The scenario is this: a collection of domain-objects (possible 100.000) has to be processed.
      All elements may need the same update, but it may also be possible to deal with each element individually;

      The 1st approach i tried was to create 100.000 processinstances individually. This costs lots of time: almost 4 hours, which is not acceptible.

      So i tried some different approaches:
      1. use batches when closing a jbpmContext to create and signall instances
      2. let a process create 10.000 subprocesses (foreachfork)
      3. use foreachfork for each element of the collection
      4. tried those last 2 with an asynchronous processstep
      This last method ensured that each fork was scheduled as a job

      One problem occurred every time: when upsizing the collection from 100 to 10.000 instances, forks or subprocesses, i recieve an errormessage concerning a transaction:
      - transaction not active, cannot open connection
      - transaction closed, cannot commit

      i searched the forums an found that this is probably caused by a transaction-timeout, which you happily can change.
      http://wiki.jboss.org/wiki/Wiki.jsp?page=TransactionTimeout

      i think this essentially doesn't fix my problems, cause i can't be upsizing my timeout to eternity. Besides it puzzles me that even the job-executor gets these exceptions.

      So my guess is that i should change the config. I know there are some options left, but i don't know what is wise. Can anybody give me some advice?

      thanx, Marc

      my current config

      JBOSS 4.2.2 GA
      JBPM 3.2.2
      Oracle 9

      <jbpm-context>
       <service name="persistence">
       <factory>
       <bean class="org.jbpm.persistence.db.DbPersistenceServiceFactory">
       <field name="isTransactionEnabled"><false /></field>
       <field name="isCurrentSessionEnabled"><true /></field>
       </bean>
       </factory>
       </service>
       <service name="tx" factory="org.jbpm.tx.TxServiceFactory" />
       <service name="message" factory="org.jbpm.msg.db.DbMessageServiceFactory" />
       <service name="scheduler" factory="org.jbpm.scheduler.db.DbSchedulerServiceFactory" />
       <!--service name="logging" factory="org.jbpm.logging.db.DbLoggingServiceFactory" /-->
       <service name="authentication" factory="org.jbpm.security.authentication.DefaultAuthenticationServiceFactory" />
      
      ...
      
       <bean name="jbpm.job.executor" class="org.jbpm.job.executor.JobExecutor">
       <field name="jbpmConfiguration"><ref bean="jbpmConfiguration" /></field>
       <field name="name"><string value="JbpmJobExector" /></field>
       <field name="nbrOfThreads"><int value="50" /></field>
       <field name="idleInterval"><int value="5000" /></field>
       <field name="maxIdleInterval"><int value="3600000" /></field> <!-- 1 hour -->
       <field name="historyMaxSize"><int value="20" /></field>
       <field name="maxLockTime"><int value="600000" /></field> <!-- 10 minutes -->
       <field name="lockMonitorInterval"><int value="60000" /></field> <!-- 1 minute -->
       <field name="lockBufferTime"><int value="5000" /></field> <!-- 5 seconds -->
       </bean>
      
      
       </jbpm-context>
      
      <hibernate-configuration>
       <session-factory>
       <property name="hibernate.session_factory_name">JbpmHibernateSessionFactory</property>
       <property name="hibernate.connection.autocommit">false</property>
       <property name="hibernate.jndi.class">org.jnp.interfaces.NamingContextFactory</property>
       <property name="hibernate.dialect">org.hibernate.dialect.Oracle9iDialect</property>
       <property name="hibernate.show_sql">false</property>
       <property name="hibernate.format_sql">false</property>
       <property name="hibernate.use_sql_comments">false</property>
       <property name="hibernate.cache.provider_class">org.hibernate.cache.HashtableCacheProvider</property> <!-- org.hibernate.cache.EhCacheProvider -->
       <property name="hibernate.connection.datasource">java:/JbpmDS</property>
       <property name="hibernate.transaction.factory_class">org.hibernate.transaction.JTATransactionFactory</property>
       <property name="hibernate.transaction.manager_lookup_class">org.hibernate.transaction.JBossTransactionManagerLookup</property>
       <property name="jta.UserTransaction">java:comp/UserTransaction</property>
      ...
      ...
      </hibernate-configuration>
      
      
      




        • 1. Re: handling bulk amounts of processinstances; transaction p
          kukeltje

          you describe your problem, but not what you want to achieve, other than '100.000 domain objects need processing'

          This is so vague that it is hard to even start thinking about a solution. Maybe 4 hours is not so bad if you can at almost the same time finish the 'process' (what ever it may be). So many options from not using persistency, not using workflow/bpm but use businessrules to just doing it in plain java. Remember bpms are not 42

          • 2. Re: handling bulk amounts of processinstances; transaction p
            macd

            I need persistable workflows: the domain objects need to be dealt with by several actors, in several steps.
            At this point i haven't created any relevant process steps; i've just persisted the instances.
            But before implementing any business logic i want to figure out if and how i can deal with large numbers.

            • 3. Re: handling bulk amounts of processinstances; transaction p
              kukeltje

              Macd, so you just create 100.000 instances and persist them?
              Still then, more relevant info is needed
              What database are you using? any optimisations in them (indexes etc) ? Do you use one transaction for all creations? Do you need to?

              I'd certainly not use a for-each fork mechanism.

              • 4. Re: handling bulk amounts of processinstances; transaction p
                macd

                I'm using an Oracle9 database, set up with the jbpm-3.2.2 database scripts (including indexes).
                I tryed several scenario's

                1) create, signal and save every instance in it's own jbpmContext(transaction).
                This actually works, buts costs the lot of 4 hours time.

                2) creating batches of instances (400 at a time) within a single jbpmContext
                and signalling instances in batches (100 at a time) withing a single jbpmContext.
                This worked for 100, 500 and 1000 instances, but when trying 5000 it broke while signalling:

                2008-01-28 09:31:58,156 WARN [logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id -3f57ce66:556:479d8cb7:86a invoked while multiple threads active within it.
                2008-01-28 09:31:58,156 WARN [logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action -3f57ce66:556:479d8cb7:86a aborting with 1 threads active!
                2008-01-28 09:31:58,187 WARN [util.JDBCExceptionReporter] SQL Error: 0, SQLState: null
                2008-01-28 09:31:58,187 ERROR [util.JDBCExceptionReporter] Transaction is not active: tx=TransactionImple < ac, BasicAction: -3f57ce66:556:479d8cb7:86a status: ActionStatus.ABORTING >; - nested throwable: (javax.resource.ResourceException: Transaction is not active: tx=TransactionImple < ac, BasicAction: -3f57ce66:556:479d8cb7:86a status: ActionStatus.ABORTING >)
                2008-01-28 09:31:58,203 ERROR [STDERR] org.hibernate.exception.GenericJDBCException: Cannot open connection


                3) using an asynchronous forEachFork i can succesfully save the main proces, but then the job-executor takes over.
                a) When the 1st node coming after the forEachFork is not asynchronous,
                it appears all childTokens are created in a single transaction.
                b)When the 1st node coming after the forEachFork is asynchronous,
                every childToken gets created in it's own transaction.

                ad. a)
                This works for 100 to 9000 instances, but trying 10.000 instances the test breaks:
                2008-02-01 10:01:46,171 WARN [logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id -3f57ce66:6d1:47a2cd67:1f0d invoked while multiple threads active within it.
                2008-02-01 10:01:46,171 WARN [logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action -3f57ce66:6d1:47a2cd67:1f0d aborting with 1 threads active!
                2008-02-01 10:02:13,562 WARN [logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator_2] TwoPhaseCoordinator.beforeCompletion - failed for null
                org.hibernate.SessionException: Session is closed!
                 at org.hibernate.impl.AbstractSessionImpl.errorIfClosed(AbstractSessionImpl.java:49)
                 at org.hibernate.impl.SessionImpl.getJDBCContext(SessionImpl.java:1854)


                ad.b)
                basically it works, but it costs a lot of time (seconds per job) and it doesn't run flawless:
                running 10.000 instances (childTokens) as a single job,
                some jobs get a transaction exception:
                2008-02-05 13:23:19,437 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id -3f57fef5:4af:47a851a6:651 invoked while multiple threads active within it.
                2008-02-05 13:23:19,453 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action -3f57fef5:4af:47a851a6:651 aborting with 1 threads active!
                2008-02-05 13:23:19,500 INFO [org.hibernate.jdbc.ConnectionManager] forcing batcher resource cleanup on transaction completion; forgot to close ScrollableResults/Iterator?
                2008-02-05 13:23:20,281 WARN [org.hibernate.util.JDBCExceptionReporter] SQL Error: 0, SQLState: null
                2008-02-05 13:23:20,281 ERROR [org.hibernate.util.JDBCExceptionReporter] Transaction is not active: tx=TransactionImple < ac, BasicAction: -3f57fef5:4af:47a851a6:651 status: ActionStatus.ABORTED >; - nested throwable: (javax.resource.ResourceException: Transaction is not active: tx=TransactionImple < ac, BasicAction: -3f57fef5:4af:47a851a6:651 status: ActionStatus.ABORTED >)
                2008-02-05 13:23:20,390 INFO [org.hibernate.event.def.DefaultLoadEventListener] Error performing load command
                org.hibernate.exception.GenericJDBCException: Cannot open connection


                I didn't get this test run to the end, because after executing about 1870 jobs i got a java.lang.OutOfMemoryError.



                It appears strange to me that, whatever the scenario, every time it is the transaction that gets an error.
                i found http://wiki.jboss.org/wiki/Wiki.jsp?page=TxMultipleThreads
                which adresses the logged causes, but i didn't get me any further:
                I tried doubling the timeout on the 3rd scenario, but it wasn't successfull.

                I can imagine that increasing the timeout may be usefull when using one big transaction to create all instances.
                But it doesn't apply to batches or jobs, which are reasonably small and should be on independend transactions.
                So how come they do crash?


                • 5. Re: handling bulk amounts of processinstances; transaction p
                  bhagatkota

                  Marc

                  Did you ever find a solution for this problem? We are having a similar issue.

                  B

                  • 6. Re: handling bulk amounts of processinstances; transaction p
                    kukeltje

                    similar or identical?

                    • 7. Re: handling bulk amounts of processinstances; transaction p
                      jaydub

                      I have an app which must execute about 200,000 process instances per day inside an EE container. I was unable to use JBPM persistence as it resulted in too many DB deadlocks, especially when trying to delete completed process instances. Run a dump of all the SQL that is executed when you create/signal/manipulate vars/delete a process instance and you will quickly see why this did not work for me. It really was not designed to be used how I had hoped to use it.

                      I replaced the JBPM persistence model with my own simple, flat model, persisting the process instance as an EJB3 @Lob in a single table. This has reduced my DB load to almost nothing and works well. This solution may not meet your requirements, as it does not really handle concurrent updates to a single process instance, but then again, I did not have that requirement.

                      • 8. Re: handling bulk amounts of processinstances; transaction p
                        kukeltje

                        but do you still do logging in the db? cause turning that of makes a big difference... curious btw how you did this. Could you elaborate a little?

                        • 9. Re: handling bulk amounts of processinstances; transaction p
                          bhagatkota

                          The problem we are trying to solve is very identical. We want to create around 10000 process instances at once. The user can update the processes individually or in bulk again. Creating one process instance at a time and saving takes a lot of time.

                          • 10. Re: handling bulk amounts of processinstances; transaction p
                            jaydub

                            I do not have logging turned on. (now, nor when I tried to get it to work with standard jBPM persistence model)

                            I have a simple EJB3 entity which contains the following fields

                            @Id
                            int thisFieldIsAnIdFromMyApp;

                            other fields ...

                            @Lob
                            ProcessInstance processInstance;

                            The entity manager handles serializing/deserializing the process instance into a BLOB in my DB.

                            I simply load up this entity, get the process instance, do what you need to do with it, then merge() or remove() the entity via the entity manager, as needed.

                            Yes, it is a bit ham-fisted, but it is very fast. As noted in a different thread, the only problem I have is that I must create process instances from XML, not from the jBPM DB process definitions. (Due to hibernate lazy-loading)

                            • 11. Re: handling bulk amounts of processinstances; transaction p
                              kukeltje

                              so you wrote your own persistency service? nice, maybe something for the wiki to write about....

                              But if I'm correct, you do not have something like tasklists, timers etc...

                              • 12. Re: handling bulk amounts of processinstances; transaction p
                                bhagatkota

                                But how did you persist multiple instances? If you persisted the process instances in your application tables, how do you manage the tasks assosciated with the process istances

                                • 13. Re: handling bulk amounts of processinstances; transaction p
                                  jaydub

                                  I guess you could call it a persistence service of sorts, although the persistence is done outside of the JBPM realm.

                                  Yes, I do lose a bunch of functionality such as timers etc.. Those types of things would need to be built into my own application if I needed them. The standard jBPM persistence model worked great for me as a proof-of-concept....I just had problems when I tried to ramp up the throughput, and that is what forced me to use this model.

                                  • 14. Re: handling bulk amounts of processinstances; transaction p

                                    jaydub,

                                    Can you say a little more about how/where you plugged in your own persistence model?

                                    Do you have multiple tokens, i.e., do you do any forking into nodes that are either explicitly "async='true'", or that asynchronously block, waiting for an external event to signal?

                                    Thanks,
                                    -Ed Staub

                                    1 2 Previous Next