3 Replies Latest reply on Nov 14, 2014 1:54 AM by swiderski.maciej

    Handling thousands of subprocesses

    qbast

      Hello

       

      I am trying to use jBPM (6.1CR1 with MS SQL server and JBoss 7.1)  to orchestrate TV EPG ingestion. It is modelled as main process which acquires an XML file, parses items into a list and then starts subprocess for each item (using  'multiple instance' subprocess with 'reusable' inside). Subprocess then does real work for each item: inserting data into db, uploading thumbnails to another server, retrieving metadata from external services, in some cases creating human task to complete metadata if something is missing (the whole thing consists of 14 nodes and one embedded reusable subprocess). The problem is that if I try to parse too big file (over ~1000 - 2000 items), jBPM gets transaction timeout while trying to create all the subprocesses and rolls back.

      So, taking my questions from the top:

      - is jBPM even right tool for the job? Or maybe I should look into something like ESB?

      - does my approach of one 'outer' process than spawns thousand or more of suprocesses make sense?

      - should I just try to tweak process definition a bit to create subprocesses in smaller batches or make more nodes async?

      - or is it just configuration issue on JBoss or MS SQL server side?

        • 1. Re: Handling thousands of subprocesses
          swiderski.maciej

          most likely you can solve this by increasing transaction timeout on application server but that usually means you might be locking part of db for quite some time. So rethinking the process definition design might be better option. You could consider using more hierarchical definition where each reusable subprocess gets hundreds of items to process and then can decide if it will do it or distribute to other processes. Then you can take advantage of using async work to make the processing be done in parallel.

           

          HTH

          1 of 1 people found this helpful
          • 2. Re: Handling thousands of subprocesses
            qbast

            Increasing transaction timeout (and disabling JTA) indeed helped, but as you say locking db for 15 minutes is not a good idea.

             

            I am not sure I undestand your idea for redesign right. By 'more hierarchical' do you mean something like:
            - topmost process that parses a file and gets item list, organizes them into batches (for example 100) and creates multiple intermediate processes
            - intermediate process that takes batch of items and justs fires low-level subprocess for each of them
            - per-item low-level subprocess

             

            And make starting nodes in intermediate and low-level subprocesses async.

            • 3. Re: Handling thousands of subprocesses
              swiderski.maciej

              yup, that's exactly what I had in mind.

               

              Make the processes bit more intelligent and let them decide what to do. For example if there are batches to be processed that are big more than 100 then split them to additional set of processes let's say to 10 but if there is only 20 then simply process them.

               

              HTH