it can run as many as your db can store
thank you for your answer.
a more detailed question:
what if i have 1.000.000 timers (jobs) with 10.000 jobs due each day? can the jobexecutor handle that? is there a notable performance impact when it queries the jobs-table?
10.000 jobs due each day. That is 10.000/(3.600*8) is 1 job per 3 seconds on average. I do not see why that should not be possible.
The performance impact is related to the number of jobs that will end in that period. If the 10.000 jobs end all at once, yes, that has an impact. otherwise I think it will be limited
We are currentlcy also playing around with huge data sizes.
The main performance bottleneck in jBPM is the database logging service and the sive of the the logs table. Executing our test process with something about 150 nodes in 10 subprocesses the engine takes about 15seconds to execute it without logging and about 1 minute with logging activated.
Also the logs table is the biggest one. We have something under 1000000 process in our db and the logs table already has more than 4 000 000 entries. Retrieving the logs takes a while then.
thank you for this information.
leads to another question related to log retrieval:
is there a tool out there that displays the process history (ie. path token has taken) for a process instance? i can't find such thing in jbpm-console (3.2.3).
i know it would not be hard to code that but would be nice to use existing solution.
We had the same issue. We are currently building our own interface for that.
When it comes to logging and history it gets even more tricky - espacially in cases something wrong happens:
1) The process history is really not interesting when the process completed without problems.
2) Things are getting interesing if an error happened - BUT:
If an error happens within jBPM and rollback is executed you do not have the process history log entries. So you just donÂ´t really know what the process did exactly.
We are currently working quite hard on this issues. There are a bunch of situations which have to be considered in this area. I am currently working on a solution, where the logs are captured using log4j and written directly ot persistent store and can be analyzed later, when you need the logs for a certain process.
The new console will display more about the logging and also have some general reporting available, I suggest to take a look at it. It is in 3.3.1 already.
Regarding the not writing of errors... that is a indeed the case, but since the cause of these rollbacks are (or should be) mostly technical, e.g. a remote system not available the log4j logging should cointain info. It is a valid issue though were there might be alternative solutions. Any hint on how you are trying to solve this?
Do you refer to the new GWT Console? - I tested it in 3.3.0 and it was not really finished by then....
I would like to solve two issues - 1) the log service is just to slow - its the main perfomance bottleneck of the enigne - 2) Is unsafe becuase it does not log when something goes wrong.
To have really good logs after a crash you must log you stuff (nearly) immidieatly to a persistent store. Otherwise when you pull the plug out of your server you have now idea what you already have done.
Additionally when you want to now that your database has a problem - then you might not be able to log this to the database.
Really good logs must be written nearly immediatily to an extremly high available store ( - the local file system).
So I try to write the logs and the process hisotry using log4j and its MDC to the file system. From their you can transfer it to the database or you can direclty configure a JDBC Appender within log4j to write it to the db. Then you would need to look at the files just in case the db was unavailable.
The main problem is that this does not work without patching jBPM. The LoggingInstance is hardcoeded in the ProcessInstance and the logging service only gets called at the end of execution. This is were I am stuck at the moment. I would need a way to configure the LoggingInstance just like the Services.
no, it is still not finished, but improved a lot in 3.3.1
1) is being taken care of in 4.0... logging is changing and will be configurable as well. The thing is that 'governance' sometimes requires a lot of info... so it is a tradeoff...
2) as said before... if there is a technical error by which a transaction is rolled back, nothing has happened in the process. Yes, there is an error, but that should be dealt with on a different level... not the process logging....
In one of the production systems I developed (non-jbpm based, not even workflow) we made the logging async in a transaction to an in-memory jms queue. That would write data to the db afterwards... An issue of what to do if the system crashes and not all logging has been written to the db.... again... it is a tradeoff....Writing it to the local filesystem is *not* high availability... most crashes we had were due to disk crashes... but you can configure the jms queue to use local file persistence (we tried that and it works, but switched back to keeping it in-memory).
So logging async (JMS) with local file persistency as storage before the log is actually written to the database is imo a good solution. Regarding your main problem.... patches are always welcome....
does the gwt-console run under tomcat 5.5?
most crashes we had were due to disk crashes
ThatÂ´s an interesting piont - I did not thougth about it that way...
So logging async (JMS)
How much logging did you wrote asynchronously? I heard about such suggestions earlier but personally I always thought thatÂ´s an amazing overhead for writing log statements....
Sure, on one hand it is overhead, but on the other hand you can scale better.
- Your real front-end worker threads return quicker (the delay to write a small log entry to the jms queue is way lower than writing it to the db directly)
- So the number of threads simultaneously used will be lower (and thus use less memory/threads) which can be used to process more requests
- you can give the mdb's that process the logs from jms to the database their own thread pool
- you do can limit the number of connections for the loging component to the database e.g. by giving them their own connection pool (which lowers the burden on the db).
Once you have it in place and use it (of course, make it a reusable component, dry) it is easy to do. But... don't forget to use the date/time from when you put it in JMS if you write a date to the db, Don't use the date/time from the moment it is actually written to the db... believe me... otherwise you can get confused when reading the logs
What we did in a project is to exchange the LoggingService and to catch Exceptions in the environment. As soon a exception is thrown (causing the correct rollback) we send an event in a own transaction containing the log data to some special service handling this (via ESB in this case, meaning JMS as well).
By the way: It is still correct from jbpm to rollback logs as well in my eyes! This code has never happend so no process audit logs should be there.
The current logs in jbpm are business audit logs, not for technical error solving. This is indeed a bit missing now, but will get better in jbpm 4...
Also in my opinion is correct to rollback process logs but what if in one of the rolled back nodes jbpm sends an email to an operator or a customer or, in general, triggers some operation in external systems?
We can't say that in this case the code has never happened,
in other words,
code has never happened for jbpm but it did happened for some external systems (or even worse people).
There could be situations where syncronization between jbpm and external world would be lost so i think that what Camunda did is a must even because using an ESB gives not only asyncronicity but protocol/transport failover as well.
Camunda, could you explain better how did you designed that special service that collects log data where internal jbpm error occurs?
did you store that logs in the same table as other logs or in a separate location?