1 2 Previous Next 19 Replies Latest reply on Apr 14, 2008 3:18 AM by tom.baeyens Go to original post
      • 15. Re: log infrastructure proposal

         

        "tom.baeyens@jboss.com" wrote:
        Miguel,

        IFAICT now, we both want a BI (star schema) database for the history queries. On top of that you want another schema or table (called Journal) to perform queries on runtime executions that are not finished yet.


        We have currently implemented a journal as another database schema rather than a single table... journal data is the one representing the workflow execution. This journal data is queriable from an API.

        "tom.baeyens@jboss.com" wrote:

        First I would like to ask to clearly sketch your requirements in terms of which databases or schema's you see, what kind of data they contain and what kind of queries will be issued against those dbs.


        We see the same object model for both journal and history data. Same or different database schemas could be used for both. Moreover we see journal and history plugabble so one default implementation could be database (different dbs) but also other implementations are possible and can even be chained.

        As a example, we could have a default journal implementation storing running data in a database (this can be queried) + another implementation that just log in a single table or just println in the command line + a BAM like implementation sending JMS messages that will be handle afterwards by a BAM vendor (i.e SeeWhy)

        In terms of queries we have a long list of operations that are currently used in Bonita, some examples could be:

        - getInstanceVariables
        - getTaskPerfomer (taskId)
        - getActivityStartDate
        - ...

        In fact any operation related to execution data...

        "tom.baeyens@jboss.com" wrote:

        I don't quite get/follow the motivation yet to have a separation between runtime history data (journal, unfinished executions) and the real history data (finished executions). You say that you think the performance of the queries on the runtime data will get too slow. I don't think that the extra complexity of separating the journal from the history is worthed for that.


        We see the separation between BI and BAM (including get operations) as a natural way to work with execution vs history data.

        As explained in my last post we don't think that storing in the log table + moving the date to BI database for each operation is a good mechanism (and this is the only option with the initial log proposal to ensure that runtime data is queriable)

        "tom.baeyens@jboss.com" wrote:

        On top of that, I'm not sure if you see the full potential of the configuration options to archive the logs into the history db. I think asynchronous log archiving right after the runtime transaction should be the default. That means that the history tables will be always up to date. And the work to archive needs to be done anyway. So postponing it till end of execution doesn't really save any work to be done. So it doesn't result in higher throughput.


        We see your point but IMO, to directly store the data in the journal database is more efficient than storing the data in the log table and then move it to the history database as you proposed.

        "tom.baeyens@jboss.com" wrote:

        Treat all this as a mixture of concerns and alternative pieces of the puzzle. Can you make it a bit more clear what it is exactly that you would like to see realized on the pvm ?


        You got my answers before :-)

        "tom.baeyens@jboss.com" wrote:


        As for the namings, i propose following terms:
        * Runtime DB (state of active executions, optimized for just state management. contains only active executions)
        * Log table (flat list of events that are recorded during execution)
        * History DB (execution information, optimised for querying. contains active and finished executions)

        The act of processing the logs to the history db is called archiving

        This also indicates the part for which I don't yet understand the full details and use cases: the Journal.


        Runtime DB (Jboss proposal) would be a particular implementation of Journal (Bull proposal). Others implementations would be log, XML, JMS notifications...

        At Bull side we see the log as a particular implementation of Journal and so we only see two concepts: Journal and History

        Archiving would be the operation moving data from Journal to History.

        I think our approach is more generic... and can easily fit with your requirements (i.e you could have a particular implementation of journal in which you store the running data in a single Log table).

        regards,
        Miguel Valdes


        • 16. Re: log infrastructure proposal
          kukeltje

          Miguel,

          The company I work for, currently has a kind of similar issue. Not BPM related, but identical to the runtime (journal) vs history. What we see is that on the runtime DB for logging, we also need to do a lot of 'support' queries to answer customer related questions. Kind of like BAM things. These queries can have a lot of different select parts some joins and to be fairly responsive (not doing full table scans) we need a lot of different indexes.

          The downside to this is that it introduces latency when inserting new records into this runtime DB. Part of this latency is caused by oracle 'issues' but with MySQL we've seen comparable things albeit a little less delay. We therefore introduced an intermediate database where records are put in a JMS queue and async inserted into this 'log' db. (called support db in our situation). The engine uses the runtime db, the support people the log db and (product) management uses the history db. Seems to be working now for all departments

          So I'd suggest to keep performance into mind when just using two databases.

          • 17. Re: log infrastructure proposal

            Hi Ronald,

            Definitely I agree. For intensive BAM or intensive quering on runtime data the best is to have two different database (in addition to BI one).

            The key thing is to provide a generic, extensible and configurable mechanism allowing to fit with each customer requirement (two databases vs one database)

            regards,
            Miguel Valdes

            • 18. Re: log infrastructure proposal
              tom.baeyens

              friday jbpm and bull had a meeting about the logs. just so that every one knows, this was the outcome.

              by default, the logs will be processed inside of the runtime transaction and merged into a history database.

              so the flat log table will disappear.

              pluggability of LogSessions will remain.

              but the default will translate the logs in the history (queryable) database. optionally, there will be a move of finished process instances to another history database (archive) that holds all finished history details (in the same schema).

              bull has convinced me that this solution is simpler and not necessarily slower then the design with the log table in the middle. so i like it better.

              • 19. Re: log infrastructure proposal
                tom.baeyens

                an extra idea is to use the same id's in the history db as in the runtime db. this way, mapping the ProcessLog's to the history record will become easier. so when the process log is processed in the runtime transaction, it can do a lookup by id in the history database.

                1 2 Previous Next