8 Replies Latest reply on Feb 14, 2009 7:44 AM by kukeltje

    blobs, clobs, text and bytes

    tom.baeyens

      I'm adding binary process attachments to jbpm 4. I'm seeing things we need to improve relative to jbpm 3. I'm seeing new possibilities. But I don't see clearly the final solution yet. So please help fill my gaps:

      1) chopping or blobs as the default strategy ?

      * jBPM 3's capability to chop a blob into discrete sized chops and store a list of those chops turned out to be very portable

      * i thought that oracle was one of the most difficult db's with respect to plain BLOBs. But currently in jBPM 4 this seems to work ok.

      So the first question is: should we keep the chops as our default solution? Or can we rely on our own QA now and give the BLOB's another try ?

      I already build a Lob (Large OBject) hibernate entity that can store a java.lang.String or a byte array in various ways (configurable). Then the handling of blobs and clobs becomes kind of configurable in one central location instead of spread out over the whole jbpm domain model.


      2) Another aspect is the large text columns.

      Here and there in the process definition datamodel and in the runtime datamodel, there are text columns. Typically string is fine. But for some DB's this is limited to 4K (was it ORA or DB2?). So we ended up truncating those strings.

      Every string property that is potentially longer then 255 could a reference to such a Clob entity.

      3) The third aspect is process definition vs runtime blobs. Process attachments need a blob, and storage of serialized objects also need a blob. Process attachments need to be cached in hibernate's second level cache. While process variables cannot be cached.

      So this doesn't really match with the configurable Blob and Clob approach that i was buiding out. As hibernate cannot make a distinction between the process definition Blobs and the variable Blobs.



      All input appreciated : extra requirements, desired improvements, silver bullet solutions,...

        • 1. Re: blobs, clobs, text and bytes
          jbarrez

          1) Imo, chopping it up is a workaround. It's more logical to have one BLOB for the attachment. If it works on all the QA supported databases, there's no reason to go again for the chopping.

          2) DB2 has this limit (altough it can be configured to be set higher).
          I don't think you can create a 'generic' DB DDL script. eg On mysql it is possible to use a TEXT column instead of VARCHAR, but I don't know for other DBs.
          What people did in the past, is simply changing the DDL script according to the database.

          3) Why can't process variables be cached?

          • 2. Re: blobs, clobs, text and bytes
            tom.baeyens

            on 2) : as long as we can postpone db specific handling, that will make development much easier and our progress much faster. db-specific things means that we would need a set of mappings for each db. that means that we would have to maintain and synchronize between all those copies.

            so i'ld like to keep the same mappings for all the dbs. and if we need to apply a small modification for one particular db, it should be possible to apply this change automatically from the single original source mapping files.

            on 3): cause that results in incorrect behaviour in a cluster. 2nd level cache of hibernate is only used for process definition caching as that can be assumed to be static information that (after deployment) will not change.

            • 3. Re: blobs, clobs, text and bytes
              camunda

              As far as I remember there was a lot of effort put in the chopping, and there is a good reason why it is there, or not?

              In that case maybe it makes sense to just port the logic/code?

              But if it works without as well I am with Jorram, skip every unecessary complexety....

              • 4. Re: blobs, clobs, text and bytes
                tom.baeyens

                the chopping code is ported to 4, but i'm not sure what is best to take as the default in this situation.

                the reason was that chopping was better for database portability. i always thought that oracle was a difficult one. but that seems to work ok.

                maybe since we have our own qa now, we can just use blob as the default and save ourselves a table and gain some performance.

                • 5. Re: blobs, clobs, text and bytes
                  camunda

                  Okay, then +1 for the blob as default

                  • 6. Re: blobs, clobs, text and bytes
                    kukeltje

                     

                    i always thought that oracle was a difficult one

                    Correct... it was indeed. Currently it isn't anymore


                    +1 for the blob

                    • 7. Re: blobs, clobs, text and bytes
                      aguizar

                      I agree we should use BLOB/CLOBs as the default and use the chopping only as needed. I don't even think we will find a *current* database/driver combination that does not support LOBs right. The four databases in continuous integration with jBPM 3 (HSQLDB, MySQL, PostgreSQL and Sybase) have no problem with CLOBs.

                      • 8. Re: blobs, clobs, text and bytes
                        kukeltje

                        Koen wrote on the mailinglist

                        +1

                        I agree on reducing complexity if there is no apparent regression.


                        But.... we should not forget migration...