10 Replies Latest reply on Mar 16, 2004 8:39 PM by mikea-xoba

    entity structure for stored messages

    mikea-xoba

      i was wondering if we may benefit from some entity sub-structure to stored messages. let me explain:

      right now, org.jboss.mail.mailbox.entity.message.LocalMessage is the only entity bean interface we have, and there are basically two methods that allow access to the data of any given message: getBody() and getHeaders(), both of which return strings that need to be parsed. headers are ultimately parsed into their respective lines and bodies are parsed into their respective BodyParts, when appropriate. drilling down into the headers part, there are many standard headers that are repeated in every email, like date, subject, etc etc, and further, some headers contain references to other entities we will eventually want to persist in the database, like email addresses on 'from' or 'to' header lines.

      if we have some entity structure to reflect the actual regular structure in email messges themselves, we'll be able to do things like search for all email messages from a particular email address, without having to load in all the bodies and headers from all the stored emails in the database; we'd rather just execute an ejb-finder method. there are lots of other examples of neat things that can be done easily and efficiently, too, if we create some structure in the Entity layer.

      so, for instance, rather than have LocalMessage have a CMP field 'headers', it may be useful to have a CMR field 'headers', in a many-to-many relationship with a 'header' entity bean.

      the 'header' entity bean would then have a CMR field in a many-to-one relationship with a 'header-type' bean. the 'header' entity bean would also have a CMP field for the string value of the header, such as the actual subject of a message.

      the 'header-type' bean would have a CMP field with the name of the header-type, such as 'Subject' etc.

      the LocalMessage object would also have a CMR field called 'from' in a one-to-many relationship with an 'email-address' bean, and another CMR field called 'to' which would be in a many-to-many relationship with the 'email-address' bean, etc.

      basically what i'm proposing is something as close as possible to 'third normal database form' for our entities, given that they're ultimately stored in a relational database. if the persistent store is in third normal form, whether in the database layer or entity layer, its very easy to query and manipulate since all the dependencies are tightly controlled and redundancies essentially eliminated.

      if we want to eventually support things like contact lists, calendaring, etc, this kind of structure may become especially useful. in that case, we may even want to deepen the structure and add beans like 'person' that represent a person, with a CMR field in a one-to-many relationship with the 'email-address' bean, for instance.

      this short note just scratches the surface of this topic, but i wanted to get some early feedback on it.

      mike

        • 1. Re: entity structure for stored messages
          acoliver

          agreed --- no need to paint bikesheds, and there are more pressing M1 issues as well.

          certainly with all persistence existing behind the 'org.jboss.mail.mailbox.Folder' interface, we can refactor anytime later without significant penalty; and if a future persistence architecture allows more flexibility (i.e., fine-grained entities), we could subclass 'Folder' interface to reflect that.



          • 2. Re: entity structure for stored messages

            +1

            If you are going to be using these headers individually, you might as well store them distinctively in the DB.

            Unless someone is going to write parse/unparse routines for the entire header as a string.

            Unless there's a really good reason to keep them all together, I think splitting them out would be a good idea.

            Steve

            • 3. Re: entity structure for stored messages
              acoliver

              Whatever you think makes sense, consider clustered caching issues as well in your design.

              • 4. Re: entity structure for stored messages
                acoliver

                I don't know that header-type makes sense because theoretically I could adhoc add X-GrandMasKitchen: as a header at whim...

                • 5. Re: entity structure for stored messages
                  mikea-xoba

                  the way i was imagining it, 'X-GrandMasKitchen' would automatically get created as a 'header entity bean' when first encountered.

                  but in any case, X-GrandMasKitchen has to get stored anyway --- either as ASCII in a raw message or otherwise, and so storing it in an entity bean just keeps the storage more structured --- precisely why folks like using relational databases (and by extension, j2ee/ejb).

                  over time, the list of stored header entities would grow, but probably much more slowly than the number of stored emails since most headers are pretty standard, even the custom ones. if there are 100 popular types of mail clients and servers out there, each using 10 custom headers, that would be 1000 unique headers and not too big a deal for a normal relational database with indexing etc etc.

                  while this is undoubtedly correct database design, i'm still not so sure its right for jboss yet, since finder methods would be used to navigate the entity store; and what i'm concerned about finder methods is that i don't think their results are cached in jboss yet (if they ever will be), like primary keys are. so each finder method results in at least one trip to the database to retrieve a set of primary keys. fortunately, however, the primary keys and their entities are cached, so further trips to database are not needed (if things are optimized right one one has alot of RAM at one's disposal). also fortunately, databases are pretty sophisticated about caching themselves. so really, it fundamentally boils down to the inefficiency of the serialization/unserialization we call 'SQL' if jboss and database are each doing their jobs right with regard to caching.

                  any further thoughts or in-the-field experiences on this entity-structure issue certainly would be useful to hear about.

                  mike

                  • 6. 3825725
                    mikea-xoba

                    p.s.,

                    what are the issues to be thinking about re: cluster caching?

                    i believe if a jboss instance in a cluster or partition modifies any fields of an entity bean, it has to send out a cache invalidation message to all other parties? so i guess if the design involves modifying beans rather than creating them, caching will be taxed to one degree or another?

                    i haven't delved into the details of jboss' caching infrastructure yet, so i really don't know.

                    mike

                    • 7. Re: entity structure for stored messages
                      okettune

                       

                      "mikea-xoba" wrote:

                      any further thoughts or in-the-field experiences on this entity-structure issue certainly would be useful to hear about.


                      Judging by our experiences with the current CMP engine, I'd say you're asking for trouble when using fine-grained CMR with entities common to most transactions (in this case header-type). All locking options we've covered so far are sub-par:

                      QueuedPessimistc locking will cause bottleneck by synchronizing read-only methods.

                      Optimistic locking will not help, as it requireds InstancePerTransaction container, and causes loading from the DB every time.

                      The scantily documented SimpleReadWriteEJBLock in the codebase is not thread safe, we've had lots of weird thread issues on dual-xeon machines (working on a fix).

                      But then again, the upcoming backport of the cache project to the 3.2 branch might help here, and rumour has it that Hibernate is pretty good. ;)

                      • 8. Re: entity structure for stored messages
                        mikea-xoba

                        agreed --- no need to paint bikesheds, and there are more pressing M1 issues as well.

                        certainly with all persistence existing behind the 'org.jboss.mail.mailbox.Folder' interface, we can refactor anytime later without significant penalty; and if a future persistence architecture allows more flexibility (i.e., fine-grained entities), we could subclass 'Folder' interface to reflect that.



                        • 9. Re: entity structure for stored messages
                          kabirkhan

                          I haven't thought this through properly, just wanted to mention it so please don't shoot me down :-)

                          Could it maybe be possible to both keep the messages as they are at present (for speed of loading) AND split the headers as mentioned below (for search capabilities etc.)?

                          • 10. Re: entity structure for stored messages
                            mikea-xoba

                            i believe so --- and sounds like a good idea too.

                            i think its usual for folks to begin by designing their databases or set of entity ejb's all the way to third normal form and then back off a little where it makes sense, for performance or other reasons. 'backoff' in our case would be keeping the bit-perfect original message data as you suggested.

                            its also seems useful to keep the raw messages for 'archival' reasons, in case one absolutely needs a bit-perfect copy of the original email. for instance, there will undoubtedly be email instances that don't even conform to rfc 822 (extreme spam) and thus won't properly fit into a neat little entity bean structure. but we'll still probably want to handle them somehow in the jboss mail server.

                            mike