2 Replies Latest reply on Aug 30, 2005 5:46 PM by joe.cheng

    thoughts on IMAP and mailboxes

    sunfire

      After the the IMAP skeleton has been comitted now here a few things that I have in mind.
      to push the current IMAP4 stack to a point where T'bird is able to have a look at the real folders and mails that are in a mailbox a few things need to be done. The first commands which need to be implemented are LSUB, SELECT, LIST and FETCH (the latest being the most complex). I also never tested the STARTTLS command to work ? it is just a copy of the POP3 STLS code so it may or may not work with IMAP. The CAPABILITY command also needs some adjustments after STARTTLS was confirmed to work since it may return different login capabilities before and after STARTTLS was called due to security considerations in the login process.
      In order to implement the commands the mailbox interface needs to be overhauled. Here are a few thoughts that came to my mind when stubbing out the IMAP stack.

      1. names and structure
      I don't know if I am mistaken since I haven't looked into the code too much but I think currently the default mailbox that is returned if you ask for a users mailbox is the root folder (INBOX?). I think the structure should be changed to allow more named folders at the root lvl besides INBOX and to seperate the general "container" and the actual "mailboxes" for clarity to something like this: You first retrive a users "container". A user has only one ?container?. A ?container? contains only the named root ?mailboxes? (e.g. ?INBOX?, ?Trash?, etc.) but no ?mails?. A ?mailbox? may contain other named ?mailboxes? and/or ?mails?.

      2. mailbox attributes
      mailboxes need the ability to associate with flexible attributes. For IMAP the attributes are quite simple and can be saved as a String. So something like

      public String getAttribute(String key);
      public void setAttribute(String key, String attribute);

      would do the trick for IMAP. Key would be an identifier unique per protocol to allow different sets of attributes per protocol per mailbox

      3. mail numbering/flagging
      message numbering and flags need a few thoughts but before I start to elaborate on that please read RFC3501 section 2.3.1 to 2.3.3 to understand what is required/desired for IMAP

      4. mailbox transactions
      IMAP clients may use different mechanisms to check if a mailbox was changed to resynchronize with the server. Since all commands a client issues may return status updates as untagged data the IMAP instance has to run a check if a serial or whatever identifier in the mailbox was changed since its last check and also provide an efficient way to report all the changes to the client. So besides optimistic locking to see if the mailbox was changed it need to be decided on how to implement a nice way to see what kind of changes/transactions took place since the last check. First thing that pops into my mind is keeping a copy of the mailbox in the instance and trace the changes made in the real thing with it but I don't like the idea of keeping copies. I was thinking about some kind of transaction log kept in each mailbox that can be queried to return a list of pre-defined actions like adding, removing, flagging, etc. mails since a given serial/timestamp/whetever. To avoid that it grows into infinity purge entries in it after the maximum amount of time a client is allowed to stay connected without issuing a command and thus will retrieve all actions that happened until then anyways (idle timeout for IMAP MUST be at least 30 minutes).

      5. efficient FETCHing
      The FETCH command has quite a few options a client may use to request very specific parts of a mail from a server. For complete message transfer of a large mails RFC2683 recommends client developers to transfer the message in parts by subsequent fetching of chunks like this:
      C: 022 FETCH 3 BODY[1]<0.20000>
      S: * 3 FETCH (FLAGS(\Seen) BODY[1]<0> {20000}
      S: ...data...)
      S: 022 OK done
      C: 023 FETCH 3 BODY[1]<20001.20000>
      S: * 3 FETCH (BODY[1]<20001> {20000}
      S: ...data...)
      S: 023 OK done
      C: 024 FETCH 3 BODY[1]<40001.20000>

      I haven't tested what fetch options T'bird uses to download a first view of a message yet. But for the example above 2 things should be implemented without creating too much overhead since every FETCH is a command by itself and my or may not be part of a series of subsequent FETCHs. Splitting a message up as required by RFC822 and 2822 and then send just the requested octets of the specified mime part back to the client. If this process is not implemented in an efficient way a few users downloading a few big mails may produce a lot of heat on a system.

      6. a few general questions
      - are mailbox locks going to be inherited throughout submailboxes or is a lock just good for the selected mailbox?
      - is INBOX going to be a magic mailbox that e.g. can't be deleted since it also should be the default mailbox to fetch from a users container for POP3 and the place where new mails being stored by SMTP? IMAP has a few commands where INBOX is treated different then any other - regular - mailbox.
      - are we there, yet?! ;-)

      Cheers, Thorsten

        • 1. Re: thoughts on IMAP and mailboxes

          Firstly, many thanks for your input getting the IMAP kick started. I look forward to seeing more contributions.

          We definitely need to look at changing the mailbox implementation to support IMAP, but we are a little distance from that currently. At the moment I am working on some refactoring that I would like to get in place before tackling the mailboxes. This includes the Mail object itself, its far too tightly coupled to the smtp protocol. Also I have been thinking about normalising the schema with regards to the Mails. Currently each folder contains a copy of the email (headers and a reference to the mailbody). It could be possible to have a single message table and the folder holds references to the mails. It would make things more efficient in a number of areas, but is a bit more complex (handling deletions is tricky).

          I would also like to have a namespace within JBMail such that it is possible to access most entities using a path like structure (similar to the Java Content Repository).

          5. efficient FETCHing


          We have most of the infrastructure in place to do this efficiently (check out org.jboss.mail.store.*) and I will try to handle the spliting of the body into parts when I refactor mail creation.


          - are mailbox locks going to be inherited throughout submailboxes or is a lock just good for the selected mailbox?


          Depends on what is required by the spec. If there is a requirement for both then we could make it a configuration option.

          - is INBOX going to be a magic mailbox that e.g. can't be deleted since it also should be the default mailbox to fetch from a users container for POP3 and the place where new mails being stored by SMTP? IMAP has a few commands where INBOX is treated different then any other - regular - mailbox.


          I don't think that it needs to be magic as such. Each 'Container' could have a list of its predefined folders. Folders could have some extra attributes to handle specific behaviour. E.g. Folder.isPermenant()

          - are we there, yet?! ;-)


          Its a journey, not a destination :-).

          To move this forward, one possibility is that you could start be writing some of the interfaces for the mailbox. This would help use define the structure that we need.

          Mike.

          • 2. Re: thoughts on IMAP and mailboxes
            joe.cheng

             

            "mikezzz" wrote:

            5. efficient FETCHing


            We have most of the infrastructure in place to do this efficiently (check out org.jboss.mail.store.*) and I will try to handle the spliting of the body into parts when I refactor mail creation.


            Keep in mind that the FETCH command may specify particular MIME parts, not necessarily the whole body. (I'm not sure if Tbird does this though.)

            This may be obvious, but IIUC, the offset/length-style partial FETCHes will probably either happen consecutively or not at all. So rather than optimizing for any arbitrary offset FETCH at any given time, you can instead restrict yourself to the case where a 0..2000 fetch MAY be immediately followed by a 2000..4000 fetch, etc. (i.e. just cache the single most recently used body if it wasn't fully retrieved)

            Really glad to see you guys pushing forward with IMAP. I've pretty much given up hope for JAMES.