1 2 Previous Next 15 Replies Latest reply on Jul 6, 2006 6:42 PM by gohip

    POP --> isDeleted

    esriram

      After a mail is fetched by the POP client, the mail will first be marked as "deleted" and will then be physically deleted. First MailService.markDeleted will be called and then MailService.deleteMarked will be called.
      Both these calls will be made in the same TCP session on the POP client. (Correct me if I am wrong)
      Question: Can I mantain the list of deleted messages in memory instead of updating the DB. Is it correct? If yes, is there any reason why the isDeleted column should be persisted.

      (i.e.)
      I will do a deletedHashtable.put() when markDeleted is called.
      I will do a deletedHashtable.remove() when unMarkDeleted is called.
      I will physically remove the mails from DB/filesystem when deleteMarked is called.

        • 1. Re: POP --> isDeleted
          acoliver

          I can see the advantage of not doing it on the same session, but I see no advantage to doing this hashtable-fu and a pretty big disadvantage. As an alternative, I suggest attaching this to the Reaper code (requires more thought) but leaving it DB side.

          • 2. Re: POP --> isDeleted
            esriram

             

            I can see the advantage of not doing it on the same session, but I see no advantage to doing this hashtable-fu and a pretty big disadvantage


            Thanks a lot for your response. My aim was to avoid the update Table set isDeleted=true queries.

            A typical pop session will be like this (Please correct me if I am wrong)
            1. LIST --> 1 SQL
            2. RETR --> SQL for scrolling to the num'th row in the database.
            3. DELE --> 1 SQL for update --> It is done in a selarate transaction --> Suspent/Resume etc
            4. RETR 2
            5. DELE 2
            6 QUIT --> Delete from table where isDeleted=true
            In a session which is going to download say 10 mails, there will be too many queries executed. I dont see any need for (temporarily) updating the isDeleted column.

            My suggestion:
            1. The temporary update of isDeleted column should be an inmemory operation. No queries should be executed.
            2. When the LIST command is executed the list of mails should be stored in memory. The Message in the memory should have the following
            1. MessageID
            2. Size
            3. isDeleted
            4. The temp ID which RETR and DELE will use. (The index of for loop)
            5. It should not have the header or the body in memory.
            3. RETR should do this.
            1. Iterate the inmemory list and get the messageID of the num'th mail.
            2. Get the header and the body from the DB by using the messageID(PK) instead of using scrollable result set.
            4. DELE should just update the inMemory list.
            5. QUIT --> Get the messageID of the deleted mails from the list and delete from DB in a single query.

            Think this will perform better. Let me know what you feel.


            • 3. Re: POP --> isDeleted
              acoliver

              We cannot do the messageID(PK) because sequence # != uid. sequence # is only per session. Other than that it looks okay. Please submit a patch.

              • 4. Re: POP --> isDeleted

                An interesting optimisation would be to load a page of headers into memory rather than one at a time on demand. If pageSize == 1, then it would behave the same outlined above. It would allow of systems with more memory to take advantage of it (basically allowing an administrator to tune the time/space trade off).

                A database will much prefer 10 queries of 10 rows rather than a 100 queries of 1 row.

                Just a thought.

                Mike.

                • 5. Re: POP --> isDeleted
                  esriram

                   

                  An interesting optimisation would be to load a page of headers into memory rather than one at a time on demand.


                  Actually I dont find any need to load the headers separately in the memory. The Headers and the body can be stored together (as 1 file) or (as 1 coumn in DB). When RETR is done, the file (headers and the body together) should be read (from the file/db) and written to the POP Response. Please correct me if I am wrong.
                  The current implementation stores the headers and the body separately. Donno why.
                  This way there will be no querries to fetch the headers.

                  We cannot do the messageID(PK) because sequence # != uid. sequence # is only per session. Other than that it looks okay.

                  Implementation Note:
                  The cached list of messages will not contain the headers and the body. Headers and the body are needed only during RETR. They will be lazy loaded.

                  ONLY the following are needed throughout the POP session. All other attributes should be lazy loaded
                  1. messageID
                  2. sequenceID#
                  3. isDeleted

                  When DELE 2 is being done. Fetch the message (from the cached list) whose sequenceID# is 2 and call isDeleted on that instance of messageData.

                  The code in MailboxServiceImpl will look like this

                  public List getMailListForFolder(Folder folder)
                  {
                  //Query DB and create list of messageData(This list will NOT contain the headers)
                  //Set the sequenceID# for the messages
                  //Cache the list. Cache.put(mailboxID, list)
                  return list.
                  }

                  public void markDeleted(Mailbox box, int num)
                  {
                  //get the list from cache. Cache.get(box.getId)
                  MessageData messageData=list.get(num);
                  messageData.setIsDeleted(true)
                  //No queries to be executed. Only in memory operation
                  }

                  public void deleteMarked(Mailboxbox)
                  {
                  //Get list from Cache
                  //Get list of messages for which isDeletd is true.
                  // Excecute 1 SQL which will delete all the mails
                  //Clean cache. Cache.remove(maibox.getId)
                  }

                  Please let me know if I am not clear anywhere.
                  Please submit a patch.

                  I am doing the cache implementation for my file based implementation of MailboxServiceImpl. Dont think you can use it direcly. Thought posting the idea would be nice think to do.

                  • 6. Re: POP --> isDeleted
                    acoliver

                    describe your idea via code supplied via "cvs diff -u" :-)

                    • 7. Re: POP --> isDeleted
                      pilhuhn

                      Headers and bodies should not be stored together. Imagine clients issuing the TOP-command to fetch a header. When headers and bodies are stored together this would also fetch all atachments etc. No good idea.

                      • 8. Re: POP --> isDeleted
                        esriram

                        Given below is the extract from the POP3 RFC. It says that the header should be followed by the message body when a TOP command is given. This means that the headers and the body CAN be stored together.

                        TOP msg n

                        Arguments:
                        a message-number (required) which may NOT refer to to a
                        message marked as deleted, and a non-negative number
                        of lines (required)

                        Restrictions:
                        may only be given in the TRANSACTION state

                        Discussion:
                        If the POP3 server issues a positive response, then the
                        response given is multi-line. After the initial +OK, the
                        POP3 server sends the headers of the message, the blank
                        line separating the headers from the body, and then the
                        number of lines of the indicated message's body, being
                        careful to byte-stuff the termination character (as with
                        all multi-line responses).

                        Note that if the number of lines requested by the POP3
                        client is greater than than the number of lines in the
                        body, then the POP3 server sends the entire message.

                        Possible Responses:
                        +OK top of message follows
                        -ERR no such message

                        Examples:
                        C: TOP 1 10
                        S: +OK

                        S: <the POP3 server sends the headers of the
                        message, a blank line, and the first 10 lines
                        of the body of the message>
                        S: .
                        ...
                        C: TOP 100 3
                        S: -ERR no such message


                        • 9. Re: POP --> isDeleted
                          pilhuhn

                          The client is allowed to not request any line of the body. I think this is quite common when only displaying a header list. E.g. when connected from a mobile device over GSM or such.

                          • 10. Re: POP --> isDeleted

                             

                            The current implementation stores the headers and the body separately. Donno why.


                            Basically the store implementation is designed for scalability with respect to space (i.e. memory) rather than out and out performance. We store the body seperately in order to achieve partial writes. If we stored them together in the same column, we would only be able to achieve this Oracle. MySQL doesn't support efficient partial reading/writing of blobs and PostgreSQL has a seperate API for large objects.

                            Mike

                            • 11. Re: POP --> isDeleted
                              acoliver

                              not exclusively. I also want the DB to efficiently search the headers w/o search engine stuff.

                              • 12. Re: POP --> isDeleted
                                gohip

                                so as to not reinvent the wheel, does anyone have the sql code to fully delete emails from dbase

                                i.e., when a user or mailbox is deleted, all their associated email should be deleted, was wondering if anyone already has the SQL code that could accomplish this...will keep searching forums though, would look in code, but last time I spoke with Andrew, he stated mails were never fully expunged, and that we needed a routine to do this periodically

                                • 13. Re: POP --> isDeleted
                                  gohip

                                  actually, found the email blob cleanup code, which looks as if it deletes orpahned emails that dont exist in messagedata, I imagine I can just delete a mailboxes messagedata entries, and the run the sql code for the "cleanup", thus the "cleanup" sql code, would actually have something to cleanup.

                                  Just trying to delete a users data, alias, and mailbox, all in one "swooop"

                                  • 14. Re: POP --> isDeleted
                                    acoliver

                                    Guys, when such descriptions get so verbose I think one thing...PATCHES PLEASE!!! If it takes me a lot of time to understand it and it isn't IMAP efficiency, API or admin (my primary concerns ATM) then it will possibly get lost AND IT SHOULDN'T, and I KNOW that the other JBMS guys (as opposed to JBCS on the whole) are the same way! This sounds like a simpler fix with less lines of code than the actual thread!

                                    1 2 Previous Next