5 Replies Latest reply on Jan 6, 2004 6:09 PM by jason.greene

    Message Selector Performance in Persistent P2P Queue DB Map

    jason.greene

      I was taking a look at using JMS as a replacement technology for provisioning. One of the requirements is the ability to requeue messages at a lower priority because events can often be retried. Also, there is the possibility that the external system that the data is being replicated too could be down, resulting in a large amount of database queuing. I at first thought that message selectors are a perfect solution for this because i can add a custom priority property and then configure mdb pools for each priority level. I did notice though, unless i missed something obvious, that there is no way to map a property to an actual column in the database, and all properties must be serialized within the messageblob. This presents a performance and an ordering problem. The performance problem is that without the ability to index a property, all records in the queue must be scanned even if there is only one record that matches the selector. The ordering problem is that if that one message that matches a selector is the last record in database order, then all records before it must be processed.

      The only way I can see to do this is to not use selectors, and to create multiple queues using the priority as a component of the queuename QueueP1, QueueP2, QueueP3. This is not a very elegant solution though because it dramatically increases the number of queues to manage.

      Has the ability to map properties to columns with the JDBC2 Persistence Manager ever been considered?

      Thanks,
      Jason

        • 1. Re: Message Selector Performance in Persistent P2P Queue DB
          genman

          In JBoss 3.2/3.0, every message that exists on disk has a so-called memory reference as well. This reference holds a subset of the fields, such as priority, message ID, expiration time, etc. Meaning, if you have 1,000,000 messages in some queue, JBoss holds on to 1,000,000 references in memory.

          Thus, there is little point (with this model) to index anything on the DB, since the data isn't accessed in any special order.

          Note that also in JBoss, messages with higher priority are always ordered before those of lower -- regardless of their age. If you do use a message selector, JBoss will do a full scan of the queue, but it's quite fast (even for 1,000,000 records) since it is done quickly in memory. (There are probably room for futher optimizations, as I expect there are corner cases that do badly.)

          Anyway, there is little to gain by breaking up properties into their own columns, unless you need to use this data from a separate SQL query.

          • 2. Re: Message Selector Performance in Persistent P2P Queue DB
            jason.greene

             

            "genman" wrote:
            If you do use a message selector, JBoss will do a full scan of the queue, but it's quite fast (even for 1,000,000 records) since it is done quickly in memory. (There are probably room for futher optimizations, as I expect there are corner cases that do badly.)


            I took a look at the source and it appears that this is not the case. While the fields you mentioned are cached, they are not used for the actual selector. When a P2P queue receive operation is invoked, the queue is scanned linearly and a selector is ran against the message headers.

            MessageReference m = (MessageReference) i.next();
             if (s.test(m.getHeaders()))
             selection.add(m.getMessage());
            


            This getHeaders() call not only loads the full headers, but also loads the entire message.

            It appears the problem is acknowleged in the source:
             /**
             * We could optimize caching by keeping the headers but not the body.
             * The server will uses the headers more often than the body and the
             * headers take up much message memory than the body
             *
             * For now just return the message.
             */
             public SpyMessage.Header getHeaders() throws JMSException
             {
             return getMessage().header;
             }
            


            This means that unless the message cache is large enough to hold the entire table, a selector will always perform a massive scan and deserialization of the queue.

            "genman" wrote:

            Anyway, there is little to gain by breaking up properties to their own columns, unless you need to use this data from a separate SQL query.


            It appears that a message header cache would be a simpler solution to implement, and will most likely bring a comparable performance increase. However, with this solution, header data and body data may need to be stored separately to save deserialization costs. Also, there is still an upper limit which is tied to the available memory in the system.

            Separate queues seems to be the only way to go at this time.

            Jason

            • 3. Re: Message Selector Performance in Persistent P2P Queue DB
              jason.greene

              Sorry I forgot to enable BBCode in the last post....

              Take 2:

              "genman" wrote:
              If you do use a message selector, JBoss will do a full scan of the queue, but it's quite fast (even for 1,000,000 records) since it is done quickly in memory. (There are probably room for futher optimizations, as I expect there are corner cases that do badly.)


              I took a look at the source and it appears that this is not the case. While the fields you mentioned are cached, they are not used for the actual selector. When a P2P queue receive operation is invoked, the queue is scanned linearly and a selector is ran against the message headers.

              MessageReference m = (MessageReference) i.next();
               if (s.test(m.getHeaders()))
               selection.add(m.getMessage());
              


              This getHeaders() call not only loads the full headers, but also loads the entire message.

              It appears the problem is acknowleged in the source:
               /**
               * We could optimize caching by keeping the headers but not the body.
               * The server will uses the headers more often than the body and the
               * headers take up much message memory than the body
               *
               * For now just return the message.
               */
               public SpyMessage.Header getHeaders() throws JMSException
               {
               return getMessage().header;
               }
              


              This means that unless the message cache is large enough to hold the entire table, a selector will always perform a massive scan and deserialization of the queue.

              "genman" wrote:

              Anyway, there is little to gain by breaking up properties to their own columns, unless you need to use this data from a separate SQL query.


              It appears that a message header cache would be a simpler solution to implement, and will most likely bring a comparable performance increase. However, with this solution, header data and body data may need to be stored separately to save deserialization costs. Also, there is still an upper limit which is tied to the available memory in the system.

              Separate queues seems to be the only way to go at this time.

              Jason

              • 4. Re: Message Selector Performance in Persistent P2P Queue DB
                genman


                You're right, it does have to fetch the entire message from DB to check the headers.

                Optimimally, what it should do is be able to page out the message body and message headers to DB as separate columns. And then you would create a separate datastructure that gets persisted to a different database column and you would have to modify the persistence manager to fetch those parts separately.

                Less optiminally would be to always store the message headers as part of the message reference. This would be a lot less coding and a lot less work, I think. The downside is that adding the message headers might take up quite a bit of memory, especially if the properties were quite large. It might be a good idea to have a configuration setting for this feature.

                If you're willing to contribute code, I'm sure it would be most welcome.

                • 5. Re: Message Selector Performance in Persistent P2P Queue DB
                  jason.greene

                  Sure I'll work on a prototype and contact the lead author of this component. It shouldn't take too long to develop.

                  Jason