6 Replies Latest reply on Jun 25, 2012 4:43 PM by clebert.suconic

    HORNETQ-950 - external big data storage design thread

    clebert.suconic

      This thread is about https://issues.jboss.org/browse/HORNETQ-950

       

       

      I had some talks with a few folks about this, and from what I understood, they would like to have these following features:

       

      (obs: since it's not defined what BigData we would use, MongoDB, or any other big-data like, I will just call it big-data-store)

       

      - Zero node-to-node communication. Store on the data, and that's it.. any node can read from there. no need to talk between the nodes once the data is placed on the big-data-store. We will need some notification between the servers about reading data from it when it's placed on the clusters.

       

      - Backward play... or whatever you want to call this feature. Once you create a queue, previous sent messages to the address should also be available on the address.

      Implication to this is, we need to define:

      - when to delete a message from the data store?

      - when to associate a message to a new queue? Big data sometimes means **big** meaning you could go back to a long period of time. How far should we go, and how we would define what messages will belong to the queue?

        • 1. Re: HORNETQ-950 - external big data storage design thread
          clebert.suconic

          Also: if you have a single queue distributed among different nodes. what will happen when a message is consumed in one queue?

           

          I understood from a talk I had with this user that they want a message to be acked on any node..but I'm not sure how possibly that could work.

          • 2. Re: HORNETQ-950 - external big data storage design thread
            connie.yang

            A message can be deleted when

            • its consumers have received and sent back with an ACK, or
            • the queue or address that this message belongs to have no consumer or subscription, and the message has reached its retention period.  A message retention policy can be defined when the queue or topic was created.

             

            In addtion storing the messages in a persistence store, we should also have a metadata store which keeps track of the queue, topic and subscription information.  This way, any node in the system can access the information.

            • 3. Re: HORNETQ-950 - external big data storage design thread
              jbertram

              This is sounding more and more like the model we had with JBoss Messaging that we explicitly moved away from with HornetQ. 

               

              HORNETQ-950 says, "HornetQ currently keeps its messages either in memory or on its local storage. This limits other HornetQ processes running in the same cluster behind a vip to access these messages without using HornetQ core bridge. Also, we don't have a good way to scale horizontally via the core bridge."  I don't understand why using a "core" bridge limits horizontal scalability.  What's the use-case here?  We moved away from the centralized data-store to improve horizontal scalability

               

              "A robust solution is to store these messages to an external persistence store (e.g. MongoDB, Cassandra or other NoSQL DBs), so that each HornetQ instance in a cluster can access these message and hence achieving the distributed messaging architecture."  How is this robust?  It adds a single point of failure.

              • 4. Re: HORNETQ-950 - external big data storage design thread
                clebert.suconic

                Connie Yang wrote:

                 

                A message can be deleted when

                • its consumers have received and sent back with an ACK, or
                • the queue or address that this message belongs to have no consumer or subscription, and the message has reached its retention period.  A message retention policy can be defined when the queue or topic was created.

                 

                In addtion storing the messages in a persistence store, we should also have a metadata store which keeps track of the queue, topic and subscription information.  This way, any node in the system can access the information.

                That contradicts one of the requirements you gave me...     Be able to replay a queue, i.e. create a queue at a point in the past and be able to receive mesages that were sent long ago.

                 

                 

                Think of the twitter case.   I create a Search and start to listen on it.

                 

                 

                If we did excatly like twitter, we would have a different approach to messaging where you would need an operation to actually delete messages. (i.e. like the user deleting a message). and ACK would only mean.. do not send me it the message again.

                • 5. Re: HORNETQ-950 - external big data storage design thread
                  connie.yang

                  #Justin, the usecase here is to be able to reliably send messages to our consumers from any of the messaging nodes that sit behind a load balancer.  The selected delivery method can a response to a pull request via HTTP or a push request.  In the push model, messages are "pushed" to the consumers via a callback endpoint (or websocket in the future).  Each node runs HornetQ in its embedded mode and can be scaled horizontally.

                   

                  While the core bridge implementation works well in a small deployment, it can become an operation maintainence concern due to bridge configuration/mapping if this environment grows.  Also, with messages that belong to a queue spread across the nodes, we can no longer guarenteed data consistency when our consumer asks for it.  By moving the data to its own tier, each node will have a fair chance of accessing the data when it needs to.

                   

                  #Clebert, maybe I'm missing something, but I don't quite understand your comment.

                   

                  WRT replaying a queue, the usecase I can think of is to be able to send messages to our consumers, including the messages that were received before the the consumers subscribe to the queue or topic.

                   

                  Not sure the details of Twitter usecase is, but one of our quality of services focuses on reliable messaging and the model we're striving for is the ability to keep the messages until our client or consumer wants us to delete them or a visibility timeout has passed.  This goes back to the point where we want to provide different messaging quality of services for different needs.  For our best effort real time messaging, we rely on a HTTP 200 ACK as an indication for us to remove the messages.

                  • 6. Re: HORNETQ-950 - external big data storage design thread
                    clebert.suconic

                    @Connie what you call subscribe to a topic, is broken into multiple operations in HornetQ

                     

                     

                    I - Create a Queue to an Address

                    II - Attach a consumer to the Queue

                    III - receive messages from the consumer

                     

                     

                    That translates well into any JMS (Or non JMS Concept).

                     

                     

                    In HornetQ, we start to route messages at the moment the create is created. (Having or not having a consumer attached to it).

                     

                     

                    On your case, as far as I remember, you could create a subscription (or create a queue) and start receiving messages at an earlier point... similar to the twitter usecase I described you.