8 Replies Latest reply on Jan 27, 2003 1:52 AM by Craig Day

    Dont be speed obsessed

    Craig Day Newbie

      Everyone seems to be saying, concentrate on performance and scalability first and forget about fail-over for now. This seems strange as I bet hardly any of you have a real requirement for high-speed, high-throughput, 1000s of client messaging systems. If I could find a JMS solution that gave me clustered/federated message routers with guaranteed fail-over and a maximum message rate of 50 msg/secs Id be very happy. Dont go out and build a JMS solution that goes like a bat out of hell, but then behaves eratically and incorrectly when a router node goes down. Making it fast is the easy problem,.. solve the hard problem, reliability and correctness under duress.

      c

        • 1. Re: Dont be speed obsessed
          Joel Vogt Master

          Well, there is always a trade off but I tend to agree. I think the problem with the current jms is the reliability. You can get it working well but only with a lot of extra code and there seems to be a steep learning curve.

          • 2. Re: Dont be speed obsessed
            Bela Ban Master


            The reliability part can be solved by using JavaGroups, which is already used successfully by the clustering subsystem.


            JG is an ideal candidate for the topics side of JMS, for queues I suggest TCP (JavaGroups is geared towards point-to-multipoint).
            Bela

            • 3. Re: Dont be speed obsessed
              Abid Farooqui Newbie

              Hi Craig,
              Although I agree with you in general but I would say that thinking that users don't need speed in a MOM based product is quite one sided. Hiram and me for instance worked on a MOM based product last year where component distribution was achieved via JMS or MOM and transfer rate of large messages upto 2MB that were also durable or persistent directly effected the performance of the product.
              If JBoss JMS implementation can handle not 1000 but 250 clients and can transfer not thousands but only 50msg/sec (persistent ... guranteeing delivery) like you suggest but where each message is several hundread KBs or even a MB, that would be OK. Does it do that today?
              In real business products messages can easily get to several hundread KBs. Consider an XML based EDI like set of documents being transferred between trading partners.
              Reliability is of course very very important in a MOM implementation. That goes without saying. To achieve reliability however, you don't need to START with a federated failover solution. That is too complex to start with, unless one is simple minded and does not anticipate all the inherent problems that come with a asynchronous and synchronous based MOM implementation.

              In a MOM implementation, the first object should be to achieve
              1)fast
              2)reliable
              3)guaranteed
              message delivery based on non-failover scenario. Assuming that by failover you mean that a machine goes down or a JVM goes down.
              If the JVM goes down then persistent or durable messages will still be delivered, but only later when the service comes back up. There is a delay but not a failure to deliver. That is what asynchronous messaging is for. However, if the message was not persistent/durable then essentially the user is saying that this message can be lost on failure, so that should be an expected result. No MOM should loose persistent messages or even non-persistent messages just under run of the mill conditions. That just means that MOM is not designed well and that has no bearing on failover.

              Second phase should focus on federated design and failover. Failing JVMs have nothing to do with guranteed delivery when durable/persistent messages are used.

              Abid Farooqui

              • 4. Re: Dont be speed obsessed
                Craig Day Newbie

                I guess, in the area I work (financial markets) the problem we always eventually come up against is the single point of failure. The scenario is typically as follows:

                I need to send messages to a market quickly, reliably guaranteed and in order... so first thing you do is you make the message queues persistent. Now with most MOMs Ive played with, the persistent queues live on a single node in the cluster and that node is solely responsible for managing the durable persistent subscriptions to that queue. The problem that currently is not solved by any MOM is that when that node goes down message delivery is delayed for that queue until the node comes back up. Thats fine IF THE NODE COMES BACK UP. If the machine has died then I have to accept a messy recovery scenario where I dont know what the state of the queue and messages is,.. I will probably lose messages or deliver duplicates, at the very least write a whole lot of messy client code to flip to secondary queues or similar. I need a MOM that manages the durable subscriptions across a cluster of message server nodes so that at any time any node in the cluster can reliably know the state of the queues and durable subscriptions and receive/deliver on behalf of the clients. You need 2-phase commit of every messaging operation across all nodes in the cluster.

                c

                • 5. Re: Dont be speed obsessed
                  Bela Ban Master

                  > You need 2-phase commit of every messaging operation across
                  > all nodes in the cluster.

                  We will provide something similar in the Cache/JBoss project. All nodes participate in a locking protocol: one node fails to acquire the lock, the transaction is aborted.

                  May be interesting for you guys too.

                  Bela

                  • 6. Re: Dont be speed obsessed
                    Paul Smith Newbie

                    IMHO, I agree with craigday, I would reorder the priority that farooqui posted to be:

                    1)reliable
                    2)guaranteed
                    3)fast

                    There's no point in being fast if the other 2 are not met.

                    I don't think many (if any) users of a JMS product are really looking for load balanced behaviour (a very nice to have and helps tremendously towards the speed issue), but are incredibly interested in fail-over situations.

                    Perhaps an interim "Clustered Fail Over JMS" solution would REQUIRE a JDBC persistence store which the cluster uses (then force users of product to use their own clustering of the JDBC datastore), and the cluster automatically assigns a node to be the JMS node.

                    • 7. Re: Dont be speed obsessed
                      John Fawcett Newbie

                      In addition to the big three (reliable,guaranteed,fast) mentioned, I think accessibility (meaning connectivity to a diversity of clients) is also critical. I too work in the financial industry, and the common problem we have is conducting asynch communication amongst nodes that speak different languages, or, more importantly in the case of jbossmq, aren't java clients.

                      Another problem we run up against often is distributed nodes - e.g. a client with several offices that are "network distant" from each other, but we still need to conveye messages between clients at the various locations.

                      • 8. Re: Dont be speed obsessed
                        Craig Day Newbie

                        I guess what that boils down to is having a properly documented wire protocol or at least an implementation in something like C/C++ that is closer to compile-portable. Interestingly though, doing this makes it hard to do the interceptor/dynamic proxy thing on the client, which I see might be the approach in the rewrite. I like the idea of perhaps getting security/transactions almost for free with interceptors, but that aint going to work in C. I dont know how far the rewrite has got,.. but Ive been teasing some people at work about writing a JMS provider myself,.. can someone give some feedback on where the rewrite is at and what its looking like?

                        cheers
                        craig