9 Replies Latest reply on Oct 8, 2011 6:15 AM by ataylor

    HornetQ behavior on network outage

    mayankmit2002

      Hello,

       

       

      We are using HornetQ (2.2.5) in JBoss As 6.1.0.

       

      Our test consists of  2 server nodes running Jboss AS in clustered mode and HornetQ in active-backup mode.

      We are perforing test of failover in case of network outage.

       

      Server Side code

      -------------------------

      Simple clustered SLSB.

      Method is exposed from it, which triggers a some messages.

      In the finally block we are flushing those messages are fired on a JMS Topic.

       

      Client side code

      -----------------------

      Client is a simple JMS client listing to those messages. Here we are using provider URL (port ::: 1100), consiting of IP's from both servers.

       

      Now I tried to casses and facing some issues.

       

      Case1::   Shared store

      =================

       

      Following are my configuratrions.

       

      Shared Store: true. Both nodes are sharing a samba location on my 3rd machine.

      Clustered: false

       

      Node2 is my backup node.

       

      Now if I pull out the cable from Node1, then my JMS clients hangs for 1 minute and then dropped the correction. but gets connected again.

      This behavior seems not aligned with what is specified in HornetQ documentation

       

       

      Case2::  Local store

      ===============

       

      Following are my configuratrions.

       

      Shared Store: false. Both nodes are using their

      Clustered: false

       

      Node2 is my backup node.

       

      Now If I pull out the cable from Node1  the hornetQ on node2 starts throwing exception

        • 1. Re: HornetQ behavior on network outage
          clebert.suconic

          Case 1: how are you creating the clients? What's the configuration? if clients are inactive they will need some time until they realize the server is out based on pings and pongs.

           

          Case 2:  I'm not sure about what exception are you seeing. But you need a backup node for a failover.

          • 2. Re: HornetQ behavior on network outage
            mayankmit2002

            Hello Clebert,

             

            Case 1

            =====

            Here I attached configuration files from both nodes and the client code too.

             

            Case 2

            =====

            I've attached the file in "Case2" archive.

            Here in this, I've provided stacktrace of the server, the logs from the server from clients. Please use the Client side code from the case 1 itself.

             

            The only change in the configuration here is :

            On Node2 the value for <backup> tag is set to false in hornetq-configuration.xml .

             

            Still I'm unable to understand why the server is unable too lookup JmsXA on Node2 once Node1 gets disconnected.

            • 3. Re: HornetQ behavior on network outage
              ataylor

              Im finding it difficult to actually understand what you are saying here, it all sounds a bit confused, can you please explain clearly what you are trying to acheive and what you are actually seeing.

               

              couple of points tho.

               

              you shouldnt be creating message listeners inside MDB's as its not allowed.

               

              clustered should be set to true if you want failover to occur.

               

              one node should be a backup (backup=true).

               

              also when you pull out the network cable the client will hang as tcp will try to reconnect you.

              • 4. Re: HornetQ behavior on network outage
                ataylor

                also your client looks like its opening a new connection every time which is an anti pattern, this wouldnt play well with failover

                • 5. Re: HornetQ behavior on network outage
                  mayankmit2002

                  In simple words we want to achieve HA with Hornetq without shared storage/SAN.

                   

                  The system we've to workout is consists of only 2 machines (no third machine, except client machine(s)) with Jboss 6 running on it (with hornetQ).

                   

                  What all are expectation from system:

                  1. The failover should be seamless.

                  2. There should be no message loss during failover.

                  3. Messages fired suring failover should be recieved to the client by max delay of 5sec.

                   

                  Now, the system consists of:

                  1. A SLSB ( a Facade)

                  2. An another SLSB (SLSB1) is injected in facade.

                  3. If clients calls any method of facade, it should fire an event with the scope of JTA on a JMS Queue and will be listened by a MDB (upon transaction completion). If transaction gets rolledback, those events will be automaticallly flushed.

                  4. Message listened on MDB should invike a particular callbacks registered it and could fire some event on the a JMS Topic.

                  5. Upon invocation of of callback, the messages listened at MDB are posted on JMS Topic.

                  6. JMS client(s) (on client machine)  is listening to the JMS Topic.

                  7. Once event reached to client , it will be delegated to reective handler.

                   

                  This is all we waht to achieve using HornetQ and Jboss 6.

                   

                  To achieve this I tried Active-backup configuration (with shared location) of hornetQ, but is taking almost 30 seconds to complete its failover.

                  But later we found that shared location could be single point of failure.

                  So, we've to subside this configuration and  then think og configuring in clustered environment, but as per documentaion, SAN is recommended. As it is a too costly option, we've to drop this too..

                   

                  So we tried an option  (Case 3) which is some what in middle of Case1 and Case2. with this we've both nodes in active mode with no shared storage. this suites our requirement of two systems.

                   

                  With case 3, we are experiencing the reported issue(s) i.e. issues in Case2.zip.

                  • 6. Re: HornetQ behavior on network outage
                    mayankmit2002

                    Andy,

                    Here is the code snippet of my clinet, can you tell me, where it is an anti-pattern.

                     

                     

                    public class TestMessageListener implements MessageListener, ExceptionListener {

                     

                     

                              private boolean isJMSConnected;

                              private ConnectionFactory mConnectionFactory;

                              private TopicConnection mTopicConnection;

                     

                     

                              public TestMessageListener() {

                                        while (true) {

                                                  if (!isJMSConnected) {

                                                            lookup();

                                                  }

                                                  try {

                                                            Thread.sleep(5000l);

                                                  } catch (Exception anException) {

                                                            // do nothing

                                                  }

                                        }

                              }

                     

                     

                              public static void main(String[] args) {

                                        new TestMessageListener();

                              }

                     

                     

                              @Override

                              public void onMessage(Message aMessage) {

                                        try {

                                                  ObjectMessage objectMessage = (ObjectMessage) aMessage;

                                                  String string = (String) objectMessage.getObject();

                                                  System.out.println("Message Recieved froms server::: " + string);

                                        } catch (JMSException exception) {

                                                  System.err.println("Exception occured......");

                                                  isJMSConnected = false;

                                                  exception.printStackTrace();

                                        }

                              }

                     

                     

                              private synchronized void lookup() {

                                        try {

                                                  Context context = null;

                                                  final Properties props = new Properties();

                                                  props.put(Context.INITIAL_CONTEXT_FACTORY,

                                                                      "org.jnp.interfaces.NamingContextFactory");

                                                  props.put(Context.URL_PKG_PREFIXES,

                                                                      "org.jboss.naming:org.jnp.interfaces");

                     

                     

                                                  props.put("jnp.sotimeout", "1000");

                                                  props.put("sun.rmi.transport.tcp.readTimeout", "1000");

                                                  props.put("socketTimeout", "1000");

                     

                     

                                                  props.put("jnp.timeout", "1000");

                                                  props.put("timeout", "1000");

                                                  props.put(Context.PROVIDER_URL, "Node1:1100,Node2:1100");

                                                  props.put(NamingContext.JNP_DISABLE_DISCOVERY, "true");

                                                  context = new InitialContext(props);

                     

                     

                                                  System.out

                                                                      .println("Trying to lookup:: Messaging.....  ServerMessageListenerSimulator");

                                                  mConnectionFactory = (ConnectionFactory) context

                                                                      .lookup("ConnectionFactory");

                                                  Topic topic = (Topic) context.lookup("topic/TestTopic");

                                                  mTopicConnection = (TopicConnection) mConnectionFactory

                                                                      .createConnection();

                     

                     

                                                  TopicSession topicSession = (TopicSession) mTopicConnection

                                                                      .createSession(false, Session.AUTO_ACKNOWLEDGE);

                     

                     

                                                  TopicSubscriber topicSubscriber = (TopicSubscriber) topicSession

                                                                      .createConsumer(topic);

                                                  topicSubscriber.setMessageListener(this);

                                                  mTopicConnection.setExceptionListener(this);

                     

                     

                                                  mTopicConnection.start();

                     

                     

                                                  isJMSConnected = true;

                                                  System.out

                                                                      .println("Message listener simulator is ready to receive events from server.");

                                                  // System.in.read();

                     

                     

                                        } catch (NamingException exception) {

                                                  System.err.println("Unable to start Messaging "

                                                                      + this.getClass().getSimpleName());

                                        } catch (JMSException exception) {

                                                  if (mTopicConnection != null) {

                                                            try {

                                                                      mTopicConnection.close();

                                                            } catch (JMSException exception1) {

                                                                      System.err

                                                                                          .println("exception occured while closing connection"

                                                                                                              + this.getClass().getSimpleName());

                                                            }

                                                  }

                     

                     

                                                  System.err.println("Unable to start Messaging "

                                                                      + this.getClass().getSimpleName());

                                                  exception.printStackTrace();

                                        }

                              }

                     

                     

                              @Override

                              public void onException(JMSException anArg0) {

                                        if (mTopicConnection != null) {

                                                  try {

                                                            mTopicConnection.close();

                                                  } catch (JMSException exception) {

                                                            System.err.println("exception occured while closing connection"

                                                                                + this.getClass().getSimpleName());

                                                  }

                                        }

                                        System.err.println("Exception occured......");

                     

                     

                                        anArg0.printStackTrace();

                              }

                     

                     

                    }

                    • 7. Re: HornetQ behavior on network outage
                      ataylor

                      In simple words we want to achieve HA with Hornetq without shared storage/SAN.

                      HornetQ doesn't support HA witrhout shared storage, that will be available in the next version when we re enable replication.

                       

                      As for your anti pattern, the EventEmitter class is opening a new connection for every message sent, you should re use connections and sessions. Also yopu should not use a message listener in a JEE environment.

                      • 8. Re: HornetQ behavior on network outage
                        mayankmit2002

                        Andy Taylor wrote:

                         

                        In simple words we want to achieve HA with Hornetq without shared storage/SAN.

                        HornetQ doesn't support HA witrhout shared storage, that will be available in the next version when we re enable replication.

                        So, in which version the above said feature is planned or will be available.

                         

                         

                        Andy Taylor wrote:

                         

                         

                        As for your anti pattern, the EventEmitter class is opening a new connection for every message sent, you should re use connections and sessions. Also yopu should not use a message listener in a JEE environment.

                        Thanks for guiding me, what I'm doing wrong at my server side code.

                        But I would like to say that none of the message listener is implemented in a JEE environment, we are using MDB every where.

                        In  EventEmitter we are just firing events and not listening them.

                         

                        The only place where we are using Message Listener is at client end which is a non JEE environment.

                        But still my question is there why I'm unable to lookup JmsXA after failover; it simply means that after getting disconnected from Node1 why Node2 is unable to deploy/start JmsXA. This is what I can understand from the exception stating "No route to host". The most interesting part is that now if I start Node1 again, the exception is no more there and JmsXA is now available on Node2.

                        This behavior is totally out of my understanding......

                        Except this behavior, the there is no other issue with two non Active-backup and non-Clustered HornetQ instances running within Jboss 6 cluster.

                        • 9. Re: HornetQ behavior on network outage
                          ataylor

                          Im not really sure i understand what you are doing, with regard to your listener class, you havent told me the topolgy etc, however the JmsXA connection factory is the App Server managed connection factory and only available invm not remotely.