5 Replies Latest reply on Jan 27, 2005 7:21 PM by tylerblack

    HAJNDI/Clustering Bug in 4.0.x?

      Possible Bug in JBoss 4.0.0 and JBoss 4.0.1

      Tested with:
      JBoss 4.0.0 and JBoss 4.0.1
      Redhat ES 3.0 Update 3
      JDK 1.4.2_06

      Please be patient as you read through all of this. I've tried to provide as much information as possible so that we can work towards a solution quickly.

      We have a system which is distributed over 3 separate clusters. During various processes, we require messages to be sent from one cluster to another. We are using HAJNDI to send and receive the JMS messages. However, any remote cluster HAJNDI lookup fails. It seems that, no matter what provider url we enter when we are creating the InitialContext for the lookup, it will always do the lookup on the local cluster! We will get a NameNotFoundException on the sending node, and if we hotdeploy the mdb that was waiting on the destination cluster, the previously sent message is immediately received on the sending node. There is no activity on the destination cluster. This configuration worked perfectly in JBoss 3.2.5.

      If we add our own clusters on top of a shared DefaultPartition, the messages are of course sent and received successfully. However, we need to have completely separate clusters, (DefaultPartitions and our own partitions) because we have MDB's that we only want deployed on certain clusters that we have created, and we don't want to share the same singleton DestinationManager that runs on the DefaultPartition accross all of our nodes. As a result, we've set up 3 different DefaultPartitions that each use a different UDP Multicast IP, configured in cluster-service.xml.

      Is this working as intended? Is HAJNDI designed to ONLY work within the same cluster? (it was working in earlier versions of JBoss) If you would like more detail, please read through the various config, code and log file excerpts below. If you need more clarification or more info, please ask.

      We set up a test environment with 2 different DefaultPartitions Server1 and Server2, distinguished from each other by Mulicast ip's. Neither node is multihomed. (If the DefaultPartitions are the same, this example works.) The only thing that we change is the Multicast ip for the DefaultPartition on Server2. When the two separate clusters are running, each has its own DestinationManager service running, as expected. The sender is an MBean with a method that accepts a String parameter running on Server1. This parameter is the provider url (192.168.129.49:1100 for the example Server2's ip) that will be used to create the InitialContext that is used for the lookup of the destination Queue on Server 2. The destination is an MDB bound to the jndi name of "queue/TestingMDB". When we invoke the method in our MBean(Server1) it tries to Connect to the local HAJNDI port not Server2's HAJNDI port. We endup seeing the following error on Server1:

      ERROR [testing.QueueTester] NamingException in QueueTester.sendMessage(): queue/TestingMDB
      Nothing is logged on Server2 when this fails. If we change the Provider url to 192.168.129.49:1099 everything works as planned.(except no High Availablity is involved on Server2's Cluster).

      From cluster-service.xml on the sender node:
      <mbean code="org.jboss.ha.framework.server.ClusterPartition"
      name="jboss:service=DefaultPartition">
      ...
      <UDP mcast_addr="228.1.2.3" mcast_port="45566"


      From cluster-service.xml on the destination node:
      <mbean code="org.jboss.ha.framework.server.ClusterPartition"
      name="jboss:service=DefaultPartition">
      ...
      <UDP mcast_addr="228.1.2.4" mcast_port="45566"


      MBean test code that is sending the JMS message:
      (providerUrl has been verified at runtime to contain the correct destination ip and correct destination port of the remote destination (ie: 192.168.129.49:1100))
      public void connect(String providerUrl) throws NamingException, JMSException
      {
      Properties props = new Properties();
      props.put("java.naming.factory.initial","org.jnp.interfaces.NamingContextFactory");
      props.put("java.naming.factory.url.pkgs","org.jnp.interfaces");
      props.put(Context.PROVIDER_URL, providerUrl);
      Context context = new InitialContext(props);

      // Lookup the managed connection factory for a topic
      QueueConnectionFactory factory =
      (QueueConnectionFactory) context.lookup("UIL2XAConnectionFactory");

      // Create a connection to the JMS provider
      queueConnection = factory.createQueueConnection();

      // Lookup the destination you want to send to
      queue = (Queue) context.lookup("queue/TestingMDB");


      Test MDB on the destination node that has been verified to be bound to the jndi name "queue/TestingMDB":
      public void onMessage(Message _incMessage) {
      log = Logger.getLogger(this.getClass());
      try
      {
      TextMessage message = (TextMessage) _incMessage;
      log.error("JMS TEXT MESSAGE RECEIVED!: " + message.getText());
      }
      catch (Exception e)
      {
      log.error("Exception caught in TestingMDBBean.onMessage(): " + e.getMessage());
      }
      }


      Both jboss instances have had run.conf modified to include the following JAVA_OPTS:

      JAVA_OPTS="-server -Xms128m -Xmx384m -Djava.awt.headless=true -Djboss.bind.address=[**IPofServer**]"
      JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n $JAVA_OPTS"

      Thank you for reading all of this and I hope we can work this out, whatever the problem may be.


        • 1. Re: HAJNDI/Clustering Bug in 4.0.x?

          When separating partitions in JBoss 4.0.x, one needs to have different Multicast IP's AND different partition names. Otherwise the partitions will not separate.

          Thank you to Adrian and the jboss support staff.

          • 2. Re: HAJNDI/Clustering Bug in 4.0.x?
            belaban

            No, OR:

            one needs to have different Multicast IP's **OR** different partition names.

            To separate clusters, you can either
            - change *all* partition names OR
            - change the mcast IP address OR
            - change the mcast port OR
            - all of the above

            • 3. Re: HAJNDI/Clustering Bug in 4.0.x?

              Here's the thing.

              We tried changing only the ports first. The problem persisted.

              Then we tried changing only the Multicast IP's. The problem persisted.

              Then we tried changing only the Partition Names. The problem persisted.

              Then we tried calling JBoss support. Adrian told us to change the Partition Names in order for the changes to take effect. We tried changing the Multicast IP's AND the Partition Names, and the problem was solved.

              I'm really not trying to argue. People make mistakes and we could have missed a port or ip number or a name. I find this unlikely as we were testing on 2 boxes, but not impossible. I guess all I'm saying is "we tried that".

              Again, I'm thankful for JBoss support in helping us resolve this issue.

              • 4. Re: HAJNDI/Clustering Bug in 4.0.x?

                Just to clarify, the confusion comes from the different views of the cluster.

                JGroups maintains the group membership. You can create different groups
                by changing the partition name AND/OR the mulitcast address/port.

                An J2EE client however is unaware of JGroups. It just sees the partition name.

                So if you just change the multicast address/port and expect one cluster to
                act as a client of the other it is going to get very confused if they are both
                called "DefaultPartition".

                • 5. Re: HAJNDI/Clustering Bug in 4.0.x?

                  Thank you for clarifying. I can see now that as my first reply is worded it isn't complete, or correct.

                  Cheers!