2 Replies Latest reply on Mar 25, 2010 12:01 AM by bjchip

    dual-homed 4.2.3  cannot get discovery on 2nd nic

    bjchip

      Situation is this:

       

      Two dual-homed servers with eth0 named in the hosts file  and eth1 not named.

       

      Call the two servers laurel and hardy.

       

      eth0 is assigned to unconnected subnets.  IE in hosts on laurel

       

      192.168.1.32  laurel

       

      and on hardy

       

      192.168.2.31  hardy.

       

      So they can't be asked to cluster on eth0.  Won't do discovery there either.

       

      When queried on ifconfig both admit to having an eth1 assigned

       

      eth1 is all good 192.168.7.1 on laurel and 192.168.7.2 on hardy.

       

      Testing with an independent and very simple program verifies that these two addresses can see each other can be addressed by this user and successful multicast can be used.

       

      In the command line, using b 0.0.0.0 gets the servers up.

               They don't see each other.  They bind to the eth0 address and see nothing.

       

      Using -Dignore.bind.address=true and setting bind_addr=192.168.7.1 in cluster-services.xml on laurel and 192.168.7.2 on hardy doesn't work either.

       

      Nor does the command line run.sh  -b 0.0.0.0 -Dbind.address=192.168.7.1

       

      In no case does discovery occur.

       

      Configuring over TCP however, works.   

       

      I am at wits end here.  This setup "Just works" in single-homed setups all over.   Works in dual-homed setups with TCP explicit clustering or discovery set up.  Works (apparently) in dual-homed setups elsewhere, but the ordering of the nics is not known in any of those cases.

       

      Moreover, the hornetq embedded managed to discover its counterpart on the other side with only a little help (setting the local.bind.address).

       

      The JBoss cluster gives no indication of being able to find anything on the other side.  RMI binds to the named first NIC. Everything else binds where it should (It seems).

       

       

      respectfully

      BJ

        • 1. Re: dual-homed 4.2.3  cannot get discovery on 2nd nic
          brian.stansberry

          Does this work?

           

          -b 0.0.0.0 -Djgroups.bind_addr=192.168.7.1

           

          If not, check the server.log files, JGroups logs to STDOUT information about how it's binding sockets; you can see what interface it's trying to use.

           

          If it's using the correct interface, then try the diagnostic tests linked to in the "Further troubleshooting" section of http://community.jboss.org/wiki/TestingJBoss . Make sure you test the same multicast address and ports you've configured JBoss AS to use; I've seen machines configured to route part of the class D address space over eth0 and the rest over eth1.

          • 2. Re: dual-homed 4.2.3  cannot get discovery on 2nd nic
            bjchip

            OK... I didn't try your way exactly, but did use something like it.   An observation (over the past week or so of beating my head against this wall and reading everything I can find) finally clicked in that for most of the places where people appear to succeed, they appear to set things in the command line options.

             

            Soooo  I created a command line feeder which looks very like this - experimenting with an unmodified "all" config so that I could bounce the machine faster and cleaner ( our fully installed system is a lot heavier ).   The names have been changed because I am a paranoid sort of guy. 

             

            -startMyCluster.sh-

             

            run.sh  -b 0.0.0.0 -g MyCluster -c myCluster -u 230.0.0.8  -Dbind.address=192.168.7.1 -Dhornetq.remoting.netty.host=192.168.7.1

             

            -and with a small hint to hornetq in its own config file, it all appears to work properly.  I say appears because I only managed to get to this point yesterday and we're still seeing if it does actually work.  

             

            The annoying thing about the environment we got was that it wasn't built this way on-purpose but it is about as difficult as any can be for this purpose.

             

            Now all I can observe leads me to the belief that setting this up through the system properties ALMOST works... the logs all say its all good... but it isn't, because it always comes back as the only server in the cluster...  but something is not reading one of the properties unless it is set through the command line.   What, where, how I am not interested in.   I found an answer.    I'll live with the funky command line kluge :-)

             

            I'd vastly prefer that the apps developers got their stuff to work on 5.1.GA and let me work with the more coherent system.... not quite yet.   I am supposed to get that for our next installation, but that is then and this is now.

             

            I suspect, because of changes used when things finally worked, that there was something about the default multicast address.  I finally used a simple client server to verify multicast working.   The other possibility is that the admin I was working with had left multicast disabled and simply re-enabled it when I mentioned I was looking at it .   One can never be sure unless they give you root.  :-)   

             

            respectfully

            BJ