10 Replies Latest reply on Jun 16, 2003 8:47 PM by Kevin

    Problems clustering RH9 and Win2K.

    Kevin Newbie

      Hi all,

      I have followed the docs, as well as a short tutorial I found. I set up the linux rh9 and even found out the route was not in there so I added the route information. I verified on windows and linux both that the route info is the same.

      When I run the startup run.bat -c all, it does load the /all folder, and I see it starting the clustering stuff on both.

      I am using purely defaults at this point. I am not deploying any app. I simply want to see that both servers join the same cluster. I verified with our IT manager that mulit-cast is functional (not blocked) within our lan.

      For some reason, neither one is joining the others cluster no matter which way I start it.

      The Windows server shows the following (bried section):

      16:07:50,111 INFO [ClusterPartition] Connecting to channel
      16:07:50,121 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is localhost:2616
      -------------------------------------------------------
      16:07:52,174 INFO [ClusterPartition] Starting channel
      16:07:52,174 INFO [DefaultPartition] Number of cluster members: 1
      16:07:52,174 INFO [DefaultPartition] Other members: 0
      16:07:52,184 INFO [ClusterPartition] Started ClusterPartition: DefaultPartition

      16:07:52,184 INFO [ClusterPartition] Started
      16:07:52,184 INFO [HASessionStateService] Starting
      16:07:52,184 INFO [HASessionStateService] Started
      16:07:52,184 INFO [HANamingService] Starting
      16:07:52,234 INFO [HANamingService] Listening on 0.0.0.0/0.0.0.0:1100
      16:07:52,244 INFO [HANamingService$AutomaticDiscovery] Listening on /0.0.0.0:11
      02, group=230.0.0.4

      Everything is identical on the linux setup except the GMS shows:

      GMS: address is localhost:1028

      Again, I have left everything as is, defaults. From what I have read, this should work.

      I am looking into the cluster-service.xml file. Is there anything I should be changing here? I am reading that on windows I want to set loopback to false. I also read that the bind_addr on linux should be to the ip of the network that other machines talk with. As far as i know, localhost should work in this case?

      I'd appreciate any thoughts on why they can't communicate together.

      Thanks.

        • 1. Re: Problems clustering RH9 and Win2K.
          Kevin Newbie

          Incidentally, I downloaded the javagroups stuff and rant he tests. I ran the listener on the linux box ANd the sender. It worked. I also ran the sender on the windows box, and it worked. Oddly, when I send a string from the window box, both the sender and listener on the linux box show output. They do show the correct windows box ip. But when I Send on the linux box, only the linux box hears it. The windows box sender doesn't show any output. What confuses me is why the linux sender test shows that it receives a package from the windows box, and if that is correct functionality, why does the windows box sender not show anything? I also am running the sender/reciever on both boxes at the same time. Both receivers do get info from both boxes, but still the sender on linux also shows output. Any ideas (if its that big of a deal)?

          Anyway, so multi-cast is working, all default values are in place. Still, the jboss servers don't join the same cluster. Any chance the GMS thing has something to do with it, since they are different numbers? They seem to be random numbers though.

          Again, thanks for any help.

          • 2. Re: Problems clustering RH9 and Win2K.
            Kevin Newbie

            Crap, left out an important detail. JBoss 3.2.1 is what I am using on both machines.

            • 3. Re: Problems clustering RH9 and Win2K.
              Kevin Newbie

              So i have one more thought on this. In Linux we add 224.0.0.0 route and 240.0.0.0 mask. Windows also shows this, and as I said, I can multi-cast. However, I see the ip is 228.1.2.3, and the listening is done on 230.0.0.1. Is there any possibility I need to also add these in linux as routes?

              • 4. Re: Problems clustering RH9 and Win2K.
                Darran Lofthouse Master

                I am also running JBoss 3.2.1 with one node on Redhat and one node on Windows 2000.

                To stop my RedHat node thinking that it was localhost I needed to add the hostname and external ip address to /etc/hosts.

                I have also seen a post in the FAQ forum discussing this recently.

                • 5. Re: Problems clustering RH9 and Win2K.
                  Kevin Newbie

                  So, you also were not able to get clustering to work until you made this change to /etc/hosts? If so, can you paste what you added to that file. I am not quite the expert in linux (yet..). But you are able to cluster now, right?

                  Thanks.

                  • 6. Re: Problems clustering RH9 and Win2K.
                    Kevin Newbie

                    Alrighty, I tried adding the name of the computer (how do you find out the name anyway?), and that still doesn't work. The GMS still shows localhost:103? no matter how I play around with the /etc/hosts file. In there I have:

                    127.0.0.1 localhost.localdomain localhost
                    10.10.10.138 jbosstest.domain.com localhost

                    • 7. Re: Problems clustering RH9 and Win2K.
                      Dennis Cartier Newbie

                      Hi Buckman1,

                      If your hostname is jbosstest, then your hosts file should read:

                      127.0.0.1 localhost.localdomain localhost
                      10.10.10.138 jbosstest.domain.com jbosstest


                      You can confirm your hostname by typeing 'hostname' at in a shell. It should come back 'jbosstest'.

                      Dennis

                      • 8. Re: Problems clustering RH9 and Win2K.
                        Kevin Newbie

                        Actually, I did not set a computer name when installing RH9, so I don't know what it is. I take it that hostname command will give me the name of the computer? I just manually inserted the jbosstest value, hoping that JBoss would resolve the 10.10.10.138 to jbosstest. After inserting that line (my hosts looks exactly like what you typed), when I start jboss it still shows localhost:103x when starting.

                        Odd that multi-cast works form my win2k to the linux machine, and vice versa, as well as on each machine (running both sender/receiver on same machine). But jboss is not finding each other. I am hoping it is only the matter of the host name issue.

                        • 9. Re: Problems clustering RH9 and Win2K.
                          Kevin Newbie

                          Any other thoughts on why this may not work? Seems like it should based on everything I have read.

                          • 10. Re: Problems clustering RH9 and Win2K.
                            Kevin Newbie

                            Ok, I got it to partly work now. On the redhat linux 9 server, I had to go into /etc/sysconfig/network and change the hostname=localhost.localdomain to jbosstest.mydomain.com

                            In /etc/hosts file I also added our static ip. It looks like:

                            10.10.10.138 jbosstest.mydomain.com jbosstest
                            127.0.0.1 jbosstest.mydomain.com jbosstest

                            So, sure enough, this time out, when I start Jboss on the RH server, I finally see the GMS : jbosstest:1038 come up. Oddly enough, I see a lot of exceptions thrown as well, and have no idea why. In previous runs with just localhost, I never had these even while running it in a cluster.

                            Anyway, when I start the win2K box, I now see 2 nodes in cluster, etc. So at least they seem to be talking.

                            Now, is there a reason when the name of my linux computer was set to localhost.localdomain that it could not work? Should I just change it to localhost.mydomain.com?

                            Thanks.