I have followed the docs, as well as a short tutorial I found. I set up the linux rh9 and even found out the route was not in there so I added the route information. I verified on windows and linux both that the route info is the same.
When I run the startup run.bat -c all, it does load the /all folder, and I see it starting the clustering stuff on both.
I am using purely defaults at this point. I am not deploying any app. I simply want to see that both servers join the same cluster. I verified with our IT manager that mulit-cast is functional (not blocked) within our lan.
For some reason, neither one is joining the others cluster no matter which way I start it.
The Windows server shows the following (bried section):
16:07:50,111 INFO [ClusterPartition] Connecting to channel
16:07:50,121 INFO [STDOUT]
GMS: address is localhost:2616
16:07:52,174 INFO [ClusterPartition] Starting channel
16:07:52,174 INFO [DefaultPartition] Number of cluster members: 1
16:07:52,174 INFO [DefaultPartition] Other members: 0
16:07:52,184 INFO [ClusterPartition] Started ClusterPartition: DefaultPartition
16:07:52,184 INFO [ClusterPartition] Started
16:07:52,184 INFO [HASessionStateService] Starting
16:07:52,184 INFO [HASessionStateService] Started
16:07:52,184 INFO [HANamingService] Starting
16:07:52,234 INFO [HANamingService] Listening on 0.0.0.0/0.0.0.0:1100
16:07:52,244 INFO [HANamingService$AutomaticDiscovery] Listening on /0.0.0.0:11
Everything is identical on the linux setup except the GMS shows:
GMS: address is localhost:1028
Again, I have left everything as is, defaults. From what I have read, this should work.
I am looking into the cluster-service.xml file. Is there anything I should be changing here? I am reading that on windows I want to set loopback to false. I also read that the bind_addr on linux should be to the ip of the network that other machines talk with. As far as i know, localhost should work in this case?
I'd appreciate any thoughts on why they can't communicate together.
Incidentally, I downloaded the javagroups stuff and rant he tests. I ran the listener on the linux box ANd the sender. It worked. I also ran the sender on the windows box, and it worked. Oddly, when I send a string from the window box, both the sender and listener on the linux box show output. They do show the correct windows box ip. But when I Send on the linux box, only the linux box hears it. The windows box sender doesn't show any output. What confuses me is why the linux sender test shows that it receives a package from the windows box, and if that is correct functionality, why does the windows box sender not show anything? I also am running the sender/reciever on both boxes at the same time. Both receivers do get info from both boxes, but still the sender on linux also shows output. Any ideas (if its that big of a deal)?
Anyway, so multi-cast is working, all default values are in place. Still, the jboss servers don't join the same cluster. Any chance the GMS thing has something to do with it, since they are different numbers? They seem to be random numbers though.
Again, thanks for any help.
Crap, left out an important detail. JBoss 3.2.1 is what I am using on both machines.
So i have one more thought on this. In Linux we add 126.96.36.199 route and 240.0.0.0 mask. Windows also shows this, and as I said, I can multi-cast. However, I see the ip is 188.8.131.52, and the listening is done on 184.108.40.206. Is there any possibility I need to also add these in linux as routes?
I am also running JBoss 3.2.1 with one node on Redhat and one node on Windows 2000.
To stop my RedHat node thinking that it was localhost I needed to add the hostname and external ip address to /etc/hosts.
I have also seen a post in the FAQ forum discussing this recently.
So, you also were not able to get clustering to work until you made this change to /etc/hosts? If so, can you paste what you added to that file. I am not quite the expert in linux (yet..). But you are able to cluster now, right?
Alrighty, I tried adding the name of the computer (how do you find out the name anyway?), and that still doesn't work. The GMS still shows localhost:103? no matter how I play around with the /etc/hosts file. In there I have:
127.0.0.1 localhost.localdomain localhost
10.10.10.138 jbosstest.domain.com localhost
If your hostname is jbosstest, then your hosts file should read:
127.0.0.1 localhost.localdomain localhost
10.10.10.138 jbosstest.domain.com jbosstest
You can confirm your hostname by typeing 'hostname' at in a shell. It should come back 'jbosstest'.
Actually, I did not set a computer name when installing RH9, so I don't know what it is. I take it that hostname command will give me the name of the computer? I just manually inserted the jbosstest value, hoping that JBoss would resolve the 10.10.10.138 to jbosstest. After inserting that line (my hosts looks exactly like what you typed), when I start jboss it still shows localhost:103x when starting.
Odd that multi-cast works form my win2k to the linux machine, and vice versa, as well as on each machine (running both sender/receiver on same machine). But jboss is not finding each other. I am hoping it is only the matter of the host name issue.
Any other thoughts on why this may not work? Seems like it should based on everything I have read.
Ok, I got it to partly work now. On the redhat linux 9 server, I had to go into /etc/sysconfig/network and change the hostname=localhost.localdomain to jbosstest.mydomain.com
In /etc/hosts file I also added our static ip. It looks like:
10.10.10.138 jbosstest.mydomain.com jbosstest
127.0.0.1 jbosstest.mydomain.com jbosstest
So, sure enough, this time out, when I start Jboss on the RH server, I finally see the GMS : jbosstest:1038 come up. Oddly enough, I see a lot of exceptions thrown as well, and have no idea why. In previous runs with just localhost, I never had these even while running it in a cluster.
Anyway, when I start the win2K box, I now see 2 nodes in cluster, etc. So at least they seem to be talking.
Now, is there a reason when the name of my linux computer was set to localhost.localdomain that it could not work? Should I just change it to localhost.mydomain.com?