2 Replies Latest reply on Aug 2, 2005 6:31 PM by rcostanzo

JBoss 3.2.6 Clustering with JGroups Configuration Question

rcostanzo Aug 2, 2005 5:08 PM

My cluster of 5 JBoss 3.2.6 machines on Red Hat Linux has issues with members dropping in and out of the cluster under heavy load. I am looking for the most reliable method to keep my cluster in a steady state. Does anybody know what the cost/benefits are for switching from FD to FD_SOCK in the JGroups configuration? I am looking for examples for when each would be appropriate to use.

Also, I notice that only one of the members in the cluster starts rapidly degrading under heavy load (with an exceptionally high load average), and once that instance is bounced, another instance starts to degrade, like a virus. But the traffic is being evenly distributed to all boxes. Does anybody have any ideas as to the cause of this? My gut is that it has to do with the cluster coordinator being overloaded with work. Is there a good resource out there to find out what the cluster coordinator does and how to optimize it?

Thanks.

-Rob

1. Re: JBoss 3.2.6 Clustering with JGroups Configuration Questi

vignesh76 Aug 2, 2005 6:03 PM (in response to rcostanzo)

Hi,

Are you using Apache+mod_jk and EJB clustering? I am curious to know that If the load is evenly distributed how can one particular node have excess load than the others? Did you check how many sessions are on each node? I believe there is no sort of co-ordinator at the EJB level, the client making the RMI call decides which node should be used if HA-JNDI is configured.

What do you mean by bounced? Is JBoss crashing with an exception or something? If so, please provide details.
Actions
2. Re: JBoss 3.2.6 Clustering with JGroups Configuration Questi

rcostanzo Aug 2, 2005 6:31 PM (in response to rcostanzo)

I am using Apache+mod_jk. The cluster is configured to use EJB cache invalidation and there is no session replication going on. The clients don't end up using RMI since the mod_jk delegates to Tomcat via AJP13, and then Tomcat makes a local EJB call. So we're not using HAJNDI. The reason for using a cluster is so our servers can fully cache (option A), and then rely on the cache invalidation events to keep the caches in sync.

I do see the session count even across all of our boxes. However, only one box begins to get bogged down and eventually stops responding to requests. So I then have to restart (bounce) that instance. When it comes back it starts to work fine, but another instance starts to get bogged down (i.e. high load average). Since the session load is even, there must be some internal overhead which causes the instance to become overwhelmed at high loads.

I think it has to do with clustering and more specifically JGroups due to this high load average moving from one instance to another after a bounce. And I do think JGroups uses the concept of a coordinator since I just had an issue restarting one of my servers and it gave an error message saying it couldn't determine who the coordinator was (apparantly my cluster ended up in a bad state with 2 instances thinking it was the coordinator, so I had to stop one of them to fix that issue).

Let me know if any other clarifications would be useful.

Again, if anybody knows when to use FD_SOCK over FD for JGroups and how JGroups uses the concept of a coordinator, or any other ideas as to what's going on, any info would be greatly appreciated.
Actions

Go to original post