Are you using Apache+mod_jk and EJB clustering? I am curious to know that If the load is evenly distributed how can one particular node have excess load than the others? Did you check how many sessions are on each node? I believe there is no sort of co-ordinator at the EJB level, the client making the RMI call decides which node should be used if HA-JNDI is configured.
What do you mean by bounced? Is JBoss crashing with an exception or something? If so, please provide details.
I am using Apache+mod_jk. The cluster is configured to use EJB cache invalidation and there is no session replication going on. The clients don't end up using RMI since the mod_jk delegates to Tomcat via AJP13, and then Tomcat makes a local EJB call. So we're not using HAJNDI. The reason for using a cluster is so our servers can fully cache (option A), and then rely on the cache invalidation events to keep the caches in sync.
I do see the session count even across all of our boxes. However, only one box begins to get bogged down and eventually stops responding to requests. So I then have to restart (bounce) that instance. When it comes back it starts to work fine, but another instance starts to get bogged down (i.e. high load average). Since the session load is even, there must be some internal overhead which causes the instance to become overwhelmed at high loads.
I think it has to do with clustering and more specifically JGroups due to this high load average moving from one instance to another after a bounce. And I do think JGroups uses the concept of a coordinator since I just had an issue restarting one of my servers and it gave an error message saying it couldn't determine who the coordinator was (apparantly my cluster ended up in a bad state with 2 instances thinking it was the coordinator, so I had to stop one of them to fix that issue).
Let me know if any other clarifications would be useful.
Again, if anybody knows when to use FD_SOCK over FD for JGroups and how JGroups uses the concept of a coordinator, or any other ideas as to what's going on, any info would be greatly appreciated.