JBoss AS Clustering FAQ (pre-7.x)

Version 41

    My clustering does not work in the first place?

     

      If this is your first time running JBoss clustering, you will need to make sure that the mulitcast is working on your machines. To troubleshoot it, go to JGroups Wiki Troubleshooting Section

     

    Now JGroups layer work, what's next for web clustering troubleshooting?

     

      Please go to Tomcat Clustering page for further information on how to setup an example.

     

    How do I isolate clusters on the same network?

     

      See TwoClustersSameNetwork

     

    How do I change my clusters partition name?

     

      See Changing the Partition Name of a Cluster

     

    What is the difference between cluster-service.xml and tc5-cluster-service.xml?

     

      Since release 3.2.6 and 4.0, there is a new service file called tc5-cluster-service.xml that is used for the new http session replication implementation using JBossCache. The old cluster-service.xml is still used for the rest of the HA solution like SFSB, etc.

     

    Problem: Assume we have two JBoss nodes in a cluster. If we start node A and node B then deploy an ear file to farm it is propegated to both servers as expected. If we have the ear file in node A's farm but not node B's farm -- then start node B fully before starting node A -- Jboss fails to deploy the ear to node B.

     

      Solution:  Go to Succeed in deploying applications to the cluster farm and read the solution there.

     

    "Failed to setup clustering", what does it mean?

     

      [JBossCacheService][main] jboss.cache:service=TomcatClusteringCache service to Tomcat clustering not found
      [JBossCacheManager][main] JBossCacheService to Tomcat clustering not found
      [TomcatDeployer][main] Failed to setup clustering, clustering disabled
    

     

    There's two possibilities here:

     

      1) Since 3.2.6, HTTP session replication is based on JBossCache, and more precisely the TomcatClusteringCache service.In 3.2.6, 3.2.7, 4.0.1 and 4.0.1SP1, the dependency on this service is missing in Tomcat's jboss-service.xml. To fix it, please edit /all/deploy/jbossweb-tomcat5x.sar/META-INF/jboss-service.xml and enable the

    element itself.

     

    The resulting code should look as follows:

     

          <!--
             Configuration for HTTP Session Clustering using JBossCache
          -->
          <depends>jboss.cache:service=TomcatClusteringCache</depends>
    

    instead of

          <!--
             Configuration for HTTP Session Clustering using JBossCache
          -->
          <!--
          <depends optional-attribute-name="CacheName">jboss.cache:service=TreeCache</depends>
          -->
    

     

    2) This error could also come up if you're trying to remove the clustering capabilities from the all configuration as explained in JBossASTuningSliming. Even though you might have modified jbossweb-tomcat5x.sar/META-INF/jboss-service.xml to remove the dependency on

    jboss.cache:service=TomcatClusteringCache

    , if the web application is deployed as

    <distributable></distributable>

    , AS will try starting up

    jboss.cache:service=TomcatClusteringCache

    , for HTTP session replication. In this case, the ERROR message should not be considered as a failure because if it can't start it up, it'll log the ERROR but will continue working as normal.

     

    HA-JNDI lookup to an EJB deployed in a clustered JBoss instance does not work from a web application deployed in "default" server in an unclustered JBoss instance?

     

    Accessing a clustered EJB deployed in a clustered JBoss instance from a web application deployed in "default" server in an unclustered JBoss instance results in the following exception

    javax.naming.NamingException: Could not dereference object [Root exception is javax.naming.CommunicationException: 
    Failed to retrieve stub from server 204.246.8.199:1100 [Root exception is java.io.StreamCorruptedException: 
    unexpected block data]]
    

    To fix this problem, copy jbossha.jar from server/all/lib/ and place it in server/default/lib. The absence of this jar file results in the error. The above was tested with two JBoss instances, with the unclustered JBoss instance containing

    only the "default" server.

     

    Can I use farming with an exploded archive?

     

    No, this feature is not available.

     

    Why are calls between clustered session beans not load balanced even though load balancing policy is Round Robin?

     

    Let's say that we have two session beans, A and B. A is not a clustered bean and is deployed in server 1 and B is a clustered bean deployed in servers 1 and 2. Using Round-Robin as load balancing policy, you might expect calls from bean A to bean B to be load balanced between servers 1 and 2. Actually, this is not what will happen. Repeated calls from bean A to bean B will always remain in server 1.

     

    In JBoss, the default policy is to invoke locally for clustered ejbs. This avoids the need for distributed transactions and serialisation, thus reducing the complexity of the call and increasing the performace.

     

    There's various alternatives in order to avoid this situation. First one, remove session bean A from server 1 and deploy it in a new server, let's say server 3. Calls from bean A to bean B will then be load balanced between servers 1 and 2. Another alternative would be to write your own interceptor and modify the EJB stack(only available from JBoss 4.0.2). For example:

     

    public class MyInvokerInterceptor extends InvokerInterceptor
    {
      public boolean hasLocalTarget(Invocation invocation)
      {
        return false;
      }
    

    }

     

    How can I check which members compose the cluster?

     

    org.jboss.ha.framework.server.ClusterPartitionMBean contains a method called getCurrentView() which returns list of members forming the cluster. It returns a Vector of Strings representing the host:port values of the nodes. For example:

    [10.11.14.104:1099, 10.11.14.105:1099]

     

    This MBean is exposed by the jboss:service=DefaultPartition service (defined in cluster-service.xml) which is the service managing the cluster membership.

     

    Is there any way to receive topology change notification?

     

    See ClusteringTopologyChangeNotifications wiki.

     

    I made my app clustered and now I'm getting serialization exceptions? Why

    If your app uses clustered web sessions or clustered SFSBs, the clustering function needs to replicate application state around the cluster.  Replication involves serializing your application state; you need to make sure that all such state is either Serializable or marked as transient.  See DebuggingSerializationErrors for more.

     

    How do I reduce the network traffic in a clustered environment?

     

      If you use the default config, and replace FD with FD_SOCK, then traffic will be null when there is no data between the nodes (e.g. replication messages).

      Note however, that this is not recommended. We actually recommend using both FD and FD_SOCK together. FD does not generate a large amount of traffic -- one point-to-point packet from each node to its neighbor every FD.timeout ms. Depending on what AS version you are using, the default FD.timeout setting is 2500 or 10000; a packet per node every few seconds is not a lot of traffic.

     

    I have deployed SLSB(Stateless Session Beans) in a cluster with Round Robin policy. But I see a failover mode only. Only one node services a request until it dies?

     

      Make sure you don't create the remote interface before each call. Instead create it once, and use it for all subsequent calls.

     

      Note that this shouldn't be an issue in any JBoss release since 3.2.3. Since that release when you create an SLSB the proxy randomly picks its initial target.

     

    "Possible concurrency problem: Replicated version id X matches in-memory version for session ...", what does it mean?

     

      The message basically is saying that a replicated session is overriding an existing session in that node. Quite often the version id is 1 but regardless of the version id, the problem is the same. Here's an scenario:

     

      Let's say you have a first request (r1) that lands on node x (x) and a second request (r2) land on node y (y). If r2 gets to y before r1 gets replicated to y, a new session will be created on y, which will get replicated to x overriding what r1 had created on x. Node x will them prompt the message that you're seeing because the replicated session from r2 is overriding the session created by r1 in x. This will only happen if a new session was created in y. There's also the possibility (remote) that y will log this message, for example if r1 created a new session in x and replicated it to y after r2 created a new session in y. Very unlikely but could still happen. There's even also the possibility of both nodes reporting it.

     

      Whether this messages are something you should be concerned by depends on whether r1 or r2 store anything meaningful in the session. Basically, whichever server logged lost their copy of the session. So:

     

    1. If r1 and r2 both store something meaningful, for sure a problem. One or the other is lost.

     

    1. If r1 doesn't store data, but r2 does, it's probably OK. Not OK if y logs the message. OK if x logs the message.

     

    1. If r2 doesn't store data, but r1 does, probably a problem. Not OK if x logs the message. OK if y logs the message.

     

     

      The best way to completely avoid these messages is to use sticky sessions with mod_jk. When other load balancers are used, even if sticky sessions is used, these messages can still appear. For example, in the case of F5 BIG-IP load balancer, it provides sticky sessions using hash session persistence. This method hashes the JSESSIONID cookie and uses the value to determine a server for load balancing. Since there is no JESSSIONID cookie on the first request, but there are for subsequent requests, it is possible that the first request is being served by one server and subsequent requests are being served by another. This does not happen with mod_jk because it requires each Tomcat worker to configured to set a jvmRoute per node in the cluster which must match the worker (node) name in the mod_jk side. When Tomcat replies to the first request, it adds the jvmRoute value to the session id so that mod_jk later can redirect to the same node where the request landed first.

     

    If you see these messages when using a Netscaler load balancer, see the following wiki to get advice on how to configure Netscaler to use sticky sessions.

     

    I get "org.jboss.ha.framework.server.ClusterFileTransfer$ClusterFileTransferException: Did not receive response from remote machine..."

     

    See debugging farming file transfer wiki.