1 Reply Latest reply on Oct 23, 2004 10:48 PM by tom.elrod

    Network interface dies, EJB method invocations not failed ov

    tmarx

      I have three Intel boxes running RH9.0 and JBoss 3.2.1 set up in a clustered environment. In addition to the default partition, I have created an additional partition for these machines (TestPartition). In the deployed application, there are three SLSBs with clustering enabled for the TestPartition partition. They are also configured to use the round-robin load-balancing policy.

      A fourth Intel box running RH9.0 runs a separate stand-alone Java process which makes use of this cluster and retrieves the EJB remote interfaces via HA-JNDI and HA-JNDI auto discovery (Hashtable passed in to the InitialContex object constructor contains: provider_url=null, url_pkg_prefixes=org.jboss.naming:org.jnp.interfaces, jnp.partitionName=TestPartition, jnp.discoveryGroup=230.0.0.5, jnp.discoveryPort=230.0.0.5:1102 discovery group and port match the values specified in the cluster deployment descriptor). This stand alone process invokes methods on the clustered beans on a regular interval which in turn query db tables.

      I bring up the three JBoss instances and I can see in the log files all three of the machines successfully join the cluster. I start my stand-alone app and I can see that the work is being distributed amogst all three machines. If I kill one of the app server instances via a kill or kill -9, the remaining two app servers report the instance as a dead member and continue working. When I bring the instance back up, it rejoins the cluster and work is distributed to it.

      The problem I encounter is when I pull the network cable out of one of the machines or shut down the network interface. When I do that, the other members of the cluster mark that machine as dead but any method invocations to it do not get failed over. The hang until the machine/instance comes back on-line.

      Has anyone else encountered this problem or perhaps have any suggestions on how to fix or work around this problem?

      Thanks!

      -Tom

        • 1. Re: Network interface dies, EJB method invocations not faile

          The invocations failover depends strictly on the behavior of the
          transport being used. If the socket is not timing out and reporting
          that the connection is broken, there will be no failover. Its
          upto the os tcp stack to determine how long to wait for the
          invocation to fail. We don't have a client side timeout notion
          so whatever os keepalive setting exist need to be configured. This
          is often 2 hours by default.

          I checked on rh3.0, and its default is 2 hours:
          [starksm@localhost testsuite]$ cat /proc/sys/net/ipv4/tcp_keepalive_time
          7200

          [thanks Scott :) ]

          For Solaris can change the tcp_keepalive_interval setting. See http://www.sun-microsystems.org/ for details (search for tcp_keepalive_interval, as is large page).