1 Reply Latest reply on Apr 5, 2004 8:37 AM by Marco

    SFSB prolem during state replication in HA cluster (1) ? pre

    Marco Newbie

      Hi folks,

      i'm writing a proof of concept focused on Session EJB high availability, this proof is based on two simple Session EJB the first is a stateless and the second, obviously, stateful with a simple conversation logic; source code and deplyment descriptors follows.

      To test EJB HA i've defined a JBoss cluster configuration made of three logical nodes running into DefaultPartition; a front end node FE1 running only a web application as client of remote EJBs; the other two back end nodes, BE1 and BE2, holds the same EJB package with clusterable tags sets to true.

      The cluster, only for development purpose, is running onto a unique machine; configurations follows. The same cluster configuration has been tested in Linux (RH9) and Win2K with identical "unexpected" results.

      Testing a front end node FE1 with Web app using HA-JNDI and only one back end node, say BE1, with EJB app, the cluster works fine with both kind of EJBs; but when turn ON the second EJB back end node BE2 ... the new BE2 node is recognized by the cluster, the stateless EJBs are still well balanced using default RoundRobin policy, the stateful EJB ... lets the cluster go 'crazy' in a unrepeatable way; it rarely works, the most fails; it seems to create an infinite number of stateful EJB instances going into a stack overflow exception on the other node respect the one is about to serve the request.

      Scenario example:

      1- the client on FE1 create a remote instance, the home created by HA-JNDI serves a new remote instance from BE1,
      2- the client start and consume a conversation with the remote instance, the EJB generates a random number,
      3- the client executes 'n' consuming calls to terminate the conversation then remove the remote instance,

      [All traces are from the Win2K run test, the Linux test results are equals ... :-( i was hoping in a Gates mistake !!!!]

      this is an excerpts from trace ...

      see also messages
      "SFSB prolem during state replication in HA cluster (2) ? (traces)"
      "SFSB prolem during state replication in HA cluster (3) ? (source code and configs)"

      The mixed s.o. test (Linux + Win2K) has not be done ... i don't want to win another nightmare !!!

      Until here this may seems a tutorial on how to test a cluster ;-) but ...

      As from correctness of stateless EJB availability and balancing the cluster node communication seem to works,
      the problem 'seems' to be about stateful EJB state propagation, should i configure some specific for sharing EJB states ?

      Is there something wrong in 'cluster-service.xml' configuration ?

      Can anyone advice me on the next serious action to put it at work ?

      Bela, hey Bela ... i'm talking to You ;-)

      P.S. Sorry for a so huge message, but is difficult to describe a so strange behaviour.