1 Reply Latest reply on Sep 17, 2003 12:08 PM by rob_dickinson42

    Uneven balancing with round robin policy

    rob_dickinson42

      I apologize for the length of this post, clustering can be complicated .

      I'm using the JBoss clustering framework to export clustered RMI services. My prototype initially works properly, but as I shut down and restart server nodes I'm seeing some unfairness in the round robin load balancing. It's certainly possible this isn't a JBoss bug, but a problem in how I'm using the clustering framework.

      THE TEST CASE

      I've got a simple MBean (a "beacon") that exports an RMI server using the JBoss HA framework, and then binds the RMI stub into local JNDI:

      public class Beacon
       extends org.jboss.system.ServiceMBeanSupport
       implements BeaconMBean
      {
       public void startService() throws Exception
       {
       rebind();
       }
      
       private void rebind() throws Exception
       {
       log.info("Rebinding...");
      
       // Grab HA partition
       String pname = "/HAPartition/" + partitionName;
       InitialContext context = new InitialContext();
       HAPartition partition = (HAPartition)context.lookup(pname);
      
       // Create HA-RMI server
       this.beaconImpl = new BeaconImpl(partition);
       this.beaconServer = new HARMIServerImpl(partition, "Beacon",
       BeaconInterface.class, beaconImpl, 0, null, null);
      
       // Bind server stub
       BeaconInterface stub = (BeaconInterface)beaconServer.createHAStub(new RoundRobin());
       context.rebind(jndiName, stub);
      
       log.info("Rebinding complete");
       }
      
       // LOTS OF DETAIL OMITTED HERE!
      }


      The beacon RMI server does nothing but increment a counter and log the new value.

      My simple client looks up a beacon interface and repeatedly makes calls to the beacon interface (by looking up the RMI stub via JNDI). The beacon interface is looked up every 10 calls to detect when the load balancing gets stuck (it wouldn't otherwise be required to do this).

      public class Client
      {
       public static void main(String[] args) throws Exception
       {
       Properties p = new Properties();
       p.put("java.naming.factory.initial", "org.jnp.interfaces.NamingContextFactory");
       p.put("java.naming.factory.url.pkgs", "org.jboss.naming:org.jnp.interfaces");
       p.put("jnp.partitionName", "DefaultPartition");
       p.put("java.naming.provider.url", "jnp://localhost:80,jnp://localhost:81");
      
       int count = 0;
       BeaconInterface beacon = null;
       while (true) {
       try {
       if ((beacon == null) || count++ % 10 == 0) {
       InitialContext context = new InitialContext(p);
       beacon = (BeaconInterface)context.lookup("mybeacon");
       }
       System.out.println("OK: " + beacon.execute(null));
       } catch (Throwable t) {
       beacon = null;
       t.printStackTrace();
       } finally {
       Thread.sleep(100);
       }
       }
       }
      }



      RUNNING THE TEST CASE

      1) Download and expand the jboss-roundrobinbug.zip file.

      2) You need JBoss 3.2.2rc3 somewhere on your hard disk. If this isn't installed to 'c:\jboss322rc3', then edit the build properties in the test project accordingly.

      3) Run 'ant install' to configure your JBoss server to run the test project. (This creates a 'beacon' server configuration, while leaving any other servers alone.) This sets up the 'primary' JBoss server.

      4) Copy your JBoss server to a new directory (like 'c:\jboss322rc3-2'). Edit the 'cluster-service.xml' file and change the port to 81. (It's 80 by default.) This sets up the 'backup' JBoss server.

      5) Run the 'ant test' target from the Ant test project build. This starts the client polling, which initially results in a stream of SocketTimeoutExceptions.

      6) Start the primary JBoss server (run -c beacon). Now the client starts to make calls to the beacon. When the count is evenly divisible by 10, the client pauses momentarily as the InitialContext is recreated as expected.

      0
      1
      2
      3
      4
      ...

      7) After letting the primary run for a bit, fire up the backup JBoss server (again, run -c beacon). Now the client starts to evenly load balance:

      511
      0
      512
      1
      513
      2
      514
      ...

      8) Now stop and restart the primary JBoss server. The client now gets stuck to the backup. If the InitialContext were not refreshed periodically, the client has a 50-50 shot of never calling the primary after it fails.

      1
      19533
      2
      19534
      3
      19535
      4
      19536
      5
      19537
      19538
      19539
      19540
      19541
      19542
      19543
      19544
      19545
      19546
      19547
      6
      19548
      7
      19549
      ...

      Once the client starts to see the behavior in step #8, there doesn't appear to be a way to get it unstuck. Bouncing the primary server again or the backup doesn't ever restore the even load balancing initially evident.

      If anyone can shed some light on what's happening here, I'd appreciate it! I rather hope this is something I'm doing wrong, rather than a framework bug...