Clustering Troubleshooting Ideas

Version 3

    Automatic Thread Dump Generation

    JIRA: JBAS-6311

    All In One JGroups Test

    - What is it? A cluster certification test in jgroups.

    - N nodes certifying each other that they're ready for JBoss Clustering
    - Tests to run/combine: multicast test, view demo test, larget state test

     

    Muticast test:
      - tough to automate N node send/receive rotation to check multicast traffic is flowing.
      - easier to do such thing with view demo, hence multicast test might not be necessary.
      - maybe all nodes send at the same time, each adding something diff to the message and then each node checks it got N messages back for the message it sent!

     

    View demo + Large test:
      - start with xml configuration and number of expected nodes in cluster (N) and state size to transfer (i.e 1 MB)
      - coordinator checks that view contains the number of expected nodes.
      - N-1 nodes that are only members must have received the state correctly (they know the size they should expect).
      - coordinator instructs 2nd node in cluster to die/shutdown and restart channe    l after T timeout.
      - coordinator checks that after instruction to shutdown, the view is updated correctly.
      - coordinator checks that after waiting long enough, the 2nd node is part of the view again and 2nd node received state correclty.
      - repeat this with N-1 nodes that are supposed to form the cluster.
      - once the current coordinator, c1, has done this with all nodes, it will shutdown its own channel so that a new coordinator is elected.
      - new coordinator, c2, will do the same thing all over again with the rest of nodes.
      - test ends when you'd done all permutations c1-cN.

     

    Testing:
      - emulate mcast traffic not working, i.e. misconfigure firewall
      - emulate mcast traffic working but not able to cope with state size, how??

     

    Cluster Message Flow Representation

    Show JGroups/JBossCache message flow from data gathered from multiple nodes in the cluster.