2 Replies Latest reply on Aug 20, 2012 9:01 AM by masterv

    large cluster - adding removing nodes causes cluster to fall apart

    masterv

      Hi there

       

      Sorry I am part of the operations team supporting one of our applications which is around 50 node infinispan cluster, it is an older version of 5.0.0 and the current issue faced is if a new node is added specially whilst cluster is active, it can make node appear as part of a new multicast even though it is joining the right multicast ip, sometimes stopping a node causes other nodes to fall out too.

       

      The other issue is when starting up certain nodes the values of node id's is returned as hexadecimal values.

       

      Having worked with different size inifispan clusters, it would seem if the cluster consists of less nodes unsure what the best range is but lets assume a cluster containing 10 nodes would work fine and removal, addition of nodes would be less complicated so it would seem.

       

      Trying to find the best solution for this current dilemma since most times this issue causes an outage and would like to find a way of causing less disturbance.

       

       

      1. Would it be possible to run a large cluster like above on lets say 4 different multicast groups but then have them work together or share cache across multicast - if so what would this be configuration or development or a case of both?

       

      2. What if the master node starting up the cluster was to be a much better specification than the rest of the nodes in the cluster and was doing nothing else besides being the master cluster server i.e. not publicly giving out data, would this help addition/removal of cluster nodes without disturbing/breaking entire cluster.

        • 1. Re: large cluster - adding removing nodes causes cluster to fall apart
          galder.zamarreno

          Hmmm, we've been creating big clusters in our environment without issues, but I'd suggest you try with more recent Infinispan versions, such as 5.1.5.FINAL.

           

          Remember too that if in production, we now have a supported version of Infinispan called JBoss Enterprise Data Grid where we provide professional support:

          http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/

          • 2. Re: large cluster - adding removing nodes causes cluster to fall apart
            masterv

            Thanks Galder, we are planning to upgrade infinispan but will need to interact with the developers to ensure all is well.

             

            We are using IBM Blades HS21/HS22 to host the cluster on and the infinispan is working through tomcat on these nodes. Each host on the cluster is running multiple tomcat instances (to increase the cluster size)

             

            There has been some form of a pattern noticed with the blades, if a stop is issued to one node on a blade that has other hosts that is part of the same cluster multicast, then all the nodes on this blade chassis tend to break. Stopping starting any of these nodes will make them start their own clusters. To work around this we change the multicast IP and start up entire cluster.

             

            Unsure if there are others who have had a similar issues with IBM blades and infinispan clustering.