4 Replies Latest reply on Jan 18, 2006 8:23 PM by brian.stansberry

    JBAS-2677 Only update FamilyClusterInfo targets if elements

    brian.stansberry

      In the description of JBAS-2677, I proposed to fix it by changing "FamilyClusterInfoImpl.updateClusterInfo() so it doesn't change its targets unless A) it's out of sync with the server, B) the view id has changed, or C) the new target list has different elements from the old."

      I'm planning to drop step "C" from the algorithm as it will be inefficient to compare two lists in large clusters. The view id is essentially a hash of the JGroups IPAddress(es) of the nodes where a key is deployed; using it to detect whether the target list has changed should be sufficient.

      The ways I can see where this could break down are:

      1) Hash collision in viewId. Very unlikely.
      2) No change in cluster nodes or beans deployed thereon, so view id hasn't changed. But, an invoker has been redeployed. Now the invoker stub in the target list doesn't match what's on the server. Again, not likely - user would have to redeploy the invoker and all EJBs that use it, but not restart the ClusterPartition.

      If either of these occur, the consequences are that FamilyClusterInfo misses an update and the target list includes an invalid target. As soon as a failed attempt is made to contact that target, the FCI will be marked as out of sync with the server and on the next proxy download the target list will be updated.

        • 1. Re: JBAS-2677 Only update FamilyClusterInfo targets if eleme
          starksm64

          But doesn't the round robin order show up for repeated use? This seems like a minor transient issue that evens out in the end. Introducing stateful behavior in a stateless proxy requires reuse of the proxy. If one wants globally unique interation across all home/remote proxy invocations a seperate load balancing policy that does not depend on the ArrayList implementation details of the FamilyClusterInfo is needed.

          Why the DistributedReplicantManagerImpl is not maintaining a consistent ordering based on the jgroups list is certainly one question. It comes down to the contract guarenteed by the DistributedReplicantManager.

          • 2. Re: JBAS-2677 Only update FamilyClusterInfo targets if eleme
            brian.stansberry

            In the support case that gave rise to this issue, their test case always generated patterns where the client would end up never or almost never calling certain nodes. But you're right, with different or more varied usage patterns than their test case, the calls would probably even out over time. Wouldn't be round-robin though; more like semi-random.

            Re: doing a global iteration across all proxies of a given family, that seemed to me to be a natural benefit of ClusteringTargetsRepository and its static map, so I assumed it was meant to work that way. (Note also that because of this issue, even the calls from the single home proxy that was used throughout their test weren't round-robin).

            The DRM.ReplicantListener interface doesn't say anything about the order of the replicant list. If we tightened up some of the contracts in the HAPartition interface javadoc (e.g. say getClusterNodes() will return an array that is consistently ordered across the cluster), DRMImpl could be changed to notify listeners with a consistently ordered list of replicants and I could add that fact to the DRM.ReplicantListener contract.

            • 3. Re: JBAS-2677 Only update FamilyClusterInfo targets if eleme
              starksm64

              What was the usage pattern?

              The ClusteringTargetsRepository is still replacing the cached FamilyClusterInfoImpl targets list on the instance, so it does not really seem to be doing much. This is where your algorithm would have to be applied to prevent thrashing of the targets list. This could also maintained an ordered set of the targets to avoid relying on the order coming from the DRM.

              • 4. Re: JBAS-2677 Only update FamilyClusterInfo targets if eleme
                brian.stansberry

                The usage pattern was pretty simple. 3 node cluster.

                1) Look up an SLSB home.
                2) Call home.create().
                3) Invoke 2 methods on the bean.
                4) Repeat steps 2 and 3 ten times; always use the same home.

                For the creates, the patterns they reported (where the # is the server that got the invocation) were 3212212212, 2122122122 and 2113113113. For the calls on the remote itself, the one series they reported was 12312113212113212113.

                If I sat down and analyzed it, I suspect the pattern had something to do with a 3 node cluster and 3 calls per series, etc.

                The change I was thinking was to FamilyClusterInfoImpl.updateClusterInfo()

                public ArrayList updateClusterInfo (ArrayList targets, long viewId)
                {
                 synchronized (this)
                 {
                 if (!this.isViewMembersInSyncWithViewId || (viewId != this.currentViewId))
                 {
                 this.targets = (ArrayList) targets.clone();
                 }
                 this.currentViewId = viewId;
                 this.isViewMembersInSyncWithViewId = true;
                 }
                 return this.targets;
                }


                Only change is the if test around the update of the targets member.

                As I was about to hit "submit", I realized the above change is dangerous. I think the analysis I went through in the first post is valid for the client side, but this code is also used on the server side. If a target list got updated on the server side but the viewId didn't change, the new targets would never get applied. This is highly unlikely, but very bad if it happened. Either a comparison of the list contents is needed in the "if" test, or consistent DRM ordering, or both.

                I'm inclined toward consistent DRM ordering.