1 Reply Latest reply on May 8, 2013 1:43 PM by pruivo

    RpcManager#getMembers vs. Transport#getMembers - should they be in sync?

    c_lohmn

      Hi,

       

      I noticed that upon shutting down a clusternode/cachemanager, the RpcManager#getMembers method returns the new cluster state quite a bit earlier compared to Transport#getMembers.

       

      In my scenario (a unit test), a node (=CacheManager instance) is stopped while another thread fires Callables via the ExecutorService on that node.

       

      At some point I get

      java.lang.IllegalArgumentException: Target node IBIS-60036 is not a cluster member, members are [IBIS-29670, IBIS-53537]

          at org.infinispan.distexec.DefaultExecutorService.submit(DefaultExecutorService.java:454)

      which is expected. Here the members come from RpcManager#getMembers.

       

      But when I invoke cacheManager.getTransport().getMembers() directly after that (that is still before the VIEW_CHANGED event has arrived), I still get the old cluster state with 3 members.

      Only after more than 80ms, the VIEW_CHANGED event arrives and Transport#getMembers returns the new cluster state.

       

      Is that to be expected?

       

      In that case I assume, it's better to use RpcManager#getMembers in most cases (if I have a cache instance and each node is supposed to have that cache), right?

      (relates to ISPN-2641).

       

      Cheers,

      Carsten

        • 1. Re: RpcManager#getMembers vs. Transport#getMembers - should they be in sync?
          pruivo

          Hi,

           

          yes, I think so. When you invoke the stop() in the CacheManager, that node will send a message to notify the leaving. After this message, a state transfer may be trigger to avoid losing any data own by the leaving node. At this moment, the RpcManager.getMembers() already knows about the leaving node and returns the most updated cluster members.

           

          Later, the leaving node disconnects from JGroups and the view change is triggered.

           

          Cheers,

          Pedro