Version 3

    Synchronous RPCs and view changes


    When cluster-wide RPCs are invoke synchronously (ie., the caller is blocked until everyone has replied), there are issues with view changes.


    A simple example is cluster V2={A,B,C,D}.


    Let's say D crashes and - at the same time - A invokes a cluster RPC.


    If A received V3={A,B,C}, then it would wait for replies from itself, B and C. However, if A invoked the RPC first and only after returning received V3, then it would wait for D's reply forever.


    There are a few things that can be done to remedy this:

    • Bound the RPC with a timeout, say 5000. In this case, if the RPC is executed first and only then V3 is received, the RPC would return after 5000 with valid return values for A, B and C, and a null return value for D, which is marked as 'not-received'

    • Use asynchronous RPCs if you can. Sometimes, though, especially for data collection tasks, this won't do