Version 3

    Synchronous RPCs and view changes

     

    When cluster-wide RPCs are invoke synchronously (ie., the caller is blocked until everyone has replied), there are issues with view changes.

     

    A simple example is cluster V2={A,B,C,D}.

     

    Let's say D crashes and - at the same time - A invokes a cluster RPC.

     

    If A received V3={A,B,C}, then it would wait for replies from itself, B and C. However, if A invoked the RPC first and only after returning received V3, then it would wait for D's reply forever.

     

    There are a few things that can be done to remedy this:

    • Bound the RPC with a timeout, say 5000. In this case, if the RPC is executed first and only then V3 is received, the RPC would return after 5000 with valid return values for A, B and C, and a null return value for D, which is marked as 'not-received'

    • Use asynchronous RPCs if you can. Sometimes, though, especially for data collection tasks, this won't do