Compensating RESTful Transactions

Version 10

    Overview

    In a previous wiki article we introduced a proposal for supporting ACID transactions in a REST based environment. That protocol has availability issues (such as the need to take locks and the synchronous nature of phase one of the completion protocol) and is not appropriate for activities that may run for extended periods. Instead we propose a compensation based approach in which participants (or compensators) make changes visible but register a compensatory action which is performed if something goes wrong. We call the model LRA and is based on an old OASIS draft, Long Running Action transaction model. It is updated to be better suited for use in REST based architectures.

     

    Within the LRA model, an activity reflects business interactions: all work performed within the scope of an activity is required to be compensatable. Therefore, an activity’s work is either performed successfully or undone. How services perform their work and ensure it can be undone if compensation is required are implementation choices and is not exposed to the LRA model which simply defines the triggers for compensation actions and the conditions under which those triggers are executed. In other words, a LRA coordinator is concerned only with ensuring compensators obey the protocol necessary to make an activity compensatable (the semantics of the business interactions are not part of the model). Issues such as isolation of services between potentially conflicting activities and durability of service work are assumed to be implementation decisions. The coordination protocol used to ensure an activity is completed successfully or compensated is not two-phase and is intended to better model business-to-business interactions. Although this may result in non-atomic behaviour for the overall business activity, other activities may be started by the application or service to attempt to compensate in some other manner.

     

    In the model a LRA is tied to the scope of an activity so that when the activity terminates the LRA coordination protocol will be automatically performed either to accept or to compensate the work. For example, when a user reserves a seat on a flight, the airline reservation centre may take an optimistic approach and actually book the seat and debit the users account, relying on the fact that most of their customers who reserve seats later book them; the compensation action for this activity would be to un-book the seat and credit the user’s account.

     

    As in any business interaction, application services may or may not be compensatable. Even the ability to compensate may be a transient capability of a service. A Compensator is the LRA participant that operates on behalf of a service to undo the work it performs within the scope of a LRA or to compensate for the fact that the original work could not be completed. How compensation is carried out will obviously be dependant upon the business logic of the service.

    The Model

    The model concerns compensators and a coordinator. A client starts a new LRA via the LRA coordinator. When a service does work that may have to be later compensated within the scope of the LRA, it enlists a compensator with the LRA coordinator. Subsequently the client closes or cancels the LRA via the coordinator which in turn tells all enlisted compensators to either complete or compensate.

     

    The compensator will be invoked in the following way by the LRA coordinator when the activity terminates:

    • Success: the activity has completed successfully. If the activity is nested then compensators may propagate themselves (or new compensators) to the enclosing LRA. Otherwise the compensators are informed that the activity has terminated and they can perform any necessary cleanups.
    • Fail: the activity has completed unsuccessfully. All compensators that are registered with the LRA will be invoked to perform compensation in the reverse order. The coordinator forgets about all compensators that indicated they operated correctly. Otherwise, compensation may be attempted again (possibly after a period of time) or alternatively a compensation violation has occurred and must be logged.

    Each service is required to log sufficient information in order to ensure (with best effort) that compensation is possible. Each compensator or subordinate coordinator (in the case of nested LRAs) is responsible for ensuring that sufficient data is made durable in order to undo the LRA in the event of failures. Interposition and check pointing of state allow the system to drive a consistent view of the outcome and recovery actions taken, but allowing always the possibility that recovery isn’t possible and must be logged or flagged for the administrator. In a large scale environment or in the presence of long term failures recovery may not be automatic and manual intervention may be necessary to restore an application’s consistency.

     

    Different usage patterns for LRAs are possible, for example LRAs may be used sequentially and concurrently, where the termination of one LRA signals the start of some other unit of work within an application. However, LRAs are units of compensatable work and an application may have as many such units of work operating simultaneously as it needs to accomplish its tasks. Furthermore, the outcome of work within LRAs may determine how other LRAs are terminated. An application can be structured so that LRAs are used to assemble units of compensatable work and then held in the active state while the application performs other work in the scope of different (concurrent or sequential) LRAs. Only when the right subset of work (LRAs) is arrived at by the application will that subset be confirmed; all other LRAs will be told to cancel (complete in a failure state).

    Protocol URLs

    LRA Coordinator URL

     

    The path or stem of a LRA coordinator URL is <base uri>/lra-coordinator where base-uri is <protocol>://<hostname> (such as http://localhost:8080).

     

    1. Performing a GET on <base uri>/lra-coordinator returns a list of all transactions.
    2. Performing a GET on <base uri>/lra-coordinator/recovery returns a list of recovering transactions.
    3. Performing a GET on <base uri>/lra-coordinator/active returns a list of inflight transactions.
    4. Performing a DELETE on any of the lra-coordinator URLs will return a 401.
    The LRA URL

     

    Each client is expected to have a unique identity which we'll call ClientID (it can be a URL too).

     

    1. Performing a POST on <base uri>/lra-coordinator/start?ClientID=<ClientID> will start a new LRA with a default timeout and return a LRA URL that uniquely identifies the new LRA. An obvious choice for the URL format could be <base uri>/lra-coordinator/<LRAId>) but returning any URL should be valid (TODO change the URLs in points 2-6 below to ensure that any URL can be used to identify a LRA). Adding a query parameter, TimeLimit=<timeout>, will start a new LRA with the specified timeout (in milliseconds). If the LRA is terminated because of a timeout then the LRA will be cancelled. Adding a query parameter, ParentLRA=<parent LRA URL>, will nest the new LRA under the parent LRA which means that closing/cancelling the parent will automatically close/cancel the new LRA.
    2. Performing a GET on <base uri>/lra-coordinator/<LRAId> returns 200 if the LRA is still active.
    3. Performing a GET on <base uri>/lra-coordinator/completed/<LRAId> returns 200 if the LRA completed successfully (a 404 response means it is not present).
    4. Performing a GET on <base uri>/lra-coordinator/compensated/<LRAId> returns 200 if the LRA compensated (a 404 response means it is not present).
    5. Performing a PUT on <base uri>/lra-coordinator/<LRAId>/close will trigger the successful completion of the LRA and all compensators will be dropped by the coordinator (the complete message will be sent to each compensator). Upon termination, the URL is implicitly deleted. If it no longer exists, then 404 will be returned. The caller cannot know for sure whether the LRA completed or compensated without enlisting a compensator. The response body of the request may contain a JSON array of application specific strings (one for each compensator).
    6. Performing a PUT on <base uri>/lra-coordinator/<LRAId>/cancel will trigger the unsuccessful completion of the LRA and all compensators will be notified by the coordinator (the compensate message will be sent to each compensator).The response body of the request may contain a JSON array of application specific strings (one for each compensator).
    7. Performing a PUT on <base uri>/lra-coordinator/<LRAId>/renew along with a query parameter, TimeLimit=<timeout>, will update the timeout, in milliseconds, for the LRA starting from the time the PUT request was acted upon.

     

    Once the LRA terminates the implementation may retain information about it for an indeterminate amount of time.

    Compensators

     

    When making an invocation on a resource that needs to participate in a LRA, the LRA context (the LRA URL) needs to be transmitted to the resource. How this happens is outside the scope of this effort. It may occur as additional payload on the initial request such as in an HTTP header, or it may be that the client sends the context out-of-band to the resource. To facilitate interoperability between different implementations of this specification we recommend that the context is passed using an HTTP header with name the name "X-lra"

     

    Once a resource has the LRA context, it can register participation in the LRA (ie enlist the compensator). The compensator is free to use whatever URL structure it desires for uniquely identifying itself with the constraint that it must be unique for the LRA (ie the same compensator cannot be involved in more than one LRA). The <compensator URL> must support the following operations:

     

    1)   Performing a GET on the compensator URL will return the current status of the compensator, or 404 if the compensator is no longer present or 412 if the compensator has not yet been asked to compete or compensate. The following types are returned by compensators to indicate their current status:

    • Compensating: the compensator is currently compensating for the LRA.
    • Compensated: the compensator has successfully compensated for the LRA.
    • FailedToCompensate: the compensator was not able to compensate for the LRA. It must maintain information about the work it was to compensate until the coordinator sends it a forget message.
    • Completing: the compensator is tidying up after being told to complete.
    • Completed: the compensator has confirmed.
    • FailedToComplete: the compensator was unable to tidy-up.

    The compensator registers with a LRA by performing a PUT on the LRA URL with a body that contains the <compensator URL>. The PUT request returns a unique handle/resource reference (aka RecoveryCoordinator) so that it can be uniquely reasoned about later:

     

          <base uri>/lra-recovery-coordinator/<RecCoordId>

     

    2)   Performing a GET on this URL will return the original <compensator URL>.
    3)   Performing a PUT on this URL will overwrite the old <compensator URL> with the new one supplied.

    4)   Performing a DELETE or POST will return a 401.

    5)  Performing a POST on <compensator URL>/compensate will cause the compensator to compensate the work that was done within the scope of the transaction. Performing a GET or PUT on this url will return 400. The compensator can optionally return an application specific string which will ultimately be made available to whoever closed or cancelled the LRA.

    6)  Performing a POST on <compensator URL>/complete will cause the compensator to tidy up and it can forget this transaction. In either case the compensator will either return a 200 OK code and a <status URL> which indicates the outcome and which can be probed (via GET) and will simply return the same (implicit) information: <URL>/cannot-compensate or <URL>/cannot-complete. In the successful case the compensator can optionally return an application specific string (instead of a status URL) which will ultimately be made available to whoever closed or cancelled the LRA.

    If the compensator is unknown then 410 will be returned. It can be assumed by the coordinator that the service compensated.

    Note, a compensator that cannot compensate must maintain its information until it is told to forget via POST <compensator URL>/forget

     

    It is expected that the receipt of cannot-compensate or cannot-complete will be handled by the application or logged if not.

     

    A compensator can resign from a LRA at any time prior to the completion of an activity by performing a PUT on <base uri>/lra-coordinator/<LRAId>/remove with the URL of the compensator in the body of the request.

     

    When a compensator is enrolled within a LRA, the entity performing the enrol can supply a number of qualifiers which may be used by the coordinator and business application to influence the overall outcome of the activity. The currently supported qualifiers are:

    • TimeLimit: the time limit (in milliseconds, although the java annotation based support makes the unit configurable) that the compensator can guarantee that it can compensate the work performed by the service. After this time period has elapsed, it may no longer be possible to undo the work within the scope of this (or any enclosing) LRA and the compensate URL will be invoked. It may therefore be necessary for the application or service to start other activities to explicitly try to compensate this work. The application or coordinator may use this information to control the lifecycle of a LRA.

     

    Current Status

    This work is still at the specification stage but there is a PoC.

     

    Java Annotations

     

    A JAX-RS implementation of the specification should be achievable via a set of CDI annotations. The service developer annotates resources to specify how LRAs should be controlled:

     

    Controlling the lifecyle of an LRA

     

    @LRA - the Type element of the LRA annotation indicates whether a bean method is to be executed within a compensatable transaction context. Supported types are:

     

    REQUIRED: a new LRA is started if none is present and ended when the method finishes

    REQUIRES_NEW: if there is already an LRA it is suspended and a new one is started. When the method finishes any preexisting LRA is resumed, otherwise the new one is ended.

    MANDATORY: if there is no LRA present the method is not called and a PRECONDITION_FAILED status code is generated

    SUPPORTS: if there is already an LRA present the bean method will be called with it as the LRA context

    NOT_SUPPORTED: if there is already an LRA it is suspended (and resumed after the method finishes)

    NEVER: if there is already an LRA present the method is not called and a PRECONDITION_FAILED status code is generated

     

    The bean writer can also specify which http status codes returned by a bean method will cancel the LRA using the cancelOn or cancelOnFamily type elements which are arrays of HTTP status codes or status families, respectively.

     

    If an annotation causes an LRA to be started it will be ended when the bean method finishes. This behaviour can be overridden by setting the delayClose element to true.

     

    If there is an LRA present when a bean method is invoked it will still be active when the method finishes. This behaviour can be overridden by setting the terminal element to true.

     

    When an LRA is present it should be made available to the business logic via request and response headers (with the name "X-lra")

     

    Example:

     

      @POST

      @Path("/book")

      @Produces(MediaType.APPLICATION_JSON)

      @LRA(

                  value = LRA.Type.REQUIRED,

                  cancelOn = {Response.Status.INTERNAL_SERVER_ERROR} // cancel on a 500 code

                  cancelOnFamily = {Response.Status.Family.CLIENT_ERROR}, // cancel on any 4xx code

                  delayClose = true) // delayClose because we want the LRA to be associated with a booking until the user confirms the booking

      public Response bookTrip(...) { ... }

     

      @PUT

      @Path("/complete")

      @Produces(MediaType.APPLICATION_JSON)

      @Consumes(MediaType.APPLICATION_JSON)

      @LRA(

                   LRA.Type.SUPPORTS,

                   terminal = true) // the confirmation should trigger the closing of the LRA started in the bookTrip bean method

      public Booking confirmTrip(Booking booking) throws BookingException { ... }

     

    Compensating Activities

     

    Compensator join LRAs using the @Compensate and @Complete annotations. When a bean method executes in the context of an LRA any methods

    in the bean class that are annotated with @Compensate, @Complete and @Status will be used to as the compensator and all three must be present.

    If an annotation is present on multiple methods an arbitrary one is chosen. When the LRA is closed the method annotated with @Complete will be invoked.

    Similarly if the @Compensate method will be invoked if the LRA is cancelled. For example

     

    @POST

      @Path("/compensate")

      @Produces(MediaType.APPLICATION_JSON)

      @Compensate

      public Response compensateWork(@HeaderParam("X-lra") String lraId) { /* compensate for whatever activity the business logic has associated with lraId */}

     

    Nesting LRAs

     

    An activity can be scoped within an existing LRA using the @NestedLRA annotation. Invoking a method marked with this annotation will start a new LRA whose outcome

    depends upon whether the enclosing LRA is closed or canceled. If the nested LRA is closed but the outer LRA is canceled then the compensators registered with the

    nested LRA will be told to compensate. In the PoC there is no annotation to cancel a closed nested LRA so the demo uses an @Injected LRAClient bean (which encapsulates

    the spec API) to cancel the nested LRA (which then triggers the compensator that was registered with the @NestedLRA annoation).

     

    Timing out LRAs and Compensators

     

    The ability to compensate may be a transient capability of a service so compensators (and LRAs) can be timed out after which the compensate is called (or LRA cancelled).

    To set such a time limit use the @TimeLimit annotation, for example:

     

      @GET

      @Path("/doitASAP")

      @Produces(MediaType.APPLICATION_JSON)

      @TimeLimit(limit = 100, unit = TimeUnit.MILLISECONDS)

      @LRA(value = LRA.Type.REQUIRED)

      public Response theClockIsTicking(@HeaderParam(LRA_HTTP_HEADER) String lraId) {...}

     

    Leaving an LRA

     

    If a bean method annotated with @Leave is annotated in the context of a LRA then if the bean class has registered a compensator with the active

    LRA it will be removed from the LRA (will not be asked to complete or compensate when the LRA is ended).

     

    Reporting the status of a compensator

     

    Compensators must provide a method for reporting the status of the compensator by annotating one of the methods with the @Status annotation.

     

    It is the responsibility of the service writer to return (as a String) a valid status value from the enum:

     

    class enum CompensatorStatus {Compensating, Compensated, FailedToCompensate, Completing, Completed, FailedToComplete}

     

    If the compensator is in none of these states it should report the error using a BAD_REQUEST HTTP status code.