Compensating Transactions: When ACID is too much

Version 32

    I'm using this article as a place for the community to review a series of blog posts I'm preparing.

     

    The plan is to base my JUDCon:Boston talk roughly on this series of blog posts. The code examples will demonstrate the current design of the compensations API. I'll also be adding sections at the end of the talk, to explain what we have implemented so far and our future roadmap; I'm not sure how much of this I will cover in the blog posts. Maybe I'll have a part 5, that covers this.

     

    Compensating Transactions: When ACID is too much.

    Part 1: Introduction

     

    ACID transactions are a useful tool for application developers and can provide very strong guarantees, even in the presence of failures. However, ACID transactions are not always appropriate for every situation. In this series of blog posts. I'll present several such scenarios and show how an alternative non-ACID transaction model can be used.

     

    The isolation property of an ACID transaction is typically achieved through optimistic [REF] or pessimistic [REF] concurrency control. Both approaches can impose negative impacts on certain classes of applications, if the duration of the transaction exceeds a few seconds [REF]. This can frequently be the case for transactions involving slow participants (humans, for example) or those distributed over high latency networks (such as the Internet). Also, some actions cannot simply be rolled back; such as, the sending of an email or the invocation of some third-party service.

     

    A common strategy for applications that cannot use ACID, is to throw out transactions altogether. However, with this approach you are missing out on many of the benefits that transactions can provide. There are many alternative transaction models that relax some of the ACID properties, while still retaining many of the strong guarantees essential for building robust enterprise applications. These models are often referred to as "Extended Transaction models" and should be considered before deciding not to use transactions at all.

     

    In the Narayana project, we have support for three Extended Transaction models; "Nested Top Level Transactions" [REF], "Nested Transactions" [REF] and a compensation-based model based on "Sagas" [REF]. In this series of blog posts I'll be focusing on the compensation-based approach.

     

     

    What is a ‘Compensation-based transaction’?

    Transaction systems typically use a two-phase protocol to acheive atomicity between participants. This is the case for both ACID transactions and our compensation-based transactions model. In the first phase, each individual participant, of an ACID transaction, will make durable any state changes that were made during the scope of the transaction. These state changes can either be rolled back or committed later once the outcome of the transaction has been determinned. However, participants in a compensation-based transaction behave slightly differently. Here any state changes, made in the scope of the transaction, are committed during (or prior) to the first phase. In order to make "rollback" possible, a compensation handler is logged during the first phase. This allows the state changes to be 'undone' if the transaction later fails.

     

    What Affect Does this Have on the Isolation property of the Transaction?

    The Isolation property of a transaction dictates what, if any, changes are visible outside of the transaction, prior to its completion. For ACID transactions, the isolation property is usually pretty strong with database vendors offering some degree of relaxation via the isolation level configuration [REF]. However, in a compensation-based transaction the isolation level is totally relaxed allowing units of work to be completed and visible to other transactions, as the current compensation-based transaction progresses. The benefit of this model is that database resources are not held for prelonged periods of time. However, the down-side is that this model is only applicable for applications that can tolerate this reduced level of isolation.

     

    The following two diagrams show an example, where a client is coordinating invocations to multiple services that each make updates to a database. The diagrams are simplified in order to focus on the different isolation levels offered by a ACID and compensation-based transaction. The example also assumes a database is used by the service, but it could equally apply to other resources.

     

    acid-seq.png

    The diagram above shows a simplified sequence diagram of the interactions that occur in an ACID transaction. After the client begins the (ACID) transaction it invokes the first service. This service makes a change to a database and at this point database resources are held by the transaction. This example uses pessemistic locking. Had optimistic locking been used, the holding or database resources could have been delayed until the prepare phase, but this could result in more failures to prepare. The Client then invokes the other services, who may in turn hold resources on other transactional resources. Depending on the latency of the network and the nature of the work carried out by the services, this could take some time to complete. All the while, the DB resources are still held by the transaction. If all goes well, the client then requests that the transaction manager commit the transaction. The transaction manager invokes the two-phase commit protocol, by first preparing all the participants and then if all goes well, commits all the participants. It's not until the database participant is told to commit, that these database resources are released.

     

    From the diagram, you can see how, in an ACID transaction, DB resources could be held for a relativley long period of time. Also, assuming the service does not wish to make a heuristic decision, this duration is beyond the control of the service. It must wait to be informed of the outcome of the protocol, which is subject to any delays introduced by the other participants.

     

    compenastion-success-simple.png

     

    The diagram above shows a simplified sequence diagram of the intercations that occur in a compensation-based transaction. The client begins a new (compensation-based) transaction and then invokes the first service. The service then sends an update to the database, which is committed imediatlly, in a relativly short, seperate ACID transaction. At this point (not shown in the diagram) the service informs the transaction manager that it has completed it's work, which causes the transaction manager to record the outcome of this participant's work to durable storage along with the details of the compensation handler and any state required to carry out the compensation. It's possible to delay the commit of the ACID tranaction until after the compensation handler has been logged (see here [REF]), this removes a failure window in which a non-atomic outcome could occur.

     

    The client now invokes the other services who, in this example, behave similarly to the first service. Finally, the client can request that the Transaction Manager close (commit) or cancel (rollback) the compensation-based transaction. In the case of cancel, the transaction manager calls the compensating action asociated with each participant that previously completed some work. In this example, the compensating action makes an update to the database in a new, relativley short ACID transaction. The service can also be notified if/when the compensation-based transaction closes. We'll cover situations when this is useful later in this series. The notification of (close/compensate) is retried until it is acknowledged by the service. Although this is good for reliability, it does require that the logic of the handlers be idempotent.

     

    From the diagram, you can see that the duration for which DB resources are held, is greatly reduced. This comes at a cost of relaxed isolation (see the 'changes visible' marker). However, in scenarios where compensation is rare, the relaxed isolation could be of little concern as the visible changes are usually valid.

     

    It is also possible to mitigate this loss of isolation by marking the change as tentative in the first phase and then marking the change as confirmed/cancelled in the second phase. For example, the initial change could mark a seat on a plane as reserved; the seat could later be released or marked as booked, depending on the outcome of the transaction. Here we have traded the holding of database-level resources for the holding of application-level resources (in this case the seat). This approach is covered in more detail later in the series.

     

     

    What's Coming up in the Next Posts?

    The following three posts will each focus on particular set of use-cases where compensation-based transactions could prove to be a better fit than ACID transactions. In each part, I'll provide a code example, using the latest iteration of our new API for compensation-based transactions (first introduced in [REF]).

     

    Part 2: Non-transactional Work. This part will cover situations where you need to coordinate multiple non-transactional resources, such as sending an email or invoking a third party service.

    Part 3: Cross-domain Distributed Transactions: This part covers a scenario where the transaction is distributed, and potentially crosses multiple business domains.

    Part 4: Long-lived Transactions. This part covers transactions that span long periods of time and shows how it's possible to continue the transaction even if some work fails.

     

    In part five, I'll cover the status of our support for compensation-based transactions and present a roadmap for our future work.

     

     

    Compensating Transactions: When ACID is too much.

    Part 2: Non-Transactional Resources

    Introduction

    In part one [REF] in this series I explained why ACID transactions are not always appropriate. I also introduced compensation-based transactions as a possible alternative to ACID transactions. In this post I'll focus on situations where the application needs to coordinate multiple non-transactional resources and show how a compensation-based transaction could be used to solve this problem.

     

    For the sake of this discussion, I'm defining a transactional resource as one that can participate in a two phase protocol and can thus be prepared and later committed or rolled back. For example, XA-capable databases or message queues would be considered transactional resources. In contrast a non-transactional resource is one that does not offer this facility. For example, the sending of an email or printing of a cheque can not easily participate in this two phase protocol. Third party services can also be hard to coordinate in an ACID transaction. Even though these services might be implemented with ACID transactions, they may not allow participation in any existing transaction.

     

    A compensation-based transaction could be a good fit for these situations. The non-transactional work can be carried out in the scope of the compensation-based transaction. Providing that a compensation handler is registered, the work can later be undone, should the compensation-based transaction need to be aborted. For example, the compensation handler for sending an email, could be to send a second email asking the recipient to disregard the first email. The printing of a cheque could be compensated by canceling the cheque and notifying the recipient of the cancellation.

     

    It's also possible to coordinate transactional and non-transactional resources in a compensation-based transaction. Here the application just needs to create compensation handlers for the non-transactional resources. You could still use an ACID transaction with the last resource commit optimization (LRCO) [REF] if you only have one non-transactional resource, but this approach is not recommended if you have multiple non-transactional resources.

     

    In a nutshell: If you find yourself needing to coordinate multiple non-transactional resources, you should consider using compensations.

     

    Code Example

     

    In this code example, we have a simple service that is used by an EComerce application to sell books. As well as making updates to transactional resources, such as a database, it also needs to send an email notifying the customer that the order was made.

     

    public class BookService {
    
        @Inject
        EmailSender emailSender;
    
        @Compensatable
        public void buyBook(String item, String emailAddress) {
    
            emailSender.notifyCustomerOfPurchase(item, emailAddress);
            //Carry out other activities, such as updating inventory and charging the customer
        }
    }
    


    The above class represents the BookService. The 'buyBook' method coordinates updates to the database and notifies the customer via an email. The 'buyBook' method is annotated with '@Compensatable'. Processing of this annotation ensures that a compensation-based transaction is running when the method is invoked. This annotation is processed similarly to the @Transactional annotation (new to JTA 1.2). The key difference being that it works with a compensation-based transaction, rather than a JTA (ACID) transaction. An uncaught RuntimeException (or subclass of) will cause the transaction to be canceled, and any completed work to be compensated. Again, this behavior is based on the Transaction handling behavior of @Transactional in JTA 1.2.

     

    For the sake of brevity, I have excluded the calls to update the other transactional resources. Part 3 of this series will show interoperation with JTA ACID transactions.

     

    public class EmailSender {
    
        @Inject
        OrderData orderData;
      
        @CompensateWith(NotifyCustomerOfCancellation.class)
        public void notifyCustomerOfPurchase(String item, String emailAddress) {
    
            orderData.setEmailAddress(emailAddress);
            orderData.setItem(item);
            System.out.println("Sending Email to confirm Order...");
        }
    }
    

     

    This class carries out the work required to notify the customer. In this case it simulates the sending of an email. The method 'notifyCustomerOfPurchase' can later be compensated, should the transaction fail. This is configured through the use of the 'CompensateWith' annotation. This annotation specifies which class to use to compensate the work done within the method. For this compensation to be possible, it will need available to it, key information about the work completed. In this case the item ordered and the address of the customer. This data is stored in a CDI managed bean, 'orderData', which as we will see later, is also injected in the compensation handler.

     

    @CompensationScoped
    public class OrderData {
    
        private String item;
        private String emailAddress;
        ...
    }
    

     

    This managed bean represents the state required by the compensation handler to undo the work. The key thing to notice here is that the bean is annotated with @CompensationScoped. This scope is very similar to the @TransactionScoped annotation (new in JTA 1.2). This annotation ensures that the lifecycle of the bean is tied to the current running transaction. In this case the lifecycle of the compensation based transaction, but in the case of @TransactionScoped it is tied to the lifecycle of the JTA transaction. The @CompensationScoped bean will also be serialized to the transaction log, so that it is available in the case that the compensation handler needs to be invoked at recovery time [REF].

     

    public class NotifyCustomerOfCancellation implements CompensationHandler {
    
        @Inject
        OrderData orderData;
    
        @Override
        public void compensate() {
            String emailMessage = "Sorry, your order for " + orderData.getItem() + " has been cancelled";
            System.out.println("Sending 'emailMessage' to '" + orderData.getEmailAddress() + "' apologising for cancellation");
        }
    }
    
    

     

    This class implements the compensation handler. For our example it simply takes the details of the order from the injected OrderData bean and then sends and email to the customer informing them that the order failed.

     

    Summary

    In this blog post I explained why it's difficult to coordinate non-transational resources in an ACID transaction and showed how a compensation-based transaction can be used to solve this problem.

     

    Part 3, of this series, will look at cross-domain distributed transactions: Here I'll show that ACID transactions are not always a good choice for scenarios where the transaction is distributed, and potentially crossing multiple business domains. I'll show how a compensation-based transaction could be used to provide a better solution.

     

     

    Compensating Transactions: When ACID is too much

    Part 3: Cross-Domain Distributed Transactions

     

    Introduction

    In part one [REF] in this series I explained why ACID transactions are not always appropriate. I also introduced compensation-based transactions as a possible alternative to ACID transactions. In this post I'll show how compensation-based transactions could be a better fit, than ACID transactions, for distributed applications that cross high latency networks or span multiple business domains.

     

    When your application becomes distributed, and more systems become involved, you inevitably increase the chances of failure. Many of these failures can be tolerated using a distributed ACID transaction. However, an ACID transaction can be seen as impractical for certain distributed applications. The first reason for this, is that distributed transactions, that cross high latency networks (such as the Internet), can take a relatively long time to run. As I showed in part 1 [REF], increasing the time to run an ACID transaction can have negative impacts on your application. For example, the holding of database resources for prolonged periods can significantly reduce the throughput of your application. The second reason is due to the tight coupling between the participants of the transaction. This tight coupling occurs because the root coordinator of the transaction ultimately drives all the transactional resources through the 2PC protocol. Therefore, once prepared, a transactional resource has to either make a heuristic decision (bad) or wait for the root coordinator to inform it of the outcome. This tight-coupling may be acceptable if your distributed application resides in a single business domain where you have control of all the parties. However, it is less likely to be acceptable for distributed applications that span multiple business domains.

     

    Compensation-based transactions could prove to be a better solution for these scenarios. As I showed in part 1 [REF], compensation-based transactions can be more suitable for longer lived transactions, as they don't need to hold onto database resources until the transaction completes. Compensation-based transactions can also be used to decouple the back-end resources from the transaction coordinator. This can be done by splitting the two phases of the protocol into abstract business operations, such as book/cancel. The 'book' operation makes an update to the database to create the booking. As this update is committed immediately, there is no tight-coupling between the database resources and the transaction coordinator. The cancel operation is invoked by the compensation handler, should the compensation-based transaction need to abort.

     

    Code Example

     

    In this example we'll look at a simple travel booking example, in which a client makes a hotel and taxi booking with remote services, inside a compensation-based transaction. These remote services live in different business domains and are invoked over the Internet.


    public class Client {
    
        @Compensatable
        public void makeBooking() throws BookingException {
    
             // Lookup Hotel and Taxi Web Service ports here...
    
            hotelService.makeBooking("Double", "paul.robinson@redhat.com");
            taxiService.makeBooking("Newcastle", "paul.robinson@redhat.com");
        }
    }
    
    

     

    This code forms part of the client application. The 'makeBooking' method is annotated with '@Compensatable' which ensures that the method is invoked within a compensation-based transaction. The method invokes two Web services. These Web services support WS-BA [REF] , so the transaction is transparently distributed over these calls.

     

     

    @WebService
    public class HotelService {
    
        @Inject
        BookingData bookingData;
    
        @Compensatable(MANDATORY)
        @CompensateWith(CancelBooking.class)
        @ConfirmWith(ConfirmBooking.class)
        @Transactional(value=REQUIRES_NEW, rollbackOn=BookingException.class)
        @WebMethod
        public void makeBooking(String item, String user) throws BookingException {
    
            //Update the database to mark the booking as pending...
    
            bookingData.setBookingID("the id of the booking goes here");
        }
    }
    

     

    Here's the code for the Hotel's Web Service. As well as the usual JAX-WS annotations that you would expect (some omitted for brevity), there are some extra annotations to manage the transactions. The first is @Compensatable(MANDATORY); this ensures that this method is invoked within the scope of a compensation-based transaction. The second annotation (@CompensateWith) provides the compensation handler, which you should be familiar with from Part 2 in this series. The third annotation (@CompleteWith), may be new to you. This annotation provides a handler that is invoked at the end of the transaction if it was successful. This allows the application to make final changes once it knows the transaction will not be compensated. Hopefully, the need for this feature will become more clear as we discuss this example further. Finally, this example uses the JTA 1.2 @Transactional annotation to begin a new JTA transaction. This transaction is used to make the update to the database and will commit if the 'makeBooking' method completes successfully. The JTA transaction will rollback if a BookingException (see the rollbackOn attribute) or a RuntimeException (or a subclass of) are thrown.

     

    So, why does the JTA transaction commit at the end of this method call even though the compensation-based transaction is still running?

     

    This is done to reduce the amount of time the service holds onto database resources. The application could simply add the booking to the database, but it's possible that at some time in the future it might need to be canceled. Therefore in this example, the application just marks the booking as pending. Therefore, any other transaction that reads the state of the bookings table will see that this particular booking is tentative. Remember, by using a compensation-based transaction, we have relaxed isolation, and this is one way in which the application can be modified to tolerate this.

     

    public class ConfirmBooking implements ConfirmationHandler {
    
        @Inject
        BookingData bookingData;
      
        @Override
        @Transactional(REQUIRES_NEW)
        public void confirm() {
            //Confirm order for '" + bookingData.getBookingID() + "' in Database (in a JTA transaction)
        }
    }
    

     

    As mentioned above, the ConfirmationHandler provides a callback that occurs when the compensation-based transaction completes successfully. In this example, the confirmation handler begins a new JTA transaction and then updates the database to mark the booking as finalized.

     

    public class CancelBooking implements CompensationHandler {
    
        @Inject
        BookingData bookingData;
    
        @Override
        @Transactional(REQUIRES_NEW)
        public void compensate() {
            //Cancel order for bookingData.getBookingID() in Database (in a new JTA transaction)
        }
    }
    

     

    Similarly, we also have a compensation handler that starts a new JTA transaction which cancels the booking.

     

    In this example we are essentially making a trade-off between 2 shorter lived JTA transactions, with relaxed isolation (in this example) in place of 1 longer lived JTA transaction with stronger isolation (had we used a distributed JTA transaction). We also needed to make a change to the application to accommodate the relaxed isolation. Wether this is a sensible trade-off will depend largely on your application and performance requirements.

     

    It is also worth noting that the middleware can (we don't yet implement this <REF1> <REF2>) reliably tie the compensation-based and the JTA transaction together. It does this by preparing the JTA transaction when the method completes, but holds off committing it until the compensation handler has been logged to the transaction log. This ensures that the work is only committed if it can later be compensated in the case of failure. The invocation of the CompensationHandler and the ConfirmationHandler is also reliable as they are invoked repeatedly until they complete successfully. Therefore, it is important for thier implementations to be idempotent.

     

    Compensating Transactions: When ACID is too much

    Part 4: Long Lived Transactions

     

    Introduction

    In part one [REF], I explained how ACID transactions can have a negative impact on applications whose transactions can take a relatively long time to run. In addition, another potential issue with ACID transactions is that the failure of one unit can cause the entire transaction to be rolled back. This is less of an issue for short running transactions, as the previously successful work can be retried quickly. However, for long running transactions, this previously completed work may be significant, leading to a lot of waste should it need to be rolled back. In this post, I'll show how a compensation-based transaction could be a better fit for long lived transactions.

     

    A compensation-based transaction can be composed of multiple short-lived ACID transactions. When each transaction completes, it releases the locks on the resources it held, allowing other transactions, requiring those resources, to proceed. A compensation action is registered for each ACID transaction and can be used to undo any work completed, should the entire compensation-based transaction need to be aborted. Furthermore, should one of these short-lived ACID transactions fail, it could be possible to find an alternative, preventing the entire transaction from failing. This allows forward progress to be achieved. By composing the compensation-based transaction as several units of work, you also gain the opportunity to selectively abort (compensate) particular units as the compensation-based transaction progresses. A simple example should help to clarify these ideas...

     

    For example, take a travel booking scenario. We begin by booking a flight. We then try to book a taxi, but that fails. At this point we don’t want to compensate the flight as it may be fully-booked next time we try. Therefore we try to find an alternative Taxi, which in this example succeeds. Later, in the compensation-based transaction, we may find a cheaper flight, in which case we want to cancel the original flight whilst keeping the taxi and the cheaper flight. In this case we notify our intentions to the transaction manager who ensures that the more expensive flight is compensated when the compensation-based transaction completes.

     

    Code Example

    In this example, I expand on the Travel Agent example from part 3 <REF> in this series. Here I will show how a failure to complete one unit of work does not have to result in the whole compensation-based transaction being aborted.

     

    public class Agent {
    
        @Inject
        HotelService hotelService;
    
        @Inject
        Taxi1Service taxi1Service;
    
        @Inject
        Taxi2Service taxi2Service;
    
        @Compensatable
        public void makeBooking(String emailAddress, String roomType, String destination) throws BookingException {
    
            hotelService.makeBooking(roomType, emailAddress);
      
            try {
                taxi1Service.makeBooking(destination, emailAddress);
            } catch (BookingException e) {
                    /**
                     * Taxi1 rolled back, but we still have the hotel booked. We don't want to lose it, so we now try Taxi2
                     */
                    taxi2Service.makeBooking(destination, emailAddress);
            }
    
        }
    }
    

    For this example, you can imagine that the Hotel and Taxi services are implemented similarly to the HotelService in part 3 <REF>.

    The makeBooking method is annotated with @Compensatable, which ensures that the method is invoked within a compensation-based transaction. The method begins by making a Hotel reservation. If this fails, we don't handle the BookingException, which causes the compensation-based transaction to be canceled. We then move onto booking a taxi. If this particular booking fails, we catch the BookingException and try an alternative Taxi company. Because the Taxi service failed immediately, we know that it should (it's a requirement of using this API) have undone any of it's work. We can therefore chose to fail the compensation-based transaction or, in this case try an alternative Taxi company. The important thing to note here is that we still have the Hotel booked and we don't really want to lose this booking as the hotel may be fully booked next time we try. The code then goes on to attempt an alternative Taxi company. If this booking fails, we have no option but to cancel the whole transaction as we have no other alternatives.

    Conclusion

    In this blog post, I showed how a compensation-based transaction could be a good fit for long running transactions. In the next part I'll discuss the status of our API for compensation-based transactions.

     

     

    Compensating Transactions: When ACID is too much

    Part 5: Status of our API for Compensation-Based Transactions