Version 5

    Transactional File I/O in Java.

     

    Summary

    Transactions are often used to structure activities within reliable software applications. In Java EE, business logic typically involves accessing transactional resource managers (databases, message queues) within boundaries denoted by calls to the JTA (begin/commit/rollback). The resource managers work with the transaction manager to perform e.g. locking, logging and recovery transparently to the application programmer. However, this separation of concerns is broken with regard to one important resource: the file system. Java's file I/O library does not support transactions, a situation which requires application programmers to implement such support manually in their programs. This project aims to develop a transaction aware resource manager for file I/O in Java. This library will provide application programmers with access to a filesystem that offers ACID semantics.

     

     

    Example use cases

     

    use case: transactional changes within a single file.

     

    A file on disk contains a number of business data entries,

    each of which must be read and processed. At the end of the

    processing for each entry, results are sent via JMS and the

    entry in the file is marked as done. There may be lots of

    entries and processing them can take a long time. The system

    may crash during processing. We never want to miss

    processing an entry, nor to process it more than once.

     

    Solution: within an XA transaction, read an entry from the

    file, process it, mark it as done and enqueue the JMS

    message. Two-phase commit of the transaction ensures that

    even if the system crashes we still get the 'mark entry as

    done' and the 'send JMS message' as an single atomic update.

     

    Even better: allow multiple transactions to be active on the

    file simultaneously, provided they are updating different

    parts of the file. This allows concurrent processing of

    entries on multi-processor systems.

     

    use case: batch changes on group of files.

     

    A directory contains a number of files. Each file must be

    read, some processing done and the result written back to

    another file in the same directory. The source file is then

    deleted. As a side effect of the processing, one or more

    database updates may occur. We want to ensure each file is

    processed exactly once.

     

    Solution: within an XA transaction, read the source file,

    update the database and create the output file. The

    transaction ensures that the source file is never deleted

    before the output file is created, nor is the source file

    left intact after the output file is created. Likewise the

    database is never updated unless the other operations also

    complete.  Even better: multiple transactions on the

    directory simultaneously, so different files can be

    processed in parallel.

     

    use case: software upgrade.

     

    A new version of a software package includes additional

    files, moves some existing files, edits others and deletes

    some. We would like an upgrade installer that will ensure

    that if any of these operations fails, the upgrade can be

    backed out smoothly, leaving the original version of the

    software in a usable state. Under no circumstances should an

    upgrade that fails part way through leave the installed

    software unusable.

     

    Solution: Perform all the installation operations within an

    XA transaction. If any of them fails, rollback the

    transaction, which returns the filesystem to its original state.

     

    use case: clustered transactional file storage.

     

    A web based application provides for the upload and storage

    of user images. There are a large number of users, such that

    we need a cluster of application servers to support them.

    Some files may be shared by multiple users, such as in a

    business workgroup. The server cluster may perform

    asynchronous operations (crop, rotate, etc) on the image

    files in response to user requests.  Simultaneous

    manipulations of the same image should never be allowed to

    corrupt the file.

     

    Solution: provide access to a shared network file system

    through a transactional API, such that all nodes in the

    cluster have the same view of the filesystem state and can

    lock files to prevent interference.

     

    Summary of required functionality:

     

    Manipulate the contents of a single file transactionally.

    Conceptually this is very similar to having multiple

    concurrent transactions on a database. As long as they don't

    try to update the same data (file: blocks, db: rows of

    table) they should not interfere.

     

    Allow these manipulations to be committed or rolledback as

    part of an XA transaction's 2PC, so that other updates e.g.

    JMS or SQL can be committed or rolledback consistently.

     

    Manipulate the contents of a directory in the same way:

    add or remove files in a transaction, such that the

    operations can be undone together if needed.

     

    Allow changes to file contents and directories in the same

    transaction. This basically combines the points above.

     

    Allow transactions to span multiple JVMs, to allow for

    greater scaling. Conceptually this in similar to moving from

    an in-process main memory database to one that is accessed

    by multiple clients using a network protocol. JCA may be

    useful here.

     

    Discussion

     

    Implementing the above is likely to involve changes to some

    java.io. code. Fortunately it's open source. We don't want

    to change too much: ideally just the low level code, not the

    things like Streams that layer on top of it. Perhaps

    subclassing java.io.File is one approach. Shame it's a class

    not an interface!  Taking that approach, Streams etc don't

    need to change, they still see a File, though it's actually

    our new XAFile?  The way JDBC uses Connection vs.

    XAConnection is worth looking at too.

     

    The ideas being discussed in JSR-203 may useful.

     

    The apache commons transaction project also has done a

    little work on this in their file module, but it's not very

    impressive last time I looked. Microsoft have some transactional NTFS work that may be relevant.

     

    http://commons.apache.org/transaction/

     

    http://www.infoq.com/news/2008/01/file-systems-transactions

     

     

    In term of executing the project, I'd anticipate step one to

    cover use in a single JVM only, with step two then adding

    distribution so multiple JVMs can share transactional access

    if there is enough time.

     

    This work relates only to the JVM's view of the filesystem.

    Indeed, it may relate only to the view of threads within the

    JVM that are in a transaction.  Other processes on the

    machine will not access the filessystem through the

    transactional API, so they won't be affected. This has

    benefits and drawbacks. It will be hard to deal with the

    case where the filesystem is changed by an external process

    whilst a transaction is running. But it will be easy to keep

    a virtual view of the modified version of the filesystem in

    memory for the benefit of ongoing transactions. The new vfs used in AS 5.0 may be handy?

     

    Some kind of logging mechanism will be needed to record

    changes to the filesystem for use in crash recovery. Perhaps

    a 'shadow' directory for each transaction, in some specially

    designated out of the way place on the file system.  Similar

    to what a database does with transaction log files or

    rollback segments.

     

    Although what's being implemented here is basically a resource manager, part of the JBossTS transaction manager may be reusable, particularly the locking and logging support provided by ArjunaCore.

     

    Thoughts on the apache commons transaction project

     

    This focuses on multi level locks, transactional collections and transactional file access. The transactional collections are not of interest to this project. The locking overlaps the capabilities of ArjunaCore to some extent, but is less mature. The transactional file access focuses mainly on manipulation of filesystems i.e. the 'java.io.File as dir structure' view rather than 'java.io.File as byte{FOOTNOTE DEF  }' view. It does not support XA, so it won't interoperate with a standard transaction manager unless we provide an adaptor layer. There is no crash recovery support either. In short, it may provide a useful source of ideas and some reusable code, but its not a workable solution by itself at present.

     

    Thoughts on transactional access to File as byte{FOOTNOTE DEF  }

     

    Conceptually a single file is an array of bytes. There may be business level structure layered on top of this e.g. fixed length records, csv, lines of text, serialized objects.  Most of what we know about tx concurrency control comes from the database community, where transactions work in terms of tuples, data blocks or tables. For example, row level locking attaches transaction meta data (locks) to data at the tuple level.  To do meaningful concurrency control on a file, we need to adapt these ideas a bit.

     

    How to break down a file into smaller units so we can do fine grained cc? Take a 'black box' approach where we know nothing about the business data structure and just lock on byte blocks of fixed size, or byte ranges according to what a is read/written inside the scope of a tx? Or provide hooks through which the business logic can provide some structure meta-data so we can lock at the business record level e.g. new RecordLengthSpecifier(long numberOfBytesPerRecord)

     

    Also access patterns for a file may differ from a db. In a db, reads outnumber writes considerably. With a file, an app may read once, then write once or not at all. What implications does that have for the design? Overhead is likely to come from writing locks or versioned blocks/records to disk, so we want to minimise that.

     

    Where records are not fixed length, a write in the middle of a file involves moving all subsequent records back or forwards, or leaving a gap. That's potentially a lot of data copying. Also if locks are expressed in terms of byte indexes and the data moves to different indexes, we have a problem...  However, how common is this use case really?

     

    To my mind transactional access to a single file breaks down into three cases:

    • manipulation of fixed length records e.g. read a record, process it, change a 0->1 to mark it done. record boundaries are thus fixed, so we can do concurrency control easily

    • conditional append: we want to append a log record to the end of a file for each tx, or perhaps just the successful tx. Ordering may or may not be important. We can't use afterCompletion sync to do the write because that's volatile and won't survive a crash. Locking here is very different to the earlier case, particularly if we can write in commit order rather that tx start time order.

    • everything else i.e. manipulation of variable length records. This is hard.

    Perhaps we have special cases for the first two, so we do more efficient things where the business logic indicates its running one of those more limited use cases.

     

    Thoughts on XA in a object oriented, connectionless world.

     

    The XA spec is not object oriented, so its mapping into Java through the JTA already has some interesting issues. XA is mainly used by JCA/JMS/JDBC, which are Connection oriented. File access is not, which leaves us either a) further stretching JTA to work in a connectionless model or b) providing a JCA like filesystem provider that forces a connection centric model on file I/O.  That is not too big a stretch, since Streams are already close to Connections in essence, plus users are used to a 'connection to a remote file server' approach to storage. The choice here also feeds in to work on distributed (multi JVM) transactional access, since there a Connection based model works well. With a connectionless API we would need to hide the distribution bit from the user rather than allowing it to be exposed.