Transactional File I/O in Java.
Summary
Transactions are often used to structure activities within reliable software applications. In Java EE, business logic typically involves accessing transactional resource managers (databases, message queues) within boundaries denoted by calls to the JTA (begin/commit/rollback). The resource managers work with the transaction manager to perform e.g. locking, logging and recovery transparently to the application programmer. However, this separation of concerns is broken with regard to one important resource: the file system. Java's file I/O library does not support transactions, a situation which requires application programmers to implement such support manually in their programs. This project aims to develop a transaction aware resource manager for file I/O in Java. This library will provide application programmers with access to a filesystem that offers ACID semantics.
Example use cases
use case: transactional changes within a single file.
A file on disk contains a number of business data entries,
each of which must be read and processed. At the end of the
processing for each entry, results are sent via JMS and the
entry in the file is marked as done. There may be lots of
entries and processing them can take a long time. The system
may crash during processing. We never want to miss
processing an entry, nor to process it more than once.
Solution: within an XA transaction, read an entry from the
file, process it, mark it as done and enqueue the JMS
message. Two-phase commit of the transaction ensures that
even if the system crashes we still get the 'mark entry as
done' and the 'send JMS message' as an single atomic update.
Even better: allow multiple transactions to be active on the
file simultaneously, provided they are updating different
parts of the file. This allows concurrent processing of
entries on multi-processor systems.
use case: batch changes on group of files.
A directory contains a number of files. Each file must be
read, some processing done and the result written back to
another file in the same directory. The source file is then
deleted. As a side effect of the processing, one or more
database updates may occur. We want to ensure each file is
processed exactly once.
Solution: within an XA transaction, read the source file,
update the database and create the output file. The
transaction ensures that the source file is never deleted
before the output file is created, nor is the source file
left intact after the output file is created. Likewise the
database is never updated unless the other operations also
complete. Even better: multiple transactions on the
directory simultaneously, so different files can be
processed in parallel.
use case: software upgrade.
A new version of a software package includes additional
files, moves some existing files, edits others and deletes
some. We would like an upgrade installer that will ensure
that if any of these operations fails, the upgrade can be
backed out smoothly, leaving the original version of the
software in a usable state. Under no circumstances should an
upgrade that fails part way through leave the installed
software unusable.
Solution: Perform all the installation operations within an
XA transaction. If any of them fails, rollback the
transaction, which returns the filesystem to its original state.
use case: clustered transactional file storage.
A web based application provides for the upload and storage
of user images. There are a large number of users, such that
we need a cluster of application servers to support them.
Some files may be shared by multiple users, such as in a
business workgroup. The server cluster may perform
asynchronous operations (crop, rotate, etc) on the image
files in response to user requests. Simultaneous
manipulations of the same image should never be allowed to
corrupt the file.
Solution: provide access to a shared network file system
through a transactional API, such that all nodes in the
cluster have the same view of the filesystem state and can
lock files to prevent interference.
Summary of required functionality:
Manipulate the contents of a single file transactionally.
Conceptually this is very similar to having multiple
concurrent transactions on a database. As long as they don't
try to update the same data (file: blocks, db: rows of
table) they should not interfere.
Allow these manipulations to be committed or rolledback as
part of an XA transaction's 2PC, so that other updates e.g.
JMS or SQL can be committed or rolledback consistently.
Manipulate the contents of a directory in the same way:
add or remove files in a transaction, such that the
operations can be undone together if needed.
Allow changes to file contents and directories in the same
transaction. This basically combines the points above.
Allow transactions to span multiple JVMs, to allow for
greater scaling. Conceptually this in similar to moving from
an in-process main memory database to one that is accessed
by multiple clients using a network protocol. JCA may be
useful here.
Discussion
Implementing the above is likely to involve changes to some
java.io. code. Fortunately it's open source. We don't want
to change too much: ideally just the low level code, not the
things like Streams that layer on top of it. Perhaps
subclassing java.io.File is one approach. Shame it's a class
not an interface! Taking that approach, Streams etc don't
need to change, they still see a File, though it's actually
our new XAFile? The way JDBC uses Connection vs.
XAConnection is worth looking at too.
The ideas being discussed in JSR-203 may useful.
The apache commons transaction project also has done a
little work on this in their file module, but it's not very
impressive last time I looked. Microsoft have some transactional NTFS work that may be relevant.
http://commons.apache.org/transaction/
http://www.infoq.com/news/2008/01/file-systems-transactions
In term of executing the project, I'd anticipate step one to
cover use in a single JVM only, with step two then adding
distribution so multiple JVMs can share transactional access
if there is enough time.
This work relates only to the JVM's view of the filesystem.
Indeed, it may relate only to the view of threads within the
JVM that are in a transaction. Other processes on the
machine will not access the filessystem through the
transactional API, so they won't be affected. This has
benefits and drawbacks. It will be hard to deal with the
case where the filesystem is changed by an external process
whilst a transaction is running. But it will be easy to keep
a virtual view of the modified version of the filesystem in
memory for the benefit of ongoing transactions. The new vfs used in AS 5.0 may be handy?
Some kind of logging mechanism will be needed to record
changes to the filesystem for use in crash recovery. Perhaps
a 'shadow' directory for each transaction, in some specially
designated out of the way place on the file system. Similar
to what a database does with transaction log files or
rollback segments.
Although what's being implemented here is basically a resource manager, part of the JBossTS transaction manager may be reusable, particularly the locking and logging support provided by ArjunaCore.
Thoughts on the apache commons transaction project
This focuses on multi level locks, transactional collections and transactional file access. The transactional collections are not of interest to this project. The locking overlaps the capabilities of ArjunaCore to some extent, but is less mature. The transactional file access focuses mainly on manipulation of filesystems i.e. the 'java.io.File as dir structure' view rather than 'java.io.File as byte{FOOTNOTE DEF }' view. It does not support XA, so it won't interoperate with a standard transaction manager unless we provide an adaptor layer. There is no crash recovery support either. In short, it may provide a useful source of ideas and some reusable code, but its not a workable solution by itself at present.
Thoughts on transactional access to File as byte{FOOTNOTE DEF }
Conceptually a single file is an array of bytes. There may be business level structure layered on top of this e.g. fixed length records, csv, lines of text, serialized objects. Most of what we know about tx concurrency control comes from the database community, where transactions work in terms of tuples, data blocks or tables. For example, row level locking attaches transaction meta data (locks) to data at the tuple level. To do meaningful concurrency control on a file, we need to adapt these ideas a bit.
How to break down a file into smaller units so we can do fine grained cc? Take a 'black box' approach where we know nothing about the business data structure and just lock on byte blocks of fixed size, or byte ranges according to what a is read/written inside the scope of a tx? Or provide hooks through which the business logic can provide some structure meta-data so we can lock at the business record level e.g. new RecordLengthSpecifier(long numberOfBytesPerRecord)
Also access patterns for a file may differ from a db. In a db, reads outnumber writes considerably. With a file, an app may read once, then write once or not at all. What implications does that have for the design? Overhead is likely to come from writing locks or versioned blocks/records to disk, so we want to minimise that.
Where records are not fixed length, a write in the middle of a file involves moving all subsequent records back or forwards, or leaving a gap. That's potentially a lot of data copying. Also if locks are expressed in terms of byte indexes and the data moves to different indexes, we have a problem... However, how common is this use case really?
To my mind transactional access to a single file breaks down into three cases:
manipulation of fixed length records e.g. read a record, process it, change a 0->1 to mark it done. record boundaries are thus fixed, so we can do concurrency control easily
conditional append: we want to append a log record to the end of a file for each tx, or perhaps just the successful tx. Ordering may or may not be important. We can't use afterCompletion sync to do the write because that's volatile and won't survive a crash. Locking here is very different to the earlier case, particularly if we can write in commit order rather that tx start time order.
everything else i.e. manipulation of variable length records. This is hard.
Perhaps we have special cases for the first two, so we do more efficient things where the business logic indicates its running one of those more limited use cases.
Thoughts on XA in a object oriented, connectionless world.
The XA spec is not object oriented, so its mapping into Java through the JTA already has some interesting issues. XA is mainly used by JCA/JMS/JDBC, which are Connection oriented. File access is not, which leaves us either a) further stretching JTA to work in a connectionless model or b) providing a JCA like filesystem provider that forces a connection centric model on file I/O. That is not too big a stretch, since Streams are already close to Connections in essence, plus users are used to a 'connection to a remote file server' approach to storage. The choice here also feeds in to work on distributed (multi JVM) transactional access, since there a Connection based model works well. With a connectionless API we would need to hide the distribution bit from the user rather than allowing it to be exposed.
Comments