Transactional File I/O in Java.



Transactions are often used to structure activities within reliable software applications. In Java EE, business logic typically involves accessing transactional resource managers (databases, message queues) within boundaries denoted by calls to the JTA (begin/commit/rollback). The resource managers work with the transaction manager to perform e.g. locking, logging and recovery transparently to the application programmer. However, this separation of concerns is broken with regard to one important resource: the file system. Java's file I/O library does not support transactions, a situation which requires application programmers to implement such support manually in their programs. This project aims to develop a transaction aware resource manager for file I/O in Java. This library will provide application programmers with access to a filesystem that offers ACID semantics.



Example use cases


use case: transactional changes within a single file.


A file on disk contains a number of business data entries,

each of which must be read and processed. At the end of the

processing for each entry, results are sent via JMS and the

entry in the file is marked as done. There may be lots of

entries and processing them can take a long time. The system

may crash during processing. We never want to miss

processing an entry, nor to process it more than once.


Solution: within an XA transaction, read an entry from the

file, process it, mark it as done and enqueue the JMS

message. Two-phase commit of the transaction ensures that

even if the system crashes we still get the 'mark entry as

done' and the 'send JMS message' as an single atomic update.


Even better: allow multiple transactions to be active on the

file simultaneously, provided they are updating different

parts of the file. This allows concurrent processing of

entries on multi-processor systems.


use case: batch changes on group of files.


A directory contains a number of files. Each file must be

read, some processing done and the result written back to

another file in the same directory. The source file is then

deleted. As a side effect of the processing, one or more

database updates may occur. We want to ensure each file is

processed exactly once.


Solution: within an XA transaction, read the source file,

update the database and create the output file. The

transaction ensures that the source file is never deleted

before the output file is created, nor is the source file

left intact after the output file is created. Likewise the

database is never updated unless the other operations also

complete.  Even better: multiple transactions on the

directory simultaneously, so different files can be

processed in parallel.


use case: software upgrade.


A new version of a software package includes additional

files, moves some existing files, edits others and deletes

some. We would like an upgrade installer that will ensure

that if any of these operations fails, the upgrade can be

backed out smoothly, leaving the original version of the

software in a usable state. Under no circumstances should an

upgrade that fails part way through leave the installed

software unusable.


Solution: Perform all the installation operations within an

XA transaction. If any of them fails, rollback the

transaction, which returns the filesystem to its original state.


use case: clustered transactional file storage.


A web based application provides for the upload and storage

of user images. There are a large number of users, such that

we need a cluster of application servers to support them.

Some files may be shared by multiple users, such as in a

business workgroup. The server cluster may perform

asynchronous operations (crop, rotate, etc) on the image

files in response to user requests.  Simultaneous

manipulations of the same image should never be allowed to

corrupt the file.


Solution: provide access to a shared network file system

through a transactional API, such that all nodes in the

cluster have the same view of the filesystem state and can

lock files to prevent interference.


Summary of required functionality:


Manipulate the contents of a single file transactionally.

Conceptually this is very similar to having multiple

concurrent transactions on a database. As long as they don't

try to update the same data (file: blocks, db: rows of

table) they should not interfere.


Allow these manipulations to be committed or rolledback as

part of an XA transaction's 2PC, so that other updates e.g.

JMS or SQL can be committed or rolledback consistently.


Manipulate the contents of a directory in the same way:

add or remove files in a transaction, such that the

operations can be undone together if needed.


Allow changes to file contents and directories in the same

transaction. This basically combines the points above.


Allow transactions to span multiple JVMs, to allow for

greater scaling. Conceptually this in similar to moving from

an in-process main memory database to one that is accessed

by multiple clients using a network protocol. JCA may be

useful here.




Implementing the above is likely to involve changes to some code. Fortunately it's open source. We don't want

to change too much: ideally just the low level code, not the

things like Streams that layer on top of it. Perhaps

subclassing is one approach. Shame it's a class

not an interface!  Taking that approach, Streams etc don't

need to change, they still see a File, though it's actually

our new XAFile?  The way JDBC uses Connection vs.

XAConnection is worth looking at too.


The ideas being discussed in JSR-203 may useful.


The apache commons transaction project also has done a

little work on this in their file module, but it's not very

impressive last time I looked. Microsoft have some transactional NTFS work that may be relevant.



In term of executing the project, I'd anticipate step one to

cover use in a single JVM only, with step two then adding

distribution so multiple JVMs can share transactional access

if there is enough time.


This work relates only to the JVM's view of the filesystem.

Indeed, it may relate only to the view of threads within the

JVM that are in a transaction.  Other processes on the

machine will not access the filessystem through the

transactional API, so they won't be affected. This has

benefits and drawbacks. It will be hard to deal with the

case where the filesystem is changed by an external process

whilst a transaction is running. But it will be easy to keep

a virtual view of the modified version of the filesystem in

memory for the benefit of ongoing transactions. The new vfs used in AS 5.0 may be handy?


Some kind of logging mechanism will be needed to record

changes to the filesystem for use in crash recovery. Perhaps

a 'shadow' directory for each transaction, in some specially

designated out of the way place on the file system.  Similar

to what a database does with transaction log files or

rollback segments.


Although what's being implemented here is basically a resource manager, part of the JBossTS transaction manager may be reusable, particularly the locking and logging support provided by ArjunaCore.


Thoughts on the apache commons transaction project


This focuses on multi level locks, transactional collections and transactional file access. The transactional collections are not of interest to this project. The locking overlaps the capabilities of ArjunaCore to some extent, but is less mature. The transactional file access focuses mainly on manipulation of filesystems i.e. the ' as dir structure' view rather than ' as byte{FOOTNOTE DEF  }' view. It does not support XA, so it won't interoperate with a standard transaction manager unless we provide an adaptor layer. There is no crash recovery support either. In short, it may provide a useful source of ideas and some reusable code, but its not a workable solution by itself at present.


Thoughts on transactional access to File as byte{FOOTNOTE DEF  }


Conceptually a single file is an array of bytes. There may be business level structure layered on top of this e.g. fixed length records, csv, lines of text, serialized objects.  Most of what we know about tx concurrency control comes from the database community, where transactions work in terms of tuples, data blocks or tables. For example, row level locking attaches transaction meta data (locks) to data at the tuple level.  To do meaningful concurrency control on a file, we need to adapt these ideas a bit.


How to break down a file into smaller units so we can do fine grained cc? Take a 'black box' approach where we know nothing about the business data structure and just lock on byte blocks of fixed size, or byte ranges according to what a is read/written inside the scope of a tx? Or provide hooks through which the business logic can provide some structure meta-data so we can lock at the business record level e.g. new RecordLengthSpecifier(long numberOfBytesPerRecord)


Also access patterns for a file may differ from a db. In a db, reads outnumber writes considerably. With a file, an app may read once, then write once or not at all. What implications does that have for the design? Overhead is likely to come from writing locks or versioned blocks/records to disk, so we want to minimise that.


Where records are not fixed length, a write in the middle of a file involves moving all subsequent records back or forwards, or leaving a gap. That's potentially a lot of data copying. Also if locks are expressed in terms of byte indexes and the data moves to different indexes, we have a problem...  However, how common is this use case really?


To my mind transactional access to a single file breaks down into three cases:

  • manipulation of fixed length records e.g. read a record, process it, change a 0->1 to mark it done. record boundaries are thus fixed, so we can do concurrency control easily

  • conditional append: we want to append a log record to the end of a file for each tx, or perhaps just the successful tx. Ordering may or may not be important. We can't use afterCompletion sync to do the write because that's volatile and won't survive a crash. Locking here is very different to the earlier case, particularly if we can write in commit order rather that tx start time order.

  • everything else i.e. manipulation of variable length records. This is hard.

Perhaps we have special cases for the first two, so we do more efficient things where the business logic indicates its running one of those more limited use cases.


Thoughts on XA in a object oriented, connectionless world.


The XA spec is not object oriented, so its mapping into Java through the JTA already has some interesting issues. XA is mainly used by JCA/JMS/JDBC, which are Connection oriented. File access is not, which leaves us either a) further stretching JTA to work in a connectionless model or b) providing a JCA like filesystem provider that forces a connection centric model on file I/O.  That is not too big a stretch, since Streams are already close to Connections in essence, plus users are used to a 'connection to a remote file server' approach to storage. The choice here also feeds in to work on distributed (multi JVM) transactional access, since there a Connection based model works well. With a connectionless API we would need to hide the distribution bit from the user rather than allowing it to be exposed.