7 Replies Latest reply on Mar 31, 2008 12:17 PM by manik

FileCacheLoader failing with EOFException with fix

e2open Mar 18, 2008 12:05 AM

File cache loader in a cluster was failing with EOF exception while restoring an object, under load. It looks like the original hash map was not getting written properly. The original code is given below:

protected void storeAttributes(Fqn fqn, Map attrs) throws Exception
{
File f = getDirectory(fqn, true);
File child = new File(f, DATA);
if (!child.exists())
if (!child.createNewFile())
throw new IOException("Unable to create file: " + child);
FileOutputStream out = new FileOutputStream(child);
ObjectOutputStream output = new ObjectOutputStream(out);
output.writeObject(attrs);
out.close();
}

Changing the code as given below seems to fix the issue.

protected void storeAttributes(Fqn fqn, Map attrs) throws Exception
{
File f = getDirectory(fqn, true);
File child = new File(f, DATA);
if (!child.exists())
if (!child.createNewFile())
throw new IOException("Unable to create file: " + child);
FileOutputStream out = new FileOutputStream(child);
try
{
MarshalledValueOutputStream output =
new MarshalledValueOutputStream(out);
output.writeObject(attrs);
output.close();
out = null;
}
finally
{
if(out != null)
out.close();
}
}

1. Re: FileCacheLoader failing with EOFException with fix

genman Mar 18, 2008 1:49 AM (in response to e2open)

It looks like the original code did not close the ObjectOutputStream, rather it closed the underlying file stream, which isn't correct.

Could you file a JIRA issue and link to this post?

By the way, I think the FileCacheLoader is documented as not so production worthy.
Actions
2. Re: FileCacheLoader failing with EOFException with fix

manik Mar 18, 2008 8:38 AM (in response to e2open)

Yes, please create a JIRA for this, target for 2.2.0.

Thanks
Manik
Actions
3. Re: FileCacheLoader failing with EOFException with fix

e2open Mar 19, 2008 8:19 AM (in response to e2open)

The fix above reduced the EOF exceptions but it did not completely eliminate the issue. I have a shared cache with 2 nodes and file cache loader with nfs (yes, i know that you guys don't recomment it, just building a basic infrastructure to test with). Further debugging revealed the fact that both the nodes are trying to write into the data file at the same time. One of which runs into the EOF exception as it tries to do a readObject on a 0 byte file that is being written into from the other node. Now i am confused, isn't the lock supposed to be across the cluster at the tree cache level, avoiding this issue ? Or am I missing something ?
Actions
4. Re: FileCacheLoader failing with EOFException with fix

manik Mar 19, 2008 8:30 AM (in response to e2open)

No, it is still possible that one instance performs a read, another instance performs a write, both involving a cache loader.

There is no cluster-wide lock at the start. Locking attempts to gain cluster-wide locks during the prepare phase of a 2-phase commit, and the transaction fails if this cannot be obtained.
Actions
5. Re: FileCacheLoader failing with EOFException with fix

e2open Mar 20, 2008 10:38 PM (in response to e2open)

I changed the code to do a dot file and move to avoid the EOF on read(on linux). The problem i see now is that, there are more than one write happening at the same time (from different nodes in the cluster). How is that possible if there is a write lock on the node ?

protected void storeAttributes(Fqn fqn, Map attrs) throws Exception
{
File f = getDirectory(fqn, true);
File child = new File(f, DATA);
File dotChild = new File(f, DOT_DATA);
if(dotChild.exists())
System.out.println("Found dot file : " + dotChild);
FileOutputStream out = new FileOutputStream(dotChild);
try
{
MarshalledValueOutputStream output =
new MarshalledValueOutputStream(out);
output.writeObject(attrs);
output.close();
out = null;
}
finally
{
if(out != null)
out.close();
}
if(!dotChild.renameTo(child))
{
throw new Exception("Failed to rename '" + dotChild + "' to '" +
child + "' : " + f.exists() + " : " + dotChild.exists() +
" : " + child.exists());
}
}
Actions
6. Re: FileCacheLoader failing with EOFException with fix

genman Mar 21, 2008 12:56 PM (in response to e2open)

If you trace through, you'll notice the writing to disk doesn't happen during when write lock is obtained, but actually when the transaction commits.

What probably should happen is the writes happen during the prepare phase to "dot files", perhaps named with the JGroups address, and during the commit phase the files are renamed.
Actions
7. Re: FileCacheLoader failing with EOFException with fix

manik Mar 31, 2008 12:17 PM (in response to e2open)

BTW, what version of JBC are you referring to in your original post?
Actions

Go to original post