14 Replies Latest reply on Nov 8, 2008 7:34 AM by belaban

Cache corrupted by 64bit windows member

wjm Oct 13, 2008 8:28 PM

Hello. We've run into some trouble in attempting to bring a 64bit windows machine into an existing cluster. The cluster has (had) two 64bit linux members and was running fine when we attempted to merge a 64bit windows machine into the group. Anytime the windows machine wrote on a node, the node was thereafter unreadable by either of the original members. Exceptions of the following form were thrown by the other members. All machines were using 32bit java. Has anyone seen anything similar? Have I missed something fundamental about 64bit windows?

Thanks in advance for any help or insight.

INFO | jvm 1 | 2008/10/10 10:05:13 | 0 [ERROR] AdjListJDBCCacheLoader.reportAndRethrowError(): - Failed to load node for fqn /IntegrationModules
INFO | jvm 1 | 2008/10/10 10:05:13 | java.lang.Exception: Unable to load to deserialize result:
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.loader.AdjListJDBCCacheLoader.loadNode(AdjListJDBCCacheLoader.java:397)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.loader.AdjListJDBCCacheLoader.get(AdjListJDBCCacheLoader.java:97)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadData(CacheLoaderInterceptor.java:530)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadNode(CacheLoaderInterceptor.java:408)
[...]
INFO | jvm 1 | 2008/10/10 10:05:13 | Caused by:java.io.EOFException
INFO | jvm 1 | 2008/10/10 10:05:13 | at java.io.DataInputStream.readInt(Unknown Source)
INFO | jvm 1 | 2008/10/10 10:05:13 | at java.io.ObjectInputStream$BlockDataInputStream.readInt(Unknown Source)
INFO | jvm 1 | 2008/10/10 10:05:13 | at java.io.ObjectInputStream.readInt(Unknown Source)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.marshall.CacheMarshaller200.populateFromStream(CacheMarshaller200.java:740)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.marshall.CacheMarshaller200.unmarshallHashMap(CacheMarshaller200.java:705)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.marshall.CacheMarshaller200.unmarshallObject(CacheMarshaller200.java:564)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.marshall.CacheMarshaller200.objectFromObjectStream(CacheMarshaller200.java:147)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.marshall.VersionAwareMarshaller.objectFromStream(VersionAwareMarshaller.java:176)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.loader.AdjListJDBCCacheLoader.unmarshall(AdjListJDBCCacheLoader.java:702)
INFO | jvm 1 | 2008/10/10 10:05:13 | at org.jboss.cache.loader.AdjListJDBCCacheLoader.loadNode(AdjListJDBCCacheLoader.java:392)

1. Re: Cache corrupted by 64bit windows member

wjm Oct 13, 2008 8:30 PM (in response to wjm)

Small addendum, we're running the latest 2.2.0GA release. Thanks again.
Actions
2. Re: Cache corrupted by 64bit windows member

manik Oct 21, 2008 4:29 AM (in response to wjm)

I doubt this is an OS problem, but just to prove it, do you see this issue if you had 1 windows and 1 linux machine in the cluster, and the other linux machine were to join?
Actions
3. Re: Cache corrupted by 64bit windows member

wjm Oct 21, 2008 4:42 AM (in response to wjm)

"manik.surtani@jboss.com" wrote:
I doubt this is an OS problem, but just to prove it, do you see this issue if you had 1 windows and 1 linux machine in the cluster, and the other linux machine were to join?

Thanks for the reply. We see this problem immediately after the 64bit machine put()s or replace()s any node in the cache. That same node(s) becomes unreadable for all other members, whether current or newly joined.

Note, we have not tried clustering two 64bit windows machines, because we only have one, but presumably they would play well together.

We're going to try using the 64bit java installation on the windows machine to see if that might "help". So far we've only used the 32bit java everywhere.

Otherwise, I hope you're right, but it's certainly evident only when 64bit windows is a member.
Actions
4. Re: Cache corrupted by 64bit windows member

wjm Oct 23, 2008 11:20 AM (in response to wjm)

Just a followup here, as mentioned. When using 64bit java for windows (1.6 latest), we do see the same result.
Actions
5. Re: Cache corrupted by 64bit windows member

manik Nov 1, 2008 11:49 AM (in response to wjm)

Do you see any issues with the replication? E.g., if you turn off the cache loader - or even use a different cache loader - does this work?

Finally, given that you are using a JDBC cache loader, which database backend are you using, and is it configured to be shared?
Actions
6. Re: Cache corrupted by 64bit windows member

wjm Nov 7, 2008 11:08 AM (in response to wjm)

"manik.surtani@jboss.com" wrote:
Do you see any issues with the replication? E.g., if you turn off the cache loader - or even use a different cache loader - does this work?

Hello again, and thanks for the followup. We tried the cache without a loader today, and got the same result.
Actions
7. Re: Cache corrupted by 64bit windows member

wjm Nov 7, 2008 12:18 PM (in response to wjm)

I've now logged this as a formal JIRA case:

https://jira.jboss.org/jira/browse/JBCACHE-1432

Thanks again!
Actions
8. Re: Cache corrupted by 64bit windows member

manik Nov 7, 2008 2:17 PM (in response to wjm)

So what does this stack trace look like when you don't use a cache loader? :-)
Actions
9. Re: Cache corrupted by 64bit windows member

wjm Nov 7, 2008 2:37 PM (in response to wjm)

Ah, yes. Very fair question. :-) When it was apparant to me that the error lies in readInt() and readShort(), I didn't think about the rest. Clearly thats where the problem is, but this is the form we see without a loader enabled:

30 [ERROR] RequestCorrelator.receiveMessage(): - failed unmarshalling buffer into return value
java.io.EOFException
at java.io.DataInputStream.readShort(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source)
at java.io.ObjectInputStream.readShort(Unknown Source)
at org.jboss.cache.marshall.CacheMarshaller200.unmarshallObject(CacheMarshaller200.java:536)
at org.jboss.cache.marshall.CacheMarshaller200.objectFromObjectStream(CacheMarshaller200.java:147)
at org.jboss.cache.marshall.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:154)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:544)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:746)
at org.jgroups.JChannel.up(JChannel.java:1151)
at org.jgroups.mux.Multiplexer$Task.run(Multiplexer.java:1036)
at org.jgroups.mux.Multiplexer$ExecuteTask.run(Multiplexer.java:1060)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source

I'll attach this to the bug as well.
Actions
10. Re: Cache corrupted by 64bit windows member

genman Nov 7, 2008 2:42 PM (in response to wjm)

You're getting an EOF, meaning somebody in your cluster is cutting the connection to your system. It might just be a network connectivity issue. Look at the logs on the other machines (TRACE) and see what's triggering the disconnect, if it indeed is within the cache.
Actions
11. Re: Cache corrupted by 64bit windows member

wjm Nov 7, 2008 2:46 PM (in response to wjm)

"genman" wrote:
You're getting an EOF, meaning somebody in your cluster is cutting the connection to your system. It might just be a network connectivity issue. Look at the logs on the other machines (TRACE) and see what's triggering the disconnect, if it indeed is within the cache.

Thanks for the feedback, but I don't think this is the case. The underlying problem seems related to the way 64bit architecture writes Int and Short to the object stream, causing other cluster members to throw this sort of exception when reading anything written by a windows 64bit member writes.

I've read that Nio handles this better than ordinary DataInputStream, but I'm not well versed enough to say for sure.
Actions
12. Re: Cache corrupted by 64bit windows member

genman Nov 7, 2008 4:11 PM (in response to wjm)

I doubt that's the case. All the Java IO classes write the same number of bytes regardless of the underlying architecture. I've never heard otherwise since I started using JDK 1.0.

If you're patient, feel free to run Wireshark and see if the Windows 64 bit machine is sending anything weird.
Actions
13. Re: Cache corrupted by 64bit windows member

wjm Nov 7, 2008 5:18 PM (in response to wjm)

I'm still hoping to get a confirmation. We feel certain that anyone else attempting to blend 64bit windows in with other architectures in a cluster will experience the same result, though.
Actions
14. Re: Cache corrupted by 64bit windows member

belaban Nov 8, 2008 7:34 AM (in response to wjm)

I created a JIRA issue to verify this in JGroups (https://jira.jboss.org/jira/browse/JGRP-856). Although the specific issue you're seeing might be caused by marshalling code relying on explict assumptions about size of certain types or big/little endian issues.
Actions

Go to original post