5 Replies Latest reply on Jun 26, 2012 10:11 AM by jbertram

Shared File System Master Slave

amshaik Jun 25, 2012 11:31 AM

In the following link, ActiveMQ has a NFSv3 warning about some locking issues that essentially renders the data in the shared file system inaccessbile to the slave node. If I understand correctly, when the master node is abnormally terminated, it does not release the lock on the file system, after which the slave cannot obtain it.

http://activemq.apache.org/shared-file-system-master-slave.html

Since HornetQ also uses a shared file system, I would like to know whether it has avoided this problem or not. If it has, how exactly has it accomplished this? It makes me nervous that ActiveMQ has openly stated this problem exists, so I would like to know if HornetQ has solved this somehow.

Thanks.

1. Re: Shared File System Master Slave

jbertram Jun 25, 2012 12:03 PM (in response to amshaik)

From http://docs.jboss.org/hornetq/2.2.14.Final/user-manual/en/html/ha.html#ha.mode.shared:
We do not recommend you use Network Attached Storage (NAS), e.g. NFS mounts to store any shared journal (NFS is slow).

Aside from the fact that NFS is slow, we've also seen locking issues as well.
1 of 1 people found this helpful
Actions
2. Re: Shared File System Master Slave

amshaik Jun 25, 2012 2:40 PM (in response to jbertram)

Thanks for the quick and helpful reply Justin.

Could you specify a bit more about what these "locking issues" actually are? Is it the same exact problem that ActiveMQ is describing? I'm currently using a NFSv3 shared directory between my live and backup HornetQ servers, and I have not yet encountered any locking issues in my testing of HornetQ. I'd like to be able to recreate some of these locking problems for testing purposes; if you could provide some details on when and where exactly the problem occurs, that would be very helpful.
Actions
3. Re: Shared File System Master Slave

jbertram Jun 25, 2012 3:47 PM (in response to amshaik)

It's hard to be specific about the locking issue because I received the report from a colleague. I believe the problem was that the back-up node didn't recognize the lock that the live node had on the journal which caused the back-up to start up completely and then there were two nodes performing IO on the journal which of course caused a big mess. I'm not sure which version of NFS was involved.

I believe NFS was tested early in HornetQ's development cycle and the results of those tests prompted the documentation excerpted above.

We do not have ongoing tests of HornetQ and NFS so maybe things have changed for the better in later versions, but we haven't taken any special steps to work around the problems (real or perceived) with NFS which I believe was your original question.
Actions
4. Re: Shared File System Master Slave

amshaik Jun 26, 2012 10:08 AM (in response to jbertram)

Thanks Justin. Does HornetQ have any alternatives besides file locking? Can the backup server ping the live server to see if it's still alive?
Actions
5. Re: Shared File System Master Slave

jbertram Jun 26, 2012 10:11 AM (in response to amshaik)

Does HornetQ have any alternatives besides file locking?
I'm not aware of any alternative.
Actions

Go to original post