This looks like the SAN is not really shared.
The KahaDB store uses a shared file lock, nio channel lock, to ensure exclusive access to the file system store directory. It would be useful to validate if this mechanism works with your SAN. Start both BrokerA and BrokerB, only one of them should get the lock and startup successfully. If both get a lock there is a sharing/sync problem as both are seeing different versions of the directory.
This appears to be what is happening in your test, does the SAN volume need to be unmounted on serverA and remounted on serverB as part of failover so that the SAN state is consistent?
We were able to get this issue resolved.
The root cause for this issue is we forgot to add a dependency between the SAN drive's logical name and the physical disk resource in Windows Cluster Administrator. As a result, after failover the secondary broker did not have access to the physical disk resource which had the persisted messages. We have added that dependency and it seems persistence works fine during failover.
Thanks for pointing us to the right direction.