Strange problem with Standalone 2.2.5 journal/pages...
didka Sep 22, 2011 2:55 PMHi!
We have standalone 2.2.5 Hornetq installation on Ubuntu Server. One active, one backup , static connection between them. Shared store is SAN mounted on Ubuntu as ocfs2. NIO is used for connections, Journal is AIO.
2 instances of the same application (servlet + ssb on clustered Jboss via JCA) send 1-2K messages to 1 paged queue. Both connection pools grow up to 50 connections each.
Queue has MDB consumer (10 sessions) and also non exclusive divert to another queue (second queue paging policy is drop). There are no consumers on second queue yet.
Consumers send messages with 300-700 msg/sec. MDB consumes them. And everything seems nice, but:
In logs we have a lot of such messages:
[Thread-3 (group:HornetQ-scheduled-threads-22262475)] 19:21:18,874 WARNING [org.hornetq.core.transaction.impl.ResourceManagerImpl] transaction with xid XidImpl (1973652184 bq:55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.100.52 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.56.55 timed out
Sometimes such messages:
[Thread-3 (group:HornetQ-scheduled-threads-22262475)] 19:21:18,875 SEVERE [org.hornetq.core.transaction.impl.ResourceManagerImpl] failed to timeout transaction, xid:XidImpl (2021427911 bq:55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.52.97 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.48.102.50.51
java.lang.IllegalStateException: Transaction is in invalid state SUSPENDED
at org.hornetq.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:345)
at org.hornetq.core.transaction.impl.ResourceManagerImpl$TxTimeoutHandler.run(ResourceManagerImpl.java:228)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Sometimes such:
[New I/O server worker #1-34] 19:30:53,650 WARNING [org.hornetq.core.protocol.core.impl.HornetQPacketHandler] Reattach request from /192.168.168.197:53634 failed as there is no confirmationWindowSize configured, which may be ok for your system
But confirmation window is configured in ConnectionFactory in hornetq-jms-xml (3Mb)
And after several hours or 1 -2 days server stuck, all consumers and producers stuck also. No failover, restart doesn't help. But if I stop server, clear shared store (remove journal and pages) and start it again, - consumers and producers are able to reconnect and system starts to work again.
Could you help? What can be the problem?