ClientSessionfactoryImpl: inconsistent delay for scheduleWithFixedDelay
michael10 Jan 30, 2013 10:42 AMWe are using HornetQ 2.2.14 version.
In our logfiles we found unexpected reconnection loggings, although there where no network connection problems.
We found out that the problem is in the java.util.concurrent.ScheduledThreadPoolExecutorService.You use it in ClientSessionFactoryImpl
It seems that the ScheduledExecutorService didn't executes the pingRunnables in time (we configured the default of 30sec.).
So the server didn´t acknowledge the pings and after 60sec. (connetionTTL), the server close the client connection. Therefore the client must reconnect. And so on.....
It's was only reproduceable on specific OS systems (Windows Server 2003 SP2 Build 3790) and its maybe hardware specific. We are also trying the newest java version.
In our bugfix we are replace the ScheduledExecutorService in the ClientSessionFactory code and using the Timer.schedule() and TimerTask instead.
In the method where the connection failover is handeled you call cancelScheduledTasks(); only if reconnectAttempts are not equals 0.
Is there any reason why you don't call it if reconnectAttempts are set to 0?
We are setting reconnectAttempts=0 and we are getting into troubles, when cancelScheduledTasks() is not called. The Timer will not be garbage collected if the TimerTask is not cancelled.
So I have add cancelScheduledTasks(); also in the else case and it works fine.
The orig code of ClientSessionImpl:
private void failoverOrReconnect(final Object connectionID, final HornetQException me)
{
Set<ClientSessionInternal> sessionsToClose = null;
synchronized (failoverLock)
{
if (connection == null || connection.getID() != connectionID)
{
// We already failed over/reconnected - probably the first failure came in, all the connections were failed
// over then a async connection exception or disconnect
// came in for one of the already exitLoop connections, so we return true - we don't want to call the
// listeners again
return;
}
if (ClientSessionFactoryImpl.isTrace)
{
ClientSessionFactoryImpl.log.trace("Client Connection failed, calling failure listeners and trying to reconnect, reconnectAttempts=" + reconnectAttempts);
}
// We call before reconnection occurs to give the user a chance to do cleanup, like cancel messages
callFailureListeners(me, false, false);
// Now get locks on all channel 1s, whilst holding the failoverLock - this makes sure
// There are either no threads executing in createSession, or one is blocking on a createSession
// result.
// Then interrupt the channel 1 that is blocking (could just interrupt them all)
// Then release all channel 1 locks - this allows the createSession to exit the monitor
// Then get all channel 1 locks again - this ensures the any createSession thread has executed the section and
// returned all its connections to the connection manager (the code to return connections to connection manager
// must be inside the lock
// Then perform failover
// Then release failoverLock
// The other side of the bargain - during createSession:
// The calling thread must get the failoverLock and get its' connections when this is locked.
// While this is still locked it must then get the channel1 lock
// It can then release the failoverLock
// It should catch HornetQException.INTERRUPTED in the call to channel.sendBlocking
// It should then return its connections, with channel 1 lock still held
// It can then release the channel 1 lock, and retry (which will cause locking on failoverLock
// until failover is complete
if (reconnectAttempts != 0)
{
lockChannel1();
final boolean needToInterrupt;
synchronized (exitLock)
{
needToInterrupt = inCreateSession;
}
unlockChannel1();
if (needToInterrupt)
{
// Forcing return all channels won't guarantee that any blocked thread will return immediately
// So we need to wait for it
forceReturnChannel1();
// Now we need to make sure that the thread has actually exited and returned it's connections
// before failover occurs
synchronized (exitLock)
{
while (inCreateSession)
{
try
{
exitLock.wait(5000);
}
catch (InterruptedException e)
{
}
}
}
}
// Now we absolutely know that no threads are executing in or blocked in createSession, and no
// more will execute it until failover is complete
// So.. do failover / reconnection
CoreRemotingConnection oldConnection = connection;
connection = null;
try
{
connector.close();
}
catch (Exception ignore)
{
}
cancelScheduledTasks();
connector = null;
reconnectSessions(oldConnection, reconnectAttempts);
if (oldConnection != null)
{
oldConnection.destroy();
}
}
else
{
CoreRemotingConnection connectionToDestory = connection;
if (connectionToDestory != null)
{
connectionToDestory.destroy();
}
connection = null;
}
if (connection == null)
{
synchronized (sessions)
{
sessionsToClose = new HashSet<ClientSessionInternal>(sessions);
}
callFailureListeners(me, true, false);
}
}
// This needs to be outside the failover lock to prevent deadlock
if (connection != null)
{
callFailureListeners(me, true, true);
}
if (sessionsToClose != null)
{
// If connection is null it means we didn't succeed in failing over or reconnecting
// so we close all the sessions, so they will throw exceptions when attempted to be used
for (ClientSessionInternal session : sessionsToClose)
{
try
{
session.cleanUp(true);
}
catch (Exception e)
{
ClientSessionFactoryImpl.log.error("Failed to cleanup session");
}
}
}
}
Greetings Michael