-
1. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
tsegismont Dec 10, 2012 5:27 AM (in response to dfradkov)Hi,
How your resources appear in RHQ server after outage is resolved? UP? DOWN? Just to figure out if the problem come from availabilty report or from the alert subsystem.
Is the problem occuring on all or only some resource types?
How are your availability check intervals configured?
Thanks
-
2. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
dfradkov Dec 10, 2012 10:04 AM (in response to tsegismont)Hi,
How your resources appear in RHQ server after outage is resolved? UP? DOWN? Just to figure out if the problem come from availabilty report or from the alert subsystem.
Resources appears down.
Is the problem occuring on all or only some resource types?
We have modified DNS entries on the RHQ server and it seems that this problem now affects only some servers. It used to affect all servers. We have to restart agent when this happens then resource appears to be up on the RHQ Dashboard.
How are your availability check intervals configured?
Metric collections time varies from one minute to twenty minutes.
Thanks.
-
3. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
tsegismont Dec 10, 2012 12:34 PM (in response to dfradkov)>Resources appears down
Ok so alerting sub system may not be involved.
>We have modified DNS entries on the RHQ server and it seems that this problem now affects only some servers.
Sounds weird. Any particular error message in server/agent? Can you tell me precisely which resource types continue to show status "down"?
-
4. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
dfradkov Dec 10, 2012 4:07 PM (in response to tsegismont)Well, there are couple of exceptions the server log.
partial stack trace for couple of errors thrown in the last two days
2012-12-09 00:33:07,470 WARN [org.hibernate.util.JDBCExceptionReporter] SQL Error: 12899, SQLState: 72000
2012-12-09 00:33:07,473 ERROR [org.hibernate.util.JDBCExceptionReporter] ORA-12899: value too large for column "RHQ"."RHQ_PACKAGE_VERSION"."LICENSE_NAME" (actual: 863, maximum: 255)
2012-12-09 00:33:07,473 WARN [org.hibernate.util.JDBCExceptionReporter] SQL Error: 12899, SQLState: 72000
2012-12-09 00:33:07,473 ERROR [org.hibernate.util.JDBCExceptionReporter] ORA-12899: value too large for column "RHQ"."RHQ_PACKAGE_VERSION"."LICENSE_NAME" (actual: 863, maximum: 255)
2012-12-09 00:33:07,473 ERROR [org.hibernate.event.def.AbstractFlushingEventListener] Could not synchronize database state with session
org.hibernate.exception.GenericJDBCException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:103)
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:91)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:254)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:237)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:141)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:515)
<--------------------------------------------------------snippet----------------------------------------------------------------------------------------------------->
Caused by: java.sql.BatchUpdateException: ORA-12899: value too large for column "RHQ"."RHQ_PACKAGE_VERSION"."LICENSE_NAME" (actual: 863, maximum: 255)
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10345)
at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:230)
at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeBatch(CachedPreparedStatement.java:476)
at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:774)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:247)
... 167 more
2012-12-09 22:11:20,967 WARN [org.jboss.resource.connectionmanager.JBossManagedConnectionPool] Throwable while attempting to get a new connection: null
org.jboss.resource.JBossResourceException: Could not create connection; - nested throwable: (java.sql.SQLRecoverableException: IO Error: Socket read timed out)
at org.jboss.resource.adapter.jdbc.local.LocalManagedConnectionFactory.createManagedConnection(LocalManagedConnectionFactory.java:190)
at org.jboss.resource.connectionmanager.InternalManagedConnectionPool.createConnectionEventListener(InternalManagedConnectionPool.java:619)
at org.jboss.resource.connectionmanager.InternalManagedConnectionPool.getConnection(InternalManagedConnectionPool.java:264)
at org.jboss.resource.connectionmanager.JBossManagedConnectionPool$BasePool.getConnection(JBossManagedConnectionPool.java:575)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:347)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:332)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.allocateConnection(BaseConnectionManager2.java:402)
at org.jboss.resource.connectionmanager.BaseConnectionManager2$ConnectionManagerProxy.allocateConnection(BaseConnectionManager2.java:849)
at org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:89)
at org.quartz.utils.JNDIConnectionProvider.getConnection(JNDIConnectionProvider.java:160)
at org.quartz.utils.DBConnectionManager.getConnection(DBConnectionManager.java:112)
at org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConnection(JobStoreCMT.java:164)
at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3108)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3887)
at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3907)
Caused by: java.sql.SQLRecoverableException: IO Error: Socket read timed out
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:546)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:236)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
at org.jboss.resource.adapter.jdbc.local.LocalManagedConnectionFactory.createManagedConnection(LocalManagedConnectionFactory.java:172)
... 14 more
Caused by: oracle.net.ns.NetException: Socket read timed out
at oracle.net.ns.Packet.receive(Packet.java:339)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:296)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1102)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:320)
... 19 more
Can you tell me precisely which resource types continue to show status "down"?
Servers and server agents.
-
5. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
tsegismont Dec 11, 2012 6:10 AM (in response to dfradkov)These errors should not affect the availabilty sub-system.
Please make sure all your monitored servers have correct DNS forward and reverse mapping.
-
6. Re: RHQ Server 4.4.0 fails to discover that agent is up after receiving component down alert
dfradkov Dec 12, 2012 9:50 AM (in response to tsegismont)We are planning to update all DNS entries hopefully that will help as I am running out of ideas.