Automatic agent upgrade failure with secure communications
ymartin Nov 3, 2015 10:54 AMHello,
My RHQ server installation (no HA or cluster) has been upgraded from 4.4 to 4.13.1 but agents now fail to upgrade themself automatically. I have changed log level in conf/log4j.xml to diagnose and get:
2015-11-03 15:56:36,040 DEBUG [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.args-processed}Agent container has processed its command line arguments: [--daemon] 2015-11-03 15:56:36,124 INFO [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.identify-version}Version=[RHQ 4.4.0], Build Number=[516c434], Build Date=[May 7, 2012 11:05 PM] ... 2015-11-03 15:56:36,604 DEBUG [main] (enterprise.communications.command.client.ClientCommandSender)- {ClientCommandSender.added-state-listener}Added the command client sender state listener [org.rhq.enterprise.agent.AgentMain$2@3febb011]; sender is sending=[false]; notify listener immediately=[true] 2015-11-03 15:56:36,890 FATAL [RHQ Server Polling Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-not-supported}This version of the agent is not supported by the server - an agent update must be applied 2015-11-03 15:56:36,893 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentUpdateThread)- {AgentUpdateThread.started}The agent update thread has started - will begin the agent auto-update now! ... 2015-11-03 15:56:37,918 DEBUG [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-registration-attempt}Agent will now attempt to register with the server [AgentRegistrationRequest: [name=[dev-srv1]; address=[172.20.6.14]; port=[16163]; remote-endpoint=[sslsocket://172.20.6.14:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200]; regenerate-token=[false]; original-token=[<was not null>]; agent-version=[4.4.0(516c434)]] 2015-11-03 15:56:37,922 DEBUG [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.SecurityTokenCommandPreprocessor)- {SecurityTokenCommandPreprocessor.no-security-token-yet}There is no security token yet - the server will not accept commands from this agent until the agent is registered. ... 2015-11-03 15:56:41,954 INFO [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.shut-down}Agent has been shut down 2015-11-03 15:56:41,954 FATAL [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.start-failure}Failed to start the agent org.rhq.core.clientapi.server.core.AgentNotSupportedException at org.rhq.enterprise.agent.AgentMain.waitForServer(AgentMain.java:1611) at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:655) at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:428) 2015-11-03 15:56:41,956 DEBUG [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentUpdateVersion)- {AgentUpdateVersion.update-version-retrieval}Getting the agent update version via URL [https://rhqserver:7443/agentupdate/version] 2015-11-03 15:56:41,990 FATAL [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentUpdateThread)- {PromptCommand.update.download-failed}Failed to download the agent update binary. Cause: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2015-11-03 15:56:41,990 FATAL [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentUpdateThread)- {AgentUpdateThread.exception}The agent update thread encountered an exception: javax.net.ssl.SSLHandshakeException:sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> javax.net.ssl.SSLHandshakeException:sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> sun.security.validator.ValidatorException:PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> sun.security.provider.certpath.SunCertPathBuilderException:unable to find valid certification path to requested target 2015-11-03 15:56:41,990 FATAL [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentUpdateThread)- {AgentUpdateThread.cannot-restart-retry}The agent cannot restart after the aborted update, will try to update again in [60,000]ms
And here is content visible in command-trace.log
2015-11-03 15:56:38,139 TRACE {send.initiate}==>CoreServerService.connectAgent|? 2015-11-03 15:56:38,162 TRACE {send.complete}=>>CoreServerService.connectAgent|?|failed:java.lang.reflect.InvocationTargetException:null -> org.rhq.core.clientapi.server.core.AgentNotSupportedException:Agent [dev-srv1] is an unsupported aent: 4.4.0(516c434)
I have confirmed that the HTTPS port 7443 is opened with published certificates thanks to openssl s_client -connect rhqserver:7443
By the way, I would say that my keystore and truststore are configured properly in agent 4.4 conf/ as it is able to query for the server version.
My opinion is that the self-signed server certificate is rejected when downloading binary, so that the agent does not use the given truststore to accept it. How to diagnose to confirm ? Is there a work-around when upgrading with secure communications already setup ?
Thank you in advance for your help
Regards
Yves