First Deploy or Redeploy of bundle frequently fails - repeats are OK ("Failed to schedule... may be down")
ehle Sep 13, 2013 4:40 PMHello,
I was wondering if anyone else had seen this or might have some incite.
Basic scenario:
Several RH JBoss EAP6 servers running - configured as per "best practices"
2 Test RHQ (JBoss-ON 3.1.2) server (independent, with different client sets)
All servers are using RPM client:
Release : 1.el6_3
Size : 8.0 M
1. Upload a bundle.
2. Set up Destination for deployment (one or more EAP6 Servers' deployments/<app_name> directory)
3. Deploy
"Frequently," the first attempt will fail. The RHQ/JON GUI gives a message like this when you dig into the red Exclamation points:
Deployment Requested - Success
Deployment - "Failed to schedule, agent on [Resource[id=10404, uuid=4e27d296-10ed-4920-8033-3f89699d8bd6, type={JBossAS7}JBossAS7 Standalone Server, key=/usr/share/jbossas/standalone, name=EAP (192.168.1.92:9990), parent=myusername-dev03.private.com, version=EAP 6.1.0.GA]] may be down: java.lang.reflect.UndeclaredThrowableException"
(IP#s/DNS names obsfucated for privacy)
Server Log:
2013-09-13 11:39:20,697 ERROR [org.jboss.remoting.transport.socket.SocketClientInvoker] Got marshalling exception, exiting
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1476)
<SNIP>
at java.lang.Thread.run(Thread.java:724)
Caused by: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1886)
<SNIP>
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
... 22 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
<SNIP>
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
... 105 more
2013-09-13 11:39:20,706 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[schedule], targetInterfaceName=org.rhq.core.clientapi.agent.bundle.BundleAgentService}]]. Cause: java.rmi.MarshalException:Failed to communicate. Problem during marshalling/unmarshalling; nested exception is:
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
2013-09-13 11:39:21,278 ERROR [org.jboss.remoting.transport.socket.SocketClientInvoker] Got marshalling exception, exiting
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1476)
<SNIP>
at java.lang.Thread.run(Thread.java:724)
Caused by: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
<SNIP>
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
... 22 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
<SNIP>
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122)
... 105 more
2013-09-13 11:39:21,283 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[schedule], targetInterfaceName=org.rhq.core.clientapi.agent.bundle.BundleAgentService}]]. Cause: java.rmi.MarshalException:Failed to communicate. Problem during marshalling/unmarshalling; nested exception is:
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe -> javax.net.ssl.SSLException:Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe -> javax.net.ssl.SSLException:java.net.SocketException: Broken pipe -> java.net.SocketException:Broken pipe. Cause: java.rmi.MarshalException: Failed to communicate. Problem during marshalling/unmarshalling; nested exception is:
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe
Agent (Debug Logging) doesn't mention the deploy attempt, only a closed socket:
2013-09-13 11:38:57,999 DEBUG [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Built availability report for [0] resources with a size of [261] bytes in [256]ms
2013-09-13 11:39:05,919 DEBUG [WorkerThread#0[<RHQ.SERVER.IP.#>:50822]] (jboss.remoting.transport.socket.ServerThread)- WorkerThread#0[146.139.115.113:50822] closing socketWrapper: ServerSocketWrapper[5f8aff2d[TLS_DHE_DSS_WITH_AES_256_CBC_SHA: Socket[addr=/146.139.115.113,port=50822,localport=16163]].5f8aff2d]
2013-09-13 11:39:05,919 DEBUG [WorkerThread#0[<RHQ.SERVER.IP.#>:50822]] (jboss.remoting.transport.socket.ServerSocketWrapper)- wrote CLOSING
2013-09-13 11:39:05,920 DEBUG [WorkerThread#0[<RHQ.SERVER.IP.#>:50822]] (jboss.remoting.transport.socket.SocketWrapper)- ServerSocketWrapper[5f8aff2d[TLS_DHE_DSS_WITH_AES_256_CBC_SHA: Socket[addr=/146.139.115.113,port=50822,localport=16163]].5f8aff2d] closing
2013-09-13 11:39:26,537 DEBUG [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementSenderRunner)- Measurement report contains no data - not sending to Server.
2013-09-13 11:39:26,599 DEBUG [EventManager.poller-2] (core.pluginapi.event.log.LogFileEventPoller)- /usr/share/jbossas/standalone/log/server.log: 36708 new bytes
If you try again immediately, everything works.
Problem is most common after client/server has been sitting idle for a while. Once you are succeeding you can redeploy dozens of times and not get an issue.
If deploying to a group of EAP6 system, I first observed that the first system the JON/RHQ tried to deploy to would fail to deploy but the second would succeed.
On a hunch I let the JVM use more Memory, problem went a way for a while (after the server restart...) but when it came back both would usually fail.
If you choose a "Clean" Deploy or not doesn't seem to have an impact.
I have a feeling this might be a timeout of some sort - that the client isn't responding fast enough, or the server side is not waiting long enough for the connection to be established, but I can't prove it.
If you could shed some light on the matter, it would be much appreciated - or even suggestions for tests to run.
Thanks much!
David.