Jopr 2.2 server crashing sporadically
fbrueseke Jun 8, 2009 7:43 AMHi Jopr experts.
I am currently load testing my measurement program with Jopr at the very heart of it. After about ~5.5h of operating with only the usual errors the server just crashes. I get a strange error message and then the server goes on and the crash occurs shortly after this. The following are the last lines in the "rhq-server-log4j.log" logfile:
2009-06-05 19:22:56,246 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Purging availablities that are older than Thu Jun 05 19:22:56 CEST 2008 2009-06-05 19:22:58,121 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Availability data purged [0] - completed in [1875]ms 2009-06-05 19:22:58,121 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Database maintenance starting at Fri Jun 05 19:22:58 CEST 2009 2009-06-05 19:22:58,121 INFO [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] Performing hourly database maintenance 2009-06-05 19:25:09,701 ERROR [org.jboss.remoting.transport.socket.ServerThread] Worker thread initialization failure java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at org.jboss.remoting.transport.socket.ServerThread.createServerSocketWrapper(ServerThread.java:706) at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:364) at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:165) Caused by: java.net.SocketException: Software caused connection abort: recv failed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2213) at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2226) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2694) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:761) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:277) at org.jboss.remoting.loading.ObjectInputStreamWithClassLoader.<init>(ObjectInputStreamWithClassLoader.java:95) at org.jboss.remoting.serialization.impl.java.JavaSerializationManager.createInput(JavaSerializationManager.java:54) at org.jboss.remoting.marshal.serializable.SerializableUnMarshaller.getMarshallingStream(SerializableUnMarshaller.java:72) at org.jboss.remoting.marshal.serializable.SerializableUnMarshaller.getMarshallingStream(SerializableUnMarshaller.java:55) at org.jboss.remoting.transport.socket.ClientSocketWrapper.createInputStream(ClientSocketWrapper.java:185) at org.jboss.remoting.transport.socket.ClientSocketWrapper.createStreams(ClientSocketWrapper.java:164) at org.jboss.remoting.transport.socket.ClientSocketWrapper.<init>(ClientSocketWrapper.java:66) at org.jboss.remoting.transport.socket.ServerSocketWrapper.<init>(ServerSocketWrapper.java:46) ... 7 more 2009-06-05 19:25:55,571 WARN [org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [PADFBRUESEKE1.pad.orga-systems.net] since [Fri Jun 05 19:23:50 CEST 2009]. Will be backfilled since we suspect it is down 2009-06-05 19:26:22,694 INFO [org.rhq.enterprise.server.core.CoreServerServiceImpl] Agent [PADFBRUESEKE1.pad.orga-systems.net][1.2.0(3872)] would like to connect to this server 2009-06-05 19:27:09,236 INFO [org.rhq.enterprise.server.core.CoreServerServiceImpl] Agent [PADFBRUESEKE1.pad.orga-systems.net] has connected to this server at Fri Jun 05 19:27:09 CEST 2009 2009-06-05 19:27:28,906 INFO [org.rhq.enterprise.server.measurement.MeasurementServerServiceImpl] Performance: measurement merge [20] timing (19467)ms 2009-06-05 19:27:29,499 INFO [org.rhq.enterprise.server.measurement.MeasurementServerServiceImpl] Performance: measurement merge [623] timing (20060)ms
I am not sure if this Exception is correlated with the server crash. I also can see from my load test results that the system load was unusually high at that time. The measured system's throughput was exceptionally low for about 15min (i.e. in 3 five minute intervals) and it went right up after Jopr server was down.
I have experienced this behaviour for a few times now and have no explanation whatsoever. So, what log configuration changes would you advocate to shed some light on this issue?
Kind regards
Frank