So are you thinking about a service that generates thread dumps (or tweaking the existing one), with configuration options to control where the dumps go? With hooks to inject the service into other interested services, e.g. HAPartition? HAPartition would then decide whether an event (e.g. a timeout on an RPC) justifies calling into the thread dump service.
In the case of an RPC timeout, only the caller knows it happened.
So are you thinking about a service that generates thread dumps (or tweaking the existing one), with configuration options to control where the dumps go?
Something along those lines but was thinking more about a service that lies in the management console that responds to specific JMX events and then requests generation of thread dumps. Could be a standard service in AS, but seems to fit more the monitoring area.
This service should be able to instruct not only the local node, but other cluster members to generate thread dumps as well. For example if the TE happened when doing a sync repl, or when doing a sync JGroups RPC call, request the node where the sync repl/rpc failed to generate a thread dump.
Information that would be needed:
- timestamp (kill -3 does not provide timestamp of the thread dump!)
- thread dump
- some kind of unique id shared by all thread dumps in all nodes that were generated from a specific failure.
- some information to match the thread dump(s) to the failure in the logs.
The aim is for someone to be able to say something like this: "Machine A reported a TE (with suspected=false) and these are the thread dumps taken immediately after from Machines B,C,D in the cluster that are associated with this TE. I have already the GC logs in case the TE was due to long garbage collection."
With hooks to inject the service into other interested services, e.g. HAPartition? HAPartition would then decide whether an event (e.g. a timeout on an RPC) justifies calling into the thread dump service.
In the case of an RPC timeout, only the caller knows it happened.
That could work as well. My comment above on JMX notifications would be pretty much this. HAPartition generates a JMX notification upon an RPC timeout and a service in the monitoring tool does the job.
Where do you think such service would fit better?
One thing I need to get to the bottom of regarding thread dumps is that Thread.getAllStackTraces() does not provide the same information a kill -3. Some lock information seems to be missing from Thread.getAllStackTraces(), which is why I recommend against JMX method to generate stack traces.
Couple of things on Thread.getAllStackTraces()
It leaks :). Fixed in 6u1
This function returns the information provided by Thread.getAllStackTraces(). This does not include the light weight thread id, which is important for troubleshooting hung and spinning processes; therefore, it is recommended to use the methods described above to obtain the stack trace if possible.
There're 4 ways to generate a thread dump. I have taken the output from the same run with different methods:
1.- kill -3. Can be parsed by tools like https://tda.dev.java.net/ (simply, one of the most useful tools I've found out there!). Example:
"JBossLifeThread" prio=1 tid=0x0a690cc0 nid=0x1741 in Object.wait() [0x867f2000..0x867f30b0] at java.lang.Object.wait(Native Method) - waiting on <0x9247e200> (a java.lang.Object) at java.lang.Object.wait(Object.java:474) at org.jboss.system.server.ServerImpl$LifeThread.run(ServerImpl.java:940) - locked <0x9247e200> (a java.lang.Object)
2.- JMX via org.jboss.system.server.ServerInfo.listThreadDump(). Is it the same as what Thread.getAllStackTraces() would do? Need a JIRA to get to the bottom of this. Here's some output:
Thread: JBossLifeThread : priority:5, demon:false, threadId:58, threadState:WAITING, lockName:java.lang.Object@10cff6b java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:474) org.jboss.system.server.ServerImpl$LifeThread.run(ServerImpl.java:940)
Lock name is different to lock monitor achieved via kill -3. Doesn't looks like it prints previously locked monitors. Cannot be parsed by TDA tools because it's in JBoss specific format. Useless.
Haven't tested the output from this. What would this print? Couldn't find an MBean that uses/parses the output from this, but I suspect it's the same as method 2, specially since http://jira.jboss.com/jira/browse/JBAS-1448 has already been fixed.
4.- JConsole Threads view:
Name: JBossLifeThread State: WAITING on java.lang.Object@10cff6b Total blocked: 0 Total waited: 1 Stack trace: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:474) org.jboss.system.server.ServerImpl$LifeThread.run(ServerImpl.java:940)
Seems to print same info as JBoss' ServerInfo.
This has been bugging for me for ages, had to get to get to the bottom of it :). So far, there's nothing like "killing" JBoss.
I've just realised that I never created a JIRA for a service that would request thread dumps from 1..n nodes upon some trigger or event.
Couple of weeks back I saw this JIRA which seems to attempt to do a similar thing: get a thread dump when the server shutdown times out.
The idea is basically the same except that the triggers are different. My initial motivation is members that didn't response to a sync call, whether AS cluster wide RPC, or JBC's repl sync of data.
To get a stack trace of all nodes in a cluster, you can use JGroups' probe:
http://www.jboss.org/community/docs/DOC-11689 discusses how probe can be extended, e.g. to add custom functionality.
The down side is that you get back a string, there is currently no API to access the results. This is planned though
Probe dump would indeed be really handy for trying to retrieve the thread dumps accross the cluster but it would be called from some Java logic (notification consumer for each different type of situation when to generate a thread) and hence I would need to call JGroups API to get the same that "probe.sh dump" would return.
Also, I would expect all thread dumps from problem X to be dumped locally and then someone to inspect them with TDA or Samurai, so string sounds good to me. Note that it would help that the thread dump string is formatted in the same way as the JVM does (kill -3) as this would help compatibility with tools that are already able to read these type of dumps.
Formatting is the same as the JVM dumps.
Regarding the API: the benefit here would be that you can drop responses from different clusters. Well, you can do that too with probe.sh, but you'd need a little help from sed/awk/cut and friends
Hmmm, I don't understand this very well. I need to test it but ideally what you want is that whatever API you call to get thread dumps for all nodes in the cluster returns in an array or similar each individual thread dump with information where the thread dump is coming from.
The idea here is that the problem plus each individual node's thread dumps are logged somewhere. Obviously, if the same problem happen a lot of times in a row, you'd only log it once, otherwise you could end up swamping the server's HD.
I'm not sure we'd want thread dump responses from different clusters at this point unless there was some kind inter cluster sync RPC call or replication.