-
15. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 10:34 AM (in response to brian.stansberry)"adrian@jboss.org" wrote:
"adrian@jboss.org" wrote:
The code is the same as JBoss4.x
There is one difference between 4.x and 5.x, which is the 4.x code won't loop forever
since it uses a for loopJBoss 4.x int size = taskList != null ? taskList.size() : 0; synchronized( taskList ) { for(int i = 0; i < size; i ++) JBoss5.x synchronized (taskList) { while (taskList.isEmpty() == false) {
But using a for loop would just hide the problem.
The thread on Trace2 still wouldn't be woken up because the other thread wouldn't
do the notification on its task list (it would still assign the task to the wrong thread).
One horrible thought is that JBoss4.x has the same problem but we haven't seen it
as clearly before because of the for loop. ;-( -
16. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 3, 2009 10:58 AM (in response to brian.stansberry)Just an FYI: Bela's likely out for the rest of the day. I'm trying to reproduce w/ TRACE logging now, but so far w/o success. If I can figure out the secret recipe to reproduce, I'll switch the jboss-classloader.jar and see what happens.
-
17. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 2:01 PM (in response to brian.stansberry)I think I know what the problem is, but if I'm correct it will be very difficult to
reproduce the problem.
If you look at the old code in JBoss4, LoadMgr3 in nextTask()
when releaseInNextTask == true just removes the lock on the classloader
and unassigns the current thread as the owner of the classloader.
The important part is that it doesn't try to reschedule any outstanding tasks
back to their original threads. That only occurs in endLoadTask().if( threadTask.releaseInNextTask == true ) { if( trace ) log.trace("Releasing loadLock and ownership of UCL: "+threadTask.ucl); synchronized( registrationLock ) { loadClassThreads.remove(threadTask.ucl); } synchronized( threadTask.ucl ) { ucl3.release(); ucl3.notifyAll(); } }
While in JBoss5, it does:if (threadTask.isReleaseInNextTask()) threadTask.getClassLoader().unlock();
The classloader unlock() will always try to reassign the outstanding tasks
when it invokes ClassLoaderManager.unregisterLoaderThread()
so it is reassigning tasks in both places. -
18. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 2:09 PM (in response to brian.stansberry)"adrian@jboss.org" wrote:
I think I know what the problem is, but if I'm correct it will be very difficult to reproduce the problem.
The reason I think it will be difficult to reproduce the problem is because
I think this only occurs when you hit the ClassCircularity bug.
The releaseInNextTask == true means that it got a temporary lock on the classloader.
Its not the original classloader used to initiate the classloading attempt.
This can only occur if the thread owns the classloader so it should in normal circumstances be able to complete the task, release the classloader and reschedule back the other requests.
If however, it hits the ClassCircularity problem (see the catch block just above the unlock), it will reschedule that task. So when it does the unlock, it will loop
on the reassignment of the task back to the same thread. -
19. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 3, 2009 2:21 PM (in response to brian.stansberry)"adrian@jboss.org" wrote:
The reason I think it will be difficult to reproduce the problem is because I think this only occurs when you hit the ClassCircularity bug.
Perhaps the mod_cluster objects being deserialized trigger the ClassCircularity bug? That may explain why Bela is seeing this with mod_cluster while we haven't gotten a ton of reports on it for AS 5.
Still trying to reproduce... Bela said it wasn't that hard for him to reproduce by stumbling into it, so if I can't there's a reasonable chance he'll be able to. -
20. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 2:40 PM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
"adrian@jboss.org" wrote:
The reason I think it will be difficult to reproduce the problem is because I think this only occurs when you hit the ClassCircularity bug.
Perhaps the mod_cluster objects being deserialized trigger the ClassCircularity bug? That may explain why Bela is seeing this with mod_cluster while we haven't gotten a ton of reports on it for AS 5.
It also depends which version of the JDK you use. e.g. Sun claim it is fixed from
1.5.0-b08 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2139628 -
21. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 2:46 PM (in response to brian.stansberry)I've uploaded a second snapshot which includes a fix for the reassignment problem.
http://snapshots.jboss.org/maven2/org/jboss/cl/jboss-classloader/2.0.3-SNAPSHOT/
i.e. this part of the code should now be back to what it was in JBoss-4.x
But Ales says he is going to do the 2.0.3.GA release tonight anway.
I'm going to close JBCL-81 since the reassignment problem has all the symptoms
of the original problem and it is consistent with the stacktraces posted before.
An extra check would be to get a TRACE log (if you reproduce the problem)
that has the ClassCircularityError or LinkageError. We only log it as an error
if it fails after 10 attempts. :-) -
22. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 3, 2009 2:53 PM (in response to brian.stansberry)OK.
Note that Bela was using some JDK 6 release; you can tell from the "java.lang.Thread.State: RUNNABLE" in the thread dumps. -
23. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 3, 2009 3:08 PM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
OK.
Note that Bela was using some JDK 6 release; you can tell from the "java.lang.Thread.State: RUNNABLE" in the thread dumps.
Notice I said "claim" to have fixed it. :-)
I remember still seeing it on JDK 6 with one of our classloading tests, I forwarded
it to one of the guys at Sun working on the concurrent classloading in JDK 7
but I never heard back from him. Maybe it got fixed in a later release?
The changes in JDK 7 (if they happen) will make all this nonsense unnecessary! -
24. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 3, 2009 7:03 PM (in response to brian.stansberry)Any reason why the category for the logger in org.jboss.classloader.spi.base.ClassLoaderManager is
private static Logger log = Logger.getLogger("org.jboss.detailed.classloader.ClassLoaderManager");
which means enabling TRACE for org.jboss.classloader doesn't pick up output from this class. :-(
I'd just change it but the hardcoded String category makes me think there's a 0.5% chance there's a reason for that, so I'm asking first. ;-)
Anyway, good news is I can reproduce this now; will fix my jboss-log4j.xml to work around above issue and will have logs showing what's going on.
Then I'll try the 2.0.3-SNAPSHOT. -
25. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 3, 2009 11:40 PM (in response to brian.stansberry)I was able to induce the classloading loop quite easily using JDK 6 (u12) a stock 5.0.0.GA. But with the 2.0.3-SNAPSHOT libraries I've been trying for over half an hour without success.
So, something you did helped. :-)
More mysterious is exactly what caused the problem. I have logs from a server that failed and grepped for "Circular" or "LinkageError" and saw nothing. The logs are quite complex so it would probably take hours of detail analysis to get anything out of them. I attached them to JBCL-81 just so they are archived somewhere in case they prove useful.
I believe a bug I found today (https://jira.jboss.org/jira/browse/MODCLUSTER-47) was relevant here. It basically resulted in one node continually sending an RPC to another, with a complex object passed as a param and a complex object returned. So, fair bit of classloading calls involved. This RPC happens about 20 times a second. Failure seemed to occur when you made an AJP request to the server while this was going on; the thread handling the AJP request ends up in the state shown in my 3rd post on this thread; the JGroups thread ends up in the tight loop.
The continual RPC issue is why the log attached to the JIRA is so large. -
26. Re: Strange classloading behavior -- thread stuck
belaban Feb 4, 2009 2:40 AM (in response to brian.stansberry)I downloaded Adrian's snapshot and I wasn't able to reproduce the class loading issue.
However, I ran into hangs when shutting down an instance. I think this is related to MODCLUSTER-47, so it is not relevant to this forum. I'll continue in MODCLUSTER-47.
Thanks ! -
27. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 4, 2009 8:10 AM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
Any reason why the category for the logger in org.jboss.classloader.spi.base.ClassLoaderManager isprivate static Logger log = Logger.getLogger("org.jboss.detailed.classloader.ClassLoaderManager");
which means enabling TRACE for org.jboss.classloader doesn't pick up output from this class. :-(
Sorry about that. I did it deliberately (but forgot I did it)
because I wanted to keep Scott's logging for the
thread scehduling in the code, but not normally show it.
As you found, this thread scheduling logging tends to swamp the main
classloader logging making it hard to read for normal classloader debugging
e.g. why didn't it load that class from where I expected? -
28. Re: Strange classloading behavior -- thread stuck
brian.stansberry Feb 4, 2009 9:28 AM (in response to brian.stansberry)No problem. :-)
The ClassLoadingTask log category had I typo:protected static Logger log = Logger.getLogger("org.jboss.detailed.classloader.ClassLoadinTask");
Missing "g" in "Loading". I just fixed that, r83853. To which I just a just realized I didn't add a commit message. :( Better get some coffee. -
29. Re: Strange classloading behavior -- thread stuck
adrian.brock Feb 4, 2009 9:39 AM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
I was able to induce the classloading loop quite easily using JDK 6 (u12) a stock 5.0.0.GA. But with the 2.0.3-SNAPSHOT libraries I've been trying for over half an hour without success.
So, something you did helped. :-)
More mysterious is exactly what caused the problem. I have logs from a server that failed and grepped for "Circular" or "LinkageError" and saw nothing. The logs are quite complex so it would probably take hours of detail analysis to get anything out of them. I attached them to JBCL-81 just so they are archived somewhere in case they prove useful.
I've been examing your log and I think I understand another way this can happen.
My fix is also correct in that case as well, which is why it helped.
Its failry complicated so bare with me (I'll try and simplify it to the bare minimum :-).
Step 1
Thread1: gets a lock on ClassLoader A to do a normal classloading request on ClassA
Step 2
Thread1: to load ClassA it also needs to load ClassB from ClassLoader B
Thread1: aquires the lock - but hasn't scheduled the task yet
Step 3
Thread2: wants to load ClassC from ClassLoaderA and schedules against Thread1
to do that since it owns the lock
Step 4
Thread1: schedules the load ClassB, but it is behind ClassC
Step 5
Thread1: does the load on ClassC but this requires to load ClassD from ClassLoaderA
(note Thread1 is now the requestingThread even though the requestingThread for the
initial load of ClassC is Thread2 - this is the importan part)
Step 6
Thread1: Schedules the load of ClassD
Step 7
Thread1: does the load of ClassB and then tries to reassign the tasks.
But there is still the load of ClassD for the same thread in the list from step (5)
So it loops trying to reassign to itself.
As I noted in step (5) the case I'd missed when trying to determine how it could happen
is when you get a recursive classloading request. Although the load of ClassC
belongs to Thread2, its knock on effect to load ClassD actually belongs to Thread1
So the ClassCircularity bug is not required to reproduce the problem,
just this "jumping the queue" or more correctly, interleaving of recursive requests
that get executed on the same thread.