-
1. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
starksm64 Dec 8, 2005 11:10 PM (in response to brian.stansberry)From my "@todo there is still too much synrhonization on keyListeners", I really think the synchronization on the keyListeners is just unnecessary since the change to the use of the ConcurrentReaderHashMap. It can stay on the registerListener/unregisterListener because of the non-atomic step of having to add/remove an ArrayList, but the synchronization in notifyKeyListeners is just uneeded. So what if a listener is added or removed during dispatch?
You can make a copy as you indicate and that is certainly better than what exists now. All it really does is shorten the period over which list changes may affect the targets of the current notification, and is really not needed in my view.
The other issue that we have cleaned up in other uses is that we should not be doing any dispatches into arbitrary layers from the jgroups notification thread. The same should be done here. Notifications should be ocurring from another thread so that the jgroups stack is not blocked. -
2. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
brian.stansberry Dec 9, 2005 12:31 AM (in response to brian.stansberry)Are you aware of any intent to use the synchronized block in notifyKeyListeners to serialize calls to the method? Or was it just there to coordinate access to the collection?
-
3. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
starksm64 Dec 9, 2005 12:57 AM (in response to brian.stansberry)I don't know of that intent, but its a possibility. Such serialization cannot be done from the jgroups callback though.
-
4. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
belaban Dec 9, 2005 4:14 AM (in response to brian.stansberry)I have a feeling Ovidiu or Scott fixed this (by removing the synchronization when iterating through the listeners) some time ago, are you looking at the latest code in head ?
We could use a CopyOnWriteArray or ConcurrentReaderHashmap, then synchronization could go -
5. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
starksm64 Dec 9, 2005 12:00 PM (in response to brian.stansberry)No, its still there. Some synchronization was removed, but I only removed the minimum required for that particular deadlock. Now its time to completely fix this.
-
6. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
brian.stansberry Dec 9, 2005 1:36 PM (in response to brian.stansberry)OK, without objection here's what I plan to do. (This approach is colored by a desire to get fixes in 4.0.4 but limited bandwidth before the release :)
1) Punt on the issue of whether or not calls through notifyKeyListeners() need to be serialized. The effect of the current code is to only allow a single thread to flow through the method; I'm going to make the method synchronized to keep that behavior. Add a big fat TODO comment to reconsider this. Personally I doubt calls through the method need to be serialized.
2) Remove synchronization on the keyListeners collection in notifyKeyListeners(). This fixes JBAS-2539.
3) Have all calls to notifyKeyListeners() that originate from the JGroups thread go through an async handler thread. These calls are easy to isolate, as they come from special purpose methods _add() and _remove().
Another option would be to have *all* calls to notifyKeyListeners go through the async handler thread and serialize them via the queue that thread processes, but then I'm adding asynchronous behavior to local calls, and right now don't have the bandwidth to properly consider implications of that. -
7. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
belaban Dec 9, 2005 2:36 PM (in response to brian.stansberry)We already have an async handler somewhere else, maybe there's some common code to be refactored ?
E.g. a service that can be used by all subsystems emitting notifications
E.g. a notification service, with its own thread pool ?
I son't suggest this needs to be done now, but at least we should put a JIRA issue on it. -
8. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
adrian.brock Dec 9, 2005 3:07 PM (in response to brian.stansberry)"bstansberry@jboss.com" wrote:
OK, without objection here's what I plan to do. (This approach is colored by a desire to get fixes in 4.0.4 but limited bandwidth before the release :)
4) Write a stress test that excercises the problem.
That way you won't have to fix the problem again in 6 months time. :-) -
9. Re: JBAS-2539 Deadlock in accessing DistributedReplicantMana
starksm64 Dec 9, 2005 6:45 PM (in response to brian.stansberry)The minimum change I want to see is an asych handoff to a notification thread so that the jgroups stack is not blocked. Its got be trivial to design a key notification listener that turns around and causes a deadlock by modifying a distributed key either directly or indirectly.
Mark the issue as critical and until this has been tested the release does not go out.