-
30. Re: Remoting Transport Transaction Inflow Design Discussion
jason.greene Sep 19, 2011 12:40 PM (in response to jhalliday)Jonathan Halliday wrote:
Sorry, running a bit behind what with having spent the last two work days in a meeting room with lousy connectivity :-(
The discussion above is on the basis of 'what would we ideally like to have in the future' per the related conf call. We're not even at the stage of 'we will deliver feature enhancement X by date Y' yet. What we eventually commit to implement will depend on what other requirements we have competing for attention in the same timeframe and we don't have all that information yet.
For the AS7.1 release we've already delivered the final TS feature release, so if you want tx over remoting in that you will indeed need to build it on whatever is already there, which basically means JCA tx inflow.
Ok sounds like we are in agreement then
-
31. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 7:34 AM (in response to jason.greene)OK, but are we all in agreement that this is a stop-gap measure and not where we'd like to be in some later release?
Also can you (Jason) stick the design discussions around what you will do in this forum, or at least cross post?
-
32. Re: Remoting Transport Transaction Inflow Design Discussion
jason.greene Sep 20, 2011 8:30 AM (in response to marklittle)Sure we'll post something when we have time. Although honestly we have wasted enough on this just to find out that the transaction project has no intention of resolving the issues anytime soon. After EAP6 my team is going to look at doing our own fork, which is unfortunate but it's the only way I see it happening by EAP6
-
33. Re: Remoting Transport Transaction Inflow Design Discussion
jason.greene Sep 20, 2011 8:32 AM (in response to jason.greene)That should read 6.1
-
34. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 9:07 AM (in response to jason.greene)Your own fork of what?!
-
35. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 9:38 AM (in response to jason.greene)Did you not see Jonathan's response from a few weeks back? There was no response to that either. I'll copy it here:
"
From the point of view of the transactions project roadmap, it seems that -
ClientUserTransaction requires no additional implementation work. The same hooks that supported the earlier implementations of this can be used for the new one too.
Transaction context inflow support falls into two parts: whole transaction (gtrid) interposition and branch only (bqual) interposition.
For whole transaction interposition, a new subordinate transaction context is created on each node receiving an inflow. Synchronizations are handled purely locally. The JCA inflow API can be used, albeit with semantics which IMO are not spec compliant. This model would be relatively simple to implement on the transaction manager side, as the existing recovery architecture will mostly still apply. IMO it's of limited utility for users though, as resource managers will see independent transactions and not do any transaction branch coupling. That impacts both functionality and performance.
For branch only transaction interposition, subordinate nodes maintain the inflowed gtrid but create new branches within an allocated portion of the bqual state space. This requires information about the allocation of bqual space to be communicated, either by explicit parameter passing for the general case or by encoding in the Xid for the jboss->jboss inflow case. It affords the opportunity for branch coupling in resource managers used by more than one node in the same transaction, but leads to more complicated recovery needs. Specifically recovery can no longer be driven off consideration of the gtrid ownership alone, but must also consider bqual ownership. This naturally requires that the bqual value actually contain node ownership information, which will require a new encoding. On the other hand we probably need one anyhow to communicate delegation of the bqual state space on links where we're working with 3rd party implementations and thus constrained to the JCA api rather than one that could carry additional parameters. For links where we do control both ends, we need additional methods to support afterCompletion as a separate phase. BeforeCompletion is already available as a separate step on subordinate transactions, although it may need to be even finer grained to allow for JTA 1.1 TSR sync interposition semantics to be transaction global rather than node local.
In both models the communication is entirely top down - the coordinator does not exist as a network endpoint as in JTS, as persistent ids are not supported by the remoting transport. This constrains recovery to use the XA recovery scan model rather than the JTS replayCompletion one. One consequence of this is that parent nodes will require to maintain a list of all possible subordinates and have a recovery module plugin for them, but that's probably not an undue burden for most deployment scenarios. Another consequence is that it's probably going to be better to build it as distributed hooks into JBossJTA rather than a pluggable transport layer for JBossJTS. That should also offer better performance as resource records will get inlined to the tx rather than be separate ostore entries."
-
36. Re: Remoting Transport Transaction Inflow Design Discussion
jhalliday Sep 20, 2011 2:54 PM (in response to jason.greene)ok, so I guess we're jumping ahead to the implementation roadmap discussion without waiting for the EAP6.1 spec then. Is that the same Jason who just got done telling me the JBossTS planning cycle is too far in advance of the AS planning cycle? :-)
If the AS has no preference either way, then I'm inclined to go with the tx branch interposition model rather than full interposition, on the basis that transaction branch lock sharing is a more intuitive and powerful model for users. Whist I think it's the right goal, it's undeniably more complex to implement than the full interposition model. However, rather than go with full interposition as a stopgap and then change over and waste that work, I'd prefer to look at how we can break down the branch interposition implementation into smaller pieces to deliver the functionality incrementally.
With an incremental approach we may even be able to put in place a partial solution for some of the more limited but common use cases in AS7.1, much less a later release. For example, I think it's feasible to extend the org.jboss.tm spi to support a subordinate tx type that exposes beforeCompletion for remote usage. JCA tx inflow does not offer that, but the TS impl under it already does, so we just need the ability to wire it through without tying the AS too closely to the TS implementation classes. That kind of conservative change should be possible on the existing maintenance branch without forking, as it's not going to impact compatibility. Unifying the way the AS and TS do node identity and putting in place the AS side management of the peer node id list that's needed for recovery can also be done up front to avoid later alteration to the domain model - not all the work is going to be on the TS code side.
Some of the more involved TS internal recovery changes may also make sense to do even before we know if the wider feature does make the cut when it comes to prioritizing work for the next release cycle. We could for example potentially enable recovery for multiple resources even in the broken JCA spec semantics, which is an outstanding customer request. It's been a low priority thus far as it's limited to one customer, albeit a large one. But if the work overlaps with wider tx inflow support then it's easier to justify the allocation of resources to it. As the new JCA integration spi is already in place (finally!) the remaining bits needed would likely be entirely arjuna internal, so again probably doesn't require a major version rev or fork on the TS code side.
Whilst getting at least some functionality out sooner rather than later is probably a good thing, one of my concerns on delivering this in staged releases, potentially including the current cycle, is the impact it will have on docs and QE. We'd need to clearly specify what scenarios are expected to work in each release we provide, in order to test correctly and manage user expectations thoroughly. We probably need an internal discussion with those teams to see what resource they have available in addition to seeing what dev resource we can shake loose for this by deprioritizing other things.
-
37. Re: Remoting Transport Transaction Inflow Design Discussion
tomjenkinson Sep 20, 2011 3:41 PM (in response to jhalliday)Jonathan Halliday wrote:
Whilst getting at least some functionality out sooner rather than later is probably a good thing, one of my concerns on delivering this in staged releases, potentially including the current cycle, is the impact it will have on docs and QE. We'd need to clearly specify what scenarios are expected to work in each release we provide, in order to test correctly and manage user expectations thoroughly. We probably need an internal discussion with those teams to see what resource they have available in addition to seeing what dev resource we can shake loose for this by deprioritizing other things.
I agree wholeheartedly with Jonathan and also think we definitely need to take heed on his point re functionality available at each stage being documented accurately and the overhead this will add to EAP. Jason, are we certain this is a feature that needs to be in EAP6 vs EAP6.1? I guess more generally can you clarify your preferred and maximum timescales you would like us to deliver this functionality within.
Another point that was raised earlier (or rather a detail of the one surrounding QA) is the question of testing this with remoting itself. Clearly some work will be done on the remoting side to chat with the API we expose so is it realistic to expect this level of testing to be performed in the timescale needed?
It sounds like a nice piece of work, but our worry is we may simply not be able to add this feature with enough capability/testing to address your desired use cases in the timescale we anticipate you need this in. That said we are ready to work on this right now so hopefully we can make an effective solution in the time available.
-
38. Re: Remoting Transport Transaction Inflow Design Discussion
dmlloyd Sep 20, 2011 4:13 PM (in response to jhalliday)Jonathan Halliday wrote:
If the AS has no preference either way, then I'm inclined to go with the tx branch interposition model rather than full interposition, on the basis that transaction branch lock sharing is a more intuitive and powerful model for users. Whist I think it's the right goal, it's undeniably more complex to implement than the full interposition model. However, rather than go with full interposition as a stopgap and then change over and waste that work, I'd prefer to look at how we can break down the branch interposition implementation into smaller pieces to deliver the functionality incrementally.
Sounds great.
Jonathan Halliday wrote:
With an incremental approach we may even be able to put in place a partial solution for some of the more limited but common use cases in AS7.1, much less a later release. For example, I think it's feasible to extend the org.jboss.tm spi to support a subordinate tx type that exposes beforeCompletion for remote usage. JCA tx inflow does not offer that, but the TS impl under it already does, so we just need the ability to wire it through without tying the AS too closely to the TS implementation classes. That kind of conservative change should be possible on the existing maintenance branch without forking, as it's not going to impact compatibility. Unifying the way the AS and TS do node identity and putting in place the AS side management of the peer node id list that's needed for recovery can also be done up front to avoid later alteration to the domain model - not all the work is going to be on the TS code side.
To be honest I was already planning on copying the things I liked out of the JBossXATerminator implementation since Remoting inflow for EJBs will obviously not execute in terms of JCA Work objects. So it sounds like we're aligned on this point at least. I wasn't sure about beforeCompletion because I didn't see any of that in the code I have, so I'm glad to hear that the underlying API has that.
As for tying AS to the TS implementation, that's not really a problem as that's the whole point of the AS Transactions subsystem. If and when we need an SPI to abstract around multiple TS implementations, we can probably do that on the AS side. I'm not aware of any plans on our part to allow other TS implementations though. I'm not really in favor of SPIs for the sake of SPIs.
Jonathan Halliday wrote:
Some of the more involved TS internal recovery changes may also make sense to do even before we know if the wider feature does make the cut when it comes to prioritizing work for the next release cycle. We could for example potentially enable recovery for multiple resources even in the broken JCA spec semantics, which is an outstanding customer request. It's been a low priority thus far as it's limited to one customer, albeit a large one. But if the work overlaps with wider tx inflow support then it's easier to justify the allocation of resources to it. As the new JCA integration spi is already in place (finally!) the remaining bits needed would likely be entirely arjuna internal, so again probably doesn't require a major version rev or fork on the TS code side.
Whilst getting at least some functionality out sooner rather than later is probably a good thing, one of my concerns on delivering this in staged releases, potentially including the current cycle, is the impact it will have on docs and QE. We'd need to clearly specify what scenarios are expected to work in each release we provide, in order to test correctly and manage user expectations thoroughly. We probably need an internal discussion with those teams to see what resource they have available in addition to seeing what dev resource we can shake loose for this by deprioritizing other things.
Well before we get ahead of ourselves and drown in borrowed trouble, it might be better to identify what specific tasks need to be accomplished, and by whom. Once we have a clear sense of what needs to be done, then we can figure out how we want the releases to go and what resources should be dedicated to what.
In particular I still only have a general sense of what is involved in the branch interposition model - to be specific, it's not clear what the transport requirements would be for negotiating the XID space; you mentioned alternate XID formats as well as bitmask-based solutions, but none of this is really specific. And I don't have a clear understanding of what issues we'll face in terms of recovery - though perhaps I don't need to, as long as it is clear what the transport requirements would be for this as well. By "transport requirements" I just mean, what communication steps will the coordinators need to perform in order to accomplish the tasks which make up propagation and recovery.
Also it's not clear to me how XIDs can actually be created, which I imagine will follow directly from the solution we decide on for how the bqual is selected. It seems like once this is figured out, the rest of the actual inflow code should be pretty straightforward.
So how should we break down the task list? Off the cuff we have:
- Design the XID negotiation/generation scheme (conceptual work, no coding)
- Implement the XID negotiation/generation scheme (possibly split between AS-side and TS-side)
- Design the XA inflow scheme (conceptual)
- Implement XA inflow (probably mostly AS-side, once the XID stuff is in place)
- Design the recovery scheme (conceptual)
- Implement the recovery transport mechanism (mostly AS-side)
For the design points, I think we should probably split each into a forum thread, but honestly I think you TS folks should take the lead on these since you have the experience. I'll jump into the implementation parts as soon as the design is clear enough.
Thoughts?
-
39. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 4:29 PM (in response to dmlloyd)Tom said: "It sounds like a nice piece of work, but our worry is we may simply not be able to add this feature with enough capability/testing to address your desired use cases in the timescale we anticipate you need this in. That said we are ready to work on this right now so hopefully we can make an effective solution in the time available."
Let's all remember that we have a "baking in the community first" mantra here, and for critical (read complex) pieces of code like this, I definitely want to see that baking taken into consideration. So let's be sure to factor that in too!
-
40. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 4:32 PM (in response to dmlloyd)I'll talk with Jonathan tomorrow about this, but when you say:
"In particular I still only have a general sense of what is involved in the branch interposition model - to be specific, it's not clear what the transport requirements would be for negotiating the XID space; you mentioned alternate XID formats as well as bitmask-based solutions, but none of this is really specific."
I don't see how this will have any requirements on Remoting other than as a communication channel, i.e., the protocol should work just as well (conceptually) using HTTP, raw socket, IIOP or carrier pidgeon.
-
41. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 4:37 PM (in response to dmlloyd)David Lloyd wrote:
So how should we break down the task list? Off the cuff we have:
- Design the XID negotiation/generation scheme (conceptual work, no coding)
- Implement the XID negotiation/generation scheme (possibly split between AS-side and TS-side)
- Design the XA inflow scheme (conceptual)
- Implement XA inflow (probably mostly AS-side, once the XID stuff is in place)
- Design the recovery scheme (conceptual)
- Implement the recovery transport mechanism (mostly AS-side)
For the design points, I think we should probably split each into a forum thread, but honestly I think you TS folks should take the lead on these since you have the experience. I'll jump into the implementation parts as soon as the design is clear enough.
I definitely want to see the design happen in the forums so it's all inclusive. As to the steps, I'm pretty sure that Jonathan and Tom already have a plan of attack since this is something we've discussed for a while and most recently last week at the JBossTS f2f. I'll encourage one or other of them to post it here as soon as possible.
-
42. Re: Remoting Transport Transaction Inflow Design Discussion
dmlloyd Sep 20, 2011 4:37 PM (in response to marklittle)Mark Little wrote:
I'll talk with Jonathan tomorrow about this, but when you say:
"In particular I still only have a general sense of what is involved in the branch interposition model - to be specific, it's not clear what the transport requirements would be for negotiating the XID space; you mentioned alternate XID formats as well as bitmask-based solutions, but none of this is really specific."
I don't see how this will have any requirements on Remoting other than as a communication channel, i.e., the protocol should work just as well (conceptually) using HTTP, raw socket, IIOP or carrier pidgeon.
Right. But Remoting will have to carry the messages so the sooner I know what they are, the better. Also this does impact the AS side of things insofar as I have to use the right TS API and feed the results to/from Remoting to get the XID to start the transaction.
But most of all, I just want to break the inertia get things jump-started here, so this is mostly just a push in what I hope is the right direction.
-
43. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 4:43 PM (in response to tomjenkinson)Tom Jenkinson wrote:
Jason, are we certain this is a feature that needs to be in EAP6 vs EAP6.1? I guess more generally can you clarify your preferred and maximum timescales you would like us to deliver this functionality within.
I've mentioned this before, but the overriding issue here is that we will deliver EAP 6 on schedule. If the use cases we need to have in EAP 6 from TS cannot be done by the ideal solution then we probably have no option but to have a stop-gap implementation. We've discussed that on this forum before and elsewhere so I think everyone understands what that would be. On the other hand, if we can skip EAP 6 and go straight to EAP 6.1 and cover the use cases with the ideal solution as outlined, then that gives us the better implementation, more time to bake in the community, more time to QA etc.
Another point that was raised earlier (or rather a detail of the one surrounding QA) is the question of testing this with remoting itself. Clearly some work will be done on the remoting side to chat with the API we expose so is it realistic to expect this level of testing to be performed in the timescale needed?
Yes, testing this is critical. We've seen over the past 20 years that testing the recovery scenarios always take more time than anything else and for good reasons. So we shouldn't underestimate how long this testing will take.
-
44. Re: Remoting Transport Transaction Inflow Design Discussion
marklittle Sep 20, 2011 4:45 PM (in response to dmlloyd)David Lloyd wrote:
Right. But Remoting will have to carry the messages so the sooner I know what they are, the better. Also this does impact the AS side of things insofar as I have to use the right TS API and feed the results to/from Remoting to get the XID to start the transaction.
But most of all, I just want to break the inertia get things jump-started here, so this is mostly just a push in what I hope is the right direction.
Yes, believe me I understand the inertia aspect But I'm expecting the impact on you and Remoting for this specific bit of the protocol to be minimal. Let's put it this way, if we have to change Remoting to cope with this then I'd be even more concerned than I am at the moment