Perhaps there is some way to prevent both warnings? Perhaps we can add the CMR xid to a different list too that is cleaned before phaseOne but can be consulted by the CommitMarkableResourceRecovery class?
tomjenkinson ok, the point is that I have no idea in what circumstances the warning will be shown. I can see only the comment in the code narayana/XARecoveryModule.java at master · jbosstm/narayana · GitHub which says there could be such occasion. Would you give me a pointer what is here the scenario where that could happen?
About the CMR: for me removing the record from the _xidScans seems to me the best way as it defines it was already committed. If there is any other module depending on the outcome of the AtomicActionRecoveryModule it will be automatic processed.
I can imagine to use some information from the CommitMarkableResourceRecoveryModule but it sounds me strange and going against the current design (no recovery modules are touching other ones at least what I can see).
What's happening is - the responsibility for commititng the XAResource during periodic recovery is at hands of the XAResourceRecord. The XAResourceRecord looks through the list of the recovery modules to get the XARecoveryModule and verify if the particular XAResource is available and if it can be committed. That's fine.
What makes here troubles is the fact that XARecoveryModule orphan detection tries to rollback the resource in the next step. Do you think that the XARecoveryModule should grasp an instance of the CommitMarkableResourceRecoveryModule and verify if the XAResource was not part of the CMR transaction? (as I mentioned above, it's probably possible but I don't know if it's a good way. using the _xidScans seems to me cleaner).
It is related to a tooling change:
What happens is that if you connect the tooling up, the tooling calls restore_state on the XAResourceRecord which will have the effect of calling getNewXAResource which will remove the Xid from the scanned list and so if the terminator is called at the same time as this is going on it will result in an XATerminator::2pc call not being able to complete the branch which pertains to a resource that exists there.
I've been investigating on what you described but I'm not able to find out what is or how to achieve the error. I took the code updated with my change in the recovery module.
What I can see is that the XATerminator::2pc can finish the inflow transactions but I can't see the XATerminator 2pc using some of the XARecoveryModule methods to influence the tooling or recovery. The XATerminator works with the XARecoveryModule only when "recover" is called. But I can't find how they influence each other.
Would you have some test to run or steps to follow?
I don't have a test but the thing that I think can happen is that:
1. Tx propagated in over XATerminator
2. Resource enlisted to subordinate
3. EIS calls prepare on XATerminator
4. XARM scans resources in phase 1
5. ** TOOLING does XAResourceRecord::restore_state, this would remove the Xids from _xids with your patch
6. XATerminator::commit comes in but there is nothing in _xids
That is the problem I think should exist
tomjenkinson another round of harassment
I re-checked your scenario but I can't find the issue you described. I tried to simulate the situation but I was not successful on getting the error.
If I understand it right the XARM loads the XAResource to the _xidScans. That's true when for example the server is started after a JVM failure happens. XARM is called but during the XAResourceRecord periodic recovery initialization the _theXAResource is loaded and prepared. When the tooling is called then the xid is removed from the _xidScans but it's not a trouble as the _theXAResource is ready, already loaded and the doCommit just use it.
I think about another kind of errors but I don't know what could be potentially the issue.
Of course if you think that not touching the _xidScans is safer I can work on some solution where CMRRecoveryManager can be used in the XARecoveryModule. Just I don't know if it's reasonable when it brings complexity and tight bound of the two recovery modules. While _xidScans can be just updated as it was before.
Ah that is right - I had not considered that it was already loaded into the instance that the XATerminator would be using. Thanks for checking!
tomjenkinson I would like to check with you what is the preferred way for the fix.
I can use the first proposal where your change is reverted and _xidScans is updated after each XARecoveryModule::getNewXAResource is called.
It's the change in the branch (and the commit):
Meanwhile I've worked on the other option that's based on your initial idea. It's about marking the resource in the CMR recovery module as processed and then forcing it to be removed by the XARecoveryModule but leaving other xids to be garbage collected from the _xidScans later in second phase.
It's chnage in the branch (and the commit):
I am OK with the first one really as we believe it is correct. It does mean that we can occasionally get nonxaresourcerecovery message but the other approach is more complicated and introduces some coupling of the two. Perhaps attach a patch with the current state of your work to the JBTM-3034 in case we ever need to go back to revisit?