-
1. Re: Issue using REST-AT
gytis Mar 26, 2015 4:38 AM (in response to ajcmartins)Hello,
do you have log files with trace level enabled for org.jboss.jbossts.star and org.jboss.narayana.rest categories? That would be really helpful.
Also, which quickstart did you use?
Thanks,
Gytis
-
2. Re: Issue using REST-AT
ajcmartins Mar 26, 2015 6:57 AM (in response to gytis)Hello and thanks for your reply. Please take a look at the logs and see if that helps.
What i am doing is based on the recovery2 quickstart.
Thank you,
-
server1.log.zip 5.4 KB
-
server2.log.zip 8.2 KB
-
-
3. Re: Issue using REST-AT
ajcmartins Mar 26, 2015 10:22 AM (in response to ajcmartins)Ok Gytis, at this point i am almost sure it's a bug.
If i do a new step and restart server1 (contains the transaction/recovery coordinator) before getting server2 back up , then everything recovers correctly. It seems that something isn't being updated correctly after the commit fails, the same something that is being updated on the coordinator restart.
I just need someone with a better insight to confirm this, and you look like being that person.
Thanks,
-
4. Re: Issue using REST-AT
gytis Mar 26, 2015 10:43 AM (in response to ajcmartins)Thanks for an update.
I see that server2 fails to contact recovery coordinator. Leave this with me and I'll try to figure this out. I'll let you know once I have something.
Thanks,
Gytis
-
5. Re: Issue using REST-AT
tomjenkinson Mar 26, 2015 10:44 AM (in response to ajcmartins)Can you produce a simple test that replicates the issue at all as it will really help to diagnose the issue if we have something to fire up?
Thanks,
Tom
-
6. Re: Issue using REST-AT
ajcmartins Mar 26, 2015 11:46 AM (in response to tomjenkinson)Hello Tom, i am unable at this point to provide that kind of test. Nevertheless since i was suspecting about restarting fixing the problem, i went ahead downloaded the code from github and applied a small patch that solved the issue that i was experiencing.
I added the following code just after the line at: narayana/Coordinator.java at master · jbosstm/narayana · GitHub
recoveringTransactions = getRecoveringTransactions(transactions);
I don't actually know if it's valid or how negative the impact may be on other situations. But maybe it helps shedding light on what may be happening?
Thanks,
-
7. Re: Issue using REST-AT
mmusgrov Mar 30, 2015 12:10 PM (in response to ajcmartins)I took a look at the logs and it appears that the application itself is aborting the transaction. Here is what I can glean from the logs you uploaded:
Server B did know about the participant (appB) and wrote its state into persistent storage before the crash (see message with timestamp 2015-03-26 10:25:18,298 in server2.log)
The coordinator on server 1 has logged the transaction and is now trying to replay it on appB but is receiving 404 because server B does not know about the participant (this is message timestamp 2015-03-26 10:28:53,044 in server1.og).
Since server B did persist details about appB something has removed the persistent log for it. This happened at timestamp 2015-03-26 10:27:05,024 in server2.log:
2015-03-26 10:27:05,024 INFO [services.iap.subscriptions.service.tx.UpdateSubscriptionWork] (MSC service thread 1-3) Aborting transaction..
This is coming from a thread inside App B itself - can you take a look at your application code and figure out under what conditions it will abort the transaction branch - ie I don't think it is REST-AT framework code that is issuing the abort request.
-
8. Re: Issue using REST-AT
ajcmartins Mar 30, 2015 12:39 PM (in response to mmusgrov)Hello Michael,
like i said on my first post the rollback is happening on the server B restart during the local recovery system startup. During this flow, the local recovery system tries to sync/update it's info on the transaction/recovery coordinator (server A) which in turn answers with a 404 stating that the transaction doesn't exists The code that does this is at:
- narayana/RecoveryManager.java at master · jbosstm/narayana · GitHub
- narayana/RecoveryManager.java at master · jbosstm/narayana · GitHub
The log of this invocation on server A is:
2015-03-26 10:27:05,001 TRACE [org.jboss.jbossts.star.service.Coordinator] (default task-17) coordinator: replace: recovery-coordinator/0_ffffc0a801f1_62c1ef98_5513de56_49?URL=http://192.168.3.227:8180/rest-at-participant/0:ffffc0a801f1:-1c33e0e9:5513de66:19
Now the problem (and bug in my understanding) is that the transaction coordinator answers with the 404 to this operation despite knowing about the transaction since he keeps trying to replay it.
Thanks,
P.S - the "Aborting transaction" message may be misleading. That's just a log that is made on the rollback method implementation of the participant interface. It should be read as "Received rollback callback"
-
9. Re: Issue using REST-AT
mmusgrov Mar 30, 2015 1:03 PM (in response to mmusgrov)mmusgrov wrote:
This is coming from a thread inside App B itself - can you take a look at your application code and figure out under what conditions it will abort the transaction branch - ie I don't think it is REST-AT framework code that is issuing the abort request.
Ah wait. There is some code in recreateParticipantInformation that will do the abort. Let me look further into what's happening ...
-
10. Re: Issue using REST-AT
mmusgrov Mar 30, 2015 1:31 PM (in response to ajcmartins)ajcmartins wrote:
like i said on my first post the rollback is happening on the server B restart during the local recovery system startup. During this flow, the local recovery system tries to sync/update it's info on the transaction/recovery coordinator (server A) which in turn answers with a 404 stating that the transaction doesn't exists The code that does this is at:
P.S - the "Aborting transaction" message may be misleading. That's just a log that is made on the rollback method implementation of the participant interface. It should be read as "Received rollback callback"
When Server A looks for the transaction id it searches in memory for it and if it isn't there it looks for it on disk but if fails to find it so yes I agree it does look like a bug in the coordinator. Is there any other log output from server A when it does this check (should be about timestamp 10:27:04,807 in amongst the replay requests). I think I will probably need to put a debugger on it to figure out why its not found. Since I will need to first recreate your issue it may take a couple of days.
You also said that calling recoveringTransactions = getRecoveringTransactions(transactions); fixed the problem. This is good but puzzling since the method replaceParticipant in the coordinator already does this if the transaction isn't in memory so another reason why we probably need to debug it.
-
11. Re: Issue using REST-AT
ajcmartins Mar 30, 2015 5:21 PM (in response to mmusgrov)Great! I am glad that i could at least help you going on the right direction.
Cheers,
-
12. Re: Issue using REST-AT
mmusgrov Apr 1, 2015 12:54 PM (in response to ajcmartins)Gytis has a fix. You can track our progress via JIRA: [JBTM-2356] REST-AT recovery failure - JBoss Issue Tracker