12 Replies Latest reply on Apr 28, 2008 7:19 AM by marklittle

    Distributed Transactions: what to do when coordinator crasch

      Hello,

      I have several participants of distributed transaction and a coordinator. I use transaction bridging (done by Halliday), so each participant is JTA transaction.

      I don't know what to do when coordinator crashes (or some participant crasches). I would like to rollback the distributed transaction manually by some database command. It is surely database question to database forum (I use Oracle), but maybe somebody knows solution hear.

      I can see global transaction in view v$global_transaction and I can see participants in view v$transaction. But I don't know how to rollback/kill them.

      The only workaround is to shutdown database:-). After this action the views are empty and oracle inserts one row into DBA_2PC_PENDING. This row describes the distributed transaction. The transaction is in state PREPARED. I can see global_trans_id and I can issue command 'rollback force <global_trans_id>'. But as I have written, I can do that after shutting down database.

      Thank you for your help
      Pavel Kadlec

        • 1. Re: Distributed Transactions: what to do when coordinator cr
          jhalliday

          you don't need to shut it down, you just need to be patient. Oracle won't move the tx into the pending view until it hits a timeout, which IIRC is around 20 seconds by default.

          • 2. Re: Distributed Transactions: what to do when coordinator cr

            What is it timeout for? Please can you point me to some docs? I wait 30 minuts and the global transaction is still in v$global_transaction. It's state is 'COLLECTING'. In v$transaction is one participant's transaction. It's state is 'PREPARED'. The other participant's transaction disappeared (Their state was 'ACTIVE').

            Maybe you should know that when I created the distributed transaction, I use com.arjuna.mw.wst.UserTransaction.begin and I set timeout of the transaction to one day:-). I have long business process (10 hours) running in that distributed transaction.


            • 3. Re: Distributed Transactions: what to do when coordinator cr
              jhalliday

              how are the participants connecting to the db? If they share user credentials, oracle may optimize multiple branches down to one.

              • 4. Re: Distributed Transactions: what to do when coordinator cr

                The participants share user credentials. They are connected through datastore. The participants use hibernate to persist entities.

                I have not good knowledge about Oracle stuff, so I don't know anything about optimizing and what it means.

                • 5. Re: Distributed Transactions: what to do when coordinator cr

                  I have some clues from some Oracle forum that it could be Oracle bug related to XARecovery. I will let you know the results.

                  • 6. Re: Distributed Transactions: what to do when coordinator cr

                    I have done some debugging. The problem can arrise when participant is in state 'PREPARED' and between it receives commit or rollback. If the participant or coordinator crashes in this period, the participant hangs in a state 'PREPARED' foreever.

                    It seems like a bug in Oracle or I have bad Oracle configuration. There is surely some timeout that says how long the participant can be in 'PREPARED' state. But I don't know where to find that timeout. What this timeout the one mentioned above?

                    • 7. Re: Distributed Transactions: what to do when coordinator cr

                      Was this timeout the one mentioned above?

                      • 8. Re: Distributed Transactions: what to do when coordinator cr
                        jhalliday

                        > the participant hangs in a state 'PREPARED' foreever.

                        yup, it's supposed to. It's waiting on the coordinator telling it what to do. The real problem is that the current XTS coordinator does not have crash recovery enabled, so it's never going to send the participant a decision. Coordinator crash recovery will be coming along shortly with a bit of luck, it's one of our priorities for the JBossTS 4.4 release.

                        • 9. Re: Distributed Transactions: what to do when coordinator cr

                          Does it exist some workaround how to rollback the participant? Does it exist some databaze command how to rollback the participant?

                          I have to find some workaround. Shutting database is not possible to me, so that I could rollback the participant.....

                          • 10. Re: Distributed Transactions: what to do when coordinator cr

                            I have done following test. After participant did prepare I kill the partcipant. I hoped that I will be able to use com.arjuna.ats.internal.jta.transaction.arjunacore.jca.XATerminatorImple recover feature.

                             try {
                             XATerminatorImple xaTerminatorImple = new XATerminatorImple();
                             Xid[] xids = xaTerminatorImple.recover(XAResource.TMSTARTRSCAN);
                            
                             for (Xid xid : xids) {
                             xaTerminatorImple.rollback(xid);
                             }
                            
                            
                            
                             xaTerminatorImple.recover(XAResource.TMENDRSCAN);
                             } catch (XAException e) {
                             // TODO Auto-generated catch block
                             e.printStackTrace();
                             }
                            
                            



                            Unfortunally I received

                            17:50:46,723 WARN [loggerI18N] 17:50:46,723 WARN [loggerI18N] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa] Could not find new XAResource to use for recovering non-serializable XAResource < 131075, 30, 64, 1--3f57ff95:c83b:4811d097:50be
                            17:51:08,007 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_54] - Top-level abort of action -3f57ff95:c83b:4811d097:50bf received TwoPhaseOutcome.FINISH_ERROR from <ClassName:RecordType.JTA_RECORD>
                            


                            I do not have any other ideas how to make workaround....

                            Pleas can I do anything with the whole situation?

                            • 11. Re: Distributed Transactions: what to do when coordinator cr

                              I called the code above on the same JBoss as the participant is. I restarted the JBoss after I killed the whole JBoss where the participant is

                              • 12. Re: Distributed Transactions: what to do when coordinator cr
                                marklittle

                                Wait for crash recovery to be enabled.