1 2 Previous Next 15 Replies Latest reply on May 15, 2008 10:05 AM by marklittle

    Crash scenarios and recovery

    dimonv

      Hello all,

      There are definitely some topics like this or similar in this forum. Each of them describes any kind of a crash scenario, which is expected to be recovered by the arjuna's recovery manager.

      My case is similar to the case described in
      http://www.jboss.com/index.html?module=bb&op=viewtopic&t=116486.
      I'm also simulating a network crash during the prepare phase:
      I run two XA DataSources connecting to Oracle 10g.
      At the end of the com.arjuna.ats.arjuna.coordinator.BasicAction.prepare() method I cut the JDBC-connections. As result
      1. the prepare to the db is executed
      2. Atomic action is written down to the store with the status COMMITED (os similar, I'm not sure)
      3. pending tx remains in the db.
      4. XARecoveryModule.xaRecovery() tries to recover the tx but since the state of the atomic action remaining in the object store is considered to be okay the rollback is not executed:

      if (!transactionLog((Xid) xids[j]))
      xares.rollback((Xid) xids[j]);

      5. the lock on the db remains:-(

      Probably is my test scenario too hard?

      Therefore my question is:
      Can anybody say which kind of crashes (or their results) can be recovered by arjunas recovery at all?
      This information would very helpful, and could prevent further topic like this with similar crash tests.

      Thanks a lot in advance.

        • 1. Re: Crash scenarios and recovery
          jhalliday

          > Probably is my test scenario too hard?

          Nope, more likely you just don't have the correct config yet. You running inside JBossAS or standalone?

          • 2. Re: Crash scenarios and recovery
            dimonv

            Hi jhalliday,

            thanks for such a fast response.

            I'm running my test in JBoss 4.2.2 GA.
            I configured as it is described in JBossTS documentation.
            I configured JDBCXARecovery as well and tried to switch 1PC.

            Do you have any suspicions?

            • 3. Re: Crash scenarios and recovery
              jhalliday

              Ahh, that's your problem then. The standalone recovery config described in the JBossTS docs does not work for XADatasources deployed via -ds.xml files in JBossAS. You need the shiny new JBossAS specific recovery code.

              http://jira.jboss.com/jira/browse/JBTM-319

              • 4. Re: Crash scenarios and recovery
                dimonv

                I guess XARecovery is not the issue.
                Actually I'm using com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery.
                I extended this class with a connection validation and it works fine and provides pending Xids from db.
                The AtomicAction entry is also in the store but, as wrote earlier, because of its status XARecoveryModule doesn't call rollback on xares.
                I think the issue is that the status of the object store entry has not been changed correspondingly. I came to this conclusion during dubbuging. I have noticed that condition is false:

                if (!transactionLog((Xid) xids[j]))
                 xares.rollback((Xid) xids[j]);
                



                BTW, JBossTS version I'm using is 4.2.3SP5 shipped with JBoss 4.2.2GA.

                • 5. Re: Crash scenarios and recovery
                  jhalliday

                  > The AtomicAction entry is also in the store but, as wrote earlier, because of its status XARecoveryModule doesn't call rollback on xares.

                  As far as I can tell from your description, it's not supposed to. It should call commit in this scenario. The reason is, it may have crashed after starting the second phase and sending a commit to a resource manager but before seeing the response. It has to assume at least one resource manager committed and proceed accordingly. To rollback the resource may cause a heuristic outcome. Is AtomicAction recovery working? It should really drive the tx recovery top down if it has a log entry for the tx. Mail me your test code, jbossjta-properties.xml and -ds.xml files if you get stuck, I'll take a look when I have time.

                  • 6. Re: Crash scenarios and recovery
                    dimonv

                    You're right, it has to call commit.
                    During recovery iteration:
                    AtomicActionRecoveryModule.doRecoverTransaction() is called and it leads to the BasicAction.doCommit (RecordList rl, boolean reportHeuristics) call. But the instance attribute preparedList is empty. I guess this is the cause that the AtomicAction is not commited at the recovery.

                    • 7. Re: Crash scenarios and recovery
                      marklittle

                      Do you see any warnings during commit such as "activation failed for transaction " (paraphrasing)?

                      If there has been an heuristic outcome then it's possible for the preparedList to be empty but for the heuristic list to have 1..* entries within it. Heuristics must be resolved manually through the heuristic resolution tool.

                      • 8. Re: Crash scenarios and recovery
                        dimonv

                        Aftre I droped the JDBC connections at the end of prepare there was a warning:

                        14:45:33,265 WARN [loggerI18N] [com.arjuna.ats.internal.jta.resources.arjunacore.commitxaerror] [com.arjuna.ats.internal.jta.resources.arjunacore.commitxaerror] XAResourceRecord.commit - xa error XAException.XAER_RMFAIL

                        afterwards during the recovery iteration:
                        14:58:45,109 WARN [loggerI18N] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa] [com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa] Could not find new XAResource to use for recovering non-serializable XAResource < 131075, 27, 25, 49454553571025410156579958995656585256509710149505758549745535710254101565799589956565852565097101495057585553 >
                        14:58:45,109 WARN [arjLoggerI18N] [com.arjuna.ats.internal.arjuna.gandiva.inventory.StaticInventory_1] - cannot find null implementation.
                        14:59:49,484 WARN [arjLoggerI18N] [com.arjuna.ats.arjuna.recovery.RecoverAtomicAction_4] - RecoverAtomicAction: transaction -59f6e89c:c88:482ae129:6a not activated, unable to replay phase 2 commit


                        • 9. Re: Crash scenarios and recovery
                          marklittle

                          The first warning is because your RM is "dead".

                          The second can be located http://wiki.jboss.org/wiki/TxNonSerializableXAResource.

                          • 10. Re: Crash scenarios and recovery
                            dimonv

                            >The first warning is because your RM is "dead".
                            Yes, it is. I killed it:-)
                            For recovery I'm using XAResourceRecovery implementation
                            com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery

                            Here is the config:

                            <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery1"
                             value="com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery;TrailDS"/>
                            


                            as I wrote, I extended it a bit with connection validation so that XAResource is checked before use. It works fine.
                            By the way regarding AppServerJDBCXARecovery, it presumes that the XADataSource is using a user having DBA privileges, what in my opinion, could be considered as a security hole in production env:
                            http://jira.jboss.org/jira/browse/JBTM-319?page=comments

                            But I think, as you wrote, the problem is heuristic outcome; there is heuristic entry in the list.
                            Is there a way to avoid such a situation or how to prevent it?

                            Please write me which heuristic resolution tools did you mean in your previous message?

                            Thanks.

                            • 11. Re: Crash scenarios and recovery
                              jhalliday

                              > By the way regarding AppServerJDBCXARecovery, it presumes that the XADataSource is using a user having DBA privileges, what in my opinion, could be considered as a security hole in production env:

                              Actually it requires the user to have a small subset of DBA privileges needed for transaction recovery. If you want to use a different db user account for recovery then you simply deploy a different datasource specifically for that. There is no requirement that recovery use the same datasource as the production app, nor even that it runs in the same JBoss server.

                              • 12. Re: Crash scenarios and recovery
                                dimonv

                                >Actually it requires the user to have a small subset of DBA privileges needed for transaction recovery.

                                Yes, you're right. But it is still a bit more then just read/write on application's db schema.

                                >If you want to use a different db user account for recovery then you simply deploy a different datasource specifically for that.

                                From my point of view, that's another drawback of the current recovery configuration: dependency on the deployment. The content of the deploy directory can change; new applications with new XA datasources can be deployed, and want to benefit from the recovery features. Thus the recovery in conf directory (infrastructure) should be adjusted.
                                I'm looking for a solution for it and thinking on a JBoss service providing the XAResourceRecovery with the required information like JNDI name of the XA datasource.

                                • 13. Re: Crash scenarios and recovery
                                  jhalliday

                                  Yes, the new recovery process on the drawing board for AS 5.0 relies on the datasource deployers providing XAResources directly to the recovery system, so the recovery code does not need to mess around with datasources or such, nor even require to know in advance which datasoures exist. We just can't do that on the AS 4.x maintenance branch as it will involve API changes and hence not be backwards compatible. Therefore in the short term we have a less than ideal interm solution.

                                  • 14. Re: Crash scenarios and recovery
                                    dimonv

                                    Thanks, that's a good news.

                                    I would like to switch to my "sheep" now :-)

                                    Do you have any suggestions regarding how to deal with the "dead" RM and remaining heuristics or how can I avoid/prevent them?
                                    What kind of heuristic resolution tools can be used to resolve db locks?

                                    Thanks in advance

                                    1 2 Previous Next