4 Replies Latest reply on Apr 1, 2011 4:44 PM by rhauch

    Trouble with ReferentialIntegrityException and importXML

    sjahan

      Hi everyone,

       

      I'm currently encountering a strange issue with the importXML method:

       

      I have a XML file exported using session.exportDocumentView(). Everything seems fine, i get the XML properly filled.

      The trouble comes when i try to import. If there is nothing at the same UUID, then it's ok. The nodes are correctly imported, but the node that i want to import is already existing, then comes the trouble.

       

      I'm using the IMPORT_UUID_COLLISION_REMOVE_EXISTING behavior, but i always get back a ReferentialIntegrityException. If i try right after to delete the node using the usual working way (node.remove()), i get the same exception.

      So the node is completely corrupted. More interesting, i restart my JBoss, but it keeps returning me the same exception, so i guess the node is corrupted forever or something. I have to remove some higher nodes to be able to re-add them, which is really driving me crazy for testing concerns. Is there a way to get rid of this exception without erasing half my tree, and re-adding it?

       

      So i read about this strange exception, and it appears that i cannot request for a node removal if this one is actually referenced. Ok but i haven't really a clue about what reference this is talking about, because i do nothing except logging in, then calling the importXML, then saving the session and it always goes wrong. I would like to know what object is keeping a pointer on that node i want to destroy... eventually destroy this link myself...

      Is it possible?

       

      I assume i'm missing something big for one day, could you help me please on that matter?

       

      Thank you very much in advance!

       

      SJ.

        • 1. Re: Trouble with ReferentialIntegrityException and importXML
          rhauch

          I'm using the IMPORT_UUID_COLLISION_REMOVE_EXISTING behavior, but i always get back a ReferentialIntegrityException. If i try right after to delete the node using the usual working way (node.remove()), i get the same exception.

          This could very well be the correct behavior w/r/t JSR-283. For example, let's say you imported this XML file that has 'jcr:uuid' properties (e.g., some of the nodes in the XML are "mix:referenceable"), and lets call these "mix:referenceable" nodes A, B, and C. And after you import the file, you then create one other nodes (we'll call it D) that reference node B.

           

          You then attempt to import the same XML file and use IMPORT_UUID_COLLECTION_REMOVE_EXISTING option, which means that during the import process the repository should remove any nodes that have the same UUIDs as nodes being imported. In other words, the import process will need to remove the existing A, B, and C nodes before it can import these nodes from the XML file.

           

          In this example, however, the importer cannot remove node B because node D still has a REFERENCE property pointing to node B. (Note it would not be a problem if node D had used a WEAK_REFERENCE property.)

          So the node is completely corrupted. More interesting, i restart my JBoss, but it keeps returning me the same exception, so i guess the node is corrupted forever or something.

          As I pointed out in my example scenario above, the nodes are not necessarily corrupted. However, there are other scenarios that would indicate a bug.

           

          For example, consider an XML file that contains "mix:referenceable" nodes (e.g., nodes A, B and C) and other nodes that have REFERENCE properties pointing to nodes within this file (e.g., E has a reference to A, F has a reference to C). If you import this XML file into an empty repository (with the necessary node types already registered) and then immediately re-import the same XML file again using the IMPORT_UUID_COLLECTION_REMOVE_EXISTING option, you should not get this exception. If this is your case, then there is a problem in ModeShape's importer, so please log a bug in our JIRA and please attach an XML file that can replicate this situation. We already have tests that check this kind of import, so if you're seeing this problem then we must be not properly handling some specific condition. Having a file that causes this problem will make it vastly easier to fix.

           

          I would like to know what object is keeping a pointer on that node i want to destroy... eventually destroy this link myself...

          Is it possible?

          The best way to know if there is a problem is to find out which nodes are referencing the nodes that you want to delete. There are two ways to do this.

           

          The JCR API provides a "getReferences()" method on javax.jcr.Node that will return all of the REFERENCE javax.jcr.Property objects that point to the node. So in my earlier example, if you call "getReferences()" on node B, it will return node D's reference property. Unfortunately, to find out all of the REFERENCE properties to nodes underneath some subgraph, you'll have to walk the subgraph and call the "getReferences" method on each node in the subgraph. Note that calling "getReferences()" on a node that is not "mix:referenceable" is perfectly acceptable and will return an empty iterator. This is probably the easiest way.

           

          A second way to find references is to use ModeShape's query facility and a JCR-SQL2 query with a subquery. (This query will not work in other JCR repositories unless they also support subqueries.) Here's the query:

           

           

          SELECT [jcr:path] FROM [nt:base] WHERE REFERENCE() IN ( 
             SELECT [jcr:uuid] FROM [mix:referenceable] 
             WHERE ISDESCENDANTNODE([mix:referenceable],'/x/y/z') )

           

          This query finds the path of all nodes that have a REFERENCE property (of any name) with a UUID value that is within those returned by the subquery. The subquery find the UUIDs of all 'mix:referenceable' nodes that exist under the '/x/y/z' path (note this excludes the '/x/y/z' node itself). You can remove the "ISDESCENDANTNODE" criteria from the subquery if you want to find all nodes that reference other nodes.

           

          Unfortunately the query listed above will only return the path of the nodes that have references to nodes with the specified subgraph. To figure out which nodes are referenced or which REFERENCE properties are the culprit, you'll have to get the nodes and do a bit of investigation.

           

          Hopefully this helps.

          1 of 1 people found this helpful
          • 2. Trouble with ReferentialIntegrityException and importXML
            sjahan

            Thank you Randall,

             

            I finally assumed that this was quite the normal behavior since i found the following option:

            <performReferentialIntegrityChecks jcr:primaryType="mode:option" mode:value="false"/>

            I put this in the modeshape-repositories.xml file and it works fine now!

            Assuming that we're protecting the modeshape's methods with front methods, we shouldn't encounter breachs.

             

            I take the opportunity of this "import" thread to ask you about encoding.

            As i'm French (sorry about that ), we've accentuated characters. It generates some problem at the import:

             

            18:07:28,477 ERROR [STDERR] [Fatal Error] :1:30597: Invalid byte 2 of 3-byte UTF-8 sequence.

            18:07:28,477 ERROR [STDERR] javax.jcr.InvalidSerializedDataException: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.

            18:07:28,478 ERROR [STDERR]     at org.modeshape.jcr.JcrSession.importXML(JcrSession.java:965)

            18:07:28,478 ERROR [STDERR]     at org.oea.jcr.management.impl.JCRManagerImpl.importJCRData(JCRManagerImpl.java:1117)

             

            I just took a look at the source code and it could come from the emit method in the StreamingContentHandler class. When the bytes are obtained from the string, maybe there should have a getBytes("UTF-8") instead of just getBytes() but i'm not expert at all with Java. Anyway, the accentuated characters are unreadable in the output file.

             

            Anyway, thank you very much and have a god weekend!

             

            SJ.

            StreamingContentHandler.java

            • 3. Re: Trouble with ReferentialIntegrityException and importXML
              rhauch

              I finally assumed that this was quite the normal behavior since i found the following option:

              <performReferentialIntegrityChecks jcr:primaryType="mode:option" mode:value="false"/>

              I put this in the modeshape-repositories.xml file and it works fine now!

              StreamingContentHandler.java

              That option simply turns off all referential integrity checks, and basically makes ModeShape treat all of your REFERENCE properties as if there were WEAKREFERENCE properties.

               

              I take the opportunity of this "import" thread to ask you about encoding.

              As i'm French (sorry about that ), we've accentuated characters. It generates some problem at the import:

               

              18:07:28,477 ERROR [STDERR] [Fatal Error] :1:30597: Invalid byte 2 of 3-byte UTF-8 sequence.

              18:07:28,477 ERROR [STDERR] javax.jcr.InvalidSerializedDataException: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.

              18:07:28,478 ERROR [STDERR]     at org.modeshape.jcr.JcrSession.importXML(JcrSession.java:965)

              18:07:28,478 ERROR [STDERR]     at org.oea.jcr.management.impl.JCRManagerImpl.importJCRData(JCRManagerImpl.java:1117)

               

              I just took a look at the source code and it could come from the emit method in the StreamingContentHandler class. When the bytes are obtained from the string, maybe there should have a getBytes("UTF-8") instead of just getBytes() but i'm not expert at all with Java. Anyway, the accentuated characters are unreadable in the output file.

              StreamingContentHandler.java

               

              Thanks, SJ. I've created MODE-1137 to record this issue. I should have it fixed very soon.

               

              Anyway, thank you very much and have a good weekend!

               

              Thanks, and you, too!

              • 4. Trouble with ReferentialIntegrityException and importXML
                rhauch

                I've committed and merged the fix to MODE-1137 into our 'master' branch, and have marked the issue as resolved. If you get a chance to test with the latest and find it is still a problem, please reopen MODE-1137.

                 

                Best regards