6 Replies Latest reply on Dec 11, 2014 7:05 AM by bwallis42

    Very slow node delete in 2.8.1

    bwallis42

      Hi, we have a system in production using Modeshape 2.8.1 that has a little bug in it. It is creating new node trees for incoming data but the code to delete them when it has finished with the data was accidentally omitted from the system.

       

      We now have some thousands of SNS nodes at the top level (schema attached, top level node is inf:document) and it is taking many minutes to insert a new document. We have a utility program to try to delete the nodes but each delete is taking 30 minutes! We cannot delete the unused nodes as fast as new ones are being created.

       

      What I don't understand is why it is taking so long for each delete. There are only about 5-10 child or grandchild nodes under each top level node but we do have 10000 - 15000 top level nodes.

       

      Any suggestions how I can clean up this mess would be most appreciated!

       

      The code we are using to do the deletes work on the top level node, we get a list of the node IDs and then loop through them to remove each one

       

       List<NodeDTO> unusedDocuments = searchUnusedDocuments(repoAddress, repoName, username, password, limit);
       session = getJcrRepository(repoAddress, repoName)
                   .login(new SimpleCredentials(username, password.toCharArray()));
                  
       for(NodeDTO unusedDocument : unusedDocuments)
       {
           session.getNodeByIdentifier(unusedDocument.getIdentifier()).remove();    
       }
       session.save();
      
      

       

      Simple enough but very very slow. This code is running inside of a JBoss 6.1 appserver (old, not the EAP).

       

      thanks.

        • 1. Re: Very slow node delete in 2.8.1
          bwallis42

          Just noted this in the documentation:

           

          If a node with same-name siblings is removed, this decrements by one the indices of all the siblings with indices greater than that of the removed node. In other words, a removal compacts the array of same-name siblings and causes the minimal re-numbering required to maintain the original order but leave no gaps in the numbering.

           

          Would my deleting go somewhat faster if I started with the highest possible numbered node? i.e. the node where Node.getIndex() has the highest value?

          • 2. Re: Very slow node delete in 2.8.1
            rhauch

            When possible, delete child nodes that are at the end of the list of children. For child nodes that have the same name (e.g., are same name siblings), deleting any node but the last SNS will be expensive, since all SNS following the removed node have to be updated.

             

            (This behavior in 3.x and 4.x is much improved, since SNS indexes are computed on the fly rather than managed.)

            1 of 1 people found this helpful
            • 3. Re: Very slow node delete in 2.8.1
              bwallis42

              Thanks Randall for confirming what I suspected after reading the doco on the remove function. We will try re-ordering our delete operations.

              • 4. Re: Very slow node delete in 2.8.1
                bwallis42

                Is it possible to order the results of a query by the SNS index and descending? I've been reading about the queries in the JCR but cannot see how to do this since the index doesn't seem to be available as a property on the nodes.

                 

                thanks.

                • 5. Re: Very slow node delete in 2.8.1
                  rhauch

                  We expose the SNS index as a pseudocolumn, so at the moment there is no way to order by SNS index. However, you can order by 'jcr:name', as this pseudocolumn contains the local part and SNS index (if there is one).

                  • 6. Re: Very slow node delete in 2.8.1
                    bwallis42

                    We have successfully run our node deletions.

                     

                    We found that if we delete 100 nodes and then do a save() it only takes about 2-3 times longer than deleting one node and the time decreased as we deleted more. So we are down from something like 2800 nodes to about 150 and performance is back to normal.

                     

                    Deleting from the highest numbered node did not seem to help with the performance.

                    1 of 1 people found this helpful