1 Reply Latest reply on Sep 13, 2013 9:20 AM by rhauch

    Merge operation fails to correctly process children that are same-name siblings, and does not correctly clone and remove child nodes

    ashelepko

      Hello,

       

      In my attempts to get merge() working, i've stumbled upon additional several issues. This time, with processing child nodes.

       

      First, some snippets from JCR 2.0 specification on merge and update:

      10.8.2 Getting a Corresponding Node

      Finding the path of a node's corresponding node in another workspace is done with

      String Node.getCorrespondingNodePath(String workspaceName).

      This method returns the absolute path of the node in the specified workspace that corresponds to this node

       

      10.8.3 Updating Nodes Across Workspaces

      Node correspondence governs the behavior of the update method:

      void Node.update(String srcWorkspace)

      causes this node to be updated to reflect the state of its corresponding node in srcWorkspace.

      If this node does have a corresponding node in the workspace srcWorkspace, then this replaces this node and its subgraph with a clone of the corresponding node and its subgraph.

      If this node does not have a corresponding node in srcWorkspace, then the method has no effect.

      If the update succeeds, the changes made to this node and its subgraph are applied to the workspace immediately, there is no need to call save.

      The update method does not respect the checked-in status of nodes. An update may change a node even if it is currently checked-in.

      15.9 Merge

      Note that a deep merge performed on a subgraph with no versionable nodes at all (or indeed in a repository that does not support versioning in the first place) will be equivalent to an update.

       

      ...


      15.9.1 Merge Algorithm


      The above declarative description can also be expressed in pseudo-code as follows:

       

      ...

       

      doupdate(n, n')

      replace set of properties of n with those of n'.

      let S be the set of child nodes of n.

      let S' be the set of child nodes of n'.

      judging by the name of the child node:

      let C be the set of nodes in S and in S'

      let D be the set of nodes in S but not in S'.

      let D' be the set of nodes in S' but not in S.

      remove from n all child nodes in D.

      for each child node of n' in D' copy it (and its subgraph) to n
      as a new child node (if an incoming node has the same

      identifier as a node already existing in this workspace,

      the already existing node is removed).

      for each child node m of n in C domerge(m).

       

      The above algorithm is implemented in JcrVersionManager$MergeCommand.doUpdate(AbstractJcrNode targetNode, AbstractJcrNode sourceNode)

      It appears to have the following issues:

      1. The sets of child nodes C, D and D' are created by comparing nodes by name (LinkedHashMap of names to nodes). IMHO, while this follows the specification pseudocode, this is simply not true:
        • This does not work for children that have the same name. Sets/maps will contain only one child, because all children have the same name.
        • The spec states that merge() should behave itself as update() for unversioned nodes, and update() uses node correspondence semantics. It's not a totally valid point for versioned nodes, but the code won't work for unversioned ones as well.
      2. Node removal for nodes from set D does not work for nodes that are checked in.
      3. IMHO, nodes from set D' should be cloned, not copied:
        • Once again, merge should behave itself as update(), and update() adds new nodes with clone(), not copy().
        • Cloned nodes have the same ID and version history as original. This fits well with
          • Merge() use cases. Merge is intended to work on nodes which share the same version history.
          • Node correspondence semantics. If we try to run the same merge again, or do a reverse-merge, we'll get new child nodes for each call. This is incorrect.
      4. Adding new nodes from set D' does not work for nodes that are checked in.

       

      I've created MODE-2006 and attached a pull request to it that fixes the above issues:

      1. Use node correspondence semantics in creating sets of nodes instead of using node names;
      2. Add a new package-private AbstractJcrNode.internalRemove() method that allows skipping the version check, and use it to remove nodes;
      3. Replaced workspace.copy() with workspace.clone()
      4. Add a new package-private JcrWorkspace.internalClone() method that allows skipping the version check, and use it to clone nodes.

       

      Thanks.