2 Replies Latest reply on Aug 30, 2012 9:57 AM by Eric Wittmann

    Should we use properties or sub-nodes to model a Map?

    Eric Wittmann Apprentice

      In S-RAMP we have nodes in ModeShape where we need to store (basically) a java.util.Map<String, String> of data.  Randall suggested using "residual properties" to store the data.  Going a bit further, I figured I could use residual properties within a custom namespace.  That way it would be easy for me to quickly pull out just the data corresponding to the Map.

       

      However, I suspect we could also use child nodes to model this data.

       

      I was wondering - are there any significant functional or performance differences between these two approaches?

        • 1. Re: Should we use properties or sub-nodes to model a Map?
          Randall Hauch Master

          Storing all of the map entries as properties on a single node is perhaps the most straightforward. The node simply needs a primary type or mixin type that allows residual property definitions.

           

          1. accessing each map entry is not much more than looking up in a map based upon the property/key name (there is additional JCR validation logic, though that's done as lazily as we can make it)
          2. when querying the nodes, you can easily apply criteria to the properties and get all the property values in the result set (see below)
          3. large string values in the map entries are automatically handled as "large values" in ModeShape; that is, they're stored in the same way that BINARY values are stored
          4. this approach performs slightly less ideal if there are many hundreds of entries (or more) in the map.

           

          Using child nodes for each entry in the map is a more heavyweight approach:

           

          1. each entry in the map would be a separate child (where each child is stored separately), and getting each entry would require a separate lookup in the Infinispan cache;
          2. each node would be small, so the overhead per node becomes greater, too.
          3. querying the "map" is at best a bit harder but at worst very, very difficult (if not impossible) depending upon what you want to do in your query (see below)
          4. large string values in the map entries are automatically handled as "large values" in ModeShape; that is, they're stored in the same way that BINARY values are stored
          5. this approach performs better if there are many hundreds of entries (or more) in the map.

           

          So, most of the difference is in how your client code accesses the repository and in performance/scalability. However, the abiltity to query is very different.

           

          In the case of storing all the map entries on a single node, queries are very straightforward and natural:

           

          SELECT * FROM [acme:map] WHERE prop1 = 'value1' OR ( prop2 = 'value2' AND prop3 = 'value3' )
          

           

          However, if map entries were stored on separate children, then queries become far less natural and require joins. Plus, the results will come back as separate rows for each entry:

           

          SELECT a.value FROM [acme:map] AS parent JOIN [acme:mapEntry] AS a ON ISCHILDNODE(a,parent)
          

           

          Adding simple OR criteria is somewhat easy, but doing ANDs require separate joins for each child, plus that makes it far more difficult to get all of the child nodes.

           

          In short, if queries are important, I'd really try using residual properties on a single node. If queries are not important, then the choice becomes less clear.