2 Replies Latest reply on Apr 28, 2013 12:10 AM by sahar

Content Modeling in modeShape For n:n or 1:n Relationship between nodes

sahar Apr 23, 2013 4:51 AM

Hi,I'm newbie in modeshape and I want to have some nodes that are related together. like this :

User node , and Email node

each user can send/receive Several Email to/from several user (carbon copy)

already i worked with relational databases and IDs to express relations, so i use them in content models aswell

this relation is "many to many" or "one to many" relationship

since it is better that items be identified by path, I do't know how can impliment this content model

this link http://wiki.apache.org/jackrabbit/DavidsModel was usefull but i'm so confused

I will be appreciate if any body help me.

Thnak you

1. Re: Content Modeling in modeShape For n:n or 1:n Relationship between nodes

rhauch Apr 23, 2013 11:34 AM (in response to sahar)
The "rules" in David's Model are indeed useful, but I'd generally consider them guidelines rather than absolute rules. Except for rule #1, which I would consider a must-do.

I think it's probably relatively straightforward to create one branch in the repository for users and another for emails. There are several ways of organizing each branch, but truthfully you should try several to see what works best for your application and how it access and updates data.

The bottom line (in terms of David's Model) is that you should try out several of these structures to see which of these (or other designs) works based upon your needs. In other works, prototype the structure and write code that accesses that content like your application does. If something doesn't work, change it. If it does work, then you have a starting poing for part of your application. And initially, don't worry about the primary type or mixin types. Just focus initially on the structure and how your application access and updates the structure. (If you want to use certain features like versioning or shareable nodes, you'll need to use the appropriate mixins. But those can be added to any node of any primary type.)

Users

Users might be organized by username, and if you only have many 10Ks of users then you might be able to get away with putting them all under a single parent. So one node called "users" and under it a child node for each user named by their username. That's nice an simple, and makes it very easy to navigate to a user given their username. Even if "users" is not under the root, your application should know where the "users" node is. Plus usernames are likely to be quite stable (often they are fixed), so path-based navigation is easy.

If you know you'll have more users (e.g., 100K or more), then having that many child nodes under a single parent will start to become a problem performance-wise. In this case, you might want to create a simple structure of intermediate nodes between "users" and each user node. The first level of intermediate nodes might represent the first character (or two) of the username. For example, given a username "jsmith", the path might be "/users/js/jsmith". This segregates the users into groups that are more easily accessed. You have have one or more levels of intermediate nodes. This technique tends to work really well, even if the number of users is only 10Ks.

Emails

My initial thought would be to organize this area of the repository by date, and again it probably makes sense to create some intermediate organizational nodes to handle larger volumes of data. But here, if we design the structure well, we can also make it easy for the application to navigate by date. For example, you might consider "/emails/{year}/{monthNumber}/{dateOfMonth}" for the organizational structure, with a node for each email under that structure in the requisite location. Again, if the email volume dictates it, consider extending that date-based organizational structure to also include path segments for hours and maybe even minutes.

Relationships between users and emails

If your use case also involves individual inboxes, you might even consider looking at sharable nodes. Here, you could store the emails separately (and only once) in the branch described above, and then under each user you could define an "inbox" where you could share each appropriate email. Since shareable nodes can "exist" at multiple places in the repository tree, the emails would exist in both the central email area and under the users inboxes. Removing an email from a user's inbox would not remove it from the central area or any other users' inbox.

Using shareable nodes may or may not work well for your case. (Again, if you think it might, try it!) If not, then there still are options.

It sounds like you want each email node to contain information about which users were involved (e.g., To, From, CC, BCC, etc.). So, assuming you want each email to have separate properties for To, From, CC, and BCC, you have a choice of types for those properties. Each of these properties might contains:

weak or strong references to the appropriate user node
the string identifier of the appropriate user node
the path to the appropriate user node
the username

Each of these has their advantages and disadvantages.

Weak/strong references allow you to navigate from the email directly to the user nodes, but they also allow you to navigate from a usernode to the emails that they were associated with. That might sound useful, but think about the number of emails that a user might be involved in -- that's probably a huge number, and maintaining those back-references from user to email nodes would quickly become expensive. Plus, there's no organization of those back references.

The other three approaches don't use references but rather some form of "pointer" that your application can easily resolve. With a string identifier, you can simply call "Session.getNodeByIdentifier(...)" to retrieve the user node(s). This will be extremely fast, and string identifiers are durable and never change, even if the username does. However, you probably will have to resolve the opaque identifier into an email address or username before you can display anything useful about that user. (Remember that ModeShape has an internal cache of recently accessed nodes, so if a certain user is needed frequently, it may very well remain in the cache. And if this is important to you, you might eventually want to explicitly configure this internal workspace cache. But that's only when you're optimizing your system, as your application code would remain unchanged.)

Using the path to the user node or the username itself is similar, except that it will be a tiny bit slower and use a different approach to look up nodes. With a path, you simply call "Session.getNode(...)" and supply the absolute path. With a username, you simply build up a relative path from the "users" node and then call "Node.getNode(relativePath)" on that "users" node. But the advantage of either of these approaches is that you know the usernames of those involved in a particular email without having to look up those user nodes. (Yes, you may very well want to know their real names, and that would still require looking them up. But perhaps your application doesn't need to display or use their real names right away.)

What about finding emails for particular users?

This is one area where you might very well want to use queries rather than references, and you'll have much more success (and faster query performance) if you define and use node types on your nodes. The good news is that it's pretty easy to specify a primary type or add a mixin when you create your nodes. So look at your structure, determine what properties you'll want to use in criteria on queries, and then define some node types that match.

Even after you do this, you may find that your structure isn't quite right. You still can alter the structure and your code pretty easily. Doing this early will help ensure your application works as it matures.

Using primary types and mixins

Again, the first thing is to create a prototype of several structure designs to see how each of them work for your application's needs without considering node types or schema validation. But definitely consider using fake content so that you test realistic numbers of users and emails.

Once you do find a good candidate design, then you should consider defining and using node types (as primary types and/or mixins). Advantage of using node types is that ModeShape can then help validate your structure, and it makes for much more powerful querying. For example, a node type with properties for each of the fields might look something like this (but please use an appropriate custom namespace for your node own types and property names):

[acme:email] - acme:from STRING mandatory - acme:to STRING multiple mandatory - acme:cc STRING multiple - acme:bcc STRING multiple - acme:subject STRING mandatory - acme:content STRING - acme:timestamp DATE mandatory + * (nt:file) = nt:file sns

(That last line defines a child node definition for all of the attachments. I used "nt:file", but you could use anything. You could also create a single "attachments" child node and put all of the attachments under it; this might be useful if there are other non-attachment child nodes and you want to categorize the child nodes.)

So if your prototype already uses property names like those above, the only thing you have to do is explicitly set the primary node type (or add a mixin) when you create an email node.

But with this node type, you can then create a query to find all of the emails sent by a particular user within a given date range (e.g., within a particular year). If your using options 3 or 4 above, then this is simple:

SELECT * FROM [acme:email] WHERE [acme:from] = 'jsmith' AND [acme:timestamp] BETWEEN ('2011-01-01T00:00:00.000-00:00' AS DATE) AND ('2012-01-01T00:00:00.000-00:00') AS DATE EXCLUSIVE

Using the users identifier would be just as easy if you already had the user node, but you could also use a JOIN, too. Also look at bind variables in queries, so that you can easily parameterize your queries. (Definitely consider doing this if your application uses user-supplied information within queries. SQL-injection is always something you should worry about, but with ModeShape it's less of a problem than in other databases.)

Hopefully this gives you enough help to get started.
1 of 1 people found this helpful
Actions
2. Re: Content Modeling in modeShape For n:n or 1:n Relationship between nodes

sahar Apr 28, 2013 12:10 AM (in response to rhauch)

thanks so much
Actions

Go to original post