Some questions about ModeShape| JBoss.org Content Archive (Read Only)

1. Re: Some questions about ModeShape

michaelwesterngate Jan 20, 2012 6:04 AM (in response to michaelwesterngate)

Hi,

One more sorry ;-) How does ModeShape clustering using JGroups and JBoss AppSrv go together?

Furthermore.. Any idea when a ready-made package for JBoss7 will be avail?

thanks,

Alex

2. Re: Some questions about ModeShape

rhauch Jan 20, 2012 10:02 AM (in response to michaelwesterngate)

I've got the use case to create jcr repositories on demand, i.e. not pre-configured depending on user action; For exampe I receive a new user so I will create a new repository for him etc. without restart.
Is this possible? If so, how? I know, I can create configurations dynamically but that isn't of help in that case. And even more important: Will and how will that work over clusters?

For 2.x, it's not possible to dynamically create a Repository instances. (You can dynamically create workspaces instances, but that's not what you're asking.)

For 3.x, you will be able to dynamically create new Repository instances, and you'll even be able to modify their configuration on the fly, while the repository is being used. (There are just a few configuration settings that will not be able to be changed while the repository is running: things like the Infinispan cache where the repository's owned content is being stored. But almost everything else can be changed at any time.)

Unless otherwise specified, the remaining answers apply to both 2.x and 3.x.

What kind of connection modes are available to the repository? I know REST and I guess when I am using a cluster I need to use that one right? Or can I boot up a JcrEngine in my web application and it discovers the other nodes of the cluster?

Clustering is set up in the configuration, but after that is completely transparent to JCR clients. Clustering merely allows multiple processes to use JCR to use the same repository, yet they still see the changes and events whether they originate in the same process or different processes.

As far as remote connection modes, you can use REST, WebDAV, or JDBC to talk to a remote ModeShape instances deployed into an app/web container, whether or not ModeShape is clustered. You can also use the local JDBC driver within the same app/web container that ModeShape is running. JCR only works when your application is running in the same process as ModeShape.

My use case is that I will have 1000's of different repositories identified by their UID. Whenever my web app (or any other) boots up, it retrieves after login of a user the UID of the repository it wants to use and only that one. What'd be the most efficient way to realize that (also using clusters) without having 1000's of repositories running at all time? Write my own "repository" server or something?

Are you sure you can't use workspaces? Or better yet, partition a single workspace (or a smaller number of workspaces) into different areas for each user? You can implement your own security to prevent users from seeing what they're not supposed to see. Think of a JCR repository like a database in a relational DBMS ... you generally don't create 1000's of databases. More often, you create a few databases and limit the data that each user can see. Obviously there are some situations where you can and do create a separate database for each user. Each 2.x ModeShape repository has some weight to it, so creating 1000's of them is not really recommended at this time.

In ModeShape 3.x, repositories are lighter weight, but they still are quite a bit heavier than workspaces. And in 3.x, workspaces are quite lightweight. In fact, there's only a little difference in how ModeShape 3 will handle a repository with a total of N nodes either in one workspace or spread across many workspaces.

How mature is the REST server & client? Is there a list of JCR features it supports?

They're relatively simple and minimal, and are not REST APIs for all of JCR. They basically allow a client to use REST to read, update, and delete content. So no events, no workspace operations (e.g., import, export, node type management, locking, etc.).

What's the most efficient backend for storing large amounts of data? And have there ever be any thoughts made on writing a "native jcr repository db"?

In 2.x, the fastest is probably the Infinispan connector, but the Disk connector is pretty fast, too. See How To Select The Right Connectors and How To Tune ModeShape for Better Performance.

In 3.x, the fastest will definitely be using Infinispan as an in-memory data grid (with or without cache loaders to persist content in the data grid). For smaller installations where a distributed grid isn't feasible, Infinispan using the BerkleyDB cache loader has shown (so far) to be the fastest.

I'm not sure what you mean by "native jcr repository db".

One more sorry ;-) How does ModeShape clustering using JGroups and JBoss AppSrv go together?

In 2.x, the clustering is configured within the ModeShape configuration file, which contains an embedded JGroups configuration (even when deployed to JBoss AS 5 or 6). This works fine, since ModeShape uses a separate channel anyway, and we don't want to piggyback on the same channel as other components.

In 3.x, clustering will also use JGroups, and will still allow you to embed a JGroups configuration. However, when deployed within AS7, we hope to simply just reference the domain's clustering configuration, meaning that ModeShape 3 clustering on AS7 should just work with AS7 clustering.

Any idea when a ready-made package for JBoss7 will be avail?

We'll only be supporting a JBoss AS7 subsystem for ModeShape 3, and we hope to have something by the first 3.0 alpha (we're shooting for next week). There would be so little integration with ModeShape 2, that it's just as easy to use the "embedded" technique with AS7 that is used for Tomcat or other servlet/app containers.

Hope this helps!

3. Re: Some questions about ModeShape

michaelwesterngate Jan 20, 2012 10:36 AM (in response to rhauch)

hi!

Wow, thanks a bunch for those extensive and fast answers, highly appreciated!

I hope you don't mind me asking a few more which should finally help me to see whether I should vote to use ModeShape for in our company's project:

I am still quite confused about clustering but that surely might be caused by my general missing knowledge about that topic. Anyway, In my understanding, I would want to create a cluster of JCR Repositories on different machines to gather load balancing and replication features, right? Is that what the system does? If so, how does it guarantee from which node to read data from? And does it write to the master only? If so, how does he make sure that everything's synchronized?
I am thinking about creating a bunch of cluster nodes on different machines. Now, all I'd need is access to the REST and WebDAV service. For my personal guess, just to be "cluster" nodes, jboss is way too heavy for my needs so would it be easily possible to switch and i.e. create a simple "server" instance (using embedded jetty) as cluster nodes? Would that work without any additional setup besides the jgroups/cluster setup?
About security: I'd like to use a SSO system. I am quite curious how that would go together with the REST service?
Switching to workspaces instead sounds fair to me so yes, I think that's a good idea
With native db I was ment to say to use a database storage backend that fits jcr's nature best especially considering performance and stability. For my needs, I think writing a db backend not using JPA but writing/reading from a graph database would be much more efficient no?
What's the relationship of ModeShape to http://www.jboss.org/exojcr.html?
Is it possible to add federated repositories into a workspace at runtime? Furthermore, is it possible to add federated repositories to specific workspaces only, not the whole repository? What I am talking of is something like "linking" one of our repositories into the workspace of a user, etc. If so, how's that done?
3.x sounds pretty neat.. when you think it'd be feature complete with the things you've mentioned? Will it be (API) compatible to 2.x? Is there a roadmap somewhere? I might have missed it..

Thank you SO much! Already like modeshape pretty much.. :-)

4. Re: Some questions about ModeShape

rhauch Jan 20, 2012 2:00 PM (in response to michaelwesterngate)

I am still quite confused about clustering but that surely might be caused by my general missing knowledge about that topic. Anyway, In my understanding, I would want to create a cluster of JCR Repositories on different machines to gather load balancing and replication features, right? Is that what the system does? If so, how does it guarantee from which node to read data from? And does it write to the master only? If so, how does he make sure that everything's synchronized?

Yes, clustering is used to have a number of processes with ModeShape running inside so that they can balance/share load, be tolerant of some processes failuring, etc. But in 2.x, all ModeShape engines in the cluster:

use connectors to the same underlying storage system, which can be an Infinispan cache/grid, the database (JPA) connector, or the disk-based connector; each repository in each engine has direct access to any/all of the content in the underlying storage (there's no "sharding" of the data, no session affiinity)
fire events for changes made in that engine/repository to a shared JGroups channel, and receive events for changes made in other processes via the same channel
can be used by any client to establish a JCR Session; the client uses that Session and when finished logs out.

For example, consider a web application that uses JCR and that is going to be deployed to a cluster of application servers. Each of those servers would also have a ModeShape engine (in the case of JBoss AS, ModeShape is would be deployed as a service, but in the case of other application servers ModeShape can be embedded in the web app or a different "JCR engine" web app and accessed via JNDI). In this case, the web app just uses the local ModeShape engine via the JCR API. But ModeShape ensures that all changes made via any engine are coordinated with the underlying storage, and that any JCR events generated by those changes are fired on all of the JCR observation listeners (in all processes). Note that the JCR API lets you define whether each listener is interested in local-only events.

ModeShape 3.x will behave from the JCR app's perspective in a similar fashion to 2.x, but the storage options will be different. In addition to ModeShape running in each app server, Infinispan will also be running in each app server and may (if Infinispan cache loaders are configured) persist cached content in an external store. So Infinispan will distribute (and replicate) content across the cluster, but still any ModeShape engine will be able to access all of the content in each repository.

I am thinking about creating a bunch of cluster nodes on different machines. Now, all I'd need is access to the REST and WebDAV service. For my personal guess, just to be "cluster" nodes, jboss is way too heavy for my needs so would it be easily possible to switch and i.e. create a simple "server" instance (using embedded jetty) as cluster nodes? Would that work without any additional setup besides the jgroups/cluster setup?

The REST and WebDAV services are merely servlet applications that can be deployed to the same app server as ModeShape. (Each work as follows: when a request is received, it creates a session, processes the request, produces the response, and then closes the session.) They REST and WebDAV service don't really know or care whether or not the ModeShape repository they're using is clustered.

About security: I'd like to use a SSO system. I am quite curious how that would go together with the REST service?

ModeShape works out of the box with JAAS, and I think there are ways of configuring JAAS to work with a SSO. Alternatively, you can plug in your own authentication/authorization provider into each ModeShape repository.

With native db I was ment to say to use a database storage backend that fits jcr's nature best especially considering performance and stability. For my needs, I think writing a db backend not using JPA but writing/reading from a graph database would be much more efficient no?

Yes, JPA is not the fastest, but isn't too bad if properly configured with a second level cache. But yes, we're really not using much of the "relational" aspect of a DBMS. Instead, we really need just a transactional, distributable, and document store. And Infinispan actually fits the bill really nicely (it can even persist cached content to NoSQL stores), which is why we are completely replacing 2.x connectors with Infinispan. (There are lots of other benefits, too. See Next Generation ModeShape for some background.)

What's the relationship of ModeShape to http://www.jboss.org/exojcr.html?

eXo JCR is the JCR 1.0 implementation used by the eXo Platform, and that was brought into the JBoss.org system in the last 2 years (ModeShape is ~4 years old). It's only used by JBoss Enterprise Portal Platform. ModeShape 2 is currently used by the JBoss SOA Platform 5.x and is an optional choice for JBoss BRMS 5.x. (My hope is that ModeShape 3 becomes the default JCR in BRMS 6.0).

Is it possible to add federated repositories into a workspace at runtime? Furthermore, is it possible to add federated repositories to specific workspaces only, not the whole repository? What I am talking of is something like "linking" one of our repositories into the workspace of a user, etc. If so, how's that done?

Well, it's technically possible in 2.x, but really difficult and requires non-public APIs. In 2.x configurations, sources define the what each connector is talking to; for the federated sources, this includes which other sources are being federated and where (e.g., which node in a workspace). Then each repository merely uses one of the sources, and maps the workspaces to the source's workspaces. See our Reference Guide for more details.

In 3.x, we'll actually allow dynamically configuring the connectors to external sources, and (we think) federation will be configured by properties/mixins on the specific federated nodes.

3.x sounds pretty neat.. when you think it'd be feature complete with the things you've mentioned? Will it be (API) compatible to 2.x? Is there a roadmap somewhere? I might have missed it..

Our goal is that 3.0 becomes feature complete in late Feb or March.

As far as API compatibility, ModeShape 3.x will continue to use the JCR 2.0 API, so that won't change. Our goal is that any application using the JCR API and ModeShape 2.x will simply just work when migrated to ModeShape 3.0. We do have a small public API that we allow applications to use, and a handful methods/interfaces were marked as deprecated in 2.7.0.Final and will be removed in 3.0. A couple of these are no longer needed with JCR 2.0 (e.g., they were introduced in ModeShape 1.x when we used JCR 1.0, and we couldn't remove them in 2.x), while the remainer already have replacements in 2.7.0.Final (e.g., loading node types via CND files). We've also added a few things to the ModeShape 3.0 public API.

With 3.0, though, quite a few "under the covers" things are changing dramatically:

Our configuration file format is completely different, and we have sensible defaults for almost everyhing. Where we used to have one large configuration file for the whole engine, in 3.0 each repository will have a completely separate configuration that is independent from all other repositories. We hope to provide a tool that will convert 2.x configurations to 3.0 configurations, but the best bet will be to manually convert the configuration files.
Many of our SPIs are changing. Where 2.x sequencers used our internal graph API, in 3.x they only use the JCR API. The interfaces for text extractors and MIME type detectors are changing package names, but are otherwise unchanged. And the 3.x connector SPI is going to be radically different.

5. Re: Some questions about ModeShape

michaelwesterngate Jan 23, 2012 9:24 AM (in response to rhauch)

Hi there!

Thank you so much thus far for answering my questions! Would you mind me asking a few more to find final clarifications?

As I do understand, MS 3.x will no longer include custom connectors but instead, use Infinispan for everything. I have no clue of infinispan so, let me ask
- Will persistance still work? I don't want to keep more than, say 1GB frequently used nodes in memory, rest should be kept on disk (safely!)
- How could we easily write custom persistence using a db of our choice for the jcr content?
- How to connect to other sources like SVN and JDBC sources etc. to be able to integrate other sources into the repo? Using the federation stuff?
- How does the federation stuff then work when no custom connectors are no longer avail?
How is it easily possible to create workspaces at runtime with 2.x as well as with 3.x?
How is it easily possible to add federation (i.e. to various sources like jdbc, svn, other repos etc.) with 2.x as well as with 3.x at runtime?
Where would we need to start from to be able to extend / implement our own REST Service?
Does the REST Service allow regular file download or will nt:file date be returned as base64 encoded json stuff?

Oh, and where can we get a snapshot from 3.x? And is there a list of features it already implements?

thank you so much!

Alex

6. Re: Some questions about ModeShape

rhauch Jan 23, 2012 10:30 AM (in response to michaelwesterngate)

Thank you so much thus far for answering my questions! Would you mind me asking a few more to find final clarifications?

No, sorry. You've reached your limit. (Just kidding. I'm happy to answer them.)

As I do understand, MS 3.x will no longer include custom connectors but instead, use Infinispan for everything. I have no clue of infinispan so, let me ask
Will persistance still work? I don't want to keep more than, say 1GB frequently used nodes in memory, rest should be kept on disk (safely!)
How could we easily write custom persistence using a db of our choice for the jcr content?
How to connect to other sources like SVN and JDBC sources etc. to be able to integrate other sources into the repo? Using the federation stuff?
How does the federation stuff then work when no custom connectors are no longer avail?

See Extended Federated Connector for a discussion about connectors and how we likely will bring the concept (not API or implementations!) forward at least for external content being federated into the repository. For content stored within the repository, we will indeed use Infinispan rather than our older connector framework.

You can think of Infinispan as a big heap-based map, which we're using to store for each node a JSON-like document value keyed by a unique identifier. (I say "JSON-like" because it's actually an in-memory representation of the JSON document; we do NOT have to parse JSON text file over and over.) Infinispan can be configured to never store anything (at which point it's more like a cache than a store), but you can also configure it to persist some or all of the content to an external store. When persisting some, then it's storing what's not kept in memory; when it's persisting all, then what's in-memory is a subset of what's stored. The latter will be the most common configuration for non-clustered or small cluster scenarios, and will behave like you suggest (keeping the most recently-used values in-memory while passivating those that haven't been used in a while). For scenarios that involve large-scale clusters, multi-site deployments, and/or high performance, you can also set up Infinispan to be a data grid, where it keeps in-memory multiple copies of every node (document) distributed across multiple processes, and yet allows any of the clients (in this case ModeShape) to access/update any of the nodes (documents) from any process.
Infinispan's cache loaders do the job of reading and writing values to an external persistence store, and Infinispan already has several (including two that store values in a relational database). You'll be able to just use these within a ModeShape installation, or you can easily write your own.
Federating external content will also be possible, but that will involve either using ModeShape connectors or writing a custom connector. We've not locked down our connector API yet.
You'll be able to write custom connectors, but they'll use a different (and hopefully far simpler) API than with 2.x.

How is it easily possible to create workspaces at runtime with 2.x as well as with 3.x?

Creating workspaces at runtime will work in 3.x the same way that it does in 2.x: through the JCR AP for creating workspaces.

How is it easily possible to add federation (i.e. to various sources like jdbc, svn, other repos etc.) with 2.x as well as with 3.x at runtime?

See the Extended Federated Connector thread for much more detail about where we're doing for federation in 3.x federation. 2.x federation has limitations that make adding federation at runtime difficult if not impossible (depending upon how dynamic you want it to be).

Where would we need to start from to be able to extend / implement our own REST Service?

We do want to improve our simplistic RESTful service, and can hopefully make it easier to extend. At the present time, feel free to write your own or extend ours or copy our code as a starting point (adhering to our licenses as required by the LGPL). We can help you if you have more specific questions, ideally in a separate discussion thread.

Does the REST Service allow regular file download or will nt:file date be returned as base64 encoded json stuff?

This is one of the improvements we want to make. Right now, all content is encoded, which indeed does make it harder to use. WebDAV, however, does allow you to easily download the files.

Oh, and where can we get a snapshot from 3.x? And is there a list of features it already implements?

We don't yet have a snapshot, but we'll be releasing our first Alpha this week and will be providing snapshots shortly thereafter. You'll be able to use the alpha to create an embedded ModeShape and most of the JCR features supported in 2.x will work (except for queries, sharable nodes, and observation). The JBoss AS7 kit won't be runnable. We hope to then start issuing releases every few weeks, and will switch from 'alpha' to 'beta' as soon as we are feature complete.

7. Re: Some questions about ModeShape

michaelwesterngate Jan 23, 2012 10:15 PM (in response to rhauch)

hi,

Oh well, now that my limit of questions is exhausted I'm no longer scared on bombing you with some more questions - Promising that I am getting close to finish up with 'em

Aight, here they go:

We'd like to federate a lot of different content easily, like RSS feeds and the such. More important, we'd also need "temporal" nodes. That is, they should not be persisted or at least not for too long, they're just floating around and get cleaned at some point, imagine i.e. federating an event bus into the repository. Would that work? Is there a way with 2.x or at least with 3.x to have such kind of "temporal" nodes (or at least a way to implement it on our own) that float around, get cleaned easily and are NOT persistent after all?
When do you think would be a good time for us, considering our mentioned requirements, to start off with modeshape 3.x?
I seem to be too dumb to find the real entry point in creating our own REST Service.. any pointers?
Any clue why the out-of-the-box JBoss 6 deployment of ModeShape doesn't propagate the jndi repository name as specified in the documentation? It is all running just fine, using the exact default config, we can access it using REST and Webdav, we've also got the modeshape admin stuff and a ModeShapeDS Datasource but when trying to access it using JCRExplorer (which runs fine by itself) using the proposed jndi name (tried a lot of different ones as well) it can't find the name?!
Can we use the full power of lucene (also lucene queries) within ModeShape or do we need to add yet another search indexer? If we can, how do we use lucene queries and setup indexing etc.?

thanks abunch!

8. Re: Some questions about ModeShape

rhauch Jan 24, 2012 1:10 PM (in response to michaelwesterngate)

We'd like to federate a lot of different content easily, like RSS feeds and the such. More important, we'd also need "temporal" nodes. That is, they should not be persisted or at least not for too long, they're just floating around and get cleaned at some point, imagine i.e. federating an event bus into the repository. Would that work? Is there a way with 2.x or at least with 3.x to have such kind of "temporal" nodes (or at least a way to implement it on our own) that float around, get cleaned easily and are NOT persistent after all?

ModeShape 2.x doesn't have support for this kind of feature, but it could be similated by adding content, annotating it with an appropriate mixin and timestamp, saving it, and periodically removing older content. Not ideal, but it definitely would work.

This doesn't seem like it'd be too difficult to add, but it hasn't been covered in our discussion on federated use cases. If you respond to that thread and describe your scenarios (being as detailed as possible), we can definitely make sure it's covered (if not in 3.0 then in 3.1 or 3.2, which would be late summer at the latest).

When do you think would be a good time for us, considering our mentioned requirements, to start off with modeshape 3.x?

Federation probably won't be available until 3.0 Beta1, which should be next month. Of course, you could start trying out the alphas to start getting a feel for the standard JCR stuff, considering Infinispan configurations, and helping us out by general testing. Depends on how much time you can afford to help us out.

I seem to be too dumb to find the real entry point in creating our own REST Service.. any pointers?

You can look at our RESTful service for an example. The main Java code (that uses JAX-RS and RESTEasy) is here (in the 3.x codebase, which is as-of-yet unchanged from 2.x), and we have a separate Maven module for building the WAR file (tho its possible to have the main module build the WAR file). The "starting" point in the code is the JcrResources class that contains methods with JAX-RS annotations for each HTTP verb and other criteria, and the JcrApplication class that contains the list of annotated classes (including the JcrResources class and the exception handlers). The "WEB-INF/web.xml" then specifies stuff needed to bootstrap RESTEasy to handle the incoming requests, delegate to our JcrResources class (which generates the response), and catches any exceptions (which are then translated into the HTTP response).

Note that the JcrResources methods (or any other class that was annotated to handle the HTTP requests) would essentially just:

get a JCR Session
perform the operation and save the session (if required)
generate the response
close the session (always)

How your methods do this is up to you. Our JcrResources class uses a number of other classes that encapsulate various functionality and generate the output (we use JSON), but yours could be quite a bit simpler.

This is the nice thing about JAX-RS (and RESTEasy): you literally create a class, annotate methods for the various HTTP operations and resources, and then build and deploy the WAR file.

Any clue why the out-of-the-box JBoss 6 deployment of ModeShape doesn't propagate the jndi repository name as specified in the documentation? It is all running just fine, using the exact default config, we can access it using REST and Webdav, we've also got the modeshape admin stuff and a ModeShapeDS Datasource but when trying to access it using JCRExplorer (which runs fine by itself) using the proposed jndi name (tried a lot of different ones as well) it can't find the name?!

This would be a good question to discuss in another thread (as it'd probably be useful for others). But since the ModeShape JDBC driver, RESTful service, and WebDAV service all use the JCR RepositoryFactory mechanism, my guess is that it probably boils down to one of two problems:

Either the JNDI lookup is not correct in JCRExplorer, or
The JCRExplorer web app contains the JCR API, and in our JBoss AS5/6 kit the JCR API is already on the classpath. The result would be the JCRExplorer sees the Repository instance, but it isn't an instance of its Repository class. If this is the case, then remove the JCR API JAR from the WAR file.

Can we use the full power of lucene (also lucene queries) within ModeShape or do we need to add yet another search indexer? If we can, how do we use lucene queries and setup indexing etc.?

I'm not sure what you're asking. Yes, your application can use the JCR API to create and execute JCR-SQL2 queries (or any other query languages supported by ModeShape), and this will return either result sets (with tuples) or nodes. All this is done via the standard JCR API, and even works in ModeShape 2.x. Note that ModeShape has a non-standard full-text query language that is more like a search engine.

Your application cannot, however, directly query the Lucene indexes without potentially troublesome side effects.

9. Re: Some questions about ModeShape

michaelwesterngate Jan 25, 2012 3:32 AM (in response to rhauch)

Randall Hauch schrieb:

We'd like to federate a lot of different content easily, like RSS feeds and the such. More important, we'd also need "temporal" nodes. That is, they should not be persisted or at least not for too long, they're just floating around and get cleaned at some point, imagine i.e. federating an event bus into the repository. Would that work? Is there a way with 2.x or at least with 3.x to have such kind of "temporal" nodes (or at least a way to implement it on our own) that float around, get cleaned easily and are NOT persistent after all?

ModeShape 2.x doesn't have support for this kind of feature, but it could be similated by adding content, annotating it with an appropriate mixin and timestamp, saving it, and periodically removing older content. Not ideal, but it definitely would work.

This doesn't seem like it'd be too difficult to add, but it hasn't been covered in our discussion on federated use cases. If you respond to that thread and describe your scenarios (being as detailed as possible), we can definitely make sure it's covered (if not in 3.0 then in 3.1 or 3.2, which would be late summer at the latest).

Hmm I am not sure whether this feature would fit into the federated use case. See, what I am looking for is something simple like annotating a node with a timestamp like you've said. Setting it to "indefinite" will never remove the node and persist it (same as without annotating with a timestamp). A value of zero would mean that the node will never make into the repository. I.e. that's like adding and immediately removing it BUT the idea is that the observation events would still be fired for it. By doing so, you wouldn't stress the repository with a new node that's "already dead" but instead you'd simply receive events for it. This would allow to build a simple yet efficient event system using repository capabilities. Besides that, you could add other timestamps in milliseconds (or whole dates) to be able to let nodes go out of scope at a certain point of time. Those nodes are not required to come from a federation though that brings me to my new question -> is it easily possible to clone nodes including whole subtrees? Because by doing so, you could oberserve a certain path for new nodes with timestamp=0 and "transform" them (when necessary) into another structure providing the result into the federated source (i.e. an Enterprise Bus). Know what I mean?

Furthermore there's one more question -> Is it possible to use "filtering" (aka queries) with nodes in-memory only without checking for the repository structure? What I am talking off is to be able to apply a filter for nodes with timestamp=0 to see whether they fit into a certain category and so some work with it.. As the node with timestamp=0 will never make it into the repository after all, we'd need a way to separately try to match "filters" (the WHERE part of queries) to nodes..

Should I open a ticket for this idea/feature? It'd be certainly something we'd need before we could use the jcr stuff...

Can we use the full power of lucene (also lucene queries) within ModeShape or do we need to add yet another search indexer? If we can, how do we use lucene queries and setup indexing etc.?
I'm not sure what you're asking. Yes, your application can use the JCR API to create and execute JCR-SQL2 queries (or any other query languages supported by ModeShape), and this will return either result sets (with tuples) or nodes. All this is done via the standard JCR API, and even works in ModeShape 2.x. Note that ModeShape has a non-standard full-text query language that is more like a search engine.

Your application cannot, however, directly query the Lucene indexes without potentially troublesome side effects.

Hmm so at the very end, we'd need to use another, separate indexer to be able to use the full power of queries right? I mean, having had a quick look at the full text query language of mode shape it doesn't look like it's getting even close to the power of lucene queries?

thank you so much!

10. Re: Some questions about ModeShape

rhauch Jan 25, 2012 9:41 AM (in response to michaelwesterngate)

Michael Westerngate wrote:

Hmm I am not sure whether this feature would fit into the federated use case. See, what I am looking for is something simple like annotating a node with a timestamp like you've said. Setting it to "indefinite" will never remove the node and persist it (same as without annotating with a timestamp). A value of zero would mean that the node will never make into the repository. I.e. that's like adding and immediately removing it BUT the idea is that the observation events would still be fired for it. By doing so, you wouldn't stress the repository with a new node that's "already dead" but instead you'd simply receive events for it. This would allow to build a simple yet efficient event system using repository capabilities. Besides that, you could add other timestamps in milliseconds (or whole dates) to be able to let nodes go out of scope at a certain point of time.

You're right. I combined the fact that you're pulling information from RSS and presumed the transient-ness feature was for federation.

I think it might make more sense to define a mixin (e.g., something like "mode:transient") that has a required property (e.g., maybe "mode:timeToLiveInSeconds" or "mode:timeToLiveDate" ) that defines how long the node is to be kept in-memory. The challenge is that it would have to apply to all child nodes, and it also makes the parent reference to this transient node more difficult because we need to handle (and cleanup) the invalid child reference once the child has been pruned. BTW, any transient nodes that did get pruned should probably fire the appropriate JCR remove-node events.

We should probably treat this as a new feature. I'm not sure we'll have time to add this into 3.0 (we can try), but mostly likely it'll get pushed to 3.1 (prolly about 6 weeks after 3.0). I can think of a couple of ways of implementing it pretty efficiently (using Infinispan's infrastructure), so I don't think we need to worry about the 3.0 design not handling it.

Those nodes are not required to come from a federation though that brings me to my new question -> is it easily possible to clone nodes including whole subtrees? Because by doing so, you could oberserve a certain path for new nodes with timestamp=0 and "transform" them (when necessary) into another structure providing the result into the federated source (i.e. an Enterprise Bus). Know what I mean?

I think I do, but this would be client code that would use observation, and the ESB would need a new/custom connector.

Furthermore there's one more question -> Is it possible to use "filtering" (aka queries) with nodes in-memory only without checking for the repository structure? What I am talking off is to be able to apply a filter for nodes with timestamp=0 to see whether they fit into a certain category and so some work with it.. As the node with timestamp=0 will never make it into the repository after all, we'd need a way to separately try to match "filters" (the WHERE part of queries) to nodes..

Wouldn't JCR queries work for this? You could easily use a query like :

SELECT * FROM [mode:transient] where [mode:transient].[mode:timeToExpire] < CAST('2012...' AS DATE)

Maybe I'm not following what you're trying to do.

Should I open a ticket for this idea/feature? It'd be certainly something we'd need before we could use the jcr stuff...

Sure, go ahead and open a ticket for the transient feature. Please include a thorough description and use case.

Hmm so at the very end, we'd need to use another, separate indexer to be able to use the full power of queries right? I mean, having had a quick look at the full text query language of mode shape it doesn't look like it's getting even close to the power of lucene queries?

Yes, that's true that the full-text search language doesn't really expose much of the Lucene power. But the JCR-SQL2 language is extremely powerful, and we translate much of these queries into Lucene queries. Can you provide a more concrete example of what you want to do?

11. Re: Some questions about ModeShape

michaelwesterngate Jan 27, 2012 4:05 AM (in response to rhauch)

hi,

I've added a ticket for it - https://issues.jboss.org/browse/MODE-1388

Please check it as I am not sure I've described / setup everything properly..

thanks!

12. Re: Some questions about ModeShape

rhauch Jan 27, 2012 9:07 AM (in response to michaelwesterngate)

I guess I'm still not understanding the desired behavior of the transient node feature. Specifically, I don't understand the benefit of setting the timespan/lifetime to 0, and why the node will be immediately removed right after being added. This is definitely an edge case that we have to consider, but I guess I would have expected the timespan/lifetime property to require a positive value.

IIUC, the description is expecting an event mechanism that allows the listener to effectively veto the removal of transient nodes. I'm not sure this could be done with the JCR observation API, since the API does not allow vetos. In particular, see Section 12.5 of the JSR-283 specification:

This observation mechanism is asynchronous in that the operation that causes an event to be dispatched does not wait for a response to the event from the listener; execution continues normally on the thread that performed the operation.

The description mentions that when a transient subgraph is removed, any REFERENCE properties pointing to any nodes in the subgraph would be removed. That's an option, although it may cause authorization problems if the referring node cannot be modified. An alternative might be to disallow setting REFERENCE properties to any node (or descendant of a node) that is marked as transient. WEAKREFERENCE properties could always be used, since they don't prevent removal like REFERENCE properties do. I'm not sure which is better: the former is more attractive, while the latter would seem difficult for client apps to understand. A third option is to prevent transient nodes from being referenceable.

So I think there's still a bit of work to flesh out all of the requirements. It also would be very useful to give one or more use cases that describe why/how this feature would be used and why it cannot be done another way.