5 Replies Latest reply on Jun 7, 2013 10:08 AM by rhauch

    JBoss ModeShape integration with Liferay Portal

    j.thakkar87

      Hello

       

      I want to integrate JBoss ModeShape as a JCR Implementation for Liferay Document management.

       

      Before that I want to understand ModeShape in depth.

       

      So I started the documentation of  JBoss ModeShape and found that it is possible to deploy modeshape in tomcat server but I don't know how? Please provide me the detail steps if any one have idea

       

      Second thing that I need a working example of Java web application where the modeshape is use to manage document using configuration file.

       

      I already have exisitng database in mysql and i want to use the same using configuration but I don't have much idea about the configuration xml file? Please suggest me that how can i configure my mysql database to modeshape and implement a working example

       

      Also suggest me which version is best for this kind of requirement.

       

      Please provide me the steps or document for the same.

       

      Thanks in advance.

        • 1. Re: JBoss ModeShape integration with Liferay Portal
          rhauch

          First of all, I would strongly recommend using the latest version (at this time ModeShape 3.2; 3.3 will be released later this week).

           

          Secondly, it is possible to deploy ModeShape on Tomcat, though this is not documented well in our documentation.

           

          With Tomcat (or Glassfish), simply use ModeShape like you would in an embedded Java SE application (see here). There are two ways to do this, and which you choose will depend on how many web applications you want to share a repository.

           

          By far the easist and simplest is if you have a single application that uses the repository. In this case, embed ModeShape inside your application by following the instructions for embedding ModeShape inside a Java SE application. Just be sure to include all the ModeShape (and dependency) JARs in your WAR file. And, as the documentation describes, your application code should use the RepositoryFactory approach with a Map that contains a property that points to your ModeShape configuration file (in JSON format). The rest of your code will then handle a request by using the returned javax.jcr.Repository instance to create a Session, read/write content (to fulfill the request), and close the session.

           

          However, it's also possible (but more complicated) to deploy ModeShape within the Tomcat server as a global resource so that multiple applications can simply lookup the Repository instances inside JNDI. To do this, place all the ModeShape (and dependency JARs) in Tomcat's library folder, and then modify either the server.xml or config.xml file with the resource defining each ModeShape repository. See this section of our documentation for more details.

           

           

          I already have exisitng database in mysql and i want to use the same using configuration but I don't have much idea about the configuration xml file? Please suggest me that how can i configure my mysql database to modeshape and implement a working example

           

          You can continue to use the database to persist ModeShape content by configuring Infinispan to use the JDBC cache store (see here for more information). However, be aware that ModeShape will create its own tables and will store the content there; it will not use any existing tables or schema.

           

          Please see our Getting Started documentation for much more detail.

          • 2. Re: JBoss ModeShape integration with Liferay Portal
            j.thakkar87

            Hi Randall

             

            Thank you very much for your prompt reply.

             

            I hope you don't mind if I ask some more questions

             

            I am working in one web based Enterprise project using portal framework based on java. In my project, I have a requirement to manage DMS. For that We choose JCR approach. I go through jackrabiit but it will not be set properly when there is a heavy right scenario in cluster environment. I heard about JBoss ModeShape that is an alternate to jackrabbit but not clear whether it provides the following features

             

            1) Can we configured JBoss ModeShape in Embedded mode? is yes then how the data synchronization happens on different node and how the data is persisted and secure? in this case if there is a multiple heavy write scenario occurs from different node then how JBoss ModeShape handle the synchronization and performance?

             

            2) If we configure JBoss Modeshape as a standalone then please provide me the link or steps to do the same? also how different node or instance of my application can communicate with modeshape? Can we use Rest or WebDev or CMIS to communicate modeshape from different instance? What about the risk, performance and security in network communication in heary write or read scenario with modeshape from my application?

             

            3) Pleae provide me the steps or link for all communication mechnisam like how can I use Rest API or Web Dev or CMIS to communicate with modeshape from my application (from configuration prespective)? Which is best in my scenario of heavy write and high scalable?

             

            4) If I use Web Dev then is it fully JCR 2.0 compliant to communicate?

             

            Please provide me your inputs on this and which one the best I choose based on my requirement.

             

            I want a heavy write and read and also high scalable to maintain my DMS in my enterprise web application based on java in portal framework with high performance? It would be great if the solution is JCR 2.0 compliant

             

            One more question : can't we configure our configuration file in .xml format if we use modeshape 3.x? is it compulsory that we have to use .json format? I am asking because Jackrabbit is using .xml format to configure repositories.

             

            Is it neccessary to use infispan only to manage our backend data?

             

            Please provide me your inputs with proper link or steps or documentation.

            • 3. Re: JBoss ModeShape integration with Liferay Portal
              rhauch

              First of all, our documentation is here and our examples are here. We also have quickstarts if you're deploying on top of JBoss EAP.

               

              1) Can we configured JBoss ModeShape in Embedded mode? is yes then how the data synchronization happens on different node and how the data is persisted and secure? in this case if there is a multiple heavy write scenario occurs from different node then how JBoss ModeShape handle the synchronization and performance?

              Yes, you can embed ModeShape inside your regular Java applications. ModeShape is still clustered the same way (via JGroups), and data is still stored in Infinispan (which also needs to be clustered).

               

              ModeShape uses transactions, so concurrent writes are absolutely possible without global write locks (as is the case with Jackrabbit). See this blog post for an in-depth discussion of concurrency. Note that we rely upon Infinispan's support for and use of transactions to help make this possible. It is true that if you have multiple JCR sessions (in the same process or spread across your cluster) that are regularly writing/updating the same node and saving at the same time, then all of these changes will be serialized (one of them will block the others). But most of the time the multiple JCR sessions will be updating different nodes, in which case there is no blocking.

               

               

              2) If we configure JBoss Modeshape as a standalone then please provide me the link or steps to do the same? also how different node or instance of my application can communicate with modeshape? Can we use Rest or WebDev or CMIS to communicate modeshape from different instance? What about the risk, performance and security in network communication in heary write or read scenario with modeshape from my application?

               

              If you want to use the JCR API in your application, then you need to run ModeShape within the same process(es). So if you're running as regular Java SE applications, your application would instantiate and start the ModeShapeEngine. If your applications are deployed to a web server (e.g., Tomcat), then you can either embed ModeShape inside your singular web app or have the web server run ModeShape (e.g., in Tomcat via the "server.xml" file) have have your application(s) look up the repositories in JNDI. If you deploy on top of JBoss AS/EAP, then use our kit that installs ModeShape as a service inside AS/EAP, and again your applications just look up the repositories in JNDI.

               

              See the clustering section of our documentation.

               

               

              3) Pleae provide me the steps or link for all communication mechnisam like how can I use Rest API or Web Dev or CMIS to communicate with modeshape from my application (from configuration prespective)? Which is best in my scenario of heavy write and high scalable?

               

              If you're using REST or WebDAV, then your applications will simply access the server using our REST API or the WebDAV protocol. If you're going to use CMIS, then your application would use the CMIS REST API or a CMIS client application framework (such as Apache Chemistry). See our test case for a simple example.

               

              All of these remote protocols will be less efficient than using the JCR API from within the same process, simply because they require network communication.

               

              I can't really say which topology will be best for your scenario, but if you're concerned about heavy writes, then you might start with using the JCR API (which is also a much richer API).

               

              4) If I use Web Dev then is it fully JCR 2.0 compliant to communicate?

              Our WebDAV implementation only exposes some functionality of ModeShape, even though ModeShape is JCR 2.0 compliant.

               

               

              I want a heavy write and read and also high scalable to maintain my DMS in my enterprise web application based on java in portal framework with high performance? It would be great if the solution is JCR 2.0 compliant

              I would probably suggest starting with using the JCR API, which means running ModeShape within the same process.

               

               

              One more question : can't we configure our configuration file in .xml format if we use modeshape 3.x? is it compulsory that we have to use .json format? I am asking because Jackrabbit is using .xml format to configure repositories.

              We do not have an XML format at this time, so using JSON is currently the only format we support. (The exception is if you're deploying ModeShape within JBoss EAP, since then you configure ModeShape using the EAP configuration mechanism.

               

               

              Is it neccessary to use infispan only to manage our backend data?

              Yes, ModeShape always uses Infinispan, but you can have Infinispan store data a variety of ways.

              • 4. Re: JBoss ModeShape integration with Liferay Portal
                j.thakkar87

                Hi Randall

                 

                Thank you very much for your prompt reply. It would be great help if you answer the below questions also.

                 

                Before ask any other question, I just want to clarify that we are working on Enterprise product which run on a web server and application server where we have multiple web applications which share the same repository. our Environment is in a cluster mode and we are using Tomcat as a web server.

                 

                Now we want to cluster our environment and want to keep the repository at common place.

                 

                But As per your above comments, I understood that if we want to use the modeshape with high performance and heavy right then I have to embedded in my application server and then share the reposiotry with all other web application.

                 

                Now My main question is If we cluster the environment and each node or instance of my application server have their own modeshape and infispan as a datastore. Then how each node or instance can synchronize the data that we stored in repository because in this case repository has local to all instance and if we use the Replicated or Distributed structure from here

                 

                Then in this case consider the scenario that there is 1000 write comes to all the nodes then how modeshape handle it? what is the risk of data failure? How locking mechanism work? How data synchronized for each node with other node?

                 

                If I made a request for Write on node1 and there is a request for read the same data from node2 then how I got the data? is data synchronized properly? what will be case when read and write both request come at the same time? whicih has a priority? if write is done successfully and read come then do I get the same data that is just write on node1 and read from node2?

                 

                Explain me the synchronization process to keep the data in sync from different node?? in this case which cluster mechanisam will be helpful out of all four(local, Replicated (Not shared), Replicated (shared), Distributed, Remote) ?

                 

                How costly the synchronization process to keep the data in sync from different node?

                 

                Please provide me the steps or link to implement any one this which you suggested and pros and cons of each one if possible?

                 

                Thanks

                • 5. Re: JBoss ModeShape integration with Liferay Portal
                  rhauch

                  Before ask any other question, I just want to clarify that we are working on Enterprise product which run on a web server and application server where we have multiple web applications which share the same repository. our Environment is in a cluster mode and we are using Tomcat as a web server.

                   

                  Now we want to cluster our environment and want to keep the repository at common place.

                  Great.

                   

                  But As per your above comments, I understood that if we want to use the modeshape with high performance and heavy right then I have to embedded in my application server and then share the reposiotry with all other web application.

                  Yes, if you want to use the JCR API. You could, of course, put ModeShape into a separate cluster of web/application servers, and have your application remotely access the repository through our RESTful service. But as I mentioned, our RESTful service exposes a lot of basic functionality, but it does not expose all of the functionality that the JCR API exposes. Now, you could create your own RESTful service that is designed exactly around how your application would access the content. This would be more efficient than using ours, since you could minimize the network hops while still having your service use the full JCR API.

                   

                  I don't know which approach would be best, since it is so dependent upon your application needs.

                   

                   

                  Now My main question is If we cluster the environment and each node or instance of my application server have their own modeshape and infispan as a datastore. Then how each node or instance can synchronize the data that we stored in repository because in this case repository has local to all instance and if we use the Replicated or Distributed structure from here

                   

                  Then in this case consider the scenario that there is 1000 write comes to all the nodes then how modeshape handle it? what is the risk of data failure? How locking mechanism work? How data synchronized for each node with other node?

                  Yes, if every process in the ModeShape cluster (whether ModeShape is embedded in your web server cluster or a separate cluster of servers) would have it's own copy of the data. Your applications (or service) would obtain a JCR Session, read and/or make changes to content, optionally save any changes, and then close the session. This happens very frequently and sessions are short-lived. But when your application saves the changes in a session, ModeShape will start a transaction (since you're very likely not using user transactions in your architecture), apply all of the changes you've made, and immediately commit the transaction. It is during this transaction that the distributed/replicated nature of Infinispan takes effect, because Infinispan ensures that all of its processes that have a copy of the changed nodes actually lock those nodes for writing, ensuring that no other sessions can change the node at that time. (Again, I describe this in more detail in my aforementioned blog post.) When the transaction completes, all node locks are released and the next session blocking on the write locks for those nodes then obtain the lock, etc.

                   

                  So to answer your question, Infinispan is doing most of the work here, because it is ensuring that all of the Infinispan processes are collaborating so that they all have the latest version of every node they own. If you're running Infinispan in a replicated mode, that means every process in the cluster "owns" the node. If you're running in distributed mode, only those processes that have a copy of the node will "own" the node. Obviously, if you're cluster is small, replication works great. But as the cluster size grows (even to a half-dozen processes), distributed mode starts to outperform replicated mode. And this makes logical sense: if you have a cluster of 20 processes, a replicated setup would have 20 copies of each node, whereas a distributed setup would ensure that you have, say, 4 copies of every node, but these copies are distributed across the cluster so that you do not lose data if you lose machines. And for very large clusters that span data centers, you don't even need to use a cache store, since your distributed cluster maintains multiple copies (on multiple machines in separate racks in separate data centers) and is very fault tolerant. Again, Infinispan is doing all of this work for us.

                   

                  Now, this is how the stored representation of nodes are shared/synchronized across the cluster. But just with this kind of architecture, we need a way for the changes committed in one session to be announced to all other sessions in all processes in the cluster. That's why ModeShape itself needs to be clustered, and it's how ModeShape implements JCR event listeners and maintains the query indexes.

                   

                  There are a variety of ways that the ModeShape processes in the cluster can be configured, depending upon your needs and your environment. See this documentation for more details. Note that in general JMS-based sharable indexes are generally more scalable, but not all web server environments have JMS.

                   

                   

                  If I made a request for Write on node1 and there is a request for read the same data from node2 then how I got the data? is data synchronized properly? what will be case when read and write both request come at the same time? whicih has a priority? if write is done successfully and read come then do I get the same data that is just write on node1 and read from node2?

                  Using "node" to describe processes in the cluster is difficult with ModeShape, since "node" is a piece of data within a repository content tree.

                   

                  But, assuming you meant processes: The request to write on process1 will get/wait for a lock on the data it is writing. Any other request to read that data (e.g., nodes) will never be blocked by the write lock, and they can proceed immediately. So if a request comes in and your application uses a session (within the same process or a different process) to read the data (e.g., nodes) but before the write session commits the changes, then the request will see the earlier version of the data (e.g., node). However, if it arrives after the write session's changes are committed, the request will see the updated version of the data (e.g., node).

                   

                  You might think that if a session gets an earlier version of a JCR node, and another session immediately changes that node's persisted representation, that the first session is immediately corrupted. It is not. True, it might seem that the application using that first session might generate a web page that contains slightly out-of-date information, but it actually wasn't out-of-date when the information was read. (This is true of every architecture: by the time a user actually sees the data, it might already be obsolete.)

                   

                  When an application uses a session to make a change to a node, the session actually records the changes being made (e.g., a new property was added, an existing property was changed/remoted, a new child was added). Then, when the application calls Session.save(), ModeShape actually:

                   

                  1. gets the write locks for all nodes that will be changed,
                  2. for each node
                    1. reads the most-up-to-date version of the node,
                    2. applies the changes to the node representation, and
                    3. writes the updated node-representation back to Infinispan
                  3. commits the transaction to release all of the write locks

                   

                  If there was a change/delta that cannot be applied to the updated node representation because that change is not compatible, then ModeShape will throw an exception. But this is rare. Often it's because the session is affecting a node that was just deleted by another session (e.g., adding a reference to node X, but node X was just removed).

                   

                  Explain me the synchronization process to keep the data in sync from different node?? in this case which cluster mechanisam will be helpful out of all four(local, Replicated (Not shared), Replicated (shared), Distributed, Remote) ?

                   

                  I can't answer this, because there is no one right answer. (If there were, we wouldn't give users the option.) I can give you some guidelines, though:

                   

                  • You should rule out setting up Infinispan with a "local" mode, since that simply is not clustered.
                  • Use "replicated" for small cluster sizes (e.g., no more than 4-6-ish); otherwise, distributed will likely be more efficient
                  • Use a shared cache store only for JDBC or some other cache store that actually supports it; many will not

                   

                  I addressed remotely accessing ModeShape earlier - I do believe that if this is important for your architecture, that you build your own (RESTful?) service on top of ModeShape that is deployed with the ModeShape cluster. The benefit is that you can size the ModeShape cluster for throughput/load separately from your application; the disadvantage is the additional network overhead.