1 2 Previous Next 18 Replies Latest reply on Mar 8, 2013 10:22 PM by clementp

Very slow replication in cluster

anweibel Feb 22, 2013 9:41 AM

Another beginner's question...

We have set up an infinispan replication cluster with two members, both running on the same machine. When we change the content of a node on one host, the other member of the cluster seems to pick up this change only after severeal minutes - which seems like a very long time.

Is there a way to force infinspan to replicate changes immediately? Or have I misconfigured something? Or is the delay not caused in infinispan, but there is another layer of caching in modeshape?

Thank you for any hints!

-------------------------------------------------------

This is how we save changes to nodes

    public static void nodeSave( String identifier ) throws Exception {

             com.arjuna.ats.jta.UserTransaction.userTransaction().begin();

              Node node = getJcrSession().getNodeByIdentifier(identifier);
             node.setProperty("title", params.get("title") != null ? params.get("title") : "");
             node.setProperty("src", params.get("src") != null ? params.get("src") : "");

             getJcrSession().save(); 

             com.arjuna.ats.jta.UserTransaction.userTransaction().commit();
     }

Infinispan configuration

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
        xmlns="urn:infinispan:config:5.1">

    <global>
        <transport clusterName="infinispan-cluster">
            <properties>
                <property name="configurationFile" value="jgroups-tcp.xml" />
            </properties>
        </transport>
    </global>

    <namedCache name="clusteredCache">

        <loaders passivation="false" shared="false" preload="false">
            <loader class="org.infinispan.loaders.file.FileCacheStore"
                fetchPersistentState="false" purgeOnStartup="false">
                <properties>
                    <property name="location" value="infinispanCacheDir" />
                </properties>
            </loader>
        </loaders>

        <transaction transactionMode="TRANSACTIONAL" lockingMode="OPTIMISTIC" 
        transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
        />

        <clustering mode="replication">
            <stateTransfer chunkSize="0" fetchInMemoryState="true"
                timeout="240000" />
            <sync replTimeout="20000" />
        </clustering>
    </namedCache>

</infinispan>

Repository configuration

{
    "name" : "myRepository",
    "workspaces" : {
        "predefined" : ["otherWorkspace"],
        "default" : "default",
        "allowCreation" : true,
        "initialContent" : {
            "default" : "xmlImport/init.xml" 
        }
    },
    "storage" : {
        "cacheConfiguration" : "infinispan.xml",
        "cacheName" : "clusteredCache",
        "binaryStorage" : {
            "minimumBinarySizeInBytes" : 4096,
            "minimumStringSize" : 4096,
            "type" : "file",
            "directory" : "infinispanBinaryStorage"
        }
    },
    "node-types" : ["types.cnd"],
    "security" : {
        "anonymous" : {
            "roles" : ["readonly","readwrite","admin"],
            "useOnFailedLogin" : false
        }
    }
}

Output at startup time

13:39:29,672 INFO  ~ ModeShape version 3.1.2.Final
13:39:29,759 INFO  ~ Configured the freemarker templating system
13:39:29,763 INFO  ~ Application 'myApp' is now started !
13:39:31,286 INFO  ~ ISPN000078: Starting JGroups Channel
13:39:31,737 INFO  ~ ISPN000094: Received new cluster view: [client-43181|1] [client-43181, client11-24967]
13:39:31,739 INFO  ~ ISPN000079: Cache local address is client-24967, physical addresses are [127.0.0.1:7801]
13:39:31,746 INFO  ~ ISPN000128: Infinispan version: Infinispan 'Brahma' 5.1.2.FINAL
13:39:31,747 WARN  ~ ISPN000149: Fetch persistent state and purge on startup are both disabled, cache may contain stale entries on startup
13:39:31,822 INFO  ~ ISPN000107: Retrieving transaction manager Transaction: unknown
13:39:32,756 INFO  ~ ARJUNA012163: Starting service com.arjuna.ats.arjuna.recovery.ActionStatusService on port 52027
13:39:32,760 INFO  ~ ARJUNA012337: TransactionStatusManagerItem host: 127.0.0.1 port: 52027
13:39:32,764 INFO  ~ ARJUNA012170: TransactionStatusManager started on port 52027 and host 127.0.0.1 with service com.arjuna.ats.arjuna.recovery.ActionStatusService

1. Re: Very slow replication in cluster

rhauch Feb 22, 2013 10:15 AM (in response to anweibel)
We have set up an infinispan replication cluster with two members, both running on the same machine.

Is there a way to force infinspan to replicate changes immediately? Or have I misconfigured something? Or is the delay not caused in infinispan, but there is another layer of caching in modeshape?
It is misconfigured. Changes should propagate immediately.

This is how we save changes to nodes

    public static void nodeSave( String identifier ) throws Exception {              com.arjuna.ats.jta.UserTransaction.userTransaction().begin();               Node node = getJcrSession().getNodeByIdentifier(identifier);              node.setProperty("title", params.get("title") != null ? params.get("title") : "");              node.setProperty("src", params.get("src") != null ? params.get("src") : "");              getJcrSession().save();              com.arjuna.ats.jta.UserTransaction.userTransaction().commit();      }

You don't need to use a UserTransaction if you're simply doing a single Session.save() within the transaction boundary. ModeShape will implicitly start, use, and commit a transaction within the Session.save() if there is no running UserTransaction. So you could just as easily do the following and achieve exactly the same effect (assuming there is no other requirement to have a UserTransaction):

    public static void nodeSave( String identifier ) throws Exception {              Node node = getJcrSession().getNodeByIdentifier(identifier);              node.setProperty("title", params.get("title") != null ? params.get("title") : "");              node.setProperty("src", params.get("src") != null ? params.get("src") : "");              getJcrSession().save();      }

Now, as far as your configuration, I have one comment:
        <loaders passivation="false" shared="false" preload="false">
            <loader class="org.infinispan.loaders.file.FileCacheStore"                 fetchPersistentState="false" purgeOnStartup="false">                 <properties>                     <property name="location" value="infinispanCacheDir" />                 </properties>             </loader>
        </loaders>

You are hard-coding the directory on the local file system where the FileCacheStore will persist the Infinispan content. Are you using the same Infinispan configuration file for all processes? If so, then you have two processes that are sharing the same storage, but the FileCacheStore is not capable of working this way. Each process needs to own and use a separate stoage area; the state transfer (within the cluster) will ensure that each process is kept up-to-date. (The JDBC-based cache stores can, but you have to mark them as "shared=true".)

I'd suggest trying again but using different directories for each repository's storage (including indexes, which you're not currently storing on disk). ModeShape and Infinispan configuration files support variables, which means you still can use the same configuration file but you run each process with different system property values for these variables. One pattern I've seen is to use a single variable for a "process identifier", and then use that variable within the directory paths. However, using separate variables for each directory path is certainly a possibility. You can even supply defaults and use multiple variable names. See our documentation for details and examples.
1 of 1 people found this helpful
Actions
2. Re: Very slow replication in cluster

anweibel Feb 22, 2013 10:32 AM (in response to rhauch)

Thanks a lot for the fast reply and the helpful suggestions!

Though, the two processes are started in different locations, so they don't use the same directory for storage already now... Could there be any other reason for the slow replication?
Actions
3. Re: Very slow replication in cluster

rhauch Feb 22, 2013 12:09 PM (in response to anweibel)

Let's see. From your log:

13:39:29,672 INFO ~ ModeShape version 3.1.2.Final 13:39:29,759 INFO ~ Configured the freemarker templating system 13:39:29,763 INFO ~ Application 'myApp' is now started ! 13:39:31,286 INFO ~ ISPN000078: Starting JGroups Channel 13:39:31,737 INFO ~ ISPN000094: Received new cluster view: [client-43181|1] [client-43181, client11-24967] 13:39:31,739 INFO ~ ISPN000079: Cache local address is client-24967, physical addresses are [127.0.0.1:7801] 13:39:31,746 INFO ~ ISPN000128: Infinispan version: Infinispan 'Brahma' 5.1.2.FINAL 13:39:31,747 WARN ~ ISPN000149: Fetch persistent state and purge on startup are both disabled, cache may contain stale entries on startup 13:39:31,822 INFO ~ ISPN000107: Retrieving transaction manager Transaction: unknown 13:39:32,756 INFO ~ ARJUNA012163: Starting service com.arjuna.ats.arjuna.recovery.ActionStatusService on port 52027 13:39:32,760 INFO ~ ARJUNA012337: TransactionStatusManagerItem host: 127.0.0.1 port: 52027 13:39:32,764 INFO ~ ARJUNA012170: TransactionStatusManager started on port 52027 and host 127.0.0.1 with service com.arjuna.ats.arjuna.recovery.ActionStatusService

It looks like this process is seeing the other process, so they've both joined the cluster.

You should probably use fetchPersistentState="true", so that when a process joins the cluster it gets all the state. That probably explains this log output:

Fetch persistent state and purge on startup are both disabled, cache may contain stale entries on startup

However, if you continue to have problems, you may want to enable debug logging on the following packages:

org.modeshape
org.infinispan.statetransfer
org.infinispan.schematic

The logging will be verbose, but it may be helpful.
Actions
4. Re: Very slow replication in cluster

anweibel Feb 22, 2013 2:10 PM (in response to rhauch)

Thank you for your help!

Turning on fetchPersistentState didn't make a difference, unfortunately.

I have turned logging for the three packages on now, and the results seem interesting...

If I interpret the log correctly, the two members of the infinispan cluster exchange the changed value instantaneously. However, the value I get from session.getNodeByIdentifier only changes after around 90 secs.

2013-02-22 18:41:48,961 --- [This is when I do the update in the other process] ---
2013-02-22 18:41:48,961 TRACE ~ Deserialized delta as document { "metadata" : { "id" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "parent" : "2a2282e7505d6418b163ad-582d-461c-a784-96fbff9dadfc" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{[deleted]}template" } , "created" : { "$date" : "2013-02-21T12:27:51.723Z" } , "lastModified" : { "$date" : "2013-02-22T18:41:48.958Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content new28" , "title" : "new value" } } } }
2013-02-22 18:41:48,961 TRACE ~ Acquired shared state transfer shared lock, total holders: 1
2013-02-22 18:41:48,962 TRACE ~ Merging whole doc delta into existing literal, resulting in SchematicEntryLiteral{ "metadata" : { "id" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "parent" : "2a2282e7505d6418b163ad-582d-461c-a784-96fbff9dadfc" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{[deleted]}template" } , "created" : { "$date" : "2013-02-21T12:27:51.723Z" } , "lastModified" : { "$date" : "2013-02-22T18:41:48.958Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content new28" , "title" : "new value" } } } }
2013-02-22 18:41:48,962 TRACE ~ Committed 2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b: { "metadata" : { "id" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d64091be640-4746-45ff-86ea-afad10d9b42b" , "parent" : "2a2282e7505d6418b163ad-582d-461c-a784-96fbff9dadfc" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{[deleted]}template" } , "created" : { "$date" : "2013-02-21T12:27:51.723Z" } , "lastModified" : { "$date" : "2013-02-22T18:41:48.958Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content new28" , "title" : "new value" } } } }
2013-02-22 18:41:48,962 TRACE ~ Released shared state transfer shared lock, remaining holders: 0
2013-02-22 18:41:48,967 TRACE ~ Acquired shared state transfer shared lock (for commit), total holders: 1
2013-02-22 18:41:48,969 TRACE ~ Released shared state transfer shared lock, remaining holders: 0
2013-02-22 18:41:55,181 INFO ~ Title value = title=old value
2013-02-22 18:42:18,451 INFO ~ Title value = title=old value
2013-02-22 18:42:35,121 INFO ~ Title value = title=old value
2013-02-22 18:42:50,216 INFO ~ Title value = title=old value
2013-02-22 18:43:06,288 INFO ~ Title value = title=old value
2013-02-22 18:43:24,254 INFO ~ Title value = title=new value

Here is how I read the node value:
public static void node( String identifier ) throws RepositoryException {

        Node node = null;

        node = JCR.session.getNodeByIdentifier(identifier);
        notFoundIfNull(node);

        NodeType nodeType = node.getPrimaryNodeType();
        Logger.info("Title value = " + node.getProperty("title"));

}

(JCR.session is a static variable, i.e. I do not login again after each request.)
Actions
5. Re: Very slow replication in cluster

rhauch Feb 22, 2013 2:30 PM (in response to anweibel)

(JCR.session is a static variable, i.e. I do not login again after each request.)
Oh, that's wrong. You're essentially allowing multiple threads to reuse the same session, and because Session is stateful, each thread will see the other threads concurrent changes.

Every request should obtain a new Session - this is the way the JCR API was designed.

If you're seeing concurrent updates, then you need to use the PESSIMISTIC locking mode and not OPTIMISTIC.
Actions
6. Re: Very slow replication in cluster

anweibel Feb 23, 2013 7:06 AM (in response to rhauch)

Thanks again for the helpful comment. We are now using a different JCR session per request (in the future, we'll have one JCR session per http session).

However, even if we do a new login to get a new Session for each request, we see updates to nodes only after around 2 minutes... It seems as if there was a cache that is invalidated after this time - but I don't understand if this is an infinispan issue (that I probably could configure away, somehow) or if there is another cache on top of infinispan in modeshape..?
Actions
7. Re: Very slow replication in cluster

cbeer Feb 24, 2013 1:56 PM (in response to anweibel)

Hi Andreas -- I've been struggling with the same kind of problem too. Next week I'll try to pull together a simple test case that demonstrates this behavior.
Actions
8. Re: Very slow replication in cluster

rhauch Feb 26, 2013 7:08 PM (in response to cbeer)

I just took an example used to replicate MODE-1830 and uploaded a new version with some changes and best practices for configuring clustered ModeShape and Infinispan in a JavaSE environment. Have a look to see if it helps.

Note that MODE-1830 appears to be a problem with how ModeShape handled events originating from other nodes in the cluster. The core problem is that when changes were made in one process (say Process A), events were correctly sent across the cluster but the other processes (e.g., Processes B, C, ...) did not properly use the events to purge the changed items from the local cache. Obviously this means that changes did not become visible until other changes were made to the same nodes. Kind of a big deal. But interestingly, ModeShape still applied the changes correctly to the persistent storage -- IOW, no data was lost, which I think is why we didn't really see it in any of our testing. But I think it does explain many of the problems people have been having lately with clustering. See the issue for the details.

The good news is that I've created a pull request and we should be able to merge this tomorrow AM. And we're ready to release 3.1.3.Final tomorrow, too, so if you're running into similar problems please give 3.1.3 a try as soon as possible.
Actions
9. Re: Very slow replication in cluster

anweibel Feb 28, 2013 10:12 AM (in response to rhauch)

Thank you Randall.

I have tested our application with 3.1.3 - and we still have the same issue. There seems to be more logging information in this version, though. Here are the log entries for the same scenario as above:

2013-02-28 14:53:11,502 [In this moment, we change the title on the other host]
2013-02-28 14:53:11,502 TRACE ~ Deserialized delta as document { "metadata" : { "id" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "parent" : "2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{...}template" } , "created" : { "$date" : "2013-02-27T09:46:52.991Z" } , "lastModified" : { "$date" : "2013-02-28T14:53:11.502Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content" , "title" : "new value" , "identifier" : "4a1cd3dc-4951-44e3-a606-113599e483d0" , "controller" : "content" , "body" : "authenticityToken=976022cfa047dd49eb928ab365764e362fd256a4&identifier=4a1cd3dc-4951-44e3-a606-113599e483d0&src=This+is+some+test+content&jcrNodeName=testTemplate&title=new+value" , "action" : "nodesave" , "authenticityToken" : "976022cfa047dd49eb928ab365764e362fd256a4" } } } }
2013-02-28 14:53:11,502 TRACE ~ Acquired shared state transfer shared lock, total holders: 1
2013-02-28 14:53:11,502 TRACE ~ Merging whole doc delta into existing literal, resulting in SchematicEntryLiteral{ "metadata" : { "id" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "parent" : "2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{...}template" } , "created" : { "$date" : "2013-02-27T09:46:52.991Z" } , "lastModified" : { "$date" : "2013-02-28T14:53:11.502Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content" , "title" : "new value" , "identifier" : "4a1cd3dc-4951-44e3-a606-113599e483d0" , "controller" : "content" , "body" : "authenticityToken=976022cfa047dd49eb928ab365764e362fd256a4&identifier=4a1cd3dc-4951-44e3-a606-113599e483d0&src=This+is+some+test+content&jcrNodeName=testTemplate&title=new+value" , "action" : "nodesave" , "authenticityToken" : "976022cfa047dd49eb928ab365764e362fd256a4" } } } }
2013-02-28 14:53:11,503 TRACE ~ Committed 2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0: { "metadata" : { "id" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "contentType" : "application/json" } , "content" : { "key" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "parent" : "2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{...}template" } , "created" : { "$date" : "2013-02-27T09:46:52.991Z" } , "lastModified" : { "$date" : "2013-02-28T14:53:11.502Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content" , "title" : "new value" , "identifier" : "4a1cd3dc-4951-44e3-a606-113599e483d0" , "controller" : "content" , "body" : "authenticityToken=976022cfa047dd49eb928ab365764e362fd256a4&identifier=4a1cd3dc-4951-44e3-a606-113599e483d0&src=This+is+some+test+content&jcrNodeName=testTemplate&title=new+value" , "action" : "nodesave" , "authenticityToken" : "976022cfa047dd49eb928ab365764e362fd256a4" } } } }
2013-02-28 14:53:11,503 TRACE ~ Released shared state transfer shared lock, remaining holders: 0
2013-02-28 14:53:11,503 TRACE ~ Acquired shared state transfer shared lock (for commit), total holders: 1
2013-02-28 14:53:11,503 TRACE ~ Released shared state transfer shared lock, remaining holders: 0
2013-02-28 14:53:15,606 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:53:15,606 INFO ~ Title value = title=old value
2013-02-28 14:53:21,082 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:53:21,082 INFO ~ Title value = title=old value
2013-02-28 14:53:51,066 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:53:51,066 INFO ~ Title value = title=old value
2013-02-28 14:54:07,245 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:54:07,245 INFO ~ Title value = title=old value
2013-02-28 14:54:19,336 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:54:19,336 INFO ~ Title value = title=old value
2013-02-28 14:54:29,991 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:54:29,991 INFO ~ Title value = title=old value
2013-02-28 14:54:38,307 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:54:38,307 INFO ~ Title value = title=old value
2013-02-28 14:55:10,849 TRACE ~ Node '4a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:55:10,849 TRACE ~ Node '2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:55:10,849 TRACE ~ Materialized document '2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0' in 'default' workspace from store: { "key" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "parent" : "2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{...}template" } , "created" : { "$date" : "2013-02-27T09:46:52.991Z" } , "lastModified" : { "$date" : "2013-02-28T14:53:11.502Z" } , "lastModifiedBy" : "<anonymous>" , "createdBy" : "<modeshape-worker>" } , "" : { "src" : "This is some test content" , "title" : "new value" , "identifier" : "4a1cd3dc-4951-44e3-a606-113599e483d0" , "controller" : "content" , "body" : "authenticityToken=976022cfa047dd49eb928ab365764e362fd256a4&identifier=4a1cd3dc-4951-44e3-a606-113599e483d0&src=This+is+some+test+content&jcrNodeName=testTemplate&title=new+value" , "action" : "nodesave" , "authenticityToken" : "976022cfa047dd49eb928ab365764e362fd256a4" } } }
2013-02-28 14:55:10,850 INFO ~ Title value = title=new value
2013-02-28 14:55:10,850 TRACE ~ Node '2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f' is not found in the 'default' workspace cache; looking in store
2013-02-28 14:55:10,850 TRACE ~ Materialized document '2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f' in 'default' workspace from store: { "key" : "2a2282e7505d64ef00047c-cab4-4a36-a8e1-d687dc1e5c1f" , "parent" : "2a2282e7505d6442e84908-e489-49ac-bcf0-84aaaa0f375f" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{...}templates" } } } , "children" : [ { "key" : "2a2282e7505d644a1cd3dc-4951-44e3-a606-113599e483d0" , "name" : "testTemplate" } ] , "childrenInfo" : { "count" : 1 } }

It still seems that Infinispan is exchanging information about the update immediately, while ModeShape only picks up the change after around 90 seconds.

When I stop and start the application, it immediately starts with the new value.

(Let me know if you prefer these comments in the Jira issue instead of in the discussion forum.)
Actions
10. Re: Very slow replication in cluster

rhauch Feb 28, 2013 11:18 AM (in response to anweibel)
Actually, I just noticed that your repository configuration does not have a "clustering" section. You absolutely need this in order for one ModeShape process to be notified of changes made in another process. Here's what it should look like for your Infinispan configuration:

"clustering" : {
"clusterName" : "infinispan-cluster"
}

That reuses the same JGroups configuration that Infinispan uses.

If that doesn't resolve the problem, try enabling TRACE logging on these packages to watch ModeShape processing those events:

org.modeshape.jcr.bus
org.modeshape.jcr.cache
org.modeshape.jcr.txn

And if that doesn't help, please include a self-contained test case that replicates the problem.
Actions

11. Re: Very slow replication in cluster

clementp Mar 7, 2013 1:33 AM (in response to anweibel)

Curiously, I am also seeing this problem when I run Modeshape (3.1.3) in a JBoss AS (7.1.1) cluster. Data retrieved is immediately refreshed to their current state if the node is restarted. I will have to dig deeper to see what's going on.

My modeshape configuration:

<subsystem xmlns="urn:jboss:domain:modeshape:1.0">
        <repository name="dshell" cache-name="dshell" cache-container="modeshape" cluster-name="modeshape-cluster" cluster-stack="tcp"/>
</subsystem>

Inifinispan configuration:

<cache-container name="modeshape" default-cache="dshell">
     <transport lock-timeout="60000"/>
     <distributed-cache name="dshell" mode="SYNC">
            <transaction mode="FULL_XA" locking="PESSIMISTIC"/>
            <file-store relative-to="jboss.server.data.dir" path="modeshape/store/dshell" passivation="false" purge="false">
                   <write-behind/>
            </file-store>
     </distributed-cache>
</cache-container>

JGroups configuration:

<subsystem xmlns="urn:jboss:domain:jgroups:1.1" default-stack="tcp">
  <stack name="tcp">
    <transport type="TCP" socket-binding="jgroups-tcp"/>
    <protocol type="TCPPING">
      <property name="initial_hosts">
        server1[7600],server2[7600]
      </property>
      <property name="num_initial_members">
        1
      </property>
      <property name="port_range">
        0
      </property>
      <property name="timeout">
        2000
      </property>
    </protocol>
    <protocol type="MERGE2"/>
    <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
    <protocol type="FD"/>
    <protocol type="VERIFY_SUSPECT"/>
    <protocol type="BARRIER"/>
    <protocol type="pbcast.NAKACK"/>
    <protocol type="UNICAST2"/>
    <protocol type="pbcast.STABLE"/>
    <protocol type="pbcast.GMS"/>
    <protocol type="UFC"/>
    <protocol type="MFC"/>
    <protocol type="FRAG2"/>
  </stack>
</subsystem>

12. Re: Very slow replication in cluster

clementp Mar 7, 2013 2:36 AM (in response to clementp)

I was able to debug the code to see that the servers didn't actually receive cluster views from each other which is odd since there are other jgroup channels in JBoss and they are all print something similar to:

[Server:server-one] 07:18:52,820 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,null) ISPN000094: Received new cluster view: [master:server-one/modeshape|1] [master:server-one/modeshape, slave:server-one/modeshape]

The above is infinispan's own channel for the cache but the channel used for the modeshape repository itself never receives those messages. Breakpointing in ClusteredRepositoryChangeBus shows that channelConnected() is called in the listener but viewAccepted() in the receiver is never called.

Curiously, the same log message is printed whenever JGroups starts a channel (here is the one from inifinispan that successfully receives cluster views when a node joins):

[Server:server-one] 07:08:51,019 INFO [stdout] (pool-27-thread-1) -------------------------------------------------------------------
[Server:server-one] 07:08:51,020 INFO [stdout] (pool-27-thread-1) GMS: address=master:server-one/modeshape, cluster=modeshape, physical address=10.XX.XX.XX:7600
[Server:server-one] 07:08:51,021 INFO [stdout] (pool-27-thread-1) -------------------------------------------------------------------

And the one for the modeshape repository (which doesn't receive cluster views):

[Server:server-one] 07:12:26,484 INFO [stdout] (MSC service thread 1-2) -------------------------------------------------------------------
[Server:server-one] 07:12:26,485 INFO [stdout] (MSC service thread 1-2) GMS: address=master:server-one/modeshape-cluster, cluster=modeshape-cluster, physical address=10.XX.XX.XX:7600
[Server:server-one] 07:12:26,486 INFO [stdout] (MSC service thread 1-2) -------------------------------------------------------------------

I can't see why they are configured differently (break pointed into the exact code that starts both channels) but the modeshape one just doesn't kick in (no cluster view updates) hence my guess is that a local cached view of the data is never purged properly leading to stale data on other nodes.
Actions
13. Re: Very slow replication in cluster

rhauch Mar 7, 2013 8:39 AM (in response to clementp)

Clement:

Doesn't your Infinispan configuration need to specify the (JGroups) transport stack? So something like this:

<cache-container name="modeshape" default-cache="dshell"> <transport lock-timeout="60000" stack="tcp"/> <distributed-cache name="dshell" mode="SYNC"> <transaction mode="FULL_XA" locking="PESSIMISTIC"/> <file-store relative-to="jboss.server.data.dir" path="modeshape/store/dshell" passivation="false" purge="false"> <write-behind/> </file-store> </distributed-cache> </cache-container>

Anyway, the symptom you talk about (that is, restarting a server makes all changes visible) sounds exactly like ModeShape is not connecting via JGroups, and thus not recieving any events from the other members of the cluster. However, I don't see anything wrong with the ModeShape configuration.
Actions
14. Re: Very slow replication in cluster

clementp Mar 7, 2013 7:53 PM (in response to rhauch)

I believe the default stack is tcp and I do see that infinispan prints that it has found its peers properly. I do not however see the trace from modeshape that indicates the same.
Actions

1 2 Previous Next

Go to original post