2 Replies Latest reply on Oct 12, 2016 12:02 PM by kaape

    Outdated session cache in domain mode Wildfly cluster

    kaape

      Hello Modeshape Community,

       

      we're currently trying to build a Wildfly (10.0.0.Final) cluster in domain mode with ModeShape 5.2.0.Final.

      In our minimal test setup the cluster runs on a single machine with two nodes.

      The repositories use database persistence with an Oracle RDBMS.

       

      Unfortunately the changes our application does in a repository on cluster node 1 are not seen on cluster node 2, although they use the same Oracle DataSource and "db" cluster-locking.

      This is (probably) caused by an outdated session cache on node 2.

       

      From my understanding of ModeShape clustering with JGroups and the explanation in https://developer.jboss.org/thread/272341 JGroups messages should be sent from node 1 to node 2 to invalidate the session cache as soon as repository data is changed on node 1.

      With TRACE-level logs for "org.jgroups" I could confirm that messages are properly sent by node 1 and received on node 2 after changes.

      With tcpdump on the JGroups tcp socket port it could further be validated that the payload of these messages contain information about the changeset, e.g. the attributes and data of a newly created repository node.

      Hence the communication within the cluster / between the nodes seems to work, but the session cache is not updated.

       

      If we configure the cache-size of the workspace to 1 (0 is not schema-valid), the changes are immediately visible, but performance takes a major hit.

       

      Is my understanding correct, that the relevant parts of the session cache of node 2 should be invalidated as soon as JGroups messages with a changeset are received?

       

      Could you please help us find the wrong parts in our minimal cluster configuration?

      Please see the attached domain, host-master and jgroups-config xml files.

      (modcluster is currently not configured properly. We're testing by accessing the two nodes manually and retrieving the repository state with modeshape-rest)

       

      Thanks for your help!

        • 1. Re: Outdated session cache in domain mode Wildfly cluster
          hchiorean

          Your understanding of how workspace cache clearing works is correct: when a session is saved successfully, a number of events containing the changes from that session are sent/broadcasted remotely via JGroups to all members of the cluster. Once these messages are received (eventually) these nodes should clear their local ws caches which means that the "next read operation" should get fresh data from the persistent store. Note that "eventually" is important here as this operation is not synchronous. So in the  [transaction.commit(), events received] interval other cluster nodes may see stale data.

           

          As long as messages are dispatches and received correctly everything should work. You can investigate the cache clearing behavior by enabling TRACE logging and looking for the messages printed here: modeshape/WorkspaceCache.java at master · hchiorean/modeshape · GitHub . After a WS cache is cleared, the first lookup for a particular node key should retrieve a copy from the DB unless there is an active transaction which has already loaded that node. In theory this case should not be possible because exclusive locking is used (unless there's a bug somewhere). In other words, ModeShape explicitly disallows multiple transactions concurrently changing the same node(s).

           

          In terms of the attached configuration files, the local jgroups-config.xml file looks fine to me. Note that I have no experience with WF domain mode, so I can't tell for sure if the JGroups WF configuration is ok or not. For example, I don't see the oob_thread_pool.enabled="false" thread_pool.enabled="false"  properties which are critical for correct event dispatching (but they may be 'false' by default in WF)

           

          So first I would suggest looking at TRACE logging in the WS cache to check that the caches are cleared. The other thing you can probably test (if you haven't done so already) is a simple non-WF cluster using your JG config which should indicate if this particular issue is WF related or not.

          • 2. Re: Outdated session cache in domain mode Wildfly cluster
            kaape

            Thanks for your input.

            It helped me pinpoint the cause of the error.

             

            The JGroups channel for the repository cluster is not established, because the ClusteringService isn't even activated.

            Unfortunately the cluster settings in the jboss subsystem (domain.xml) are not properly loaded on repository startup.

            I've filed an issue: [MODE-2638] Repository cluster settings are not properly loaded during Wildfly startup - JBoss Issue Tracker