7 Replies Latest reply on Mar 3, 2016 1:23 PM by haiaw

    Jboss/Infinispan clustered Treecache - blocks server to start

    haiaw

      Hi,

       

      I see this Cache example: https://github.com/infinispan/infinispan-quickstart/tree/master/clustered-cache

       

      And I have very similar basend on JbossCache 1.4

       

      I want to migrate to Jboss but firstly I want to know if the concept is the same.

       

      In JbossCache 1.4 I have 4 nodes and while they are connecting to cache cluster using jgroubs with so called jgroups "Promises" (ackownleging different protocols readiness) they are suspending whole deploy process on Weblogic Server. Since I have some problem with setting cluster on node A with node B on different machine, one of the machines is endlessly deploying...

       

      I would like to have such mechanism that would let to deployment to continue and maybe join the cache cluster int the background...

       

      Is it possible in Infinispan?

       

      Best regards

        • 1. Re: Jboss/Infinispan clustered Treecache - blocks server to start
          nadirx

          Not sure I understand what you mean.

          Infinispan uses jgroups to handle all things related to cluster communications. The default configurations don't wait for the entire cluster to form, so you start one node at a time. There is an initial discovery process which blocks for a short while (GMS join timeout) but after that it does essentially "start in the background".

          • 2. Re: Jboss/Infinispan clustered Treecache - blocks server to start
            haiaw

            Are you sure it does that?

             

            I am using JBossCache 1.4.1 which also uses jgroups (2.4.1) and GMS is retrying and timeouting endlessly and the deployment is blocked whole time. I am debugging jgroups sources and cannot find the reason. GMS is one of last protocols on the stack and whats strange, 2 times in 10 it succeeds. I wonder if Infinispan could do better and this information is very important.

            • 3. Re: Jboss/Infinispan clustered Treecache - blocks server to start
              nadirx

              Pretty sure

              Also we are using JGroups 3.6. Things have changed. But belaban can probably provide more insight

              • 4. Re: Jboss/Infinispan clustered Treecache - blocks server to start
                belaban

                I have no idea what Colin is talking about, Colin, can you rephrase?

                • 5. Re: Jboss/Infinispan clustered Treecache - blocks server to start
                  haiaw

                  1) I have one WAR which is deployed on two virtual machines (two separate ip addresses) Each virtual machine hase one Weblogic installed. On each Weblogic I have two servers - front and backend. Finally, my WAR file is deployed 4 times front,back on virtual machine 1 and front,back on virtual machine 2. I just want each WAR to deploy seamlessly in a way independent from cache problems. Each node should attach to cluster in background so as not to block the WAR file deployment. Front,back on virtual machine 1 starts OK, but deployment of front or back on virtual machine 2 never ends because of endlessly retrying GMS protocol connection event. It succedes 1 per 6 times, still don't know why..

                   

                  2) Moreover I am wondering if we have good cache architecture. I have 4 deployments of the same WAR file on two machines, so there are two ip addresses. Each WAR has cluster-config file based on TreeCache. Cache can be replicated sync/async or invalidated sync/async - all depends on settings. But if something very important is cached and it must be replicated / invalidated ASAP like for example user permissions snapshot, then some problems may occur regardless of what happen - I mean if timeut whil replication/invalidation occurs it is bad, other nodes have wrong user permissions in cache. If, on the other side, it work, but to slowly, GUI and the end user are waiting for server to respond, if cache is replicated/ invalidated in sync mode. Worse, if timeout occurs on Jboss Cache invaidation / replication transaction, that rollbacks or blocks whole use-case started by user on GUI.. That is no acceptable that cache problems rollback / block user actions. Thats why I am thinking of changing even the architecture of cache i our project..

                  Maybe the best solution would be to have one cache source, not clustered, and every node would query that cache and changed values in it. We have database to which every node is connected, and it is good solution - just think what would happen if each node would have other database that would require some synchronization process, nobody do such things..

                   

                  What Infinispan offers as for architecture for this scenario?

                  • 6. Re: Jboss/Infinispan clustered Treecache - blocks server to start
                    belaban

                    I suggest you try out the latest stable Infinispan/JGroups combo, to see if you GMS connection problems disappear. Alternatively, you could run a JGroups standalone demo on all 4 servers (e.g. Chat) with your config, to look at networking issues separately.

                    • 7. Re: Jboss/Infinispan clustered Treecache - blocks server to start
                      haiaw

                      Thanks for your reply. I am doing two things now. New git branch for migrating to Infinispan, old to repair old cache.
                      As for old cache it seems I succeeded in joining all servers to cluster. I used probe.sh script from JbossCache sources, didn't know something like this even exists. This script showed me that I have 10 strange clusters with old settigs on servers. To restart server I was using kill -9 command and these clusters stayed in JVM with the same multicast port and different binding ports or maybe also the same in some cases. TreeCache.stop wasn't invoked due to kill -9.

                      That strange, that these zombie clusters stays on JVM and what worse I dont know how to kill them.... ???

                      After changing multicast port I managed to join all servers to cluster even two times so it seems it works.

                       

                      Below is type of configuration (not the same but very similar, just copied from some example) I have so that you knew what I am writing about.

                       

                          <?xml version="1.0" encoding="UTF-8" ?>

                          <server>

                            <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar" />

                       

                            <!--  ====================================================================  -->

                            <!--  Defines TreeCache configuration                                       -->

                            <!--  ====================================================================  -->

                            <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">

                              <depends>jboss:service=Naming</depends>

                              <depends>jboss:service=TransactionManager</depends>

                       

                       

                              <!-- Configure the TransactionManager -->

                              <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>

                       

                              <!--

                                      Node locking scheme :

                                                          PESSIMISTIC (default)

                                                          OPTIMISTIC

                              -->

                              <attribute name="NodeLockingScheme">PESSIMISTIC</attribute>     

                       

                              <!--

                                      Node locking isolation level :

                                                           SERIALIZABLE

                                                           REPEATABLE_READ (default)

                                                           READ_COMMITTED

                                                           READ_UNCOMMITTED

                                                           NONE

                       

                                      (ignored if NodeLockingScheme is OPTIMISTIC)

                              -->

                              <attribute name="IsolationLevel">REPEATABLE_READ</attribute>

                       

                              <!--     Valid modes are LOCAL

                                                       REPL_ASYNC

                                                       REPL_SYNC

                                                       INVALIDATION_ASYNC

                                                       INVALIDATION_SYNC

                              -->

                              <attribute name="CacheMode">LOCAL</attribute>

                           

                              <!--  Whether each interceptor should have an mbean

                                  registered to capture and display its statistics.  -->

                              <attribute name="UseInterceptorMbeans">true</attribute>

                       

                              <!-- Name of cluster. Needs to be the same for all clusters, in order

                                       to find each other -->

                              <attribute name="ClusterName">JBoss-Cache-Cluster</attribute>

                       

                              <attribute name="ClusterConfig">

                                <config>

                                  <!-- UDP: if you have a multihomed machine,

                                          set the bind_addr attribute to the appropriate NIC IP address

                                  -->

                                  <!-- UDP: On Windows machines, because of the media sense feature

                                           being broken with multicast (even after disabling media sense)

                                           set the loopback attribute to true

                                  -->

                                  <UDP mcast_addr="228.1.2.3" mcast_port="45566" ip_ttl="64" ip_mcast="true"

                                     mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000"

                                     ucast_recv_buf_size="80000" loopback="false" />

                                  <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />

                                  <MERGE2 min_interval="10000" max_interval="20000" />

                                  <FD shun="true" up_thread="true" down_thread="true" />

                                  <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />

                                  <pbcast.NAKACK gc_lag="50" max_xmit_size="8192" retransmit_timeout="600,1200,2400,4800" up_thread="false"

                                     down_thread="false" />

                                  <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />

                                  <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />

                                  <FRAG frag_size="8192" down_thread="false" up_thread="false" />

                                  <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />

                                  <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />

                                </config>

                              </attribute>

                       

                              <!--    The max amount of time (in milliseconds) we wait until the

                                      initial state (ie. the contents of the cache) are retrieved from

                                      existing members in a clustered environment

                              -->

                              <attribute name="InitialStateRetrievalTimeout">5000</attribute>

                       

                              <!--    Number of milliseconds to wait until all responses for a

                                      synchronous call have been received.

                              -->

                              <attribute name="SyncReplTimeout">10000</attribute>

                       

                              <!--  Max number of milliseconds to wait for a lock acquisition -->

                              <attribute name="LockAcquisitionTimeout">15000</attribute>

                       

                              <!--  Name of the eviction policy class. -->

                              <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>

                       

                              <!--  Specific eviction policy configurations. This is LRU -->

                              <attribute name="EvictionPolicyConfig">

                                <config>

                                  <attribute name="wakeUpIntervalSeconds">5</attribute>

                                  <!--  Cache wide default -->

                                  <region name="/_default_">

                                   <attribute name="maxNodes">5000</attribute>

                                   <attribute name="timeToLiveSeconds">1000</attribute>

                                   <!-- Maximum time an object is kept in cache regardless of idle time -->

                                   <attribute name="maxAgeSeconds">120</attribute>

                                 </region>

                       

                                 <region name="/org/jboss/data">

                                   <attribute name="maxNodes">5000</attribute>

                                   <attribute name="timeToLiveSeconds">1000</attribute>

                                 </region>

                       

                                 <region name="/org/jboss/test/data">

                                   <attribute name="maxNodes">5</attribute>

                                   <attribute name="timeToLiveSeconds">4</attribute>

                                 </region>

                                </config>

                              </attribute>

                        

                       

                            </mbean>

                          </server>