1 2 Previous Next 15 Replies Latest reply on Aug 6, 2012 8:27 AM by belaban

    Infinispan/Jgroups relay across 2 sites

    dex80526

      I came acorss a few posts on using RELAY to do geo-failover across data centers, but I did not find any specific requirements for this deployment.

       

      My potential deployment looks like this:

       

      infinispan cluster ONE in Replication (3 nodes) at site 1  <-------RELAY-----> infinispan cluster TWO (3 nodes) in Replication at site 2

       

      I came across a post which states replication does not work with RELAY. Is it true?

        • 1. Re: Infinispan/Jgroups relay across 2 sites
          dex80526

          I saw Bela's JBW presntation that states RELAY is only for DISTRIBUTION cluster mode.

           

          Hence, the answer to my earlier question is yes.

           

          Are there any real deployment  in production which uses the RELAY for geo-failover? What are the overhead/performance implications of RELAY to a local cluster?

          • 2. Re: Infinispan/Jgroups relay across 2 sites
            belaban

            Yes, replication is not supported. I'm currently working on RELAY2 (JGRP-1433), a successor to RELAY, which allows to connect more than 2 local clusters (sites). The focus is on DIST, although we'll allow to route multicasts, so REPL might be supported as well in a next stage. The design is described here: https://community.jboss.org/wiki/DesignForCrossSiteReplication

            • 3. Re: Infinispan/Jgroups relay across 2 sites
              dex80526

              Bela: thanks for the response.  What is the time line for the RELAY2? Is RELAY2 going to be in Infinispan 5.2 release?

               


              Anohter question on the configuration:

               

              In my case, my local cluster will have multiple name caches configured, but I need to replicate only ONE of  named cache across sites using RELAY.  Is this possible?  How do I configure it?

              • 4. Re: Infinispan/Jgroups relay across 2 sites
                belaban

                I don't know whether RELAY2 will be in ISPN 5.2. I anticipate a first beta of RELAY2 to be available by mid Sept if not sooner.

                 

                Since Infinispan uses a shared JChannel, I don't think that the scenario you describe above will work (at least not without changes in Infinispan)... You'd have to create a separate cache with its own JChannel.

                • 5. Re: Infinispan/Jgroups relay across 2 sites
                  dex80526

                  Does that (create a separate cache with its own JChannel) mean I need to have a separate CacheManager for a cache needs to be RELAYed  in Infinispan?

                   

                  I do not know how Infinispan decides what messages need to put in a queue for RELAY. But, I think conceptually we can use somthing like "NO_RELAY" flag/mode to tag a named cache which is configured for NO RELAYing.

                   

                  Please, can anyone else in Infiispan provide some insights on to allow selectively enable RELAY on named caches in one cache manager?

                  • 6. Re: Infinispan/Jgroups relay across 2 sites
                    dex80526

                    Bela: I tried a simple POC with RELAY today. I ran into some configuration issues, I think.

                     

                    Basically, I followed the ISPN's gui demo, but changed it to use TCP for local clusters.

                     

                    But, I can not make the bridge/relay work. I.e., the local cluster does not see other (remote) cluster.

                     

                    I am thinking I am missing somthing in my configuaration.

                     

                    Especially, I do not see any configuration examples using RELAY with ISPN, especially, what needs to put into the tcp.xml for the bridge.

                     

                    According the GUI demo sample, it uses jgroups-tcp.xml. In that case, what should we give to initial_hosts in TCP-PING element?  Should we list all nodes from both clusters?  What port RELAY uses?

                     

                    I am testing this by runing 2 single node clusters on 2 different nodes using jgropus-tcp.xml for bridge_props, and listed all nodes in both clusters for initial_hosts.

                     

                    With the above setting, I saw the nodes indeed form a single cluster, inseated of bridging 2 clusters.

                     

                    Here are some logs I saw:

                    2012-07-16 17:05:10,348 DEBUG [FD] (Timer-2,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:12,843 WARN  [TCP] (OOB-1,bridge-cluster,hanode1-54665) discarded message from different cluster "testCluster1" (our cluster is "bridge-cluster"). Sender was hanode1-36757

                    2012-07-16 17:05:13,350 DEBUG [FD] (Timer-4,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:16,352 DEBUG [FD] (Timer-3,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:19,355 DEBUG [FD] (Timer-2,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:22,356 DEBUG [FD] (Timer-5,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:25,358 DEBUG [FD] (Timer-4,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:28,360 DEBUG [FD] (Timer-3,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:31,364 DEBUG [FD] (Timer-2,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:32,723 WARN  [TCP] (OOB-1,bridge-cluster,hanode1-54665) discarded message from different cluster "testCluster1" (our cluster is "bridge-cluster"). Sender was hanode1-36757

                    2012-07-16 17:05:34,365 DEBUG [FD] (Timer-5,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:37,367 DEBUG [FD] (Timer-4,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:40,370 DEBUG [FD] (Timer-3,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:43,373 DEBUG [FD] (Timer-3,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:46,375 DEBUG [FD] (Timer-4,bridge-cluster,hanode1-54665) sending are-you-alive msg to dexlaptop-3506 (own address=hanode1-54665)

                    2012-07-16 17:05:46,908 WARN  [TCP] (OOB-1,bridge-cluster,hanode1-54665) discarded message from different cluster "testCluster1" (our cluster is "bridge-cluster"). Sender was hanode1-36757

                     

                    Here, "testCluster1" is the local cluster which should have only one node: hanode1.  dexlaptop is another node which should belong a separate cluster.

                     

                    Please help.

                    • 7. Re: Infinispan/Jgroups relay across 2 sites
                      dex80526

                      Just update on my previous post. I got my simple poc relay work using tcp in local clusters, I think.  The warnings I saw earlier are due to my configuration in tcp.xml for the bridge.

                       

                      It looks basically there are 3 clusters (2 local ones, and one bridge cluster).

                       

                      Move on, I am going to play a little more to understand how it works in my env, and cache manager will handle them in terms of some API calls such as getCoordinator(), getMemers(), etc.  It seems to me that those methods are not aware of relay/bridge cluster.

                      • 8. Re: Infinispan/Jgroups relay across 2 sites
                        dex80526

                        Bela, I hope you are still following this thread.

                         

                        I am using ISPN 5.1.5 final.

                         

                        I did some experiment in my testing. Now, I am not sure if my configure is really working as supposed.

                         

                        I have one cluter (relay1) with 2 nodes  and other cluster (relay2) with one node, and using RELAY among them.

                         

                        I changed my configuration of local clusters using "replication" as cluster mode along with RELAY. It seems there is no diffierence.

                        I still see the cluster is formed and data is rellicated across clusters.

                        If I set relay=false on the second cluster (relay2) in its RELAY element, the replication/relay is happing in ONE direction as expected.

                         

                        Am I doing someting completly wrong?

                         

                         

                        Here is my configurartion:

                         

                        relay1.xml:

                        <infinispan

                              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                              xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"

                              xmlns="urn:infinispan:config:5.1">

                         

                           <global>

                              <transport clusterName="c1"

                                        machineId="c1-node1"

                                        siteId="nyc"

                                        rackId="r1" nodeName="hanode1"

                              >

                                 <properties>

                                    <property name="configurationFile" value="./jgroups-tcp-relay1.xml" />

                                 </properties>

                              </transport>

                              <globalJmxStatistics enabled="true"/>

                           </global>

                         

                           <default>

                             <locking

                                 isolationLevel="READ_COMMITTED"

                                 lockAcquisitionTimeout="20000"

                                 writeSkewCheck="false"

                                 concurrencyLevel="5000"

                                 useLockStriping="false"

                              />

                              <deadlockDetection enabled="true" spinDuration="1000"/>

                              <jmxStatistics enabled="true"/>

                          </default>

                          <namedCache name="keychain" >

                                <clustering mode="replication">

                        <!--

                                 <l1 enabled="false" lifespan="10000"/>

                                 <hash numOwners="2" rehashWait="5000" rehashRpcTimeout="10000" />

                        -->

                               <async/>

                              </clustering>

                              <transaction transactionMode="NON_TRANSACTIONAL"/>

                          </namedCache>

                        </infinispan>

                         

                        jgroups-tcp-relay1.xml:

                        <config>

                          .....

                        <TCPPING timeout="3000"

                                initial_hosts="${jgroups.tcpping.initial_hosts:10.200.22.40[7800],10.200.22.41[7800]}"

                                    port_range="1"

                                    num_initial_members="2"

                                    ergonomics="false"

                                />

                           <MERGE2 max_interval="30000"

                                   min_interval="10000"/>

                           <FD_SOCK/>

                           <FD timeout="3000" max_tries="3"/>

                           <VERIFY_SUSPECT timeout="1500"/>

                           <pbcast.NAKACK

                                 use_mcast_xmit="false"

                                 retransmit_timeout="300,600,1200,2400,4800"

                                 discard_delivered_msgs="false"/>

                           <UNICAST2 timeout="300,600,1200"/>

                           <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                                          max_bytes="400000"/>

                           <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>

                           <UFC max_credits="2000000" min_threshold="0.10"/>

                           <MFC max_credits="2000000" min_threshold="0.10"/>

                           <FRAG2 frag_size="60000"/>

                           <RELAY site="nyc" bridge_name="dr_bridge" level="TRACE" bridge_props="./jgroups-tcp.xml"  />

                        </config>

                         

                        jgroups-tcp.xml:

                        <config>

                           <!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved -->

                           <TCPPING timeout="3000"

                                    initial_hosts="${jgroups.tcpping.initial_hosts:10.100.30.66[8900], 10.200.22.40[8900], 10.200.22.41[8900]}"

                         

                                    port_range="1"

                                    num_initial_members="3"

                                    ergonomics="false"

                          />

                           <MERGE2 max_interval="30000"

                                   min_interval="10000"/>

                           <FD_SOCK/>

                           <FD timeout="3000" max_tries="3"/>

                           <VERIFY_SUSPECT timeout="1500"/>

                           <pbcast.NAKACK

                                 use_mcast_xmit="false"

                                 retransmit_timeout="300,600,1200,2400,4800"

                                 discard_delivered_msgs="false"/>

                           <UNICAST2 timeout="300,600,1200"/>

                           <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                                          max_bytes="400000"/>

                           <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>

                           <UFC max_credits="2000000" min_threshold="0.10"/>

                           <MFC max_credits="2000000" min_threshold="0.10"/>

                           <FRAG2 frag_size="60000"/>

                        </config>

                         

                        relay3.xml and jgroups-tcp-relay2.xml are ommited here.

                        • 9. Re: Infinispan/Jgroups relay across 2 sites
                          dex80526

                          I have more questions when I play RELAY more, and hope some one can clerfy them for me:

                           

                          I use the following code to test if a node is a cluster coordinator or not:

                          >>>cacheManager.getTransport().isCoordinator();

                           

                          With my configuration (2 clusters and RELAY among them): cluster 1 (relay1) has 2 nodes  and cluster 2 (relay2) has 1 node.

                          When I run my testing code on 3 different nodes, I expect that 2 nodes (one from each cluser) are coordinator.  But, the realy result is that only one node is coordinator.

                           

                          This indicates that the 3 nodes are forming single cluster instead of 2 local clusters and bridged with RELAY.

                           

                          If that is not case, what is the right way to get coodinator for each local cluster? 

                           

                          CacheManager.getMembers() also returns all nodes from both clusters instead of a local cluster.

                           

                          I am wondering if I really have 2 clusters bridged RELAY or just one big cluster with all the nodes as member. Please help.

                          • 10. Re: Infinispan/Jgroups relay across 2 sites
                            dex80526

                            I did not see responses to my previous questions, and spent more time on the RELAY last week.  I find more questions than answers.

                             

                            BTW, I did read JGROUPS doucments related to RELAY as I can find, but did not find any Infinispan specifics.

                             

                            Here are more questions to any one in the knows:

                             

                            1) Does RELAY relay the messages across clusters in async?  It is not obvious from the configuration I saw so far. How do we configure the behaviour of RELAY?

                             

                            2) Does one directional RELAY still require bi-driectional network access between 2 clusters? For example, I have 2 clusters: local cluster inside company's firewall, and remote cluster in Amazon EC2. I want to use the EC2 cluster as DR site, and confiure the one way RELAY from local cluster to EC2 cluster  (local cluster -----RELAY----> EC2 cluster). In this case, the nodes in local cluster can access the nodes in EC2, but not other way around.

                            Does the one way RELAY work in this setting?

                             

                            3) How do the 2 brdged clusters in RELAY (re) sync their state?

                             

                            For example, for some reason, the connectivity between the 2 cluster get lost, and re-established, how they (re)sync the state? 

                             

                            I'll be happy to share my experience if I work this out. Please help.

                            • 11. Re: Infinispan/Jgroups relay across 2 sites
                              mircea.markus

                              dex - sorry for the late answer.

                               

                              With my configuration (2 clusters and RELAY among them): cluster 1 (relay1) has 2 nodes  and cluster 2 (relay2) has 1 node.

                              When I run my testing code on 3 different nodes, I expect that 2 nodes (one from each cluser) are coordinator.  But, the realy result is that only one node is coordinator.

                               

                              The current implementation of RELAY (it currently rewritten) indeed creates an virtual cluster over all the nodes. You can read more about it here[1], in particular the section dedicated to Views.

                               

                              [1] http://www.jgroups.org/manual/html/user-advanced.html#RelayAdvanced

                              • 12. Re: Infinispan/Jgroups relay across 2 sites
                                dex80526

                                Mircea: thanks for the respose.

                                 

                                I am strugging with lack of document on what ISPN supports or not in terms of RELAY across clusters/data centers.

                                 

                                I was able to get something going with my POC. The basic replication across cluster seems working. 

                                 

                                Do you know there is any way in which I can enable RELAY for certain named caches of a cache manager,not for others?

                                 

                                The use case is that I have 4 named caches configured, I just want to RELAY one of them for performance reason. 

                                • 13. Re: Infinispan/Jgroups relay across 2 sites
                                  belaban

                                  dex chen wrote:

                                   

                                  Bela: thanks for the response.  What is the time line for the RELAY2? Is RELAY2 going to be in Infinispan 5.2 release?

                                   

                                  No, I don't think so, but I'll know more next week when I get back to RELAY2.

                                   

                                   


                                  Anohter question on the configuration:

                                   

                                  In my case, my local cluster will have multiple name caches configured, but I need to replicate only ONE of  named cache across sites using RELAY.  Is this possible?  How do I configure it?

                                   

                                  Yes, I think this will be possible; you'll have to configure a separate cache for relaying, pointing to a different JGroups config (with RELAY2 in it). We'll post more on this as soon as we have a first prototype.

                                  • 14. Re: Infinispan/Jgroups relay across 2 sites
                                    belaban

                                    Yes, exactly. You have 2 local clusters ONE and TWO, and the 2 coordinators of ONE and TWO also join a third cluster BRIDGE.

                                    1 2 Previous Next