6 Replies Latest reply on Jun 20, 2011 2:39 PM by monty-temboo

    Basic Replication Configuration Question

    monty-temboo

      I've been playing with the 2 node example on github.  I modified it to have Node0 just print out the size of the cache every few seconds.  Node1 puts 100 random strings as keys and values in.  I also modified the waitForClusterToForm() method to listen for the cacheStarted() method and then sleep a second before allowing the node to start reading or writing.

       

      So, in playing around, I started the reader node, then a writer and saw 100 things in there.  Started another writer and saw 200 things in there.  Started another reader, and that one showed 0 items.  Ran another writer and the first reader then showed 300, second reader 100 items.

       

      I have replication turned on.  Here is my config, pretty much straight out of the example directory, but with no jgroups xml referenced.

       

      <?xml version="1.0" encoding="UTF-8"?>

      <infinispan

            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

            xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"

            xmlns="urn:infinispan:config:5.0">

         <global>

            <globalJmxStatistics

                  enabled="true"

                  jmxDomain="org.infinispan"

                  cacheManagerName="SampleCacheManager"/>

            <transport

                  clusterName="infinispan-cluster"

                  machineId="m1"

                  rackId="r1" nodeName="Node-A">

            </transport>

         </global>

         <default>

            <locking

               isolationLevel="READ_COMMITTED"

               lockAcquisitionTimeout="20000"

               writeSkewCheck="false"

               concurrencyLevel="5000"

               useLockStriping="false"

            />

            <jmxStatistics enabled="true"/>

            <clustering mode="replication">

               <stateRetrieval

                  timeout="240000"

                  fetchInMemoryState="false"

                  alwaysProvideInMemoryState="false"

               />

               <sync replTimeout="20000"/>

            </clustering>

       

         </default>

      </infinispan>

       

       

      My cache manager is created like this:

       

      this.cacheManager = new DefaultCacheManager("infinispan.xml");

       

      I imagine that there is some other configuration I need to do to make this work.  After looking again at the config, I decided to try setting fetchInMemoryState and alwaysProvideInMemoryState to true.  If I do that I get an exception like this for anything other than the first node up:

       

      15:35:26,374  INFO JGroupsTransport -- ISPN00078: Starting JGroups Channel

      15:35:26,374  INFO JGroupsTransport -- ISPN00088: Unable to use any JGroups configuration mechanisms provided in properties {}.  Using default JGroups configuration!

      15:35:26,424  INFO JChannel -- JGroups version: 2.12.0.Final

      15:35:27,027  WARN UDP -- receive buffer of socket java.net.DatagramSocket@1fea6a1c was set to 20MB, but the OS only allocated 65.51KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)

      15:35:27,027  WARN UDP -- receive buffer of socket java.net.MulticastSocket@56dc64a2 was set to 25MB, but the OS only allocated 65.51KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)

      15:35:30,208  INFO JGroupsTransport -- ISPN00094: Received new cluster view: [Node-A-16741|1] [Node-A-16741, Node-A-63371]

      15:35:30,309  INFO JGroupsTransport -- ISPN00079: Cache local address is Node-A-63371, physical addresses are [fe80:0:0:0:21f:5bff:feb6:9ae3%5:61547]

      15:35:30,310  INFO GlobalComponentRegistry -- ISPN00128: Infinispan version: Infinispan 'Pagoa' 5.0.0.CR5

      15:35:30,367  WARN GMS -- Node-A-63371: not member of view [Node-A-16741|2] [Node-A-16741]; discarding it

      15:35:30,582  INFO CacheJmxRegistration -- ISPN00031: MBeans were successfully registered to the platform mbean server.

      15:35:30,583  INFO RpcManagerImpl -- ISPN00074: Trying to fetch state from Node-A-16741

      15:35:30,597  WARN STREAMING_STATE_TRANSFER -- State reader socket thread spawned abnormaly

      java.net.NoRouteToHostException: No route to host

          at java.net.PlainSocketImpl.socketConnect(Native Method)

          at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

          at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

          at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

          at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)

          at java.net.Socket.connect(Socket.java:525)

          at org.jgroups.util.Util.connect(Util.java:276)

          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510)

          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)

          at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)

          at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)

          at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)

          at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)

          at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)

          at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)

          at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)

          at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)

          at org.jgroups.protocols.BARRIER.up(BARRIER.java:119)

          at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177)

          at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275)

          at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)

          at org.jgroups.protocols.Discovery.up(Discovery.java:291)

          at org.jgroups.protocols.PING.up(PING.java:66)

          at org.jgroups.protocols.TP.passMessageUp(TP.java:1102)

          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658)

          at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          at java.lang.Thread.run(Thread.java:637)

      15:35:30,599  WARN STREAMING_STATE_TRANSFER -- Could not connect to state provider. Closing socket...

       

      All of this is running on the same machine, by the way.

        • 1. Re: Basic Replication Configuration Question
          amalrajvinoth

          Hi,

           

          can you give jgroups-tcp/udp.xml for this exception?

          is it unicast ot multicast?

           

          thanks, amalraj.

          • 2. Re: Basic Replication Configuration Question
            monty-temboo

            I used the default configuration--no jgroups xml was specified.  If you have a recommendation I'll be happy to try the tcp or udp files that came with the distribution.

            • 3. Re: Basic Replication Configuration Question
              monty-temboo

              So I tried udp but got a mesage about the udp address not being multicast.  Since I don't know about how multicast works I then tried the jgroups-tcp.xml file.  First I got a message that it wasn't set up for state streaming, so I changed that.  Now when I run my example I get no exceptions, but it doesn't appear that anything is getting shared.  The messages in each node are the same except that I notice one is on port 7800 and the other on 7801.  I don't know what the messages about port 7500 are below, that port isn't in any of my configuration files that I can see.  I guess I'll start looking at jgroups configuration documentation to learn what is supposed to happen here.

               

              10:25:11,387  INFO JGroupsTransport -- ISPN00078: Starting JGroups Channel

              10:25:11,438  INFO JChannel -- JGroups version: 2.12.0.Final

              10:25:12,080  WARN TCP -- failed to join /ff0e:0:0:0:0:0:75:75:7500 on vnic1: java.net.SocketException: Address family not supported by protocol family

              10:25:12,082  WARN TCP -- failed to join /ff0e:0:0:0:0:0:75:75:7500 on vnic0: java.net.SocketException: Address family not supported by protocol family

               

              -------------------------------------------------------------------

              GMS: address=Node-A-10441, cluster=infinispan-cluster, physical address=fe80:0:0:0:21f:5bff:feb6:9ae3%5:7801

              -------------------------------------------------------------------

              10:25:14,128  INFO JGroupsTransport -- ISPN00094: Received new cluster view: [Node-A-10441|0] [Node-A-10441]

              10:25:14,206  INFO JGroupsTransport -- ISPN00079: Cache local address is Node-A-10441, physical addresses are [fe80:0:0:0:21f:5bff:feb6:9ae3%5:7801]

              10:25:14,208  INFO GlobalComponentRegistry -- ISPN00128: Infinispan version: Infinispan 'Pagoa' 5.0.0.CR5

              10:25:14,443  INFO CacheJmxRegistration -- ISPN00031: MBeans were successfully registered to the platform mbean server.

              10:25:14,444  INFO ComponentRegistry -- ISPN00128: Infinispan version: Infinispan 'Pagoa' 5.0.0.CR5

              • 4. Re: Basic Replication Configuration Question
                monty-temboo

                Here's the jgroups-tcp configuration file I used:

                 

                <!--

                  Fast configuration for local mode, ie. all members reside on the same host. Setting ip_ttl to 0 means that

                  no multicast packet will make it outside the local host.

                  Therefore, this configuration will NOT work to cluster members residing on different hosts !

                 

                  Author: Bela Ban

                  Version: $Id: fast-local.xml,v 1.9 2009/12/18 14:50:00 belaban Exp $

                -->

                 

                <config xmlns="urn:org:jgroups"

                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.8.xsd">

                    <TCP bind_port="7800" port_range="10"

                         recv_buf_size="20000000"

                         send_buf_size="640000"

                         loopback="false"

                         discard_incompatible_packets="true"

                         max_bundle_size="64000"

                         max_bundle_timeout="30"

                         enable_bundling="true"

                         enable_unicast_bundling="true"

                         enable_diagnostics="true"

                         thread_naming_pattern="cl"

                 

                         timer_type="new"

                         timer.min_threads="4"

                         timer.max_threads="10"

                         timer.keep_alive_time="3000"

                         timer.queue_max_size="1000"

                         timer.wheel_size="200"

                         timer.tick_time="50"

                 

                         thread_pool.enabled="true"

                         thread_pool.min_threads="2"

                         thread_pool.max_threads="8"

                         thread_pool.keep_alive_time="5000"

                         thread_pool.queue_enabled="true"

                         thread_pool.queue_max_size="100000"

                         thread_pool.rejection_policy="discard"

                 

                         oob_thread_pool.enabled="true"

                         oob_thread_pool.min_threads="1"

                         oob_thread_pool.max_threads="8"

                         oob_thread_pool.keep_alive_time="5000"

                         oob_thread_pool.queue_enabled="false"

                         oob_thread_pool.queue_max_size="100"

                         oob_thread_pool.rejection_policy="discard"/>

                 

                    <MPING timeout="2000"

                          num_initial_members="3" />

                 

                    <MERGE2 max_interval="30000"

                            min_interval="10000"/>

                 

                    <FD_SOCK/>

                    <FD_ALL interval="2000" timeout="5000" />

                    <VERIFY_SUSPECT timeout="500"  />

                    <BARRIER />

                    <pbcast.NAKACK use_stats_for_retransmission="false"

                                   use_mcast_xmit="false" gc_lag="0"

                                   retransmit_timeout="100,300,600,1200"

                                   discard_delivered_msgs="true" />

                    <UNICAST2 timeout="300,600,1200" />

                 

                    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                                   max_bytes="10m"/>

                    <pbcast.GMS print_local_addr="true" join_timeout="5000"

                                max_bundling_time="30"

                                view_bundling="true"/>

                    <UFC max_credits="2M"

                         min_threshold="0.4"/>

                    <MFC max_credits="2M"

                         min_threshold="0.4"/>

                    <FRAG2 frag_size="60000"  />

                    <pbcast.STREAMING_STATE_TRANSFER  />

                </config>

                • 5. Re: Basic Replication Configuration Question
                  amalrajvinoth

                  You've to add one of the discovery protocol in case of TCP

                  please try to add the following:

                   

                  <TCPPING timeout="3000"

                          initial_hosts="10.21.38.24[7800],10.21.38.38[7800]"

                          num_initial_members="3"

                          ergonomics="false"

                      />

                  1 of 1 people found this helpful
                  • 6. Re: Basic Replication Configuration Question
                    monty-temboo

                    It turned out that when I added -Djgroups.bind_addr=192.168.1.102 to the JVM arguments then things worked, even without the TCPPING section you recommended.  That was recommended in the docs somewhere.

                     

                    I tried adding just bind_addr=192.168.1.102 to the TCP properties, but that did not work for me.

                     

                    This is good enough to get me started.  Obviously I have a bit to learn about the discovery protocol, thanks for the link!  That's exactly what I needed to know.