9 Replies Latest reply on Jan 5, 2012 4:39 AM by galder.zamarreno

    FD, FD_SOCK configuration

    pmancham

      I have 3 machines m1, m2, m3. All of them configured as follows:

       

      <FD_SOCK bind_addr="${jgroups.bind_addr:127.0.0.1}"

                          start_port="7900"

                          client_bind_port="8900"

                          port_range="5"/>

       

      <FD max_tries="3" timeout="3000" />

       

      I am printing the servers that the client is working with using TcpConnectionFactory.getServers()

       

      I start m1 first, m2 second and then m3. When I abruptly shut down infinispan on one of the machines, it behaves differently based on which machine was shutdown - and it is not consistent.

       

      1) After starting all 3 servers and printing topology on the client I see all 3 servers in the list (TcpConnectionFactory.getServers() = m1,m2,m3) When I shut m2 down, the list now only has m1. Then I re-start m2 and list has m1, m2. For m3 to be back in the list I have to restart infinispan on m3.

       

      2) Sometimes, when I shut only m1 down (m2, m3 still running) the client starts getting "IllegalStateException - shouldnt be here" errors

       

      Is it wrong configuration?

        • 1. Re: FD, FD_SOCK configuration
          pmancham

          As we can see below, the server prepares a new view with 2 servers when m2 is down. The client however prepares a new view with just one server - m1

           

          Hotrod Server log:

           

          2011-12-28 10:23:33,991 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.marshall.jboss.AbstractJBossMarshaller] Stop unmarshaller

          2011-12-28 10:23:33,991 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Attempting to execute command: CacheViewControlCommand{cache=___hotRodTopologyCache, type=PREPARE_VIEW, sender=n1-65223, newViewId=4, newMembers=[n1-65223, n3-42135], oldViewId=3, oldMembers=[n1-65223, n2-63201, n3-42135]} [sender=n1-65223]

          2011-12-28 10:23:33,991 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.cacheviews.CacheViewsManagerImpl] ___hotRodTopologyCache: Preparing cache view CacheView{viewId=4, members=[n1-65223, n3-42135]}, committed view is CacheView{viewId=3, members=[n1-65223, n2-63201, n3-42135]}

          2011-12-28 10:23:33,991 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.cacheviews.CacheViewInfo] ___hotRodTopologyCache: Preparing view CacheView{viewId=4, members=[n1-65223, n3-42135]}

          2011-12-28 10:23:33,991 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.statetransfer.BaseStateTransferManagerImpl] Received new cache view: ___hotRodTopologyCache CacheView{viewId=4, members=[n1-65223, n3-42135]}

          2011-12-28 10:23:33,992 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.distribution.ch.TopologyAwareConsistentHash] Positions are: {1316340906=n3-42135, 1501433928=n1-65223}

          2011-12-28 10:23:33,992 TRACE (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.distribution.ch.TopologyAwareConsistentHash] Consistent hash initialized: TopologyAwareConsistentHash {1316340906: n3-42135, 1501433928: n1-65223}

          2011-12-28 10:23:33,992 DEBUG (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.statetransfer.ReplicatedStateTransferTask] Commencing state transfer 4 on node: n3-42135. Before start, data container had 3 entries

          2011-12-28 10:23:33,992 DEBUG (OOB-1,cache-test-cluster,n3-42135) [org.infinispan.statetransfer.StateTransferLockImpl] Blocking new write commands for cache view 4

           

           

           

          Hotrod Client log:

           

           

          2011-12-28 10:23:34,503 TRACE (main) [org.infinispan.client.hotrod.impl.protocol.Codec10] Received operation code is: 0x04

          2011-12-28 10:23:34,503 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Offset: 0, len=2, size=2

          2011-12-28 10:23:34,503 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Successfully read array with size: 2

          2011-12-28 10:23:34,503 TRACE (main) [org.infinispan.client.hotrod.impl.consistenthash.ConsistentHashFactory] No hash function configured for version 2

          2011-12-28 10:23:34,504 TRACE (main) [org.infinispan.client.hotrod.impl.consistenthash.ConsistentHashFactory] Trying to use default value: org.infinispan.client.hotrod.impl.consistenthash.ConsistentHashV2

          2011-12-28 10:23:34,509 TRACE (main) [org.infinispan.client.hotrod.impl.protocol.Codec11] Topology change request: newTopologyId=3, numKeyOwners=2, hashFunctionVersion=2, hashSpaceSize=2147483647, clusterSize=1, numVirtualNodes=1

          2011-12-28 10:23:34,509 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Offset: 0, len=11, size=11

          2011-12-28 10:23:34,509 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Successfully read array with size: 11

          2011-12-28 10:23:34,509 TRACE (main) [org.infinispan.client.hotrod.impl.transport.AbstractTransport] Read string is: 192.168.0.1

          2011-12-28 10:23:34,509 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Offset: 0, len=2, size=2

          2011-12-28 10:23:34,510 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Successfully read array with size: 2

          2011-12-28 10:23:34,510 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Offset: 0, len=4, size=4

          2011-12-28 10:23:34,510 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Successfully read array with size: 4

          2011-12-28 10:23:34,511 TRACE (main) [org.infinispan.client.hotrod.impl.protocol.Codec11] Server(192.168.0.1:11222) read with base hash code -177493001, and normalized hash code 1501433928

          2011-12-28 10:23:34,512 TRACE (main) [org.infinispan.client.hotrod.impl.protocol.Codec11] Hash code is: 1501433928

          2011-12-28 10:23:34,512 INFO  (main) [org.infinispan.client.hotrod.impl.protocol.Codec11] ISPN004006: New topology: [/192.168.0.1:11222]

          2011-12-28 10:23:34,519 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] Current list: [/192.168.0.4:11222, /192.168.0.1:11222, /192.168.0.2:11222]

          2011-12-28 10:23:34,519 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] New list:

          2011-12-28 10:23:34,519 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] Added servers:

          2011-12-28 10:23:34,519 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] Removed servers:

          2011-12-28 10:23:34,519 INFO  (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] ISPN004014: New server added(/192.168.0.1:11222), adding to the pool.

          2011-12-28 10:23:34,520 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory] Created tcp transport: TcpTransport{socket=Socket[addr=/192.168.0.1,port=11222,localport=49167], serverAddress=/192.168.0.1:11222, id =5}

          2011-12-28 10:23:34,520 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory] Returning to pool: TcpTransport{socket=Socket[addr=/192.168.0.1,port=11222,localport=49167], serverAddress=/192.168.0.1:11222, id =5}

          2011-12-28 10:23:34,520 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.RoundRobinBalancingStrategy] New server list is: [/192.168.0.1:11222]

          2011-12-28 10:23:34,521 INFO  (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] ISPN004016: Server not in cluster anymore(/192.168.0.4:11222), removing from the pool.

          2011-12-28 10:23:34,522 INFO  (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] ISPN004016: Server not in cluster anymore(/192.168.0.2:11222), removing from the pool.

          2011-12-28 10:23:34,523 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory] About to destroy tcp transport: TcpTransport{socket=Socket[addr=/192.168.0.2,port=11222,localport=49166], serverAddress=/192.168.0.2:11222, id =4}

          2011-12-28 10:23:34,523 TRACE (main) [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport] Successfully closed socket: Socket[unconnected]

          • 2. Re: FD, FD_SOCK configuration
            galder.zamarreno

            Infinispan version? Can you try with 5.1.0.CR2?

            • 3. Re: FD, FD_SOCK configuration
              pmancham

              Yes. It is 5.1.0.CR2 version.

              • 4. Re: FD, FD_SOCK configuration
                galder.zamarreno

                I don't think you've updated the client though cos it shows:

                2011-12-28 10:23:34,503 TRACE (main) [org.infinispan.client.hotrod.impl.protocol.Codec10] Received operation code is: 0x04

                 

                And a 5.1.0.CR2 client would show:

                2012-01-03 11:42:06,217 1212  TRACE [org.infinispan.client.hotrod.impl.protocol.Codec11] (main:) Received operation code is: 0x18

                 

                Notice the change in the class name (Code10 vs Codec11).

                 

                Anyway, I don't think this is crucial, but update the client and try again. If the issue still fails, I'd suggest doing the following:

                - Enable TRACE logging on both the client and server for org.infinispan packages (you seem to have done that already)

                - Repeat the test and attach the entire log files from both clients and servers (zip them and attach)

                • 5. Re: FD, FD_SOCK configuration
                  galder.zamarreno

                  Side note, this and https://issues.jboss.org/browse/ISPN-1654 could be related.

                  • 6. Re: FD, FD_SOCK configuration
                    pmancham

                    The client was using 5.1.0.CR1 jars, so I updated them to use 5.1.0.CR2.

                     

                    1) I started with m1, m3 & finally m2

                    2) shut down m3 => client topology [m1]

                    3) started m3 again => client topology [m1, m3]

                    4) shut down m2 => client topology [m1]

                    5) started m2 => client topology [m1, m2]

                     

                    Attached the logs for client and all servers

                    • 7. Re: FD, FD_SOCK configuration
                      galder.zamarreno

                      Thanks Prasant, I was able to find the bug. A fix for it should be available in CR3.

                      • 8. Re: FD, FD_SOCK configuration
                        pmancham

                        Thank you! When can we have the fixed client? When https://issues.jboss.org/browse/ISPN-1654 is marked resolved?

                        • 9. Re: FD, FD_SOCK configuration
                          galder.zamarreno

                          The fix is on the server actually, nothing on the client. Once that's fixed, you can go and build master branch of https://github.com/infinispan/infinispan and you'll the jars available to you. Otherwise wait for CR3 to be out.