12 Replies Latest reply on Apr 11, 2018 1:53 PM by Tristan Tarrant

    Configuration for three parallel Infinispan cluster

    Alexander Diedler Newbie

      Hello,

      We have three cluster environments, each have two nodes. We make the hotrod ports equal per environment 11122 ENV1, 11132 ENV2 and 11142 ENV3. But, we see, that the nodes found each other and this should be avoided because it was a three tier system with Integration, Pre-Production and Production.

      How we can seperate / isolate each Infinispan Cluster (replicated) and avoid replication to the other envionments? We see on the other env the messages node xxx join and if we stop the node, node xxx has left the cluster.

      We change the ports in the clustered.xml

      here the default settings, what my idea was, to increment all ports by 1 (or two for the Prod systrem) to separate the envionenments from each other.But it does not work.

       

       <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
              <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/>
              <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9993}"/>
              <socket-binding name="hotrod" port="11222"/>
              <socket-binding name="hotrod-internal" port="11223"/>
              <socket-binding name="hotrod-multi-tenancy" port="11224"/>
              <socket-binding name="jgroups-mping" port="0" multicast-address="${jboss.default.multicast.address:234.99.54.14}" multicast-port="45700"/>
              <socket-binding name="jgroups-tcp" port="7600"/>
              <socket-binding name="jgroups-tcp-fd" port="57600"/>
              <socket-binding name="jgroups-udp" port="55200" multicast-address="${jboss.default.multicast.address:234.99.54.14}" multicast-port="45688"/>
              <socket-binding name="jgroups-udp-fd" port="54200"/>
              <socket-binding name="memcached" port="11211"/>
              <socket-binding name="rest" port="8080"/>
              <socket-binding name="rest-multi-tenancy" port="8081"/>
              <socket-binding name="rest-ssl" port="8443"/>
              <socket-binding name="txn-recovery-environment" port="4712"/>
              <socket-binding name="txn-status-manager" port="4713"/>
              <socket-binding name="websocket" port="8181"/>
              <outbound-socket-binding name="remote-store-hotrod-server">
                  <remote-destination host="remote-host" port="11222"/>
              </outbound-socket-binding>
              <outbound-socket-binding name="remote-store-rest-server">
                  <remote-destination host="remote-host" port="8080"/>
              </outbound-socket-binding>
          </socket-binding-group>
      
        • 1. Re: Configuration for three parallel Infinispan cluster
          Tristan Tarrant Master

          Nodes find each other using the "*PING" family of JGroups protocols. You should therefore ensure that each cluster uses a dedicated port/address combination. In particular, if you want to use multicast discovery, start your server with:

           

          -Djboss.default.multicast.address=a.b.c.d

           

          where a.b.c.d is unique to your cluster.

          • 2. Re: Configuration for three parallel Infinispan cluster
            Alexander Diedler Newbie

            Hello, Yes this was partially the good answer. By using the offset flag, we separate each environment from the others by using own ports and now it works. And in addtion we choose dedicated Multicast-ports in the clustered.xml file And we rename the cache-container to dedicated names.

            • 3. Re: Configuration for three parallel Infinispan cluster
              Alexander Diedler Newbie

              Hello, The separation was sucessful, but the communication between the nodes dies from time to time. "Request timeout to get response from ..." and I have no idea what could be the problem.

              Memory? I did not see any outofmemory error messages in server.log from IS.

              • 4. Re: Configuration for three parallel Infinispan cluster
                Alexander Diedler Newbie

                "If you want to use..." I don´t know If I want to use it. I want only a simple and robust 2 node cluster for replication of simple and complex stored values (Arrays and Structs). For the moment it was very unstable sometimes the nodes discover on startup, sometimes not. Then Timeouts happens in communication and the clustered.xml was used "out-of-the-box" from me, only and single modification is to raise the offset for the ports to seperate the clusters from each others.

                • 5. Re: Configuration for three parallel Infinispan cluster
                  Radim Vansa Master

                  Have you checked GC logs? Maybe you have long GC pauses that correlate with these failures.

                  • 6. Re: Configuration for three parallel Infinispan cluster
                    Alexander Diedler Newbie

                    Hello,

                    Thank you all for your tipps, but I don´t think it is related to GC or memory. I make a fresh installation of IS 9.1.4 on the Redhat server and use the ./standalone.sh -c clustered.xml with no modifications. I configure my application and Connector to use the defaut repl cache to put my values. The Connector uses the HotRod protocol.

                    From the server.log I see different error messages, where I am not sure, where it was the root cause:

                     

                    Sometimes, if we restart the 2nd node, I see on the first node

                    2018-04-09 14:48:20,108 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t17) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from wwhelapp0120
                     at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
                     at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
                     at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
                     at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:620)
                     at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:484)
                     at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:359)
                     at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:338)
                     at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:83)
                     at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:765)
                     at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
                     at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
                     at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
                     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                     at java.lang.Thread.run(Thread.java:745)
                    Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from wwhelapp0120
                     at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)
                     at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
                     at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
                     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                     ... 3 more
                    2018-04-09 17:09:42,952 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t17) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [wwhelapp0119, wwhelapp0120].

                     

                    2018-04-09 12:52:13,903 WARN  [org.infinispan.server.hotrod.Decoder2x] (HotRod-ServerWorker-5-4) ISPN006011: Operation 'REMOVE' forced to return previous value should be used on transactional caches, otherwise data inconsistency issues could arise under failure situations
                    2018-04-09 12:52:28,913 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p3-t1) ISPN000136: Error executing command RemoveCommand, writing keys [WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805}]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 18 from wwhelapp0120
                     at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)
                     at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
                     at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
                     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                     at java.lang.Thread.run(Thread.java:745)

                     

                    2018-04-09 13:15:06,446 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (jgroups-16,wwhelapp0119) ISPN000136: Error executing command RemoveCommand, writing keys [WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805}]: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from wwhelapp0120, see cause for remote stack trace
                     at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27)
                     at org.infinispan.remoting.transport.ValidSingleResponseCollector.withException(ValidSingleResponseCollector.java:41)
                     at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:25)
                     at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
                     at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:35)
                     at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53)
                     at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1328)
                     at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1238)
                     at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:121)
                     at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1366)
                     at org.jgroups.JChannel.up(JChannel.java:819)
                     at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:134)
                     at org.jgroups.stack.Protocol.up(Protocol.java:340)
                     at org.jgroups.protocols.FORK.up(FORK.java:134)
                     at org.jgroups.protocols.FRAG3.up(FRAG3.java:171)
                     at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
                     at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
                     at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864)
                     at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
                     at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002)
                     at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728)
                     at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383)
                     at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600)
                     at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119)
                     at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199)
                     at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252)
                     at org.jgroups.protocols.MERGE3.up(MERGE3.java:276)
                     at org.jgroups.protocols.Discovery.up(Discovery.java:267)
                     at org.jgroups.protocols.TP.passMessageUp(TP.java:1229)
                     at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
                     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                     at java.lang.Thread.run(Thread.java:745)
                    Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805} and requestor CommandInvocation:wwhelapp0119:764. Lock is held by CommandInvocation:wwhelapp0119:763
                     at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:253)
                     at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAndRecord(AbstractLockingInterceptor.java:269)
                     at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitNonTxDataWriteCommand(AbstractLockingInterceptor.java:130)
                     at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitDataWriteCommand(NonTransactionalLockingInterceptor.java:38)
                     at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitRemoveCommand(AbstractLockingInterceptor.java:105)
                     at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
                     at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
                     at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306)
                     at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252)
                     at org.infinispan.statetransfer.StateTransferInterceptor.visitRemoveCommand(StateTransferInterceptor.java:108)
                     at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
                     at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
                     at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitRemoveCommand(CacheMgmtInterceptor.java:214)
                     at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
                     at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:127)
                     at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:96)
                     at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:60)
                     at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
                     at org.infinispan.interceptors.DDAsyncInterceptor.visitRemoveCommand(DDAsyncInterceptor.java:65)
                     at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
                     at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50)
                     at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234)
                     at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63)
                     at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57)
                     at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:102)
                     at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99)
                     at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71)
                     at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40)
                     ... 3 more
                    • 7. Re: Configuration for three parallel Infinispan cluster
                      Radim Vansa Master

                      If this is not happening under heavy load (where timeouts are to be expected) and the cluster is operational at least partially, I am out of advices. At such moment you need to set trace logging on (including org.jgroups) and try to find out what's happening to the messages.

                       

                      Just to be sure, could you try to turn off firewall?

                      • 8. Re: Configuration for three parallel Infinispan cluster
                        Alexander Diedler Newbie

                        Hello, now this is the test system. What happens is the first call of a page fill up the cache with 2000 elements in a block, then it works fast to read for a few minutes and then the communication seems to be broken between the nodes and come never back.

                        i am not sure what I should think about Infinispan, it does not work "out-of-the-box" and there are no specialists available. All people I ask for (paid) support, give me the feedback, they have no experiences deeply with Infinispan as cluster (but they wrote Infinispan Installation support on their websites) . Is this cluster framework just BETA or does it work for enterprise, large scaled infrastructures?

                        Is there are no good example configuration available or have nobody here deeply experiences with Infinispan as cluster and can help me to build a stable system?

                         

                        What is the best practise to have this scenario:

                        an application based on 2 node cluster

                        On each cluster you have a local Infinispan installation

                        the two cluster nodes should be operate in replication mode, means, if one node if failing, the other node should operate without any impact.

                        This two node cluster exists three times, one test cluster, one validation cluster and one production cluster in the same network.

                        • 9. Re: Configuration for three parallel Infinispan cluster
                          Radim Vansa Master

                          I don't know whom you're getting support from, but Infinispan as a project is not supported (besides this forum and IRC). Most of the developers are Red Hat employees and the project is productized as Red Hat JBoss Data Grid (it's also used inside Wildfly -> EAP, Keycloak -> RH SSO and others). So this is where you should be looking for paid support. On this forum you'll find answers from the core developers, so I'd say that these are people who have experience with clustering - basically all the features are developed as clustered. And yes, there are setups with hundreds of clustered nodes in production.

                           

                          I am sorry it does not work out-of-the-box, and it seems that you don't experience any of the often configuration issues (like misconfiguring IPv4/IPv6 and such), but given the information you gave is hard to see what's happening. Stack traces ain't enough in distributed systems.

                          • 10. Re: Configuration for three parallel Infinispan cluster
                            Tristan Tarrant Master

                            As Radim said, Infinispan is definitely used in production in single node and clustered configurations up to 100s of nodes.

                            Clustering is a complex thing to get right, so you cannot expect things to just work without putting some effort into understanding about discovery, transports and the type of network you're dealing with.

                            If your cluster is breaking up, we need to understand why. This could be due to Infinispan/JGroups misconfiguration, network/switch issues, operating system networking, etc.

                            In particular enabling debug logs for JGroups would help.

                            • 11. Re: Configuration for three parallel Infinispan cluster
                              Alexander Diedler Newbie

                              Thank you for your clear words. I am very impatient to get things to fly because we configure now weeks on this cluster and it seems every time to end up in total outage what annoy me. I spend hours and hours on this topic.

                              What seems to help, I configure the default-stack now to TCP instead of UDP and this seems to be stable in general, also now for hours.

                              But, what I know, TCP with unicast is "expensive" in the network communication and UDP with Multicast should be prefered, isn´t it?

                               

                              If we get this reference system to work, for sure, we will use Infinispan in all future projects because I am very pleased about the management interface and in general the functionality.

                              I don´t know how to enable in Infinispan the trace mode, where is the switch for that?

                              • 12. Re: Configuration for three parallel Infinispan cluster
                                Tristan Tarrant Master

                                TCP is not necessarily more expensive than UDP: it depends on the size of the cluster, Distributed vs Replicated, the size of the entries, etc. Also, it is always wiser to trade off a little performance for stability.

                                 

                                As for logging, if you are using clustered.xml, look at the logging subsystem, and add a relevant logger, eg:

                                 

                                <logger category="org.jgroups">

                                     <level name="DEBUG"/>

                                </logger>

                                 

                                Since our server is based on WildFly, you can also tune the logger at runtime using the CLI:

                                 

                                How To - WildFly 10 - Project Documentation Editor