6 Replies Latest reply on Mar 1, 2019 3:38 PM by nadirx

Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

soupy989 Feb 26, 2019 6:08 PM

Hi All

I have been trying to cluster an embedded spring boot infinispan instance with a remote standalone server node with no luck and was hoping someone here could help.

The error I'm seeing on the infinispan server side (NODE_1):

17:12:26,408 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Infinispan Server 9.4.4.Final (WildFly Core 6.0.2.Final) started in 10731ms - Started 274 of 322 services (163 services are lazy, passive

or on-demand)

17:19:49,480 INFO [org.infinispan.CLUSTER] (jgroups-8,{NODE_1}) ISPN000094: Received new cluster view for channel cluster: [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

17:19:49,483 INFO [org.infinispan.CLUSTER] (jgroups-8,{NODE_1}) ISPN100000: Node {NODE_2}-52218 joined the cluster

17:20:49,644 WARN [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p4-t23) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Ti

med out waiting for responses for request 1 from {NODE_2}-52218

at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)

at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)

at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Suppressed: org.infinispan.util.logging.TraceException

at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)

at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Suppressed: org.infinispan.util.logging.TraceException

at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:75)

at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:525)

at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:508)

at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:321)

The error seen from Spring Boot (NODE_2):

17:19:49.171 [main] DEBUG o.i.manager.DefaultCacheManager - Starting cache manager null

17:19:49.180 [main] INFO o.i.f.GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2' 9.4.4.Final

17:19:49.343 [main] INFO o.i.r.t.jgroups.JGroupsTransport - ISPN000078: Starting JGroups channel cluster

17:19:49.429 [main] DEBUG org.jgroups.stack.Configurator - set property TCP.diagnostics_addr to default value /224.0.75.75

17:19:49.434 [main] DEBUG org.jgroups.protocols.TCP - thread pool min/max/keep-alive: 0/200/60000 use_fork_join=false, internal pool: 0/4/30000 (4 cores available)

17:19:49.439 [main] DEBUG org.jgroups.protocols.MPING - bind_addr=/10.252.132.39, mcast_addr=/234.99.54.14, mcast_port=45700

17:19:49.447 [main] DEBUG org.jgroups.protocols.MPING - receiver thread started

17:19:49.452 [main] DEBUG org.jgroups.protocols.pbcast.GMS - address={NODE_2}-52218, cluster=cluster, physical address=10.252.132.39:47500

17:19:49.471 [main] DEBUG org.jgroups.protocols.pbcast.GMS - {NODE_2}-52218: sending JOIN({NODE_2}-52218) to {NODE_1}

17:19:49.652 [main] DEBUG o.jgroups.protocols.pbcast.NAKACK2 -

[rdcdlimdg02-52218 setDigest()]

existing digest: []

new digest: {NODE_1}: [0 (0)], {NODE_2}-52218: [0 (0)]

resulting digest: {NODE_1}: [0 (0)], {NODE_2}-52218: [0 (0)]

17:19:49.652 [main] DEBUG org.jgroups.protocols.pbcast.GMS - {NODE_2}-52218: installing view [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

17:19:49.665 [main] INFO org.infinispan.CLUSTER - ISPN000094: Received new cluster view for channel cluster: [{NODE_1}|1] (2) [{NODE_1}, {NODE_2}-52218]

17:19:49.666 [FD_SOCK pinger-11,{NODE_2}-52218] DEBUG org.jgroups.protocols.FD_SOCK - {NODE_2}-52218: pingable_mbrs=[{NODE_1}, {NODE_2}-52218], ping_dest={NODE_1}

17:19:49.666 [main] DEBUG o.i.r.t.jgroups.JGroupsTransport - Joined: [{NODE_1}, {NODE_2}-52218], Left: []

17:19:49.744 [main] INFO o.i.r.t.jgroups.JGroupsTransport - ISPN000079: Channel cluster local address is {NODE_2}-52218, physical addresses are [10.252.132.39:47500]

17:20:49.756 [main] DEBUG o.i.t.ClusterTopologyManagerImpl - Timed out waiting for rebalancing status from coordinator, trying again

The jgroups xml configuration used from Spring Boot (pulled from 9.4.4-core .jar):

<config xmlns="urn:org:jgroups"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">

<TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"

bind_port="${jgroups.tcp.port:7800}"

enable_diagnostics="false"

thread_naming_pattern="pl"

send_buf_size="640k"

sock_conn_timeout="1000"

bundler_type="no-bundler"

thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"

thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"

thread_pool.keep_alive_time="60000"

<MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"

mcast_addr="234.99.54.14"

mcast_port="45700"

ip_ttl="${jgroups.udp.ip_ttl:2}"/>

<MERGE3 min_interval="10000"

max_interval="30000"

<FD_SOCK start_port="57600"/>

<FD_ALL timeout="10000"

interval="2000"

timeout_check_interval="1000"

<VERIFY_SUSPECT timeout="1500"/>

<pbcast.NAKACK2 use_mcast_xmit="false"

xmit_interval="100"

xmit_table_num_rows="50"

xmit_table_msgs_per_row="1024"

xmit_table_max_compaction_time="30000"

resend_last_seqno="true"

<UNICAST3 xmit_interval="100"

xmit_table_num_rows="50"

xmit_table_msgs_per_row="1024"

xmit_table_max_compaction_time="30000"

<pbcast.STABLE stability_delay="500"

desired_avg_gossip="5000"

max_bytes="1M"

<pbcast.GMS print_local_addr="false"

join_timeout="${jgroups.join_timeout:5000}"

<MFC max_credits="2m"

min_threshold="0.40"

</config>

Compared to jgroups configuration used from server:

{HOST_2}[47500],{HOST_1}[47500]

</property>

</protocol>

false

</property>

</protocol>

</stack>

Some background:

jgroups.tcp.address and jgroups.tcp.port are set on start through JVM options.

Infinispan Server version - 9.4.4

infinispan-spring-boot-starter-embedded - 2.1.1

The oddest part about this is that I am able to cluster two standalone infinspan instances with no problem between NODE_1 and NODE_2 (using infinispan TCP config above). I took a look at trace logging for both org.jgroups and org.infinispan but found no details. Watching the UNICAST traffic (in trace logs) between the two, I don't see where a request was dropped - all requests from NODE_1 receive an ACK from node 2 < 60s.

Any ideas on cause/resolution would be greatly appreciated.

Thank you.

1. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

soupy989 Feb 27, 2019 2:32 PM (in response to soupy989)

I was able to cluster two embedded infinispan nodes (spring boot) using the above configs on the two servers with no issue as well.

Seems like the issue is specifically due to clustering embedded/server.
Actions
2. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

nadirx Feb 28, 2019 2:55 AM (in response to soupy989)

The server jgroups stack has a FORK element which is preventing the cluster from forming. FORK is used in the server because you can have multiple cache-containers piggy-backed on the same jgroups transport.
The solution is to add FORK at the end of the JGroups stack declaration in the embedded configuration. Also ensure that all nodes use the same cluster name.
Actions
3. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

soupy989 Feb 28, 2019 12:49 PM (in response to nadirx)

Thanks for the info! I don't see a FORK element in the server's tcp stack, is it something that isn't declared?

After adding a basic <FORK/> to the jgroups config, it seems additional configuration is required:

"WARN org.jgroups.protocols.FORK - {NODE_2}-39164: fork-stack for id=cluster not found; discarding message."

I'm still not sure how to configure a fork-stack for id "cluster", I tried the following:

<FORK config=".../forkConfig.xml"/>

with forkConfig.xml:
<?xml version="1.0"?>
<fork-stacks xmlns="fork" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="fork fork-stacks.xsd">
<fork-stack id="cluster"/>
</fork-stacks>

But I'm still getting the same warn message ("fork-stack for id=cluster not found, discarding message"). I'm guessing I'll have to configure the full fork stack for id="cluster" (exactly the same as in jgroups.xml), trying that now.
Actions
4. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

soupy989 Feb 28, 2019 3:26 PM (in response to soupy989)

Configuring the <fork-stack id="cluster"> with the full tcp stack from jgroups got rid of the new error (fork-stack for id=cluster), but I'm running in to the same issue seen before (server side reports timeout due to no response from embedded node, embedded node reports a timeout on rebalance).
Actions
5. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

soupy989 Mar 1, 2019 12:46 PM (in response to nadirx)

I updated my embedded jgroups config to :
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">
   <TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"
        bind_port="${jgroups.tcp.port:7800}"
        enable_diagnostics="false"
        thread_naming_pattern="pl"
        send_buf_size="640k"
        sock_conn_timeout="1000"
        bundler_type="no-bundler"

        thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"
        thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
        thread_pool.keep_alive_time="60000"
   />
   <TCPPING initial_hosts="10.252.132.39[47500],10.252.132.38[47500]"/>
   <MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"
    mcast_addr="234.99.54.14"
    mcast_port="45700"
    ip_ttl="${jgroups.udp.ip_ttl:2}"/>
   <MERGE3 min_interval="10000"
           max_interval="30000"
   />
   <FD_SOCK start_port="57600"/>
   
   <FD_ALL timeout="10000"
           interval="2000"
           timeout_check_interval="1000"
   />
   <VERIFY_SUSPECT timeout="1500"/>
   <pbcast.NAKACK2 use_mcast_xmit="false"
                   xmit_interval="100"
                   xmit_table_num_rows="50"
                   xmit_table_msgs_per_row="1024"
                   xmit_table_max_compaction_time="30000"
                   resend_last_seqno="true"
   />
   <UNICAST3 xmit_interval="100"
             xmit_table_num_rows="50"
             xmit_table_msgs_per_row="1024"
             xmit_table_max_compaction_time="30000"
   />
   <pbcast.STABLE stability_delay="500"
                  desired_avg_gossip="5000"
                  max_bytes="1M"
   />
   <pbcast.GMS print_local_addr="false"
               join_timeout="${jgroups.join_timeout:5000}"
   />
   <MFC max_credits="2m"
        min_threshold="0.40"
   />
   <FRAG3/>
   <FORK config="/DATAGRID/microservices/watp/forkConfig/forkConfig.xml"/>
</config>

With forkConfig.xml as:

<fork-stacks>
    <fork-stack id="cluster">
          <config/>
    </fork-stack>
</fork-stacks>

And I'm now receiving the following message in logs:
12:24:38.513 [jgroups-4,rdcdlimdg02-30019] WARN org.jgroups.protocols.FORK - rdcdlimdg02-30019: fork-channel for id=clustered not found; discarding message

It seems the "clustered" channel (from <cache-container name="clustered" default-cache="default" statistics="true">) isn't getting forked as I'd expect from the "cluster" channel on the embedded node. The server seems to create this forked channel automatically based on the cache-container name. I'm not sure how to fix this besides potentially switching to a programmatic declaration of jgroups/infinispan & manually creating the forked channel. Any idea what could cause this?

Thanks,
Jonathan
Actions
6. Re: Issue clustering embedded infinispan (spring boot) with Infinispan Server (9.4.4)

nadirx Mar 1, 2019 3:38 PM (in response to soupy989)

This is the part of the server code which installs the FORK protocol:

infinispan/JChannelFactory.java at 0a2fa837bcf5c4bf2716e43cef1bac494ce9c6f4 · infinispan/infinispan · GitHub
Actions

Go to original post