-
15. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Jul 24, 2016 3:02 AM (in response to ez1234)Adding some more logs in DEBUG
See attached.
In the scenario we had massive numbers of entries not replicated between cluster nodes one & two.
There were many errors in the logs. Please take a look and see if this can take us forward in the issue investigation.
Thanks,
Eli
-
replication error2.zip 192.8 KB
-
-
16. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
nadirx Jul 25, 2016 5:02 AM (in response to ez1234)It looks like the cluster is splitting. Are you sure you're not hitting GC limits ? Do you have GC logs ?
-
17. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Jul 27, 2016 7:01 AM (in response to nadirx)Hi,
I do not know if i am hitting GC limits. I do not have GC logs.
I do know that we start the server with 10GB of heap memory. on one machine with 24GB of RAM and another with 50 GB of RAM.
How do i enable GC logs?
And we encounter this scenario even when not writing many entries to the cache. The cache is not under load at all.
Attached is another set of logs. I can see that what you said that the cache cluster is splitting / disconnecting is happening in the logs. Why does the cache cluster is splitting / nodes disconnecting from each other?
is this related to the cluster configuration? networking issue? a bug?
Thanks,
Eli
-
replication error-3.zip 1.1 MB
-
-
18. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Aug 3, 2016 7:56 AM (in response to ez1234)Hi,
Any updates on this issue?
Thanks,
Eli
-
19. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
pruivo Aug 3, 2016 9:25 AM (in response to ez1234)Hi Eli,
To enable the GC logs, enable this flags in the jvm: -XX:+PrintGCDetails -Xloggc:<file/to/store/gc-logs>
There a couple of reason why the cluster splits:
* The GC is one of them if it stops the application for too long, the heatbeats between nodes will not be sent and they are considered down. The GC logs will tell us more.
* Remote thread pool exhausted. It can happen under heavy load and it may not have free threads to send the heartbeat. A stack trace would be good to find out.
* Network issues. Could be a firewall blocking a connection, or the physical connection is drop or a hardware failure.
* a bug? yes of course. This isn't a perfect world.
Cheers,
Pedro
-
20. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Aug 4, 2016 1:51 AM (in response to pruivo)Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?
-
21. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Aug 4, 2016 2:00 AM (in response to ez1234)Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?
Another question, is there some configuration changes / tuning we can do to the jgroups / infinispan modules in order to reduce the risk for cluster split brain and disconnections between nodes?
-
22. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
ez1234 Aug 4, 2016 4:01 AM (in response to ez1234)I have reproduced the issue now with gc logs.
See in attached zip file gc log for each server
-
replication error 5.zip 99.3 KB
-
-
23. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
sathishkumarbt Aug 17, 2016 6:35 AM (in response to wdfink)Hi,
I facing similar issue with the warning
Aug 17, 2016 3:15:05 PM org.jgroups.protocols.FlowControl start
WARNING: The fragmentation size of the fragmentation protocol is 60000, which is greater than min_credits (40000). This can lead to blockings (https://issues.jboss.org/browse/JGRP-1659)
Aug 17, 2016 3:15:05 PM org.jgroups.blocks.cs.TcpServer$Acceptor run
WARNING: JGRP000006: failed accepting connection from peer
java.net.SocketException: BaseServer.TcpConnection.readPeerAddress(): cookie read by 192.168.15.135:7860 does not match own cookie; terminating connection
at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:256)
at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:54)
at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:132)
at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:117)
at java.lang.Thread.run(Thread.java:745)
Jgroups
<!--
TCP based stack, with flow control and message bundling. This is usually used when IP
multicasting cannot be used in a network, e.g. because it is disabled (routers discard multicast).
Note that TCP.bind_addr and TCPPING.initial_hosts should be set, possibly via system properties, e.g.
-Djgroups.bind_addr=192.168.5.2 and -Djgroups.tcpping.initial_hosts=192.168.5.2[7800]".
author: Bela Ban
-->
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-3.6.xsd">
<TCP loopback="true"
bind_addr="${jgroups.tcp.address:153.88.45.71}"
bind_port="${jgroups.tcp.port:7860}"
recv_buf_size="${tcp.recv_buf_size:20M}"
send_buf_size="${tcp.send_buf_size:640K}"
discard_incompatible_packets="true"
max_bundle_size="64K"
max_bundle_timeout="30"
enable_bundling="true"
use_send_queues="true"
sock_conn_timeout="300"
timer_type="new"
timer.min_threads="4"
timer.max_threads="10"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<!-- <TCP_NIO -->
<!-- bind_port="7800" -->
<!-- bind_interface="${jgroups.tcp_nio.bind_interface:bond0}" -->
<!-- use_send_queues="true" -->
<!-- sock_conn_timeout="300" -->
<!-- reader_threads="3" -->
<!-- writer_threads="3" -->
<!-- processor_threads="0" -->
<!-- processor_minThreads="0" -->
<!-- processor_maxThreads="0" -->
<!-- processor_queueSize="100" -->
<!-- processor_keepAliveTime="9223372036854775807"/> -->
<!-- <TCPGOSSIP initial_hosts="192.168.15.135[7860]"/> -->
<!-- <TCPPING initial_hosts="${jgroups.tcpping.initial_hosts}" -->
<!-- port_range="0" -->
<!-- timeout="3000" -->
<!-- /> -->
<TCPGOSSIP initial_hosts="${jgroups.tcpping.initial_hosts:153.88.45.71[7860],153.88.45.71[7861]}" />
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3"/>
<VERIFY_SUSPECT timeout="1500"/>
<pbcast.NAKACK
use_mcast_xmit="false"
retransmit_timeout="300,600,1200,2400,4800"
discard_delivered_msgs="true"/>
<UNICAST2 timeout="300,600,1200"
stable_interval="5000"
max_bytes="1m"/>
<pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>
<pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
<UFC max_credits="200k" min_threshold="0.20"/>
<MFC max_credits="200k" min_threshold="0.20"/>
<FRAG2 frag_size="60000"/>
<RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />
</config>
infinispan
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:8.2 http://www.infinispan.org/schemas/infinispan-config-8.2.xsd"
xmlns="urn:infinispan:config:8.2">
<jgroups>
<stack-file name="configurationFile" path="jgroups.xml"/>
</jgroups>
<cache-container>
<!-- <transport cluster="x-cluster" stack="configurationFile" /> -->
<transport stack="configurationFile" />
<replicated-cache name="transactional-type" mode="SYNC">
<transaction mode="NON_XA" locking="OPTIMISTIC" transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" auto-commit="true" />
<locking acquire-timeout="60000"/>
<expiration lifespan="43200000"/>
</replicated-cache>
</cache-container>
</infinispan>
-
24. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
wdfink Aug 17, 2016 7:45 AM (in response to sathishkumarbt)Hello Sathish,
please avoid double post and continue in your thread Replication is not happening with Infinispan 8.2.2