Infinispan server: Cache entries are not replic...| JBoss.org Content Archive (Read Only)

1 2 Previous Next 24 Replies Latest reply on Aug 17, 2016 7:45 AM by wdfink Go to original post

15. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Jul 24, 2016 3:02 AM (in response to ez1234)
Adding some more logs in DEBUG

See attached.

In the scenario we had massive numbers of entries not replicated between cluster nodes one & two.
There were many errors in the logs. Please take a look and see if this can take us forward in the issue investigation.

Thanks,
Eli

replication error2.zip 192.8 KB
Actions
16. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

nadirx Jul 25, 2016 5:02 AM (in response to ez1234)

It looks like the cluster is splitting. Are you sure you're not hitting GC limits ? Do you have GC logs ?
Actions
17. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Jul 27, 2016 7:01 AM (in response to nadirx)
Hi,

I do not know if i am hitting GC limits. I do not have GC logs.
I do know that we start the server with 10GB of heap memory. on one machine with 24GB of RAM and another with 50 GB of RAM.
How do i enable GC logs?

And we encounter this scenario even when not writing many entries to the cache. The cache is not under load at all.
Attached is another set of logs. I can see that what you said that the cache cluster is splitting / disconnecting is happening in the logs. Why does the cache cluster is splitting / nodes disconnecting from each other?
is this related to the cluster configuration? networking issue? a bug?

Thanks,
Eli

replication error-3.zip 1.1 MB
Actions
18. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Aug 3, 2016 7:56 AM (in response to ez1234)

Hi,

Any updates on this issue?

Thanks,
Eli
Actions
19. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

pruivo Aug 3, 2016 9:25 AM (in response to ez1234)

Hi Eli,

To enable the GC logs, enable this flags in the jvm: -XX:+PrintGCDetails -Xloggc:<file/to/store/gc-logs>

There a couple of reason why the cluster splits:
* The GC is one of them if it stops the application for too long, the heatbeats between nodes will not be sent and they are considered down. The GC logs will tell us more.
* Remote thread pool exhausted. It can happen under heavy load and it may not have free threads to send the heartbeat. A stack trace would be good to find out.
* Network issues. Could be a firewall blocking a connection, or the physical connection is drop or a hardware failure.
* a bug? yes of course. This isn't a perfect world.

Cheers,
Pedro
Actions
20. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Aug 4, 2016 1:51 AM (in response to pruivo)

Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?
Actions
21. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Aug 4, 2016 2:00 AM (in response to ez1234)

Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?

Another question, is there some configuration changes / tuning we can do to the jgroups / infinispan modules in order to reduce the risk for cluster split brain and disconnections between nodes?
Actions
22. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

ez1234 Aug 4, 2016 4:01 AM (in response to ez1234)
I have reproduced the issue now with gc logs.

See in attached zip file gc log for each server

replication error 5.zip 99.3 KB
Actions
23. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

sathishkumarbt Aug 17, 2016 6:35 AM (in response to wdfink)

Hi,
I facing similar issue with the warning
Aug 17, 2016 3:15:05 PM org.jgroups.protocols.FlowControl start
WARNING: The fragmentation size of the fragmentation protocol is 60000, which is greater than min_credits (40000). This can lead to blockings (https://issues.jboss.org/browse/JGRP-1659)
Aug 17, 2016 3:15:05 PM org.jgroups.blocks.cs.TcpServer$Acceptor run
WARNING: JGRP000006: failed accepting connection from peer
java.net.SocketException: BaseServer.TcpConnection.readPeerAddress(): cookie read by 192.168.15.135:7860 does not match own cookie; terminating connection
at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:256)
at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:54)
at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:132)
at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:117)
at java.lang.Thread.run(Thread.java:745)

Jgroups


<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-3.6.xsd">

<TCP loopback="true"
       bind_addr="${jgroups.tcp.address:153.88.45.71}"
       bind_port="${jgroups.tcp.port:7860}"
       recv_buf_size="${tcp.recv_buf_size:20M}"
       send_buf_size="${tcp.send_buf_size:640K}"
       discard_incompatible_packets="true"
       max_bundle_size="64K"
       max_bundle_timeout="30"
       enable_bundling="true"
       use_send_queues="true"
       sock_conn_timeout="300"
       timer_type="new"
       timer.min_threads="4"
       timer.max_threads="10"
       timer.keep_alive_time="3000"
       timer.queue_max_size="500"
       thread_pool.enabled="true"
       thread_pool.min_threads="2"
       thread_pool.max_threads="30"
       thread_pool.keep_alive_time="60000"
       thread_pool.queue_enabled="false"
       thread_pool.queue_max_size="100"
       thread_pool.rejection_policy="discard"
       oob_thread_pool.enabled="true"
       oob_thread_pool.min_threads="2"
       oob_thread_pool.max_threads="30"
       oob_thread_pool.keep_alive_time="60000"
       oob_thread_pool.queue_enabled="false"
       oob_thread_pool.queue_max_size="100"
       oob_thread_pool.rejection_policy="discard"/>

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    <TCPGOSSIP initial_hosts="${jgroups.tcpping.initial_hosts:153.88.45.71[7860],153.88.45.71[7861]}" />
    <MERGE2 max_interval="30000" min_interval="10000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK
        use_mcast_xmit="false"
        retransmit_timeout="300,600,1200,2400,4800"
        discard_delivered_msgs="true"/>
    <UNICAST2 timeout="300,600,1200"
              stable_interval="5000"
              max_bytes="1m"/>
    <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>
    <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
    <UFC max_credits="200k" min_threshold="0.20"/>
    <MFC max_credits="200k" min_threshold="0.20"/>
    <FRAG2 frag_size="60000"/>
    <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />
</config>

infinispan

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:8.2 http://www.infinispan.org/schemas/infinispan-config-8.2.xsd"
xmlns="urn:infinispan:config:8.2">

<jgroups>
        <stack-file name="configurationFile" path="jgroups.xml"/>
    </jgroups>
    <cache-container>

<transport   stack="configurationFile" />
<replicated-cache name="transactional-type" mode="SYNC">
<transaction mode="NON_XA" locking="OPTIMISTIC" transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" auto-commit="true" />
<locking acquire-timeout="60000"/>
<expiration lifespan="43200000"/>
</replicated-cache>
    </cache-container>
</infinispan>
Actions
24. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async

wdfink Aug 17, 2016 7:45 AM (in response to sathishkumarbt)

Hello Sathish,
please avoid double post and continue in your thread Replication is not happening with Infinispan 8.2.2
Actions

1 2 Previous Next

Go to original post