1 2 Previous Next 24 Replies Latest reply on Aug 17, 2016 7:45 AM by wdfink Go to original post
      • 15. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
        ez1234

        Adding some more logs in DEBUG

         

        See attached.

         

        In the scenario we had massive numbers of entries not replicated between cluster nodes one & two.

        There were many errors in the logs. Please take a look and see if this can take us forward in the issue investigation.

         

        Thanks,

        Eli

        • 16. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
          nadirx

          It looks like the cluster is splitting. Are you sure you're not hitting GC limits ? Do you have GC logs ?

          • 17. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
            ez1234

            Hi,

             

            I do not know if i am hitting GC limits. I do not have GC logs.

            I do know that we start the server with 10GB of heap memory. on one machine with 24GB of RAM and another with 50 GB of RAM.

            How do i enable GC logs?

             

            And we encounter this scenario even when not writing many entries to the cache. The cache is not under load at all.

            Attached is another set of logs. I can see that what you said that the cache cluster is splitting / disconnecting is happening in the logs. Why does the cache cluster is splitting / nodes disconnecting from each other?

            is this related to the cluster configuration? networking issue? a bug?

             

            Thanks,

            Eli

            • 18. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
              ez1234

              Hi,

               

              Any updates on this issue?

               

              Thanks,

              Eli

              • 19. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                pruivo

                Hi Eli,

                 

                To enable the GC logs, enable this flags in the jvm: -XX:+PrintGCDetails -Xloggc:<file/to/store/gc-logs>

                 

                There a couple of reason why the cluster splits:

                * The GC is one of them if it stops the application for too long, the heatbeats between nodes will not be sent and they are considered down. The GC logs will tell us more.

                * Remote thread pool exhausted. It can happen under heavy load and it may not have free threads to send the heartbeat. A stack trace would be good to find out.

                * Network issues. Could be a firewall blocking a connection, or the  physical connection is drop or a hardware failure.

                * a bug? yes of course. This isn't a perfect world.

                 

                Cheers,

                Pedro

                • 20. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                  ez1234

                  Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?

                  • 21. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                    ez1234

                    Shell i switch to using G1 GC as well? Or currently just stay with the default to try and catch the problem again?

                     

                    Another question, is there some configuration changes / tuning we can do to the jgroups / infinispan modules in order to reduce the risk for cluster split brain and disconnections between nodes?

                    • 22. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                      ez1234

                      I have reproduced the issue now with gc logs.

                       

                      See in attached zip file gc log for each server

                      • 23. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                        sathishkumarbt

                        Hi,

                        I facing similar issue with the warning

                        Aug 17, 2016 3:15:05 PM org.jgroups.protocols.FlowControl start

                        WARNING: The fragmentation size of the fragmentation protocol is 60000, which is greater than min_credits (40000). This can lead to blockings (https://issues.jboss.org/browse/JGRP-1659)

                        Aug 17, 2016 3:15:05 PM org.jgroups.blocks.cs.TcpServer$Acceptor run

                        WARNING: JGRP000006: failed accepting connection from peer

                        java.net.SocketException: BaseServer.TcpConnection.readPeerAddress(): cookie read by 192.168.15.135:7860 does not match own cookie; terminating connection

                          at org.jgroups.blocks.cs.TcpConnection.readPeerAddress(TcpConnection.java:256)

                          at org.jgroups.blocks.cs.TcpConnection.<init>(TcpConnection.java:54)

                          at org.jgroups.blocks.cs.TcpServer$Acceptor.handleAccept(TcpServer.java:132)

                          at org.jgroups.blocks.cs.TcpServer$Acceptor.run(TcpServer.java:117)

                          at java.lang.Thread.run(Thread.java:745)

                         

                         

                        Jgroups

                         

                        <!--

                            TCP based stack, with flow control and message bundling. This is usually used when IP

                            multicasting cannot be used in a network, e.g. because it is disabled (routers discard multicast).

                            Note that TCP.bind_addr and TCPPING.initial_hosts should be set, possibly via system properties, e.g.

                            -Djgroups.bind_addr=192.168.5.2 and -Djgroups.tcpping.initial_hosts=192.168.5.2[7800]".

                            author: Bela Ban

                        -->

                        <config xmlns="urn:org:jgroups"

                                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                                xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-3.6.xsd">

                         

                         

                          <TCP loopback="true"

                               bind_addr="${jgroups.tcp.address:153.88.45.71}"

                               bind_port="${jgroups.tcp.port:7860}"

                               recv_buf_size="${tcp.recv_buf_size:20M}"

                               send_buf_size="${tcp.send_buf_size:640K}"

                               discard_incompatible_packets="true"

                               max_bundle_size="64K"

                               max_bundle_timeout="30"

                               enable_bundling="true"

                               use_send_queues="true"

                               sock_conn_timeout="300"

                               timer_type="new"

                               timer.min_threads="4"

                               timer.max_threads="10"

                               timer.keep_alive_time="3000"

                               timer.queue_max_size="500"

                               thread_pool.enabled="true"

                               thread_pool.min_threads="2"

                               thread_pool.max_threads="30"

                               thread_pool.keep_alive_time="60000"

                               thread_pool.queue_enabled="false"

                               thread_pool.queue_max_size="100"

                               thread_pool.rejection_policy="discard"

                               oob_thread_pool.enabled="true"

                               oob_thread_pool.min_threads="2"

                               oob_thread_pool.max_threads="30"

                               oob_thread_pool.keep_alive_time="60000"

                               oob_thread_pool.queue_enabled="false"

                               oob_thread_pool.queue_max_size="100"

                               oob_thread_pool.rejection_policy="discard"/>

                         

                         

                            <!-- <TCP_NIO -->

                            <!--         bind_port="7800" -->

                            <!--         bind_interface="${jgroups.tcp_nio.bind_interface:bond0}" -->

                            <!--         use_send_queues="true" -->

                            <!--         sock_conn_timeout="300" -->

                            <!--         reader_threads="3" -->

                            <!--         writer_threads="3" -->

                            <!--         processor_threads="0" -->

                            <!--         processor_minThreads="0" -->

                            <!--         processor_maxThreads="0" -->

                            <!--         processor_queueSize="100" -->

                            <!--         processor_keepAliveTime="9223372036854775807"/> -->

                            <!--   <TCPGOSSIP initial_hosts="192.168.15.135[7860]"/> -->

                            <!-- <TCPPING initial_hosts="${jgroups.tcpping.initial_hosts}" -->

                            <!--          port_range="0" -->

                            <!--          timeout="3000" -->

                            <!--          /> -->

                            <TCPGOSSIP initial_hosts="${jgroups.tcpping.initial_hosts:153.88.45.71[7860],153.88.45.71[7861]}" />

                            <MERGE2 max_interval="30000" min_interval="10000"/>

                            <FD_SOCK/>

                            <FD timeout="3000" max_tries="3"/>

                            <VERIFY_SUSPECT timeout="1500"/>

                            <pbcast.NAKACK

                                use_mcast_xmit="false"

                                retransmit_timeout="300,600,1200,2400,4800"

                                discard_delivered_msgs="true"/>

                            <UNICAST2 timeout="300,600,1200"

                                      stable_interval="5000"

                                      max_bytes="1m"/>

                            <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>

                            <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>

                            <UFC max_credits="200k" min_threshold="0.20"/>

                            <MFC max_credits="200k" min_threshold="0.20"/>

                            <FRAG2 frag_size="60000"/>

                            <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />

                        </config>

                         

                        infinispan

                         

                        <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                          xsi:schemaLocation="urn:infinispan:config:8.2 http://www.infinispan.org/schemas/infinispan-config-8.2.xsd"

                          xmlns="urn:infinispan:config:8.2">

                         

                         

                        <jgroups>

                                <stack-file name="configurationFile" path="jgroups.xml"/>

                            </jgroups>

                            <cache-container>

                          <!-- <transport cluster="x-cluster" stack="configurationFile" /> -->

                          <transport   stack="configurationFile" />

                          <replicated-cache name="transactional-type" mode="SYNC">

                          <transaction mode="NON_XA" locking="OPTIMISTIC" transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"  auto-commit="true"  />

                          <locking acquire-timeout="60000"/>

                          <expiration lifespan="43200000"/>                

                          </replicated-cache>

                            </cache-container>

                        </infinispan>

                        • 24. Re: Infinispan server: Cache entries are not replicated although cache is in mode replicated async
                          wdfink

                          Hello Sathish,

                          please avoid double post and continue in your thread Replication is not happening with Infinispan 8.2.2

                          1 2 Previous Next