1 2 3 Previous Next 35 Replies Latest reply on Jul 25, 2017 6:35 AM by galder.zamarreno

    [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)

    vishalhighq

      Hi everyone,

       

      We have been facing infinispan server issue while we are testing our application with 100 concurrent active session each cluster node of our application.

       

      Infinispan server configuration.

       

      • Two Domain with Two node
      • Near Cache is configured,INVALIDATE type , with no max entry limit.
      • our Application is having two cluster node as well.
      • Infinispan version : 8.2.3

       

      Error we are finding in infinispan server:

       

      2017-06-08 18:49:18,464 DEBUG [org.infinispan.server.hotrod.HotRodExceptionHandler] (HotRodServerWorker-8-7) Exception caught: java.io.IOException: An existing connection was forcibly closed by the remote host

        at sun.nio.ch.SocketDispatcher.read0(Native Method)

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

        at sun.nio.ch.IOUtil.read(IOUtil.java:192)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)

        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898)

        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)

        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)

        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)

        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:485)

        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:399)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:371)

        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)

        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)

        at java.lang.Thread.run(Thread.java:745)

       

      After getting this error, we tried disabling near cache with infinispan server. That configuration get the things correct but It also result in very slow performance which is not acceptable as we are using infinispan server for performance purpose only.

       

                     

      Our Template using which we are creating cache container in Infinispan

                           <distributed-cache-configuration name="OurTempateName" owners="2" segments="20" mode="SYNC" remote-timeout="30000" start="EAGER">

                              <locking striping="false" acquire-timeout="30000" concurrency-level="1000"/>

                              <transaction mode="NONE"/>

                              <security/>

                          </distributed-cache-configuration>

       

      With Client server infinispan setup, Can anyone suggest configuration which can overcome this issue?

       

      nadirx Can you please help us here in this case?  Any modification in configuration to sort out this issue / to improve performance when we are having more that 100 active session over production ?

        • 1. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
          nadirx

          Do you have any client-side logs ?

          How are the client connection pools configured ?

          Can you share the output of

          netstat -ntlp | grep 11222
          

          for the server ?

          Do you use any client-side listeners aside from the near cache ?

          As for the configuration, you can simplify it greatly and increase the number of segments:

           

          <distributed-cache-configuration name="OurTempateName" owners="2" segments="256" mode="SYNC" remote-timeout="30000" start="EAGER">
              <security/>
          </distributed-cache-configuration>
          

           

          I see "security" in there. Are you using authorization ?

          I would also recommend upgrading to Infinispan 9.0 since we have done a lot of work on performance and scalability.

           

          Tristan

          • 2. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
            vishalhighq

            Hi nadirx

             

            We are still getting same issue with Infinispan 9.0.1 and with above mention configuration.

             

            Do you use any client-side listeners aside from the near cache ?

             

            No We do not have any other listeners other than near cache.

             

            Are you using authorization ?

             

            No  we are not using authorization.

             

            Do you have any client-side logs ?

             

            we are only getting single line NullPointerException. Like "java.lang.NullPointerException"

             

            How are the client connection pools configured ?

            Can you share the output of

             

             

            1. netstat -ntlp | grep 11222 

            for the server ?

             

            I will let you know once getting this information.

             

             

            Please suggest any further solution which can solve this huddle for us.

            • 3. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
              nadirx

              that NPE on the client could be interesting. Any way we can get a stacktrace from it ? Do you have logging turned on ?

              • 4. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                vishalhighq

                Hi nadirx

                 

                We have logging enabled with debug level. Still getting Single Line Log from Infinspan.  Due to this , we have enabled logging in infinispan and take log from infinispan iteself.

                 

                Infinispan log is attached within start of this question which you may want to refer.

                 

                Apart from this we are not getting any log from each side.

                • 5. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                  vishalhighq

                  Hi nadirx

                   

                  After converting all type of logger to debug in infinispan, following is the log i am able to get from our application but with Infinispan 8.2.3 version.

                  1251479 [ERROR] ClientListenerNotifier: ISPN004043: Unrecoverable error reading event from server /192.168.156.222:11222, exiting event reader thread

                  org.infinispan.client.hotrod.exceptions.TransportException: java.io.IOException: An existing connection was forcibly closed by the remote host

                          at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.readByte(TcpTransport.java:195) ~[infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at org.infinispan.client.hotrod.impl.protocol.Codec20.readMagic(Codec20.java:305) ~[infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at org.infinispan.client.hotrod.impl.protocol.Codec20.readEvent(Codec20.java:147) ~[infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at org.infinispan.client.hotrod.event.ClientListenerNotifier$EventDispatcher.run(ClientListenerNotifier.java:264) [infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_51]

                          at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_51]

                          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]

                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]

                          at org.infinispan.client.hotrod.event.ClientListenerNotifier.lambda$null$0(ClientListenerNotifier.java:79) [infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at org.infinispan.client.hotrod.event.ClientListenerNotifier$$Lambda$13/703586411.run(Unknown Source) [infinispan-remote-8.2.3.Final.jar:8.2.3.Final]

                          at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]

                  Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host

                          at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[?:1.8.0_51]

                          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[?:1.8.0_51]

                          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_51]

                          at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_51]

                          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_51]

                   

                  Also with Infinispan version 9.0.3, this weird log I am getting when Infinspan server stop responding.

                   

                   

                  ERROR [vishalr_dev][bt:2017_06_12_15_32_34][cb:4.1.4][LoadCacheHelperThread 11][2017-06-12 16:35:53,015]- com.os.util.helper.griffin.ThreadPoolHelper.CachingServerExceptionMailThread.run(CachingServerExceptionMailThread.java:110) - java.io.FileNotFoundException: D:\tools\tomcat8-9\temp\Infinispan server error ISPN004034: Unable to unmarshall bytes 030409000000176A6176612E7574696C2E4C696E6B6564486173684D617034C04E5C106CC0FB000000010000000B6163636573734F72646572200038000000116A6176612E7574696C2E486173684D61700507DAC1C31660D1000000020000000A6C6F6164466163746F722600000000097468726573686F6C642300163F4000000000000C320800000010000000094B00000001043BFE3F4000000000000C320800000010000000013E03706E67040900000025636F6D2E6F732E64626F2E6772696666696E2E4F6372537570706F7274446F63756D656E745443135EE7296BD300000000090000001F636F6D2E6F732E64626F2E6772696666696E2E446F63756D656E7454797065FDEB0FF5762C272700000013000000116170706C69616E636544657461696C494423000000000C636F6E76657274546F504446230000000016636F6E76657274546F5044464265666F726553656E6423000000000B646973706C61794E616D65160000000013646F63446973706C6179457874656E73696F6E16000000000C646F63457874656E73696F6E160000000012646F63547970654465736372697074696F6E160000000009646F6354797065494423000000000B646F635479706549636F6E16000000000E656E6372797074696F6E466C6167230000000009696D6167655479706523000000000C6973437573746F6D49636F6E2300000000106C61726765446F635479706549636F6E16000000000C6D6F6469666965644461746516000000000A6F6372537570706F72742300000000166F6E6C696E6545646974696E6756696157656244415623000000000B7376675479706549636F6E20000000000F74726565446F635479706549636F6E16000000000D766965776572537570706F72742300160000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000002043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000043043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000042043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000047043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000045043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B00000059043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035004B0000004D043BFC3F4000000000000C320800000010000000013E03706E67043BFE0000000000000000000000000139FE39FE3E18506F727461626C65204E6574776F726B20477261706869630000007B3E07706E672E706E670000000000000000000000000101000000000000000000010000000035003500 exception.txt (The filename, directory name, or volume label syntax is incorrect)

                   

                   

                  Both log is fetched from our application log. At the same time,  There was not any log in infinispan server.

                  • 6. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                    galder.zamarreno

                    Are there any log files attached? I can't see any attachments.

                     

                    Anyway, independent of that, seems to me the server stops responding. Many reasons why that could happen, but the usual things to check are:

                     

                    * Are the server(s) running out of memory? Do you see any OutOfMemoryError messages? Even if these don't appear, do you have any GC logs? The server might be too busy doing GC and unable to reply. Apart from getting any GC logs, I'd also suggest getting a heap dump when server stops responding.

                     

                    * Another possibility for the server to stop responding is for some deadlock to happen. You might be able to see that if you get a thread dump of the server(s) when they stop responding.

                     

                    I can't say much else with the info provided so far. Also, use gist or pastebin for any longer log outputs or thread dumps.

                    • 7. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                      vishalhighq

                      Hi galder.zamarreno

                       

                      Our setup as of now to get requested file.

                       

                      Our Appliction is with single cluster node

                      Infinispan (.0.1)

                      One Domain

                      One Host

                       

                      Our configuration for cache is same as nadirx suggested.

                      within 50 constant user , it stop responding using load testing.

                       

                      This is link where you can find this files as zip : https://files.fm/u/vmggsjjk https://files.fm/u/vmggsjjk

                      • 8. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                        galder.zamarreno

                        Thanks for the files. At first glance I don't see anything meaningful, but I've noticed something that I had missed before:

                         

                        The exceptions are appearing in the server, not in the client. I originally thought the exceptions were reported by the client.

                         

                        Anyway, the server's heap dumps look fine, so I'd suggest you try to get a thread dump when the system stops in the server.

                         

                        Also, since the exceptions in the server are saying that the client stopped the connection, is that because the client application was halted somehow? You should also check GC logs, heap dump and thread dumps for the client. I don't think these are in the zip you attached.

                         

                        Finally, if none of this shows anything, I'd suggest enabling TRACE on org.infinispan package in both server and client and attach those.

                        • 9. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                          vishalhighq

                          galder.zamarreno

                           

                          I have fetched requested information in following zip file: https://files.fm/u/f4h9uh28

                          This zip contains:

                          1. GC log of client application
                          2. Heap Dump of Client application
                          3. Thread Dump of Client application
                          4. server log of client application
                          5. server log of infinispan domain
                          6. server log of infinispan node

                           

                          Here is code using which we are connecting to infinispan

                           

                          try{

                               if (manager != null && manager.isStarted())

                                 {

                                      manager.stop();

                                 }

                           

                                 ConfigurationBuilder cb = new ConfigurationBuilder();

                                 logger.info("connectWithInfinispanServer : started");

                                 ExhaustedAction exhaustedAction = ExhaustedAction.WAIT;

                           

                                 cb.tcpNoDelay(true).connectionPool()

                                 .maxActive(-1)

                                 .maxIdle(-1)

                                 .maxWait(-1)

                                 .minIdle(1)

                                 .numTestsPerEvictionRun(3)

                                 .exhaustedAction(exhaustedAction)

                                 .timeBetweenEvictionRuns(120000)

                                 .minEvictableIdleTime(1800000)

                                 .testWhileIdle(true)

                                 .socketTimeout(1000)

                                 .connectionTimeout(500).tcpKeepAlive(true)

                                 .addServer().host(serverUrl).maxRetries(Constants.INFINISPAN_SERVER_MAX_RETRIES)

                                 .nearCache().mode(NearCacheMode.INVALIDATED).maxEntries(-1);

                           

                                 builder = cb;

                                 manager = new RemoteCacheManager(builder.build());

                                 if (!manager.isStarted())

                                 {

                                      manager.start();

                                 }

                           

                                 logger.info("connectWithInfinispanServer builder and manager created, now next step to get Cache with Name : " + cacheName);

                                 cache = manager.getCache(cacheName);

                                 logger.info("connectWithInfinispanServer Successfully get Cache with Name : " + cacheName);

                            }

                            catch (Exception e)

                            {

                                 setException(e);

                                 logger.error(e);

                            }

                           

                           

                          Also, From our point of analysis, It seems that we need proper setup for communication between Infinspan and Tomcat server which can handle connection failour and discover scenario as well. Do you have any idea about it ?

                           

                          Can you also please let us know, How our hotroad client will go to another domain of infinispan when It is unable to connect to first infinispan domain, as we are giving both domain URL in one string comma separated as serverUrl (ie. "jboss1domain.dev.com ,jboss2domain.dev.com").

                           

                          One more error found continuously: https://files.fm/u/4gdeqkyz .

                          • 10. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                            galder.zamarreno

                            Thx for all the info. Much appreciate the effort you've put into compiling this information.

                             

                            There's a few issues here:

                             

                            First, I think you have a very low socket timeout configuration in the clients, 1000ms which is 1s. That would explain why the server would say that the client has closed the connection. If you're running load tests, the system might be a bit overloaded to handle the concurrency required and if the client does not get a response in 1s, it will close the connection. Also, you need to take into account that you're configuring a cluster, so data will be replicated and the size of the values passed around, so there's a time penalty there.

                             

                            I also see that the lock timeout is 10s, so your socket timeout should be at least bigger than that. So, on top of the lock timeout, you should add replication timeout that you've set. The default is 60s which should work fine, but you might want to even increase that if running load tests.

                             

                            The second issue I see is that there are a lot of lock timeouts around a particular key, 0x034B0000002B. Does you system update a given a key a lot? The lock seems to be held by "CommandInvocation:C2LOCAL:server-one:599" but there's not enough information in the logs to find out what's up with this. We would need TRACE logging to find out more. The logs only contain DEBUG logging.

                             

                            I also see that the value sizes seem quite big, could you give us an estimate of the size of the values?

                            • 11. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                              vishalhighq

                              Hi galder.zamarreno

                               

                              Thanks for you precious review about our configuration.

                               

                              We would need TRACE logging to find out more. The logs only contain DEBUG logging.

                              I tried with TRACE level enable, but it put lots of logger and server become very slow . Even login screen comes after 1 or 2 minuts. So I am not able to do load testing with Trace enable .

                               

                               

                              I also see that the value sizes seem quite big, could you give us an estimate of the size of the values?

                               

                              It can not be calculated as it depends on different data of our client is having for different object.

                               

                               

                              Does you system update a given a key a lot?

                               

                              For Load testing we are only adding & getting data from cache. No Update or Delete process is being done on Cache entries. So then Why it is holding lock for any entry?  Please refer this image:https://files.fm/u/tfmd2pff

                               

                               

                              We have modified our configuration for socket timeout. However we have disabled cluster and running infinispan with One domain and its node only.Still we are facing this issue.

                               

                              Can we have a discussion for our configuration of infnispan regarding this issue on skype or hangout if it is fine with you and you have time?

                              • 12. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                                vishalhighq

                                Hi nadirx and galder.zamarreno

                                 

                                Can anyone please suggest further solution ?

                                 

                                We are currently stuck with this in our production environment.  Any mediator which can handle communication between Infinispan and our application tomcat which can be solution of this issue?

                                 

                                Please help us out.

                                • 13. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                                  vishalhighq

                                  nadirx and galder.zamarreno

                                   

                                  We have dig further for exception which we were getting during server stops responding. We tried to discover for which type of data we are getting exception from infinispan.

                                   

                                  org.infinispan.client.hotrod.exceptions.HotRodClientException:: ISPN004034: Unable to unmarshall bytes 030409000000176A6176612E7574696C2E4C696E6B6564486173684D617034C04E5C106CC0FB000000010000000B6163636573734F72646572200038000000116A6176612E7574696C2E486173684D61700507DAC1C31660D1000000020000000A6C6F6164466163746F722600000000097468726573686F6C642300163F40000000000030320800000040000000234B00000066043BFF3F4000000000000C320800000010000000014B00000066040900000022636F6D2E6F732E64626F2E6772696666696E2E536974655061727469636970616E74E937D7AE55744AA50000000D00000017636F6E74656E7441646D696E416363657373417272617916000000000E637265617465536974655573657220000000000C6372656174656442794D6170160000000011696E7665746174696F6E53656E744D617016

                                  We found that above unmarshall exception is coming for LinkedHashMap Object.  However we are not getting same issue for HashMap. For few object we have converted them to HashMap and Error was not coming for this object anymore. But still sometime we are getting this error in remaining LinkedHashMap. So Need your input for this issue.

                                   

                                  One more thing which we notice is that, After ignoring those unmarshall exception ( for which if comes we are getting data from db ), We are now getting two types of issue.

                                      1. NullPointerException from Infinispan.

                                       2. org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport

                                   

                                  We are getting few NullPointerException , prior to org.infinispan.client.hotrod.exceptions.TransportException in our log. Does this help in getting some idea for this issue?

                                   

                                   

                                  After searching through internet for this issue, we found one discussion thread Show stopper: Infinispan hot rod server gets stuck / dead lock in high load with registered client listener in hot rod client - infinispan-server-8.2.1.Final   in which Some thread issue creating hot rod client to get stuck.

                                   

                                  So that we have closely analysis of all running thread during jemeter load testing. We found that when we get above exception at that time,  Thread name like this > RMI TCP Connection(23)-192.168.156.135  is stuck with park status for 1 to 3 second in our application as well as in one of three java application of infinispan.

                                   

                                  FYI:

                                   

                                  "RMI TCP Connection(idle)" #436 daemon prio=5 os_prio=0 tid=0x000000001e65b800 nid=0xa2c waiting on condition [0x00000000366ae000]

                                     java.lang.Thread.State: TIMED_WAITING (parking)

                                  at sun.misc.Unsafe.park(Native Method)

                                  - parking to wait for  <0x00000000e1d89868> (a java.util.concurrent.SynchronousQueue$TransferStack)

                                  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

                                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

                                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)

                                  at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)

                                  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)

                                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)

                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                                  at java.lang.Thread.run(Thread.java:745)

                                   

                                   

                                     Locked ownable synchronizers:

                                  - None

                                   

                                   

                                  Looking forward for your reply for above issues.

                                   

                                  PS: This whole debugging is done with Infinispan 9.0.1 and with one domain and one node configuration. Also We have recovered lot after applying your suggested configuration about socket time out , but still at some point we are getting this issue in load testing.

                                  • 14. Re: [Blocker] Inifnispan server stop responding (near cache configured , clustered environment, client server configuration)
                                    galder.zamarreno

                                    Vishal Patel wrote:

                                     

                                    Does you system update a given a key a lot?

                                     

                                    For Load testing we are only adding & getting data from cache. No Update or Delete process is being done on Cache entries. So then Why it is holding lock for any entry? Please refer this image:https://files.fm/u/tfmd2pff

                                    Hmmm, if you're never updating a key, the problem might be due to updating two keys concurrently that are kept under the same segment underneath. Just in case that's what's causing issues, you could also try to increase locking XML element, concurrency-level attribute. By default this is 32, you could try putting 256 as value. Try also with 1024 as value and see whether it improves things.

                                    1 2 3 Previous Next