1 2 Previous Next 17 Replies Latest reply on Nov 7, 2009 6:08 AM by Radim Chlad

    Messaging blocked by long time-out

    Ralf Torsten Menche Newbie

      Hi,

      I posted this question originally in the Messaging forum (http://www.jboss.org/index.html?module=bb&op=viewtopic&t=152037), but was told that it seems to be more of a Remoting issue.

      I'm repeating the introductory description here, but would ask you to follow the link above for the full stack trace.


      We are using JBAS 4.2.2.GA with JBM 1.4.2.GA-SP1 and jboss-remoting-2.2.2.SP11-brew.jar.

      The server (running on Linux) is publishing on several topics to some Windows clients. If the network connection to one of the clients fails, the updates to ALL clients stop until a time-out occurs after about 15 to 18 minutes!

      Below is the stack trace of the server-side exception caused by the time-out. It should help to identify the offending operation that takes so long to time out.

      Can I do anything to shorten this time-out substantially to, say, 10 secs?


      I suppose that the JIRA mentioned by Tim in his reply might be one of JBREM-1069, JBREM-1082 or JBMESSAGING-1482 that all seem to be concerned with passing parameters from the Messaging configuration to Remoting. But I'm rather confused whether my problem is really related to one of these and whether it should have been expected to be fixed by either Remoting 2.2.2.SP11-brew or Messaging 1.4.2.GA-SP1.

      Any help welcome.



        • 1. Re: Messaging blocked by long time-out
          Tarek Hammoud Novice

          We are experiencing the exact same problem on Linux. This is very easy to reproduce. Have two clients running against a server and pull the network cable from one of the clients simulating a kernel death. A few seconds later, the server puts out:

          13:07:15,064 WARN [SimpleConnectionManager] A problem has been detected with the connection to remote client 5c4o12v-svccoz-fs969gkr-1-fs96isi3-1z, jmsClientID=b1-uvp969sf-1-rkg969sf-zoccvs-v21o4c5. It is possible the client has exited without closing its connection(s) or the network has failed. All associated connection resources will be cleaned up.


          If you dump the stack trace you will see:



          Timer-87" Id=954 RUNNABLE (in native)
          at java.net.SocketOutputStream.socketWrite0(Native Method)
          at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
          at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
          at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
          at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
          - locked java.io.BufferedOutputStream@4773d99f
          at java.io.DataOutputStream.flush(DataOutputStream.java:106)
          at org.jboss.jms.wireformat.ClientDelivery.write(ClientDelivery.java:93)
          at org.jboss.jms.wireformat.JMSWireFormat.write(JMSWireFormat.java:237)
          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.versionedWrite(MicroSocketClientInvoker.java:971)
          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:606)
          at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:422)
          at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:133)
          at org.jboss.remoting.Client.invoke(Client.java:1645)
          at org.jboss.remoting.Client.invoke(Client.java:559)
          at org.jboss.remoting.Client.invokeOneway(Client.java:609)
          at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallback(ServerInvokerCallbackHandler.java:826)
          at org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallbackOneway(ServerInvokerCallbackHandler.java:697)
          at org.jboss.jms.server.endpoint.ServerSessionEndpoint.performDelivery(ServerSessionEndpoint.java:1452)
          at org.jboss.jms.server.endpoint.ServerSessionEndpoint.handleDelivery(ServerSessionEndpoint.java:1364)
          - locked org.jboss.jms.server.endpoint.ServerSessionEndpoint@6d47a5f
          at org.jboss.jms.server.endpoint.ServerConsumerEndpoint.handle(ServerConsumerEndpoint.ja


          After the timeout, the server dumps:


          13:22:51,975 ERROR [SocketClientInvoker] Got marshalling exception, exiting
          java.io.IOException: No route to host
          at java.net.SocketOutputStream.socketWrite0(Native Method)
          at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
          at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
          at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
          at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
          at java.io.DataOutputStream.flush(DataOutputStream.java:106)
          at org.jboss.jms.wireformat.ClientDelivery.write(ClientDelivery.java:93)
          at org.jboss.jms.wireformat.JMSWireFormat.write(JMSWireFormat.java:237)
          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.versionedWrite(MicroSocketClientInvoker.java:971)
          at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:606)
          at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.transport(BisocketClientInvoker.java:422)
          at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:133)
          at org.jboss.remoting.Client.invoke(Client.java:1645)
          at org.jboss.remoting.Client.invoke(Client.java:559)


          and other clients start getting ticks.

          This will not come back until the write socket times out. I believe that is a tcp keep alive issue and on Linux it defaults to minutes.

          Is there any configuration that we can use? We do have:

          <attribute name="callbackTimeout">10000</attribute>set in our remoting-bisocket-service.xml file.


          We are dead in the water in our migration effort from ActiveMQ to JBOSS messaging because of this. Thank you for any help.



          • 2. Re: Messaging blocked by long time-out
            Ron Sigal Master

            Hi guys,

            First, a little background. JBossMessaging sends messages to a consumer by calling org.jboss.remoting.callback.ServerInvokerCallbackHandler.handleCallback(), which, in the case of the bisocket transport. results in a call to org.jboss.remoting.Client.invoke(), which eventually leads to a write on a java.net.Socket. Now, it's true that there is a parameter, "callbackTimeout", that can be used to configure the callback Client and, ultimately, the Sockets used by the Client. But it's important to note that the Socket timeout value, set by Socket.setSoTimeout(), affects blocking reads on the Socket's SocketInputStream. It doesn't affect blocking writes. See, for example,

            http://java.sun.com/j2se/1.4.2/docs/api/java/net/Socket.html#setSoTimeout(int).

            See also

            http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4031100,

            where someone asked (in 1997!) for a write timeout for Sockets. The request was rejected since they anticipated NIO would eliminate the problem.

            "thammoud" wrote:

            I believe that is a tcp keep alive issue


            Yes, I think can use TCP to discover that the connection has failed. Indeed, it's probably a keepalive failure that's causing the exceptions your both seeing. Fortunately, it is possible to configure the keepalive parameters so that the failure is detected more quickly. Unfortunately, as far as I know, you have to set the parameters at the TCP level rather than the Java level. For Linux configuration, see, for example,

            http://www.linux.org/docs/ldp/howto/TCP-Keepalive-HOWTO/usingkeepalive.html

            section 3.1, in particular, and for Windows, see, for example,

            http://msdn.microsoft.com/en-us/library/ms819735.aspx

            On the other hand, there is an org.jboss.remoting.Lease on the server side which, if it doesn't receive pings from the client in a timely fashion, will declare that a connection is broken and inform any registered listeners. Now JBossMessaging registers a listener, so it should be informed about the broken connection, and it should attempt to shut it down. Remoting won't try to kill a Socket in the middle of a read() or write(), but I'm a little surprised that JBossMessaging doesn't clean things up enough to prevent all the other clients from receiving messages.

            Are you seeing the output from

             log.trace("Notified connection listener of lease expired due to lost connection from client (client session id
            = " + clientHolder.getSessionId());
            


            on your server logs?

            -Ron

            • 3. Re: Messaging blocked by long time-out
              Jun Liao Newbie

              Hi all
              I faced the same problem,Can Remoting provide a method to resolve the problem like AMQ.
              http://issues.apache.org/activemq/browse/AMQ-1993

              • 4. Re: Messaging blocked by long time-out
                Mark Swanson Newbie

                Java does allow setting the keep-alive parameter programatically on a socket. It appears from the Javadocs, however, that this value is binary (set to 2 hours or nothing).

                • 5. Re: Messaging blocked by long time-out
                  Tarek Hammoud Novice

                  ActiveMQ's solution was above and beyond the TTL. This is a serious issue that needs to be fixed by the JBOSS folks asap. No respectable messaging middle-ware should ever be held hostage to a rogue client.

                  Unfortunately, we had to abandon our migration to JBM from ActiveMQ because of this issue (and a couple of other anomalies that have to do with reconnects). While we love the clustering capabilities of the system, the reliability leaves much to be desired under stressful conditions. For now, our MDB no longer use jbossmq and now use JBM, our client interaction with our servers will stay with ActiveMQ until these issues are resolved.

                  • 6. Re: Messaging blocked by long time-out
                    Jun Liao Newbie

                    Yes,This is a serious problem.
                    The JBM are not reliable in stressful conditions,We have to consider choosing another mom production.

                    • 7. Re: Messaging blocked by long time-out
                      Tim Fox Master

                      +1

                      I believe this is a serious issue and remoting should apply a fix as per the AMQ fix.

                      Ron, do you have a JIRA and ETA for this?

                      • 8. Re: Messaging blocked by long time-out
                        Ron Sigal Master

                         

                        "timfox" wrote:

                        Ron, do you have a JIRA and ETA for this?


                        JBREM-1120 "Add a socket write timeout facility".

                        I was hoping to get something into Remoting 2.5.1, which I just released, but time didn't allow. I've scheduled JBREM-1120 for releases 2.2.2.SP12 and 2.5.2.

                        -Ron

                        • 9. Re: Messaging blocked by long time-out
                          w y Newbie

                          does this bug affect JBM queue too? or topics only?

                          • 10. Re: Messaging blocked by long time-out
                            Wayland Chan Newbie

                            Any ETA on 2.2.2SP12?

                            • 11. Re: Messaging blocked by long time-out
                              Ron Sigal Master

                               

                              "jlaemthonglang" wrote:
                              does this bug affect JBM queue too? or topics only?


                              As far as I know, it should affect both. I'm sure someone from JBossMessaging could be more definite.

                              • 12. Re: Messaging blocked by long time-out
                                Ron Sigal Master

                                 

                                "waylandc" wrote:
                                Any ETA on 2.2.2SP12?


                                Release 2.2.2.SP12 morphed into 2.2.3, which was released in May. The next release, 2.2.3.SP1, should come out in early September, and I'll aim to get the write timeout facility in that release.

                                • 13. Re: Messaging blocked by long time-out
                                  Ron Sigal Master

                                  I've just attached to JBREM-1120 two versions of jboss-remoting.jar with the write timeout facility implemented:

                                  * 2.2.3.SP1 preview (874 kb): writes "Remoting version: 2.2.3.SP1-preview: 8/19/2009 - 14:54" when loaded

                                  * 2.5.2 preview (1.07 Mb): writes "JBossRemoting Version 2.5.2 (Flounder) preview: 8/19/09-14:59" when loaded

                                  Note that release 2.2.3.SP1 is scheduled for September 14, 2009. Right now I don't have a release date for 2.5.2.

                                  -Ron

                                  • 14. Re: Messaging blocked by long time-out
                                    Ron Sigal Master

                                     

                                    "Chul Yoon" wrote:

                                    I assume I can just set the write timeout attribute for JBM in remoting-bisocket-service.xml

                                    Something like this?:

                                    <attribute name="writeTimeout">200000</attribute>
                                    



                                    That will set "writeTimeout" on the server side, which will affect (1) writing responses to the client and (2) sending messages to the client. If you want to configure the client side as well, then add the "isParam" attribute, which will put "writeTimeout" in the InvokerLocator which gets sent to the client:

                                    <attribute name="writeTimeout" isParam="true">200000</attribute>
                                    


                                    -Ron

                                    1 2 Previous Next