0 Replies Latest reply on Aug 15, 2013 1:58 PM by cdegeiso

    JBoss 5.1.0.GA Severe Latency Problem

    cdegeiso

      Hi all,

       

        I have an issue that I have not been able to determine whether it is a symptom or a cause of a severe latency problem I am experiencing with a deployment of a COTS package using JBoss 5.1.0.GA. We use 5.1 because that is what is currently supported with the application. I have heard the next release will have JBoss 7 support and am anxious to move on to something more solid.

       

        We've been running 5.1 for a couple of years now (JBoss 4.2.1 years before that) and it used to run fine in JBoss 4 and for a little over a year on 5.1, then performance started degrading and has become unusable at times. Although we have been using JBoss for a long time, I am by no means a JBoss expert. I know enough to get it running and processing data for our application, that's about it. Unfortunately, due to apathy of system admins (don't ask), the task of figuring out what is going on has been assigned to me (an application administrator), so I will have to get spun up quick on it.

       

        First some symptoms. After a few days of the application running, latency starts to drift in. It will get so bad that transaction times are clocked at roughly 75s per transaction. CPU and memory usage are good on windows boxes and HP/UX database box, I can access JMX console just fine and monitor activity there, but some anomalies exist. One, JCA connections Available and connections in use do not equal our total connection pool size. For example, our connection pool is set to max 300 connections and I will seel something like 164 connections in use and 52 avaliable. When we aren't having issues, these two numbers always equal 300 if added together. The only way we have found to recover is to completely shut off the JBoss servers, bump the database, and restart the JBoss servers. Stopping JBoss services doesn't fix it and neither does restarting the JBoss servers. The database has to clear out all sessions on it's end before JBoss will connect again. DBA states that the database is fully functioning during this time and has not provided any anomalies from logs on the oracle end.

       

        The second thing that occurs when we get slammed with latency is when I run a thread dump to see what tasks are currently processing during the latency, I see a lot of threads that look like this:

       

      Thread: Thread-6293325 : priority:5, demon:false, threadId:6370353, threadState:RUNNABLE

       

      java.net.SocketInputStream.socketRead0(Native Method)
      java.net.SocketInputStream.__AW_read(Unknown Source)
      java.net.SocketInputStream.read(Unknown Source)
      oracle.net.ns.Packet.receive(Packet.java:300)
      oracle.net.ns.DataPacket.receive(DataPacket.java:106)
      oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
      oracle.net.ns.NetInputStream.__AW_read(NetInputStream.java:260)
      oracle.net.ns.NetInputStream.read(NetInputStream.java)
      oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
      oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
      oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
      oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
      oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
      oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
      oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
      oracle.jdbc.driver.T4CTTIoping.doOPING(T4CTTIoping.java:52)
      oracle.jdbc.driver.T4CConnection.doPingDatabase(T4CConnection.java:4008)
      - locked <0x14948d6e> (a oracle.jdbc.driver.T4CConnection)
      oracle.jdbc.driver.PhysicalConnection$3.run(PhysicalConnection.java:7868)
      java.lang.Thread.run(Unknown Source)

      Thread: Thread-6293326 : priority:5, demon:true, threadId:6370354, threadState:RUNNABLE

      Thread: Thread-6293327 : priority:5, demon:true, threadId:6370355, threadState:RUNNABLE

       

      java.net.SocketInputStream.socketRead0(Native Method)
      java.net.SocketInputStream.__AW_read(Unknown Source)
      java.net.SocketInputStream.read(Unknown Source)
      oracle.net.ns.Packet.receive(Packet.java:300)
      oracle.net.ns.DataPacket.receive(DataPacket.java:106)
      oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
      oracle.net.ns.NetInputStream.__AW_read(NetInputStream.java:260)
      oracle.net.ns.NetInputStream.read(NetInputStream.java)
      oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
      oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
      oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
      oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
      oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
      oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
      oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
      oracle.jdbc.driver.T4CTTIoping.doOPING(T4CTTIoping.java:52)
      oracle.jdbc.driver.T4CConnection.doPingDatabase(T4CConnection.java:4008)
      - locked <0x65f90b32> (a oracle.jdbc.driver.T4CConnection)
      oracle.jdbc.driver.PhysicalConnection$3.run(PhysicalConnection.java:7868)
      java.lang.Thread.run(Unknown Source)

      Thread: Thread-6293328 : priority:5, demon:true, threadId:6370356, threadState:RUNNABLE

      Thread: Thread-6293329 : priority:5, demon:true, threadId:6370357, threadState:RUNNABLE

      Thread: Thread-6293330 : priority:5, demon:true, threadId:6370358, threadState:RUNNABLE

      Thread: Thread-6293331 : priority:5, demon:true, threadId:6370359, threadState:RUNNABLE

      Thread: Thread-6293332 : priority:5, demon:true, threadId:6370360, threadState:RUNNABLE

      Thread: Thread-6293323 : priority:5, demon:true, threadId:6370361, threadState:RUNNABLE

      Thread: Thread-6293333 : priority:5, demon:true, threadId:6370362, threadState:RUNNABLE

      Thread: Thread-6293334 : priority:5, demon:true, threadId:6370363, threadState:RUNNABLE

       

        All of the other threads have some kind of name that I can determine what they are used for, like:

       

      Thread: ajp-0.0.0.0-8005-197

      Thread: ReportQueueAgent

      Thread: HeartbeatThread

       

        All of these threads have stack traces that I can see what they are doing and can be reasonably sure that they are doing something productive. The generic threads (what I call them) above don't really seem to do much other than open connections and hold onto them. When things are running smoothly, the generic threads are non-existent. They gradually build up over time though as latency kicks in. These also seem to correlate with higher user connections (the more users that are in, the more likely performance will tank).

       

        As I said earlier, I am no expert, but I feel like there is a direct correlation to these threads and our latency issue. While they may not be the cause, I would like to know how I go about getting JBoss to tell me why these threads are being created so I can find a way to stop it or, in the case of a code defect, submit it to the developer for review.

       

        More info pertaining to our environment. I am not sure it matters, but in case it does, here are some specs:

       

      User Base: Total 4,000 users

      Concurrent Users: 175 (peak), 110 (avg)

      No Load balancing - App and process servers serve two completely separate functions, although they both run JBoss and TRIRIGA.

       

      (All are virtual with the exception of HP/UX)

       

      Web Server: AMD64 Dual Core processor, 4GB Ram, Windows 2003, IIS 6.0, ISAPI Redirector 1.2.32

      App Server: AMD64 Dual Core processor, 12GB Ram, Windows 2003, JBoss 5.1.0.GA (hosting Tririga) - Java Heap 8GB

      Process Server: AMD64 Dual Core processor, 10GB Ram, Windows 2003, JBoss 5.1.0.GA (hosting Tririga) - Java Heap 6GB

      Report Server: AMD64 Dual Core processor, 4GB Ram, Windows 2003, Crystal RAS 2008 SP3

      Database Server: 6 processors, 18GB Ram, Oracle 11gR2

       

      Windows boxes will be upgraded to Windows 2008R2 within the next couple of months.

       

      Thanks,

       

      Chris