Well solaris JVM & NIO are notoriously buggy, so first step would be to upgrade JDK to latest (u131 atm)
and than we can look in other options.
Also upgrading WildFly to 10.1 would not hurt
Besides the upgrade, do you have any recommendations on how to tackle/investigate this? The issue appeared over night, without any modifications on the application itself.
You may try increasing direct buffer memory using -XX:MaxDirectMemorySize=512m JVM option. However, consider to upgrade to JDK to latest update and Wildfly too.
MaxDirectMemorySize is not set at this point. How can I determine its default value - the value the my app is using right now?
You may use jinfo as in the example:
jinfo -flag MaxDirectMemorySize <PID>
Given that the flag is not set, the jinfo comand has the following output:
myzone# jinfo -flag MaxDirectMemorySize PID
What is the maximum size of direct memory in this case?
Accordingly to this dzone article, you can use a java api to print the max direct memory, the value is in bytes.
You can use the following to display the value as MB
This is the value I get printing sun.misc.VM.maxDirectMemory(): 24811929600 in bytes. This is aprox 23 Gb. This means setting it to 512Mb would make no sense. I am right? Are there any mbeans or commands that would indicate the usage of this memory? Is this process specific?
We manage to print the usage of Direct Memory by using the Java.nio BufferPool mbean attributes. The usage reaches around 1.5Gb when the OutOfMemory errors occur. We reproduced the issue even after a jdk upgrade to 1.8 update 131 and Wildfly to10.1.0. Any hints on how we should continue to tackle this?
Note: the issue does not occur in a similar environment where the number of processors is 64 (instead of 256). This difference in no of processors triggers a difference in number of threads (io workers). We will run a test with a decreased number of threads (io workers).
yeah, tuning IO subsystem worker pools would be a proper approach I think.
As default config calculates number of threads and buffers based on available CPUs.
And in case of such huge number of CPUs default formula would not be appropriate anymore.