We're running 3.0.4 (with a few tweaks pulled from 3.0.5 and my quick hack to let one partition communicate with another) and the past couple of days have been getting a lot of failures on the HANaming listener.
My company, Dorado Software, builds network management software. As you can imagine, we have a LOT of random equipment around - production quality, beta quality, you name it. My colleague and I suspect that some piece or pieces of equipment are sending bad UDP packets to the HA listen port. I've confirmed at least that it is bad data coming in and not a JVM error by putting up a small listener of my own on the same port and seeing it die.
Overall, we've been extremely pleased with JBoss. Converting from Weblogic has been a relatively painless process so far, going smoothly and quickly. The community is responsive, the code is clean and easy to understand, and the server is rock-solid. This is the first hitch we've come to that's causing a real issue in terms of reliability.
In order to ensure that reliability despite our noisy network, I've made the following change to our code. I believe it is the right thing to do - especially on a large and chaotic internal network - and should be applied to the 3.0 branch.
--- HANamingService.java~ Fri Jan 24 11:21:18 2003
+++ HANamingService.java Fri Jan 24 12:43:16 2003
@@ -459,9 +459,11 @@
if (!stopping) log.error ("HA-JNDI AutomaticDiscovery stopped", e);
- // Create a new thread to accept the next datagram
- listen ();
+ // Create a new thread to accept the next datagram
+ listen ();
// Return the naming server IP address and port to the client
Thank you. The fix has been applied today on Branch_3_0, Branch_3_2 and HEAD. It will be part of jboss 3.0.7.