-
15. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 10, 2012 3:48 PM (in response to clebert.suconic)Unfortunately I have now been able to reproduce the problem on the first attempt to connect, which seems to rule out the "attack" theory, as no close has occurred. I have very limited time this week, but will try to catch Noman on IRC at some point.
-
16. Re: Reproducible failure under load, SSL only
clebert.suconic Dec 10, 2012 3:56 PM (in response to stuarthalloway)I will talk to Norman tomorrow anyways.. but I'm not sure how to help there.. the issue is definitely environmental. I will see what I gather from Norman.. .please keep us posted if you find anything.
-
17. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 11, 2012 7:37 AM (in response to clebert.suconic)I have isolated a different failure more when using HornetQ in Datomic. The error there is
HornetQException[errorCode=2 message=Cannot connect to server(s). Tried with all available servers.]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:619) Unfortunately, I do not have a small test case demonstrating this error. However, it is similar to the error discussed above in the following ways:
- the error happens only occasionally, in a code path that works most of the time
- the arguments being passed to createSessionFactory are exactly the same in the cases that succeed, and the cases that fail
- if I reconfigure Datomic to disable ssl, the error never occurs.
There is another open thread reporting a similar error, and suggesting that "localhost" is special and to blame. I believe I have eliminated that possibility by reproducing the problem using all the different network names for localhost.
-
18. Re: Reproducible failure under load, SSL only
normanmaurer Dec 11, 2012 11:38 AM (in response to stuarthalloway)1 of 1 people found this helpfulI have some good news... I can reproduce the "hang" and know what the problem is. It's actual a "bug" in Netty which we already fixed in the next major version. Anyway it is not fixed in the 3.x series yet, and so you see the problem here. So to fix this I will need to backport the fix to Netty 3.x series. Once it is released you should be able to just upgrade the Netty jar and everything should "just work".
The "problem" arise when this happens:
1. the client can successfully connect to the server
2. the client issue an handshake
3. the server close the connection and the client will not see the RST, at this point the handshake is still not complete
4. the client waits (almost) forever on the handshake which will never happen
Stay tuned, I'm working on it ...
-
19. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 11, 2012 1:52 PM (in response to normanmaurer)That's great news. I tried running my example against netty 4.0.0.Alpha4, an got a totally unrelated error immediately on startup. Am I right to assume that the netty 4 line is not compatible with my project, and I should wait and test your fix on the 3 line?
-
20. Re: Reproducible failure under load, SSL only
normanmaurer Dec 11, 2012 2:00 PM (in response to stuarthalloway)Yes you are right it is not compatible. I hope to have a fix ready to test tomorrow.
-
21. Re: Reproducible failure under load, SSL only
clebert.suconic Dec 11, 2012 6:57 PM (in response to stuarthalloway)@Stuard: We were chatting on IRC today. Netty has a different API. Norman will do the changes on HornetQ to integrate Netty on master as soon as they release Netty 4 Beta.
-
22. Re: Reproducible failure under load, SSL only
clebert.suconic Dec 11, 2012 6:59 PM (in response to stuarthalloway)There's another issue fixed about the first node and localhost on the list. I'm confused on the versions you tried now. Are you still seeing that even with a checkout on Branch_2_2_AS7 or the latest beta?
-
23. Re: Reproducible failure under load, SSL only
normanmaurer Dec 12, 2012 6:52 AM (in response to normanmaurer)@stuart: could you please clone this git-repos and build hornetq from it and verify it fixes the problem:
https://github.com/normanmaurer/hornetq/tree/handshake_workaround
Thanks!
-
24. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 12, 2012 8:06 AM (in response to normanmaurer)Where the sample app used to hang, I now see
HornetQException[errorType=NOT_CONNECTED message=HQ119026: Cannot connect to server(s). Tried with all available servers.]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:841)
at hornet.samples.PingProducer.main(PingProducer.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:291)
at java.lang.Thread.run(Thread.java:680)
Will also test in Datomic, where we have application-level retry logic, and see what happens there.
-
25. Re: Reproducible failure under load, SSL only
normanmaurer Dec 13, 2012 1:13 AM (in response to stuarthalloway)@stuart: Ok so we have one of two issues fixed now.. I will do more debugging for the SSL error, but I guess it will take me some time as debugging SSL problems are a pain :/
-
26. Re: Reproducible failure under load, SSL only
normanmaurer Dec 17, 2012 7:58 AM (in response to normanmaurer)@stuart: Could you please build Netty from the following branch and try it with it:
https://github.com/netty/netty/tree/ssl_race_fix
I did run your example app and sent/received over 1 million messages without a problem using the above Netty code.
Please let me know if it also fix the problem for you.
-
27. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 19, 2012 7:20 AM (in response to normanmaurer)Hi Norman,
I am unable to see this branch -- is it possibly not pushed?
Stu
-
28. Re: Reproducible failure under load, SSL only
normanmaurer Dec 19, 2012 7:32 AM (in response to stuarthalloway)Hey Stuart,
it was merged into the 3 branch:
https://github.com/netty/netty/tree/3
Please test and let me know as we would like to cut a release ASAP.
Thanks!
-
29. Re: Reproducible failure under load, SSL only
stuarthalloway Dec 19, 2012 8:53 AM (in response to normanmaurer)Working on it. The maven build now requires JDK 7? I can test that, but not sure we can force our users there...