Any other messages during the runtime?
Since you are doing tests.. any chance you could try trunk?
Hi, (I was working with bizz on this test)
we didn't see any messages during that time( I assume you ment log messages)
we only monitored the process and the disk usage, and saw that the process is accessing the journal files.
I'm building the trunk now, and we'll try to reproduce tomorrow. We were using 2.1.1, were there any recent issues regarding this issue?
thanks for your help
It's possible you were having compacting issues.. what would have turned off reclaiming, making your journal to not compress or reclaim any files.
I wanted to know if you saw any error logs on the system.
Maybe you could share me your test if If you replicate the issue. We're about to release.. if you replicate it tomorrow you would be on perfect timing.
I'll be happy to help,
What kind of info can i give you in order to better understand the situation?
Maybe i can elaborate on our use case, and on the testing.
We have a system that has a lot of jobs, around 15 million, and most of them needs to be run at different frequencies.
We thought we can use HornetQ to do the scheduling of those jobs. Every message represents a job. A message is pretty small, it only has a few IDs that represent the entities it works on. Once the message is received, the job is done, and once it's finished it is acknowledged. A job can decide it needs to be run at a future time again, so it will send a new message, with a scheduled.
This means that in any given time, there will be only around 100,000 messages that needs to be handeled, but there will be a few millions that are "pending".
First thing we wanted to test, was how HornetQ manages when there are so many messages that are pending.
This is why we just sent s few millions messages, and had a small amount of consumers.
As Levona said, everything was going pretty well, they time it took to receive a message was pretty short (a few milliseconds), the time it took to send was a bit slower (it got to a few seconds) but it was reasonable, and the system was running smoothly till at one point we decided to restart the test.
That's when we were surprised that it took a few hours for the server to finish the startup.
One thing we didn't mention is that the test was running on an amazon instance, don't know if that means anything.
Please let me know how I can help to further investigate. Tomorrow I will attach the test application and the config files we are using, if you think it will help.
Thanks again for all the help.
HornetQ seems by far the most system for our needs, we just need to pass this little bump :-)
"What kind of info can i give you in order to better understand the situation?"
The best thing would be a way to replicate the scenario. (against trunk, please)
I'm cheering for you not find an issue with trunk.. so it's important you try trunk first before raising any issues around this.
"One thing we didn't mention is that the test was running on an amazon instance"
From our point of view I think it's fine.
The only thing I'm not sure about Amazon is how real is the disk they give you. I mean.. a sync is really a sync or not? if you configure the virtual hardware to sync.. would that mean a phisical sync?
I'm not sure how optimistic/pessimist is Amazon in regard to crashes. I read their docs and I didn't find much information.
From what I talked to a few people at Cloud conferences, they tell me EC2 will respect the configuration you set on your virtual hardware. So... that means it's fine for now.. but that's something to watch for on Amazon.
But anyway.. this goes a little off from the initial thread title.
Going back to the original thread... I will take a look as soon as you provide me a test.
I want to look if there's anything going on such as compacting.. or if there's anything I could do to speed up startup for you.
We did the same test on trunk.
For you questions about errors, we didn't see any errors in the log, and no stuck threads (using kill -3) . It really seems like it simply took it a long time to load the journal files.
In the trunk test, we could see an improvement in the journal files management, I guess there was a compression because some files were deleted from the journal during the execution, and it took us much longer to get to 1.1GB journal size (containing approximately 2.5 million messages)
We restarted the server three hours ago, and it's still didn't start.
(We also restarted the server once during the test when the journal was 300M, and it took 7 minutes)
Attached you can find the configuration files we used both for the server and clients, and the code of our clients.
Thanks for all the help
hornetq-testcase.rar.zip 9.1 MB
Loading 1.1 GB should task about one second on a normal machine with ext3 file system. Are you sure you're not running on some weird file system?
The persistence chapter describes what file systems we support.
Do you have any way you could give me the data files? You could mail me in pvt a link.. or something?
I will try your test here.. but since you already have the data.. it would save me some time.
Good point from Tim. Are you guys using ext3?
filesystem is ext3
Any chance you guys could send me the data somehow?
you mean the *.hq files? do you have an FTP server i can upload them to? do yuo need all of them or just some?
is there anything else we can try? maybe turn on some logging or something that can help?
did you take a look at the config files? I'm kind of hoping that the whole thing is just the way we misconfigured things (though we didn't change much of the default values)
anything else regarding the machine itself? if it's not the filesystem (that is ext3), is there anything else that might make a difference?
thanks for all your help
> "you mean the *.hq files? do you have an FTP server i can upload them to? do yuo need all of them or just some?"
You could zip your data and send it to me through FTP. Send me an email in pvt (you have it here on my account) and I will synchronize with you.
I did a few tweaks on loading, and I can load your test data in about 1 minute.
If all of those messages (2.5 million) were scheduled... it would take about 2 more minutes to reschedule all of them.
Notice that a messaging system is not intended to be a scheduling system.
These changes will make into next release over the weekend.