We had some weird phenomenons at a customer site that we are unable to explain. I am actually not sure if anyone can explain that. I can already see adrian asking me what drugs I am taking :)
The customer is currently running acceptance tests for our product which will take about 2 months.
Our product is J2EE based using EJBs and JMS and for this customer is planned to be deployed on JBoss 3.2.2 on HP-UX with Oracle 9.2 (also used for JMS). We are using JMS to publish all logging information (audit trails, error logs etc.)
14 days ago the customer started to take down the Oracle 9.2 database during night for offline backups. They did not take down JBoss and there is also a service running accessing the system in a regular interval. When the database is down this is leading to some exceptions of course which is expected at this point of time.
3 days ago they were unable to perform any actions because the got a "unique constraint violation" from the JMS system.
We investigated and found about 150 JMS messages in the JMS table in Oracle. Those entries where messages for our ErrorMessageTopic but it seems that topic was handled as a a temporary topic at that time (we don't have temporary topics). We wrote a small program to extract the blob to get some more information.
All the messages were from the first 4 days of the 14 day time period from the timeframe when the database was down.
But here is now the really disturbing bit. The message in the JMS table are messages reporting a database connection error (expected errors because the database was down). But how can it be that i have JMS messages stored in Oracle reporting that is not possible to connect to oracle? Is Jboss under certain circumstances caching messages and persisting them later?
I was not able to re-produce this behavior so far. Has anyone ever encountered something similar?
My comment about the messages being sent to a temporary topic is actually not true. I mis-interpreted the "T" in the TXOP column. It only means they are inserted by the message cache, right?
I still do not know how JBossMQ works when the database is down. I would expect to get an exception immediately, but it seems that JBossMQ is holding the messages in the cache and when the database comes back online it seems that the messages are then inserted using the CacheStore without being consumed.
Any ideas, hints?
Getting a little bit confused. My previous post was not right, I think.
The destination for the entries in the JMS table is something like TOPIC.ErrorTopic.ID:3.-2147483648
This syntax with the ID is used for temporary topics, right?
So, next guess:
For some reason on the customer site the topics are temporary topics (nothing in the startup log is indicating that though). When the database is down, the service that runs and executes methods on our J2EE application is producing errors which we catch and send to JMS for logging purposes. The amount of errors exceeds the HighMemoryMark and as a result some of these messages are persisted when the database comes back online, but are not consumed (or at least not removed from the database).
So our original problem seems to be the fact that the topics are temporary topics.
We don't have any temp topics on our test systems and the startup output from the customer site looks fine, too:
The topics/queues are bound
10:36:59,061 INFO [AdminTopic] Bound to JNDI name: topic/efpAdminTopic
10:36:59,061 INFO [AdminTopic] Started jboss.mq.destination:service=Topic,name=AdminTopic
10:36:59,064 INFO [ErrorTopic] Bound to JNDI name: topic/efpErrorTopic
10:36:59,064 INFO [ErrorTopic] Started jboss.mq.destination:service=Topic,name=ErrorTopic
10:36:59,066 INFO [AuditTopic] Bound to JNDI name: topic/efpAuditTopic
10:36:59,067 INFO [AuditTopic] Started jboss.mq.destination:service=Topic,name=AuditTopic
10:36:59,076 INFO [ActiveActivityQueue] Bound to JNDI name: queue/efpActiveActivityQueue
10:36:59,077 INFO [ActiveActivityQueue] Started jboss.mq.destination:service=Queue,name=ActiveActivityQueue
and the MDBs are deployed without any warnings about non-existing destinations or created temporary destinations
10:37:36,479 INFO [JMSContainerInvoker] Started jboss.j2ee:binding=message-driven-bean,jndiName=local/SysAuditLogger,plugin=invoker,service=EJB
10:37:36,479 INFO [MessageDrivenInstancePool] Started jboss.j2ee:jndiName=local/SysAuditLogger,plugin=pool,service=EJB
10:37:36,479 INFO [MessageDrivenContainer] Started jboss.j2ee:jndiName=local/SysAuditLogger,service=EJB
10:37:36,516 INFO [DLQHandler] Started null
10:37:36,516 INFO [JMSContainerInvoker] Started jboss.j2ee:binding=message-driven-bean,jndiName=local/HistAuditLogger,plugin=invoker,service=EJB
10:37:36,517 INFO [MessageDrivenInstancePool] Started jboss.j2ee:jndiName=local/HistAuditLogger,plugin=pool,service=EJB
10:37:36,517 INFO [MessageDrivenContainer] Started jboss.j2ee:jndiName=local/HistAuditLogger,service=EJB
10:37:36,541 INFO [DLQHandler] Started null
10:37:36,541 INFO [JMSContainerInvoker] Started jboss.j2ee:binding=message-driven-bean,jndiName=local/ErrorLogger,plugin=invoker,service=EJB
10:37:36,542 INFO [MessageDrivenInstancePool] Started jboss.j2ee:jndiName=local/ErrorLogger,plugin=pool,service=EJB
10:37:36,542 INFO [MessageDrivenContainer] Started jboss.j2ee:jndiName=local/ErrorLogger,service=EJB
10:37:36,576 INFO [DLQHandler] Started null
10:37:36,577 INFO [JMSContainerInvoker] Started jboss.j2ee:binding=message-driven-bean,jndiName=local/BFAuditLogger,plugin=invoker,service=EJB
10:37:36,577 INFO [MessageDrivenInstancePool] Started jboss.j2ee:jndiName=local/BFAuditLogger,plugin=pool,service=EJB
10:37:36,578 INFO [MessageDrivenContainer] Started jboss.j2ee:jndiName=local/BFAuditLogger,service=EJB
10:37:36,615 INFO [DLQHandler] Started null
10:37:36,616 INFO [JMSContainerInvoker] Started jboss.j2ee:binding=message-driven-bean,jndiName=local/ActiveActivity,plugin=invoker,service=EJB
10:37:36,616 INFO [MessageDrivenInstancePool] Started jboss.j2ee:jndiName=local/ActiveActivity,plugin=pool,service=EJB
10:37:36,617 INFO [MessageDrivenContainer] Started jboss.j2ee:jndiName=local/ActiveActivity,service=EJB
You'll need to upgrade. This problem was caused by an error in the SQL
where the "T" messages were not deleted at restart on Oracle and other
This SQL has been replaced with a DELETE_TEMPORARY_MESSAGES
in the recent versions.
thanks for your response. I am aware of the temp message in that version. But this is not original problem in my opinion. I configured a DurableSubscriber for the Topic to keep all the messages in the table. The system is definitely not using temporary topics. It seems that these messages (TXOP='T') are cached while the database is down and (some of them) stored in the table when the database is up again. What I don't understand is why. We do not want transactional JMS for this particular topic so we are using java:ConnectionFactory to send the error logs from an EJB to the JMS system. The messages are deilvered to the JMS and picked up by the MDB although the dtabase is down. So there is actually no need for caching and persisting them at all.
Upgrading to a newer version is unfortunately not an option at this stage. Some configuration bits in 3.2.5 are stored in different configuration files in different directories than in 3.2.2. For the customer this makes it a new release instead of a patch, which means they would have to start the whole test cycle again.
Durable subscriptions persist messages by default (unless you explicitly send
non persistent messages).
The "T" messages are non persistent messages that don't fit into memory.
They are moved to disk to free up memory for something else when memory
Here is one my "favourite" links:
Aren't you an IONA customer? It seems strange that you would be posting in the forums
where there is no guaranteed service level (witness the 5 days delay in answering your
original question) when you have access to support where you can get a
The IONA support was engaged by the PROJECT team after I started the thread. I am responsible for the PRODUCT development and try to get answers from whereever I can. The guaranteed answers from official support teams are often enough "sorry, we don't know". I have more trust in the community that is actually using the particular product. But it seems that this problem is too specific. Simply the fact we can not re-produce it on other systems is weird.
I know how a durable subscriber works. As I stated in the previous message I just configured the durable subscription to keep all messages in the db to compare them with the "T" messages. And the result is that all messages that are sent while the database is online are sent correctly to the right topic with TXOP="A".
Only after the database was down messages with "T" appear in the database, messages that were sent while the database was down, messages that were received by the corresponding MDB although the database was down. And this has nothing to do with the durable subcriber because I configured that one only for a short test cycle.
I know that there are configurable cache limits and I understand that mesages have to be persisted if the cache grows too large, but it seems to me that something is going wrong when JBoss has to persist messages because of cache "overflow" while the database is not available. Given the fact that the messages are still delivered, it is also strange that we have a cache overflow at all.
But the strangest thing is that we are not able to reproduce the problem on our systems. The customer changed a lot of the port configurations, could that result in such a behavior?
Anyway, some of my team and someone from IONA will be at the customer site tomorrow. Let's hope we get some more information from them afterwards.
just to clarify, when I am saying "professional support teams" I am not talking about the JBoss support or IONA in specific. I was just making a generalised statement based on almost 20 years of experience with support teams from almost every major software company.
Btw, when I am posting on this forum I actually do not expect you to answer all the questions immediately, sometimes not even at all. The aspect of exchanging information and discussing issues with other JBoss users is often the main reason for posting.
a quick update on this issue.
According to our investigations the problem was the result of probably two bugs in Jboss.
If the database is not available and the cache limit is reached, the messages are not deleted from the cache after delivery (you can see the the counter going up in the JMX console all the time). When the database is up again JBoss is writing all these messages to the database according to its cache configuration. Because the messages are already delivered these entries are never pickud up again and stay in the database.
This in combination with the message cleanup bug on Jboss startup can lead to a unique constraint violation if new messages are written.
It might be worth mentioning that the JMS cache configuration was too low given the used hardware, but still the messages should be removed from the cache after delivery.
As a solution for now we increased the cache size and implemented a startup services that is cleaning the "T" entries from the JMS Messages table.
Which is what DELETE_TEMPORARY_MESSAGES does (since 3.2.4)
We know that, but unfortunately you decided to switch from tomcat 4.x to tomcat 5 which makes 3.2.4/5 a new major release and not a minor release or patch.
The customer would have to start their whole test cycle again, which would move the live date to next year, which would cost us more than $200000. If that happens I could never convince my superior management that JBoss is a trustworthy Applcation Server and it would definitely disappear from our suggested platform list. So you see, all I am trying is to keep you in the game for our future projects...
and deleting the messages is only fighting a symptom anyway. Those messages should never appear in the database because they were already delivered before the database comes online again and there are no durable subscribers.
That is not correct.
It is part of the design of the way the jdbc pm works.
Non persistent messages added to store via the cache don't survive a restart.
The cache does not fail just because it couldn't remove the persistent copy at
acknowledgement time. It will let the restart tidy it up.
The problem in earlier versions was that it was issuing an invalid sql statement
delete from jms_messages where txop='T' and txid=null
which some databases allow, but Oracle interprets correctly.
It should have been
delete from jms_messages where txop='T' and txid IS null
It was something I fixed a while ago, but the code had regressed.
Since the txid being null is irrelevent anyway, later versions do
delete from jms_messages where txop='T'