6 Replies Latest reply on Feb 22, 2005 11:22 AM by sheckler

HAJMS and Oracle TAF

sheckler Feb 4, 2005 4:19 AM

Hi all,

I am running HAJMS (with oracle-jdbc2-service) on JBoss 3.2.6. The oracle server infact is an Real Application Cluster with Transparent Application Failover (TAF).

In case of TAF , for a short period, there may be no oracle instance available.

ORA-01089: immediate shutdown in progress - no operations are permitted or
ORA-03113: end-of-file on communication channel
may occur

Writing transactions are normally rolled back when failover is finished or the jboss transaction timeout occures. Reading transactions just continue (this is managed by the oracle oci driver).

How will HAJMS (e.g. a MDB) behave in this case? Will it enter a reconnection loop until oracle is available again?

Can I configure a separate transaction timeout for JMS?

Are thre special settings in the oracle-ds.xml , when used by JMS?

Thank You all
Stefan

PS: I am using Oracle 9.2 and ojdbc14.jar driver

oracle-jms-ds.xml

<datasources>
<local-tx-datasource>
<jndi-name>OracleJMSDS</jndi-name>
<connection-url>jdbc:oracle:oci8:@TNSNAMES-ENTRY</connection-url>
<driver-class>oracle.jdbc.driver.OracleDriver</driver-class>
<security-domain>OracleDbRealm</security-domain>





<check-valid-connection-sql>select table_name from all_tables where owner = 'RWCO_B'</check-valid-connection-sql>


<new-connection-sql>select table_name from all_tables where owner = 'RWCO_B'</new-connection-sql>


<exception-sorter-class-name>org.jboss.resource.adapter.jdbc.vendor.OracleExceptionSorter</exception-sorter-class-name>


<track-statements>nowarn</track-statements>


<set-tx-query-timeout>false</set-tx-query-timeout>





<min-pool-size>1</min-pool-size>
<max-pool-size>10</max-pool-size>
<blocking-timeout-millis>30000</blocking-timeout-millis>
<idle-timeout-minutes>10</idle-timeout-minutes>






</local-tx-datasource>
</datasources>

1. Re: HAJMS and Oracle TAF

adrian.brock Feb 4, 2005 10:11 PM (in response to sheckler)

As far as JBossMQ is concerned (HAJMS considerations are irrelevent to your question)
any send or recieve that requires persistence/transactions will fail during the DB failover.

Senders will receive an exception, receivers will have the message redelivered.

MDBs will not reconnect since there is no connection failure between the client (MDB)
and server (JBossMQ), only between the server and the DB.

JBossMQ does not support a transaction timeout beyond that provided by JTA.

Your other questions can be found in the JCA FAQ.
Actions
2. Re: HAJMS and Oracle TAF

sheckler Feb 7, 2005 7:04 AM (in response to sheckler)

Thank You Adrian,

To verify I tried the following:
running JBoss (3.2.7 by now) on my pc with oracle on a remote server and the following configuration set:

datasoure for JMS
..
<blocking-timeout-millis>300000</blocking-timeout-millis>
..

jboss-service.xml

..
"<attribute name=TransactionTimeout>600</attribute>"
..

after disconnecting my pc from the lan for about one minute I saw the following
message several times:
10:56:07,359 INFO [JMSContainerInvoker] Reconnected to JMS provider
and everything recovers.
The message comes from MDBs, doesn't it?

If the the database connection is away for longer than the configured datasource blocking-timeout for JMS, JBoss doesn't come back.

If the fault tolerance of JMS (meaning: server is running consistently and stable again after database failover) is dependent on the blocking-timeout of the datasource JMS uses (blocking-timeout shoult be at least as long as the database failover can last), it's fine for me.

Stefan
Actions
3. Re: HAJMS and Oracle TAF

adrian.brock Feb 7, 2005 5:43 PM (in response to sheckler)

The blocking timeout is just how long the
DataSource.getConnection() waits when all database connections are in use.

What does the this mean?

If the the database connection is away for longer than the configured datasource blocking-timeout for JMS, JBoss doesn't come back.

I don't see the relevence to database failover.

If JBossMQ cannot get a JDBC connection
it will propogate the exception to the client. Either by:
1) Cannot send message - in this case the message was never sent and the client
knows this
2) Cannot acknowledge message - in this case the message was never received
the message is not lost because only a successful acknowledge deletes the message
it will be redelivered

If you are seeing something else show me the logging (READ THIS FIRST)
and describe the behaviour.

I am usually not very tolerant of posts like this that say "IT DOES NOT WORK".
What it usually means is the poster "DOES NOT KNOW HOW IT IS SUPPOSED TO WORK".
But in this case I sniff a bug report if I correctly understand your "JBoss doesn't come back". Although what this "JBoss" is that doesn't come back I can only guess?
Where did this "JBoss" go in the first place? Do you mean the MDB, the JMS server,
the sender, the database connection or what?
Actions
4. Re: HAJMS and Oracle TAF

sheckler Feb 8, 2005 4:49 AM (in response to sheckler)

Hello Adrian,
thanks for Your reply.

It is quite difficult to explain in short words, what (I think) is going on. We are developing a workforce management system communicating with an external network control system (electricity).

We need high availability.

Therefore we use JBoss clustering and HAJMS and oracle RAC/TAF.

As an example: we can interchange information with mobile units (cars) over GPRS or GSM via the network control system.

Car --> network control system -> JBoss

This interface between the network control system and JBoss is implemented as an mbean using nio and HAJMS to get information from the car to the application server. This mbean service is a cluster singleton service. In a test environement every 10 seconds a update information from the car is comming via socket and put into a ha queue. An MDB processes this information and triggers some business logic. I can see the log message every 10 seconds. If the message cannot be put into the queue, it is retried several times.

And now I come to the point. JBoss comes back means I can see these log messages again and JBoss is ready for clients to connect (after databse was plug off for about one minute). No exceptions are seen afterwards
.
JBoss doesn't come back means I dont's see the log messages any longer and clients cannot connect (JBoss console is dead) and I can see exceptions (after database was plug off more than 10 minutes, which is the blocking timeout). The last exception is
10:15:09,218 WARN [LocalManagedConnectionFactory] Destroying connection that is not valid, due to the following exception:
java.sql.SQLException: ORA-12571: TNS: Fehler beim Paket-Schreiber
. Then nothing more happens. No reaction on shutdown request.

I thought I found empirically a relation between the blocking timeout of the jms datasource and the time I can cut off the database connection without killing the server completely.

I saw HAJMS failover messages from the container as well as from my own classes while the database was plug off.

Please excuse my English
Stefan
Actions
5. Re: HAJMS and Oracle TAF

adrian.brock Feb 8, 2005 2:55 PM (in response to sheckler)

I don't want to know what you "*think* is going on".
I want to see what *is* going on from the logging.

"JBoss console is dead" .
If you mean the jmx console, this is just Tomcat with no interaction on the database.
If I understand you correctly, this probably means your JVM has crashed
(assuming no funny network interactions).

I'd guess (and it is a guess) that you are using OCI which is native code.
And that this native code has caused the jvm to crash due to a bug????

Trying to take a thread dump as explained in "READ THIS FIRST"
will tell you whether the jvm is still active and what it is doing.

If I don't see something more concrete about the problem in your future posts
I will be ignoring this thread.
Actions
6. Re: HAJMS and Oracle TAF

sheckler Feb 22, 2005 11:22 AM (in response to sheckler)

Yes, I am using OCI.
Since I changed to the newest version of the oci driver (from 9.2.0.2 to 9.2.0.5 of oracle 9i), I could not reproduce any of the behaviour described.
The test results differences are definitly caused by the oci driver.
Actions

Go to original post