Shutting down JBoss while an invoke is executing
anujbhatia Dec 12, 2011 3:52 AMHi,
If I try to shutdown JBoss while a BPEL process instance is executing an INVOKE activity, the server does not shutdown completely untill the response is recieved from the external web service (or the request timesout). Is this by design?
I think this would be a good feature to have to make sure that the process is not left in an inconsistent state. However, there seems to be a problem and this is not working as I expected. Once the response (or timeout) for the INVOKE is recieved, the server that is partially shutdown tries to process the response but is not able to do so, I've pasted the log messages below. Also, if this is running in a cluster the other cluster node takes up the INVOKE_CHECK job from the node that is now partially shutdown. But the other cluster node that is still running is not able to process the response (or timeout) and continue the process execution.
My environment is JBoss 5.1.0 GA, JBossWS CXF 3.4.0, RiftSaw 2.3.1-SNAPSHOT and JDK 1.6. (I'm using a snapshot build at revision 1429 that includes the fix for RIFTSAW-404)
Log messages on Shutdown:
13:35:12,373 INFO [org.jboss.bootstrap.microcontainer.ServerImpl] Runtime shutdown hook called, forceHalt: true
13:35:12,373 INFO [org.apache.coyote.http11.Http11Protocol] Pausing Coyote HTTP/1.1 on http-0.0.0.0-8080
13:35:12,373 INFO [org.apache.coyote.http11.Http11Protocol] Stopping Coyote HTTP/1.1 on http-0.0.0.0-8080
13:35:12,389 INFO [org.apache.coyote.ajp.AjpProtocol] Pausing Coyote AJP/1.3 on ajp-0.0.0.0-8009
13:35:12,389 INFO [org.apache.coyote.ajp.AjpProtocol] Stopping Coyote AJP/1.3 on ajp-0.0.0.0-8009
13:35:12,389 INFO [org.jboss.soa.bpel.deployer.as5.AS5BPELDeployer] Undeploy: Quickstart_bpel_simple_invoke-1.jar relativePath=deploy.xml
13:35:12,389 INFO [org.jboss.soa.bpel.deployer.as5.AS5BPELDeployer] Not undeploying BPEL: Quickstart_bpel_simple_invoke-1.jar relativePath=deploy.xml
13:35:12,389 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/Quickstart_bpel_simple_invoke
13:35:12,404 INFO [org.jboss.wsf.framework.management.DefaultEndpointRegistry] remove: jboss.ws:context=Quickstart_bpel_simple_invoke,endpoint=HelloWorldWS
13:35:12,404 INFO [org.springframework.context.support.GenericApplicationContext] Closing org.springframework.context.support.GenericApplicationContext@af5c74: startup date [Mon
Dec 12 13:34:50 IST 2011]; parent: org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext@38a6d2
13:35:12,404 INFO [org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext] Closing org.jboss.wsf.stack.cxf.client.configuration.JBossWSBusApplicationContext@3
8a6d2: startup date [Mon Dec 12 13:34:50 IST 2011]; root of context hierarchy
13:35:12,435 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/web-console
13:35:12,435 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/invoker
13:35:12,498 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Stopping partition DefaultPartition
13:35:12,498 INFO [org.jboss.cache.RPCManagerImpl] Disconnecting and closing the Channel
13:35:12,639 INFO [org.jboss.cache.RPCManagerImpl] Stopping the RpcDispatcher
13:35:12,764 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition stopped.
13:35:12,764 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Partition DefaultPartition destroyed.
13:35:12,764 INFO [org.jboss.soa.bpel.runtime.engine.service.BPELEngineService] Stopping JBoss BPEL Engine
Now this JBoss instance does not exit but waits for the INVOKE reponse to return. In the mean time *on the other cluster node*:
2011-12-12 13:35:14,392 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) I am (10.40.41.241:1099) received membershipChanged event:
<snip>
2011-12-12 13:35:14,392 INFO [org.jboss.soa.bpel.clustering.ODEJobClusterListener] (AsynchViewChangeHandler Thread) The available nodes now are [10.40.41.241:1099]
Once the response is recieved *on the orignal JBoss node that is partially shutdown*:
13:36:08,951 DEBUG [org.apache.cxf.transport.http.HTTPConduit] Response Code: 200 Conduit: {http://simple_invokehelloworld}HelloWorldPort.http-conduit
<snip>
13:36:09,045 DEBUG [org.apache.ode.bpel.engine.PartnerRoleMessageExchangeImpl] create work event for mex=hqejbhcnphr6tp10hgvakx
<snip>
13:36:09,061 DEBUG [org.apache.ode.sql] prepareStatement: insert into BPEL_MESSAGE (DATA, HEADER, MESSAGE_EXCHANGE_ID, TYPE) values (?, ?, ?, ?)
13:36:09,061 DEBUG [org.apache.ode.sql] executeUpdate, bound (2,null) (3,hqejbhcnphr6tp10hgvakx) (4,{http://simple_invokehelloworld}HelloWorld_sayHelloResponse)
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.sql] getConnection (tx=2)
13:36:09,076 DEBUG [org.apache.ode.sql] prepareStatement: update BPEL_MESSAGE_EXCHANGE set CALLEE=?, CHANNEL=?, CORRELATION_ID=?, CORRELATION_KEYS=?, CORRELATION_STATUS=?, CORR_ID=?, CREATE_TIME=?, DIRECTION=?, EPR=?, FAULT=?, FAULT_EXPLANATION=?, OPERATION=?, PARTNER_LINK_ID=?, PARTNER_LINK_MODEL_ID=?, PATTERN=?, PIPED_ID=?, PORT_TYPE=?, PROCESS_ID=?, PROCESS_INSTANCE_ID=?, PROPAGATE_TRANS=?, REQUEST_MESSAGE_ID=?, RESPONSE_MESSAGE_ID=?, STATUS=?, SUBSCRIBER_COUNT=? where MESSAGE_EXCHANGE_ID=?
13:36:09,076 DEBUG [org.apache.ode.sql] executeBatch, bound (1,null) (2,23) (3,null) (4,null) (5,null) (6,null) (7,2011-12-12 13:35:08.0) (8,P) (10,null) (11,null) (12,sayHello) (13,368) (14,18) (15,REQUEST_RESPONSE) (16,null) (17,{http://simple_invoke/helloworld}HelloWorld) (18,129) (19,113) (20,false) (21,643) (22,644) (23,RESPONSE) (24,0) (25,hqejbhcnphr6tp10hgvakx)
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.sql] getConnection (tx=2)
13:36:09,076 DEBUG [org.apache.ode.sql] prepareStatement: insert into ODE_JOB (channel, correlationKeySet, correlatorId, detailsExt, inMem, instanceId, mexId, nodeid, processId,retryCount, scheduled, ts, transacted, type, jobid) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
13:36:09,076 DEBUG [org.apache.ode.sql] executeBatch, bound (1,23) (2,@2) (3,null) (4,null) (5,false) (6,113) (7,hqejbhcnphr6tp10hgvakx) (8,10.40.42.88:1099) (9,null) (10,0) (11,true) (12,1323677169061) (13,true) (14,INVOKE_RESPONSE) (15,hqejbhcnphr6tp10hgval1)
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.sql] close()
13:36:09,076 DEBUG [org.apache.ode.bpel.engine.BpelServerImpl] BPEL SERVER STOPPING
13:36:09,092 DEBUG [org.apache.ode.sql] getConnection (tx=4)
13:36:09,092 DEBUG [org.apache.ode.sql] Set isolation level to 2
13:36:09,092 INFO [org.apache.ode.bpel.engine.BpelServerImpl] BPEL Server Stopped.
13:36:09,092 DEBUG [org.apache.ode.sql] prepareStatement: delete from ODE_JOB where jobid=? and nodeid=?
13:36:09,092 INFO [STDOUT] BPAFLogAdapter shutdown
13:36:09,092 DEBUG [org.apache.ode.sql] executeUpdate, bound (1,hqejbhcnphr6tp10hgval1) (2,10.40.42.88:1099)
13:36:09,092 DEBUG [org.apache.ode.sql] close()
13:36:09,092 DEBUG [org.apache.ode.sql] close()
13:36:09,092 ERROR [org.apache.ode.scheduler.simple.SimpleScheduler] Error while processing a persisted job: [JobId: hqejbhcnphr6tp10hgval1,nodeId: 10.40.42.88:1099,scheduled: true,transacted: true,ts: 1323677169061,channel: 23,instaceId : 113,type: INVOKE_RESPONSE,retrycount: 0]
java.lang.NullPointerException
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:315)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:251)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.processInTransactionContext(SimpleScheduler.java:566)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:546)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
13:36:09,092 ERROR [STDERR] org.apache.ode.bpel.iapi.ContextException: Cannot locate the transaction manager; the server might be shutting down.
13:36:09,092 ERROR [STDERR] at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:256)
13:36:09,092 ERROR [STDERR] at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:251)
13:36:09,092 ERROR [STDERR] at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.processInTransactionContext(SimpleScheduler.java:611)
13:36:09,092 ERROR [STDERR] at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:546)
13:36:09,092 ERROR [STDERR] at org.apache.ode.scheduler.simple.SimpleScheduler$RunJobCallable.call(SimpleScheduler.java:533)
13:36:09,092 ERROR [STDERR] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
13:36:09,092 ERROR [STDERR] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
13:36:09,092 ERROR [STDERR] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
13:36:09,092 ERROR [STDERR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
13:36:09,092 ERROR [STDERR] at java.lang.Thread.run(Thread.java:619)
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/gwt-console-server
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/bpel-console
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/jbossws
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/jmx-console
13:36:09,514 INFO [org.jboss.web.tomcat.service.deployers.TomcatDeployment] undeploy, ctxPath=/Quickstart_bpel_simple_invokeWS
13:36:09,530 INFO [org.apache.catalina.core.StandardService] Stopping service jboss.web
13:36:09,592 INFO [org.jboss.resource.connectionmanager.ConnectionFactoryBindingService] Unbound ConnectionManager 'jboss.jca:service=DataSourceBinding,name=BPELDB' from JNDI na
me 'java:BPELDB'
13:36:09,623 INFO [org.jboss.jmx.adaptor.snmp.agent.SnmpAgentService] SNMP agent stopped
13:36:09,733 INFO [org.jboss.resource.connectionmanager.ConnectionFactoryBindingService] Unbound ConnectionManager 'jboss.jca:service=DataSourceBinding,name=DefaultDS' from JNDI
name 'java:DefaultDS'
13:36:09,873 INFO [com.arjuna.ats.jbossatx.jta.TransactionManagerService] Stopping transaction recovery manager
13:36:09,873 INFO [com.arjuna.ats.jbossatx.jta.TransactionManagerService] Destroying TransactionManagerService
And *on ther other cluster node* I see the INVOKE_CHECK for the process is processed, but the INVOKE is not retries nor does process continue execution from the next step.
2011-12-12 13:37:08,573 DEBUG [org.apache.ode.bpel.engine.BpelEngineImpl] (ODEServer-2) handleJobDetails: InvokeCheck event for mexid hqejbhcnphr6tp10hgvakx
2011-12-12 13:37:08,573 DEBUG [org.apache.ode.sql] (ODEServer-2) getConnection (tx=2)
2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) prepareStatement: select messageexc0_.MESSAGE_EXCHANGE_ID as MESSAGE1_6_1_, messageexc0_.CALLEE as CALLEE6_1_, messageexc0_.CHANNEL as CHANNEL6_1_, messageexc0_.CORRELATION_ID as CORRELAT4_6_1_, messageexc0_.CORRELATION_KEYS as CORRELAT5_6_1_, messageexc0_.CORRELATION_STATUS as CORRELAT6_6_1_, messageexc0_.CORR_ID as CORR20_6_1_, messageexc0_.CREATE_TIME as CREATE7_6_1_, messageexc0_.DIRECTION as DIRECTION6_1_, messageexc0_.EPR as EPR6_1_, messageexc0_.FAULT as FAULT6_1_, messageexc0_.FAULT_EXPLANATION as FAULT11_6_1_, messageexc0_.OPERATION as OPERATION6_1_, messageexc0_.PARTNER_LINK_ID as PARTNER21_6_1_, messageexc0_.PARTNER_LINK_MODEL_ID as PARTNER13_6_1_, messageexc0_.PATTERN as PATTERN6_1_, messageexc0_.PIPED_ID as PIPED15_6_1_, messageexc0_.PORT_TYPE as PORT16_6_1_, messageexc0_.PROCESS_ID as PROCESS22_6_1_, messageexc0_.PROCESS_INSTANCE_ID as PROCESS23_6_1_, messageexc0_.PROPAGATE_TRANS as PROPAGATE17_6_1_, messageexc0_.REQUEST_MESSAGE_ID as REQUEST24_6_1_, messageexc0_.RESPONSE_MESSAGE_ID as RESPONSE25_6_1_, messageexc0_.STATUS as STATUS6_1_, messageexc0_.SUBSCRIBER_COUNT as SUBSCRIBER19_6_1_, props1_.MEX_ID as MEX2_3_, props1_.ID as ID3_, props1_.ID as ID14_0_, props1_.MEX_ID as MEX2_14_0_, props1_.PROP_KEY as PROP3_14_0_, props1_.PROP_VALUE as PROP4_14_0_ from BPEL_MESSAGE_EXCHANGE messageexc0_ left outer join BPEL_MEX_PROP props1_ on messageexc0_.MESSAGE_EXCHANGE_ID=props1_.MEX_ID where messageexc0_.MESSAGE_EXCHANGE_ID=?
2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) executeQuery, bound (1,hqejbhcnphr6tp10hgvakx)
2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) close()
2011-12-12 13:37:08,588 DEBUG [org.apache.ode.sql] (ODEServer-2) close()
2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) getConnection (tx=2)
2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) prepareStatement: select processdao0_.ID as ID9_0_, processdao0_.GUID as GUID9_0_, processdao0_.PROCESS_ID as PROCESS3_9_0_, processdao0_.PROCESS_TYPE as PROCESS4_9_0_, processdao0_.VERSION as VERSION9_0_ from BPEL_PROCESS processdao0_ where processdao0_.ID=?
2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) executeQuery, bound (1,129)
2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) close()
2011-12-12 13:37:08,681 DEBUG [org.apache.ode.sql] (ODEServer-2) close()
On the other hand if I kill the JBoss process from the windows task manager, instead of attempting a clean shutdown, the other cluster node is able to resume the process normally. The other cluster node takes over the scheduled INVOKE_CHECK activity. Once the INVOKE_CHECK is processed the INVOKE is reported to have failed, it is retried and the process instance continues executing on the surviving cluster node.
Thanks
Anuj