Below is the little background of the issue I am facing:
Overall architecture :
Currently we have 2 Apache server instances and 4 JBoss server instances in the Production for 2A.1. The Presentation i.e the css, images and js are put in Apache and the rest of the application, comprising of the classes, lib etc is deployed in the form of ATG.ear in the 4 JBoss machines. Also there is a Apache server instance for http compression.
Issues with JBoss
Till now after the 2A.1(first release deployed few months back) has gone live, the JBoss has gone down many times and restarted but this seems to be sporadic and inconsistent
Now the java related errors can be identified and after analysis it is given to the dev team for fixing, but if any major issues comes up then the only alternative remaining is:-
remove each of the jboss instance from the cluster,
rebuild the application,
deploy the ear,
put the instance back into the cluster and
start the application
Issue in following the above step:
There is a error downtime for 1 hr till all the JBoss instances dont start afresh with the freshly build ear deployed in it.
The Bcc deployed in the Live Production goes down and suddenly comes up automatically, this problem, even the tech Arch Team is not able to fix-up till now.
Any help in this will really be appreciated.
Thanks in advance,