I've discussed several times in the past why you need something like a Java EE application server, such as AS7, as the basis for your PaaS. Emphasis on "something like", because if you've got a CORBA implementation or an old DCE deployment lying around, those will do too at a pinch, though I have to say that AS7 is a better option. But precisely why is this the case?
Let's look at the fundamental core capabilities that I keep mentioning, because they will drive home the reasons:
- transactions; unless you're working with only one datasource then at some point you'll need transactions. These have the nice property of guaranteeing isolation and failure atomicity for work conducted within their scope. Using them with multiple databases and messaging services, for instance, will ensure that your updates occur and the message is delivered or nothing happen, even in the presence of machine crashes or concurrent users. And these days, in a multi-core world with increasing parallelism, transactions in the form of Software Transactional Memory, are becoming a first class programming construct. As an industry, transaction systems such as JBossTS (aka Arjuna Transaction Service), CICS or Tuxedo, have been around for decades and quietly running the backbones of mission critical environments.
- replication; now although transactions are great for guaranteeing consistency and correctness in the presence of failures, they can't provide forward progress - a machine crash will be tolerated by a transaction, but if the transaction is retried and the crash remains or happens again, then the application will not move on. If you replicate the machine or just some of the services on it, using an appropriate replica consistency protocol, then you can tolerate a finite number of failures. The types of failures to can tolerate can range from timing, through value and to Byzantine. As with transactions, replication protocols have been employed in standards based and bespoke distributed systems for many years, including air traffic control, finance and healthcare. JGroups is one of the foremost group communications frameworks and can be used as the basis of various replication protocols.
- security; making sure that your competitors can't read or write your data is pretty important, especially when it's in a shared environment. Making sure your users don't see data they're not entitled to is also a necessity. So security and authentication go hand in hand. In fact security is often one of the first things that any enterprise platform will incorporate, because otherwise it really doesn't matter if you can tolerate machine crashes if that just gives others longer to hack in to your data and processes! The industry has worked on many protocols, such as XACML, TLS and Bell-LaPadula, so there are solid foundations with existing platforms.
- messaging; it goes without saying that in a distributed system you need some way for participants to communicate. That could be through approaches like RPC or asynchronous message passing. And you might also want to incorporate other techniques like retained results and timeout/retries to make your life easier in the world of lost responses and out of order delivery. Fortunately because distributed systems live or die on their messaging implementations (reliability, performance, ability to cope with large payloads etc.) it's an area where maturity is a necessity. It doesn't matter if you were using CORBA or one of the bespoke distributed systems that predated even DCE, these things typically worked well. In Java EE, which builds on these, JMS is the standard and really good implementations such as our HornetQ project, exist and are well integrated into the platform.
- persistence; applications that don't need state are of very limited use! Whether your state is bank account details or the latest sky map details, it's got to be stored somewhere. In the past, relational databases (RDBMS) such as MySQL or Oracle, have been the data repository of choice, although other obvious candidates cam be used, such as the file system and replicated in-memory storage (after all, durability is probabilistic no matter what implementation medium you choose). These days we're seeing more discussions around what are being termed NoSQL (used to mean No SQL, but now tends to stand for Not Only SQL as people recognise it's not an either or problem). Whereas RDBMS are good generalist solutions, there are problem areas where they are suboptimal, e.g., where the number of participants is large or they are physically remote. In these situations, theories such as CAP come in to play and you have to make tradeoffs. There really is no such thing as a free beer! But NoSQL is still a relatively early field and not every use case needs to move away from RDBMS, or can tolerate the data inconsistencies that often occur in NoSQL, particularly when you realise that some implementations don't support transactions. (Hey, eventual consistency could very well be immediately too late for your application!) The work that the Infinispan and Hibernate teams are doing in this space is very important and well worth a look.
- standards; hopefully this one is fairly obvious? Vendor lock-in really doesn't help anyone except the vendor. Short term advantages, especially in the area of cost, can easily come back and bite you in the derrière! And where PaaS is concerned let's not forget that being able to move out of the cloud is just as important as moving in to it!
EE6 is a good standard that brings these and more together into a well defined stack. And AS7 is the best implementation of that standard that puts to death the old myths and FUD that Java EE is bloated and unusable. I could write a multi-page technical paper about all of the above and maybe I will. However, until then I believe I've outlined the reasons why I think that existing enterprise middleware stacks like AS7 (and specifically AS7!) are a very appropriate platform on which to build PaaS and AS7+OpenShift is a great way to put your toe into the water as well as jump in completely. And this is (and will be) irrespective of the application development language.