11 Replies Latest reply on Jul 15, 2003 6:54 AM by pbrant

    How does clustering on JBoss actually work?

    pbrant

      I have studied the paid doco for EJB clustering, and have thoroughly scanned this forum for any discussion regarding the underlying dynamic of clustering in JBoss...

      unfortunately, i have been unable to find satisfaction, due no doubt to my bad research abilities...


      Let me clarify my question.


      first, a description of my production environment.

      i am working on a system that has three logical tiers...

      1. a web tier, that consists of two physical servers, each configured with JBoss 3.2.1 (tomcat) and deployed with servlets and jsp (struts)

      2. an application tier, also two servers, configured with JBoss 3.2.1 (tomcat) and deployed with stateless session beans and entity beans...

      3. a data tier, two servers, configured with MySQL and using data replication...


      my question concerns the application tier and the nature of clustering as defined by JBoss...


      1. Stateless Session Beans:

      what does it mean to cluster a stateless session bean?

      why do it?

      what do you get?

      i came across the following, on the Oracle 9AI site:
      http://otn.oracle.com/tech/java/oc4j/doc_library/902/ejb/cluster.htm#1006535

      "Stateless session beans do not require any state to be replicated among nodes in a cluster. Thus, the only use of the clustering methods that stateless session beans have is load balancing between nodes."

      do you guys concur with this?



      2. Entity Beans

      also from the same site:


      "The state of the entity bean is saved in a persistent storage, such as a database. Thus, when the client loses the connection to one node in the cluster, it can switch to another node in the cluster without worrying about replication of the entity bean state. However, to ensure that the state is updated from the persistent storage when the load balancing occurs, the entity bean that changes state notifies other nodes that their state is no longer in synch. That is, that their state is "dirty". At this point, nothing is done. If failover occurs and the client accesses another node for this entity bean, then the bean notices that its cache is dirty and resynchronizes its cache to the "READ_COMMITTED" state within the database. "


      Does this square with how JBoss handles entity bean clustering?



      thanks
      Peter


      PS. no one has done so in quite awhile, so what the hell, i am gonna go for it, and claim the largest JBoss cluster...

      the architecture described above has:

      6 physical servers

      2 ethernet switches (failover ready), separating the web tier from the app tier

      2 ethernet switches (failover ready), separating the app tier from the data tier

      2 hardware-based firewalls (separating the web tier from the app tier)

      1 hardware based load balancer sitting in front of the web tier

      12 CPUs (Xeon)

      8 GB RAM

      600 GB HDD

      Gb Ethernet fabric



      c'mon guys, waddya got?

      :-)

        • 1. Re: How does clustering on JBoss actually work?

          http://www.jboss.org/modules/bb/index.html?module=bb&op=viewtopic&t= is a cache invalidation service for entity beans.

          Regards,
          Adrian

          • 2. Re: How does clustering on JBoss actually work?
            buckman1

            Well, if that is the largest jboss cluster, we'll have you beat soon enough ;) Our initial foray is a simple 2 or 3 node jboss cluster. But we will no doubt grow them quite large in our ASP environment! So you win... for now. ;)


            So let me explain what I know to answer your questions, hopefully it will help somewhat.

            In your setup, your ideal situation is to do all the session fail-over, state replication, etc in the web tier. While as far as I know JBoss and other app servers can do stateful clustering, fail-over, etc in the ejb tier, it seems the web tier is a bit ahead in that arena. Maybe I am wrong, JBoss seems pretty capable, as do others. I have just seen the use of the web tier used more so for this for many reasons. One reason in particular seems to be the issue of sticky sessions and clients. We have some pretty elaborate cisco load balancers and they work by keeping track where a client is balanced to by using the users cookie passed from the browser. In our situation, we happen to be using Java Swing clients connecting directly to the EJB tier, and we have not really found a good way to load balance between servers other than using JBoss' smart proxy client balancing capability. It works well enough, but seems to suffer a few minor issues.

            Anyway, back to your questions. Why cluster EJB? Well, I suppose if you can't maintain state in the client, it may be more beneficial to maintain it in the EJB tier. Somewhere you want to maintain state if necessary by your application. Not all applications require this. Another reason is a multiple request/response transaction. I can't really think of a good example, but there are cases where you may start a transaction, update part of a DB, continue with some info, and so forth. At any point, the transaction could be rolled back should the full requirements of completing the transaction not be completed. This may be that a user must enter some info, it saves in the DB, then the user enters more info based on what they entered previously. By storing the state on the EJB side, you may be able to use a web client, applet client, CORBA client, RMI client, etc all connected to your system without each client having to maintain the state. Much like a cart system though, you'll want a way to ensure that if a server a clien'ts info is on fails, the client doesn't see any interruption. Thus, that may be an example of when to use EJB statefull beans. I have not personally worked with statefull entity beans, and I am not sure why you would need to do that when the container caches beans as necessary, etc. Maybe someone can give a good example of that?

            If you are using a web tier, I would seriosly consider doing statefullness in that tier. Unless you have outside applications having to connect directly to the ejb tier, httpsessions seem to me to be much simpler for statefullness, replication, etc. I would also consider using Jetty over Tomcat. It is a faster web server, and uses the same JSP engine as Tomcat.

            Crap, dinner time. Sorry, food is priority #1! ;) I hope I have helped a little bit. Post some more questions, keep the thread goind, lets get a good line of questions and answers from all that know in regards to your post!

            • 3. Re: How does clustering on JBoss actually work?
              pbrant

              Hello Adrian,

              thank you for your response...

              if you don't mind i would like to clarify your statement, to ensure that i understand...

              "there is a cache invalidation service for entity beans"

              i take the above statement to mean that jboss does indeed implement the scenario described in my previous post, whereby an entity bean will broadcast a "dirty state status" message to all of its "peer entities" deployed throughout the cluster...

              is this correct?


              also, does this mean:

              a. that this service is supplied by jboss automatically when you configure entity beans for clustering?

              or

              b. that we need to configure this cache invalidation service in order to get the benefits?

              or what?


              also, i read the thread that you linked to, and the main posting that describes clustering for each layer in the stack was very interesting...thank you for that.

              more questions :-)

              1. you state the following:

              "Stateful session beans - failover/load balancing
              and replicated sessions"

              i think i understand the failover/load balancing, however i am uncertain about the replicated session part...

              when you say replicated session i have to assume that you are not referring to HTTP Sessions, rather you mean the session between a Remote Interface on a client and the Session bean on the server?

              am i correct so far?

              if so, am i correct in assuming that by replicating a client/server session across each app server in the cluster, jboss is able to provide seemless failover for the client in the case of a crashed server?

              is that how failover works at the session bean layer?



              2. regarding entity beans...

              what you state here is even more interesting...

              "Entity Beans - failover/load balancing
              if you really want to access entity beans remotely :-)
              State is in the db."


              with my current production deployment, the session beans are colocated with the entity beans in the same JVM, and this arrangement is duplicated across each app server in the cluster...

              also, my web tier does not access the entity beans directly, rather access is always performed through a session bean...

              so, given my situation, does your statement regarding the inadvisability of accessing entity beans directly from a remote client, imply that i should not bother clustering entity beans, since i am already clustering the session beans?

              in other words, is it redundant to cluster both session beans and entity beans if they are colocated in the same JVM?


              thank you for any guidance that you can provide.

              Peter

              • 4. Re: How does clustering on JBoss actually work?
                pbrant

                Hello Kevin,

                thank you for the thoughtful response, you have posted a lot here, so let me take it one portion at a time...

                first off, i have no doubt that you will soon surpass my *massive cluster*,

                "glory is fleeting" :-)

                i am impressed that you have committed to jboss for an ASP application, that is a huge vote of confidence for the jboss platform, very cool, i wish you great success...


                to business.

                You state the following:

                "In your setup, your ideal situation is to do all the session fail-over, state replication, etc in the web tier."

                the client was quite specific in that they wanted to keep the web tier completely stateless and lightweight...basically a pass thru layer, using struts...

                i think that the motivation behind this was security...in the sense that the web tier is exposed to the Internet, and the app tier is behind firewall protection, therefore keep all state, including session state on the app tier, and protected...

                i am only guessing here, "who knows what clients really think" :-)...

                does this line of thinking make sense?

                are they in fact getting better security by relying on the app tier for replication/failover?

                perhaps there is a performance reason as well...

                keep in mind that my application stack is purely web based, using struts, and it would seem to me that using non-sticky sessions may improve performance...

                by relying on session bean clustering, and i assume this implies the replication of "Remote Interface/Session Bean" sessions between the web server and each app server (see my previous posting to Adrian regarding this), we free up the web server to decide which client request is best routed to which app server, on a request by request basis...

                would this not lead to a more performant application?

                or is the overhead implication going to drown out any benefits?

                or do i know what the hell i am talking about?

                your thoughts?


                another question, on this topic...

                by relying on all state replication and failover in the web tier, are we not then relying on the abilities of the embedded servlet container, be it jetty or tomcat?

                i mean, it is the servlet container that handles HTTP Sessions, and it also manages the Servlet and JSP resources underlying Struts, and finally, in my Web Tier, i am only deploying the .war file to each web server...

                am i on the right track here?

                is that what you meant when you said "the web tier is a bit ahead in that arena" ?



                yet another question, or perhaps simply a thought stream...


                i have read somewhere that HTTP Session replication is quite expensive and will therefore not scale very well...but performance is relative, and is a meaningless concept unless compared against an alternative...

                so, let's compare with Session Bean replication...


                what are your thoughts on this?

                it seems like you guys had no choice but to rely on HTTP Session replication since you are offering highly interactive swing based clients, but if you had the choice, which would you choose, from a performance standpoint?


                one more,

                what do you mean by "smart proxies", i don't remember reading about these, could you point me to info on these?



                finally,


                "I would also consider using Jetty over Tomcat. It is a faster web server, and uses the same JSP engine as Tomcat."

                the client specified tomcat because it is the "reference" and is therefore politically safer i guess...

                i would agree from a design standpoint...i have had the pleasure of investigating, in depth, the source code and architecture of Jetty v4.2.10 and Tomcat v4.1.24...

                Jetty is tight, elegant, and quite straight forward...it took me about 2 hours to compile from source and figure out how things hang together...

                Tomcat on the other hand, good god, what a mess!
                I'll spare the details, but one thing was particularly disturbing, the external jar dependencies...it borders
                on spaghetti....if you don't believe me try downloading the source and compiling it...


                but i wonder, is Jetty really faster?
                i mean, in your experience?

                have you tested between them and compared, on your current ASP implementation...

                i will say this, they are seemlessly compatible, disregarding a few startup parameters, i know this because our developers are using embedded jetty and the production servers use tomcat....lunacy i know, but there you have it.


                i also know that tomcat has made optimization a priority starting with v4, and reading their release notes, it continues to be a priority for v5...

                v4 is certainly much faster than v3...i have direct experience of this

                also, i wonder, considering the recent split between JBoss and Core Developers Network...JBoss just announced that tomcat is the new priority and default web container for jboss v4 and moving forward...and of course, ALL, of the core developers for jetty went with CDN, so you have to wonder about the political ramifications...




                look forward to your response...
                Peter

                • 5. Re: How does clustering on JBoss actually work?
                  pbrant

                  Kevin,

                  i forgot to mention that i very much agree with your suggestion about building this thread of discussion...

                  let's thrash the sh*t out of it, and build some shared practical knowledge regarding clustering on jboss....


                  i am certainly game if you are...i know i have many more questions and points of discussion :-)


                  cheers
                  Peter

                  • 6. Re: How does clustering on JBoss actually work?

                    > Hello Adrian,
                    >
                    > thank you for your response...
                    >
                    > if you don't mind i would like to clarify your
                    > statement, to ensure that i understand...
                    >
                    > "there is a cache invalidation service for entity
                    > beans"
                    >
                    > i take the above statement to mean that jboss does
                    > indeed implement the scenario described in my
                    > previous post, whereby an entity bean will broadcast
                    > a "dirty state status" message to all of its "peer
                    > entities" deployed throughout the cluster...
                    >
                    > is this correct?
                    >

                    Yes

                    >
                    > also, does this mean:
                    >
                    > a. that this service is supplied by jboss
                    > automatically when you configure entity beans for
                    > clustering?
                    >
                    > or
                    >
                    > b. that we need to configure this cache invalidation
                    > service in order to get the benefits?
                    >
                    > or what?
                    >

                    http://www.onjava.com/pub/a/onjava/2003/05/28/jboss_optimization.html
                    At the end of the second page.

                    >
                    > also, i read the thread that you linked to, and the
                    > main posting that describes clustering for each layer
                    > in the stack was very interesting...thank you for
                    > that.
                    >
                    > more questions :-)
                    >
                    > 1. you state the following:
                    >
                    > "Stateful session beans - failover/load balancing
                    > and replicated sessions"
                    >
                    > i think i understand the failover/load balancing,
                    > however i am uncertain about the replicated session
                    > part...
                    >
                    > when you say replicated session i have to assume that
                    > you are not referring to HTTP Sessions, rather you
                    > mean the session between a Remote Interface on a
                    > client and the Session bean on the server?
                    >
                    > am i correct so far?
                    >
                    > if so, am i correct in assuming that by replicating a
                    > client/server session across each app server in the
                    > cluster, jboss is able to provide seemless failover
                    > for the client in the case of a crashed server?
                    >
                    > is that how failover works at the session bean
                    > layer?
                    >

                    Yes, failover means replicating the state across the
                    cluster.

                    >
                    >
                    > 2. regarding entity beans...
                    >
                    > what you state here is even more interesting...
                    >
                    > "Entity Beans - failover/load balancing
                    > if you really want to access entity beans remotely
                    > :-)
                    > State is in the db."
                    >
                    >
                    > with my current production deployment, the session
                    > beans are colocated with the entity beans in the same
                    > JVM, and this arrangement is duplicated across each
                    > app server in the cluster...
                    >
                    > also, my web tier does not access the entity beans
                    > directly, rather access is always performed through a
                    > session bean...
                    >
                    > so, given my situation, does your statement regarding
                    > the inadvisability of accessing entity beans directly
                    > from a remote client, imply that i should not bother
                    > clustering entity beans, since i am already
                    > clustering the session beans?
                    >
                    > in other words, is it redundant to cluster both
                    > session beans and entity beans if they are colocated
                    > in the same JVM?
                    >

                    You will need it for cache invalidation, you
                    don't need it for the session beans.

                    >
                    > thank you for any guidance that you can provide.
                    >
                    > Peter

                    Regards,
                    Adrian

                    • 7. Re: How does clustering on JBoss actually work?
                      buckman1

                      Hey Peter,

                      Sorry, did not know that the web tier had to remain stateless. Are your clients running a java swing client to connect through web services at all? You indicate that it will be stateless and a "pass thru". A couple of thoughts arise. First, you could use SSL and even better, a hardware SSL decoder/encoder pass your first firewall that decodes ALL SSL so that your web container itself never has to deal with the SSL stuff. They have devices for a few grand that decode 1000's of SSL requests simultaneously these days. By going SSL with 128-bit encryption, that should eliminate any security woes, but that isn't always going to impress your clients.

                      But in reality, we too face some clients that refuse to open any ports other than 80 and 443, and thus our swing client using port 1100 is out of the question. These are the facts of life, and the client (or customer) is always right if you want to keep the job. Therefore, I'd say the line of questioning is not really relevant unless you feel you can pursuade this client into going the route above or something similar. I am guessing they are using web browsers.

                      I personally feel the web tier would be faster at replication becuase in general it has a lot less to deal with. In-memory replication simply serializes/deserializes objects as the setAttribute() call is made on a given session. Multi-cast and/or other methods are used to sync the various nodes of a web tier cluster, but in general, the clustering doesn't deal with remote objects, entity beans, etc. It only deals with the httpsession objects. Therefore, although I can't back this up with actual facts at this time, I would think the web tier would be better suited to cluster and handle fail over. Plus, since ALL requests come in through this tier, it is the first point you are going to be handling lots of traffic through. I am also guessing the ejb tier has a lot more overhead when creating stubs, remote interfaces, objects, entity beans, etc than the web tier. Thus, the web tier may be a bit better at handling lots of clients than the ejb tier. Most likely you would put hardware load balancers and/or switches in front of the web tier, as well as between the web tier and the ejb tier. My personal preference is to always keep the ejb tier stateless, as it is faster thus handles more. Let it do all the logic of the app, but store the state in the web tier because of its relative ease as well as it's the place you will build your pages from. You might be able to get better performance on your site as well. If a user requests to refresh his cart, you have to make an extra trip to the ejb side to get his information state. If its on the web tier, its right there. That alone could aid in better performace, avoiding tons of ejb tier calls.

                      As for non-sticky sessions, well again I refer you to reading up on hardware load balancers. They can handle a LOT more than your web servers most likely. The Cisco switches we have have self contains cookie capabilities and I am betting most pro grade balancers, switches, etc have similar capabilities. So long as the first request to a server can be saved in the balancer so that subsequent requests go to that server, you are fine. Although, if you are using in-memory replication, it really shouldn't matter what server a web request goes to, as long as it goes to the same partition or island, cluster group, what have you. That is, if you do the "good setup" of 3 nodes per cluster, you want to make sure that the client always hits that same group of nodes. Every time your app stores something in the http session it will automatically be replicated across to the other two nodes. Therefore, if the next request goes to any of the 3 nodes, that should be fine. With EJB it is a bit different I assume, we are also figuring this part out. Ideally we see that the users requests should remain at the same server mostly because of the entity bean caching stuff. If the user pulls a large query on one server, then comes back and hits another server, that large query has to be done again before the data is available. That to me is anothe reason to keep state at the web tier. Your web tier can request data from any ejb node and not worry about state. So long as they are load balanced, you don't need them failed over since none will contain any state to fail over from.

                      As for more performance using the web tier to figure out what ejb server to go to, well, it sounds more like that you are using the web tier to server pages, but also act as a load balancer to the ejb tier. I don't know that I agree with this approach, but again I am not an expert on this. I have not done what you are doing in this regard.

                      This is also my second attempt at clustering. My first was with Orion app server which was pretty nice. JBoss has more capabilities that I can tell, and they both offer the multi-cast auto discovery, etc. Both were very simple to initially get nodes to see each other. I only did web clustering on Orion, never EJB, so I don't know how well it performs there.

                      Yes, the web container itself would handle the clustering stuff as well. I don't know how good Tomcat is, but Jetty has pretty good clustering and replication capabilities. I still think in-memory is best with 3 nodes per cluster group. If you need to scale, add another partition of 3 nodes. Hopefully sub-node clustering will come into place soon. Basically its the same thing as adding more partitions, only you add nodes to the same partition and group 3 nodes to cluster with each other and no others. This at least allows a single app to belong to a single partition, but still allows you to scale that particular partition without the growing memory requirements if you added more nodes to the same one cluster group.

                      As for HTTPSession being expensive, naturally it is a lot more expensive than a non-clustered app. But again I believe the ejb tier is more expensive due to its larger overhead in what it achieves. I don't know for sure, but when you store an object in the http session, it gets serializes, that one object, to each node in the same group. In a 3 node setup, this should be relatively fast, only two serialize/deserialize steps are needed to ensure the users state is failed over. It may be similar in EJB, but I think with the worry of entity beans, cached entities, local and remote references, and so forth, there is a lot more that can go wrong and a lot more than can eat up cycles in the ejb tier.

                      Also, keep in mind different replication mechanisms eat up more or less time but offer more or less capabilities. If you use a DB that all nodes replicate to, you don't need the 3-node limit per cluster group. You can expand for ever, so long as your DB can handle all the extra nodes you may add. You can even load-balance the DB layer that handles storing httpsession state so that if you have 10 nodes with one DB and that proves to be the "max", adding another 10 nodes should require a 2nd DB and a load balancer between them. Then you have to deal with if you should replicate the data to all DBs, or if the 10 nodes should always use the one DB and somehow your state replication must be "sticky" between the 10 nodes and the specific DB that they are in (if you added 2 or more DB's behind a load balancer).

                      Smart proxy is how JBoss allows a client to load balance without the use of a load balancer. RMI allows an object to be sent back to the client from the server. Normally this is an empty stub or somethign (not an expert on this either). JBoss sends back a little bit extra. IT sends code back that allows ALL client requests to use this one proxy to load balance between the app servers. You should see sometime like this regardless of client (Swing or web). In other words, you don't have to use a load balancer. Keep in mind the point of failure issues. IF you have a load balancer, you need two and two virtual IPs for your one main IP to your site. Otherwise you stand a single point of failure at your load balancer. Our IT guy hammered this into our heads! ;) But because JBoss returns a "smart proxy", every client request can be round robin or first available (or any other policy you decide for your beans) to the ejb tier. Thus, you can get by without a load balancer and the single point of failure would be your client itself. Read up more on that, or ask more and i'll respond more later on that topic.

                      I did not read anything about the tomcat/jetty issue. I was under the impression that Jetty was the web container of choice for good. Sacha, Bill, can either of you shed some light on this topic? Perhaps JBoss is trying to be more "standards" based by using the Reference Tomcat as opposed to Jetty? I don't know for sure.

                      Jetty is faster in regards to web pages being returned. JSP pages is another story, they both use the same JSP engine, Jasper. Not entirely sure these days, but I also don't care for how Tomcat is to be configured. Kind of a mess in my opinion.

                      Ask away, I'll answer more later.

                      • 8. Re: How does clustering on JBoss actually work?
                        pbrant

                        >> in other words, is it redundant to cluster both
                        >> session beans and entity beans if they are >>colocated in the same JVM?

                        >You will need it for cache invalidation, you don't need it for the session beans.

                        so, to clarify, what i think you are you saying is:

                        it is NOT redundant to cluster both session beans and entity beans colocated in the same JVM...

                        each clustering strategy (session, entity) provides different benefits...which coexist and provide synergistic benefit.

                        1. clustering session beans provides seemless session failover to the client in case of a server crash

                        2. entity beans, are in essence, a highly robust object cache, which ultimately, if designed and used properly, will dramatically improve application performance...

                        in other words, entity beans enable us to cache system state in working memory, and therefore minimize database interaction.


                        by clustering entity beans, we are enabling a distributed object cache that spans each application server in the "java group" cluster...

                        however, clustering entity beans does not imply replication of state and therefore does not provide failover...


                        am i getting more precise in my understanding, or am i veering of course?


                        also, thank you for the link to that article, it is excellent.

                        i intend to study this article more closely, is it ok to post questions when they come up, i know you are extremely busy?



                        thanks
                        Peter

                        • 9. Re: How does clustering on JBoss actually work?
                          pbrant

                          Kevin,

                          wow, great response, thank you.

                          once again, i will take it one portion at a time....


                          1. "Sorry, did not know that the web tier had to remain stateless.."

                          hmmm, you know, to be honest, i can't remember if the client actually specified that the web tier should be stateless, or if our senior architects simply dictated it as a "best practice"...

                          i think the latter....in fact, now that i think of it, i am certain that it came from us, not the client....

                          and yes, you are correct, the application is accessed exclusively by browsers, this is what i meant by web-based, but i see now that saying "web-based" in no way implies browser access...sorry for the lazy nomenclature.

                          i tend to agree with the philosophy (design pattern) of creating a thin presentation layer, preferably through a browser, that remains thin and non-complex...but then again, of course i would...fully interactive, gui based clients are so much more complex and time consuming...even with the relative cleanliness of swing...

                          anyway, our appliction is accessed exlusively by Internet browsers, and the presentation layer is remarkably thin, considering the functionality that it provides...

                          why did you guys settle on the much more complex swing platform to implement your presentation layer...is it because you needed advanced user interaction?

                          or what?


                          2.
                          "I personally feel the web tier would be faster at replication becuase in general it has a lot less to deal with.", and the rest of this paragraph...

                          yeah, everything you say here rings true, now that you say it, it is obvious, of course the web tier has less to deal with...clearly this is true.


                          3.
                          "My personal preference is to always keep the ejb tier stateless"

                          hang on, by stating that the ejb tier should be stateless, are you not implying that an application should only implement stateless session beans in the app tier, and forgo entity beans entirely?

                          what i mean is, entity beans, are by definition, stateful, that is their point for being, as i understand it, so by suggesting a stateless ejb tier, are you not, by implication, suggesting an app tier with no entity beans?

                          please clarify.


                          4. ok, here is the meat!

                          "if you do the "good setup" of 3 nodes per cluster"

                          why is 3 nodes per cluster the ideal?
                          what are the advantages?
                          the disadvantages?


                          also, you say,

                          "I still think in-memory is best with 3 nodes per cluster group. "


                          by *in-memory*, i assume you mean in-memory state, stored in an http session, and the distribution of this state to peers in the cluster...

                          is this correct?



                          so, to summarize, as i understand it, you are suggesting that we push the clustering technology as far up the application stack as possible, right into the web tier...

                          then, we intercept and process incoming requests as quickly and directly as possible, and deal with each request in a clusterd context as early as possible,

                          furthermore, by pushing clustering into the web tier, we must, almost by definition, support the concept of sticky sessions...and implement it in our clustered web tier...

                          so the emphasis is to minimize the notion of clustering the session and entity beans in the app, and the fundamental rational is that it is much easier, more proven, and ultimately, more efficient, then the app server alternative...

                          is this correct?


                          Peter

                          • 10. Re: How does clustering on JBoss actually work?
                            buckman1

                            Hi,

                            AS for why we went with Swing over web, well, I can say that in one application it was too complex for modern day browsers. Meaning, we had the ability to add many many lines, on going, dynamically, each line containing several input boxes. But more so it worked across 7 frames! That is, we had different displays in each frame to keep the user updated. If they changed something in one box, with javascript I had to update a page in another frame. When they clicked on a submit button I had to submit 4 frames, and hope all came back in the right order, which was never the case! IT got very complex and would definitely have been much easier to implement in a swing client.

                            Some applications are just extremely difficult to build rapdily in web form, and I think also part of it is security, and the other thing is the much more control over the look of the app. Web is just too limited for certain types of apps, and doesn't lend well to those that require multiple views going at one time. It works, just overly complex and too easy to see get screwed up. Other business reasons play into this as well.

                            Not that I don't like the web stuff. I built my own MVC framework which I am about to open source, got to know servlets/jsp, clustering (through Orion at the time), session state replication, etc. I even wrote a chapter in a Wrox book on performance and scalability of web apps. I love the stuff.


                            As for ejb stateless, not at all what I meant. I mean keep the session beans stateless, so that communication between your stateful web layer and your stateless logic/ejb layer, is very fast, simple, and you know ejb is doing your business logic, but doesn't need to maintain any state. Entity beans as you said are stateful in that they represent the data in the DB. So naturally you have stateful entity beans, but not in the manner where they would be failed over. I can't quite come up with a reason why entity beans would ever be used for a user state? They are cached with data, store data, yes, but they shouldn't be storing state of a user, such as user id, user name, login time, etc. I mean, actually, they would be used to store that data, but you wouldn't genenerally use entity beans to store the state that is accessed from the web tier! That would mean your web has to make a remote call to session bean, which in turn has to make either a local (hopefully) or remote call to an entity bean to get that info. And if entity bean state is not replicated, you are then forced to make sure the client ALWAYS goes to the same one server, which then becomes a single point of failure again. Thus, I see no point in entity beans ever being part of the stateful fail-over equation. However, i can see where they may want to "replicate" their data across nodes so that any server an ejb session bean uses (whether remote or local) entity bean, it has the same cached data.

                            As for the "meat". If you are doing in-memory replication, which basically means for every node you have that is in the same cluster, its entire HttpSession state (all objects in it) are replicated to every other node in the same cluster, you have to consider that if the memory of just one node is 4GB, and the server may actually use say 2GB of that memory for session state, and all that state has to replicate to other nodes, well, each node has to have enough memory for its own state, plus every other nodes state. Two things here. First, if any one server is using 2GB worth of HttpSession, your app is seriously in need of rewriting. Session state is for things like cart data, login info, etc. Anything that is not used that often and can be retrieved by a remote lookup (which generally is going to be pretty fast due to most two teir web/ejb setups being within the same 100MB or 1GB lan space in a rack somewhere in the same colo facility... but not always), should stay there. With entity caching, and more on the ejb side, there is no need to store stuff in the HttpSession that is easily retrievable from the ejb side of things and is not used that often. On the other hand, dynamic data such as when forms are processed, that data no doubt has to stay in the http session. Again another thing I have seen is developers often don't clean up their state! They fill it up, have a ref to it somewhere, or wait for the GC to clean up. The thing is, when you are done with a particular "state", such as a user in a cart system submits his/her order, remove the state from the HttpSession. No need to keep it lingering. Just because the application is done using the state, the user logged out, whatever, if the htttpsession data for that user is not removed, it lingers, thus taking up more memory, on every node no less.

                            In-memory is also the fastest stateful capability, but not always the best in all cases. Ideally you only need 2 nodes to have session failover, right? But the problem is, if one node dies, you leave one node left with no fail-over path. With a 3-node setup, if one node dies, you still have two nodes.

                            You would also want to test performance on two nodes only, leaving the 3rd as an extra "cushion" for peak times, as well as if one node dies. Idealy, load test for about 80 to 90% capacity on two nodes. Add a 3rd node and you should be able to do continous 60% or so (give or take..) and leave room for spikes, peak traffic.

                            Now, let's not confuse the issue by assuming all containers are created euqal. Jetty handles in memory and DB. In-memory is best suited for 3 nodes per cluster. If you use DB, you can use a single partition, and have all nodes store state in the DB. IT is slower, and if you don't have redundancy at the load balancer and DB layer for storing state you may introduce yet another single point of failure, but you can add node after node to scale. I personally don't like this setup.

                            Yes, you have to use sticky sessions. A good load balancer (Cisco) is sticky/cookie aware. They will properly take a users cookie and keep track of what server it went to, always routing the user to that server. In case of failure, they are smart enough to route the user to another server in the same cluster. It may even be possible to have the load balancer updated with cpu usage per node, requests per node, and other performance measures to ensure optimum load balancing.

                            I am not the authoritave figure on all this. I am going by what I have researched, seen in use, and played with myself. I feel the web stack is a better and less risky place to load balance. There are specific uses for ejb statefullness, transactions in particular, although I personally haven't seen a use for it and prefer the stateless setup for its better performance and less worry about fail-over issues. I also feel that if you are doing a web app, the web tier is by and far the best place for state. You are generating pages dynamically based on the state. So why have to make calls to the ejb tier to get that state? If a user changes state and saves it, you store it through ejb. If a user flips pages and you need to determine how to build a page based on previous selections, then using HttpSession is the only way to go for performance. I suppose a combo of both could be done as well.

                            Hope that helps.

                            • 11. Re: How does clustering on JBoss actually work?
                              pbrant

                              Kevin,

                              very helpful indeed,

                              thank you so much for your high quality input and analysis, it is most appreciated!

                              i have much to think about.


                              i will reply meaningfully, as soon as i wrap my head around the above content.


                              thanks again.
                              Peter