9 Replies Latest reply on Oct 28, 2009 2:09 AM by rmdavidson

    3-node cluster with stickySession and failover

    rmdavidson

      Hi everyone,

      I've managed to mostly get mod_cluster working in a cluster with two Apache load-balancer boxes running mod_cluster 1.0.2.GA and 3 JBossAS 4.2.3.GA nodes.

      Sticky sessions are working well in general - people are being routed to various web nodes and their sessions are sticky to those nodes.

      I've been playing with the stickySessionRemove and stickySessionForce settings. If I understand correctly, stickySessionRemove should remove the cookie if the session can not be routed to the node its stuck to, eg. if that node goes offline. If thats the case, then shouldn't the cookie be updated to contain the name of a node that IS running,possibly on the second request?

      What I'm trying to achieve is to have a 3 node cluster, with sticky sessions. When one node dies, the sessions assigned to that node should be re-assigned to other nodes in the cluster as needed. The cookie should then be updated and the user will be stuck to a new node.

      Our 3 JBossAS node jvmRoute settings are ws01, ws02 and ws03. If a user is stuck to node ws02 for example, and I shut down this node, they will be routed to either node ws01 or ws03. Their JSESSIONID cookie is not updated and still has ".ws02" appended to the end. So I'm guessing that since this jvmRoute is now dead, mod_cluster is sending the request to any available node, but its also NOT updating the cookie. Is it possible to get mod_cluster to update the cookie to a valid node automatically?

      JBossAS Config (jboss-web.deployer/server.xml):

      
      <Listener className="org.jboss.modcluster.ModClusterListener" advertise="false" proxyList="lb1:80,lb2:80" excludedContexts="ROOT,invoker,jbossmq-httpil,jbossws,jmx-console,juddi,web-console" balancer="mybal" domain="mydomain.example.com" stickySession="true" stickySessionForce="false" stickySessionRemove="true" nodeTimeout="300"/>
      
      


      Then further down, in the same file, under the AJP connector section, we have:

      
      <Engine name="jboss.web" defaultHost="localhost" jvmRoute="ws01">
      
      


      In Apache, the following config applies. Basic mod_cluster config:

      
      LoadModule slotmem_module modules/mod_slotmem.so
      LoadModule manager_module modules/mod_manager.so
      LoadModule proxy_cluster_module modules/mod_proxy_cluster.so
      LoadModule advertise_module modules/mod_advertise.so
      
      <Location /mod_cluster-manager>
       SetHandler mod_cluster-manager
       Order deny,allow
       Deny from all
       Allow from 127.0.0.1
       Allow from xxx.xxx.xxx.xxx
      </Location>
      
      


      Apache virtualhost configuration:

      
      <VirtualHost *:80>
       ServerName mydomain.example.com
       ManagerBalancerName mybal
       ServerAdvertise off
       CreateBalancers 1
       ProxyPass / balancer://mybal/ stickysession=JSESSIONID
      </VirtualHost>
      
      



        • 1. Re: 3-node cluster with stickySession and failover
          jfclere

          domain="mydomain.example.com"...
          If all the node are in the same domain the sessionid is not going to be removed, the domain means that the sessions are "replicated" between the nodes of the same domain.
          Remove domain="mydomain.example.com" and StickySessionRemove="true" will work.

          • 2. Re: 3-node cluster with stickySession and failover
            rmdavidson

            Thanks for the information. I've removed the domain="mydomain.example.com" part and restarted the JBoss instances and also the Apache servers just for completeness.

            This hasn't helped though. When I manually shut down the node I was on (ws01 this time) and visited the site again, the JSESSIONID cookie was still showing ".ws01" on the end. It was then sending my requests to random nodes again.

            Any other ideas?

            • 3. Re: 3-node cluster with stickySession and failover
              rmdavidson

              Some more info:

              Setting stickySessionRemove="true" or stickySessionRemove="false" on the nodes seems to have no affect, nor do I get an updated cookie. Its like its stuck in "false", if I understand the documentation correctly.

              Setting stickySessionForce="true" and taking the node I'm "stuck" to offline does not produce a 500 error, the documentation leads me to believe the opposite should be true, instead I just get directed to random nodes.

              I've also recompiled mod_cluster with extended debugging on (HAVE_CLUSTER_EX_DEBUG = 1 or something similar) and set Apache to LogLevel debug. I've noticed something that looks odd to me, but I'm not familiar with the code so it might be nothing, but here goes.

              I've stopped all JBoss instances.
              After that, I've restarted the two Apache balancers I've set up.
              Then I've restarted all of the nodes (ws01, ws02, ws03 and ws-adm01)
              FWIW, ws-adm1 is not for public use and is in a different balancer="..." than the other 3 nodes, which are for public use. I only mention this in case having more than one balancer has any significance.

              After starting all nodes, I send a request that goes to one of the backends (ws01/02/03), then I shut down whichever node I was "stuck" to with the sticky session. In this case it was node ws01.

              I then send a second request and get bumped to another node, but here is what shows up in the logs, which suggests to me that either it thinks that a domain is still set up (which its not) or that ws01 is still online? (its not).

              also - I've had to replace some stuff in these log entries. "testsite", "www.testsite.com.au" and "mybal" are all substitutions, which is mainly to protect myself from potentially getting in trouble from manager types...

              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1710): proxy_cluster_trans for 0 (null) (null) uri: /nuxeo/site/testsite/video args: (null) unparsed_uri: /nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1567): cluster: Found value 7899B443DF124122D67DF549557B3C69.ws01 for stickysession JSESSIONID|jsessionid
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1575): cluster: Found route ws01
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1522): find_nodedomain: finding node for ws01: mybal
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1535): find_nodedomain: finding domain for ws01: mybal
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1567): cluster: Found value 7899B443DF124122D67DF549557B3C69.ws01 for stickysession JSESSIONID|jsessionid
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1575): cluster: Found route ws01
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1522): find_nodedomain: finding node for ws01: mybal
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1535): find_nodedomain: finding domain for ws01: mybal
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(868): get_balancer_by_node testing node ws02 for /nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(892): get_balancer_by_node testing host localhost
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(902): get_balancer_by_node testing context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(914): get_balancer_by_node found context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1758): proxy_cluster_trans using mybal uri: proxy:balancer://mybal/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1775): proxy_cluster_canon url: balancer://mybal/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(2087): proxy_cluster_pre_request: url balancer://mybal/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1020): proxy: Entering byrequests for CLUSTER (balancer://mybal)
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(868): get_balancer_by_node testing node ws02 for /nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(892): get_balancer_by_node testing host localhost
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(902): get_balancer_by_node testing context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(914): get_balancer_by_node found context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(868): get_balancer_by_node testing node ws03 for /nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(892): get_balancer_by_node testing host localhost
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(902): get_balancer_by_node testing context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(914): get_balancer_by_node found context /nuxeo
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(1102): proxy: byrequests balancer DONE (ajp://10.250.206.145:8009)
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_cluster.c(2237): proxy_cluster_pre_request: balancer (balancer://mybal) worker (ajp://10.250.206.145:8009) rewritten to ajp://10.250.206.145:8009/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy.c(993): Running scheme balancer handler (attempt 0)
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_http.c(1910): proxy: HTTP: declining URL ajp://10.250.206.145:8009/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] mod_proxy_ajp.c(626): proxy: AJP: serving URL ajp://10.250.206.145:8009/nuxeo/site/testsite/video
              [Tue Oct 27 21:32:13 2009] [debug] proxy_util.c(1991): proxy: AJP: has acquired connection for (10.250.206.145)
              [Tue Oct 27 21:32:13 2009] [debug] proxy_util.c(2047): proxy: connecting ajp://10.250.206.145:8009/nuxeo/site/testsite/video to 10.250.206.145:8009
              [Tue Oct 27 21:32:13 2009] [debug] proxy_util.c(2145): proxy: connected /nuxeo/site/testsite/video to 10.250.206.145:8009
              [Tue Oct 27 21:32:13 2009] [debug] proxy_util.c(2300): proxy: AJP: fam 2 socket created to connect to 10.250.206.145
              [Tue Oct 27 21:32:13 2009] [debug] ajp_utils.c(31): Into ajp_handle_cping_cpong
              [Tue Oct 27 21:32:13 2009] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done
              [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
              [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Host] = [www.testsite.com.au]
              [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [User-Agent] = [Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3]
              
              



              Any thoughts as to whats happening here?

              Thanks.


              • 4. Re: 3-node cluster with stickySession and failover

                Sounds like you want the same configuration as me.
                Users should fail-over to another app server in case of an outage.

                (I presume you've got jboss-web-cluster.sar deployed and configured for session replication too, always helps!)

                If this is the case then you need to set both values to false.

                stickySessionRemove="false"
                stickySessionForce="false"


                stickySessionRemove will remove the session stickyness if your app server fails. You do want the session to be sticky on the next app server it hits, right? So you set this to false.

                stickySessionForce will not allow the session to fail-over to another app in case of failure and will return the user an error. This 'aint what you want either!

                Also, you don't need to specify stickySession="true" as it's the default value.

                FYI - 1.1 is due out soon which will correct issues with removing ndoes when they're not ping-able and displaying them as down on the mod_balancer page so it'll be easier to see the state of your cluster from 1 place. Hopefully they'll also fix some other layout issues we've mentioned and make the default font size a little smaller =p

                • 5. Re: 3-node cluster with stickySession and failover
                  jfclere

                  "Any thoughts as to whats happening here? "

                  Can't tell more that it looks ok...
                  Would need a the CONFIG messages (to make sure that there is no domain or that the node are in a different domain).
                  Would need a little more of the headers:
                  ajp_marshal_into_msgb: Header[n] the cookie shouldn't be there.

                  • 6. Re: 3-node cluster with stickySession and failover
                    jfclere

                    "FYI - 1.1 is due out soon which will correct issues with removing ndoes when they're not ping-able and displaying them as down on the mod_balancer page so it'll be easier to see the state of your cluster from 1 place. Hopefully they'll also fix some other layout issues we've mentioned and make the default font size a little smaller =p"

                    Hmmm not sure what you mean... JIRA?

                    • 7. Re: 3-node cluster with stickySession and failover
                      rmdavidson

                      Thanks for the help guys. I might try putting each node into its own domain and see if that has any affect, but here are the rest of the headers as requested:

                      
                      Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Host] = [www.testsite.com.au]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [User-Agent] = [Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Accept] = [text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Accept-Language] = [en-us,en;q=0.5]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [Accept-Encoding] = [gzip,deflate]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[5] [Accept-Charset] = [ISO-8859-1,utf-8;q=0.7,*;q=0.7]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[6] [Referer] = [http://www.testsite.com.au/testsite/]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[7] [Cookie] = [JSESSIONID=7899B443DF124122D67DF549557B3C69.ws01; __utma=267912469.736550832.1256639446.1256639446.1256639446.1; __utmb=267912469.1.10.1256639446; __utmc=267912469; __utmz=267912469.1256639446.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __gads=ID=8ca197413c6d9267:T=1256639447:S=ALNI_MbwefJXVMFgXOa0iW3EonE8aCqbOQ]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[8] [X-Forwarded-For] = [220.253.148.49]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[9] [X-Forwarded-Host] = [www.testsite.com.au]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[10] [X-Forwarded-Server] = [www.testsite.com.au]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[11] [Connection] = [Keep-Alive]
                      [Tue Oct 27 21:32:13 2009] [debug] ajp_header.c(430): ajp_marshal_into_msgb: Done
                      
                      



                      I'll try capturing some CONFIG messages from the nodes, and I may post them here also.

                      Thanks.


                      • 8. Re: 3-node cluster with stickySession and failover
                        rmdavidson

                        Also, here is the DUMP and INFO outputs from mod_cluster-manager console:

                        DUMP:

                        
                        balancer: [1] Name: mybal Sticky: 1 [JSESSIONID]/[jsessionid] remove: 1 force: 0 Timeout: 0 Maxtry: 1
                        balancer: [2] Name: mybal-adm Sticky: 1 [JSESSIONID]/[jsessionid] remove: 1 force: 0 Timeout: 0 Maxtry: 1
                        node: [1:1],Balancer: mybal,JVMRoute: ws02,Domain: [],Host: 10.250.206.145,Port: 8009,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 33,ttl: 60,timeout: 300
                        node: [2:2],Balancer: mybal,JVMRoute: ws03,Domain: [],Host: 10.250.206.81,Port: 8009,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 33,ttl: 60,timeout: 300
                        node: [3:3],Balancer: mybal-adm,JVMRoute: ws-adm01,Domain: [],Host: 10.250.115.31,Port: 8009,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 33,ttl: 60,timeout: 300
                        node: [4:4],Balancer: mybal,JVMRoute: ws01,Domain: [],Host: 10.250.154.128,Port: 8009,Type: ajp,flushpackets: 0,flushwait: 10,ping: 10,smax: 33,ttl: 60,timeout: 300
                        host: 1 [localhost] vhost: 1 node: 3
                        host: 2 [localhost] vhost: 1 node: 2
                        host: 3 [localhost] vhost: 1 node: 1
                        host: 4 [localhost] vhost: 1 node: 4
                        context: 1 [/nuxeo] vhost: 1 node: 3 status: 1
                        context: 2 [/nuxeo] vhost: 1 node: 2 status: 1
                        context: 3 [/nuxeo] vhost: 1 node: 1 status: 1
                        context: 4 [/nuxeo] vhost: 1 node: 4 status: 1
                        
                        


                        INFO:

                        
                        Node: [1],Name: ws02,Balancer: mybal,Domain: ,Host: 10.250.206.145,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 33,Ttl: 60000000,Elected: 157,Read: 2177567,Transfered: 7490788,Connected: 0,Load: 100
                        Node: [2],Name: ws03,Balancer: mybal,Domain: ,Host: 10.250.206.81,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 33,Ttl: 60000000,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 100
                        Node: [3],Name: ws-adm01,Balancer: mybal-adm,Domain: ,Host: 10.250.115.31,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 33,Ttl: 60000000,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 0
                        Node: [4],Name: ws01,Balancer: mybal,Domain: ,Host: 10.250.154.128,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 33,Ttl: 60000000,Elected: 342,Read: 11290986,Transfered: 109115,Connected: 4,Load: 100
                        Vhost: [3:1:1], Alias: localhost
                        Vhost: [2:1:2], Alias: localhost
                        Vhost: [1:1:3], Alias: localhost
                        Vhost: [4:1:4], Alias: localhost
                        Context: [3:1:1], Context: /nuxeo, Status: ENABLED
                        Context: [2:1:2], Context: /nuxeo, Status: ENABLED
                        Context: [1:1:3], Context: /nuxeo, Status: ENABLED
                        Context: [4:1:4], Context: /nuxeo, Status: ENABLED
                        
                        




                        • 9. Re: 3-node cluster with stickySession and failover
                          rmdavidson

                          Hi all,

                          I've done more tweaking and I've come up with the following. With each node in its own unique domain (basically I've set the domain for each node to be the same value as the jvmRoute), the failover works correctly.

                          The setting (true/false) of stickSessionRemove doesn't make a difference, with or without it, failover is working as expected and sessions stay stuck to their new nodes once failed over.

                          The configuration in server.xml is currently this (this is the config from the node with jvmRoute="ws02")

                          
                           <Listener className="org.jboss.modcluster.ModClusterListener"
                           advertise="false"
                           proxyList="lb1:80,lb2:80"
                           excludedContexts="ROOT,invoker,jbossmq-httpil,jbossws,jmx-console,juddi,web-console"
                           balancer="mybal"
                           domain="ws02"
                           stickySession="true"
                           stickySessionForce="false"
                           stickySessionRemove="false"
                           nodeTimeout="300" />
                          
                          


                          Thank you very much for your help jfrederic and clarkee. Clarkee, it was your mention of domains that gave me the idea to try putting each node within its own domain, and this is what seems to have solved the problem.

                          I'm a little worried about stickySessionRemove. It doesn't seem to matter if its true or false (with or without all nodes in their own domains). It doesn't seem to have any affect on the cookie being replaced when a node fails, the only thing that seems to have cured the problem was putting the nodes in their own domains.

                          I've set stickySessionRemove to false, though I'm not sure that this is really what I want. The bahaviour I'm after is that, for example, when node ws01 crashes, the JSESSIONID cookie is updated with a new node and the client stays stuck to that node, unless that node also fails, in which case they are (hopefully) stuck to yet another node.

                          So based on that, I seem to think that the correct setting for stickySessionRemove would be "true". Is this correct?