Questions and Answers on mod_cluster webinar

Version 3

    Want to know more, explained by the developper see:

    http://www.vimeo.com/13180921

    and read the resulting QA:

     

    Q: Is the demo application available?
    A: Yes, it's part of the mod-cluster download (under demo/client). The
       SessionDemo itself is not available, but it's a simple demo
       adding data to an HTTP session. I can make it available if
       necessary...

     


    Q: Are the slides available?
    A: www.jboss.org/webinars

     


    Q: Is this a direct competition to Terracotta's offering?
    A: No; mod-cluster is about (1) dynamic discovery of workers, (2) web
       applications, and (3) intelligent load balancing. Clustering is an
       orthogonal aspect; as a matter of fact, clustering could be used
       among a number of workers which are not clustered.

     


    Q: Is the clustering between jboss instances within a domain done @ JVM level?
    A: No; we use JGroups (www.jgroups.org) and JBossCache
       (jboss.org/jbosscache) to replicate sessions. In JBoss 6, we've
       replaced JBossCache with Infinispan (infinispan.org) to replicate
       and/or distribute sessions among a cluster.

     


    Q: Why should the deployment topology use httpd? Can't the tomcat (bundled in JBoss) use APR.
    A: Yes, JBossWeb can use APR, and as a matter of fact does use it if
       the shared APR lib is found on the library path. However, using APR
       and httpd are orthogonal issues; while the mod-cluster module could
       theoretically be used in JBossWeb directly, we haven't tried it
       out, as many deployments still use httpd in production.
       Note that JBossWen cannot be used as a reverse proxy.

     


    Q: What are the steps involved to migrate a setup which is on mod_jk to mod_cluster?
    A: There are only a few steps involved (more details can be found on
       jboss.org/mod_cluster):
       - Use the httpd modules downloadable from jboss.org/mod_cluster
       - Configure httpd.conf accordingly
       - Drop workers.properties and uriworkermap.properties
       - Configure JBoss AS to include the addresses of the httpd
         daemon(s) running
       - (Optional) Configure the domain for the JBoss AS instance
       The steps are described in detail in
       http://docs.jboss.org/mod_cluster/1.1.0/html/mod_jk.html

     

     

     


    Q: Are there seperate logging mechanism for mod_cluster like we use to have for mod_jk
    A: No; mod-cluster uses the normal httpd log, and this is configured in httpd.conf (similar
       to mod-jk / mod-proxy). On the JBoss AS side, the normal AS logging
       is used (e.g. conf/log4j.xml)

     

     

     

    Q: Is the mod_cluster the same as mod_proxy_balancer?
    A: No; mod_proxy_balancer requires manual configuration
       (e.g. hosts to be balanced over). Also, web applications have to be
       present on all hosts, and don't register themselves
       automatically. Plus, mod_proxy_balancer doesn't have any notion of
       load balance factors sent to it by the workers.

     


    Q: I have an application that uses an HASingleton(ejbtimer). In case of a multidomains architecture, my application would fail because I would have an ejbtimer in each domain. How would you get a large cluster work in this scenario.
    A: If one singleton timer per domain is not desired, then one could
       place the singleton timer into a separate cluster, which spans
       multiple domains. Note that an HASingleton ejb timer and
       distributed cache will use separate channels by default.

     


    Q: Is it not efficient to avoid sticky-sessions? If we avoided sticky sessions, then We could use hardware based load-balancers which did load balancing @ Transport (TCP/IP) layer rather than Application layer.
    A: Making sessions non-sticky means that access to sessions can be random, ie. requests
       for an HTTP session can go to any node within a domain. However, this means that we should
       not use asynchronous replication, as a write to an attribute followed by an immediate read
       of the same attribute but on a different node might lead to the reading of stale data.
       However, using synchronous replication is slower because every write incurs a round trip to
       the cluster, and the caller blocks until all responses have been received.
       Our recommendation is to use sticky sessions and asynchronous replication, for the best
       performance.

     


    Q: Is it possible to configure mod_cluster or mod_jk in a way that certain IPs requests go to just a particular domain
    A: Not easily. One could configure virtual hosts in httpd.conf, and
       workers connect to certain virtual hosts only, but there is no
       enforcement of which domains are hit from the httpd side.

     


    Q: We used appliance for loadbalancing. Can we use mod-cluster for dynamic configuration instead of using static properties?
    A: No, mod-cluster requires the httpd to run. We intend to talk to load balancer vendors and
       get them to implement the MCMP protocol, so that their balancers could be used with
       mod-cluster enabled workers.

     

     

     

    Q:Is mod_cluster delivered as a native module in Apache just as mod_proxy?
    A: Yes, on the httpd side. On the JBoss AS side, we use a service archive (mod_cluster.sar), in /deploy

     


    Q:a little more general clustering question. What about distributing jboss servers across datacenters but that belong to the same cluster?
    A: This is possible, however, in most cases IP multicasting would not be available over a
       WAN. Therefore, the configuration of JBoss AS should use a TCP based stack rather than a UDP
       based stack.

     


    Q: Can you suggest the pattern to cluster the Apache server for Fail over when acting as Load balancer for Jboss Cluster
    A: This is very simple: just start multiple httpds and add them to JBoss AS,
       e.g. mod_cluster.proxyList=host1:8000,Host2:8000 etc
       Workers (JBoss instances) will then register themselves and their applications with all
       httpds in the list.

     


    Q: Is mod_cluster available with JBoss AS (community) or JBoss Enterprise Application Platform from Red Hat?
    A: Currently, mod-cluster 1.1.0.CR3 will ship as part of JBoss AS 6. The mod-cluster
       functionality is part of EAP 5.0.1 and will also be part of JBoss EAP 5.1.

     


    Q: Can the worker nodes be configured from JON?
    A: Not yet (with respect to mod-cluster configuration). This is on the roadmap.

     

     

     

    Q: What is the configuration for dynamically adding nodes as load increases?
    A: This feature is not available. It might be available as part of our Deltacloud product. Currently, third party vendor's products, such as RightScale, could be used to do this.

     


    Q: Which version of mod_cluster do you use ? in my version i cannot see the sessions
    A: To see sessions in mod_cluster_manager, the following entry has to be added to httpd.conf:
       <IfModule mod_manager.c>
            MaxsessionId 50
       </IfModule>

     

       Note that sessions are by default not shown in mod_cluster_manager.
       Refer to the documentation at jboss.org/mod_cluster for details.

     


    Q: can you show config quickly how mod_cluster automatically detect new hosts
    A: When a new JBoss instance is started, as soon as the mod_cluster.sar service is deployed,
       the host and all of its applications will be registered with all httpds, so this happens
       immediately.

     


    Q: What do you advice in a multi-datacenter setup? Can we use mod_cluster and won't this cause a event-storm when one of the datacenters goes down?
    A: When you have domains across multiple data centers, and one data center goes down, then the
       other data center has to accommodate the traffic from the data center which is down. This
       causes more traffic to the surviving data center, so when doing capacity planning this
       should be taken into account. If the nodes in a domain run in the cloud, then one could
       envisage automatically starting new virtualized instances to accommodate the handling of
       this increased traffic.

     


    Q: Are there seperate logging mechanism for mod_cluster like we use to have for mod_jk
    A: mod-cluster is configured through the usual mechanism in httpd.conf

     

     

     

    Q: Do we need a mod_cluster manager on all nodes [in the cluster]?
    A: Note that mod_cluster_manager is only available on the httpd side

     

     

     

    Q: Is gossiprouter high available?
    A: Yes, multiple GossipRouters can be started. Note that, if running only on EC2, then a
       protocol called S3_PING can be used as an alternative. It uses an S3 bucket to store cluster
       topology information.

     


    Q: For the group of HTTP daemons in front of the clusters, I assume those can be round robin'd DNS, or any other method of load balancing them?
    A: No, DNS round robin (or a hardware load balancer fronting the httpds) works. When using
       sticky sessions, the jsessionid is sent with each request (cookie or URL rewriting) and it
       is suffixed with the jvmRoute of the node which hosts a given session.

     

     

     

    Q: Does JBoss support UNICAST messaging?
    A: Yes; JGroups would have to be configured appropriately to do that. When using TCP, this is
       done automatically. When using UDP, ip_mcast="false" would have to be set.

     

     

     

    Q: Is there support for mount point exclusions like JkUnMount in mod-jk?
    A: Yes, use <property name="excludedContexts">jmx-console,web-admin,ROOT</property> in
       /deploy/mod_cluster.sar/META-INF/mod_cluster-jboss-beans.xml

     

     

     

    Q: What are the steps involved to migrate a setup which is on mod_jk to mod_cluster?
    A: See the previous answer above (http://docs.jboss.org/mod_cluster/1.1.0/html/mod_jk.html)

     

     

     

    Q: There is implicit, a concept, of starting connections from the jboss "backend" to the frontend" ,this seems odd to me
    A: This is only conceptual; workers will *not* create a socket connection to
       httpd. Instead httpd connects to the workers (ie. JBoss AS instances) and the workers use
       the same channel to send status updates, registration of web applications etc.

     

     

     

    Q: Can you use buddy list to replicate session accross domains?
    A: Yes, that can be done, as a domain doesn't need to have the same scope as a cluster; a
       cluster can span multiple domains. However, for scalability purposes, we recommend to
       restrict a cluster to a domain

     


    Q: How does full replication in each domain compare to using buddy replication and just one cluster/domain?
    A: The scalability of full replication is a function of cluster size and average data size, so
       if we have many nodes and/or large data sets, then we hit a scalability ceiling.
       If DATA_SIZE * NUMBER_OF_HOSTS is smaller than the memory available to each host, the full
       replication is preferred, as reads are always local. If this is not the case, then we can
       use multiple domains, or we can use one single cluster, but switch from full replication to
       either buddy replication (JBossCache) or distribution (Infinispan).
       Distribution only stores N copies of a session, therefore scales much better than full
       replication.

     


    Q: Is there any turorial provided?
    A: There's a quick start guide available at jboss.org/mod_cluster

     

     

     

    Q: Is it possible to limit which hosts are allowed to join the cluster easily?
    A: Yes. This can be done at the JGroups level, by using a protocol called AUTH
       (http://community.jboss.org/wiki/JGroupsAUTH). It provides passwords, X.509 certificates,
       host lists and simple MD5 hashes as authentication, but it is pluggable, so other mechanisms
       can be included. Post questions on AUTH to the JGroups mailing list (jgroups.org).

     

     

     

    Q: to uprade without downtime you have to have at least two domains for each application, right?
    A: Yes

     


    Q: Is there any method/workaround to avail Session Replication across Domains?
    A: A cluster isn't restricted in scope to a domain, it can span multiple domains. However, that
       defeats the purpose of a domain (divide-and-conquer), and makes rolling upgrade more
       difficult. For instance, if a cluster spans 2 domains, then it is better to club the 2
       domains together into one.

     

     

     

    Q: I missed some of the demo - I saw the session replication/migration in the demo, but wanted to know if I have 2 apache servers in front of the jboss cluster and a network load balancer doing round robins will mod_cluster maintain the session across them?
    A: Yes. The jvmRoute is appended to the jsessionid and identifies the node in a given domain
       uniquely. See also the question above on DNS round robin.

     

     

     

    Q: on apache side, which is required versions? 2.2 or also 2.0?
    A: 2.2.8 or higher

     

     

     

    Q: Im using JBoss 4.2.2 GA.. Should I migrate to JBoss6?
    A: JBoss 5 or higher. You *can* use mod_cluster with JBoss 4.2.2 - but
       you'd need to configure it as you would for JBoss Web standalone
       (or Tomcat) - and consequently has slightly limited functionality,
       e.g. no HA-mode, limited to 1 load metric.

     


    Q: UDP broadcast?
    A: The ability to send a packet to all hosts on a given subnet. IP multicasting is more
       efficient because a packet is only sent to subscribed hosts. IP multicasting is more efficient
       than TCP is large clusters, because the switch copies the packet to all recipients, whereas
       with TCP a packet has to be sent N-1 times (where N is the cluster size)

     

     

     

    Q: Normally how much time it takes for new node to be detected by mod_cluster..is it configurable?
    A: No, it is not configurable. As soon as the JBoss instance is started, it (and its webapps)
       will get registered.
       The time required to do this depends on how the node finds out
       about the proxy. If you've configured mod_cluster with a static
       proxy list, then it registers with the httpd proxy upon startup. If
       you configured mod_cluster server-side to use an HASingleton (via
       HAModClusterService), then it knows about the proxy upon joining
       the cluster - also upon startup. Otherwise, you are relying on the
       advertise mechanism - so the time required to register with the
       proxy is a product of the advertise interval (AdvertiseFrequency,
       configured in httpd.conf), and the status interval
       (Engine.backgroundProcessorDelay, configured in server.xml)

     

     

     

    Q: how the new servers got added pick up the sessions? are they new or existing sessions?
    A: The new servers use a mechanism provided by JGroups called state transfer (see
       http://www.jgroups.org/manual/html/user-channel.html#GetState), which copies the existing
       sessions into a new server. This way, the new server can be failed over to should an
       existing server crash.
       Note that state transfer is not needed if we use distribution instead of replication (see
       above).

     

     

     

    Q: When performing rolling upgrades, how do you mitigate issues where the database schema changes? So certain domains may be using JNDI to hook into one core db - if another domain is upgraded in a roll out then hibernate will update / alter those tables
    A: Schema migration is a difficult topic, outside the scope of mod-cluster. One possible way
       could be to have a separate DB in the new domain, drain the old domain, and - when the old
       domain is shut down - transfer the data from the old to the new DB. But, again, this is very
       application dependent, and generic advise moot.

     


    Q: Is mod_cluster also wokring with JBoss 5.1 with the same power, or does it require Jboss 6?
    A: mod-cluster works with 5.1, but is already integrated into AS 6 out
       of the box.
       The latest mod_cluster 1.1.0.CR3 release will work with JBoss 5.1
       with no configuration changes - just drop in the mod_cluster.sar
       into the $JBOSS_HOME/server/all/deploy directory.

     


    Q: How do nodes identify other nodes within their cluster?  In other words how do EC2 nodes only cluster with EC2 nodes etc.?
    A: Nodes find other nodes through JGroups (www.jgroups.org). On EC2, we can either use a
       GossipRouter, which is a separate lookup process, or S3_PING which is based on S3 buckets.
       A cluster is defined via (a) the same configuration and (b) the same cluster name. All nodes
       which have (a) and (b) form a cluster. Nodes which have (a) but a different cluster name for
       a different cluster.

     

     

     

    Q: Is it possible to shutdown and drain a single web app?
    A: Yes. The steps are:
       - Disable the app
       - Wait until the sessions for the app have drained
       - Undeploy the app
       - Deploy the new app
       Note that the old and new webapp needs to be compatible, ie. classes cannot change between
       redeployments.
       If there is an incompatible change, I recommend to drain all webapps of the same type
       (context) in a domain.

     

       Note that undeploy of a web application will perform the above
       operations automatically! Use the
       stopContextTimeout/stopContextTimeoutUnit config properties to
       control the default drain timeout. If you're using session
       replication, then you don't need to wait for all sessions to drain
       - just all current requests to complete, since those session will
       be available elsewhere. The method of draining is determined by
       whether or not the target web application is distributable or
       not. Additionally, the sessionDrainingStrategy config property can
       be used to always force session draining, even for distributable
       web applications.

     

       Alternatively, you can stop a single context manually in once step
       via the stopContext(...) JMX operation.

     

     

     

    Q: Is mod_cluster delivered as a native module in Apache , just as mod_proxy?
    A: Yes

     

     

     

    Q: Does the "load balancer demo app" come with mod_cluster?
    A: Yes, under /demo/client

     

     

     

    Q: Can you configure the jboss nodes to announce themselves to the httpd servers over a local/private network keeping that communication private and seperate from the public access to the application?
    A: Yes. You can - since a separate connection is used, provided these
       routes exist. This private network address/port would be provided by
       the advertise mechanism or via the server-side proxyList.
       The private and public network could be created in httpd.conf,
       using virtual hosts.

     

     

     

    Q: When a new version of a web app is deployed, how does JBoss/mod_cluster know how to replicate between old versions and new versions.
    A: The webapp needs to be compatible to existing versions. If it isn't, deploy it into a new
       domain, or redeploy all existing webapps of the same type.

     

     

     


    Q: Can mod_clustered enabled when v use configure Elastic Load Balance?
    A: Yes, but this doesn't make much sense. Compared to ELB, mod-cluster is (1) cloud independent
       (ELB only exists in EC2), (2) allows for dynamic registration of workers (this is static in
       ELB), (3) allows for dynamic registration/de-registration of webapps (ELB doesn't) and (4)
       sends dynamic load balancer information back to httpd (ELB has some built-in LB
       functionality, but it is not extensible).

     

     

     


    Q: what about the performance when we divide one large cluster in to small clusters ?
    A: Performance is probably better, for various reasons. For example, if we use TCP, cluster
       wide calls (RPCs) have a cost of N-1. With smaller N's, these calls become less costly.