0 Replies Latest reply on Sep 19, 2018 11:15 PM by prateekk.in

    [New/Edited*] Infinispan's JGroups not joining the same cluster in Docker services

    prateekk.in

      **Update** - Now need to deploy across 2 (or more) docker hosts which are in a private LAN network (on premise) and in swarm mode. For services hosted on a single docker host, they are able to form a cluster, but now the query is about services distributed across these multiple docker hosts which need to form a cluster (but are not at present). Thanks!

       

      (Query section below towards middle).

      Cross posted at Infinispan's JGroups not joining the same cluster in Docker services - Stack Overflow


      Environment:

      Infinispan 9.13

      Embedded cache in a cluster with jgroups

      Single file store

      Using JGroups inside Docker services in a single [Edit: or multiple] docker host/daemon (Not in AWS yet).

      Infinispan.xml:-

      <jgroups>

              <stack-file name="external-file" path="${path.to.jgroups.xml}"/>

      </jgroups>

      Application = 2 webapps + database

       

      Issue:

      When I deploy the 2 webapps in separate tomcats directly on a machine (not docker yet), the Infinispan cache manager initializing the cache (in each webapp) joins the same cluster using jgroups (i.e it works). But with the exact same configuration (and same channel name in jgroups), when deploying the webapps as services in docker, they don't join the same cluster (rather they are separate clusters and have just one member each - logs below)

       

      The services are docker containers from images = linux + tomcat + webapp and are launched using docker compose v3.

       

      I have tried the instructions at GitHub - belaban/jgroups-docker: Dockerfile for a container containing JGroups and a couple of demos where it suggests either using --network=host mode for the docker services (this does work but we cannot do this as the config files would need to have separate ports if we scale), or passing the external_addr=docker_host_IP_address field in jgroups.xml (this is NOT working and the query is how to make this work).

       

      Its not a timing issue as I also tried putting a significant delay in the 2nd service deployed in the stack, but still the 2 apps's Infinispan cluster have just one member in its view (that container itself). Calling the cacheManager.getMembers() also shows just one entry inside each app (should show 2).

       

      Log showing just one member in first app.

      org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView ISPN000094: Received new cluster view for channel CHANNEL_NAME: [FirstContainerId-6292|0] (1) [FirstContainerId-6292].

      org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded ISPN000079: Channel CHANNEL_NAME local address is FirstContainerId-6292, physical addresses are [10.xx.yy.zz:7800]

       

      Log showing just one member in second app:

      org.infinispan.remoting.transport.jgroups.JGroupsTransport.receiveClusterView ISPN000094: Received new cluster view for channel CHANNEL_NAME: [SecondContainerId-3502|0] (1) [SecondContainerId-3502]

      29-Apr-2018 11:47:42.357 INFO [localhost-startStop-1] org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded ISPN000079: Channel CHANNEL_NAME local address is 58cfa4b95c16-3502, physical addresses are [10.xx.yy.zz:7800]

       

      The docker compose V3 is below and shows the overlay network:

      version: "3"

      services:

        app1:

          image: app1:version

          ports:

             - "fooPort1:barPort"

          volumes:

            - "foo:bar"

          networks:

            - webnet

       

        app2:

          image: app2:version

          ports:

            -  "fooPort2:barPort"

          volumes:

           - "foo:bar"

          networks:

            - webnet

       

      volumes:

         dbdata:

       

      networks:

         webnet:

       

      Deployed using $docker stack deploy --compose-file docker-compose.yml OurStack

       

      The JGroups.xml has the relevant config part below:

      <TCP

               external_addr="${ext-addr:docker.host.ip.address}"

               bind_addr="${jgroups.tcp.address:127.0.0.1}"

               bind_port="${jgroups.tcp.port:7800}"

               enable_diagnostics="false"

               thread_naming_pattern="pl"

               send_buf_size="640k"

               sock_conn_timeout="300"

               bundler_type="sender-sends-with-timer"

               thread_pool.min_threads="${jgroups.thread_pool.min_threads:1}"

               thread_pool.max_threads="${jgroups.thread_pool.max_threads:10}"

               thread_pool.keep_alive_time="60000"/>

          <MPING bind_addr="${jgroups.tcp.address:127.0.0.1}"

                 mcast_addr="${jgroups.mping.mcast_addr:228.2.4.6}"

                 mcast_port="${jgroups.mping.mcast_port:43366}"

                 ip_ttl="${jgroups.udp.ip_ttl:2}"/>

       

       

      The code is similar to:

      DefaultCacheManager manager = new DefaultCacheManager(jgroupsConfigFile.getAbsolutePath());

      Cache someCache = new Cache(manager.getCache("SOME_CACHE").getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES));

       

      Query:

      How do we deploy with docker-compose (as two services in docker containers) and jgroups.xml above so that the Infinispan cache in each of the two webapps join and form a cluster - so both apps can access the same data each other read/write in the cache. RIght now they connect to the same channel name and each becomes a cluster with one member, even if we point jgroups to external_addr.

       

      Tried so far:

      1. Putting delay in second service's startup so first has enough time to advertise.
      2. JGroups - Running JGroups in Docker am able to deploy the belaban/jgroups containers as two services in a stack using docker compose and they are able to form a cluster (chat.sh inside container shows 2 member view).
      3. Tried --net=host which works but infeasible. Tried external_addr=docker.host.ip in jgroups.xml which is the ideal solution but its not working (the log above is from that).

       

      Thanks! Will try to provide any specific info if required.