1 2 Previous Next 19 Replies Latest reply on Dec 2, 2005 6:45 PM by kbisla

    JBoss Cache using Multicast.

    kbisla

      I'm facing a strange problem with jboss cache using multicast, any pointers would be helpful.
      I have a jboss sever (4.0.1) running Jboss cache using multicast. The setup i like this

      |client|---invoking session bean to add data----------->|server|
      |____|<===== cache update that data added=====|______|

      The client just listenes to cache for updates. To add it calls server directly and hears about the changes in the cache.
      Only the server writes to the cache. We are using DummyTransactionManager.
      This whole setup works fine under linux 2.4 but under 2.6 it doesn't,
      my first guess was the multicast isn't quite working right, but i checked and it was ( i used org.jgroups.tests.McastReceiverTest & McastSenderTest class to confirm).
      Under 2.6 it works fine only for few minutes and then the client stops getting any updates. I have to use jboss-4.0.1/bin/twiddle.sh to stop and start the jbosscache mbean to get it working again but that also only works for few minutes. I suspect some deadlock issue.
      Any ideas/direction would be greatly appreciated.
      Thanks

        • 1. Re: JBoss Cache using Multicast.
          kbisla

          One other thing to add, the JBossCache running on the jboss server uses the following transaction

          org.jboss.cache.JBossTransactionManagerLookup

          while the client uses
          DummyTransactionManagerLookup

          hope this helps.

          • 2. Re: JBoss Cache using Multicast.

            I am not sure why either. But if it is deadlock, it should timeout in 15 seconds (as a default). So you should be able to see it from the log.

            • 3. Re: JBoss Cache using Multicast.
              kbisla

              So here's whats happening.
              Tree cache is running happily on jboss server, on turning the debug on it prints all the _put to the cache.
              But the client cannot receive any cache updates (actually it does briefly some times upto a few minutes and then never hears anything again).
              I suspect the treecache on the server is not writing the message out on the wire or something (the jgroup channel).

              I also updated to the latest jboss-cache i.e. 1.2.4 and jgroups 2.2.8.
              Not be paranoid or something but to very sure i have
              jboss-4.0.1/server/default/deploy/myapp/my.ear/my-jmx.sar/jboss-cache-service.xml
              as a symbolic link to the same jboss-cache-service.xml as the client is
              using. so they are definitely using the same multicast port etc etc...
              I have also put the xml in this post.

              Also want to mention the client is a java swing application.
              Another thing i did to see what's happening on the cache is, i wrote a
              very simple java app which just prints what ever it hear on the cache to the console.
              so after the swing app stoped receiving message, i started the simple java app which write to the console and here's what it's output.

              using [jboss-cache-service.xml] cache config and [/] region
              0 [main] INFO cache.PropertyConfigurator - Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@15cda3f
              38 [main] INFO cache.PropertyConfigurator - configure(): attribute size: 13
              60 [main] INFO cache.TreeCache - setting cluster properties from xml to: UDP(mcast_addr=228.1.2.3;mcast_port=48866;ip_ttl=64;ip_mcast=true;mcast_send_buf_size=150000;mcast_recv_buf_size=150000;ucast_send_buf_size=150000;ucast_recv_buf_size=80000;loopback=true):PING(timeout=20000;num_initial_members=1;up_thread=false;down_thread=false):MERGE2(min_interval=10000;max_interval=20000):FD_SOCK:VERIFY_SUSPECT(timeout=15000;up_thread=false;down_thread=false):pbcast.NAKACK(gc_lag=50;retransmit_timeout=600,1200,2400,4800;max_xmit_size=8192;up_thread=false;down_thread=false):UNICAST(timeout=600,1200,2400;window_size=100;min_threshold=10;down_thread=false):pbcast.STABLE(desired_avg_gossip=20000;up_thread=false;down_thread=false):FRAG(frag_size=8192;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=10000;join_retry_timeout=20000;shun=false;print_local_addr=true):pbcast.STATE_TRANSFER(up_thread=true;down_thread=true)
              cache name : TreeCache-Cluster
              cluster props : UDP(mcast_addr=228.1.2.3;mcast_port=48866;ip_ttl=64;ip_mcast=true;mcast_send_buf_size=150000;mcast_recv_buf_size=150000;ucast_send_buf_size=150000;ucast_recv_buf_size=80000;loopback=true):PING(timeout=20000;num_initial_members=1;up_thread=false;down_thread=false):MERGE2(min_interval=10000;max_interval=20000):FD_SOCK:VERIFY_SUSPECT(timeout=15000;up_thread=false;down_thread=false):pbcast.NAKACK(gc_lag=50;retransmit_timeout=600,1200,2400,4800;max_xmit_size=8192;up_thread=false;down_thread=false):UNICAST(timeout=600,1200,2400;window_size=100;min_threshold=10;down_thread=false):pbcast.STABLE(desired_avg_gossip=20000;up_thread=false;down_thread=false):FRAG(frag_size=8192;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=10000;join_retry_timeout=20000;shun=false;print_local_addr=true):pbcast.STATE_TRANSFER(up_thread=true;down_thread=true)
              init state retrival time out : 5000
              cache mode : REPL_ASYNC
              76 [main] WARN cache.TreeCache TreeCache - No transaction manager lookup class has been defined. Transactions cannot be used
              94 [main] INFO cache.TreeCache TreeCache - interceptor chain is:
              class org.jboss.cache.interceptors.CallInterceptor
              class org.jboss.cache.interceptors.LockInterceptor
              class org.jboss.cache.interceptors.UnlockInterceptor
              class org.jboss.cache.interceptors.ReplicationInterceptor
              94 [main] INFO cache.TreeCache TreeCache - cache mode is REPL_ASYNC
              434 [DownHandler (UDP)] INFO protocols.UDP - sockets will use interface 192.168.3.1
              437 [DownHandler (UDP)] INFO protocols.UDP - socket information:
              local_addr=192.168.3.1:32854, mcast_addr=228.1.2.3:48866, bind_addr=/192.168.3.1, ttl=64
              sock: bound to 192.168.3.1:32854, receive buffer size=80000, send buffer size=131071
              mcast_recv_sock: bound to 192.168.3.1:48866, send buffer size=131071, receive buffer size=131071
              mcast_send_sock: bound to 192.168.3.1:32856, send buffer size=131071, receive buffer size=131071
              
              -------------------------------------------------------
              GMS: address is 192.168.3.1:32854
              -------------------------------------------------------
              10464 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying
              40468 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying
              .
              .
              .
              .
              .
              .
              130491 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying
              


              The shared jboss-cache-service.xml


              <?xml version="1.0" encoding="UTF-8"?>
              
              <!-- ===================================================================== -->
              <!-- -->
              <!-- Sample TreeCache Service Configuration -->
              <!-- -->
              <!-- ===================================================================== -->
              
              <server>
              
               <!-- ==================================================================== -->
               <!-- Defines TreeCache configuration -->
               <!-- ==================================================================== -->
              
               <mbean code="org.jboss.cache.TreeCache"
               name="jboss.cache:service=TreeCache">
              
               <depends>jboss:service=Naming</depends>
               <depends>jboss:service=TransactionManager</depends>
              
               <!--
               Configure the TransactionManager
               <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>
               -->
               <!--
               Isolation level : SERIALIZABLE
               REPEATABLE_READ (default)
               READ_COMMITTED
               READ_UNCOMMITTED
               NONE
               -->
               <attribute name="IsolationLevel">NONE</attribute>
              
               <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
               <attribute name="CacheMode">REPL_ASYNC</attribute>
              
               <!-- Just used for async repl: use a replication queue -->
               <attribute name="UseReplQueue">true</attribute>
              
               <!-- Replication interval for replication queue (in ms) -->
               <attribute name="ReplQueueInterval">0</attribute>
              
               <!-- Max number of elements which trigger replication -->
               <attribute name="ReplQueueMaxElements">0</attribute>
              
               <!-- Name of cluster. Needs to be the same for all clusters, in order to find each other -->
               <attribute name="ClusterName">TreeCache-Cluster</attribute>
              
               <!-- JGroups protocol stack properties. Can also be a URL, e.g. file:/home/bela/default.xml
               <attribute name="ClusterProperties"></attribute>
               -->
              
               <attribute name="ClusterConfig">
               <config>
               <!-- UDP: if you have a multihomed machine,
               set the bind_addr attribute to the appropriate NIC IP address -->
               <!-- UDP: On Windows machines, because of the media sense feature
               being broken with multicast (even after disabling media sense)
               set the loopback attribute to true -->
               <UDP mcast_addr="228.1.2.3" mcast_port="48866"
               ip_ttl="64" ip_mcast="true"
               mcast_send_buf_size="150000" mcast_recv_buf_size="150000"
               ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
               loopback="true"/>
               <PING timeout="20000" num_initial_members="1"
               up_thread="false" down_thread="false"/>
               <MERGE2 min_interval="10000" max_interval="20000"/>
               <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
               <FD_SOCK/>
               <VERIFY_SUSPECT timeout="15000"
               up_thread="false" down_thread="false"/>
               <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
               max_xmit_size="8192" up_thread="false" down_thread="false"/>
               <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
               down_thread="false"/>
               <pbcast.STABLE desired_avg_gossip="20000"
               up_thread="false" down_thread="false"/>
               <FRAG frag_size="8192"
               down_thread="false" up_thread="false"/>
               <pbcast.GMS join_timeout="10000" join_retry_timeout="20000"
               shun="false" print_local_addr="true"/>
               <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
               </config>
               </attribute>
              
               <!--
               Whether or not to fetch state on joining a cluster
               -->
               <attribute name="FetchStateOnStartup">true</attribute>
              
               <!--
               The max amount of time (in milliseconds) we wait until the
               initial state (ie. the contents of the cache) are retrieved from
               existing members in a clustered environment
               -->
               <attribute name="InitialStateRetrievalTimeout">5000</attribute>
              
               <!--
               Number of milliseconds to wait until all responses for a
               synchronous call have been received.
               -->
               <attribute name="SyncReplTimeout">20000</attribute>
              
               <!-- Max number of milliseconds to wait for a lock acquisition -->
               <attribute name="LockAcquisitionTimeout">15000</attribute>
              
              
               <!-- Name of the eviction policy class. -->
               <attribute name="EvictionPolicyClass"></attribute>
              
               <!--
               Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
               class loader, e.g., inside an application server. Default is "false".
               -->
               <attribute name="UseMarshalling">false</attribute>
              
               </mbean>
              
              
               <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
               <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">-->
               <!-- <depends>jboss.cache:service=TreeCache</depends>-->
               <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>-->
               <!-- </mbean>-->
              
              
              </server>
              


              • 4. Re: JBoss Cache using Multicast.
                manik

                I know this may sound simplistic, but does multicast work in the first place? I.e., have you been successful in running tests like Draw?

                http://www.jgroups.org/javagroupsnew/docs/newuser/node13.html

                • 5. Re: JBoss Cache using Multicast.
                  kbisla

                  As i said in my original post that's the first thing i thought too and so tested multicast using
                  org.jgroups.tests.McastSenderTest and McastReceiverTest
                  and found it to be working absolutely fine. I haven't tried the draw app though.

                  Initially the cache seems to work fine during which the client gets all the updates etc
                  but after a few minutes the client never hears anything.
                  so to investigate further as to whats happening on the transport level i also started org.jgroups.tests.McastReceiverTest
                  which prints out all the updates sent out by the server,
                  but when the client stop to see the cache updates the McastReceiverTest
                  prints the NACKACKSTABLE message...
                  see the output below.
                  does this ring a bell ?

                  _replicateur[Ljava.lang.Object;??X?s)lxpsrjava.util.LinkedList)S]J`?"xpwsq~w_putuq~psrorg.jboss.cache.Fqnp?0??yxpwt1xt1q~srjava.lang.
                  [sender=192.168.3.1:32773]
                  0228j?NAKACKSTABLE???????UDPTreeCache-Cluster[sender=192.168.3.1:32777]
                  0228j?NAKACKSTABLE???????UDPTreeCache-Cluster[sender=192.168.3.1:32777]
                  

                  Also if i start another jboss-cache client right after the main client stops getting updates here's what it outputs....
                  -------------------------------------------------------
                  GMS: address is 192.168.3.1:32789
                  -------------------------------------------------------
                  10461 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying
                  40465 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying
                  70470 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying
                  
                  


                  • 6. Re: JBoss Cache using Multicast.

                    This looks like jgroups issue. So the question is what is your OS and JDK? Maybe it is this or your enviornment that is causing the problem.

                    One way to trouble shoot this is to use two standalone JBossCache instances (i.e., not to use MBean) and do some loadtest to see if you can re-produce this.

                    • 7. Re: JBoss Cache using Multicast.
                      kbisla

                      The os is linux running kernel 2.6.12.2 i386 and java 1.4.2_08.
                      Ok I'll try running two instances of standalone jboss cache and run some tests. thanks !

                      • 8. Re: JBoss Cache using Multicast.

                        I'm having the same problem in a similar situation:

                        We are running a JBossCache on JBoss 4.0.3, and multiple external instances outside of JBoss, using JBossCache 1.2.4 and JGroups 2.2.9rc1. The JBoss instance is using a JBossTransactionManager and the external instances DummyTransactionManager.

                        Initially everything seems to work fine, but if one of the external caches is restarted after a few minutes it fails to rejoin the group:

                        WARN GMS: join(192.168.30.103:55055) failed (coord=192.168.30.101:59333), retrying

                        I've found that a workaround is to not specify a transaction manager to the JBoss instance. Of course this means that the cache is not transactional anymore, but fortunately in this application it does not matter.

                        • 9. Re: JBoss Cache using Multicast.

                          On further investigation it does not seem to be related to the transaction manager.

                          The following error:

                          2005-11-23 10:19:13 WARN GMS: join(192.168.100.82:59955) failed (coord=192.168.100.79:62951), retrying

                          refers to a coordinator on 192.168.100.82, but the cache instance on that particular machine was not actually running at the time, so it seems that the coordinator was shunned, but no new new coordinator was elected.

                          This is a nasty problem as the only way to fix it is to restart all cache instances.

                          • 10. Re: JBoss Cache using Multicast.
                            belaban

                            This could be due to http://jira.jboss.com/jira/browse/JGRP-126, and is fixed in 2.2.9

                            • 11. Re: JBoss Cache using Multicast.

                              We are running jgroups 2.2.9rc1.

                              • 12. Re: JBoss Cache using Multicast.
                                kbisla

                                I found the problem to be related to the d-link network driver, upgrading the driver fixed it,
                                but now i'm running into out of memory problem with the cache.

                                May be you should check out if there are updates available for the driver.
                                if you don't want to upgrade or don't have a new driver that fixed this,
                                then you could add a route for multicast, which helps in most cases.

                                route add -net 224.0.0.0 netmask 224.0.0.0 dev ethXYZ
                                


                                On a different note, can i use JGroups.229rc1 with jboss-cache 124.



                                • 13. Re: JBoss Cache using Multicast.
                                  belaban

                                  2.2.9RC1 has bot been verified to work with 1.2.4, but it should work.

                                  • 14. Re: JBoss Cache using Multicast.
                                    kbisla

                                    About the Out-Of-Memory Exception. one sender may be overwhelming the receivers.
                                    Is there some kind of flow control i could add to the stack ???
                                    I noticed the receivers report out-of memory before the sender.
                                    Also why would the receiver go out of memory if all the sender is doing is updating the same object over and over again though at a very high rate ...
                                    any pointer would be helpful.

                                    1 2 Previous Next