1 2 Previous Next 19 Replies Latest reply on Dec 2, 2005 6:45 PM by kbisla

JBoss Cache using Multicast.

kbisla Oct 27, 2005 4:47 PM

I'm facing a strange problem with jboss cache using multicast, any pointers would be helpful.
I have a jboss sever (4.0.1) running Jboss cache using multicast. The setup i like this

|client|---invoking session bean to add data----------->|server|
|____|<===== cache update that data added=====|______|

The client just listenes to cache for updates. To add it calls server directly and hears about the changes in the cache.
Only the server writes to the cache. We are using DummyTransactionManager.
This whole setup works fine under linux 2.4 but under 2.6 it doesn't,
my first guess was the multicast isn't quite working right, but i checked and it was ( i used org.jgroups.tests.McastReceiverTest & McastSenderTest class to confirm).
Under 2.6 it works fine only for few minutes and then the client stops getting any updates. I have to use jboss-4.0.1/bin/twiddle.sh to stop and start the jbosscache mbean to get it working again but that also only works for few minutes. I suspect some deadlock issue.
Any ideas/direction would be greatly appreciated.
Thanks

1. Re: JBoss Cache using Multicast.

kbisla Oct 30, 2005 11:30 PM (in response to kbisla)
One other thing to add, the JBossCache running on the jboss server uses the following transaction
org.jboss.cache.JBossTransactionManagerLookup

while the client uses
DummyTransactionManagerLookup

hope this helps.
Actions
2. Re: JBoss Cache using Multicast.

ben.wang Oct 31, 2005 1:05 PM (in response to kbisla)

I am not sure why either. But if it is deadlock, it should timeout in 15 seconds (as a default). So you should be able to see it from the log.
Actions

3. Re: JBoss Cache using Multicast.

kbisla Nov 2, 2005 4:05 PM (in response to kbisla)

So here's whats happening.
Tree cache is running happily on jboss server, on turning the debug on it prints all the _put to the cache.
But the client cannot receive any cache updates (actually it does briefly some times upto a few minutes and then never hears anything again).
I suspect the treecache on the server is not writing the message out on the wire or something (the jgroup channel).

I also updated to the latest jboss-cache i.e. 1.2.4 and jgroups 2.2.8.
Not be paranoid or something but to very sure i have
jboss-4.0.1/server/default/deploy/myapp/my.ear/my-jmx.sar/jboss-cache-service.xml
as a symbolic link to the same jboss-cache-service.xml as the client is
using. so they are definitely using the same multicast port etc etc...
I have also put the xml in this post.

Also want to mention the client is a java swing application.
Another thing i did to see what's happening on the cache is, i wrote a
very simple java app which just prints what ever it hear on the cache to the console.
so after the swing app stoped receiving message, i started the simple java app which write to the console and here's what it's output.

using [jboss-cache-service.xml] cache config and [/] region
0 [main] INFO cache.PropertyConfigurator - Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@15cda3f
38 [main] INFO cache.PropertyConfigurator - configure(): attribute size: 13
60 [main] INFO cache.TreeCache - setting cluster properties from xml to: UDP(mcast_addr=228.1.2.3;mcast_port=48866;ip_ttl=64;ip_mcast=true;mcast_send_buf_size=150000;mcast_recv_buf_size=150000;ucast_send_buf_size=150000;ucast_recv_buf_size=80000;loopback=true):PING(timeout=20000;num_initial_members=1;up_thread=false;down_thread=false):MERGE2(min_interval=10000;max_interval=20000):FD_SOCK:VERIFY_SUSPECT(timeout=15000;up_thread=false;down_thread=false):pbcast.NAKACK(gc_lag=50;retransmit_timeout=600,1200,2400,4800;max_xmit_size=8192;up_thread=false;down_thread=false):UNICAST(timeout=600,1200,2400;window_size=100;min_threshold=10;down_thread=false):pbcast.STABLE(desired_avg_gossip=20000;up_thread=false;down_thread=false):FRAG(frag_size=8192;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=10000;join_retry_timeout=20000;shun=false;print_local_addr=true):pbcast.STATE_TRANSFER(up_thread=true;down_thread=true)
cache name : TreeCache-Cluster
cluster props : UDP(mcast_addr=228.1.2.3;mcast_port=48866;ip_ttl=64;ip_mcast=true;mcast_send_buf_size=150000;mcast_recv_buf_size=150000;ucast_send_buf_size=150000;ucast_recv_buf_size=80000;loopback=true):PING(timeout=20000;num_initial_members=1;up_thread=false;down_thread=false):MERGE2(min_interval=10000;max_interval=20000):FD_SOCK:VERIFY_SUSPECT(timeout=15000;up_thread=false;down_thread=false):pbcast.NAKACK(gc_lag=50;retransmit_timeout=600,1200,2400,4800;max_xmit_size=8192;up_thread=false;down_thread=false):UNICAST(timeout=600,1200,2400;window_size=100;min_threshold=10;down_thread=false):pbcast.STABLE(desired_avg_gossip=20000;up_thread=false;down_thread=false):FRAG(frag_size=8192;down_thread=false;up_thread=false):pbcast.GMS(join_timeout=10000;join_retry_timeout=20000;shun=false;print_local_addr=true):pbcast.STATE_TRANSFER(up_thread=true;down_thread=true)
init state retrival time out : 5000
cache mode : REPL_ASYNC
76 [main] WARN cache.TreeCache TreeCache - No transaction manager lookup class has been defined. Transactions cannot be used
94 [main] INFO cache.TreeCache TreeCache - interceptor chain is:
class org.jboss.cache.interceptors.CallInterceptor
class org.jboss.cache.interceptors.LockInterceptor
class org.jboss.cache.interceptors.UnlockInterceptor
class org.jboss.cache.interceptors.ReplicationInterceptor
94 [main] INFO cache.TreeCache TreeCache - cache mode is REPL_ASYNC
434 [DownHandler (UDP)] INFO protocols.UDP - sockets will use interface 192.168.3.1
437 [DownHandler (UDP)] INFO protocols.UDP - socket information:
local_addr=192.168.3.1:32854, mcast_addr=228.1.2.3:48866, bind_addr=/192.168.3.1, ttl=64
sock: bound to 192.168.3.1:32854, receive buffer size=80000, send buffer size=131071
mcast_recv_sock: bound to 192.168.3.1:48866, send buffer size=131071, receive buffer size=131071
mcast_send_sock: bound to 192.168.3.1:32856, send buffer size=131071, receive buffer size=131071

-------------------------------------------------------
GMS: address is 192.168.3.1:32854
-------------------------------------------------------
10464 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying
40468 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying
.
.
.
.
.
.
130491 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32854) failed, retrying

The shared jboss-cache-service.xml

<?xml version="1.0" encoding="UTF-8"?>

<!-- ===================================================================== -->
<!-- -->
<!-- Sample TreeCache Service Configuration -->
<!-- -->
<!-- ===================================================================== -->

<server>

 <!-- ==================================================================== -->
 <!-- Defines TreeCache configuration -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.cache.TreeCache"
 name="jboss.cache:service=TreeCache">

 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>

 <!--
 Configure the TransactionManager
 <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>
 -->
 <!--
 Isolation level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">NONE</attribute>

 <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
 <attribute name="CacheMode">REPL_ASYNC</attribute>

 <!-- Just used for async repl: use a replication queue -->
 <attribute name="UseReplQueue">true</attribute>

 <!-- Replication interval for replication queue (in ms) -->
 <attribute name="ReplQueueInterval">0</attribute>

 <!-- Max number of elements which trigger replication -->
 <attribute name="ReplQueueMaxElements">0</attribute>

 <!-- Name of cluster. Needs to be the same for all clusters, in order to find each other -->
 <attribute name="ClusterName">TreeCache-Cluster</attribute>

 <!-- JGroups protocol stack properties. Can also be a URL, e.g. file:/home/bela/default.xml
 <attribute name="ClusterProperties"></attribute>
 -->

 <attribute name="ClusterConfig">
 <config>
 <!-- UDP: if you have a multihomed machine,
 set the bind_addr attribute to the appropriate NIC IP address -->
 <!-- UDP: On Windows machines, because of the media sense feature
 being broken with multicast (even after disabling media sense)
 set the loopback attribute to true -->
 <UDP mcast_addr="228.1.2.3" mcast_port="48866"
 ip_ttl="64" ip_mcast="true"
 mcast_send_buf_size="150000" mcast_recv_buf_size="150000"
 ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
 loopback="true"/>
 <PING timeout="20000" num_initial_members="1"
 up_thread="false" down_thread="false"/>
 <MERGE2 min_interval="10000" max_interval="20000"/>
 <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
 <FD_SOCK/>
 <VERIFY_SUSPECT timeout="15000"
 up_thread="false" down_thread="false"/>
 <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
 max_xmit_size="8192" up_thread="false" down_thread="false"/>
 <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
 down_thread="false"/>
 <pbcast.STABLE desired_avg_gossip="20000"
 up_thread="false" down_thread="false"/>
 <FRAG frag_size="8192"
 down_thread="false" up_thread="false"/>
 <pbcast.GMS join_timeout="10000" join_retry_timeout="20000"
 shun="false" print_local_addr="true"/>
 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
 </config>
 </attribute>

 <!--
 Whether or not to fetch state on joining a cluster
 -->
 <attribute name="FetchStateOnStartup">true</attribute>

 <!--
 The max amount of time (in milliseconds) we wait until the
 initial state (ie. the contents of the cache) are retrieved from
 existing members in a clustered environment
 -->
 <attribute name="InitialStateRetrievalTimeout">5000</attribute>

 <!--
 Number of milliseconds to wait until all responses for a
 synchronous call have been received.
 -->
 <attribute name="SyncReplTimeout">20000</attribute>

 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <attribute name="LockAcquisitionTimeout">15000</attribute>


 <!-- Name of the eviction policy class. -->
 <attribute name="EvictionPolicyClass"></attribute>

 <!--
 Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
 class loader, e.g., inside an application server. Default is "false".
 -->
 <attribute name="UseMarshalling">false</attribute>

 </mbean>


 <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
 <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">-->
 <!-- <depends>jboss.cache:service=TreeCache</depends>-->
 <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>-->
 <!-- </mbean>-->


</server>

4. Re: JBoss Cache using Multicast.

manik Nov 3, 2005 5:13 AM (in response to kbisla)

I know this may sound simplistic, but does multicast work in the first place? I.e., have you been successful in running tests like Draw?

http://www.jgroups.org/javagroupsnew/docs/newuser/node13.html
Actions
5. Re: JBoss Cache using Multicast.

kbisla Nov 3, 2005 11:45 AM (in response to kbisla)
As i said in my original post that's the first thing i thought too and so tested multicast using
org.jgroups.tests.McastSenderTest and McastReceiverTest
and found it to be working absolutely fine. I haven't tried the draw app though.

Initially the cache seems to work fine during which the client gets all the updates etc
but after a few minutes the client never hears anything.
so to investigate further as to whats happening on the transport level i also started org.jgroups.tests.McastReceiverTest
which prints out all the updates sent out by the server,
but when the client stop to see the cache updates the McastReceiverTest
prints the NACKACKSTABLE message...
see the output below.
does this ring a bell ?
_replicateur[Ljava.lang.Object;??X?s)lxpsrjava.util.LinkedList)S]J`?"xpwsq~w_putuq~psrorg.jboss.cache.Fqnp?0??yxpwt1xt1q~srjava.lang. [sender=192.168.3.1:32773] 0228j?NAKACKSTABLE???????UDPTreeCache-Cluster[sender=192.168.3.1:32777] 0228j?NAKACKSTABLE???????UDPTreeCache-Cluster[sender=192.168.3.1:32777]

Also if i start another jboss-cache client right after the main client stops getting updates here's what it outputs....
------------------------------------------------------- GMS: address is 192.168.3.1:32789 ------------------------------------------------------- 10461 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying 40465 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying 70470 [DownHandler (GMS)] WARN pbcast.ClientGmsImpl - join(192.168.3.1:32789) failed, retrying
Actions
6. Re: JBoss Cache using Multicast.

ben.wang Nov 3, 2005 1:35 PM (in response to kbisla)

This looks like jgroups issue. So the question is what is your OS and JDK? Maybe it is this or your enviornment that is causing the problem.

One way to trouble shoot this is to use two standalone JBossCache instances (i.e., not to use MBean) and do some loadtest to see if you can re-produce this.
Actions
7. Re: JBoss Cache using Multicast.

kbisla Nov 3, 2005 1:57 PM (in response to kbisla)

The os is linux running kernel 2.6.12.2 i386 and java 1.4.2_08.
Ok I'll try running two instances of standalone jboss cache and run some tests. thanks !
Actions
8. Re: JBoss Cache using Multicast.

djmalan Nov 22, 2005 8:09 AM (in response to kbisla)

I'm having the same problem in a similar situation:

We are running a JBossCache on JBoss 4.0.3, and multiple external instances outside of JBoss, using JBossCache 1.2.4 and JGroups 2.2.9rc1. The JBoss instance is using a JBossTransactionManager and the external instances DummyTransactionManager.

Initially everything seems to work fine, but if one of the external caches is restarted after a few minutes it fails to rejoin the group:

WARN GMS: join(192.168.30.103:55055) failed (coord=192.168.30.101:59333), retrying

I've found that a workaround is to not specify a transaction manager to the JBoss instance. Of course this means that the cache is not transactional anymore, but fortunately in this application it does not matter.
Actions
9. Re: JBoss Cache using Multicast.

djmalan Nov 23, 2005 3:24 AM (in response to kbisla)

On further investigation it does not seem to be related to the transaction manager.

The following error:

2005-11-23 10:19:13 WARN GMS: join(192.168.100.82:59955) failed (coord=192.168.100.79:62951), retrying

refers to a coordinator on 192.168.100.82, but the cache instance on that particular machine was not actually running at the time, so it seems that the coordinator was shunned, but no new new coordinator was elected.

This is a nasty problem as the only way to fix it is to restart all cache instances.
Actions
10. Re: JBoss Cache using Multicast.

belaban Nov 23, 2005 7:17 AM (in response to kbisla)

This could be due to http://jira.jboss.com/jira/browse/JGRP-126, and is fixed in 2.2.9
Actions
11. Re: JBoss Cache using Multicast.

djmalan Nov 23, 2005 7:25 AM (in response to kbisla)

We are running jgroups 2.2.9rc1.
Actions
12. Re: JBoss Cache using Multicast.

kbisla Nov 23, 2005 2:56 PM (in response to kbisla)
I found the problem to be related to the d-link network driver, upgrading the driver fixed it,
but now i'm running into out of memory problem with the cache.

May be you should check out if there are updates available for the driver.
if you don't want to upgrade or don't have a new driver that fixed this,
then you could add a route for multicast, which helps in most cases.
route add -net 224.0.0.0 netmask 224.0.0.0 dev ethXYZ

On a different note, can i use JGroups.229rc1 with jboss-cache 124.
Actions
13. Re: JBoss Cache using Multicast.

belaban Nov 24, 2005 1:53 AM (in response to kbisla)

2.2.9RC1 has bot been verified to work with 1.2.4, but it should work.
Actions
14. Re: JBoss Cache using Multicast.

kbisla Nov 29, 2005 2:22 PM (in response to kbisla)

About the Out-Of-Memory Exception. one sender may be overwhelming the receivers.
Is there some kind of flow control i could add to the stack ???
I noticed the receivers report out-of memory before the sender.
Also why would the receiver go out of memory if all the sender is doing is updating the same object over and over again though at a very high rate ...
any pointer would be helpful.
Actions

1 2 Previous Next

Go to original post