-
1. Re: Remoting On Existing POJO Channel
shorrockin Jul 6, 2006 5:29 PM (in response to shorrockin)For all who care I finally figured out a way to get this to work (which also may point out a bug inside TreeCache).
First thing you have to do is create your JChannel before you create your PojoCache. My earlier attempts (in my first message) extended the PojoCache to expose the JChannel through a public method, this did not work and resulted in the error listed in my original message, this method of creating your JChannel before hand works but only after some tweaking.
The main problem is that the method below (in TreeCache.class), when you use your own JChannel, on line 3177 waits forever:synchronized (members) { while (members.size() == 0) { log.debug("waiting on viewAccepted()"); try { members.wait(); } catch (InterruptedException iex) { } } }
To fix this we have to figure out why this works when we allow JBossCache to create the JChannel but doesn't work when we do it ourself. I believe the problem lies in _createService method. If you look at this method there's an if statement which checks to see if a JChannel exists and returns (prematurally) like such:log.info("cache mode is " + mode2String(cache_mode)); if (channel != null) { // already started log.info("channel is already running"); return; }
The probelm with returning is that after the if block there are several lines which attach a listener that I assume is required, the absence of this listener causes problems:channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE); channel.setOpt(Channel.AUTO_GETSTATE, Boolean.TRUE); // COMMENTED OUT CODE HERE disp = new RpcDispatcher(channel, ml, this, this); disp.setMarshaller(getMarshaller()); break;
So because this code doesn't get executed we have to manually attach this listener ourselves. The whole process of creating the JChannel and setting up the listener looks like:JChannelFactory factory = new JChannelFactory( this.getClass().getClassLoader().getResource( _jgroupsConfiguration ) ); _channel = (JChannel) factory.createChannel(); _channel.setOpt(Channel.GET_STATE_EVENTS, Boolean.TRUE); _channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE); _channel.setOpt(Channel.AUTO_GETSTATE, Boolean.TRUE); // creates our own dispatcher used to proxy method requests across the cluster _dispatcher = new RpcDispatcher( _channel, null, null, this ); _serviceCache = new PojoCache( _channel ); PropertyConfigurator config = new PropertyConfigurator(); config.configure( _serviceCache, _cacheConfiguration ); // START HACK: STOP JBOSSCACHE FROM BLOCKING FOREVER RpcDispatcher dispatcher = new RpcDispatcher( _channel, _serviceCache.getMessageListener(), _serviceCache, _serviceCache ); dispatcher.setMarshaller( _serviceCache.getMarshaller() ); // END HACK _serviceCache.startService();
I belive this could all be avoided by removing the return statement on line 1352 so that the MessageListenerAdaptor gets properly attached. -
2. Re: Remoting On Existing POJO Channel
shorrockin Jul 7, 2006 10:41 AM (in response to shorrockin)Even though it appears like I'm talking to myself it appears that my previous method of doing things doesn't quite work, even after fixing the obvious bug so that it reads:
_serviceCache = new PojoCache( _channel ) { public void start() throws Exception { // START HACK: STOP JBOSSCACHE FROM BLOCKING FOREVER this.disp = new RpcDispatcher( _channel, _serviceCache.getMessageListener(), _serviceCache, _serviceCache ); this.disp.setMarshaller( getMarshaller() ); // END HACK super.start(); } };
This does fix the problem in TreeCache and allows me to create my own RpcDispatcher that I can use for my own means but this RpcDispatcher still does not work, and produces a EOFException which causes the main thread to sit in a wait block (maybe because there's 2 RpcDispatchers, one for JBossCache and one for me?) I'd love some advise from the authors on how I should be dealing with this problem, but I'm not sure anybody is there. -
3. Re: Remoting On Existing POJO Channel
manik Jul 11, 2006 5:09 AM (in response to shorrockin)Hi,
Sorry for the slow response.
You shouldn't look at using JBoss Cache as an RPC mechanism. It is a library for distributing state and that is it. I try and discourage people from using JBoss Cache as an RPC library and this is why the callRemoteMethods() are deprecated. They will disappear in 2.0.0.
If you need to implement an RPC protocol, using JGroups + RpcDispatcher is one way to do it. And as you've seen, this is how we perform RPC in JBoss Cache.
If you need JBoss Cache functionality *as well as* RPC functionality, perhaps you should start a separate JGroups channel (different mcast address/port/cluster name) for RPC calls and a separate channel for JBoss Cache. This will prevent any conflicts in messages, etc. -
4. Re: Remoting On Existing POJO Channel
kblanken May 5, 2008 1:08 PM (in response to shorrockin)Hi there,
thank you, Shorrockin, for describing the problem and more importantly the solution in detail. In fact, I'm having the very same problem of creating a TreeCache instance with a multiplexer channel (JBoss Cache 1.4.1.SP9).
Here's my code:JChannelFactory factory=new JChannelFactory(); factory.setMultiplexerConfig(getClass().getResource("stacks.xml")); Channel channel = factory.createMultiplexerChannel("udp-safe", cacheName); channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE); cache = new TreeCache((JChannel) channel); PropertyConfigurator configurator = new PropertyConfigurator(); configurator.configure(cache, getClass().getResourceAsStream("replSync-pessimistic-service.xml")); cache.createService(); cache.startService();
When executing this code, the last line of this snippet hangs. The stacktrace isVector(Object).wait(long, int) line: not available [native method] Vector(Object).wait() line: 199 TreeCache.getCoordinator() line: 1840 TreeCache.determineCoordinator() line: 1818 TreeCache.startService() line: 1580
Obviously no one is calling notify() on the Vector object. The only place this could happen is in the viewAccepted method of the MembershipListener interface. As this is just what Shorrockin has described, it seems to me that he has had the same problem back then.
Am I missing something here? BTW, I'm outside JBoss AS, so I can't access the cache MBean. (Can I?) -
5. Re: Remoting On Existing POJO Channel
manik May 9, 2008 6:19 AM (in response to shorrockin)Is there any particular reason you are using 1.4.1.SP9 rather than 2.X?
-
6. Re: Remoting On Existing POJO Channel
kblanken May 14, 2008 4:09 AM (in response to shorrockin)Hi Manik, thank you for your reply, I haven't seen it until now.
The reason we're on 1.4.1 is that we're stuck to using WebSphere 6.0 for now, which means Java 1.4. -
7. Re: Remoting On Existing POJO Channel
manik May 14, 2008 5:30 AM (in response to shorrockin)Fair enough. Do you see JGroups membership messages in your logs? The Vector is waiting on a view change coming in from other instances in the cluster.
-
8. Re: Remoting On Existing POJO Channel
kblanken May 14, 2008 8:34 AM (in response to shorrockin)No view change messages. I've seen them before using the multiplexer, but not now:
[5/14/08 12:48:16:183 CEST] 00000028 TreeCache I org.jboss.cache.TreeCache _createService channel is already running [5/14/08 12:48:16:215 CEST] 00000029 UDP I org.jgroups.protocols.UDP createSockets sockets will use interface 80.253.209.188 [5/14/08 12:48:16:230 CEST] 00000029 UDP I org.jgroups.protocols.UDP createSockets socket information: local_addr=80.253.209.188:4786, mcast_addr=234.56.78.90:48866, bind_addr=/80.253.209.188, ttl=64 sock: bound to 80.253.209.188:4786, receive buffer size=80000, send buffer size=150000 mcast_recv_sock: bound to 80.253.209.188:48866, send buffer size=150000, receive buffer size=80000 mcast_send_sock: bound to 80.253.209.188:4787, send buffer size=150000, receive buffer size=80000 [5/14/08 12:48:16:246 CEST] 0000002a SystemOut O ------------------------------------------------------- GMS: address is 80.253.209.188:4786 ------------------------------------------------------- [5/14/08 12:48:18:327 CEST] 0000002b ENCRYPT I org.jgroups.protocols.ENCRYPT down handling view: [80.253.209.188:4786|0] [80.253.209.188:4786] [5/14/08 12:48:18:342 CEST] 0000002b ENCRYPT I org.jgroups.protocols.ENCRYPT becomeKeyServer I have become key server 80.253.209.188:4786 [5/14/08 12:48:18:358 CEST] 00000028 TreeCache I org.jboss.cache.TreeCache startService TreeCache local address is 80.253.209.188:4786
And this is where the code hangs.
Shouldn't the startService() method continue regardless of other potential members? -
9. Re: Remoting On Existing POJO Channel
manik May 14, 2008 10:16 AM (in response to shorrockin)What version of JGroups do you have? Older versions did have issues with the multiplexer not properly propagating views, IIRC. I'd search JGroups forums and mail lists for that.
"kblanken" wrote:
Shouldn't the startService() method continue regardless of other potential members?
Even if you are the only member in the cluster, when you join you receive a view change containing only yourself. This is enough to determine that you are the coordinator and for the Vector to be notified as such. -
10. Re: Remoting On Existing POJO Channel
kblanken May 15, 2008 1:52 PM (in response to shorrockin)My JGroups version is 2.4.2. So far I haven't found anything on their site, but I'm still searching.
The Multiplexer doesn't seem to be the reason anyway. I've narrowed down the problem to the following code:// Builds a properties String JGroupsProperties props = new JGroupsProperties(); JChannel channel = new JChannel(props.getProps()); cache = new TreeCache(channel); cache.setCacheMode(TreeCache.REPL_SYNC); // Uncomment to prevent wait lock: // channel.connect(cacheName); // View view = channel.getView(); // if (cache.getMembers()==null || cache.getMembers().size()==0) { // cache.getMembers().addAll(view.getMembers()); // } cache.startService();
This code hangs with the stacktrace posted earlier. The viewChanged method is never called.
By enabling the commented out code a different behaviour is triggered in TreeCache.getCoordinator(), bypassing the members.wait(). This is probably a very crude workaround I'd rather not keep as a final solution.
The wait seems to have been introduced in http://jira.jboss.com/jira/browse/JBCACHE-507.
My system is a Windows XP machine, UDP loopback is enabled.
I see the original error might indeed lie in JGroups, which doesn't pass down/up the view change event generated by the first member. Can you confirm this? Where is the join event triggered in TreeCache?
Any help is really appreciated. I'll gladly run some tests if you need me to.
Regards,
Kai -
11. Re: Remoting On Existing POJO Channel
manik May 15, 2008 5:30 PM (in response to shorrockin)Do you see any logs from JBoss Cache (trace level) after calling createService() and startService()?
-
12. Re: Remoting On Existing POJO Channel
kblanken May 16, 2008 10:15 AM (in response to shorrockin)The trace.log starting from the call of startService() follows:
[5/16/08 16:01:21:840 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache _createService No transaction manager lookup class has been defined. Transactions cannot be used [5/16/08 16:01:21:934 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache createEvictionPolicy Not using an EvictionPolicy [5/16/08 16:01:22:325 CEST] 00000027 InterceptorCh 1 org.jboss.cache.factories.InterceptorChainFactory createPessimisticInterceptorChain interceptor chain is: class org.jboss.cache.interceptors.CallInterceptor class org.jboss.cache.interceptors.PessimisticLockInterceptor class org.jboss.cache.interceptors.UnlockInterceptor class org.jboss.cache.interceptors.ReplicationInterceptor class org.jboss.cache.interceptors.TxInterceptor class org.jboss.cache.interceptors.CacheMgmtInterceptor 5/16/08 16:01:24:606 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache _createService cache mode is REPL_SYNC [5/16/08 16:01:24:637 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache _createService channel is already running [5/16/08 16:01:24:668 CEST] 00000028 STABLE 3 org.jgroups.protocols.pbcast.STABLE startStableTask stable task started [5/16/08 16:01:24:778 CEST] 00000029 SystemOut O ------------------------------------------------------- GMS: address is 80.253.209.188:2438 ------------------------------------------------------- [5/16/08 16:01:24:825 CEST] 0000002a PingSender 3 org.jgroups.protocols.PingSender run sending GET_MBRS_REQ [5/16/08 16:01:24:840 CEST] 0000002b FD_SOCK 3 org.jgroups.protocols.FD_SOCK$ServerSocketHandler run waiting for client connections on 0.0.0.0/0.0.0.0:2440 [5/16/08 16:01:24:840 CEST] 0000002c PingWaiter 3 org.jgroups.protocols.PingWaiter findInitialMembers waiting for initial members: time_to_wait=2000, got 0 rsps [5/16/08 16:01:25:840 CEST] 0000002a PingSender 3 org.jgroups.protocols.PingSender run sending GET_MBRS_REQ [5/16/08 16:01:26:840 CEST] 0000002c PingWaiter 3 org.jgroups.protocols.PingWaiter findInitialMembers initial mbrs are [] [5/16/08 16:01:26:856 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl join initial_mbrs are [] [5/16/08 16:01:26:856 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl join no initial members discovered: creating group as first member [5/16/08 16:01:26:934 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.GMS installView [local_addr=80.253.209.188:2438] view is [80.253.209.188:2438|0] [80.253.209.188:2438] [5/16/08 16:01:26:950 CEST] 00000028 FC 3 org.jgroups.protocols.FC handleViewChange new membership: [80.253.209.188:2438] [5/16/08 16:01:26:950 CEST] 00000028 FC 3 org.jgroups.protocols.FC handleViewChange creditors are [] [5/16/08 16:01:26:965 CEST] 00000028 STABLE 3 org.jgroups.protocols.pbcast.STABLE resetDigest resetting digest from NAKACK: [80.253.209.188:2438#-1] [5/16/08 16:01:26:965 CEST] 0000002d FD_SOCK 1 org.jgroups.protocols.FD_SOCK down VIEW_CHANGE received: [80.253.209.188:2438] [5/16/08 16:01:26:981 CEST] 0000002d FD_SOCK 1 org.jgroups.protocols.FD_SOCK getCacheFromCoordinator first member; cache is empty [5/16/08 16:01:26:981 CEST] 0000002e FD_SOCK 3 org.jgroups.protocols.FD_SOCK up i-have-sock: 80.253.209.188:2438 --> 80.253.209.188:2440 (cache is {80.253.209.188:2438=80.253.209.188:2440}) [5/16/08 16:01:26:996 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.GMS setImpl 80.253.209.188:2438 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl [5/16/08 16:01:26:996 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl becomeSingletonMember created group (first member). My view is [80.253.209.188:2438|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl [5/16/08 16:01:27:043 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache startService TreeCache local address is 80.253.209.188:2438 [5/16/08 16:01:27:090 CEST] 00000027 JChannel 3 org.jgroups.JChannel getState cannot get state from myself (80.253.209.188:2438): probably the first member [5/16/08 16:01:27:137 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache getCoordinator getCoordinator(): waiting on viewAccepted()
-
13. Re: Remoting On Existing POJO Channel
manik May 16, 2008 11:59 AM (in response to shorrockin)Could I see your JGroups cfg?
-
14. Re: Remoting On Existing POJO Channel
kblanken May 19, 2008 4:11 AM (in response to shorrockin)Sure. It is the same as in replAsync-service.xml, safe for the loopback and num_initial_members attributes.
<config> <!-- UDP: if you have a multihomed machine, set the bind_addr attribute to the appropriate NIC IP address --> <!-- UDP: On Windows machines, because of the media sense feature being broken with multicast (even after disabling media sense) set the loopback attribute to true --> <UDP mcast_addr="228.1.2.3" mcast_port="48866" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="true"/> <PING timeout="2000" num_initial_members="2" up_thread="false" down_thread="false"/> <MERGE2 min_interval="10000" max_interval="20000"/> <!-- <FD shun="true" up_thread="true" down_thread="true" />--> <FD_SOCK/> <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/> <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false" down_thread="false"/> <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false"/> <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/> <FC max_credits="2000000" down_thread="false" up_thread="false" min_threshold="0.20"/> <FRAG frag_size="8192" down_thread="false" up_thread="true"/> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/> </config>