1 2 Previous Next 17 Replies Latest reply on Jun 3, 2008 6:03 AM by manik

    Remoting On Existing POJO Channel

    shorrockin

      I'm currently developing a standalone APP that needs a mechanism by which remote methods can be executed on individual servers in the cluster.

      My initial attempt at this was to use the JGroups RpcDispatcher, but when I used this class starting up other clusters in the node resulted in an exception with a message which reads:
      org.jboss.cache.CacheException: Initial state transfer failed: Channel.getState() returned false

      (removing the construction of the RpcDispatcher fixes the problem)

      Fair enough, I figue this isn't supported with JBossCache so I'm trying to determine how it is suggested I do this. It appears that all the callRemoteMethod calls in Tree cache have been deprecated and the comment which reads "this is due to be moved to an interceptor." doesn't lead me to a clear replacement.

      My questions are as follows:
      1) Should JGroups RpcDispatcher work?
      2) If not, what approach should I take to build this type of framwork that utilizes the channel that JBossCache is sitting on?

        • 1. Re: Remoting On Existing POJO Channel
          shorrockin

          For all who care I finally figured out a way to get this to work (which also may point out a bug inside TreeCache).

          First thing you have to do is create your JChannel before you create your PojoCache. My earlier attempts (in my first message) extended the PojoCache to expose the JChannel through a public method, this did not work and resulted in the error listed in my original message, this method of creating your JChannel before hand works but only after some tweaking.

          The main problem is that the method below (in TreeCache.class), when you use your own JChannel, on line 3177 waits forever:

           synchronized (members)
           {
           while (members.size() == 0)
           {
           log.debug("waiting on viewAccepted()");
           try
           {
           members.wait();
           }
           catch (InterruptedException iex)
           {
           }
           }
           }
          


          To fix this we have to figure out why this works when we allow JBossCache to create the JChannel but doesn't work when we do it ourself. I believe the problem lies in _createService method. If you look at this method there's an if statement which checks to see if a JChannel exists and returns (prematurally) like such:

           log.info("cache mode is " + mode2String(cache_mode));
           if (channel != null)
           { // already started
           log.info("channel is already running");
           return;
           }
          


          The probelm with returning is that after the if block there are several lines which attach a listener that I assume is required, the absence of this listener causes problems:

           channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
           channel.setOpt(Channel.AUTO_GETSTATE, Boolean.TRUE);
          
           // COMMENTED OUT CODE HERE
          
           disp = new RpcDispatcher(channel, ml, this, this);
           disp.setMarshaller(getMarshaller());
           break;
          


          So because this code doesn't get executed we have to manually attach this listener ourselves. The whole process of creating the JChannel and setting up the listener looks like:

           JChannelFactory factory = new JChannelFactory( this.getClass().getClassLoader().getResource( _jgroupsConfiguration ) );
          
           _channel = (JChannel) factory.createChannel();
           _channel.setOpt(Channel.GET_STATE_EVENTS, Boolean.TRUE);
           _channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
           _channel.setOpt(Channel.AUTO_GETSTATE, Boolean.TRUE);
          
           // creates our own dispatcher used to proxy method requests across the cluster
           _dispatcher = new RpcDispatcher( _channel, null, null, this );
           _serviceCache = new PojoCache( _channel );
          
           PropertyConfigurator config = new PropertyConfigurator();
           config.configure( _serviceCache, _cacheConfiguration );
          
           // START HACK: STOP JBOSSCACHE FROM BLOCKING FOREVER
           RpcDispatcher dispatcher = new RpcDispatcher( _channel, _serviceCache.getMessageListener(), _serviceCache, _serviceCache );
           dispatcher.setMarshaller( _serviceCache.getMarshaller() );
           // END HACK
          
           _serviceCache.startService();
          


          I belive this could all be avoided by removing the return statement on line 1352 so that the MessageListenerAdaptor gets properly attached.

          • 2. Re: Remoting On Existing POJO Channel
            shorrockin

            Even though it appears like I'm talking to myself it appears that my previous method of doing things doesn't quite work, even after fixing the obvious bug so that it reads:

             _serviceCache = new PojoCache( _channel ) {
             public void start() throws Exception {
            
             // START HACK: STOP JBOSSCACHE FROM BLOCKING FOREVER
             this.disp = new RpcDispatcher( _channel, _serviceCache.getMessageListener(), _serviceCache, _serviceCache );
             this.disp.setMarshaller( getMarshaller() );
             // END HACK
            
             super.start();
             }
             };


            This does fix the problem in TreeCache and allows me to create my own RpcDispatcher that I can use for my own means but this RpcDispatcher still does not work, and produces a EOFException which causes the main thread to sit in a wait block (maybe because there's 2 RpcDispatchers, one for JBossCache and one for me?) I'd love some advise from the authors on how I should be dealing with this problem, but I'm not sure anybody is there.

            • 3. Re: Remoting On Existing POJO Channel
              manik

              Hi,

              Sorry for the slow response.

              You shouldn't look at using JBoss Cache as an RPC mechanism. It is a library for distributing state and that is it. I try and discourage people from using JBoss Cache as an RPC library and this is why the callRemoteMethods() are deprecated. They will disappear in 2.0.0.

              If you need to implement an RPC protocol, using JGroups + RpcDispatcher is one way to do it. And as you've seen, this is how we perform RPC in JBoss Cache.

              If you need JBoss Cache functionality *as well as* RPC functionality, perhaps you should start a separate JGroups channel (different mcast address/port/cluster name) for RPC calls and a separate channel for JBoss Cache. This will prevent any conflicts in messages, etc.



              • 4. Re: Remoting On Existing POJO Channel
                kblanken

                Hi there,

                thank you, Shorrockin, for describing the problem and more importantly the solution in detail. In fact, I'm having the very same problem of creating a TreeCache instance with a multiplexer channel (JBoss Cache 1.4.1.SP9).

                Here's my code:

                JChannelFactory factory=new JChannelFactory();
                factory.setMultiplexerConfig(getClass().getResource("stacks.xml"));
                Channel channel = factory.createMultiplexerChannel("udp-safe", cacheName);
                channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
                
                cache = new TreeCache((JChannel) channel);
                
                PropertyConfigurator configurator = new PropertyConfigurator();
                configurator.configure(cache, getClass().getResourceAsStream("replSync-pessimistic-service.xml"));
                
                cache.createService();
                cache.startService();


                When executing this code, the last line of this snippet hangs. The stacktrace is
                Vector(Object).wait(long, int) line: not available [native method]
                Vector(Object).wait() line: 199
                TreeCache.getCoordinator() line: 1840
                TreeCache.determineCoordinator() line: 1818
                TreeCache.startService() line: 1580
                


                Obviously no one is calling notify() on the Vector object. The only place this could happen is in the viewAccepted method of the MembershipListener interface. As this is just what Shorrockin has described, it seems to me that he has had the same problem back then.

                Am I missing something here? BTW, I'm outside JBoss AS, so I can't access the cache MBean. (Can I?)


                • 5. Re: Remoting On Existing POJO Channel
                  manik

                  Is there any particular reason you are using 1.4.1.SP9 rather than 2.X?

                  • 6. Re: Remoting On Existing POJO Channel
                    kblanken

                    Hi Manik, thank you for your reply, I haven't seen it until now.
                    The reason we're on 1.4.1 is that we're stuck to using WebSphere 6.0 for now, which means Java 1.4.

                    • 7. Re: Remoting On Existing POJO Channel
                      manik

                      Fair enough. Do you see JGroups membership messages in your logs? The Vector is waiting on a view change coming in from other instances in the cluster.

                      • 8. Re: Remoting On Existing POJO Channel
                        kblanken

                        No view change messages. I've seen them before using the multiplexer, but not now:

                        [5/14/08 12:48:16:183 CEST] 00000028 TreeCache I org.jboss.cache.TreeCache _createService channel is already running
                        [5/14/08 12:48:16:215 CEST] 00000029 UDP I org.jgroups.protocols.UDP createSockets sockets will use interface 80.253.209.188
                        [5/14/08 12:48:16:230 CEST] 00000029 UDP I org.jgroups.protocols.UDP createSockets socket information:
                        local_addr=80.253.209.188:4786, mcast_addr=234.56.78.90:48866, bind_addr=/80.253.209.188, ttl=64
                        sock: bound to 80.253.209.188:4786, receive buffer size=80000, send buffer size=150000
                        mcast_recv_sock: bound to 80.253.209.188:48866, send buffer size=150000, receive buffer size=80000
                        mcast_send_sock: bound to 80.253.209.188:4787, send buffer size=150000, receive buffer size=80000
                        [5/14/08 12:48:16:246 CEST] 0000002a SystemOut O
                        -------------------------------------------------------
                        GMS: address is 80.253.209.188:4786
                        -------------------------------------------------------
                        [5/14/08 12:48:18:327 CEST] 0000002b ENCRYPT I org.jgroups.protocols.ENCRYPT down handling view: [80.253.209.188:4786|0] [80.253.209.188:4786]
                        [5/14/08 12:48:18:342 CEST] 0000002b ENCRYPT I org.jgroups.protocols.ENCRYPT becomeKeyServer I have become key server 80.253.209.188:4786
                        [5/14/08 12:48:18:358 CEST] 00000028 TreeCache I org.jboss.cache.TreeCache startService TreeCache local address is 80.253.209.188:4786

                        And this is where the code hangs.
                        Shouldn't the startService() method continue regardless of other potential members?

                        • 9. Re: Remoting On Existing POJO Channel
                          manik

                          What version of JGroups do you have? Older versions did have issues with the multiplexer not properly propagating views, IIRC. I'd search JGroups forums and mail lists for that.

                          "kblanken" wrote:

                          Shouldn't the startService() method continue regardless of other potential members?


                          Even if you are the only member in the cluster, when you join you receive a view change containing only yourself. This is enough to determine that you are the coordinator and for the Vector to be notified as such.

                          • 10. Re: Remoting On Existing POJO Channel
                            kblanken

                            My JGroups version is 2.4.2. So far I haven't found anything on their site, but I'm still searching.

                            The Multiplexer doesn't seem to be the reason anyway. I've narrowed down the problem to the following code:

                            // Builds a properties String
                            JGroupsProperties props = new JGroupsProperties();
                            
                            JChannel channel = new JChannel(props.getProps());
                            cache = new TreeCache(channel);
                            cache.setCacheMode(TreeCache.REPL_SYNC);
                            
                            // Uncomment to prevent wait lock:
                            // channel.connect(cacheName);
                            // View view = channel.getView();
                            // if (cache.getMembers()==null || cache.getMembers().size()==0) {
                            // cache.getMembers().addAll(view.getMembers());
                            // }
                            
                            cache.startService();


                            This code hangs with the stacktrace posted earlier. The viewChanged method is never called.
                            By enabling the commented out code a different behaviour is triggered in TreeCache.getCoordinator(), bypassing the members.wait(). This is probably a very crude workaround I'd rather not keep as a final solution.

                            The wait seems to have been introduced in http://jira.jboss.com/jira/browse/JBCACHE-507.
                            My system is a Windows XP machine, UDP loopback is enabled.

                            I see the original error might indeed lie in JGroups, which doesn't pass down/up the view change event generated by the first member. Can you confirm this? Where is the join event triggered in TreeCache?

                            Any help is really appreciated. I'll gladly run some tests if you need me to.

                            Regards,

                            Kai

                            • 11. Re: Remoting On Existing POJO Channel
                              manik

                              Do you see any logs from JBoss Cache (trace level) after calling createService() and startService()?

                              • 12. Re: Remoting On Existing POJO Channel
                                kblanken

                                The trace.log starting from the call of startService() follows:

                                [5/16/08 16:01:21:840 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache _createService No transaction manager lookup class has been defined. Transactions cannot be used
                                [5/16/08 16:01:21:934 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache createEvictionPolicy Not using an EvictionPolicy
                                [5/16/08 16:01:22:325 CEST] 00000027 InterceptorCh 1 org.jboss.cache.factories.InterceptorChainFactory createPessimisticInterceptorChain interceptor chain is:
                                class org.jboss.cache.interceptors.CallInterceptor
                                class org.jboss.cache.interceptors.PessimisticLockInterceptor
                                class org.jboss.cache.interceptors.UnlockInterceptor
                                class org.jboss.cache.interceptors.ReplicationInterceptor
                                class org.jboss.cache.interceptors.TxInterceptor
                                class org.jboss.cache.interceptors.CacheMgmtInterceptor
                                5/16/08 16:01:24:606 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache _createService cache mode is REPL_SYNC
                                [5/16/08 16:01:24:637 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache _createService channel is already running
                                [5/16/08 16:01:24:668 CEST] 00000028 STABLE 3 org.jgroups.protocols.pbcast.STABLE startStableTask stable task started
                                [5/16/08 16:01:24:778 CEST] 00000029 SystemOut O
                                -------------------------------------------------------
                                GMS: address is 80.253.209.188:2438
                                -------------------------------------------------------
                                [5/16/08 16:01:24:825 CEST] 0000002a PingSender 3 org.jgroups.protocols.PingSender run sending GET_MBRS_REQ
                                [5/16/08 16:01:24:840 CEST] 0000002b FD_SOCK 3 org.jgroups.protocols.FD_SOCK$ServerSocketHandler run waiting for client connections on 0.0.0.0/0.0.0.0:2440
                                [5/16/08 16:01:24:840 CEST] 0000002c PingWaiter 3 org.jgroups.protocols.PingWaiter findInitialMembers waiting for initial members: time_to_wait=2000, got 0 rsps
                                [5/16/08 16:01:25:840 CEST] 0000002a PingSender 3 org.jgroups.protocols.PingSender run sending GET_MBRS_REQ
                                [5/16/08 16:01:26:840 CEST] 0000002c PingWaiter 3 org.jgroups.protocols.PingWaiter findInitialMembers initial mbrs are []
                                [5/16/08 16:01:26:856 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl join initial_mbrs are []
                                [5/16/08 16:01:26:856 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl join no initial members discovered: creating group as first member
                                [5/16/08 16:01:26:934 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.GMS installView [local_addr=80.253.209.188:2438] view is [80.253.209.188:2438|0] [80.253.209.188:2438]
                                [5/16/08 16:01:26:950 CEST] 00000028 FC 3 org.jgroups.protocols.FC handleViewChange new membership: [80.253.209.188:2438]
                                [5/16/08 16:01:26:950 CEST] 00000028 FC 3 org.jgroups.protocols.FC handleViewChange creditors are []
                                [5/16/08 16:01:26:965 CEST] 00000028 STABLE 3 org.jgroups.protocols.pbcast.STABLE resetDigest resetting digest from NAKACK: [80.253.209.188:2438#-1]
                                [5/16/08 16:01:26:965 CEST] 0000002d FD_SOCK 1 org.jgroups.protocols.FD_SOCK down VIEW_CHANGE received: [80.253.209.188:2438]
                                [5/16/08 16:01:26:981 CEST] 0000002d FD_SOCK 1 org.jgroups.protocols.FD_SOCK getCacheFromCoordinator first member; cache is empty
                                [5/16/08 16:01:26:981 CEST] 0000002e FD_SOCK 3 org.jgroups.protocols.FD_SOCK up i-have-sock: 80.253.209.188:2438 --> 80.253.209.188:2440 (cache is {80.253.209.188:2438=80.253.209.188:2440})
                                [5/16/08 16:01:26:996 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.GMS setImpl 80.253.209.188:2438 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
                                [5/16/08 16:01:26:996 CEST] 00000028 GMS 1 org.jgroups.protocols.pbcast.ClientGmsImpl becomeSingletonMember created group (first member). My view is [80.253.209.188:2438|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
                                [5/16/08 16:01:27:043 CEST] 00000027 TreeCache I org.jboss.cache.TreeCache startService TreeCache local address is 80.253.209.188:2438
                                [5/16/08 16:01:27:090 CEST] 00000027 JChannel 3 org.jgroups.JChannel getState cannot get state from myself (80.253.209.188:2438): probably the first member
                                [5/16/08 16:01:27:137 CEST] 00000027 TreeCache 1 org.jboss.cache.TreeCache getCoordinator getCoordinator(): waiting on viewAccepted()


                                • 13. Re: Remoting On Existing POJO Channel
                                  manik

                                  Could I see your JGroups cfg?

                                  • 14. Re: Remoting On Existing POJO Channel
                                    kblanken

                                    Sure. It is the same as in replAsync-service.xml, safe for the loopback and num_initial_members attributes.

                                    <config>
                                     <!-- UDP: if you have a multihomed machine,
                                     set the bind_addr attribute to the appropriate NIC IP address -->
                                     <!-- UDP: On Windows machines, because of the media sense feature
                                     being broken with multicast (even after disabling media sense)
                                     set the loopback attribute to true -->
                                     <UDP mcast_addr="228.1.2.3" mcast_port="48866"
                                     ip_ttl="64" ip_mcast="true"
                                     mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                                     ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                                     loopback="true"/>
                                     <PING timeout="2000" num_initial_members="2"
                                     up_thread="false" down_thread="false"/>
                                     <MERGE2 min_interval="10000" max_interval="20000"/>
                                     <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
                                     <FD_SOCK/>
                                     <VERIFY_SUSPECT timeout="1500"
                                     up_thread="false" down_thread="false"/>
                                     <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
                                     max_xmit_size="8192" up_thread="false" down_thread="false"/>
                                     <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
                                     down_thread="false"/>
                                     <pbcast.STABLE desired_avg_gossip="20000"
                                     up_thread="false" down_thread="false"/>
                                     <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                                     shun="true" print_local_addr="true"/>
                                     <FC max_credits="2000000" down_thread="false" up_thread="false"
                                     min_threshold="0.20"/>
                                     <FRAG frag_size="8192" down_thread="false" up_thread="true"/>
                                     <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
                                    </config>


                                    1 2 Previous Next