7 Replies Latest reply on Jan 15, 2007 7:26 AM by oew

    TCP replication conf issue ?

    oew

      Hi All,

      I am facing a strange problem : I am trying to cluster 2 (and later 3) caches . My tests are based on the test.examples.StudentMaintTest. Code remains the same except for

      private PojoCache createCache(String name)
      , there is one method for each of the cache that reads different conf files.

      one conf file has :

      <TCP bind_addr="10.32.2.48" loopback="true" start_port="7800" enable_diagnostics="false" />
      <TCPPING down_thread="true" initial_hosts="10.32.2.48[7800],10.32.2.48[7801]" num_initial_members="3" port_range="5" timeout="3500" up_thread="true" />
      <MERGE2 max_interval="10000" min_interval="5000"/>
      <FD down_thread="true" max_tries="5" shun="true" timeout="2500" up_thread="true"/>
      <VERIFY_SUSPECT down_thread="false" timeout="1500" up_thread="false"/>
      <pbcast.NAKACK down_thread="true" gc_lag="100" retransmit_timeout="3000" up_thread="true"/>
      <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false"/>
      <pbcast.GMS down_thread="true" join_retry_timeout="2000" join_timeout="5000" print_local_addr="false" shun="false" up_thread="true"/>
      <pbcast.STATE_TRANSFER down_thread="true" up_thread="true"/>


      the other one has :

      <TCP bind_addr="10.32.2.48" loopback="true" start_port="7801" enable_diagnostics="false" />
      <TCPPING down_thread="true" initial_hosts="10.32.2.48[7800],10.32.2.48[7801]" num_initial_members="3" port_range="5" timeout="3500" up_thread="true"/>
      <MERGE2 max_interval="10000" min_interval="5000"/>
      <FD down_thread="true" max_tries="5" shun="true" timeout="2500" up_thread="true"/>
      <VERIFY_SUSPECT down_thread="false" timeout="1500" up_thread="false"/>
      <pbcast.NAKACK down_thread="true" gc_lag="100" retransmit_timeout="3000" up_thread="true"/>
      <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false"/>
      <pbcast.GMS down_thread="true" join_retry_timeout="2000" join_timeout="5000" print_local_addr="false" shun="false" up_thread="true"/>
      <pbcast.STATE_TRANSFER down_thread="true" up_thread="true"/>


      and nothing happens (regarding sychronization) test fails : NullPointerException is thrown when doing
      joe2.addCourse(bar_);
      . Joe2 comes from
      Student joe2 = (Student) cache2_.getObject("/students/65432");

      If I do NOT list all the "servers" in the TCPPING it works !!!

      Did I do something wrong in the service.xml files ? or did I really miss something ?

      Thanks for your answers

        • 1. Re: TCP replication conf issue ? now Works
          oew

          Hi All,

          Turning around the problem , and testing conf possibilites, I found that :

          <TCP start_port="7802" />
          <TCPPING down_thread="true" initial_hosts="10.32.2.48[7800],10.32.2.48[7801],10.32.2.48[7802]" timeout="3500" up_thread="true" />
          <MERGE2 max_interval="10000" min_interval="5000"/>
          <FD down_thread="true" max_tries="5" shun="true" timeout="2500" up_thread="true"/>
          <VERIFY_SUSPECT down_thread="false" timeout="1500" up_thread="false"/>
          <pbcast.NAKACK down_thread="true" gc_lag="100" retransmit_timeout="3000" up_thread="true"/>
          <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false"/>
          <pbcast.GMS down_thread="true" join_retry_timeout="2000" join_timeout="5000" print_local_addr="false" shun="false" up_thread="true"/>
          <pbcast.STATE_TRANSFER down_thread="true" up_thread="true"/>


          works as expected.
          removing attributes bind_addr from TCP and port_range from TCPPING

          Now I have 3 caches running on the same PC that replicate well.

          If someone has an explanation......



          • 2. Problems are back
            oew

            Last week I had my 3 caches finally running (step1) , then I tried to setup a TCPCacheServer and assign a cache to it (step2).

            According to this link : http://jira.jboss.com/jira/browse/JBCACHE-690 this was a hopeless attempt.

            What i did is upgrade from 1.4.0 SP1 to 1.4.1 GA fix version.

            The problem now is that cache do not replicate anymore even when returning to step1. but work again when downgrading !

            Am I the only person in this forum that has such problems trying to setup standalone replicated servers ?

            What can I do ?

            • 3. Re: TCP replication conf issue ?
              manik

              JBCACHE-690 is a fix around JMX lifecycle state of the TCP cache server, used as a target for the TcpDelegatingCacheLoader. Is this what you intend to use? From what I gather from your post, I don't think so.

              I think you should first debug your network and JGroups stack - have a look at debugging tips in the JGroups docs. Test that your JGroups configuration allows the nodes to see each other properly and communicate.

              • 4. Re: TCP replication conf issue ?
                manik

                Also note that JBC 1.4.1 ships with JGroups 2.4.1.

                • 5. Re: TCP replication conf issue ?
                  oew

                  Hi Manik,

                  Use of TcpDelegatingCacheLoader is my final intention , but it will come later I do this step after step.

                  regarding the jgroups version i use the on shipped with the cache distribution in both cases (1.4.0 & 1.4.1)

                  here is jbosscache.log when i use 1.4.1 :

                  2007-01-12 15:06:42,492 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@a17083
                  2007-01-12 15:06:42,492 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:06:42,648 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:06:46,975 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|0] [10.32.2.48:7800]
                  2007-01-12 15:06:47,022 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7800
                  2007-01-12 15:06:47,022 INFO [org.jboss.cache.TreeCache] (main) State could not be retrieved (we are the first member in group)
                  2007-01-12 15:06:47,100 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  2007-01-12 15:06:47,100 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@197bb7
                  2007-01-12 15:06:47,116 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:06:47,131 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:06:50,864 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7801|0] [10.32.2.48:7801]
                  2007-01-12 15:06:50,864 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7801
                  2007-01-12 15:06:50,864 INFO [org.jboss.cache.TreeCache] (main) State could not be retrieved (we are the first member in group)
                  2007-01-12 15:06:50,896 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  2007-01-12 15:06:50,896 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@80cac9
                  2007-01-12 15:06:50,911 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:06:50,943 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:06:51,177 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7801|1] [10.32.2.48:7801, 10.32.2.48:7802]
                  2007-01-12 15:06:51,177 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7801|1] [10.32.2.48:7801, 10.32.2.48:7802]
                  2007-01-12 15:06:51,177 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7802
                  2007-01-12 15:06:51,224 INFO [org.jboss.cache.statetransfer.StateTransferGenerator_140] (UpHandler (STATE_TRANSFER)) returning the state for tree rooted in /(5619 bytes)
                  2007-01-12 15:06:51,224 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) received the state (size=5619 bytes)
                  2007-01-12 15:06:51,317 INFO [org.jboss.cache.TreeCache] (main) state was retrieved successfully (in 140 milliseconds)
                  2007-01-12 15:06:51,317 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  


                  7800 & 7801 do not see each other, the test case inserts in 7800 and reads from 7801 & 7802 , of course it fails

                  and here is the log when running with 1.4.0 :
                  
                  2007-01-12 15:04:22,474 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@c88440
                  2007-01-12 15:04:22,490 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:04:22,615 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:04:27,051 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7800
                  2007-01-12 15:04:27,051 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|0] [10.32.2.48:7800]
                  2007-01-12 15:04:27,082 INFO [org.jboss.cache.TreeCache] (main) State could not be retrieved (we are the first member in group)
                  2007-01-12 15:04:27,192 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  2007-01-12 15:04:27,192 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@1a918d5
                  2007-01-12 15:04:27,207 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:04:27,207 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:04:30,909 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|1] [10.32.2.48:7800, 10.32.2.48:7801]
                  2007-01-12 15:04:30,909 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7801
                  2007-01-12 15:04:30,909 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|1] [10.32.2.48:7800, 10.32.2.48:7801]
                  2007-01-12 15:04:30,972 INFO [org.jboss.cache.statetransfer.StateTransferGenerator_140] (UpHandler (STATE_TRANSFER)) returning the state for tree rooted in /(5619 bytes)
                  2007-01-12 15:04:30,972 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) received the state (size=5619 bytes)
                  2007-01-12 15:04:31,081 INFO [org.jboss.cache.TreeCache] (main) state was retrieved successfully (in 172 milliseconds)
                  2007-01-12 15:04:31,097 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  2007-01-12 15:04:31,097 INFO [org.jboss.cache.PropertyConfigurator] (main) Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@5bb966
                  2007-01-12 15:04:31,112 INFO [org.jboss.cache.PropertyConfigurator] (main) attribute size: 15
                  2007-01-12 15:04:31,144 INFO [org.jboss.cache.factories.InterceptorChainFactory] (main) interceptor chain is:
                  class org.jboss.cache.interceptors.CallInterceptor
                  class org.jboss.cache.interceptors.PessimisticLockInterceptor
                  class org.jboss.cache.interceptors.CacheStoreInterceptor
                  class org.jboss.cache.interceptors.CacheLoaderInterceptor
                  class org.jboss.cache.interceptors.UnlockInterceptor
                  class org.jboss.cache.interceptors.ReplicationInterceptor
                  class org.jboss.cache.interceptors.TxInterceptor
                  class org.jboss.cache.interceptors.CacheMgmtInterceptor
                  2007-01-12 15:04:31,300 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|2] [10.32.2.48:7800, 10.32.2.48:7801, 10.32.2.48:7802]
                  2007-01-12 15:04:31,300 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|2] [10.32.2.48:7800, 10.32.2.48:7801, 10.32.2.48:7802]
                  2007-01-12 15:04:31,300 INFO [org.jboss.cache.TreeCache] (main) TreeCache local address is 10.32.2.48:7802
                  2007-01-12 15:04:31,300 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) viewAccepted(): [10.32.2.48:7800|2] [10.32.2.48:7800, 10.32.2.48:7801, 10.32.2.48:7802]
                  2007-01-12 15:04:31,300 INFO [org.jboss.cache.statetransfer.StateTransferGenerator_140] (UpHandler (STATE_TRANSFER)) returning the state for tree rooted in /(5619 bytes)
                  2007-01-12 15:04:31,315 INFO [org.jboss.cache.TreeCache] (UpHandler (STATE_TRANSFER)) received the state (size=5619 bytes)
                  2007-01-12 15:04:31,362 INFO [org.jboss.cache.TreeCache] (main) state was retrieved successfully (in 62 milliseconds)
                  2007-01-12 15:04:31,362 INFO [org.jboss.cache.TreeCache] (main) parseConfig(): PojoCacheConfig is empty
                  


                  same code produces these 2 files! just libs are different



                  • 6. Re: TCP replication conf issue ?
                    manik

                    Like I said, I would try the JGroups config using one of the JGroups test programs (like Draw) to make sure the config works.

                    • 7. Re: TCP replication conf issue ?
                      oew

                      Hi Manik,

                      You are right !!

                      I cannot have two draw instances see each other. I'll keep investigating this and come back to cache when it is fixed.

                      Thx for your support.