11 Replies Latest reply on Jan 7, 2008 7:13 AM by fredrikj

    Buddyrep issue

    fredrikj

      Hi.
      I am currently using cache 2.1.0 GA and jgroups 2.6.1 with buddy replication. Buddy rep is configured to use one buddy only.

      The setup is four nodes with ip addresses like:

      172.16.0.5
      172.16.0.6
      172.16.0.7
      172.16.0.8


      They are all started in the stated order so that .5 is the coordinator. The first node up (.5) will insert data to the cache and then data will gravitate to the other nodes when needed. This will occur mostly initially when applying load to the system. Data affinity is handled by a layer above the cache.

      Using this scenario with 2.0.0 GA presented no problems except adding new nodes during load, so we are now investigating 2.1.0.

      The issue I'm facing is that the coordinator seems to get two buddy backups, one being itself.

      This is the contents on 172.16.0.5 (coordinator):

      null {}
       /91 {91=com.cubeia.firebase.game.table.InternalMetaData@1134d9b com.cubeia.testgame.server.game.TestGame@1cdc8ce}
       /63 {63=com.cubeia.firebase.game.table.InternalMetaData@cc9ff5 com.cubeia.testgame.server.game.TestGame@951aeb}
       /92 {92=com.cubeia.firebase.game.table.InternalMetaData@e60a39 com.cubeia.testgame.server.game.TestGame@185c84e}
       ...
      
       /_BUDDY_BACKUP_ {}
       /172.16.0.8_8786 {}
       /15 {15=com.cubeia.firebase.game.table.InternalMetaData@d9e1c0 null}
       /16 {16=com.cubeia.firebase.game.table.InternalMetaData@742062 null}
       /172.16.0.5_8786 {}
       /31 {}


      Notice that there are two members listed under_BUDDY_BACKUP, one is .8 and the other one is .5, which is itself.

      Now, on 172.16.8 we get a lot of lock timeouts like the one below:

      Caused by: org.jboss.cache.lock.TimeoutException: read lock for /_BUDDY_BACKUP_/172.16.0.5_8786 could not be acquired by GlobalTransaction:<172.16.0.6:8786>:41 after 5000 ms. Locks: Read lock owners: []
      Write lock owner: GlobalTransaction:<172.16.0.6:8786>:1
      , lock info: write owner=GlobalTransaction:<172.16.0.6:8786>:1 (activeReaders=0, activeWriter=Thread[Incoming,TableSpace,172.16.0.8:8786,5,Thread Pools], waitingReaders=25, waitingWriters=0, waitingUpgrader=0)
      


      172.16.0.8 also shows two members under the buddy backup:

      null {}
       /28 {}
       /29 {}
       /92 {}
       ...
       /_BUDDY_BACKUP_ {}
       /172.16.0.7_8786 {}
       /91 {91=com.cubeia.firebase.game.table.InternalMetaData@1fbeed6 null}
       /41 {41=com.cubeia.firebase.game.table.InternalMetaData@fd3922 null}
       /115 {115=com.cubeia.firebase.game.table.InternalMetaData@b215d9 null}
       ...
       /172.16.0.5_8786 {}
       /31 {}
      


      It seems like .8's buddy to backup is in fact .7. But we still hold some buddy ref to the .5 member as well. In fact, all the lock timeouts on .8 is related to to .5 buddy back fqn:

      failure acquiring lock: fqn=/_BUDDY_BACKUP_/172.16.0.5_8786


        • 1. Re: Buddyrep issue
          fredrikj

          (Cont.)

          Further, we see that all locks that fail because a timeout is from .6, which do have .5 as its buddy backup.

          So, my question is if buddy replication has changed between 2.0.0 and 2.1.0?

          In any case the behavior is changed since this worked with 2.0.0 and not anymore. Why does the .5 member show up twice and as a buddy backup of itself? It does not make sense to me anyway....

          Anyway, I hope I have been able to state the setup we are using, it tends to be a bit complex to define when using multiple nodes and buddy rep =)

          • 2. Re: Buddyrep issue
            manik

             

            "FredrikJ" wrote:
            I am currently using cache 2.1.0 GA


            I presume you mean 2.1.0.CR2.

            No such behaviour should have changed between 2.0.0 and 2.1.0. Some of the internal code has changed, but that is more along the lines of refactoring and performance enhancements, not logical behaviour in any way.

            I can't seem to recreate this; we have a functional test for creating 3 cache instances each with 1 buddy, and then checking which cache belongs to which buddy group, etc.

            Do you have a unit test that recreates this? If not, do you have any more information that could help - are you using a custom BuddyLocator, etc?

            Thanks
            Manik

            • 3. Re: Buddyrep issue
              fredrikj

              2.1.0 CR2 is correct.

              We are not using anything extra apart from turning on buddy replication and using data gravitation. I tried to recreate it today as well in a separate unit-test but with no success so far. I will give it another shot tomorrow since I think the underlying use case scenario is what might be triggering this.

              Ideally you would run the nodes on different machines to introduce some latency between the gravitations, I'll try to do that tomorrow.

              • 4. Re: Buddyrep issue
                fredrikj

                I have now tried to reproduce the issue in a standalone unit test and have succeeded to at least some extent =)

                I am now running two caches locally where one is producing data and the other one is inspecting the cache - causing data to gravitate to the second cache. The issue is replicated in the sense that the produces does get a buddy backup node for itself, but I can't seem to get that node to contain any data.

                The standalone test is much simpler and single threaded compared to the real life application so thats a plausible reason why we do not see the exact same behaviour (i.e. data inside the redundant buddy backup node).

                In short:
                The producer (master) is started:

                Master created: 192.168.1.135:51469


                The slave joins the cluster and is added as a buddy to the master:

                MASTER:
                null {}
                 /1 {1=c6m0p888dfvz}
                 /_BUDDY_BACKUP_ {}
                 /192.168.1.135_51470 {}


                The slave fetches cache contents and data is gravitated to the slave, thus moving to the slave buddy backup. But we also see the master having a buddy backup for itself (with no data):

                MASTER:
                null {}
                 /_BUDDY_BACKUP_ {}
                 /192.168.1.135_51469 {}
                 /192.168.1.135_51470 {}
                 /1 {1=c6m0p888dfvz}


                The backup node for itself does not contain any data like in our real application but I think it might be symptomatic for the issue.

                You can find the full logfiles here:
                http://www.cubeia.com/misc/cache/log

                You can find the source code for the test here:http://www.cubeia.com/misc/cache

                Warning, the code is a bit of a hack =)

                To run it, start one instance with the argument 'master' and then another instance with the argument 'slave' directly after. I've put some system properties used in the javadoc of the main class.

                • 5. Re: Buddyrep issue
                  manik

                  Sorry, I didn't quite understand your use case in the beginning.

                  If you are adding data to .5 while at the same time reading the same data from the other nodes (causing data gravitation), this will naturally cause some timeouts since there is contention on a buddy backup subtree.

                  In general though, we very strongly recommend that you have session affinity if you are using buddy replication. I.e., only perform reads and writes on ONE INSTANCE in the cluster for each key. Data gravitation should really only be used as a form of recovery if a node dies and your reads and writes are now redirected to a standby instance.

                  See http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossCacheBuddyReplicationDesign.

                  I guess judging by your access patterns, you need something like Partitioning, which won't be available for at least another 6 months or so - unless you're willing to help out! ;-)

                  Without better knowledge of your application, I can't really recommend an alternative strategy - do you use a load balancer to distribute requests between caches? Do you use some form of session id per request and is data shared between "sessions"?

                  • 6. Re: Buddyrep issue
                    fredrikj

                    We do apply session affinity.

                    Look at this way:

                    1. Cache A starts
                    2. Cache A adds 10 nodes to the cache
                    3. Cache B starts
                    4. Cache B 'gets' the 10 nodes thus causing a gravitation

                    After #4 in the sequence we end up with the weird buddy rep settings as discussed above. This is exactly what we are doing in the test provided. We only go through the sequence once, we do not keep modifying/adding etc to the cache. (There is also a 5 second sleep between 3 and 4).

                    Just to clarify:
                    We do not add data to the cache while trying to read it.
                    We do not modify data to the cache while trying to read it.
                    In fact there is no concurrent access at all in the test provided.

                    The issue is clearly related to data gravitation, in the logs we see that the buddy backup is correctly setup before gravitation occurs (i.e. after #3 in the sequence above), but has changed to include itself after data gravitation has occurred (after #4 in the sequence).

                    I'll try and make a comparison run with 2.0.0 GA and post the logs, if behavior is unchanged then we should get the same buddy backup result.




                    • 7. Re: Buddyrep issue
                      manik

                       

                      "FredrikJ" wrote:

                      But we also see the master having a buddy backup for itself (with no data):

                      MASTER:
                      null {}
                       /_BUDDY_BACKUP_ {}
                       /192.168.1.135_51469 {}
                       /192.168.1.135_51470 {}
                       /1 {1=c6m0p888dfvz}


                      The backup node for itself does not contain any data like in our real application but I think it might be symptomatic for the issue.



                      Come to think of it, I did see something like this in trunk a couple of weeks back and fixed it. Do you want to try a snapshot build from SVN and see if you still see the problem?


                      • 8. Re: Buddyrep issue
                        fredrikj

                        I tried a snapshot build from SVN (revision 4982) but the result is still the same .

                        Actually I was not entirely accurate in the sequence description. It should be like this:

                        1. Cache A starts
                        2. Cache A adds 10 nodes to the cache
                        3. Cache B starts
                        4. Cache B 'gets' 9 nodes thus causing a gravitation

                        I'm not sure why I decided to gravitate 9/10 nodes, but there it is. I have modified a producer log to include the sequence numbers so you can follow the sequences in the log better. You can find a log containing the sequences here: http://www.robotsociety.com/cache/br/producer_seq.log

                        You can see a comparison between running the provided test for 2.0.0 GA and 2.1.0 CR2 here:
                        2.0.0 GA producer log : http://www.robotsociety.com/cache/br/2.0.0/producer_2.0.0.log
                        2.1.0 CR2 producer log : http://www.robotsociety.com/cache/br/2.1.0%20CR2/producer_2.1.0_cr2.log

                        To sum the comparison up, the 2.0.0 GA and 2.1.0 CR2 are equal until data has gravitated. Then we can see that for 2.0.0 GA the cache in the producer looks like:

                        2.0.0 GA:
                        null {}
                         /9 {4=85969gehp9xt}
                         /_BUDDY_BACKUP_ {}
                         /192.168.1.135_51890 {}
                         /3 {4=xapfkubptgx0}
                         /5 {2=17z0dcpunsivo}
                         /7 {2=1kr0r0fpucxmy}
                         /2 {1=1cdp3dootdbu9}
                         /0 {0=cvwo5pn5r7ou}
                         /4 {4=1wzvk2sq01r1p}
                         /8 {0=1vfthyv0lbkvw}
                         /6 {2=i5az9hje5gxo}
                         /1 {0=10eaf93uzvsru}


                        And for 2.1.0 CR2:

                        2.1.0 CR2:
                        null {}
                         /9 {0=13gmz9ol9twc}
                         /_BUDDY_BACKUP_ {}
                         /192.168.1.135_51892 {}
                         /3 {1=kf0p8vtuad5y}
                         /5 {4=uvykukzoi2h0}
                         /7 {1=zaouao7ntsy7}
                         /2 {0=ib5tifo2tqco}
                         /0 {4=1d1boyx12sha7}
                         /4 {2=gyxlbtnjgpc}
                         /8 {3=1ivb35jlxspf2}
                         /6 {0=pte5ws2mpe11}
                         /1 {1=1erqp88pgexrv}
                         /192.168.1.135_51891 {} // <--- wtf?


                        As I stated before in my first slightly confusing post, it is not the fact that we get an additional (recursive) buddy node in the cache that is the dealbreaker for us. But I strongly believe that this is one of, if not the, cause for the lock timeouts we see. Remember that for 2.0.0 GA buddy rep works like a charm for us, but with 2.1.0 CR2 it is broken. Given that and the fact that the caches are in different states after a gravitation has occurred using the simple test code provided, indicates to me that something has changed and/or a regression has occurred.

                        • 9. Re: Buddyrep issue
                          manik

                          This is a regression. Thanks for this - JBCACHE-1256

                          • 10. Re: Buddyrep issue
                            manik

                            Hi - I have a fix in svn trunk if you feel like trying it out. Note that trunk is unstable and Hudson still hasn't finished thoroughly checking my fix for further regressions. Details are in JIRA.

                            • 11. Re: Buddyrep issue
                              fredrikj

                              Wow, that was fast =)

                              The issue is indeed fixed for the standalone test case. We will probably wait for the CR3 release before we test with our real application.