1 2 Previous Next 15 Replies Latest reply on Mar 11, 2011 11:10 AM by belaban

    Deterministic Hash Wheel Positions

    shane_dev

      We have been working on optimizing performance during a rehash. One of the things we realized is that when a node is restarted, it can end up in a new location within the hash wheel.

       

      For example...

       

      • Our wheel looks is in the following order: A, B, C, D
      • We then restart C.
      • It is now: A, C, B, D

       

      This is due to JGroupsTransport appending a random number to the node name and then using that value as the channel name. For example, node1 becomes node1-19539. If we restart node1, it may then become node1-48520.

       

      We have made some changes locally so that we use a static value for the channel name. Now when the address is used as the hash wheel key its hash is always the same, before and after restarts. This is of tremendous value us.

       

      1. The distribution is the same for every single test run (for a specific key set).
      2. We can play with the positions via trial and error to get a better distribution (for a specific key set).
      3. We know exactly what the consequences are for a rehash.
      4. And the biggest advantage...
        • If you set numOwners to 1 and a node is shut down, you lose those keys. That is fine with us.
          • If no keys are added and the node is started again, there is NO state transfer.
          • If some keys are added and the node is started again, only a portion of those NEW keys will be transferred.

       

      I understand that multiple instances can run on the same node using the same configuration and so that is why the random numbers are appended to the node name. However, perhaps the configuration can be made more flexible so that you can choose to include them or not.

        • 1. Deterministic Hash Wheel Positions
          genman

          It seems like a fairly easy contribution to make. I wonder if you couldn't provide a patch yourself via github?

           

          I would probably use some sort of JVM-wide counter (i.e. static variable :-) ) as an alternative. The first cache instance created would be XX-1, second XX-2, etc. The counter could decrement as instances are released. It wouldn't be perfect across JVMs, but it'd work for you and the purpose of testing.

          • 2. Deterministic Hash Wheel Positions
            shane_dev

            I see what you are saying about the counter, but I'm not quite sure how to go about it.

             

            Ideally, if we start 3 instances on the same box I would like to see nodename-1, nodename-2, and nodename-3 as you are suggesting. The question is, where can we store this counter? I'm having a hard time thinking of something other than a shared file or something. Or some sort of remote service.

             

            I don't necessarily want to make it too complex though.

             

            The short term answer is that we can use a specific configuration per instance so that we can manually set the nodename ala nodename-1 and what not in the config file of each one.

            • 3. Deterministic Hash Wheel Positions
              kapilnayar1

              In fact the node location policy should be configurable.

               

              Typically, in a clustered environment a new node is required to be added when the current cluster nodes tend to overload.

              The consistent hashing algorithm mandates the newly added node to share the load from the neighbouring nodes.

              Under these circumstances ideally the new node should be a neighbour to the most loaded node in the hashing wheel.

               

              Kapil

              • 4. Deterministic Hash Wheel Positions
                galder.zamarreno

                Hmmm, the reason JGroups behaves that way is due to https://issues.jboss.org/browse/JGRP-129 which was developed to solve reincarnation issues as explained in https://issues.jboss.org/browse/JGRP-130

                 

                What I think it'd be more useful is to be able to hash on the static part of the address, i.e. "node1" rather than on "node1-19539". However, if you do that how do you deal with multiple Infinispan instances bound to same "node1"? They'd both hash to the same position which is not good. Maybe hashing on physical address could be better?

                • 5. Deterministic Hash Wheel Positions
                  belaban

                  > This is due to JGroupsTransport appending a random number to the node name and then using that value as the channel name. For example,

                  > node1 becomes node1-19539. If we restart node1, it may then become node1-48520.

                   

                  While this is true, it applies only to the *logical* name. The position on the hashwheel is determined by the hash of the address. The address is an org.jgroups.util.UUID and - yes - will change after a disconnect-connect sequence, in order to prevent reincarnation.

                   

                  Unless I'm completely mistaken, Infinispan uses the UUID.hashCode() to determine the position, *not* the logical name !

                  • 6. Deterministic Hash Wheel Positions
                    belaban

                    Yes, I confirmed that, we're using an Address.hashCode(), which resolves to JGroupsAddress.address.hashCode() --> UUID.hashCode().

                     

                    Of course you can always write a consistent hash implementation which hashes on the logical name, but then this is not guaranteed to be unique...

                    • 7. Deterministic Hash Wheel Positions
                      shane_dev

                      Indeed. As everyone has mentioned, our short term solution is to hash on the static part of the address. The problem now is how do we support multiple instances on one box. I believe the answer is with configuration. I wonder if we can just add an option to use static or dynamic addresses.

                       

                      This way if an organization does not care they can choose to use dynamic addresses. However, for those similar to the ones I've worked with we can choose to use static addresses and simply use specific configuration files for each instance.

                       

                      This has been extremely helpful for us so far, it is difficult for me to imagine anyone NOT wanting this feature. To be honest, I'm suggesting a config option to be nice but I think Infinispan should simply require a proper config file per instance. That way you can hardwire 'node1-ins1' and 'node1-inst2' and so on. I see little benefit in havining the product generate a random number for you rather than you specifying the number yourself.

                      • 8. Deterministic Hash Wheel Positions
                        shane_dev

                        What if we used something akin to a counter in the hashwheel? For example, if node1-593 joins it gets added using a hash of node1-1. Then when node1-485 joins it gets added as a hash of node1-2.

                        • 9. Deterministic Hash Wheel Positions
                          belaban

                          OK, so just to make sure we're on the same page: you wrote your own consistent hash function which hashes on the *logical name* of an address compared to the DefaultConsistentHash which hashes on the address (org.jgroups.util.UUID) itself. Correct ?

                           

                          If you do this, why don't you use the TopologyAwareConsistentHash function and set siteId, rackId and machineId ?

                           

                          I have some new code in [1] which encapsulates siteId, rackId and machineId (*if* TopologyAwareConsistentHash is enabled) in *every* address.

                           

                          You could do the following:

                          Address addr; // Infinispan address

                          if(addr instanceof TopologyAwareAddress) {

                               JGroupsTopologyAwareAddress jg_addr=(JGroupsTopologyAwareAddress)addr;

                               String machineId=jg_addr.getMachineId();

                               // use hash on machineId to place into wheel

                          }

                           

                           

                          [1] https://github.com/belaban/infinispan/tree/rebalance-changes

                          • 10. Deterministic Hash Wheel Positions
                            shane_dev

                            Actually, we had updated the JGroupsTransport where the channel name is set. That is where the random suffix is generated and added. Not that I'm arguing it should be there. I did start to think that it should be in the hash wheel itself.

                             

                            We actually attempted to use the TopologyAwareConsistentHash by taking advantage of the machineId and treating the node name like an 'instance' name. However, it can only be used when numOwners > 1 per this issue. Not a bad thing, I understand it was original intended specifically for backups and eventually we'll want them. But at this point, we're still testing with just numOwners = 1.

                             

                            The question is are we going to remove the dynamic suffix with the assumption that there is a unique config for each instance?

                             

                            Or, do we look for solution that accomadate both scenarios: common config with randox suffix, and unique config per instance.

                            • 11. Deterministic Hash Wheel Positions
                              mircea.markus

                              Actually, we had updated the JGroupsTransport where the channel name is set. That is where the random suffix is generated and added. Not that I'm arguing it should be there. I did start to think that it should be in the hash wheel itself.

                               

                              Didin't you also extended the CH in a way described by Bela?  By default, the position on wheel is only influenced by  the JGroups address which does not rely on the channel name.

                              • 12. Deterministic Hash Wheel Positions
                                shane_dev

                                I'll have to check since I wasn't the one who actually wrote the code. I don't have the Infinispan/JGroups source in front of me but if I recall correctly the UUID is encapsulated in the Address and it will use the name if present for things like toString() and what not. Though I need to check on this. I'll respond after I find the actual code changes and take a look at them. I didn't necessarily want to use ours as a reference since it was 'quick and dirty'.

                                • 13. Deterministic Hash Wheel Positions
                                  belaban

                                  Note that UUID.hashCode() hashes on the 2 longs in it, *not* on the logical name ! The logical name is only used in toString, everything else (including equals() and compareTo() use the longs)

                                  • 14. Deterministic Hash Wheel Positions
                                    shane_dev

                                    I just checked the code for what we did. In fact we had extended both the JGroupsTransport and the JChannel. We extended the JGroupsTransport to remove the appending of that dynamic suffix. We extended the JChannel to manually set the id and UUID.

                                     

                                    So, we had a node name of node_123. We parsed out 123 and used it as the channel id. When we set the address we used UUID(0, id). That is why it worked. We were controlling the actual long values.

                                     

                                    However, I don't see that as a proper solution. I'd rather not touch JGroups since there is obviously nothing wrong with it. Perhaps we could simply use the string representation of the Address as the key in the hash wheel? This way we could use the logical name.

                                    1 2 Previous Next