11 Replies Latest reply on Jun 29, 2006 9:32 PM by ben.wang

    Proposal: Flat 'Heap' for POJOs stored in PojoCache

    brian.stansberry

      We talked about the concept of a flat 'heap' for storing POJOs at the end of our clustering meeting in Las Vegas -- wanted to post here both to flesh out the idea a bit and because I've found another situation that seems to call for it.

      General Problem:

      Let's say we have a Person POJO with 3 fields: String name, int age, Address address. Address has string fields street, city and postcode. Application calls cache.putObject("/husband", person);

      Current implementation: This data gets stored internally in a Node with Fqn "/husband". The data is stored as follows -- name and age as key value pairs in the node's data map, and Address as another Node with Fqn "/husband/address". This second "address" node is stored in the "husband" node's children map under key "address".

      There are 2 ugly things about this set up.

      1) The fields that define a Person are spread over two separate data structures -- the Node's data map and it's children map. Not a big deal, but this gives off a bit of a strange smell.
      2) The node for the Address POJO is tightly coupled to the node for the Person POJO even though the Address object has no reference at all to the Person object. This leads to all sorts of issues when there is more than one reference to the Address, since the location in which the Address' data is stored implies the continued existence of a particular instance of Person.

      Proposal:

      Separate the tree structure from the storage structure. Store all complex objects in a flat structure that is not meant to be externally accessed. The cache's tree structure is maintained solely as a way to allow users to organize the objects they place in the cache. If a user calls putObject("/abc", pojo), pojo is stored in the internal area -- all that is stored in node "/abc" is a pointer to the internal area.

      Each unique POJO would be stored in an individual node under root "/_JBossInternal_/_HEAP_". When the cache first identifies the existence of a new POJO, it will assign it a PojoId:

      public class PojoId implements Serializable {
       private Address creatorAddress;
       private java.rmi.server.UID uid;
      }


      The combination of Address and UID is sufficient to make a globally unique identifier for that pojo.

      The Fqn for any particular Pojo would be "/_JBossInternal_/_HEAP_/".

      All information that describes a POJO would be stored in the node's data map. The difference from the current implementation is that for any field that points to a complex type, a key/value pair for that field would be stored in the data map, but the value would be the PojoId of the pojo referenced by the field. So, if we had a Person pojo stored under PojoId (for short) 1234 with a reference to an Address pojo stored under PojoId 9876, the data map for /_JBossInternal_/_HEAP_/1234 would have these key/value pairs:

      name="Brian Stansberry"
      age=40
      address=<PojoId:9876>
      __jboss:internal:class__=org.jboss.cache.test.Person
      AopInstance=<AopInstance@fd34a7>

      If this Person were placed in the cache under Fqn "/husband", all that would be stored in the "/husband" node would be an AopInstance object with a refFqn field pointing to /_JBossInternal_/_HEAP_/1234.

        • 1. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
          brian.stansberry

          I mentioned before that I found another problem that seems to call for the use of a flat heap.

          See unit test o.j.c.buddyreplication.aop.BuddyReplicationFailoverTest. This test currently fails due to JBCACHE-669, but even if 669 were resolved the test would still fail due to the following problem:

          2 Person objects, with a shared reference to an Address. Using buddy replication.

          1 Person is placed in Cache 1 under "/husband" then the other is placed in Cache 1 under "/wife". The address is stored in "/husband/address" -- "/wife/address" just has an indirect pointer to "/husband/address".

          Cache 1 dies. User fails over to Cache 2.

          1) User calls getObject("/wife").
          2) This causes node's "/wife" and "/wife/address" to gravitate.
          3) User call wife.getAddress().
          4) JBCACHE-669 is fixed, so this causes "/_JBossInternal_/_RefMap_/husband_address" to gravitate.
          5) PojoCache deferences and follows the RefMap entry to "/husband/address", so node "/husband/address" is gravitated.
          6) Gravitating "/husband/address" *does not cause the data map of "/husband" to gravitate*!!! Rather, Cache 2 just creates an empty node at "/husband" to maintain the tree structure.
          7) User calls getObject("/husband"). This returns null, because node "/husband" does not have any data. No data gravitation is performed, because the "/husband" node *exists* in Cache 2.

          The fundamental problem here is the intermingling of the TreeCache structure with the storage of data. You can't access the Address object without involving the concept of "husband", which leads to all sorts of problems.

          BTW, I don't think this particular issue has to be fixed for 1.4.0 -- we could just say BR is for plain cache operations. However, JBCACHE-669 does need to be fixed, as FIELD replication does not work correctly without it.

          • 2. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache

            Yes, this is something that I plan to do. Not necessary to store all the internal node under the same _HEAP_ space since issues such as locking needs to be taken into account.

            Secondly to address your second post, a better solution like I mentioned in my previous reply to your forum post, is to have _JBOSS_INTERNAL_ to reside into an individual region. This way, again, we assume the object relationship only has the scope within that region. But this should be a perfect assumption. For example, within a webapp, we don't expect the relationship to persist across the different webapps, right?

            • 3. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
              brian.stansberry

              IMHO, a flat heap *improves* locking semantics by making them more like standard java locking.

              E.g. in standard Java if you have a Person with a sub-object Address. One thread has a ref to the Address; another thread has a ref to the Person. If the 2nd thread synchronizes on the Person, that doesn't prevent the 1st thread updating the Address.

              PojoCache would prevent an update to the Address if the Person has been updated. But, AFAICT, only sometimes! If Address is stored in /husband/address, an update to /husband locks address. But an update to /wife with an indirect ref to /husband/address doesn't lock the Address.

              • 4. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
                belaban

                Ben, do we have a JIRA issue regarding this ?

                • 5. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
                  belaban

                  +1 on this. Allocating a child node for a complex field (e.g. address) creates an unwanted dependency between parent-child which doesn't exist in real life, as an address can exist independently of a person

                  • 6. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache

                    Yeah,
                    http://jira.jboss.com/jira/browse/JBCACHE-589

                    Per our discussion in Vegas, this is the direction that I am going as well. I have been trying to walk through couple of use cases to see if there is any hole to it.

                    • 7. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
                      belaban

                      The 589 doesn't look like the flat heap task, that one is about pojo level locking.

                      • 8. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache

                        Oops! Sorry. :-)

                        I mean this that I have created a while ago:
                        http://jira.jboss.com/jira/browse/JBCACHE-173

                        I have added the link to this post and will update it further later.

                        I am planning to implement the flat space approach in 2.0 (or 2.1) depending on the release schedule, I think.

                        On the locking semantics to match the Java one, one thing that I have mentioned previously in the Jira is that if I have an example like the following:

                        Person p1 = new Person();
                        Address addr = new Address();
                        
                        cache.attach(p1);
                        cache.attach(addr);
                        
                        p1.setAddress(addr);
                        


                        Then if thread 1 is doing:
                        tx.begin();
                        p1.setBlah();
                        ...
                        


                        thread 2 will be blocked in this operation as well:
                        p1.getAddress().setZip(95123);
                        


                        but this will not block on thread 2,
                        addr.setZip(95123)
                        


                        Same thing that if thread 2 goes first, then thread1 is blocked as well.

                        In Java, the behavior really depends on whether syncrhonized(this) is used for all of the methods or not.


                        • 9. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache

                           

                          "bstansberry@jboss.com" wrote:

                          2 Person objects, with a shared reference to an Address. Using buddy replication.

                          1 Person is placed in Cache 1 under "/husband" then the other is placed in Cache 1 under "/wife". The address is stored in "/husband/address" -- "/wife/address" just has an indirect pointer to "/husband/address".

                          Cache 1 dies. User fails over to Cache 2.

                          1) User calls getObject("/wife").
                          2) This causes node's "/wife" and "/wife/address" to gravitate.
                          3) User call wife.getAddress().
                          4) JBCACHE-669 is fixed, so this causes "/_JBossInternal_/_RefMap_/husband_address" to gravitate.
                          5) PojoCache deferences and follows the RefMap entry to "/husband/address", so node "/husband/address" is gravitated.
                          6) Gravitating "/husband/address" *does not cause the data map of "/husband" to gravitate*!!! Rather, Cache 2 just creates an empty node at "/husband" to maintain the tree structure.
                          7) User calls getObject("/husband"). This returns null, because node "/husband" does not have any data. No data gravitation is performed, because the "/husband" node *exists* in Cache 2.

                          The fundamental problem here is the intermingling of the TreeCache structure with the storage of data. You can't access the Address object without involving the concept of "husband", which leads to all sorts of problems.

                          BTW, I don't think this particular issue has to be fixed for 1.4.0 -- we could just say BR is for plain cache operations. However, JBCACHE-669 does need to be fixed, as FIELD replication does not work correctly without it.


                          Brian, I have thought about this issue while designing the new mapping scheme. While the scenario that you mentioned can happen, I think it is more unlikely. The reason being that buddy replication requires "sticky session" to operate in cases like http session repl.

                          If it is http session repl, then every data structure will be stored under a sessionID. In this case, my undertanding is we will gravitate everything under sessionID in one shot during failover, am I correct. Therefore, shared reference between "joe" and "mary" will still work, for example.

                          Using a flat mapping on the other hand, will require, during gravitation, the special need to gravitate the corresponding _JBoss_Internal_ area (since everything is stored in the internal flat heap now). In essence, we will need to walk the object graph.

                          For example, I am thinking what will be the best behavior for data gravitation, if after failover I do a:
                          joe = getAttribute("joe", joe);
                          joe.getName();
                          

                          should we also gravitate the corresponding address field as well. Or just lazily gravitate it when needed? E.g., when
                          joe.getAddress().setCity("Taipei");
                          

                          is called?


                          • 10. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache
                            brian.stansberry

                             

                            "ben.wang@jboss.com" wrote:

                            Brian, I have thought about this issue while designing the new mapping scheme. While the scenario that you mentioned can happen, I think it is more unlikely. The reason being that buddy replication requires "sticky session" to operate in cases like http session repl.

                            If it is http session repl, then every data structure will be stored under a sessionID. In this case, my undertanding is we will gravitate everything under sessionID in one shot during failover, am I correct. Therefore, shared reference between "joe" and "mary" will still work, for example.


                            Yes, definitely. That's why this issue was not a big priority for me for 1.4.0 -- the session replication use case works even if this issue isn't resolved.

                            Re: walking the object graph and gravitating aggressively vs. lazy gravitation, my instinct is that walking the object graph and gravitating aggressively would be more performant. Otherwise you end up doing a bunch of single node gravitations at random points. I could be wrong though. For example, gravitation now is a 2 step process:

                            1) Please give me everything under Fqn x -- return is a list of NodeData objects.
                            2) Please remove everything you had under Fqn x, as I now own it. This is a simple call, I just pass "x".

                            Doing it in 2 steps is important as you avoid removing the data from the old backup node until the new owner acknowledges he's received it.

                            With a flat heap it becomed more complex:

                            1) Please walk an object graph starting at x and give me all nodes.
                            2) Please rewalk that object graph, but now delete all nodes you have.

                            I guess that's not that different, but it is different.

                            • 11. Re: Proposal: Flat 'Heap' for POJOs stored in PojoCache

                              Yeah, active loading is what I have in mind. As a matter of fact, if we can safely assume that all BR will a corresponding session ID fqn associated with it, then all we need is graviate it twice: one on the regular /JSESSION/sessionId and one on _JBOSS_INTERNAL/sessionId. All the object graphs should be contained within.