1 2 Previous Next 18 Replies Latest reply on Dec 1, 2004 1:49 AM by jiwils

    JBossCache Bug in get(someNode, key)?

    jiwils

      I have two JVMs that are invoking put(blah, key, value) and get(blah, key) where blah is the same node in a TreeCacheAop instance. The keys and the values are String instances. A key, in this case, is a hashed version of its corresponding value which is just an FQN to another location in the cache. When many threads (200 or more) are creating nodes and adding key/value pairs for those nodes into the cache, I occassionaly get null back from the cache on a get invocation. This occurs in between get invocations for the same key in which a value was returned. At no time are key/value pairs being removed from the blah node, and at no time are other nodes being removed from the cache.

      Can anyone help me postulate why this might be occurring? I am using JBossCache 1.1.1 with an isolation level of SERIALIZEABLE (it happened with REPEATABLE_READ too) and with asynch replication.

        • 1. Re: JBossCache Bug in get(someNode, key)?
          belaban

           

          "bela@jboss.com" wrote:
          Jimmy, if you could provide the unit test, that would be great, because I want to release JBossCache 1.2 by mid-December.


          I have created a unit test that causes the described behavior. Because it involves a little more than a single JUnit TestCase class, I do not think it would be wise to post it on the forum. I will send it along to you and Ben via an e-mail.

          • 2. Re: JBossCache Bug in get(someNode, key)?
            jiwils

            Okay, Jimmy, could you create a unit test that reproduces the problem ?
            If you say replication doesn't affect this b/c it is in the same VM, this should make it even easier.
            Cheers,
            Bela

            • 3. Re: SRP: Multiple clients
              jiwils


              There are more information that might be important to find out what is the problem.

              On the server side i have the following configuration:

              <application-policy name = "CustomFwRealm">

              <login-module code= "org.jboss.security.srp.jaas.SRPCacheLoginModule"
              flag = "required">
              <module-option name = "cacheJndiName">srp-fw/AuthenticationCache
              </module-option>
              </login-module>

              <login-module code = "com.security.jaas.FwServerLoginModule"
              flag = "required">
              <module-option name = "password-stacking">useFirstPass</module-option>
              </login-module>

              </application-policy>

              The FwServerLoginModule is a login module that creates a principal and gets the user roles.
              The strange part is that after the the Client application 2 did the login and the Client application 1 calls the method, the login method of this module is called again.
              Althought the login returns true i've got the exception

              2004-12-07 12:23:15,690 ERROR [org.jboss.ejb.plugins.SecurityInterceptor] Authentication exception, principal=nmeira

              and commit is never called...

              • 4. Re: JBossCache Bug in get(someNode, key)?
                norbert

                Try this with synchronous replication to see whether there are any 'CacheException' when 'put' ist called right before retriving 'null'.

                (Asynchronous Replication will catch these Exceptions internally since they do not occur synchronous to the corresponding method-call).

                • 5. Re: JBossCache Bug in get(someNode, key)?
                  jiwils

                   

                  "norbert" wrote:
                  Try this with synchronous replication to see whether there are any 'CacheException' when 'put' ist called right before retriving 'null'.

                  (Asynchronous Replication will catch these Exceptions internally since they do not occur synchronous to the corresponding method-call).


                  An interesting idea (and I will try it to see what happens), but in my case, the put in question is happening in another VM. Furthermore, I know that the replication is working because before (and after) the null occurrance, get invocations for the same key are working. Could there be a "hidden" exception in the get invocation that your suggestion might expose? Since I am not using transactions, I am not sure what that would be. I am only using SERIALIZABLE as an isolation level because it seemed to fix an unrelated issue where a cache node existed, but its corresponding object did not (yet).

                  • 6. Re: JBossCache Bug in get(someNode, key)?
                    jiwils

                    My application was running a a Linux box, but all of the application's JAR files were available via NFS. By moving the application's JAR files to local disk, the described problem seems to have disappeared.

                    Unfortunately, I was unable to try the aforemented tests (turning on synchronous replication) before this change was made, so I do not know what that might have produced. If the problem reappears, I will try this to see what it produces.

                    My theory is that the classloader was failing to load class information across the NFS-mounted file system and causing the issue. Has anyone else experienced this type of strange behavior in their Java applications when utilizing NFS?

                    • 7. Re: JBossCache Bug in get(someNode, key)?
                      jiwils

                       

                      "jiwils" wrote:
                      My application was running a a Linux box, but all of the application's JAR files were available via NFS. By moving the application's JAR files to local disk, the described problem seems to have disappeared.


                      Correction...the problem is back (with everything on the local disk). I will be attempting to see what happens if I turn on synchronous replication.

                      • 8. Re: JBossCache Bug in get(someNode, key)?
                        jiwils

                         

                        "jiwils" wrote:
                        Correction...the problem is back (with everything on the local disk). I will be attempting to see what happens if I turn on synchronous replication.

                        I am still getting a null result when calling get(blah, key) just like before even when using synchronous replication. There were no CacheExceptions or a derivative reported during the get that returned null nor during the put in the remote VM where the key/value pair was placed into the cache. In the same VM as the get that returned null, other threads received a value for the same key before and after the invocation that returned null.

                        • 9. 3840220
                          jiwils

                           

                          "jiwils" wrote:
                          I am still getting a null result when...


                          For further clarification, here is the snippet of code that this is occurring in and the log that it produces (proof that this is really happening).

                          The code snippet:
                          String nodeID = null;
                          String fqn = null;
                          
                          try
                          {
                           nodeID = new String(id);
                           fqn = (String) _cache.get("/blah", nodeID);
                          
                           if (log.isDebugEnabled())
                           {
                           StringBuffer message = new StringBuffer("Retrieved");
                           message.append(SPACE);
                           message.append(fqn);
                           message.append(SPACE);
                           message.append("for");
                           message.append(SPACE);
                           message.append(nodeID);
                           message.append(SPACE);
                           message.append("from blah.");
                          
                           log.debug(message.toString());
                           }
                          }
                          catch (CacheException ce)
                          {
                           log.error("Unable to use blah:", ce);
                          }
                          


                          The log snippet:
                          00:13:40,887 DEBUG [Test] Retrieved /blah/some_test for bwLbCSdPI80ONJeMWEmEgA== from blah.
                          00:13:40,889 DEBUG [Test] Retrieved /blah/some_test/test for MyTOeDTaGhly7bp4/Smy6g== from blah.
                          ...snip...
                          00:13:40,889 DEBUG [Test] Retrieved null for MyTOeDTaGhly7bp4/Smy6g== from blah.
                          ...snip...
                          00:13:40,900 DEBUG [Test] Retrieved /blah/some_test for bwLbCSdPI80ONJeMWEmEgA== from blah.
                          00:13:40,901 DEBUG [Test] Retrieved /blah/some_test/test for MyTOeDTaGhly7bp4/Smy6g== from blah.
                          


                          Again nothing is being removed from the blah node's map during this test, and no CacheException is logged (even though synchronous replication has been enabled).

                          • 10. Re: JBossCache Bug in get(someNode, key)?
                            norbert

                            jiwils wrote:

                            Since I am not using transactions...


                            I guess you should use transactions. Without transactions the TransactionIsolationLevel is not imposed, there's no Node locking and as a result the content of a Node might just be modified in the moment you are calling 'get()'.

                            If you are not running in a J2EE-container, you might just use the DummyTransactionManager that comes with JBossCache.

                            • 11. Re: JBossCache Bug in get(someNode, key)?

                              Couple suggestions:

                              1. Like Norbert mentioned, use tx now. There is a known problem in locking when tx is not used. We are fixing that in release 1.2.

                              2. If you are using TreeCacheAop, I'd suggest you try the latest pre-1.2 release in jboss-head. I have done some refactoring and bug fixing on the aop part.

                              3. If you still are seeing the problem, the best way to help me out to troubleshoot is to write a JUnit test case. That can cut down the time that I need to generate one by myself. Besides, I can check in your test case in the src tree as well. :-)

                              Thanks,

                              -Ben

                              • 12. Re: JBossCache Bug in get(someNode, key)?
                                belaban

                                 

                                "jiwils" wrote:

                                Do I need locking when I am "getting" a value that is already in a node's map? I do not care if other items are added to the map while I am "getting" this value do I? Is the issue you refer to the reason I am getting null?


                                Yes, you need locks on reads: if isolation level is SERIALIZABLE, then a get() might block if another read or write is happening.

                                Bela

                                • 13. Re: JBossCache Bug in get(someNode, key)?
                                  belaban

                                   

                                  "jiwils" wrote:

                                  Can anyone help me postulate why this might be occurring? I am using JBossCache 1.1.1 with an isolation level of SERIALIZEABLE (it happened with REPEATABLE_READ too) and with asynch replication.


                                  You *cannot* use async repl and expect that the backup cache(s) have already been updated when you update the primary cache. Use sync_repl.
                                  BTW: there is a unit test case (forgot which one) that tests this. Check it out.

                                  Bela

                                  • 14. Re: JBossCache Bug in get(someNode, key)?
                                    jiwils

                                     

                                    "bela@jboss.com" wrote:
                                    You *cannot* use async repl and expect that the backup cache(s) have already been updated when you update the primary cache. Use sync_repl. BTW: there is a unit test case (forgot which one) that tests this. Check it out.


                                    I agree with the above; I do not expect an update from one cache VM to be immediately propogated to the VMs in the cache cluster (and I have been testing with SYNC replication anyway though I prefer ASYNCH replication).

                                    The problem I am seeing is after the replication occurs. A key/value pair exists in a node's map, then it does not, then it does. All of this happens inside the *same* VM. Isn't the initial presence of the key/value pair, in the node's map proof that replication has already occurred? The VM in which this behavior occurs does not place this value into the map, but after it is replicated, it disappers only to reappear again later (as the log showed). The disappearance/reappearance of the key/value pair is my concern, not when it is replicated.

                                    Note, the key/value pair that goes missing (there are actually two that exhibit this behavior), is not being modified or removed by any VM in the cache cluster once it has been added.

                                    1 2 Previous Next