7 Replies Latest reply on May 24, 2006 2:58 PM by manik

    Hanging on startup

    mraccola

      I am getting the following warning message when starting up a replicated JBoss TreeCache 1.3.0.SP2 instance and JGroups 2.2.8 on WAS 5.1.1. The application never comes up and this message is repeated over and over. I am unable to stop the server, using stopserver command, and end up having to re-boot of the Windphysical machine.

      WARN [UpHandler (UNICAST)] (org.jgroups.stack.UpHandler:65) 2006-05-16 23:23:53,141 - UpHandler (UNICAST) exception: java.lang.NoClassDefFoundError: org/jgroups/protocols/UNICAST$Entry

      I was able to consistently re-create the problem using the following steps:

      On a single node WebSphere instance:
      1) Set TreeCache CacheMode=REPL_SYNC (I know it should be LOCAL, but this is the fail scenario)
      2) Start-up application, cache starts up, cache node obtains a GMS address and joins the "cluster"
      3) Shut-down application
      4) Start-up application

      Step 4 leads the application hanging forever and issuing the log messages seen above.

      Something appears to not be shutting down when the application shuts down. The NoClassDefFound might be a side-effect of the 1st class loader being shut-down by the container.

      I changed to CacheMode=LOCAL for the single node and it seems to work fine in that case. However, I am a little worried about what would happen if I deploy to a multi-node environment with CacheMode=REPL_SYNC, but one of the nodes is down for maintenance at the time, would I hang the application server?

      I can send the complete log4j output at INFO level for both org.jboss.cache and org.jgroups if necessary. I can also provide the tree cache configuration files. Note: I have two TreeCache instances running.

        • 1. Re: Hanging on startup
          belaban

          This smells like a class not found problem. Where did you place your JBossCache and JGroups JARs ? Place em as high as possible 'up the stack', e.g. in ./lib if such a thing exists in WAS

          • 2. Re: Hanging on startup
            mraccola

            I have read the JBossCacheAndWAS Wiki on how to accomplish this and I have a concern with this approach. Moving the JBoss Cache JARs to the %WAS_ROOT%/lib/ext directory requires overriding the WebSphere JMX JAR jmxc.jar in $WAS_ROOT%/lib. I believe this would interfere with WebSphere and the administration console, not to mention resistance from the infrastructure folks.

            Is this the only option? What would prevent the cache and/or JGroups channel from shutting down cleanly when the application class loader is shutting down? Am I correct that there is still a cache or channel which was loaded from the old application class loader still being notified of broadcasts from the new application? Why would this situation lock up the application server process?

            Alternatively, is there any plan to remove the dependency on JBoss JMX, as the JMX capability is not used outside of JBoss AS?

            • 3. Re: Hanging on startup
              belaban

              Yes, there is, but it is not very high prio. If JMX is indeed the problem, we might bump up the priothough...

              • 4. Re: Hanging on startup
                mraccola

                Thank you for the quick responses. Yes, in my opinion, removing the JMX dependency would be a big deal for mulitple reasons. I can do a separate post on that so as not to clutter this one.

                I still don't understand what could be causing the server to hang. Can you think of anything which could be misconfigured in TreeCache or JGroups which would cause them to not be shut-down cleanly (and without any notification in the logs). As I mentioned I have two TreeCache instances running, both are using "localhost" as the mcast address. They have different cluster names. Other that that I haven't changed many properties in the treecache XML configuration.

                • 5. Re: Hanging on startup
                  belaban

                  I realized that we will probably remove the JBoss JMX dependency in the 2.0 series, where we do heavy refactoring anyway.
                  But, looking at your issue, now I don't think this is related to JMX. Are you using the TreeCacheMarshaller (set it to true) ? This is described in the docs, I suggest (a) use the 1.4 beta and enable the TreeCacheMarshaller.

                  • 6. Re: Hanging on startup
                    mraccola

                    I think I found a case where an reference to the cache was being referenced from outside the application (JAAS login module in container). This could have prevented the referenced cache from shutting down, I guess. I have fixed the problem and I can no longer re-create it at any rate. I did not have to switch to TreeCacheMarshaller.

                    By the way, does Hibernate support the TreeCacheMarshaller with its integration with JBoss TreeCache?

                    Thanks for your help.

                    • 7. Re: Hanging on startup
                      manik

                       


                      By the way, does Hibernate support the TreeCacheMarshaller with its integration with JBoss TreeCache?


                      Yes, it would. Basically Hibernate uses JBC as a black box cache provider.