4 Replies Latest reply on Feb 5, 2006 1:45 AM by asafz

    RetryInterceptor problem with cluster shutdown

    asafz

      I'm using the org.jboss.proxy.ejb.RetryInterceptor like the example in the admin guide.
      I wanted to use it for recovery in case of a total cluster failure.
      but when I shut down all the cluster nodes than access the slsb from my client the client is stuck and he is remaining stuck after i started back the cluster nodes.
      this is not the expected in this kind of failure, I tought it sholud re connect the cluster node after the cluster is back on air

      by the way i'm using jboss-4.0.3sp1

      thnx,

        • 1. Re: RetryInterceptor problem with cluster shutdown
          brian.stansberry

          Yes, it should. It does this by doing a JNDI lookup of the bean invoker stub. If it's failing to reconnect, I suspect the problem is the JNDI lookup is failing.

          What are your jndi.properties? And is auto-discovery turned on in the HA-JNDI service on the server side? (It is by default; you'd have to explicitly turn it off.)

          The RetryInterceptor will just use a default InitialContext() unless you pass a set of JNDI properties to the static RetryInterceptor.setRetryEnv() method. Often people don't do that (it requires a call to JBoss-specific code). But, even if you don't do it, the default InitialContext() should eventually fall back to auto-discovery of HA-JNDI and find the cluster that way.

          • 2. Re: RetryInterceptor problem with cluster shutdown
            asafz

            thank you for your answer,
            I dont have jndi.prorpties in the client so like you said the RetryInterceptor is probably looking for the HA_JNDI with the default multicast address.

            and this is not working because my switches are not configured to use multicast address and my client is in diffrent segment then the server. so this is not supposed to work. I'll try to configure the jndi properties my self (to the retryInterceptor) and check if this will do the job.

            still have 2 questions:
            1. what is the interceptor suppose to do if he didnt find any jndi(total cluster failure) is it suppose to stuck until he gets some answer ?(retrying every amount of second until he found one)

            2. general jboss cluster question: in jboss you have a cluster but actually you dont have one cluster but 2 clusters or 3 clusters (if you use ejb3).
            every time you are using TreeCache you are using also the jgroup to find the members who should be replicated so the tomcat cluster is actually not related to the main DefaultPartition jboss cluster and also the treeCache of the ejb3 sfsb and entity beans use TreeCache and use its own jgroup.
            I think that all of it should use the same jgroup definition(some kind of singelton) cause its the same cluster after all. the way that is defined now is causing a lot of configuration problems and i think it's logically wrong.
            (if you by mistake change one of this files in one server you would for example have a one jboss cluster but a diffrent tomcat cluster)
            What do you think? is there any reason why it is like this now?

            thanx.

            • 3. Re: RetryInterceptor problem with cluster shutdown
              brian.stansberry

              1) Yep, it retries once a second until it succeeds. The downside to this is if it never reconnects your client is hung.

              2) You're opinion is correct. We have it on the roadmap to allow a shared JGroups channel between multiple services, and it's high priority. The reason it's the way it is now is that there are some tricky challenges to solve in JGroups to allow multiple services to share a channel without conflicting with each other (e.g. how do they get their service-specific state when they come independently start up, how do you prevent one service hanging a thread and preventing other services getting messages, etc.).


              Also, regarding retry, in 4.0.4 there will be some new features. First, if you use org.jboss.naming.NamingContextFactory instead of org.jnp.interfaces.NamingContextFactory, whatever environment properties you pass to new InitialContext(Hashtable) will get stored in a ThreadLocal. The RetryInterceptor can then access that ThreadLocal to find them, saving the need for you to call RetryInterceptor.setRetryEnv().

              Also, RetryInterceptor now exposes a protected c'tor that takes 2 args; int maxRetries, long sleepMillisecs. You can use this to write your own trivial subclass that, for example. will retry once a second for 30 secs, and then fail; in your subclass' default c'tor just call super(20, 1000). This can give you finer control of client behavior -- retry for a time that's appropriate for your app, and fail if not successful.

              I'll being adding more on this stuff to the RetryInterceptor wiki page next week.

              • 4. Re: RetryInterceptor problem with cluster shutdown
                asafz

                thanx,

                The RetryInterceptor now works fine. (I added jndi.properies to the classpath).