2 Replies Latest reply on Oct 8, 2008 4:02 AM by timfox

    Customers having problems during startup trying to use the j

    jay.howell

      This is in relation to https://jira.jboss.org/jira/browse/JBMESSAGING-1274

      Problem:
      The problem is currently that many customers start their systems(default peer id is 0) and they have a problem. The error message states that there are two peers with node 0 as the peer. They then look up the error and figure out that they either need to set the environment variable or manually adjust the peer id.

      But the problem is by the point they get the duplicate peer id message, the database is already hosed. If the peers are started at the same time, they generate channel id's using node 0. So you have two datastores with the same node id and different channel ids. This creates an error condition where you just have to wipe away your database.

      This entire thing can be avoided if users could use the old jboss paradigm. In order to start a cluster, just start jboss on different nodes in the same network and they will join. This would require something else as the peer id, other than the default 0.

      Possible Solution to this problem:

      We could change the peer id to a string field.(db would need changing). So how does the peer id being a string help us here. It helps us because we can then ping something that is naturally occuring in the environment to be that peer id. One thing that comes to mind is bind_address ${bind_address}. The bind address could be used as the peer id, or anything else that is in the environment. But right now there are very few int type things available in the environment. This would also pave the way for a possible future fix to add a programatic peer id designation based on things such as mac address, or other things that will uniquely identify the peer.

      Jay:)

        • 1. Re: Customers having problems during startup trying to use t
          clebert.suconic

           

          "jhowell@redhat.com" wrote:
          This is in relation to https://jira.jboss.org/jira/browse/JBMESSAGING-1274
          But the problem is by the point they get the duplicate peer id message, the database is already hosed.


          Wouldn't be easier to avoid the database on HSQL from being damaged than making any substantial re-engineering or refactorings on JBM 1.4?

          IMO Changing the ServerPeerID on JBM 1.4 from an Integer to a String is technically possible but is unlikely to be done. That would change all the testcases on clustering, it would require extensive testing, for the cost of something that won't ever be used in production. In other words, it would diverge all of us from JBM2.

          We need to avoid the temptation of creating something that would look like a JBM 1.5.

          If you avoided the DB from being corrupted as you're saying, the user would just have to restart the server with the correct ID.

          • 2. Re: Customers having problems during startup trying to use t
            timfox

            The main reason why server peer id is not a string, is it's used in the PK of tables in the schema.

            Making it a string would increase DB size and slow things down.

            Ints are very difficult (if not impossible) to make globally unique without getting them from some shared counter (e.g. using JGroups).

            Also the id must be unique - for the life time of the server, not just on a single run, so would need to be persisted after generating.

            It's hard to see what we could do here, and, as Clebert says, if we were to make a change from int to string it would be a big change and unlikely to happen in 1.4 which is maintenance mode.