4 Replies Latest reply on Nov 4, 2013 12:25 AM by genman

    4.9 and Cassandra; multiple announce, duplicate hard link, other issues with 4.9

    genman

      4.9 has a bug where if two or more nodes are in 'announce' state they can never move out of it. The problem is some of the queries were designed to return a single result. I'm hoping this was fixed. Yes, it seems much safer to add one node at a time, but I was trying to get a 4 node cluster up and running right away and got quite stuck.

       

      One other issue is Bug 1017961, which has to do with MBeans appearing down: I've seen this on 4.5.1 and 4.9 where MBeans are unavailable but in fact they are there. This shows up with Cassandra, if the system is restarted when the agent is up, you get into this state. I have a proposed fix for this which seems to solve this problem.

       

      The other problem seen is taking snapshots. I've seen all three nodes fail with:

      Caused by: javax.management.RuntimeMBeanException: java.lang.RuntimeException: Tried to create duplicate hard link to /data05/rhq/data/system/NodeIdInfo/snapshots/1383438602316/system-NodeIdInfo-ic-1-TOC.txt

       

       

      I'm not sure what this is about. The help isn't too helpful.

        • 1. Re: 4.9 and Cassandra; multiple announce, duplicate hard link, other issues with 4.9
          john.sanda

          Hi Elias,

           

          I have created https://bugzilla.redhat.com/show_bug.cgi?id=1026088 for the deployment issue. This will be fixed in 4.10. For the other two issues, would you remind creating separate forum threads for those since they are distinct from the deployment issue? I have probably run into BZ 1017961. That sounds familiar. Not sure about the hard link issue. Can you provide any more details on how you hit that?

           

          Thanks

          • 2. Re: 4.9 and Cassandra; multiple announce, duplicate hard link, other issues with 4.9
            genman

            John Sanda wrote:

             

            Not sure about the hard link issue. Can you provide any more details on how you hit that?

             

            Thanks

            I'm not sure how I hit it. I originally had a four node cluster and one node stopped talking and I had it decommissioned. Since I wasn't terribly methodical about things, maybe something got corrupted. It seems to be working though.

             

            I'll start a few more threads if you like...

            • 3. Re: 4.9 and Cassandra; multiple announce, duplicate hard link, other issues with 4.9
              john.sanda

              Thanks for creating the separate threads. I have created another bug about adding support for deploying multiple nodes simultaneously - https://bugzilla.redhat.com/show_bug.cgi?id=1026128. There is some work that needs to be done in order to support this properly. There is nothing in RHQ 4.9 to prevent you from attempting to deploy multiple nodes simultaneously (other than the bug you hit), but I strongly discourage doing it as you are likely to run into problems. For example, suppose you have a 3 node cluster with nodes N1, N2, and N3. Then you decide to deploy N4 and N5 at the same time. N4 and N5 will likely not be able to talk to one another and that ought to lead a whole host of interesting problems.

               

              For now the fastest solution is to install multiple nodes before installing the server. It requires more manual steps, but you completely bypass the deployment process that happens when new storage nodes get imported into inventory.

              1 of 1 people found this helpful
              • 4. Re: 4.9 and Cassandra; multiple announce, duplicate hard link, other issues with 4.9
                genman

                Thanks John,

                 

                What I initially tried to do is create one node, then I added three more. :-(

                 

                I've had lots of problems, like bug 1025783, where the installer ends up creating multiple agents in inventory, causing trouble.

                 

                What would be helpful is a log, showing the steps, what it's doing and what failed. It also would be nice if the state within RHQ would reflect the state from Cassandra, not try to guess what it is based on some sort of installation operations. For example, I have a node that's working but it shows up as DOWN, then it gets kicked out of the cluster.