6 Replies Latest reply on Jul 4, 2013 8:06 AM by john.sanda

    4.8 Error running data migration tool to Cassandra

    pathduck

      Hi,

       

      We've recently upgraded to 4.8 and had problems with running the migration of metric data to Cassandra.

       

      What happened was I had chosen the "rhqctl upgrade" command to output the command needed to manually run rhq-data-migration.

      However this crashed with an error indicating it could not connect to host "/127.0.0.1" - (not sure why the leading slash is there?)

       

      So I could not get the command to manually migrate, and was stuck.

       

      In rhq-storage/conf/cassandra.yaml the listen_address is set to "10.51.9.38" which is the IP of the server, but it seems the data migration tries to connect to localhost (127.0.0.1) and since Cassandra is not bound to this interface it does not listen to localhost and the script fails.

       

      It states that setting it to 0.0.0.0 is wrong -why? It would probably work fine and we have no security issue with binding to both interfaces.

       

      I searced in vain for any documentation of rhq-data-migration.sh, but after a long time found --help so I could create a properties file and use it to complete the process.

       

      By the way, we have chosen to install the storage data under /opt/rhq/rhq-storage.

      What is the recommended location for this if user is not running is root? IMO I think it is wrong of RHQ to just assume the user has write to /var...

       

      Thanks for any info I get can get on how this is meant to work. I find the new Cassandra functions quite confusing and hard to understand when things fail...

       

      Stian

        • 1. Re: 4.8 Error running data migration tool to Cassandra
          john.sanda

          Hi Stian,

           

          listen_address is used for inter-node communication, and 0.0.0.0 will not work with Cassandra's gossip (i.e., inter-node communication) protocol. See this FAQ[1] for more details.

           

          The host/127.0.0.1 is from the java.net.InetAddress.toString method. I agree that the format there can be a bit confusing. It does not mean that the data migrator is trying to connect to the address /127.0.0.1. The value before the slash is the hostname portion of the InetAddress object and the value after the slash is the IP addresss portion of the InetAddress object.

           

          Storing the data under /opt/rhq/rhq-storage is perfectly fine. RHQ actually does not assume the user has write permission to /var. rhqctl checks that the user has write permission to the data directory, be it the default of /var/lib/rhq/storage or a non-default location such as /opt/rhq/rhq-storage. The instal/upgrade will fail with a detailed error message if you do not have write permission to the data directory.

           

          There is a bug[2] with the rhqctl upgrade command that you may have hit. Were you able to use rhq-data-migrator.sh successfully?

           

          The new Cassandra backend is a major change, and we want to do everything we can to provide a smooth upgrade process. Your feedback is very helpful.

           

          [1] http://wiki.apache.org/cassandra/FAQ#cant_listen_on_ip_any

           

          [2] https://bugzilla.redhat.com/show_bug.cgi?id=976790

          • 2. Re: 4.8 Error running data migration tool to Cassandra
            nstefan

            Hello Stian,

             

            The data migrator was not looking in the rhq-server.properties file for additional configuration in RHQ 4.8.0. So in your case it was running with the default host for the Storage Node (which is 127.0.0.1). That is now fixed and will be part of the next RHQ release. The second link posted above by John has all the details. In your case, the best solution is to run the data migration independently post-install and configure it either via command line arguments or a separate configuration file with the arguments.

             

            Were you able to run the migration process post install? Do you have any feedback on the usage of the data migrator? Was the --help documentation good?

             

             



            1 of 1 people found this helpful
            • 3. Re: 4.8 Error running data migration tool to Cassandra
              pathduck

              Hey, thanks for the enlightening replies

               

              Just wondering, what exactly is it trying to connect to at 127.0.0.1? Cassandra or a local Postgresql db since it doesn't know of Oracle (which we use).

               

              It does not mean that the data migrator is trying to connect to the address /127.0.0.1

               

              Just to clarify it *does* try to connect to 127.0.0.1, without the leading slash, right?

               

              The data migrator was not looking in the rhq-server.properties file for additional configuration in RHQ 4.8.0. So in your case it was running with the default host for the Storage Node (which is 127.0.0.1).

               

              Should it not look in cassandra.yaml to find the bind address to connect to for the Cassandra node? At least to me it seemed that the error lay in it *not* being able to connect to 127.0.0.1 since the bind-address of Cassandra was set to the  actual IP of the server. Is our servers not configured correctly to return a proper value of java.net.InetAddress ?

               

              Were you able to run the migration process post install? Do you have any feedback on the usage of the data migrator? Was the --help documentation good?

               

              Yes I was able to migrate properly - it took about 5 minutes for 5-600MB of data from our Oracle server.

               

              I had a read of the help output and figured the best solution was to use a properties file instead of putting all the info needed on a massive command line. I'll have another look at the --help output to see if there's anything I am not sure about there.

               

              It would be a great help if the upgrade process was able to just upgrade and find the correct values in rhq-server.properties as filling out the properties file is a hassle. And also having to document the process while doing it...

               

              cheers,

              Stian

              • 4. Re: 4.8 Error running data migration tool to Cassandra
                john.sanda

                The data migrator was trying to connect to 127.0.0.1 without the leading slash. I am assuming you initially tried running the migration with a command line something like,

                 

                $ rhqctl upgrade --run-data-migrator=true

                 

                in which case rhqctl will run the data migrator using default options. And because of  https://bugzilla.redhat.com/show_bug.cgi?id=976790 the data migrator is trying to connecto to 127.0.01. The data migrator is not set up to parse and read cassandra.yaml. Everything it needs is in rhq-server.properties including the connection info for Cassandra. When rhqctl runs, it updates rhq-server.properties with the connection info for Cassandra. The reason for this is because the RHQ Server needs that connection info as well. The fix for BZ 976790 will go into the next RHQ release which means you won't have fill out the properties file. Of course, I suppose you won't have to worry about it since you have already gone through the upgrade We are still in the process of getting docs updated.

                 

                Am I correct in assuming you did the data migration while your RHQ server was up and running? The data migration can be run with the RHQ server offline. For larger loads (i.e., number of agents, number of metrics schedules, frequency of collections), taking your RHQ server offline could potentially speed of the migration considerably.

                 

                Thanks again for the feedback and glad to hear you are up and running.

                 

                John

                • 5. Re: 4.8 Error running data migration tool to Cassandra
                  pathduck

                  Thanks for the clarification John.

                   

                  I now see there is an rhq.cassandra.seeds property there, so if is parsed it should be able find Cassandra.

                   

                  John Sanda wrote:

                   


                  Am I correct in assuming you did the data migration while your RHQ server was up and running? The data migration can be run with the RHQ server offline. For larger loads (i.e., number of agents, number of metrics schedules, frequency of collections), taking your RHQ server offline could potentially speed of the migration considerably.

                   

                   

                  Yes, I actually found a thread discussing whether the server should be stopped or not. I didn't really think about it when doing it though

                   

                  I might try having the server off when upgrading another installation with the same amount of data. But - I guess Cassandra still needs to run, correct? So one will need to shutdown just the server.

                   

                  thanks,

                  Stian

                  • 6. Re: 4.8 Error running data migration tool to Cassandra
                    john.sanda

                    Right, Cassandra still has to be running. Both server and agent can be shut down though. After finishing the installation you could do,

                     

                    $ rhqctl stop --server --agent

                     

                    and then run the data migrator.