1 2 Previous Next 18 Replies Latest reply on May 28, 2014 2:28 AM by pilhuhn

    Api functionality

    pilhuhn

      We need to find out more use cases to be able to better model the data model. The tags discussion is one aspect. Another one is if we want to have a way to find out what we store, have hyperlinking in the rest-api or if we completely leave this to clients.

       

      To quote a comment in the source

       

              // The idExists call [...]. We

              // need to decided whether or not this check is really necessary. There isn't really

              // an efficient way to do this in Cassandra unless we either query all keys or

              // introduce new schema to support this method.

       

      Basically we are saying that a user that does not know if a metric with an id of "lulu" exist or not will never be able to find out:

       

      snert:~ hrupp$ curl -i http://localhost:8080/rhq-metrics/metrics/lulu

      HTTP/1.1 200 OK

       

      I think we should have an index over existing ids that can be searched / browsed. And that an api call like the curl example should correctly return a 404 Not Found if such an id does not exist.

      So in this case we may need a different table in Cassandra to record those. Perhaps along the tags from the other discussion.

       

      Further:

      1. Do we want to allow the user to delete metrics? Individual items? Full history?
      2. Do we want to add api methods that bucketize metric results (e.g. if we display 60 bars, we need 60 buckets)?
      3. How where do we want to store metadata (units, monotonically increasing/dynamic)?

       

      For 1 I think yes and yes

      For 2 I think yes - as this reduced the need for clients to implement this and it also allows for much easier comparison of two metrics with different ids at a certain point of time, as this means to compare the values of one bucket as very often metric values are not taken at the exact same point in time.

      For 3 I think It may make sense to have a table in C* to encode this -- perhaps along with the table of ids discussed above, so that the metadata does not need to be stored in the tags on each tuple (I know I am thinking relational

        • 1. Re: Api functionality
          heiko.braun

          Something related to the current API:

           

          There is currently a mixture of async and sync API invocation methods [1]. IMO these should all be async.

           

          I was going to ask if it's a good idea to expose the guava ListenableFuture at this level. Plain JDK Futures would certainly do the job as well and simplify things to a lot. Unless somebody has a compelling reason to use listenable Futures, I would suggest to replace them.

           

          [1] rhq-metrics/core-api/src/main/java/org/rhq/metrics/core/MetricsService.java at master · rhq-project/rhq-metrics · GitHub

          • 2. Re: Api functionality
            heiko.braun

            I would propose to remove the Futures from the add*() operations completely. I don't really see the benefit (from a client point of view) for the additional resource allocations this requires.

            • 3. Re: Api functionality
              john.sanda

              Heiko Braun wrote:

               

              Something related to the current API:

               

              There is currently a mixture of async and sync API invocation methods [1]. IMO these should all be async.

               

              Furthermore I was going to ask if it's a good idea to expose the guava ListenableFuture at this level. IMO plain JDK Futures would certainly do and simplify things to a lot. Unless somebody has a compelling reason to use listenable Futures, I would suggest to replace them.

               

              [1] rhq-metrics/core-api/src/main/java/org/rhq/metrics/core/MetricsService.java at master · rhq-project/rhq-metrics · GitHub

              The methods for reading/writing metrics are async. I have not taken the time yet to make the other methods async because I am not convinced that we need them.

               

              The DataStax driver already uses ListenableFuture which is the primary reason it is exposed. And Guava's Futures API provides a framework for callbacks that we do not have with plain Java Futures.

              • 4. Re: Api functionality
                john.sanda

                Heiko Braun wrote:

                 

                I would propose to remove the Futures from the add*() operations completely. I don't really see the benefit (from a client point of view) for the additional resource allocations this requires.

                That would effectively make the add methods fire-n-forget. How would the client know whether or not the inserts succeed?

                • 5. Re: Api functionality
                  heiko.braun
                  How would the client know whether or not the inserts succeed?

                  If needed at all this could be achieved though the query API's. But I am still not convinced clients would actually do that. But maybe I am missing something here?

                  • 6. Re: Api functionality
                    heiko.braun
                    The DataStax driver already uses ListenableFuture which is the primary reason it is exposed. And Guava's Futures API provides a framework for callbacks that we do not have with plain Java Futures.

                    Yes, that's what I thought. Can you elaborate on the the need for the listeners?  I guess there are very few real clients that directly operate on that API, most of them would probably rely on specific transport/protocol like HTTP/REST. Are there any specific use cases you has in mind for keeping the listener semantics?

                    • 7. Re: Api functionality
                      john.sanda

                      Heiko Braun wrote:

                       

                      How would the client know whether or not the inserts succeed?

                      If needed at all this could be achieved though the query API's. But I am still not convinced clients would actually do that. But maybe I am missing something here?

                      If you are concerned about the resources involved to allocate the listeners, then you ought to be a lot more concerned about performing reads after writes to verify whether or not the writes succeeded. I think that the listener overhead is very minimal. Of course, we can let testing bear this out. While some clients might be perfectly fine with fire and forget semantics, RHQ is not.

                      • 8. Re: Re: Api functionality
                        john.sanda

                        Heiko Braun wrote:

                         

                        The DataStax driver already uses ListenableFuture which is the primary reason it is exposed. And Guava's Futures API provides a framework for callbacks that we do not have with plain Java Futures.

                        Yes, that's what I thought. Can you elaborate on the the need for the listeners?  I guess there are very few real clients that directly operate on that API, most of them would probably rely on specific transport/protocol like HTTP/REST. Are there any specific use cases you has in mind for keeping the listener semantics?

                        The listeners are used extensively in the RHQ aggregation code that will likely be ported over at some point in the near future. They are also used in the implementations of the current REST endpoints, e.g., AsyncEndPoint.java. I don't see any point blocking a server thread to wait for a Cassandra request to complete since the DataStax driver notifies us when the results are available.

                        • 9. Re: Api functionality
                          heiko.braun
                          you ought to be a lot more concerned about performing reads after writes

                          yes, that's true. I did assume these are sparse operations.

                           

                          While some clients might be perfectly fine with fire and forget semantics, RHQ is not.

                          If the metrics service accepts the data (ACK) and the service itself provides some kind of resilience, why does the client to be informed about the success or failure of the operations that actually persist the data? Can you elaborate on the RHQ use cases?

                          • 10. Re: Api functionality
                            heiko.braun
                            I don't see any point blocking a server thread to wait for a Cassandra request to complete since the DataStax driver notifies us when the results are available.

                            yes, the way you describe it makes complete sense. But I did not suggest blocking calls. I was just wondering about the actual semantics and rational behind that particular API.

                            • 11. Re: Re: Api functionality
                              john.sanda

                              Sure, I can elaborate on the RHQ case. It was discussed on the rhq-devel mailing list a little while back. Here is my original post on the subject,

                               

                              Currently there exists the possibility of numeric data loss when merging measurement reports. If there is an error storing raw data, we log the error but do nothing else. Suppose for example that while the server is storing a set of raw data, the storage cluster goes down half way through. In this scenario it is likely that the latter half of that data is lost. There has been some recent discussion about the potential for data loss, and I want to open it  up to the list for additional thoughts, opinions, etc. I will briefly summarize a few options for dealing with data loss.

                               

                               

                              option 1 - do nothing

                              The case can be made that loss of metric data may not be as significant as losing inventory or configuration data for example. If the data loss is limited to a single measurement report or subset thereof, then it probably is not very significant since we are dealing with loss of a single data point for some some number of schedules. Of course, some dropped metrics here and some dropped metrics there can quickly add up to where we are dealing with a substantial amount of data loss, and this would be bad.

                               

                               

                              option 2 - Rely on agent/server comm layer guaranteed delivery

                              MeasurementServerService.mergeMeasurementReport(MeasurementReport report) has guaranteed delivery semantics. If the calls fails for whatever reason, the agent will retry it. The agent also spools the report to disk so that if it get disconnected from the server, it can retry after reconnecting. The downside of the guaranteed delivery is that the agent continually retries. If storing raw data failed because the storage cluster is overloaded, this could exacerbate the problem. I have actually experienced this in test environments where I was putting a heavy write load on the server and storage cluster. My server would be down or in maintenance mode for a while, and then the server comes back up, all my agents hammer the server with spooled measurement reports.

                               

                               

                              There is another aspect to consider in terms of efficiency. Suppose an agent sends 10,000 raw data to the server. An error occurs after storing 9,995 raw data. The agent will resend and the server will store again all 10,000. This is less than optimal and brings me to option 3.

                               

                               

                              option 3 - Do not overwhelm the server and only retry failed data

                              The server can report back to the agent the raw data that it failed to store. The agent can spool that data to disk, and resend it at some point in the future. There could be some different approaches. The agent could retry on some fixed interval, or maybe it uses some initial delay with an increasing back off, e.g., 2 minutes, 4 minutes, 8 minutes, etc. This option requires the most work, but I think that it is the most robust.

                               

                              Later on in the rhq-devel mailing list thread someone suggested handling it completely on the server side by spooling the data to disk and retrying it at some point in the future. Maybe that is what we need to do, but as of right now I prefer to let the client handle it for a couple reasons. First, even if we let the server handle the failures, I think we still want to provide some error reporting to the client for logging/debugging if for nothing else. Secondly, I am concerned about the additional burden that this could put on the server. If we are dealing with a small number of data points, then it is not a big concern. But suppose we have a large, rapidly growing amount of data on disk that will have to be retried. We will want an efficient solution for processing the data on disk, maybe a queue of some sort. Shouldn't that queue be distributed though in the event we are running multiple servers? I just think it is easier and potentially more scalable to let the client retry if it wants.

                               

                              Even if the server handles failures, I still think it would be nice to provide error reporting back to the client so that it can act accordingly even if that means nothing more than logging the errors for debugging.

                              • 12. Re: Api functionality
                                mithomps

                                Another one is if we want to have a way to find out what we store, have hyperlinking in the rest-api or if we completely leave this to clients.

                                I believe the hyperlinking in rest-api (HATHEOS) is the way to go as we can save some client round-trips and saves having to implement some of these states in the client. Anything we can do to help the client we should provide.

                                 

                                  // The idExists call [...]. We

                                  // need to decided whether or not this check is really necessary. There isn't really

                                I don't see this one as a must have. [If we did have it I assume this would be a rest HEAD call on a resource]. Instead, i would like to see an API to search just the Ids so we can provide dynamic type-as-you-go selection of valid ids in a UI.

                                 

                                Do we want to allow the user to delete metrics? Individual items? Full history?

                                Low priority, probably out of scope for now (especially if it can be done via CQL).

                                 

                                Do we want to add api methods that bucketize metric results (e.g. if we display 60 bars, we need 60 buckets)?

                                +1. This makes client UI much simpler and eliminates the possibility of overloading the client with too much data. I think both continuous data and bucketized data are required to give clients the flexibility to create both types of charts/tables. Certain graph types support the continuous data model well and other graph types support the discrete interval (bucketized) model. I didn't see many metrics engines offer both methods so this can also be strong point for rhq-metrics.

                                 

                                How where do we want to store metadata (units, monotonically increasing/dynamic)?

                                Metadata can provide the glue for the additional data and gives a graph more context for decision making. Its not first priority but eventually needs to be there. There are many details here that would still need to be hashed out.

                                 

                                Just my opinions.

                                • 13. Re: Re: Api functionality
                                  john.sanda

                                  I am -1 on the idExists method if the only use case is to determine if an id exists. We can wind up with ids stored in Cassandra that have no corresponding data because it has all aged out. What purpose does having an id with no data serve? We could easily build a method on top of the core API to check whether or not there is any metric data for a given id.

                                   

                                  What are the use cases for manual delete operations?

                                   

                                  I am +1 on providing bucketing, i.e., real time aggregation of query results, but not as the only way of returning query results as is the case in RHQ.

                                   

                                  I think we can add support for meta data pretty easily now that there is support for static columns in CQL. We could do,

                                   

                                  CREATE TABLE metrics (
                                     bucket text,
                                     metric_id text,
                                     time timestamp,
                                     value map<int, double>,
                                     meta_data map<text, text> static,
                                     PRIMARY KEY ((bucket, metric_id), time)
                                  )
                                  

                                   

                                  A static column is only stored once for the entire partition which is exactly what we need.  And using a map allows us to store arbitrary key/value pairs.

                                  • 14. Re: Api functionality
                                    mithomps

                                    What about adding simple counters (these are very common in websites)?

                                    Example: Api.incrementCounter('home.page.counter');

                                    Cassandra even has a special counter type.

                                    1 2 Previous Next