Version 1

    One of the objectives of the BAM infrastructure is to analyse the activity information, being generated in realtime, and using it to present a service overview. The service overview will show the dependencies between the services (and their operations), along with the metrics associated with the invocation patterns between those services.

     

    The problem is how to manage the service overview information in a realtime manner, without having to reconstruct the representation each time a client application requests the overview.

     

    The best way to achieve this is to incrementally update the service overview with the relevant information from each activity unit as it is being processed. Although this is  the most efficient way to construct the service overview, the disadvantage is that it means the service overview will represent a cumulative summary of the interactions/metrics from all captured activity units.

     

    This can be interesting information but is not as useful to a business as a more short term 'rolling' view of the service performance - as this enables a business to react more quickly to situations that are occurring over the short term.

     

    Service Overview Snapshots

     

    A potential solution for providing this 'rolling'  summary would be the use of snapshots. There are two ways in which the snapshot could be used to provide this 'rolling' summary:

     

    1) The server could record the set of snapshots and when a client application requests a summary it can be constructed from the latest set of snapshots - the number of snapshots would be dictated by configuration, to represent the time period of interest.

     

    2) Build a single 'summary' view, but periodically deduct the information from a snapshot, to ensure that the summary only represents the most recent information.

     

     

    Although the second approach seems the non-intuitive approach, it has a number of benefits:

     

    a) A single 'active collection' can be maintained with the summary information, enabling it to be directly access by clients via that collection. This is useful, as the information it contains can also be used for other purposes (e.g. to identify all of the service definitions and their operations).

     

    b) If approach (1) is used, it will only represent the accumulation of the services/operations information used over that time period. If approach (2) is used, it will include all services/operations, even if they have not been recently used.

     

     

    So the current preference is to use approach (2).

     

     

    Creating the snapshot

     

    At certain (configurable) time intervals, a copy will be taken of the current accumulated service overview information. This will be compared to the previous copy taken, and used to derive a 'delta' of the information, which will then be recorded as the snapshot of the service overview information associated with that time period.

     

     

    Historic Analysis of Service Overview Information

     

    There are two approaches that could be used to perform historic analysis of service overview information:

     

    1) For the period of interest, query all of the activity units and derive the service overview information.

     

    2) Store the snapshots used for managing the 'rolling window' and retrieve these to build the service overview information over the required time period.

     

     

    Although option (1) is feasilble, it may be more inefficient than (2) - although we may need to do some performance testing to understand whether the performance difference is significant and/or acceptable. If option (1) is acceptable, it would be simpler as it requires less intermediate/derived information to be stored.

     

    However, in case option (1) is a performance problem, the following subsections describes how the snapshots could be used.

     

    Storing snapshots

     

    Although the snapshots have initially been created as a means of implementing the 'rolling window' on the service overview information, in support of providing a realtime view on the activity of a service oriented system, we must also consider how historic queries can also be performed on this information.

     

    To achieve this we need to store the snapshot information for later processing. One of the complexities is that the service overview information is derived locally within each server that is presenting the information to clients - so in a clustered environment, multiple servers will be locally deriving this information. Therefore to store the information, we need to contend with multiple servers storing information - but as some servers within a cluster may be down, they may not all record a complete set of snapshots over a particular time period. So it may mean consolidating snapshots recorded by different servers - but at the same time, trying to use information from the fewest possible servers, to ensure the information is as consistent as possible.

     

    The proposed solution would be to have a snapshot server/service that could record arbituary information against a date/time, server and info type fields. In this case the information type would be service overview.

     

     

    Querying snapshots

     

    When snapshot information of a certain type, over a certain time range, is requested, then the aim should be to obtain all of the information from the same server. So the first step is to determine which server has the most information, and then check if there are any gaps that require filling.

     

    For each gap, repeat the process - identifying a server that has the most information within the time period associated with the gap, etc.

     

    Once the snapshots have been retrieved, then they should be consolidated into a single service overview.

     

     

    Service Overview Change

     

    Previously we discussed deriving a delta between two sets of service overview information, to create the snapshot. We also need to be able to compare two service overviews, to determine how the metrics between the two have changed.