4 Replies Latest reply on Apr 20, 2012 6:02 AM by objectiser

    Activity Model - schema


      This discussion relates to whether an existing activity event schema exists that should be used, or whether a bespoke schema should be used that can evolve with requirements. The following is not an exhaustive list, although there does not appear to be many candidates in this area, however please let us know if there is any other schemas that may be of interest.


      (a) TPTP (cbe.pdf) - "Test & Performance Tool Platform" at Eclipse: http://www.eclipse.org/tptp/

      (This candidate was highlighted by Rob Cernich in a previous post).


      Although this project is now archived and therefore no more work will be done on it, the event schema identifies work from a number of companies on how to structure this type of information. The implementation provided by the project was EMF based, which I personally would not want to use, however the xsd schema could be used to generate a lighter weight version.


      The key point about this schema is that it is generic - there are no derived element types representing the specific events (or situations) that may occur. A particular event type would be defined in an 'extension name' field, and the associated details defined in an 'extended metadata' hierarchy.


      This has pros and cons - it means that new types of event could easily be supported, however it is essentially bypassing the benefits of the schema, as the information provided for a particular 'event type' may either be wrong or missing. So schema validation would not be of any use. From a serialization perspective, the information would be more verbose, as each additional field needs to provide its name, type and value.


      Looking at the schema from a simplistic perspective, it basically has three elements of interest, i) component identification, to determine what is reporting the activity, ii) context, to identify any correlation info required to relate it to other events, and iii) situation (i.e. the activity), which has a type and set of supporting name/value pair type data.



      (b) CIM "Common Information Model" - (referenced by the TPTP project as well as Heiko's original SAMM project wiki page): http://dmtf.org/standards/cim


      This is more management focused, so encompasses describing a managed environment, applications, but also events and metrics. So more related to interoperability of information between management systems.




      My current preference is to simply define our own schema. I don't believe there is any significant benefit of strictly adopting either of the 'standard' schemas identified above, although it would be wise to follow some useful patterns followed by these schemas.



        • 1. Re: Activity Model - schema

          My preference would also be starting to define our own schema, and tends to be a subset of TPTP for the start. But agreed that we may borrow some  from the TPTP, just don't want to accomdate all of it though, especially in the start.




          • 2. Re: Activity Model - schema

            My current preference is to simply define our own schema. I don't believe there is any significant benefit of strictly adopting either of the 'standard' schemas identified above, although it would be wise to follow some useful patterns followed by these schemas.





            The only benefit of using CBE would have been the ability to leverage the existing TPTP infrastructure and tools.  However, since that project is no longer active, I think leveraging TPTP may be more of a hinderence than a help.


            That said, what do other projects/products use today?  Log scrapers?

            • 3. Re: Activity Model - schema

              Rob Cernich wrote:


              That said, what do other projects/products use today?  Log scrapers?


              Good question - not sure. I think we eventually need to support log scrapers, but you can't necessarily guarantee that all the required information will be available in a log. For example, to be able to correlate activity between two services based on an invocation or message exchange, we need to have context information that may only be available in the header or contents of a message. Generally, due to the potential size of the message payload, this level of detail is not recorded - possibly if lowest level of diagnostics is enabled, but this does not usually occur in a production system for performance reasons.

              • 4. Re: Activity Model - schema

                Ok, assuming we are going define our own schema, at the top level I think activity events should be recorded in groups related to a 'unit of work' - the CIM model uses the concept of unit for work, but in relation to the scope of service metrics.


                I was thinking that if we are directly intercepting events, and have an understanding of the transaction scope, then all events that occur within a particular XA transaction scope should be recorded as a group. This means that some of the individual events don't need to be concerned by their context, or relationship to other events, as this will be taken care of by their inclusion in the group. Recording following a transaction scope also makes sense in a JEE context, as the transaction may get rolled back, and therefore essentially the activities did not occur.


                So at the top level, there would be:


                • a header of some sort containing the system wide information (e.g. server details (port), host, etc)
                  • this is simplier than the TPTP component id concept, as it does not try to cater for all the different identifies that may be relevant to different events - this information should be specific to the event type being reported
                  • if recording a group of activity events, then each of those events could occur at different times, albeit within milliseconds of each other possibly - however some transactions may be open for minutes resulting in some activities being quite spread out. So the timestamp should be recorded on the individual events
                • context - basically a set of properties that can be used for correlation
                  • the individual properties should have a name and value, although essentially it is the value that is important, as different models (for different services) may name the correlation information differently
                  • the properties should also have a type, to indicate what the information represents. Some possible candidates may be, business identifier (or possibly just identifier), process instance id, etc
                • events - the list of activity events that occurred within the transaction scope


                This approach should hopefully reduce the duplication of header and correlation information across the individual events, if they were reported separately.


                From an analysis perspective there are some pros and cons:


                • If analysing the time taken to perform a task, and understanding where the time was spent within that task, then the grouping on a transaction basis may provide all the information required
                • The CEP rules that may be used to analyse the activity events may only be interested in individual event types, and then perform any correlation between the event types itself, without wanting to be concerned about potential transaction groups
                • Solution may be to initially dispatch the activity group into the analysis phase, but enable individual analysis steps to easily extract particular event types of interest as individual events, which are then passed to subsequent modules for further analysis



                From a visualisation perspective:


                • It should be easier to correlate between groups, to build up a graph of related groups which can then be presented to a user
                • The 'activity units of work' can be expanded to show the individual detailed activities that were performed to achieve a particular business step