Some early work on a BAM infrastructure has been started within the Governance organisation at github including:
1) A gadget-server repo, to provide pluggable visual components that can present different capabilities to users
2) The bam repo, providing the ability to collect, analyse and organise activity information
The purpose of this post is to describe the functionality that should be delivered in an initial demo, that can be used to help focus the current work, and enable the potential capabilities to be illustrated.
- Switchyard application - the first step is a simple switchyard application that can be monitored to produce performance metrics and analysed against a SLA
- Switchyard activity event collector - initially will use an ExchangeHandler configured with the application, but eventually may be supported by the Switchyard infrastructure
- Activity Server to receive collected activity events
- Event Processor Network, configured with an example network to:
- derive service invocation metrics - the information should include service type, operation name, optional fault name, duration (ms)
- SLA rules to check service metrics to determine whether they violate the contract
- Result Processing
- SLA violations should be reported via JMX notifications
- SLA violations should be stored in an active collection
- Service metric information should be stored in an active collection
- Visual presentation
- Gadget to display list of SLA violations
- Gadget to display service metrics (more details below)
The gadget for displaying service metrics will need to allow a user to customise it to specify the service type, operation name and optionally a fault type. For the demo, this could be handled by simply providing text fields where the user directly enters the relevant values to be used for filtering results. In the final version it should be possible to use RESTful queries to retrieve a valid set of values that can be presented to the user in dropdown lists.
The metric information may represent individual durations, but may also represent aggregated data over a time range. If a range is defined, then the duration will represent the average value (being the main line on the graph) with the min and max values representing a region that should be highlighted on the graph (e.g. as a lighter filled background behind the main solid line representing the average).
If possible it would also be good to access the SLA violations active collection, and overlay any relevant violations for the selected service type/op/fault onto the graph as a marker associated with the relevant date/time.