1. Background
Typical CEP use cases derive complex events from sequences of atomic events. For instance: a check-in event, followed by a bag loaded event, followed by a boarding card scanned event can be abstracted into a higher level passenger boarded event. In pseudo code, this could be represented as (the -> conditional element means "followed by"):
{code}event pb creates PassengerBoarded as
$checkin : CheckIn() ->
$bag: BagLoaded( cid == $checkin.id ) ->
$bcard: BoardingCardScanned( passengerId = $checkin.passengerId )
end{code}
These sequences of events can be represented in Drools 5.4 by making use of temporal operators. Unfortunately this has two inconveniences:
- It is more verbose than it needs to be
- It places a burden on the user to manage the lifecycle of the events that are part of the matched sequence, and this is usually hard to do correctly.
The goal of this proposal is to introduce a few constructs into both the language and the runtime engine that make it easy for rule and query authors to define sequences of events that should be matched by the engine, allow these sequences to be constructed into a higher level event and handled as one, and define the conditions in which the engine can automatically control the lifecycle of such events.
2. Event Abstraction
As of Drools 5.4, it is possible to define arbitrary data structures (classes) to represent events (atomic or composite) or to instruct the engine to assign event semantics to existing java classes in the classpath, by using the "@role()" annotation in a "declare" construct. E.g.:
{code}declare VoiceCallEvent
@role( event )
callStart : CallStartEvent // a primitive event
callEnd : CallEndEvent // a primitive event
customer : Customer // related data
end{code}
This is a flexible and effective way of defining and using events, but requires the rules author to write rules to instantiate and insert complex events into the working memory based on conditional elements matching events and/or data in the session (working memory). E.g.:
{code}rule "derive VoiceCallEvent"
when
// a LHS with some conditions and the proper bound variables
then
insert( new VoiceCall( $start, $end, $customer ) );
end{code}
In order to reduce the verbosity, the engine could derive the data directly from the bound variables in the LHS, instantiate a new VoiceCallEvent and insert it into the proper entry point. The proposed syntax (EBNF re-using rules from our current parser) is:
{code}EVENT <ID> CREATES <ID> (INTO <stringId>)?
<annotation>*
<attributes>?
AS
<lhs>
(RETRACT <ID> (COMMA <ID)*)?
END{code}
Note: uppercase words are lexer tokens and <> denotes parser rules. <annotation>, <attributes> and <lhs> accept the same values as the ones defined for rules, although we might have to verify each rule attribute for compatibility, during the detailed design and implementation.
E.g., for the previous voice call event, one could define it like:
{code}event vc creates VoiceCallEvent
into
voiceCalls
as
callStart : CallStartEvent() ->
callEnd : CallEndEvent( callingNumber == callStart.callingNumber )
$phone : MobileDevice( number == callStart.callingNumber )
customer : Customer( id == $phone.customerId )
retract
callStart, callEnd
end{code}
Please note that in the previous example:
- "vc1" is an arbitrary unique ID, like a rule name is for rules, that allows the engine to uniquely identify that statement, and the application to be able to later remove that statement or replace it by a new version by referencing this unique ID.
- -> is the "followed by" conditional element that is defined in the next section of this article. It defines a temporal relationship between events callStart and callEnd
- the name of the event definition matches the name of the declared type with assigned event @role (i.e. VoiceCallEvent)
- the bound variables with the same names as the fields in the declared type, will have their values automatically assigned to the instantiated event (i.e., callStart, callEnd and customer).
- $phone is a locally bound variable with no corresponding field name in the declared event and as such, is only used in the constraints themselves and is not assigned to any field in the declared event.
- "into voiceCalls" instructs the engine to insert the instantiated event into the "voiceCalls" entry point, instead of the default entry point.
- "retract callStart, callEnd" instructs the engine to retract the atomic events from the working memory after creating the composite event
OPEN ISSUE: I am not happy with the syntax for the unique ID: "event x creates Y...". Need to figure out an alternative.
{quote}*NOTE:* Wolfgang proposed the use of a keyword like "next" as a compact syntax for consuming the matched events. Should we support that? E.g.:
{code} event Call as next
offHook: OffHook( $s: subscriber ) ->
dialing: Dialing( subscriber == $s ) ->
onHook: OnHook( subscriber == $s )
end{code}{quote}
3. Definition: Event Tuple
An event tuple is a tuple composed by one or more events, optionally containing inter-mixed data (facts and other tuples). The start timestamp of an event tuple is the minimum start timestamp among all events in the tuple, and the end timestamp of an event tuple is the maximum end timestamp among all events in the tuple. The end timestamp of an event is an implicit attribute derived from the sum of the start timestamp and duration attributes of the event. For point-in-time events (i.e., events with a duration of 0), the end timestamp is the same as the start timestamp. An event tuple is not an event by itself, but share some of the same properties and can be used to instantiate new events.
4. Definition: Sequence Conditional Elements
Defining sequential relationships for events requires the introduction of several new conditional elements. All of these conditional elements consider the order of the events in the input stream of events, but may also enforce additional temporal constraints as defined below:
Conditional Element | Description |
---|---|
A -> B | A followed by B |
A => B | A strictly followed by B |
A ~> B | A loosely followed by B |
A \\ B | A independently followed by B |
In all the above examples, A and B can be events or event tuples.
4.1. The "followed by" (->) conditional element
The "followed by" conditional element (->) defines a temporal relationship between two event tuples where the left event tuple's end timestamp happens before the right event tuple's start timestamp. This semantic resembles the semantic of the "after" operator, but is stronger, works on tuples instead of individual events or attributes and defines additional constraints that the engine can use to infer new information and manage the relationship of the events. It is easy to understand if we consider the left event tuple to be composed of a single A event, and the right event tuple to be composed of a single B event. In this case:
{code}"A -> B" implies that "A.endTime <= B.startTime"{code}
A and B in the previous example can be event tuples, but according to the previous definition of event tuples, the "followed by" relationship holds.
4.2. The "strictly followed by" (=>) conditional element
The "strictly followed by" (=>) operator determines that the event tuple on the left must be strictly followed by the event tuple on the right, i.e., without any other intervening tuples. The usability and feasibility of such conditional element needs to be investigated.
{code}"A => B" *implies that* "A.endTime <= B.startTime" *AND* there is no C where "C.startTime >= A.endTime \&\& C.startTime <= B.startTime"{code}
4.3. The "loosely followed by" (~>) conditional element
The "loosely followed by" (~>) operator determines that the event tuple on the left must start before the event tuple on the right, but does not need to finish before it. The usability and feasibility of such conditional element needs to be investigated.
{code}"A ~> B" *implies* that "A.startTime <= B.startTime"{code}
4.4. The "independently followed by" (\\) conditional element
The "independently followed by" (\\) operator determines that the event tuple on the left must be completed before the event tuple on the right, but no additional temporal constraints are enforced. This usually means that all the events in the event tuple on the left must happen before on the event stream than all but one of the events in the event tuple in the left. The usability and feasibility of such conditional element needs to be investigated.
{code}"A \\ B" *does not imply* any relationship between A and B other than event ordering in the stream. {code}
4.5. The semantics of "or"
A sequence can have as one of its terms event tuples connected by other conditional elements like "and" and "or". Since Drools unifies the semantics of CEP, BRE and BPM, the semantics for the "or" conditional element will be the same as in traditional BRE rules, i.e., an inclusive or. For instance, for the stream of events:
{code}A1 B1 C1 D1{code}
The following sequence pattern will generate 2 matches:
{code}A() -> ( B() or C() ) -> D(){code}
Named:
{code}M' = A1 B1 D1
M'' = A1 C1 D1{code}
5. Pattern repetitions
When matching events in streams, there are frequently patterns that will match multiple events. Sometimes, the users wants to ignore multiple event occurrences considering only the first or last event, sometimes the user wants to constrain the matching algorithm to a given number of repetitions, and sometimes the user wants to match and consider all such repetitions.
To define pattern repetitions, the "[ ]" (repetition) operator is used in front of a conditional element or pattern. The general syntax for the repetition operator is:
{code}[ <repetition_counter> (| <sequence_conditional_element>)? ]{code}
The repetition counter can be:
Syntax | Result |
---|---|
* | Matches zero or more |
+ | Matches one or more |
? | Matches zero or one |
min..max | Matches between min and max |
..max | Matches between zero and max |
min.. | Matches min or more |
For all the cases above where multiple repetitions are supported, the result will always be the "maximal match", i.e., it will match only the chain of events that consists of the maximal number of events, not a subchain consisting of some of those events. This is consistent with what is defined by Dr. David Luckham in his book "The Power of Events", but nevertheless requires users to be careful when defining repetition patterns in order to make sure matches will complete and not remain open forever, waiting for more events to match. This is particularly true when using repetition counters as the last pattern in a sequence with any operator other than "strictly followed by" (=>).
The default sequence conditional element that connects the repeated pattern is the followed by (->) conditional element, but it is possible to define a different conditional element. E.g., if the relationship between the repeated patterns is strictly followed by, it would be defined as:
{code}[ 1..10 | => ] A{code}
Please note that A can be an event or a tuple event. The supported sequence conditional elements are "->", "=>", "~>" and "\\", as previously defined.
E.g.: if the application should match an A event, followed by at least one B event followed by a C event, the rule would be written as:
{code}rule X
when
A() ->
[+]B() ->
C()
then
...
end{code}
5.1. Variable bindinds and the concept of multi-dimensional tuples
Tuples are essentially an ordered list of values. Rules and event definitions match events in a stream and produce tuples. E.g.:
{code}rule X when
$a : A() ->
$b : B() ->
$c : C()
then
// ...
end{code}
The rule X above will match tuples of 3 values, one each for A, B and C. A possible match for the previous rule could be the tuple:
{code}[a, b, c]{code}
In the previous tuple, one can address the "b" value in the tuple by its bound variable "$b".
The aproach above works well for situations without pattern repetitions, but how could one address individual elements on a repeated pattern? For instance, what is the value of "$b" in the example below?
{code}rule X
when
$a : A() ->
$b : [+]B() ->
$c : C()
then
...
end{code}
The answer is, $b is an ordered list of values, i.e., a tuple by itself. The resulting tuple is, then, a tuple of tuples, or what can be called a "multi-dimensional tuple". In textual form, a tuple of tuples can be written with nested [ ] to indicate nested tuples, or in graphical form, it can be represented with a tree structure. For instance, the following multi-dimensional tuple is a possible match for the previous rule:
{code}[a, [b0, b1, b2], c]{code}
In this case, $b is a variable of the type Tuple and it is bound to [b0, b1, b2].
5.2. Addressing individual elements in a multi-dimensional tuple
To address individual elements in a multi-dimensional tuple, a few concepts are defined. The first is the type Tuple. In the previous example, $b is a variable of the type Tuple.
Individual elements of the type Tuple can be addressed by a numeric offset, like in an array, where the first element in a tuple has offset 0. In the previous example, the following conditions are true:
{code}$b[0] == b0
$b[1] == b1
$b[2] == b2{code}
The type Tuple also defines special properties:
Property | Description |
---|---|
length | the number of elements in the tuple |
first | the first element in the tuple |
last | the last element in the tuple |
asList | returns an ordered list of the elements in the tuple |
iterator | an iterator for all elements in the tuple |
For the previous example, the following conditions are true:
{code}$b.length == 3
$b.first == b0
$b.last == b2 {code}
Multi-dimensional tuples can be several levels deep, depending on the definition of the patterns that created that tuple. For instance:
{code}$t : ( first $ab : [1:5]( first $a : A() -> first $b : [1:3]B()) -> first $c : C() ){code}
The example above is definiting a pattern matching sequence of between 1 and 5 groups of one A followed by 1 to 3 B's. These groups are then followed by 1 C. All occurrences are qualified as "first". The whole tuple is bound to a variable $t of the type Tuple.
A possible result tuple could be:
{code}[ [ [ a0, [b0] ], [ a1, [ b1, b2 ] ], [a2, [ b3, b4 ] ] ], c0]{code}
This example demonstrates that variables can be bound to nested patterns, effectivelly creating nested variables. Such variables can be addressed with the usual . navigation and can be understood as "aliases" to the tuple indexes previously defined. The following expressions would hold true for the above example:
{code}$t[0] == [ [ a0, [b0] ], [ a1, [ b1, b2 ] ], [a2, [ b3, b4 ] ] ]
$t.$ab == [ [ a0, [b0] ], [ a1, [ b1, b2 ] ], [a2, [ b3, b4 ] ] ]
$t[1] == c0
$t.$c == c0
$t.$ab[0] == [ a0, [b0] ]
$t.$ab.first == [ a0, [b0] ]
$t.$ab[0][0] == a0
$t.$ab[0].$a == a0
$t.$ab[0].first == a0{code}
Complex cases like the previous one will rarely occur in real use cases, but nevertheless, it is quite common to have tuples with 1 or 2 nested levels. In any case, this proposal guarantees that all elements in a multi-dimensional tuple are accessible.
6. The event qualifiers
While the repetition operator allows the user to constrain event pattern matching, the event qualifiers allow the user to define what the engine should consider when encountering event repetitions. For instance, assume the following sequence of events:
{code}A1 B1 A2 A3 C1 B2 C2 A4 C3{code}
How can the user define a complex event X that is composed by A followed by C, but disregarding event repetitions?
Using the following syntax would not achieve the desired result:
{code}event X as
A() -> C()
end{code}
The previous definition would result in instantiating one X event for each pair of [A,C] where A happens before C. I.e.:
{code}X1 = [A1, C1]
X2 = [A2, C1]
X3 = [A3, C1]
X4 = [A1, C2]
X5 = [A2, C2]
X6 = [A3, C2]
X7 = [A1, C3]
X8 = [A2, C3]
X9 = [A3, C3]
X10 = [A4, C3]{code}
The desired result could be obtained by using an event qualifier instead:
{code}event X as
first A() -> first C()
end{code}
The above code would result:
{code}X1 = [A1, C1]{code}
Please note that in the previous examples, the letters A..D can represent either individual events or event tuples.
6.1. The "every" qualifier
The "every" qualifier instructs the engine to start a new match for each event matching the pattern. This is the default behavior of all patterns in the engine and so the every qualifier is optional for single patterns, but still needed for event tuple patterns. E.g., assuming the sequence of events in the beginning of this chapter:
{code}event X as
A() -> C()
end{code}
is the same as
{code}event X as
every A() -> every C()
end{code}
and yields the results:
{code}X1 = [A1, C1]
X2 = [A2, C1]
X3 = [A3, C1]
X4 = [A1, C2]
X5 = [A2, C2]
X6 = [A3, C2]
X7 = [A1, C3]
X8 = [A2, C3]
X9 = [A3, C3]
X10 = [A4, C3]{code}
6.2. The "any" qualifier
The "any" qualifier will consider any event that matches the pattern, but only one of them. So
{code}event X as
any A() -> any C()
end{code}
will result in a single X event composed by one A followed by one C. There are no guarantees about which A and which C the engine will pick (althought the implementation will usually pick the first event that matches the pattern, the users should not care or rely on that). A possible result could be:
{code}X1 = [A1, C1]{code}
Please note that once matched, the pattern will never match again, as all subsequent events will be ignored because of the "any" qualifier. If the user wants the above event definition to match a new [A,C] pair once the first pair is found, it can then compose the qualifiers:
{code}event X as
every( any A() -> any C() )
end{code}
The above will result in:
{code}X1 = [A1, C1]
X2 = [A4, C3]{code}
6.3. The "first" qualifier
The "first" qualifier works similar to the "any" qualifier, but instructs the engine to pick always the first event in a sequence that matches the given pattern. For instance:
{code}event X as
first A() -> first C()
end{code}
Will result in:
{code}X1 = [A1, C1]{code}
This qualifier can be composed with others to generate different results.
6.4. The "last" qualifier
The "last" qualifier works similar to the "any" qualifier, but instructs the engine to pick always the last event in a sequence that matches the given pattern. For instance:
{code}event X as
last A() -> first C()
end{code}
Will result in:
{code}X1 = [A1, C1]{code}
IMPORTANT: please note that it is not possible to use the "last" qualifier in the last pattern of a sequence as that would create an unbound sequence that would never match. In the above example, we defined a match as the last A() followed by the first C(), but it would be impossible to define any match composed by the "last C" as it would be an unbound condition.
This qualifier can be composed with others to generate different results.
6.5. The "not" qualifier
The "not" qualifier is actually a delimiter on a sequence of events and instructs the engine to discard any partial match that matches the negated condition. For instance:
{code}event X as
first A() -> not B() -> first C()
end{code}
Will result in:
{code}X1 = [A2, C1]{code}
Please note that the result differs from the example using the "first" qualifier alone because the "not B()" condition causes the partial match [A1,...] to be discarded when B1 event arrives.
IMPORTANT: please note that it is not possible to use the "not" qualifier in the last pattern of a sequence as that would create an unbound sequence that would never match.
This qualifier can be composed with others to generate different results.
7. Event expiration support
When definiting event sequences, expiration of partial matches becomes a critical aspect of the design, or otherwise, the hardware resources will be consumed and never released. In order to allow such support, sliding windows need to be supported for individual patterns and composite CEs as well. The usual "over window:time()" and "over window:length()" syntax seems a bit too verbose for event sequencing, and since the syntax of the window itself is not ambiguous, as timers use time units (ms, s, m, h, d) and events can adopt the unit "e", this proposal suggests the following:
For temporal constraints:
{code}(<pattern>|<CE>) over <interval>{code}
E.g.:
{code}A() -> B() over 30s
A() -> (B() -> C()) over 1h30m
A() -> (not B() -> any C()) over 14h00m30s150ms{code}
For event constraints:
{code}(<pattern>|<CE>) over <event_count>{code}
E.g.
{code}A() -> B() over 10e
A() -> (B() -> C()) over 25e
A() -> (not B() -> any C()) over 5e{code}
Please note that it does not make sense to define window constraints in the first pattern or CE in a sequence of events.
OPEN ISSUE: the proposed window syntax makes the definition of multiple parameters ambiguous, so it might not be feasible. We might need to revert to our original window syntax.
8. Entry point reference
Any pattern in Drools can have a distinct, inline declared, source. For instance, if one is looking for A events/facts in the X entry point, the syntax would be:
{code}A() from entry-point X{code}
This is fine for a few patterns, but in event sequences, when most of the time all patterns refer to events in the same stream/entry-point, a syntax should be provided to avoid the repetition of the sufix in every single pattern. This article proposes the following syntax:
{code}event X from entry-point Y as
A() -> B() -> C()
end{code}
In that case, unless a pattern has a specific "from" clause, they default to the source defined in the header, i.e., entry point Y in the example. In case one or more patterns should match events in a different stream, they can individually refer to different streams:
{code}event X from entry-point Y as
A() -> B() from entry-point Z -> C()
end{code}
9. Internal design
9.1. Event Abstraction
The implementation of event abstraction is trivial given the current engine runtime. The event instantiations will be converted into rules during compilation time with generated consequences that instantiate and properly populate the attributes based on their name and matching bound variables. It might still require support to arbitrary variable binding, but that would be implemented in a second phase.
9.2. Event Sequencing
The implementation of Event Sequencing will require extensive changes in the current runtime algorithm.
9.2.1. Tuple Normalization
As of Drools 5.4, the engine uses 2 tuple interfaces: LeftTuple and RightTuple. This split is there mostly for historical purposes but there is plently of code that depend on that differentiation. Nevertheless, its use already forces the engine into using adapters for some cases, like wrapping left tuples into right tuples when subnetworks are required.
The implementation of sequencing will require the concatenation or arbitrary tuples and the differentiation between LeftTuple and RightTuple makes that task very fiddly. The current algorithm can only concatenate right tuples into left tuples, but not any other combination.
This seems the opportunity to remove this split and unify the interfaces into a top left Tuple interface that is used for both left and right tuples. Concatenating tuples then becomes a standard operation as all tuples are the same. This has the side effect of allowing the support to what Gary Riley calls "Joins from the Right" and that he implemented for CLIPS back in 2009.
Tuple normalization will affect the whole core engine and the code changes required to support it will be very extensive: nearly every class in the core will be touched by this change.
9.2.2. Sequencing of events
Defining and matching event sequences is very similar to applying regular expressions to streams of characters. The same principles apply and similar algorithms can be used. At this point in time, it seems that compiling the source code into either an NFA or DFA and applying an online matching algorithm seems to be the best option. Since no back-references are necessary in event sequencing, though, it seems on a first analysis that a good option would be to implement a variation of the Thompson algorithm described in the following link would be the best way forward:
http://swtch.com/~rsc/regexp/regexp1.html
The algorithm performs with O(n^2) in the worst case, that is much better than the exponential O(2^n) of the backtracking NFA algorithm used by most modern languages, including Java, for regex matching with back-refences support. There are still open questions, though, specially on the support to constraints on the state transitions, including temporal constraints. E.g.:
{code}A( $x : x ) -> B( y == $x ){code}
The above sequence depends not only on receiving a B token (i.e. event), but also that the token in question must match a constraint defined in terms of a previously matched event in the sequence. The constraints can also be temporal in nature:
{code}$a : A() -> B( this after[1s,10s] $a ){code}
At this point, it seems possible to implement such support by having transitions being represented as constrained transitions. This obviously increases the complexity and space requirements of the algorithm as the NFA/DFA can no longer be represented exclusively by efficient numeric matrixes, but since event sequency patterns should be relatively small in nature, the impact should be manageable.
Another open question at this point is the requirement in the Thompson algorithm to have counted repetition groups to be transformed into actual non-counted transitions, i.e., "[3]A()" would have to be transformed into "A() -> A() -> A()"
Given the requirement above to support constrained transitions, it seems to be possible to have the transition counter to be stored as part of the current state context, in a way that we can support counted repetition groups without increasing the space requirements by increasing the state nodes.
9.2.3. Sequencing node
The sequencing algorithm above would be added to a new node called SequencingNode. Basically it would allow the engine to use Rete for the alpha and beta matching for each of the individual tuples of the sequence, that would in turn be propagated into the sequencing node that would create the matches for the sequencing expression. E.g., in the following sequence expression, where A, B, C and D are event types and X is a fact type (non-event):
{code}A(...) -> (B(...) and X(...)) -> (C(...) or D(...)){code}
* Rete would be used to match/resolve all alpha constraints in all patterns, the join of B and X, the disjunction of C and D, as well as any beta constraints in X that depend on B.
* Each individual result of this matches would be an input into the sequencing node that in the previous example would have 4 inputs: A(), B() and X(), C(), D(). The sequencing node would alos resolve any beta constraint on the transitions themselves.
The above example clearly demonstrate that this node would require multiple inputs, as opposed to all other rete nodes that require only one or two inputs. It also requires the building algorithm to be changed in order to support this change that will require smarter analysis of variable scoping, proper determination of logical branches during compilation, among other changes.
Comments