Skip navigation

Kevin Conner's Blog

June 20, 2011 Previous day Next day

On Tuesday May 3rd, at 4pm EST, Craig Muzilla, Mark Little and Burr Sutter went on stage to present the JBoss World keynote. Mark talked in detail about JBoss Everywhere, his vision of ubiquitous computing, and Burr performed a fantastic anchorman role for a very ambitious demo.

 

I was fortunate to play a large part in the creation of the demo but could not be there to see it performed live. I did, however, watch the keynote over a live stream and was thrilled to see the audience reaction.

 

To call this a very ambitious demo may sound like boasting but, in my view, this is not an understatement. There were a lot of technologies in play, including some unusual servers, and this was all done without any backup plan.

 

If this was going to go wrong then it would have gone wrong in a spectacular fashion!

 

Altogether we had

  • a live twitter stream being fed into an infinispan grid via hibernate OGM
  • audience members casting votes through tweets
  • drools filtering the data in the grid to identify useful information (the votes) before and after the cheating started
  • HornetQ transporting data to the second grid.
  • Plug Computers hosting the filtered data in a second grid
  • three user interfaces displaying live aspects of each grid, with each written using different technologies and all of them running on diverse devices (mobile and otherwise)

and to top it off we even let an audience member 'pull the plug' on a random node to demonstrate the resilience of an infinispan grid. What more could you want?

 

A lot of folks said we were crazy to attempt such a demo, and they may have been right, but the reaction of the audience was worth everything!

 

It was a great team who put this together and it was a lot of fun doing it.

 

So what goes into a demo such as this?  From the above list it is obvious that there are a lot of technologies involved, but that only tells part of the story.  It is now time to delve deeper and expose some of the details, in this case we will be discussing drools and how we can take advantage of its capabilities to deal with those who would wish to subvert the voting process to their own advantage..

 

The challenge for Drools

 

The twitter stream is a large, and very diverse, data stream containing postings on topics such as #myperfectmorning, #nationaldonutday and even Tom & Jerry.  Hidden in the midst of all those postings are the gems we seek, in this case the votes cast by the keynote audience for specific JBoss technologies.  The stream would be captured by a different service within the keynote demo, responsible for constructing the Tweet entities and, using  hibernate OGM as the Java Persistence API (JPA) layer, persisting these entities in to an infinispan grid.

 

The Requirements

 

The initial execution of the drools integration would be straight forward, intercept the infinispan cache entries as they were being added into the grid, identify the valid votes and then pass these through to a second infinispan grid, via the hornetq integration, for further processing and visualization.

 

Having allowed this to run for some time we would then notice a peculiar voting pattern, one person submitting excessive votes for an individual project.  At this point we would stop the processing of the infinispan data, alter the ruleset so that we limit each person to a single vote of a technology within a three minute window, and then replay all the captured data through the new ruleset.

 

The Knowledge Base and Session


Before we go on to discuss the rulesets lets take a look at how we configure

  • stream analysis to allow reasoning over time periods
  • control of the session clock to allow replay of the data

 

Drools allows the developer to configure the event processing mode on the Knowledge Base, providing support for the following modes

  • cloud

This is the default processing mode, which behaves as a normal rulebase.  There is no concept of time nor event aging.

  • stream

Enables optimizations specifically for stream processing, such as the capability to expire events

 

We wish to enable the stream processing so that we can associate a duration with each event, allowing drools to automatically expire them once the events are no longer applicable.  This can be done by specifying the stream option on the Knowledge Base Configuration as follows

 

final KnowledgeBaseConfiguration conf = KnowledgeBaseFactory.newKnowledgeBaseConfiguration() ;
conf.setOption(EventProcessingOption.STREAM) ;

 

In order to reason over a period of time, the session will schedule jobs through the TimerService associated with the session.  Drools provides two implementations of the TimerService

  • realtime

A TimerService which provides time references and scheduling based on the current time

  • pseudo

A TimerService which is controlled by the application, often for testing or simulation.

 

In order to allow the replay of the twitter stream, once the cheat is discovered, we must be able to control the session's perception of time.  For this reason we choose the pseudo timer service rather than the realtime variant, and enable it as follows

 

final KnowledgeSessionConfiguration ksconf = KnowledgeBaseFactory.newKnowledgeSessionConfiguration() ;
ksconf.setOption(ClockTypeOption.get(ClockType.PSEUDO_CLOCK.getId())) ;
final StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession(ksconf, null) ;

 

Now that we have a session through which we can insert the tweets, either in response to the infinispan cache events or by querying the data held in the infinispan grid, how do we handle the insert while controlling time?

 

Although this may sound complicated it is, in reality, a straight forward operation.  There are a couple of issues relating to the manipulation of the time that you must bear in mind but, apart from these, the steps you would take are very similar to how a normal session would be used.

 

final SessionPseudoClock sessionClock = session.getSessionClock() ;

final long currentTime = sessionClock.getCurrentTime() ;
final long tweetTimestamp = tweet.getTimestamp() ;
if (tweetTimestamp > currentTime)
{
    sessionClock.advanceTime(tweetTimestamp - currentTime, TimeUnit.MILLISECONDS) ;
}


entryPoint.insert(tweet) ;
sessionClock.advanceTime(0, TimeUnit.SECONDS) ;
session.fireAllRules() ;

 

The first thing to notice is that we only update the clock's time if we find that the time has advanced, as there is a possibility that tweets may arrive with timestamps that are not in a monotonically increasing sequence.  We then insert the tweet into the session via an entry point, which is the drools generalization of stream events, before triggering the scheduled jobs that may result from the insert.  The final step is to fire all the rules, ensuring that any queued actions are executed.

 

The Rulesets

 

Now that we know how to insert the tweets and control the perception of time, lets take a look at the rulesets and how we can modify them to handle the voters who are intent on cheating the system.  During this discussion we will focus solely on handling the cheats and will assume that all Tweets being passed through the ruleset have already been identified as votes.

 

The first ruleset is very simple

  • declare the Tweet instances with the event role
  • obtain the current tweet from the named entry point
  • pass the tweet through for processing

 

The resulting ruleset will look similar to the following

 

declare Tweet
    @role( event )
    @expires( 0ms )
end

rule "Send all tweets"
when
    $t : Tweet() from entry-point "twitter"
then
    tweetProcessor.processTweet($t) ;
end

 

This rule set will take all the tweets and pass them straight through for processing, without any restrictions, as there are no temporal operators being used.  When using this ruleset the cheat will be able to subvert the voting process to their advantage,

 

Once the cheating pattern has been identified we then take a decision to alter the ruleset so that we can still make use of the data collected within the infinispan grid, while reducing the impact of the cheating on the outcome of the vote.  After some deliberation we decide that the best course of action will be to restrict each person to a single vote, for a particular technology, within a three minute window.

 

How do we modify the existing ruleset to allow us to reason over time?  What we need to do is track votes that have already been cast, requiring the introduction of another event type into the ruleset.  Once we have access to this type we can apply temporal logic to the ruleset, allowing the event and its existence to determine whether the same vote has previously been cast.  If we discover an existing vote within the preceding three minute window then no further processing of the current tweet should occur, otherwise we will log the vote, by creating and inserting it into the session, and continue to process the current tweet as before.

 

The modified ruleset is as follows

 

declare Tweet
    @role( event )
    @expires( 0ms )
end

declare Vote
    @role( event )
end

rule "Send one vote every three minutes"
when
    $t : Tweet($screenName: screenName, $vote : hashtags) from entry-point "twitter"
    not (Vote( screenName == $screenName, vote == $vote, this before[0, 2m59s999ms] $t))
then
    insert(new Vote($screenName, $vote));
    tweetProcessor.processTweet($t) ;
end

 

The important section of this ruleset is the following line

 

not (Vote( screenName == $screenName, vote == $vote, this before[0, 2m59s999ms] $t))

 

which will evaluate to true if no matching vote, from the person with the same screen name and for the same technology vote, has been created any time within the preceding 2m, 59s, 999ms.

 

With this ruleset in place we can now prevent the cheat from skewing the vote in their favour while still being able to rely on the data currently stored within the infinispan grid.

 

Although we ended up with a contrived and rather simplified example scenario, many of the Drools capabilities exhibited within the demonstration are applicable across a wide range of real world applications.  The ability to apply temporal logic to a data set, alter the rules when necessary and then re-apply them across the complete data set is a powerful tool to include within your application.

 

Where to go next?

 

A simplified version of the drools demo code has been attached to this posting, showing the application of the rules discussed above.  Please download it and play around, using it as a basis for exploring some of the other temporal logic capabilities available within drools.

 

If this introduction has been enough to whet your appetite for temporal logic then I would also encourage you to download the latest drools release and explore the many examples provided in their distribution.  We have only touched the surface of this interesting topic, there is a great deal more to learn.

 

Thanks for reading this far,

    Kev

Filter Blog

By date: