For most Java developers the idea of using a Rule Engine evokes thoughts of vendors in suits selling their bosses a complex and expensive piece of software they don’t need and the introduction of something completely foreign and intrusive to their code base. Drools 5 (http://www.jboss.org/drools/) aims to change this perception by bridging the gap between the Java developer and world of Rule-based systems.
The simplest explanation of what a rule engine is that is a very efficient pattern matcher. It matches data, referred to as “facts” against rules. Rules are simple if-then constructs that operate on the matched data. For example, imagine an example application in which the data is a loan application containing the credit score for the loan applicant. A rule could express a credit score requirement for a loan such as:
For the context of the this example assume that the following Java classes exist:
rule "LowCreditScoreRejection"
dialect "java"
when
mortgage:Mortgage(
lender:lenderName == "ACME"
)
application:LoanApplication(
lender == mortgage.lenderName,
score:creditScore < 680
)
then
application.reject("the score " + score + " is too low. \n" +
"A credit score of at least 680 is required");
insert(new RejectionNotice(application));
end
Listing SAM-1 A simple rule
The rule named “LowCreditScoreRejection” is a typical Drools Rule Language (DRL) rule. It has two parts; the “when” part, also known as the “predicate”, “premise”, “condition” or simply as the “Left-hand side” (LHS for short) and the “then” part or “consequence”, “action”, “conclusion” or “Right-hand side” (RHS)
The “when” part or rule condition determines the patterns to be matched. That is, the types and characteristics of the objects that will activate the rule. In the example shown, we are looking to match two objects, an object of type Mortgage and an object of type LoanApplication. The “mortage” object must have a lenderName (mortgage.getLender) equals to “ACME” and the “application” must have a matching lender name and a creditScore (application.getCreditScore) that is less than 680.
As you can see this rule only gets evaluated if there are two objects of the aforementioned types present and it is only activated if those two objects properties match the conditions in parenthesis.
An observant Java developer will notice that the rule sort of looks like Java code but not quite. The when part list the classes of the objects that must be present and the values of the properties of those instances needed to activate the rule. The when part is purely a pattern matching expression using first order logic. You can think of the when part as the “where” clause in a SQL statement. Just like in a SQL statement you can create aliases for the objects being matches (and their properties). In the “LowCreditScoreRejection” rule we have three aliases “mortgage”, “lender”, “application” and “score”. Once you have aliased a matched object that object can be used somewhere else in the rule condition and also in the rule consequence.
We learned that the rule condition part of the rule determines what pattern of objects that will activate the rule. The “then” part or rule consequence is what happens when the conditions set forth in the “when” part are met. In Drools the consequence is simply a block of Java code. In the “LowCreditScoreRejection” rule example there are two things happening in the “then” part. First we are calling the “reject” method on the application and passing a message telling them why the application is being rejected (notice the use of the alias “score”). Next and last we are creating a new object of type RejectionNotice, passing the application object in the constructor and then passing the newly created object to the insert method. The insert method is a Drools working memory method that tells the rule engine that there is a new object that should be considered when evaluating the rules. In this case the expectation is that there will be another rule that has a condition expecting objects of type RejectionNotice and that will act upon them.
From the simple example of the “LowCreditScoreRejection” rule we see that a rule engine is a system that matches facts (our data objects) against rules. The rules are then used to infer conclusions about the data. In the example, the conclusions inferred were rejection of the loan application and the creation of the rejection notice. This type of system is what is referred to as a data-driven forward chaining reasoning system. At the heart of the system is an “inference engine”; the component that does the pattern matching, activates the rules and determines how to execute the activated rules. This process is typically referred to as truth maintenance. Under the covers Drools uses a custom version of the popular Rete algorithm (see http://en.wikipedia.org/wiki/Rete_algorithm). As we can see the “forward chaining” process starts with the available facts and uses the rules to infer more facts (such as RejectionNotice) until a desired goal as been reached (determining whether a loan application is approved or rejected and communicating the rejections).
The rules like “LowCreditScoreRejection” rule are contained in DRL files (files with the extension .drl) that comply with Drools native rule language syntax. To use the created rules in a Java application you must:
Inside the rule engine, after you have inserted your facts into a knowledge session and invoked the session fireAllRules() method:
This loop is how a Rule Engine infers knowledge from existing facts using rules. You can see that a Rule Engine using pattern matching to reduce/transform the problem space to arrive at a set of facts that can be considered a solution to the problem at hand.
Rule Engine development introduces enough conceptual complexity (mainly inherited from the A.I. lingo and academia) that feels fairly unapproachable to us Java developers. So, let tackle a simple but yet representative problem using Drools 5.
Recently Twitter has become the darling of the social media applications. Classified by some as micro blogging, Twitter’s goal is for users to constantly answer the question “What are you doing?” Of course, the intended usage of a tool by its creators has no bearing on how people will actually use the tool. As an active Twitter user it is especially annoying to deal with those attempting to exploit the tool in ways detrimental to other users experience.
Yes, I am speaking about spammers! Recently blogger Allan Young’s wrote about what he termed the “Twitter Influence Ratio” (http://allantyoung.com). In that blog entry he wrote about a simple way to measure a Twitter user’s influence as the ratio of the number of followers to the number of people the user is following. The article talks about the inexactitude of the measurement for certain notable users such as Robert Scoble.
Based on the twitter influence ratio, Evan Prodromou came up with a simple scale to classify twitter users:
The goal of our Drools 5 project will be to corroborate or repudiate the classification above by also providing a number than can be used to further narrow down the classification. Reading of the possible ways people judge whether a user is a potential spammer (see references) I’ve collected what I think are a small set of rules that can help us narrow down a Twitter user classification:
You might be asking how scientific or statistically accurate the rules above are, and the answer is: “I haven’t a clue”. These rules are exploratory, just a learning algorithm the rule author can use rules to discover hidden patterns in the data. What I’m attempting to do here to set a framework that can be easily tweaked and enhanced.
One of the big decisions you’ll face with implementing a rule-based system with Drools is how much Java to put in your rules. As with any other object-oriented application we want encapsulate complex behavior to make our rules more readable. There are also things that are much easily accomplished, tested and developed outside of the realm of the rules engine.
To deal with the interaction with Twitter I decided to use Twitter4J (http://yusuke.homeip.net/twitter4j), a Java library to interact with the Twitter API. Using Twitter4J I created a simple collection of static utility methods contained in the class TwitterUtils.java. Some of the available methods are:
Twitter4J provides the classes twitter4j.Twitter which represents the authenticated Twitter user and twitter4j.User which represents detailed information about a Twitter user.
Implementing a Rule Based system is all about choices and trade-offs. The first rule we’ll develop is an example of such a trade off. The rule is more of a utility rule to extract and classify the users. To accomplish this I’m using a simple Java enumeration called TwitterUserType that contains a static method to return the right enumeration value given a twitter user’s influence ratio:
public enum TwitterUserType {
UNCLASSIFIED (Double.MIN_VALUE, 0.0),
TWITTER_CASTER (0.0, 0.2),
NOTABLE (0.2, 0.5),
SOCIALLY_HEALTY (0.5, 1.0),
NEWBIE (1.0, 2.0),
POTENTIAL_SPAMMER (2.0, Double.MAX_VALUE);
private Double low, high;
TwitterUserType(Double low, Double high) {
this.low = low;
this.high = high;
}
public static TwitterUserType getType(Double influenceRatio) {
for (TwitterUserType userType : EnumSet.range(TWITTER_CASTER, POTENTIAL_SPAMMER)) {
if ((influenceRatio > userType.low) &&
(influenceRatio <= userType.high)) {
return userType;
}
}
return UNCLASSIFIED;
}
}
Listing SAM-2 A Java Enum to classify Twitter Users
In Drools 5 we can create custom data types right at the DRL level using the “declare” keyword:
declare Follower user : User classification : TwitterUserType follows : Twitter hasPicture : Boolean followedBack : Boolean inactive : Boolean averageTweetsPerDay : Double followersInCommon : Integer followeesInCommon : Integer ranking : Double end
Listing SAM-3 A DRL custom data type
In Listing SAM-3 we declare a simple POJO called Follower that will contain some of the metrics used by the rules. The Rule “Extract and classify followers” will match any object of type Twitter (the Twitter4J class representing the authenticated Twitter user), extract its followers and for each of the follower it will create a Follower object and set its values using the static methods in TwitterUtils. Each object created will be then inserted into the knowledge session using the insert method.
rule "Extract and classify followers"
dialect "java"
when
twitter : Twitter()
then
for (User user : twitter.getFollowers()) {
Follower follower = new Follower();
follower.setUser(user);
follower.setFollows(twitter);
follower.setClassification(TwitterUserType.getType(TwitterUtils.getTwitterInfluenceRatio(user)));
follower.setHasPicture(TwitterUtils.hasSetProfileImage(user));
follower.setFollowedBack(TwitterUtils.isFollowing(twitter, user));
follower.setInactive(TwitterUtils.inactiveForTheLast(user, 30));
follower.setAverageTweetsPerDay(TwitterUtils.averageTweetsPerDay(user));
follower.setFollowersInCommon(TwitterUtils.followersInCommon(twitter, user));
follower.setFolloweesInCommon(TwitterUtils.followingInCommon(twitter, user));
follower.setRanking(0.00);
logger.info("Inserting follower => " + user.getScreenName());
insert(follower);
}
end
Listing SAM-4 A utility rule to extract and classify Twitter followers
One of things that you’ll discover early on is that your application domain objects might not be well suited to be used as rule engine facts. In the case of this simple tool, having an simple data object such as the DRL specific Follower object makes the rule creation simpler and consequently makes the rules much more readable and easy to maintain.
With the Follower objects created and inserted into the knowledge session we can now write a set of rules that match Follower objects with certain characteristics.
The “User has no picture” rule matches any object of type Follower where the hasPicture boolean value is false. It aliases the matched object as “follower” and in the consequence it subtract 30.0 points from the ranking value.
rule "User has no picture"
dialect "java"
when
follower : Follower(hasPicture == false)
then
follower.setRanking(follower.getRanking() - 30.0);
end
Listing SAM-5 The “User has no picture” rule
The “Follower with no mutual followers” rule is equally simple:
rule “Follower with no mutual followers”
dialect “java”
when
follower : Follower(followersInCommon == 0)
then
follower.setRanking(follower.getRanking() – 10.0);
end
Listing SAM-6 The “Follower with no mutual followers” rule
As you can see once we created and populated objects suitable for the rule engine, writing the rules becomes a simple task. The rest of the rules are left as an exercise for the reader (or you can download them with the complete sample application on github).
The Java application that will exercise our Twitter rules is a simple class with a main method. We’ll pass a Twitter username and password as arguments via the String[] arguments. The code needed to read the DRL file and create a knowledge package is stardard boilerplate Drools code as shown in Listing SAM-7:
// get a knowledge builder
KnowledgeBuilder knowledgeBuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
// parse and compile the DRL file
knowledgeBuilder.add(
ResourceFactory.newClassPathResource("TwitterRules.drl", TwitterDroolsExample.class),
ResourceType.DRL);
// check the builder for errors
if (knowledgeBuilder.hasErrors()) {
logger.error(knowledgeBuilder.getErrors().toString());
throw new RuntimeException("Unable to compile \"TwitterRules.drl\".");
}
// get the compiled packages (which are serializable)
Collection pkgs = knowledgeBuilder.getKnowledgePackages();
// add the packages to a knowledgebase (deploy the knowledge packages).
KnowledgeBase knowledgeBase = KnowledgeBaseFactory.newKnowledgeBase();
knowledgeBase.addKnowledgePackages(pkgs);
Listing SAM-7 Loading and compiling the rules
With a knowledge base in place we can now create and configure a knowledge session as shown in Listing SAM-8:
StatefulKnowledgeSession knowledgeSession = knowledgeBase.newStatefulKnowledgeSession(); Twitter twitter = new Twitter(twitterUser, twitterPassword); knowledgeSession.insert(twitter); knowledgeSession.fireAllRules();
Listing SAM-8 Asserting facts and firing the rules
The knowledge session is created using the knowledge base and then we insert a Twitter4J Twitter object which will be matched by our “Extract and classify followers” which in turn will produce Follower objects that will be evaluated by the ranking rules.
The astute reader will notice that the DRL-scoped Follower objects represent the “result” of our process. The question is then, how do we retrieve those objects from the knowledge session after the rules have executed. Although we could put System.out.println statements in our rules consequence blocks or even logging statements in a real application you’ll most likely need to retrieve those objects from the knowledge session to be used in your Java application.
To retrieve an object from the knowledge session Drools provides a querying facility. Drools queries are just like rules that have no consequence block. For example if we wanted to just retrieve all Follower objects in the knowledge session we could write a query like:
query "get all followers"
follower : Follower()
end
Listing SAM-9 A Drools Query
The Drools query in Listing SAM-9 simply matches any objects of type Follower and aliases those as “follower”. In the Java code then we could use the query as shown in Listing SAM-10 to retrieve the results:
FactType followerType = knowledgebase.getFactType( "org.drools.examples", "Follower" );
QueryResults results = knowledgeSession.getQueryResults("get all followers");
for (Iterator i = results.iterator(); i.hasNext();) {
QueryResultsRow row = i.next();
Object follower = row.get("follower");
User user = (User) followerType.get( follower, "user" );
TwitterUserType type = (TwitterUserType) followerType.get( follower, "classification" );
Double ranking = (Double) followerType.get( follower, "ranking" );
logger.info(user.getScreenName() + " is a " + type + " with a ranking of " + ranking);
}
Listing SAM-10 Using a Drools Query and dealing with custom data types
Running the application will produce output similar to:
stuarthalloway is a TWITTER_CASTER with a ranking of -5.0 bmaso is a NEWBIE with a ranking of 5.0 bobmcwhirter is a NOTABLE with a ranking of 5.0 jaredrichardson is a NOTABLE with a ranking of 10.0
As we can see from the output the numeric ranking begins to shed light on the influence and intentions of your followers in Twitter. As we can see above our friend Stu is classified as a Twitter Caster but due to his low activity or average tweets per day, our rules took 5 points of his ranking while Jared is ranked a little lower, as a Twitter Notable but due to his high activity he get a ranking of +5. The next step is to tweak our rules based on observation and investigation of the flagged users. We can see that the simple Twitter Influence Ratio is not sufficient to accurate predict a twitter spammer. But if we continue adding rules that go deeper than the simple static analysis we’ve perform here we can start getting closer to our goal. With a few more rules, possibly taking advantage of semantic analysis we could look at the hash tags, URLs embedded in tweets and other content analysis and more accurately classify Twitter users.
Rule engines can provide a Java developer with an environment in which logic and data are clearly separated. In the simple example used in this article it is easy to see how the DRL file becomes you laboratory of centralized knowledge about the problem at hand. Once the plumbing code is in place you can truly concentrate on the “business logic” in atomic, discrete, manageable chunks.
In this article we’ve barely scratched the surface of the capabilities provided by Drools. Drools 5 is a complete offering that includes the rule engine (Drools Expert), a Business Rule Management System (Drools Guvnor), a process/workflow engine (Drools Flow) and an event processing/temporal reasoning engine (Drools Fusion).
The code for the example above can be found at https://github.com/bsbodden/drools-twitter
0 Comments