Content Negotiation in Internet Mail



Home Index Bottom

Previous: Odyssey Of Content Negotiation Next: Machine Learning In The Laboratory

8 Rule Based Negotiation

Summarizing the message of the previous chapters Content Negotiation is a nearly impossible thing to do. The complexity of the topic Information in Emails yields an intelligent and fault tolerant implementation. The approach has to be modeled by nature. How do human beings understand and rate such complex information structures?

8.1 Patterns

The Nature Model puts before. Information is negotiated by patterns correlated and rated by a network of experiences (knowledge). Combinations of patterns with already well proven patterns construct an information.

8.1.1 Pattern Correlation

Recurring or similar patterns can be grouped together into an unique pattern for the negotiation of the belonging category. The following example shows two approaches with different results:

  1. Invitation to team meeting 9:30 GMT 1 room 4.22
  2. Invitation to our wedding May, 26th 2006

8.1.1.1 One Shot Matching

The simplest approach is to identify an unique and significant key pattern. This pattern can be the word Invitation. This word is a key word of the Private Category. Using the One Shot Matching approach results the two informations being correlated as private email. Unfortunately this only applies to the second information.

8.1.1.2 Relation Matching

This more intelligent approach identifies an unique and significant key pattern similar to the One Shot Matching approach. The two informations differ, but are rated private since. Now a second (and potentially even more) key pattern is identified for both informations. These secondary key patterns can be meeting and wedding.

Now the knowledge base is looked up. Both, Business Email and Private Email contain correlations with the pattern Invitation. The pattern Meeting is only existing in the business category. Vice versa the pattern Wedding is only found in the private category. The correct result is information 1 being rated as Business Email and information 2 being rated as Private Email.

8.2 Deriving Rules

In the informational aspect a pattern can be formulated as a rule. Rules can be implemented and used by an algorithm. Using a set of rules in combination with a special mnemonic can help solving semiotic problems.

8.2.1 Integrity of Rules

Rules have to ensure integrity. This means that new rules must not impact previously correctly negotiated information. An existing information correlated doubtless with a class may not be moved to another class by applying the new rule.

For example the R.I.P.P.E.R algorithm (see next chapter [ripper]) does a great job on this. When new rules are derived the algorithm clarifies the integrity of classes and rules. If conflicts appear, a so called Reduction Phase is invoked and the ruleset is reduced to ensure integrity again.

8.3 Teaching Rules

The aim of teaching rules is to discover and build a knowledge base (Nature Model). There are several methods derived from the topic of Software Engineering Techniques to apply the process of teaching.

8.3.1 Unit Training

A Content Negotiation Algorithm using Unit Training (see figure [unittraining]) is trained on examples for each category before the processing begins. The aim is to provide both unique and problematical examples, as well. This helps the algorithm to conclude and build rules of integrity. After the training phase has finished the algorithm should produce good results. If not, another training is needed.

Figure: Unit Training and Processing

8.3.2 Real Time Training

The Real Time Training] (see figure [realtimetraining]) to build a knowledge base is similar to the Software Engineering Waterfall Concept 11: Cite from: Ian Sommerville00. The software begins processing immediately, often misattributing. The training process then is to let the categorizer software work, improving the learning process by user interception when the software fails. With every user interception the software derives new rules. In case of an interception after miscorrelation the training is double effective:

  1. The software knows what to prevent in future.
  2. The software knows a new kind of correlation.

The Real Time Training technique takes time to train the software, causing a high amount of user interception, but resulting in a high feasibility of correct results. After the training phase user interception becomes rarely needed.

Figure: Real Time Training with User Interception

8.3.3 Paradigm Training

The Paradigm Training is a multi-phase approach (see figure [paradigmtraining]). After finishing the learning phase the processing begins.

  1. Observing Phase The software takes a look on the user actions and stores every event in a local database.
  2. Conclusion Phase The software reads all saved events from the database and tries to derive rules and knowledge by conclusion.
  3. Processing Phase The software starts processing using the rules and knowledge base.

This technique follows the paradigm: Watch Think Do

Figure: Paradigm Training - Understanding User Actions

8.4 Static Negotiation

The Pattern and Rules Concept is implemented in nearly all available email software available in our days. The simplest and most sufficient approach for small sites is the approach of static negotiation. This technique has a set of user defined rules to determine how to categorize incoming email. No knowledge base is concerned.

8.4.1 Mail User Agents

Popular MUAs like Microsoft Outlook and its derivatives implement static negotiation with the help of rule sets. These rules help pre-categorizing email in the every day work. These agents have the ability to user-define simple rules using predefined templates and macros (see also figure [applemail-static]):

Figure: Apple Mail Using Static Rules

The problem with Static Negotiation Techniques is concerning new (and therefore unknown) email information, as well as suffering all semiotic problems like described in section [semiotic].

8.4.2 Mail Transfer Agents

MTAs like Sendmail, Procmail and Exim4 provide simple rule processing like the MUAs do. It is possible to configure email routes and aliases, deny mail from specific hosts or addresses, move email, or use external tools to categorize and rate email.

A promising approach is the Milter Interface 22: Milter - Mail Filter Interface. See http://www.milter.org , which allows third party applications to rate an email using a plugin interface (see figure [milter]). This allows complex filter software to be plugged into dumb MTAs.

Figure: MTA using the Milter Interface

The range of milter software runs the gamut from simple Spam- and Anti Virus Filters up to complex content negotiation software.

Possible MTA Rules:

8.5 Dynamic Negotiation

The disadvantages of the Static Negotiation approach is obvious. The lack of a knowledge base results in information being new information every time. The correct negotiation of information is potentially insecure with each arriving email.

Due to the dynamic diversity of email information a dynamic approach is needed to follow and serve the progress of the information flood. Machine Learning Techniques serve this issue.

8.5.1 Machine Learning

The advantage of rule sets using Machine Learning is the ever growing Knowledge Base. The training of rules never has to end (see figures Real Time- and Paradigm Training [realtimetraining] and [paradigmtraining]).

Unique advantages are:

A good example of a (simple) Machine Learning implementation is the MUA called Apple Mail] (see figure [applemail]). The negotiation software starts processing with an initial (pre-trained) knowledge base, causing good results already. Once misrated an user interception is possible. In this case Apple Mail offers a button to tell the software that the last action was incorrect.

Figure: Smart Real Time Teaching in Apple Mail

Previous: Odyssey Of Content Negotiation Next: Machine Learning In The Laboratory

Home Index Top
Managed with the WikiSH written in /bin/sh
and the T2W Tex to Wiki Translator
2005, by Sebastian Misch