Summarizing the message of the previous chapters Content Negotiation is a nearly impossible thing to do. The complexity of the topic Information in Emails yields an intelligent and fault tolerant implementation. The approach has to be modeled by nature. How do human beings understand and rate such complex information structures?
The Nature Model puts before. Information is negotiated by patterns correlated and rated by a network of experiences (knowledge). Combinations of patterns with already well proven patterns construct an information.
Recurring or similar patterns can be grouped together into an unique pattern for the negotiation of the belonging category. The following example shows two approaches with different results:
The simplest approach is to identify an unique and significant key pattern. This pattern can be the word Invitation. This word is a key word of the Private Category. Using the One Shot Matching approach results the two informations being correlated as private email. Unfortunately this only applies to the second information.
This more intelligent approach identifies an unique and significant key pattern similar to the One Shot Matching approach. The two informations differ, but are rated private since. Now a second (and potentially even more) key pattern is identified for both informations. These secondary key patterns can be meeting and wedding.
Now the knowledge base is looked up. Both, Business Email and Private Email contain correlations with the pattern Invitation. The pattern Meeting is only existing in the business category. Vice versa the pattern Wedding is only found in the private category. The correct result is information 1 being rated as Business Email and information 2 being rated as Private Email.
In the informational aspect a pattern can be formulated as a rule. Rules can be implemented and used by an algorithm. Using a set of rules in combination with a special mnemonic can help solving semiotic problems.
Rules have to ensure integrity. This means that new rules must not impact previously correctly negotiated information. An existing information correlated doubtless with a class may not be moved to another class by applying the new rule.
For example the R.I.P.P.E.R algorithm (see next chapter [ripper]) does a great job on this. When new rules are derived the algorithm clarifies the integrity of classes and rules. If conflicts appear, a so called Reduction Phase is invoked and the ruleset is reduced to ensure integrity again.
The aim of teaching rules is to discover and build a knowledge base (Nature Model). There are several methods derived from the topic of Software Engineering Techniques to apply the process of teaching.
A Content Negotiation Algorithm using Unit Training (see figure [unittraining]) is trained on examples for each category before the processing begins. The aim is to provide both unique and problematical examples, as well. This helps the algorithm to conclude and build rules of integrity. After the training phase has finished the algorithm should produce good results. If not, another training is needed.
Figure: Unit Training and Processing
The Real Time Training] (see figure [realtimetraining]) to build a knowledge base is similar to the Software Engineering Waterfall Concept 11: Cite from: Ian Sommerville00. The software begins processing immediately, often misattributing. The training process then is to let the categorizer software work, improving the learning process by user interception when the software fails. With every user interception the software derives new rules. In case of an interception after miscorrelation the training is double effective:
The Real Time Training technique takes time to train the software, causing a high amount of user interception, but resulting in a high feasibility of correct results. After the training phase user interception becomes rarely needed.
Figure: Real Time Training with User Interception
The Paradigm Training is a multi-phase approach (see figure [paradigmtraining]). After finishing the learning phase the processing begins.
This technique follows the paradigm: Watch
Think
Do
Figure: Paradigm Training - Understanding User Actions
The Pattern and Rules Concept is implemented in nearly all available email software available in our days. The simplest and most sufficient approach for small sites is the approach of static negotiation. This technique has a set of user defined rules to determine how to categorize incoming email. No knowledge base is concerned.
Popular MUAs like Microsoft Outlook and its derivatives implement static negotiation with the help of rule sets. These rules help pre-categorizing email in the every day work. These agents have the ability to user-define simple rules using predefined templates and macros (see also figure [applemail-static]):
Figure: Apple Mail Using Static Rules
The problem with Static Negotiation Techniques is concerning new (and therefore unknown) email information, as well as suffering all semiotic problems like described in section [semiotic].
MTAs like Sendmail, Procmail and Exim4 provide simple rule processing like the MUAs do. It is possible to configure email routes and aliases, deny mail from specific hosts or addresses, move email, or use external tools to categorize and rate email.
A promising approach is the Milter Interface 22: Milter - Mail Filter Interface. See http://www.milter.org , which allows third party applications to rate an email using a plugin interface (see figure [milter]). This allows complex filter software to be plugged into dumb MTAs.
Figure: MTA using the Milter Interface
The range of milter software runs the gamut from simple Spam- and Anti Virus Filters up to complex content negotiation software.
Possible MTA Rules:
The disadvantages of the Static Negotiation approach is obvious. The lack of a knowledge base results in information being new information every time. The correct negotiation of information is potentially insecure with each arriving email.
Due to the dynamic diversity of email information a dynamic approach is needed to follow and serve the progress of the information flood. Machine Learning Techniques serve this issue.
The advantage of rule sets using Machine Learning is the ever growing Knowledge Base. The training of rules never has to end (see figures Real Time- and Paradigm Training [realtimetraining] and [paradigmtraining]).
Unique advantages are:
A good example of a (simple) Machine Learning implementation is the MUA called Apple Mail] (see figure [applemail]). The negotiation software starts processing with an initial (pre-trained) knowledge base, causing good results already. Once misrated an user interception is possible. In this case Apple Mail offers a button to tell the software that the last action was incorrect.
Figure: Smart Real Time Teaching in Apple Mail