I n t r o d u c t i o n     t o     V i s i R e x |
Introduction A Rule Tree Rule Extraction Suggested Uses VisiRex Overview Bibliography |
Inductive Rule Extraction is related to the fields of Machine Learning, Knowlege Discovery, Expert Systems and Artifical Intelligence. Rule Extraction is sometimes called "Decsion Tree Classification". The method depends on the concept of "Entropy" which is a term used by scientists to measure the amount of randomness, disorder, or uncertainty in a population. The use of entropy in the field of Information Theory was introduced by Claud Shannon 40 years ago.
The following example is meant to give a brief demonstration how entropy can be used to extract rules from a database.
Now suppose you notice that all patients are shown as being healthy.
The portion of healthy is 100% (P=1); while the portion sick is 0% (Q=0).
Likewise, if all patients were shown as sick,
the portion of healthy is 0% (P=0); while the portion sick is 100% (Q=1).
The worst case occurs when half the patients are healthy and half are sick.
the portion of healthy is 50% (P=.5); while the portion sick is 50% (Q=.5).
Notice how the value of entropy is always somewhere between zero and one.
Now suppose the above database is widened to include fields for [Age] ("Young" or "Old") and [Wealth] ("Rich" or "Poor").
Next you determine that 60% are poor and 40% are rich.
Try splitting the data into two portions based on wealth, and calculate the entropy of diagnosis for each portion.
The lower (.29) entropy on [Wealth] versus the higher (.63) entropy on [Age] is telling you
that [Wealth] contains more information about the diagnosis than [Age].
Therfore, you should use [Wealth] as the first branch of your decision tree.
You continue this procedure for further tree building, always choosing the fields with least entropy as the uppermost branches of the evolving tree.
|
A Complete System for Inductive Rule Extraction |