VisiRex -- Rule Extraction

-->

I n t r o d u c t i o n t o V i s i R e x

Introduction

A Rule Tree

Rule Extraction

Suggested Uses

VisiRex Overview

Bibliography

What is Inductive Rule Extraction?

Inductive Rule Extraction is related to the fields of Machine Learning, Knowlege Discovery, Expert Systems and Artifical Intelligence. Rule Extraction is sometimes called "Decsion Tree Classification". The method depends on the concept of "Entropy" which is a term used by scientists to measure the amount of randomness, disorder, or uncertainty in a population. The use of entropy in the field of Information Theory was introduced by Claud Shannon 40 years ago.

-Entopy = P * log₂(P) + Q * log₂(Q)

The following example is meant to give a brief demonstration how entropy can be used to extract rules from a database.
Imagine a large database of medical patients contains only one field as "Sick" or "Healthy". It is your job to make a diagnosis using any information you can extract from this single column database.

Now suppose you notice that all patients are shown as being healthy. The portion of healthy is 100% (P=1); while the portion sick is 0% (Q=0).
For this example, -Entropy = 1 * log₂(1) + 0 * log₂(0) , Entropy = 0.0
implying no randomness, so a diagnosis of "healthy" could be made with high confidence.

Likewise, if all patients were shown as sick, the portion of healthy is 0% (P=0); while the portion sick is 100% (Q=1).
For this example, -Entropy = 0 * log₂(0) + 1 * log₂(1) , Entropy = 0.0
implying no randomness, so a diagnosis of "sick" could be made with high confidence.

The worst case occurs when half the patients are healthy and half are sick. the portion of healthy is 50% (P=.5); while the portion sick is 50% (Q=.5).
For this example, -Entropy = .5 * log₂(.5) + .5 * log₂(.5) , Entropy = 1.0
implying total randomness. This database contains zero information regarding the diagnosis.

Notice how the value of entropy is always somewhere between zero and one.

Now suppose the above database is widened to include fields for [Age] ("Young" or "Old") and [Wealth] ("Rich" or "Poor").
You determine that 80% of the patients are young and 20% are old. Try splitting the data into two portions based on age, and calculate the entropy of diagnosis for each portion.
Say entropy of "Young" portion = .65, Entropy of "Old" portion =.55.
Combined overall entropy for [Age] is weighted average, (80% * .65) + (20% * .55) = .63

Next you determine that 60% are poor and 40% are rich. Try splitting the data into two portions based on wealth, and calculate the entropy of diagnosis for each portion.
Say entropy of "Poor" portion = .45, Entropy of "Rich" portion =.05.
Combined overall entropy for [Wealth] is weighted average, (60% * .45) + (40% * .05) = .29

The lower (.29) entropy on [Wealth] versus the higher (.63) entropy on [Age] is telling you that [Wealth] contains more information about the diagnosis than [Age]. Therfore, you should use [Wealth] as the first branch of your decision tree.

You continue this procedure for further tree building, always choosing the fields with least entropy as the uppermost branches of the evolving tree.

A Complete System for Inductive Rule Extraction

CorMac Technologies Inc.
34 North Cumberland Street ~ Thunder Bay ON P7A 4L3 ~ Canada
E m a i l