VisiRex -- Extracting Rules

V i s i R e x T i p s & T r i c k s

Tips & Tricks

Choosing a Project

VisiRex vs NeuNet

Extracting Rules

Mining Data

Modeling Data

Using Marks

Tips for Extracting Rules

Begin the three step rule extraction wizard by pressing the exclamation ! icon on the tool bar.

Field Selection

Text fields are always treated as discrete classes.
Numeric fields can be treated as either discrete or continuous. Double click on the name of a numeric field to toggle it to discrete.
Discrete fields cannot be used if they contain more than 256 unique values (classes).
The Field to Predict must be a discrete field.
The Fields to Use may be either discrete or continuous.
HINT: Using continuous fields where possible seems to produce a smaller tree. This happens because continuous fields split two ways on a certain value, while any discrete field that appears on the tree must show all possible classes for that field.
Try to avoid using fields that are obviously redundant measures of the same thing. Also ignore fields that are obviously unrelated to the target prediction.
The Field Statistics view is shown to assist in selecting the fields.

Row Selection

The black numbers can be dragged with your mouse, or you can double click on a black number to over-type the current value. Remember to press enter after typing a new value.
For data mining small databases, set both training set and testing set too cover all rows.
For initial experiments on large databases, you will save time if you select only a small portion of data for both training and testing.
If you decide to use training set different from testing set, always think about whether you data is well shuffled.

Rule Extraction

The extracted rules are normally shown as a color-coded flow chart. However if you tree is too wide (approximately 200 to 300 nodes), the program will change to Tree Chart view. If you later prune down to a smaller tree, you must manually select Flow Chart if you wish to return to that view.
When a flow chart has dozens of nodes, they will be difficult read on the computer screen. You can zoom-in on the flow chart by simply dragging a rectangle around the area of interest. This drag method seems to work better than using the provided magnifying glass. You can control the printed size of the final flow chart by using the print tiling option.
This is a visual, interactive screen where you experiment with various settings for Prune Rate and Min Items until you arrive at an acceptable rule tree. Keep observing the tree size, Node Count and Confidence as you experiment with the extraction settings. The goal is to build the most simple tree possible for the desired level of confidence. It is up to your subjective judgement to trade-off tree size versus accuracy. During this experience, you may learn new insight about which fields are important and how the fields relate to the prediction. This is knowledge discovery!
Remember to press Extract Rules Now to refresh the tree to your latest settings.

Hints for Tree Pruning

The Prune Rate is the most intelligent way to prune, as it goes to the heart of the entropy calculations. A higher setting produces a smaller, less accurate tree.
The Min Items is a crude way of pruning, as it simply ignores cases that occur less frequently than this minimum setting. The use of Min Items works best on large or noisy databases, where you wish to discourage the extraction of trivial rules to explain small amounts of data. A higher setting produces a smaller, less accurate tree.
In practice, the optimal tree occurs at some combination of Prune Rate and Min Items.
You may return to Rule Extraction at any time.

A Complete System for Inductive Rule Extraction

CorMac Technologies Inc.
34 North Cumberland Street ~ Thunder Bay ON P7A 4L3 ~ Canada
E m a i l