Tips & Tricks
Choosing a Project
Configuring a Project:
Choosing Between SFAM and BackProp
- Choose SFAM if you are trying to predict a class based on clean data.
- Choose BackProp if you are trying to predict a value.
- BackProp may be used for classification, especially if the data is dirty.
Try to avoid using SFAM if your data contains any blatant contradictions.
- In some cases, the data can be cleaned using BackProp, then the project can be completed using SFAM.
- It is possible to convert numeric predictions to class predictions by dividing the target numbers
into ranges. For example: "HIGH, MEDIUM, LOW".
How Many BackProp Nodes Should be Used?
- SFAM nodes are created automatically as required.
- With BackProp you may specify the number of nodes under advanced project configuration.
- The program will default to the rule-of-thumb that number of nodes = 2 x SquareRoot(Number of Inputs +1)
- The default number of nodes appears to work very well, however you are free to experiment with different numbers.
- If too many nodes are used, BackProp will tend to memorize the training data instead of learning to make useful predictions.
The performance will appear very good on training data, but predictions will be poor on testing data which has not been seen during training.
- If too few nodes are used, BackProp may have difficulty making accurate predictions on the training data, but the predictions on testing data may improve.
Which Fields Should be Used for Input?
- Input fields must be numbers, dates or times. Text fields cannot be used for inputs
- Try to avoid using redundant input fields.
For example, bond prices and interest rates are really measuring the same thing.
Choose either bond prices or interest rates, but not both.
By reducing the number of inputs, there is less change of discovering spurious correlation and the project will train faster.
- If some of your records are missing input values, those records will be skipped during training and NeuNet Pro will show "N/A" as the prediction for that record.
- It is possible to estimate the missing values by creating a new project that will treat the problem column as the target prediction.
How Can Text Fields be Used for Input?
- Text fields must be converted to number fields if they are to be used for inputs.
- Suppose you have an input field that contains one of three colors - Red, Green, Blue.
These words could be converted to color number 1, 2, 3.
- It is better to create 3 new fields for this color example.
The three new fields could be titled - "Red, Green Blue".
A color "Red" would be entered as 1, 0, 0; "Green" would be entered as 0, 1, 0; and "Blue" would be entered as 0,0,1.
Now it is possible to enter any mix of colors as a mixture of "Red, Green, Blue".
Setting User Min/Max:
- User Min/Max is adjusted in the Configure Project... Advanced screen.
- User Min/Max defaults to the actual values of the Min/Max for your entire data table.
- User Min/Max is used to control the internal "normalization" of the data.
A data value greater than or equal to User Max is normalized to 1.0
A data value less than or equal to User Min is normalized to 0.0
All other data is proportioned between 0 and 1.
For example a value half way between User Min and User Max is normalized to 0.5
- In some cases, you may wish to adjust User Min/Max:
- If you expect future data to occur outside the range of the Min/Max for your current data range,
it is helpful to widen User Min/Max to reflect the range of future data values.
For example, if all your fields are data readings from the same instrument,
you may wish to set User Min/Max to the range of that instrument.
- If your data contains a few outliers, these outliers will distort the normalization for the rest of your data.
You should narrow the User Min/Max so the normalization is scaled to the majority of your data.
For example, suppose you are working on 1000 rows of data and 997 rows have a certain field valued between 100 to 200.
Three rows have this field valued at 1000. You should decrease User Max from 1000 to around 200, so the normalization of the data reflects the majority range.
- If you have a high portion of your data occurring at User Min and/or User Max,
it is helpful to widen the range of User Min/Max so there is some "elbow room" for your data.
This technique will tend to move your data out of the unresponsive tails of the sigmoid function, leading to better predictions.
For example, suppose you are trying the predict a field that has values of 0 or 1. Try setting User Min to -0.25 and User Max to 1.25, and your predictions should improve.
A Complete Neural Network Development System