"Some Thoughts About Horse Race Prediction "
By Doug McCormack, CorMac Technologies Inc.
Several NeuNet purchasers report they intend to use it for horse & dog racing, as well as team sports. I have received very little feedback as to how well they are doing. Eugene Mosby firstname.lastname@example.org reported excellent results using SFAM for greyhound races. An article in IEEE Expert magazine, Dec/94 reported good results using a neural nets to predict greyhound races. It was interesting to see how they structured the inputs. That is really the key.
Here are my ideas for horse racing:
SECOND: I would compile factors A,B,C,D,E for each horse in a particular race.
THIRD: I would re-state factors A,B,C,D,E for each horse, so they indicate
the amount of fall-back from the best horse in this race.
FOURTH: I would arrange a neural network pattern for each horse in the race using only one pattern per horse. Inputs are A,B,C,D,E and output is NUMBER OF LENGTHS finished behind winner. It is not right to simply predict a WIN because a win by 5 lengths is very different than a win by a nose.
The above method allows you to make predictions on races where none of the horses have met before.
A problem is that horse race data is very noisy *grin*. There are many "upsets" that occur. So you must use a very large sample to average out these upsets. I believe it should be possible to train the NN to predict the final crowd odds instead of the horse finishes. Studies have shown that crowd odds are a very accurate predictor on average. You would be using the crowd as a noise filter to reduce the size of your sample -- Just a thought.
Another method of structuring your data is called "Pairwise Matching". Each neural net pattern contains data for only two horses, and the neural net learns to predict the winner. When a group of horses meet, you run your predictions on one pair at a time.
I became interested in horse racing 18 years ago when I was living in Edmonton, just a few blocks from the Northlands Racetrack. At that time I was reading every possible library book on stock market prediction, and I stumbled upon a chapter about horse race prediction. I had never even been to a horse race, but it appealed to my mathematical nature. I spend many hours over the next few years in developing and testing various systems. I bought books by Ainslie, Gibson, Sullivan, Thorp, Beyer, Permutter, Mitchell, Barr, Bauman, Quirin, Davidowitz, and Ziemba. I borrowed many library books. The two best books were "Beat The Racetrack" by Dr. William Ziemba; and "The Science of Winning" by Fabricand.
I developed a system that really works! However since moving back to my home town of Thunder Bay, the nearest racetrack is 400 miles away; so I have kind of lost interest. My memory is a little rusty, but I will try to summarize some of my studies and opinions for you.
The problem with most handicapping systems is that they are obsessed with trying to select the horse most likely to win. They don't look for the best bet! Suppose the Ainslie pick will win 2 out of 3 times, but will pay only a 40% profit if it wins. This horse is the one most likely to win, but repeated bets on this horse will be almost certain to lose your money. Another horse will win only 1 time out of ten, but will return 12 times your money. This horse may be the least likely to win, but may be the best bet. Repeated bets on this horse will almost certainly make money (long term). Please don't interpret this example to mean that long shots are a better bet (actually the reverse is often true). This example shows that it is not enough to know the most likely winner. You must look at each horses's probability of winning and compare that to the reward offered.
Once you have identified an attractive bet, the mathematics gets really interesting. How much of your bankroll should you risk on this horse? Gamblers refer to this question as "Money Management", while investors call it "Asset Allocation". I did some deep calculus on this problem and came up with a paper I call the MARS Equation (Maximized Asymtoptic Return Series). This equation, when reduced for horse racing, says the maximum long-term growth of your bankroll will occur when Wager in % of current bankroll = 100% * [(odds+1)*chance of winning-1] / odds. Suppose a neural network determines that a horse has a 40% chance of winning, and the horse goes off at odds of 3 to 1. Your optimal bet is 20% of your current bankroll.
The odds can easily be determined by studying the tote board just before post time. But how do you get an accurate assesment of the probability of winning? There are many handicapping methods that will indicate the relative probabilities of each horse. I experimented with multivarible regression. One of the most accurate indicators of win probability is simply what portion of the crowd's money is bet on each horse. If a horse receives 20% of the win pool, it has a 20% chance of winning. Economists refer to this situation as an "Efficient Market". One major study showed that the crowd's efficiency was slightly biased toward long-shots. If we assume that the crowd is perfectly efficient and there is a 17% track take with nickel breakage; a horses chance of winning will be 83% / (odds+1.05).
With odds of 0.2, horses should win 66.4% of races, but actually won 86.7% 0.5 53.5% 66.7% 1.0 40.5% 46.8% 1.5 32.5% 30.8% 2.0 27.2% 27.2% 3.0 20.5% 16.7% 4.0 16.4% 15.7% 5.0 13.7% 13.0% 10.0 7.5% 5.8% 15.0 5.2% 4.4% 20.0 3.9% 2.8% 30.0 2.7% 2.1% 50.0 1.6% 0.7%If you let the crowd determine the probabilities for you, you will never find an attractive bet, because the payback is calculated from the same crowd numbers, less track-take. Any truly winning system must determine a horse's win probability to be significantly greater than the crowd's consensus. Your number must be right and the crowd consensus must be wrong. This is a tall order, because the above table shows the crowd's consensus is not wrong by much.
You must be very careful not to fool yourself into believing you have a winning system!!!!! If you try hard enough and long enough, you will eventually predict five coin tosses in a row. Your brain starts to truly believe you have a system to predict coin tosses, when your success is totally due to luck.
Anyone can cook-up a system that works well on a single previous race. He then "fine tunes" his system so it works well on a whole bunch of previous races. Actually any group of previous races can be considered as just one very large previous race. This system may be of no value for future races.
It is quite possible that a "lucky" player using a bad system will outperform someone who is using a truly winning system. Your results will be a mixture of luck and your system. The mechanics of horse racing are such that the performance of your system tends to be overshadowed by giant waves of luck (both bad and good). Lady Luck will trick you into thinking you have a winning system when you don't. Or she will cause you to reject a truly winning system. It is a huge task to properly evaluate a system (refer to "Beat the Racetrack" by Dr. William Ziemba).
MY WINNING SYSTEM:
I used the tote-board win pools to calculate the crowd's probabilities for each horse. These numbers were adjusted slightly to allow for the crowd's tendancy to under-bet favourites and over-bet long shots (as shown on above table). At this point I assumed I had accurate win probabilities.
Knowing the win probabilities, it is possible to calculate the place and show probabilities for each horse. I calculated hundreds of different scenarios for the first three horses to cross the finish line. Each scenario had a certain probability and a certain payback. Probability times payback equals expectation. For each horse I summed the expectation for all possible place and show scenarios. I was delighted to find that distortions in the place and show pools caused occassional cases of positive expectation. The crowd was inefficient because the win/place/show pools did not have the correct mathematical relationship to each other. I used my MARS equation to place my bets. I purchased a Radio Shack model 100 laptop computer to run my program. The computer attracted plenty of interest in those years. The computer required three minutes to perform the calculations, so I would enter the pool numbers at four minutes before post, then I would stand in line at the wickets. By the time I got to the front, the computer would have the bet(s) calculated.
In 1985, I purchased a book called "Beat The Racetrack" by a mathematician named Dr. William Zieba. I was shocked to find his "DR. Z METHOD" was the same as mine. He makes some simplifying assumptions, so the calculations can be done more easily and quickly. He even has precalculated graphs you can use. The most interesting part was his "proof" that the method works. His graphs show how even a successful system is overshadowed by waves of luck.
A POSSIBLE FUTURE SYSTEM:
I believe an ideal system is one that would place a number of offsetting bets on the same race, so the effects of luck would be cancelled, and you would see a slow, steady growth in your bankroll. I expanded my MARS equation to include multiple bets, but the math became very difficult with complex numbers and imaginary roots to multiple equations. I obtained a copy of a paper published in the Bell System Technical Journal, July 1956. The title is "A New Interpretation Of Information Rate" by J. L. Kelly Jr.
Kelly talks about sending bits of information into a noisy phone line. Each bit has a certain probabilty of being clobbered before it arrives at its destination. He places a bet on each bit, then tries to optimize the growth of his money. He shows that his equation is identical to the "Shannon Theory" for transmission rate over a noisy channel. He then goes on to illustrate how this theory could be used for betting on a horse race.
Today, some gamblers talk about a "KELLY" betting system, which means they always bet some judicious portion of their current bankroll. They will never go broke because they bet only a portion of what they have left. If their bankroll grows, their bets become larger. I did some reseach into Kelly betting systems with extremely interesting conclusions. This work led up to my MARS equation.
I have read the Kelly paper dozens of times, taking the math line by line; but I always get lost at one point about half of the way through. Therefore I must confess that I do not understand Kelly's paper. However, his conclusions for multiple bets on a horse race are summarized as follows:
Assuming each horse i has a win probability Pi and odds Oi,
Expectation = Probability * (Odds+1) - 1
NeuNet Tip #1:
NeuNet Tip #2:
for example the factors for three horses are
Assuming a three horse race, the pattern would be
Using this method, the NN has a chance to learn that you are talking about horse #1 (the first 4 inputs), while it has a chance to size up the competion.
The second method is the one shown in the IEEE article.
Here you enter only one horse per pattern, but the pattern is adjusted
to show how this horse stacks up against the competion. For example,
assume your horse #2 came first, #1 came second and #3 came third.
Each input should be adjusted to show the difference between it and best
in the race for each input.
With this method, the patterns can be shuffled into any order, and you don't have to worry about the number of horses in the race. I would recommend you make an entirely different project for races of different lengths. I would try to predict LENGTHS BEHIND WINNER instead of finish postion.
We wanted to include a horse racing example with NeuNet, but we became discouraged at the amount of data that is required in order to average out all the "UPSETS" that occur. Perhaps predicting the FINAL ODDS instead of LENGTHS would act as a filter to reduce the data requirements. You would simply rank the predicted finish to match the predicted ODDS.
It is a tedious job to extract data from thousands of races by hand. Perhaps you can purchase this data on CD-ROM or download from the internet. If anyone knows a good source, please post a message in the NeuNet forum.
The IEEE article uses factors like % wins this year, % places, % shows, average $ winnings per race this year and last year, fastest time in last 5 races, average finish position in last 5 races.
NeuNet Tip #3:
NeuNet Tip #4:
If your horse has a 45% chance of winning,
NeuNet Tip #5:
HORSE1 190,110,300,500:Lost by 0.0 lengths (WINNER)The NN inputs would be:
10,00,00,05:0.0 20,10,30,00:3.0 00,00,20,10:0.1Notice how each input is the difference between this horse and the best in the race FOR THAT ONE FACTOR. For example the 10 for horse1 comes from comparing horse1's 190 to horse3's 200. This method should predict the finish order (and gaps between each horse - great for exotic bets). You could run a prediction on any one horse alone. You could have up to 254 different factors considered for each horse. Do not mix races of different lengths. If POST POSTION is one of your factors, do not mix races of different field sizes.
I hope this long message gives you some food for thought. Comments are welcome in the NeuNet Discussion Forum at http://www.cormactech.com/neunet
Congratulations on winning the Uk national competition. You must have put a lot of work into your horse racing program, or you are very lucky *grin*.
You are correct that North American racing is much different. Our odds are calculated from the tote board numbers. Our PLACE and SHOW are different from yours.
About 2 years ago I had a long series of emails from a chap in Australia who was very interested in this subject. He had a database of 40,000 races and he was looking for someone to develop a program using his data.
His name is John Wikman. His email was email@example.com You might try dropping him an email to see if he is still interested in this subject. I recall that his database of Australian races was based on the UK model, not the North American model.
From: Miles Miles Michelson
Doug, interested in your MARS equation as you termed it on your web page.
Looking for a few good betting strategies and you seem to know what you're
talking about mathematically.
Also, on the Kelly front, I just finished a Kelly Calculator for Win95 this
http://www.dmtc.com/tools/wagering_tools.html to download a
copy. I spent the better part of the last two days digesting his AT&T paper
on transmission rates and finally figured out the complete algorithm. One
small thing about your Kelly summary that isn't complete. When you say:
Doug, interested in your MARS equation as you termed it on your web page. Looking for a few good betting strategies and you seem to know what you're talking about mathematically.
Also, on the Kelly front, I just finished a Kelly Calculator for Win95 this afternoon http://www.dmtc.com/tools/wagering_tools.html to download a copy. I spent the better part of the last two days digesting his AT&T paper on transmission rates and finally figured out the complete algorithm. One small thing about your Kelly summary that isn't complete. When you say: