<<Prev TOC Next>>

TeraForge Tutorial Data Model

For those with little or no exposure to thoroughbred racing, this brief discussion will hopefully help you follow the remainder of the tutorial. (If you're already comfortable with horse racing concepts, you can skim the data model section below, then proceed to the first tutorial)

Horse racing has been a part of our social fabric since the earliest days of civilization. Thoroughbred horse racing is the most basic organized form of the sport, where distance, speed, class, and minimal equipment determine which horse is best. While the history of horseracing has a colorful, exciting, - and occasionally seamy - history, we will focus on the details and jargon of the sport with respect to managing a database which will assist in picking winners of races, a process known as handicapping.

As you are probably aware, thoroughbred horseracing is a sport which encourages - indeed, thrives on - wagering on race outcomes. One interesting bit of history: US President Andrew Jackson collected the money to finance his presidential election bid by winning a horserace.

Data Model Definitons

Races are run

The surface conditions for a given race may be effected by weather conditions, e.g., after a heavy rain, the dirt track may be muddy or sloppy, and a turf course may be "yielding".

There are different types of races, and each type has its own set of classes:

Each race also has a purse, or the amount of money which is awarded to the winner (as well as fractional portions awarded to the 2nd, 3rd, etc. horses). The type, class and purse of a race is considered an overall guide to the quality of horses likely to be running in the race.

From a handicapping perspective, another important aspect of races are the fractions, which indicate the amount of time for the leading horse to reach a certain intervening point in the race, measured from the start of the race. For example, a 1 mile race may have fractions recorded at the 1/4 mile, 1/2 mile, 3/4 mile and final 1 mile distances. When these fractional times are recorded, the distance from each horse to the horse behind it is also recorded, measured in lengths, which is a generalized measurement of the length from the shoulders of a horse to it backside, approx. 7 feet. For horses closer to each other than a length, fractional lengths may be used, or the terms "nose", "head", or "neck" may be used to indicate being ahead by a nose, head, or neck. The combination of fractions is useful in handicapping for several reasons:

One other additional aspect of a race, known as a key race, is the number of horses which come out of a race and win or place in their very next race. A horse which comes out of such a race may have lost against other high quality horses, and thus may be more capable than the actual finish position may indicate.

A race entry consists of:

  1. the horse (of course)
  2. the jockey that rode the horse,
  3. the trainer who cares for and schools the horse
  4. the post position (the position in the starting gate)
  5. position in the final order of finish
  6. position relative to the lead horse at the various fractions
  7. any special equipment (e.g., blinkers)
  8. any medications (e.g., Lasix) used by the horse
  9. the morning line, i.e., opening odds that the entry will win the race
  10. the final odds that the entry will win the race, determined by the amount bet on the entry relative to the total amount bet on the race
  11. brief commentary on the horse's trip in the race

Each of these are important aspects for handicapping (some more important than others, but useful to know, nonetheless).

For those interested in wagering, it is important to know the morning line assigned to the horse by handicapping specialists based on the estimated quality of the horse as determined either from its prior race history, or from its breeding, or other factors. Comparing the morning line to the current or final odds on the horse when the race actually starts can be important in gauging the knowledge of both the betting public, and of the handicappers that set the morning lines, and is useful for determining the "value" of betting a horse. For example, a horse that may have won a previous race in which the pace of the race was actually below par may be mistakenly assigned low odds to win its next race, while a horse that came in 4th in a very fast race may be given higher odds, and thus present a better betting value.

Finally, many tracks provide brief commentaries on how a horse ran in a given race. These comments can shed light on why a particular horse performed unexpectedly well ('on the rail, clear, driving', indicating a perfect, easy trip) or poorly ('bumped gate, pushed 4 wide entering, bumped stretch', indicating a very rough trip indeed).

So when we handicap a race, we want to compare each horse based on their past performances (aka PPs) in races, noting horses that ran very well, but may have lost due to a fast pace or a rough trip, and thus may have higher odds than they deserve, or for horses which won races, perhaps due to a slow pace or perfect trip, and may be getting lower odds than they deserve.

NOTE: There are numerous other factors used in handicapping, e.g., breeding, "figures", etc., which we'll disregard for purposes of this tutorial.

Data Model

Our data model is very simple: we will track individual races and the entries in those races. We will also keep a summary table of key races, and a summary table of par fractions for each track, surface, distance, and race type and class. We will also track the wagering results in each race.

Our race table contains the following columns:

The primary key of this table will be the track, racedate, and racenumber.

Our entries table has

The track, racedate, and horse are the primary key of the entries table.
The track, racedate, and race number are the foreign key into the race table.

Racing Style

We derive a racing "style" indication for the horse in the race using the fractional positions.

These letters are sequenced into a string that provides a simple shorthand to describe a horse's running style in the race, e.g. 'PPPP' is a horse that ran with the lead the entire trip, while 'TTSP' indicates a stalking horse that saves energy for the final burst to the finish.

When handicapping a race, we will only use the 5 most recent races for each horse entered. We join the race table to the entries table on the track, race date, and race number to generate the full data for our past performance data. We also join to the keyrace table on track, race date, and race number to retrieve any keyrace information. Finally, we will join to the track pars table on track, distance, surface, racetype, and race class to provide comparative pars for the fractions of the race.

Collecting Race Data

Historical race result data (known as a "result chart") is available from the Equibase website as HTML formatted pages. In the interest of brevity, we'll skip the details of retrieving and processing this data; instead, for purposes of this tutorial, an example small subset of data has been collected into a VARTEXT formatted file, along with an associated BTEQ load script used to create and load the base tables.

<<Prev TOC Next>>

Copyright© 2004, Presicient Corporation, USA. All rights reserved.
Teradata® is a registered trademark of NCR corporation.