Saturday, November 5, 2016

Watchout Ratings: What They Mean, Part III

Implications and Value in Assessing Performance

I have said this many before, and I am creating this post to emphasize a point brought up in the comments recently: my ratings, like any other performance evaluation, are inherently imperfect. Their purpose is not to pinpoint the exact caliber of performance (that would be the ideal, but realistically that is impossible) for every athlete. Rather, the purpose is to estimate roughly how Race A compares to Race B, and this is done given how Athlete A’s time compared to the rest of the field in the race, as well as the estimated caliber of field at the race.

What I use these ratings for is a tool to predict potential/future outcomes. Here is a very quick and rough lesson in the statistics of it all.

For any mid- or late-season race, the standard deviation of the field (after eliminating all outliers) I am comparing usually tends to end up at around 0.80-1.80% of the race rating (in general: closer to 1.80% if the races are closer to 3 weeks apart and fewer common athletes to compare, and closer to 0.80% if the races are more like 1-2 weeks apart and a fair amount of common athletes to compare). As an example, the AVERAGE standard deviation for the Washington State Qualifiers meets ended up being 1.08% (1.08% for 4A District II, 0.79% for the Westside Classic, 0.88% for District I, 1.20% for the 4A Eastern Regional, 1.93% for 2A District IV, 1.58% for 2A Eastern Regional, 0.81% for 3A District II/V+, and 1.35% for 4A District VI. Oregon State Qualifiers had similar standard deviations: 1.28% for 6A-1, 1.18% for 6A-2, 1.71% for 6A-3, 1.26% for 6A-4, 1.53% for 6A-5, 1.26% for 6A-6/5A-3/4A-5, 1.34% for 5A-1, 1.09% for 5A-2, 1.64% for 5A-4, 1.20% for 5A-5, 1.45% for 4A-1, 0.82% for 4A-2, 1.54% for 4A-3, 1.36% for 4A-4, and 1.98% for 4A-6).

If high school XC performances from week to week resembled a Standard-Normal Distribution (it doesn’t, but taking out outliers let’s say it gets close enough to maintain this general point), that means we can have rough probabilities of the range of performance from one race compared to another. In statistics, in a Standard-Normal Distribution, about 68% of data falls within one standard deviation of the mean (average), and about 95% of data falls within two standard deviations of the mean. Let’s consider those percentages as the chances of a runner ending up within 1 or 2 standard deviations of the estimated performance from Race A to Race B, IF the runner’s performance isn’t an outlier.

Hypothetical scenario: I rate a race with a fairly low standard deviation (1.20%). An athlete runs 16:00, and given the statistics above, if that was not an outlier performance (and he doesn’t have an outlier performance in the hypothetical next race), he would have about a 68% chance of finishing within 11.5 seconds of that in a similarly rated race, and about a 95% chance of finishing within about 23 seconds in a similarly rated race. To use the Washington 4A Projections as an example, Central Valley's Gabe Romney fits that description, and the implication is that (according to my ratings) if everyone in the field has roughly the same performances as they did at their League/State Qualifier race, he has about a 68% chance to finish between 10th and 26th, and about a 95% chance to finish between 6th and 40th, assuming he doesn't have an extreme race (re: outlier performance). Or, to look at it another way, let’s say an athlete runs a 10:30 in track in a normal situation: my ratings would have about a 68% chance of estimating their performance as being somewhere between a 10:25-10:35, and about a 95% chance of estimating their performance as being somewhere between a 10:20-10:40. The ratings provide a ballpark estimate, but not a precise measurement.

This is the goal of my ratings: to put athletes, and teams, in general packs to describe how they compare to one another. When there are a lot of athletes and teams similarly rated, it means there is a very high volatility to the projections (or put in other words, the “form charts” don’t mean very much in these situations). Most athletes/teams will generally find themselves in that situation when they go up against similar athletes/teams, which for the best athletes/teams doesn’t really happen until meets like NXR/NXN and FLW/FLN, though sometimes also a major invites and their state meet.

EDIT: Post State Meet update, these are the standard deviations for each of the five mainland NW State meets - think of this as meaning that the smaller the standard deviation, the better the fit the ratings ended up.

Idaho State = 1.11%
Montana State = 0.92%
Wyoming State = 0.87%
Oregon State = 0.82%
Washington State = 0.70%

No comments:

Post a Comment