Sunday, October 2, 2016

Watchout Ratings: What they mean, part 2

Continuing the discussion on what my ratings mean, here is a chart that plots the times at the 2015 Washington State XC meet vs. their 3200m times (average of 2015 SB vs. 2015-2016 PR). The times, of course, are in raw seconds. 3200m times on the left (y), State XC time on the bottom (x). As you can see in the legend, the Blue dots and (linear) trendline represents ALL of the data (n = 437), while the Orange dots and (linear) trendline represents only the marks that were not statistical outliers (n = 313). The two formulas at the left are for the trendlines, with the one on top being the raw data trendline and the one on the bottom being the final trendline.

Keep in mind, as always, that these are all ROUGH estimates. Lots of things to consider, such as whether or not the State XC race was a particularly good race vs. bad race (you are comparing against a previous SB and later PR, of course), and some athletes are going to be better at one event than others.

Using these formulas, for example, would show that a 15:20.2 at the Washington State XC meet last fall would be on par with a 9:20.30. One extreme non-outlier matching that comparison was Peter Hogan of Bishop Blanchet, who placed third in the 3A race running 15:20.2 and his averaged 3200m time was 9:34.34, which is one of the more extreme marks that was not an outlier: he ran 9:20.36 later that year, but only had run 9:48.32 the year prior. On the other side of the extreme, but still not an outlier, an example would be a 18:32.3 at the Washington State XC meet ran by Rachel Kostama of Puyallup (4A #14). The previous year, she ran 11:02.05 and this spring ran 10:45.63, while the formula would equate a 18:32.3 to a 11:13.72 - or, in other words, she exceeded her XC performance both the spring before and the spring after. It could be argued that she didn't have a good race at State XC (IMO she ran slightly better at the Westside Classic, finishing fourth while she was the fifth finisher from the region at state), although it would seem more like she was just better in track than XC last fall. A third example, of someone closer to the middle, would be Heidi Smith of Glacier Peak (3A #10): she ran 18:53.5 at State (11:26.23), compared to 11:23.1h in 2015 and 11:30.65 in 2016 (note: the data above used only FAT marks, so her 2015 mark as well as her 2015-2016 PR included in the data was actually her 11:26.98 at the 2015 WESCO Championships).

The data above points to an estimated rating of 200.0 = 14:46.44, while the final revised rating I did a few months ago (which was based on XC marks) was 200.0 = 14:45.00 ... not an exact match, but a VERY close match (it suggests I might be undervaluing the Washington & Northwest XC performances very slightly, but 1.44 seconds is really isn't really a noticeable difference in the scheme of things).

In short, a large part of evaluating times in XC and comparing them to track marks is a very ROUGH process: there is a lot of variation (standard deviation for the non-outliers above is 1.69% of the median), and everyone has their strengths and weaknesses. The ratings attempt to average that out over the entire field in order to get an idea of how a performance at Race X compares to performance at Race Y. Early season, all my ratings are based pretty heavily on 3k/3200m/2mile marks coming into the season, gradually incorporating more from other XC races from known courses as the season goes on (both current-season comparisons within a few weeks of performances and sometimes a comparison of a race's profile vs. that of previous years). There are some innate issues with the process, but that will always be the case when doing analysis like this and in the end all the ratings are just estimates (meaning approximate rather than exacting) anyways.

note: if you wanted to know how a 3200m time would translate to WA State XC (instead of the other way around, as above), inverting the x & y axis gives the formula of y = 1.6512x - 0.8379 (where y = 2015 State XC mark, and x = 3200m). Keep in mind, though, that the 3200m mark used is the average of the 2015 SB and the 2015-2016 PR, so calculating from just the SB will give a different formula. FWIW, the average year-to-year rate of improvement for boys XC qualifiers was 2.16% (n = 243) and for girls it was 1.30% (n = 194), meaning if you assume an average rate of improvement the 3200m mark you would use would be 1.08% faster than the previous SB for boys and 0.65% faster for girls. Or, if you look beyond State XC qualifiers and include all athletes, the ROI was 2.09% for boys (n = 684) and 0.35% for girls (n = 427), meaning 1.04% and 0.17% for the halved ROI.

Additional side-note: You'll notice that there are more data points, and in particular more outliers, below the trendline rather than above. This is to be expected, for some combination of a few reasons: 1. this is a comparison of 3200m PRs and/or Season Bests vs. the result of one specific race, 2. when looking at what an athlete is like when they are at or near their best, any specific performance would likely be closer to "average" (meaning worse) in comparison; 3. high school athletes are often more developed anaerobically than aerobically, meaning they aren't as strong at longer distances; 4. high school coaches might focus more of their year-long training on the anaerobic system rather than aerobic system compared to what is ideal for racing at the Washington State XC Championships.

No comments:

Post a Comment