Monday, October 8, 2012

How Accurate is the Slope System?



 Introduction - The introduction of the Slope System to golf handicapping has given the illusion of scientific accuracy.  Players have an index calculated to the first decimal.  Rather than being a 10-handicap as in the old days, one is now a 10.4 index.  The decimal gives the impression of increased accuracy which may not be deserved.

When a player finds his or her handicap on a slope conversion table, another level of deception occurs.  The table emits an aura of authority with it rows and columns of neatly printed numbers.  It may not occur to the player with a 10.4 index why he should get a ten percent increase in handicap when the slope of a course goes from 114 to 115.  The player may not ask if a course can be rated with such precision to distinguish between a 114 and a 115 Slope Rating.  Has the Slope System contributed to the precision of golf handicaps or merely added another layer of mathematical obfuscation?

The research presented here attempts to measure any contribution the Slope System has made in decreasing the error in estimating golf handicaps.  The Slope Rating of a course is estimated by combining estimates of the Course Rating and the Bogey Rating.  To understand the errors in the Slope Rating, the errors in estimating the Course and Bogey Ratings are examined in turn.

 Error in the Course Rating - To estimate the uncertainty stemming from errors in estimating the Course Rating, it is necessary to examine the USGA course rating model.  Since the USGA has steadfastly refused to release either statistical information on model estimation or the raw data,[1] much of what follows has to be based on informed guesswork.

By definition, the Course Rating should be the average of the better half of 20 scores submitted by a “scratch” golfer.  This suggests that if a statistically significant number of rounds by scratch golfer were submitted, a very good estimate of the Course Rating could be made.  Unfortunately, you cannot tell if a player is scratch without knowing the Course Rating.  And you cannot determine the Course Rating without knowing if a player is scratch.

            The USGA gets around the “chicken and the egg” problem by assuming that players in the United States Amateur Championship are scratch players.  The model is estimated by using the average of the better half of their scores.[2]  This methodology underestimates the Course Rating.  The USGA is taking the low 144 scores out of 288 competitors.  But handicaps are based on the best 10 out of 20 scores.  To more accurately reflect the handicapping system, the USGA should have randomly selected groups of twenty players, and then taken the lowest ten scores from each group.  This sample selection method should have been used to estimate the coefficients in the USGA Course Rating Model discussed below.[3]

            The USGA course rating system is based on the following model:

Scratch Average Score = a + b•Yardage +c1(R1-S1) + c2(R2-S) +

                                                …C10(R10-S10)

Where,

                   Scratch Average Score =   Average of the better half of the scores

                                                                  of 288 players in the U.S. Amateur

                                           Yardage =   Hole length

                                                      Ri =   ith Obstacle factor

                                                      Si =   Reference value for the ith obstacle

                                                                  factor

The USGA used a form of least squares regression analysis to estimate the coefficients of the Course Rating Model.[4] 
                  

            The USGA model assumes yardage makes the same contribution to scoring over all ranges of hole length.  This implies that adding 50 yards to a 180 yard hole would lead to the same marginal increase in the average hole score as adding 50 yards to a 500 yard hole.  An alternative hypothesis is that yardage has a different effect depending on hole length.  One test of this hypothesis would be to take the USGA data and estimate separate equations for par 3, par 4, and par 5 holes.  If the coefficient of the yardage variable was approximately the same in all three equations, then the USGA model would be validated.  If the coefficients were different, then a specification error of unknown proportion has been introduced into the estimate of the course rating.

The USGA claims an unbiased estimate of the hole-by-hole error of predictions is:[5]

                                                                   s =   ((SSE/(N-12)).5

                                                                    s =   .067

Where,

                                                                   s =   Hole-by-hole standard error of prediction

                                                             SSE =   Sum of squares of the difference

                                                                           between the average hole score and the

                                                                           estimate of the average hole scores

The USGA claims the “root-mean-square error” in rating a golf course is only .285 strokes ((s·(18).5).  The USGA does not report either the value of the model coefficients or whether they are statistically significant. The USGA may not have even tested for statistical significance, since multiple regression programs were not readily available at the time the model was estimated. There is no public record of the USGA revisiting the Course Rating Model now that more data and powerful analytic techniques are easily accessible. Interestingly, the yardage variable has not changed in the past 20 years though players are hitting the ball much further.  It is also doubtful that all ten obstacle coefficients are significant.  The inclusion of so many variables has given the illusion of precision that probably cannot be justified by the data.[6]

            The variance is also underestimated because of the colinearity among the yardage and obstacle values.  This colinearity stems from both design (architects tend to design more obstacles in long championship courses) and definition (the “green target” obstacle for example is related to the length of the hole.  This implies the variables are not independent which is one of the key assumptions of regression analysis.

            In summary, the course rating model may have specification, measurement, and colinearity errors.  The size of these errors is unknown.  To be conservative, these errors are neglected, and the USGA error estimate of ±.3 strokes will be used in estimating possible errors in the Slope Rating.

Error in the Bogey Rating - The Bogey Rating is the expected average score of a bogey golfer who is defined as a player with a USGA Handicap Index of between 17.5 and 22.4.  The USGA has taken a group of 60 golfers with indices within the required range to create a norm reference group.  How golfers with these indices were found before the slope methodology was developed had never been documented by the USGA.    In a correspondence, however, a USGA official stated that the standard deviation of the bogey rating estimate was ±.5 strokes.[7]  The estimate of the standard deviation is subject to all of the errors discussed previously for the Course Rating.  The USGA is given the benefit of the doubt, however, and this estimate of the standard deviation is accepted here for the purposes of this paper.


Error in the Slope Rating -  The Slope Rating is a measure of the course difficulty for a bogey golfer.  It is calculated by the equation:

                                                 Slope Rating =   5.381• (BR - CR)

Where,
                                                                BR =   Bogey Rating
                                                                CR =   Course Rating

The estimated equation for the Slope Rating can be found by substituting the equations for the course and bogey rating into the equation above:

                                                 Slope Rating =   52.7 +.00917•Yardage +

                                                                             5.381(CBOV - CSOV)

Where,
                                                          CBOV =   Bogey Obstacle Value for 18 holes
                                                          CSOV =   Scratch Obstacle Value for 18 holes

           The Slope Rating is not very sensitive to yardage.  A five hundred yard difference in course length would only mean a difference of 5 points in the slope rating.  What drives the difference in Slope Ratings among courses is the last term that measures the difference in obstacle values for the bogey and scratch golfer.

            An example of two courses will demonstrate the impact of the obstacle values on the variation in slope rating among courses.  The contribution to the Slope Rating of yardage and obstacle values for the Stadium Course at PGA West and for Rancho Park, a municipal course of moderate difficulty are shown in Table 1.


Table 1
Contribution to the Slope Rating
Course
Constant
Yardage
Obstacle Value
Total
PGA West
7265 Yards

52.7

66.6

31.7

151
Rancho Park
6271 Yards

52.7

57.5

6.8

117

       
When only the constant and yardage contributions are considered, the slope rating for PGA West is only 8 percent higher than for Rancho Park.  When all contributions are considered, however, the slope rating for PGA West is 29 percent higher.  The point is that the Slope Rating is heavily impacted by the most subjective, and hence the more prone to error, part of the rating system.

            The variance in the estimate of the Slope Rating is:

                                Variance(Slope Rating)=   (5.381)2 (BRV + CRV)

Where,
                                                 BRV =   Variance in the estimate of the Bogey Rating
                                                 CRV =   Variance in the estimate of the Course Rating

             Now the variance is just the square of the standard deviation of the two ratings.  Using the USGA values of .5 and .3 for the standard deviations, the variance of the slope rating becomes:

                               Variance(Slope) =   28.96 (.25 +.09) = 9.85

            The standard deviation of the estimate of the Slope Rating would be the square root of the variance or approximately 3.1 rating points.

Impact of the Slope System on the Accuracy of Handicaps - Does the Slope System enhance or degrade the accuracy of handicaps?  The answer is “It depends.”  Let’s examine three cases to judge the efficacy of the Slope System.

Case 1: No Slope Effect - The existence of the slope effect (i.e., bogey golfers doing relatively worse on a course with a high slope rating) has never been empirically demonstrated.  In this case where there is no slope effect, possible errors associated with the estimating methodology are not a key concern.  It is not the errors in estimating the slope, but the entire model that needs to be re-examined.  This is not the purpose here, however, so we go on to other cases where the validity of the Slope System is not challenged.

Case 2: Small Range in Slope Ratings - In this case, it is assumed the slope effect does exist but the range of true slope ratings is quite small.  For our purposes, it is assumed that courses have a range of ten Slope Rating points.  Such a small range is not as unlikely as it may appear.  In a survey in Southern California, 48 percent of all courses with a Slope Rating over 100 (i.e., a rough measure of being a regulation length golf course) had Slope Ratings from the regular tees between 114 and 123.

            When courses have about the same true slope, measurement errors become more important.  Assume for example, that two courses have the same true slope.  The probability of estimating the true Slope Rating at any one course is only around .13--i.e., the USGA estimated Slope Rating will only equal the true Slope Rating in about 1 in 7 tries.[8] 

More important, however, is the difference in the estimate of the two slopes since this will partially determine the size of any portability error.  The standard error of the estimate of the difference in two slope ratings is approximately 4.4 rating points.  The probability that the difference in two slopes will be estimated correctly is about .09.  The probability of making errors of various size in the estimate of the difference in two slope ratings is shown in   Table 2.

Table 2
Probability of an Error in Estimating
the Difference in Two Slope Ratings
N= Difference in Slope Ratings
Probability of an Error of More than N Rating Points
0
.91
1
.73
2
.57
3
.42
4
.31
5
.21
6
.14
7
.09
8
.05
9
.03
10
.02

The table indicates there is a one in five chance the difference in Slope Ratings will be off by 5 or more rating points. In this example, the Slope System can actually lessen the equity of competition when the true slope of courses are closely clustered.

Case 3: Large Range in Slope Ratings - The value of the Slope System is more evident where there is a large difference in the true slope among courses.  If the true difference were 10 rating points, for example, the more difficult course would almost always have a higher Slope Sating.  That is a player’s handicap (as long as it not below scratch) will stay the same or increase while going to the course that is tougher for the bogey golfer.  The increase may not reflect the true difference in difficulty because of measurement errors.  Measurement errors, however, may be small in comparison to the true difference in the Slope Rating.


Summary - The overall assessment of the efficacy of the Slope System rests in part on the range of slope ratings a player encounters.  Are the errors involved in rating similar courses outweighed by the more accurate handicap at courses with a substantially higher or low slope rating?  The answer to this question lies within data controlled by the USGA.  It is only with this data that the size of other possible error can be estimated.   Remember, this analysis used the best possible case for the Slope System.  And most importantly, USGA data could reveal whether there is indeed a slope effect large enough to justify the Slope System.

The USGA Responds

            The USGA was not pleased with this paper.  They responded with  little understanding of the statistics underlying their own model.[9]  The USGA denied there could be such large errors in the Slope Ratings:

In the USGA Rating System we admit that scratch rating are only accurate to within ±.3 stokes and bogey rating to within ±.5 strokes.  Assuming the widest spread… the resulting error in the slope rating is .8•(5.381) or 4.3.  So the certainty that 130 is more difficult than 125 is a little better than the certainty we assign to course ratings.

First, the USGA makes an error in assuming the widest spread can be .8.  The standard error does not mean that the estimate cannot be off by more that ±.3.  The standard error only means that the course rating methodology has at best only a 68 percent chance of bracketing the true mean to with ±.3 strokes.  This is an elementary mistake in statistics and reflects the level of sophistication behind the USGA ratings model.

            Second, he USGA response only deals with the error at one course.  What is important to the success of the Slope System is the error in the difference in Slope Ratings between courses.  If you only play one course, any error in the slope rating will have no effect on your handicap.  The errors between courses, however, can be substantial and distort the computation of a player’s handicap and lead to inequitable competition in inter-club matches.



[1] Letter from David Fay, Executive Director of the USGA to the author, May 31, 1991
[2] It is assumed the USGA used the better half of the 18 hole scores rather that the better half of the scores on each hole.  Using the latter scores would lead to an underestimate of the course rating.
[3] In order to estimate the magnitude of the difference in the two different sampling methods, 280 scores from the 1991 Atlanta Open were examined.  Using the better half of all scores, the average score was 68.3.  Taking the 10 best scores out of sets of 20 scores (players were grouped alphabetically into groups of 20), the average was 68.5 strokes.  The method of selecting the sample led to a difference in the course rating estimate of .2 strokes.  The difference is likely to be larger in the USGA estimate of the course rating since there is probably a greater variance in scores at the USGA Amateur than there is at most professional tournaments.
[4] One of the major assumptions of this type of regression analysis is not met.  Least squares regression techniques assume there are no errors in measuring the independent variables (i.e., the yardage and obstacle values.)  This is not the case with obstacle values, however.  For example, with what level of precision can a water hazard be rated a “2.”  Would other raters put the value at 3.  Is the hazard really a 2.45, but rated at 2 because the rating methodology does not allow for such fine distinctions.?  The model proposed by the USGA belongs to the class of problems termed “errors in both variables.”  The USGA has made no attempt to estimate the size of this error.  Estimating techniques for such models can be found in Wonnacott and Wonnacott, Econometrics, Wiley and Sons, New York, 1970, p. 164.
[5] Knuth, D., “A Two Parameter Golf Course Rating System,” Science and Golf: the Proceedings of the First World Scientific Congress, Rutledge, Chapman and Hall, London, 1990
[6] Another problem with the methodology employed by the USGA is the sample of holes selected.  The USGA had data for 126 holes, but chose to use data from only 74 holes.  In essence over 40 percent of the sample was not used.  The criteria for the selection of holes has not been given.  The omission of so much data, however, could lead to vary different estimates of the coefficients or much larger error estimates than the USGA has presented..
[7] Letter from Warren Simmons, Executive Director of the Colorado Golf Association to Dean Knuth, Director of Handicapping of the USGA. February 14, 1991
[8] To estimate true slope, the estimate must be off by less than .5 of a rating point.  The standard deviation of the estimate of the mean is 3.1 rating pints.  Therefore, the estimate must be within .16 stand error of the true mean.  Assuming a normal distribution of the error, such an estimate should occur only 13 percent of the time.
[9] Letter from Warren Simmons, op. cit.

2 comments:

  1. http://www.golfdigest.com/story/how-far-do-average-golfers-really-hit-it-new-distance-data-will-surprise-you
    Let me know if you think the data makes sense. Equipment is helpful but not as much as you would think. Still have to hit on the face, most golfers can barely accomplish that.

    ReplyDelete
    Replies
    1. I believe the general argument of the article is the average player has not gained much from technical advancements in clubs and balls. While one might quibble with the methodology, the finding is consistent with my experience. The back tees at most clubs are rarely used. The purchase of new clubs is essentially a case of hope overcoming experience. I'd write more but I am off to demo day...

      Delete