Monday, October 22, 2012

Is Your Tournament Equitable?

Introduction - A golf handicap is a rough measure of a player's ability.  It is not a perfect measure.  It is biased in favor of the low handicap player, biased against a player with a large variance in his scoring, and biased against the player whose scores are trending upward.  Even with these flaws, however, it is probably the best predictor of a player's gross score in a stroke play event.
         Many tournaments, however, are played in a format where no scoring data is available.  In a four-ball event, each player may post an individual  score, but the team score is not posted.  There are clearly practical problems with trying to get a team handicap for four-ball events (e.g., not sufficient scores with the same partnership). To get around this problem, studies have been undertaken to examine the relative performance of generic teams.  For example, the USGA believes the teams with the higher combined handicap will do better in four-ball stroke play than teams with lower combined handicaps.  To correct for this inequity, the USGA recommends each player only receive 90 percent of their course handicap (See USGA Handicap System, Sec. 9-4bii).  The 90 percent figure is called an allowance.   A tournament committee has a great deal of discretion in choosing the allowance for a particular format.  That choice will have a large impact on the equity of the tournament.  Typically, there is no ex post facto examination of equity after a tournament is completed.    This lack of analysis only ensures the same mistakes will be made next year.
         This paper examines the equity of a tournament to provide an example of what should be done to increase equity.  The exemplar was a two-day tournament with a two-man scramble and four-ball format.  Thirty two teams participated and could select to play from any of three sets of tees with their handicaps adjusted according to Section 3-5 of the USGA Handicap System.  Five areas of possible equity problems are studied: 1) Prize format, 2) Scramble handicap allowance, 3) Four-ball handicap allowance, 4) The effectiveness of Sec. 3-5 in ensuring fairness, and 5) The spread in the difference in handicaps between partners.   A concluding section suggests possible policy and research implications for the United States Golf Association (USGA).

 Prize Format – One of the most common prize formats is “Equal Gross and Net.”  Unfortunately, putting “equal” in the title does not make it so. In the tournament in question, three gross prizes and five prizes were awarded.  The three gross prizes went to the three teams with the lowest combined indices. A conservative estimate of their probability of winning a prize, barring a major health emergency, would be between 80 and 95 percent.  This left the five net prizes to be fought over by the remaining 29 teams (i.e., each team in theory had only a 17% of winning a prize). The disparity in probabilities of winning is one measure of inequity. (Equally annoying, but not a problem in this tournament, is giving a gross prize in the B-flight.  This gives an edge to a team who were sufficiently mediocre to miss the cut for the A-flight)

The argument for “equal gross and net” is the low-handicap player cannot compete against the high-handicap player in a net tournament.  This argument may be valid when there a large number of players and a wide range of handicaps.  Typically, a high-handicap player has a larger variance in his scoring so it is likely one of the high-handicappers has a good chance of winning—as well as finishing dead last.  If the competition is flighted and the range of handicaps within each flight relatively small, the argument that a low-handicap player cannot compete loses much if not all of its strength.  This tournament only had one flight with handicaps ranging from 1 to 28. The low-handicap players, however, did very well.  For example, the team winning low gross also had the lowest net score.  The advantage of the low-handicap player stemmed an inequitable handicap allowance formula which is discussed next.

Scramble Handicap Allowance – In the scramble event, the player with the lower course handicap was allowed 25 percent of his course handicap.  The player with the higher course handicap was allowed 15 percent of his course handicap.  The total was rounded off with fractions of .5 or more rounded up.  This allocation seems inequitable on its face.  A team of ten-handicap players would receive a handicap of 4.  A team of scratch players would receive a handicap of 0.   In essence, the ten-handicaps would have to play even with the scratch players over 14 holes and just lose by a stroke on 4 others just to tie.  This seems unlikely. 

In an ideal tournament, net scores should not be correlated with handicaps.  Fig.1 shows a plot of the net scores versus the scramble handicap of each team.  Net scores and handicaps are highly correlated (R2 = .54).  The linear regression equation predicts that for each one-stroke increase in team handicap, the net score will increase by 1.3 strokes. This explains why the lowest net scores were posted by the three teams with the lowest combined indices. 

Clearly, the 25,15 allocation was unfair to the high handicap players.  Is there a more equitable allocation?   That is, is there an allocation that reduces the slope of the regression line to near zero?  The USGA suggests a 35,15 allocation.  Using the USGA allocation, the estimate of the slope was reduced to 0.8 for the full sample as shown in Table 1.  Based on this data set, the “best” allocation would be 50,25.  This allocation produces a minimal slope (.2)and the R2 value indicates a team’s handicap only accounts for 14 percent of the variance in net scores.  Because of the small sample size, however, this result only suggests the USGA recommended allocations may be too low and further study is definitely needed. 


Table 1

Bias in Scramble Handicap Allocations 



 Four-Ball Allowance – The net scores of each team are plotted against their average team handicap in Figure 2.  The regression equation indicates average handicap may have a small negative effect (-.1 for every increase in average handicap) on net score.   If there is a wide range in average handicaps, however, even a small effect could be important.  In this tournament there was a 20 stroke difference in handicap between the low-handicap and high handicap teams.  This translates (20 x .1) into a 2 stroke edge for the high handicap team.  Handicaps, however, were not reduced by 10 percent as recommended by the USGA.  If they were, it is likely the effect of average handicap on net score would disappear.

The coefficient for the average handicap variable, however, was not significant at the 95 percent level of confidence.  The finding of bias found here is merely suggestive, and a more definitive conclusion waits upon more and larger samples.

Sec. 3-5 – In this tournament, teams were allowed to complete from any of three sets of tees.  The player’s handicaps were adjusted in accordance with Sec. 3-5.  The assumption was the particular set of tees chosen would not have an effect on the team’s net score.  This assumption is examined for the scramble and four-ball competitions.

Scramble – To test for any effect of tee selection a dummy variable (T) was created.   The variable was assigned a value of 1 if the team played from the longest tees.  The variable was assigned a value of 0 if the team played from the shortest tees. (Note: Only three teams played from the combination tees so they were excluded from the sample.)  The following equation was estimated:

             Net Score = a + b1 · Scramble Handicap + b2 · T

The estimated equation was:

Net Score = 60.4 + 1.3 · Scramble Handicap - 0.2 · T 

The coefficient of the T variable was not significant (t-statistic = -0.17).  This would indicate Sec. 3-5 adequately compensates for the differences in tees.   

Four-ball – A similar model was estimated for the four-ball competition.  The estimated equation was:

Net Score = 63.8 – 0.4 · Average Handicap + 1.6 · T

The equation estimates that playing the longer tees results in a 1.6 stroke increase in the teams net score.  Again, the coefficient of the T variable was not significant (t-statistic = 0.93) at the 95 percent level of confidence.  The equation does suggest, however, that Sec. 3-5 has not equalized competition in the four-ball event.

Limitation on the Difference in Handicaps Between Partners - The USGA is convinced, mainly on the basis of research done in the 1970’s, that the spread between handicaps is an important determinant of net score in four-ball events.  The argument is, for example, that a team composed of a 6 and a 12 handicap is better than one composed of two 9 handicaps.  To examine if the USGA’s assertion is correct, the following model was estimated using data from the four-ball event:

Net Score = a +b1 · LH + b2 ·T + b3· Spread


                       LH = Low Handicap of the two players adjusted in accordance with Sec. 3-5,
                         T = Dummy variable representing tee selection (1= Long tees, 0 = short tees)
               Spread = Difference in handicaps between partners

The estimated equation was:

                Net Score(Four-Ball) = 64.8 + .01·LH + 1.9·T – 0.5·Spread

The coefficient for LH was not significant as before.  The coefficient for the T variable was slightly more significant (t statistic = 1.24), but still did not pass the 95 percent level of confidence.  The coefficient for the Spread variable was significant at the 95 percent level (t-statistic = -2.18).   This confirms the USGA’s recommendation of placing a limit on the difference in handicaps between partners.

USGA research, however, applies to four-ball events and not to scrambles.  To examine if “spread” was important in scramble events a model similar to that above was employed.  The estimated equation was:

                Net Score (Scramble) = 60.4 + 1.3 · H - .3· T - .01·Spread


                                H = Scramble Handicap (25,15 allowance)
                                 T = Dummy variable representing tee selection (1= Long tees, 0 = short tees)
                     Spread = Difference in handicaps between partners

The coefficients of the T and Spread variable were not significant (t-statistic = -.2 and -.1 respectively).  This would indicate the limitation on the difference in handicap between partners may not be necessary for scramble events.  Given the peculiar nature of this tournament (the use of difference tees and a biased handicap allowance), any finding from the scramble event is not much more than conjecture.

 Implications for future Research - The limited purpose of this paper was to demonstrate a methodology for evaluating the equity of golf tournaments.  In the course of this research, however, several policy and research questions surfaced that should be addressed by the USGA. 

·        The USGA requires clubs to use Sec. 3-5 when players are competing from different tees.  The USGA, however, has not reported any research that proves Sec. 3-5 provides for equitable competition for different formats.  This should be corrected.  Moreover, the USGA should provide guidance on when Sec. 3-5 should be used and when it should be avoided if possible.  The USGA published "How to Conduct a Competition,"  but it is of little help in selecting formats to help ensure equity or in analysing tournament results. 

·         The four-ball allowance recommended by the USGA was developed around 1978.  That was 34 years ago.  It seems time to revisit the allowance since there have been changes to the handicap system since then.

·         The USGA only gives a soft recommendation on the allowance for scramble events—i.e., 35,15 seems to work, but you can use anything you want.  The USGA could instruct handicap chairpersons on ways to evaluate the equity of tournaments, as done here, so they are in a better position to select the appropriate allowances.






Monday, October 8, 2012

How Accurate is the Slope System?

 Introduction - The introduction of the Slope System to golf handicapping has given the illusion of scientific accuracy.  Players have an index calculated to the first decimal.  Rather than being a 10-handicap as in the old days, one is now a 10.4 index.  The decimal gives the impression of increased accuracy which may not be deserved.

When a player finds his or her handicap on a slope conversion table, another level of deception occurs.  The table emits an aura of authority with it rows and columns of neatly printed numbers.  It may not occur to the player with a 10.4 index why he should get a ten percent increase in handicap when the slope of a course goes from 114 to 115.  The player may not ask if a course can be rated with such precision to distinguish between a 114 and a 115 Slope Rating.  Has the Slope System contributed to the precision of golf handicaps or merely added another layer of mathematical obfuscation?

The research presented here attempts to measure any contribution the Slope System has made in decreasing the error in estimating golf handicaps.  The Slope Rating of a course is estimated by combining estimates of the Course Rating and the Bogey Rating.  To understand the errors in the Slope Rating, the errors in estimating the Course and Bogey Ratings are examined in turn.

 Error in the Course Rating - To estimate the uncertainty stemming from errors in estimating the Course Rating, it is necessary to examine the USGA course rating model.  Since the USGA has steadfastly refused to release either statistical information on model estimation or the raw data,[1] much of what follows has to be based on informed guesswork.

By definition, the Course Rating should be the average of the better half of 20 scores submitted by a “scratch” golfer.  This suggests that if a statistically significant number of rounds by scratch golfer were submitted, a very good estimate of the Course Rating could be made.  Unfortunately, you cannot tell if a player is scratch without knowing the Course Rating.  And you cannot determine the Course Rating without knowing if a player is scratch.

            The USGA gets around the “chicken and the egg” problem by assuming that players in the United States Amateur Championship are scratch players.  The model is estimated by using the average of the better half of their scores.[2]  This methodology underestimates the Course Rating.  The USGA is taking the low 144 scores out of 288 competitors.  But handicaps are based on the best 10 out of 20 scores.  To more accurately reflect the handicapping system, the USGA should have randomly selected groups of twenty players, and then taken the lowest ten scores from each group.  This sample selection method should have been used to estimate the coefficients in the USGA Course Rating Model discussed below.[3]

            The USGA course rating system is based on the following model:

Scratch Average Score = a + b•Yardage +c1(R1-S1) + c2(R2-S) +



                   Scratch Average Score =   Average of the better half of the scores

                                                                  of 288 players in the U.S. Amateur

                                           Yardage =   Hole length

                                                      Ri =   ith Obstacle factor

                                                      Si =   Reference value for the ith obstacle


The USGA used a form of least squares regression analysis to estimate the coefficients of the Course Rating Model.[4] 

            The USGA model assumes yardage makes the same contribution to scoring over all ranges of hole length.  This implies that adding 50 yards to a 180 yard hole would lead to the same marginal increase in the average hole score as adding 50 yards to a 500 yard hole.  An alternative hypothesis is that yardage has a different effect depending on hole length.  One test of this hypothesis would be to take the USGA data and estimate separate equations for par 3, par 4, and par 5 holes.  If the coefficient of the yardage variable was approximately the same in all three equations, then the USGA model would be validated.  If the coefficients were different, then a specification error of unknown proportion has been introduced into the estimate of the course rating.

The USGA claims an unbiased estimate of the hole-by-hole error of predictions is:[5]

                                                                   s =   ((SSE/(N-12)).5

                                                                    s =   .067


                                                                   s =   Hole-by-hole standard error of prediction

                                                             SSE =   Sum of squares of the difference

                                                                           between the average hole score and the

                                                                           estimate of the average hole scores

The USGA claims the “root-mean-square error” in rating a golf course is only .285 strokes ((s·(18).5).  The USGA does not report either the value of the model coefficients or whether they are statistically significant. The USGA may not have even tested for statistical significance, since multiple regression programs were not readily available at the time the model was estimated. There is no public record of the USGA revisiting the Course Rating Model now that more data and powerful analytic techniques are easily accessible. Interestingly, the yardage variable has not changed in the past 20 years though players are hitting the ball much further.  It is also doubtful that all ten obstacle coefficients are significant.  The inclusion of so many variables has given the illusion of precision that probably cannot be justified by the data.[6]

            The variance is also underestimated because of the colinearity among the yardage and obstacle values.  This colinearity stems from both design (architects tend to design more obstacles in long championship courses) and definition (the “green target” obstacle for example is related to the length of the hole.  This implies the variables are not independent which is one of the key assumptions of regression analysis.

            In summary, the course rating model may have specification, measurement, and colinearity errors.  The size of these errors is unknown.  To be conservative, these errors are neglected, and the USGA error estimate of ±.3 strokes will be used in estimating possible errors in the Slope Rating.

Error in the Bogey Rating - The Bogey Rating is the expected average score of a bogey golfer who is defined as a player with a USGA Handicap Index of between 17.5 and 22.4.  The USGA has taken a group of 60 golfers with indices within the required range to create a norm reference group.  How golfers with these indices were found before the slope methodology was developed had never been documented by the USGA.    In a correspondence, however, a USGA official stated that the standard deviation of the bogey rating estimate was ±.5 strokes.[7]  The estimate of the standard deviation is subject to all of the errors discussed previously for the Course Rating.  The USGA is given the benefit of the doubt, however, and this estimate of the standard deviation is accepted here for the purposes of this paper.

Error in the Slope Rating -  The Slope Rating is a measure of the course difficulty for a bogey golfer.  It is calculated by the equation:

                                                 Slope Rating =   5.381• (BR - CR)

                                                                BR =   Bogey Rating
                                                                CR =   Course Rating

The estimated equation for the Slope Rating can be found by substituting the equations for the course and bogey rating into the equation above:

                                                 Slope Rating =   52.7 +.00917•Yardage +

                                                                             5.381(CBOV - CSOV)

                                                          CBOV =   Bogey Obstacle Value for 18 holes
                                                          CSOV =   Scratch Obstacle Value for 18 holes

           The Slope Rating is not very sensitive to yardage.  A five hundred yard difference in course length would only mean a difference of 5 points in the slope rating.  What drives the difference in Slope Ratings among courses is the last term that measures the difference in obstacle values for the bogey and scratch golfer.

            An example of two courses will demonstrate the impact of the obstacle values on the variation in slope rating among courses.  The contribution to the Slope Rating of yardage and obstacle values for the Stadium Course at PGA West and for Rancho Park, a municipal course of moderate difficulty are shown in Table 1.

Table 1
Contribution to the Slope Rating
Obstacle Value
PGA West
7265 Yards




Rancho Park
6271 Yards





When only the constant and yardage contributions are considered, the slope rating for PGA West is only 8 percent higher than for Rancho Park.  When all contributions are considered, however, the slope rating for PGA West is 29 percent higher.  The point is that the Slope Rating is heavily impacted by the most subjective, and hence the more prone to error, part of the rating system.

            The variance in the estimate of the Slope Rating is:

                                Variance(Slope Rating)=   (5.381)2 (BRV + CRV)

                                                 BRV =   Variance in the estimate of the Bogey Rating
                                                 CRV =   Variance in the estimate of the Course Rating

             Now the variance is just the square of the standard deviation of the two ratings.  Using the USGA values of .5 and .3 for the standard deviations, the variance of the slope rating becomes:

                               Variance(Slope) =   28.96 (.25 +.09) = 9.85

            The standard deviation of the estimate of the Slope Rating would be the square root of the variance or approximately 3.1 rating points.

Impact of the Slope System on the Accuracy of Handicaps - Does the Slope System enhance or degrade the accuracy of handicaps?  The answer is “It depends.”  Let’s examine three cases to judge the efficacy of the Slope System.

Case 1: No Slope Effect - The existence of the slope effect (i.e., bogey golfers doing relatively worse on a course with a high slope rating) has never been empirically demonstrated.  In this case where there is no slope effect, possible errors associated with the estimating methodology are not a key concern.  It is not the errors in estimating the slope, but the entire model that needs to be re-examined.  This is not the purpose here, however, so we go on to other cases where the validity of the Slope System is not challenged.

Case 2: Small Range in Slope Ratings - In this case, it is assumed the slope effect does exist but the range of true slope ratings is quite small.  For our purposes, it is assumed that courses have a range of ten Slope Rating points.  Such a small range is not as unlikely as it may appear.  In a survey in Southern California, 48 percent of all courses with a Slope Rating over 100 (i.e., a rough measure of being a regulation length golf course) had Slope Ratings from the regular tees between 114 and 123.

            When courses have about the same true slope, measurement errors become more important.  Assume for example, that two courses have the same true slope.  The probability of estimating the true Slope Rating at any one course is only around .13--i.e., the USGA estimated Slope Rating will only equal the true Slope Rating in about 1 in 7 tries.[8] 

More important, however, is the difference in the estimate of the two slopes since this will partially determine the size of any portability error.  The standard error of the estimate of the difference in two slope ratings is approximately 4.4 rating points.  The probability that the difference in two slopes will be estimated correctly is about .09.  The probability of making errors of various size in the estimate of the difference in two slope ratings is shown in   Table 2.

Table 2
Probability of an Error in Estimating
the Difference in Two Slope Ratings
N= Difference in Slope Ratings
Probability of an Error of More than N Rating Points

The table indicates there is a one in five chance the difference in Slope Ratings will be off by 5 or more rating points. In this example, the Slope System can actually lessen the equity of competition when the true slope of courses are closely clustered.

Case 3: Large Range in Slope Ratings - The value of the Slope System is more evident where there is a large difference in the true slope among courses.  If the true difference were 10 rating points, for example, the more difficult course would almost always have a higher Slope Sating.  That is a player’s handicap (as long as it not below scratch) will stay the same or increase while going to the course that is tougher for the bogey golfer.  The increase may not reflect the true difference in difficulty because of measurement errors.  Measurement errors, however, may be small in comparison to the true difference in the Slope Rating.

Summary - The overall assessment of the efficacy of the Slope System rests in part on the range of slope ratings a player encounters.  Are the errors involved in rating similar courses outweighed by the more accurate handicap at courses with a substantially higher or low slope rating?  The answer to this question lies within data controlled by the USGA.  It is only with this data that the size of other possible error can be estimated.   Remember, this analysis used the best possible case for the Slope System.  And most importantly, USGA data could reveal whether there is indeed a slope effect large enough to justify the Slope System.

The USGA Responds

            The USGA was not pleased with this paper.  They responded with  little understanding of the statistics underlying their own model.[9]  The USGA denied there could be such large errors in the Slope Ratings:

In the USGA Rating System we admit that scratch rating are only accurate to within ±.3 stokes and bogey rating to within ±.5 strokes.  Assuming the widest spread… the resulting error in the slope rating is .8•(5.381) or 4.3.  So the certainty that 130 is more difficult than 125 is a little better than the certainty we assign to course ratings.

First, the USGA makes an error in assuming the widest spread can be .8.  The standard error does not mean that the estimate cannot be off by more that ±.3.  The standard error only means that the course rating methodology has at best only a 68 percent chance of bracketing the true mean to with ±.3 strokes.  This is an elementary mistake in statistics and reflects the level of sophistication behind the USGA ratings model.

            Second, he USGA response only deals with the error at one course.  What is important to the success of the Slope System is the error in the difference in Slope Ratings between courses.  If you only play one course, any error in the slope rating will have no effect on your handicap.  The errors between courses, however, can be substantial and distort the computation of a player’s handicap and lead to inequitable competition in inter-club matches.

[1] Letter from David Fay, Executive Director of the USGA to the author, May 31, 1991
[2] It is assumed the USGA used the better half of the 18 hole scores rather that the better half of the scores on each hole.  Using the latter scores would lead to an underestimate of the course rating.
[3] In order to estimate the magnitude of the difference in the two different sampling methods, 280 scores from the 1991 Atlanta Open were examined.  Using the better half of all scores, the average score was 68.3.  Taking the 10 best scores out of sets of 20 scores (players were grouped alphabetically into groups of 20), the average was 68.5 strokes.  The method of selecting the sample led to a difference in the course rating estimate of .2 strokes.  The difference is likely to be larger in the USGA estimate of the course rating since there is probably a greater variance in scores at the USGA Amateur than there is at most professional tournaments.
[4] One of the major assumptions of this type of regression analysis is not met.  Least squares regression techniques assume there are no errors in measuring the independent variables (i.e., the yardage and obstacle values.)  This is not the case with obstacle values, however.  For example, with what level of precision can a water hazard be rated a “2.”  Would other raters put the value at 3.  Is the hazard really a 2.45, but rated at 2 because the rating methodology does not allow for such fine distinctions.?  The model proposed by the USGA belongs to the class of problems termed “errors in both variables.”  The USGA has made no attempt to estimate the size of this error.  Estimating techniques for such models can be found in Wonnacott and Wonnacott, Econometrics, Wiley and Sons, New York, 1970, p. 164.
[5] Knuth, D., “A Two Parameter Golf Course Rating System,” Science and Golf: the Proceedings of the First World Scientific Congress, Rutledge, Chapman and Hall, London, 1990
[6] Another problem with the methodology employed by the USGA is the sample of holes selected.  The USGA had data for 126 holes, but chose to use data from only 74 holes.  In essence over 40 percent of the sample was not used.  The criteria for the selection of holes has not been given.  The omission of so much data, however, could lead to vary different estimates of the coefficients or much larger error estimates than the USGA has presented..
[7] Letter from Warren Simmons, Executive Director of the Colorado Golf Association to Dean Knuth, Director of Handicapping of the USGA. February 14, 1991
[8] To estimate true slope, the estimate must be off by less than .5 of a rating point.  The standard deviation of the estimate of the mean is 3.1 rating pints.  Therefore, the estimate must be within .16 stand error of the true mean.  Assuming a normal distribution of the error, such an estimate should occur only 13 percent of the time.
[9] Letter from Warren Simmons, op. cit.