Monday, December 10, 2012

Handicapping Four-Ball Stroke Play Events


Introduction - The handicap system is designed to bring equity to the competition between individual competitors.  Many competitions, however, are based on team performance (four-ball, foursome, scramble).  The problem then is how to combine individual handicaps into team handicaps to make for fair competition.

            The United States Golf Association (USGA) has made recommendations on handicaps for different formats.[1]  The USGA, however, has not published any empirical research validating its recommended team handicaps.  Moreover, USGA recommendations are not internally consistent.  In four-ball match play, the USGA recommends teams play at full handicap.  In four-ball stroke play for men, however, handicaps are reduced by 10 percent.  For women, the recommended reduction for four-ball stroke play is only 5 percent.  It would be difficult to justify on theoretical grounds why four-ball match play should be played at full handicap while four-ball stroke play should be played at 90 percent of course handicap.[2]  There is also nothing inherent in the handicap system that could explain why men and women should have different handicap adjustments in four-ball stroke play competition.

            This paper presents a methodology for the empirical verification of the equity of a handicap procedure.  The methodology is applied to the case of four-ball stroke play.[3]  This examination is structured along the lines of the questions it intends to answer.  First, is the 90 percent allowance recommended by the USGA equitable?  Second, do teams with larger differences in handicaps between partners have an advantage?  And third, what is the optimal handicap allowance for four-ball stroke play?


Is the 90 Percent Allowance Equitable? - USGA recommendations on handicap allowances appear to based in part on the work of F. Scheid of the USGA Handicap Research Team.  In a 1971 study, Scheid examined the net scores of players with various combinations of handicaps.[4]  His study was based on 50 players from one club.  Scheid did not have data from a four-ball event.  Instead he took the scores of these 50 players and simulated matches between them.  Based on these simulations, Scheid claimed players should be allowed 107 percent of their handicap in four-ball stroke play competition.  In 1971, however, a player’s handicap was computed as 85 percent of the average of his ten best scores.  Today, that percentage has been increased to 96 percent.  Sheid’s recommendation in terms of the current handicap system would be to allow players roughly 95 percent of their handicap.

Sheid’s argument that handicaps should be reduced proportionately for four-ball stroke play competition can be represented by Model I below:


            Net Score = a + bH


                        H = Average handicap of the two players


If high handicappers had an advantage in four-ball competitions a team’s net score would be expected to decrease with its handicap.  A plot of tournament net scores illustrating such a bias is shown in the figure below.


The b-coefficient in Model I is a measure of the inequity of the handicap.  This coefficient is the slope of the line drawn in the figure above.  If b were to equal -.1, for example, the model would predict that for every ten stroke increase in average handicap, net score would be expected to decrease by one stroke.  If the coefficient were zero, then there is no correlation between handicap and net score which is the ideal situation.  Estimates of this coefficient using data from various tournaments would indicate the magnitude and direction of the bias.

            The value of the b-coefficient of Model I was estimated with the data from the 1997 Southern California Golf Association (SCGA) Four-Ball Stroke Play Championship.  The competition was conducted at four qualifying sites followed by two rounds at the championship site (i.e., six sample sites).  Each player’s course handicap was computed using the 90 percent allowance.

Five of the six estimates of the model coefficient are negative indicating a likely negative relationship between net score and average handicap (see Table 1). That is, even with a 10 percent reduction, higher handicap players had an advantage.  In two cases  (Menifee Lakes and Vista Valley), the hypothesis that the b-coefficient was equal to zero could be rejected at the 95 percent level of confidence.

            This research only raises the suspicion that the 90 percent allowance may not be equitable.  Any conclusion has to be tempered by the data limitations which are serious and are discussed in the concluding section.

Table 1

Model I Coefficient (90% Allowance)


b-coefficient (t-stat.)
Menifee Lakes
-0.38 (2.55)
Sierra La Verne
-0.09 (0.38)
Soule Park
0.31 (0.16)
Vista Valley
-0.46 (2.09)
Bear Creek Day 1
-0.11 (0.54)
Bear Creek Day 2
-0.06 (0.25)


Do Teams with Large Difference in Handicaps Between Partners have an Advantage? - Sheid argued that the average handicap of a team is not an adequate description of their ability in four-ball competition.[5]  He found that teams with spreads in handicap do better (e.g., a 25 and 1-handicap will team better than two 13-handicap players even though the average handicap of each team is the same.) 

Sheid’s finding suggests Model II:


            Net Score = a + bH + cDIFF



    H = Average Handicap of the Two Players

DIFF = Difference in Handicap between High and Low Handicap


If high handicappers had an advantage, the b-coefficient should be negative.  And if larger differences in handicaps between players are advantageous, then the c-coefficient should also be negative.

            The model was estimated using standard multivariate regression techniques.  There was a five-stroke limitation on the difference in handicap.  In the tournament, the higher handicapped player had his handicap reduced so the five-stroke limitation was met.  Such teams, however, were eliminated from the sample used in estimating the model. The estimates of the model coefficients are shown in Table 2  below:


Table 2

Model II Coefficients


b-coefficient (t-stat.)
c-coefficient (t-stat.)
Menifee Lakes
-.403 (2.36)
-.203 (0.63)
Sierra LaVerne
-.077 (0.34)
.119 (0.27)
Vista Valley
-.489 (2.20)
-.456 (0.87)
Soule Park
.025 (0.12)
-.038 (0.09)
Bear Creek - Day 1
-.079 (0.37)
.340 (0.75)
Bear Creek - Day 2
.048 (0.20)
1.00 (1.96)


            The c-coefficient is never statistically significant at the 95 percent level of confidence.  In three cases the estimate of the c-coefficient is positive and in three cases it is negative.  Within the limitations of the data, it is concluded that the difference in handicap did not have a significant effect on the net score of a team.

            The evidence for the b-coefficient is similar to that found in Model I which was expected.  In two cases (Menifee Lakes and Vista Valley) the b-coefficient is both significant and negative.  The size of the b-coefficient at these two sites indicates a large bias favoring the higher handicapped player.[6]  In the other four cases, however, the b-coefficient is not statistically significant. 


What is the Optimal Allocation for Four-Ball Stroke Play Competition? - As shown above, the 90 percent allowance did not lead to equitable competition at several sites.  What allowance would have eliminated the correlation between net score and average handicap?  To explore this question, tournament results were simulated using actual gross scores but varying the handicap allowance.  An estimate of the b-coefficient was made for each allowance.  The allowance that yielded an estimate of the b-coefficient nearest to zero was considered optimal.

            The simulations found the estimate of the optimal allowance differed widely among sites as shown in Table 3.


Table 3

Optimal Handicap Allowance


Optimal Handicap
Menifee Lakes
Vista Valley
Sierra LaVerne
Soule Park
Bear Creek Day 1
Bear Creek Day 2


            The optimal allowance is shown as a point estimate in Table 3.  In fact, however, there is an optimal range of allowances that yield little correlation between net scores and average handicap.  For example, in Bear Creek Day 2, any allowance between 87 and 103 percent would have led an estimated b-coefficient of less than .1.[7]  Therefore, the USGA recommended allowance of a 90 percent allowance would give equitable results at four of the sites. 

            At two of the sites (Menifee Lakes and Vista Valley), however, the optimal allowance is well below the USGA recommendation.  It is impossible from this data to tell whether the results at these two sites are due to random variation in the optimal allowance or to anomalies caused by players (e.g., sandbaggers).  An examination of the slope rating at these two courses did not reveal the size of error that could lead to the results presented here (See Appendix B). 

Data Limitations and Future Research - The conclusions of this research are far from definitive because of the following data limitations:


1.    Narrow Range of Handicaps - Handicaps in the championship flight at Bear Creek varied over the narrow range of between 4 and 20.  Only about 14 percent of the field had single digit handicaps.  A broader and more evenly distributed sample of handicaps is needed to have better estimates and to extend the results to the general population of golfers.

2.    Error in the Slope Rating - An error in the slope rating could bias the estimate of the optimal allowance.  For example, if the slope rating at Bear Creek was 128 rather than 136, handicaps would be overestimated by 6 percent.  Players would actually be playing at 96 percent of their true handicap rather than the 90 percent allowance.

3.    Limited Difference in Partnership Handicaps - The SCGA limited the difference in handicaps (after taking 90 percent) to five strokes.  To better explore the effect of the difference in handicap between partners a much more varied sample is required.[8]

4.    Small Sample Size - Estimates of the b-coefficient were obtained at only six sites.  To get a better understanding of the mean and variance of this coefficient, many more sites need to be examined.


To mitigate these limitations, data should be collected from a number of four-ball tournaments held at clubs.  These tournaments should be member tournaments and not member-guests or invitationals.  This will minimize the slope rating error problem since handicaps would be based on scores (for the most part) from the same course where the tournament was played.  These tournaments should also attract a greater diversity in handicaps and in differences between partners than the SCGA tournament.

With data from say 20 clubs, estimates of the model coefficient can be made for a variety of handicap allowances.  If a small range of allowance is consistently best across the clubs, then a recommendation on the allowance to be used in four-ball tournaments can be made with some confidence.




Changing Index Restrictions in the SCGA Four-Ball Tournament


            The SCGA places two restrictions on the handicaps of competitors that do not contribute to the equity of competition.  This appendix examines those restrictions and makes recommendations for change.


Five Stroke Difference in Handicap Between Partners - The stroke limitation means that some qualifying teams had their handicaps reduced when they played the championship course (Bear Creek) where the slope rating was 136.  Assume a team had indices of 9.5 and 15.4.  At a qualifying site with a low slope rating, they would have handicaps of 9 and 14 (90 percent of their course handicap).  At Bear Creek 90 percent of their course handicaps would be 10 and 16.  The 16-handicap would have to be reduced to 15 to meet the five stroke differential.  In essence, this player is playing at 83 percent of his course handicap, while most of his fellow competitors are playing at 90 percent of their handicap. 

            To eliminate this inequity, the SCGA should place the restriction on the difference in indices rather than the differences in course handicaps.  For example, the maximum difference in indices between partners could be set at 5.0.  This would be easier for the players to understand, be administratively simple for the SCGA tournament staff to compute course handicaps, and would eliminate the inequity of players competing at different percentages of their handicaps because of the slope rating.


Computing the Reduced Handicap - The USGA recommends taking 90 percent of a player’s course handicap for four-ball stroke play.  It is suspected, however, that this recommendation was written before the advent of the Slope System and has never been revised. 

The USGA’s method contains two rounding errors.  First, an error occurs when a player’s index is converted to an integer course handicap.  A second error results when 90 percent of the course handicap is rounded to the nearest integer for the four-ball handicap.

            It is suggested that the SCGA compute a player’s four-ball handicap based on 90 (or whatever allowance is used) percent of his index and not his course handicap. This method is easier to compute and yields handicaps as close or closer to a player’s ability (as measured by the non-rounded course handicap) than the USGA method.[9]

Appendix B


Testing the Accuracy of the Slope Rating


            The methodology described in this paper can also be used to make an assessment of the accuracy of the slope rating of a course.  According to the theory behind the slope system, net score should not be correlated with handicap if the slope rating is correct.[10] To examine this premise, the adjusted net score of each player was regressed against his handicap.  The b-coefficient estimated at each site is shown in the Table below:


b-Coefficient (t-stat.)
Menifee Lakes
-0.16 (1.12)
Sierra LaVerne
0.07 (0.50)
Soule Park
0.18 (1.24)
Vista Valley
0.06 (0.40)
Bear Creek Day 1
0.12 (0.72)
Bear Creek Day 2
0.09 (0.50)


The results at Bear Creek, Vista Valley, and Sierra LaVerne indicate the course slope rating is working according to theory (i.e., the estimated coefficients are not significantly different from zero).  In the other two cases, Menifee Lakes and Soule Park there is some apparent bias in the results.  High handicappers have an advantage at Menifee Lakes while low handicappers have an advantage at Soule Park.  The estimates of the coefficients at these sites are nearly statistically significant at the 90 percent level of confidence.  To lower the estimate of the b-coefficient to the range of the other sites (i.e., ±.09), the slope rating at Menifee Lakes would have to be reduced from 115 to 107.  Similarly, the slope rating at Soule Park would have to be raised from 107 to 120.

            If Menifee Lakes was overrated that would help explain the bias favoring high handicappers at this site.  Running a simulation at Menifee Lakes with a slope rating of 107, however, does not eliminate the bias as measured by the b-coefficient.  The b-coefficient was still statistically significant.

            This methodology cannot say conclusively that Menifee Lakes is overrated and Soule Park is underrated.  The research does indicate, however, the Rating Committee should review the slope rating at these two courses.


[1] USGA, USGA Handicap System, Far Hills, New Jersey, 1994.
[2] An argument could be made for reducing a player’s handicap for match play.  A high handicapper will have his handicap based in part on hole scores that would not matter to the outcome in match play (e.g., a 9 on a par five).  The USGA recommendation of lowering handicaps for stroke play and not for match play, however, does not appear to have a theoretical basis.
[3] In four-ball stroke play, two competitors play as partners.  The lower of the partners’ scores is the score for the hole.
[4] Francis Scheid, “You’re not getting enough strokes,” Golf Digest, Trumbull, Connecticut, June 1971, p. 52.
[5] Ibid.
[6] There is no obvious explanation for why scores at Menifee Lakes and Vista Valley behaved differently from other sites.  If the slope rating at these two courses was overestimated, some of the bias could be explained.   It would take an error of over 40 slope rating points, however, to explain all of the bias at Menifee Lakes.  Another possible explanation is that Menifee Lakes had a disproportionate number of “sandbaggers” who had high handicaps.  Qualifiers from Menifee Lakes were to take first, third and tie for fourth in the championship tournament.  When these three teams were excluded from the Menifee Lakes sample, the b-coefficient in both models was no longer statistically significant.
[7] At Bear Creek, for example, the same teams finish in the top ten whether 90 or 100 percent of the handicap is used.
[8] For equity and administrative ease it is suggested the SCGA : 1) base handicaps on 90 percent of a player’s index and not 90 percent of the player’s course handicap, and 2) place the limit on the difference in player’s index and not on the difference in course handicap.  Appendix A details the reasoning behind these recommendations.
[9] Assume a player had an index of 22.0 and was to play a course with a slope rating of 136.  His four-ball course handicap with a 90 percent allowance would be 23.83 before rounding.  Using the USGA method his four-ball handicap would be 23 (i.e. 26 handicap multiplied by .9 and round to 23.)  Using the 90 percent of index method, his four-ball handicap would be 24.  The 90 percent of index method yields a handicap closer to the player’s actual ability (i.e., 23.83).
[10] Stroud, R.C., Riccio, L.J., “Mathematical underpinning of the slope handicap system,” in Science and Golf I, E & FN Spon, London, 1990, pp. 135-146.

Thursday, November 15, 2012

Comparing World Golf Ranking Systems

(Note: World Rankings have gained much importance over the last decade.  To my knowledge there has not been any published research validating the methodology for ranking players. The paper below, written over ten years ago, tried to evaluate competing rankings systems by how well each predicted performance.  Neither ranking system did very well. Rankings are not a powerful predictor because of the random variation in scoring (i.e., the best golfer will often lose over one trial), past performance is not perfect indicator of future performance (i.e., a mutual fund that did well last year may not have similar results next year as every prospectus will tell you), and rankings are affected by the attributes of the course being played (i.e, there are horses for courses).  Though a ranking system is not a good predictor of performance, it does serve two important functions. First, the ranking system acts as a (hopefully) neutral arbiter in deciding who qualifies for major tournaments.  Second, it  feeds our desire for lists (i.e., the 10 best places to retire, the 10 best Mexican restaurants), which also makes for  endless 19th Hole banter--Luke Donald, are you kidding!)   
            Golf Digest printed an article (April 1999) critical of the World Golf Rankings (hereinafter termed IMG rankings).  The author, Dean Knuth, devised a new system for ranking players.  Knuth provided reasoning why the Golf Digest system would do a better job, but supplied no empirical evidence to substantiate his claim.[1]  The purpose of this paper is to evaluate the two ranking systems in their ability to explain performance at the Tournament Players Championship (TPC).
            Rankings from both systems were available on 59 players who entered the TPC.  The rankings were not the most current since the Golf Digest rankings were only available as of January 31, 1999.
            The players were ranked from 1 to 59 consistent with the rankings of both systems.  First, the rankings were correlated with the standings after the second round. The Spearman coefficient of rank correlation is shown in the table below.[2] A coefficient of 1.0 would mean the rankings and the order of finish were identical.  A coefficient of zero would mean there was no relationship between the rankings and performance.  The IMG ranking had a slightly higher correlation coefficient (.56>.54) as shown in the table below.
Spearman Correlation Coefficient 
2nd Round
All Players-Final
Cut Players
Golf Digest
Next, rankings were correlated with the final standings.   Those players who missed the cut or were disqualified were ranked last.  There was no significant difference between the correlation coefficients for the Golf Digest and IMG systems. 
            Last, a study was limited to only those players who made the cut. The 38 players in the study who made the cut were re-ranked and that ranking was correlated with the final standings.[3]  The Golf Digest system did not have a significantly larger correlation coefficient (.54 vs. .53) than the IMG system.  A plot of the TPC standing versus the rankings (See the Appendix below) of each system demonstrate the similarity of results in using the two systems.  They also demonstrate that neither system explains a great deal of the variance in performance.[4] 
In summary, the Golf Digest system does not do any better in explaining performance than the IMG system.  There may be elements of the Golf Digest system worthy of adoption.  Further research should isolate the effect of each of Golf Digest’s recommended changes to determine if it advances or impedes equity in the rankings.
            Like all quasi-experiments there are significant caveats to the results. 
·         Only One Tournament - There is a large random component in performance among golfers from week to week.  A player’s performance will also vary with the type of course (long, tight, etc.).  Even in the best case studied here, the rankings only explained 34 percent of the variation in the order of finish.  Therefore, the superiority of a ranking system could easily be due to chance when only one tournament is considered.  The methodology demonstrated here should be repeated at the four majors and WGC tournaments to determine if the Golf Digest system (or other variation) is consistently better than the IMG system.
·         Use of Old Rankings - The only set of both rankings available were as of January 31, 1999.  Any definitive study would need rankings current with the tournament under study.
·         Missing Rankings - The study was done using only 59 players out of the 144 who played.  The research should have had rankings for all players so a more complete evaluation could have been done.  It is especially important to see how well the ranking systems do in the lower regions – 40th to 70th – because of the new criterion for admission to tournaments based on the World Rankings.  Without more rankings on players in the lower ranges, no judgment on the efficacy of the two systems over various regions could be made.
            Rankings have more than just the sole purpose of predicting performance.  The rankings clearly will be one factor in determining where and how often the best players play.  Tours will also be impacted.  Decreasing the points awarded on the Japanese tour may threaten that tour’s financial viability, and harm the popularity of the game in that country (e.g., What if no Japanese player qualified for a major event because of the ranking system?).[5]  It is important these incentives be shaped into a framework that will promote and serve the best interests of the game.
Within that framework, however, it is possible a more equitable ranking system could be built.  Many of the suggestions put forward by Golf Digest seem reasonable.  Policy should not be changed on what appears reasonable, however, but on what has been empirically tested to improve the accuracy of the rankings.





[1] This is consistent with Knuth’s defense of the Slope System – all theory and no empirical verification.  In setting forth his credentials for writing this article, Knuth again referenced a 30-year-old test score..
[2] Kendall, Maurice and Jean Gibbon, Rank Correlation Methods, Oxford University Press, NY, 1990.  The Pearson correlation coefficient gave approximately the same results as shown in the table.
[3] Thirty nine of the players made the cut (i.e., low seventy players and ties).  Nick Faldo was disqualified and was treated as a non-cut player.
[4] The two systems explain approximately 30 percent of the variance in performance.  There appears to be a large random component in determining performance.  That is why there are so many upsets in match play golf tournaments.
[5] Golf Digest’s claim of the inferiority of Japanese tour players was not borne out at the TPC.  Joe Ozaki led the tournament after two days, and finished well ahead of his ranking.  Brian Watts tied for the lead after the first day, but finished slightly lower than his IMG ranking.  Shegeki Maruyama and Carlos Franco both missed the cut, but because of their low rankings (43rd and 41st) this was not unexpected.  To assess the relative strength of the various tours, the performance of each tour’s players should be evaluated over an extended period.  It would then be possible to adjust the points for each tour to bring performance and ranking into parity.  This is a much more scientific and equitable approach than just slashing the points for the Japanese and Australian tours as Golf Digest recommends.