As mentioned in the previous post, I’m not a huge fan of the Hot List testing. It reminds me more of the special advertising sections often found in magazines than the critical, objective, scientific study it attempts to pass itself off as.
Below I’ll summarize the methodology Golf Digest outlines, point out a few weaknesses of the methodology and finally suggest a few enhancements.
Methodology
First, Golf Digest assembled six scientists, seven retailers and 21 players. The scientists provide input regarding the various clubs’ technological innovations, the retailers provide input regarding demand for the clubs and the players provide input based on their use of the clubs.
Next the “judges,” three Golf Digest Editors and one Golf Digest writer, use the input provided by the scientists, retailers and players to assign a numerical score to each club in four categories: performance, innovation, look/sound/feel, and demand
Finally, the four categories are then weighted (45% performance, 30% innovation, 20% look/sound/feel, and 5% demand) to arrive at an overall score. Clubs with an overall score of 93-100 are designated “Gold” and clubs with a score of 88-92.99 are designated “Silver” status.
Methodology Problems – Part 1: The Players
Let’s start from the beginning: the input provided by the players (I’ll give the retailers and scientists a pass for now).
My first concern is with fit. With 21 players and 98 clubs, 2058 club fittings would need to be conducted before testing begins. Even if every player didn’t test every club, that’s a lot of club fittings. I’m guessing the fittings didn’t get done or didn’t get done thoroughly.
Additionally some clubs simply don’t offer options that fit certain players. My experience with the Callaway Razr Hawk Tour is a perfect example of this. As this club is only offered with the Aldila RIP shaft (not a good match for me due to its low kick point) there isn’t a stock option that suits me. This doesn’t mean it’s a bad club, just not a good fit for me. With all of the players and clubs involved in the testing, surely there were similar misfits. Were such misfits identified and excluded from test results? If not, the results could be skewed.
In addition to fit, there are questions regarding players’ objectivity. While I’m sure that the players involved in the testing are well intentioned, our best efforts to be objective do not make us objective.
I think it’s safe to assume that all of the players involved in the testing own their own golf clubs. It’s therefore to be expected that there will be some confirmation bias in reflected in the test results. Said another way, a player is more likely to score a Ping driver favorably if he has a Ping driver in his bag at home as this validates his belief that the Ping driver belongs in his bag.
Players can also be biased by advertisements. Advertisements are effective. If they weren’t companies wouldn’t spend so much money on advertising. It can therefore be assumed that the more a player has been exposed to a golf equipment company’s advertising, the more likely he is to be positively predisposed to that company’s products.
Additionally players can be biased by their fellow players. If the players are able to talk amongst themselves they will likely share their opinions. In this fashion, others’ opinions act as social proof (one means people use to determine what is correct is to find out what other people think is correct). For example, if one player mentions that he loved the Adams hybrid to another player, that other player would be more likely to rate the Adams club favorably. Conversely if this player said the hated the club, the other player would be more likely to rate the Adams club unfavorably.
The Hot List article does not address any steps taken to account for these biases. It would seem to me that these biases would tend to favor the larger equipment companies. The larger companies are more likely to have clubs already in the players bags at home (confirmation bias) and more likely to have influenced players through their advertising.
Methodology Problems – Part 2: The Judges
The judges are tasked with taking the input provided by the scientists, retailers and players and using it to assign a numerical score to each club in four categories: performance, innovation, look/sound/feel, and demand. I find this step very concerning.
My first concern is that this task is incredibly difficult. Can you translate player feedback such as “consistent flight and straight” and “you don’t have to force it” (note: this is some of the actual feedback judges were working with) into meaningful performance scores on a 100 point scale? The editors of Golf Digest would have you believe you can. In my opinion you cannot.
My second concern is objectivity. In addition to having to deal with the same objectivity challenges the players face and discussed above, the judges are faced with the “these are our advertisers” bias. Golf equipment companies have a lot of discretion in how they allocate their marketing spend. If you think that if one of Golf Digests major advertisers such as Taylor Made or Callaway got unfavorable reviews in the Hot List issue they wouldn’t consider moving some of their Golf Digest marketing allocation to other magazines, television or online get in touch with me through the comments section (I’ve got a business associate with a bank account in Zaire I’d like to introduce you to).
An analysis of the number of pages of advertising purchased by equipment companies and the number of clubs those equipment companies have appearing on the Hot List shows a high degree of correlation. For the analysis I combined Callaway and Odyssey (as Odyssey is part of Callaway Golf) and I gave Titleist credit for Footjoy ads (both Titleist and Footjoy are owned by Acushnet). This analysis yielded a correlation coefficient of 0.798 (correlation is measured from -1 to 1). Ping was a bit of an outlier in that it had 14 clubs appearing on the hotlist and only one page of advertising. However Ping’s one page of advertising was the back cover. This position makes the Ping ad highly visible and I’m sure it paid extra for this prime real estate. If we increase Ping’s advertising page count to three to reflect this special positioning the correlation coefficient jump to 0.876.
This correlation does not prove anything untoward is going on. One would expect some correlation between the two measures. The largest golf equipment companies have the largest marketing budgets so you’d expect to see the most ads from them and they also have the largest product lines. However it does highlight the fact that Golf Digest is in a very difficult position of reviewing its advertisers’ products.
Suggested Improvements
My first suggestion for improvement would be to attempt to reduce the players’ bias. One way this could be accomplished would be by disguising the clubs. I bet if Golf Digest asked the equipement companies would provide clubs that did not contain any names or other identifying markings. Alternatively, Golf Digest could use tape and or spray paint to disguise the clubs. Of course there are some distinctive features such as the Taylor Made R11s’ ASP Plate which would be difficult to disguise but this would be a step in the right direction as it would help reduce players’ confirmation bias and reduce the influence of advertising on players’ feedback.
In addition to the player testing, I’d also like to see Golf Digest implement some lab testing. For example, hot faces were frequently cited as key technology in many of clubs but there was no attempt to determine which clubs offered the best face technology. I’d like to see golf digest measure the COR of each club’s clubface and present the data in a heat map graphic.
Lastly and most importantly, I’d like to see Golf Digest remove the editors from the equation. Why have the editors translate the scientists’, retailers’, and players’ feedback into numerical scores when the scientists, retailers, and players can do this themselves? The editors’ involvement in this process threatens to distort the findings in two ways. The first way in which the results are distorted I’ll call the “whisper down the lane” problem: the editors misinterpret the feedback they’re provided. The second way in which results are distorted is that editors can allow Golf Digest’s commercial interests (i.e. keeping advertisers happy) to influence their ratings. The scientists, retailers, and players who do not have this conflict are therefore in a better position to provide these ratings directly.