Over the past decade, some academics have claimed to have shown scientifically that humans tend to become paralyzed by too many choices. This is often called the “paradox of choice.” Probably the best-known piece of evidence is the “jam experiment,” in which shoppers bought more jam when presented with fewer flavors than when confronted with many flavors. This assertion about human decisionmaking matters politically, as it has also been used extensively over the past decade to argue for paternalistic social policies that would have the government help people by restricting the choices available to them.
But what if one of the crucial experiments at the foundation of this mountain of inference showed no such thing?
Libertarian writer Virginia Postrel opens her recent New York Times review of a new book on the topic of the paradox of choice with this:
Sheena Iyengar is the psychologist responsible for the famous jam experiment. You may have heard about it: At a luxury food store in Menlo Park, researchers set up a table offering samples of jam. Sometimes, there were six different flavors to choose from. At other times, there were 24. (In both cases, popular flavors like strawberry were left out.) Shoppers were more likely to stop by the table with more flavors. But after the taste test, those who chose from the smaller number were 10 times more likely to actually buy jam: 30 percent versus 3 percent. Having too many options, it seems, made it harder to settle on a single selection.
Wherever she goes, people tell Iyengar about her own experiment. The head of Fidelity Research explained it to her, as did a McKinsey & Company executive and a random woman sitting next to her on a plane. A colleague told her he had heard Rush Limbaugh denounce it on the radio. That rant was probably a reaction to Barry Schwartz, the author of “The Paradox of Choice ” (2004), who often cites the jam study in antimarket polemics lamenting the abundance of consumer choice. In Schwartz’s ideal world, stores wouldn’t offer such ridiculous, brain- taxing plenitude. Who needs two dozen types of jam?
It turns out that I was also told the story of the jam experiment – for the umpteenth time — at a business conference a couple of weeks ago. But it was Postrel’s characteristic highlighting of a telling detail that I had never before heard which piqued my interest: Those who chose from the smaller number were ten times more likely to buy jam. I’ve designed and analyzed a lot of retail experiments, and causing a 10X increase in sales by changing a shelf assortment would be a truly astounding result.
Before getting into the detailed analysis, stop to notice that if this result were valid and applicable with the kind of generality required to be relevant as the basis for social policy, it would imply that lots of retailers could simultaneously eliminate 75 percent of their inventory and increase sales by 900 percent. I don’t believe in purely efficient markets, but that doesn’t seem very plausible to me.
As I dug into the experiment, I became pretty sure that this is not what happened, and I’ll try to describe why.
Some detail on what the researchers actually did is important. On two consecutive Saturdays, they operated a tasting booth inside a specific grocery store in Menlo Park, Calif., for five hours each day. Here is the original academic paper on the procedure:
Two research assistants, dressed as store employees, invited passing customers to “come try our Wilkin and Sons jams.” Shoppers encountered one of two displays. On the table were either 6 (limited-choice condition) or 24 (extensive-choice condition) different jams. On each of two Saturdays, the displays were rotated hourly; the hours of the displays were counterbalanced across days to minimize any day or time-of-day effects.
Consumers were allowed to taste as many jams as they wished. All consumers who approached the table received a coupon for a $l-discount off the purchase of any Wilkin & Sons jam. Afterwards, any shoppers who wished to purchase the jam needed to go to the relevant jam shelf, select the jam of their choice, and then purchase the item at the store’s main cash registers.
Across the ten-hour experimental period, 145 people stopped at the extensive-assortment booth, and of these, four bought jam with the coupon (a 3 percent redemption rate). One-hundred-and-four people stopped at the limited assortment booth, and of these, 31 bought jam with the coupon (a 30 percent response rate).
The fundamental problem that confronts all retail-store experiments is “signal-to-noise”: the background variation in day-to-day store performance is typically very large compared to the actual causal effect of the business program being tested. These researchers are careful scholars who worked hard to correct for this by using alternating hours across two days, but the design is simply not sufficient.
What they were really testing was not the effect of changing assortment breadth on sales, but rather the effect of changing the assortment breadth of an in-store display on the redemption rate of a store-distributed coupon. While it seems intuitively unlikely that you could create a 10X improvement in redemption with a smaller display than a larger one, it also seems implausible that this huge a difference in response rate could be due to random chance – right? Not necessarily.
What are the odds that we would see one randomly chosen group of about 100 of the people who were given a coupon have a redemption rate that is ten times as large as another similarly sized random group of people given the exact same coupon? It’s larger than you might think. Consider an example. A recent in-store coupon executed by a large-format grocery-store chain was distributed to more than 1.3 million shoppers. I randomly divided them into about 13,000 groups of 100 shoppers each. I then randomly paired each of these groups with one other, creating about 6,500 randomly matched pairs of randomly selected groups of 100 shoppers. In a little over 9 percent of these pairings, the redemption rate was at least ten times as high in one group as in its matched pair. The jam experiment, by this simplified and indicative metric, would fail to achieve standard measures of statistical confidence required to reject the hypothesis that this was just random variation.
And while the specifics will vary for any given coupon – based on characteristics like product category, average redemption rate, time of year, and so forth – this indicative analysis almost certainly understates the actual probability of seeing this much difference between the two groups in the experiment. The two groups of jam buyers were not assigned randomly. Because the experiment was done for a total of ten hours in only one store, and because shoppers were grouped in hourly chunks, there are all kinds of reasons why the people who happened to show up during the five hours of limited assortment might have different propensity to respond to one-dollar-off coupons for a specific line of jams than those who arrived in the other five hour period. Maybe a soccer game finished at some specific time, and several of the parents who share similar propensities versus the average shopper came in nearly together, or maybe a bad traffic jam in a part of town with non-average propensity to respond to the coupon dissuaded several people from going to the store at one time versus another. Remember, all of the inference is built on the purchase of a grand total of 35 jars of jam. This is one reason why rigorous retail experiments, when a lot of money is at stake, are typically executed for dozens of randomly assigned stores for a period of weeks — and even sample sizes like that are pushing the envelope of causal inference.
But the result is at least interesting, and the right way to figure out whether or not the result is valid and generalizable is replication. Over the past ten years, a number of such experiments have been done by academics to evaluate the asserted paradox of choice for product categories ranging from mp3 players to mutual funds, and a paper was published in February (Scheibehenne, et al.) that conducted a meta-analysis of 50 of them (h/t Tim Harford). Across all of these experiments, the average effect of increasing choice on consumption or satisfaction was “virtually zero.” Further, this meta-analysis showed a positive average effect of increasing choices for those experiments that, like the jam experiment, tested the effect of choice on consumption quantity, rather than some measure of satisfaction, as the outcome. That is, when it comes to sales, more choice is better.
This is consistent with all of the unpublished assortment experiments that I’ve seen, and should not be surprising. As a store adds more and more products to a given product line assortment – say, canned soup – sales will rise sub-linearly with product count. The first product in a category will generally be the one with the highest sales – say, Campbell’s tomato soup – and the 1,000th one added will generally have a small market. Further, people will not indefinitely add consumption of canned soup as a category just because more choices are available. Costs, on the other hand, continue to rise as the store adds more and more kinds of canned soup. At some point, incremental products in the assortment will add some small positive revenue but will also add enough cost that they will be unprofitable — the most profitable assortment will still avoid adding some products that would drive positive revenue growth for the category. Since most assortment experiments are designed to try to find the profit optimum, adding products in this range will almost always drive some gain in revenue. There are exceptions – such as some store that has grossly mis-estimated demand in some category, or a business change that combines a reduced assortment with massive investment in improving the overall merchandising of the department, and so on – but these are rare. Further, obviously, at some point an assortment would get so large that sales would actually decline for practical reasons like consumers just not being able to get to products. The paradox of choice will surely occur in some contexts – it’s just that markets don’t seem to produce this outcome very often.
This does not mean, of course, that more choice is always better in all ways. None of my comments address more ineffable feelings of discontent that might be created in a world of many choices (because this is not the most prominent claim made by the jam experiment, to which I am responding). In response to that, I will only note that there is not a lot of evidence of widespread voluntary abandonment of the choices offered by the modern world. Sure, lots of people consciously simplify their lives – this has been a real social movement for at least the past decade. In less self-dramatizing ways, all of us do this without announcing it when we use brands and other methods for restricting our considered alternatives because we have only finite time and energy to devote to a given purchase decision.
But I think that viewing this kind of decisionmaking as evidence of the need to restrict choice coercively is a mistake. We make these decisions within nested hierarchies of choice. Person A decides to shop for hammers at Home Depot because the enormous range of choices is important to him in this category, but buys his beer at 7-Eleven. Person B shops for beer at a specialty store, but buys whatever hammer he can find at Walgreens. One quick observation is that A has probably spent time learning about hammers and B about beer, or they too would have felt overwhelmed by the variety of choices on offer. (The Scheibehenne meta-analysis showed that decisionmakers with strong prior preferences or expertise benefit from having more options to choose from; in a narrow context, this is another reason why most retail experiments – unlike the jam experiment – run for several weeks in order to allow consumers to figure out how to respond to the offer.) A second observation is that having Home Depot, 7-Eleven, Walgreens, and a specialty beer shop available is in the collective interest of both A and B, despite the fact that they don’t both shop at all stores. At a higher level, A might like shopping in general and so be happy that lots of stores offer lots of things for sale, while B likes sailing and surfing and so is happy that lots of alternative ways to get out on the water are available in our society. And so on, up through ever-higher levels of abstraction. We choose to simplify some decisions in our lives, but that doesn’t mean that we want somebody else choosing for us where we should have broad versus narrow ranges of choice.
In a 2005 article in Reason magazine, Postrel placed this debate in its proper philosophical context using elegant, even beautiful prose:
Ultimately, the debate about choice is not about markets but about character. Liberty and responsibility really do go together; it’s not just a platitude. The more freedom we have to control our lives, the more responsibility we have for how they turn out. In a world of constraints, learning to be happy with what you’re given is a virtue. In a world of choices, virtue comes from learning to make commitments without regrets. And commitment, in turn, requires self-confidence and self-knowledge.
“We are free to be the authors of our lives,” says Schwartz, “but we don’t know exactly what kind of lives we want to ‘write.’” Maturity lies in deciding just that.