Everything Is Obvious (40 page)

Read Everything Is Obvious Online

Authors: Duncan J. Watts

BOOK: Everything Is Obvious
9.18Mb size Format: txt, pdf, ePub
CHAPTER 6: THE DREAM OF PREDICTION

  
1.
 See Rosenbloom (2009).

  
2.
 See Tetlock (2005) for details.

  
3.
 See Schnaars (1989, pp. 9–33) for his analysis and lots of entertaining examples. See also Sherden (1998) for additional evidence of the lousy forecasting record of futurologists. See also Kuran (1991) and Lohmann (1994) for discussions of the unpredictability of political revolutions; specifically the 1989 collapse of the East Germany. And see Gabel (2009) for a retrospective look at the Congressional Budget Office’s Medicare cost predictions.

  
4.
 See Parish (2006) for a litany of intended blockbusters that tanked at the U.S. box office (although some, like
Waterworld
, later became profitable
through foreign box office revenues and video and DVD sales). See Seabrook (2000) and Carter (2006) for some entertaining stories about some disastrous miscalculations and near-misses inside the media industry. See Lawless (2005) for some interesting background on the publisher Bloomsbury’s decision to acquire
Harry Potter
(for £2,500). General information about production in cultural industries is given in Caves (2000) and Bielby and Bielby (1994).

  
5.
 In early 2010, the market capitalization of Google was around $160B, but it has fluctuated as high as $220B. See Makridakis, Hogarth, and Gaba (2009a) and Taleb (2007) for lengthier descriptions of these and other missed predictions. See Lowenstein (2000) for the full story of Long-Term Capital Management.

  
6.
 Newton’s quote is taken from Janiak (2004, p. 41).

  
7.
 The Laplace quote is taken from
http://en.wikipedia.org/wiki/Laplace’s-demon
.

  
8.
 Lumping all processes into two coarse categories is a vast oversimplification of reality, as the “complexity” of a process is not a sufficiently well understood property to be assigned anything like a single number. It’s also a somewhat arbitrary one, as there’s no clear definition of when a process is complex enough to be called complex. In an elegant essay, Warren Weaver, then vice president of the Rockefeller Foundation, differentiated between what he called disorganized and organized complexity (Weaver 1958), where the former correspond to systems of very large numbers of independent entities, like molecules in a gas. Weaver’s point was that disorganized complexity can be handled with the same kinds of tools that apply to simple systems, albeit in a statistical rather than deterministic way. By organized complexity, however, he means systems that are neither simple nor subject to the helpful averaging properties of disorganized systems. In my dichotomous classification scheme, in other words, I have effectively lumped together simple systems with disorganized systems. As different as they are, however, they are similar from the perspective of making predictions; thus conflation does not affect my argument.

  
9.
 See Orrell (2007) for a slightly different take on prediction in simple versus complex systems. See Gleick (1987), Watts (2003), and Mitchell (2009) for more general discussions of complex systems.

10.
 When I say we can predict only the probability of something happening, I am speaking somewhat loosely. The more correct way to talk about prediction for complex systems is that we ought to be able to predict properties of the distribution of outcomes, where this distribution characterizes the probability that a specified class of events will occur. So, for example, we might predict the probability that it will rain on a given day, or that the home team will win, or that a movie will generate more than a certain level of revenue. Equivalently, we might ask questions about the number of points by which we expect the home team to win, or the expected revenue of a particular class of movies to earn, or even the variance that we expect to observe around the average. Regardless, all these predictions are
about “average properties” in the sense that they can be expressed as an expectation of some statistic over many draws from the distribution of outcomes.

11.
 For a die roll, it’s even worse: The best possible performance is to be right one time out of six, or less than 17 percent. In real life, therefore, where the range of possible outcomes can be much greater than a die roll—think, for example, of trying to predict the next bestseller—a track record of predicting the right outcome 20 percent of the time might very well be as good as possible. It’s just that being “right” 20 percent of the time also means being “wrong” 80 percent of the time; that just doesn’t sound very good.

12.
 See
http://www.cimms.ou.edu/~doswell/probability/Probability.html
. Orrell (2007) also presents an informative discussion of weather prediction; however, he is mostly concerned with longer-range forecasts, which are considerably less reliable.

13.
 Specifically, “frequentists” insist that statements about probabilities refer to the relative fraction of particular outcomes being realized, and therefore apply only to events, like flipping a coin, that can in principle be repeated ad infinitum. Conversely, the “evidential” view is that a probability should be interpreted only as the odds one ought to accept for a particular gamble, regardless of whether it is repeated or not.

14.
 See de Mesquita (2009) for details.

15.
 As Taleb explains, the term “black swan” derives from the European settlement of Australia: Until the settlers witnessed black swans in what is now Western Australia, conventional wisdom held that all swans must be white.

16.
 For details of the entire sequence of events surrounding the Bastille, see Sewell (1996, pp. 871–78). It is worth noting, moreover, that other historians of the French Revolution draw the boundaries rather differently from Sewell.

17.
 Taleb makes a similar point—namely that to have predicted the invention of what we now call the Internet, one would have to have known an awful lot about the applications to which the Internet was put after it had been invented. As Taleb puts it, “to understand the future to the point of being able to predict it, you need to incorporate elements from this future itself. If you know about the discovery you are about to make, then you have almost made it” (Taleb 2007, p. 172).

CHAPTER 7: THE BEST-LAID PLANS

  
1.
 Interestingly, a recent story in
Time
magazine (Kadlec 2010) contends that a new breed of poker players is relying on statistical analysis of millions of games played online to win at major tournaments.

  
2.
 See Ayres (2008) for details. See also Baker (2009) and Mauboussin (2009) for more examples of supercrunching.

  
3.
 For more details on prediction markets, see Arrow et al. (2008), Wolfers and Zitzewitz (2004), Tziralis and Tatsiopoulos (2006), and Sunstein
(2005). See also Surowiecki (2004) for a more general overview of the wisdom of crowds.

  
4.
 See Rothschild and Wolfers (2008) for details of the Intrade manipulation story.

  
5.
 In a recent blog post, Ian Ayres (author of
Supercrunchers
) calls the relative performance of prediction markets “one of the great unresolved questions of predictive analytics” (
http://freakonomics.blogs.nytimes.com/2009/12/23/prediction-markets-vs-super-crunching-which-can-better-predict-how-justice-kennedy-will-vote/
).

  
6.
 To be precise, we had different amounts of data for each of the methods—for example, our own polls were conducted over only the 2008–2009 season, whereas we had nearly thirty years of Vegas data, and TradeSports predictions ended in November 2008, when it was shut down—so we couldn’t compare all six methods over any given time interval. Nevertheless, for any given interval, we were always able to compare multiple methods. See Goel, Reeves, et al. (2010) for details.

  
7.
 In this case, the model was based on the number of screens the movie was projected to open on, and the number of people searching for it on Yahoo! the week before it opened. See Goel, Reeves, et al. (2010) for details. See Sunstein (2005) for more details on the Hollywood Stock Exchange and other prediction markets.

  
8.
 See Erikson and Wlezien (2008) for details of their comparison between opinion polls and the Iowa Electronic Markets.

  
9.
 Ironically, the problem with experts is not that they know too little, but rather that they know too much. As a result, they are better than nonexperts at wrapping their guesses in elaborate rationalizations that make them seem more authoritative, but are in fact no more accurate. See Payne, Bettman, and Johnson (1992) for more details of how experts reason. Not knowing anything, however, is also bad, because without a little expertise, one has trouble even knowing what one ought to be making guesses about. For example, while most of the attention paid to Tetlock’s study of expert prediction was directed at the surprisingly poor performance of the experts—who, remember, were more accurate when making predictions outside their area of expertise than in it—Tetlock also found that predictions made by naïve subjects (in this case university undergraduates) were significantly worse than those of the experts. The correct message of Tetlock’s study, therefore, was not that experts are no better than anyone at making predictions, but rather that someone with only general knowledge of the subject, but not no knowledge at all, can outperform someone with a great deal of knowledge. See Tetlock (2005) for details.

10.
 Spyros Makridakis and colleagues have shown in a series of studies over the years (Makridakis and Hibon 2000; Makridakis et al. 1979; Makridakis et al. 2009b) that simple models are about as accurate as complex models in forecasting economic time series. Armstrong (1985) also makes this point.

11.
 See Dawes (1979) for a discussion of simple linear models and their usefulness to decision making.

12.
 
See Mauboussin (2009, Chapters 1 and 3) for an insightful discussion on how to improve predictions, along with traps to be avoided.

13.
 The simplest case occurs when the distribution of probabilities is what statisticians call stationary, meaning that its properties are constant over time. A more general version of the condition allows the distribution to change as long as changes in the distribution follow a predictable trend, such as average house prices increasing steadily over time. However, in either case, the past is assumed to be a reliable predictor of the future.

14.
 Possibly if the models had included data from a much longer stretch of time—the past century rather than the past decade or so—they might have captured more accurately the probability of a large, rapid, nationwide downtown. But so many other aspects of the economy also changed over that period of time that it’s not clear how relevant much of this data would have been. Presumably, in fact, that’s why the banks decided to restrict the time window of their historical data the way they did.

15.
 See Raynor (2007, Chapter 2) for the full story.

16.
 Sony did in fact pursue a partnership with Matsushita, but abandoned the plan in light of Matsushita’s quality problems. Sony therefore opted for product quality while Matsushita opted for low cost—both reasonable strategies that had a chance of succeeding.

17.
 As Raynor writes, “Sony’s strategies for Betamax and MiniDisc had all the elements of success, but neither succeeded. The cause of these failures was, simply put, bad luck: the strategic choices Sony made were perfectly reasonable; they just turned out to be wrong.” (p. 44).

18.
 For an overview of the history of scenario planning, see Millet (2003). For theoretical discussions, see Brauers and Weber (1988), Schoemaker (1991), Perrottet (1996), and Wright and Goodwin (2009). Scenario planning also closely resembles what Makridakis, Hogarth and Gaba (2009a) call “future perfect thinking.”

19.
 For details of Pierre Wack’s work at Royal Dutch/Shell, see Wack (1985a; 1985b).

20.
 Raynor actually distinguishes three kinds of management: functional management, which is about optimizing daily tasks; operational management, which is focused on executing existing strategies; and strategic management, which is focused on the management of strategic uncertainty. (Raynor 2007, pp. 107–108)

21.
 For example, a 2010 story about Ford’s then CEO claimed that “What Ford won’t do is change direction again, at least not under Mr. Mulally’s watch. He promises that he—and Ford’s 200,000 employees—will not waver from his ‘point of view’ about the future of the auto industry. ‘That is what strategy is all about,’ he says. ‘It’s about a point of view about the future and then making decisions based on that. The worst thing you can do is not have a point of view, and not make decisions.’
New York Times
, January 9, 2010.

22.
 This example was originally presented in Beck (1983), but my discussion of it is based on the analysis by Schoemaker (1991).

23.
 
According to Schoemaker (1991, p. 552), “A deeper scenario analysis would have recognized the confluence of special circumstances (e.g. high oil prices, tax incentives for drilling, conducive interest rates, etc.) underlying this temporary peak. Good scenario planning goes beyond just high-low projections.”

24.
 See Raynor (2007, p. 37).

CHAPTER 8: THE MEASURE OF ALL THINGS

  
1.
 Some more details about Zara’s supply chain management are provided in a
Harvard Business Review
case study of the company (2004, pp. 69–70). Additional details are provided in Kumar and Linguri (2006).

  
2.
 Mintzberg, it should be noted, was careful to differentiate strategic planning from “operational” planning, which is concerned with short-term optimization of existing procedures. The kind of planning models that don’t work for strategic plans actually do work quite well for operational planning—indeed, it was for operational planning that the models were originally developed, and it was their success in this context that Mintzberg believed had encouraged planners to repurpose them for strategic planning. The problem is therefore not that planning of any kind is impossible, any more than prediction of any kind is impossible, but rather that certain kinds of plans can be made reliably and others can’t be, and that planners need to be able to tell the difference.

  
3.
 See Helft (2008) for a story about the Yahoo! home page overhoul.

  
4.
 See Kohavi et al. (2010) and Tang et al. (2010).

  
5.
 See Clifford (2009) for a story about startup companies using quantitative performance metrics to substitute for design instinct.

  
6.
 See Alterman (2008) for Peretti’s original description of the Mullet Strategy. See Dholakia and Vianello (2009) for a discussion of how the same approach can work for communities built around brands, and the associated tradeoff between control and insight.

  
7.
 See Howe (2008, 2006) for a general discussion of crowdsourcing. See Rice (2010) for examples of recent trends in online journalism.

  
8.
 See Clifford (2010) for more details on Bravo, and Wortman (2010) for more details on Cheezburger Network. See
http://bit.ly/9EAbjR
for an interview with Jonah Peretti about contagious media and BuzzFeed, which he founded.

  
9.
 See
http://blog.doloreslabs.com
for many innovative uses of crowd sourcing.

10.
 See Paolacci et al (2010) for details of turker demographics and motivations. See Kittur et al. (2008) and Snow et al. (2008) for studies of Mechanical Turk reliability. And see Sheng, Provost, and Ipeirotis (2008) for a method for improving turker reliability.

11.
 See Polgreen et al. (2008) and Ginsberg et al. (2008) for details of the influenza studies. Recently, the CDC has reduced its reporting delay for influenza
caseloads (Mearian 2009), somewhat undermining the time advantages of search-based surveillance.

12.
 The Facebook happiness index is available at
http://apps.facebook.com/usa-gnh
. See also Kramer (2010) for more details. A similar approach has been used to extract happiness indices from song lyrics and blog postings (Dodds and Danforth 2009) as well as Twitter updates (Bollen et al. 2009).

13.
 See
http://yearinreview.yahoo.com/2009
for a compilation of most popular searches in 2009. Facebook has a similar service based on status updates, as does Twitter. As some commenters have noted (
http://www.collisiondetection.net/mt/archives/2010/01/the_problem_wit.php
), these lists often produce rather banal results, and so possibly would be more interesting or useful if constrained to more specific subpopulations of interest to particular individuals—like his or her friends, for example. Fortunately, modifications like this are relatively easy to implement; thus the fact that topics of highest average interest are unsurprising or banal does not imply that the capability to reflect collective interest is itself uninteresting.

14.
 See Choi and Varian (2008) for more examples of “predicting the present” using search trends.

15.
 See Goel et al. (2010, Lahaie, Hofman) for details of using web search to make predictions.

16.
 Steve Hasker and I wrote about this approach to planning in marketing a few years ago in the
Harvard Business Review
(Watts and Hasker 2006).

17.
 The relationship between sales and advertising is in fact a textbook example of what economists call the endogeneity problem (Berndt 1991).

18.
 In fact, there was a time when controlled experiments of this kind enjoyed a brief burst of enthusiasm among advertisers, and some marketers, especially in the direct-mail world, still run them. In particular, Leonard Lodish and colleagues conducted a series of advertising experiments, mostly in the early 1990s using split cable TV (Abraham and Lodish 1990; Lodish et al. 1995a; Lodish et al. 1995b; and Hu et al. 2007). Also see Bertrand et al. (2010) for an example of a direct-mail advertising experiment. Curiously, however, the practice of routinely including control groups in advertising campaigns, for TV, word-of-mouth, and even brand advertising, never caught on, and these days it is mostly overlooked in favor of statistical models, often called “marketing mix models” (
http://en.wikipedia.org/wiki/Marketing_mix_modeling
).

19.
 See, for example, a recent Harvard Business School article by the president and CEO of comScore (Abraham 2008). Curiously, the author was one of Lodish’s colleagues who worked on the split-cable TV experiments.

20.
 User anonymity was maintained throughout the experiment by using a third-party service to match Yahoo! and retailer IDs without disclosing individual identities to the researchers. See Lewis and Reiley (2009) for details.

21.
 More effective advertising may even be better for the rest of us. If you only saw ads when there was a chance you might be persuaded by them, you’d
probably see many fewer ads, and possibly wouldn’t find them as annoying.

22.
 See Brynjolfsson and Schrage (2009). Department stores have long experimented with product placement, trying out different locations or prices for the same product in different stores to learn which arrangements sell the most. But now that virtually all physical products are labeled with unique barcodes, and many also contain embedded RFID chips, they have the potential to track inventory and measure variation between stores, regions, times of the day, or times of the year—possibly leading to what Marshall Fisher of the University of Pennsylvania Wharton School has called the era of “Rocket Science” retailing (Fisher 2009). Ariely (2008) has also made a similar point.

23.
 See
http://www.povertyactionlab.org/
for information on the MIT Poverty Action Lab. See Arceneaux and Nickerson (2009) and Gerber et al (2009) for examples of field experiments run by political scientists. See Lazear (2000) and Bandiera, Barankay, and Rasul (2009) for examples of field experiments run by labor economists. See O’Toole (2007, p. 342) for the example of the national parks and Ostrom (1999, p. 497) for a similar attitude to common pool resource governance, in which she argues that “all policy proposals must be considered as experiments.” Finally, see Ayers (2007, chapter 3) for other examples of field experiments.

24.
 Ethical considerations also limit the scope of experimental methods. For example, although the Department of Education could randomly assign students to different schools, and while that would probably be the best way to learn which education strategies really work, doing so would impose hardship on the students who were assigned to the bad schools, and so would be unethical. If you have a reasonable suspicion that something might be harmful, you cannot ethically force people to experience it even if you’re not sure; nor can you ethically refuse them something that might be good for them. All of this is as it should be, but it necessarily limits the range of interventions to which aid and development agencies can assign people or regions randomly, even if they could do so practically.

25.
 For specific quotes, see Scott (1998) pp. 318, 313, and 316, respectively.

26.
 See Leonhardt (2010) for a discussion of the virtues of cap and trade. See Hayek (1945) for the original argument.

27.
 See Brill (2010) for an interesting journalistic account of the Race to the Top. See Booher-Jennings (2005) and Ravitch (2010) for critiques of standardized testing as the relevant metric for student performance and teacher quality.

28.
 See Heath and Heath (2010) for their definition of bright spots. See Marsh et al. (2004) for more details of the positive deviance approach. Examples of positive deviance can be found at
http://www.positivedeviance.org/
. The hand-washing story is taken from Gawande (2008, pp. 13–28), who describes an initial experiment run in Pittsburgh. Gawande cautions that it is still uncertain how well the initial results will last, or whether they will generalize to other hospitals; however, a recent controlled experiment (Marra et al. 2010) suggests that they might.

29.
 
See Sabel (2007) for a description of bootstrapping. See Watts (2003, Chapter 9) for an account of Toyota’s near catastrophe with “just in time” manufacturing, and also their remarkable recovery. See Nishiguchi and Beaudet (2000) for the original account. See Helper, MacDuffie, and Sabel (2000) for a discussion of how the principles of the Toyota production system have been adopted by American firms.

30.
 See Sabel (2007) for more details on what makes for successful industrial clusters, and Giuliani, Rabellotti, and van Dijk (2005) for a range of case studies. See Lerner (2009) for cautionary lessons in government attempts to stimulate innovation.

31.
 Of course in attempting to generalize local solutions, one must remain sensitive to the context in which they are used. Just because a particular hand-washing practice works in one hospital does not necessarily mean that it will work in another, where a different set of resources, constraints, problems, patients, and cultural attitudes may prevail. We don’t always know when a solution can be applied more broadly—in fact, it is precisely this unpredictability that makes central bureaucrats and administrators unable to solve the problem in the first place. Nevertheless, that should be the focus of the plan.

32.
 Easterly (2006, p. 6).

Other books

The Secret Sister by Brenda Novak
Chart Throb by Elton, Ben
Sins & Mistrust by Lucero, Isabel
Blue Wolf In Green Fire by Joseph Heywood