The Baseball Economist: The Real Game Exposed (33 page)

BOOK: The Baseball Economist: The Real Game Exposed
7.01Mb size Format: txt, pdf, ePub
As an economist, I marvel every day at the power of market forces in allocating resources in all kinds of markets. I can find no reason to believe that these market forces are doing anything but bringing fans the baseball we deserve . . . and we must have been awfully good.
Epilogue
Under normal conditions the research scientist is not an innovator but a solver of puzzles, and the puzzles upon which he concentrates are just those which he believes can both be stated and solved within the existing scientific tradition.
—THOMAS KUHN
I CONSIDERED calling this book
An Economist Ruins Baseball
. After all, many people feel that economics really is dismal. However, I hope I have offered useful methods for understanding baseball on and off the field and demonstrated how satisfying and fun economics can be.
Applying my professional training to the game that I love has taught me new, unexpected lessons. Notions that I held about the game turned out to be false. I thought Leo Mazzone was overrated as a pitching coach and that batters protected one another in the lineup. I’m happy to have eliminated some of my ignorance.
One of my former professors once expressed to me his admiration for a book in which he disagreed with almost every single one of the author’s conclusions. Why was he so fond of the book? The methods used by the author were right, so, because the conclusions reached were unsatisfying, my professor thought about the subject in new ways. The book forced him to examine why he thought the author was wrong and so helped him better understand the subject. Unsatisfying answers provide opportunities to expand knowledge. I hope that if you find any of my conclusions to be unsatisfying, this will motivate your search for better ones.
If you ever find yourself next to an economist at a social function, ask what his or her hobbies are. I guarantee, within five minutes of discussing the topic, you’ll learn about a puzzle or two and some satisfying, intriguing solutions. Whatever you do, don’t ask about how to invest your money, unless you want the economist to notice his drink needs refilling. Economics furthers our understanding of important topics in so many areas. But really, what could be more important than baseball?
Acknowledgments
FIRST, I want to thank the many readers of my blog, Sabernomics, who have encouraged and challenged me over the past two years. Darren Viola at
BaseballThinkFactory.com
, David Pinto at BaseballMusings.com, and Mac Thomason at
BravesJournal.com
have provided other Internet forums to discuss and guide people to my work. The feedback has been helpful and kept me going.
This book would have been much more difficult without the availability of baseball data. The Lahman Baseball Archive (maintained by Sean Lahman) provides seasonal baseball statistics in a database format. The Retrosheet Project, which is maintained by David Smith but is the product of many volunteers, provided the historical play-by-play, game-by-game, and box score information. I think there are few people who have visited
Baseball-Reference.com
more than I have in the past two years. The site administrator, Sean Forman, has created the best baseball site on the web. I am grateful for Sean’s efforts, and I can’t wait to see what he has in store for the future as he sets his sights on making the site even better. Furthermore, Sean read parts of the manuscript and gave me several good suggestions.
I want to acknowledge and thank all of those people who helped me with this project in other ways. Some gave a lot, others a little, but I appreciate all of the contributions. Jim Albert, Dave Berri, Andy Bradbury, Dennis Coates, Tom Coker, Tyler Cowen, Mark Crain, Craig Depken, Heather Fain, David Gassko, Jahn Hakes, James Hall, Charlie Hallman, Elizabeth Hamilton, Jill Hendrickson, Kevin Holman, Brad Humphreys, Charles Israel, Dave Laband, Rich Lederer, Robert May-hew, Chris McDonough, Jeff Merron, Jim Porter, Skip Sauer, Alan Schwarz, Johnny Shoaf, Clay Shonkwiler, Bill Shughart, Frank Stephen-son, Dave Studenmund, Greg Tamer, Todd Thrasher, and Bob Tollison. Also, seminar participants with Clemson University, Wofford College, Southern Economic Association, Western Economic Association, and American Mathematical Society provided many helpful suggestions. I thank John McArthur for his economic inspiration and for encouraging his sports economics students to find errors in the first edition of this book.
I owe a huge debt of gratitude to Stephen Morrow, my editor. He convinced me to write this book for Dutton, improved the manuscript in ways that I never would have seen on my own, and handled the bureaucratic tasks to make life much simpler for me. It’s been an absolute pleasure to work with him. I also want to thank my publicist, Beth Parker, and the rest of the folks at Dutton who contributed to the book.
This project would not have happened without my crossing paths with Doug Drinen five years ago, when we both joined the faculty at the University of the South. Though Doug is a mathematician, he’s a natural economist and an avid sports fan. He runs the best football statistics site on the web (
Pro-Football-Reference.com
) and happens to own a complete collection of
The Bill James Baseball Abstracts
. We spent many lunches discussing sports, politics, and life. I didn’t know how many undiscovered questions existed in sports until we met. We challenged each other’s ideas, and he corrected many of my mistaken notions. He helped me develop the ideas in the book, and several of the chapters are the product of joint research projects. Though he has not read the entire manuscript, no idea here is new to him. Additionally, he taught me several useful computer skills for organizing data, and in some instances he did the computer work for me. Doug is also a good friend, and I miss our weekly lunches, summer research projects, and Sunday football watching now that I have moved to Kennesaw State University. Though our research continues at a distance, it is not as enjoyable.
There is no way I could have completed this project without my loving wife, Rachael, who gave time so that I could write. She also provided plenty of encouragement and offered many valuable suggestions. Our three-year-old daughter, Rebekah, continues to be a constant source of inspiration. My in-laws contributed good advice and encouragement.
My mother and father, both former journalists, gave me plenty of suggestions, though their moral support was much more important. They spent many hours taking me to Little League practices and games, which helped build my passion for baseball. My father shared his childhood stories of going to Yankee Stadium to watch Yogi Berra and Mickey Mantle. And, most importantly, he sympathized with the pain felt by a ten-year-old who struck out at every trip to the plate; he made time for extra practice to teach me how to hit well enough to make the All-Star team. We tend to like the things we are good at; if it were not for my father’s persistence, I don’t think I’d care much for baseball today, which is why I dedicate this book to him.
APPENDIX A
A Simple Guide to Multiple Regression Analysis
WHAT DO YOU do if you want to know the impact of one event on an observed outcome, but the factor to be explained results from several potential explanatory factors? Disentangling the responsibility of different factors in the real world is virtually impossible without the aid of certain statistical tools. In the physical sciences, holding constant, or controlling for, the influence of factors is often done in a laboratory. For example, a physicist who wants to know how a baseball behaves absent the friction of the atmosphere could place a baseball inside a vacuum—thereby removing the atmosphere entirely—and run experiments. The economist who studies human behavior doesn’t normally have the opportunity to run such controlled experiments. We have to observe individuals going about their daily lives and attempt to make them comparable through statistical tools. Multiple regression analysis is one such tool. A spreadsheet program like Microsoft Excel can run a basic regression.
88
Let’s assume that we want to figure out the impact of years of schooling on the annual income of workers. We take an unbiased sample of workers, meaning we don’t want to accidently pick workers of a certain type, and look at the education and incomes of these individuals. Figure 20 maps the income and years of schooling of a sample of fifteen workers in a scatterplot. Fifteen is a far smaller sample than we would like—typically, samples of thirty or larger are best, with more observations preferred to fewer—but lowering the number for this example makes the analysis easier to see graphically. Each point represents an individual according to these two characteristics. We often refer to the
characteristics of observations as
variables,
because the characteristics vary among those in the sample.
From eyeballing the figure, there appears to be a slight upward trend in the relationship between education and income, but the relationship is far from perfect. There are plenty of wealthy people without high school diplomas, just as there are many Ph.D.s living on modest means. However, the characteristics tend to move in the same direction. If more educated individuals tended to earn less income than those with less education, this would be a negative, or inverse, correlation. In that case, the points would form a pattern that would slope down toward the southeast corner of the diagram.
Knowing that there is a positive correlation between two variables is useful, but it doesn’t really provide much information. We would like to know two further bits of information: magnitude and certainty. How big an effect does an additional year of education have on schooling? $100, $1,000, $10,000, or some other amount? And once we estimate a magnitude, how confident are we in this estimate given the range of other possible estimates? For example, should we give or take $50 or $500? What we can do to answer this question is to estimate a
regression line
through the points to predict an average impact of education on earnings. A common technique for generating a regression line from the data is
ordinary least squares
(OLS). The method calculates the average distance from the observed values to many hypothetical predicted relationships between variables and then “picks” the line that minimizes the sum of the squared errors in prediction.
Figure 21 adds an OLS estimated regression line to the scatterplot of worker income and schooling. The vertical distance between the line and the actual points shows the individual prediction mistakes, or
residuals
. Squaring the prediction
errors of all observations and adding them together will yield a number lower than any other hypothetical line drawn through these points. Squaring the errors serves two valuable functions: it counts positive and negative errors equally, and bigger errors receive greater weight. Because this is a linear function, it can be expressed in a simple formula, which many of us probably remember from middle school: Y = mX + b; where m is the slope of the line— remember rise-over-run (∆Y/∆X)—and b is the Y-intercept. In this example, the Y-variable, or the explained variable, is income, and the X-variable is schooling, which is an explanatory variable. Econometricians typically use a different notation—replacing the slope m with β and the Y-intercept b with α— so that Y = α + βX.
The slope of the line, β, is the estimated magnitude of the effect of schooling on income. Every one-unit increase in X is associated with a β-unit change in Y. Thus, for this example, every additional year of schooling is associated with a $β increase in income. The Y-intercept, α, accounts for all of the factors that are not included in the regression equation. In a properly specified regression model, the factors are random and will cancel out.
The magnitude of the β estimate—also known as a regression coefficient— is important, but so is the confidence we have in it. Regression procedures, such as OLS, pick the intercept and slope that minimize prediction mistakes from a range of possible β values; but, given the dispersion of the data, the estimates may not be precise. Therefore, we also want information on the spread of the range of estimates, known as a confidence interval. Even though a β coefficient is positive or negative, there might not be a relationship between education and income; therefore, the β we observe is just an artifact of random fluctuations in the data. For example, if we estimate β = $2,500, what is the range of this possible relationship? The commonly preferred range of estimates is based on a 95 percent probability that the true magnitude lies within the range; although, higher levels of confidence are sometimes preferred. For example, if there is a 95 percent chance that the range lies within $1,000 of the estimate, then we can say that each year of education increases education by between $1,500 and $3,500. Can we say for certain that this correlation is positive? No, but we have a pretty good idea that it is. Because there is less than a 5 percent chance that the relationship between education and income is zero, we say that the relationship is positive. If the 95 percent confidence interval lies entirely above or below zero, then we have a high level of confidence that the relationship is not a product of random chance and is therefore
statistically
significant.
89
Oftentimes, researchers will report t-statistics or z-statistics so that others may observe the significance of the estimates. These metrics are generated from the coefficient’s standard errors of the estimates. As a rule of thumb, t-statistics and z-statistics greater than two are statistically significant.

Other books

Wanted by Potter, Patricia;
Escape by Sheritta Bitikofer
Play Dirty #2 by Jessie K
Falling Idols by Brian Hodge
Drive by Tim Falconer
Love in Lowercase by Francesc Miralles
Stories of Erskine Caldwell by Erskine Caldwell