Authors: William Poundstone
Tags: #Business & Economics, #Investments & Securities, #General, #Stocks, #Games, #Gambling, #History, #United States, #20th Century
to Vannevar Bush, Shannon wrote, “Off and on, I have been working on an analysis of some of the fundamental properties of general systems for the transmission of intelligence, including telephony, radio, television, telegraphy, etc.” This letter describes the beginning of information theory. As Shannon would ultimately realize, his theory of communication has surprising relevance to the problem of gambler’s ruin.
Before Shannon, most engineers did not see much of a connection between the various communications media. What applied to television did not apply to telegraphs. Communications engineers had learned some of the technical limits of each medium through trial and error, in about the way the cathedral builders of the Middle Ages learned something of structural engineering. Through trial and error, they learned what didn’t work.
Shannon sensed that the field was due for a new synthesis. He apparently came to this subject without any coaching from Bush, and before he worked for Bell Labs, where it would have obvious economic value to AT&T.
Your home may have a fiber-optic cable leading into it, carrying TV channels, music, web pages, voice conversations, and all the other content we loosely call information. That cable is an example of a “communications channel.” It is a pipeline for messages. In some ways, it’s like the water pipe leading into your home. Pipe or cable, each can carry so much and no more. In the case of a water pipe, capacity is simply a matter of the width of the pipe’s bore. With a communications channel, the capacity is called
Flow of water through pipes is limited, not only by capacity but also by friction. The contact between the water and the inner wall of the pipe causes drag and turbulence, diminishing the flow. Communications channels are subject to noise that garbles messages. One of the rules of thumb that engineers had evolved was that noise diminishes the flow of information. When there’s a lot of noise, it may not be possible to transmit at all.
There is one extremely important way in which a fiber-optic cable (or any communications channel) is different from a water pipe. Water cannot be compressed, at least not much at the pressures used in household plumbing. A gallon of water always occupies a gallon’s worth of pipe. You can’t squish it into a pint in order to send more water through the same pipe. Messages are different. It is often easy to abbreviate or compress a message with no loss of meaning.
The first telegraph wires were precious commodities. Operators economized their nineteenth-century bandwidth by stripping out unnecessary words, letters, and punctuation marks. Today’s mobile phone users economize with text messages or slangy codes. As long as the receiver can figure out what was meant, that’s good enough.
You might compare messages to orange juice. Brazilian orange producers boil their juice into a syrupy concentrate. They send the concentrate to the United States, saving on shipping costs. At the end of the process, American consumers add water, getting approximately (?) what the producers started with. Sending messages efficiently also involves a process of concentrating and reconstituting. Of course, with messages as well as orange juice, there is the question of whether some of the subtler nuances have been lost.
A particularly powerful way to compress messages is to encode them. Mobile phone and Internet connections do this automatically, without us having to think about it. A good encoding scheme can compress a message a lot more than a few abbreviations can.
The code that Morse devised for his telegraph was relatively good because the most common letter, E, is represented with the shortest code, a single dot. Uncommon letters like Z have longer codes with multiple dots and dashes. This makes most messages more concise than they were in some of the early telegraphic codes. This principle, and many more subtle ones, figures in today’s codes for compressing digital pictures, audio, and video.
The success of these compression schemes implies that messages are like sponges. They are mostly “air” with little “substance.” As long as you preserve the substance, you can squeeze out the air.
The question that all of Shannon’s predecessors tried to tackle was: What is the “substance” of a message, the essential part that can’t be dispensed with? To most the answer was
. You can squeeze anything out of a message
meaning. Without meaning, there is no communication.
Shannon’s most radical insight was that meaning is irrelevant. To paraphrase Laplace, meaning was a hypothesis Shannon had no need of. Shannon’s concept of information is instead tied to
. This is not just because noise randomly scrambles messages. Information exists only when the sender is saying something that the recipient doesn’t already know and can’t predict. Because true information is unpredictable, it is essentially a series of random events like spins of a roulette wheel or rolls of dice.
If meaning is excluded from Shannon’s theory, what is the incompressible substance that exists in every message? Shannon concluded that this substance can be described in statistical terms. It has only to do with how
the stream of symbols composing the message is.
A while back, a phone company ran ads showing humorous misunderstandings resulting from mobile phone noise. A rancher calls to order “two hundred oxen.” Because of the poor voice quality, he gets two hundred dachshunds—which are no good at pulling plows at all. A wife calls her husband at work and asks him to bring home shampoo. Instead he brings home Shamu, the killer whale.
The humor of these spots derived from a gut-level understanding of Shannon’s ideas that we all share whether we know it or not. Try to analyze what happened in the Shamu commercial: (1) The wife said something like, “Pick up shampoo!” (2) The husband heard “Pick up Shamu!” (3) The husband wound up the conversation, said goodbye, and on the way home picked up the killer whale.
It is only the third action that is ridiculous. It is ridiculous because “Pick up Shamu” is an extremely low-probability message. In real conversations, we are always trying to outguess each other. We have a continuously updated sense of where the conversation is going, of what is likely to be said next, and what would be a complete non sequitur. The closer two people are (personally and culturally), the easier this game of anticipation is. A long-married couple can finish each other’s sentences. Teen best friends can be in hysterics over a three-character text message.
It would be unwise to rely on verbal shorthand when speaking to a complete stranger or someone who doesn’t share your cultural reference points. Nor would the laconic approach work, even with a spouse, when communicating a message that can’t be anticipated.
your spouse to bring home Shamu, you wouldn’t just say, “Pick up Shamu!” You would need a good explanation. The more improbable the message, the less “compressible” it is, and the more bandwidth it requires. This is Shannon’s point: the essence of a message is its improbability.
Shannon was not the first to define information approximately the way he did. His two most important predecessors were both Bell Labs scientists working in the 1920s: Harry Nyquist and Ralph Hartley. Shannon read Hartley’s paper in college and credited it as “an important influence on my life.”
As he developed these ideas, Shannon needed a name for the incompressible stuff of messages. Nyquist had used
, and Hartley had used
. In his earliest writings, Shannon favored Nyquist’s term. The military connotation of “intelligence” was fitting for the cryptographic work. “Intelligence” also implies meaning, however, which Shannon’s theory is pointedly
John von Neumann of Princeton’s Institute for Advanced Study advised Shannon to use the word
. Entropy is a physics term loosely described as a measure of randomness, disorder, or uncertainty. The concept of entropy grew out of the study of steam engines. It was learned that it is impossible to convert all the random energy of heat into useful work. A steam engine requires a temperature
to run (hot steam pushing a piston against cooler air). With time, temperature differences tend to even out, and the steam engine grinds to a halt. Physicists describe this as an increase in entropy. The famous second law of thermodynamics says that the entropy of the universe is always increasing. Things run down, fall apart, get used up.
Use “entropy” and you can never lose a debate, von Neumann told Shannon—because no one really knows what “entropy” means. Von Neumann’s suggestion was not entirely flippant. The equation for entropy in physics takes the same form as the equation for information in Shannon’s theory. (Both are logarithms of a probability measure.)
Shannon accepted von Neumann’s suggestion. He used both the word “entropy” and its usual algebraic symbol,
. Shannon later christened his Massachusetts home “Entropy House”—a name whose appropriateness was apparent to all who set eyes on its interior.
“I didn’t like the term ‘information theory,’” Robert Fano said. “Claude didn’t like it either.” But the familiar word “information” proved too appealing. It was this term that has stuck, both for Shannon’s theory and for its measure of message content.
HANNON WENT FAR BEYOND
the work of his precursors. He came up with results that surprised everyone. They seemed almost magical then. They still do.
One of these findings is that it is possible, through the encoding of messages, to use virtually the entire capacity of a communication channel. This was surprising because no one had come anywhere close to that in practice. No conventional code (Morse code, ASCII, “plain English”) is anywhere near as efficient as the theory said it could be.
It’s as if you were packing bowling balls into an orange crate. You’re going to find that there’s a lot of unused space no matter how you arrange the bowling balls, right? Imagine packing bowling balls so tightly that there’s no empty space at all—the crate is filled 100 percent with bowling balls. You can’t do this with bowling balls and crates, but Shannon said you
do it with messages and communications channels.
Another unexpected finding involves noise. Prior to Shannon, the understanding was that noise may be minimized by using up more bandwidth. To give a simple example, you might take the precaution of sending the same message three times (
Pick up shampoo—
Pick up shampoo—Pick up shampoo
). Maybe the other person receives
Pick up shampoo—Pick up Shamu—Pick up shampoo
. By comparing the three versions, the recipient can identify and correct most noise errors. The drawback is that this eats up three times the bandwidth.
Shannon proved that you can have your cake and eat it too. It is possible to encode a message so that the chance of noise errors is as small as desired—no matter how noisy the channel—and do this without using
additional bandwidth. This defied the common sense of generations of engineers. Robert Fano remarked,
To make the chance of error as small as you wish? Nobody had ever thought of that. How he got that insight, how he even came to believe such a thing, I don’t know. But almost all modern communication engineering is based on that work.
Initially it was hard to imagine how Shannon’s results would be used. No one in the 1940s pictured a day when people would navigate supermarket aisles with a mobile phone pressed to the side of their face. Bell Labs’ John Pierce had his doubts about the theory’s practical merit. Just use more bandwidth, more power, Pierce suggested. Laying cable was cheap compared to the computing power needed to use digital encoding.
Sputnik and the U.S. space program changed that mind-set. It cost millions to put a battery in space. Satellite communications had to make the best of anemic power and bandwidth. Once developed for NASA, digital codes and integrated circuits became cheap enough for consumer applications.
We would be living in a very different world today without Shannon’s work. All of our digital gear is subject to the noise of current surges, static, and cosmic rays. Every time a computer starts up, it reads megabytes of information from disk. Were even a few bits garbled, programs would be corrupted and would likely crash. Shannon’s theory showed that there is a way to make the chance of misread data negligible. The ambivalent blessing of Internet file sharing also derives from Shannon. Were it not for Shannon-inspired error-correcting codes, music and movie files would degrade every time they were transmitted over the Internet or stored on a hard disk. As one journalist put it recently, “No Shannon, no Napster.”
By the 1950s, the general press started to pick up on the importance of Shannon’s work.
magazine declared information theory to be one of humanity’s “proudest and rarest creations, a great scientific theory which could profoundly and rapidly alter man’s view of the world.”
The very name “information theory” sounded expansive and open-ended. In the 1950s and 1960s, it was often used to embrace computer science, artificial intelligence, and robotics (fields that fascinated Shannon but which he considered distinct from information theory). Thinkers intuited a cultural revolution with computers, networks, and mass media at its base.
“The word communication will be used here in a very broad sense to include all of the procedures by which one mind may affect another,” begins the introduction to a 1949 book,
The Mathematical Theory of Communication
, reprinting Shannon’s paper. “This, of course, involves not only written and oral speech, but also music, the pictorial arts, the theater, the ballet, and in fact all human behavior.” These words were written by Shannon’s former employer Warren Weaver. Weaver’s essay presented information theory as a humanistic discipline—perhaps misleadingly so.
Strongly influenced by Shannon, media theorist Marshall McLuhan coined the term “information age” in
(1964). Oracular as some of his pronouncements were, McLuhan spoke loud and clear with that concise coinage. It captured the way the electronic media (still analog in the 1960s) were changing the world. It implied, more presciently than McLuhan could have known, that Claude Shannon was a prime mover in that revolution.
There were earnest attempts to apply information theory to semantics, linguistics, psychology, economics, management, quantum physics, literary criticism, garden design, music, the visual arts, and even religion. (In 1949 Shannon was drawn into a correspondence with science fiction writer L. Ron Hubbard, apparently by way of John Pierce. Hubbard had just devised “Dianetics,” and Shannon referred him to Warren McCulloch, a scientist working on neural networks. To this day Hubbard’s Scientology faith cites Shannon and information theoretic jargon in its literature and web sites. Hubbard was known for repeating George Orwell’s dictum that the way to get rich is to start a religion.)
Shannon himself dabbled with an information-theoretic analysis of James Joyce’s
. Betty Shannon created some of the first “computer-generated” music with Pierce. Bell Labs was an interdisciplinary place. Several of its scientists, notably Billy Kluver, collaborated with the New York avant-garde: John Cage, Robert Rauschenberg, Nam June Paik, Andy Warhol, David Tudor, and others, some of whom lived and worked steps away from Bell Labs’ Manhattan building on West Street. Many of these artists were acquainted with at least the name of Claude Shannon and the conceptual gist of his theory. To people like Cage and Rauschenberg, who were exploring how minimal a work of music or art may be, information theory appeared to have something to say—even if no one was ever entirely sure what.
Shannon came to feel that information theory had been over-sold. In a 1956 editorial he gently derided the information theory “bandwagon.” People who did not understand the theory deeply were seizing on it as a trendy metaphor and overstating its relevance to fields remote from its origin. Other theorists such as Norbert Wiener and Peter Elias took up this theme. It was time, Elias acidly wrote, to stop publishing papers with titles like “Information Theory, Photosynthesis, and Religion.”
To Shannon, Wiener, and Elias, the question of information theory’s relevance was more narrowly defined than it was for Marshall McLuhan. Does information theory have deep relevance to any field outside of communications? The answer, it appeared, is yes. That is what a physicist named John Kelly described, in a paper he titled “Information Theory and Gambling.”