Read The Most Human Human Online
Authors: Brian Christian
Ah. Okay
.
____rry!
Presumably “sorry,” and said with a friendly finality that seemed to signal that she was expecting me to hang up any second: probably this balcony room was out of my league. Fair enough.
Ok, thanks! Bye
.
____ye, now!
I suppose I got off the phone with a mixture of bemusement and guilt—it hadn’t actually been all that necessary to hear what she was saying. I
knew
what she was saying. The boilerplate, the template of the conversation—my ability to guess at what she was asking me and what her possible responses could be—pulled me through.
It occurred to me that I’d been able to pull off the same trick the last time I’d been in Europe, a two-week-long Eurail-pass whirlwind through Spain, France, Switzerland, and Italy the summer after college: though I speak only English and some Spanish, I did much of the ticket buying in France and Italy, and managed for the most part to pull it off. Granted, I did nod understandingly (and, I confess, impatiently) when the woman sold us our overnight tickets to Salzburg and kept stressing, it seemed to me unnecessarily, “est … station … est … est … station”—“This station, this one, I understand, yeah, yeah,” I replied, knowing, of course, that in Spanish “este” means
this
. Of Paris’s seven train stations, our overnighter would be coming to
this
one, right here.
Perhaps you’re saying, “Wait, Salzburg? But he didn’t say anything about seeing Austria …” Indeed.
What I failed to remember, of course, is that “este” in Spanish
also
means
east
—a fact that dawns on me as we stand dumbfounded on a ghostly empty train platform at midnight in Paris’s Austerlitz Station, checking our watches, realizing that not only have we blown the chance at realizing our Austrian
Sound of Music
Alp-running fantasies, but we’ll have to do an emergency rerouting of our entire, now Austria-less, itinerary, as we’ll be just about a million meters off course by morning. Also: it’s now half past midnight, the guidebook is describing our present location as “dicey,” and we don’t have a place to sleep. Our beds have just departed from
East Station
, fast on their way to Salzburg.
Now, this rather—ahem—serious exception aside, I want to emphasize that by and large we did just fine, armed in part with my knowledge of a sibling Romance language, and in part with a guidebook that included useful phrases in every European tongue. You realize the talismanic power of language in these situations: you read some phonetically spelled gobbledygook off a sheet, and before you know it, abracadabra, beers have appeared at your table, or a hostel room has been reserved in your name, or you’ve been directed down a mazelike alley to the epicenter of late-night flamenco. “Say the magic word,” the expression goes, but all words seem, in one way or another, to conjure.
The dark side of this is that the sub-fluent traveler risks solipsism—which can only be cracked by what linguists and information theorists call “surprisal,” which is more or less the fancy academic way of saying “surprise.” The amazing thing about surprisal, though, is that it can actually be
quantified
numerically. A very strange idea—and a very important one. We’ll see how exactly that quantification happens later in this chapter; for now, suffice it to say that, intuitively, a country can only become real to you, that is, step out of the shadow of your stereotypes of the place, by surprising you. Part
of this requires that you
pay attention
—most of life’s surprises are on the small side and often go unnoticed. The other part of it requires that you put yourself into situations where surprise is possible; sometimes this requires merely the right attitude of openness on your part, but other times it’s impossible without serious effort and commitment ahead of time (e.g., learning the language). Template-based interactions—“Je voudrais un hot dog, s’il vous plaît … merci!”; “Où est le WC? … merci!”—where you more or less treat your interlocutor as a machine, are navigable for precisely the reason that they are of almost no cultural or experiential value. Even if your interlocutor’s response is surprising or interesting, you might miss it.
Wielding
language’s magic is intoxicating; becoming
susceptible
to it, even more so.
Perhaps you’re starting to feel by now how all of this parallels the Turing test. In France I behaved, to my touristy chagrin,
like a bot. Speaking
was the easy part—provided I kept to the phrase book (this in itself was embarrassing, that my desires were so similar to those of every other American tourist in France that a one-size-fits-all FAQ sheet sufficed handily). But
listening
was almost impossible. So I tried only to have interactions that didn’t really require it.
Interacting with humans in this way is, I believe, shameful. The Turing test, bless it, has now given us a yardstick for this shame.
It seems, at first glance, that information theory—the science of data transmission, data encryption, and data compression—would be mostly a question of engineering, having little to do with the psychological and philosophical questions that surround the Turing test and AI. But these two ships turn out to be sailing quite the same seas. The landmark paper that launched information theory is Claude Shannon’s 1948 “A Mathematical Theory of Communication,” and as it happens, this notion of scientifically evaluating “communication” binds information theory and the Turing test to each other from the get-go.
What is it, exactly, that Shannon identified as the essence of communication? How do you
measure
it? How does it help us, and how does it hurt us—and what does it have to do with being human?
These connections present themselves in all sorts of unlikely places, and one among them is your phone. Cell phones rely heavily on “prediction” algorithms to facilitate text-message typing: guessing what word you’re attempting to write, auto-correcting typos (sometimes overzealously), and the like—this is data compression in action. One of the startling results that Shannon found in “A Mathematical Theory of Communication” is that text prediction and text
generation
turn out to be mathematically equivalent. A phone that could consistently anticipate what you were intending to write, or at least that could do as well as a human, would be just as intelligent as the program that could
write you back
like a human. Meaning that the average American teenager, going by the
New York Times
’s 2009 statistics on cell phone texting, participates in roughly eighty Turing tests a day.
This turns out to be incredibly
useful
and also incredibly
dangerous
. In charting the links between data compression and the Turing test’s hunt for the human spark, I’ll explore why. I want to begin with a little experiment I did recently, to see if it was possible to use a computer to quantify the literary value of James Joyce.
1
I took a passage from
Ulysses
at random and saved it on my computer as raw text: 1,717 bytes.
Then I wrote the words “blah blah blah” over and over until it matched the length of the Joyce excerpt, and saved that: 1,717 bytes.
Then I had my computer’s operating system, which happens to be Mac OS X, try to compress them. The “blah” file compressed all the
way down to 478 bytes, just 28 percent of its previous size, but
Ulysses
only came down to 79 percent of its prior size, or 1,352 bytes—leaving it nearly three times as large as the “blah” file.
When the compressor pushed down, something in the Joyce pushed back.
Imagine flipping a coin a hundred times. If it’s a fair coin, you can expect about fifty heads and fifty tails, of course, distributed randomly throughout the hundred. Now, imagine telling someone which flips came out which—it’d be a mouthful, of course. You could name all of the outcomes in a row (“heads, heads, tails, heads, tails, …”) or just the location of either just the heads (“the first, the second, the fourth, …”) or just the tails, letting the other be implicit, both of which come out to be about the same length.
2
But if it’s a biased coin, your job gets easier. If the coin comes up heads only 30 percent of the time, then you can save breath by just naming which flips came up heads. If it’s heads
80
percent of the time, you simply name which flips were
tails
. The more biased the coin, the easier the description becomes, all the way up to a completely biased coin, our “boundary case,” which compresses down to a single word—“heads” or “tails”—that describes the entire set of results.
So, if the result of the flips can be expressed with less language the more biased the coin is, then we might argue that in these cases the result literally contains less
information
. This logic extends down, perhaps counterintuitively, perhaps eerily, into the individual events themselves—for any
given
flip, the more biased the coin, the less information the flip contains. There’s a sense in which flipping the
seventy-thirty coin just doesn’t deliver what flipping the fifty-fifty coin does.
3
This is the intuition of “information entropy”: the notion that the amount of information in something can be measured.
“Information can be measured”—at first this sounds trivial, of course. We buy hard drives and fill them up, wonder if shelling out the extra fifty dollars for the 16 GB iPod will be worth it compared to the 8 GB one, and so on. We’re used to files having size values in bytes. But the size of a file is not the same thing as the amount of information in a file. Consider as an analogue the difference between volume and mass; consider Archimedes and the golden crown—in order to determine whether the crown’s gold was pure, Archimedes needed to figure out how to compare its mass to its volume.
4
How do we arrive at the density of a
file
, the karat of its bytes?
We could compress the biased coin
because
it was biased. Crucially, if all outcomes of a situation are equally probable—what’s called a “uniform distribution”—then entropy is at its maximum. From there it decreases, all the way to a minimum value when the outcome is fixed or certain. Thus we might say that as a file hits its compression
floor, the fixities and certainties shake out; pattern and repetition shake out; predictability and expectancy shake out; the resulting file—before it’s decompressed back into its useful form—starts looking more and more random, more and more like white noise.
Information, defined intuitively and informally, might be something like “uncertainty’s antidote.” This turns out also to be the formal definition—the amount of information comes from the amount by which something reduces uncertainty. (Ergo, compressed files look random: nothing about bits 0 through
n
gives you any sense what bit
n
+ 1 will be—that is, there is no pattern or trend or bias noticeable in the digits—otherwise there would be room for further compression.
5
) This value, the informational equivalent of mass, comes originally from Shannon’s 1948 paper and goes by the name of “information entropy” or “Shannon entropy” or just “entropy.”
6
The higher the entropy, the more information there is. It turns out to be a value capable of measuring a startling array of things—from the flip of a coin to a telephone call, to a Joyce novel, to a first date, to last words, to a Turing test.
One of the most useful tools for quantitatively analyzing English goes by the name of the Shannon Game. It’s kind of like playing hangman, one letter at a time: the basic idea is that you try to guess the letters of a text, one by one, and the (logarithm of the) total number of guesses
required tells you the entropy of that passage. The idea is to estimate how much knowledge native speakers bring to a text. Here’s the result of a round of the Shannon Game, played by yours truly:
7
We can see immediately that the information entropy here is wildly nonuniform: I was able to predict “the_living_room_is_a_” completely correctly, but almost exhausted the entire alphabet before getting the
h
of “handful”—and note how the “and” in “handful” comes easily but the entropy spikes up again at the
f
, then goes back down to the minimum at
l
. And “remo” was all I needed to fill in “te_control.”
We computer users of the twenty-first century are perhaps more aware of information entropy—if not by name—than any generation before. When I use Google, I intuitively type in the most unusual or infrequent words or phrases, ignoring more common or expected words, as they won’t much narrow down my results. When I want to locate a passage in the huge MS Word document that contains this manuscript, I intuitively start to type the most unusual part of the passage I have in mind: either a proper noun or an unusual diction choice or a unique turn of phrase.
8
Part of the effectiveness of the strange editing mark “tk,” for “to come,” is that
k
rarely follows
t
in English, much more rarely than
c
follows
t
, and so a writer can easily use a computer to sift through a document and tell him whether there are any “tk’s” he missed. (Searching this manuscript for “tc” pulls up over 150 red herrings, like “watch,” “match,” throughout the manuscript; but with only one exception, all occurrences of “tk”—out of the roughly half-million characters that comprise a book—appear in this paragraph.) When I want to pull up certain songs or a certain band in my iTunes library, say Outkast, I know that “out” is such a prevalent string of letters (which pulls up all Outkast songs, plus 438 others I don’t want) that I’m better off just typing “kast” into the search box. Or even just that same trusty rare bigram “tk,” which pulls up all the songs I want and only
three
I don’t.