The Emperor of All Maladies: A Biography of Cancer (55 page)

Read The Emperor of All Maladies: A Biography of Cancer Online

Authors: Siddhartha Mukherjee

Tags: #Civilization, #Medical, #History, #Social Science, #General

BOOK: The Emperor of All Maladies: A Biography of Cancer
9.96Mb size Format: txt, pdf, ePub

Screening trials in cancer are among the most slippery of all clinical trials—notoriously difficult to run, and notoriously susceptible to errors. To understand why, consider the odyssey from the laboratory to the clinic of a screening test for cancer. Suppose a new test has been invented in the laboratory to detect an early, presymptomatic stage of a particular form of cancer, say, the level of a protein secreted by cancer cells into the serum. The first challenge for such a test is technical: its performance in the real world. Epidemiologists think of screening tests as possessing two characteristic performance errors. The first error is overdiagnosis—when an individual tests positive in the test but does not have cancer. Such individuals are called “false positives.” Men and women who falsely test positive find themselves trapped in the punitive stigma of cancer, the familiar cycle of anxiety and terror (and the desire to “do something”) that precipitates further testing and invasive treatment.

The mirror image of overdiagnosis is
underdiagnosis
—an error in which a patient truly has cancer but does not test positive for it. Underdiagnosis falsely reassures patients of their freedom from disease. These men and
women (“false negatives” in the jargon of epidemiology) enter a different punitive cycle—of despair, shock, and betrayal—once their disease, undetected by the screening test, is eventually uncovered when it becomes symptomatic.

The trouble is that overdiagnosis and underdiagnosis are often intrinsically conjoined, locked perpetually on two ends of a seesaw. Screening tests that strive to limit overdiagnosis—by narrowing the criteria by which patients are classified as positive—often pay the price of increasing underdiagnosis because they miss patients that lie in the gray zone between positive and negative. An example helps to illustrate this trade-off. Suppose—to use Egan’s vivid metaphor—a spider is trying to invent a perfect web to capture flies out of the air. Increasing the density of that web, she finds, certainly increases the chances of catching real flies (true positives) but it also increases the chances of capturing junk and debris floating through the air (false positives). Making the web less dense, in contrast, decreases the chances of catching real prey, but every time something
is
captured, chances are higher that it is a fly. In cancer, where both overdiagnosis and underdiagnosis come at high costs, finding that exquisite balance is often impossible. We want every cancer test to operate with perfect specificity and sensitivity. But the technologies for screening are not perfect. Screening tests thus routinely fail because they cannot even cross this preliminary hurdle—the rate of over- or underdiagnosis is unacceptably high.

Suppose, however, our new test does survive this crucial bottleneck. The rates of overdiagnosis and underdiagnosis are deemed acceptable, and we unveil the test on a population of eager volunteers. Suppose, moreover, that as the test enters the public domain, doctors immediately begin to detect early, benign-appearing, premalignant lesions—in stark contrast to the aggressive, fast-growing tumors seen before the test. Is the test to be judged a success?

No; merely
detecting
a small tumor is not sufficient. Cancer demonstrates a spectrum of behavior. Some tumors are inherently benign, genetically determined to never reach the fully malignant state; and some tumors are intrinsically aggressive, and intervention at even an early, presymptomatic stage might make no difference to the prognosis of a patient. To address the inherent behavioral heterogeneity of cancer, the screening test must go further. It must increase survival.

Imagine, now, that we have designed a trial to determine whether our
screening test increases survival. Two identical twins, call them Hope and Prudence, live in neighboring houses and are offered the trial. Hope chooses to be screened by the test. Prudence, suspicious of overdiagnosis and underdiagnosis, refuses to be screened.

Unbeknownst to Hope and Prudence, identical forms of cancer develop in both twins at the exact same time—in 1990. Hope’s tumor is detected by the screening test in 1995, and she undergoes surgical treatment and chemotherapy. She survives five additional years, then relapses and dies ten years after her original diagnosis, in 2000. Prudence, in contrast, detects her tumor only when she feels a growing lump in her breast in 1999. She, too, has treatment, with some marginal benefit, then relapses and dies at the same moment as Hope in 2000.

At the joint funeral, as the mourners stream by the identical caskets, an argument breaks out among Hope’s and Prudence’s doctors. Hope’s physicians insist that she had a five-year survival: her tumor was detected in 1995 and she died in 2000. Prudence’s doctors insist that
her
survival was one year: Prudence’s tumor was detected in 1999 and she died in 2000. Yet both cannot be right: the twins died from the same tumor at the exact same time. The solution to this seeming paradox—called lead-time bias—is immediately obvious. Using
survival
as an end point for a screening test is flawed because early detection pushes the clock of diagnosis backward. Hope’s tumor and Prudence’s tumor possess exactly identical biological behavior. But since doctors detected Hope’s tumor earlier, it seems, falsely, that she lived longer and that the screening test was beneficial.

So our test must now cross an additional hurdle: it must improve
mortality
, not survival. The only appropriate way to judge whether Hope’s test was truly beneficial is to ask whether Hope
lived longer
regardless of the time of her diagnosis. Had Hope lived until 2010 (outliving Prudence by a decade), we could have legitimately ascribed a benefit to the test. Since both women died at the exact same moment, we now discover that screening produced no benefit.

A screening test’s path to success is thus surprisingly long and narrow. It must avoid the pitfalls of overdiagnosis and underdiagnosis. It must steer past the narrow temptation to use early detection as an end in itself. Then, it must navigate the treacherous straits of bias and selection. “Survival,” seductively simple, cannot be its end point. And adequate randomization at each step is critical. Only a test capable of meeting all these criteria—proving mortality benefit in a genuinely randomized set
ting with an acceptable over- and underdiagnosis rate—can be judged a success. With the odds stacked so steeply, few tests are powerful enough to withstand this level of scrutiny and truly provide benefit in cancer.

In the winter of 1963, three men set out
to test whether screening a large cohort of asymptomatic women using mammography would prevent mortality from breast cancer. All three, outcasts from their respective fields, were seeking new ways to study breast cancer. Louis Venet, a surgeon trained in the classical tradition, wanted to capture early cancers as a means to avert the large and disfiguring radical surgeries that had become the norm in the field. Sam Shapiro, a statistician, sought to invent new methods to mount statistical trials. And Philip Strax, a New York internist, had perhaps the most poignant of reasons: he had nursed his wife through the torturous terminal stages of breast cancer in the mid-1950s. Strax’s attempt to capture preinvasive lesions using X-rays was a personal crusade to unwind the biological clock that had ultimately taken his wife’s life.

Venet, Strax, and Shapiro were sophisticated clinical trialists: right at the onset, they realized that they would need a randomized, prospective trial using mortality as an end point to test mammography. Methodologically speaking, their trial would recapitulate Doll and Hill’s famous smoking trial of the 1950s. But how might such a trial be logistically run? The Doll and Hill study had been the fortuitous by-product of the nationalization of health care in Great Britain—its stable cohort produced, in large part, by the National Health Service’s “address book” of registered doctors across the United Kingdom. For mammography, in contrast, it was the sweeping wave of privatization in postwar America that provided the opportunity to run the trial. In the summer of 1944, lawmakers in New York unveiled a novel program to provide subscriber-based health insurance to groups of employees in New York. This program, called the Health Insurance Plan (HIP), was the ancestor of the modern HMO.

The HIP filled a great void in insurance.
By the mid-1950s, a triad of forces
—immigration, World War II, and the Depression—had brought women out of their homes to comprise nearly one-third of the total workforce in New York. These working women sought health insurance, and the HIP, which allowed its enrollees to pool risks and thereby reduce costs, was a natural solution.
By the early 1960s, the plan had enrolled
more than three hundred thousand subscribers spread across thirty-one medi
cal groups in New York—nearly eighty thousand of them women.

Strax, Shapiro, and Venet were quick to identify the importance of the resource: here was a defined—“captive”—cohort of women spread across New York and its suburbs that could be screened and followed over a prolonged time. The trial was kept deliberately simple: women enrollees in the HIP between the ages of forty and sixty-four were divided into two groups. One group was screened with mammography while the other was left unscreened. The ethical standards for screening trials in the 1960s made the identification of the groups even simpler. The unscreened group—i.e., the one not offered mammography—was not even required to give consent; it could just be enrolled passively in the trial and followed over time.

The trial, launched in December 1963, was instantly a logistic nightmare. Mammography was cumbersome: a machine the size of a full-grown bull; photographic plates like small windowpanes; the slosh and froth of toxic chemicals in a darkroom. The technique was best performed in dedicated X-ray clinics, but unable to convince women to travel to these clinics (many of them located uptown),
Strax and Venet eventually outfitted a mobile van
with an X-ray machine and parked it in midtown Manhattan, alongside the ice-cream trucks and sandwich vendors, to recruit women into the study during lunch breaks.
*

Strax began an obsessive campaign of recruitment. When a subject refused to join the study, he would call, write, and call her again to persuade her to join. The clinics were honed to a machinelike precision to allow thousands of women to be screened in a day:


Interview . . . 5 stations X 12 women
per hour = 60 women. . . . Undress-Dress cubicles: 16 cubicles X 6 women per hour = 96 women per hour. Each cubicle provides one square of floor space for dress-undress and contains four clothes lockers for a total of 64. At the close of the ‘circle,’ the woman enters the same cubicle to obtain her clothes and dress. . . . To expedite turnover, the amenities of chairs and mirrors are omitted.”

Curtains rose and fell. Closets opened and closed. Chairless and mirrorless rooms let women in and out. The merry-go-round ran through the day and late into the evening. In an astonishing span of six years, the trio completed a screening that would ordinarily have taken two decades to complete.

If a tumor was detected by mammography, the woman was treated according to the conventional intervention available at the time—surgery, typically a radical mastectomy, to remove the mass (or surgery followed by radiation). Once the cycle of screening and intervention had been completed, Strax, Venet, and Shapiro could watch the experiment unfold over time by measuring breast cancer mortality in the screened versus unscreened groups.

Other books

Chain Reaction by Zoe Archer
It Happened One Knife by COHEN, JEFFREY
Sapphire Blue by Kerstin Gier
Low Road by Eddie B. Allen, Jr.
1636 The Kremlin Games by Eric Flint, Gorg Huff, Paula Goodlett
Discord’s Apple by Carrie Vaughn
Sex Position Sequences by Susan Austin
Speed of My Heart by Erika Trevathan