Data and Goliath (3 page)

Read Data and Goliath Online

Authors: Bruce Schneier

BOOK: Data and Goliath
9.8Mb size Format: txt, pdf, ePub

Last year, when my refrigerator broke, the serviceman replaced the computer that controls
it. I realized that I had been thinking about the refrigerator backwards: it’s not
a refrigerator with a computer, it’s a computer that keeps food cold. Just like that,
everything is turning into a computer. Your phone is a computer that makes calls.
Your car is a computer with wheels and an engine. Your oven is a computer that bakes
lasagnas. Your camera is a computer that takes pictures. Even our pets and livestock
are now regularly chipped; my cat is practically a computer that sleeps in the sun
all day.

Computers are getting embedded into more and more kinds of products that connect to
the Internet. A company called Nest, which Google purchased in 2014 for more than
$3 billion, makes an Internet-enabled thermostat. The smart thermostat adapts to your
behavior patterns and responds to what’s happening on the power grid. But to do all
that, it records more than your energy usage: it also tracks and records your home’s
temperature, humidity, ambient light, and any nearby movement. You can buy a smart
refrigerator that tracks the expiration dates of food, and a smart air conditioner
that can learn your preferences and maximize energy efficiency.
There’s more coming: Nest is now selling a smart smoke and carbon monoxide detector
and is planning a whole line of additional home sensors. Lots of other companies are
working on a wide range of smart appliances. This will all be necessary if we want
to build the smart power grid, which will reduce energy use and greenhouse gas emissions.

We’re starting to collect and analyze data about our bodies as a means of improving
our health and well-being. If you wear a fitness tracking device like Fitbit or Jawbone,
it collects information about your movements awake and asleep, and uses that to analyze
both your exercise and sleep habits. It can determine when you’re having sex. Give
the device more information about yourself—how much you weigh, what you eat—and you
can learn even more. All of this data you share is available online, of course.

Many medical devices are starting to be Internet-enabled, collecting and reporting
a variety of biometric data. There are already—or will be soon—devices that continually
measure our vital signs, our moods, and our brain activity. It’s not just specialized
devices; current smartphones have some pretty sensitive motion sensors. As the price
of DNA sequencing continues to drop, more of us are signing up to generate and analyze
our own genetic data. Companies like 23andMe hope to use genomic data from their customers
to find genes associated with disease, leading to new and highly profitable cures.
They’re also talking about personalized marketing, and insurance companies may someday
buy their data to make business decisions.

Perhaps the extreme in the data-generating-self trend is lifelogging: continuously
capturing personal data. Already you can install lifelogging apps that record your
activities on your smartphone, such as when you talk to friends, play games, watch
movies, and so on. But this is just a shadow of what lifelogging will become. In the
future, it will include a video record. Google Glass is the first wearable device
that has this potential, but others are not far behind.

These are examples of the Internet of Things. Environmental sensors will detect pollution
levels. Smart inventory and control systems will reduce waste and save money. Internet-connected
computers will be in everything—smart
cities, smart toothbrushes, smart lightbulbs, smart sidewalk squares, smart pill bottles,
smart clothing—because why not? Estimates put the current number of Internet-connected
devices at 10 billion. That’s already more than the number of people on the planet,
and I’ve seen predictions that it will reach 30 billion by 2020. The hype level is
pretty high, and we don’t yet know which applications will work and which will be
duds. What we do know is that they’re all going to produce data, lots of data. The
things around us will become the eyes and ears of the Internet.

The privacy implications of all this connectivity are profound. All those smart appliances
will reduce greenhouse gas emissions—and they’ll also stream data about how people
move around within their houses and how they spend their time. Smart streetlights
will gather data on people’s movements outside. Cameras will only get better, smaller,
and more mobile. Raytheon is planning to fly a blimp over Washington, DC, and Baltimore
in 2015 to test its ability to track “targets”—presumably vehicles—on the ground,
in the water, and in the air.

The upshot is that we interact with hundreds of computers every day, and soon it will
be thousands. Every one of those computers produces data. Very little of it is the
obviously juicy kind: what we ordered at a restaurant, our heart rate during our evening
jog, or the last love letter we wrote. Rather, much of it is a type of data called
metadata
. This is data about data—information a computer system uses to operate or data that’s
a by-product of that operation. In a text message system, the messages themselves
are data, but the accounts that sent and received the message, and the date and time
of the message, are all metadata. An e-mail system is similar: the text of the e-mail
is data, but the sender, receiver, routing data, and message size are all metadata—and
we can argue about how to classify the subject line. In a photograph, the image is
data; the date and time, camera settings, camera serial number, and GPS coordinates
of the photo are metadata. Metadata may sound uninteresting, but, as I’ll explain,
it’s anything but.

Still, this smog of data we produce is not necessarily a result of deviousness on
anyone’s part. Most of it is simply a natural by-product of computing. This is just
the way technology works right now. Data is the exhaust of the information age.

HOW MUCH DATA?

Some quick math. Your laptop probably has a 500-gigabyte hard drive. That big backup
drive you might have purchased with it can probably store two or three terabytes.
Your corporate network might have one thousand times that: a petabyte. There are names
for bigger numbers. A thousand petabytes is an exabyte (a billion billion bytes),
a thousand exabytes is a zettabyte, and a thousand zettabytes is a yottabyte. To put
it in human terms, an exabyte of data is 500 billion pages of text.

All of our data exhaust adds up. By 2010, we as a species were creating more data
per day than we did from the beginning of time until 2003. By 2015, 76 exabytes of
data will travel across the Internet every year.

As we start thinking of all this data, it’s easy to dismiss concerns about its retention
and use based on the assumption that there’s simply too much of it to save, and in
any case it would be too hard to sift through for nuggets of meaningful information.
This used to be true. In the early days of computing, most of this data—and certainly
most of the metadata—was thrown away soon after it was created. Saving it took too
much memory. But the cost of all aspects of computing has continuously fallen over
the years, and amounts of data that were impractical to store and process a decade
ago are easy to deal with today. In 2015, a petabyte of cloud storage will cost $100,000
per year, down 90% from $1 million in 2011. The result is that more and more data
is being stored.

You could probably store every tweet ever sent on your home computer’s disk drive.
Storing the voice conversation from every phone call made in the US requires less
than 300 petabytes, or $30 million, per year. A continuous video lifelogger would
require 700 gigabytes per year per person. Multiply that by the US population and
you get 2 exabytes per year, at a current cost of $200 million. That’s expensive but
plausible, and the price will only go down. In 2013, the NSA completed its massive
Utah Data Center in Bluffdale. It’s currently the third largest in the world, and
the first of several that the NSA is building. The details are classified, but experts
believe it can store about 12 exabytes of data. It has cost $1.4 billion so far. Worldwide,
Google has the capacity to store 15 exabytes.

What’s true for organizations is also true for individuals, and I’m a case study.
My e-mail record stretches back to 1993. I consider that
e-mail archive to be part of my brain. It’s my memories. There isn’t a week that
goes by that I don’t search that archive for something: a restaurant I visited some
years ago, an article someone once told me about, the name of someone I met. I send
myself reminder e-mails all the time; not just reminders of things to do when I get
home, but reminders of things that I might want to recall years in the future. Access
to that data trove is access to me.

I used to carefully sort all that e-mail. I had to decide what to save and what to
delete, and I would put saved e-mails into hundreds of different folders based on
people, companies, projects, and so on. In 2006, I stopped doing that. Now, I save
everything in one large folder. In 2006, for me, saving and searching became easier
than sorting and deleting.

To understand what all this data hoarding means for individual privacy, consider Austrian
law student Max Schrems. In 2011, Schrems demanded that Facebook give him all the
data the company had about him. This is a requirement of European Union (EU) law.
Two years later, after a court battle, Facebook sent him a CD with a 1,200-page PDF:
not just the friends he could see and the items on his newsfeed, but all of the photos
and pages he’d ever clicked on and all of the advertising he’d ever viewed. Facebook
doesn’t use all of this data, but instead of figuring out what to save, the company
finds it easier to just save everything.

2

Data as Surveillance

G
overnments and corporations gather, store, and analyze the tremendous amount of data
we chuff out as we move through our digitized lives. Often this is without our knowledge,
and typically without our consent. Based on this data, they draw conclusions about
us that we might disagree with or object to, and that can impact our lives in profound
ways. We may not like to admit it, but we are under mass surveillance.

Much of what we know about the NSA’s surveillance comes from Edward Snowden, although
people both before and after him also leaked agency secrets. As an NSA contractor,
Snowden collected tens of thousands of documents describing many of the NSA’s surveillance
activities. In 2013, he fled to Hong Kong and gave them to select reporters. For a
while I worked with Glenn Greenwald and the
Guardian
newspaper, helping analyze some of the more technical documents.

The first news story to break that was based on the Snowden documents described how
the NSA collects the cell phone call records of every American. One government defense,
and a sound bite repeated ever since, is that the data collected is “only metadata.”
The intended point was that the NSA wasn’t collecting the words we spoke during our
phone conversations, only the phone numbers of the two parties, and the date, time,
and duration of the call. This seemed to mollify many people, but it shouldn’t
have. Collecting metadata on people means putting them under surveillance.

An easy thought experiment demonstrates this. Imagine that you hired a private detective
to eavesdrop on someone. The detective would plant bugs in that person’s home, office,
and car. He would eavesdrop on that person’s phone and computer. And you would get
a report detailing that person’s conversations.

Now imagine that you asked the detective to put that person under surveillance. You
would get a different but nevertheless comprehensive report: where he went, what he
did, who he spoke with and for how long, who he wrote to, what he read, and what he
purchased. That’s metadata.

Eavesdropping gets you the conversations; surveillance gets you everything else.

Telephone metadata alone reveals a lot about us. The timing, length, and frequency
of our conversations reveal our relationships with others: our intimate friends, business
associates, and everyone in-between. Phone metadata reveals what and who we’re interested
in and what’s important to us, no matter how private. It provides a window into our
personalities.
It yields a detailed summary of what’s happening to us at any point in time.

A Stanford University experiment examined the phone metadata of about 500 volunteers
over several months. The personal nature of what the researchers could deduce from
the metadata surprised even them, and the report is worth quoting:

•  Participant A communicated with multiple local neurology groups, a specialty pharmacy,
a rare-condition management service, and a hotline for a pharmaceutical used solely
to treat relapsing multiple sclerosis.

•  Participant B spoke at length with cardiologists at a major medical center, talked
briefly with a medical laboratory, received calls from a pharmacy, and placed short
calls to a home reporting hotline for a medical device used to monitor cardiac arrhythmias.

•  Participant C made a number of calls to a firearms store that specializes in the
AR semiautomatic rifle platform, and also spoke at length with customer service for
a firearm manufacturer that produces an AR line.

•  In a span of
three weeks, Participant D contacted a home improvement store, locksmiths, a hydroponics
dealer, and a head shop.

•  Participant E had a long early morning call with her sister. Two days later, she
placed a series of calls to the local Planned Parenthood location. She placed brief
additional calls two weeks later, and made a final call a month after.

That’s a multiple sclerosis sufferer, a heart attack victim, a semiautomatic weapons
owner, a home marijuana grower, and someone who had an abortion, all from a single
stream of metadata.

Web search data is another source of intimate information that can be used for surveillance.
(You can argue whether this is data or metadata. The NSA claims it’s metadata because
your search terms are embedded in the URLs.) We don’t lie to our search engine. We’re
more intimate with it than with our friends, lovers, or family members. We always
tell it exactly what we’re thinking about, in words as clear as possible. Google knows
what kind of porn each of us searches for, which old lovers we still think about,
our shames, our concerns, and our secrets. If Google decided to, it could figure out
which of us is worried about our mental health, thinking about tax evasion, or planning
to protest a particular government policy. I used to say that Google knows more about
what I’m thinking of than my wife does. But that doesn’t go far enough. Google knows
more about what I’m thinking of
than I do
, because Google remembers all of it perfectly and forever.

Other books

Shadows Over Paradise by Isabel Wolff
Cresting Tide by Brenda Cothern
Sam Bass by Bryan Woolley
P.S. Be Eleven by Rita Williams-Garcia
Angel Song by Sheila Walsh
Go: A Surrender by Jane Nin
Brodmaw Bay by F.G. Cottam