Total Recall: How the E-Memory Revolution Will Change Everything (6 page)

Read Total Recall: How the E-Memory Revolution Will Change Everything Online

Authors: C. Gordon Bell,Jim Gemmell

Tags: #Computers, #Social Aspects, #Human-Computer Interaction, #Science, #Biotechnology, #Philosophy & Social Aspects

BOOK: Total Recall: How the E-Memory Revolution Will Change Everything
9.87Mb size Format: txt, pdf, ePub

Bush was writing with scientists in mind. “There is a growing mountain of research,” he lamented. “We are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember.”

But he also realized that quantity was not the core problem. “The difficulty,” he wrote, “seems to be not so much that we publish unduly . . . but rather that publication has been extended far beyond our present ability to make real use of the record . . . [It] must be continuously extended, it must be stored, and above all it must be consulted.”

Bush wanted to free his fellow scientists from the drudgery of searching and cross-referencing their books, journals, and notes so that they could focus more on the creative side of their work.

“Creative thought and essentially repetitive thought are very different things,” said Bush. “For the latter there are, and may be, powerful mechanical aids.”

Bush did not want any scientist to worry about running out of space in his or her storage unit, which would imply having to discard items that might later prove useful. Memex was to have infinite storage. “If the user inserted five thousand pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely,” he wrote.

Memex would allow the scientist to annotate any item in the collection by speech or writing. Bush also wanted to support the way our minds work in associating one idea with another. He contrasted existing data storage with biological memory:

When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and reenter on a new path.
The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

Bush hoped that “[selection] by association, rather than indexing, may yet be mechanized.” To this end, he proposed that “trails” be created, connecting one document with the next in a sequence that could be followed again later. Trails could be given names, were something that you could share with your friends, and all the links were two-way. (The familiar hyperlinks on the World Wide Web are only partially realized trails. They are one-way, and are not grouped or named.)

Bush’s memex was inspirational. The time was ripe to realize his dream—and to extend it far beyond the realm of scientific research into the lives of everyone.

A REAL MEMEX

We named our research project MyLifeBits, and adopted memex as its minimal requirement. Our goals were twofold:

1. To create software for lifelogging, and the subsequent recall and usage of one’s e-memories. We wanted software to record a diverse array of information about one’s life and activities, from a variety of sources and devices, and to do so as easily, as unobtrusively, and as automatically as possible. The software would have to give people powerful tools for searching, organizing, annotating, and pattern-mining their ultimately huge e-memories.
2. To identify the benefits, drawbacks, technical issues, sticking points, and usability of Total Recall in real life. We wanted to try it out (as much as we could) and see what it was like.

Since 2001 I have been serving as the primary test subject, but Jim Gemmell is also an avowed user, while Roger and Vicki have tried out numerous aspects of it in real life. A number of universities also have used our software and have experimented with it.

MyLifeBits is not a commercial product; it is a research project. In fact, MyLifeBits software is not a single application. It is a prototypical suite of applications, and a storage system that blends files with a database. You won’t see Microsoft eventually ship MyLifeBits version 1.0. Instead, you will gradually see more and more of the kinds of things done in MyLifeBits also being done in operating systems and in applications.

Our aim was to preview and to help lay the groundwork for the Total Recall systems that are coming soon—very soon—and that by 2020 will be as commonplace as Web browsers and cell phones are today. A few early steps in the Total Recall revolution have already hit the market. These include Evernote, reQall, OneNote, Google’s Web history, and support for desktop search in operating systems. But as this book is being written these products remain small discrete solutions within a much larger puzzle.

HOW TO ORGANIZE AN E-MEMORY

Back in 2001 I could see we still had a lot of basic things to figure out about how to store and organize my data. We had just my sixteen gigabytes of documents and photographs loaded in my imperfect classification hierarchy of folders, and we had no good way to search them, sort them, annotate them, link them together into Bush’s “trails,” or analyze them for patterns and trends.

The files-and-folders method of organizing data is a fundamental feature of all modern operating systems such as Windows, Mac intosh, and Linux. File-and-folder hierarchies, even when stored digitally, suffer from the same basic limitation as libraries once did: Each book can exist only in one place, filed under one category. But an item might properly belong to several categories, or hundreds.
A Brief History of Time
is a physics book, but it’s also a book by Stephen Hawking, it was a best-seller, it talked about black holes, and it was published in 1988. You could easily come up with dozens of other attributes that would be perfectly legitimate criteria for tracking down
A Brief History of Time
and for sorting and grouping it with other books (and for that matter, for sorting and grouping it with other media of any kind: with lecture recordings, with songs, with articles, with pictures, with old news footage).

The MyLifeBits project ran into the problem like this: Logically, my Eagles folder should have been stored in both my Fun folder and my Animals folder, but in practice I had to make an arbitrary choice. And often, if I wanted to find some half-remembered piece of data again, I’d have to hunt for it in various folders, asking myself:
If I were me, where would I file it right now?

Librarians have been familiar with this restriction for centuries. A copy of a book can only be on one shelf in just one section, often determined by the Dewey decimal system of book topics. So they created paper card catalogs, where a card was a surrogate for a book. Now the book—or at least its surrogate card—could be filed in more than one place. Dewey might have it placed in the physics section, but it could also be in the title card catalog, filed alphabetically by its title, as well as being filed alphabetically by author in the author card catalog. For your convenience, the Dewey subject index would be duplicated in the card catalog also, allowing you to flip through cards with your fingers rather than hiking down the aisles.

So here I was, with a system that was
worse
than a library with paper card catalogs. I was like a librarian who was not allowed to have a card catalog. Jim Gray, who is widely celebrated as a pioneer, even a founding father, of database design, shook his head over me.

“You need to use a database, Gordon. When are you going to listen to me?” he would ask.

I was resistant. “We don’t need no stinkin’ databases,” I’d reply.

My resistance stemmed from my first experiences with databases back in the 1970s. Back then databases were still new and much hyped—they were also space-hogging and hard to use. I knew they’d been improved in the time since, but I’d heard enough horror stories over the years to keep my prejudices well nourished. Also, I wasn’t clear on exactly what I wanted out of even a well-behaved one. But, it turned out Jim Gray was right—as usual.

A database is a program for storing and retrieving large collections of interrelated information. Modern databases let you very quickly retrieve all the records with a given attribute. You can rapidly sort, sift, and combine information in just about any way you can imagine. There was once a slight technical distinction to be made between how a database could index and look up records and full-text retrieval of documents, but by now databases have subsumed full-text search; they are happy to store documents and perform Google-like retrieval.

In his memex paper, Bush had expressed hope that the search algorithms of the future would be better than simple index-lookup on some attribute like author or date. He held up the human brain’s associative memory as the ideal. In an associative network, items are linked together by contingency in time and space, by similarity, by context, and by usefulness. There are often numerous paths to each item.

Bush was right that trails and associative linking were critical components of an effective e-memory machine. But his dismissal of indexing was one of his rare failures of imagination. In his day, indexing meant alphabetical lookup in a predefined, noncompre hensive list of topics or keywords, as in a library card catalog. With the speed of modern computers, it has become possible to index every single word and phrase in every document and to search all of them in an instant. When indices are so comprehensive, and lookup by the index instantaneous, then indexing is actually the mechanism by which associative memory becomes possible.

The MyLifeBits research project revealed that any system that aspires to be sold as an e-memory machine in the age of Total Recall will have to use a database storage engine, including full-text indexing. Only a database will allow you to create two-way links between items (including annotations) and to regroup and recat egorize items and collections of items in an open-ended fashion. Only full-text indexing will give you keyword access to all of your e-memory.

With MyLifeBits we could find all items that share a certain property, such as having the same creation date, or having been edited during a particular meeting, or having been viewed within a certain span of hours after a particular phone call.

To make a database-style system work, we needed to include what is called metadata, or “data about data.” Metadata is essentially digital annotation about a file or other software object. Metadata may be embedded inside a file, or it may be “attached” to it from the outside. Conceptually, it’s a bit like a sticky label on a manila folder that characterizes its contents.

Your computer’s operating system keeps a little metadata on each file for bookkeeping, such as its creation date, the date last modified, the size of the file, and who is allowed to access it. Certain file types support additional metadata. For instance, the Microsoft Word document I am typing in right now lets me enter author, title, subject, keywords, category, status, and comments. Pictures in JPEG format can record things like the date taken, location, camera model, focal length, f-stop, and exposure time. Nearly all music formats include artist, album, composer, genre, and length.

Some of this metadata gets filled in automatically. Digital cameras fill in the JPEG fields when they take a picture. CD-ripping programs look up the album information on the Web and populate the metadata for each song. In contrast, the metadata for Microsoft Office documents must be manually entered. Your name, which you were prompted for when you installed Office, is prepopulated as the author in new documents, but everything is blank—and tends to stay that way. Many of these manual-entry attributes will remain blank until we have systems more like MyLifeBits so that there is an actual payback for doing the work of entry.

One kind of metadata attribute that is getting a lot of attention these days is called a tag. A tag is simply a single word or short phrase. Tags aren’t much different than the keywords attribute I have for this Word document. But they are creating a stir because there are some great photo applications, such as Flickr, that make tags easy to create and very useful for finding things again. You can add any number of tags to a file. For example, I have a photo of myself at age ten, with my four-year-old sister Sharon and my pony Snippy, who liked to bite. I kept that photo in a scrapbook for most of my life. It existed on one page only, and I was the only person who could find it quickly. But by putting the image into a database, I can tag it in any way I might find useful: Gordon, Sharon, Snippy, 1944, black-and-white, Missouri. Thereafter, I can use the tags for searching and sorting.

We also needed to be able to forge links—Bush’s “trails”—between any items or collections of items in my database. For instance, I wanted to be able to link some photographs to an entry in my calendar, to indicate that they were photos of that event. Or, if I record some audio of me talking about a photo, I want to be able to link the audio to the photo, so it is clear that it is a comment about the photo.

So Jim Gray and some other colleagues convinced us to take the plunge, and we created a database to hold all our files and other information. It was great. We could still view my data using my original folder-based organization scheme if we wanted, but that became just one of myriad ways we could view it. We could group items into what we called “collections,” and each item could belong to more than one collection.

We worked hard on annotations and metadata, making them automatic whenever possible, and otherwise trying to make it quick and easy at any moment to add information. For instance, I could select a bunch of items and then type a comment about them at any time. If I didn’t feel like typing, I could click a button and just say the comment. Silence would automatically be stripped out of the beginning and end of the comment so I could be relaxed about hitting the start and stop buttons. I could also add ratings to any item. I could comment and rate Web pages from inside the browser. I could rate and comment on anything that came up on my screensaver.

Other books

Beneath the Elder Tree by Hazel Black
Rise by Wood, Gareth
Some Can Whistle by McMurtry, Larry
A Siberian Werewolf in Paris by Caryn Moya Block
Impávido by Jack Campbell