The Boy Who Could Change the World (4 page)

Read The Boy Who Could Change the World Online

Authors: Aaron Swartz

BOOK: The Boy Who Could Change the World

9.18Mb size Format: txt, pdf, ePub

Read Book Download Book

The Fruits of Mass Collaboration

http://www.aaronsw.com/weblog/masscollab

July 18, 2006

Age 19

I often think that the world needs to be a lot more organized. Lots of people write reviews of television shows, but nobody seems to collect and organize them all. Good introductory guides to subjects are essential for learning, yet I only stumble upon them by chance. The cumulative knowledge of science is one of our most valuable cultural products, yet it can only be found scattered across thousands of short articles in hundreds of different journals.

I suspect the same thoughts occur to many of a similar cast of mind, since there's so much effort put into discouraging them. The arbiters of respectable opinion are frequently found to mock such grand projects or point out deficiencies in them. And a friend of mine explained to me that soon out of school he nearly killed himself by trying to embark on such a grand project and now tries to prevent his friends from making the same mistake.

One can, of course, make the reverse argument: since there is so much need for such organization projects, they must be pretty impossible. But upon closer inspection, that isn't true. Is there a project more grand than an encyclopedia or a dictionary? Who dares to compress all human knowledge or an entire language into a single book? And yet, there's not just one but several brands of each!

It seems that when the audience is large enough (and just about everyone has use for encyclopedias and dictionaries), it is possible to take on grand projects. This suggests that the holdup is not practical, but economic. The funding simply isn't there to do the same for other things.

But all this is only true for the era of the book, where such a project means gathering together a group of experts and having them work full-time to build a reference work which can be published and sold expensively to libraries. I tend to avoid net triumphalism, but the Internet, it would seem, changes that. Wikipedia was created not by dedicated experts but by random strangers, and while we can complain about its deficiencies, all admit that it's a useful service.

The Internet is the first medium to make such projects of mass collaboration possible. Certainly numerous people send quotes to Oxford for compilation in the
Oxford English Dictionary,
but a full-time staff is necessary to sort and edit these notes to build the actual book (not to mention all the other work that must be done). On the Internet, however, the entire jobâcollection, summarization, organization, and editingâcan be done in spare time by mutual strangers.

An even more striking, but less remarked-upon, example is Napster. Within only months, almost as a by-product, the world created the most complete library of music and music catalog data ever seen. The contributors to this project didn't even realize they were doing this! They all thought they were simply grabbing music for their own personal use. Yet the outcome far surpassed anything consciously attempted.

The Internet fundamentally changes the practicalities of large organization projects. Things that previously seemed silly and impossible, like building
a detailed guide to every television show
, are now being done as a matter of course. It seems like we're in for an explosion of such modern reference works, perhaps with new experiments into tools for making them.

The Techniques of Mass Collaboration: A Third Way Out

http://www.aaronsw.com/weblog/masscollab2

July 19, 2006

Age 19

I'm not the first
to suggest that the Internet could be used for bringing users together to build grand databases
. The most famous example is the Semantic Web project (where, in full disclosure, I worked for several years). The project, spearheaded by Tim Berners-Lee, inventor of the web, proposed to extend the working model of the web to more structured data, so that instead of simply publishing text web pages, users could publish their own databases, which could be aggregated by search engines like Google into major resources.

The Semantic Web project has received an enormous amount of criticism, much (in my view) rooted in misunderstandings, but much legitimate as well. In the news today is just the most recent example, in which famed computer scientist turned Google executive
Peter Norvig challenged Tim Berners-Lee
on the subject at a conference.

The confrontation symbolizes the (at least imagined) standard debate on the subject, which Mark Pilgrim termed
million-dollar markup versus million-dollar code
. Berners-Lee's W3C, the supposed proponent of million-dollar markup, argues that users should publish documents that state in special languages that computers can process exactly what they want to say. Meanwhile Google, the supposed proponent of million-dollar code, thinks this is an impractical fantasy, and that the only way forward is to write more advanced software to try to extract the meaning from the messes that users will inevitably create.
^*
*

But yesterday I suggested what might be thought of as a third way out, one Pilgrim might call million-dollar users. Both the code and the markup positions make the assumption that users will be publishing their own work on their own websites and thus we'll need some way of reconciling it. But Wikipedia points to a different model, where all the users come to
one
website, where the interface for inputting data in the proper format is clear and unambiguous, and the users can work together to resolve any conflicts that may come up.

Indeed, this method strikes me as so superior that I'm surprised I don't see it discussed in this context more often. Ignorance doesn't seem plausible; even if Wikipedia was a latecomer, sites like
ChefMoz
and
MusicBrainz
followed this model and were Semantic Web case studies. (Full disclosure: I worked on the Semantic Web portions of MusicBrainz.) Perhaps the reason is simply that both sidesâW3C and Googleâhave the existing web as the foundation for their work, so it's not surprising that they assume future work will follow from the same basic model.

One possible criticism of the million-dollar-users proposal is that it's somehow less free than the individualist approach. One site will end up being in charge of all the data and thus will be able to control its formation. This is perhaps not ideal, certainly, but if the data is made available under a free license it's no worse than things are now with free software. Those angry with the policies can always exercise their right to “fork” the project if they don't like the direction things are going. Not ideal, certainly, but we can try to dampen such problems by making sure the central sites are run as democratically as possible.

Another argument is that innovation will be hampered: under the individualist model, any person can start doing a new thing with their data, and hope that others will pick up the technique. In the centralized model, users are limited by the functionality of the centralized site. This too can be ameliorated by making the centralized site as open to innovation as possible, but even if it's closed, other people can still do new things by downloading the data and
building additional services on top of it (as indeed
many have done with Wikipedia
).

It's been eight years since Tim Berners-Lee published his
Semantic Web Roadmap
and it's difficult to deny that things aren't exactly going as planned. Actual adoption of Semantic Web technologies has been negligible and nothing that promises to change that appears on the horizon. Meanwhile, the million-dollar-code people have not fared much better. Google has been able to launch a handful of very targeted features, like
music search
and
answers to very specific kinds of questions
, but these are mere conveniences, far from changing the way we use the web.

By contrast, Wikipedia has seen explosive growth,
Amazon.com
has become the premier site for product information, and when people these days talk about user-generated content, they don't even consider the individualized sense that the W3C and Google assume. Perhaps it's time to try the third way out.

*
I say supposed because although this is typically how the debate is seen, I don't think either the W3C or Google actually hold the strict positions on the subject typically ascribed to them. Nonetheless, the question is real and it's convenient to consider the strongest forms of the positions.

Wikimedia at the Crossroads

http://www.aaronsw.com/weblog/wikiroads

August 31, 2006

Age 19

A couple weeks ago I had the great privilege of attending Wikimania, the international Wikimedia conference. Hundreds from all over the world gathered there to discuss the magic that is Wikipedia, thinking hard about what it means and why it works. It was an amazing intellectual and emotional experience.

The main attraction was seeing the vibrant Wikipedia community. There were the hardcore Wikipedians, who spend their days reviewing changes and fixing pages. And there were the elder statesmen, like Larry Lessig and Brewster Kahle, who came to meet the first group and tell them how their work fits into a bigger picture. Spending time with all these people was amazing funâthey're all incredibly bright, enthusiastic, and, most shockingly, completely dedicated to a cause greater than themselves.

At most “technology” conferences I've been to, the participants generally talk about technology for its own sake. If
use
ever gets discussed, it's only about using it to make vast sums of money. But at Wikimania, the primary concern was doing the most good for the world, with technology as the tool to help us get there. It was an incredible gust of fresh air, one that knocked me off my feet.

There was another group attending, however: the people holding up the platform on which this whole community stands. I spent the first few days with the mostly volunteer crew of hackers who keep the websites up and running. In later days, I talked to the site administrators who exercise the power that the software gives them. And I
heard much about the Wikimedia Foundation, the not-for-profit that controls and runs the sites.

Much to my surprise, this second group was almost the opposite of the first. With a few notable exceptions, when they were offstage they talked gossip and details: how do we make the code stop doing this, how do we get people to stop complaining about that, how can we get this other group to like us more. Larger goals or grander visions didn't come up in their private conversations; instead they seemed absorbed by the issues of the present.

Of course, they have plenty to be absorbed by. Since January, Wikipedia's traffic has more than doubled and this group is beginning to strain under the load. At the technical level, the software development and server systems are both managed by just one person, Brion Vibber, who appears to have his hands more than full just keeping everything running. The entire system has been cobbled together as the site has grown, a messy mix of different kinds of computers and code, and keeping it all running sounds like a daily nightmare. As a result, actual software development goes rather slowly, which cannot help but affect the development of the larger project.

The small coterie of site administrators, meanwhile, are busy dealing with the ever-increasing stream of complaints from the public. The recent Seigenthaler affair, in which the founding editor of
USA Today
noisily attacked Wikipedia for containing a grievous error in its article on him, has made people very cautious about how Wikipedia treats living people. (Although
to judge just from the traffic numbers
, one might think more such affairs might be a good idea . . . One administrator told me how he spends his time scrubbing Wikipedia clean of unflattering facts about people who call the head office to complain.

Finally, the Wikimedia Foundation Board seems to have devolved into inaction and infighting. Just four people have been actually hired by the Foundation, and even they seem unsure of their role in a largely volunteer community. Little about this groupâwhich, quite literally, controls Wikipediaâis known by the public. Even when they were talking to dedicated Wikipedians at the conference, they put a public face on things, saying little more than “Don't you folks worry, we'll straighten everything out.”

The plain fact is that Wikipedia's gotten too big to be run by just a couple of people. One way or another, it's going to have to become an organization; the question is what kind. Organizational structures are far from neutral: whose input gets included decides what actions get taken, the positions that get filled decide what things get focused on, the vision at the top sets the path that will be followed.

I worry that Wikipedia, as we know it, might not last. That its feisty democracy might ossify into staid bureaucracy, that its innovation might stagnate into conservatism, that its growth might slow to stasis. Were such things to happen, I know I could not just stand by and watch the tragedy. Wikipedia is just too importantâboth as a resource and as a modelâto see fail.

That is why, after much consideration, I've decided to run for a seat on the Wikimedia Foundation's Board. I've been a fairly dedicated Wikipedian since 2003, adding and editing pages whenever I came across them. I've gone to a handful of Wikipedia meetups and even got my photo on the front page of the
Boston Globe
as an example Wikipedian. But I've never gotten particularly involved in Wikipedia politicsâI'm not an administrator, I don't get involved in policy debates, I hardly even argue on the “talk pages.” Mostly, I just edit.

And, to be honest, I wish I could stay that way. When people at Wikimania suggested I run for a Board seat, I shrugged off the idea. But since then, I've become increasingly convinced that I should run, if only to bring attention to these issues. Nobody else seems to be seriously discussing this challenge.

The election begins today and lasts three weeks. As it rolls on, I plan to regularly publish essays like this one, examining the questions that face Wikipedia in depth. Whether I win or not, I hope we can use this opportunity for a grand discussion about where we should be heading and what we can do to get there. That said, if you're an eligible Wikipedian, I hope that you'll please
vote for me
.

Other books

Aunt Margaret's Lover by Mavis Cheek

Dusk by Ashanti Luke

Bone Dust & Beginnings (Alexa's Travels Book 1) by Angela White

Hotel Bosphorus by Esmahan Aykol

The Marshal and the Murderer by Magdalen Nabb

Soul Circus by George P. Pelecanos

Runaway by Winterfelt, Helen

Telón by Agatha Christie

The Red Rose of Anjou by Jean Plaidy

Wake Up and Dream by Ian R. MacLeod