Read The Boy Who Could Change the World Online
Authors: Aaron Swartz
http://www.aaronsw.com/weblog/whowriteswikipedia
September 4, 2006
Age 19
I first met Jimbo Wales, the face of Wikipedia, when he came to speak at Stanford. Wales told us about Wikipedia's history, technology, and culture, but one thing he said stands out. “The idea that a lot of people have of Wikipedia,” he noted, “is that it's some emergent phenomenonâthe wisdom of mobs, swarm intelligence, that sort of thingâthousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work.” But, he insisted, the truth was rather different: Wikipedia was actually written by “a community . . . a dedicated group of a few hundred volunteers” where “I know all of them and they all know each other.” Really, “it's much like any traditional organization.”
The difference, of course, is crucial. Not just for the public, who wants to know how a grand thing like Wikipedia actually gets written, but also for Wales, who wants to know how to run the site. “For me this is really important, because I spend a lot of time listening to those four or five hundred and if . . . those people were just a bunch of people talking . . . maybe I can just safely ignore them when setting policy” and instead worry about “the million people writing a sentence each.”
So did the Gang of 500 actually write Wikipedia? Wales decided to run a simple study to find out: he counted who made the most edits to the site. “I expected to find something like an 80-20 rule: 80% of the work being done by 20% of the users, just because that seems to come up a lot. But it's actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the
users . . . 524 people. . . . And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” The remaining 25% of edits, he said, were from “people who [are] contributing . . . a minor change of a fact or a minor spelling fix . . . or something like that.”
Stanford wasn't the only place he's made such a claim; it's part of the standard talk he gives all over the world. “This is the group of around a thousand people who really matter,” he told us at Stanford. “There is this tight community that is actually doing the bulk of all the editing,” he explained at the Oxford Internet Institute. “It's a group of around a thousand to two thousand people,” he informed the crowd at GEL 2005. These are just the three talks I watched, but Wales has given hundreds more like them.
At Stanford the students were skeptical. Wales was just counting the number of editsâthe number of times a user changed something and clicked save. Wouldn't things be different if he counted the amount of text each user contributed? Wales said he planned to do that in “the next revision” but was sure “my results are going to be even stronger,” because he'd no longer be counting vandalism and other changes that later got removed.
Wales presents these claims as comforting. Don't worry, he tells the world, Wikipedia isn't as shocking as you think. In fact, it's just like any other project: a small group of colleagues working together toward a common goal. But if you think about it, Wales's view of things is actually much
more
shocking: around a thousand people wrote the world's largest encyclopedia in four years for free. Could this really be true?
Curious and skeptical, I decided to investigate. I picked an article at random (“Alan Alda”) to see how it was written.
Today the Alan Alda page
is a pretty standard Wikipedia page: it has a couple photos, several pages of facts and background, and a handful of links. But
when it was first created
, it was just two sentences: “Alan Alda is a male actor most famous for his role of Hawkeye Pierce in the television series MASH. Or [sic] recent work, he plays sensitive male characters in drama movies.” How did it get from there to here?
Edit by edit, I watched the page evolve. The changes I saw largely fell into three groups. A tiny handfulâprobably around 5 out of
nearly 400âwere “vandalism”: confused or malicious people adding things that simply didn't fit, followed by someone undoing their change. The vast majority, by far, were small changes: people fixing typos, formatting, links, categories, and so on, making the article a little nicer but not adding much in the way of substance. Finally, a much smaller amount were genuine additions: a couple sentences or even paragraphs of new information added to the page.
Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that's not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account.
To investigate more formally, I purchased some time on a computer cluster and downloaded a copy of the Wikipedia archives. I wrote a little program to go through each edit and count how much of it remained in the latest version.
*
*
Instead of counting edits, as Wales did, I counted the number of letters a user actually contributed to the present article.
If you just count edits, it appears the biggest contributors to the Alan Alda article (7 of the top 10) are registered users who (all but 2) have made thousands of edits to the site. Indeed, #4 has made over 7,000 edits while #7 has over 25,000. In other words, if you use Wales's methods, you get Wales's results: most of the content seems to be written by heavy editors.
But when you count letters, the picture dramatically changes: few of the contributors (2 out of the top 10) are even registered and most (6 out of the top 10) have made less than 25 edits to the entire site. In fact, #9 has made exactly one editâthis one! With the more reasonable
metricâindeed, the one Wales himself said he planned to use in the next revision of his studyâthe result completely reverses.
I don't have the resources to run this calculation across all of Wikipedia (there are over 60 million edits!), but I ran it on several more randomly selected articles and the results were much the same. For example, the largest portion of the “Anaconda” article was written by a user who only made 2 edits to it (and only 100 on the entire site). By contrast, the largest number of edits were made by a user who appears to have contributed no text to the final article (the edits were all deleting things and moving things around).
When you put it all together, the story becomes clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire siteâthe kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it's the outsiders who provide nearly all of the content.
And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.
On the other hand, everyone has a bunch of obscure things that, for one reason or another, they've come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.
Other encyclopedias work similarly, just on a much smaller scale: a large group of people write articles on topics they know well, while a small staff formats them into a single work. This second group is clearly very importantâit's thanks to them encyclopedias have a consistent look and toneâbut it's a severe exaggeration to say that they wrote the encyclopedia. One imagines the people running
Britannica
worry more about their contributors than their formatters.
And Wikipedia should too. Even if all the formatters quit the
project tomorrow, Wikipedia would still be immensely valuable. For the most part, people read Wikipedia because it has the information they need, not because it has a consistent look. It certainly wouldn't be as nice without one, but the people who (like me) care about such things would probably step up to take the place of those who had left. The formatters aid the contributors, not the other way around.
Wales is right about one thing, though. This fact does have enormous policy implications. If Wikipedia is written by occasional contributors, then growing it requires making it easier and more rewarding to contribute occasionally. Instead of trying to squeeze more work out of those who spend their life on Wikipedia, we need to broaden the base of those who contribute just a little bit.
Unfortunately, precisely because such people are only occasional contributors, their opinions aren't heard by the current Wikipedia process. They don't get involved in policy debates, they don't go to meetups, and they don't hang out with Jimbo Wales. And so things that might help them get pushed on the back burner, assuming they're even proposed.
Out of sight is out of mind, so it's a short hop to thinking these invisible people aren't particularly important. Thus Wales's belief that 500 people wrote half an encyclopedia. Thus his assumption that outsiders contribute mostly vandalism and nonsense. And thus the comments you sometimes hear that making it hard to edit the site might be a good thing.
“I'm not a wiki person who happened to go into encyclopedias,” Wales told the crowd at Oxford. “I'm an encyclopedia person who happened to use a wiki.” So perhaps his belief that Wikipedia was written in the traditional way isn't surprising. Unfortunately, it is dangerous. If Wikipedia continues down this path of focusing on the encyclopedia at the expense of the wiki, it might end up not being much of either.
*
The details: I downloaded a copy of the
enwiki
-
20060717
-pages-meta-history.xml.bz2 archive, broke it up into pages, iterated over the revisions and recursively applied Python's difflib.SequenceMatcher.find _ longest _ match to each revision and the latest revision. (I used find _ longest _ match instead of get _ matching _ blocks because get _ matching _ blocks didn't properly handle blocks being reordered.) I only counted the characters which hadn't already been matched by an earlier revision.
http://www.aaronsw.com/weblog/whorunswikipedia
September 7, 2006
Age 19
During Wikimania, I gave a short talk proposing some new features for Wikipedia. The audience, which consisted mostly of programmers and other high-level Wikipedians, immediately began suggesting problems with the idea. “Won't bad thing X happen?” “How will you prevent Y?” “Do you really think people are going to do Z?” For a while I tried to answer them, explaining technical ways to fix the problem, but after a couple rounds I finally said:
           Â
Stop.
                Â
If I had come here five years ago and told you I was going to make an entire encyclopedia by putting up a bunch of web pages that anyone could edit, you would have been able to raise a thousand objections: It will get filled with vandalism! The content will be unreliable! No one will do that work for free!
                Â
And you would have been right to. These were completely reasonable expectations at the time. But here's the funny thing: it worked anyway.
At the time, I was just happy this quieted them down. But later I started thinking more about it. Why did Wikipedia work anyway?
It wasn't because its programmers were so farsighted that the software solved all the problems. And it wasn't because the people running it put clear rules in place to prevent misbehavior. We know this because when Wikipedia started it didn't have any programmers (it
used off-the-shelf wiki software) and it didn't have clear rules (one of the first major rules was apparently “
Ignore all rules
”).
No, the reason Wikipedia works is because of the community, a group of people that took the project as their own and threw themselves into making it succeed.
People are constantly trying to vandalize Wikipedia, replacing articles with random text. It doesn't work; their edits are undone within minutes, even seconds. But why? It's not magicâit's a bunch of incredibly dedicated people who sit at their computers watching every change that gets made. These days they call themselves the “recent changes patrol” and have special software that makes it easy to undo bad changes and block malicious users with a couple clicks.
Why does anyone do such a thing? It's not particularly fascinating work, they're not being paid to do it, and nobody in charge asked them to volunteer. They do it because they care about the site enough to feel responsible. They get upset when someone tries to mess it up.
It's hard to imagine anyone feeling this way about
Britannica
. There are people who love that encyclopedia, but have any of them shown up at their offices offering to help out? It's hard even to imagine. Average people just don't feel responsible for
Britannica
; there are professionals to do that.
Everybody knows Wikipedia as the site anyone can edit. The article about tree frogs wasn't written because someone in charge decided they needed one and assigned it to someone; it was written because someone, somewhere, just went ahead and started writing it. And a chorus of others decided to help out.
But what's less well known is that it's also the site that anyone can run. The vandals aren't stopped because someone is in charge of stopping them; it was simply something people started doing. And it's not just vandalism: a “welcoming committee” says hi to every new user, a “cleanup taskforce” goes around doing fact-checking. The site's rules are made by rough consensus. Even the servers are largely run this wayâa group of volunteer sysadmins hang out on IRC, keeping an eye on things. Until quite recently, the Foundation that supposedly runs Wikipedia had no actual employees.
This is so unusual, we don't even have a word for it. It's tempting to say “democracy,” but that's woefully inadequate. Wikipedia doesn't
hold a vote and elect someone to be in charge of vandal fighting. Indeed, “Wikipedia” doesn't do anything at all. Someone simply sees that there are vandals to be fought and steps up to do the job.
This is so radically different that it's tempting to see it as a mistake. Sure, perhaps things have worked so far on this model, but when the real problems hit, things are going to have to change: certain people must have clear authority, important tasks must be carefully assigned, everyone else must understand that they are simply volunteers.
But Wikipedia's openness isn't a mistake; it's the source of its success. A dedicated community solves problems that official leaders wouldn't even know were there. Meanwhile, their volunteerism largely eliminates infighting about who gets to be what. Instead, tasks get done by the people who genuinely want to do them, who just happen to be the people who care enough to do them right.
Wikipedia's biggest problems have come when it's strayed from this path, when it's given some people official titles and specified tasks. Whenever that happens, real work slows down and squabbling speeds up. But it's an easy mistake to make, so it gets made again and again.
Of course, that's not the only reason this mistake is made; it's just the most polite. The more frightening problem is that people love to get power and hate to give it up. Especially with a project as big and important as Wikipedia, with the constant swarm of praise and attention, it takes tremendous strength to turn down the opportunity to be its official X, to say instead, “It's a community project, I'm just another community member.”
Indeed, the opposite is far more common. People who have poured vast amounts of time into the project begin to feel they should be getting something in return. They insist that, with all their work, they
deserve
an official job or a special title. After all, won't clearly assigning tasks be better for everyone?
And so, the trend is clear: more power, more people, more problems. It's not just a series of mistakes, it's the tendency of the system.
It would be absurd for me to say that I'm immune to such pressures. After all, I'm currently running for a seat on the Wikimedia Board. But I also lie awake at night worrying that I might abuse my power.
A systemic tendency like this is not going to be solved by electing the right person to the right place and then going to back to sleep while they solve the problem. If the community wants to remain in charge, it's going to have to fight for it. I'm writing these essays to help people understand that this is something worth fighting for. And if I'm elected to the Board, I plan to keep on writing.
Just as Wikipedia's success as an encyclopedia requires a world of volunteers to write it, Wikipedia's success as an organization requires the community of volunteers to run it. On the one hand, this means opening up the Board's inner workings for the community to see and get involved in. But it also means opening up the actions of the community so the wider world can get involved. Whoever wins this next election, I hope we all take on this task.