The Boy Who Could Change the World (7 page)

BOOK: The Boy Who Could Change the World
4Mb size Format: txt, pdf, ePub

False Outliers

http://www.aaronsw.com/weblog/writefp

September 5, 2006

Age 19

So far
my Wikipedia script
has churned through about 200 articles, calculating who wrote what in each. This morning I looked through them to see if there were any that didn't match my theory. It printed out a couple and I decided to investigate.

The first it found was “
Alkane
,” a long technical article about acyclic saturated hydrocarbons that it said was largely written by
Physchim62
. Yesterday a good friend was telling me that he thought long technical articles were likely written by a single person, so I immediately thought that here was the proof that he was right. But, just to check, I decided to look in the edit history to make sure my script hadn't made an error.

It hadn't, I found, but once again simply looking at the numbers missed the larger point. Physchim62 had indeed contributed most of the article, but according to the edit comments, it was by translating the German version! I don't have the German data, but presumably it was written in the same incremental way as most of the articles in my study.

The next serious case was “
Characters in Atlas Shrugged
,” which the script said was written by CatherineMunro. Again, it seemed plausible that one person could have written all those character bios. But again, an investigation into the actual edit history found that Munro hadn't written them; instead she'd copied them from a bunch of subpages, merging them into one bigger page.

The final serious example was “
Anchorage, Alaska
,” which appeared to have been written by
JeffreyAllen1975
. Here the contributions
seemed quite genuine; JeffreyAllen1975 made tons of edits, each contributing a paragraph at a time. The work seemed to take quite a toll on him; at his user page he noted, “I just got burned-out and tired of the online encyclopedia. My time is being taken away from me by being with Wikipedia.” He lasted about four months.

Still, something seemed fishy about JeffreyAllen1975, so I decided to investigate further. Currently, the “Anchorage” page has a tag noting that “The current version of the article or section reads like an advertisement.” A bit of Googling revealed why: JeffreyAllen1975's contributions had been copied and pasted from other websites, like
the Anchorage Chamber of Commerce
(“Anchorage's public school system is ranked among the best in the nation. . . . The district's average SAT and ACT College entrance exam scores are consistently above the national average and Advanced Placement courses are offered at each of the district's larger high schools”).

I suspect JeffreyAllen1975 didn't know what he was doing. His writing style suggests he's just a kid: “In my free time, I am very proud of my-self by how much I've learned by making good edits on Wikipedia articles.” I'm pretty sure he just thought he was helping the project: “Wikipedia is like the real encyclopedia books (A through Z) that you see in the library, but better.” But his plagiarism will still have to be removed.

When I started, just looking at the numbers these seemed to be several cases that strongly contradicted my theory. And had I just stuck to looking at the numbers, I would have believed that to be the case as well. But, once again, investigation shows the picture to be far more interesting: translation, reorganization, and plagiarism. Exciting stuff!

(The Dandy Warhols) Come Down

http://www.aaronsw.com/weblog/comedown

September 22, 2006

Age 19

Well, the Wikipedia election has finally ended. The good news is that I can now talk about other things again. (For example, did you know that Erik Möller eats babies?) I have a backlog of about 20 posts that I built up over the course of the election. But instead of springing them on you all at once, I'll try to do daily posting again starting Monday. (Oooh.)

The actual results haven't been announced yet (and probably won't be for another couple days, while they check the list of voters for people who voted twice) but my impression is that I probably lost. Many wags have commented on how my campaign was almost destined to lose: I argued that the hard-core Wikipedia contributors weren't very important, but those were precisely the people who could vote for me—in other words, I alienated my only constituency.

“Aaron Swartz: Why is he getting so much attention?
” wrote fellow candidate Kelly Martin. “The community has long known that edit count is a poor measure of contributions.” Others, meanwhile, insisted my claims were so obviously wrong as to not be even worth discussing.

Jimbo Wales, on the other hand, finally sent me a nice message the other day letting me know that he'd removed the offending section from his talk and looked forward to sitting down with me and investigating the topic more carefully.

And for my part, I hope to be able to take up some of the offers I've received for computer time and run my algorithm across all of Wikipedia and publish the results in more detailed form. (I'd also like to
use the results to put up a little website where you can type in the name of a page and see who wrote what, color-coded or something like that.)

As for the election itself, it's much harder to draw firm conclusions. It's difficult in any election, this one even more so because we have so little data—no exit polls or phone surveys or even TV pundits to rely upon. Still, I'm fairly content seeing the kind words of all the incredible people I respect. Their support means a great deal to me.

The same is true of the old friends who wrote in during my essays along with all the new people who encouraged me to keep on writing. Writing the essays on a regular schedule was hard work—at one point, after sleeping overnight at my mother's bedside in the hospital, I trundled down at seven in the morning to find an Internet connection so I could write and post one—but your support made it worth the effort.

I hope that whoever wins takes what I've written into consideration. I'm not sure who that is yet, but there are some hints. I was reading an irreverent site critical of Wikipedia when I came across its claim that Jimbo Wales had sent an email to the Wikipedia community telling them who they should vote for. I assumed the site had simply made it up to attack Jimbo, but when I searched I found
it really was genuine
:

I personally strongly strongly support the candidacies of Oscar and Mindspillage.

[. . .]

There are other candidates, some good, but at least some of them are entirely unacceptable because they have proven themselves repeatedly unable to work well with the community.

For those reading the tea leaves, this suggests that the results will be something like: Eloquence, Oscar, Mindspillage. But we'll see.

The letdown after the election is probably not the best time to make plans, but if I had to, I'd probably decide to stay out of Wikipedia business for a while. It's a great and important project, but not the one for me.

Anyway, now everyone can
go back to vandalizing my Wikipedia page
. Laters.

Up with Facts: Finding the Truth in WikiCourt

http://www.aaronsw.com/weblog/001175

February 19, 2004

Age 17

I'm an optimist. I believe that statements like “Bush went AWOL” or “Gore claims to have invented the Internet” can be evaluated and decided pretty much true or false. (The conclusion can be a little more nuanced, but the important thing is that there's a definitive conclusion.)

And even crazier, I believe that if there was a fair and accurate system for determining which of these things were lies, people would stop repeating the lies. I would certainly try to. No matter how much I wanted to believe “Dean's state record sealing was normal” or “global warming does exist,” if a fair system had decided against it, I would stop.

And perhaps most crazy of all, I want to stop repeating falsehoods. I believe the truth is more important than particular political goals, so I want to build a system I can trust. I want to know that when I make claims, I'm not speaking out of political distortion but out of honest truth. And I want to be able to evaluate the claims of others too.

So how would such a system work? First, large claims (“Gore is a serial liar,” “Ronald Reagan was a great president”) would be broken down into smaller component parts (“Gore claimed to have invented the Internet,” “Ronald Reagan's economic plan created jobs”). On each small claim, we'd run The Process. Let's take “Gore falsely claimed to have invented the Internet.”

First, some ground rules. Everything is open. Anyone can submit anything, and all the records are put on a public website.

We'd begin with collecting evidence. Anyone could submit helpful factual evidence. We'd get videotape from CNN of what exactly Gore said. We'd get congressional records about Gore's funding of the Arpanet. We'd get testimony from people involved. And so on. If someone challenged a piece of evidence's validity (e.g., “that photo is doctored,” “that testimony is forged”), a Mini-Process could be started to resolve the issue.

Then there'd be the argument phase. A wiki page would be created where each side would try to take facts from the evidence and use them to build an argument for their case. But then the other side could modify the page to provide their own evidence, expand selective quotations, and otherwise modify the page to make it more accurate and less partisan. Each side would continue bashing the other side's work until the page gave the best arguments from each side, presented in such a way that nobody could object. (You may think that this is impossible, but Wikipedia has ably proven that it can work.)

Finally, there'd be the adjudication phase. This is the hard part. A group of twelve fair-minded intelligent people (experts in the field, if necessary) would agree to put aside their partisanship and come to a conclusion based on the argument. Hopefully, most of the time this conclusion would be (after a little wiki-rewriting from both sides) unanimous. For example, “While Gore's phrasing was a little misleading, it is clear Gore was claiming to have led the fight for providing funding for research that was later developed into the Internet—a claim that is mostly true. Gore was one of the research's major backers, although others were involved.”

The panel would be assembled by selecting people widely seen as fair-minded and intelligent, but coming from different sides of the political spectrum. It is likely many would accept—all they'd need to do was read a page and spend a little time agreeing to summarize it. And in doing so, they'd provide a great contribution to political debate (as well as getting their side represented).

All of these phases would be going on essentially simultaneously—the argument could be updated as new evidence came to light, new evidence could be added to fill holes in the argument, and the adjudicating jury could keep tabs on the page as updated.

And once a decision on an issue was made, it could be cited as evidence in the argument for a related issue (“Gore is a serial liar”).

Everything would be very fluid and wiki-like. We'd make up the rules as we went along, seeing what was necessary. And when we learned from our mistakes, we could go back and fix them.

This seems like an awful lot of effort for just coming to a decision on a couple of silly issues, but I think it's far more than that. The result would be a vast collection of trustable arguments for many of the hot topics of the day, a collection that could be relied on through time to give you the fair truth—because everybody had essentially signed off on it (it is publicly modifiable, after all). And if you look at the effort expended on these claims and political fights, spending a little time getting the facts right seems like a small price to pay.

Other books

The Blue Notes by J. J. Salkeld
The Pearls by Deborah Chester
Child of My Right Hand by Eric Goodman
The Stupidest Angel by Christopher Moore
Camino de servidumbre by Friedrich A. Hayek
Forsaken by James David Jordan