Authors: Douglas Edwards
Sanjay worked on the new indexing system, which would be tested for the first time with the April index. Instead of taking three or four months, the index would have to be finished in one. "When something failed, we were on it," he told me. "people would just wake up every few hours and see if anything bad had happened, and then go fix it. The working style was long hours, constant attention, and quickly fixing things as they went bad."
Jeff remembers this entire period as the most demanding of his decade at Google: "From March 2000 to the end of 2001 was just frantically spent redesigning our systems and trying out different ideas on a very short time scale. The difficulty was the product of several different dimensions. How many queries do you have? How big is your index? How often do you want to update your index?"
The Bens provided a perfect barometer for reading the state of the software development. As each shouldered his part of the load, signs of the pressure he was under increased. Gomes was first, because he was helping JeffnSanjay with the code for the actual crawler and indexer and updating the pageranker. Polk was next as he worked with Bogdan to push the index out to the data centers.
"That was a pretty tricky business," Gomes recalls, "because you had to copy onto machines that were currently serving the traffic and then at some point arrange for the flipover to take place. The new index was almost always larger and more complex, and you were never sure what would go wrong at a flip." Something almost always blew up. Gomes once started to explain the process to a colleague: "If things go well ..." Then he paused and asked, "Why the hell am I saying, 'If things go well'?"
Ben Smith owned the front-end infrastructure that enabled Google to serve the index to Yahoo. Smith and Craig Silverstein were the experts on the Google Web Server (GWS), the system that actually communicated directly with users. This put Smith in the role of riding herd over latency problems.
"It was the most miserable couple of months in my life," Smith said about the Yahoo buildup. "I'd be driving home with the sun coming up. I'd get four hours of sleep and then head back to work."
Every neck felt the hot breath of failure, and every throat tasted the bile waiting to erupt if they fell behind schedule. Though not all felt it with the same intensity. "I wasn't too worried about it," Urs told me. "What we promised Yahoo was a lot smaller than our goal in terms of coverage [the 1B index]. The scary things were the reliability parts, not the quality. They can't measure quality."
Urs knew that ultimately it was just a business deal and that Yahoo had the upper hand. "If Yahoo wanted to walk away," he conceded, "they could walk away. They didn't even need a pretext. It was a pretty one-sided contract."
Google would be taking a calculated risk by giving Yahoo guarantees, but Urs made that calculation and felt comfortable enough with the odds that he slept easy at night. "We promised ninety-nine-point-five percent uptime," he said, "and we weren't reeeeaaallllly quite there. So you look at the penalties and say, fine, if it occasionally happens, then we'll pay some of these penalties. Hopefully in a good partnership, people are going to be rewarding you for seriously trying. And we were definitely
seriously
trying."
So what did all this effort produce?
"Mostly," Jeff said, "we wanted to get many more queries per second served out of these machines. One of the big things we did was completely change the index format to make it much more compact."
In layman's terms, Google's index was full of spaces that didn't need to be there—it fit the data like baggy pants in constant danger of hitting the ground. Google wasted precious time searching empty pockets to find the bits it needed. One of JeffnSanjay's innovations was to shove most occurrences of a particular word into a single block in the database. Kind of like putting all your nickels in one pocket, dimes in another, so if you see a nickel, you know not to waste time searching through that pocket for a dime. The software searching the index could tell quickly from the block header that it didn't need anything in that block and skip ahead, which made each machine faster.
"We improved that," Jeff said, "and we added skip tables to skip even larger chunks than just blocks." The goal was to minimize the number of times Google read each hard drive, because physically moving a head across a disk is far, far slower than doing things within an electronic circuit. JeffnSanjay rewrote the disk-scheduling systems to give each disk its own set of code. That cut search times by thirty to forty percent. A thirty-percent improvement was like running a four-minute mile in under three. A stunning accomplishment. But it wasn't enough.
So Jeff and Sanjay got rid of "the."
"The" is the most common word in English and conveys little useful information. JeffnSanjay decided to ignore it, freeing up one percent of the space being used by the index. The only downside? It became infinitely harder to find information about the eighties alternative rock band "The The." Engineering lives and dies by its tradeoffs.
To keep the failure of a single machine from corrupting the data and requiring a restart of the entire crawl, the war room team implemented checkpointing, which saved the state of the crawl so that if things blew up they could go back to the last checkpoint instead of starting over from the beginning.
With the hardware on its way to the data centers and the crawler, the indexer, the ranker, and the serving side progressing, only one issue remained. Yahoo wanted its search results to appear current, so it insisted that at least part of its index be updated on a daily basis.
Think of a card shark at a blackjack table. She carefully arranges the cards to ensure that everyone gets a good hand, but not as good as hers. She starts dealing around the table. Now imagine her trying to add new cards to the deck in her hand as she deals, improving all the results, including her own. It was that kind of problem.
Google's PageRank algorithm required a full day and a half to score an index. Adding additional information every twenty-four hours meant the pageranker would have to run faster, while integrating the new data in all the appropriate places. "It is a much harder problem to update an index every day than it is to have a static index," Jeff explained. "There are many more moving pieces to deal with."
Jeff was maxed out. Sanjay was overloaded. Ben Gomes had a full plate. Developing an incremental indexing system could take a dedicated team of programmers years, and there were only weeks before the contract went into effect. Larry and Sergey, understanding the desperate need, threw the resource floodgates open and gave Urs carte blanche to do what was needed. Never one to waste an opportunity, Urs went all out. He hired a guy.
"I had no experience with crawls," Anurag Acharya recalls, "and Google didn't tell people what they would be working on." Urs had sung his siren song at perfect pitch and persuaded his former UC Santa Barbara colleague to abandon academia for Silicon Valley.
On his first day, Anurag focused on part of the indexing system. That same evening, Urs stopped by for a chat about his next assignment.
"I'll take a look at the logs," Anurag suggested, "and see what problems there might be."
"Why don't you do incremental indexing for a while," Urs casually replied, "and then we'll see?"
"I say 'Yeah,'" Anurag told me about that conversation, "like I know what doing incremental indexing really means. So there went the next five months."
Google didn't haze newbies, but Anurag must have felt as if he'd been led blindfolded into a room full of drunken frat boys with wooden paddles. He was hit with the complex issues of how to crawl additional sites, rank them appropriately, and then integrate them seamlessly into the existing index.
"I don't think I was brought in specifically for the index," he said. "It just happened. I showed up at that point, and at that point, those were the problems."
"Anurag started and a couple of us in the company knew him," said Ben Smith, who had been Anurag's student at UCSB, "and he basically just disappeared. He wouldn't come down to lunch. He was always in his office. He was there late for two months. What is up with this guy? And then Urs called me into his office and said, 'This is what's coming. Soon. Can you help him out?' Okay. Now I understood why."
Smith knew exactly what he was getting into. The first time Urs had asked him to take on incremental indexing had been almost a year earlier, on his first day as an intern at Google. Smith had refused. "I said," he told me with a laugh, "'That's way too big for a summer project, nobody really knows how to do that. I don't wanna tackle that.'" Now he and Anurag would have to figure it out in a matter of weeks.
Smith had already sped up Google's response rate by improving the search engine's ability to cache queries. The first time someone searched for "hotels in Madrid," Google searched the entire index, then stored the query and the results it had found. The next time someone searched for "hotels in Madrid," Smith's code delivered the same results from memory, without having to search the index. Instead of accessing hundreds of machines, a cached query used only one—an enormous reduction in the cost of search. Unfortunately for Smith, the new incremental index threatened to undo his work, because a continuously refreshed index would quickly make cached queries obsolete.
"Anurag cranked maybe six to eight weeks and he had something that kinda worked," recalls Smith. "He wrote a new server called 'the mixer,' which hid the fact that we were talking to two different indices [a daily index and the main index] and mixed them together."
*
"Anurag and I were very stressed," Smith went on. "For whatever reason, we had to keep it quiet." They couldn't talk about what they were doing or why they were in the office every night after even the vampire coders had gone home. "Many, many days, we'd leave somewhere between three and five a.m. That was the time when Anurag and I could try to plug in our new system, because that was when Google had the least amount of search traffic. There were a lot of days where it was, 'Let's turn it on and see how it works,' because we didn't really know. The mixer would talk to the cache and the mixer would talk to the incremental. And sometimes the mixer would melt down and sometimes the incremental would melt down, because it didn't have enough capacity, and we'd say, 'Okay. Why? What happened and how do you fix it?'"
The hours and the stress shaved tolerances among the engineers until little remained to insulate their frustrations from the friction of the outside world.
"For a large fraction of my career here," Smith explained to me, "I worked on infrastructure or on the serving side. Larry seemed much more interested in the product aspect of things. He wasn't interested in the infrastructure side of the Yahoo deal—he didn't even know what was going on regarding it. I remember one time he wandered into my office and made some crack like 'You need to relax more,' and I just chewed him out."
Because the 1B index devoured almost all the available machines, only a few hundred remained for the incremental team to use. Even if ops could have built them faster, there were no data centers in which to put them. The team struggled on as the last days of May passed and July loomed over the horizon like the Imperial death star.
The final deadline was a week away.
The machines were built, the data centers filled. The crawler had worked. The indexer had worked. The pageranker had worked. Google had identified a billion URLs and now could search them. We had the superior technology. The Yahoo deal proved we had the business smarts to go with it. It was time to take our light from under its bushel and show it to the world.
At 2:59 a.m. on Monday, June 26, 2000, Cindy sat in her office, her fingers poised on the keyboard, waiting to hit Send. On her screen was a press release announcing that Google was now the largest search engine on the planet. A minute later, just in time to feed the gaping morning news maw on the East Coast, the message was on its way. Cindy gave the business and technology editors an hour to digest that tantalizing morsel, then served the pièce de résistance: a brief announcement that Google had signed a contract to replace Inktomi as the search technology provider for Yahoo. It was the biggest accomplishment in our company's short life.
The experts were underwhelmed.
"Analysts agreed that the announcement may have hurt Inktomi's pride," CNET reported, "but they said the implications for its revenues and profitability are mild ... That side of its business is a money loser that has increasingly played second fiddle to its exploding networking-services division. The search market in general, meanwhile, remains a low-margin, commodity business ... Dick Pierce, Inktomi's chief operating officer, said ... losing the portal as a search licensing partner ... will have 'little impact with respect to profitability.'"
Wall Street didn't buy the expert view. In fact, it sold heavily. By the end of the day, Inktomi's share price had fallen eighteen percent. This despite the fact that Yahoo had thrown Inktomi a bone, naming them a "corporate search" partner for an initiative launched the same day—because everyone knew the real money in search was on the corporate side.
With impeccable timing, I had planned my first vacation to coincide with the most momentous week in Google's history. Sunday night I had trouble falling asleep in our Lake Tahoe hotel, and on Monday I was up early flipping through the cable channels looking for news about the blockbuster Yahoogle deal as my family snuggled under their blankets. Much to my surprise, it wasn't the lead story on any of the major networks and, unbelievably, it didn't make headlines in the Tuesday papers. The
San Francisco Chronicle
had a brief mention in the business section and the
Mercury News
had slightly more, yet even that thin coverage signified that things had changed. Up to that point, the mainstream media had portrayed Google as another quirky startup and California cultural oddity, with an emphasis on the wacky ways of western entrepreneurs. Now, however, Google was a business-section item, suggesting that the company should be taken seriously as a corporate entity.