Authors: Douglas Edwards
"You're the only neurosurgeon around," replied Schwim, assessing the situation with both concern and an engineer's practicality. "Do you think you can fix yourself?"
The facilities team, for reasons known only to them, had unscrewed the support for the crossbeam while removing the wall. In Jim's professional medical opinion, the beam would have done him considerably more damage if it had landed a couple of inches to either side. "They were very, very, very kind to us after that," he said about the Exodus crew.
Jim finally found some unoccupied space in a corner of the building and Exodus agreed to throw fence walls around it. He spent the better part of June and July installing two thousand brand-new computers into the cage. The machines didn't always work. They were built quickly and with parts purchased at very reasonable prices. "For some racks," Jim recalls, "we got fifty-six out of eighty working, so we'd spend a week installing these machines and then another week repairing the ones that didn't install."
Eventually Larry let Jim in on why his work was so urgent. Google had signed a deal with Netscape to be their fall-through search engine. If Netscape users couldn't find what they wanted using Netscape's open directory, they would be able to search from the directory page using Google. So Google needed more computing power to handle the potential traffic.
Jim load-tested Google's capacity as he added machines—checking to see that it could handle the increased traffic and any occasional spikes that might occur. "In general," he told me, "you like to see two times capacity. For peaks, you like to see four times capacity. Netscape anticipated a one-point-seven-times increase over existing Google traffic, so I tested it. At nearly five times, we were completely in the clear."
It wasn't easy. Jim and Schwim were still at the data center installing machines the night of June 24. Netscape would announce the deal and start directing traffic to Google the morning of June 25. Schwim worked until two a.m., when the cumulative lack of sleep caught up with him and he went home to crash.
Fortunately, Jim had recruited another tech guy to help them over the finish line. Though it had been a while since Sergey had dirtied his hands installing machines, he stayed at the data center with Jim until five a.m. "He didn't know all the technical details of how the routing went," Jim remembered, "but he was in there crawling under the floorboards, running cables, and hooking up switches."
For Jim it was the culmination of weeks of exhausting physical labor, and when he finally dragged himself off to bed it was with a sense of accomplishment. Google had averted a potential disaster by tripling its capacity in record time.
An hour later, his phone rang. It was Sergey. "Get in here right away. We're melting down."
Netscape's press release had hit the newswires at six a.m. West Coast time. Within seconds Google's traffic had increased not the expected one-point-seven times, but sevenfold. The servers couldn't handle the load. Sergey and Jim rushed back to Exodus and began desperately throwing the last batch of machines they had into racks and hooking them up.
Meanwhile they did everything they could to clear away extraneous demands on Google's infrastructure. They stopped the crawler from adding websites to Google's index and reallocated those machines to serving results. It helped, but not enough. Response times had slowed perceptibly, and some users got no results at all. Google's most important launch to date teetered on the brink of becoming an epic pooch-screwing.
The atmosphere in the office Craig Silverstein shared with Amit Patel was grim as Larry and Sergey, Urs, and the rest of Google's engineers reviewed their options. Netscape was not a small partner like their first client, VMWare. If this relationship went down the tubes, everyone would know and Google's tech reputation would be toast. They could think of only one way to increase capacity to handle Netscape's users.
"Shut off queries to
Google.com
," Larry instructed the team.
For the next couple of hours, anyone who went to
Google.com
saw a static page explaining that Google was down. Every computer and every bit of bandwidth Google had at its command was serving results to Netscape users. Larry and Sergey were risking their own site's reputation to maintain credibility as a reliable technology partner.
By lunchtime, traffic had subsided enough that Larry and Sergey gave the okay to turn
Google.com
back on. Schwim and Jim returned to Exodus to finish installing the last of the servers, and within four hours they had brought an additional three hundred machines online, ending the immediate crisis.
As Jim and Schwim left the controlled environment of the data center and headed out into the warm evening air, they received another call. Netscape's engineering team was at the Tied House Brewery in Mountain View, celebrating the partnership, and they wanted Google's tech team to join them.
"They threw us a great post-launch party," Jim remembers. "And the thing that came up over and over again was, 'I can't believe you guys shut down your own site just to serve our traffic.'" The Googlers in attendance noted well that their sacrifice had paid off handsomely. The deal with Netscape promised to blossom into a beautiful friendship. Google gained not only trust, but also access to a whole new set of data in Netscape's query stream—data we could analyze and compare with our own traffic. Most important, the company's first major crisis battle-hardened it. Larry and Sergey would never again underestimate the challenges of occupying new territory. Though it seemed epic at the time, the battle of Netscape would go down as a minor skirmish once Google fully engaged the major players in the war for search supremacy.
That day was coming.
A little after midnight one Saturday night in the fall of 1999, Jim's phone interrupted his sleep again. Again it was Sergey.
"The site's down. What's up?" he wanted to know.
"Not me," Jim replied with a yawn. "You woke me."
A circuit breaker at Exodus had flipped, taking down Google's main switch, an inexpensive little piece of Hewlett-Packard hardware through which all of Google's traffic flowed. Exodus had set up the switch before Google moved the first racks into its cage, and had done it in a hurry. The device had been placed on the floor under one of the racks and was cabled in such a way that it had to stay there. It was known to all the techs by the designation "Switch on the ground." There was no backup, and when it crashed Google went offline until someone did something about it.
"Sergey had been at a party. He came home and noticed we were down," recalls Jim, who logged in, figured out the problem, and had Exodus turn the circuit on again. Google was offline for about half an hour.
"We should probably be monitoring our site, huh?" said Sergey when Jim called to let him know it was back up.
Jim spent the rest of Saturday night and Sunday morning writing a script to monitor Google. His script checked the site every five seconds to make sure it was operational and called a phone number if something went wrong. The next week everyone in operations got a pager.
Google had gone dark for a second time, but no tempers flared and no heads rolled. "If Larry and Sergey were upset about anything," Jim told me, "it was, Why didn't any of us think of that? We're a bunch of bright people here and none of us even thought to monitor our own site."
The pager alert system created problems of its own. "Claus,"
*
a logs engineer, was one of the first to be hooked up, and he watched carefully as our traffic numbers kept redlining, threatening to crash the logs system. The logs were money—we billed advertisers on the basis of the data they contained—so he set up his own scripts to crunch the numbers and to call his pager when they were done. That happened about three times an hour, every hour, all day long. According to engineer Chad Lester, Claus "kept Google alive in the early days. He'd be sleeping at his desk in twenty-minute intervals between pages. One month he got a pager bill in the thousands of dollars."
Google renegotiated its pager service contract but never compromised on-site reliability again.
Google.com
would stay online, no matter what.
During the spring of 2000, I didn't sense any great strain in the fabric of the company as I grew accustomed to its rhythms. The basic elements had coalesced: a physical plant, a core engineering team, finance and HR staff, and even marketing in support of a product for which the demand seemed insatiable. The coming months would be about holding on. A previously unmet need was rushing headlong toward the provider of a free solution—bucking our audience numbers higher and higher with each lunge forward. On May 8, 2000, Google's traffic topped eight million searches a day. Two weeks later, it was nine million. In theory, we could grow forever, but each bounding leap threatened to bring our ride to an abrupt and messy end because we couldn't add capacity fast enough.
The biggest jump lay just weeks ahead. No one spoke about it, but as I stood in line at the café, debating what I could actually eat from Charlie's Appalachian Day menu (pickled pigs feet, okra consommé, free-range pork rinds, moon pies with mayonnaise, and Twinkie cheez-dogs), it seemed there were more than the usual number of empty seats. The few engineers I did glimpse hurriedly filled their trays and headed back to their desks wearing stress and fatigue like battle-tattered hockey jerseys.
Rumors and whispers about a big hairy deal had been spreading over the cables and through the cubicles, but no one would confirm whose business we were attempting to capture.
Urs knew. He rode herd on his ops team to build capacity in the data-center cages as fast as humanly possible. We would need every server we could cobble together to feed the ravenous behemoth we hoped to contain there.
We were going after Yahoo.
Inktomi's contract to supply search results to Yahoo was up for renewal in June 2000, and Yahoo did not intend to extend the partnership, a fact they were hiding from the world at large. They wanted Google to provide the fall-through search on their site, just as we did for Netscape. If users couldn't find what they wanted in Yahoo's directory, they would use Google to search the web.
Why the shift? Inktomi saw portal search as an unprofitable sideline—they focused on providing search services for the internal networks of large enterprises—so they didn't feel the need to push themselves on Yahoo's behalf. That opened the door for Google. Larry and Sergey dug deep to offer favorable financial terms,
*
and it didn't hurt that the Stanford guys running Yahoo and the Stanford guys running Google had common ground, or that Omid Kordestani knew Udi Manber—the top search guy at Yahoo—or that Google and Yahoo shared a board member in Mike Moritz of Sequoia Capital, a communication channel that smoothed the progress of the deal.
Google also promised dramatic improvements in search quality. Google's technology had surpassed Inktomi's and would continue to do so, because Google did care, truly and deeply, about consumer search. Google, however, wasn't the only contender pushing for a seat at the table to eat Inktomi's lunch.
"FAST
†
was a scare for a while in early 2000," Urs admitted. "They came out with a large index and they were pretty fast. They were not bad quality-wise, but they had real trouble keeping their index fresh. Maybe they were trying to do too many things. By 2001 we felt we were clearly better than Inktomi results-wise, clearly better than AltaVista, clearly better than FAST. We had the best search engine."
And what about Google's comparative quality the year before, when Netscape had become a partner? "Netscape was kind of crazy to switch their search to us," Urs confessed. He believed they made the change "in part because they didn't care about search that much ... It was a cost center."
Not to mention that Omid Kordestani happened to be an excellent salesperson. "Omid could type in 'IBM' on Google and type in 'IBM' on AltaVista," Urs recalls, "and say 'Hey look, aren't our results better?' That was the level of sophistication. Our search was good, but our coverage was bad. You had all kinds of queries where we didn't have the page and AltaVista or Inktomi had it. People's expectations were just low."
That wasn't the case with Yahoo, where Udi Manber, a search specialist, was chief scientist.
*
"Udi wrote the contract," Urs told me, "so he paid attention to the important things." Manber would settle for nothing less than the best that Google could produce, which he knew to be more than what the search engine was offering at the time.
Larry and Sergey committed to Yahoo that Google would make numerous improvements in a matter of weeks: to set up two new data centers—including one on the East Coast—to freshen the index by crawling more frequently, to reduce spam in results, and to meet strict limits on latency, the time between the search being entered and the results being delivered. Each of these promises would require enormous effort to fulfill, but if all were met, Google would give Yahoo users a visibly better search experience.
Better, but not the best. "What was important," Urs confided to me, "was that if you syndicated something and you had your own property, you wanted to make sure that over time you could innovate and actually have something better than Yahoo search powered by Google. And that was Google search, unrestricted by commercial agreement."