`
`• Follow Wired
`• RSS
`
`Exclusive: How Google's Algorithm Rules the
`Web
`
`• By Steven Levy
`• 02.22.10 1
`• 12:00 pm I
`• Permalink
`
`• Share on Facebook
`0
`
`•
`•
`•
`fil Photo: Mauricio Alejo
`
`http://www.wired.com/2010/02/ff_google_algorithm/all/
`
`5/16/2014
`
`EXHIBIT 2045
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`When it comes to finding stuff, there's.Google-
`and there's everyone else.
`·
`Photo: Mauricio Alejo
`
`Want to know how Google is about to change your life? Stop by the Ouagadougou conference room on
`a Thursday morning. It is here, at the Mountain View, California, headquarters of the world's most
`powerful Internet company, that a room filled with three dozen engineers, product managers, and
`executives figure out how to make their search engine even smarter. This year, Google will introduce
`550 or so improvements to its fabled algorithm, and each will be determined at a gathering just like this
`one. The decisions made at the weekly Search Quality Launch Meeting will wind up affecting the results
`you get when you use Google's search engine to look for anything- "Samsung SF-755p printer," "Ed
`Hardy MySpace layouts," or maybe even "capital Burkina Faso," which just happens to share its name
`with this conference room. Udi Manber, Google's head of search since 2006, leads the proceedings. One
`by one, potential modifications are introduced, along with the results of months of testing in various
`countries and multiple languages. A screen displays side-by-side results of sample queries before and
`after the change. Following one example- a search for "guitar center wah-wah"- Manber cries out,
`"I did that search!"
`
`You might think that after a solid decade of search-market dominance, Google could relax. After all, it
`holds a commanding 65 percent market share and is still the only company whose name is synonymous
`with the verb search. But just as Google isn't ready to rest on its laurels, its competitors aren't ready to
`concede defeat. For years, the Silicon Valley monolith has used its mysterious, seemingly omniscient
`algorithm to, as its mission statement puts it, "organize the world's information." But over the past five
`years, a slew of companies have challenged Google's central premise: that a single search engine,
`through technological wizardry and constant refinement, can satisfy any possible query. Facebook
`launched an early attack with its implication that some people would rather get information from their
`friends than from an anonymous formula. Twitter's ability to parse its constant stream of updates
`introduced the concept of real-time search, a way of tapping into the latest chatter and conversation as it
`unfolds. Yelp helps people fmd restaurants, dry cleaners, and baby sitters by crowdsourcing the ratings.
`None of these upstarts individually presents much of a threat, but together they hint at a wide-open,
`messier future of search- one that isn't dominated by a single engine but rather incorporates a grab bag
`of services.
`
`Still, the biggest threat to Google can be found 850 miles to the north: Bing. Microsoft's revamped and
`rebranded search engine - with a name that evokes discovery, a famous crooner, or Tony Soprano's
`strip joint -launched last June to surprisingly upbeat reviews. (The Wall Street Journal called it "more
`inviting than Google.") The new look, along with a $100 million ad campaign, helped boost Microsoft's
`share of the US search market from 8 percent to about 11 -
`a number that will more than double once
`regulators approve a deal to make Bing the search provider for Yahoo.
`
`Team Bing has been focusing on unique instances where Google's algorithms don't always satisfy. For
`example, while Google does a great job of searching the public Web, it doesn't have real-time access to
`the byzantine and constantly changing array of flight schedules and fares. So Microsoft purchased
`Farecast- a Web site that tracks airline fares over time and uses the data to predict when ticket prices
`will rise or fall- and incorporated its findings into Bing's results. Microsoft made similar acquisitions
`in the health, reference, and shopping sectors, areas where it felt Google's algorithm fell short.
`
`Even the Bingers confess that, when it comes to the simple task of taking a search term and returning
`relevant results, Google is still miles ahead. But they also think that if they can come up with a few areas
`where Bing excels, people will get used to tapping a different search engine for some kinds of queries.
`"The algorithm is extremely important in search, but it's not the only thing," says Brian MacDonald,
`
`http://www.wired.com/2010/02/ff_google_algorithm/all/
`
`5/16/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`Microsoft's VP of core search. "You buy a car for reasons beyond just the engine."
`
`Google's response can be summed up in four words: mike siwek Iawver mi.
`
`Amit Singhal types that koan into his company's search box. Singhal, a gentle man in his forties, is a
`Google Fellow, an honorific bestowed upon him four years ago to reward his rewrite of the search
`engine in 2001. He jabs the Enter key. In a time span best measured in a hummingbird's wing-flaps, a
`page of links appears. The top result connects to a listing for an attorney named Michael Siwek in Grand
`Rapids, Michigan. It's a fairly innocuous search- the kind that Google's servers handle billions of
`times a day -but it is deceptively complicated. Type those same words into Bing, for instance, and the
`first result is a page about the NFL draft that includes safety Lawyer Milloy. Several pages into the
`results, there's no direct referral to Siwek.
`
`The comparison demonstrates the power, even intelligence, of Google's algorithm, honed over countless
`iterations. It possesses the seemingly magical ability to interpret searchers' requests- no matter how
`awkward or misspelled. Google refers to that ability as search quality, and for years the company has
`closely guarded the process by which it delivers such accurate results. But now I am sitting with Singhal
`in the search giant's Building 43, where the core search team works, because Google has offered to give
`me an unprecedented look at just how it attains search quality. The subtext is clear: You may think the
`algorithm is little more than an engine, but wait until you get under the hood and see what this baby can
`really do.
`
`Google's search algorithm is a work in progress- constantly
`tweaked and refined to return higher-quality results. Here are
`some of the most significant additions and adaptations since the
`dawn of PageRank. - Steven Levy
`
`Key
`Advances in
`Go ogle
`Search
`
`Backrub
`[September I997]
`
`This search engine, which had run on Stanford's servers for almost two years, is renamed Google. Its
`breakthrough innovation: ranking searches based on the number and quality of incoming links.
`
`New algorithm
`[August 200 I]
`
`The search algorithm is completely revamped to incorporate additional ranking criteria more easily.
`
`Local connectivity analysis
`[February 2003]
`
`Google's first patent is granted for this feature, which gives more weight to links from authoritative
`sites.
`
`http:/ /www.wired.com/20 I 0/02/ff_google _algorithm/all/
`
`5/I6/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`Fritz
`[Summer 2003]
`
`This initiative allows Google to update its index constantly, instead of in big batches.
`
`Personalized results
`[June 2005]
`
`Users can choose to let Google mine their own search behavior to provide individualized results.
`
`Bigdaddy
`[December 2005]
`
`Engine update allows for more-comprehensive Web crawling.
`
`lJniversalsearch
`[May 2007]
`
`Building on Image Search, Google News, and Book Search, the new Universal Search allows users to
`get links to any medium on the same results page.
`
`Real-Time Search
`[December 2009]
`
`Displays results from Twitter and blogs as they are published.
`
`The story of Google's algorithm begins with PageRank, the system invented in 1997 by cofounder
`Larry Page while he was a grad student at Stanford. Page's now legendary insight was to rate pages
`to use the collective intelligence of
`based on the number and importance of links that pointed to them -
`the Web itself to determine which sites were most relevant. It was a simple and powerful concept, and
`-as Google quickly became the most successful search engine on the Web- Page and cofounder
`Sergey Brin credited PageRank as their company's fundamental innovation.
`
`But that wasn't the whole story. "People hold on to PageRank because it's recognizable," Manber says.
`"But there were many other things that improved the relevancy." These involve the exploitation of
`certain signals, contextual clues that help the search engine rank the millions of possible results to any
`query, ensuring that the most useful ones float to the top.
`
`Web search is a multipart process. First, Google crawls the Web to collect the contents of every
`accessible site. This data is broken down into an index (organized by word, just like the index of a
`textbook), a way of finding any page based on its content. Every time a user types a query, the index is
`
`http:/ /www.wired.com/20 1 0/02/ff_google _algorithm/all/
`
`5/16/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`combed for relevant pages, returning a list that commonly numbers in the hundreds of thousands, or
`millions. The trickiest part, though, is the ranking process - determining which of those pages belong
`at the top of the list.
`
`That's where the contextual signals come in. All search engines incorporate them, but none has added as
`many or made use of them as skillfully as Google has. PageRank itself is a signal, an attribute of a Web
`page (in this case, its importance relative to the rest ofthe Web) that can be used to help determine
`relevance. Some of the signals now seem obvious. Early on, Google's algorithm gave special
`consideration to the title on a Web page- clearly an important signal for determining relevance.
`Another key technique exploited anchor text, the words that make up the actual hyperlink connecting
`one page to another. As a result, "when you did a search, the right page would come up, even if the page
`didn't include the actual words you were searching for," says Scott Hassan, an early Google architect
`who worked with Page and Brin at Stanford. "That was pretty cool." Later signals included attributes
`like freshness (for certain queries, pages created more recently may be more valuable than older ones)
`and location (Google knows the rough geographic coordinates of searchers and favors local results). The
`search engine currently uses more than 200 signals to help rank its results.
`
`Google's engineers have discovered that some of the most important signals can come from Google
`itself. PageRank has been celebrated as instituting a measure of populism into search engines: the
`democracy of millions of people deciding what to link to on the Web. But Singhal notes that the
`engineers in Building 43 are exploiting another democracy- the hundreds of millions who search on
`Google. The data people generate when they search- what results they click on, what words they
`replace in the query when they're unsatisfied, how their queries match with their physical locations(cid:173)
`turns out to be an invaluable resource in discovering new signals and improving the relevance of results.
`The most direct example of this process is what Google calls personalized search -
`a feature that uses
`someone's search history and location as signals to determine what kind of results they'll find usefu1. 1
`But more generally, Google has used its huge mass of collected data to bolster its algorithm with an
`amazingly deep knowledge base that helps interpret the complex intent of cryptic queries.
`
`Take, for instance, the way Google's engine learns which words are synonyms. "We discovered a nifty
`thing very early on," Singhal says. "People change words in their queries. So someone would say,
`'pictures of dogs,' and then they'd say, 'pictures of puppies.' So that told us that maybe 'dogs' and
`'puppies' were interchangeable. We also learned that when you boil water, it's hot water. We were
`relearning semantics from humans, and that was a great advance."
`
`But there were obstacles. Google's synonym system understood that a dog was similar to a puppy and
`that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The
`problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein's theories
`about how words are defined by context. As Google crawled and archived billions of documents and
`Web pages, it analyzed what words were close to each other. "Hot dog" would be found in searches that
`also contained "bread" and "mustard" and "baseball games" - not poached pooches. That helped the
`algorithm understand what "hot dog"- and millions of other terms- meant. "Today, if you type
`'Gandhi bio,' we know that bio means biography," Singhal says. "And if you type 'bio warfare,' it
`means biological."
`
`Throughout its history, Google has devised ways of adding more signals, all without disrupting its users'
`core experience. Every couple of years there's a major change in the system- sort of equivalent to a
`new version of Windows- that's a big deal in Mountain View but not discussed publicly. "Our job is
`to basically change the engines on a plane that is flying at 1,000 kilometers an hour, 30,000 feet above
`Earth," Singhal says. In 2001, to accommodate the rapid growth of the Web, Singhal essentially revised
`Page and Brin's original algorithm completely, enabling the system to incorporate new signals quickly.
`
`http://www. wired.corn/20 1 0/02/ff_google _algorithm/all/
`
`5116/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`(One of the first signals on the new system distinguished between commercial and noncommercial
`pages, providing better results for searchers who want to shop.) That same year, an engineer named
`Krishna Bharat, figuring that links from recognized authorities should carry more weight, devised a
`powerful signal that confers extra credibility to references from experts' sites. (It would become
`Google's first patent.) The most recent major change, codenamed Caffeine, revamped the entire
`indexing system to make it even easier for engineers to add signals.
`
`Google is famously creative at encouraging these breakthroughs; every year, it holds an internal demo
`fair called CSI - Crazy Search Ideas -
`in an attempt to spark offbeat but productive approaches. But
`for the most part, the improvement process is a relentless slog, grinding through bad results to determine
`what isn't working. One unsuccessful search became a legend: Sometime in 2001, Singhal learned of
`poor results when people typed the name "audrey fino" into the search box. Google kept returning
`Italian sites praising Audrey Hepburn. (Fino means fme in Italian.) "We realized that this is actually a
`person's name," Singhal says. "But we didn't have the smarts in the system."
`
`The Audrey Fino failure led Singhal on a multiyear quest to improve the way the system deals with
`names- which account for 8 percent of all searches. To crack it, he had to master the black art of"bi(cid:173)
`gram breakage"- that is, separating multiple words into discrete units. For instance, "new york"
`represents two words that go together (a hi-gram). But so would the three words in "new york times,"
`which clearly indicate a different kind of search. And everything changes when the query is "new york
`times square." Humans can make these distinctions instantly, but Google does not have a Brazil-like
`back room with hundreds of thousands of cubicle jockeys. It relies on algorithms.
`
`lj Photo: Mauricio Alejo
`
`Voila- when a hot dog is not a boiling puppy.
`Photo: Mauricio Alejo
`The Mike Siwek query illustrates how Google accomplishes this. When Singhal types in a command to
`expose a layer of code underneath each search result, it's clear which signals determine the selection of
`the top links: a hi-gram connection to figure it's a name; a synonym; a geographic location.
`"Deconstruct this query from an engineer's point of view," Singhal explains. "We say, 'Aha! We can
`
`http://www. wired.com/20 1 0/02/ff _google _algorithm/all/
`
`5/16/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`break this here!' We figure that lawyer is not a last name and Siwek is not a middle name. And by the
`way, lawyer is not a town in Michigan. A lawyer is an attorney."
`
`This is the hard-won realization from inside the Google search engine, culled from the data generated by
`billions of searches: a rock is a rock. It's also a stone, and it could be a boulder. Spell it "rokc" and it's
`still a rock. But put "little" in front of it and it's the capital of Arkansas. Which is not an ark. Unless
`Noah is around. "The holy grail of search is to understand what the user wants," Singhal says. "Then
`you are not matching words; you are actually trying to match meaning."
`
`And Google keeps improving. Recently, search engineer Maureen Heymans discovered a problem with
`"Cindy Louise Greenslade." The algorithm figured out that it should look for a person -
`in this case a
`psychologist in Garden Grove, California- but it failed to place Greenslade's homepage in the top 10
`results. Heymans found that, in essence, Google had downgraded the relevance of her homepage
`because Greenslade used only her middle initial, not her full middle name as in the query. "We needed
`to be smarter than that," Heymans says. So she added a signal that looks for middle initials. Now
`Greenslade's homepage is the fifth result.
`
`At any moment, dozens of these changes are going through a well-oiled testing process. Google employs
`hundreds of people around the world to sit at their home computer and judge results for various queries,
`marking whether the tweaks return better or worse results than before. But Google also has a larger army
`of testers -
`its billions of users, virtually all of whom are unwittingly participating in its constant
`quality experiments. Every time engineers want to test a tweak, they run the new algorithm on a tiny
`percentage of random users, letting the rest of the site's searchers serve as a massive control group.
`There are so many changes to measure that Google has discarded the traditional scientific nostrum that
`only one experiment should be conducted at a time. "On most Google queries, you're actually in
`multiple control or experimental groups simultaneously," says search quality engineer Patrick Riley.
`Then he corrects himself. "Essentially," he says, "all the queries are involved in some test." In other
`words, just about every time you search on Google, you're a lab rat.
`
`the ability to add signals, tweak the underlying code, and instantly test the results -
`This flexibility -
`is why Googlers say they can withstand any competition from Bing or Twitter or Facebook. Indeed, in
`the last six months Google has made more than 200 improvements, some of which seem to mimic -
`even outdo- the offerings of its competitors. (Google says this is just a coincidence and points out that
`it has been adding features routinely for years.) One is real-time search, eagerly awaited since Page
`opined some months ago that Google should be scanning the entire Web every second. When someone
`queries a subject of current interest, among the 10 blue links Google now puts a "latest results" box: a
`scrolling set of just-produced posts from news sources, blogs, or tweets. Once again, Google uses
`signals to ensure that only the most relevant tweets fmd their way into the real-time stream. "We look at
`what's retweeted, how many people follow the person, and whether the tweet is organic or a bot,"
`Singhal says. "We know how to do this, because we've been doing it for a decade."
`
`Along with real-time search, Google has introduced other new features, including a service called
`Goggles, which treats images captured by users' phones as search queries. It's all part of the company's
`relentless march toward search becoming an always-on, ubiquitous presence. With a camera and voice
`recognition, a smartphone becomes eyes and ears. If the right signals are found, anything can be query
`fodder.
`
`Google's massive computing power and bandwidth give the company an undeniable edge. Some
`observers say it's an advantage that essentially prohibits startups from trying to compete. But Manber
`says it's not infrastructure alone that makes Google the leader: "The very, very, very key ingredient in
`all of this is that we hired the right people."
`
`http://www. wired.com/20 1 0/02/ff_google _ algorithmlalV
`
`5/16/2014
`
`
`
`Exclusive: How Google's Algorithm Rules the Web I Magazine I WIRED
`
`By all standards, Qi Lu qualifies as one ofthose people. "I have the highest regard for him," says
`Manber, who worked with the 48-year-old computer scientist at Yahoo. But Lujoined Microsoft early
`last year to lead the Bing team. When asked about his mission, Lu, a diminutive man dressed in jeans
`and a Bing T-shirt, .pauses, then softly recites a measured reply: "It's extremely important to keep in
`mind that this is along-term journey." He has the same I'm-not-going-away look in his eye that Uma
`Thurman has in Kill Bill.
`
`Indeed, the company that won last decade's browser war has a best-served-cold approach to search, an
`eerie certainty that at some point, people are going to want more than what Google' s algorithm can
`provide. "If we don't have a paradigm shift, it's going to be very, very difficult to compete with the
`current winners," says Harry Shum, Microsoft's head of core search development. "But our view is that
`there will be a paradigm shift."
`
`Still, even if there is such a shift, Google's algorithms will probably be able to incorporate that, too.
`That's why Google is such a fearsome competitor; it has built a machine nimble enough to absorb
`almost any approach that threatens it- all while returning high-quality results that its competitors can't
`match. Anyone can come up with a new way to buy plane tickets. But only Google knows how to find
`Mike Siwek.
`
`Senior writer Steven Levy (steven levv@:vvire&com) wrote about Twitter in issue 17.11.
`
`'
`
`c:::
`
`1. Correction appended [February 25] Google 's personalized search uses someone 's
`search history and location to determine what kind of results they'llfind useful. It does
`not require them to opt-in or be logged in, as previously reported.
`
`Pages: 1 ~ J. View All
`
`• Share on Facebook
`0
`
`•
`•
`•
`
`• Digg
`• Stumble Upon
`
`Tags: 17.12, Feature, google, Search, Steven Leyy
`
`• 0 Comments I
`• Permalink
`
`http:/ /www.wired.com/20 1 0/02/ff_google _algorithm/all/
`
`5/16/2014
`
`