throbber
0:00
`0:10 MALE SPEAKER: Good afternoon, my dear
`0:12 colleagues, dear friends.
`0:14 It's my real privilege and pleasure to welcome here in
`0:18 the Czech Technical University Mr. Douglas Merrill, who is
`0:24 currently a vice president of Go ogle.
`0:27 You may see that Google is just rolling over the Czech
`-0~ 31 'I'eehnical University, because we have got- an excellent -
`0:34 opportunity in April when Vinton Cerfwas here and he
`0:39 was speaking about the internet on Mars, while
`0:43 Douglas is going to contribute on the
`0:46 search side of the Universe.
`0:49 So it means his lecture is about the search
`0:53 possibilities, and I think he is going to be quite excellent
`0:57 in this field, because he's not far away from your age who
`1 :01 are sitting over here, and he's of your mentality.
`1 :06 And I think the main point is, please, this lecture will be
`1: 11 around 45 minutes.
`1: 13 Then, of course, it is expected at least hundreds of
`1: 17 questions rising.
`1: 19 Please write them down on the paper to smooth the process,
`1 :23 and send them to the girls who will be going down and up.
`1 :27 So please do it in this way.
`1:30 And as well, if there is some presence paper, please sign
`1:36 it, because it is fine to know who is really interested in
`1:39 such a field.
`1 :40 And in fact, I'm not really to be upon your time, Douglas.
`1 :44 It's your floor, and possibly even your microphone.
`1:48 You have got it.
`1:50 So the floor is yours.
`1:52 DOUGLAS MERRILL: Thank you very much.
`1:53
`2:02 Hi, thanks for coming.
`2:03 It's a great honor to get to come to talk to a university
`2:10 that's 300 years old about a little tiny company that was
`2: 13 founded eight years ago, nine years ago by two crazy
`2: 18 graduate students.
`2:19 And Stanford University, where Larry and Sergey were
`2:25 students, has a bunch of classrooms that
`2:27lookjust like this.
`2:29 And so I guess my deepest hope is that the next Larry and
`2:33 Sergey are sitting in the audience right now, and will
`2:35 be inspired by something stupid that I say during this
`2:38lecture to go out and prove me wrong.
`2:40 So that's my challenge to all of you.
`2:42 Find what I say that's wrong and fix it.
`
`EXHIBIT 2078
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00480
`
`

`

`2:45 Thank you so much.
`2:46 My name is Douglas Merrill.
`2:47 I'm a Vice President of Engineering at Google.
`2:49 So just for those of you who are in the front, I recommend
`2:51 that you loosen up your neck a little bit, just relax.
`2:54 I pace a lot.
`2:56 And so down here, you guys, you're going to
`2:58 get a little seasick.
`2:59 It's OK.
`3:00 If you feel seasick, close your eyes and
`3:02 breathe, it's OK.
`3:03 Up there, you guys are going to forget who I look like.
`3:04 It's all fine.
`3:05 You cannot see me up there anyway, so it's irrelevant.
`3:09 This is Alex.
`3:10 Alex is right now having a nightmare that he has a test.
`3:14 Do you guys all have that nightmare where you're in the
`3:16 front of class, and you have a test,
`3:17 and you haven't prepared?
`3:18 Alex is unprepared.
`3:19 Next slide.
`3:20
`3:24 Well done, sir.
`3:26 So Larry and Sergey met at Stanford in the computer
`3:29 science school.
`3:31 They were both students in an information theory class in
`3:36 about 1998.
`3:37 And they didn't like each other.
`3:40 Larry thought that Sergey was argumentative, and Sergey
`3:45 thought that Larry was arrogant.
`3:47 They were probably both right.
`3:50 However in their class project, they came up with the
`3:55 idea to try and apply some basic principles of
`4:00 information theory to unstructured web search.
`4:03 Now, it's 1998.
`4:04 Keep in mind at the time, web search is a solved problem.
`4:09 Everybody knows how to do search.
`4:12 There's no questions left to be worked on.
`4:15 So these guys said, oh, but wait, there is.
`4:18 And they're really kind of interesting questions.
`4:20 And they set themselves a goal to organize all the world's
`4:23 information and make it universally
`4:24 accessible and useful.
`4:27 That's a kind of a small goal.
`4:29 These guys didn't shoot high.
`4:30 All the world's information, universally
`
`

`

`4:32 accessible and useful.
`4:34 What I want to talk of it today is a little bit of what
`4:36 the web looked like in 1998, before most of you were born,
`4:43 and what it looks like today, and what we think it's going
`4:48 to look like in the next 10 years.
`4:49 Next slide.
`4:50
`4:53 In 1998, there were a couple of dominant search engines.
`4:58 Neither one of them exists anymore, don't worry about
`5:00 their names.
`5:03 And they knew how to do search.
`5:05 Here's what they did.
`5:07 They had these people who sat in these big rooms, kind of
`5:09 like this, with computers in front of them, kind of like
`5:11 all of you.
`5:13 And they were surfing the web-- kind of like all of you.
`5:16
`5:19 And what they would do is they would find a page and they
`5:21 would read the page.
`5:22 And they would say, oh, you know what?
`5:23 This particular page is about soccer.
`5:26 I'm American, I know you guys call the game something else.
`5:29 Sorry.
`5:31 And they would have this little toolbar that they would
`5:33 pull down and they label "soccer." And then that page
`5:35 would be indexed.
`5:37 And they knew that this was going to work.
`5:41 They were wrong.
`5:44 They were wrong because the world changes too fast. On
`5:50 average, 10% of the web changes every month.
`5:56 Here's the interactive portion, boys and girls.
`5:58 If 10% changes every month, 10 times 12,
`6:02 carry the one, right.
`6:04 Likely everything changes every year, which means that
`6:07 these poor horrible people with this awful job of surfing
`6:11 the web and making down each page have to look at every
`6:13 page every year.
`6:15 Additionally, the web is doubling at this point in
`6:17 history about every four or five months.
`6:21 So twice a year or so you've doubled the size, everything
`6:25 that's already existed has changed at least once, and
`6:29 keep in mind that it turns out that the web's
`6:32 not entirely in English.
`6:35 Who knew?
`6:36
`6:38 So now you have to have rooms full of people who speak all
`
`

`

`6:41 these languages.
`6:42 Not a scalable model.
`6:44 Next slide, please.
`6:47 Really not scalable today.
`6:49
`6:52 So this slide is more shocking to Americans, because it turns
`6:55 out that Americans think that no one else
`6:56 in the world exists.
`6:59 You guys have all heard the joke, right?
`7:00 If you see three languages, you're trilingual, if you
`7:02 speak two, you're bilingual, if you
`7:03 speak one, you're American.
`7:05 [LAUGHTER]
`7:13 It turns out most of us aren't American.
`7:17 So the approach used by the search engines in 1998 would
`7:20 not have gotten us today.
`7:21 They would not have gotten us here.
`7:23 What got us here was an insight that Larry--
`7:27 mostly Larry had, but Larry and Sergey had together--
`7:29 called Page Rank.
`7:32 So what does Page Rank do?
`7:34 Page Rank allows you to figure out whether a particular web
`7:39 page is interesting or not.
`7:40 That makes sense.
`7:41 Is this particular page useful?
`7:43 So it's called Page Rank.
`7:45 Obviously it's named because you're ranking web pages.
`7:50 No, Larry named it after himself.
`7:52 Larry's last name is Page.
`7:54 What is today's lesson, boys and girls?
`7:56 Computer scientists are not funny.
`7:59 Next slide, please.
`8:01
`8:09 In a second, I'm going to talk about how the
`8:11 stuff actually works.
`8:13 I'm hoping that's more interesting to you.
`8:14 But first, I want to pull back a little bit, and I want to
`8:17 talk about the context.
`8:19 So I mentioned that web search is about more than web pages
`8:23 in English.
`8:26 It matters a lot to actually understand the context within
`8:31 which you are working.
`8:33 So for example, if you do a search for BMW on google.cz,
`8:39 you ought to get different results than if you do a
`8:42 search for BMW on google.com.
`8:45 And indeed you will.
`
`

`

`8:46 We'll recognize that probably you want to go to
`8:48 the .cz site instead.
`8:50 Part of our ranking signals are more
`8:51 than just page ranked.
`8:52 It also is about the context from which you come.
`8:54 Next slide, please.
`8:56
`8:58 Google publishes--
`8:59 [LAUGHTER]
`9:01 I'll give you a second to enjoy the list. The previous
`9:13 slide was called "Being Local Matters." It turns out it only
`9:19 matters in certain regards.
`9:21 So we publish a list called the Zeitgeist. The Zeitgeist
`9:24 captures the most actively growing and most popular
`9:28 queries, and we do it by country and by language and a
`9:31 bunch of things.
`9:32 And there's a couple of truths.
`9:34 Apparently, they are universal.
`9:36 The most popular search in every country
`9:38 is a beautiful woman.
`9:39
`9:41 And apparently game shows and television are also pretty
`9:46 popular to every one.
`9:47 Prison Break, for those of you who don't know, is a really
`9:49 bad American television show.
`9:50 So fundamentally, it matters, if you're going to do search
`9:54 right, you need to understand that the web is growing too
`9:58 fast, it's changing too fast, and it's not all in English.
`10:04 So the lesson that I want to talk about is,
`10:06 how do we do that?
`10:07 And I'm hoping, again, to reiterate that you guys-- one
`10:11 of you, or two of you, or 10 of you-- are going to hear
`10:13 something that I get wrong.
`10:15 And you're going to say, hey, I have a better
`10:16 idea and go try it.
`10:19 OK, how does it work?
`10:22 All right, let's build a search engine.
`10:24 This is the first of the interactive portions of
`10:26 today's talk, boys and girls.
`10:28 How many of you have had to build a search engine in a
`10:33 computer science class?
`10:34
`10:37 How many of them were any good?
`10:41 Oh, good.
`10:41 OK, so back in the day, when the web was first created, the
`10:51 terms were all coined by Tim Berners-Lee.
`
`

`

`10:53 And he talked about the fact that these pages were all
`10:56 inter-linked like a world wide web.
`11:00 What goes on webs?
`11:03 Spiders.
`11:04 What do spiders do?
`11:05 They crawl.
`11:06 Hence the term of art for finding information in web
`11:09 search is called crawling.
`11:12 This would be my second instantiation of how computer
`11:14 scientists are not funny.
`11:16 That is supposed to be a joke.
`11:18 So we're going to start out by crawling information.
`11:21 How does a crawler work?
`11:23 Simple kinds of crawlers start from a known web page like
`11:27 aol.com or pick your favorite portal.
`11:32 And they go through each link, and they essentially click on
`11:36 each link, and that expands to more web pages.
`11:38 Each of those pages has links.
`11:40 You click on each link from there, and you keep doing
`11:42 depth-first work recursion until you run out of time,
`11:44 space, or the Universe ends.
`11:48 Crawling sounds easy, right?
`11:50 That's probably, what, 10 lines of Python.
`11:54 What's hard?
`11:57 Remember, everything changes.
`11:58 So you've got to recrawl a lot.
`12:00 How often do you have to recrawl?
`12:03 If 10% of it changes every month, you have to recrawl the
`12:05 natural log of 10 times the number of months since the
`12:07 last time you completed a crawl.
`12:09 That's a very big number.
`12:10 You've got to crawl a lot, is the answer.
`12:14 Second thing that's hard.
`12:16 How do you know if you've already seen a page?
`12:20 Oh, that's easy, right?
`12:22 Take a hash of the URL.
`12:24 That would work, wouldn't it?
`12:26 What happens if they change the title of the page?
`12:28
`12:31 What happens if it's of a copy of the page?
`12:34 Oh right, the hash of the URL won't work.
`12:36 OK, still no problem.
`12:38 I'll take a hash of all the content of the page.
`12:40 That will work, right?
`12:41 Won't it work?
`12:42 What happens if they've got a space?
`
`

`

`12:45 What happens if they misspelled a
`12:46 word in their copy?
`12:47 What happens if they inserted a picture in different spots?
`12:50 Naive crawlers get roughly 25 to 40% percent of their
`12:54 content is content they have already seen before.
`12:56 Which means that on average, you're
`12:57 wasting one byte in four.
`12:59 If you're crawling to the end of the world, you want those
`13:02 bytes back.
`13:03 Crawls are hard.
`13:04 Additionally, crawls are hard because how do you store the
`13:07 data once you've got it?
`13:09
`13:12 How many you have had a database class?
`13:15 Come on, guys, I know you're out there.
`13:17 I hear you breathing.
`13:17 Come on!
`13:19
`13:20 How would you store a page in a database?
`13:23 It's hard work.
`13:24 Databases aren't optimized for this.
`13:27 And you're not going to need to do joins.
`13:28 There's no concept of query structures here.
`13:33 So we ended up having to build a file system called the
`13:36 Google File System.
`13:37 And it's that technology called Big Table that I'll
`13:39 talk about in a second.
`13:40 If you're interested, all the papers are hung off of
`13:43 google.com, and they're publicly available.
`13:46 Precisely to let us grab a piece of information, take
`13:51 what's called a hashmap of it-- which is a hash that has
`13:54 an error code in it, so that if you add spaces or move
`13:57 words around, I notice it--
`13:58 and then store them in a way which is redundant.
`14:02 Because then you always have the operational side as well.
`14:04 What happens if you lose a machine?
`14:06 Crawling seems easy.
`14:08 It's hard.
`14:08 And it's the easiest thing on this slide.
`14:11 After I crawl, remember we've got the crawl that's running
`14:15 until the end of time, until you run out of space--
`14:17 I can't remember the joke I made before, but rewind a
`14:20 little bit.
`14:21 For those of you who are surfing the web, just go find
`14:23 a crawl paper.
`14:25 Then you have to index everything you just crawled so
`
`

`

`14:29 you can find it later.
`14:31 What's the right index structure?
`14:33 Come on, this is easy.
`14:34 Come on.
`14:36 It's not easy?
`14:37 What's the right index structure?
`14:39 You could index every single word on the page.
`14:43
`14:46 Easier, you could index every character on the page.
`14:49 Pop quiz--
`14:50 what's the most common character
`14:52 in the English language?
`14:54 Space.
`14:55
`14:59 What's the second most common character
`15:01 in the English language?
`15:02
`15:07 You're going to end up with a lot of index entries for
`15:08 space, aren't you?
`15:09
`15:16 OK, so you can index every character.
`15:18 It's not very useful.
`15:19 Why is it not very useful?
`15:22 Because every single time you get a query, you're going to
`15:24 have to go through and reassemble all those
`15:26 characters into words and then map against all the documents.
`15:30 Probably the wrong index
`15:32 structure, but pretty flexible.
`15:33
`15:35 You could index trigrams, index three words at a time.
`15:40 Douglas C. Merrill, you can index that, right?
`15:44 Would that be better or worse?
`15:47 Well, different.
`15:48 What happens if you do a search for Douglas Merrill?
`15:51 Or worse, you shorten my name, which annoys the hell out of
`15:53 me, and do a search for Doug Merrill.
`15:58 A trigram index is going to break because you're not going
`16:00 to have that entry.
`16:04 If you look at all the search engines in the world today,
`16:07 they all have one or more of these index structures.
`16:10 No, I'm not going to tell you what ours is.
`16:12 But it's in the space of somewhere between characters
`16:14 and trigrams. And the index structure is going to have
`16:18 huge implications on the stuff I'm going to
`16:20 talk about in a second.
`16:23 And so far, we're still in the easy stuff.
`
`

`

`16:27 Then you get a query.
`16:30 So you go to google.cz, you key some words into the box,
`16:34 you hit enter, you get a bunch of results back.
`16:35 Simple, right?
`16:37 No problem.
`16:38 On average, we return 10 results in 400 milliseconds,
`16:43 half a second.
`16:45 That's not too bad.
`16:47 What's the speed of light, latency, from a query served
`16:51 here, if it's served from, say, Northern California?
`16:54
`16:57 About 2/3 of that time.
`16:58 So clearly we can't serve everything
`17:00 from one data center.
`17:02 Leave aside the storage and power, et cetera,
`17:04 et cetera, et cetera.
`17:06 And then there's all the fun of actually doing the ranking
`17:09 and picking out which result goes to the top, et cetera.
`17:13 It turns out it's harder to build a search engine than it
`17:15 seems. Next slide, please.
`17:18
`17:21 We want to give you the right answer at the top every time.
`17:27 So there's a lip right here.
`17:30 I'm taking bets about the odds that I go head over heels over
`17:33 the lip at some point during this talk.
`17:35 I'm currently giving 5:1 that I end up on my face, just FYI.
`17:38 Any takers?
`17:40 OK, we want to give the right answer every time at the top.
`17:46 This is the art of ranking.
`17:48 How do you know what the right result is?
`17:53 Larry and Sergey came up with the concept of Page Rank.
`17:57 So have any of you read the Page Rank paper?
`18:00
`18:03 Wow.
`18:03 What classes have assign it, or are you guys just
`18:05 over-achievers?
`18:07
`18:09 There's a lot of you.
`18:09 That's creepy.
`18:10 OK, usually like one person raises their hand, and it's
`18:14 the person you don't like.
`18:15 There are, like, 30 of you.
`18:17 Wow.
`18:18 This is cool.
`18:21 Core concept of Page Rank.
`18:24 How many of you have met me?
`
`

`

`18:26
`18:29 Come on, you guys have met me?
`18:30 Stephanie--
`18:33 the Google people should raise their hands, geez!
`18:36 And so none of you have any idea who I am.
`18:38 Why are you here?
`18:40
`18:42 Oh, right.
`18:42 You're here because somebody you trust--
`18:47 or, well--
`18:48 [LAUGHTER]
`18:52 OK, let's just pretend.
`18:56 You're here because somebody you trust said that I was
`18:59 worth listening to.
`19:00 You're here to listen to me-- and I make it--
`19:02 you're here to listen to me because somebody else
`19:05 suggested that I would have content--
`19:09 how are you--
`19:11 twice, I made it-- that I would have content worth
`19:14 listening to.
`19:16 Fundamentally, you're trusting that I have useful content
`19:18 because someone you trust said so.
`19:21 Page Rank is the same idea.
`19:24 Some arbitrary page on the web is most likely garbage.
`19:27
`19:30 However, if someone you like links to that page, basically
`19:35 saying this page isn't garbage, it's more likely that
`19:40 page is useful.
`19:41 Page Rank is simply a sum of the vertices
`19:50 of a directed graph.
`19:53 Start from a top page, make a graph downward of links.
`19:57 Edges are links, nodes are pages.
`20:00 Take a sum of the weights across those links,
`20:02 you get Page Rank.
`20:03 Thus, the more linked something is, the
`20:05 higher its Page Rank.
`20:07 Thus, the more a page is connected across the web, the
`20:10 more likely that page is good.
`20:15 What's wrong with this algorithm?
`20:16
`20:20 What if the links are garbage?
`20:23 So say for example, you have a blog, and your
`20:28 blog has open comments.
`20:31 And I write a bot that goes and finds your blog with all
`20:34 of its open comments and inserts a comment which is a
`20:37 link back to this page.
`
`

`

`20:40 Page Rank will see that as a link, and thus will think, oh,
`20:43 this page is better.
`20:45 Do you think that link is a useful signal?
`20:48 Probably not.
`20:50 So Page Rank was our first ranking algorithm designed to
`20:53 get the right results at the top every time.
`20:55 We now use something more than 200.
`20:57 Spam is an arms race.
`21:00 Every day, we have hundreds of engineers that work on trying
`21:03 to figure out what the person who's trying to gain the
`21:06 system is going to do next.
`21:07 Now there's a fun job.
`21:09 Every day, you get to go to battle with the bad guys.
`21:13 Next slide, please.
`21:14
`21:18 And then you start thinking about, in addition to crawling
`21:29 the web, indexing the web, ranking the pages, maybe you
`21:35 ought to be nice to your users.
`21:37 Those pesky users.
`21:40 Some languages, like English, are relatively easy to enter
`21:43 search terms on.
`21:44 English doesn't have accents, I don't think.
`21:48 Do we have any?
`21:48 I don't think so.
`21:51 English doesn't have diacriticals.
`21:52
`21:55 So my English keyboard has one mode.
`21:59 Full stop.
`22:00 Not true for you guys.
`22:03 But as search engine get better and better coverage,
`22:07 they can get smarter and smarter, and they can start
`22:10 noticing things.
`22:11 For example, we can notice errors in user entries,
`22:18 specifically like you dropped the diacriticals, and we know
`22:22 it, so we can just add them back for you.
`22:25 How do we do that?
`22:26
`22:29 Come on, somebody guess.
`22:29 There's an obvious guess.
`22:30 Come on.
`22:32
`22:35 OK, I'll come out here and I'll guess.
`22:36 OK.
`22:37 OK, I'm going to come sit right next to you, and I'm
`22:38 going to guess.
`22:39 OK.
`
`

`

`22:41 I think you do it by having a bunch of
`22:42 people who speak Czech.
`22:44
`22:47 Four times, I made it without falling.
`22:48 AUDIENCE: [INAUDIBLE]
`22:50
`22:54 DOUGLAS MERRILL: That's a great guess, much
`22:56 better than my guess.
`22:57 Not right, but much better.
`22:59 Much better.
`23:00 So my guess is dumb.
`23:01 Why is my guess dumb?
`23:03 Because it doesn't scale.
`23:05 Your guess makes a lot of sense.
`23:07 Except it means that I have to teach the crawler and the
`23:11 indexer what is a diacritical.
`23:13 AUDIENCE: Is that hard?
`23:15 DOUGLAS MERRILL: Not as hard as doing it by hand.
`23:17 But you know what's easier still?
`23:20 What's easier still is watching your users.
`23:23 You take anonymized search traffic, and I can see people
`23:26 who start with that entry up top, and then go, ugh, and
`23:31 retype the entry below.
`23:35 And I can do statistical machine learning that says, oh
`23:38 right, these two are probably actually the same word.
`23:41 And then I don't have to teach it about diacriticals, I don't
`23:43 have to teach it about language, I just have to watch
`23:46 anonymized user traffic.
`23:48 AUDIENCE: Are there any users that [INAUDIBLE]
`23:51 DOUGLAS MERRILL: Say again?
`23:51 AUDIENCE: Are there any users that use diacriticals
`23:54 [UNINTELLIGIBLE] when searching?
`23:55 Because I never do.
`23:56 I always type it out, whatever it is.
`23:59 DOUGLAS MERRILL: Thank you for helping to
`24:00 improve our search quality.
`24:03 The answer oddly enough is yes, but fewer and fewer
`24:06 because we did the right thing.
`24:07 Next slide, please.
`24:08 But the next slide's the same--
`24:10 this is even better.
`24:11 This is the same problem only done from the other side.
`24:14
`24:17 We can do the same thing I just talked about, about
`24:19 diacritics and provide spell checking.
`24:22 How do we do it?
`
`

`

`24:23 The same way I just talked about.
`24:25 You see people starting at the top, which is the word for
`24:27 gym, right?
`24:28 For gymnasium.
`24:29 Apparently they're tired because they skipped a letter.
`24:32 So there's some sort of weird--
`24:34 But we can notice that you typed that word in, you
`24:39 probably will get a few results.
`24:40 In general, the other grand truth of the internet-- so
`24:43 grand truth number one was that the top-rated search is
`24:45 always about some woman.
`24:46 Grand truth two is no matter how badly you misspell a word,
`24:51 somebody's got a page that spelled it that way.
`24:53
`24:57 Anyway, it's never the right page.
`25:02 And so we always find that a couple minutes later, or a
`25:05 couple seconds later, more often, you redo the search.
`25:07 And so by doing statistical machine learning, I can learn
`25:10 how to spell in almost every language on the planet without
`25:16 having any notion of morphology, without having any
`25:18 generative grammar, without having any of the stuff that
`25:20 Steven Pinker talks about.
`25:23 All I've got is spell correction,
`25:25 which is pretty useful.
`25:26 In fact, it's so useful in English that I use it to
`25:29 actually spell check my words, because there are all these
`25:31 words I can't figure out how to spell, so it will teach me.
`25:34 And all done simply with statistical machine learning.
`25:38 So how many of you have had a statistical machine learning
`25:40 class, or has [UNINTELLIGIBLE] a topic in a class?
`25:42 Pay attention next time.
`25:43 It's important.
`25:44 Next slide.
`25:46
`25:50 OK, however to your question, it was in there someplace, I
`25:58 lost where.
`25:58 I apologize.
`25:59 Who actually does searches with diacriticals?
`26:01 Good point.
`26:02 We do, however, have more sources of
`26:04 data than just search.
`26:05 And those sources are the local products we've released
`26:08 in the market.
`26:11 The more content that gets created, the better off the
`26:15 internet is.
`26:17 But what's the interesting story of the internet?
`
`

`

`26:19 It's not actually Google or Seznam or Yahoo.
`26:22 That's not the interesting part.
`26:23 The interesting part of the story is the democratization
`26:25 of information creation.
`26:28 History has always been written by the winners.
`26:32 400 years ago, about 2% of the people could read or write.
`26:37
`26:39 And apparently all of them went to this university.
`26:41
`26:45 Now 200 years ago, between 10 20% of the people in the world
`26:53 could read or write, depending on your perspective.
`26:55 Nowadays, it's more than that.
`27:00 I hope a lot more, but I don't actually--
`27:03 have you ever read an American newspaper?
`27:05 It might surprise you.
`27:06 Anyway, leaving that aside, what the internet and tools
`27:12 like that have let us do is they have let everyone tell
`27:14 their story.
`27:15 So instead of history being written only by the winners,
`27:18 it's written by everyone.
`27:19 Everyone gets to tell their story, which is cool.
`27:22 Pop quiz, what's the difference between a
`27:25 revolution and a civil war?
`27:28 Who won.
`27:31 Because if the reigning government won,
`27:34 it's a civil war.
`27:35 If the reigning government lost, it's a revolution.
`27:37
`27:40 We built a bunch of tools to help people tell their story.
`27:43 We built a bunch of tools that help people tell their story
`27:45 in Czech, which allows me to improve my search quality even
`27:49 if, in fact, no one searches with diacriticals, because I'm
`27:52 getting content created that I can index.
`27:55 Next slide, please.
`27:57
`28:00 I don't really have anything to say on this slide, but it's
`28:02 a pretty picture.
`28:03
`28:06 So pretty?
`28:07 Yes?
`28:09 Anyone have any comments on this slide?
`28:11 Me neither.
`28:11 Next slide, please.
`28:13
`28:18 So the next time you have a class assignment to build a
`28:20 search engine, you know what you have to figure out.
`
`

`

`28:27 You have to figure out how to do a crawl and recognize that
`28:31 you've seen a page before and find an efficient way to store
`28:35 the page, find an efficient way to figure out if you've
`28:38 seen it before.
`28:40 And then you have to decide on an indexing scheme.
`28:42 You have to index characters, or maybe words, or maybe
`28:45 bigrams.
`28:48 You have to figure out a ranking system.
`28:49 Maybe you'll use Page Rank.
`28:51 Or maybe you'll be like us and you'll do hundreds of
`28:54 different things, some of which are fascinating computer
`28:57 science, and some of which are funny little hats.
`29:03 But all of the things will then ultimately result in a
`29:07 search which works well in one context.
`29:11 Here's the place where I hope all of you are
`29:12 actually paying attention.
`29:14 So everyone who's asleep, please wake up.
`29:18 The last 10 years have been fascinating.
`29:21 We've done such great things worldwide in search.
`29:23 Seznam's done great things.
`29:25 We've done interesting stuff.
`29:26 There have been great companies doing
`29:28 great work for 10 years.
`29:29 The future's much harder, and much more interesting.
`29:34 Next slide, please.
`29:35
`29:37 So our mission was all the world's information
`29:39 universally accessible and useful.
`29:41 All the world's information universally
`29:44 accessible and useful.
`29:46 There are at least four huge computer science problems to
`29:51 solve in that context.
`29:53 For those of you who are interested in winning Turing
`29:55 Awards, pay attention.
`29:56 There's at least 30 of them on the next couple of slides.
`29:59 Next slide.
`30:01 Audience participation part number whatever--
`30:05 three, four, five, whatever number I'm on.
`30:08 What is this?
`30:09 AUDIENCE: The world.
`30:10 DOUGLAS MERRILL: OK.
`30:11
`30:14 OK, fair point.
`30:14 Yes, it's the world.
`30:16 I did actually give this talk once and I showed this slide,
`30:18 and someone said it's a photograph of the Earth.
`
`

`

`30:19
`30:22 And I was sort of intrigued by this.
`30:24 So how do you take a picture of the Earth and
`30:26 have it all be dark?
`30:28 But let's ignore that for now.
`30:31 Fair enough.
`30:32 It's not a photograph of the Earth, but it is
`30:33 a map of the world.
`30:34 What are the spots on it?
`30:35 What's changing?
`30:38 AUDIENCE: The number of searches conducted?
`30:39 DOUGLAS MERRILL: How did you know that?
`30:42 Nobody gets that right.
`30:43 Hey, you get out of here.
`30:44
`30:48 Well done.
`30:49 So pretend he's not here.
`30:53 Everybody says, hey look, it's city lights at night.
`30:56 It's not.
`30:58 OK, what we did is we took our query traffic for a day, and
`31:04 we put a little white dot every place that
`31:09 a query came from.
`31:12 So we geo-located the source of a query and we plotted it
`31:17 on the map over time.
`31:18 And you see some things, like you can see the United States
`31:22 pretty clearly.
`31:23 You can see Western Europe pretty clearly.
`31:25 You can see Tokyo over there, it's [UNINTELLIGIBLE], a
`31:27 little bit of China.
`31:29 And you can see it's clearly temporal, because remember,
`31:31 time is flowing in this diagram.
`31:33 And although I've taken the scale off it, it turns out the
`31:36 people seem to search a lot in the morning and the night,
`31:38 which makes sense because we all work for a living, except
`31:41 all of you.
`31:42 But anyway what else is interesting about this slide?
`31:49 Where is Africa?
`31:50
`31:55 I flew over it a couple of days ago.
`31:57 It was there.
`32:00 Really.
`32:01 So what's going on?
`32:02 What's going on is it turns out that the continent of
`32:05 Africa is served by basically two very large
`32:08 wired internet cables.
`32:09 Two.
`
`

`

`32:10 One runs down the east coast, one runs down the west coast.
`32:13 Remarkable how that works.
`32:15 Each of those internet cables is connected to the ground by
`32:19 things called points of presence.
`32:20 Those points of presence, there are about 10 of them,
`32:22 land in governmentally controlled centers.
`32:27 What is true about the internet
`32:29 everywhere in the world?
`32:31 One, it destabilizes authoritarian governments, and
`32:35 two, it's a great source of tax revenue.
`32:40 So what does that suggest is going to be the case for the
`32:43 wired internet in Africa?
`32:45
`32:49 AUDIENCE: [INAUDIBLE]
`32:50 controlled by government.
`32:51 DOUGLAS MERRILL: Oh, well done, sir.
`32:53 It's going to be controlled by the government.
`32:54 It's going to be really, really darn spendy.
`32:56 In fact, in some parts of sub-Saharan Africa, the cost
`33:01 of an hour's internet time in an internet cafe is about the
`33:05 same as one month's total salary on average.
`33:10 That suggests there ain't going to be a whole lot of
`33:13 wired internet use, right?
`33:14 So there are about 100,000 plus/minus wired internet
`33:17 connections in Africa.
`33:19 But you know what else there are?
`33:21 10 million internet-enabled mobile phones.
`33:26 Let's say your mission is all the world's information
`33:28 universally accessible and useful.
`33:30 What would you be working on?
`33:32 Search on mobile devices.
`33:33 Next slide, please.
`33:35
`33:38 So how many of you are carrying a laptop?
`33:41 It should be almost all of you, right?
`33:44 OK, how many of you are carrying a phone?
`33:46
`33:49 Even in a classroom, there are probably 50%
`33:53 more phones than laptops.
`33:55 Imagine what it's like in places that aren't scho

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket