throbber
t
`t
`t
`
`EXHIBIT 2087
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`

`Copyright () Alex:mder Halavais 2009
`
`The right of Alexander 1-tala~--ais to be identifted as Author or this Work has
`been asserted in accordance with the UK CopYright. Designs and Patents Act
`1!)88.
`
`First published in 2009 by Polity Press
`
`Polity Press
`65 Bridge Stteet
`ClmbridgeCR2 rUR. UK.
`
`Polity Press
`150 Main SlTeet
`Malden. MA 02.148. USA
`AU righrs re:sen'ed. Except for the quot:otion or shon pass.lges for the purpose
`or criticism .:md review. no pan of lhis publicnion may be reproduced, stored
`in a rettievaJ system, or ttansmitted. in any fonn or by any means. electronic.
`mechanical. photocopying. recording or otherwise. without the prior
`permission of the publisher.
`
`ISBN· I}: 97&-o-7~56·42.14·7
`ISBN·11: 978-o-'7456·42.15·4 (paperback)
`
`A catalogue record for this book is av.ailable from the British Ubrary.
`
`Typeset in 10..2.5on l) pt FF SClla
`by Servis Filmsetting ltd. Stockpon. Cheshire
`Printed and bound in Grell Rrilain by MPC Books l1d. Bodmin. Cornwall.
`
`The publisher h.ls used its best endeaYOUtS 10 ensure that the URLs for
`enenul websites referred to in this book are correct and active at the lime of
`going to press. However, the publisher bas no responsibility for the voebsites
`and em make no guarantee that a si~e 'AiD remain live or th.n the content is or
`will remain appropriate.
`
`E\W)' effort bJ:s been made to trace aU coprright holders. but if any h.lve been
`inad,·ertently o~--eriooked the publishers 'Aill be pleased to include any
`necessary crediiS in any subsequent reprint or edition.
`
`For further information on Polity. \isit our website: www.polity.m .uk.
`
`

`

`pro\'iders: finding the least expensive airfares for a given route,
`for example.
`Most crawlers make an archival copy of some or aU of a web(cid:173)
`page, and extract the links immediately to find more pages to
`crawL Some crawlers, like the HeritrLx spider employed by the
`Internet Archive, the "wget" program often distributed with
`Unux, a nd web robots buiJt into browsers and other web
`cHents, are pretty much done at this stage. However. most
`crawlers create an archive that is designed to be parsed and
`organized in some way. Some of this processing (like ""scrap(cid:173)
`ing" out links, or storing metadata) can occur within the
`crawler itself, but there is usually some form of processing of
`the te.\1 and code of a webpage afterward to tl)' to obtain struc(cid:173)
`tura1 information about it.
`The most basic fom1 of processing, common to almost every
`modem search engine, is extraction of ke)' terms to create a
`keyword inde.x for the web by an .. indexer."' We are all familiar
`with bow the index of a book works: it takes information about
`which words appear on any given page and reverses it so that
`you may learn which pages contain any given word. In retro(cid:173)
`spea, a full-text inde.x of the web is one of the obvious choices
`for finding material online. but particularly in the early devel(cid:173)
`opment of search engines it was not dear what parts should be
`indexed: the page tides, metadata, hyperlink teX1, or full text
`(Yuwono eta!. 1995). lfindexing the full text of a page. is it pos(cid:173)
`sible to determine which words are most important?
`In practice, even deciding what constitutes a "'word" (or a
`"term") can be difficu1t For most western languages, it is pos(cid:173)
`sible to look for words by finding letters be~veen the spaces
`and punctuation, though this becomes more difficu1t in
`languages like Chinese and Japanese, v.itich have no dear
`markings between terms. In English, contractions and abbre(cid:173)
`viations cause problems. Some spaces mean more than others;
`someone looking for information about "York"' probably has
`little use for pages that mention .. New York," for instance. A
`handfu1 of words like .. the" and •my" are often dismissed as
`
`

`

`"stop words" and not included in the index because they are
`so common. Further application of nahtral language process(cid:173)
`ing (NLP) is capable of determining the parts of speech of
`terms, and synonyms can be identified to provide further d ues
`for searching. At the most extreme end of indexing are efforts
`to allow a computer to in some way understand the genre or
`topic of a given page by "reading" the text to determine its
`meaning.1
`An index works well for a book. Even in a fairly lengthy
`work, it is not difficult to check each occurrence of a keyword,
`but the same is not true of the web. Generally, an exhaustive
`examination of each of the pages containing a particular key(cid:173)
`word is impossible, particularly when much of the material is
`not just unhelpful, but - as in the case of spam - intentionally
`misleading. This is why results must be ranked according to
`perceived relevance, and the process by which a particular
`search engine indexes its content and ranks the results is really
`a large part of what makes it unique. One of the ways Google
`leapt allead of its competitors early on is that it developed an
`algorithm called PageRank that relied on hyperlinks to infer
`the authority of various pages containing a given keyword.
`Some of the problems of Page Rank ·will be examined in chap(cid:173)
`ter 4- Here, it is enough to note that the process by which an
`index is established, and the attributes that are tracked, make
`up a large part of the "secret recipes" of the various search
`engines.
`The crawling of the web and processing of that content hap(cid:173)
`pens behind the scenes, and results in a database of indexed
`material that may then be queried by an individual. The final
`piece of a search engine is its most visible part: the interface, or
`"front end," that accepts a query, processes it, and presents the
`results. The presentation of an initial request can be, and often
`is, very simple: th e search box fotmd in the comer of a web(cid:173)
`page, for example. The sparse home page for the Google
`search engine epitomizes this simplicity. However, providing
`people with an extensive set of tools to tailor their search, and
`
`

`

`The Engines
`
`to refine their search, can lead to interesting challenges, par(cid:173)
`ticularly for large search engines with an extremely diverse set
`of potential users.
`In some ways, the ideal interface anticipates people's behav(cid:173)
`iors, understanding what they expect and helping to reveal
`possibilities without overwhelming them. This can be done in
`a number of ways. Clearly the static design of the user interface
`is important, as is the process, or flow, of a search request.
`Westlaw, among other search engines, provides a thesaurus
`fLmction to help users build more compreh ensive searches.
`Search engines like Yahoo have experimented with auto(cid:173)
`completing searches, anticipating what the person might be
`trying to type in the search box, and providing suggestions in
`real time (Calore 2007). It is not clear how effective these par(cid:173)
`ticular elements are, but they exemplify the aims of a good
`interface: a design that meets the user half-way.
`Once a set of results are created, they are usually rarLked in
`some way to provide a list of topics that present the most sig(cid:173)
`nificant "hits" first. The most common way of displaying
`results is as a simple list, with some form of summary of each
`page. Often th e keywords are presented in the context of the
`surrow1ding text. In some cases, there are options to limit or
`expand the search, to change the search terms, or to alter the
`search in some other way. More recently, some search engines
`provide results in categories, or mapped graphically.
`All of these elements work togeilier to keep a search engine
`continuously updated. The largest search engines are con(cid:173)
`stantly under development to better analyze and present
`searchable databases of the public web. Some of iliis work is
`aimed at malcing search more efficient and usefi.u, but some is
`required just to keep pace. The technologies used on th e web
`ch ange frequently, and, when they do, search engines have to
`ch ange witl1 them. As people employ Adobe Acrobat or Flash,
`search engines need to create tools to make sense of iliese
`formats. The sheer amount of material iliat must be indexed
`increases exponentially each year, requiring substantial
`
`

`

`of these pages. but limited itself to the titles of the tiles.
`Nonetheless. it represented a first effon to reign in a quickly
`growing. chaotic information resource. not by imposing order
`on it from above, but by mapping and indexing the disorder to
`make it more usable.
`The Gopher system was another attempt to bring order to
`the early internet It made browsing files more practical, and
`represented an intennedia.I)' step in the direction of the World
`Wide Web. People could navigate through menus that organ(cid:173)
`ized documents and other files. and made it easier, in theory, to
`find what you might he looking fo< Gopher Jacked hypertext (cid:173)
`you cou)d not indicate a Hnk and have that link automatically
`load another document in quite the same way it can be done on
`the web - but it facilitated working through directory struc4
`tures, and insulated the individual from a command-line inter(cid:173)
`face. Veronica, named after Archie's girlfriend in J94os-era
`comics, was created to provide a broader index of content avail(cid:173)
`able on Gopher servers. Uke Archie, it provided the capability
`of searching tides (actually, menu items). rather than the full
`text of the documents available, but it required a system that
`cou1d crawl through the menu-structured directories of
`•gopherspace• to discover each of the files (Parker 1994).
`In 1991. the World Wide Web first became available, and
`with the popularization of a graphical browser. Mosaic. in 1993.
`it began to grow even more quick1y. The most usefu1 tool for the
`web user of the early 1990s was a good bookmark tile, a collec(cid:173)
`tion of URLs that the person had found to he useful. People
`began publishing their bookmark files to the web as pages. and
`this small gesture has had an enormous impact on how we use
`the web today. The collaborative filtering and tagging sites that
`are popular today descended from this practice, and the updat(cid:173)
`ing and annotating of links to interesting new websites led to
`some of the first proto-blogs. Most importantly, it gave rise to
`the first collaborative directories and search engines.
`The first of these search engines. \Vande.x. was developed by
`Matthew Grey at the Massachusetts Institute of Technology,
`
`

`

`TheE
`
`and was based on the files gathered by his crawler, the World
`Wide Web Wanderer. It was, again, developed to fulfill a partic(cid:173)
`ular need. The web was made for browsing, but perhaps to an
`even greater degree than FTP and Gopher, it had no overarch(cid:173)
`ing structure that would allow people to locate documents
`easily. Many attribute the genesis of the idea of the web to an
`article that had appeared just after the Second World War enti(cid:173)
`tled "As we may think", in which Vannevar Bush (r945) sug(cid:173)
`gests that a future global encyclopedia will allow individuals to
`follow "associative trails" between documents. In practice, the
`web grows in a haphazard fashion, like a library that consists of
`a pile of books that grows as anyone throws anything they wish
`onto the pile. A large part of what an index needed to do was to
`discover these new documents and make sense of them.
`Perhaps more than any previous collection, the web cried out
`for indexing, and that is what Wandex did.
`As with Veronica, the Wanderer had to work out a way to
`follow hyperlinks and crawl this new information resource, and,
`like its predecessors, it limited itself to indexing titles. Brian
`Pinkerton's WebCrawler, developed in I994• was one of the
`first web-available search engines (along with the Repository(cid:173)
`Based Software Engineering ["RBSE"] spider and indexer; see
`Eichmann 1994) to index the content of these pages. This was
`important, Pinkerton suggested, because titles provided little
`for tl1e individual to go on; in fact, a fifth of tile pages on tl1e web
`had no titles at all (Pinkerton 1994). Receiving its milliontil
`query near tl1e end of 1994, it clearly had found an audience on
`the early web, and, by the end of 1994, more than a half-dozen
`search engines were indexing the web.
`
`Searching the web
`
`Throughout tile 1990s, advances in searcll engine tecllnology
`were largely incremental, with a few exceptions. Generally, tile
`competitive advantage of one search engine or another had
`more to do with the comparative size of its database, and how
`
`

`

`Search
`
`Society
`
`quickly that database was updated. The size of the web and
`its phenomenal growth were the most daunting technical
`challenge any search engine designer would have to face. But
`there were some advances that had a significant impact. A
`number of search engines, including SavvySearch, provided
`metasearch: the ability to query multiple search engines at
`once (A.E. Howe & Dreilinger 1997). Several, particularly
`Northern Light, included material tmder license as part of
`their search results, extending access beyond what early web
`authors were willing to release broadly (and without charge) to
`the web. Northern Light was also one of the first to experiment
`with clustering results by topic, something that many search
`engines are now continuing to develop. Ask Jeeves attempted
`to make the query process more user-friendly and intuitive,
`encouraging people to ask fully formed questions rather th an
`use Boolean search queries, and Alta Vista provided some early
`ability to refine results from a search.
`One of the greatest cl1allenges searcl1 engines had to face,
`particularly in the late 1990s, was not just the size of the web,
`bu t the rapid growth of spam and other attempts to manipulate
`search engines in an attempt to draw th e attention of a larger
`audience. A later chapter will address this game of cat-an d.
`mouse in more detail, but it is worth noting here that it repre(cid:173)
`sented a significant teclmical obstacle and resulted in a
`perhaps unintended advantage for Google, which began pro·
`viding searcl1 functionality in 1998. It took some time for
`those wishing to manipulate search engines to understand
`how Google's reliance on hyperlinks as a measure of reputa·
`tion worked, and to develop strategies to influence it.
`At the same time, a number of directories presented a com·
`plementary paradigm for organizing the internet. Yahoo,
`l ookS mart, and others, by using a categorization of the inter(cid:173)
`net, gave their searclles a much smaller scope to begin with.
`The Open Directory Project, by releasing its volunteer.edited,
`collaborative categorization, provided another way of mapping
`the space. Each of these provided the ability to search, in
`
`

`

`It is impossible for me or anyone else to guess why this
`particular posting became especially popular, but every page
`on the web that becomes popular relies at least in part on its
`initial popularity for this to happen. The exact mechanism is
`tmclear, but after some level of success, it appears that popu·
`larity in networked environments becomes "catching" (or
`"glomming"; Balkin 2004). The language of epidemiology is
`intentional. Just as social networks transmit diseases, they can
`also transmit ideas, and the structures that support that distri·
`bution seem to be in many ways homologous.
`Does this mean that this power law distribution of the web is
`an tmavoidable social fact ? The distribution certainly seems
`prevalent, not just in terms of popularity on the web, but in a
`host of distributions that are formed under similar conditions.
`More exactly, the power law distribution appears to encourage
`its own reproduction, by providing an easy and conventional
`path to the most interesting material. And when individuals
`decide to follow this path, they further reinforce this lopsided
`distribution. Individuals choose their destination based on
`popularity, a fully intentional choice, but this results in the
`winner.take.all distribution, an outcome none of the contribu.
`tors desired to reinforce; it is, to borrow a phrase from
`Giddens, "everyone's doing and no one's" (1984, p. ro). This
`sort of distribution existed before search engines began
`mining linkage data, but has been further reinforced and
`accelerated by a system that benefits from its reproduction.
`In the end, the question is probably not whether the web and
`the engines that search it constitute an open, even, playing
`field, or even, as Cooper had it with newspapers, "whether a
`community derives most good or evil, from the institution"
`(Cooper 2004, p. 113). Both questions are fairly settled: some
`information on the web is more visible than other informa·
`tion. We may leave to others whether or not the web is, in sum,
`a good thing; the question has little practical merit as we
`can hardly expect the web to quietly disappear any time soon.
`What we may profitably investigate is how attention is guided
`
`

`

`Attention
`
`differently on the web from how it has been in earlier informa(cid:173)
`tion environments, and who benefits from this.
`
`PageRank
`
`By the end of the 1990s search engines were being taken seri(cid:173)
`ously by people who produced content for the web. This was
`particularly true of one of the most profitable segments of
`th e early web: pornography. Like many advertising-driven
`industries, "free" pornography sites were often advertising(cid:173)
`supported, and in order to be successful they needed to attract
`as many viewers as possible. It did not really matter whether or
`not the viewer was actually looking for pornography - by
`attracting them to the webpages, the site would probably be
`paid by the advertiser for the "hit," or might be able to entice
`th e visitor into making a purchase. The idea of the hawker
`standing on the street trying to entice people into a store is
`hardly a new one. Search engines made the process a bit more
`difficult for those hawkers. In fact, for many search engines,
`securing their own advertising profits required them to
`effectively silence the pornographers' hawkers.
`Search engines were trying to avoid sending people to
`pornography sites, at least unless the searcher wanted that,
`which some significant proportion did. What they especially
`wanted to avoid was having a school-aged child search for
`information on horses for her school report and be sent -
`thanks to aggressive hawking by pornography producers - to
`an explicit site, especially since in the 1990s there was a sig(cid:173)
`nificant amount of panic (particularly in the United States)
`about the immoral nature of the newly popular networks. Most
`advertisers have a vested interest in getting people to come to
`their site, and are willing to do whatever they can in order to
`encourage this. Google became the most popular search
`engine, a title it retains today, by recognizing that links could
`make it possible to understand how the web page was regarded
`by other authors. They were not the first to look to the
`
`

`

`Search
`
`Society
`
`hyperlinked structure of the web to improve their search
`results, but they managed to do so more effectively than other
`search engines had. Along with good coverage of the web, a
`simple user interface, and other design considerations, this
`attention to hyperlink structure served them well.
`Google and others recognized that hyperlinks were more
`than just connections, they could be considered votes. When
`one page linked to another page, it was inclicating that the con(cid:173)
`tent there was worth reacling, worth cliscovering. After all, this
`is most likely how web surfers and the search engine's
`crawlers encountered the page: by following links on the web
`that led there. If a single hyperlin.k constituted an endorse(cid:173)
`ment, a large number of links must suggest that a page was
`particularly interesting or worthy of attention. This logic, prob(cid:173)
`ably reflecting the logic of the day-to-day web user, was ampli(cid:173)
`fied by the search engine. Given the need to sort through
`thousands of hits on most searches, looking at the links proved
`to be helpful.
`Take, for example, a search for "staph infections." At the
`time of writing, Google reports nearly 1.3 million pages include
`those words, and suggests some terms that would help to
`refine that search (a feature of their health-related index). The
`top result, the one that the most people will try, is from
`Columbia University's advice site Go Ask Alice; the hundredth
`result is a page with information about a pain-relieving gel that
`can be used for staph infections; and the last page listed
`(Google only provides the first several hundred hits) is a blog
`posting. Gathering link data from search engines is not partic(cid:173)
`ularly reliable (Thelwalln.d.), but Alta Vista reports that these
`three sites receive 143, 31, and 3 inbound hyperlin.ks, respec(cid:173)
`tively.3 People search for things for very clifferent reasons, but
`the Columbia site represents a concise, authoritative, and gen(cid:173)
`eral overview of the condition. The last result is a blog entry by
`someone who is writing about her hus band's recent infection,
`and while it certainly may be of interest to someone who is
`faced with similar circumstances, it does not represent the sort
`
`

`

`CHAPTER EIGHT
`
`Future Finding
`
`At present, we think of search engines largely as a way to find
`information. In practice, we are already using them to find
`people and places as well. As we move to what has been termed
`an "internet of things," we begin to move beyond an index of
`knowledge, and toward an index of everything. As the sociable
`web grows to include not only the services we are familiar with,
`but collaborative virtual and augmented realities, the central
`position of search engines in social life will continue to gain
`strength.
`What does that future search engine look like? There are
`indications both of technological alternatives to the current
`state of search, and of organizational differences. Experimental
`search engines present information in a map of clustered
`topics, or collect information from our life and use it to infer
`search restrictions. Just as the creation of content has been dis(cid:173)
`tributed in interesting ways over the last few years, there are
`indications that a centralized search engine may be just one of
`a number of alternatives for those engaging the social world
`online.
`A 14-year-old subject in a study by Lewis and Fabos (2005)
`suggested "Everybody does it. I've grown up on it. It's like how
`you felt about stuff when you were growing up." She was talk(cid:173)
`ing about instant messaging, but the same could easily be said
`of search engines. They now feed into the backgrow1d of our
`everyday activities and media use, only of note when they are
`frustratingly absent. Search engines remain in the news
`because of the clashes between the search giants and tradi(cid:173)
`tional sotuces of institutional power. While this has held
`
`

`

`Search
`
`Society
`
`our attention, research into new ways of wrangling the web
`continues. What does the future of search hold?
`In the near term, many wonder what technologies might, as
`one commentator suggested, dethrone Google as the "start
`page of the internet" (Sterling 2007). This final ch apter briefly
`explores some common predictions about the direction of
`search, and what these ch anges might mean for our social lives
`in the next decade.
`
`Everything findable
`
`Eric Brewer (2oor) dreams of a search engine that will let him
`find things in his chaotic office. Because we have learned to
`turn to search, it can be frustrating when something is not
`searchable, but every day more of the world is becoming
`searchable.
`The first step of this is to make all text searchable. The com(cid:173)
`puter brought about two surprises. First, productivity did not
`increase, it decreased, and documents took more work to pre(cid:173)
`pare instead of less. Second, the paperless office never really
`happened, and paper use has increased rather than decreased.
`Gradually, however, things are being born digital and never
`make their way onto paper. People bank, file their taxes, hand in
`their homework, distribute memos, and publish books online.
`All this digital media becomes fodder for search engines. While
`it may not yet be part of the general-purpose search engines like
`Google or Yahoo, eventually the contents of all of these work
`flows are likely to show up there as well, at least for those who
`are permitted to access them. There are even services that will
`open your mail and scan it for you, so that paper never pollutes
`your office. Optical character recognition (OCR) technologies
`are improving to such a degree that they are increasingly able to
`recognize written texts, as well as printed texts, allowing for at
`least partial indexing of hand-written documents for access
`in search engines (Milewski 2oo6). Under those conditions,
`Brewer's searchable office is almost here.
`
`

`

`Future
`
`Things that were recorded in books, on audio tape, and on
`film are gradually being digitized, and often opened up to the
`web. Major book scanning projects by Google, Amazon, and
`the Internet Archive (supported by Microsoft and Yahoo) are
`aiming to wliock hw1dreds of years of printing and make it
`searchable. Once images, audio, and video are scanned, the
`question is how they will be made searchable. Especially now
`that do-it-yourself video is so collllllon on the web, finding a
`way of searching that material - short of relying on creators
`and collllllentators to describe it in some useful way - has
`proven difficult. A great deal of current research is dedicated to
`extracting meaningful features from video, identifying faces,
`and recognizing music and speech.
`A number of companies are working at making sense of the
`continual streams of data that are available. BBN Technologies,
`for example, has created a Broadcast Monitoring System,
`which detects speech and translates it in real time. It makes it
`possible to search for terms in your own language and find
`whether they have been mentioned in a broadcast anywhere in
`the world.
`Some of the greatest innovations over the last few years have
`been in moving the once arcane field of geographical informa(cid:173)
`tion systems into the public eye. Not only is there the possibil(cid:173)
`ity of using a searcher's geographical context to find "what's
`near me," but mapping and visualization products allow for far
`easier searching for locations, and navigating to those loca(cid:173)
`tions. Mapping will continue to improve with higher resolu(cid:173)
`tion and more quickly updated views. Google's Street View
`makes navigating through cities far easier, and no doubt this
`will continue to expand. By identifying the times and places
`videos and photographs were taken, it is possible to compile a
`profile of a location, assembled through a nwnber of disparate
`records. Experimental robotic airships designed to deliver city(cid:173)
`wide wireless connections also promise real-time aerial views
`of a city, a staple of science fiction (Haines 2005; Williams
`2005). Coogle's investment in 23andme, a site that provides
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket