throbber
Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00481
`
`

`

`Copyrigln e 2:003 Pietri' Baldi. P..l!Cllo Pt3SCOtli :u:td P3dlu:'lic Smy1h
`Johil Wiky & Soil§. Ltd. The Atrium .. So111hetn Gate, Chichestet.
`Published by
`West Su$So!.X P019 &SQ. England
`(+44) 12:43779171
`Plum~
`
`Email (fot or'ders and C~K-tomer service etlqui.tic~): ~boob(iwUey.ro.uk
`Vi!l.it out Ho•ni': Pste·OO www.wUeyeurop;>.('(lm Of ..... -ww.witey.com
`
`All RigbiS Re~tved. No psrt of this publication may be l't'ptoduetd. uoi'OO in a rettieval sysum or
`tr.ltlsmitti"d tn any fotm ot by any 1ll!e:ltlS. eJecw:mic. m.ech:utical photocopying. ti'eotditlg. s..~m\it1g ot
`ol.M:twise. exttpt undet 1M tetms or the CopyrigbL Desig.n..~ and P::lttt'lu Act 1988 ot undet tbe ~etms of
`a licetltt issued by the Cop)•right lictnsing Agency Lid. 90 Tonenham Coon Rood. l.():udocl WIT 4LP.
`UK. ~·i thoul the pennissioo in writi.t~& of the Publisbu. Requests to the Publlsher sbou.ld be addtessed
`10 the Per1niJ>doos JA_>p:~,tment. John Wiley & Sou.~ Ltd. '\"he. Aulurn. Soothem Gate. Otiebe:~et. West
`su.~~x P019 8SQ. El~&l:ltld.oten~o:~iled to pennreq@wiley.oo.uk. ot faxed(() (+44) 12:43770620.
`
`Tbis publication is cksigMd 10 prcwidc accurate :ltld aulhotit:ui ve lnJonootJoo itl regard to the subject rtt.:iuer
`CO\-tml. ll is soJdoo 1M undm.tanding th:u 1.be Publi~ber is not eng:ageditl rendering ptofi'ssiou~l set\'ires.
`If profe$Sio.tit.l advltt or otl)l!( expen a<;sist:ltltt is ti'quit.;-d, the seo.•icts of a rornpeu•.nt professional.sboold
`be !lMlght.
`
`Oth" wu~.,. EtlitiN'Ittt O.Qkn
`
`John Wiky & Sons ln.-.. l l l Rhw St~t. Hoboti'n. NJ 07030. USA
`
`Jossey-Bas.~. 989 MMW Stn.~L. S:ltl Pr:tudsro. CA 94!03- 1741. USA
`
`\VIky-VCH Vetlag O.nbH. Boscbsu. 12:. D-69469 Weinhl'im .. Gemt:ltly
`
`Jobn Wiley & Sons Australia Ltd. 33 Pad: Road. Milton. Q""""n.<;J:ltld 4064. Austr:llla
`
`John \Viky & Sons (Asia) Pte Lid. 2: Ckmetlti Loop fi!02:-()I, Jit1 Xing Distrl.~•1:. Sing~ 1298()1)
`Jobu Wiky & Son!l. Csnada Ltd. 2:2: \\~cester R03CI. Etobicoke. Otll.3rio. CaWa M9W 1 Ll
`
`Witty also publishes its books tn a V3:f~ty of electronic fonuau. Some ro.ruent tha1 :lppt'at!l. in pri1U may
`f)Ot be 8\'llilable in electronic books.
`
`Brlt/sll Ubru.ry Ctuak>gul~tg In hblkttJi(nt Dtm~
`
`A cat~Jos:ue. record for this book is :l\'8Uable !torn the Bri1i~b Libtaty
`
`ISBN 0-470-849()6..1
`
`'Jype~t in I<Y12:pt1imes by T&T Ptoductioos ltd. L.oudotl.
`Ptlnted and bou1:td in G~at Britain by Diddles Lid. GuildfOotd. Su.wy.
`Tbis boot is pti..nted on acid-fn.~ papo.->r responsibly ounuf::u:1utro fronl sustail)3h~ fOoti'Sll')'
`in which at le:l<;t t'A'O trees a:ti' plamed fill' ucb ooe used fot paper productiOil.
`
`

`

`5
`
`Link Analysis
`
`Computing the relevance. of a document is a major issue io Web-based in.form.ation
`retrieval. As we h.ave seen in Chapter 4. when considering unsttucrured collections
`of documents, it is possible to compute relevance with respect to user quedes such
`as those involving keyv.·ord-based Boolean expressions. or those involving rnea..c:oures
`of similarity between documentc:o. In each of these cases the score is a functiOJl of
`the document aod the query. Latge collections of hypertext such as the Web ate
`mote i nteresti ng in t11h; respec.t since rhey allow us ro compute scoring functions that
`inc:lude topologic.aJ information about the hypestext graph. The basic assumption is
`t.h.at hyperlinks comain infonnation about the humrul judgme.Jll of a documenL To
`a 6rst approximation, the more incoming links that exist for a document, the more
`likely it is that the document wa.~; judged to be 'important' by the authors of ot11er
`documents linking to it.
`lncoming links e.mbody a common oolio1l of popularity that also e:<iSLIO in other
`domain.s or i1l other webs. Nern·or-ks of interaction h.ave bee.n studied for a long time
`in soc-ial sciences (Wa.~;.serman and Faust 1994), where nodes correspond to persons
`or organ.iz.alions, and edges represent some type of social intetaction. I.Jltuitively,
`incceasing the number of incoming links to a node should iocrea.~;e a con1mon-sense
`mea..o;ure of strulding or popularity or prestige for that node. Howevet. it should be also
`common sense that just counting t11e number of lioks does not 1\ecessarily provide an
`accurate mea.~; w-e of standing. For example, measuriog the prestige of an enterprise
`b)' the mere number of its c-lients could be misleading, since ditl'ece.Jlt clients may
`have \'el')' different weighK
`An impor[a.llt example is the oetwor'k obtained by coosidel'ing the .scientific lite-ra(cid:173)
`lure. Nodes in dlis case are papers. books. or entire joumal.s. and edges c-orte$pond to
`c itation.~;. h makes se.nse to assume that the more citation.o; a pape-r or a book receives,
`the more it can be a.~;sumed to be important. since it ha..o;; beeo judged as useful by other
`sc.ientis-ts. llte systematic COJlSlJ'UCtion of such networks through citation indexes was
`introduc-ed by Garfield (1955), who later proposed a mea.~;ure of standing for jounlals
`lltal is still in u.10e. l11is me.as.ure, called imtntctfacwr (Gat-field 1972). is defined as
`lhe avernge oumber of citatioo.s per recently published itern. More prec-isely, if C is
`llle total number of c.imtions in a given time imerval(t. 1 + taf to anides published
`MIJddlng lhrlnl~m~l unJ lhr \\W.. P. Baldi. P. Ft:l~ a•"'- P. Smyth
`0 2003 P. &ldi, P. ~ti :md P. Smyth
`ISBN: 0-47~84906- 1
`
`

`

`126
`
`EARLY APPROACHES TO LINK ANALYSIS
`
`by a g iven j ournal du1·ing [1 - !z, 1 ] . and N is the total number of artic.les p ublished
`Iz.l], the. impact factor is detined as C{N (typically It = J
`by th.at journaJ in 11 -
`yeat and 12 = 2 years). Thus. the impact factor is a very simple measure, s-ince it
`ba.~ically cotresponds to the normalized iodegtee of a joumal in a subgrnph of the
`citation network. Graph-based tink aoalysis foe the scienti6c literature goes bac.k to
`the 1960s (Gatnec l967). However. these ideas were JlOt exploited inlhe developme.Jll
`of first-gene-ralion Web search tools.
`The paper by Bray (1996) reports an early attempt to apply sociaJ nerwotksooncepts
`to the Web. He suggested a We.b vi.5:ualization approach where the ' .. . appearance of
`a site should reflect i t(i \'i!iibiliry, a(j measured by the numbe.r of other site.~ that have
`pointei'S to it . .. ' and ' . . . il~ lwnino.fity, as measured by the nwnber of pointers with
`whic.h it casu oa"igatiooallight off-site . . . •. Visibility and luminosity detioed io this
`W3)' are direc.tly related ro the indegree and llle outdegree of websites, re.~pectively.
`More recently, toward the end of the 1990s. link anaJysis methods became more
`widely known and used itl a searc.h engine come.~l, leadiog ro what is sometimes
`c.alled the second genen11io" of Web searching tools.
`11lis chapter reviews the most common approaches to link analysis and how these
`rechn.ique.~ are be applied to compute the popularity of a document or a site. The aJgo(cid:173)
`rithms presemed ill this chapter extract emerge.Jlt propertie.~ from a complex netwot•k
`of interconnectioo.s. attempting ro model (indirectly) subjec.tive human judgmems.
`ll re.main.~ debatable as: to whether popularity (a.~ implied by the mechanism of cita(cid:173)
`tions) captures well the notions of relevance and quality as lhey are subjectively
`perceived by humatts, and whether link analysis algo•·ithms Catl successfully model
`human judgl'netlK
`
`5.1 Early Approaches to Link Analysis
`
`Our notation for hypertext will be .straighrfotward. For each vette.~ v ill the hypertext
`graph G = (V. £). d(v) denotes the 0011tents of the docutnellt at venex u. (f d(v)
`is considered to be an isolaled document, then iL~ .sc.ote with respect to a query q
`i.s .t(v I d(v), q). Whe.o c.onsideritlg d(v) in ils hypertext oon£ext. the .score shouJd
`alsodepe11d on G and will be denoted as S(v I d(v), q, G). In the following, we will
`.simplify the 110tation oflhese scores by ju.st writiog .f( v) and S( v) if the dependencies
`Oil d(v). q. and G are obvious from the context.
`To quantify visibility (Jumino.sity) S(v) couJd simply be designed to grow with
`lite indegtee (outdegree) of vas hhued by Bray (1996). Cle.arly, however, such an
`approac.h suffe.ts from a fuodamemal limitation: it would fail ro capture tJte relative
`importa11ce of diffetent pare.nts (dtildre.Jl) ill lhe graph. For example, a Web page
`with a .sraallnumber of lioks comillg from importrull .sile.111 s.hould be considered mote
`popular than a Web page with a latger number of links and whose sources are all from
`unimportrult or irrelevant sites. Hence. rather than a me.re count, popularity should be
`computed as a weighted sum of tJte c-itations a docume.m receive.~ through hyperte.xt
`links. ldeas having this flavor are less recentlh.an we might e.xpect.
`
`

`

`LINK ANALYSIS
`
`127
`
`11:te use of hypene.xt illfonnation in information retrieval is older than the Web.
`Mark ( 1988), for example, wasconcenled with retrieving hypertext cal'ds ill a medkal
`domai11 a11d no1ed that • ... often cards do not even mention what they ate about, but
`assume that l11e ce.ader understands the c.ome.xt because he or she ha.'> read ettrlier
`c.ards.· He the-n ptoposed a simple aJgoritJun for sc.oring docume.JHS where relevance
`infonnation was traosmined fcom documents to their patems in the hypet1ext graph
`G. More precisely. the •gtobaJ' score of v giveo the query atld the £Opology of G was
`computed as:
`S(u) = s(u) + - - L S(w).
`
`I
`
`(5.1)
`
`[ch[uJI w<J<hi•JI
`This simple algorithm soroewh.al cesembles message passing sc.hemes that are very
`cotnn\On in connectionism (McClelland and Rumelh.art 1986) or in graphiC'.al mode-l(cid:173)
`i ng (Pearl I 988). As such. it require..;; G to be a OAG so that a topological sort 1 can be
`chosen for updating the global scotes S. llte DAG as..sumption is reasonable in small
`hypertexts with a root documem and a rehllivel)• sltong hierarchical structure (iJtthis
`case, eve11 if G is 1101 acyc.lic, not mudl infomtati on would be lost by replacing it with
`i ts spanning r.ree). The Web. however, is a large and comple-x graph. This may explain
`why search engines largely ignored i ts topology For several years.
`11:te paper by Marchioti ( I 997} was probably the first one to discuss the quantitative
`concept of h)ipCr infumrmiou to complemem rext1uil injonnttiion in order to obta.i11
`lhe overall information contaioed in a Web document. The idea somewhat tesembles
`frisse's approach. l ndeed, if we rewrite Equation (5.1) as
`S(v) = s(u) + ll(u),
`then J(u) crut be thought of a.~ d:te textual infonnation (that only depends on the
`documem and the query). It( v) corresponds to lite hyper infonnation that depends on
`d:te link sttucture where v is embedded, and S(v) is the overall i 11formation. Mate-hiOti
`( 1997) did oot cite Ma~k ( J 988). butllOiletheles...;; he identified a fundame.ntal proble-m
`with Equation (5.1). If an irrelevant page v has a s·ingle link to a relevam page w,
`EquatiOJl (5.1) implies that S(v) ~ S(w). Tite scenario would be even woi'Se in a
`ch.ain of documems t.'(). u,, . .. . v~;-. Here if S(vk) is ver)' high but S(t\>) .. . . . S(v,._, )
`are almost zero, dten t.tl would receJ\'e a global score highe-t dlan VI:, even lhough a
`user would need k clicks lO reach the important document.
`As a remedy. Marchjori suggested lllill i ll this case dte hype-r infotmmion of vo
`should be c.omputed as
`h(u) = L F'(•.w>s(w).
`
`(5.2)
`
`(5.3)
`
`U!E~tl{vj
`
`lchl vJI) is the rank
`where F e (0. I) is a fading oonstaLtt a11d r(v. w) e ( I .. .
`of w after sorti ng (in a..;;c.e-•tding order) the c.hi ldre!l of u according to the value of
`
`1 A 1opolog.ical SOtt is 81Hitdetin& '<' of1he verlices such th:ll v < v' if and only if there is a dtre<:led
`p:.tb !tom v' to u (Connen t't a/. 2001).
`
`

`

`142
`
`PROBABILISTIC liNK ANALYSIS
`
`Section 5.3). It can be shown that the pages in a community connected in this way
`can be ranked highly by HITS. higher than page.111 in a much latger colleclio•l where
`ooly !UJme hubs link to some authorities. Clearly the TKC errect could be deliberately
`created by spammers intere..1111ed in pushing the trulk of certai11 webs-ites. lempel and
`Momn (200 1) consLntcted examples of oommuniry pairs C., C01111ected in a TKC
`fashion. and c, spatSely COilJlected, aod proved d1a1 authoritie.lll of C$ are ra..1ked
`above ~te autho rities of C1 by HITS but not by SALSA.
`hl a similar vein. Rafiei and Mendelzon (2000) aod Ng e1 a/. (200 I) have proposed
`varirum of t11e HITS algorithm based Oil a tandom walk model witJl re.o;et. similar 10
`the ooe used by PageRa11k. More precisely, a tandom surfer staru at time 1 = 0 at a
`tandom page and subsequently follows links from the cumm page witJl probability
`I - s , or (s)he jumps to a new random page widl p'I'Obability E . Unlike PageRa11k.
`in this model the surfer will follow a forward link on odd steps but a backward link
`oo even steps. For large 1. two statiooary distribmions re.111ult from this rru1dom walk.
`one fot odd \'alues of 1. that corre.~ponds to aJl authority distributi011, and one for
`even values of 1 that cotrespond to a hubness distribuljOI\. In \'ector llOtation the two
`distributions are proportional to
`il:ZJ+I = £1 +(I - £)Arlt:z,,
`h11 = d +(l - £)A~a:zt-l ·
`
`(5.21)
`
`(5.22)
`
`The stability prope.rties of these ranking distributions are similar to tJtose of PageRank
`(Ng <Ill/. 2!Xll).
`Some funher impro,•eme.ns of HITS ru1d SALSA, a.111 well as theoretical analyses
`011 the prope1tie.111 of the.o;e algorithms can be found in Borodin e1 fll. (2001 ).
`
`5.6.2 PHITS
`
`Cohn and Chang (2000) point out a differe.nt probte.~n with HITS. Sioce only the
`ptincipaJ eige.ovecmr is extracted. t11e authOI'ity along the remaining eige.nvectors is
`completely neglected. de.lllpile the fact that it could besig•titicanc An obvious approach
`to address this limitation consi.stlll of taking into account several eigenvectors of the
`co.citatiOil matrix, in the s.ame s-pirit as PCA is used to exttacl several factors that
`are responsible for variations in multi\'atiate data. As we h.ave discus..o;ed ill Sec(cid:173)
`tion 4.5.2. however, the statislical as..111umptions UJlderlying PCA are not sound for
`multinomial data such as tem'H:I.ocument occurrence..111 or bibliographical citations.
`PHJTS can be vie\\•ed as probabilistic LSA (see Sectio11 4.5.2) applied to co-ciLalion
`and bibliogrnphic coupling mauices. In this ca111e citations replace temts. As in PLSA.
`a dOCUJ)'lelll d is ge.nerated accordillg to a p'I'Obability disttibut.ion P(d) and a 'la.te•n·
`variable .z L~ then anached 10 d with probability P(l I d). He.re z could represe.ot
`resea..'Ch ateao; (ill the c.ase of bibliographic data) or a (topical) con1munity in the
`case of Web documents. Cit:ltions (link.111) are t11en chosen according (0 a probability
`dillltribution P (d' l ;::) .
`
`

`

`LINK ANALYSIS
`
`143
`
`Figun- 5.7 A link: famt. Shaded nodc-..s an: all copies of the same page.
`5.7 Limitations of Link Analysis
`Search engines can be 'spammed' by websites faking hjgh rele\'ance with respect to
`some topics for lhe sole purpose of attracting visitors. Matchiori (1997) c.alled tJlis
`phenomenon Search Engine Pers:uasioo (SEP). f i rst-genera6on engi1les that ba.c:;ed
`their tanking on classic-.al information retrieval mea.c;ures were c-.Jeatly very sensitive
`to SEP. ComnlOn eatly techniques used to fool tJtese eogines included the use of
`in.appropriatesite tides or descriptions. the inclusion of META key"'·ords in the HTML
`code, or eve1l the use of extra text invisible to humrul sul'fe.rs. Link analysis was
`immediately recognized as a solid defense against SEP. lo their seminaJ paper on
`PageRrulk. Page'' a/. ( 1998) sta[ed thal
`. . . foe a page m get a high PageRank. it must co•wioce rul imponam page.
`or a lm of non-important pages to link to it. At worsL you can have manip(cid:173)
`ulation ill the form of buyi•tg advel'tisemem.s Oinks) on important sites.
`But this seems well under comrol since it costs mo1ley. This i1nmunily
`to nunipulation is an extremely important propetty.
`In the inte.rveniog (itne period, ranking page..;; u...;;ing link analysis bas become the Slan(cid:173)
`datd approach followed by all the 'sec.ond ge•leratioo' search engioes. Con..;;equently.
`website owners have realized the e.normous impo11aoce of maximizing PageRa.nk or
`similar lillk-based scoting fUJlC.[ions ill order [0 increase visibility. For some websites.
`having high search engine ranking (with respect to some keywords) is so importaJlt
`th.at it juslifie..;; COL1.11iiderab1e financial iovestme .• us. Buying links a..;; a fonn of adver(cid:173)
`tisement ha~ become ceality and pechaps ill some ca..;;e.~ it is a seosible aJte-rn.ative to
`payiog search engine c.ompanies directly ror advertised liok.~.
`A link farm is a densely connected Web subgraph artificially buih for the purpose
`of accumulating PageRank (oc similar mea~ures of popularity). Link fam1.s c.an be
`buil( in ma..1y ways bu[ link exchange is a common approac.h. A website owner who
`
`

`

`144
`
`liMITATIONS OF liNK ANALYSIS
`······ · ..
`... ············
`.. . · . ·
`)--"-:::::::~~ \ .
`·-•. · ..
`
`.·
`..... ··"
`
`. ...
`·······
`woa
`
`/
`
`·····
`
`G
`
`F iJ;Urt' 5.8 CanonicaJ dccompooition of t he Web relntive co a subgrnph (;.
`
`decides to join the fann agcee.'\ to store a copy of a 'hub' page on her server and to
`link it from the root of her site. ln rerum. the m<~ill URL of her site is added lO the
`hub page. which is iJl tum redisttibuted to the sites participatl1tg in the link exchange.
`Tile tesuh is a de.nsely connected subgtaph like lhe one shown in figure 5.7.
`ll may appear that, s ince struc.tute.'\ of this kind ate highly regular, they should
`be rehuively easy to detec.t (see Exercise 5.8) and lhus liok famting should nm be
`a serious concern for search e11gioes. Howe,·er, it is possible to boHd fanns lltat are
`more tightly eotangled in the Weh and are thel'efore more di fficult to detect by simple
`topological anal yses. This problem ha<> been recently poilued out by Bianchini eta/.
`(200J), who ha,•e shownlltat e\'ery community, de.fined as an arbiuary subgraph G
`of the Web, 1HUS'I s.alisfy a special fotm of •energy balance'. The overall PageRan.k
`assigned to pages in G grows with tlte 'energy' that flows i o from page~<> linkiog to
`llle community and decreases with the energy dispersed in sinks and passed to pages
`out<> ide tlte community. With reference to Figure 5.8. let Gout de.note the subgrnph
`of G ioduced by page..; tJ\at contain hyperlinks pointing outside to G and let Gstnt
`de-1101e tJte sink subgtaph of G. Also. let Win be the subgrnph induced by the pages
`outside G that l in.k to pages in G . Then the equation
`UrG II = "''IGI + E(; - Ed'"k - £Gu.'.
`holds, whece criGI is the 'defauJt' energy that is assigned to the community,
`. 1-• "
`£('; = - - ~ [G(w)r(w)

`wrEIV;•
`
`(5.23)
`
`is the energy flowing ill from outside. whe-re fc(w) is the frac.tjon of lioks in w that
`poillt to pages in G,
`
`1 -£ "
`£~' = - ,- ~ ( I - /c(w))r(w)
`wEGw,
`
`is the ene.rgy flowil1g out to pages outside the community, and
`
`£~llk = 1 ; S L r(w).
`
`~rtt~EG$ifll:
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket