throbber
102
`
`ACM ECHT CONFERENCE
`
`Making Use of Hypertext Links when Retrieving Information
`
`RP. Frei, D. Stieger
`Swiss Federal Institute ofTechnology (EI'H) Zurich
`Department of Computer Science
`8092 Zurich, Switzerland
`
`Abstract
`Hypennedia links were invented to support the manu(cid:173)
`al browsing through large hypertext or hypermedia
`collections. However, J:e1lieving specific portions of
`inf'onnation in such a collection caimot be achieved
`by browsing only; J:e1lieval mechanisms are neces(cid:173)
`saey. In this paper we show how to use the semantic
`content of hypertext linkS for ~eval., We present
`special plllpOS8 indexing and retrieval algorltbms tbat
`exploit the node and link content. First retrieval re(cid:173)
`sults in a hypertext test colleCtiOn am preented: the
`reSults am cleady better than those obtained when the
`links are ignored The hope is that these results can
`be extended to h~edilrinfmmation and that they
`can be impxoved by mme sophisticated indexing algo(cid:173)
`rlthms.
`
`Introduction
`1
`The main idea of hyper documents is that documents
`- or parts thereof- can be brought into relation to
`each other and that additional infODDation may be at(cid:173)
`tached to any part of a document. The document
`pans are called nodes. When these nodes are con(cid:173)
`nected by links a hyper c1ocmnellt web results. Each
`hypermedia node constitutes an information item.
`Complex nodes are paditioned into simpler ones and
`-·at the lowest level -
`are simple linearly orga(cid:173)
`nized hypermedia nodes of a. single media type.
`'IbCse nodes and the links connecting them constitute
`a~ gmph. A link~ (n1, ni) xepresents a con(cid:173)
`nection from the somee node n1 to the des9nation
`noden2.
`
`PermiSIIion to eopy without fee all or pan of this material
`is granted provided that c.opiEB are not made or distn'buted
`for direct commr:rclal advantage, the ACM copyright no(cid:173)
`tice and ihc title o1' the publicatiOil a~:~d it!~ date appear,
`and notice .is given that copying is by permi11ion o1' the
`Association for Computing Machinery. To copy or.herw!Se,
`or to 'republish, requilu a. feel and/or spec:ific permission.
`@1992 ACM 0.89791-S47-X/S2/0011/0102/ $1.60
`
`Moat of the conventional Information Retrieval
`(lR) algOrithms have been developed for searcbing in
`large linear text collections [11]. They are not suited
`to n:trleve infonnation in non-linearly organized hy~
`per collections. If they are applied to the individual
`linear nodes of hyPer collections, the hyper structure
`that contributes a great deal to the content of a
`-
`is simply ignoxed and the re-.
`h~r coDection -
`trleval results·are accordingly poor. In addition, con(cid:173)
`ventional IR algorithms are not suited to retrieve
`non-textual information. iiS they employ teXtual de(cid:173)
`scriptors as indexing features. To retrieve informa(cid:173)
`tion in multimedia environments, suitable features
`have to be introduced for every individual medium as
`done for speech documtmts [5].
`This paper focuses on the problem of explo.iting
`'the Jinks when specific content-related infonnation is
`to be retrieved. We present new IR algorithms that
`make use of the semantic content of the liJJb in(cid:173)
`volved. Currently, we are considering text informa(cid:173)
`tion and textul!l descriptors exclusively. We hope
`that the methods developed are geheral enough to be
`extended to non-textual features and thus to real hy(cid:173)
`permedia collections.
`
`2 Hypertext Nodes and Links
`
`2.1 Hypertext Information
`We aheady pohited. out that hypertext information
`consists of two parts. FJI'St, thexe iS the information
`contained in the hypertext nodes. Second, the inter(cid:173)
`connections between the nodes, the links, define the
`structure of the hyper document and the nature of ev(cid:173)
`ery partiCular inter~ection [6]. Therefore, m al(cid:173)
`gorithms have to deal with both parts: nodes and
`Jinks.· In this paper, we concenttate on using the
`semantic content of links and employ weB-known IR
`algorithms to deal with the textual information con.
`tained in the nodes.
`.

`Links allow the user to discover rela.tionsbips that
`are difficdlt to determine without a hyper structure.
`We distinguish two types of Jinks [3]:
`
`EXHIBIT 2030
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`

`

`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`103
`
`- referential links,
`- semantic links.
`The main purpose of referential links is comfortable
`reading of the document. The purpose of semantic
`links is to point to similar, more detailed, or addi(cid:173)
`tional information. The reasons for establishing
`such semantic links contribute to the topic descrip(cid:173)
`tion of the link.
`Fig. 1 depicts a small example of a hypermedia or
`hypertext collection. The small part of the collec(cid:173)
`tion shown includes the two hypernets Nt = {nt,
`D}.}. Dl.2• D2, ns, D6, ng} and N2 = {n3, 114, n7}.
`The nodes nu and n1.2 are sub-nodes of nt; they
`structure node nt similarly to the way two paragraphs
`structure a chapter. The links connecting n1 with
`these two nodes are referential links. On the other
`hand, the nodes n3 and n7 contain information related
`to the information of node 114· This is the reason for
`the semantic links pointing from 114 to n3 and from
`n4 tO n7.
`
`r::--1 node
`~ refereotiallink
`~ semantic link
`
`Fig. 1: Referential and Semantic Links
`
`To facilitate content-specific retrieval, nodes and se(cid:173)
`mantic links have to be indexed. The problem of
`how to index independent text nodes and queries has
`been discussed thoroughly in theIR literature [11].
`This is why we concentrate here on the indexing of
`links and on the subsequent usage of indexing infor(cid:173)
`mation.
`
`2. 2 The Nature of Links
`Referential links serve the same purpose as foreign
`keys in a domain of a relational database. Therefore,
`they do not provide additional information to the
`topic of the document. They are plain pointers to
`improve both reading and browsing. Their use for
`information retrieval purposes is not discussed in this
`paper.
`Conversely, semantic links provide additional in(cid:173)
`formation on the topic of the hyper document web.
`Semantic links point to nodes which would be diffi(cid:173)
`cult to f"md otherwise. Such nodes contain special
`
`annotations, corrigenda, or similar, contradicting,
`generalizing, specializing, or simply additional in(cid:173)
`formation. Every link has associated some well(cid:173)
`structured attributes like creation time, author name,
`and the like. The information associated with a link
`may be consulted both by a 'reader' when browsing
`through the document and by a retrieval algorithm
`when processing a query.
`A link consists of the following components:
`A. = <t, I, Is, ld>,
`where
`t
`is the link type
`is a set of structured link attributes
`I
`(auxiliary link information)
`Is signifies the source node of the link A.
`ld signifies the destination node of the link A.
`The link type t specifies whether the link is of type
`referential or semantic. In addition, the parameter t
`could be used later to distinguish more than these two
`types, in particular to distinguish several subtypes of
`semantic links. The intention is to restrict ourselves
`to a few link types so that their semantics may be
`understood fully by authors and users. This is in
`sharp contrast to other hypermedia paradigms which
`are based on up to 80 different kinds of links [15].
`
`2.3 The Link Description
`We associate a link description i with every semantic
`link. This description contains mainly the content
`related reasons for the existence of the link, usually
`expressed by some topic descriptors (fea!_ures). It is
`the result of an indexing procedure A. ~ A. that takes
`the neighboring nodes of the link into account, in
`particular the source and destination nodes. In addi(cid:173)
`tion, the content-specific link description can be ex(cid:173)
`panded or even changed by users when they read or
`browse through the hypernet. New links can be es(cid:173)
`tablished by users when content-related relations were
`not recognized by the initial authoring and indexing
`processes.
`-
`The link description A. is a vector with feature
`weights as components. In the case of purely textual
`nodes (e.g. hypertext), the features are usually
`weighted terms or phrases.
`As hyper collections result from a dynamic pro(cid:173)
`cess, some nodes are more densly linked than others.
`Specifically. older nodes tend to be well linked
`whereas newer ones are often poorly linked. Thus
`we believe that the absence of a (semantic) link does
`not imply necessarily the absence of referential or
`semantic relations between the nodes. , If there is no
`link we a:.sume that possible dependen~ies are un(cid:173)
`known. This is in contrast to Frisse's interpretation
`that says "the absence of a link (or a path of links)
`between two cards asserts that the utilities of the two
`
`

`

`104
`
`ACM ECHT CONFERENCE
`
`'i. = <lo, ... , lm-1>
`
`cards are conditionally independent" [4]. The prob(cid:173)
`lem of missing links becomes apparent when a hyper
`document collection is in use: Poorly linked nodes
`are less likely to be found when browsing through
`the collection. Sparsely linked nodes also hamper
`those automatic retrieval methods that use links.
`In what follows we assume that the indexing vo(cid:173)
`cabulary contains m tel!1ls. The node description ii
`and the link description A. are therefore given by:
`ii =<no, ... , Dm-t>
`, where nj with 0 S i < m
`are the term weights of
`noden,
`, where li with 0 S i < m
`are the topic descriptor
`weights of the link A..
`The link description 'i. depends mainly on the link
`type t E T, on the source node Is, on the destination
`node ld, and possibly on an user expansion ii repre(cid:173)
`sented by a feature vector:
`ii =<uo, ... ,um-1> ,whereujwithOSi<m
`are the weights of descrip(cid:173)
`tors provided bY users.
`We define a simple link indexing function i:
`i:TXJRffiXJRm~ Rm,
`(t, fs, id) H a (j (t, iS, ld))
`where f ; T X R mx R m ~ lR m is a function
`modelling why the semantic link was estab(cid:173)
`lished; the link type t may play an amplifying
`or negating role. The purpose of this func(cid:173)
`tion is to find common abstract concepts
`(expressed by weighted indexing terms) in the
`source and destination nodes. A similar idea
`was followed by Croft and Turtle in their
`probabilistic hypertext retrieval model [2, p.
`219].
`a : R m ~ R m is a (non-injective) map(cid:173)
`ping funcf!on reducing small components of
`the vector A. to 0. As the number of links in
`a hyper document collection is usually much
`larger than the number of nodes, the aim is to
`keep link descriptions as compact as possible.
`The correstion function iu takes an existing link de(cid:173)
`scription ~ and a user e~sion 'ii in order to create a
`modified link description A.':
`iu: JRffiX JRID ~ JRID,
`<'i.. u) H a <~+ii)
`More elaborate link indexing functions would also
`take information outside of the immediate neighbor(cid:173)
`hood of the two nodes into account. In this way, a
`weak destination node followed by promising descen-
`
`dants would also get a chance to be considered by a
`retrieval algorithm.
`
`3 Retrieval Strategies exploiting Hy(cid:173)
`pertext Links
`
`3.1 General Considerations
`When users browse, they normally follow only a
`small number of the existing links. Conversely, a
`retrieval algorithm may follow many links and may
`create both high retrieval costs and doubtful results.
`Retrieval experiments in a collection of bibliographic
`references showed that following citations-a .kind of
`referentiallinks-p:oduces ambiguous results [8, 12]:
`. .. An evaluation of the process shows that many
`useful content words can be extracted from related
`document titles, as well as marry terms of doubtful
`value . . .
`[12, pp. 385]
`The hope is that our semantic links contain the in(cid:173)
`formation necessary to decide whether a further node
`should be visited by the retrieval algorithm or· not.
`The prbposed automatic navigation through the hyper
`web is governed by the following considerntions:
`the further away from the initial node the IR al- .
`gorithm searches the less likely it is to find suit(cid:173)
`able information;
`links are only followed when they promise to
`point to nodes containing information relevant to
`the query;
`it may become mandatory to visit a specific node
`(e.g., because of given restrictions during there(cid:173)
`trieval process), depending on the retrieval re(cid:173)
`somces available.
`In this way the descriptions of semantic links control
`the navigation process of the retrieval algorithm.
`The existence of such link descriptions and their use
`for retrieval purposes constitute the main difference
`between our retrieval algorithm and approaches de(cid:173)
`scribed elsewhere [4, 8, 9, 13].
`
`3.2 Retrieval Algorithm and Retrieval
`Costs
`A node nj is said to be adjacent to node Dj if and only
`if there exists a link A. (ni, llj) or a linK A. (nj. ni).
`Given a link A. (nit Dj), Dj is said to be the destina(cid:173)
`tion nOde of this link {or,less precisely, a destination
`node of the node nj), conversely ni is a source node of
`Dj. The outdegree of a node n is the number of links
`with n as source node, the indegree is the number of
`links with n as destination node.
`The retrieval algorithm determining a Retrieval
`Status Value (RSV) between the query q and the hy(cid:173)
`pemet node n includes an initialization and a naviga(cid:173)
`tion phase:
`
`

`

`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`105
`
`{standard retrieval}
`{initial RSV}
`
`{hypertext retrieval}
`
`{navigation procedure described below}
`{result, to be sorted}
`

`
`InitialiZation phase, step <D
`(])
`for all rwdes n of collection do
`RS~~ :=similarity (q, n)
`end for
`for all rwdes n of collection do

`rsv := RSV'l;
`navigation(n, rsv, 1)
`IniettList ( q, n, rsv)
`end for
`• Navigation phase, iteration of decision step® and navigation step @.
`procedure navigation (inn, in/out rsv;in distance)
`if(outdgree (n) > 0) and (distance ~maximum_distance) then
`for all destination nodes n' of node n do
`.
`.
`if sim (q, .t (n, n')) > threshold_value then
`updatersv
`navigation (n', rsv, distance+ I)
`end if
`end for
`end if
`We use the definitions for walk, trail, and path in the
`same sense as they are used in graph theory. If (no.
`n1, ... , lid) is a sequence '?f nodes of a hypemet H,
`such that ni is adjacent to ni+ 1 'V O~<d, then these
`nodes and the corresponding links are called a walk of
`length d. If the links are distinct, this walk is called
`a trail of length d. If all the nodes of a trail are dis-
`tinct, the trail is called a path of length d.
`The navigation distance is the minimum path
`length from a given reference node n in the hyper
`web to any node that can be reached from this node.
`If a maximum navigation distance is given, the re(cid:173)
`trieval algorithm ignores those nodes whose distance
`from the reference node exceeds the· maximum dis(cid:173)
`tance.
`We assume a homogeneously linked hyper colleC-
`tion with nodes of an average outdegree of k. If a
`node traversal starts from a given reference node n,
`the number of traversed nodes of all possible paths
`with length d grows exponentially. To limit there(cid:173)
`trieval costs, a navigation strategy with restrictive
`propagation must be found.
`
`Two different hypermedia retrieval strategies can be
`distinguished:
`a) A user who knows little about the hyper collec(cid:173)
`tion is looking for an entry point convenient to
`start browsing. Here the aim of the retrieval
`strategy is to retrieve nodes that represent the in(cid:173)
`terest of the user. For this purpose an exhaus(cid:173)
`tive search through the network is necessary.
`The potentially large search effort can be·iimited
`by taking into account the descriptions of the
`semantic links.
`b) The user is "sitting" on a node satisfying some
`of her or his information needs. In this case an
`automatic navigational search along the hyper
`structure is performed to find further useful
`nodes.
`In what follows we describe these two strategies,
`namely, the exhaustive search and the navigational
`. search along semantic links.
`
`Fig. 2:.Navigation Distance
`
`

`

`106
`
`ACM ECHT CONFERENCE
`
`3. 3 Exhaustive Search
`The aim of the exhaustive search algorithm is to re(cid:173)
`trieve nodes that are in the focus of the user's inter(cid:173)
`est. Each node in the hyper collection is regarded as
`a reference candidate from where a search may start.
`~ blltializationpha~:
`An initial RSV'l~ is obtained by determining the
`similarity between the reference node n and the
`query q given by a m-dimensional real vector q ==
`< qo, ... , 'lm-1 >. This can be done by applying
`a conventional retrieval function p [11]:
`p : JR mx JR m ~ JR , (q, ii) H p (q, ii)
`This retrieval function was called 'similarity' in
`3.2.
`® Decision step:
`The algorithm decides if a navigation step to the
`next node has to be performed. This is the case
`if the link description signalizes the existence of
`infonnation related to the query in adjacent
`nodes. This can be determined by the similarity
`function <1 between _the query description q and
`the link description A.:
`<1: lR mx lR m ~ lR, (q, ~) H <1 (q, ~)
`If <1 (q, £) is smaller than a threshold value v,
`navigation is prevented. Jn addition, a predicate
`P on the attribute set I limits navigation. In
`this way the function sim of chapter 3.2 be(cid:173)
`comes:
`
`( ( <1 (q, i) > v) AND P (I))
`@ Navigation step:
`Each time the algorithm navigates, the RSV
`with respect to the reference node is modified.
`This modification depends primarily on the desti(cid:173)
`nation node and on the distance from the cwrent
`reference node. Furthennore, it can depend on
`the net topology.
`The modification of the current RSV when a nav(cid:173)
`igation step is perfonned from a node of distance
`d to a node of distance d+ 1 is calculated as fol(cid:173)
`lows (n~1 denominates the destination node i of
`node n at distanced+ 1):
`
`d+l
`RSV3.j.~ := RSV3'n + wd ·I RSV~'ni
`i
`Wd is a propagation factor depending on the navi·
`gation distance previously covered and on the net
`topology. We are considering to use the 'link
`quality' -
`a function of <1 -
`as an auxiliary
`weighting factor.
`
`The method presented for the exhaustive search ap(cid:173)
`proach- in particular the choice of specific retrieval
`functions and weighting factors - will be discussed
`in chapter 4 where an example is introduced.
`
`3.4 Navigational· Search
`
`The navigational search deals with the situation
`where a user is 'sitting on' a node essential to the
`scope of her or his information need. The assump(cid:173)
`tion then is that further interesting nodes can be
`found in the neighborhood of this reference node pro(cid:173)
`vided that the retrieval algorithm follows suitable
`semantic links. The retrieval algorithm first deter(cid:173)
`mines a subset of further nodes to be visited depend(cid:173)
`ing on the maximum navigation distance (cf. 3.2).
`Subsequently, a search is initiated as described in
`the previous chapter. It is to be noted that this navi(cid:173)
`gation is limited by the propagation threshold value v
`and the predicate P as described in the previous chap(cid:173)
`ter.
`
`4
`
`Some Experiments
`
`4.1 Test Collection
`Functions
`
`and
`
`Indexing
`
`To show the effects of the method presented, we used
`a conventional test collection which was expanded au(cid:173)
`tomatically to a hypertext. The collection consists
`of a subset of the INSPEC collection comprising
`2472 documents and 65 queries [7]. The expansion
`to a hypertext was made by considering documents to
`be nodes and establishing two directed links between
`nodes with common phrases in their manual descrip(cid:173)
`tions (paragraph 'DE'). The result was a hypertext
`consisting of a large net of 2397 nodes and 7 inde(cid:173)
`pendent small nets (1 of 5, 2 of 4 and 4 of 2 nodes).
`Jn addition, there are 54 single independent nodes
`without any links to the rest of the collection. The
`average outdegree is 23.07, the average path length
`between two arbitrary nodes within a net is 4.2.
`This represents a iknsly linked hyper collection.
`The node and the query descriptions were obtained .
`by applying Porter's word stemming [10] and a term
`weighting of tf ·idf [11, p.63]. As all links are in(cid:173)
`terpreted as semantic links. the link type t was omit(cid:173)
`ted. Therefore, the link indexing function i is re-
`duced to
`.
`isimple : (iS, id) H a2o (f (iS, id.)) ,
`where iS and id. are vectors who~ components are the
`frequencies of the terms occuring in source and desti(cid:173)
`nation nodes respectively. For simplicity reasons,
`a2o restricts the resulting vector to the 20 highest
`valued weights (all other vector components are ~t to
`0).
`
`

`

`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`107
`
`Tim;e different link indexing functions J are applied
`and their perfonnance is compared: .
`f1 : fs + id.
`topics covered by both nodes
`f2 : fs + id + ii with additional link information
`(by'user')
`f3. : u
`'user' link information exclu-
`sive}~
`The function f 1 represents a simple attempt to pro(cid:173)
`vide a first link description. The idea is to express
`the common topics of both involved nodes.
`The 'user' link information ii was not provided by
`a real user in our particular case but was determined
`automatically. It was obtained by decomposing the
`phrase of paragraph 'DE' ~ which established the
`link- into reduced words (applying Porter's algo(cid:173)
`rithm [10]).
`Where function f2 simulates a plausible real situa(cid:173)
`an initial link description is modified by an
`tion -
`t}le function f3 i!l
`auxiliary manual description -
`only of academic interest. In real hyper collections
`the 'user' link information will probably be provided
`by relevance feedback meehanisms as pointed out in
`[14] and only in very few and special cases by a man(cid:173)
`ual link description. Conversely, the function f3 al(cid:173)
`lows us to compare the simple automatic approach
`f 1 with an elaborated manUal link description.
`For the following experiments the functions p
`(initialiZation phase) and a (link similarity function)
`are identical to the cosine measure [11, p.121J.
`
`4.2. Basic Evaluations
`Fig. 3 shows the resu!! of 11pplying the link descrip(cid:173)
`tion function ft (ie.ls + 10). The propagation dis-.
`tance was restricted to 1. The effectiveness was mea(cid:173)
`sured by taking the average number of relevant nodes
`of the 20 nodes with the highest RSVs (we chose
`this simple measure for reasons of simplicity in the
`data representation, as each of the following graphs
`represents the evaluations of at l~st 600 cOmplete· re(cid:173)
`trieval evaluations of 65 queries on 2472 nodes).
`The other two axes show the parameters 'propagation
`factor' Wd (chapter 3.3) and 'threshold value' v
`(chapter 3.3) determining if a propagation has to take
`place.

`.
`The constant 'platform' for large values of v shows
`the effectiveness level of the algorithm. High values
`of v prevent the propagation. Low values of Wd, in
`particular Wd = 0, prevent the RSV to be incre(cid:173)
`mented. In both cases the initial retrieval status
`value RsV": (cf. 3.2) remains unchanged.
`A threshold value of v = 0 returns the best re(cid:173)
`sults. Althoggh 39.6% of the links show a similar(cid:173)
`ity of a (q, ~) ""0 (and thcroforc arc not taken into
`account), the retrieval costs are very high for v = 0 as
`
`the algorithm visited 13.9 links per node on the aver(cid:173)
`age.
`
`Fig. 3: Evaluation !1 = id +IS
`It turned out that taking all links into account (on the
`average 23 links) in this experiment does not provide
`significantly different retrieval results compared to the
`ones with threshold value v = 0 (not shown in Fig.
`3). Conversely, newer results of experiments per(cid:173)
`formed on another collection [14] that suggest link
`descriptions (based on fl) may enhance the retrieval
`effectiveness even when taking significantly less
`links into account. Taking fewer links -
`and there(cid:173)
`fore reducing the retrieval cost -
`provides satisfac(cid:173)
`tory results only for small propagation factors Wd.
`The main problem is to balance retrieval cost
`against retrieval effectivei)ess. We are measuring the
`effectiveness by comparing a method A with a
`method B. If no method B is explicitely specified,
`we compare a given method with the retrieval that
`does not take links into account.
`As an example let· us look at v = 0.2 and Wd =
`0.45: the improvement is only 7.7% on the average
`(compared with 10.3% at v = 0 and Wd = 1.05). In
`this example ,only 0.56 links per node are followed

`on the average.
`Fig. 4 and 5 show the improvement by taking an
`auxiliary user description into account. Fig. 4
`shows the result when the additional features are
`weighted with their term frequency in the phrase.
`
`

`

`108
`
`ACM ECHT CONFERENCE
`
`compared to using single terms automatically e:J{.~
`tracted from the documents. Also, our 'user descrip(cid:173)
`tion' originated from a manual indexing of the nOdes
`instead of being a real description of the links. Also
`it is debatable whether a deComposition of phrases is
`suitable because of the loss of information.
`Fig 6. shows the ·effect of applying the link index(cid:173)
`ing function f3:
`
`Fig. 4: Improvement by a 'Virtual' User Description:

`!2 = IS + id + ii, simple weight
`
`Fig. 5 shows the result when the weights of the aux(cid:173)
`iliary user features are doubled to increase their influ(cid:173)
`ence.
`
`. Fig. 6: User Description Only: f3 = ii
`
`A small improvement of the retrieval effectiveness
`can be gotten by using the indexing function.f3, i.e.
`the additional use~ description only. This provides
`two further advantages:
`• The retrieval costs are significantly lower:
`Only -1.79 nodes are visited on the average for a
`threshold value of v = 0. With v = 0.1, only
`1.25 and with v = 0.2, only 0.53 nodes are vis(cid:173)
`ited
`.
`.
`• The ·method is less sensible to variations of Wd
`andv:
`.
`Th~ small nwnber of__:_ in the major part -
`spe(cid:173)
`cific features in the semantic link description al(cid:173)
`lows only a very restrictive navigation; the vis(cid:173)
`ited nodes seem more useful than the nodes that
`are visited when applying the indexing functions.
`f1 and f2. As far as we know there are no prob(cid:173)
`lems with outliers. Applying a median function
`instead of the average function during the naviga(cid:173)
`tion step (cf. 3.2) does not modify the results
`significantly for the indexing functions f1 to f3.
`This result shows that besides the over-all effective(cid:173)
`ness - which behaves as expected -
`other factors·
`determine 1the quality of a link description. A good
`choice of the link description seems to be crucial for
`stable retrieval algorithms.
`
`Fig. 5: Improvement by a 'Virtual' User Description:
`· fz = IS + 1(1 + u, stressed weight
`
`The improvement compared to the evaluation of !1 is
`relatively small in both cases. This result is not
`very swprisi,ng. As mentioned in 4.1 our 'user de(cid:173)
`scription' was derived automatically from phrases
`similar to the way descriptors were obtained. in the
`Cranfield Project [1]. These Cranfield experiments
`showed that retrieval effectiveness is not increased
`when a controlled vocabulary and phrases are used
`
`

`

`MILANO, NOVEMBER 30 - DECEMBER 4, 199:!
`
`109
`
`4. 3 Visiting Nodes with Distances larger
`than 1.
`
`To show the effects when taking nodes up to a dis(cid:173)
`tance of 2, we experimented with the link indexing
`functions f2 = iS + id + u (Fig. 4) and f3 = u (Fig.
`6). fu Fig. 7 and 8 the propagation factor Wd for the
`distances 1 and 2 are depicted instead of the axes Wd
`and v. The value axis represents the average number
`of relevant nodes within the 20 nodes with highest
`RSV.
`Because bidirectional/inks are generated in our ex(cid:173)
`periments, two series of experiments can be done for
`all parameter combinations:
`An evaluation without restriction, i.e. the algo(cid:173)
`rithm may return to the node itis coming from.
`An evaluation where the backward link is bloclred
`when the corresponding forward link is activated.
`
`Although we did both types of experiments, only the
`ones with blocked return links are presented here.
`The results are in fact very similar, but we think that
`it is too early to make general conclusions.
`
`Fig. 8: Distance 2 (v=O) for function !3
`
`· Both evaluations were performed with the link index(cid:173)
`ing function f2. Fig. 9 shows the threshold pair
`Vdist=l= 0.05 and Vdist=2= 0.10; Fig; 10 the pair
`Vdist=l= 0.10 and VcJist=2= 0.20.
`
`Fig. 7: Distance 2 (v=O) for function f2
`
`Fig. 7 shows the evaluation using the node indexing
`function f2. Fig. 8 uses the function f3. The val(cid:173)
`ues· of v are 0 for both distances 1 and 2. The two
`gtaphs show a smaller improvement when distance 2
`nodes are added as opposed to the improvement gotten
`when we added the distance 1 nodes. This can be ex(cid:173)
`plained by the decreasing similarities between the
`query and the nodes that are 1jlrther away from the ref(cid:173)
`erence node. It goes without saying that the retrieval
`cost grows extremely fast with increasing distance.
`Fig 9 and 10 show the evaluations for threshold
`values higher than 0 (to reduce retrieval cosis).
`
`Fig. 9: Distance 2 (v=0.05/0.10); f2 = id +iS+ ii
`
`. In our -
`densly linked.- test collection it is ques(cid:173)
`tionable if it is worth propagating up to distance 2
`given these circumstances. A small improvement of
`the effectiveness must be paid for with a significant
`increase of retrieval costs.

`
`

`

`110
`
`ACM ECllT CONFERENCE
`
`Further experiments will be necessary to investi(cid:173)
`gate. the role of the source and destination nodes wllen
`links are automatically indexed. Furthermore, it
`should be possible to find approxiniations for mi~s­
`ing node descriptions, e.g. for not indexed nodes that
`contain not supported media types.
`
`6
`
`Cc;mclusioris
`
`We showed that it is possible to enhance the retrieval
`effectiveness by having the retrieval algorithms fol(cid:173)
`low semantic links thus considering the conteilt-ori(cid:173)
`eiited neighborhood of a reference node. Semantic
`links have to be established between nodes whenever
`appropriate for this purpose. These links must be
`indexed by a suitable indexing method that delivers a
`link description depending on the content of at least·
`the two adjacent nodes.
`We devised retrieval algorithms taking the descrip(cid:173)
`tions of both the nodes and the links into account.
`After evaluating the similarity between the query and .
`a reference node, these algorithms determine whether
`links should be followed and if so, which links are
`the most promising ones to follow: The experi(cid:173)
`ments we carried out with ~ hypertext test collection
`showed encouraging improvements in retrieval effec(cid:173)
`tiveness.
`
`References
`
`[1] Cleverdon, C.W., Keen, E.M.: Factors· Deter(cid:173)
`mining the Perfomumce of Indexing Systems.
`Vol. 1 and 2, Aslib Cranfield Research Project,
`Cranfield. England. 1966;
`
`[2] Croft, W.~ .• Turtle, H.:.A Retrieval Model for
`Incorporating Hypertext Links. · Proc. of the
`1st European Conf. on Hypertext (ECHI') 90,
`Ed. A. Rizk, N. Streitz, J. Andre, Cambridge
`University Press, 1990, pp. 213-224.
`
`[3]
`
`[4]
`
`Frei, H.P., Schlluble, P.: Designing a Hyper(cid:173)
`media Information System. Proc. DEXA '91,
`Springer-Verlag, Wien, 1991, pp. 449-454.
`
`Frisse, M.E.: Searching for Information in a
`Hypertext Medical Handbook. Commun. ACM
`31, No. 7, July 1988, pp. 880-886.
`
`[5] Glavitsch, U., Schauble, P.: A SyStem for Re(cid:173)
`trieving Speech Documents. Proc.l5th ACM
`SIGIR Conf., SIGIR Forum, ACM Press,
`June 1992, pp. 168-176.
`[6} Haan, B., Kahn, P., Riley, V., Coombs, J.,
`Meyrowitz, N.: IRIS Hypermedia ServiCes.
`Commun. ACM 35, No. 1, January 1992, pp~
`36-51.
`
`Fig. 10: Distance 2 (v=O.l0/0.20); f2 = ld +iS+ ii
`There is evidence that in a less densiy linked eollec.:.
`tion it might be worthwhile to take distances larger.
`than 1 into account. When we removed 50% of the
`links randomly in our test collection there was -
`of
`course -
`a smaller effectiveness improvement with
`distance 1. On the other hand. we still observed a
`significant improvement with distance 2. This effect
`is reproducible with all the link indexing functions
`presented .here.
`
`5
`
`c·urrent and Future Work
`
`The experiments described in lhls paper showed how
`important i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket