`
`ACM ECHT CONFERENCE
`
`Making Use of Hypertext Links when Retrieving Information
`
`RP. Frei, D. Stieger
`Swiss Federal Institute ofTechnology (EI'H) Zurich
`Department of Computer Science
`8092 Zurich, Switzerland
`
`Abstract
`Hypennedia links were invented to support the manu(cid:173)
`al browsing through large hypertext or hypermedia
`collections. However, J:e1lieving specific portions of
`inf'onnation in such a collection caimot be achieved
`by browsing only; J:e1lieval mechanisms are neces(cid:173)
`saey. In this paper we show how to use the semantic
`content of hypertext linkS for ~eval., We present
`special plllpOS8 indexing and retrieval algorltbms tbat
`exploit the node and link content. First retrieval re(cid:173)
`sults in a hypertext test colleCtiOn am preented: the
`reSults am cleady better than those obtained when the
`links are ignored The hope is that these results can
`be extended to h~edilrinfmmation and that they
`can be impxoved by mme sophisticated indexing algo(cid:173)
`rlthms.
`
`Introduction
`1
`The main idea of hyper documents is that documents
`- or parts thereof- can be brought into relation to
`each other and that additional infODDation may be at(cid:173)
`tached to any part of a document. The document
`pans are called nodes. When these nodes are con(cid:173)
`nected by links a hyper c1ocmnellt web results. Each
`hypermedia node constitutes an information item.
`Complex nodes are paditioned into simpler ones and
`-·at the lowest level -
`are simple linearly orga(cid:173)
`nized hypermedia nodes of a. single media type.
`'IbCse nodes and the links connecting them constitute
`a~ gmph. A link~ (n1, ni) xepresents a con(cid:173)
`nection from the somee node n1 to the des9nation
`noden2.
`
`PermiSIIion to eopy without fee all or pan of this material
`is granted provided that c.opiEB are not made or distn'buted
`for direct commr:rclal advantage, the ACM copyright no(cid:173)
`tice and ihc title o1' the publicatiOil a~:~d it!~ date appear,
`and notice .is given that copying is by permi11ion o1' the
`Association for Computing Machinery. To copy or.herw!Se,
`or to 'republish, requilu a. feel and/or spec:ific permission.
`@1992 ACM 0.89791-S47-X/S2/0011/0102/ $1.60
`
`Moat of the conventional Information Retrieval
`(lR) algOrithms have been developed for searcbing in
`large linear text collections [11]. They are not suited
`to n:trleve infonnation in non-linearly organized hy~
`per collections. If they are applied to the individual
`linear nodes of hyPer collections, the hyper structure
`that contributes a great deal to the content of a
`-
`is simply ignoxed and the re-.
`h~r coDection -
`trleval results·are accordingly poor. In addition, con(cid:173)
`ventional IR algorithms are not suited to retrieve
`non-textual information. iiS they employ teXtual de(cid:173)
`scriptors as indexing features. To retrieve informa(cid:173)
`tion in multimedia environments, suitable features
`have to be introduced for every individual medium as
`done for speech documtmts [5].
`This paper focuses on the problem of explo.iting
`'the Jinks when specific content-related infonnation is
`to be retrieved. We present new IR algorithms that
`make use of the semantic content of the liJJb in(cid:173)
`volved. Currently, we are considering text informa(cid:173)
`tion and textul!l descriptors exclusively. We hope
`that the methods developed are geheral enough to be
`extended to non-textual features and thus to real hy(cid:173)
`permedia collections.
`
`2 Hypertext Nodes and Links
`
`2.1 Hypertext Information
`We aheady pohited. out that hypertext information
`consists of two parts. FJI'St, thexe iS the information
`contained in the hypertext nodes. Second, the inter(cid:173)
`connections between the nodes, the links, define the
`structure of the hyper document and the nature of ev(cid:173)
`ery partiCular inter~ection [6]. Therefore, m al(cid:173)
`gorithms have to deal with both parts: nodes and
`Jinks.· In this paper, we concenttate on using the
`semantic content of links and employ weB-known IR
`algorithms to deal with the textual information con.
`tained in the nodes.
`.
`·
`Links allow the user to discover rela.tionsbips that
`are difficdlt to determine without a hyper structure.
`We distinguish two types of Jinks [3]:
`
`EXHIBIT 2030
`Facebook, Inc. et al.
`v.
`Software Rights Archive, LLC
`CASE IPR2013-00479
`
`
`
`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`103
`
`- referential links,
`- semantic links.
`The main purpose of referential links is comfortable
`reading of the document. The purpose of semantic
`links is to point to similar, more detailed, or addi(cid:173)
`tional information. The reasons for establishing
`such semantic links contribute to the topic descrip(cid:173)
`tion of the link.
`Fig. 1 depicts a small example of a hypermedia or
`hypertext collection. The small part of the collec(cid:173)
`tion shown includes the two hypernets Nt = {nt,
`D}.}. Dl.2• D2, ns, D6, ng} and N2 = {n3, 114, n7}.
`The nodes nu and n1.2 are sub-nodes of nt; they
`structure node nt similarly to the way two paragraphs
`structure a chapter. The links connecting n1 with
`these two nodes are referential links. On the other
`hand, the nodes n3 and n7 contain information related
`to the information of node 114· This is the reason for
`the semantic links pointing from 114 to n3 and from
`n4 tO n7.
`
`r::--1 node
`~ refereotiallink
`~ semantic link
`
`Fig. 1: Referential and Semantic Links
`
`To facilitate content-specific retrieval, nodes and se(cid:173)
`mantic links have to be indexed. The problem of
`how to index independent text nodes and queries has
`been discussed thoroughly in theIR literature [11].
`This is why we concentrate here on the indexing of
`links and on the subsequent usage of indexing infor(cid:173)
`mation.
`
`2. 2 The Nature of Links
`Referential links serve the same purpose as foreign
`keys in a domain of a relational database. Therefore,
`they do not provide additional information to the
`topic of the document. They are plain pointers to
`improve both reading and browsing. Their use for
`information retrieval purposes is not discussed in this
`paper.
`Conversely, semantic links provide additional in(cid:173)
`formation on the topic of the hyper document web.
`Semantic links point to nodes which would be diffi(cid:173)
`cult to f"md otherwise. Such nodes contain special
`
`annotations, corrigenda, or similar, contradicting,
`generalizing, specializing, or simply additional in(cid:173)
`formation. Every link has associated some well(cid:173)
`structured attributes like creation time, author name,
`and the like. The information associated with a link
`may be consulted both by a 'reader' when browsing
`through the document and by a retrieval algorithm
`when processing a query.
`A link consists of the following components:
`A. = <t, I, Is, ld>,
`where
`t
`is the link type
`is a set of structured link attributes
`I
`(auxiliary link information)
`Is signifies the source node of the link A.
`ld signifies the destination node of the link A.
`The link type t specifies whether the link is of type
`referential or semantic. In addition, the parameter t
`could be used later to distinguish more than these two
`types, in particular to distinguish several subtypes of
`semantic links. The intention is to restrict ourselves
`to a few link types so that their semantics may be
`understood fully by authors and users. This is in
`sharp contrast to other hypermedia paradigms which
`are based on up to 80 different kinds of links [15].
`
`2.3 The Link Description
`We associate a link description i with every semantic
`link. This description contains mainly the content
`related reasons for the existence of the link, usually
`expressed by some topic descriptors (fea!_ures). It is
`the result of an indexing procedure A. ~ A. that takes
`the neighboring nodes of the link into account, in
`particular the source and destination nodes. In addi(cid:173)
`tion, the content-specific link description can be ex(cid:173)
`panded or even changed by users when they read or
`browse through the hypernet. New links can be es(cid:173)
`tablished by users when content-related relations were
`not recognized by the initial authoring and indexing
`processes.
`-
`The link description A. is a vector with feature
`weights as components. In the case of purely textual
`nodes (e.g. hypertext), the features are usually
`weighted terms or phrases.
`As hyper collections result from a dynamic pro(cid:173)
`cess, some nodes are more densly linked than others.
`Specifically. older nodes tend to be well linked
`whereas newer ones are often poorly linked. Thus
`we believe that the absence of a (semantic) link does
`not imply necessarily the absence of referential or
`semantic relations between the nodes. , If there is no
`link we a:.sume that possible dependen~ies are un(cid:173)
`known. This is in contrast to Frisse's interpretation
`that says "the absence of a link (or a path of links)
`between two cards asserts that the utilities of the two
`
`
`
`104
`
`ACM ECHT CONFERENCE
`
`'i. = <lo, ... , lm-1>
`
`cards are conditionally independent" [4]. The prob(cid:173)
`lem of missing links becomes apparent when a hyper
`document collection is in use: Poorly linked nodes
`are less likely to be found when browsing through
`the collection. Sparsely linked nodes also hamper
`those automatic retrieval methods that use links.
`In what follows we assume that the indexing vo(cid:173)
`cabulary contains m tel!1ls. The node description ii
`and the link description A. are therefore given by:
`ii =<no, ... , Dm-t>
`, where nj with 0 S i < m
`are the term weights of
`noden,
`, where li with 0 S i < m
`are the topic descriptor
`weights of the link A..
`The link description 'i. depends mainly on the link
`type t E T, on the source node Is, on the destination
`node ld, and possibly on an user expansion ii repre(cid:173)
`sented by a feature vector:
`ii =<uo, ... ,um-1> ,whereujwithOSi<m
`are the weights of descrip(cid:173)
`tors provided bY users.
`We define a simple link indexing function i:
`i:TXJRffiXJRm~ Rm,
`(t, fs, id) H a (j (t, iS, ld))
`where f ; T X R mx R m ~ lR m is a function
`modelling why the semantic link was estab(cid:173)
`lished; the link type t may play an amplifying
`or negating role. The purpose of this func(cid:173)
`tion is to find common abstract concepts
`(expressed by weighted indexing terms) in the
`source and destination nodes. A similar idea
`was followed by Croft and Turtle in their
`probabilistic hypertext retrieval model [2, p.
`219].
`a : R m ~ R m is a (non-injective) map(cid:173)
`ping funcf!on reducing small components of
`the vector A. to 0. As the number of links in
`a hyper document collection is usually much
`larger than the number of nodes, the aim is to
`keep link descriptions as compact as possible.
`The correstion function iu takes an existing link de(cid:173)
`scription ~ and a user e~sion 'ii in order to create a
`modified link description A.':
`iu: JRffiX JRID ~ JRID,
`<'i.. u) H a <~+ii)
`More elaborate link indexing functions would also
`take information outside of the immediate neighbor(cid:173)
`hood of the two nodes into account. In this way, a
`weak destination node followed by promising descen-
`
`dants would also get a chance to be considered by a
`retrieval algorithm.
`
`3 Retrieval Strategies exploiting Hy(cid:173)
`pertext Links
`
`3.1 General Considerations
`When users browse, they normally follow only a
`small number of the existing links. Conversely, a
`retrieval algorithm may follow many links and may
`create both high retrieval costs and doubtful results.
`Retrieval experiments in a collection of bibliographic
`references showed that following citations-a .kind of
`referentiallinks-p:oduces ambiguous results [8, 12]:
`. .. An evaluation of the process shows that many
`useful content words can be extracted from related
`document titles, as well as marry terms of doubtful
`value . . .
`[12, pp. 385]
`The hope is that our semantic links contain the in(cid:173)
`formation necessary to decide whether a further node
`should be visited by the retrieval algorithm or· not.
`The prbposed automatic navigation through the hyper
`web is governed by the following considerntions:
`the further away from the initial node the IR al- .
`gorithm searches the less likely it is to find suit(cid:173)
`able information;
`links are only followed when they promise to
`point to nodes containing information relevant to
`the query;
`it may become mandatory to visit a specific node
`(e.g., because of given restrictions during there(cid:173)
`trieval process), depending on the retrieval re(cid:173)
`somces available.
`In this way the descriptions of semantic links control
`the navigation process of the retrieval algorithm.
`The existence of such link descriptions and their use
`for retrieval purposes constitute the main difference
`between our retrieval algorithm and approaches de(cid:173)
`scribed elsewhere [4, 8, 9, 13].
`
`3.2 Retrieval Algorithm and Retrieval
`Costs
`A node nj is said to be adjacent to node Dj if and only
`if there exists a link A. (ni, llj) or a linK A. (nj. ni).
`Given a link A. (nit Dj), Dj is said to be the destina(cid:173)
`tion nOde of this link {or,less precisely, a destination
`node of the node nj), conversely ni is a source node of
`Dj. The outdegree of a node n is the number of links
`with n as source node, the indegree is the number of
`links with n as destination node.
`The retrieval algorithm determining a Retrieval
`Status Value (RSV) between the query q and the hy(cid:173)
`pemet node n includes an initialization and a naviga(cid:173)
`tion phase:
`
`
`
`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`105
`
`{standard retrieval}
`{initial RSV}
`
`{hypertext retrieval}
`
`{navigation procedure described below}
`{result, to be sorted}
`
`®
`
`InitialiZation phase, step <D
`(])
`for all rwdes n of collection do
`RS~~ :=similarity (q, n)
`end for
`for all rwdes n of collection do
`·
`rsv := RSV'l;
`navigation(n, rsv, 1)
`IniettList ( q, n, rsv)
`end for
`• Navigation phase, iteration of decision step® and navigation step @.
`procedure navigation (inn, in/out rsv;in distance)
`if(outdgree (n) > 0) and (distance ~maximum_distance) then
`for all destination nodes n' of node n do
`.
`.
`if sim (q, .t (n, n')) > threshold_value then
`updatersv
`navigation (n', rsv, distance+ I)
`end if
`end for
`end if
`We use the definitions for walk, trail, and path in the
`same sense as they are used in graph theory. If (no.
`n1, ... , lid) is a sequence '?f nodes of a hypemet H,
`such that ni is adjacent to ni+ 1 'V O~<d, then these
`nodes and the corresponding links are called a walk of
`length d. If the links are distinct, this walk is called
`a trail of length d. If all the nodes of a trail are dis-
`tinct, the trail is called a path of length d.
`The navigation distance is the minimum path
`length from a given reference node n in the hyper
`web to any node that can be reached from this node.
`If a maximum navigation distance is given, the re(cid:173)
`trieval algorithm ignores those nodes whose distance
`from the reference node exceeds the· maximum dis(cid:173)
`tance.
`We assume a homogeneously linked hyper colleC-
`tion with nodes of an average outdegree of k. If a
`node traversal starts from a given reference node n,
`the number of traversed nodes of all possible paths
`with length d grows exponentially. To limit there(cid:173)
`trieval costs, a navigation strategy with restrictive
`propagation must be found.
`
`Two different hypermedia retrieval strategies can be
`distinguished:
`a) A user who knows little about the hyper collec(cid:173)
`tion is looking for an entry point convenient to
`start browsing. Here the aim of the retrieval
`strategy is to retrieve nodes that represent the in(cid:173)
`terest of the user. For this purpose an exhaus(cid:173)
`tive search through the network is necessary.
`The potentially large search effort can be·iimited
`by taking into account the descriptions of the
`semantic links.
`b) The user is "sitting" on a node satisfying some
`of her or his information needs. In this case an
`automatic navigational search along the hyper
`structure is performed to find further useful
`nodes.
`In what follows we describe these two strategies,
`namely, the exhaustive search and the navigational
`. search along semantic links.
`
`Fig. 2:.Navigation Distance
`
`
`
`106
`
`ACM ECHT CONFERENCE
`
`3. 3 Exhaustive Search
`The aim of the exhaustive search algorithm is to re(cid:173)
`trieve nodes that are in the focus of the user's inter(cid:173)
`est. Each node in the hyper collection is regarded as
`a reference candidate from where a search may start.
`~ blltializationpha~:
`An initial RSV'l~ is obtained by determining the
`similarity between the reference node n and the
`query q given by a m-dimensional real vector q ==
`< qo, ... , 'lm-1 >. This can be done by applying
`a conventional retrieval function p [11]:
`p : JR mx JR m ~ JR , (q, ii) H p (q, ii)
`This retrieval function was called 'similarity' in
`3.2.
`® Decision step:
`The algorithm decides if a navigation step to the
`next node has to be performed. This is the case
`if the link description signalizes the existence of
`infonnation related to the query in adjacent
`nodes. This can be determined by the similarity
`function <1 between _the query description q and
`the link description A.:
`<1: lR mx lR m ~ lR, (q, ~) H <1 (q, ~)
`If <1 (q, £) is smaller than a threshold value v,
`navigation is prevented. Jn addition, a predicate
`P on the attribute set I limits navigation. In
`this way the function sim of chapter 3.2 be(cid:173)
`comes:
`
`( ( <1 (q, i) > v) AND P (I))
`@ Navigation step:
`Each time the algorithm navigates, the RSV
`with respect to the reference node is modified.
`This modification depends primarily on the desti(cid:173)
`nation node and on the distance from the cwrent
`reference node. Furthennore, it can depend on
`the net topology.
`The modification of the current RSV when a nav(cid:173)
`igation step is perfonned from a node of distance
`d to a node of distance d+ 1 is calculated as fol(cid:173)
`lows (n~1 denominates the destination node i of
`node n at distanced+ 1):
`
`d+l
`RSV3.j.~ := RSV3'n + wd ·I RSV~'ni
`i
`Wd is a propagation factor depending on the navi·
`gation distance previously covered and on the net
`topology. We are considering to use the 'link
`quality' -
`a function of <1 -
`as an auxiliary
`weighting factor.
`
`The method presented for the exhaustive search ap(cid:173)
`proach- in particular the choice of specific retrieval
`functions and weighting factors - will be discussed
`in chapter 4 where an example is introduced.
`
`3.4 Navigational· Search
`
`The navigational search deals with the situation
`where a user is 'sitting on' a node essential to the
`scope of her or his information need. The assump(cid:173)
`tion then is that further interesting nodes can be
`found in the neighborhood of this reference node pro(cid:173)
`vided that the retrieval algorithm follows suitable
`semantic links. The retrieval algorithm first deter(cid:173)
`mines a subset of further nodes to be visited depend(cid:173)
`ing on the maximum navigation distance (cf. 3.2).
`Subsequently, a search is initiated as described in
`the previous chapter. It is to be noted that this navi(cid:173)
`gation is limited by the propagation threshold value v
`and the predicate P as described in the previous chap(cid:173)
`ter.
`
`4
`
`Some Experiments
`
`4.1 Test Collection
`Functions
`
`and
`
`Indexing
`
`To show the effects of the method presented, we used
`a conventional test collection which was expanded au(cid:173)
`tomatically to a hypertext. The collection consists
`of a subset of the INSPEC collection comprising
`2472 documents and 65 queries [7]. The expansion
`to a hypertext was made by considering documents to
`be nodes and establishing two directed links between
`nodes with common phrases in their manual descrip(cid:173)
`tions (paragraph 'DE'). The result was a hypertext
`consisting of a large net of 2397 nodes and 7 inde(cid:173)
`pendent small nets (1 of 5, 2 of 4 and 4 of 2 nodes).
`Jn addition, there are 54 single independent nodes
`without any links to the rest of the collection. The
`average outdegree is 23.07, the average path length
`between two arbitrary nodes within a net is 4.2.
`This represents a iknsly linked hyper collection.
`The node and the query descriptions were obtained .
`by applying Porter's word stemming [10] and a term
`weighting of tf ·idf [11, p.63]. As all links are in(cid:173)
`terpreted as semantic links. the link type t was omit(cid:173)
`ted. Therefore, the link indexing function i is re-
`duced to
`.
`isimple : (iS, id) H a2o (f (iS, id.)) ,
`where iS and id. are vectors who~ components are the
`frequencies of the terms occuring in source and desti(cid:173)
`nation nodes respectively. For simplicity reasons,
`a2o restricts the resulting vector to the 20 highest
`valued weights (all other vector components are ~t to
`0).
`
`
`
`MILANO, NOVEMBER 30- DECEMBER 4, 1992
`
`107
`
`Tim;e different link indexing functions J are applied
`and their perfonnance is compared: .
`f1 : fs + id.
`topics covered by both nodes
`f2 : fs + id + ii with additional link information
`(by'user')
`f3. : u
`'user' link information exclu-
`sive}~
`The function f 1 represents a simple attempt to pro(cid:173)
`vide a first link description. The idea is to express
`the common topics of both involved nodes.
`The 'user' link information ii was not provided by
`a real user in our particular case but was determined
`automatically. It was obtained by decomposing the
`phrase of paragraph 'DE' ~ which established the
`link- into reduced words (applying Porter's algo(cid:173)
`rithm [10]).
`Where function f2 simulates a plausible real situa(cid:173)
`an initial link description is modified by an
`tion -
`t}le function f3 i!l
`auxiliary manual description -
`only of academic interest. In real hyper collections
`the 'user' link information will probably be provided
`by relevance feedback meehanisms as pointed out in
`[14] and only in very few and special cases by a man(cid:173)
`ual link description. Conversely, the function f3 al(cid:173)
`lows us to compare the simple automatic approach
`f 1 with an elaborated manUal link description.
`For the following experiments the functions p
`(initialiZation phase) and a (link similarity function)
`are identical to the cosine measure [11, p.121J.
`
`4.2. Basic Evaluations
`Fig. 3 shows the resu!! of 11pplying the link descrip(cid:173)
`tion function ft (ie.ls + 10). The propagation dis-.
`tance was restricted to 1. The effectiveness was mea(cid:173)
`sured by taking the average number of relevant nodes
`of the 20 nodes with the highest RSVs (we chose
`this simple measure for reasons of simplicity in the
`data representation, as each of the following graphs
`represents the evaluations of at l~st 600 cOmplete· re(cid:173)
`trieval evaluations of 65 queries on 2472 nodes).
`The other two axes show the parameters 'propagation
`factor' Wd (chapter 3.3) and 'threshold value' v
`(chapter 3.3) determining if a propagation has to take
`place.
`·
`.
`The constant 'platform' for large values of v shows
`the effectiveness level of the algorithm. High values
`of v prevent the propagation. Low values of Wd, in
`particular Wd = 0, prevent the RSV to be incre(cid:173)
`mented. In both cases the initial retrieval status
`value RsV": (cf. 3.2) remains unchanged.
`A threshold value of v = 0 returns the best re(cid:173)
`sults. Althoggh 39.6% of the links show a similar(cid:173)
`ity of a (q, ~) ""0 (and thcroforc arc not taken into
`account), the retrieval costs are very high for v = 0 as
`
`the algorithm visited 13.9 links per node on the aver(cid:173)
`age.
`
`Fig. 3: Evaluation !1 = id +IS
`It turned out that taking all links into account (on the
`average 23 links) in this experiment does not provide
`significantly different retrieval results compared to the
`ones with threshold value v = 0 (not shown in Fig.
`3). Conversely, newer results of experiments per(cid:173)
`formed on another collection [14] that suggest link
`descriptions (based on fl) may enhance the retrieval
`effectiveness even when taking significantly less
`links into account. Taking fewer links -
`and there(cid:173)
`fore reducing the retrieval cost -
`provides satisfac(cid:173)
`tory results only for small propagation factors Wd.
`The main problem is to balance retrieval cost
`against retrieval effectivei)ess. We are measuring the
`effectiveness by comparing a method A with a
`method B. If no method B is explicitely specified,
`we compare a given method with the retrieval that
`does not take links into account.
`As an example let· us look at v = 0.2 and Wd =
`0.45: the improvement is only 7.7% on the average
`(compared with 10.3% at v = 0 and Wd = 1.05). In
`this example ,only 0.56 links per node are followed
`·
`on the average.
`Fig. 4 and 5 show the improvement by taking an
`auxiliary user description into account. Fig. 4
`shows the result when the additional features are
`weighted with their term frequency in the phrase.
`
`
`
`108
`
`ACM ECHT CONFERENCE
`
`compared to using single terms automatically e:J{.~
`tracted from the documents. Also, our 'user descrip(cid:173)
`tion' originated from a manual indexing of the nOdes
`instead of being a real description of the links. Also
`it is debatable whether a deComposition of phrases is
`suitable because of the loss of information.
`Fig 6. shows the ·effect of applying the link index(cid:173)
`ing function f3:
`
`Fig. 4: Improvement by a 'Virtual' User Description:
`·
`!2 = IS + id + ii, simple weight
`
`Fig. 5 shows the result when the weights of the aux(cid:173)
`iliary user features are doubled to increase their influ(cid:173)
`ence.
`
`. Fig. 6: User Description Only: f3 = ii
`
`A small improvement of the retrieval effectiveness
`can be gotten by using the indexing function.f3, i.e.
`the additional use~ description only. This provides
`two further advantages:
`• The retrieval costs are significantly lower:
`Only -1.79 nodes are visited on the average for a
`threshold value of v = 0. With v = 0.1, only
`1.25 and with v = 0.2, only 0.53 nodes are vis(cid:173)
`ited
`.
`.
`• The ·method is less sensible to variations of Wd
`andv:
`.
`Th~ small nwnber of__:_ in the major part -
`spe(cid:173)
`cific features in the semantic link description al(cid:173)
`lows only a very restrictive navigation; the vis(cid:173)
`ited nodes seem more useful than the nodes that
`are visited when applying the indexing functions.
`f1 and f2. As far as we know there are no prob(cid:173)
`lems with outliers. Applying a median function
`instead of the average function during the naviga(cid:173)
`tion step (cf. 3.2) does not modify the results
`significantly for the indexing functions f1 to f3.
`This result shows that besides the over-all effective(cid:173)
`ness - which behaves as expected -
`other factors·
`determine 1the quality of a link description. A good
`choice of the link description seems to be crucial for
`stable retrieval algorithms.
`
`Fig. 5: Improvement by a 'Virtual' User Description:
`· fz = IS + 1(1 + u, stressed weight
`
`The improvement compared to the evaluation of !1 is
`relatively small in both cases. This result is not
`very swprisi,ng. As mentioned in 4.1 our 'user de(cid:173)
`scription' was derived automatically from phrases
`similar to the way descriptors were obtained. in the
`Cranfield Project [1]. These Cranfield experiments
`showed that retrieval effectiveness is not increased
`when a controlled vocabulary and phrases are used
`
`
`
`MILANO, NOVEMBER 30 - DECEMBER 4, 199:!
`
`109
`
`4. 3 Visiting Nodes with Distances larger
`than 1.
`
`To show the effects when taking nodes up to a dis(cid:173)
`tance of 2, we experimented with the link indexing
`functions f2 = iS + id + u (Fig. 4) and f3 = u (Fig.
`6). fu Fig. 7 and 8 the propagation factor Wd for the
`distances 1 and 2 are depicted instead of the axes Wd
`and v. The value axis represents the average number
`of relevant nodes within the 20 nodes with highest
`RSV.
`Because bidirectional/inks are generated in our ex(cid:173)
`periments, two series of experiments can be done for
`all parameter combinations:
`An evaluation without restriction, i.e. the algo(cid:173)
`rithm may return to the node itis coming from.
`An evaluation where the backward link is bloclred
`when the corresponding forward link is activated.
`
`Although we did both types of experiments, only the
`ones with blocked return links are presented here.
`The results are in fact very similar, but we think that
`it is too early to make general conclusions.
`
`Fig. 8: Distance 2 (v=O) for function !3
`
`· Both evaluations were performed with the link index(cid:173)
`ing function f2. Fig. 9 shows the threshold pair
`Vdist=l= 0.05 and Vdist=2= 0.10; Fig; 10 the pair
`Vdist=l= 0.10 and VcJist=2= 0.20.
`
`Fig. 7: Distance 2 (v=O) for function f2
`
`Fig. 7 shows the evaluation using the node indexing
`function f2. Fig. 8 uses the function f3. The val(cid:173)
`ues· of v are 0 for both distances 1 and 2. The two
`gtaphs show a smaller improvement when distance 2
`nodes are added as opposed to the improvement gotten
`when we added the distance 1 nodes. This can be ex(cid:173)
`plained by the decreasing similarities between the
`query and the nodes that are 1jlrther away from the ref(cid:173)
`erence node. It goes without saying that the retrieval
`cost grows extremely fast with increasing distance.
`Fig 9 and 10 show the evaluations for threshold
`values higher than 0 (to reduce retrieval cosis).
`
`Fig. 9: Distance 2 (v=0.05/0.10); f2 = id +iS+ ii
`
`. In our -
`densly linked.- test collection it is ques(cid:173)
`tionable if it is worth propagating up to distance 2
`given these circumstances. A small improvement of
`the effectiveness must be paid for with a significant
`increase of retrieval costs.
`·
`
`
`
`110
`
`ACM ECllT CONFERENCE
`
`Further experiments will be necessary to investi(cid:173)
`gate. the role of the source and destination nodes wllen
`links are automatically indexed. Furthermore, it
`should be possible to find approxiniations for mi~s
`ing node descriptions, e.g. for not indexed nodes that
`contain not supported media types.
`
`6
`
`Cc;mclusioris
`
`We showed that it is possible to enhance the retrieval
`effectiveness by having the retrieval algorithms fol(cid:173)
`low semantic links thus considering the conteilt-ori(cid:173)
`eiited neighborhood of a reference node. Semantic
`links have to be established between nodes whenever
`appropriate for this purpose. These links must be
`indexed by a suitable indexing method that delivers a
`link description depending on the content of at least·
`the two adjacent nodes.
`We devised retrieval algorithms taking the descrip(cid:173)
`tions of both the nodes and the links into account.
`After evaluating the similarity between the query and .
`a reference node, these algorithms determine whether
`links should be followed and if so, which links are
`the most promising ones to follow: The experi(cid:173)
`ments we carried out with ~ hypertext test collection
`showed encouraging improvements in retrieval effec(cid:173)
`tiveness.
`
`References
`
`[1] Cleverdon, C.W., Keen, E.M.: Factors· Deter(cid:173)
`mining the Perfomumce of Indexing Systems.
`Vol. 1 and 2, Aslib Cranfield Research Project,
`Cranfield. England. 1966;
`
`[2] Croft, W.~ .• Turtle, H.:.A Retrieval Model for
`Incorporating Hypertext Links. · Proc. of the
`1st European Conf. on Hypertext (ECHI') 90,
`Ed. A. Rizk, N. Streitz, J. Andre, Cambridge
`University Press, 1990, pp. 213-224.
`
`[3]
`
`[4]
`
`Frei, H.P., Schlluble, P.: Designing a Hyper(cid:173)
`media Information System. Proc. DEXA '91,
`Springer-Verlag, Wien, 1991, pp. 449-454.
`
`Frisse, M.E.: Searching for Information in a
`Hypertext Medical Handbook. Commun. ACM
`31, No. 7, July 1988, pp. 880-886.
`
`[5] Glavitsch, U., Schauble, P.: A SyStem for Re(cid:173)
`trieving Speech Documents. Proc.l5th ACM
`SIGIR Conf., SIGIR Forum, ACM Press,
`June 1992, pp. 168-176.
`[6} Haan, B., Kahn, P., Riley, V., Coombs, J.,
`Meyrowitz, N.: IRIS Hypermedia ServiCes.
`Commun. ACM 35, No. 1, January 1992, pp~
`36-51.
`
`Fig. 10: Distance 2 (v=O.l0/0.20); f2 = ld +iS+ ii
`There is evidence that in a less densiy linked eollec.:.
`tion it might be worthwhile to take distances larger.
`than 1 into account. When we removed 50% of the
`links randomly in our test collection there was -
`of
`course -
`a smaller effectiveness improvement with
`distance 1. On the other hand. we still observed a
`significant improvement with distance 2. This effect
`is reproducible with all the link indexing functions
`presented .here.
`
`5
`
`c·urrent and Future Work
`
`The experiments described in lhls paper showed how
`important i