`
`Web Site Personalizers for Mobile Devices
`
`Corin R. Anderson, Pedro Domingos, Daniel S. Weld
`{corin, pedrod, weld}@cs.washington.edu
`University of Washington, Seattle, WA, USA
`
`Abstract
`The fastest growing community of web users is that
`of mobile visitors who browse with wireless PDAs,
`cell phones, and pagers. Unfortunately, most web
`sites today are optimized exclusively for desktop,
`broadband clients, and deliver content poorly suited
`for mobile devices — devices that can display only
`a few lines of text using slow wireless networks.
`To best serve the needs of this growing community,
`we propose building web site personalizers that ob-
`serve the behavior of web visitors and automati-
`cally customize and adapt sites for each individual
`mobile visitor. In this paper, we give an overview of
`our approach to web site personalization as utility-
`maximizing search through the space of person-
`alized web sites. Following this framework we
`have implemented two personalizers: PROTEUS
`and MINPATH. PROTEUS allows changes to site
`navigation (adding or removing links) as well as
`content manipulation (rearranging or eliding con-
`tent), and evaluates the result with a learned model
`of the current visitor. MINPATH concentrates ex-
`clusively on adding “shortcut” links, but uses a
`model learned by clustering visitors based on their
`sequences of page requests. We introduce PRO-
`TEUS and MINPATH, and outline our current and
`future directions for these personalizers.
`
`Introduction
`1
`The fastest growing community of web users is that of mobile
`visitors — people who browse the web with wireless PDAs,
`cell phones, and pagers. Ninety-five percent of cell phones
`sold today are “web-ready” and authorities predict that the
`number of wireless Internet devices will outnumber desktop
`computers by 2003. Despite this trend, however, few web
`sites today cater to mobile visitors, instead optimizing their
`content for desktop clients. Unfortunately, mobile devices
`are not as capable as their desktop counterparts, being lim-
`ited by small screens, low-bandwidth networks and slower
`processors. Thus the user experience for mobile visitors at
`these “one-size-fits-all” sites suffers. To address this prob-
`lem, we propose building web site personalizers that auto-
`matically adapt and personalize a web site to each individual
`
`mobile visitor.
`Mobile web visitors exhibit a variety of browsing behav-
`iors: random surfing, task completion (e.g., buying stocks),
`information-goal seeking (i.e., answering questions), etc.
`Information-goal seeking is of particular interest because it
`is generally predictable: visitors tend to have similar infor-
`mation goals in the future as in the past. Some example goals
`include: “What is the current stock price of MSFT?”; “Are
`there any Pentax K-mount zoom lenses on auction at eBay?”;
`“What office is Dan Weld in?”. This behavior is predictable
`because visitors generally follow the same set of links, view
`the same set of pages, to achieve these goals each time, and
`attempt to do so in a direct and efficient manner.
`In addi-
`tion, visitors tend to view pages with similar content as pages
`viewed in the past (e.g., a photographer may frequently view
`pages containing words “zoom lens” and “f-stop”, although
`the URLs requested may differ). By mining past interactions
`with the web site for these behaviors, we can automatically
`personalize the web content for each individual visitor. We
`envision web site personalizers that act on behalf of a mo-
`bile visitor to adapt web content as the visitor browses. A
`web site personalizer is an intermediary between the web site
`and the visitor and may be situated on the web server, on the
`visitor’s device, or at a proxy server in between. A web site
`personalizer can:
`• Make frequently-visited destinations easier to find, by
`highlighting relevant links or adding new links to a page.
`• Highlight content that interests the visitor, by rearrang-
`ing content on the page, or by adding visual cues.
`• Elide uninteresting content and structure, replacing them
`simply with links to the omitted material.
`In
`Web site personalization follows a two-step process.
`the first step, the personalizer builds a model of each visitor
`by mining the access logs and site content. The model in-
`cludes information about navigational browsing behavior as
`well as content interests. This model could also include “out-
`of-band” information, such as visitors’ geographic location or
`demographics. In the second step, the personalizer transforms
`the site to maximize the expected utility [9] for a given visitor.
`The expected utility of a personalized web site is a measure
`of how much benefit the visitor will receive by browsing the
`site; the personalizer computes this value based on the visitor
`model derived in the first step.
`
`SCEA Ex. 1046 Page 1
`
`
`
`a link at the top of the current page may not be difficult, but
`reaching a page many links away will require scrolling (to
`find the links) and waiting for intermediate pages to down-
`load over the wireless network. We transform this intuition
`into practice by recursively defining the utility of a page as
`the sum of its intrinsic utility — the utility of the page, in
`isolation — and its extrinsic utility — the utility of the linked
`pages2. We then calculate the utility of the site by evaluat-
`ing this recursion beginning with the page requested by the
`visitor. We make these concepts more precise below.
`Web site model for evaluation
`We find it advantageous to transform the search state model
`slightly when calculating expected utility. Specifically, we
`into a sequence of “screens”
`now decompose a page pi
`(cid:104)si0, . . . , sim(cid:105), each of which represents the web content
`that can be seen in one window of the visitor’s browser. A
`screen sij is composed of web content (i.e., text and graph-
`ics), which we denote as Tij, and a set of links lij1, ..., lijk.
`Expected utility
`Let ˆp be the personalized page for the requested URL u and
`let UV (ˆp) be the utility of ˆp for visitor V . The evaluation
`of W is the result of a recursive traversal through the site
`beginning with ˆp. Because only the first screen of ˆp is initially
`visible to the visitor, the expected utility of ˆp (or any pi, in
`fact) is the expected utility of its first screen:
`E[UV (pi)] = E[UV (si0)]
`The expected utility of a screen sij, in turn, is the sum of
`its intrinsic and extrinsic utilities:
`E[UV (sij)] = E[IUV (sij)] + E[EUV (sij)]
`The intrinsic utility of a screen measures how useful the
`screen’s content is towards fulfilling the visitor’s information
`goal, independent of the rest of the web site. Typically, the in-
`trinsic utility depends on the visitor model — past history and
`demographics. A more detailed description of intrinsic utility
`depends on particular assumptions regarding visitor interests
`and goals; section 3 discusses the method used in PROTEUS.
`The extrinsic utility measures the value of a screen by its
`connections to the rest of the web site. To reach the rest of the
`site, we model the visitor as choosing from a fixed set of nav-
`igation actions, any number of which the visitor may select
`(i.e., the actions are not mutually exclusive). Specifically, if
`the visitor is at screen sij, then the visitor may: scroll down
`to the next screen (assuming that sij is not the last screen of
`the page); or follow any link that appears on the screen. We
`maintain independent probabilities that the visitor will take
`each action, denoted as P (action), as well as the cost (i.e.,
`negative utility) each action imposes on the visitor, denoted
`γs and γl for scrolling and following a link, respectively. If
`we let dijk be the destination page of link lijk, then the ex-
`trinsic utility of screen sij is a sum weighted by probabilities:
`E[EUV (sij)] = P (scroll)(E[UV (si,j+1)] − γs) +
`[P (lijk)(E[UV (dijk)] − γl)]
`
`(cid:88)k
`
`2In many ways, intrinsic and extrinsic utilities are analogous to
`Kleinberg’s authority and hub weights [10].
`
`The focus of our work is in defining this framework of
`personalization as search and exploring how to instantiate
`this framework in a practical manner. This paper discusses
`our recent efforts along these lines and details current work.
`This paper draws heavily from previous papers describing our
`work [1; 2].
`
`2 Personalization as search
`We first describe our approach briefly. Our web site person-
`alizer performs a search through the space of possible web
`sites. The initial state is the original web site of unmodified
`pages. The state is transformed by any of a number of adap-
`tation functions, which can create pages, remove pages, add
`links between pages, etc. The value of the current state (i.e.,
`web site) is measured as the expected utility of the site for
`the current visitor. The search continues either until no bet-
`ter state can be found, or until computational resources (e.g.,
`time) expire.
`
`2.1 State representation
`Each state in our search space is an entire web site, W .
`Although an actual implemented system (such as those dis-
`cussed in sections 3 and 4) may choose to personalize only a
`single page at a time, we model the entire web site to allow
`adaptations to be made anywhere in the site. The web site
`W is modeled as a directed graph whose nodes are pages,
`p0, . . . , pn, and whose arcs are hypertext links, l0, . . . , lm.
`A link lk is a triple (ps, pd, a) where ps is the source page
`(i.e., the page on which the link appears), pd is the destina-
`tion page, and a is the anchor text. Each page pi is modeled as
`a hierarchy of web content, much in the same way the parse
`tree of an HTML document confers a hierarchy of HTML
`tags. pi is thus represented as the root of this hierarchy, and
`is a content node. A content node c is a pair (C, B) where C
`is a sequence of children (cid:104)c1, . . . , ck(cid:105) of c and B is a behav-
`ior that c imparts on its children. The elements of C may be
`either plain text or (recursively) content nodes. The behavior
`B is the action that affects the human-viewable content. For
`example, if c were a “<strong>” node, then B would ren-
`der its children in boldface; or if c were an “<a>” node, then
`B would render them as a hypertext link. Summarizing:
`S = {W0, W1, . . .} Each state in the space is a web site
`W = ({p0, . . . , pn}, A web site is a directed graph of
`{l0, . . . lm})
`pages and links
`lk = (ps, pd, a)
`A link has a source, destination,
`and anchor
`pi = ci
`A page is a root content node
`ci = ((cid:104)ci1, . . . , cik(cid:105), A content node is a sequence of
`B)
`children and node behavior; or
`ci = text
`A content node is plain text
`
`2.2 State Evaluation
`We estimate the quality of the personalized web site as its
`expected utility from the point of view of the requested page.
`Intuitively, the expected utility is the sum of the utility the vis-
`itor receives by browsing each page in the site, discounted by
`the difficulty of reaching each page1. For example, following
`
`1We actually compute utility at a finer granularity than a page.
`
`SCEA Ex. 1046 Page 2
`
`
`
`This equation recursively references the utility of other
`pages and screens. The recursion is halted when the ex-
`pected utility of a screen or page is less than the cost of
`reaching that content (i.e., when E[UV (si,j+1)] < γs or
`E[UV (dijk)] < γl).
`If evaluated na¨ıvely, expected utility is not computationally
`tractable – it would require a screen-by-screen decomposition
`of potentially every page in the entire web site. Fortunately,
`the evaluation is made computationally much simpler with a
`few assumptions. First, when the cost of scrolling dominates
`the cost of following a link (γl + γs ≈ γl), then we treat all
`pages but ˆp as single-screen pages and can ignore the recur-
`sion due to scrolling on these pages. Second, we limit how
`deeply our recursion proceeds by setting a minimum thresh-
`old on the probability of viewing a screen or page — if the
`probability of it being visited is lower than the threshold, then
`it is excluded from the calculation. The threshold allows us to
`trade off performance for accuracy: the lower the threshold,
`the more accurate the evaluation, at the expense of recurring
`through more of the site.
`
`3 Proteus
`We have implemented this search framework in the web site
`personalizer PROTEUS. While most of the details of the im-
`plementation should be clear from the framework, we discuss
`a few implementation-specific issues here.
`
`3.1 Search operators
`To reduce the complexity of PROTEUS, we require that search
`operators directly affect the requested page ˆp in some way,
`e.g., adding links to ˆp or manipulating content on ˆp. We ex-
`clude operators that, for instance, generate new pages or add
`links between two other pages.
`PROTEUS supports two transformation operators3: elide-
`content and add-shortcut. elide-content replaces a block
`of content on ˆp with a link to the original content in a fash-
`ion similar to Digestor [3] (Figure 1). The add-shortcut
`operator creates a new link from ˆp to some other page that
`can be reached by following at most k links from ˆp. Thus,
`add-shortcut creates a link that “shortcuts” a longer path of
`links. For example, if the visitor previously followed the path
`ˆp → pa → pb → pc → pd, add-shortcut may create a link
`directly from ˆp to pd. Furthermore, add-shortcut places the
`new link ˆp → pd next to the original link ˆp → pa, anticipating
`that, when the visitor wants to find pd again, the visitor will
`look towards the link to pa first. This placement is possible
`only if the visitor actually followed the path previously, oth-
`erwise, add-shortcut places the link near paths other visitors
`at the site have taken. The anchor text of the link is chosen
`heuristically as either the destination’s <title> or <h1>.
`
`3.2 Expected Utility
`Our implementation measures intrinsic utility of a screen as a
`weighted sum of two terms: how well the screen’s content
`matches the visitor’s previously viewed content, according
`
`3PROTEUS supports a third operator, swap-siblings, but we omit
`its discussion for space considerations.
`
`to a text-similarity measure, and how frequently the visitor
`viewed this screen. See earlier work [2] for more detail.
`PROTEUS estimates the action probabilities by measuring
`the frequency with which the visitor took each action in the
`past. For example, the probability that the visitor follows a
`link ps → pd is the quotient of the number of sessions in
`which the visitor viewed pd sometime after ps divided by the
`number of sessions in which the visitor viewed ps. The prob-
`ability for scrolling is derived empirically and is held con-
`stant at 0.85, although we are in the process of determining
`this number as part of the visitor model. Also through em-
`pirical evaluation, we set the cost of scrolling, γs, at 0.01 and
`the cost of following a link, γl, at 0.05. These values work
`acceptably in practice, although our results are largely insen-
`sitive to the exact values. In practice, the dominant term in
`the expected utility equation is the product of probabilities of
`taking chains of actions. For example, for all but the most
`probable links, the contribution of a remote page to expected
`utility is already vanishingly small, irrespective of the cost of
`following the link.
`
`3.3 Results
`In this section we present the results of a small user study
`of PROTEUS. We track ten test subjects’ browsing habits on
`their desktop workstations and then measure how effectively
`they use a suite of personalized and non-personalized web
`sites on a wireless Palm Connected Organizer. We measure
`visitor effort in terms of both time to attain the goal and the
`amount of navigation (number of scrolling actions and links
`followed) required.
`To collect training data, we asked the subjects to perform
`a suite of information-seeking tasks that we provided daily.
`We directed the subjects to attain their goals by browsing ex-
`clusively at the given site starting from a given page. An ex-
`ample question is: “Find the current stock price for MSFT,
`starting at finance.yahoo.com”. The tasks in the seed
`suite were drawn randomly from a distribution of parametric
`questions and represent a coherent model of visitor interest.
`Following the training phase, we asked the subjects to an-
`swer another suite of questions using a wireless Palm Con-
`nected Organizer. We asked them to answer the questions
`twice: once, on the unmodified web site, and again on a per-
`sonalized version of the target site. PROTEUS personalized
`the target site for each visitor by first building a model of that
`visitor, based on the subject’s past browsing data, and then
`creating adapted pages for the testing suite. Because our cur-
`rent implementation is not yet fast enough to adapt a single
`page in real-time, we personalized the sites before the sub-
`jects performed their tests. We chose which pages for PRO-
`TEUS to adapt by using our human judgment of where the
`subjects would likely visit during the test. Note that we have
`not influenced the personalization at all — we have merely
`selected the subset of pages that PROTEUS personalizes, for
`efficiency.
`Figure 2 compares links followed to attain each goal on
`the personalized versus unmodified web sites (the graphs of
`the time required and scrolling actions are similar). The y-
`axis shows the number of links while the x-axis shows the
`location of each goal listed chronologically. The graph shows
`
`SCEA Ex. 1046 Page 3
`
`
`
`Figure 1: Elided content. On the left is an unmodified web page. On the right a number of blocks of content have been elided
`and replaced with hypertext links.
`
`4.1 The MinPath algorithm
`A trail [16] T = (cid:104)p0, p1, . . . pn(cid:105) is a sequence of page re-
`quests such that each request occurs within some fixed time
`window of the previous request and is the destination of a
`link on the previous page. The personalizer watching a vis-
`itor’s behavior midway through the trail sees only a prefix,
`(cid:104)p0, . . . , pi(cid:105). The trail suffix, (cid:104)pi+1, . . . , pn(cid:105), must be hypoth-
`esized by the personalizer.
`If one had knowledge of the complete trail (cid:104)p0, . . . ,
`pi, . . . pn(cid:105), selecting the best shortcut at any page pi is easy:
`simply, pi → pn. Of course, given only a trail prefix, the per-
`sonalizer must infer the remaining pages. Our approach uses
`a model of the visitor’s behavior to compute a probability for
`every possible trail suffix (cid:104)qi+1, . . . , qn(cid:105) on the site.
`Intu-
`itively, these suffixes are all possible trails originating from
`pi. Given a suffix and its probability, we assign an expected
`savings to the shortcut pi → qj for each qj in the suffix as
`the product of the probability of the suffix and the number of
`links saved by the shortcut. Note that many trail suffixes may
`pass through the same page qj, and so the expected savings
`of a shortcut pi → qj is summed over all suffixes.
`For example, suppose that a visitor requests the trail prefix
`(cid:104)A, B, C(cid:105) and we wish to find shortcuts to add to page C.
`Suppose further that our model of the visitor indicates there
`are exactly two sequences of pages the visitor may complete
`the trail with: (cid:104)D, E, F, G, H(cid:105), with probability 0.6, and
`(cid:104)I, J, H, K(cid:105) with probability 0.4. The expected savings from
`the shortcut C → E would be 0.6×1 = 0.6, because the trail
`with E occurs with probability 0.6 and the shortcut saves only
`one link. The expected savings for shortcut C → H includes
`a contribution from both suffixes: 0.6 × 4 + 0.4 × 2 = 3.2.
`MINPATH constructs trail suffixes by traversing the di-
`rected graph induced by the web site’s link structure. Start-
`ing at the page last requested by the visitor, pi, MINPATH
`computes the probability of following each link and recur-
`sively traverses the graph until the probability of viewing a
`page falls below a threshold, or a depth bound is exceeded.
`
`Figure 2: Links followed.
`
`that for a majority of the sites, PROTEUS’s personalizations
`appear quite useful: the addition of shortcut links and elision
`of unnecessary content reduced the amount of visitor naviga-
`tion, and hence the amount of time spent, at the sites.
`
`4 MinPath
`
`One of PROTEUS’s weaknesses is that it builds its visitor
`models in isolation of each other. When training data is
`sparse, these models can be quite inaccurate.
`Instead, we
`would like to combine data from many, similarly-behaving
`visitors to build more robust models. To this end, we devel-
`oped MINPATH, an algorithm that finds high-quality short-
`cuts by modeling clusters of visitors, and using these mod-
`els to predict where in the site the visitor is likely to travel.
`By leveraging data from many visitors in a single cluster, the
`models MINPATH builds are more accurate and lead to bet-
`ter personalizations. Moreover, while not as general as PRO-
`TEUS’s expected utility, MINPATH’s evaluation metric — ex-
`pected savings — can be very efficiently computed.
`
`Unmodified
`Personalized
`
`0123456
`
`# links followed
`
`cs.washington.edu
`
`cnet.com
`cs.washington.edu
`
`ebay.com
`finance.yahoo.com
`
`cnn.com
`cs.washington.edu
`finance.yahoo.com
`
`cnn.com
`cs.washington.edu
`finance.yahoo.com
`
`ebay.com
`
`SCEA Ex. 1046 Page 4
`
`
`
`The savings at each page is the product of the probability of
`reaching that page and the number of links saved. MINPATH
`collates the results and returns the best shortcuts. We next
`describe how we obtain the model required by MINPATH.
`
`4.2 Predictive Models
`The key element to MINPATH’s success is the predictive
`model of web usage. The probabilistic model predicts the
`next web page request pi given a trail prefix (cid:104)p0, . . . , pi−1(cid:105)
`and the visitor’s identity V : P (pi = q|(cid:104)p0, . . . , pi−1(cid:105), V ). Of
`course, a model may condition this probability on only part
`or even none of the available data; we explore such models in
`our experiments. We fit the models to past web usage data,
`either to all the visitors to the site at once, or to clusters of
`visitors. We group visitors by clustering their sequences of
`web page requests, and use EM to simultaneously find the
`clusters and fit the models. For the en masse and cluster-
`ing approaches, we evaluate two types of models: uncondi-
`tional, which predicts the next link without regard to previ-
`ous browsing, and Markov, which predicts the next link given
`the previous request. When applied to clusters of visitors, we
`effectively have mixture models, either Na¨ıve Bayes mixture
`models [6] or mixtures of Markov models [5]. We describe
`these models in more detail in our other work [1].
`
`4.3 MinPath results
`We evaluate MINPATH’s performance on usage at our home
`institution’s web site. We use web access data for September
`2000 to produce a training set of 35,212 trails (approximately
`20 days of web usage) and a test set of 2,500 trails (1.5 days);
`the time period from which the test trails were drawn occurred
`strictly after the training period. We selected only those trails
`with link length at least two, because shorter trails cannot be
`improved. We measure MINPATH’s performance by the num-
`ber of links a visitor must follow to reach the end of the trail.
`We compare MINPATH’s performance using the models
`described earlier (see Figure 3). The first column shows the
`number of links followed in the unmodified site. In the sec-
`ond and third sets of columns, MINPATH uses, respectively,
`an unconditional and Markov model, each fitted to all the
`site’s visitors, and produces 1, 3, or 5 shortcuts per page4.
`In the last two sets, MINPATH uses mixture models of either
`10 or 25 clusters, and selects the distribution of the models
`in the mixtures based on only the current trail prefix (ignor-
`ing past visitor behavior). This graph demonstrates that the
`shortcuts MINPATH finds are of high quality — when using a
`mixture of Markov models and suggesting just three short-
`cuts, MINPATH can eliminate an average of 0.97 links, or
`40% of the possible savings. Of course, the actual effective-
`ness of adding shortcuts will depend on the visitor’s ability
`to discern whether the link will be of value, and in section 6
`we describe our ongoing work to intelligently select appro-
`priate link anchor texts. Finally, we note that MINPATH’s
`running time is quite small. The models are learned offline,
`but the process usually requires only several minutes. Given
`a model and the trail prefix, MINPATH finds a set of short-
`cuts in 0.65 seconds on an average desktop PC. MINPATH’s
`
`4We limit MINPATH to only as many shortcuts as can reasonably
`be displayed on the small screen of a wireless web browser.
`
`Figure 3: MinPath’s performance. Each column shows the
`average number of links followed in a trail. Mixture model
`columns are annotated with the number of clusters. Error-
`bars denote 95% confidence intervals.
`
`expected savings computation is substantially faster than and
`positively complements PROTEUS’s more general approach.
`
`5 Related work
`Two closely related lines of research are IndexFinder [15] and
`Digestor [3].
`IndexFinder creates singular transformations
`that appeal to all visitors at the site, in particular, generat-
`ing new index pages — hubs of links to other pages on the
`site. These pages are evaluated based strictly on the naviga-
`tional usage patterns of past visitors — the pages requested
`— irrespective of their content. Digestor optimizes pages for
`small-screen display and uses a steepest-descent search simi-
`lar to ours. However, Digestor rates the quality of a web page
`simply by how much screen space it occupies, a metric that
`encourages degenerate pages — a blank web page receives
`the highest quality value. In contrast, our approach personal-
`izes content per visitor, and evaluates the adaptations using a
`principled, utility-maximizing approach.
`The Web Browser Intelligence (WBI) [12] project pro-
`poses an architecture of pluggable intermediaries between
`web servers and clients. These intermediaries generate, trans-
`form, and monitor the content they see as visitors browse, and
`can be used either individually or in chains. An interesting
`line of future research would be to integrate PROTEUS into
`the WBI framework and investigate what what other interme-
`diaries could be usefully composed with PROTEUS.
`Countless other systems attack all or part of the web site
`personalization problem; we mention briefly several related
`systems. The Daily Learner [4] learns a Palm VII user’s pref-
`erence for news content, by monitoring exactly which stories
`the user requests from both the Palm device and the user’s
`corresponding desktop computer. Mobasher et. al. [13] mine
`web usage patterns and web content to personalize the vis-
`itor experience, specifically, to recommend new content the
`visitor may like to see. The PersonalClipper [7] allows visi-
`tors to build their own custom views of web sites by record-
`ing navigational macros using a VCR-metaphor and select-
`ing components of the target page to view with the mobile
`device. Letizia [11], WebWatcher [8], and adaptive web site
`agents [14] guide visitors by suggesting pages they may like
`
`3.42
`
`3.13
`
`2.75
`
`2.63
`
`2.74
`
`2.47
`
`2.31
`
`2.72 2.70
`
`0 shortcuts
`1 shortcut
`3 shortcuts
`5 shortcuts
`
`2.45 2.45
`
`Unmodified
`
`Unconditional
`model
`
`Markov
`model
`
`10
`25
`Naïve Bayes
`mixture model
`
`25
`
`10
`Mixture of
`Markov models
`
`4.00
`
`3.50
`
`3.00
`
`2.50
`
`2.00
`
`1.50
`
`1.00
`
`0.50
`
`0.00
`
`Average # of links per trail
`
`SCEA Ex. 1046 Page 5
`
`
`
`to see. Our work differs from this earlier agent-based work in
`that we concentrate on personalizing the web site for each vis-
`itor, instead of merely suggesting which links to follow next,
`and in the depth to which we mine the web logs for useful
`access patterns.
`
`6 Future work
`Our approach to personalization as search offers many fruit-
`ful lines of continued research; we are currently exploring
`several directions. First, we are investigating how to incor-
`porate MINPATH’s cluster-based model of visitors into PRO-
`TEUS’s expected utility framework. The models MINPATH
`and PROTEUS employ differ substantially: MINPATH builds
`a predictive model of visitor behavior, while PROTEUS builds
`a descriptive model of visitor behavior and interests. At the
`heart of this work is applying the cluster-based techniques to
`descriptive models.
`Second, we are exploring how to automatically select con-
`cise and descriptive anchor texts for shortcut links. PRO-
`TEUS’s heuristics to select either a shortcut destination’s
`<title> or <h1> text as the anchor are frequently in-
`adequate. The title text of a shortcut destination may contain
`redundant information. A page’s <h1> tag may contain too
`little information, or may not exist at all. Instead, we propose
`using an information-based approach to select anchor words
`with the highest information content.
`In a third direction we are considering incorporating guid-
`ance from both the web master and the web visitor into the
`personalizer’s search process. The web master can provide
`suggestions, either directly or in the form of annotations in
`HTML documents, on what personalizations are allowed or
`disallowed and which might be the most useful. The web
`visitor can express to the personalizer certain aspects of the
`visitor model, perhaps as a list of keywords the visitor finds
`interesting, or a set of pages that the visitor wants to be able
`to easily find.
`Finally, we are expanding the set of transformations that
`PROTEUS can make to a site. Specifically, we are investi-
`gating a transformation that can aggregate blocks of content
`from many pages into a single view for the visitor. Such a
`transformation can allow PROTEUS to effectively create “por-
`tal pages” that integrate information from several separate re-
`sources on a web site. The challenge of this work lies in ap-
`propriately melding blocks of content into a cohesive whole
`by carefully analyzing the donor and recipient documents for
`clues about their presentation.
`
`7 Conclusions
`As the community of mobile web visitors grows, so grows the
`need for web sites to cater to visitors off the desktop and off
`broadband network connections. We propose building web
`site personalizers to meet the need of this community, and
`this paper gives an overview of our approach to personalizers
`as utility-maximizing search. Initial results from our person-
`alizers, PROTEUS and MINPATH, indicate that this approach
`holds great promise for improving the mobile web experi-
`ence. In the future we plan to extend their capabilities and
`continue to improve browsing for mobile visitors.
`
`[7]
`
`[5]
`
`References
`[1] C. R. Anderson, P. Domingos, and D. S. Weld. Adaptive
`web navigation for wireless devices. in Proc. of the 17th
`Intl. Joint Conf. on Art. Int., 2001.
`[2] C. R. Anderson, P. Domingos, and D. S. Weld. Person-
`alizing web sites for mobile users. In Proc. of the 10th
`Intl. WWW Conf., 2001.
`[3] T. W. Bickmore and B. N. Schilit. Digestor: Device-
`independent access to the World Wide Web. In Proc. of
`the 6th Intl. WWW Conf., 1997.
`[4] D. Billsus, M. J. Pazzani, and J. Chen. A learning agent
`for wireless news access. In Proc. of the Conf. on Intel-
`ligent User Interfaces, 2000.
`I. V. Cadez, D. Heckerman, C. Meek, P. Smyth, and
`S. White. Visualization of navigation patterns on a web
`site using model based clustering. In Proc. of the 6th.
`Intl. Conf. on Know. Disc. and Data Mining, 2000.
`[6] P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and
`D. Freeman. AutoClass: A Bayesian classification sys-
`tem. In Proc. of the 5th Intl. Conf. on Mac. Learning,
`1988.
`J. Freire and B. Kumar. Web services and information
`delivery for diverse environments. In Proc. of the VLDB
`Wkshp. on Tech. for E-Services, 2000.
`[8] T. Joachims, D. Freitag, and T. Mitchell. WebWatcher:
`A tour guide for the World Wide Web. In Proc. of the
`15th Intl. Joint Conf. on Art. Int., 1997.
`[9] R. L. Keeney and H. Raiffa. Decisions with Multiple
`Objectives: Preferences and Value Trade-Offs. Wiley,
`New York, NY, 1976.
`[10] J. Kleinberg. Authoritative sources in a hyperlinked en-
`vironment. In Proc. 9th ACM-SIAM Symp. on Discrete
`Alg., 1998.
`[11] H. Lieberman. Letizia: An agent that assists web brows-
`ing.
`In Proc. of the 14th Intl Joint Conf. on Art. Int.,
`1995.
`[12] P. P. Maglio and R. Barrett. Intermediaries personalize
`information streams. Comm. of the ACM, 43(8), 2000.
`[13] B. Mobasher, H. Dai, T. Luo, Y. Sun, and J. Zhu. Com-
`bining web usage and content mining for more effec-
`tive personalization.
`In Proc. of the Intl. Conf. on E-
`Commerce and Web Technologies (ECWeb), 2000.
`[14] M. J. Pazzani and D. Billsus. Adaptive web site agents.
`In Proc. of the 3rd Intl. Conf. on Auto. Agents, 1999.
`[15] M. Perkowitz and O. Etzioni. Towards adaptive web
`sites: Conceptual framework and case study. Art. Int. J.,
`118(1–2), 2000.
`[16] A. Wexelblat and P. Maes. Footprints