throbber
A Multilevel Approach to Intelligent
`Information Filtering: Model, System, and
`Evaluation
`
`J. MOSTAFA
`Indiana University
`S. MUKHOPADHYAY
`Purdue University
`W. LAM
`The Chinese University of Hong Kong
`and
`M. PALAKAL
`Purdue University
`
`In information-filtering environments, uncertainties associated with changing interests of the
`user and the dynamic document stream must be handled efficiently. In this article, a filtering
`model
`is proposed that decomposes the overall task into subsystem functionalities and
`highlights the need for multiple adaptation techniques to cope with uncertainties. A filtering
`system, SIFTER, has been implemented based on the model, using established techniques in
`information retrieval and artificial intelligence. These techniques include document represen-
`tation by a vector-space model, document classification by unsupervised learning, and user
`modeling by reinforcement learning. The system can filter information based on content and a
`user’s specific interests. The user’s interests are automatically learned with only limited user
`intervention in the form of optional relevance feedback for documents. We also describe
`experimental studies conducted with SIFTER to filter computer and information science
`documents collected from the Internet and commercial database services. The experimental
`results demonstrate that the system performs very well in filtering documents in a realistic
`problem setting.
`Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Informa-
`tion Search and Retrieval—clustering; selection process; I.2.6 [Artificial Intelligence]:
`Learning; I.7.3 [Text Processing]: Index Generation
`
`S. Mukhopadhyay was partially supported by NSF CAREER grant ECS-9623971 during the
`course of the research reported in this article.
`Authors’ addresses: J. Mostafa, School of Library and Information Science, Indiana Univer-
`sity, Bloomington, IN 47405-1801; email: jm@juliet.ucs.indiana.edu; S. Mukhopadhyay and M.
`Palakal, Computer and Information Science, Purdue University School of Science at Indianap-
`olis, Indianapolis, IN 46202; W. Lam, Department of Systems Engineering and Engineering
`Management, The Chinese University of Hong Kong, Shatin, Hong Kong.
`Permission to make digital / hard copy of part or all of this work for personal or classroom use
`is granted without fee provided that the copies are not made or distributed for profit or
`commercial advantage, the copyright notice, the title of the publication, and its date appear,
`and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to
`republish, to post on servers, or to redistribute to lists, requires prior specific permission
`and / or a fee.
`© 1997 ACM 1046-8188/97/1000 –0368 $03.50
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997, Pages 368 –399.
`
`IPR2017-01039
`Unified EX1025 Page 1
`
`

`

`A Multilevel Approach to Intelligent Information Filtering
`
`•
`
`369
`
`General Terms: Algorithms, Experimentation, Theory
`Additional Key Words and Phrases: Automated document representation, information filter-
`ing, user modeling
`
`1. INTRODUCTION
`Information-filtering (IF) systems have recently gained popularity, mainly
`as part of various information services based on the Internet [Edwards et
`al. 1996; Oard 1996]. These systems are similar to conventional informa-
`tion retrieval (IR) systems in that they aid in selecting documents that
`satisfy users’ information needs. However, certain fundamental differences
`do exist between IF and IR systems, making IF systems interesting and an
`independent object of analysis [Belkin and Croft 1992]. IR systems are
`usually designed to facilitate rapid retrieval of
`information units for
`relatively short-term needs of a diverse population of users. In contrast, IF
`systems are commonly personalized to support long-term information needs
`of a particular user or a group of users with similar needs. They accomplish
`the goal of personalization by directly or indirectly acquiring information
`from the user. In IF systems, these long-term information needs are
`represented as interest profiles (Lewis [1992a] refers to them as standing
`queries), which are subsequently used for matching or ranking purposes.
`The interest profiles are maintained beyond a single session and may be
`modified based on users’ feedback. Another important difference has to do
`with the document source. IR systems usually operate on a relatively static
`set of documents, whereas IF systems are usually concerned with identify-
`ing relevant documents from a continuously changing document stream.
`To operate efficiently, IF systems must acquire and maintain accurate
`knowledge regarding documents as well as users. The dynamic nature of
`users’ interests and the document stream makes the maintenance of such
`knowledge quite complex. Acquiring correct user interest profiles is diffi-
`cult, since users may be unsure of their interests and may not wish to
`invest a great deal of effort in creating such a profile. Acquiring informa-
`tion regarding documents is also difficult, because of the size of the
`document stream and the computational demands associated with parsing
`voluminous texts. At any time, new topics may be introduced in the
`document stream, or user’s interests related to topics may change. Further-
`more, sufficiently representative documents may not be available to facili-
`tate a priori analysis or training. Research on filtering, so far, has not
`clarified to a significant extent how these particular problems associated
`with users and documents may influence the overall filtering process.
`In this article, we present both an analytical and an empirical examina-
`tion of the basic problems in filtering. In our investigation here of the
`demands placed on IF systems, we identify the relevant functions and
`express them at a suitable abstraction level. This abstraction (we refer to it
`as the model) is then implemented as a system using well-known tech-
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 2
`
`

`

`370
`
`•
`
`J. Mostafa et al.
`
`niques from information science and machine learning. Following this, the
`performance of the resulting system is subjected to rigorous experimental
`analysis to clarify the influence of major constituent functions on the
`overall filtering process. The primary objective of an IF system is to
`perform a mapping from a space of documents to a space of user relevance
`values. This mapping, in turn, can be decomposed into a multilevel process,
`where the intermediate functions involve the subproblems of representa-
`tion, classification, and profile management. To ensure effective service, we
`further assume that these functions must be realized under two strict
`constraints. First, user intervention in the operation of the system must be
`minimized. That is, the system should rely on automated techniques as
`much as possible for acquiring information about documents and users.
`Second, when faced with changes in documents or users’ information needs,
`the system must adjust quickly with little or no degradation in perfor-
`mance.
`In the rest of this section, we discuss in more detail the challenges
`associated with performing effective filtering while minimizing user inter-
`vention and system degradation. We then identify some of the basic
`problems associated with filtering and delineate our approach for address-
`ing them. We conclude the section by surveying related research. In Section
`2, we present our model for information filtering. A description of an
`implementation of the model, named SIFTER (Smart Information Filtering
`Technology for Electronic Resources), is provided in Section 3. Results of
`experimental analysis conducted on SIFTER (and indirectly on the under-
`lying model used) are presented in Section 4. In Section 5, we discuss
`possible future extensions of SIFTER. Finally, we present our conclusions in
`Section 6.
`
`1.1 Problem Description
`Uncertainties in the filtering environment— especially the dynamic nature
`of users’ interests and the document stream—make it extremely difficult to
`gather and maintain accurate information necessary for filtering. Rapid or
`gradual changes introduced in the environment, viewed from the perspec-
`tive of the filtering system, are sources of uncertainty. To manage such
`uncertainties requires a high level of adaptivity on the system’s part. This
`adaptivity can be achieved by applying various machine-learning tech-
`niques. The overall problem of IF may then be broadly posed as learning a
`map from a space of documents to the space of real-valued user relevance
`factors. More precisely, denoting the space of documents as D, the objective
`is to learn a map f : D 3 ⺢ such that f(d) corresponds to the relevance of a
`document d. Given that such a map is known for all points in D, a finite set
`of documents can always be rank-ordered and presented in a prioritized
`fashion to the user.
`In an IF system, f is not known a priori and has to be estimated on-line
`based on queries and user feedback. This could, in principle, be accom-
`plished by setting up some form of a parameterized map approximator
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 3
`
`

`

`A Multilevel Approach to Intelligent Information Filtering
`
`•
`
`371
`
`(such as artificial neural networks) and updating the parameters based on
`the feedback. Such a direct on-line learning of the map f, however, is
`computationally intensive and requires a large number of user feedbacks,
`considering the high dimensionality of any reasonable representation of the
`documents. To provide a practically feasible solution to the filtering prob-
`lem, we decompose the latter into two levels. The higher level represents a
`classification mapping f1 from the document space to a finite number of
`. . . , Cm} (i.e., f1 : D 3 {C1,
`classes {C1,
`. . . , Cm}). This mapping is
`learned in an off-line setting, based on a representative database of
`documents, either by using prior information concerning the classes and
`examples or by automatically discovering abstractions using a clustering
`technique. Hence, this higher level partitions the document space into m
`equivalent classes over which user relevance is estimated. The lower level
`subsequently estimates the mapping f2 describing user relevance for the
`different classes (i.e., f2 : {C1, . . . , Cm} 3 ⺢). Since f2, unlike f and f1,
`deals with a finite input set of relatively few classes, the on-line learning of
`f2 is not unrealistically time consuming and burdensome on the user. Thus,
`the map f is being learned as the composition of f1 and f2. The decomposi-
`tion of f into f1 and f2 clearly limits the maximum achievable filtering
`accuracy, since a class may not correspond to a constant user interest.
`However, in our experience, the resulting inaccuracy is more than ade-
`quately compensated for by the substantial reduction in learning complex-
`ity. If greater accuracy is desired, it can be achieved as a two-stage process.
`In the first stage, a two-level map (i.e., f1 and f2) is learned as stated
`before. Subsequently, a more general single-level learning scheme can be
`initialized on the basis of learned f1 and f2. From then onward, the general
`map can be used for ranking purposes and can be updated on the basis of
`user feedback.
`Decomposition of f only aids in reducing the learning complexity; it does
`not eliminate it. The on-line learning problem is made even more difficult
`due to the following factors:
`
`(1) Difficulty of Representation: In general, it is not possible to represent D
`exactly by a finite-dimensional space that corresponds to some features
`of the documents (e.g., the relative frequencies of some predefined
`keywords). Hence, any finite-dimensional representation space D⬘ is
`merely an approximation to D, and there is always a loss of information
`in the process. The area of document representation and indexing
`[Salton and McGill 1983] is devoted to discovering methods for finite-
`dimensional representations that minimize the information loss in
`some sense. In a dynamic environment, to make the problem more
`difficult, the most preferable representation scheme is also a function of
`time. The choice of the representation scheme directly affects the
`realization of function f1.
`(2) Stochasticity of Feedback: The user relevance feedback may at certain
`times appear to be random to the filtering system. This can occur due to
`several reasons. First, the particular user interacting with the system
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 4
`
`

`

`372
`
`•
`
`J. Mostafa et al.
`
`may have uncertain needs or may not be very discriminating in
`expressing his or her needs. Second, depending on the f1 chosen, the
`target classes may not correspond to the way a user would normally
`group documents. This may lead to the generation of different user
`relevance feedback values for documents belonging to the same class.
`The third and final factor relates to the difficulty described in (1). On
`certain occasions, user feedback may be motivated by particular fea-
`tures (e.g., keywords) in documents that are actually not part of the
`underlying representation scheme. Feedback generated based on such
`“missing features” would appear as random, because the system would
`be unable to determine what caused such feedback.
`(3) Changing Interests of the User: Due to personal or professional reasons,
`a user’s interests may shift or change. These changes may happen in a
`relatively short duration of time or over a long period. We refer to all
`such situations as the nonstationary user case. The shifts can affect the
`user’s interests partially or fully. Whatever the scope of such shifts, the
`interest profile must be updated accordingly. The map f2 is directly
`affected by this problem.
`
`As mentioned earlier, due to the inherent complexity, filtering based on a
`direct learning approach is very difficult to accomplish in an efficient
`fashion. Decomposition allows us to isolate more specific problems, and we
`solve them by relying on existing and newly developed approaches. The
`main contributions of this article can now be summarized as follows:
`
`—We present a general model of filtering. As a way to reduce complexity,
`the architecture of the model incorporates multilevel functional decompo-
`sition and supports generality through modularity. It admits application
`of virtually any preferred techniques for basic tasks involving represen-
`tation, classification, and profile management.
`—The idea of learning is made central to the filtering process. We show
`how learning techniques can support the high degree of adaptivity
`required while minimizing user intervention. We apply learning tech-
`niques for acquiring information about both documents and users. To
`support adaptation to changes in the document stream, an unsupervised
`cluster discovery method is used. A reinforcement learning algorithm
`with very low overhead is used for user interest profile acquisition.
`—We demonstrate how representation can be conducted on a dynamic
`stream of text. The method provides a high degree of control in determin-
`ing what content to capture and what to ignore. The classification process
`is also designed to be flexible. The set of classes (i.e., the target of f1) can
`easily be changed by invocation of a relearning process. Both of these
`features allow convenient tuning of the filter to minimize user interven-
`tion.
`—We describe a method to handle profile degradation due to shifts in user
`interests. Graceful handling of interest shifts without requiring addi-
`tional data from the user is supported by the method. It is capable of
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 5
`
`

`

`A Multilevel Approach to Intelligent Information Filtering
`
`•
`
`373
`
`detecting multiple interest shifts in the same user and can take appropri-
`ate actions to minimize possible negative effects on the function f2.
`Preliminary simulation experiments involving an implementation of the
`general model have been reported in earlier sources [Lam et al. 1996;
`Mukhopadhyay et al. 1996]. These experiments simulated the operation of
`filtering LISTSERV emails. In this article, we describe the integration of
`all functionalities into a complete working system, conduct studies involv-
`ing human users in a real-world filtering application, and systematically
`analyze the influence of various user- and system-related parameters on
`the filtering performance.
`
`1.2 Related Work
`As mentioned in the introduction, IF systems are strongly related to IR
`systems in their functional goals and in the methods they apply to accom-
`plish those goals. Belkin and Croft [1992] provided an excellent review of
`IF, comparing it with IR and several other closely related processes (e.g.,
`text routing, text categorization, etc.). We do not intend to repeat such a
`comparison here. Instead, we review literature that deals more directly
`with the problems delineated in the last section.
`A basic filtering problem is to transform a large volume of information
`(text) into entities that permit efficient computation without significant
`loss of content. In our formulation of the problem, this is the objective of
`function f1, mapping documents to a more limited space of document
`classes. This particular task is generally referred to in IR as automated
`document classification. The first step in this process demands that a
`representative feature set for each document be identified. Various tech-
`niques have been developed for feature selection, ranging from simple
`procedures that calculate statistical distribution of keywords to more
`sophisticated techniques relying on analysis based on natural language
`processing (NLP) algorithms. Lewis [1992a] provided a thorough review of
`feature selection procedures and the influence of such procedures on
`document classification. The general and surprising finding of feature
`selection research in IR is that simple keyword-distribution-based ap-
`proaches are almost as effective as more sophisticated approaches [Lewis
`1992b]. The next step in classification involves assigning documents to one
`or more groups. In IR, hierarchical cluster generation techniques such as
`single-link and complete-link methods are commonly applied [Salton 1989].
`A particular track of research in IR has concentrated on generating
`predictive classifier functions based on off-line training conducted on
`representative document sets. As far back as 1963, Borko and Bernick
`[1963] described a text classifier based on a simple linear regression model
`that produced good results. A more recent successful effort that also
`applied a linear model classifier, based on least-squares fit, was described
`by Yang and Chute [1994]. The DARPA-sponsored MUC (Message Under-
`standing Conferences) initiative has generated significant research in the
`area of text routing [Lewis and Tong 1992]. The MUC efforts rely on strong
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 6
`
`

`

`374
`
`•
`
`J. Mostafa et al.
`
`NLP approaches to develop classifiers, since analysis is necessary at a
`fine-grain level to assess document content (e.g., identification of terrorist
`events based on news stories). It has been demonstrated, however, that less
`complex linear models may be appropriate if the type of information a
`system must handle is relatively simple (e.g., elements that constitute
`bibliographic document information). Generally, in most IF systems, the
`classification process has to be conducted fast, whereas the classifier
`building process can be delegated to a slower process.
`Traditionally, IR placed little attention on the users’ role—specifically,
`identification of users’ interests, representation of interests, and applica-
`tion of such representations in interactions [Belkin and Croft 1992]. My-
`aeng and Korfhage’s [1990] work on user profiles is one of the few and
`important efforts in this area. It attempted to integrate user interest
`profiles in IR systems and focused on various combinations of queries and
`profiles in enhancing retrieval. The profiles however had certain limita-
`tions. They had to be contributed directly by the user (who may be
`uncertain or unwilling to take the trouble), and profiles did not change
`during interaction. To keep up with changes in users’ interests automati-
`cally, systems can rely on internal knowledge representations or on learn-
`ing. Rich [1983] demonstrated how in an IR setting, in the absence of direct
`evidence about information needs, stereotypes can be applied to generate
`user models representing long-term interests. This is an innovative tech-
`nique; however, substantive human investment in knowledge engineering
`would be required to build the user stereotypes. Relevance feedback, a
`highly constrained and indirect form of evidence, has been successfully
`used to learn and adapt representations used for the purpose of query
`reformulation [Frants et al. 1993; Goker and McCluskey 1991]. It should be
`noted, though, that in many IF systems queries (in an IR sense) are not
`necessary, and users’ interests are more stable than in typical IR situa-
`tions. These factors must be taken into consideration in devising methods
`that minimize user involvement in profile management.
`A body of IF research exists that directly addresses problems associated
`with profile acquisition and maintenance, applying mostly AI-based tech-
`niques. Malone et al.
`[1987] described an intelligent message-sharing
`system called InfoLens in which users can generate profiles using rules.
`The rules prescribe appropriate actions with tests on content-based factors
`such as message type, date, and sender. Such explicit user-based knowl-
`edge acquisition methods support a high degree of transparency, permit-
`ting users to follow an “up-to-the-moment” knowledge state of the system.
`InfoScope, a system that applies a similar technique, has been developed
`for filtering Usenet news [Fischer and Stevens 1991]. InfoScope uses
`heuristic rules associating common patterns of usage (e.g., number of
`sessions, newsgroups read, frequencies of relevant terms in an article, etc.)
`to appropriate actions. To refine profiles, users must add or remove terms
`from the profile and must set appropriate rule-triggering thresholds. The
`requirement for direct and explicit user input for profile management, in
`our view, is somewhat demanding, and furthermore, such rule-based ap-
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 7
`
`

`

`A Multilevel Approach to Intelligent Information Filtering
`
`•
`
`375
`
`proaches may be too “brittle” to support efficient profile adaptation. News-
`Weeder [Lang 1995] is another Usenet filtering tool. In this, users’ ratings
`of documents are used as training examples for a machine-learning algo-
`rithm that is executed nightly to generate the user interest profiles for the
`next day. By limiting the user input to only ratings of documents, News-
`Weeder is successful in reducing user involvement. However, NewsWeed-
`er’s inability to adapt the profile in an on-line fashion limits its utility.
`SIFT (Stanford Information Filtering Tool) has also been developed to filter
`Usenet news [Yan and Garcia-Molina 1995]. SIFT requires users to specify
`keywords to generate the initial profile. Depending on the user’s choice, the
`filter may be represented using the vector-space model or simply as a
`boolean formula. If a vector-space approach is selected, SIFT can provide
`some adaptivity in profile refinement. In this mode, SIFT requires users to
`provide relevance feedback (by pointing out documents of interest), based
`on which weights in the profile are adjusted accordingly. Finally, NewT
`(news tailor) [Seth 1994] offers the user the option to select multiple
`profiles from a set of predefined profiles that cover common topical areas.
`NewT also applies relevance feedback for profile adaptation. To further
`reduce user involvement in profile refinement, NewT utilizes a genetic
`algorithm to evolve profiles toward increased fitness.
`In summary, IR provides a solid basis to exploit various document
`representation techniques, especially for the intermediate IF stage of
`document classification (i.e., f1). Relevance feedback and machine-learning-
`based approaches show promise in handling the subsequent IF operation of
`user modeling. However, at this point little is known as to how multiple
`functional components can be integrated satisfactorily in a single IF
`system, and additional empirical evidence is required to clarify how char-
`acteristics associated with users and the document stream may affect
`filtering performance.
`
`2. FILTERING MODEL
`There are three important and independent entities that constitute a
`filtering environment. These are the document source, the filter, and the
`user (Figure 1). Documents may exist at various sites and may be received
`by the user through disparate channels. The task of storing such docu-
`ments, before filtering,
`is handled by a component we call document
`acquisition and management (DAM). DAM is a separate component from
`the filter, and its actual design may vary from one environment to another.
`For example, at its core, DAM may be a web-crawler utility that retrieves
`documents from designated sites, a daemon that maintains indexed files, or
`even a sophisticated DBMS. Whatever the construction of DAM, when
`invoked, it would produce a stream of documents that flows into the filter.
`The filter itself consists primarily of three modules: (M1) representer,
`(M2) classifier, and (M3) profile manager. In the context of the multilevel
`decomposition of the map f : D 3 ⺢ (i.e., f ⫽ f2 ⴰ f1) discussed in Section
`1.1, M1 determines the input space for f1; M2 maps the resulting vector
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 8
`
`

`

`376
`
`•
`
`J. Mostafa et al.
`
`Fig. 1. Model of the filtering process.
`
`representation to the classification space (i.e., the output for f1); and M3
`implements the mapping f2. The functions of these modules are best
`described in terms of two different modes: filter application and filter
`tuning. In the filter application mode, upon arrival of new documents the
`representer module transforms the stream into more efficient representa-
`tions. This transformation would involve identifying relevant concepts and
`correctly assessing the discriminatory value of concepts in relation to
`specific documents. To avoid unnecessary parsing of concepts, a thesaurus
`management submodule would be used to select concepts for only those
`domains that are of interest to the user. The function of the classifier
`module is to identify for each document its corresponding document class or
`group. The classifier module utilizes a classification scheme, generated by a
`submodule, as an off-line process. In selecting an appropriate size for the
`space of classes, a crucial constraint must be followed. The space of classes
`in the filter must be smaller than the space of input document space. This
`aspect of the filtering model ensures a significant reduction in computa-
`tional complexity (see the discussion in Section 1.1). The profile manager
`module has the dual role of maintaining accurate interest profiles and
`applying the profiles to assess the relevance of documents. Profile represen-
`tation constitutes information concerning user preferences for document
`classes utilized by the filter. Such preference information may be acquired
`in various ways, but the method requiring the least user effort should be
`favored (i.e., it should be the default method used by the system). It
`appears that the best automatic profile acquisition methods are available
`from the machine-learning literature, relying on relevance feedback from
`the user. Whatever method is ultimately chosen, users should always have
`the option to enter or modify values in their profiles directly to ensure
`transparency of the filter. Once profile representation is achieved, docu-
`ments are ranked in relation to their membership in classes. It is worth
`noting here that, due to the strict imposition of a class space, assignment of
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 9
`
`

`

`A Multilevel Approach to Intelligent Information Filtering
`
`•
`
`377
`
`semantically related documents to different classes may occur. But, as
`profile learning is always conducted over the set of classes, it would have
`minimal effect on the overall document ranking. After the profile is
`learned, the classes that are semantically related are treated approxi-
`mately equally by the system, for ranking purposes.
`At the output end of the filtering model lies the presentation and access
`management (PAM) system. PAM is more tightly coupled with the filter
`than DAM and would normally be the user interface of the filtering system.
`To support the filter application mode, PAM can offer various functions, the
`most important being the actual presentation of documents. PAM must
`allow the user to select documents for display and to control the way
`documents are actually displayed (e.g., window size, font size, color, etc.).
`Another important function of PAM is to collect information for the purpose
`of profile management. For example,
`if relevance feedback is chosen,
`functions should exist to permit users to point out relevant documents.
`In modeling the filter, we also identified ways of tuning the filter so as to
`customize and improve its performance. Various types of tuning operations
`can be performed to influence the behavior of the three modules that
`constitute the filter. The frequency of such tuning would vary depending on
`the proximity of the particular module to the user (i.e., from the PAM end
`of the model). Hence, the profile manager is subjected to frequent tuning.
`An important type of tuning that applies to the profile manager module is
`avoidance of profile degradation when a user’s interests change due to some
`external circumstances. Because such a case can have an immediate effect
`on the filter’s performance, it should preferably be handled automatically.
`This would require continuous monitoring of users’ feedback and predicting
`shifts as quickly as possible. We show this tuning operation as a submodule
`of the profile manager module. The structure, size, and content of the
`classification scheme can also have a significant influence on the filter’s
`behavior. Such a scheme is usually generated using a training document
`set (a large and representative document set). However, the content of the
`document stream may change sufficiently over time to demand regenera-
`tion of the classification scheme. This type of tuning would be necessary
`less frequently and can be conducted by a submodule of the classifier (using
`the last n documents as the new training set). Finally, the structure and
`content of the thesaurus may directly affect document representation and
`consequently the rest of the filtering processes. When a domain or a field
`experiences significant change (which usually happens very slowly), tuning
`operations would be needed to update the thesaurus to keep up with such
`changes. We show these operations as a submodule of the representer
`module.
`
`3. SIFTER: AN IMPLEMENTATION OF THE FILTERING MODEL
`As a way to empirically investigate the utility of the model, we imple-
`mented a filtering system named SIFTER (written in C and TCL/TK for a
`Unix environment) that incorporates the major components described in
`
`ACM Transactions on Information Systems, Vol. 15, No. 4, October 1997.
`
`IPR2017-01039
`Unified EX1025 Page 10
`
`

`

`378
`
`•
`
`J. Mostafa et al.
`
`the last section. We now describe these components in detail. We begin
`with the filter part of SIFTER, focusing mainly on the three constituent
`modules.
`
`3.1 Document Representation Using a Vector-Space Model
`The first component of the filter (i.e., the document representation module)
`needs to convert documents into structures that can efficiently be parsed
`without the loss of vital content. We chose the vector-space model [Salton
`1989] for document representation, because it has been widely tested and is
`general enough to support other computational requirements of the filter-
`ing environment. This, in turn, relies on a thesaurus management submod-
`ule. At the core of the latter is a set of technical terms or concepts culled
`from authoritative sources representing a given area (presently, we are
`using the ACM Computing Reviews Classi

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket