`Tso et al.
`
`111111111111111 IIII IIII 1111 11111111111111111 IIII IIIII Ill Ill Ill
`US006385602Bl
`US 6,385,602 Bl
`May 7, 2002
`
`(10) Patent No.:
`( 45) Date of Patent:
`
`(54) PRESENTATION OF SFARCH RESULTS
`USING DYNAMIC CATEGORI7ATION
`
`(75)
`
`Inventors: Michael TS>, Cupertino; Jeff Clarke,
`Menlo Park; Eugene Rollins,
`Sunnyvale; Arkady Borkovsky, San
`Francisco, all of CA (US)
`
`(73) Assignee: e-centlves, Inc., Bethesda, M D (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S .C. 154(b) by0 days.
`
`(21) Appl. No.: 09/186,044
`
`Nov. 3, 1998
`(22) Filed:
`(51) Int. Cl.7 ................................................ G06F 17/30
`(52) U.S. Cl. ........................................................... 707/3
`(58) Field of Search ....................... 707/1 10, 100 104,
`707/201 203, 205; 709/201 203, 217; 345/473,
`428
`
`(56)
`
`References Cited
`
`U.S. PJUENT DOCUMENTS
`
`5,875,446 A
`5,924,090 A
`5,940,821 A
`6,028,605 A
`
`• 2/1999 Brown et al. .................. 7ITT/3
`• 7/1999 Krellenstein ................... 7ITT/5
`• 8/1999 Wical ............................ 7ITT/3
`•
`'2/2fXX) Conrad et al. .............. 345/354
`
`OTHER PUBLICATIONS
`
`Can et al., " Concept and effectiveness of the cover coeffi
`cient based clustering methodology for text database",
`1990, ACM Press, NY, USA, pp. 483 517. *
`* cited by examiner
`
`Primary Examiner John Breene
`Assistant Examiner-Mohammad Ali
`(74) Attorney, Agent, or Firm-Hickman Palermo Truong
`& Becker LLP; Edward A. Becker
`ABSTRACT
`
`(57)
`
`An approach for presenting search results using dynamic
`categorization involves examining search results and
`dynamically establishing one or more categories of search
`results based upon attributes of the search results. A variety
`of grouping or clustering techniques may be used to dynami
`cally establish the categories of search results. The catego
`ries of search results are then presented using category
`indicators.
`
`5,742,816 A
`
`• 4/1998 Barr et al. .................. 707/104
`
`48 Claims, 5 Drawing ShEets
`
`100 "
`
`START
`
`102
`
`RECEIVE SEARCH RESl.l. TS
`
`DYNAMICAU Y ESTABLISH 01\E CR P.mE SEARCH
`RESULT CATE~ES BASED UPOJ ATTRIBUTES
`CfTI-E MATCHING DATA ITEMS
`
`106
`
`PRESENT SEAFDi RESULTS BASED UPON THE
`OOE OR MORE SEARQ-i RESULT CATEGCRIES
`
`108
`
`END
`
`110
`
`001
`
`GOOGLE 1006
`
`
`
`U.S. Patent
`
`May 7, 2002
`
`Sheet 1 of 5
`
`US 6,385,602 Bl
`
`FIG. 1
`
`100 "
`
`102
`
`RECEIVE SEARCH RESULTS
`
`104
`
`DYNAMICALLY ESTABLISH ONE OR MORE SEARCH
`RESULT CATEGORIES BASED UPON ATTRIBUTES
`OF THE MATCHING DATA ITEMS
`
`106
`
`PRESENT SEARCH RESULTS BASED UPON THE
`ONE OR MORE SEARCH RESULT CATEGORIES
`
`110
`
`002
`
`
`
`U.S. Patent
`
`May 7, 2002
`
`Sheet 2 of 5
`
`US 6,385,602 Bl
`
`FIG. 2
`
`200
`\
`
`202
`
`RECEIVE SEARCH RESULTS
`
`204
`
`FILTER SEARCH RESULTS AND GENERATE QUALIFYING DATA ITEMS
`
`OPTIONAL SORT BY ATTRIBUTES OF QUALIFYING DATA ITEMS AND
`GENERATE SORTED SEARCH RESULTS
`
`208
`
`210
`
`IDENTIFY COMMON ATTRIBUTES AMONG QUALIFYING DATA ITEMS
`
`DETERMINE SIMILARllY DATA FOR QUALIFYING DATA ITEMS
`
`212
`
`214
`
`GROUP QUALIFYING DATA ITEMS BASED ON THE SIMILA.Rill' DATA
`
`216
`
`SELECT CATEGORIES BASED ON THE GROUPINGS
`
`218
`
`ASSIGN QUALIFIED DATA ITEMS TO CATEGORIES
`
`220
`
`PRESENT CATEGORIES AND QUALIFIED DATA llEMS
`
`END
`
`224
`
`003
`
`
`
`U.S. Patent
`
`May 7, 2002
`
`Sheet 3 of 5
`
`US 6,385,602 Bl
`
`FIG. 3A
`
`302-- Automobiles: Compact Cars
`Tango
`{
`Foxtrot
`
`308
`
`USER INTERFACE
`
`---,__300
`
`310
`
`{
`
`304- Automobiles: Full Size Cars
`Zebra
`Elephant
`Rhino
`<more in this category>
`
`314-
`
`306-- Automobiles: Sports Cars
`312-
`Spark
`
`FIG. 3B
`
`302- Automobiles: Compact Cars
`Tango
`{
`3oa
`Foxtrot
`
`304- Automobiles: Full Size Cars
`316-
`<$25,000
`Zebra
`{
`320
`Elephant
`318-
`>$25,000
`322-
`Rhino
`324-
`<more in this category>
`
`306-+ Automobiles: Sports Cars
`312----
`Spark
`
`USER INTERFACE
`
`i----...300
`
`004
`
`
`
`U.S. Patent
`
`May 7, 2002
`
`Sheet 4 of 5
`
`US 6,385,602 Bl
`
`FIG. JC
`
`USER INTERFACE
`
`300
`
`Automobiles:
`Automobiles:
`Automobiles:
`Automobiles:
`Automobiles:
`Automobiles:
`Automobiles:
`
`Sub-Compact Cars
`Compact Cars
`Mid-Size Cars
`Full-Size Cars
`Sports Cars
`Classic Cars
`Custom Cars
`
`(12)
`(19)
`(39)
`(24)
`(28)
`(14)
`(2)
`
`\
`332
`
`" 330
`
`005
`
`
`
`FIG.4
`
`DISPLAY
`.4..12
`
`INPUT DEVICE
`ill
`
`CURSOR
`CONTROL
`.4.1fi
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`SERVER
`.430
`
`INTERNET
`
`.402
`
`PROCESSOR
`40A
`
`COMMUN ICATION
`.4..18.
`INTERFACE
`
`NETWORK
`: LINK
`
`- - - - - - - -
`
`-
`
`- - - - - -
`
`-
`
`I 420
`_ __ __ _J
`
`4(lO_
`
`LOCAL
`NETWORK
`ill
`
`HOST
`ill
`
`006
`
`
`
`US 6,385,602 Bl
`
`1
`PRESENTATION OF SEARCH RESULTS
`USING DYNAMIC CATEGORIZATION
`
`FIELD OF THE INVENTION
`
`The present invention relates to information retrieval, and
`more specifically, to an approach for presenting search
`results using dynamic categorization.
`
`BACKGROUND OF THE INVENTION
`
`Information systems provide for the storage, retrieval and
`sometimes management of data. Information is typically
`retrieved from an information system by submitting a query
`to the information system, where the query specifies a set of
`retrieval criteria. The information system processes the 15
`query against a database and provides data that satisfies the
`search criteria (search results) to a user.
`The form of search results depends upon the context in
`which a particular search is performed. For example, in the
`context of a database search, search results might consist of
`a set of rows from a table. In the context of the global
`information network known as the "Internet", the search
`results might consist of links to web pages.
`For the purpose of explanation, the specific data items
`against which a search query is executed are referred to 25
`herein as searchable data items. The set of all searchable data
`items against which a query is executed is referred to herein
`as the searchable data set. The specific searchable data items
`that satisfy a particular query are referred to herein as
`matching data items. The set of all matching data items for 30
`a given query are referred to herein as the search results of
`the query.
`Processing a query containing general or generic search
`terms against a large searchable data set can result in a large
`number of unorganized matching data items, sometimes
`referred to as "hits." For example, processing a query
`containing general or generic terms on the Internet can
`generate millions of hits.
`On the Internet, search queries are processed by search
`tools known as "search engines" that typically present a
`sequential list of matching data items ranked by relevance,
`from most relevant to least relevant. As a result, the match(cid:173)
`ing data items that best satisfy the search criteria are
`presented at the top of the list, with the other matching data
`items presented further down the list in order of decreasing
`relevance. For example, web pages or web sites with web
`pages that contain the greatest number of the search terms
`receive the highest relevance ranking and are presented at
`the top of the list.
`Because the search results are presented serially, with
`approximately ten to twenty hits per page, reviewing a large
`number of hits, for example several thousand, or even only
`several hundred hits, is often impractical. This is not nec(cid:173)
`essarily a problem in situations where the relevancy ranking 55
`drops off quickly after a relatively few number of hits
`because a user will typically only view the most relevant
`matching data items. However, in situations where a large
`number of hits have a high relevancy ranking, it can be
`impractical to review all of the most relevant hits.
`One alternative approach for presenting search results is
`the static category approach. The static category approach
`involves pre-assigning all searchable data items to pre(cid:173)
`defined or "static" subject matter categories based upon their
`content When a search is performed, a relatively fewer
`number of categories that satisfy the search criteria are
`displayed instead of or, in addition to, the actual matching
`
`2
`data items. The members of those static categories (which
`may or may not satisfy the search criteria) can then be
`accessed through the categories.
`In the context of the Internet, for example, all web pages
`5 and web sites containing subject matter relating to the topic
`of baseball would be statically assigned to a baseball cat(cid:173)
`egory. When a query containing the term "baseball" is
`processed, the baseball category is displayed, instead of or
`in addition to, all of the individual web pages that satisfy the
`10 query terms. A user can then select the baseball category to
`view the web pages and web sites assigned to the baseball
`category. Categories containing a large number of search(cid:173)
`able data items can be divided into sub-categories to create
`a statically-defined category hierarchy.
`Although the static category approach is helpful in allow-
`ing a user to navigate through a large number of searchable
`data items in an organized manner, it suffers from several
`drawbacks. First, if the amount of information being
`searched is large, a large amount of resources can be
`20 required to pre-assign all of the searchable data items to
`categories. Furthermore, when the searchable data set
`changes, the category assignments must be updated to reflect
`the changes. For example, if new searchable data items are
`added to the searchable data set and the categories are not
`updated to reflect the new searchable data items, then a user
`cannot access the new searchable data items through the
`categories. As a result, the new searchable data items that
`cannot be accessed through the categories are effectively
`lost.
`Another drawback to the static category approach is that
`the statically-defined categories may not be helpful in find(cid:173)
`ing information that does not fit squarely into the predefined
`categories. Thus, a search may result in the display often
`categories, where each of the ten categories has a relatively
`35 low degree of relevance.
`These problems are particularly acute on the Internet for
`at least two reasons. First, the Internet provides access to a
`vast amount of information which requires an enormous
`amount of resources to assign searchable data items to
`40 categories. Secondly, the information available through the
`Internet is constantly changing and new information is being
`added at an astounding rate. Consequently, a large amount of
`resources is required to maintain static categories that do not
`necessarily reflect all of the searchable data set Therefore,
`45 based upon the need to present a large number of matching
`data items in an organized manner and the limitations of
`prior approaches, an approach for presenting a large number
`of matching data items in an organized manner that does not
`suffer from the limitations of prior approaches is highly
`50 desirable.
`
`SUMMARY OF THE INVENTION
`According to one aspect of the invention, a method is
`provided for presenting search results using dynamic cat(cid:173)
`egorization. The method comprises the steps of receiving
`search results, dynamically establishing one or more search
`result categories based upon attributes of the search results
`and presenting one or more category identifiers correspond(cid:173)
`ing to the one or more search result categories.
`According to another aspect of the invention, a method is
`provided for presenting search results on a user interface
`using dynamic categorization. The method comprises the
`steps of dynamically establishing one or more search result
`categories based upon attributes of the search results and
`65 displaying on the user interface one or more interface
`objects corresponding to the one or more search result
`categories.
`
`60
`
`007
`
`
`
`US 6,385,602 Bl
`
`3
`According to another aspect of the invention, a computer
`system is provided for presenting search results to a user
`using dynamic categorization. The computer system com(cid:173)
`prises a user interface, one or more processors and a memory
`coupled to the one or more processors. The memory contains
`one or more sequences of one or more instructions which,
`when executed by the one or more processors, cause the
`computer system to perform the steps of receiving search
`results, dynamically establishing one or more search result
`categories based upon attributes of the search results and
`displaying on the user interface one or more category
`indicators corresponding to the one or more search result
`categories.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Embodiments of the invention are illustrated by way of
`example, and not by way of limitation, in the figures of the
`accompanying drawings and in which like reference numer(cid:173)
`als refer to similar elements and in which:
`FIG. 1 is a high-level flow chart illustrating an approach
`for presenting search results using dynamic categorization
`according to an embodiment of the invention;
`FIG. 2 is a detailed flow chart illustrating an approach for
`presenting search results using dynamic categorization 25
`according to another embodiment of the invention;
`FIG. 3Ais a block diagram illustrating a user interface for
`presenting search results using dynamic categorization
`according to an embodiment of the invention;
`FIG. 3B is a block diagram illustrating a user interface for 30
`presenting search results using dynamic categorization and
`sub-categories according to an embodiment of the invention;
`FIG. 3C is a block diagram illustrating a user interface for
`presenting search results using dynamic categorization and
`user-selectable categories according to an embodiment of 35
`the invention; and
`FIG. 4 is a block diagram of a computer system on which
`embodiments of the invention may be implemented.
`
`4
`hierarchy. Finally, dynamic categorization may be used to
`modify search queries, as described in more detail herein(cid:173)
`after.
`FIG. 1 is a flow chart 100 illustrating an approach for
`5 presenting search results using dynamic categorization
`according to an embodiment of the invention. After starting
`in step 102, in step 104 search results are received. In step
`106, the search results are examined and one or more search
`result categories are dynamically established based upon
`10 attributes of the matching data items that satisfy the query.
`In step 108, the search results are presented to a user based
`upon the one or more search result categories, as described
`in more detail hereinafter. Finally, the process is complete in
`step 110.
`15 1. DYNAMICALLY DETERMINING CATEGORIES
`Dynamically determining categories involves identifying
`similarities and/or dissimilarities of attributes in the match(cid:173)
`ing data items and establishing a set of candidate categories
`based upon the identified similarities and/or dissimilarities.
`20 The nature of the attributes used to determine similarities
`and/or dissimilarities may differ based on the nature of the
`matching data items. For example, if the matching data
`items are structured records, the attributes used to determine
`the categories may be selected fields of the structured
`records. On the other hand, if the matching data items are
`relatively unstructured text-based electronic documents,
`then the attribute values used to determine categories may
`simply be similarity coefficients that have been generated
`based on comparisons between the text contents of the
`documents.
`The candidate categories may be filtered or otherwise
`processed to select an appropriate number of final categories
`from the candidate categories. In situations where the num(cid:173)
`ber of candidate categories is sufficiently small, the filtering
`may not be necessary. Ideally, the number of final categories
`is selected so that when the final categories are presented to
`a user, the user can review the final categories in a relatively
`short period of time. Accordingly, the actual number of final
`categories necessarily depends upon both the requirements
`40 of a particular application and the way in which the final
`categories are presented to the user.
`Once the final categories are determined, the matching
`data items are assigned to the final categories and the final
`categories are presented to the user. The steps of determining
`45 candidate categories, determining final categories based
`upon the candidate categories and assigning the matching
`data items to the final categories are collectively referred to
`as "clustering." The particular clustering technique used
`depends upon the particular requirements of an application
`50 and the invention is not limited to any particular clustering
`technique. Examples of clustering techniques include Baye(cid:173)
`sian clustering, neural networks, Jaccard similarity
`coefficients, semantic analysis and various natural language
`processing algorithms. The particular clustering algorithm
`55 used may be user-defined.
`The approach of presenting search results using dynamic
`categorization is now described with reference to the flow
`chart 200 of FIG. 2. After starting in step 202, in step 204
`search results are received. The particular way in which a
`60 search is performed is not germane to embodiments of the
`invention and embodiments of the invention are not limited
`to any particular type of search.
`In step 206, a determination is made as to whether initial
`criteria are satisfied. According to one embodiment of the
`65 invention, the initial criteria include a minimum number of
`search results. If the number of matching data items are
`below a minimum threshold, then dynamic categorization is
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`In the following description, for the purposes of
`explanation, specific details are set forth in order to provide
`a thorough understanding of the invention. However, it will
`be apparent that the invention may be practiced without
`these specific details. In other instances, well-known struc(cid:173)
`tures and devices are depicted in block diagram form m
`order to avoid unnecessarily obscuring the invention.
`
`FUNCTIONAL OVERVIEW
`
`In general, search results are presented using dynamic
`categorization. Dynamic categorization involves examining
`search results and dynamically establishing one or more
`search result categories based upon attributes of the search
`results. As described in more detail hereinafter, a varied of
`grouping or clustering techniques may be used to dynami(cid:173)
`cally establish the search result categories. The search result
`categories are then presented using category indicators, as
`described in more detail hereinafter.
`Dynamic categorization allows search result categories to
`be generated on a search-by-search basis while ensuring that
`all matching data items are assigned to at least one search
`result category. As a result, categories may be tailored to
`each set of search results and based on user or application
`preferences. Dynamic categorization may be used in com(cid:173)
`bination with static categories to provide a hybrid category
`
`008
`
`
`
`US 6,385,602 Bl
`
`5
`not used and traditional presentation approaches are used
`instead. Another example of the initial criteria is whether the
`search results consist of data from more than one data source
`(e.g. different databases, such as a real time query and a
`static database query), where dynamic categorization is used
`to combine the data from the different sources to be pre(cid:173)
`sented to the user. If the initial criteria are not satisfied, then
`the process is complete in step 224.
`If, however, in step 206, a determination is made that the
`initial criteria are satisfied, then in step 208 the matching
`data items (search results) are filtered to generate filtered
`search results. According to one embodiment of the
`invention, the matching data items are filtered by a relevance
`threshold. Traditional search techniques provide a relevancy
`rating for search results that indicates how well individual
`matching data items satisfy the search criteria In situations
`where a query results in a large number of matching data
`items, it is often useful to reduce the amount of matching
`data items by discarding matching data items that do not
`satisfy a minimum relevance threshold.
`For example, for particular search results containing a
`large amount of data, all matching data items having a
`relevancy of less than fifty percent might be discarded.
`According to another embodiment of the invention, a par(cid:173)
`ticular number of the most relevant hits are retained, with the 25
`remaining hits being discarded. For example, suppose a
`determination is made that at most one hundred hits are
`desired. A particular search is performed and the search
`results include twenty thousand hits. In this situation the
`relevancy ratings for the matching data items are used to
`identify and keep the one hundred most relevant hits and
`discard the remaining nineteen thousand, nine hundred hits.
`For the purpose of explanation, the matching data items
`that are not discarded during the filtering process are
`referred to herein as qualifying data items. Thus, in the 35
`example given above, the query resulted in twenty thousand
`matching data items, but only one hundred qualifying data
`items.
`In step 210, the qualifying data items are optionally sorted
`by one or more attributes to generate sorted search results. 40
`For example, in the context of search results that include
`addresses, the search results might be sorted by zip code.
`In step 212, common attribute values among the qualify(cid:173)
`ing data items are identified. The common attribute values
`are specific to each set of search results. For example, for 45
`search results pertaining to automobiles, common attribute
`values may include compact cars, mid-size cars, fill size
`cars, and sports cars.
`In step 214, similarity data is determined for the search
`results that indicates the occurrence of the common attribute 50
`values among the qualifying data items. For example, the
`similarity data would indicate how many of the hits in the
`filtered search results have the attribute values of compact
`cars, mid-size cars, full size cars, and sports cars, respec(cid:173)
`tively. In step 216, the search results are grouped based upon
`the similarity data. For example, the qualifying data items
`having the compact car attribute value are grouped together
`and the hits in the search results having the mid-size car
`attribute value are grouped together.
`In step 218, one or more categories are selected based
`upon the groupings. According to one embodiment of the
`invention, the one or more categories are selected by a
`majority vote. Specifically, the categories having the most
`qualifying data items are selected. Categories having rela(cid:173)
`tively few numbers of qualifying data items are collapsed
`into broader categories, so as to reduce the total number of
`selected categories.
`
`6
`In step 220, the qualifying data items are assigned to the
`categories. For example, the hits having the compact car
`attribute are assigned to the compact car category. For hits
`having attributes of categories that were collapsed into
`5 broader categories, those hits are assigned to the broader
`category. For example, if the mid-size car and fill size car
`categories are collapsed into a single full size car category,
`then all of the hits having the mid-size car attribute are
`included in the full size car category. In step 222, the
`10 categories and qualifying data items are presented to the
`user, as described in more detail hereinafter. The process is
`complete in step 224.
`In steps 214 and 216, more than one algorithm may be
`used to produce a number of groupings according to one
`15 embodiment of the invention, an optimal grouping may be
`selected as the grouping presented to the user. An optimal
`grouping is typically determined based upon the require(cid:173)
`ments of a particular application. For example, grouping by
`one attribute may produce more categories than grouping by
`20 another attribute. Conversely, some groupings may cluster
`results with similar relevance scores (which may be inde(cid:173)
`pendent of the categorization criteria). This may be more
`preferable in some circumstances than groupings with
`smaller number of categories.
`An application can also have access to the different
`groupings formed during steps 214 and 216, so that the
`application or the user may elect to view a different grouping
`other than the one initially selected for presentation. This
`ability to take different views of what is basically the sane
`30 large collection of data is akin to doctors using X-ray, MRI,
`and CatScan to look at the same tumor in different ways in
`order to understand it better.
`2. PRESENTING SEARCH RESULTS
`FIG. 3A illustrates a user interface 300 for presenting
`search results using dynamic categorization according to an
`embodiment of the invention. User interface 300 may be
`implemented in any combination of discrete hardware cir(cid:173)
`cuitry and computer software. Typically, user interface 300
`is provided as a graphical representation on a computer
`screen that is generated by the execution of sequences of
`instructions by one or more processors.
`Categories that are dynamically determined in accordance
`with embodiments of cw the invention are presented using
`category indicators. A category indicator is any object that is
`capable of representing a category. Since the invention is not
`limited to any particular medium for presenting search
`results, the type of category indicator may vary depending
`upon the requirements of a particular application. For
`example, for presenting search results on a user interface, a
`user interface object may be used as a category indicator.
`The user interface object may provide some indicia that it
`corresponds to a particular category of search results,
`dynamically determined in accordance with embodiments of
`the invention. For presenting search results in a data file or
`55 on a printer, a category indicator may include a text string
`identifying the corresponding category.
`Referring to the prior example of search results pertaining
`to automobiles, user interface 300 includes three category
`indicators 302, 304 and 306 that correspond to the
`60 dynamically-determined categories previously described.
`Category indicator 302 corresponds to the category "auto(cid:173)
`mobiles: compact cars" and includes two qualifying data
`items from the search results, designated by the reference
`numeral 308. Qualifying data items 308 include compact
`65 cars "Tango" and "Foxtrot". Category indicator 304 corre(cid:173)
`sponds to the category "Automobiles: Full Size Cars" that
`includes qualifying data items 310. Qualifying data items
`
`009
`
`
`
`US 6,385,602 Bl
`
`7
`310 include full size cars, "Zebra," "Elephant" and "Rhino."
`Category indicator 306 corresponds to the category "Auto(cid:173)
`mobiles: Sports Cars" that includes a qualifying data item
`312. Qualifying data item 312 is a sports car "Spark."
`For purposes of illustration, in FIG. 3A the qualifying data
`items 308, 310, 312 and 314 are displayed with their
`respective category indicators 302, 304 or 306. However,
`according to another embodiment of the invention, qualify(cid:173)
`ing data items 308, 310, 312 and 314 are not initially
`displayed. Rather, only category indicators 302, 304 and 306 10
`are initially displayed to reduce the amount of information
`on user interface 300. The respective qualifying data items
`308,310,312 and 314 are displayed in response to a user
`selection of category indicators 302, 304 and 306. For
`example, in response to a user selection of category indicator 15
`302, qualifying data items 308 are displayed. In response to
`another user selection (de-selection) of category indicator
`302, qualifying data items 308 are undisplayed from user
`interface 300. This is particularly helpful when category
`indicator 302 contains a sufficiently large number of quali- 20
`fying data items 308 such that other category indicators 304
`and 306 cannot be displayed simultaneously with the mem(cid:173)
`bers of the category associated with category indicator 302.
`User interface 300 also includes an indicator 314 identi(cid:173)
`fied as "<more in this category>." In response to the
`selection of indicator 314 by a user, additional hits in the
`category corresponding to category indicator 304 are dis(cid:173)
`played on user interface 300. Indicator 314 provides the
`benefit of informing a user that additional hits for the
`category corresponding to category indicator 304 are
`available, without over-cluttering user interface 300.
`For example, if qualifying data items 308,310 and 312 are
`structured records, the text titles may be derived from fields
`in the structured records. In the present example, both of the
`qualifying data items 308, namely "Tango" and "Foxtrot"
`may have a "compact car" field. In circumstances where
`qualifying data items 308, 310 and 312 are relatively
`unstructured text-based electronic documents, then category
`indicators 302, 304 and 306 may not be displayed at all.
`Instead, the first qualifying data item in qualifying data items
`308, 310 and 312, namely "Tango," "Zebra," and "Spark"
`would be displayed on user interface 300 followed by a
`user-selectable "<more like this>" indicator. This approach
`displays a representative qualifying data item in qualifying
`data item 308, 310 and 312 while allowing a user to easily
`view the remaining qualifying data items by selecting the
`"<more like this>" indicator. The text titles provided with
`category indicators 302, 304 or 306 are derived from
`attributes of their respective qualifying data items 308, 310
`and 312.
`Categories within a group may be presented to users in
`any order. However, some orderings may be preferable to
`others. For example, a group by unit price range may be
`more suitably displayed initially sorted by price range. A
`common way of presenting groups during "fuzzy" searches 55
`(where matches aren't exact) is by relevance. A category
`relevance rating can be calculated for each category, and the
`categories can then be presented in relevance sorted order.
`Category relevance can be calculated in any number of
`ways depending on the requirements of a particular appli- 60
`cation. One way is to assign the highest relevance score of
`any item in the category as the category's score. This has the
`effect of elevating groups containing at least one high
`scoring item to the top. Another way is to assign the average
`score of all items in the category as the category's score. Yet 65
`another way is to use the median, or a weighted average. In
`the case where there isn't a clear ordering even after
`
`8
`assigning the scores to the categories, ( e.g. scores are very
`similar), another ordering (such as alphabetical) may be used
`as a tie breaker. Again, the user and the application may have
`complete control on which algorithm is used, and can select
`5 different algorithms.
`3. SUB-CATEGORIES
`Dynamic categorization may also be used to generate
`sub-categories. Generating sub-categories is particularly
`useful when a category has a large number of hits. For
`example, referring to FIG. 3B, in the situation where the
`category corresponding to category indicator 304 contains a
`large number of hits, sub-categories are generated and
`subcategory indicators 316 and 318 corresponding to the
`sub-categories are presented on user interface 300. The
`sub-categories corresponding to sub-category indicators 316
`and 318 are generated based upon attributes of qualifying
`data items 310 contained in the category corresponding to
`category indicator 304.
`In the present example, qualifying data items 310 have a
`price attribute which is used to generate the sub-categories
`that correspond to sub-category indicators 316 and 318.
`Specifically, the sub-category corresponding to sub-category
`indicator 316 is generated for bits having a price attribute of
`less than $25,000. In the present example, this sub-category
`includes entries 320 "Zebra" and "Elephant." On the other
`25 hand, the sub-category corresponding to sub-category indi(cid:173)
`cator 318 is generated for hits having a price attribute of
`more than $25,000. This sub-category includes a hit 322
`"Rhino." The sub-category corresponding to sub-category
`indicator 318 also includes a hit 324 designated as "<more
`30 in this category>" that provides a