`Bowman et al.
`
`[19]
`
`US006006225A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,006,225
`Dec. 21, 1999
`
`[54] REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`[75] Inventors; Dwayne E. Bowman, Woodinville;
`Ruben E. Ortega; Michael L.
`Hamrick, both of Seattle; Joel R.
`Spiegel, Woodinville; Timothy R.
`Kohn, Seattle, all of Wash.
`[73] Assignee: Amazon.Oom, Seattle, Wash.
`
`Quarter?)eck Web Page, Downloaded Sep. 9, 1996, http://
`aracnid.gdeck.com/qdeck/products/webcompass.
`Towell et al. “Learning Collection Fusion Strategies for
`Information Retrieval”, Proceedings of the 12th Annual
`Machine Learning Conference, Jul. 1995, pp. 540–548.
`Voorhees et al., “Learning Collection Fusion Strategies”,
`Proceedings of SIGIR '95, Jul. 1995, pp. 172–179.
`Voorhees et al., “The Collection Fusion Problem” Proceed
`ings of TREC–3, NIST Special Publication 500–225, Apr.
`1995, pp. 95–104.
`Abstract of Generating Advanced Query Interfaces, Lee,
`Srivastava and Vista, Computer Networks and ISDN Sys
`tems Conference Title: Comput. Netw. ISDN Syst. (Neth
`[21] Appl. No.: 09/145,360
`erlands) vol. 30, No. 1–7, pp. 656–657 (1998).
`[22] Filed:
`Sep. 1, 1998
`Abstract of Using Combination of Evidence for Term Expan
`sion, Wilkinson, Information Retrieval Research, Proceed
`Related U.S. Application Data
`ings of the 19th Annual BCS-IRSG Colloquium on IR
`[60] Provisional application No. 60/089,244, Jun. 15, 1998.
`Research (1997).
`[51] Int. Cl." … G06F 17/30
`(List continued on next page.)
`[52] U.S. Cl. ......................... 707/5; 707/2; 707/4; 707/10
`Primary Examiner—Paul R. Lintz
`[58] Field of Search ................................... 707/5, 2, 10, 4
`Attorney, Agent, or Firm—Knobbe, Martens, Olson & Bear,
`LLP
`[56]
`References Cited
`ABSTRACT
`[57]
`A search engine is disclosed which suggests related terms to
`the user to allow the user to refine a search. The related terms
`are generated using query term correlation data which
`reflects the frequencies with which specific terms have
`previously appeared within the same query. The correlation
`data is generated and stored in a look-up table using an
`off-line process which parses a query log file. The table is
`regenerated periodically from the most recent query sub
`missions (e.g., the last two weeks of query submissions), and
`thus strongly reflects the current preferences of users. Each
`related term is presented to the user via a respective hyper
`link which can be selected by the user to submit a modified
`query. In one embodiment, the related terms are added to and
`selected from the table so as to guarantee that the modified
`queries will not produce a NULL query result.
`28 Claims, 10 Drawing Sheets
`
`U.S. PATENT DOCUMENTS
`
`5,675,819 10/1997 Schuetze ................................... 704/10
`5,721,897 2/1998 Rubinstein ...
`... 707/2
`5,787,422 7/1998 Tukey et al. ..
`... 707/5
`5,794,233 8/1998 Rubinstein .......
`... 707/4
`5,864,845
`1/1999 Voorhees et al.
`... 707/5
`5,911,140 6/1999 Tukey et al. .....
`... 707/5
`5,913,215 6/1999 Rubinstein ................................ 707/10
`OTHER PUBLICATIONS
`Bartell et al., “Automatic Combination of Multiple Ranked
`Retrieval Systems”, Proceedings of SIGIR '94, Jul. 1994,
`pp. 173–181.
`Belkin et al., “The Effect of Multiple Query Representations
`on Information System Performance” Proceedings of SIGIR
`’93, Jun. 1993, pp. 339–346.
`Shaw et al., “Combination of Multiple Searches”, Proceed
`ings of TREC–3, Apr. 1995, pp. 105–108.
`
`BEGIN )
`
`PARSE DAILY LOG FILE
`TO EXTRACT OUERY
`SUBMISSIONS FOR WHICH
`UND > 0
`
`4.3%
`
`CORRELATE TERMS BASED
`QN FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`CREATE DALY
`RESULTS FILE
`
`4.32
`
`47.4%%
`
`MERGE DAILY
`RESULTS FILES FOR
`| AST M DAYS
`
`
`
`QWERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`
`
`3. (52
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 1
`
`
`
`6,006,225
`Page 2
`
`OTHER PUBLICATIONS
`Abstract of Inquirus, the NECI Meta Search Engine,
`Lawrence and Giles, Computer Networks and ISDN Sys
`tems Conference Title: Comput. NetwISDN Syst. (Nether
`lands) vol. 30, No. 1–7, pp. 95–105 (1998).
`Abstract of Facilitating Complex Web Queries Through
`Visual User Interfaces and Query Relaxtion, Li and Shim,
`Computer Networks and ISDN Systems Conference Title:
`Comput. Netw. ISDN Syst. (Netherlands) vol. 30, No. 1–7,
`pp. 149–159 (1998).
`A User-centred Evaluation of Ranking Algorithms for Inter
`active Query Expansion, Efthimiadis, Proceedings of the
`16th Annual International ACM SIGIR Conference, Pitts
`burgh, pp. 146–159 (1993).
`Concept Based Query Expansion, Qiu and Frei, Proceedings
`of the 16th Annual International ACM SIGIR Conference,
`Pittsburgh, pp. 160–169 (1993).
`Improving Retrieval Performance by Relevance Feedback,
`Salton and Buckley, J. of Am. Society for Info. Science
`41(4):288–297 (1990).
`
`Query Expansion Using Domain—Adapted, Weighted The
`saurus in an Extended Boolean Model, Kwon, Kim and
`Choi, Proceedings of the 3rd International Conference on
`Information and Knowledge Management (CIKM’94), pp.
`140–146 (1994).
`Browsing Through Querying: Designing for Electronic
`Books, Charoenkitkarn, Tam, Chignell and Golovichinsky,
`at the 5th ACM Conference on Hypertext, Seattle, WA
`206–216 (1993).
`A Survey of Information Retrieval and Filtering Methods,
`Faloutsos and Oard, Univ. of Maryland, 22 pages (undated).
`A Corpus Analysis Approach for Automatic Query Expan
`sion, Gauch and Wang, Proceedings of the 6th International
`Conference on Information and Knowledge Mangement, pp.
`278–284 (1997).
`Discovering Web Acess Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xin and Han, Proceedings of the IEEE Forum on Research
`and Technology Advances in Digital Libraries (IEEE
`ADL’98), pp. 19–29 (1998).
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 2
`
`
`
`US. Patent
`6,006,225
`Sheet 1 0f 10
`l)ec.21,1999
`____________________________
`
`MMK
`
`”Emmm;Nfix
`_llllllllllllllllllllllllllllllllll
`
`IFSQ
`
`o_Im<moo:m_mmmmwwnmm
`
`mm<m<~<o2mgom._.<._mm
`
`
`
`mm>mmm
`
`
`
`mmmoommzo:.om._mm
`
`\Mx
`
`QNx
`
`aygnwx
`
`Ann-quuuv
`
`42F:
`
`HES.
`
`mmmooE
`
`zofi<mm2mo
`-——>e<o
`
`>mu=o
`
`\M.\
`
`%M\
`
`ownozzz<moommumoF¢>2ozomhm<lm
`
`
`
`m._m<._.zo_._.<._mmmoo>530
`
`<><TméKE
`
`o
`
`o._
`
`1.
`
`z...mwm_m<|mmommmmmzzznm
`
`
`
`«mmmummooum0mmz<o<mu<
`
`\HQK
`
`BloomReach, Inc. EX1026 Page 3
`
`|PR2019—01304
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 3
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 2 of 10
`
`6,006,225
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`2/7/7
`
`<= => & = (\ Q. - (6 A &
`Font Mail
`Bock Forw... Stop Refresh Home Search Favorite
`Address http://www.omazon.com/book_search
`NZ
`
`
`2/62
`
`amazon.com Book Search
`—-
`Enter Author and/or Title
`Author: |
`
`(C) Exact Name
`
`OLast, First Name
`
`OStart of Last Name
`
`Title:
`O Exact Title
`
`©Title Word(s)
`
`OStart(s) of Title Words
`
`22/7
`
`Author Search Tips / Title Search Tips
`
`Search by Subject
`
`24/7
`
`Subject:
`O Exact Subject
`
`OStart of Subject
`
`©Subject Word(s) O Start(s) of Subject Word(s)
`
`
`
`
`
`Search. Now
`
`|Clear the Form
`
`Subject Search Tips
`
`Other Search Methods:
`ISBN, Publisher/Date, Quick Search
`
`Amazon.com Home | Music Search | Your Account
`
`A/22
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 4
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 3 of 10
`
`6,006,225
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`J.2/2
`
`J.3/7
`
`
`
`J4/7
`
`
`
`/35
`
`Friday, 13-Feb-98 02:23:52
`User Identifier = 29.3847.19.287
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`title = Snow Crdsh
`items found = 2
`
`Friday, 13-Feb-98 02:24:11
`User Identifier = 29.3847.19.287
`HTTP_REFERRER= http://www.amazon.com/book_search
`PATH INFO=/ISBN = 0553562614
`
`Friday, 13-Feb-98 06:15:03
`User Identifier = 54730543.261
`HTTP_REFERRER= http://www.amazon.com/music_search_page
`PATH_INFO=/music_search
`artist = This dnd That
`items found = 0
`
`Friday, 13-Feb-98 10:07:34
`User Identifier = 0.273859 18272
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`subject = outdoor trail
`items found = 22
`
`
`
`
`
`
`
`A/2J
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 5
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 10
`
`6,006,225
`
`BEGIN
`
`4//7
`
`PARSE DAILY LOG FILE
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`|TEMS FOUND > 0
`
`4.267
`
`
`
`CORRELATE TERMS BASED
`ON FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`CREATE DALY
`RESULTS FILE
`
`45/7
`
`4.4/7
`
`
`
`MERGE DAILY
`RESULTS FILES FOR
`LAST M DAYS
`
`OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`45/2
`
`END
`
`A/24
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 6
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 5 of 10
`
`6,006,225
`
`Suuu 91 N
`
`
`
`
`
`
`
`
`
`þº39//
`
`(yg)
`(yç)
`(z ) ESIONBOX3-S
`
`(G9) 8000100-S
`
`(ç) NITXINW83–W
`
`(z) NOSTJV0-V
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 7
`
`
`
`US. Patent
`
`Dec. 21, 1999
`
`Sheet 6 0f 10
`
`6,006,225
`
`2532
`
`%\M.\
`
`.__<E.lm
`
`
`
`33
`
`:3A3
`
`ca
`
`éa5iv33
`
`ozfifimlm
`
`Cmmm<01<
`
`mvzmlm
`
`EVmxalm
`
`oz_N<._mIH
`so50225;
`8322.288;
`a:“mammoxmum3zomééé
`$3EDI;52323:;
`
`moooHDOIm
`
`zo_._.<o<>lm
`
`mEOmmlm
`
`zoxa>lm
`
`xilm
`
`2.3mEommum
`
`$3425%
`33zo:<o<>|mE4.5:;
`ac62055;,
`38mooEDOIm
`avSHEA?535¢at457m355$;
`
`mum.SQ
`
`km“.
`
`NV\
`
`BloomReach, Inc. EX1026 Page 8
`
`|PR2019-01304
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 8
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 7 of 10
`
`6,006,225
`
`Date
`
`Daily Log
`File
`
`Daily Results
`File
`
`Dates of Query
`Correlation Toble
`
`Query Correlation
`Toble
`
`
`
`1 –Feb-98
`
`2–Feb-98
`
`3-Feb-98
`
`4-Feb-98
`
`5–Feb-98
`
`6–Feb-98
`
`7—Feb-98
`
`8—Feb-98
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 9
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of 10
`
`6,006,225
`
`BEGIN
`
`77/2
`FOR EACH TERM
`IN THE QUERY
`
`Z2/7
`LOOK UP TERM IN THE
`CORRELATION TABLE
`
`Z3/7
`RETRIEVE THE TERM'S
`RELATED TERMS LIST
`
`ZZ/7
`
`| NEXT TERM II
`
`Z5/7
`
`
`
`
`
`MULTI—TERM
`QUERY
`2
`
`COMBINE ALL RELATED
`TERMS LISTS
`
`
`
`Z77
`SELECT X TERMS WITH
`HIGHEST VALUES
`
`A/2 2.
`
`END
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 10
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 9 of 10
`
`6,006,225
`
`Top 3 Related Terms:
`&//7
`TRAIL – MIX
`TRAIL – YUKON --"
`TRAIL – BIKE
`
`A/(2 &A
`
`Intersecting Terms:
`S – BIKE
`S – SPORTS
`S — VACATION
`
`&_3/2
``--
`
`Top 3 Related Terms:
`OUTDOOR TRAIL – BIKE
`OUTDOOR TRAIL – SPORTS
`OUTDOOR TRAIL – VACATION
`
`A/22%’
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 11
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 10
`
`6,006,225
`
`$200
`
`File
`
`Edit View Go Favorite
`
`Help
`
`
`
`<= = @ E ?º Q - 6 A GR
`Font
`Bock Forw... Stop Refresh Home Search Favorite
`Address http://www.dmozon.com/book_search
`NZ
`
`
`Mcil
`
`amazon.com Book Search
`_--~
`
`Your Search Results
`for: the subject words include "OUTDOOR TRAIL"
`
`Related Query Terms: Click on a link below to narrow your
`search by adding a related term.
`OUTDOOR TRAIL – BIKE
`
`9/62
`
`OUTDOOR TRAIL – SPORTS
`
`OUTDOOR TRAIL – WACATION
`
`Top Matches for this search:
`* Outdoor & Trail Guide to Yosemite
`* Outdoor Training on Mountian Bike Trails
`
`
`
`
`
`
`
`
`
`
`
`Full Results: The First 100 are shown below. to see
`more results scroll down and click the "More" button.
`
`The blessed trail; outdoor diary
`T he Great Outdoor Cookbook: Trail Recipies
`
`Outdoor & Trail Guide to Yosemite
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A/2.9
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 12
`
`
`
`1
`REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`RELATED APPLICATION
`This application claims the benefit of U. S. Provisional
`Application No. 60/089,244, filed Jun. 15, 1998, the disclo
`sure of which is hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of Invention
`This present invention relates to query processing, and
`more specifically relates to techniques for facilitating the
`process of refining search queries.
`2. Description of Related Art
`With the increasing popularity of the Internet and the
`World Wide Web, it is common for on-line users to utilize
`search engines to search the Internet for desired information.
`Many web sites permit users to perform searches to identify
`a small number of relevant items among a much larger
`domain of items. As an example, several web index sites
`permit users to search for particular web sites among known
`web sites. Similarly, many on-line merchants, such as
`booksellers, permit users to search for particular products
`among all of the products that can be purchased from the
`merchant. Other on-line services, such as Lexis" and
`Westlaw", allow users to search for various articles and
`court opinions.
`In order to perform a search, a user submits a query
`containing one or more query terms. The query may also
`explicitly or implicitly identify a record field or segment to
`be searched, such as title, author, or subject classification of
`the item. For example, a user of an on-line bookstore may
`submit a query containing terms that the user believes
`appear within the title of a book. A query server program of
`the search engine processes the query to identify any items
`that match the terms of the query. The set of items identified
`by the query server program is referred to as a “query
`result.” In the on-line bookstore example, the query result is
`a set of books whose titles contain some or all of the query
`terms. In the web index site example, the query result is a set
`of web sites or documents. In web-based implementations,
`the query result is typically presented to the user as a
`hypertextual listing of the located items.
`If the scope of the search is large, the query result may
`contain hundreds, thousands or even millions of items. If the
`user is performing the search in order to find a single item
`or a small set of items, conventional approaches to ordering
`the items within the query result often fail to place the
`sought item or items near the top of the query result list. This
`requires the user to read through many other items in the
`query result before reaching the sought item. Certain search
`engines, such as Excite" and AltaVista", suggest related
`query terms to the user as a part of the “search refinement”
`process. This allows the user to further refine the query and
`narrow the query result by selecting one or more related
`query terms that more accurately reflect the user’s intended
`request. The related query terms are typically generated by
`the search engine using the contents of the query result, such
`as by identifying the most frequently used terms within the
`located documents. For example, if a user were to submit a
`query on the term “FOOD,” the user may receive several
`thousand items in the query result. The search engine might
`then trace through the contents of some or all of these items
`and present the user with related query terms such as
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,006,225
`
`2
`“RESTAURANTS,” “RECIPIES,” and “FDA* to allow the
`user to refine the query.
`The related query terms are commonly presented to the
`user together with corresponding check boxes that can be
`selectively marked or checked by the user to add terms to the
`query. In some implementations, the related query terms are
`alternatively presented to and selected by the user through
`drop down menus that are provided on the query result page.
`In either case, the user can add additional terms to the query
`and then resubmit the modified query. Using this technique,
`the user can narrow the query result down to a more
`manageable set consisting primarily of relevant items.
`One problem with existing techniques for generating
`related query terms is that the related terms are frequently of
`little or no value to the search refinement process. Another
`problem is that the addition of one or more related terms to
`the query sometimes leads to a NULL query result. Another
`problem is that the process of parsing the query result items
`to identify frequently used terms consumes significant pro
`cessor resources, and can appreciably increase the amount of
`time the user must wait before viewing the query result.
`These and other deficiencies in existing techniques hinder
`the user’s goal of quickly and efficiently locating the most
`relevant items, and can lead to user frustration.
`SUMMARY OF THE INVENTION
`The present invention addresses these and other problems
`by providing a search refinement system and method for
`generating and displaying related query terms (“related
`terms”). In accordance with the invention, the related terms
`are generating using query term correlation data that is based
`on historical query submissions to the search engine. The
`query term correlation data (“correlation data”) is preferably
`based at least upon the frequencies with which specific terms
`have historically been submitted together within the same
`query. The incorporation of such historical query informa
`tion into the process tends to produce related terms that are
`frequently used by other users in combination with the
`submitted query terms, and significantly increases the like
`lihood that these related terms will be helpful to the search
`refinement process. To further increase the likelihood that
`the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub
`missions that produced a successful query result (at least one
`match).
`In accordance with one aspect of the invention, the
`correlation data is stored in a correlation data structure
`(table, database, etc.) which is used to look up related terms
`in response to query submissions. The data structure is
`preferably generated using an off-line process which parses
`a query log file, but could alternatively be generated and
`updated in real-time as queries are received from users. In
`one embodiment, the data structure is regenerated periodi
`cally (e.g., once per day) from the most recent query
`submissions (e.g., the last M days of entries in the query
`log), and thus strongly reflects the current tastes of the
`community of users; as a result, the related terms suggested
`by the search engine strongly reflect the current tastes of the
`community. Thus, for example, in the context of a search
`engine of an online merchant, the search engine tends to
`suggest related terms that correspond to the current best
`selling products.
`In a preferred embodiment, each entry in the data struc
`ture is in the form of a key term and a corresponding related
`terms list. Each related terms list contains the terms which
`have historically appeared together (in the same query) with
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 13
`
`
`
`6,006,225
`
`3
`the respective key term with the highest degree of frequency,
`ignoring unsuccessful query submissions (query submis
`sions which produced a NULL query result). The data
`structure thus provides an efficient mechanism for looking
`up the related terms for a given query term.
`To generate a set of related terms for refining a submitted
`query (the “present query”), the related terms list for each
`term in the present query is initially obtained from the
`correlation data structure. If this step produces multiple
`related terms lists (as in the case of a multiple-term query),
`the related terms lists are preferably combined by taking the
`intersection between these lists (i.e., deleting the terms that
`are not common to all lists). The related terms which remain
`are terms which have previously appeared, in at least one
`successful query submission, in combination with every
`term of the present query. Thus, assuming items have not
`been deleted from the database being searched, any of these
`related terms can be individually added to the present query
`while guaranteeing that the modified query will not produce
`a NULL query result. To take advantage of this feature, the
`related terms are preferably presented to the user via a user
`interface that requires the user to add no more than one
`related term per query submission. In other embodiment, the
`related terms are selected and displayed without guarantee
`ing a successful query result.
`Because the related terms are identified from previously
`generated correlation data, without the need to parse docu
`ments or correlate terms, the related terms can be identified
`and presented to the user with little or no added delay.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features will now be described with
`reference to the drawings summarized below. These draw
`ings and the associated description are provided to illustrate
`a preferred embodiment of the invention, and not to limit the
`scope of the invention.
`Throughout the drawings, reference numbers are re-used
`to indicate correspondence between referenced elements. In
`addition, the first digit of each reference number indicates
`the figure in which the element first appears.
`FIG. 1 illustrates a system in which users access web site
`information via the Internet, and illustrates the basic web site
`components used to implement a search engine which
`operates in accordance with the invention.
`FIG. 2 illustrates a sample book search page of the web
`site.
`FIG. 3 illustrates sample log entries of a daily query log
`file.
`FIG. 4 illustrates the process used to generate the corre
`lation table of FIG. 1.
`FIG. 5A illustrates a sample mapping before a query is
`added.
`FIG. 5B illustrates a sample mapping after a query is
`added.
`FIG. 6 illustrates a process for generating the correlation
`table from the most recent daily query log files.
`FIG. 7 illustrates a process for selecting the related query
`terms from the correlation table.
`FIG. 8A illustrates a set of related query terms from a
`single-term query.
`FIG. 8B illustrates a set of intersecting terms and a set of
`related query terms from a multiple-terrn query.
`FIG. 9 illustrates a sample search result page of the web
`site.
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`The present invention provides a search refinement sys
`tem and method for generating related query terms (“related
`terms”) using a history of queries submitted to a search
`engine by a community of users. Briefly, the system gener
`ates query term correlation data which reflects the frequency
`with which specific terms have previously occurred together
`within the same query. The system uses the query term
`correlation data in combination with the query term(s)
`entered by the user to recommend additional query terms for
`refining the query. The incorporation of such historical query
`information into the process tends to produce related terms
`that are frequently used by other users in combination with
`the submitted query terms, and significantly increases the
`likelihood that these related terms will be helpful to the
`search refinement process. To further increase the likelihood
`that the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub
`missions that produced a successful query result (at least one
`match).
`In the preferred embodiment, the query term correlation
`date is regenerated periodically from recent query
`submissions, such as by using the last M days of entries in
`a query log, and thus heavily reflects the current tastes of
`users. As a result, the related terms suggested by the search
`engine tend to be terms that correspond to the most fre
`quently searched items during the relevant time period.
`Thus, for example, in the context of a search engine of an
`online merchant, the search engine tends to suggest related
`terms that correspond to the current best-selling products. In
`one embodiment, the technique used to generate the related
`terms and present these terms to the user guarantees that the
`modified query will not produce a NULL query result.
`The search refinement methods of the invention may be
`implemented, for example, as part of a web site, an Internet
`site, an on-line services network, a document retrieval
`system, or any other type of computer system that provides
`searching capabilities to a community of users. In addition,
`the method may be combined with other methods for
`suggesting related terms, such as methods which process the
`contents of located documents.
`A preferred web-based implementation of the search
`refinement system will now be described with reference to
`FIGS. 1–9. For purposes of illustration, the system is
`described herein in the context of a search engine that is used
`to assist customers of Amazon.com Inc. in locating items
`(e.g., books, CDs, etc.) from an on-line catalog of products.
`Throughout the description, reference will be made to vari
`ous implementation-specific details of the Amazon.com
`implementation. These details are provided in order to fully
`illustrate a preferred embodiment of the invention, and not
`to limit the scope of the invention. The scope of the
`invention is set forth in the appended claims.
`I. Overview of Web Site and Search Engine
`FIG. 1 illustrates the Amazon.com web site 130, including
`components used to implement a search engine in accor
`dance with the invention.
`As it is well known in the art of Internet commerce, the
`Amazon.com web site includes functionality for allowing
`users to search, browse, and make purchases from an online
`catalog of book titles, music titles, and other types of items
`via the Internet 120. Because the catalog contains millions
`of items, it is important that the site provide an efficient
`mechanism for assisting users in locating items.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`IPR2019-01304
`BloomReach, Inc. EX1026 Page 14
`
`
`
`6,006,225
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`5
`As illustrated by FIG. 1, the web site 130 includes a web
`server application 131 (“web server”) which processes user
`requests received from user computers 110 via the Internet
`120. These requests include queries submitted by users to
`search the on-line catalog for products. The web server 131
`records the user transactions, including query submissions,
`within a query log 135. In the embodiment depicted in FIG.
`1, the query log 135 consists of a sequence of daily query log
`files 135(1)–135(M), each of which represents one day of
`transactions.
`The web site 130 also includes a query server 132 which
`processes the queries by searching a bibliographic database
`133. The bibliographic database 133 includes information
`about the various products that users may purchase through
`the web site 130. This information includes, for example, the
`titles, authors, publishers, subject descriptions, and ISBNs
`(International Standard Book Numbers) of book titles, and
`the titles, artists, labels, and music classifications of music
`titles. The information for each item is arranged within fields
`(such as an “author” field and a “title” field), enabling the
`bibliographic database 133 to be searched on a field
`restricted basis. The site also includes a database 134 of
`HTML (Hypertext Markup Language) content which
`includes, among other things, product information pages
`which show and describe the various products.
`The query server 132 includes a related term selection
`process 139 which identifies related query terms based on
`query term correlation data stored in a correlation table 137.
`As depicted in FIG. 1 and described below, the correlation
`table 137 is generated periodically from the M most recent
`daily query log files 135(1)–135(M) using an off-line table
`generation process 136.
`The web server 131, query server 132, table generation
`process 136, and database software run on one or more
`Unix"-based servers and workstations (not shown) of the
`web site 130 although other types of platforms could be
`used. The correlation table 137 is preferably cached in RAM
`(random access memory) on the same physical machine as
`that used to implement the query server 132. To accommo
`date large numbers of users, they query server 132 and the
`correlation table 137 can be replicated across multiple
`machines. The web site components that are invoked during
`the searching process are collectively referred to herein as a
`“search engine.”
`FIG. 2 illustrates the general format of a book search page
`200 of the Amazon.com web site 130 that can be used to
`search the bibliographic database 133 for book titles. Users
`have access to other search pages that can be used to locate
`music titles and other types of products sold by the on-line
`merchant. The book search page 200 includes author, title,
`and subject fields 210, 220, 240 and associated controls that
`allow the user to initiate fieldrestricted searches for book
`titles. Users can perform searches by first typing in the
`desired information into a search field 210, 220, 240 and
`then clicking on the appropriate search button 230,250. The
`term or string of terms submitted to the search engine is
`referred to herein as the “query.” Other areas of the web site
`ask the user to submit queries without limiting the terms to
`specific fields.
`When the user submits a query from the book search page
`200 to the web site 130, a the query sever 132 applies the
`query to the bibliographic database 133, taking into account
`any field restrictions within the query. If the query result is
`a single item, the item’s product information page is pre
`sented to the user. If the query result includes multiple items,
`the list of items is presented to the user through a query
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`result page which contains hypertextual links to the items’
`respective product information pages.
`For multiple-term queries, the query server 132 effec
`tively logically ANDs the query terms together to perform
`the search. For example, if the user enters the terms “JAVA’
`and “PROGRAMMING” into the title field 220, the query
`server 132 will search for and return a list of all items that
`have both of these terms within the title. Thus, if any query
`term does not produce a match (referred to herein as a
`“non-matching term”), the query will produce a NULL
`query result. Presenting a NULL query result to the user can
`cause significant user frustration. To reduce this problem, in
`this event, the user may be presented with a list of items that
`are deemed to be “close matches.” Although the search
`engine described herein logically ANDs the query terms
`together, it will be recognized that the invention can be
`applied to search engines that use other methods for pro
`cessing queries.
`In accordance with the invention, the search engine uses
`the query term correlation data stored in the correlation table
`137 to select the related terms that best match the user’s
`query. The search engine then presents the related terms to
`the user, allowing the user to refine the search and enhance
`discovery of relevant information. The query term correla
`tion data indicates relationships between query terms, and is
`used to effectively predict query terms that are likely to be
`helpful to the search refinement process. In accordance with
`another aspect of the invention, the correlation table 137
`preferably contains or reflects historical information about
`the frequencies with which specific query terms have
`appeared together within the same query.
`The general format of the correlation table 137 is illus
`trated in FIG. 1. In the embodiment depicted in FIG. 1 and
`described in detail herein, the correlations between query
`terms are based solely on frequency of occurrence within the
`same query. As described below, other types of query tern
`correlations can additionally be used. In addition, although
`the disclosed implementation uses a table to store the query
`term correlation data, other types of databases can be used.
`As illustrated by FIG. 1, each entry within the correlation
`table 137 (two entries shown) has two primary components:
`a “key” term 140, and a “related terms” list 142 for that key
`term. The related terms list 142 is a list of the N (e.g. 50)
`query terms that have appeared within the same query as the
`keyword with the highest degree of frequency, and is
`ordered according to frequency. For example, the entry for
`the key term COSMOS (ignoring the single-term prefixes,
`which are discussed below) is:
`COSMOS: ASTRONOMY, SAGAN, UNIVERSE,
`indicating that ASTRONOMY has appeared together with
`COSMOS with the highest degree of frequency; SAGAN
`has appeared with COSMOS with the second highest degree
`of frequency, and so on. Each term that appears within the
`related terms list 142 is deemed to