`United States Patent
`6,006,225
`Dec.21, 1999
`[45] Date of Patent:
`Bowmanet al.
`
`(11] Patent Number:
`
`US006006225A
`
`[54] REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`[75]
`
`Inventors: Dwayne E. Bowman, Woodinville;
`Ruben E. Ortega; Michael L.
`Hamrick, both of Seattle; Joel R.
`Spiegel, Woodinville; Timothy R.
`Kohn,Seattle, all of Wash.
`
`
`
`73] Assignee: Amazon.Com, Seattle, Wash.
`
`21] Appl. No.: 09/145,360
`
`22]
`
`Filed:
`
`Sep. 1, 1998
`
`Related U.S. Application Data
`Provisional application No. 60/089,244, Jun. 15, 1998.
`60]
`SL] Tn, Co eeecccecssssssesscecssssscecssessneeeess G06F 17/30
`52] US. Cheeee 707/5; 707/2; 707/4; 707/10
`58] Field of Search oes 707/5, 2, 10, 4
`
`56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,675,819
`5,721,897
`5,787,422
`5,794,233
`5,864,845
`5,911,140
`5,913,215
`
`10/1997 SchuetZe woes cesses 704/10
`2/1998 Rubinstein ....
`-- FO7/2
`
`we FOT/S
`7/1998 Tukeyet al.
`..
`
`8/1998 Rubinstein ........
`. F07/4
`. FOT/S
`1/1999 Voorheeset al.
`.
`
`6/1999 Tukeyetal. ......
`. FOT/S
`6/1999 Rubinstein oes FOT/L0
`OTHER PUBLICATIONS
`
`Bartell et al., “Automatic Combination of Multiple Ranked
`Retrieval Systems”, Proceedings of SIGIR °94, Jul. 1994,
`pp. 173-181.
`Belkin et al., “The Effect of Multiple Query Representations
`on Information System Performance” Proceedings of SIGIR
`°93, Jun. 1993, pp. 339-346.
`Shaw et al., “Combination of Multiple Searches”, Proceed-
`ings of TREC-3, Apr. 1995, pp. 105-108.
`
`QuarterDeck Web Page, Downloaded Sep. 9, 1996, http://
`aracnid.qdeck.com/qdeck/products/webcompass.
`Towell et al. “Learning Collection Fusion Strategies for
`Information Retrieval”, Proceedings of the 12th Annual
`Machine Learning Conference, Jul. 1995, pp. 540-548.
`Voorhees et al., “Learning Collection Fusion Strategies”,
`Proceedings of SIGIR ’95, Jul. 1995, pp. 172-179.
`Voorheesetal., “The Collection Fusion Problem” Proceed-
`ings of TREC—3, NIST Special Publication 500-225, Apr.
`1995, pp. 95-104.
`Abstract of Generating Advanced Query Interfaces, Lee,
`Srivastava and Vista, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw. ISDN Syst. (Neth-
`erlands) vol. 30, No. 1-7, pp. 656-657 (1998).
`Abstract of Using Combination ofEvidence for Term Expan-
`sion, Wilkinson, Information Retrieval Research, Proceed-
`ings of the 19th Annual BCS-IRSG Colloquium on IR
`Research (1997).
`
`(List continued on next page.)
`Primary Examiner—Paul R. Lintz
`Attorney, Agent, or Firm—Knobbe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`Asearch engineis disclosed which suggests related terms to
`the userto allow the userto refine a search. The related terms
`are generated using query term correlation data which
`reflects the frequencies with which specific terms have
`previously appeared within the same query. The correlation
`data is generated and stored in a look-up table using an
`off-line process which parses a query log file. The table is
`regenerated periodically from the most recent query sub-
`missions(e.g., the last two weeks of query submissions), and
`thus strongly reflects the current preferences of users. Lach
`related term is presented to the user via a respective hyper-
`link which can be selected by the user to submit a modified
`query. In one embodiment,the related terms are added to and
`selected from the table so as to guarantee that the modified
`queries will not produce a NULL queryresult.
`
`28 Claims, 10 Drawing Sheets
`
`
`
`PARSE DAILY LOG FILE
`
`
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`ITEMS FOUND > 6
`
`
`
`
`
`
` LAST
` OVERWRITE EXISTING CORRELATION
`
`CORRELATE TERMS SASED
`ON FREQUENCY OF OCCURRENCE
`
`WITHIN SAME QUERY
`
`CREATE DAILY
`RESULTS FILE
`
`MERGE DAILY
`RESULTS FILES FOR
`DAYS
`
`
`
`450
`
`TABLE WITH NEW CORRELATION TABLE
`
`
`
`ELASTIC - EXHIBIT 1026
`ELASTIC - EXHIBIT 1026
`
`
`
`6,006,225
`
`Page 2
`
`OTHER PUBLICATIONS
`
`the NECI Meta Search Engine,
`Abstract of Inquirus,
`Lawrence and Giles, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw ISDN Syst. (Nether-
`lands) vol. 30, No. 1-7, pp. 95-105 (1998).
`Abstract of Facilitating Complex Web Queries Through
`Visual User Interfaces and Query Relaxtion, Li and Shim,
`Computer Networks and ISDN Systems Conference Title:
`Comput. Netw. ISDN Syst. (Netherlands) vol. 30, No. 1-7,
`pp. 149-159 (1998).
`A User—centred Evaluation ofRankingAlgorithms for Inter-
`active Query Expansion, Efthimiadis, Proceedings of the
`16th Annual International ACM SIGIR Conference, Pitts-
`burgh, pp. 146-159 (1993).
`Concept Based Query Expansion, Qtu and Frei, Proceedings
`of the 16th Annual International ACM SIGIR Conference,
`Pittsburgh, pp. 160-169 (1993).
`Improving Retrieval Performance by Relevance Feedback,
`Salton and Buckley, J. of Am. Society for Info. Science
`41(4):288-297 (1990).
`
`Query Expansion Using Domain—Adapted, Weighted The-
`saurus in an Extended Boolean Model, Kwon, Kim and
`Choi, Proceedings of the 3rd International Conterence on
`Information and Knowledge Management (CIKM’94), pp.
`140-146 (1994).
`Browsing Through Querying: Designing for Electronic
`Books, Charoenkitkarn, Tam, Chignell and Golovichinsky,
`at
`the 5th ACM Conference on Hypertext, Seattle, WA
`206-216 (1993).
`A Survey of Information Retrieval and Filtering Methods,
`Faloutsos and Oard, Univ. of Maryland, 22 pages (undated).
`A Corpus Analysis Approach for Automatic Query Expan-
`sion, Gauch and Wang, Proceedings of the 6th International
`Conference on Information and Knowledge Mangement, pp.
`278-284 (1997).
`Discovering Web Acess Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xin and Han, Proceedings of the IEEE Forum on Research
`and Technology Advances in Digital Libraries
`(TEEE
`ADL’98), pp. 19-29 (1998).
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 1 of 10
`
`6,006,225
`
`°°!(GrzIdv-S©0¢) 3SY3AINN-S
`
`
`JIHdVSDOITI!saYSANaS
`
`ASVavVLvdWaldalviay
`0Z¢)DNINNVEDONd-SOly)AWONOULSY-S
`
`
`Z6Z334409-S06SNVOVS-¥
`
`
`F1aVLNOIWWISYYODAYINO
`£EhCEL
`
`wAWr-S|_—=—=—sSOWSOO-L|$07/
`
`
`
`ALIS€3M
`
`
`
`patsGamLLMMe
`
`LES
`
`GEL
`
`
`
`$$300UdNOILOITIS
`
`SES
`
`(W)6ES
`
`cy
`
`WH
`
`wav
`
`S$3908d
`
`NOILVYANIO
`-;Ava
`
`9Auand
`
`01
`
`oa
`
`O64
`
`LOU
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 2 of 10
`
`6,006,225
`
`200
`
`file
`Edit
`View
`Go
`Favorite
`Help
`==> © & Q@ &
`Back Forw...
`Stop Refresh Home Search
`Favorite
`
`Print A ee
`
`
`
`Address|http: //www.amazon.com/book_search Iv]
`
`aniszon.comcom Book Search
`Enter Author and/or Title
`
`Amazon.com Home | Music Search | Your Account
`
`Author:
`
`210
`
`@Exact Name
`
`OlLast, First Name
`
`OStart of Last Name
`
`Title:
`
`LEO
`
`OStart(s) of Title Words
`@Title Word(s)
`OFxact Title
`
` Search Now |
`
`||Clear the Form
`
`Author Search Tips / Title Search Tips
`
`Search by Subject
`
`L240
`
`@Subject Word(s)
`OStart of Subject
`OExact Subject
`
`
`
`Search Now
`Clear the Form
`
`© Start(s) of Subject Word(s)
`
`Subject Search Tips
`
`Other Search Methods:
`ISBN, Publisher/Date, Quick Search
`
`FIG. 2
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 3 of 10
`
`6,006,225
`
`ITO
`
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`title = Snow Crash
`ifems_found = 2
`
`Friday, 13-Feb—98 02:24:11
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search
`PATH_INFO=/ISBN = 0553562614
`
`I20
`
`IIO
`
`J40 Friday, 13-Feb—-98 02:23:52
`
`Friday, 13-Feb-98 06:15:03
`User Identifier = 54730543261
`HTTP_REFERRER= http://www.amazon.com/music_search_page
`PATH_INFO=/music_search
`artist = This and That
`items_found = 0
`
`Friday, 13-Feb-98 10:07:34
`User Identifier = 027385918272
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`subject = outdoor trail
`items_found = 22
`
`FIG. F
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 10
`
`6,006,225
`
`BEGIN
`
`470
`
`PARSE DAILY LOG FILE
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`
`ITEMS FOUND > 0
`ON FREQUENCY OF OCCURRENCE WITHIN SAME QUERY
`
`CORRELATE TERMS BASED
`
`420
`
`430
`
`CREATE DAILY
`RESULTS FILE
`
`440
`
`MERGE DAILY
`RESULTS FILES FOR
`
`LAST M DAYS
`
`OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`450
`
`END
`
`FIC. 4
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 5 of 10
`
`6,006,225
`
`sua]N
`
`WLEL
`
`WVYL-S
`
`
`(Sc)ONIZVIE-S
`
`(ZL)ayld-L
`
`(Z1)3SIDYIOXI-S
`(Zz)
`
`NOSTYYO-V
`
`SLYOdS-Sa3LUV“(9¢)Nowvonaa-L
`(Z6)XIN-S(iy)
`
`
`
`(3)oNIZvIa-L(001)ONINIG-L
`
`(7)ASSNH-L
`
`
`
`($9)yoodLNO-s
`
`
`
`(S$)NITNVYI-V
`
`
`
`(7S) Ylvdad—L
`
`(v1)Wal-L
`
`CRS
`
`
`
`
`
`
`
`(¢z)YOOGLNO-S(s9)TIval-S
`
`fyNouyvansv4NOILYOVA-S
`=LTWWYL-L
`(lr)TIVul-S
`
`VGDL
`
`
`
`
`
`NONNA-L(OL)YSNDSVM-V
`
`(8)S3LVA-V
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 6 of 10
`
`6,006,225
`
`sudo]N
`
`GLEL
`
`
`
`(sc)SONIZVIE—-S
`
`(S)9ONIZVIE—-L
`
`(2)
`
`IXI8-S
`
`(Z6)
`
`
`
`(vZ)YOOdLNO-Ss
`
`
`
`(Zt)SL¥OdS-S
`
`NOANA-S
`
`(99)
`
`(2)
`
`(01)
`
`TWUL-S
`
`WWeYl-L
`
`GSUs
`
`O65
`
`CPL
`
`WVYL-S
`
`
`
`(1%)LLauyvo-V
`
`(79)alg—-S
`
`(6)NOILVOVA-S
`XIN-S(17)
`
`
`(9¢)NOILVONGI-L
`
`(¢Z)NOILVOVA-S
`(001)
`YINOVM-V
`SLYOdS-S$
`ONINIG-L
`
`(Z1)3SIOU3OXI-S
`(99)(yo)=ASAMH-LyOOdLNO-S
`
`
`(yS)—alWd3u-L
`
`(Zr)TVYL-S
`
`(v1)TWal-L
`(z)NOSTY¥O-V
`(S)NITINVYI-V
`(8)S3LVA-V
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 7 of 10
`
`6,006,225
`
`Date
`
`Daily Log
`ile
`
`Daily Results
`File
`
`Dates of Query
`Correlation Table
`
`Query Correlation
`Table
`
`1-Feb-98 2-Feb-98
`
`
`F[E|
`
`
`
`
`
`3-Feb-98
`
`4-Feb-98
`
`5-Feb-98
`
`6—-Feb-98
`
`7-Feb—-98
`
`8-Feb-98
`
`670
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of 10
`
`6,006,225
`
`SIO
`
`MULTI-TERM
`QUERY
`?
`
`
`
`760
`
`Terrrea
`
`
`
`
`COMBINE ALL RELATED
`TERMS LISTS
`
`SELECT X TERMS WITH
`HIGHEST VALUES
`
`FIG. /
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 9 of 10
`
`6,006,225
`
`Top 3 Related Terms:
`
`TRAIL — MIX
`
`TRAIL — YUKON _/
`
`TRAIL — BIKE
`
`E10
`
`FIG. EA
`
`Intersecting Terms:
`
`E20
`
`S — BIKE
`S -— SPORTS
`S — VACATION
`
`E30
`QS
`
`Top 3 Related Terms:
`OUTDOOR TRAIL — BIKE
`QUTDOOR TRAIL — SPORTS
`OUTDOOR TRAIL — VACATION
`
`FIG. EB
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 10
`
`6,006,225
`
`GOO
`
`\
`-————
`
`LI le
` View
`
`
`
`
`File
`Edit
`Go
`Favorite
`Help
`
`as ®F & Qa FAaQ
`Back Forw... Stop Refresh Home
`Search
`Favorite
`Font
`
`
`
`Address|http: //www.amazon.com/book_search iv
`
`amazon.com Book Search
`aaa
`
` FIO
`
`
`
`
`
`
`
`
`
`
`
`
`Your Search Results
`
`for: the subject words include “OUTDOOR TRAIL”
`
`Related Query Terms:Click on a link below to narrow your
`search by adding a related term.
`OUTDOOR TRAIL — BIKE
`
`OUTDOOR TRAIL — SPORTS
`
`OUTDOOR TRAIL — VACATION
`
`Top Matches for this search:
`
`* Outdoor & Trail Guide to Yosemite
`
`
`
`
`
`* Outdoor Training on Mountian Bike Trails
`
`
`to See
`Full Results: tne First 100 are shown below.
`more results scroll down and click the “More” button.
`
`The blessed trail: outdoor diary
`
`
`
`
`
`
`Outdoor & Trail Guide to Yosemite
`
`G20
`
`T
` he Great Outdoor Cookbook: Trail Recipies
`
`
`
`
`
`6,006,225
`
`1
`REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`RELATED APPLICATION
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/089,244, filed Jun. 15, 1998, the disclo-
`sure of which is herebyincorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`1. Field of Invention
`
`2
`“RESTAURANTS,” “RECIPIES,” and “FDA”to allow the
`user to refine the query.
`The related query terms are commonly presented to the
`user together with corresponding check boxes that can be
`selectively marked or checked by the user to add terms to the
`query. In some implementations, the related query terms are
`alternatively presented to and selected by the user through
`drop down menusthat are provided on the queryresult page.
`In either case, the user can add additional terms to the query
`and then resubmit the modified query. Using this technique,
`the user can narrow the query result down to a more
`manageable set consisting primarily of relevant items.
`One problem with existing techniques for generating
`related query termsis that the related terms are frequently of
`little or no value to the search refinement process. Another
`problem is that the addition of one or morerelated terms to
`the query sometimes leads to a NULL queryresult. Another
`problem is that the process of parsing the query result items
`to identify frequently used terms consumessignificant pro-
`cessor resources, and can appreciably increase the amount of
`time the user must wait before viewing the query result.
`These and other deficiencies in existing techniques hinder
`the user’s goal of quickly and efficiently locating the most
`relevant items, and can lead to user frustration.
`
`SUMMARYOF THE INVENTION
`
`The present invention addresses these and other problems
`by providing a search refinement system and method for
`generating and displaying related query terms (“related
`terms”). In accordance with the invention, the related terms
`are generating using query term correlation data that is based
`on historical query submissions to the search engine. The
`query term correlation data (“correlation data”) is preferably
`based at least upon the frequencies with which specific terms
`have historically been submitted together within the same
`query. The incorporation of such historical query informa-
`tion into the process tends to produce related termsthat are
`frequently used by other users in combination with the
`submitted query terms, and significantly increases the like-
`lihood that these related terms will be helpful to the search
`refinement process. To further increase the likelihood that
`the related terms will be helpful,
`the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at least one
`match).
`the
`In accordance with one aspect of the invention,
`correlation data is stored in a correlation data structure
`(table, databasc, ctc.) which is used to look up related terms
`in response to query submissions. The data structure is
`preferably generated using an off-line process which parses
`a query log file, but could alternatively be generated and
`updated in real-time as queries are received from users. In
`one embodiment, the data structure is regenerated periodi-
`cally (e.g., once per day)
`from the most recent query
`submissions (¢.g., the last M days of entries in the query
`log), and thus strongly reflects the current tastes of the
`community of users; as a result, the related terms suggested
`by the search engine strongly reflect the current tastes of the
`community. Thus, for example, in the context of a search
`engine of an online merchant, the search engine tends to
`suggest related terms that correspond to the current best-
`selling products.
`In a preferred embodiment, each entry in the data struc-
`ture is in the form of a key term and a corresponding related
`termslist. Each related terms list contains the terms which
`have historically appeared together (in the same query) with
`
`15
`
`,,
`
`This present invention relates to query processing, and
`more specifically relates to techniques for facilitating the
`process of refining search queries.
`2. Description of Related Art
`With the increasing popularity of the Internet and the
`World Wide Web, it is common for on-line users to utilize
`search engines to search the Internet for desired information.
`Manywebsites permit users to perform searchesto identify
`a small number of relevant items among a muchlarger
`domain of items. As an example, several web indexsites
`permit users to search for particular web sites among known
`web sites. Similarly, many on-line merchants, such as
`booksellers, permit users to search for particular products
`among all of the products that can be purchased from the
`merchant. Other on-line services, such as Lexis™ and
`Westlaw™, allow users to search for various articles and
`court opinions.
`In order to perform a search, a user submits a query
`containing one or more query terms. The query may also
`explicitly or implicitly identify a record field or segment to
`be searched, suchastitle, author, or subject classification of
`the item. For example, a user of an on-line bookstore may 3
`submit a qucry containing terms that
`the uscr believes
`appear within the title of a book. A query server program of
`the search engine processes the query to identify any items
`that match the terms of the query. The set of itemsidentified
`by the query server program is referred to as a “query
`result.” In the on-line bookstore example, the query result is
`a set of books whosetitles contain someorall of the query
`terms. In the web indexsile example, the query result is a set
`of web sites or documents. In web-based implementations,
`the query result is typically presented to the user as a
`hypertextual listing of the located items.
`If the scope of the search is large, the query result may
`contain hundreds, thousands or even millions of items.If the
`user is performing the search in order to find a single item
`or a small set of items, conventional approachesto ordering
`the items within the query result often fail to place the
`sought item or items near the top of the query result list. This
`requires the user to read through many other items in the
`query result before reaching the sought item. Certain search
`engines, such as Excite™ and AltaVista™, suggest related
`query terms to the user as a part of the “search refinement”
`process. This allows the user to further refine the query and
`narrow the query result by selecting one or more related
`query terms that more accurately reflect the user’s intended
`request. The related query terms are typically generated by
`the search engine using the contents of the query result, such
`as by identifying the most frequently used terms within the
`located documents. For example, if a user were to submit a
`query on the term “FOOD,” the user may receive several
`thousand items in the query result. The search engine might
`then trace through the contents of someor all of these items
`and present
`the user with related query terms such as
`
`40
`
`45
`
`60
`
`65
`
`
`
`6,006,225
`
`10
`
`15
`
`3
`the respective key term with the highest degree of frequency,
`ignoring unsuccessful query submissions (query submis-
`sions which produced a NULL query result). The data
`structure thus provides an efficient mechanism for looking
`up the related terms for a given query term.
`To generate a set of related termsfor refining a submitted
`query (the “present query”), the related terms list for each
`term in the present query is initially obtained from the
`correlation data structure. If this step produces multiple
`related termslists (as in the case of a multiple-term query),
`the related termslists are preferably combined by taking the
`intersection between these lists (i.c., deleting the terms that
`are not commonto all lists). The related terms which remain
`are terms which have previously appeared, in at least one
`successful query submission,
`in combination with cvery
`term of the present query. Thus, assuming items have not
`been deleted from the database being searched, any of these
`related terms can be individually added to the present query
`while guaranteeing that the modified query will not produce
`a NULL queryresult. To take advantage of this feature, the
`related terms are preferably presented to the user via a user
`interface thal requires the user to add no more than one
`related term per query submission. In other embodiment, the
`related terms are selected and displayed without guarantee-
`ing a successful query result.
`Because the related terms are identified from previously-
`generated correlation data, without the need to parse docu-
`ments or correlate terms, the related terms can be identified
`and presented to the user with little or no added delay.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features will now be described with
`
`reference to the drawings summarized below. These draw-
`ings and the associated description are providedto illustrate ;
`a preferred embodimentof the invention, and notto limit the
`scope of the invention.
`Throughout the drawings, reference numbers are re-used
`to indicate correspondence betweenreferenced elements. In
`addition, the first digit of each reference numberindicates
`the figure in which the elementfirst appears.
`FIG, 1 illustrates a system in which users access website
`information via the Internet, and illustrates the basic website
`components used to implement a search engine which
`operates in accordance with the invention.
`FIG, 2 illustrates a sample book search page of the web
`site.
`
`40
`
`45
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The present invention provides a search refinement sys-
`tem and methodfor generating related query terms (“related
`terms”) using a history of queries submitted to a search
`engine by a community of users. Briefly, the system gener-
`ates query term correlation data whichreflects the frequency
`with which specific terms have previously occurred together
`within the same query. The system uses the query term
`correlation data in combination with the query term(s)
`entered bythe user to recommend additional query termsfor
`refining the query. ‘he incorporation of such historical query
`information into the process tends to produce related terms
`that are frequently used by other users in combination with
`the submitted query terms, and significantly increases the
`likelihood that these related terms will be helpful to the
`search refinementprocess. To further increase the likelihood
`that the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successfulqueryresult (at least one
`match).
`In the preferred embodiment, the query term correlation
`date is regenerated periodically from recent query
`submissions, such as by using the last M daysofentries in
`a query log, and thus heavily reflects the current tastes of
`users. As a result, the related terms suggested by the search
`engine tend to be terms that correspond to the most fre-
`quently searched items during the relevant time period.
`Thus, for example, in the context of a search engine of an
`online merchant, the search engine tends to suggest related
`terms that correspondto the current best-selling products.In
`one embodiment, the technique used to generate the related
`terms and present these terms to the user guarantees that the
`modified query will not produce a NULL queryresult.
`The search refinement methods of the invention may be
`implemented, for example, as part of a web site, an Internet
`site, an on-line services network, a document retrieval
`system, or any other type of computer system that provides
`searching capabilities to a community of users. In addition,
`the method may be combined with other methods for
`suggesting related terms, such as methods which process the
`contents of located documents.
`
`A preferred web-based implementation of the search
`refinement system will now be described with reference to
`FIGS. 1-9. For purposes of illustration,
`the system is
`described herein in the context of a search enginethatis used
`to assist customers of Amazon.com Inc. in locating items
`(e.g., books, CDs,etc.) from an on-line catalog of products.
`Throughout the description, reference will be made to vari-
`ous implementation-specific details of the Amazon.com
`implementation. These details are provided in orderto fully
`illustrate a preferred embodimentof the invention, and not
`to limit
`the scope of the invention. The scope of the
`invention is set forth in the appended claims.
`
`FIG. 3 illustrates sample log entries of a daily query log
`file.
`
`FIG. 4 illustrates the process used to generate the corre-
`lation table of FIG. 1.
`
`FIG. 5A illustrates a sample mapping before a query is
`added.
`
`FIG. 5B illustrates a sample mapping after a query is
`added.
`
`FIG. 6 illustrates a process for generating the correlation
`table from the most recent daily query log files.
`FIG, 7 illustrates a process for sclecting the related query
`terms from the correlation table.
`
`FIG, 8A illustrates a set of related query terms from a
`single-term query.
`FIG. 8B illustrates a set of intersecting terms and a set of
`related query terms from a multiple-terrn query.
`FIG. 9 illustrates a sample search result page of the web
`site.
`
`I. Overview of Web Site and Scarch Engine
`
`60
`
`65
`
`FIG. 1 illustrates the Amazon.com website 130, including
`components used to implement a search engine in accor-
`dance with the invention.
`As it is well knownin the art of Internet commerce, the
`Amazon.com web site includes functionality for allowing
`users to search, browse, and make purchases from an online
`catalog of booktitles, musictitles, and other types of items
`via the Internet 120. Because the catalog contains millions
`of items, it is important that the site provide an efficient
`mechanism for assisting users in locating items.
`
`
`
`6,006,225
`
`6
`5
`result page which contains hypertextual links to the items’
`As illustrated by FIG. 1, the web site 130 includes a web
`server application 131 (“web server”) which processes user
`respective product information pages.
`requests received from user computers 110 via the Internet
`For multiple-term queries,
`the query server 132 effec-
`120. These requests include queries submitted by users to
`tively logically ANDs the query terms together to perform
`search the on-line catalog for products. The web server 131
`the search. For example, if the user enters the terms “JAVA”
`records the user transactions, including query submissions,
`and “PROGRAMMING?”into the title field 220, the query
`server 132 will search for and returnalist of all items that
`within a query log 135. In the embodiment depicted in FIG.
`1, the query log 135 consists of a sequence of daily query log
`have both of these terms within the title. Thus, if any query
`files 135(1)-135(M), each of which represents one day of
`term does not produce a match (referred to herein as a
`transactions.
`“non-matching term”),
`the query will produce a NULL
`query result. Presenting a NULL queryresult to the user can
`cause significant user frustration. To reduce this problem, in
`this event, the user may be presented with a list of items that
`are deemed to be “close matches.” Although the search
`engine described herein logically ANDs the query terms
`together,
`it will be recognized that the invention can be
`applied to search engines that use other methods for pro-
`cessing queries.
`In accordance with the invention, the search engine uses
`the query term correlation data stored in the correlation table
`137 to select the related terms that best match the user’s
`query. The search engine then presents the related terms to
`the user, allowing the user to refine the search and enhance
`discovery of relevant information. The query term correla-
`tion data indicates relationships between query terms, and is
`used to effectively predict query terms that are likely to be
`helpful to the search refinement process. In accordance with
`another aspect of the invention, the correlation table 137
`preferably contains or reflects historical information about
`the frequencies with which specific query terms have
`appeared together within the same query.
`The general format of the correlation table 137 is illus-
`trated in FIG. 1. In the embodiment depicted in FIG. 1 and
`described in detail herein, the correlations between query
`terms are based solely on frequency of occurrence within the
`same query. As described below, other types of query tern
`correlations can additionally be used. In addition, although
`the disclosed implementation uses a table to store the query
`term correlation data, other types of databases can be used.
`As illustrated by FIG. 1, each entry within the correlation
`table 137 (two entries shown) has two primary components:
`a “key” term 140, and a “related terms”list 142 for that key
`term. ‘The related terms list 142 is a list of the N (e.g. 50)
`query terms that have appeared within the same query as the
`keyword with the highest degree of frequency, and is
`ordered according to frequency. For example, the entry for
`the key term COSMOS(ignoring the single-term prefixes,
`which are discussed below) is:
`COSMOS: ASTRONOMY, SAGAN, UNIVERSE,
`indicating that ASTRONOMYhasappeared together with
`COSMOSwith the highest degree of frequency; SAGAN
`has appeared with COSMOSwith the second highest degree
`of frequency, and so on. Each term that appears within the
`related terms list 142 is deemed to be related to the corre-
`
`HTML (Hypertext Markup Language) content which
`includes, among other things, product information pages
`which show and describe the various products.
`The query server 132 includes a related term selection
`process 139 which identifies related query terms based on
`query term correlation data stored in a correlation table 137.
`As depicted in FIG. 1 and described below, the correlation
`table 137 is generated periodically from the M most recent
`daily query log files 135(1)-135(M)using an off-line table
`generation process 136.
`The web server 131, query server 132, table generation
`process 136, and database software run on one or more
`Unix™-based servers and workstations (not shown) of the
`web site 130 although other types of platforms could be
`used. The correlation table 137 is preferably cached in RAM
`(random access memory) on the same physical machine as
`that used to implement the query server 132. To accommo-
`date large numbers of users, they query server 132 and the
`correlation table 137 can be replicated across multiple
`machines. The web site components that are invoked during
`the searching process are collectively referred to herein as a
`“search engine.”
`FIG, 2 illustrates the gencral format of a book scarch page
`200 of the Amazon.com website 130 that can be used to
`
`The web site 130 also includes a query server 132 which
`processes the queries by searching a bibliographic database
`133. The bibliographic database 133 includes information
`about the various products that users may purchase through
`the website 130. This information includes, for cxample, the
`titles, authors, publishers, subject descriptions, and ISBNs
`(International Standard Book Numbers) of booktitles, and
`the titles, artists, labels, and music classifications of music
`titles. The information for each item is arranged within fields
`(such as an “author”field and a “title” field), enabling the
`bibliographic database 133 to be searched on a field-
`restricted basis. The site also includes a database 134 of
`
`10
`
`15
`
`40
`
`45
`
`search the bibliographic database 133 for booktitles. Users
`have access to other search pages that can be used to locate
`music titles and other types of products sold by the on-line
`merchant. The book search page 200 includes author,title,
`and subject fields 210, 220, 240 and associated controls that
`allow the user to initiate fieldresiricted searches for book
`
`titles. Users can perform searches by first typing in the
`desired information into a search field 210, 220, 240 and ;
`then clicking on the appropriate search button 230, 250. The
`term or string of terms submitted to the search engine is
`referred to herein as the “query.” Other areas of the website
`ask the user to submit queries without limiting the terms to
`specific fields.
`Whenthe user submits a query from the book search page
`200 to the web site 130, a the query sever 132 applies the
`query to the bibliographic database 133, taking into account
`any field restrictions within the query. If the query result is
`a single item, the item’s product information page is pre-
`sentedto the user. If the query result includes multiple items,
`the list of items is presented to the user through a query
`
`60
`
`65
`
`sponding key term 140 by virtue of the relatively high
`frequency with which the terms have occurred within the
`same query.
`As further depicted by FIG. 1, each related term and each
`key term 140 preferably includes a single-character field
`prefix which indicates the search field 210, 220, 240 to
`which the term corresponds. These prefixes may, for
`example, be as follows: A=author, T=title, S=subject,
`R=artist, L=label, G=generic. In addition, each related term
`is stored together with a correlation score 146 which,in the
`preferred embodiment, indicates the number of times the
`related term has appeared in combination with the key term
`
`
`
`6,006,225
`
`7
`(within the search fields indicated by their respective field
`prefixes), not counting queries that produced a NULL query
`result.
`
`the related term (including prefix)
`Thus, for example,
`S-ASTRONOMYhasa correlation score of 410 under the
`
`key term T-COSMOS,indicating that four hundred and ten
`“successful” queries were received (during the time period
`to which the table 137 corresponds) which included the
`combination of COSMOS in the title field and
`
`field. Although the field
`ASTRONOMY in the subject
`prefixes and correlation scores 146 carry information which
`is useful to the related terms selection process (as described
`below), such information need not be preserved.
`In operation, when a user submits a query to the website
`130, the web server 131 passes the query to the query server
`132, and the query server applies the query to the biblio-
`graphic database 133. If the number o