`United States Patent
`6,006,225
`(11)
`[45]
`Date of Patent:
`Bowman et al.
`Dec. 21, 1999
`
`Patent Number:
`
`USO0G006225A00
`
`[54] REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`[75]
`
`Inventors: Dwayne E. Bowman, Woodinville;
`Ruben E. Ortega; Michael L.
`Hamrick, both of Seattle: Joel R.
`Spiegel, Woodinville; Timothy R.
`Kohn, Seattle, all of Wash.
`
`[73] Assignee: Amazon.Com, Seattle, Wash.
`
`[21] Appl. No. 09/145,360
`
`[22]
`
`Filed:
`
`Sep. 1, 1998
`
`Related U.S. Application Data
`Provisional application No. 60/089244_ Jun. 15, 1998.
`[60]
`Int. CL.°....
`[Sl]
`GO06F 17/30
`
`. 1707/5; 707/2; 7O7/4; 707/10
`[52] U.S. Cl.
`.....
`Field of Searely0.cen TOTS, 2, 10, 4
`[58]
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`QuarierDeck Web Page, Downloaded Sep. 9, 1996, hup://
`aracnid.qdeck.com/qdeck/ products/webcompass.
`Towell et al. “Learning Collection Fusion Strategies for
`Information Retrieval”, Proceedings of the 12th Anoual
`Machine Learning Conference, Jul. 1995, pp, 540-548.
`Voorhees et al., “Learning Collection Vusion Strategies”,
`Proceedings of SIGIR "95, Jul. 1995, pp. 172-179.
`Voorheeset al., “The Collection Fusion Problem” Proceed-
`ings of TREC—3, NIST Special Publication 500-225, Apr.
`1995, pp. 95-104,
`Abstract of Generating Advanced Query Interfaces, Lee,
`Srivastava and Vista, Computer Networks and ISDN Sys-
`tems Conference Tile: Comput. Netw, ISDN Syst. (Neth-
`erlands) vol. 30, No. 1-7, pp. 656-657 (1998).
`Abstract ofUsing Combination ofEvidencefor Term Expan-
`sion, Wilkinson, Information Retrieval Research, Proceed-
`ings of the 19h Annual BCS-IRSG Colloquium on IR
`Research (1997),
`
`(List continued on next page.)
`Primary Examiner—Paul R. Linte
`Attorney, Agent, or Pirm—Knobhe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`ee TOSIO
`10/1997 Schuetz ous
`5,675,819
`Aseirch engine is disclosed which suggests related terms to
`the user to allow the user to refine 4 search. The related terms
`. W729
`2/1998 Rubinstein ..
`S.721,897
`FORS
`7/1998 Tukey etal.
`5,787,422
`are generated using query term correlation data which
`» FOF
`S/L898, Rubinstein .....
`5,794,233
`reflects the frequencies with which specific terms have
`
`5.864.845=1/1999 Voorhees et al. TONS
`previously appeared within the same query, The correlation
`S.911,140
`6/1999 Tukey el al...
`. WTS
`data is generated and stored in a look-up table using an
`FOF/LO
`S{13,215
`6/1999 Rubinstein...
`oll-line process which parses a query log file. The table is
`OTHER PUBLICATIONS
`regenerated periodically from the most recent query sub-
`missions (e2., (he last bwo weeks of query submissions), and
`thus strongly reflects the current preferences of users, Each
`related term is presented to the user via 4 respective hyper-
`link which can be selected by the user to submit a modified
`query. In one embodiment, the relatedterms are added to and
`selected from the table so as to guarantee that the modilied
`queries Will not produce a NULLqueryresult.
`
`
`
`Bartell et al., “Automatic Combination of Multiple Ranked
`Retrieval Systems", Proceedings of SIGIR ‘94, Jul, 1994,
`pp. 173-181,
`Belkin etal, “The Effect of Multiple Query Representations
`on Information System Perlormance™ Proceedings of SIGIR
`"93, Jun. 1993, pp. 339-346.
`Shaw et al., “Combination of Multiple Searches”, Proceed-
`ings of TREC-3, Apr. 1995, pp. 105-108.
`
`28 Claims, 10 Drawing Sheets
`
`(BEGIN)
`aFit
`— i
`PARSE DAILY LOG FILE
`
`To EXTRACT QUERY
`
`SUBMISSIONS FOR WHICH
`WEMS FOUND > 0
`
`
`uy
`CORRELATETERMS @ASEI
`ON FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`
`
`| CREATE Daley
`|
`RESULTS FILE
`~440
`
`MERGE DAILY
`RESULTS FILES FOR
`LAST M DAYS:
`
`
`
`_——
`|, OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`tie
`
`ENO
`
`PETITIONERS - EXHIBIT 1026
`PETITIONERS- EXHIBIT 1026
`
`IPR2022-00217
`
`IPR2022-00217
`
`
`
`6,006,225
`Page 2
`
`OTHER PUBLICATIONS
`
`the NECI Meta Search Engine,
`Abstract of JInguirts,
`Lawrence and Giles, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw ISDN Syst. (Nether-
`lands) vol. 30, No. 1-7, pp. 95-105 (1998).
`Abstract of Facilitating Complex Web Queries Through
`Visual User Interfaces and Query Relaxtion, Li and Shim,
`Computer Networks and ISDN Systems Conference Title:
`Comput. Netw. ISDN Syst. (Netherlands) vol. 30, No. 1-7,
`pp. 149-159 (1998),
`A User—centred Evaluation ofRanking Algoritims for Inter-
`active Query Expansion, Etthimiadis, Proceedings of the
`16th Annual International ACM SIGIR Conference, Pitts-
`burgh, pp. 146-159 (1993).
`Concep! Based Query Expansion, Qiu and Frei, Proceedings
`of the 16th Annual International ACM SIGIR Conference,
`Pittsburgh, pp. 160-169 (1993).
`Improving Retrieval Performance by Relevance Feedback,
`Salion and Buckley, J. of Am. Society for Info. Science
`41(4):288-297 (1990).
`
`Query Expansion Using Domain-Adapted, Weighted The-
`saurus invan Extended Boolean Model, Kwon, Kim and
`Choi, Proceedings of the 3rd International Conference on
`Information and Knowledge Management (CIKM‘94), pp.
`140-146 (1994),
`Browsing Through Querying: Designing for Flectranic
`Books, Charoenkitkarn, Tam, Chignell and Golovichinsky,
`at
`the 5th ACM Conference on Hypertext, Seatlle, WA
`206-216 (1993),
`A Survey of Information Retrieval and Filtering Methods,
`Faloutsos and Oard, Univ, of Maryland, 22 pages (undated).
`A Corpus Analysis Approach for Automatic Query Expan-
`sion, Gauch and Wang, Proceedings of the 6th Loternational
`Conference on Information and Knowledge Mangement,pp.
`278-284 (1997).
`Discovering Web Acess Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xin and Han, Proceedings of the IEEE Forum on Research
`and ‘Technology Advances
`in Digital Libraries (IEEE
`ADL*98), pp. 19-29 (1998).
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 1 of 10
`
`NIANIS|JIHdvYDONElsti||Y3AUIS||SSAgno||si;LEE||ausa3Mey|peeeheeeSaSSSSESSNy
`
` NOILWYSNI9|wav!||||||||SS390¥dNOWOSTSS|asvavivdWY3L
`
`
` AYaNO||LEE||||ss3008d||—-4
`
`
`FlavLNOlVTaNyODONINNVSDONd-SOly)ANONOYLSY-S||vAWr-S|___——=—=—«SOWSOD=1|$4/|T
`SSSSSSSSSSSe‘—*——————SS>.-_7]|||GFL||||elt:*|(sezidv-s_|(\(eoe) 3su3AINN-Sort||Z6Z334409-SO6ENVOYS-¥|OZ)
`
`
`O3LV13y
`
`6,006,225
`
`
`
`OCs
`
`OLL
`
`LO
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 2 of 10
`
`6,006,225
`
`File
`Edit
`View
`Go
`Favorite
`Help
`pea e38 AgaeQg Aa
`Back Forw...
`Stop Refresh Home Search
`Favorite
`Font
`
`
`
`
`Address|http: //www.amozon.com/book_search v7]
`
`amazon.com Book Search
`a—tio,
`
`Amazon.com Home | Music Search | Your Account
`
`
`
`
`Enter Author and/or Title
`
`Author:
`
`@Exact Name
`
`OLast, First Name
`
`OStart of Last Name
`
`210
`
`Title:
`
`LEO
`
`O&Exact Title
`
`@Title Word(s)
`
`OStart(s) of Title Words
`
`
`
`Author Search Tips / Title Search Tips
`
`search by Subject
`
`240
`
`OExact Subject
`| Search Now |
`
`OStart of Subject
`@Subject Word(s)
`| Clearthe Form |
`
`OStart(s) of Subject Word(s)
`
`Subject Search Tips
`
`Other Search Methods:
`ISBN, Publisher/Date, Quick Search
`
`FIG.
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 3 of 10
`
`6,006,225
`
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search_pags
`PATH_INFO=/book_search
`title = Snow Crash
`items_found = 2
`
`Friday, 13-Feb—98 02:24:11
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search
`PATH_INFO=/ISBN = 0553562614
`
`370
`
`S20
`
`530
`
`SAO Friday, 13-Feb-98 02:23:52
`
`Friday, 13-Feb-—98 06:15:03
`User Identifier = 54730543261
`HTTP_REFERRER= http://www.amazon.com/music_search_page
`PATH_INFO=/music_search
`artist = This and That
`items_found = 0
`
`Friday, 13-Feb-98 10:07:34
`User Identifier = 027385918272
`HTTP_REFERRER= hitp://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`subject = outdoortrail
`itams_found = 22
`
`FIG. F
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 10
`
`6,006,225
`
`470
`
`PARSE DAILY LOG FILE
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`ITEMS FOUND > 0
`
`CREATE DAILY
`RESULTS FILE
`
`440
`
`
`MERGE DAILY
`RESULTS FILES FOR
`LAST M DAYS
`
`
`CORRELATE TERMS BASED
`ON FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`
`
`
`
`
`
`
`
`
`
`
`
`OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`450
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 5 of 10
`
`6,006,225
`
`sw4a]N
`
`BLES
`
`NOMNA-L
`
`VGDL/
`
`(1Z)LIaaeav9-V
`
`(£9)qlg—-S
`
`(Z6)XIN-S
`
`(Ss)9ONIZVIE-S
`
`(€z)YOOGLNO-S
`(Zt) S1YOdS-S
`
`(S)9ONIZWIE-L
`(6)NOILVOVA-S
`
`(001)ONINIG~—L
`
`(9¢)NOWVONdI-L
`
`(¢Z)3I8-L
`
`(17) SLYOdS-S
`
`(Sg)TIVYL-S
`(€Z)NOILYOVA-S
`(Ol)Y3NOVM-¥
`
`(Z)Tival-L
`
`(Z1)aSIOY3OX3-S
` (8) SALVA-V(7S)=-aIVd3u—-L
`
`
`($9)4YOoOdLNO-S(yo)=ASSNH—L
`
`
`(71)Tval-L
`
`(1+)WVul-S
`(S)NIMXNVaI-V
`
`WVL-S
`
`(z)NosTYyvo-V
`
`CRS
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 6 of 10
`
`6,006,225
`
`GLE1
`
`oeTIVal-S
`
`
`
`suLELINsoe(yz)yooaLno-s(99)jivui-s|(v5)—alvd3u-L
`
`
`
`(79)3xIg-S=(S)NIOMNVYI-V
`
`(se)9NIZzvIa-Srh(z)NoSTayo-¥
`(ZL)SLYOdS-S(Z)qIWal-L(Zr)Tval-s
`
`
`
`
`
`
`
`(Z6)XIN-S(17)S1¥OdS-S(99)yOoodLNO-s
`
`
`-(S)ONIZWI€-L(v2)aeek(Z1)asiod¥3ox3-S
`
`(001)ONINIG=L
`
`NONNA-S
`
`(6)NOILYOVA-S
`
`(01) YaNOVM-V
`
`(€zZ)NOILVOVA-S
`
`(71)TWVel-L
`(8)S3LVA-V
`
`GGD7
`
`06S
`
`
`
`SZP/
`
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 7 of 10
`
`6,006,225
`
`Date
`
`Daily Log
`File
`
`Daily Results
`File
`
`Dates of Query
`Correlation Table
`
`Query Correlation
`Table
`
`8—-Feb-98I"
`
`1—Feb-98
`
`2—Feb-98
`
`3—Feb-98
`
`4—Feb-98
`
`5—Feb-98
`
`6—-Feb-98
`
`7—Feb-98
`
`670
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of10
`
`6,006,225
`
`770
`
`FOR EACH TERM
`IN THE QUERY
`
`720
`
`
`
`LOOK UP TERM IN THE
`CORRELATION TABLE
`
`
`
`
`
`[enreru
`
`
`
`
`730
`
`RETRIEVE THE TERM’S
`RELATED TERMS LIST
`
`440
`
`7IO
`
`MULTI-TERM
`QUERY
`?
`
`
`
`760
`
`COMBINE ALL RELATED
`TERMS LISTS
`
`SELECT X TERMS WITH
`HIGHEST VALUES
`
`FIG. 7
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 9 of 10
`
`6,006,225
`
`Top 3 Related Terms:
`
`TRAIL — MIX
`
`TRAIL — YUKON 3H
`
`TRAIL — BIKE
`
`E10
`
`F1G.EA
`
`Intersecting Terms:
`
`620
`
`S — BIKE
`S — SPORTS
`S — VACATION
`
`E30
`Pee
`
`Top 3 Related Terms:
`OUTDOOR TRAIL — BIKE
`OUTDOOR TRAIL — SPORTS
`OUTDOOR TRAIL — VACATION
`
`FIG. EB
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 10
`
`6,006,225
`
`900
`
`Address|http: //www.amezon.com/book_search 7]
`
`
`
`
`
`
`
`
`
`File
`Edit
`View
`Go
`Faverite
`Help
`
`
`
`== @H & Qa GAeB@
`
`
`Bock Forw..
`Stop Refresh Home Seerch
`Fovorite
`Font
`
`
`
`
`
`Your Search Results
`
`
`
`
`
`Related Query Terms:Click on a link below to narrow your
`search by adding a related term.
`(
`OUTDOOR TRAIL — BIKE
`
`
`
`i OUTDOOR TRAIL — VACATION
`
`
`
`
`
`
` Outdoor & Trail Guide to Yosemite
`
`amazon.com Book Search
`oe
`
`for: the subject words include "OUTDOOR TRAIL"
`
`#10
`
`OUTDOOR TRAIL — SPORTS
`
`Top Matches for this search:
`
`* Outdoor & Trail Guide to Yosemite
`
`* Outdoor Training on Mountian Bike Trails
`
`Full Results: the First 100 are shown below. to See
`more results scroll down and click the “More” button.
`
`The blessed trail: outdoor diary
`
`
`
`
`T h
`
`G20
`
`e Great Outdoor Cookbook: Trail Recipies
`
`FIG.G
`
`
`
`6,006,225
`
`I
`REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`RELATED APPLICATION
`
`This application claims the benefit of U. S. Provisional
`Application No. 60/089,244, filed Jun. 15, 1998, the disela-
`sure of which is hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of Invention
`
`aq
`
`2
`“RESTAURANTS,” “RECIPIES,” and “FDA”toallow the
`user to reline the query.
`The related query terms are commonly presented to the
`user togeiher with corresponding check boxes that can he
`selectively marked or checked by the user to add terms to the
`query. [In some implementations, the related query terms are
`alternatively presented to and selected by the user through
`drop down menusthat are provided on the query result page.
`In either case, the user can add additional terms to the query
`aod then resubmit the modified query. Using this technique,
`the user can narrow the query result down to a more
`manageable set consisting primarily of relevant items.
`One problem with existing techniques for generating
`related query termsis that the related termsare frequently of
`little or no value to the search refinement process, Another
`problem is that the addition of one or more related terms to
`the query sometimesleacls lo a NULL query result. Another
`problem is thal the process of parsing the query resull items
`to identify frequently used terms consumes significant pro-
`cessor resources, and can appreciably increase the amount of
`lime the user must wait before viewing the query result.
`These and other deficiencies in existing techniques hinder
`the user’s goal of quickly and efficiently locating the most
`relevant items, and cao lead to user frustration,
`
`SUMMARY OF‘THE INVENTION
`
`The present invention addresses these and other problems
`by providing a search refinement system and method for
`generating and displaying related query terms (“related
`terms”). In accordance with the invention, the related terms
`are generating using query term correlation data thal is based
`on historical query submissions to the search engine, The
`query lerm correlation data (“correlation cata”) is preferably
`based at least upon the frequencies with which specific terms
`have historically been submitted together within the same
`query. The incorporation of such historical query inlorma-
`lion into the process tends to produce related terms that are
`frequently used by other users in combination with the
`submitted query terms, and significantly increases the like-
`lihood that these related terms will be helpful to the search
`refinement process. To further increase the likelihood that
`the related terms will be helpful,
`the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at least one
`match).
`the
`In accordance with one aspect of the invention,
`correlation data is stored in a correlation data structure
`(table, database, ete.) which is used to look up related terms
`in response to query submissions. The data structure is
`preferably generated using an off-line process which parses
`a query log file, but could alternatively be generated and
`updated in real-Lime as queries are received from users. Ip
`one embodiment, the data structure is regenerated periodi-
`cally (e.g. once per day)
`from the most
`recent query
`submussions (e.g., the last M days of entries in the query
`log), and thus strongly reflects the current lasles of the
`community ofusers; as a result, the related terms suggested
`by the search engine strongly reflect the current tastes of the
`community. Thus, for example,
`in the context of a search
`engine of an online merchant, the search engine tends to
`suggest related terms that correspond to the current best-
`selling products.
`In a preferred embodiment, each entry in the data struc-
`ture is in the form of a key term and a corresponding related
`termslist. Each related terms list contains the terms which
`have historically appeared together (in the same query) with
`
`invention relates to query processing, and
`This present
`more specifically relates to techniques for facilitating the
`process ofrefining search queries,
`2, Description of Related Art
`With the increasing popularity of the [nternet and the
`World Wide Web, it is common for on-line users to ulilize
`search engines to search the Internet for desired information.
`Many websites permit users to perform searches to identify
`a small number of relevant
`items among a much larger
`domain of items. As an example, several web index sites
`permit users to search for particular web siles among known
`web sites. Similarly, many on-line merchants, such as
`booksellers, permit users to search for particular products
`among, all of the products thal can be purchased from the
`merchant. Other on-line services, such as Lexis™ and
`Westlaw™), allow users to search for various articles and
`court opinions.
`In order to perform a search, a user submils a query
`containing one or more query terms. The query may also
`explicitly or implicitly identify a record field or segment to
`be searched, such as title, author, or subject classification of
`the item. For example, a user of an on-line bookstore may 2
`submit a query centaining terms that
`the user believes
`appear within the title of a book. A query server program of
`the search engine processes the query to identify any items
`that match the terms of the query. The set of items identified
`by the query server programis referred to as a “query
`result.” In the on-line bookstore example, the query result is
`a sel of books whose litles contain someor all of the query
`terms. In the web index site example, ihe query resull is a set
`of web sites or documents. In web-based implementations,
`the query result
`is typically presenied to the user as a
`hypertextual listing of the located items.
`If the scope of the search is large, the query result may
`contain hundreds, thousands or even millions of items. Ifthe
`user is performing the search in order to find a single item
`ora small set of items, conventional approaches to ordering
`the items within the query result often fail
`to place the
`sought item or items near the top of the query result list. This
`requires the user lo read through many other items in the
`query result before reaching the sought item. Certain search
`engines, such as Excite™ and AlltaVista™, suggest related
`query terms to the user as a part of the “search refinement”
`process. This allows the user to further refine the query and
`narrow the query result by selecting one or more related
`query terms that more accurately reflect the user's intended
`request. The related query terms are typically generated by
`ihe search engine using the contents of the query result, such
`as by identifying the most frequently used terms within the
`located documents. For example, if a user were to submit a
`query on the term “FOOD,” the user may receive several
`thousand items in the query result. The search engine might
`then tracethrough the contents of some or all of these items
`and present
`the user with related query terms such as
`
`40
`
`50)
`
`55
`
`6
`
`65
`
`
`
`6,006,225
`
`3
`the respective key term with the highest degree of frequency,
`ignoring unsuccessful query submissions (query submis-
`sions which produced a NULI. query result), The data
`structure thus provides an efficient mechanism for looking
`up the related terms for a grven query term.
`‘To generate a set ofrelated termsfor refining a submitied
`query (the “present query”), the related terms List for each
`term in the present query is initially obtained from the
`correlation data structure. If this step produces multiple
`related termslists (as in the case of a multiple-term query),
`the related terms lists are preferably combined by taking the
`intersection between these lists (i.¢., deleting the terms that
`are not common to all lists). The related terms which remain
`are terms which have previously appeared,
`in at least one
`successful query submission,
`in combination with every
`term ofthe present query. Thus, assuming items have not
`been deleted from the database being searched, any ofthese
`related terms can be individually added to the present query
`while guaranteeing that the modified query will not produce
`a NULL query resull. To take advantage ofthis feature, the
`related terms are preferably presented to the user via a user
`interface that requires the user to add no more than one
`related term per query submission. In other embodiment, the
`related terms are selected anddisplayed wilhout guarantee-
`ing a successful query result.
`Because the related terms are identified from previously-
`generated correlation data, without the need fo parse docu-
`menis or correlate terms, the related terms can be identified
`and presented to the user with little or no added delay,
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`2Q
`
`These and other features will now be described with
`reference to the drawings summarized below, These draw-
`ings and the associated description are provided to illustrate;
`a preferred embodiment of the invention, and not to limit the
`scope of the invention.
`Throughout the drawings, reference numbers are re-used
`to indicate correspondence between referenced elements. In
`addition, the first digit of each reference number indicates
`the figure in which the element first appears.
`FIG. 1 illustrates a system in which users access website
`information via the Internet, and illustrates the basic website
`components used to implement a search engine which
`operates in accordance with the invention,
`FIG, 2 illustrates a sample book search page of the web
`sile.
`
`40
`
`FIG. 3 illustrates sample log entries of a daily query log
`file.
`
`FIG. 4 illustrates the process used to generate the corre-
`lation table of PIG. 1.
`
`PIG, 5A illustrates a sample mapping before a query is
`added.
`
`PIG, 5B illustrates a sample mapping aller a query is
`added.
`
`FIG. 6 illustrates a process for generating the correlation
`table from the most recent daily query log files.
`FIG, 7 illustrates a process for selecting the related query
`lerms from the correlation table.
`
`FIG. SA illustrates a set of related query terms from a
`single-lerm query.
`FIG. 8B illustrates a set of intersecting terms and a set of
`related query terms from a multiple-terrn query.
`FIG. 9 illustrates a sample search result page of the web
`site.
`
`SU)
`
`55
`
`60
`
`65
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The present invention provides a search refinement sys-
`tem and method for generating related query terms (“related
`terms”) using a history of querics submitted to a search
`cogine by a community of users. Briefly, the system gener-
`ales query term correlation data which reflects the frequency
`with whichspecific terms have previously occurred together
`within the same query. The system uses the query term
`correlation data in combination with the query term(s)
`entered by the user to recommend additional query terms for
`refining the query. The incorporation of such historical query
`information into the process tends to produce related terms
`ihal are frequently used by other users in combination wilh
`the submitted query terms, and signilicantly increases the
`likelihood that these related terms will be helpful to the
`search refinement process. To further increase the likelihood
`that the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at leasi one
`match).
`In the preferred embodiment, the query lerm correlation
`date is regenerated periodically from recent query
`submissions, such as by using the last M days of entries in
`a query log, and thus heavily reflects the current tastes of
`users. As a result, the related terms suggested by the search
`engine tend to be terms that correspond to the most [re-
`quently searched items during the relevant
`time period,
`Thus, for example, in the context of a search engine af an
`online merchant, the search engine tends to suggest related
`terms that correspond to the current best-selling products. In
`one embodiment, the technique used to generate the related
`terms and present these terms to the user guaranices that the
`modified query will not produce a NULL query result.
`‘The search relinement methods of the invention may be
`implemented, for example, as part of a web site, an Internet
`sile, an on-line services network,
`a document
`retrieval
`sysiem, or any other type of compuler system that provides
`searching capabilities to a community of users. In addition,
`the method may be combined with other methods for
`suggesting related terms, such as methods which process the
`contents of located documenis.
`
`A preferred web-based implementation of the search
`refinement system will now be described with reference to
`FIGS. 1-9. For purposes of illustration,
`the system is
`described herein in the context of a search engine that is used
`lo assist customers of Amazon.com Inc.
`in locating items
`(e.g., books, CDs, etc.) from an on-line catalog of products.
`Throughout the description, reference will be made to vari-
`ous implementation-specific details of the Amazon.com
`implementation, These details are providedin order to fully
`illustrate a preferred embodiment of the invention, and not
`to limit
`the scope of ihe invention. The scope of the
`invention is set forth in the appended claims.
`
`L. Overview of Web Site and Search Engine
`FIG. Lillustrates the Amazon.com web site 130, including
`components used to implement a search engine in aceor-
`dance with the invention.
`As it is well known in the art of Internet commerce, the
`Amazon.com web site includes functionality for allowing
`users to search, browse, and make purchases from an online
`catalog of book titles, music titles, and other types of items
`via the Internet 120. Because the catalog contains millions
`of items,
`it is important that the site provide an efficient
`mechanismfor assisting users in locating items.
`
`
`
`6,006,225
`
`5
`Asillustrated by FIG. 1, the website 130 includes a web
`server application 131 (“web server") which processes user
`requests received from user computers 110 via the Internet
`120. These requests include queries submitied by users to
`search the on-line catalog for products. The web server 131
`records the user transactions, including query submissions,
`within a query log 135. In the embodiment depicted in FIG,
`1, the query log 135 consists of a sequence of daily query log
`files 135(1)}-135(M), each of which represents one day of
`transactions.
`
`The website 130 also includes a query server 132 which
`processes the queries by searching a bibliographic database
`133, The bibliographic database 133 includes information
`about the various products thal users may purchase through
`the web site 130, This information includes, for example, the
`litles, authors, publishers, subject descriptions, and ISBNs
`(International Standard Book Numbers) of book titles, and
`the titles, artists, labels, and music classifications of music
`utles. The information lor cach item is arranged within fields
`(such as an “author” field and a “title” field), enabling the
`bibliographic database 133 to be searched on a
`field-
`restricted basis. The site also includes a database 134 of
`HTML (Hypertext Markup Language) content which
`includes, among other things, product
`information pages
`which show and describe the various products.
`The query server 132 includes a related term selection
`process 139 which identifies related query terms based on
`query term correlation data stored in a correlation table 137.
`As depicted in FIG, 1 and described below, ihe correlation
`table 137 is generated periodically from the M most recent
`daily query log files 135(1)-135(M) using an off-line table
`generation process 136.
`The web server 131, query server 132, table generation
`process 136, and database software run on one or more
`Unix'™-based servers and workstations (not shown) of the
`web site 130 although other types of platforms could be
`used. The correlation table 137 is preferably cached inRAM
`(random access memory) on the same physical machine as
`that used to implement the query server 132. To accommo-
`date large numbers of users, they query server 132 and the
`correlation table 137 can be replicated across. multiple
`machines. The web site components that are invoked during
`the searching process are collectively referred to herein as a
`“search engine.”
`FIG. 2 illustrates the general format of a book search page
`200 of the Amazon.com website 130 that can be used to
`searchthe bibliographic database 133 for book titles. Users
`have access to other search pages thal can be used to locate
`music titles and other types of products sold by the on-line
`merchant. The book search page 200 includes author, lille,
`aod subjectfields 210, 220, 240 and associated controls that
`allow the user to initiate ficldrestricted searches for book
`lites. Users can perform searches by first
`typing in the
`desired information into a search field 210, 220, 240 and
`then clicking on the appropriate search button 230, 250. The
`term or string of terms submitted to the search engine is
`referred to herein as the “query.” Other areas of the web site
`ask the user to submit queries without limiting the terms to
`specific fields.
`When the user submits a query from the book search page
`200io the web site 130, a the query sever 132 applies the
`query to the bibliographic database 133, raking into account
`any field restrictions within the query. If the query result is
`a single item, the ilem’s produc! information page is pre-
`sented to the user. If the query result includes multiple items,
`the list of items is presented to the user through a query
`
`20
`
`45
`
`SU)
`
`55
`
`60
`
`65
`
`6
`result page which contains hypertextual links to the items’
`respective product information pages.
`For mulliple-lerm queries, the query server 132 effec-
`tively logically ANDs the query terms together to perform
`the search. For example, if the user enters the terms “JAVA”
`and "PROGRAMMING" into the title field 220, the query
`server 132 will search for and return a list of all ttems that
`have both of these terms within the title. Thus, if any query
`term does nat produce a match (referred to herein as a
`“pon-matching term”),
`the query will produce a NULL
`query result. Presenting a NULL query result to the user can
`cause significant user rustration. To reduce this problem, in
`this event, the user may be presented with a list of items that
`are deemed to be “close matches,” Although the search
`engine described herein logically ANDs the query terms
`together,
`it will be recognized that
`the invention can be
`applied to search engines that use other methods for pro-
`cessing queries.
`Tn accordance with the invention, the search engine uses
`the query term correlation data stored in the correlation table
`137 to select the related terms that best match the user's
`query. The search engine then presents the related terms to
`the user, allowing the user lo refine the search and enhance
`discovery of relevant information, The query term correla-
`tion data indicates relationships between query terms, andis
`used to effectively predict query terms thai are likely to be
`helpful to the search refinement process. In accordance with
`another aspect of the invention, the correlation table 137
`preferably contains or reflects historical information about
`the frequencies with which specific query terms have
`appeared together within the same query.
`‘The general format of ithe correlation table 137 is illus-
`trated in FIG. L. In the embodiment depicted in FIG. 1 and
`described in detail herein, the correlations between query
`terms are based solely on frequency of occurrence within the
`same query. As described below, other types of query tern
`correlations can additionally be used. In addition, although
`the disclosed implementation uses a table to store the query
`term correlation data, other types of databases can be used,
`As illustrated by PIG. 1, each entry within the correlation
`table 137 (two entries shown) has two primary components:
`a“key” term 140, and a “related terms”list 142 for that key
`term, The related termslist 142 is a list of the N (e.g. 50)
`query terms that have appeared within the same query as the
`keyword with the highest degree of frequency, and is
`ordered according to frequency, Por example, the entry lor
`the key term COSMOS(ignoring the single-lerm prefixes,
`which are discussed below) is:
`COSMOS: ASTRONOMY, SAGAN, UNIVERSE,
`indicating that ASTRONOMY has appeared together wilh
`COSMOS with the highest degree of frequency; SAGAN
`has appeared with COSMOSwith ihe second highest degree
`of frequency, and so on. Each term that appears within the
`related terms list 142 is deemedto be related to the corre-
`sponding key term 140 by virtue of the relatively high
`frequency with which the terms have oecurred within the
`same query,
`As further depicted by FIG. 1, cach related term and each
`key term 140 preferably includes a single-charaeter field
`prefix which indicates the search ficld 210, 220, 240 to
`which the term corresponds, These prelixes may,
`for
`example, be as follows; A=author, T'=tithe, S=subject,
`R=arlist, L=label, G=generic. In addition, each related term
`is stored together with a correlation score 146 which, in the
`preferred embodiment,
`indicates the number of times the
`related term has appeared in combination with the key term
`
`
`
`6,006,225
`
`20)
`
`2
`
`40
`
`7
`(within the search fields indicated by their respective field
`prelixes), nol counting queries that produced a NULL query
`resull.
`the related term (including prefix)
`for example,
`Thus,
`5-ASTRONOMY has a correlation score of 410 under the
`key term 1-COSMOS,indicating that four hundred and ten
`“successful” queries were received (during the time period
`to which the table 137 corresponds) which included the
`combination of COSMOS in the tithe
`field and
`ASTRONOMYin the subject
`field. Although the field
`prefixes and correlation scores 146 carry information which
`is useful to the related terms selection process (as described
`below), such information need not be preserved,
`In opera