throbber
p15
`United States Patent
`6,006,225
`Dec.21, 1999
`[45] Date of Patent:
`Bowmanet al.
`
`(11] Patent Number:
`
`US006006225A
`
`[54] REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`[75]
`
`Inventors: Dwayne E. Bowman, Woodinville;
`Ruben E. Ortega; Michael L.
`Hamrick, both of Seattle; Joel R.
`Spiegel, Woodinville; Timothy R.
`Kohn,Seattle, all of Wash.
`
`
`
`73] Assignee: Amazon.Com, Seattle, Wash.
`
`21] Appl. No.: 09/145,360
`
`22]
`
`Filed:
`
`Sep. 1, 1998
`
`Related U.S. Application Data
`Provisional application No. 60/089,244, Jun. 15, 1998.
`60]
`SL] Tn, Co eeecccecssssssesscecssssscecssessneeeess G06F 17/30
`52] US. Cheeee 707/5; 707/2; 707/4; 707/10
`58] Field of Search oes 707/5, 2, 10, 4
`
`56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,675,819
`5,721,897
`5,787,422
`5,794,233
`5,864,845
`5,911,140
`5,913,215
`
`10/1997 SchuetZe woes cesses 704/10
`2/1998 Rubinstein ....
`-- FO7/2
`
`we FOT/S
`7/1998 Tukeyet al.
`..
`
`8/1998 Rubinstein ........
`. F07/4
`. FOT/S
`1/1999 Voorheeset al.
`.
`
`6/1999 Tukeyetal. ......
`. FOT/S
`6/1999 Rubinstein oes FOT/L0
`OTHER PUBLICATIONS
`
`Bartell et al., “Automatic Combination of Multiple Ranked
`Retrieval Systems”, Proceedings of SIGIR °94, Jul. 1994,
`pp. 173-181.
`Belkin et al., “The Effect of Multiple Query Representations
`on Information System Performance” Proceedings of SIGIR
`°93, Jun. 1993, pp. 339-346.
`Shaw et al., “Combination of Multiple Searches”, Proceed-
`ings of TREC-3, Apr. 1995, pp. 105-108.
`
`QuarterDeck Web Page, Downloaded Sep. 9, 1996, http://
`aracnid.qdeck.com/qdeck/products/webcompass.
`Towell et al. “Learning Collection Fusion Strategies for
`Information Retrieval”, Proceedings of the 12th Annual
`Machine Learning Conference, Jul. 1995, pp. 540-548.
`Voorhees et al., “Learning Collection Fusion Strategies”,
`Proceedings of SIGIR ’95, Jul. 1995, pp. 172-179.
`Voorheesetal., “The Collection Fusion Problem” Proceed-
`ings of TREC—3, NIST Special Publication 500-225, Apr.
`1995, pp. 95-104.
`Abstract of Generating Advanced Query Interfaces, Lee,
`Srivastava and Vista, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw. ISDN Syst. (Neth-
`erlands) vol. 30, No. 1-7, pp. 656-657 (1998).
`Abstract of Using Combination ofEvidence for Term Expan-
`sion, Wilkinson, Information Retrieval Research, Proceed-
`ings of the 19th Annual BCS-IRSG Colloquium on IR
`Research (1997).
`
`(List continued on next page.)
`Primary Examiner—Paul R. Lintz
`Attorney, Agent, or Firm—Knobbe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`Asearch engineis disclosed which suggests related terms to
`the userto allow the userto refine a search. The related terms
`are generated using query term correlation data which
`reflects the frequencies with which specific terms have
`previously appeared within the same query. The correlation
`data is generated and stored in a look-up table using an
`off-line process which parses a query log file. The table is
`regenerated periodically from the most recent query sub-
`missions(e.g., the last two weeks of query submissions), and
`thus strongly reflects the current preferences of users. Lach
`related term is presented to the user via a respective hyper-
`link which can be selected by the user to submit a modified
`query. In one embodiment,the related terms are added to and
`selected from the table so as to guarantee that the modified
`queries will not produce a NULL queryresult.
`
`28 Claims, 10 Drawing Sheets
`
`
`
`PARSE DAILY LOG FILE
`
`
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`ITEMS FOUND > 6
`
`
`
`
`
`
` LAST
` OVERWRITE EXISTING CORRELATION
`
`CORRELATE TERMS SASED
`ON FREQUENCY OF OCCURRENCE
`
`WITHIN SAME QUERY
`
`CREATE DAILY
`RESULTS FILE
`
`MERGE DAILY
`RESULTS FILES FOR
`DAYS
`
`
`
`450
`
`TABLE WITH NEW CORRELATION TABLE
`
`
`
`ELASTIC - EXHIBIT 1026
`ELASTIC - EXHIBIT 1026
`
`

`

`6,006,225
`
`Page 2
`
`OTHER PUBLICATIONS
`
`the NECI Meta Search Engine,
`Abstract of Inquirus,
`Lawrence and Giles, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw ISDN Syst. (Nether-
`lands) vol. 30, No. 1-7, pp. 95-105 (1998).
`Abstract of Facilitating Complex Web Queries Through
`Visual User Interfaces and Query Relaxtion, Li and Shim,
`Computer Networks and ISDN Systems Conference Title:
`Comput. Netw. ISDN Syst. (Netherlands) vol. 30, No. 1-7,
`pp. 149-159 (1998).
`A User—centred Evaluation ofRankingAlgorithms for Inter-
`active Query Expansion, Efthimiadis, Proceedings of the
`16th Annual International ACM SIGIR Conference, Pitts-
`burgh, pp. 146-159 (1993).
`Concept Based Query Expansion, Qtu and Frei, Proceedings
`of the 16th Annual International ACM SIGIR Conference,
`Pittsburgh, pp. 160-169 (1993).
`Improving Retrieval Performance by Relevance Feedback,
`Salton and Buckley, J. of Am. Society for Info. Science
`41(4):288-297 (1990).
`
`Query Expansion Using Domain—Adapted, Weighted The-
`saurus in an Extended Boolean Model, Kwon, Kim and
`Choi, Proceedings of the 3rd International Conterence on
`Information and Knowledge Management (CIKM’94), pp.
`140-146 (1994).
`Browsing Through Querying: Designing for Electronic
`Books, Charoenkitkarn, Tam, Chignell and Golovichinsky,
`at
`the 5th ACM Conference on Hypertext, Seattle, WA
`206-216 (1993).
`A Survey of Information Retrieval and Filtering Methods,
`Faloutsos and Oard, Univ. of Maryland, 22 pages (undated).
`A Corpus Analysis Approach for Automatic Query Expan-
`sion, Gauch and Wang, Proceedings of the 6th International
`Conference on Information and Knowledge Mangement, pp.
`278-284 (1997).
`Discovering Web Acess Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xin and Han, Proceedings of the IEEE Forum on Research
`and Technology Advances in Digital Libraries
`(TEEE
`ADL’98), pp. 19-29 (1998).
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 1 of 10
`
`6,006,225
`
`°°!(GrzIdv-S©0¢) 3SY3AINN-S
`
`
`JIHdVSDOITI!saYSANaS
`
`ASVavVLvdWaldalviay
`0Z¢)DNINNVEDONd-SOly)AWONOULSY-S
`
`
`Z6Z334409-S06SNVOVS-¥
`
`
`F1aVLNOIWWISYYODAYINO
`£EhCEL
`
`wAWr-S|_—=—=—sSOWSOO-L|$07/
`
`
`
`ALIS€3M
`
`
`
`patsGamLLMMe
`
`LES
`
`GEL
`
`
`
`$$300UdNOILOITIS
`
`SES
`
`(W)6ES
`
`cy
`
`WH
`
`wav
`
`S$3908d
`
`NOILVYANIO
`-;Ava
`
`9Auand
`
`01
`
`oa
`
`O64
`
`LOU
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 2 of 10
`
`6,006,225
`
`200
`
`file
`Edit
`View
`Go
`Favorite
`Help
`==> © & Q@ &
`Back Forw...
`Stop Refresh Home Search
`Favorite
`
`Print A ee
`
`
`
`Address|http: //www.amazon.com/book_search Iv]
`
`aniszon.comcom Book Search
`Enter Author and/or Title
`
`Amazon.com Home | Music Search | Your Account
`
`Author:
`
`210
`
`@Exact Name
`
`OlLast, First Name
`
`OStart of Last Name
`
`Title:
`
`LEO
`
`OStart(s) of Title Words
`@Title Word(s)
`OFxact Title
`
` Search Now |
`
`||Clear the Form
`
`Author Search Tips / Title Search Tips
`
`Search by Subject
`
`L240
`
`@Subject Word(s)
`OStart of Subject
`OExact Subject
`
`
`
`Search Now
`Clear the Form
`
`© Start(s) of Subject Word(s)
`
`Subject Search Tips
`
`Other Search Methods:
`ISBN, Publisher/Date, Quick Search
`
`FIG. 2
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 3 of 10
`
`6,006,225
`
`ITO
`
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`title = Snow Crash
`ifems_found = 2
`
`Friday, 13-Feb—98 02:24:11
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search
`PATH_INFO=/ISBN = 0553562614
`
`I20
`
`IIO
`
`J40 Friday, 13-Feb—-98 02:23:52
`
`Friday, 13-Feb-98 06:15:03
`User Identifier = 54730543261
`HTTP_REFERRER= http://www.amazon.com/music_search_page
`PATH_INFO=/music_search
`artist = This and That
`items_found = 0
`
`Friday, 13-Feb-98 10:07:34
`User Identifier = 027385918272
`HTTP_REFERRER= http://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`subject = outdoor trail
`items_found = 22
`
`FIG. F
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 10
`
`6,006,225
`
`BEGIN
`
`470
`
`PARSE DAILY LOG FILE
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`
`ITEMS FOUND > 0
`ON FREQUENCY OF OCCURRENCE WITHIN SAME QUERY
`
`CORRELATE TERMS BASED
`
`420
`
`430
`
`CREATE DAILY
`RESULTS FILE
`
`440
`
`MERGE DAILY
`RESULTS FILES FOR
`
`LAST M DAYS
`
`OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`450
`
`END
`
`FIC. 4
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 5 of 10
`
`6,006,225
`
`sua]N
`
`WLEL
`
`WVYL-S
`
`
`(Sc)ONIZVIE-S
`
`(ZL)ayld-L
`
`(Z1)3SIDYIOXI-S
`(Zz)
`
`NOSTYYO-V
`
`SLYOdS-Sa3LUV“(9¢)Nowvonaa-L
`(Z6)XIN-S(iy)
`
`
`
`(3)oNIZvIa-L(001)ONINIG-L
`
`(7)ASSNH-L
`
`
`
`($9)yoodLNO-s
`
`
`
`(S$)NITNVYI-V
`
`
`
`(7S) Ylvdad—L
`
`(v1)Wal-L
`
`CRS
`
`
`
`
`
`
`
`(¢z)YOOGLNO-S(s9)TIval-S
`
`fyNouyvansv4NOILYOVA-S
`=LTWWYL-L
`(lr)TIVul-S
`
`VGDL
`
`
`
`
`
`NONNA-L(OL)YSNDSVM-V
`
`(8)S3LVA-V
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 6 of 10
`
`6,006,225
`
`sudo]N
`
`GLEL
`
`
`
`(sc)SONIZVIE—-S
`
`(S)9ONIZVIE—-L
`
`(2)
`
`IXI8-S
`
`(Z6)
`
`
`
`(vZ)YOOdLNO-Ss
`
`
`
`(Zt)SL¥OdS-S
`
`NOANA-S
`
`(99)
`
`(2)
`
`(01)
`
`TWUL-S
`
`WWeYl-L
`
`GSUs
`
`O65
`
`CPL
`
`WVYL-S
`
`
`
`(1%)LLauyvo-V
`
`(79)alg—-S
`
`(6)NOILVOVA-S
`XIN-S(17)
`
`
`(9¢)NOILVONGI-L
`
`(¢Z)NOILVOVA-S
`(001)
`YINOVM-V
`SLYOdS-S$
`ONINIG-L
`
`(Z1)3SIOU3OXI-S
`(99)(yo)=ASAMH-LyOOdLNO-S
`
`
`(yS)—alWd3u-L
`
`(Zr)TVYL-S
`
`(v1)TWal-L
`(z)NOSTY¥O-V
`(S)NITINVYI-V
`(8)S3LVA-V
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 7 of 10
`
`6,006,225
`
`Date
`
`Daily Log
`ile
`
`Daily Results
`File
`
`Dates of Query
`Correlation Table
`
`Query Correlation
`Table
`
`1-Feb-98 2-Feb-98
`
`
`F[E|
`
`
`
`
`
`3-Feb-98
`
`4-Feb-98
`
`5-Feb-98
`
`6—-Feb-98
`
`7-Feb—-98
`
`8-Feb-98
`
`670
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of 10
`
`6,006,225
`
`SIO
`
`MULTI-TERM
`QUERY
`?
`
`
`
`760
`
`Terrrea
`
`
`
`
`COMBINE ALL RELATED
`TERMS LISTS
`
`SELECT X TERMS WITH
`HIGHEST VALUES
`
`FIG. /
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 9 of 10
`
`6,006,225
`
`Top 3 Related Terms:
`
`TRAIL — MIX
`
`TRAIL — YUKON _/
`
`TRAIL — BIKE
`
`E10
`
`FIG. EA
`
`Intersecting Terms:
`
`E20
`
`S — BIKE
`S -— SPORTS
`S — VACATION
`
`E30
`QS
`
`Top 3 Related Terms:
`OUTDOOR TRAIL — BIKE
`QUTDOOR TRAIL — SPORTS
`OUTDOOR TRAIL — VACATION
`
`FIG. EB
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 10
`
`6,006,225
`
`GOO
`
`\
`-————
`
`LI le
` View
`
`
`
`
`File
`Edit
`Go
`Favorite
`Help
`
`as ®F & Qa FAaQ
`Back Forw... Stop Refresh Home
`Search
`Favorite
`Print
`Font
`Mail
`
`
`
`Address|http: //www.amazon.com/book_search iv
`
`amazon.com Book Search
`aaa
`
` FIO
`
`
`
`
`
`
`
`
`
`
`
`
`Your Search Results
`
`for: the subject words include “OUTDOOR TRAIL”
`
`Related Query Terms:Click on a link below to narrow your
`search by adding a related term.
`OUTDOOR TRAIL — BIKE
`
`OUTDOOR TRAIL — SPORTS
`
`OUTDOOR TRAIL — VACATION
`
`Top Matches for this search:
`
`* Outdoor & Trail Guide to Yosemite
`
`
`
`
`
`* Outdoor Training on Mountian Bike Trails
`
`
`to See
`Full Results: tne First 100 are shown below.
`more results scroll down and click the “More” button.
`
`The blessed trail: outdoor diary
`
`
`
`
`
`
`Outdoor & Trail Guide to Yosemite
`
`G20
`
`T
` he Great Outdoor Cookbook: Trail Recipies
`
`
`
`

`

`6,006,225
`
`1
`REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`RELATED APPLICATION
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/089,244, filed Jun. 15, 1998, the disclo-
`sure of which is herebyincorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`1. Field of Invention
`
`2
`“RESTAURANTS,” “RECIPIES,” and “FDA”to allow the
`user to refine the query.
`The related query terms are commonly presented to the
`user together with corresponding check boxes that can be
`selectively marked or checked by the user to add terms to the
`query. In some implementations, the related query terms are
`alternatively presented to and selected by the user through
`drop down menusthat are provided on the queryresult page.
`In either case, the user can add additional terms to the query
`and then resubmit the modified query. Using this technique,
`the user can narrow the query result down to a more
`manageable set consisting primarily of relevant items.
`One problem with existing techniques for generating
`related query termsis that the related terms are frequently of
`little or no value to the search refinement process. Another
`problem is that the addition of one or morerelated terms to
`the query sometimes leads to a NULL queryresult. Another
`problem is that the process of parsing the query result items
`to identify frequently used terms consumessignificant pro-
`cessor resources, and can appreciably increase the amount of
`time the user must wait before viewing the query result.
`These and other deficiencies in existing techniques hinder
`the user’s goal of quickly and efficiently locating the most
`relevant items, and can lead to user frustration.
`
`SUMMARYOF THE INVENTION
`
`The present invention addresses these and other problems
`by providing a search refinement system and method for
`generating and displaying related query terms (“related
`terms”). In accordance with the invention, the related terms
`are generating using query term correlation data that is based
`on historical query submissions to the search engine. The
`query term correlation data (“correlation data”) is preferably
`based at least upon the frequencies with which specific terms
`have historically been submitted together within the same
`query. The incorporation of such historical query informa-
`tion into the process tends to produce related termsthat are
`frequently used by other users in combination with the
`submitted query terms, and significantly increases the like-
`lihood that these related terms will be helpful to the search
`refinement process. To further increase the likelihood that
`the related terms will be helpful,
`the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at least one
`match).
`the
`In accordance with one aspect of the invention,
`correlation data is stored in a correlation data structure
`(table, databasc, ctc.) which is used to look up related terms
`in response to query submissions. The data structure is
`preferably generated using an off-line process which parses
`a query log file, but could alternatively be generated and
`updated in real-time as queries are received from users. In
`one embodiment, the data structure is regenerated periodi-
`cally (e.g., once per day)
`from the most recent query
`submissions (¢.g., the last M days of entries in the query
`log), and thus strongly reflects the current tastes of the
`community of users; as a result, the related terms suggested
`by the search engine strongly reflect the current tastes of the
`community. Thus, for example, in the context of a search
`engine of an online merchant, the search engine tends to
`suggest related terms that correspond to the current best-
`selling products.
`In a preferred embodiment, each entry in the data struc-
`ture is in the form of a key term and a corresponding related
`termslist. Each related terms list contains the terms which
`have historically appeared together (in the same query) with
`
`15
`
`,,
`
`This present invention relates to query processing, and
`more specifically relates to techniques for facilitating the
`process of refining search queries.
`2. Description of Related Art
`With the increasing popularity of the Internet and the
`World Wide Web, it is common for on-line users to utilize
`search engines to search the Internet for desired information.
`Manywebsites permit users to perform searchesto identify
`a small number of relevant items among a muchlarger
`domain of items. As an example, several web indexsites
`permit users to search for particular web sites among known
`web sites. Similarly, many on-line merchants, such as
`booksellers, permit users to search for particular products
`among all of the products that can be purchased from the
`merchant. Other on-line services, such as Lexis™ and
`Westlaw™, allow users to search for various articles and
`court opinions.
`In order to perform a search, a user submits a query
`containing one or more query terms. The query may also
`explicitly or implicitly identify a record field or segment to
`be searched, suchastitle, author, or subject classification of
`the item. For example, a user of an on-line bookstore may 3
`submit a qucry containing terms that
`the uscr believes
`appear within the title of a book. A query server program of
`the search engine processes the query to identify any items
`that match the terms of the query. The set of itemsidentified
`by the query server program is referred to as a “query
`result.” In the on-line bookstore example, the query result is
`a set of books whosetitles contain someorall of the query
`terms. In the web indexsile example, the query result is a set
`of web sites or documents. In web-based implementations,
`the query result is typically presented to the user as a
`hypertextual listing of the located items.
`If the scope of the search is large, the query result may
`contain hundreds, thousands or even millions of items.If the
`user is performing the search in order to find a single item
`or a small set of items, conventional approachesto ordering
`the items within the query result often fail to place the
`sought item or items near the top of the query result list. This
`requires the user to read through many other items in the
`query result before reaching the sought item. Certain search
`engines, such as Excite™ and AltaVista™, suggest related
`query terms to the user as a part of the “search refinement”
`process. This allows the user to further refine the query and
`narrow the query result by selecting one or more related
`query terms that more accurately reflect the user’s intended
`request. The related query terms are typically generated by
`the search engine using the contents of the query result, such
`as by identifying the most frequently used terms within the
`located documents. For example, if a user were to submit a
`query on the term “FOOD,” the user may receive several
`thousand items in the query result. The search engine might
`then trace through the contents of someor all of these items
`and present
`the user with related query terms such as
`
`40
`
`45
`
`60
`
`65
`
`

`

`6,006,225
`
`10
`
`15
`
`3
`the respective key term with the highest degree of frequency,
`ignoring unsuccessful query submissions (query submis-
`sions which produced a NULL query result). The data
`structure thus provides an efficient mechanism for looking
`up the related terms for a given query term.
`To generate a set of related termsfor refining a submitted
`query (the “present query”), the related terms list for each
`term in the present query is initially obtained from the
`correlation data structure. If this step produces multiple
`related termslists (as in the case of a multiple-term query),
`the related termslists are preferably combined by taking the
`intersection between these lists (i.c., deleting the terms that
`are not commonto all lists). The related terms which remain
`are terms which have previously appeared, in at least one
`successful query submission,
`in combination with cvery
`term of the present query. Thus, assuming items have not
`been deleted from the database being searched, any of these
`related terms can be individually added to the present query
`while guaranteeing that the modified query will not produce
`a NULL queryresult. To take advantage of this feature, the
`related terms are preferably presented to the user via a user
`interface thal requires the user to add no more than one
`related term per query submission. In other embodiment, the
`related terms are selected and displayed without guarantee-
`ing a successful query result.
`Because the related terms are identified from previously-
`generated correlation data, without the need to parse docu-
`ments or correlate terms, the related terms can be identified
`and presented to the user with little or no added delay.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features will now be described with
`
`reference to the drawings summarized below. These draw-
`ings and the associated description are providedto illustrate ;
`a preferred embodimentof the invention, and notto limit the
`scope of the invention.
`Throughout the drawings, reference numbers are re-used
`to indicate correspondence betweenreferenced elements. In
`addition, the first digit of each reference numberindicates
`the figure in which the elementfirst appears.
`FIG, 1 illustrates a system in which users access website
`information via the Internet, and illustrates the basic website
`components used to implement a search engine which
`operates in accordance with the invention.
`FIG, 2 illustrates a sample book search page of the web
`site.
`
`40
`
`45
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The present invention provides a search refinement sys-
`tem and methodfor generating related query terms (“related
`terms”) using a history of queries submitted to a search
`engine by a community of users. Briefly, the system gener-
`ates query term correlation data whichreflects the frequency
`with which specific terms have previously occurred together
`within the same query. The system uses the query term
`correlation data in combination with the query term(s)
`entered bythe user to recommend additional query termsfor
`refining the query. ‘he incorporation of such historical query
`information into the process tends to produce related terms
`that are frequently used by other users in combination with
`the submitted query terms, and significantly increases the
`likelihood that these related terms will be helpful to the
`search refinementprocess. To further increase the likelihood
`that the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successfulqueryresult (at least one
`match).
`In the preferred embodiment, the query term correlation
`date is regenerated periodically from recent query
`submissions, such as by using the last M daysofentries in
`a query log, and thus heavily reflects the current tastes of
`users. As a result, the related terms suggested by the search
`engine tend to be terms that correspond to the most fre-
`quently searched items during the relevant time period.
`Thus, for example, in the context of a search engine of an
`online merchant, the search engine tends to suggest related
`terms that correspondto the current best-selling products.In
`one embodiment, the technique used to generate the related
`terms and present these terms to the user guarantees that the
`modified query will not produce a NULL queryresult.
`The search refinement methods of the invention may be
`implemented, for example, as part of a web site, an Internet
`site, an on-line services network, a document retrieval
`system, or any other type of computer system that provides
`searching capabilities to a community of users. In addition,
`the method may be combined with other methods for
`suggesting related terms, such as methods which process the
`contents of located documents.
`
`A preferred web-based implementation of the search
`refinement system will now be described with reference to
`FIGS. 1-9. For purposes of illustration,
`the system is
`described herein in the context of a search enginethatis used
`to assist customers of Amazon.com Inc. in locating items
`(e.g., books, CDs,etc.) from an on-line catalog of products.
`Throughout the description, reference will be made to vari-
`ous implementation-specific details of the Amazon.com
`implementation. These details are provided in orderto fully
`illustrate a preferred embodimentof the invention, and not
`to limit
`the scope of the invention. The scope of the
`invention is set forth in the appended claims.
`
`FIG. 3 illustrates sample log entries of a daily query log
`file.
`
`FIG. 4 illustrates the process used to generate the corre-
`lation table of FIG. 1.
`
`FIG. 5A illustrates a sample mapping before a query is
`added.
`
`FIG. 5B illustrates a sample mapping after a query is
`added.
`
`FIG. 6 illustrates a process for generating the correlation
`table from the most recent daily query log files.
`FIG, 7 illustrates a process for sclecting the related query
`terms from the correlation table.
`
`FIG, 8A illustrates a set of related query terms from a
`single-term query.
`FIG. 8B illustrates a set of intersecting terms and a set of
`related query terms from a multiple-terrn query.
`FIG. 9 illustrates a sample search result page of the web
`site.
`
`I. Overview of Web Site and Scarch Engine
`
`60
`
`65
`
`FIG. 1 illustrates the Amazon.com website 130, including
`components used to implement a search engine in accor-
`dance with the invention.
`As it is well knownin the art of Internet commerce, the
`Amazon.com web site includes functionality for allowing
`users to search, browse, and make purchases from an online
`catalog of booktitles, musictitles, and other types of items
`via the Internet 120. Because the catalog contains millions
`of items, it is important that the site provide an efficient
`mechanism for assisting users in locating items.
`
`

`

`6,006,225
`
`6
`5
`result page which contains hypertextual links to the items’
`As illustrated by FIG. 1, the web site 130 includes a web
`server application 131 (“web server”) which processes user
`respective product information pages.
`requests received from user computers 110 via the Internet
`For multiple-term queries,
`the query server 132 effec-
`120. These requests include queries submitted by users to
`tively logically ANDs the query terms together to perform
`search the on-line catalog for products. The web server 131
`the search. For example, if the user enters the terms “JAVA”
`records the user transactions, including query submissions,
`and “PROGRAMMING?”into the title field 220, the query
`server 132 will search for and returnalist of all items that
`within a query log 135. In the embodiment depicted in FIG.
`1, the query log 135 consists of a sequence of daily query log
`have both of these terms within the title. Thus, if any query
`files 135(1)-135(M), each of which represents one day of
`term does not produce a match (referred to herein as a
`transactions.
`“non-matching term”),
`the query will produce a NULL
`query result. Presenting a NULL queryresult to the user can
`cause significant user frustration. To reduce this problem, in
`this event, the user may be presented with a list of items that
`are deemed to be “close matches.” Although the search
`engine described herein logically ANDs the query terms
`together,
`it will be recognized that the invention can be
`applied to search engines that use other methods for pro-
`cessing queries.
`In accordance with the invention, the search engine uses
`the query term correlation data stored in the correlation table
`137 to select the related terms that best match the user’s
`query. The search engine then presents the related terms to
`the user, allowing the user to refine the search and enhance
`discovery of relevant information. The query term correla-
`tion data indicates relationships between query terms, and is
`used to effectively predict query terms that are likely to be
`helpful to the search refinement process. In accordance with
`another aspect of the invention, the correlation table 137
`preferably contains or reflects historical information about
`the frequencies with which specific query terms have
`appeared together within the same query.
`The general format of the correlation table 137 is illus-
`trated in FIG. 1. In the embodiment depicted in FIG. 1 and
`described in detail herein, the correlations between query
`terms are based solely on frequency of occurrence within the
`same query. As described below, other types of query tern
`correlations can additionally be used. In addition, although
`the disclosed implementation uses a table to store the query
`term correlation data, other types of databases can be used.
`As illustrated by FIG. 1, each entry within the correlation
`table 137 (two entries shown) has two primary components:
`a “key” term 140, and a “related terms”list 142 for that key
`term. ‘The related terms list 142 is a list of the N (e.g. 50)
`query terms that have appeared within the same query as the
`keyword with the highest degree of frequency, and is
`ordered according to frequency. For example, the entry for
`the key term COSMOS(ignoring the single-term prefixes,
`which are discussed below) is:
`COSMOS: ASTRONOMY, SAGAN, UNIVERSE,
`indicating that ASTRONOMYhasappeared together with
`COSMOSwith the highest degree of frequency; SAGAN
`has appeared with COSMOSwith the second highest degree
`of frequency, and so on. Each term that appears within the
`related terms list 142 is deemed to be related to the corre-
`
`HTML (Hypertext Markup Language) content which
`includes, among other things, product information pages
`which show and describe the various products.
`The query server 132 includes a related term selection
`process 139 which identifies related query terms based on
`query term correlation data stored in a correlation table 137.
`As depicted in FIG. 1 and described below, the correlation
`table 137 is generated periodically from the M most recent
`daily query log files 135(1)-135(M)using an off-line table
`generation process 136.
`The web server 131, query server 132, table generation
`process 136, and database software run on one or more
`Unix™-based servers and workstations (not shown) of the
`web site 130 although other types of platforms could be
`used. The correlation table 137 is preferably cached in RAM
`(random access memory) on the same physical machine as
`that used to implement the query server 132. To accommo-
`date large numbers of users, they query server 132 and the
`correlation table 137 can be replicated across multiple
`machines. The web site components that are invoked during
`the searching process are collectively referred to herein as a
`“search engine.”
`FIG, 2 illustrates the gencral format of a book scarch page
`200 of the Amazon.com website 130 that can be used to
`
`The web site 130 also includes a query server 132 which
`processes the queries by searching a bibliographic database
`133. The bibliographic database 133 includes information
`about the various products that users may purchase through
`the website 130. This information includes, for cxample, the
`titles, authors, publishers, subject descriptions, and ISBNs
`(International Standard Book Numbers) of booktitles, and
`the titles, artists, labels, and music classifications of music
`titles. The information for each item is arranged within fields
`(such as an “author”field and a “title” field), enabling the
`bibliographic database 133 to be searched on a field-
`restricted basis. The site also includes a database 134 of
`
`10
`
`15
`
`40
`
`45
`
`search the bibliographic database 133 for booktitles. Users
`have access to other search pages that can be used to locate
`music titles and other types of products sold by the on-line
`merchant. The book search page 200 includes author,title,
`and subject fields 210, 220, 240 and associated controls that
`allow the user to initiate fieldresiricted searches for book
`
`titles. Users can perform searches by first typing in the
`desired information into a search field 210, 220, 240 and ;
`then clicking on the appropriate search button 230, 250. The
`term or string of terms submitted to the search engine is
`referred to herein as the “query.” Other areas of the website
`ask the user to submit queries without limiting the terms to
`specific fields.
`Whenthe user submits a query from the book search page
`200 to the web site 130, a the query sever 132 applies the
`query to the bibliographic database 133, taking into account
`any field restrictions within the query. If the query result is
`a single item, the item’s product information page is pre-
`sentedto the user. If the query result includes multiple items,
`the list of items is presented to the user through a query
`
`60
`
`65
`
`sponding key term 140 by virtue of the relatively high
`frequency with which the terms have occurred within the
`same query.
`As further depicted by FIG. 1, each related term and each
`key term 140 preferably includes a single-character field
`prefix which indicates the search field 210, 220, 240 to
`which the term corresponds. These prefixes may, for
`example, be as follows: A=author, T=title, S=subject,
`R=artist, L=label, G=generic. In addition, each related term
`is stored together with a correlation score 146 which,in the
`preferred embodiment, indicates the number of times the
`related term has appeared in combination with the key term
`
`

`

`6,006,225
`
`7
`(within the search fields indicated by their respective field
`prefixes), not counting queries that produced a NULL query
`result.
`
`the related term (including prefix)
`Thus, for example,
`S-ASTRONOMYhasa correlation score of 410 under the
`
`key term T-COSMOS,indicating that four hundred and ten
`“successful” queries were received (during the time period
`to which the table 137 corresponds) which included the
`combination of COSMOS in the title field and
`
`field. Although the field
`ASTRONOMY in the subject
`prefixes and correlation scores 146 carry information which
`is useful to the related terms selection process (as described
`below), such information need not be preserved.
`In operation, when a user submits a query to the website
`130, the web server 131 passes the query to the query server
`132, and the query server applies the query to the biblio-
`graphic database 133. If the number o

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket