throbber
jv
`United States Patent
`6,006,225
`(11)
`[45]
`Date of Patent:
`Bowman et al.
`Dec. 21, 1999
`
`Patent Number:
`
`USO0G006225A00
`
`[54] REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`[75]
`
`Inventors: Dwayne E. Bowman, Woodinville;
`Ruben E. Ortega; Michael L.
`Hamrick, both of Seattle: Joel R.
`Spiegel, Woodinville; Timothy R.
`Kohn, Seattle, all of Wash.
`
`[73] Assignee: Amazon.Com, Seattle, Wash.
`
`[21] Appl. No. 09/145,360
`
`[22]
`
`Filed:
`
`Sep. 1, 1998
`
`Related U.S. Application Data
`Provisional application No. 60/089244_ Jun. 15, 1998.
`[60]
`Int. CL.°....
`[Sl]
`GO06F 17/30
`
`. 1707/5; 707/2; 7O7/4; 707/10
`[52] U.S. Cl.
`.....
`Field of Searely0.cen TOTS, 2, 10, 4
`[58]
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`QuarierDeck Web Page, Downloaded Sep. 9, 1996, hup://
`aracnid.qdeck.com/qdeck/ products/webcompass.
`Towell et al. “Learning Collection Fusion Strategies for
`Information Retrieval”, Proceedings of the 12th Anoual
`Machine Learning Conference, Jul. 1995, pp, 540-548.
`Voorhees et al., “Learning Collection Vusion Strategies”,
`Proceedings of SIGIR "95, Jul. 1995, pp. 172-179.
`Voorheeset al., “The Collection Fusion Problem” Proceed-
`ings of TREC—3, NIST Special Publication 500-225, Apr.
`1995, pp. 95-104,
`Abstract of Generating Advanced Query Interfaces, Lee,
`Srivastava and Vista, Computer Networks and ISDN Sys-
`tems Conference Tile: Comput. Netw, ISDN Syst. (Neth-
`erlands) vol. 30, No. 1-7, pp. 656-657 (1998).
`Abstract ofUsing Combination ofEvidencefor Term Expan-
`sion, Wilkinson, Information Retrieval Research, Proceed-
`ings of the 19h Annual BCS-IRSG Colloquium on IR
`Research (1997),
`
`(List continued on next page.)
`Primary Examiner—Paul R. Linte
`Attorney, Agent, or Pirm—Knobhe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`ee TOSIO
`10/1997 Schuetz ous
`5,675,819
`Aseirch engine is disclosed which suggests related terms to
`the user to allow the user to refine 4 search. The related terms
`. W729
`2/1998 Rubinstein ..
`S.721,897
`FORS
`7/1998 Tukey etal.
`5,787,422
`are generated using query term correlation data which
`» FOF
`S/L898, Rubinstein .....
`5,794,233
`reflects the frequencies with which specific terms have
`
`5.864.845=1/1999 Voorhees et al. TONS
`previously appeared within the same query, The correlation
`S.911,140
`6/1999 Tukey el al...
`. WTS
`data is generated and stored in a look-up table using an
`FOF/LO
`S{13,215
`6/1999 Rubinstein...
`oll-line process which parses a query log file. The table is
`OTHER PUBLICATIONS
`regenerated periodically from the most recent query sub-
`missions (e2., (he last bwo weeks of query submissions), and
`thus strongly reflects the current preferences of users, Each
`related term is presented to the user via 4 respective hyper-
`link which can be selected by the user to submit a modified
`query. In one embodiment, the relatedterms are added to and
`selected from the table so as to guarantee that the modilied
`queries Will not produce a NULLqueryresult.
`
`
`
`Bartell et al., “Automatic Combination of Multiple Ranked
`Retrieval Systems", Proceedings of SIGIR ‘94, Jul, 1994,
`pp. 173-181,
`Belkin etal, “The Effect of Multiple Query Representations
`on Information System Perlormance™ Proceedings of SIGIR
`"93, Jun. 1993, pp. 339-346.
`Shaw et al., “Combination of Multiple Searches”, Proceed-
`ings of TREC-3, Apr. 1995, pp. 105-108.
`
`28 Claims, 10 Drawing Sheets
`
`(BEGIN)
`aFit
`— i
`PARSE DAILY LOG FILE
`
`To EXTRACT QUERY
`
`SUBMISSIONS FOR WHICH
`WEMS FOUND > 0
`
`
`uy
`CORRELATETERMS @ASEI
`ON FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`
`
`| CREATE Daley
`|
`RESULTS FILE
`~440
`
`MERGE DAILY
`RESULTS FILES FOR
`LAST M DAYS:
`
`
`
`_——
`|, OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`tie
`
`ENO
`
`PETITIONERS - EXHIBIT 1026
`PETITIONERS- EXHIBIT 1026
`
`IPR2022-00217
`
`IPR2022-00217
`
`

`

`6,006,225
`Page 2
`
`OTHER PUBLICATIONS
`
`the NECI Meta Search Engine,
`Abstract of JInguirts,
`Lawrence and Giles, Computer Networks and ISDN Sys-
`tems Conference Title: Comput. Netw ISDN Syst. (Nether-
`lands) vol. 30, No. 1-7, pp. 95-105 (1998).
`Abstract of Facilitating Complex Web Queries Through
`Visual User Interfaces and Query Relaxtion, Li and Shim,
`Computer Networks and ISDN Systems Conference Title:
`Comput. Netw. ISDN Syst. (Netherlands) vol. 30, No. 1-7,
`pp. 149-159 (1998),
`A User—centred Evaluation ofRanking Algoritims for Inter-
`active Query Expansion, Etthimiadis, Proceedings of the
`16th Annual International ACM SIGIR Conference, Pitts-
`burgh, pp. 146-159 (1993).
`Concep! Based Query Expansion, Qiu and Frei, Proceedings
`of the 16th Annual International ACM SIGIR Conference,
`Pittsburgh, pp. 160-169 (1993).
`Improving Retrieval Performance by Relevance Feedback,
`Salion and Buckley, J. of Am. Society for Info. Science
`41(4):288-297 (1990).
`
`Query Expansion Using Domain-Adapted, Weighted The-
`saurus invan Extended Boolean Model, Kwon, Kim and
`Choi, Proceedings of the 3rd International Conference on
`Information and Knowledge Management (CIKM‘94), pp.
`140-146 (1994),
`Browsing Through Querying: Designing for Flectranic
`Books, Charoenkitkarn, Tam, Chignell and Golovichinsky,
`at
`the 5th ACM Conference on Hypertext, Seatlle, WA
`206-216 (1993),
`A Survey of Information Retrieval and Filtering Methods,
`Faloutsos and Oard, Univ, of Maryland, 22 pages (undated).
`A Corpus Analysis Approach for Automatic Query Expan-
`sion, Gauch and Wang, Proceedings of the 6th Loternational
`Conference on Information and Knowledge Mangement,pp.
`278-284 (1997).
`Discovering Web Acess Patterns and Trends by Applying
`OLAP and Data Mining Technology on Web Logs, Zaiane,
`Xin and Han, Proceedings of the IEEE Forum on Research
`and ‘Technology Advances
`in Digital Libraries (IEEE
`ADL*98), pp. 19-29 (1998).
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 1 of 10
`
`NIANIS|JIHdvYDONElsti||Y3AUIS||SSAgno||si;LEE||ausa3Mey|peeeheeeSaSSSSESSNy
`
` NOILWYSNI9|wav!||||||||SS390¥dNOWOSTSS|asvavivdWY3L
`
`
` AYaNO||LEE||||ss3008d||—-4
`
`
`FlavLNOlVTaNyODONINNVSDONd-SOly)ANONOYLSY-S||vAWr-S|___——=—=—«SOWSOD=1|$4/|T
`SSSSSSSSSSSe‘—*——————SS>.-_7]|||GFL||||elt:*|(sezidv-s_|(\(eoe) 3su3AINN-Sort||Z6Z334409-SO6ENVOYS-¥|OZ)
`
`
`O3LV13y
`
`6,006,225
`
`
`
`OCs
`
`OLL
`
`LO
`
`
`
`

`

`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 2 of 10
`
`6,006,225
`
`File
`Edit
`View
`Go
`Favorite
`Help
`pea e38 AgaeQg Aa
`Back Forw...
`Stop Refresh Home Search
`Favorite
`Print
`Font
`
`Mail
`
`
`
`Address|http: //www.amozon.com/book_search v7]
`
`amazon.com Book Search
`a—tio,
`
`Amazon.com Home | Music Search | Your Account
`
`
`
`
`Enter Author and/or Title
`
`Author:
`
`@Exact Name
`
`OLast, First Name
`
`OStart of Last Name
`
`210
`
`Title:
`
`LEO
`
`O&Exact Title
`
`@Title Word(s)
`
`OStart(s) of Title Words
`
`
`
`Author Search Tips / Title Search Tips
`
`search by Subject
`
`240
`
`OExact Subject
`| Search Now |
`
`OStart of Subject
`@Subject Word(s)
`| Clearthe Form |
`
`OStart(s) of Subject Word(s)
`
`Subject Search Tips
`
`Other Search Methods:
`ISBN, Publisher/Date, Quick Search
`
`FIG.
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 3 of 10
`
`6,006,225
`
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search_pags
`PATH_INFO=/book_search
`title = Snow Crash
`items_found = 2
`
`Friday, 13-Feb—98 02:24:11
`User Identifier = 29384719287
`HTTP_REFERRER= http://www.amazon.com/book_search
`PATH_INFO=/ISBN = 0553562614
`
`370
`
`S20
`
`530
`
`SAO Friday, 13-Feb-98 02:23:52
`
`Friday, 13-Feb-—98 06:15:03
`User Identifier = 54730543261
`HTTP_REFERRER= http://www.amazon.com/music_search_page
`PATH_INFO=/music_search
`artist = This and That
`items_found = 0
`
`Friday, 13-Feb-98 10:07:34
`User Identifier = 027385918272
`HTTP_REFERRER= hitp://www.amazon.com/book_search_page
`PATH_INFO=/book_search
`subject = outdoortrail
`itams_found = 22
`
`FIG. F
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 10
`
`6,006,225
`
`470
`
`PARSE DAILY LOG FILE
`TO EXTRACT QUERY
`SUBMISSIONS FOR WHICH
`ITEMS FOUND > 0
`
`CREATE DAILY
`RESULTS FILE
`
`440
`
`
`MERGE DAILY
`RESULTS FILES FOR
`LAST M DAYS
`
`
`CORRELATE TERMS BASED
`ON FREQUENCY OF OCCURRENCE
`WITHIN SAME QUERY
`
`
`
`
`
`
`
`
`
`
`
`
`
`OVERWRITE EXISTING CORRELATION
`TABLE WITH NEW CORRELATION TABLE
`
`450
`
`FIG. 4
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 5 of 10
`
`6,006,225
`
`sw4a]N
`
`BLES
`
`NOMNA-L
`
`VGDL/
`
`(1Z)LIaaeav9-V
`
`(£9)qlg—-S
`
`(Z6)XIN-S
`
`(Ss)9ONIZVIE-S
`
`(€z)YOOGLNO-S
`(Zt) S1YOdS-S
`
`(S)9ONIZWIE-L
`(6)NOILVOVA-S
`
`(001)ONINIG~—L
`
`(9¢)NOWVONdI-L
`
`(¢Z)3I8-L
`
`(17) SLYOdS-S
`
`(Sg)TIVYL-S
`(€Z)NOILYOVA-S
`(Ol)Y3NOVM-¥
`
`(Z)Tival-L
`
`(Z1)aSIOY3OX3-S
` (8) SALVA-V(7S)=-aIVd3u—-L
`
`
`($9)4YOoOdLNO-S(yo)=ASSNH—L
`
`
`(71)Tval-L
`
`(1+)WVul-S
`(S)NIMXNVaI-V
`
`WVL-S
`
`(z)NosTYyvo-V
`
`CRS
`
`
`
`
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 6 of 10
`
`6,006,225
`
`GLE1
`
`oeTIVal-S
`
`
`
`suLELINsoe(yz)yooaLno-s(99)jivui-s|(v5)—alvd3u-L
`
`
`
`(79)3xIg-S=(S)NIOMNVYI-V
`
`(se)9NIZzvIa-Srh(z)NoSTayo-¥
`(ZL)SLYOdS-S(Z)qIWal-L(Zr)Tval-s
`
`
`
`
`
`
`
`(Z6)XIN-S(17)S1¥OdS-S(99)yOoodLNO-s
`
`
`-(S)ONIZWI€-L(v2)aeek(Z1)asiod¥3ox3-S
`
`(001)ONINIG=L
`
`NONNA-S
`
`(6)NOILYOVA-S
`
`(01) YaNOVM-V
`
`(€zZ)NOILVOVA-S
`
`(71)TWVel-L
`(8)S3LVA-V
`
`GGD7
`
`06S
`
`
`
`SZP/
`
`
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 7 of 10
`
`6,006,225
`
`Date
`
`Daily Log
`File
`
`Daily Results
`File
`
`Dates of Query
`Correlation Table
`
`Query Correlation
`Table
`
`8—-Feb-98I"
`
`1—Feb-98
`
`2—Feb-98
`
`3—Feb-98
`
`4—Feb-98
`
`5—Feb-98
`
`6—-Feb-98
`
`7—Feb-98
`
`670
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of10
`
`6,006,225
`
`770
`
`FOR EACH TERM
`IN THE QUERY
`
`720
`
`
`
`LOOK UP TERM IN THE
`CORRELATION TABLE
`
`
`
`
`
`[enreru
`
`
`
`
`730
`
`RETRIEVE THE TERM’S
`RELATED TERMS LIST
`
`440
`
`7IO
`
`MULTI-TERM
`QUERY
`?
`
`
`
`760
`
`COMBINE ALL RELATED
`TERMS LISTS
`
`SELECT X TERMS WITH
`HIGHEST VALUES
`
`FIG. 7
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 9 of 10
`
`6,006,225
`
`Top 3 Related Terms:
`
`TRAIL — MIX
`
`TRAIL — YUKON 3H
`
`TRAIL — BIKE
`
`E10
`
`F1G.EA
`
`Intersecting Terms:
`
`620
`
`S — BIKE
`S — SPORTS
`S — VACATION
`
`E30
`Pee
`
`Top 3 Related Terms:
`OUTDOOR TRAIL — BIKE
`OUTDOOR TRAIL — SPORTS
`OUTDOOR TRAIL — VACATION
`
`FIG. EB
`
`

`

`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 10
`
`6,006,225
`
`900
`
`Address|http: //www.amezon.com/book_search 7]
`
`
`
`
`
`
`
`
`
`File
`Edit
`View
`Go
`Faverite
`Help
`
`
`
`== @H & Qa GAeB@
`
`
`Bock Forw..
`Stop Refresh Home Seerch
`Fovorite
`Print
`Font
`Mail
`
`
`
`
`
`Your Search Results
`
`
`
`
`
`Related Query Terms:Click on a link below to narrow your
`search by adding a related term.
`(
`OUTDOOR TRAIL — BIKE
`
`
`
`i OUTDOOR TRAIL — VACATION
`
`
`
`
`
`
` Outdoor & Trail Guide to Yosemite
`
`amazon.com Book Search
`oe
`
`for: the subject words include "OUTDOOR TRAIL"
`
`#10
`
`OUTDOOR TRAIL — SPORTS
`
`Top Matches for this search:
`
`* Outdoor & Trail Guide to Yosemite
`
`* Outdoor Training on Mountian Bike Trails
`
`Full Results: the First 100 are shown below. to See
`more results scroll down and click the “More” button.
`
`The blessed trail: outdoor diary
`
`
`
`
`T h
`
`G20
`
`e Great Outdoor Cookbook: Trail Recipies
`
`FIG.G
`
`

`

`6,006,225
`
`I
`REFINING SEARCH QUERIES BY THE
`SUGGESTION OF CORRELATED TERMS
`FROM PRIOR SEARCHES
`
`RELATED APPLICATION
`
`This application claims the benefit of U. S. Provisional
`Application No. 60/089,244, filed Jun. 15, 1998, the disela-
`sure of which is hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of Invention
`
`aq
`
`2
`“RESTAURANTS,” “RECIPIES,” and “FDA”toallow the
`user to reline the query.
`The related query terms are commonly presented to the
`user togeiher with corresponding check boxes that can he
`selectively marked or checked by the user to add terms to the
`query. [In some implementations, the related query terms are
`alternatively presented to and selected by the user through
`drop down menusthat are provided on the query result page.
`In either case, the user can add additional terms to the query
`aod then resubmit the modified query. Using this technique,
`the user can narrow the query result down to a more
`manageable set consisting primarily of relevant items.
`One problem with existing techniques for generating
`related query termsis that the related termsare frequently of
`little or no value to the search refinement process, Another
`problem is that the addition of one or more related terms to
`the query sometimesleacls lo a NULL query result. Another
`problem is thal the process of parsing the query resull items
`to identify frequently used terms consumes significant pro-
`cessor resources, and can appreciably increase the amount of
`lime the user must wait before viewing the query result.
`These and other deficiencies in existing techniques hinder
`the user’s goal of quickly and efficiently locating the most
`relevant items, and cao lead to user frustration,
`
`SUMMARY OF‘THE INVENTION
`
`The present invention addresses these and other problems
`by providing a search refinement system and method for
`generating and displaying related query terms (“related
`terms”). In accordance with the invention, the related terms
`are generating using query term correlation data thal is based
`on historical query submissions to the search engine, The
`query lerm correlation data (“correlation cata”) is preferably
`based at least upon the frequencies with which specific terms
`have historically been submitted together within the same
`query. The incorporation of such historical query inlorma-
`lion into the process tends to produce related terms that are
`frequently used by other users in combination with the
`submitted query terms, and significantly increases the like-
`lihood that these related terms will be helpful to the search
`refinement process. To further increase the likelihood that
`the related terms will be helpful,
`the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at least one
`match).
`the
`In accordance with one aspect of the invention,
`correlation data is stored in a correlation data structure
`(table, database, ete.) which is used to look up related terms
`in response to query submissions. The data structure is
`preferably generated using an off-line process which parses
`a query log file, but could alternatively be generated and
`updated in real-Lime as queries are received from users. Ip
`one embodiment, the data structure is regenerated periodi-
`cally (e.g. once per day)
`from the most
`recent query
`submussions (e.g., the last M days of entries in the query
`log), and thus strongly reflects the current lasles of the
`community ofusers; as a result, the related terms suggested
`by the search engine strongly reflect the current tastes of the
`community. Thus, for example,
`in the context of a search
`engine of an online merchant, the search engine tends to
`suggest related terms that correspond to the current best-
`selling products.
`In a preferred embodiment, each entry in the data struc-
`ture is in the form of a key term and a corresponding related
`termslist. Each related terms list contains the terms which
`have historically appeared together (in the same query) with
`
`invention relates to query processing, and
`This present
`more specifically relates to techniques for facilitating the
`process ofrefining search queries,
`2, Description of Related Art
`With the increasing popularity of the [nternet and the
`World Wide Web, it is common for on-line users to ulilize
`search engines to search the Internet for desired information.
`Many websites permit users to perform searches to identify
`a small number of relevant
`items among a much larger
`domain of items. As an example, several web index sites
`permit users to search for particular web siles among known
`web sites. Similarly, many on-line merchants, such as
`booksellers, permit users to search for particular products
`among, all of the products thal can be purchased from the
`merchant. Other on-line services, such as Lexis™ and
`Westlaw™), allow users to search for various articles and
`court opinions.
`In order to perform a search, a user submils a query
`containing one or more query terms. The query may also
`explicitly or implicitly identify a record field or segment to
`be searched, such as title, author, or subject classification of
`the item. For example, a user of an on-line bookstore may 2
`submit a query centaining terms that
`the user believes
`appear within the title of a book. A query server program of
`the search engine processes the query to identify any items
`that match the terms of the query. The set of items identified
`by the query server programis referred to as a “query
`result.” In the on-line bookstore example, the query result is
`a sel of books whose litles contain someor all of the query
`terms. In the web index site example, ihe query resull is a set
`of web sites or documents. In web-based implementations,
`the query result
`is typically presenied to the user as a
`hypertextual listing of the located items.
`If the scope of the search is large, the query result may
`contain hundreds, thousands or even millions of items. Ifthe
`user is performing the search in order to find a single item
`ora small set of items, conventional approaches to ordering
`the items within the query result often fail
`to place the
`sought item or items near the top of the query result list. This
`requires the user lo read through many other items in the
`query result before reaching the sought item. Certain search
`engines, such as Excite™ and AlltaVista™, suggest related
`query terms to the user as a part of the “search refinement”
`process. This allows the user to further refine the query and
`narrow the query result by selecting one or more related
`query terms that more accurately reflect the user's intended
`request. The related query terms are typically generated by
`ihe search engine using the contents of the query result, such
`as by identifying the most frequently used terms within the
`located documents. For example, if a user were to submit a
`query on the term “FOOD,” the user may receive several
`thousand items in the query result. The search engine might
`then tracethrough the contents of some or all of these items
`and present
`the user with related query terms such as
`
`40
`
`50)
`
`55
`
`6
`
`65
`
`

`

`6,006,225
`
`3
`the respective key term with the highest degree of frequency,
`ignoring unsuccessful query submissions (query submis-
`sions which produced a NULI. query result), The data
`structure thus provides an efficient mechanism for looking
`up the related terms for a grven query term.
`‘To generate a set ofrelated termsfor refining a submitied
`query (the “present query”), the related terms List for each
`term in the present query is initially obtained from the
`correlation data structure. If this step produces multiple
`related termslists (as in the case of a multiple-term query),
`the related terms lists are preferably combined by taking the
`intersection between these lists (i.¢., deleting the terms that
`are not common to all lists). The related terms which remain
`are terms which have previously appeared,
`in at least one
`successful query submission,
`in combination with every
`term ofthe present query. Thus, assuming items have not
`been deleted from the database being searched, any ofthese
`related terms can be individually added to the present query
`while guaranteeing that the modified query will not produce
`a NULL query resull. To take advantage ofthis feature, the
`related terms are preferably presented to the user via a user
`interface that requires the user to add no more than one
`related term per query submission. In other embodiment, the
`related terms are selected anddisplayed wilhout guarantee-
`ing a successful query result.
`Because the related terms are identified from previously-
`generated correlation data, without the need fo parse docu-
`menis or correlate terms, the related terms can be identified
`and presented to the user with little or no added delay,
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`2Q
`
`These and other features will now be described with
`reference to the drawings summarized below, These draw-
`ings and the associated description are provided to illustrate;
`a preferred embodiment of the invention, and not to limit the
`scope of the invention.
`Throughout the drawings, reference numbers are re-used
`to indicate correspondence between referenced elements. In
`addition, the first digit of each reference number indicates
`the figure in which the element first appears.
`FIG. 1 illustrates a system in which users access website
`information via the Internet, and illustrates the basic website
`components used to implement a search engine which
`operates in accordance with the invention,
`FIG, 2 illustrates a sample book search page of the web
`sile.
`
`40
`
`FIG. 3 illustrates sample log entries of a daily query log
`file.
`
`FIG. 4 illustrates the process used to generate the corre-
`lation table of PIG. 1.
`
`PIG, 5A illustrates a sample mapping before a query is
`added.
`
`PIG, 5B illustrates a sample mapping aller a query is
`added.
`
`FIG. 6 illustrates a process for generating the correlation
`table from the most recent daily query log files.
`FIG, 7 illustrates a process for selecting the related query
`lerms from the correlation table.
`
`FIG. SA illustrates a set of related query terms from a
`single-lerm query.
`FIG. 8B illustrates a set of intersecting terms and a set of
`related query terms from a multiple-terrn query.
`FIG. 9 illustrates a sample search result page of the web
`site.
`
`SU)
`
`55
`
`60
`
`65
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The present invention provides a search refinement sys-
`tem and method for generating related query terms (“related
`terms”) using a history of querics submitted to a search
`cogine by a community of users. Briefly, the system gener-
`ales query term correlation data which reflects the frequency
`with whichspecific terms have previously occurred together
`within the same query. The system uses the query term
`correlation data in combination with the query term(s)
`entered by the user to recommend additional query terms for
`refining the query. The incorporation of such historical query
`information into the process tends to produce related terms
`ihal are frequently used by other users in combination wilh
`the submitted query terms, and signilicantly increases the
`likelihood that these related terms will be helpful to the
`search refinement process. To further increase the likelihood
`that the related terms will be helpful, the correlation data is
`preferably generated only from those historical query sub-
`missions that produced a successful query result (at leasi one
`match).
`In the preferred embodiment, the query lerm correlation
`date is regenerated periodically from recent query
`submissions, such as by using the last M days of entries in
`a query log, and thus heavily reflects the current tastes of
`users. As a result, the related terms suggested by the search
`engine tend to be terms that correspond to the most [re-
`quently searched items during the relevant
`time period,
`Thus, for example, in the context of a search engine af an
`online merchant, the search engine tends to suggest related
`terms that correspond to the current best-selling products. In
`one embodiment, the technique used to generate the related
`terms and present these terms to the user guaranices that the
`modified query will not produce a NULL query result.
`‘The search relinement methods of the invention may be
`implemented, for example, as part of a web site, an Internet
`sile, an on-line services network,
`a document
`retrieval
`sysiem, or any other type of compuler system that provides
`searching capabilities to a community of users. In addition,
`the method may be combined with other methods for
`suggesting related terms, such as methods which process the
`contents of located documenis.
`
`A preferred web-based implementation of the search
`refinement system will now be described with reference to
`FIGS. 1-9. For purposes of illustration,
`the system is
`described herein in the context of a search engine that is used
`lo assist customers of Amazon.com Inc.
`in locating items
`(e.g., books, CDs, etc.) from an on-line catalog of products.
`Throughout the description, reference will be made to vari-
`ous implementation-specific details of the Amazon.com
`implementation, These details are providedin order to fully
`illustrate a preferred embodiment of the invention, and not
`to limit
`the scope of ihe invention. The scope of the
`invention is set forth in the appended claims.
`
`L. Overview of Web Site and Search Engine
`FIG. Lillustrates the Amazon.com web site 130, including
`components used to implement a search engine in aceor-
`dance with the invention.
`As it is well known in the art of Internet commerce, the
`Amazon.com web site includes functionality for allowing
`users to search, browse, and make purchases from an online
`catalog of book titles, music titles, and other types of items
`via the Internet 120. Because the catalog contains millions
`of items,
`it is important that the site provide an efficient
`mechanismfor assisting users in locating items.
`
`

`

`6,006,225
`
`5
`Asillustrated by FIG. 1, the website 130 includes a web
`server application 131 (“web server") which processes user
`requests received from user computers 110 via the Internet
`120. These requests include queries submitied by users to
`search the on-line catalog for products. The web server 131
`records the user transactions, including query submissions,
`within a query log 135. In the embodiment depicted in FIG,
`1, the query log 135 consists of a sequence of daily query log
`files 135(1)}-135(M), each of which represents one day of
`transactions.
`
`The website 130 also includes a query server 132 which
`processes the queries by searching a bibliographic database
`133, The bibliographic database 133 includes information
`about the various products thal users may purchase through
`the web site 130, This information includes, for example, the
`litles, authors, publishers, subject descriptions, and ISBNs
`(International Standard Book Numbers) of book titles, and
`the titles, artists, labels, and music classifications of music
`utles. The information lor cach item is arranged within fields
`(such as an “author” field and a “title” field), enabling the
`bibliographic database 133 to be searched on a
`field-
`restricted basis. The site also includes a database 134 of
`HTML (Hypertext Markup Language) content which
`includes, among other things, product
`information pages
`which show and describe the various products.
`The query server 132 includes a related term selection
`process 139 which identifies related query terms based on
`query term correlation data stored in a correlation table 137.
`As depicted in FIG, 1 and described below, ihe correlation
`table 137 is generated periodically from the M most recent
`daily query log files 135(1)-135(M) using an off-line table
`generation process 136.
`The web server 131, query server 132, table generation
`process 136, and database software run on one or more
`Unix'™-based servers and workstations (not shown) of the
`web site 130 although other types of platforms could be
`used. The correlation table 137 is preferably cached inRAM
`(random access memory) on the same physical machine as
`that used to implement the query server 132. To accommo-
`date large numbers of users, they query server 132 and the
`correlation table 137 can be replicated across. multiple
`machines. The web site components that are invoked during
`the searching process are collectively referred to herein as a
`“search engine.”
`FIG. 2 illustrates the general format of a book search page
`200 of the Amazon.com website 130 that can be used to
`searchthe bibliographic database 133 for book titles. Users
`have access to other search pages thal can be used to locate
`music titles and other types of products sold by the on-line
`merchant. The book search page 200 includes author, lille,
`aod subjectfields 210, 220, 240 and associated controls that
`allow the user to initiate ficldrestricted searches for book
`lites. Users can perform searches by first
`typing in the
`desired information into a search field 210, 220, 240 and
`then clicking on the appropriate search button 230, 250. The
`term or string of terms submitted to the search engine is
`referred to herein as the “query.” Other areas of the web site
`ask the user to submit queries without limiting the terms to
`specific fields.
`When the user submits a query from the book search page
`200io the web site 130, a the query sever 132 applies the
`query to the bibliographic database 133, raking into account
`any field restrictions within the query. If the query result is
`a single item, the ilem’s produc! information page is pre-
`sented to the user. If the query result includes multiple items,
`the list of items is presented to the user through a query
`
`20
`
`45
`
`SU)
`
`55
`
`60
`
`65
`
`6
`result page which contains hypertextual links to the items’
`respective product information pages.
`For mulliple-lerm queries, the query server 132 effec-
`tively logically ANDs the query terms together to perform
`the search. For example, if the user enters the terms “JAVA”
`and "PROGRAMMING" into the title field 220, the query
`server 132 will search for and return a list of all ttems that
`have both of these terms within the title. Thus, if any query
`term does nat produce a match (referred to herein as a
`“pon-matching term”),
`the query will produce a NULL
`query result. Presenting a NULL query result to the user can
`cause significant user rustration. To reduce this problem, in
`this event, the user may be presented with a list of items that
`are deemed to be “close matches,” Although the search
`engine described herein logically ANDs the query terms
`together,
`it will be recognized that
`the invention can be
`applied to search engines that use other methods for pro-
`cessing queries.
`Tn accordance with the invention, the search engine uses
`the query term correlation data stored in the correlation table
`137 to select the related terms that best match the user's
`query. The search engine then presents the related terms to
`the user, allowing the user lo refine the search and enhance
`discovery of relevant information, The query term correla-
`tion data indicates relationships between query terms, andis
`used to effectively predict query terms thai are likely to be
`helpful to the search refinement process. In accordance with
`another aspect of the invention, the correlation table 137
`preferably contains or reflects historical information about
`the frequencies with which specific query terms have
`appeared together within the same query.
`‘The general format of ithe correlation table 137 is illus-
`trated in FIG. L. In the embodiment depicted in FIG. 1 and
`described in detail herein, the correlations between query
`terms are based solely on frequency of occurrence within the
`same query. As described below, other types of query tern
`correlations can additionally be used. In addition, although
`the disclosed implementation uses a table to store the query
`term correlation data, other types of databases can be used,
`As illustrated by PIG. 1, each entry within the correlation
`table 137 (two entries shown) has two primary components:
`a“key” term 140, and a “related terms”list 142 for that key
`term, The related termslist 142 is a list of the N (e.g. 50)
`query terms that have appeared within the same query as the
`keyword with the highest degree of frequency, and is
`ordered according to frequency, Por example, the entry lor
`the key term COSMOS(ignoring the single-lerm prefixes,
`which are discussed below) is:
`COSMOS: ASTRONOMY, SAGAN, UNIVERSE,
`indicating that ASTRONOMY has appeared together wilh
`COSMOS with the highest degree of frequency; SAGAN
`has appeared with COSMOSwith ihe second highest degree
`of frequency, and so on. Each term that appears within the
`related terms list 142 is deemedto be related to the corre-
`sponding key term 140 by virtue of the relatively high
`frequency with which the terms have oecurred within the
`same query,
`As further depicted by FIG. 1, cach related term and each
`key term 140 preferably includes a single-charaeter field
`prefix which indicates the search ficld 210, 220, 240 to
`which the term corresponds, These prelixes may,
`for
`example, be as follows; A=author, T'=tithe, S=subject,
`R=arlist, L=label, G=generic. In addition, each related term
`is stored together with a correlation score 146 which, in the
`preferred embodiment,
`indicates the number of times the
`related term has appeared in combination with the key term
`
`

`

`6,006,225
`
`20)
`
`2
`
`40
`
`7
`(within the search fields indicated by their respective field
`prelixes), nol counting queries that produced a NULL query
`resull.
`the related term (including prefix)
`for example,
`Thus,
`5-ASTRONOMY has a correlation score of 410 under the
`key term 1-COSMOS,indicating that four hundred and ten
`“successful” queries were received (during the time period
`to which the table 137 corresponds) which included the
`combination of COSMOS in the tithe
`field and
`ASTRONOMYin the subject
`field. Although the field
`prefixes and correlation scores 146 carry information which
`is useful to the related terms selection process (as described
`below), such information need not be preserved,
`In opera

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket