`Kurtzman, II
`
`US006044376A
`[11] Patent Number:
`[45] Date 0f Patent:
`
`6,044,376
`Mar. 28, 2000
`
`[54] CONTENT STREAM ANALYSIS
`
`5,948,061
`
`9/1999 Merriman et al. .................... .. 709/219
`
`[75] Inventor: Stephen J. Kurtzman, II, San Jose,
`Calif
`
`[73] Assignee: IMGIS, Inc., Cupertino, Calif.
`
`Appl. No.: 08/847,778
`Filed:
`Apr. 24, 1997
`
`l
`[
`l
`[
`Int. C1.7 .................................................... ..G06F 17/30
`[51]
`[5 ] US. Cl. .......................... .. 707/102; 707/10; 707/100;
`707/103; 345/327
`[58] Field of Search ................................ .. 707/4, 10, 100,
`707/102; 5; 6; 14; 345/327; 705/10; 395/226;
`455/42
`
`[56]
`
`References Cited
`
`Us‘ PATENT DOCUMENTS
`1/1994 Pedersen et al. ......................... .. 707/4
`5,278,980
`5,692,132 11/1997 Hogan .... ..
`395/227
`5,696,965 12/1997 Dedrick - - - - - - - - - - - -
`- - - - -- 707/10
`5,710,887
`1/1998 Chelliah et al. .
`395/226
`5,712,979
`1/1998 Graber et al.
`.. 395/200.11
`5,717,860
`2/1998 Graber et al. .
`.. 395/200.12
`5,717,923
`2/1998 Dedrick ................................. .. 707/102
`5,721,827
`2/1998 Logan etal. ..................... .. 395/200.47
`5,724,424
`3/1998 GllffOId ............................ .. 380/24
`5,727,156
`3/1998 Herr-Hoyman et al.
`395/200.49
`5,737,619
`4/1998 Judson ................... ..
`395/761
`5,740,549
`4/1998 Reilly
`395/214
`5,751,956
`5/1998 Kirsch
`-- 395/200-33
`577577917
`5/1998 Rose et a1
`380/25
`577817904
`7/1998 Oren et a1‘
`707/100
`5,794,235
`8/1998 Chess ......... ..
`707/5
`5,802,515
`9/1998 Adar et al.
`707/5
`5,835,087 11/1998 HerZ et al.
`345/327
`5,848,396 12/1998 Gerace . . . . .
`. . . . .. 705/10
`
`FOREIGN PATENT DOCUMENTS
`0 749 081 A1 12/1996 European Pat. Off. ...... .. G06F 17/60
`WO 97/21183
`6/1997 WIPO ......................... .. G06F 151/00
`
`OTHER PUBLICATIONS
`
`Article by Ellis Booker entitled: “Seeing a Gap, A Palo Alto
`Startup Will Debut Advertising Server for the Net” pub
`lished by Web Week, Vol. 2, Issue 2 in Feb. 1996.
`Article by Bob Metcalfe entitled: “From the Ether”, Info
`World, V18 issue 3, Aug. 12, 1996.
`NetGravity Announces AdServer 2.0, Oct. 14, 1996, pub
`lished at h?P1//WWW-n9tgravity-99m
`Article titled “Internet access: Internet marketing revolution
`begins in the US. this Sep.”: Hyper Net Offering on Dec.
`1996 EDGE: Work—group Computer Report, V7 N316.
`Article by Youji Kohda et al. entitled: “Ubiquitous adver
`tising on the
`Merging advertisement on the
`browser” published by Computer Networks and ISDN Sys
`terns, Vol- 28, NO- 11, May 1996- pp- 1493—1499
`Declaration of DWight Allen Merriman submitted With 37
`CFR 1.131 petition during prosecution of Us. Patent No.
`5;948;061_
`
`Primary Examiner—Wayne Amsbury
`Assistant Examiner_Thuy Pardo
`Attorney; Agent; Or Firm_FenWiCk & West LLP
`
`[57]
`
`ABSTRACT
`
`Content stream analysis is a user pro?ling technique that
`generates a user pro?le based on the content ?les selected
`and vieWed by a user. This user pro?le can then used to help
`select an advertisement or other media presentation to be
`Shown to the user
`'
`
`10 Claims, 5 Drawing Sheets
`
`I 670
`
`I 620
`
`CRE A TE AD
`EEA TURE VECTORS
`l
`CREA TE CON TENT
`F E A TURE VECTORS
`i
`DETERMINE
`SIMILARI TY
`MEASURES
`I 640
`l
`MUL TIPL Y SIMILARI TY
`MEASURES 5)’
`DE CA Y E A C TOR
`i
`SUM SIMILARITY
`ME A SURE S
`
`I 630
`
`I 650
`
`1
`
`
`
`U.S. Patent
`
`5
`
`6,044,376
`
`mQ233mSE55
`
`
`u.EQEmo<uxmQ. o.EQEmom.oxoQ\mE.:9:8:
`
`_at
`
`
`
`méeaEé%.05MmckcogqEm:\Co.Em>SbeQ\Qm<O.m.tOQ.m.mm.MPERQm.T\EAR
`
`
`
`
`
`Q9
`
`?.a.Em§\
`
`gmimmSE
`
`53%5.5.5,‘
`
`6923:Ex
`
`2
`
`
`
`
`
`
`U.S. Patent
`
`Mar. 28,2000
`
`Sheet 2 0f5
`
`6,044,376
`
`f200
`
`usER
`
`202
`1 DISPLA Y
`
`204
`|
`INPUT f
`DEVICES
`
`2081
`
`270
`1
`
`CPU
`
`MOOEM/
`NETWORK
`
`2201
`
`WWW
`
`f 7 70
`
`WEBSITE
`SERVER
`
`I
`WEBSITE
`CORPUS
`/
`230
`
`700 1 |
`AFFINITY
`sER VER
`1
`AD
`BANK
`
`12O\
`
`WORKING
`MEMORY
`
`206
`APPLICA UO/v f
`MEMORY
`(BROWSER)
`
`USER SYSTEM
`
`WEBSITE SYSTEM
`
`F240
`WORKING
`MEMORY
`
`F 242
`)
`APPLICA HUN
`MEMORY
`AFFINI TY
`SERVER
`INSTRUCTIONS
`/
`f
`246
`
`FIG. 2
`
`3
`
`
`
`U.S. Patent
`
`Mar. 28,2000
`
`Sheet 3 0f 5
`
`6,044,376
`
`770
`WEBSITE f
`SERVER
`246
`*
`AFFINITY f
`SERVER
`INSTRUCTIONS
`
`A 340
`
`330
`HTTP
`TCP IR f
`/
`
`SOCKET f
`+
`700
`AFFINITY f
`
`770
`+
`coNTENT f
`5 TREA M
`ANAL VSIS
`
`f350
`
`"
`NE TW0RK
`HARD+WARE
`200
`USER f
`
`FIG. 3
`
`200
`IISER I
`4701 / \ f 430
`CURRENT
`0 YNAMICALL Y
`PA GE
`GENERA TED
`PAGE
`
`\ / 770
`
`_
`
`WEBSITE f
`SERVER
`I
`
`‘
`
`700
`AFFINITY f
`SERVER
`
`NEW
`PAGE
`/
`420
`
`FIG. 4
`
`4
`
`
`
`U.S. Patent
`
`Mar. 28,2000
`
`Sheet 4 0f5
`
`6,044,376
`
`Sm
`
`Rm
`
`Qhm
`
`ova
`
`Rm
`
`90
`
`3
`
`
`
` mqmqgQ«S93.\.Q\<.\.EZN35K>E269Kn<.§>50
`
`§RIQkVGWNQ
`
`
`
`meE©\<.\..Ev§><O:.\
`
`QQKmQk.\b,.w.\.Q
`
`WQQQE
`
`OZNEEEmxin.\
`
`mmsqmuomm
`
`m2.§Emm
`
`asmm§,m58E
`
`Em
`
`Rm
`
`QM6
`
`9%
`
`anS:E$.29
`
`:Em-q%§\q%§EEE98.:
`§<Sm>m§.q
`
`.9:§\§<0.58E
`
`mat
`
`
`
`:Em-mE§\qm©gto5..
`
`Qmm
`
`9\E5&6
`
`MQE8.:E2_\m..\
`
`
`
`MQEEsmm»:«GkK20¢>39EEmu
`
`0.33&1”:MQ
`
`\CExvQ.§<~m
`
`Baum.\m§\
`
`\C\.k_\d.\<.\.mxED#3
`
`kegQ«CxVQMQxm.
`
`mmmsm.\m§\
`
`\CE§.Em33%
`
`mudsmis
`
`©GE
`
`Sm
`
`qwm
`
`Qhm,
`
`own
`
`0.:Nmumk
`
`m3
`
`.§.\.m6mE
`
`\<ESWKZE>50
`
`MZNSQEMQ
`
`
`
`mmfi\.\<.FE_\
`
`EmsEQ.\REmmmm
`
`
`
`mmzommmm\<.\.
`
`
`
`MFR.\.\<TE<QR
`
`mGE
`
`5
`
`
`
`
`
`
`U.S. Patent
`
`Mar. 28,2000
`
`Sheet 5 0f5
`
`6,044,376
`
`(
`
`)
`
`I 702
`
`CON vER I ADS
`IN IO INDIVIDUAL
`WORDS
`+
`DISCARD H IML
`FORMA IIINC IA OS
`+
`DISCARD STOP
`WORDS
`+
`APPL Y SIEMMINC
`PROCEDURE
`+
`DETERMINE
`FREOUENCIES OF
`EA CH WORD/WORD—SIEM
`+
`I 712
`CREA IE MUL II-
`DIMENSIONAL vEC IORS
`
`I 704
`
`I 706
`
`I 708
`
`I 770
`
`720
`I
`
`RECEIVE SI IE
`CORPUS
`722
`I
`CONVERT CON IENI I
`FILES IN TO
`INDI VIDUAL WORDS
`I
`DISCARD HIML
`FORMA IIINC TA CS
`+
`DISCARD SIOP
`WORDS
`+
`APPL Y SIEMMINC
`PROCEDURE
`I
`DETERMINE I, 0F
`FILES EACH
`WORD/WORD—SIEM
`OCCURS
`
`I 724
`
`I 726
`
`I 728
`
`I 730
`
`FREQUENCY PAIRS
`
`MODIFY
`AD
`vEC IOR?
`
`USE WORD FREQUENCY
`SIA IISIICS IO
`MODIFY AD
`F E A TURE VE C TOR’
`
`\
`
`FIG. 7
`
`J
`
`6
`
`
`
`6,044,376
`
`1
`CONTENT STREAM ANALYSIS
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`1. Field of the Invention
`This invention relates to a method of selecting an adver
`tisement to be shoWn to a user based on the content ?les
`selected and vieWed by a user. More particularly, this
`invention relates to determining an af?nity measure betWeen
`an advertisement and a set of content ?les.
`2. Background of the Invention
`Product advertisement in media such as newspaper and
`television have the advantage of reaching many people. At
`the same time, these forms of advertisement are indiscrimi
`nate and may reach many people Who are not interested in
`the product advertised.
`An advertisement is more effective When it can be tar
`geted to a speci?c market that is more likely to be interested
`in the product advertised. For example, advertisements for
`?shing equipment Will be more effective When placed in a
`?shing magaZine.
`On the World-Wide Web (WWW), advertisers can target
`speci?c markets With more discrimination than other media.
`The manner in Which content is presented on the WWW
`25
`means that advertisers can reach increasingly Well-de?ned
`segments of the market. For example, a high percentage of
`people Who access a stock quotes WWW page may be
`interested in a stock broker. A stock broker Who places an
`advertisement on this WWW page mall reach a smaller
`group of people, but a much higher percentage of this group
`Will be potential customers. This is in stark contrast to other
`media such as neWspaper and television, in Which the target
`market may only be a small percentage of the total market
`reached.
`Other media, including emerging and developing tech
`nologies such as on-demand television, Will also give adver
`tisers similar ability to target speci?c markets.
`To take advantage of this ability to target speci?c markets
`on the WWW, advertisers often estimate a user’s interests
`using a variety of pro?ling techniques. These pro?ling
`techniques can help an advertiser to select an advertisement
`to present to the user. Current pro?ling techniques use a
`combination of demographic, geographic, psychographic,
`collaborative ?ltering, digital identi?cation, and hypertext
`transfer protocol (HTTP) information. HoWever, these cur
`rent techniques have met only With limited success.
`What is needed is a more sophisticated pro?ling technique
`for generating a more useful user pro?le. This more useful
`user pro?le Would be valuable in selecting an advertisement
`to be shoWn to the user.
`
`35
`
`45
`
`OBJECTS AND SUMMARY OF THE
`INVENTION
`
`Accordingly, an object of the invention is to provide a
`more sophisticated pro?ling technique for generating a more
`useful user pro?le.
`Afurther object of the invention is to use this user pro?le
`to help select an advertisement or other media presentation
`to be shoWn to the user.
`These and other objects of the invention are achieved by
`using the actual content ?les accessed and vieWed by the
`user. These content ?les may be used alone or in combina
`tion With the other elements knoWn in the prior art to help
`select an advertisement or other media presentation to be
`shoWn to the user. This selection process is performed by an
`af?nity server.
`
`55
`
`65
`
`2
`First, the af?nity server receives both the content ?les and
`the available advertisements. Second, the advertisements are
`compactly represented as advertisement feature vectors. In
`one example, advertisement feature vectors are multi
`dimensional vectors comprised of individual Words mapped
`to their frequency of occurrence. The advertisement feature
`vectors may be modi?ed by Weighting the importance of
`each Word in the context of the Website corpus.
`Next, a content stream including a sequence of one or
`more pages selected and vieWed by the user and including
`content data is also compactly represented in a sequence of
`content feature vectors.
`Lastly, the af?nity is calculated. This is done by calcu
`lating similarity measures betWeen each advertisement and
`the content stream. An affinity measure is obtained by
`combining the similarities. This affinity measure is then used
`to help select an advertisement to be shoWn to a user.
`The method described by this invention can also be
`applied to user-feedback media other than the WWW, such
`as broadcast television or interactive television. For
`example, content streams can be created from the television
`program content, such as re?ected in closed caption text,
`length of time vieWed, and hoW recently the shoW Was
`vieWed. These content streams can then be used in the
`method described above to select a commercial to be shoWn
`to the vieWer. The method described can also target material
`other than advertising, such as entertainment, education, and
`instructional materials.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 shoWs a conceptual vieW of content stream analy
`sis.
`FIG. 2 shoWs a schematic of a user and a computer
`connected to a Website server Which contains the content
`stream analysis capability.
`FIG. 3 shoWs a schematic of hoW the content stream is
`directed.
`FIG. 4 shoWs a schematic of hoW content stream is
`performed for a dynamically generated page.
`FIG. 5 shoWs a ?oWchart of content stream analysis
`FIG. 6 shoWs a ?oWchart of determining an af?nity
`measure.
`FIG. 7 shoWs a ?oWchart of creating an advertisement
`feature vector.
`FIG. 8 shoWs a sample advertisement feature vector.
`FIG. 9 shoWs a ?oWchart of creating a content feature
`vector.
`
`DETAILED DESCRIPTION
`Referring to the ?gures, FIG. 1 is a conceptual diagram
`placing context stream analysis in the context of its envi
`ronment. Requests for advertisements are received by the
`Website server 110. The Website server 110 sends these
`requests to the affinity server 100.
`The af?nity server 100 receives requests and selects an
`advertisement. The af?nity server 100 has access to an
`advertisement bank 120. The advertisement bank 120 con
`tains advertisements selected and controlled by the adver
`tisement manager 130.
`The af?nity server 100 uses a combination of procedures
`to select an advertisement, including sponsorship categories
`140, ad inventory 150, and user pro?ling 160.
`Sponsorship categories 140 include page, keyWord, and
`?oating advertisements. Page sponsorship is an advertise
`
`7
`
`
`
`6,044,376
`
`10
`
`15
`
`25
`
`3
`ment anchored to a location on a particular page, typically
`in a prominent position. Keyword sponsorship refers to
`shoWing an advertisement in response to keywords the user
`has entered to perform a search or other query. Floating
`advertisements are not anchored, and may appear anyWhere
`on the page.
`Ad inventory 150 uses impression, freshness, time/day,
`and sequence techniques. Impression refers to the number of
`times an advertisement is shoWn to all users. Freshness
`refers to the number of times an advertisement is shoWn to
`a particular user, and hoW soon the advertisement may be
`shoWn again and hoW many times the advertisement may be
`shoWn Without losing effectiveness. Time/day techniques
`refer to selecting an advertisement based on the time and
`day, e.g. shoWing a fast food advertisement: immediately
`before lunch time. Sequence techniques refer to shoWing a
`sequence of advertisements Which form a uni?ed
`presentation, e.g. a ?rst brand-aWareness advertisement, a
`second product-speci?c advertisement, and a ?nal Where
`to-buy advertisement.
`User pro?ling 160 uses content stream analysis 170, as
`Well as demographic, geographic, pyschographic, digital
`identi?cation, and HTTP information. Content stream analy
`sis 170 refers to the particular pages selected and vieWed by
`the user. Demographic information refers to factors such as
`income, gender, age, and race. Geographic information
`refers to Where the user lives. Psychographic information
`refers to user responses to a questionnaire. Digital identi?
`cation information refers to user domain, broWser, operating
`system, and hardWare information. HTTP information refers
`to transfer protocol information.
`FIG. 2 shoWs a display 202, input devices 204, and a
`broWser 206, all of Which alloW a user 200 to interact With
`a CPU 208. The CPU 208 is connected through a modem or
`35
`netWork connection 210 to the WWW 220. The W 220
`alloWs user 200 to send instructions through broWser 206 to
`the Website server 110.
`The Website server 110 controls a Website corpus 230,
`made up of numerous Website ?les. The Website server 110
`uses a Working memory 240 and an application memory
`242. The application memory 242 contains the instructions
`246 to use the affinity server 100.
`The Website server 110 receives instructions from the user
`200 through the WWW 220. The user 200 instructs the
`Website server 110 to access the Website corpus 230 and
`retrieve and transmit speci?c Website ?les. These speci?c
`?les selected and vieWed by the user 200 are recorded by the
`af?nity server 100. The content stream to be analyZed
`includes the speci?c ?les selected and vieWed by the user.
`FIG. 3 shoWs one example of hoW the content stream is
`directed. After receiving instructions, the Website server 110
`uses instructions 246 to send the ?les 320 through the
`protocol stack 330 and netWork hardWare 350 to the user
`200. Preferably at the same time, the Website server 110 also
`sends the ?les 320 through a socket 340 to the affinity server
`100, Where content stream analysis 170 is performed.
`FIG. 4 shoWs hoW a page may be dynamically generated
`using content stream analysis. The user 200 vieWs a current
`page 410, Which contains links to other pages. When the user
`decides to folloW a link leading to another page, the Website
`server 110 retrieves the neW page 420 and sends it to the
`af?nity server 100. The af?nity server 100 then selects an
`advertisement. This advertisement is sent back to the Web
`site server 110, Where it is associated With the neW page 420
`and sent to the user 200, Where the advertisement and the
`neW page 420 comprises a dynamically generated page 430.
`
`4
`FIG. 5 is a ?oWchart of content stream analysis 170,
`Which involves: (1) receiving a group of advertisements
`from an advertisement bank (block 510); (2) receiving a
`content stream (block 520), (3) determining an affinity
`measure betWeen each advertisement and the content stream
`(block 530); and (4) selecting and presenting an advertise
`ment to the user, based Wholly or partially upon these affinity
`measures (block 540).
`FIG. 6 shoWs the determination of an af?nity measure
`betWeen an advertisement and a content stream (block 610).
`This involves: (1) creating an advertisement feature vector
`for each advertisement (block 620); (2) creating a content
`feature vector for each content ?le in the content stream
`(block 630); (3) determining a similarity measure betWeen
`the advertisement feature vector and the content feature
`vectors (block 640); and (4) multiplying the similarity
`measures by a decay factor (block 66); and (5) summing the
`similarity measures (block 650).
`FIG. 7 shoWs the creation of an advertisement feature
`vector (block 610). First, an advertisement is converted into
`individual Words (block 702). Text data may be parsed into
`their individual Words, While voice data may require auto
`mated voice recognition and transcription to be converted
`into their individual Words.
`Words Which are deemed insigni?cant for discerning the
`content of the advertisement are discarded. Discarded Words
`include formatting codes, such as those Which occur inside
`hypertext markup language (HTML) formatting tags, e.g.
`<title> and <bold> (block 704). The HTML standard is
`available at the World Wide Web Consortium Website (http://
`WWW.W3.org/pub/WWW/) and is incorporated by reference.
`Discarded Words include stop Words, e.g. articles,
`prepositions, and common adjectives, adverbs, and verbs
`(block 706). Words Which are deemed particularly signi?
`cant may be given extra Weight, e.g. Words labeled by the
`HTML <meta keyWord> or <title> tags.
`Next, the individual Words are passed through a stemming
`procedure to obtain Words and Word-stems (block 708). This
`is done to map all Words With a common meaning to the
`same Word. For example, a stemming procedure might map
`the Words nation, national, and nationally to the stem “nati.”
`The book “Information Retrieval” by William Frakes and
`Ricardo BaeZa-Yates, eds., Prentice Hall, 1992, is incorpo
`rated by reference as one example of a stemming procedure.
`The stemming procedure used is a modi?ed version of the
`procedure found in Frakes, et al. This modi?ed version adds
`neW rules for inferring suffixes, and also contains a Word
`pre?x processing scheme. The modi?ed version recogniZes
`When a Word begins With a common pre?x, and removes the
`pre?x before the stemming process is applied. After the
`stemming process is complete, the pre?x is added back on
`to the Word. This improves the accuracy of the stemming
`process, as Words that incorrectly stem to the same Word
`under the original procedure no longer do so.
`After the stemming procedure, the frequencies of each
`Word and Word-stem are determined (block 710). Finally,
`these frequencies are paired With the Words and Word-stems
`to create a multi-dimensional vector (block 712). This
`multi-dimensional vector is knoWn as an advertisement
`feature vector.
`The advertisement feature vector may be modi?ed using
`an inverse, logarithmic, document-frequency measure
`derived from Word frequency statistics (block 714). One
`embodiment of the document-frequency measure is the
`folloWing:
`
`45
`
`55
`
`65
`
`8
`
`
`
`6,044,376
`
`iff=0
`
`%
`2% +10%?) forf > 0
`
`15
`
`6
`easily applied to television programs and help determine
`What kind of commercials Will be shoWn to the user.
`What is claimed is:
`1. A method of selecting an advertisement from a ?le of
`advertisements having a target consumer, comprising the
`steps of:
`receiving content data representing content having par
`ticular characteristics;
`receiving advertisement data representing advertisements
`in the ?le;
`creating a content data structure Which indicates features
`of the content having particular characteristics;
`creating an advertisement data structure Which indicates
`features of the advertisements in the ?le;
`determining similarity measures betWeen the content data
`structure and the advertisement data structure by cal
`culating dot vector products betWeen the content data
`structure and the advertisement data structure and mul
`tiplying the dot vector products by a decay factor;
`determining af?nity measures betWeen the content data
`and the advertisement data in response to the similarity
`measures; and
`presenting to the consumer an advertisement from the ?le
`in response to the af?nity measures.
`2. The method of claim 1, Wherein content data includes
`WWW ?les.
`3. The method of claim 1, Wherein content data includes
`television programs.
`4. The method of claim 1, Wherein creating a content data
`structure Which indicates features of the content having
`particular characteristics comprises the steps of:
`converting the content data into individual Words;
`applying a stemming procedure to the individual Words to
`obtain Words and Word-stems;
`determining frequencies of particular Words and Word
`stems; and
`creating a multi-dimensional vector comprised of the
`Words and Word-stems mapped to their respective fre
`quencies.
`5. The method of claim 4, further comprising the steps of:
`discarding stop Words; and
`discarding Words Which occur inside HTML formatting
`tags, eXcept for those Which occur inside a meta key
`Word tag.
`6. The method of claim 1, Wherein creating an advertise
`ment data structure Which indicates features of the adver
`tisements in the ?le comprises the steps of:
`converting the advertisement data into individual Words;
`applying a stemming procedure to the individual Words to
`obtain Words and Word-stems;
`determining frequencies of particular Words and Word
`stems; and
`creating a multi-dimensional vector comprised of the
`Words and Word-stems mapped to their respective fre
`quencies.
`7. The method of claim 6, further comprising the steps of:
`discarding stop Words;
`discarding Words Which occur inside HTML formatting
`tags, eXcept for those Which occur inside a meta key
`Word tag.
`8. The method of claim 6, further comprising the steps of:
`determining Word frequency statistics for a content avail
`able at a site;
`
`Where,
`n is the number of occurrences of a particular Word Within
`the
`advertisement
`m is the maximum number of Words in the advertisement
`d is the total number of ?les in the site corpus
`f is the number of ?les in the site corpus Which contain the
`particular Word
`To obtain the Word frequency statistics, the site corpus
`received (block 720) and each individual content ?le in the
`site corpus is converted into individual Words (block 722).
`Insigni?cant Words such as formatting tags (block 724) and
`stop Words (block 726) are discarded. The individual Words
`are then passed through a stemming procedure to obtain
`Words and Word-stems (block 728). The number of ?les in
`Which each Word/Word-stem occurs is determined, produc
`ing the Word frequency statistics (block 730). These Word
`frequency statistics are then used to modify the advertise
`ment feature vector (block 732).
`FIG. 8 shoWs a sample advertisement feature vector. The
`Word/Word-stems 810 are mapped to their corresponding
`frequency values 820.
`FIG. 9 shoWs the creation of content feature vectors from
`the content ?les in the content stream (block 620). Each
`content ?le in the content stream is converted into individual
`Words (block 910). Insigni?cant Words such as HTML
`formatting tags (block 920) and stop Words (block 930) are
`discarded. The individual Words are then passed through a
`stemming procedure to obtain Words and Word-stems (block
`940). The Word and Word-stems are counted to determine
`their frequencies (block 950). These frequencies are paired
`With the Words and Word-stems to create a multi
`dimensional vector for each content ?le in the content
`stream (block 960).
`The similarity measure is the dot vector product of an
`advertisement feature vector and a content feature vector.
`Mathematically, let A=(v0, v1, K, V”) represent the content
`stream, Where v0 represents the most recent content feature
`vector in the content stream and vn represents the oldest
`content feature vector in the content stream. Let W be an
`advertisement feature vector. The similarity measure of v to
`W is denoted Sim(v, W). The af?nity measure of A to W is
`denoted Aff (A, W) and is calculated by:
`
`25
`
`35
`
`45
`
`Where 0t is the decay factor, for example
`
`1
`
`55
`
`Although the methods here have been described using
`WWW ?les as an example, they could just as easily be
`applied to television programs and other forms of user
`feedback media. With the advent and development of inter
`active television and automated voice recognition and tran
`scription systems, the methods described here could be
`
`65
`
`9
`
`
`
`6,044,376
`
`7
`modifying the advertisement data structure using an
`inverse, logarithmic, document-frequency measure
`derived from the Word frequency statistics.
`9. The method of claim 8, Wherein determining Word
`frequency statistics for the site corpus comprises the steps
`of:
`converting the content available at a site into individual
`Words;
`applying a stemming procedure to the individual Words to
`obtain Words and Word-stems; and
`determining frequencies of particular Words and Word
`stems.
`
`8
`10. The method of claim 1, Wherein presenting to the user
`an advertisement from the ?le in response to the affinity
`measures comprises the steps of:
`
`5
`
`retrieving the advertisement;
`retrieving a content page;
`
`combining the advertisement and the content page;
`
`transmitting the advertisement and the content page to the
`user.
`
`10