throbber
as) United States
`a2) Patent Application Publication (10) Pub. No.: US 2003/0086341 Al
`(43) Pub. Date: May8, 2003
`
`Wells et al.
`
`US 20030086341A1
`
`(54) AUTOMATIC IDENTIFICATION OF SOUND
`RECORDINGS
`
`Related U.S. Application Data
`
`(75)
`
`Inventors: Maxwell Wells, Seattle, WA (US);
`Vidya Venkatachalam, Bellevue, WA
`(US); Luca Cazzanti, Bellevue, WA
`(US); Kwan Fai Cheung, Bellevue,
`WA (US); Navdeep Dhillon, Seattle,
`WA (US); Somsak Sukittanon, Seattle,
`WA(US)
`
`Correspondence Address:
`STAAS & HALSEY LLP
`
`700 11TH STREET, NW
`SUITE 500
`
`WASHINGTON,DC 20001 (US)
`
`(73) Assignee: GRACENOTE,INC., Berkeley, CA
`
`(21) Appl. No.:
`
`10/200,034
`
`(22)
`
`Filed:
`
`Jul. 22, 2002
`
`(60) Provisional application No. 60/306,911, filed on Jul.
`20, 2001.
`
`Publication Classification
`
`Int. C1?eee G11B 11/00; GO6F 17/00
`(51)
`(52) U.S. Ch ecco 369/13.56; 700/94; 705/51;
`704/208
`
`ABSTRACT
`(57)
`Copiesof original sound recordingsare identified by extract-
`ing features from the copy, creating a vector of those
`features, and comparing that vector against a database of
`vectors. Identification can be performed for copies of sound
`recordings that have been subjected to compression and
`other manipulation such that they are not exact replicas of
`the original. Computational efficiency permits many hun-
`dreds of queries to be serviced at the same time. The vectors
`maybeless than 100 bytes, so that many millions of vectors
`can be stored on a portable device.
`
`103
`
`
`Test set
`Candidate
`
`
`of
`Fingerprint
`recordings
`Component
`
`
`
`106
`
`
`
`
`
`Dif-
`Acceptfor
`Compare
`
`ference
`further
`105
`Candidate
`
`
`
`
`< x testing
`
`Fingerprints
`
`
`
`
`
` Test sets of
`Candidate
`
`Fingerprint
`Recordings
`
`
`
`Component
`Reject or
`with effects
`
`
`
`modify
`
`
`108
`
`EX1049
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`EX1049
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 1 of 20
`
`US 2003/0086341 Al
`
`
`
`Jo,|dssoV
`
`Jayyny
`
`Buljs9]
`
`901
`
`“$!Q
`
`g0Uasa}
`
`LX>
`
`Jopelay
`
`Aj|pou
`
`801
`
`Vi‘Sls
`
`COL
`
`
`
`quiduaBbul4jo
`
`ayeplpuedv8SOL
`
`
`
`yuauodwosdsBulpiooe
`
`aeduoy
`
`s}epipueyD
`
`sjuidiebul
`
`SOL
`
`ayepIpuey
`
`juUdebul4
`
`JUSUCdWUOD
`
`
`
`JOs}es}SOL
`
`sBulpio0ey
`
`SJO@YOYIM
`
`
`
`
`
`
`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 2 of 20
`
`US 2003/0086341 Al
`
`90¢GO0e
`
`0d
`
`£0?
`
`
`
`Aouenbel4yoezul
`
`Aouenbai
`
`gyejauese}eJBUey
`
`JOJOJOBA,JO10]D8/A,Sul
`
`
`
`SBDUBLIEAjetjUuaDAquanbe14
`
`
`yoesulsejouepud|XE
`
`joxiyey\
`
`Aouanbel4
`
`soepnyduiy
`
`Sul]ye
`
`S|JEAIO}U|
`
`oul|
`
`uolsodwo0eq
`Aguenbel4
`Huluonipuod
`
`jeuBls
`
`
`
`
`
`judeBul4WIOY0}SONjEAJOPSAayeUs}EQUOy
`
`Burjoajes
`
`Buuepio
`
`pue
`
`Bunybiana
`
`a}BIBUS)
`
`JoJO}S/A
`
`sepny|duiy
`
`awl]Yyoes}e
`
`Q}e18U95
`
`JO10}09/,
`
`saiouenbel4
`
`oul|yoed}e
`
`802
`
`gb‘Sls
`
`
`
`
`
`
`
`
`
`
`

`

`Patent Application Publication May 8,2003 Sheet 3 of 20
`
`US 2003/0086341 Al
`
`FIG. 2
`
`218 220
` 221
`
`Histogram
`
`Normalized
`
`224
`
`
`Histogram
`
`
`228
`
` Histogram
`
`
` Cumulative
`
`Probability
`Density
`
`
`
`
`Remapped
`Cumulative
`
` 226
`
`
`Probability
`Density
`
` Remapped
` Histogram
` 230
`Equalized
`
`
`Audio
`
`sample
`
`

`

`Patent Application Publication
`
`May 8, 2003
`
`Sheet 4 of 20
`
`US 2003/0086341 Al
`
`232
`
`Audio
`Sample
`
`
`
`Time
`frequency
`matrix
`
`244
`
`534 | frame
`
`Remove
`sensitive bands
`
`250
`
`
`Compute
`mean across
`
`
`bands and
`
`
`frames
`
`252
`
`DCT
`
`
`
`
`Split into
`
`frequency
`
`
`bands
`
`246
`
`240
`
`Normalize the
`
`DCT values
`
`Standard
`
`Normalize
`
`band and
`
`254
`
`frame means
`
`Concatenate band
`
`and frame means
`
`256
`
`into fingerprint
`
`
`
`

`

`Patent Application Publication
`
`
`volsnpJA
`SUI}Ul:SOWel]coe
`
`fouonboly|SAqsanien
`Owl}4cgjeiedasWWUOJSUBL|re8ZESUISO)
`
`ONG
`~LESssedpueg)epueg
`FOG
`
`(su9}|I4
`
`CCEOZE
`
`May8, 2003 Sheet 5 of 20
`
`US 2003/0086341 Al
`
`9ZE
`
`9190/981q
`
`
`
`

`

`Patent Application Publication May 8,2003 Sheet 6 of 20
`
`US 2003/0086341 Al
`
`512
`
`501
`
`sample
`
`component
`
`Reduced
`
`matrix
`
`frequency
`
`Windowed
`;
`audio
`frame
`
`508
`
`502
`
`Compute
`power
`matrix
`
`Transform
`to
`|
`cale
`O tog §
`
`509
`
`t
`
`Compute
`powerin time-
`frequency
`
`blocks
`
`Ear
`transfer
`function
`
`510
`
`Normalize
`
`power
`
`Generate
`fingerprint
`
`514
`
`F I g . 5
`
`513
`511 Discard
`
`
`Matrix
`
`506
`
`Advance
`frame
`
`505
`
`Time
`
`
`
`
`
`frequency and/or
`
`time bands
`
`

`

`Patent Application Publication May 8,2003 Sheet 7 of 20
`
`US 2003/0086341 Al
`
`FIG.6A
`
`FIG.6B
`
`FIG.6C
`
`NS
`Ww
`Ww)
`
`co
`i>
`uw
`
`°O
`
`_
`
`LO
`
`oO
`
`10
`LO
`
`o
`1.
`—
`
`CUM
`Oo
`
`So
`—
`
`«(1
`CS
`

`
`vr
`
`LO
`
`oO
`
`=
`Ss
`
`CP
`vv
`

`
`a
`

`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 8 of 20
`
`US 2003/0086341 Al
`
`JUusWAa|y
`
`UIY}IM
`
`Souelsip
`
`uopueqge
`
`yolees
`
`614
`
`al
`
`802
`
`SOA
`
`ON
`
`SOA
`
`
`
`wuewlel3E01
`
`
`
`Zouesipsan\uIIMqua
`
`youees
`
`SLL
`
`uopueqeON
`
`Vl‘Sls
`
`polUOpuegqe
`
`yoleas
`
`UIUIMSJUBLUE|e
`
`SoAQoue}sIp
`ebéONBuipsoesdUM
`
`
`
`uiyjimsyuaLue|eONSOZaoe20198
`
`
`
`
`BuipsoeidUyIM
`
`Sdq[IBJOJos
`
`Sd|B$0Jos
`
`yolees
`
`pLbbl
`
`202
`
`
`
`=3
`
`902
`
`LOZ
`
`
`
`
`
`
`
`
`

`

`Patent Application Publication May 8,2003 Sheet 9 of 20
`
`US 2003/0086341 Al
`
`
`Compare distance of candidate
`from all reference FPsin set
`
`
`
`“7
`
`Choose
`
`closest
`
`distance
`
`Within
`
`distance
`
`
`
`No match in
`
`
`
`
`
`
`
`
`threshold?
`
`database
`
`FIG. 7B
`
`Matching
`
`
`
`
`
`fingerprint
`
`
`

`

`
`
`/
`

`
`—
`
`Patent Application Publication May 8,2003 Sheet 10 of 20
`
`US 2003/0086341 Al
`
`oO
`9
`
`oO
`N
`
`QN
`
`i
`
`-
`
`oOel
`
`w
`
`;
`Oo
`
`t
`L
`i
`Q
`OQ
`oO
`oO
`oO
`oO
`S
`Oo
`Oo
`Oo
`oO
`N
`Oo
`co

`-
`-
`
`™) PO |
`Oo

`Oo
`Oo
`oO
`oa
`wt
`N
`
`oO
`
`O°
`(o>)
`oO
`oO
`
`

`

`
`
`FIG.8B
`
`Patent Application Publication May 8,2003 Sheet 11 of 20
`
`US 2003/0086341 Al
`
`oc
`
`o
`
`25
`
`15
`
`10
`
`
`
`

`

`Patent Application Publication
`
`————|.V6‘9ls
`
`
`36‘SIs
`
`d6Sls
`
`May8, 2003 Sheet 12 of 20
`
`US 2003/0086341 Al
`
`
`
` OLxaouerlsid
`
`
`
`
`
`
`d6‘Sls
`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 13 of 20
`
`US 2003/0086341 Al
`
`018Za
`
`CLS40uolodold
`
`a)1
`
`VOLSIs
`
`gob‘Sls
`
`

`

`Patent Application Publication May 8,2003 Sheet 14 of 20
`
`US 2003/0086341 Al
`
`1500
`
`
`
`Candidate
`fingerprint
`
`1501
`
`1502
`
`Search LRU
`
`cache
`
`
`
`
`
`1507
`
`1508
`
`
`Return
` matching FP
`
`Populate
`
`LRU cache
`
`Fig. 11
`
`
`
`FP notin
`
`database
`
`
`
`

`

`Patent Application Publication May 8,2003 Sheet 15 of 20
`
`US 2003/0086341 Al
`
` 1601
`Stream of
`
`
`music
`
`
`
`1602
`
`at regular
`
`
`intervals
`
`Search
`
`1603
`
`
`
`database
`
` against
`
` Extract
`
`
`
`
`fingerprints
`1600
`
`Database offingerprints
`
`identified
`
`1605
`
`Fig. 12
`
`
`
`One
`
` Stream not
`
`or more
`
`fingerprint
`
`1604
`
`matches
`
`1607
`
`
`
`Stream
`
`additional
`
`
`identified
`
`Meets
`
`criteria
`
`
`?
`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 16 of 20
`
`US 2003/0086341 Al
`
`
`
`
`
`S3UBISIPYdJBW1seso|7)——
`
`wee eee ee eet ee es! wee ee eee
`
`1''1i1,''1k
`
`rrr TPs
`
`ee pe de ee
`
`iVelSis
`
`OcL
`
`001
`
`6ebSls
`
`

`

`Patent Application Publication
`
`May8, 2003 Sheet 17 of 20
`
`US 2003/0086341 Al
`
`
`
`eeee
`
`eee eye eee yr rrr
`
`ee ee Je ee ede eee dee
`
`boo eee eb eee eee
`'''
`t'1‘
`
`‘
`procter
`
`eee
`
`ieee Joe ee eed ee
`
`eee qe eee
`
`BYSIBUUOY
`
`ANoot
`
`SIBLUO}gSSOl0e@jljJadsulodyesigpz-sbues$6
`
`06
`
`08OLog0SOvO€0gOl
`
`VbSls
`
`
`
`
`

`

`Patent Application Publication May 8,2003 Sheet 18 of 20
`
`US 2003/0086341 Al
`
`Song
`
`Choose a
`
`
`
`principal FP
`
`Extract FPs
`
`1701
`
`
`every 15
`
`Compute
`
`
`seconds
`distance between
`each FP and
`
`1704
`
`principal
`
`1702
`
`
`
`
`Fig. 15A
`
`Create
`
`representative
`vector
`
`
`4705
`
`

`

`youolj}e007
`
`US 2003/0086341 Al
`
`SUL|
`
`gs)einbis
`
`00
`
`Patent Application Publication
`
`May8, 2003 Sheet 19 of 20
`
`jediouud
`
`quudsaBuy
`
`Ol
`
`
`
`useMmjieqsoue}sid
`
`pueajdiouud
`
`
`
`sjuLdiabBuyJ@u}O
`
`

`

`Patent Application Publication May 8,2003 Sheet 20 of 20
`
`US 2003/0086341 Al
`
`1606
`
`oc
`
`FIG.16
`
`O
`
`©9
`
`)
`a)
`Lu
`oO
`
`Ooo
`
`1604
`
`

`

`US 2003/0086341 Al
`
`May8, 2003
`
`AUTOMATIC IDENTIFICATION OF SOUND
`RECORDINGS
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`[0001] This application is related and claims priority to
`US. provisional application entitled AUTOMATIC IDEN-
`TIFICATION OF SOUND RECORDINGS, havingserial
`No. 60/306,911, by Wells et al., filed Jul. 20, 2001 and
`incorporated by reference herein.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`
`1. Field of the Invention
`
`[0003] The present invention is directed to the identifica-
`tion of recordings and, more particularly, to the identifica-
`tion of sound recordings, such as recordings of music or
`spoken words.
`
`[0004]
`
`2. Description of the Related Art
`
`Identification is a process by which a copy of a
`[0005]
`sound recording is recognized as being the same as the
`original or reference recording. There is a need to automati-
`cally identify sound recordings for the purposes of registra-
`tion, monitoring and control, all of which are important in
`ensuring the financial compensationof the rights owners and
`creators of music. There is also a need for identification for
`
`the purposes of adding value to, or extracting value from the
`music. Registration is a process by which the owner of
`content recordshis or her ownership. Monitoring records the
`movementand use of content so that it can be reported back
`to the owner, generally for purposes of payment. Control is
`a process by which the wishes of a content owner regarding
`the use and movementof the content are enforced.
`
`[0006] Some examples of adding value to music include:
`identification of unlabelled or mislabeled content to makeit
`
`easier for users of the music to access and organize their
`music and identification so that the user can be provided
`with related content, for example, information about the
`artist, or recommendations of similar pieces of music.
`
`[0007] Some examples of extracting value from music
`include: identification for the provision of buying opportu-
`nities and identification for the purpose of interpreting
`something about
`the psychographics of the listener. For
`example, a particular song may trigger an offer to purchase
`it, or a related song by the sameartist, or an article of
`clothing made popular by that artist. This extracts value
`from the music by using it as a delivery vehicle for a
`commercial message. In addition, psychographics uses psy-
`chological, sociological and anthropological factors to deter-
`mine how a market is segmented by the propensity of groups
`within the market
`to make a decision about a product,
`person,
`ideology or otherwise hold an attitude or use a
`medium. This information can be used to better focus
`
`commercial messages and opportunities. This extracts value
`from the music by usingit to profile the listener.
`
`[0008] There have been two types of monitoring, reflect-
`ing the delivery of stored music and the delivery of played
`music. Stored music is considered to be copies for which
`there are “mechanical” or “reproduction” rights. Played
`music may be considered to be a performance, whether or
`not the performanceis live or recorded. This demarcation is
`
`reflected in different payment structures, which are admin-
`istered by different organizations. One organization (Harry
`Fox Agency) collects reproduction royalties when CDsor
`tapes are sold. These physical goods are counted and moni-
`tored using a variety of accounting practices and techniques.
`ASCAP, BMI and SESACcollect performance royalties
`when live or recorded music is played on the radio or in
`public spaces. These performances are monitored using a
`combination of automatic identification methods and human
`verification.
`
`[0009] There are several different methods used for deliv-
`ery of music. Live music is “delivered” in a performance
`space, by radio and TV (both analog and digital) and over
`the Internet. Stored music or other sound recordings may be
`delivered on physical media associated with the recordings
`(CDs, cassettes, mini discs, OD-RWs, DVDs) which may be
`moved (stored, distributed, sold, etc). However, a sound
`recording does not have to be associated with a physical
`medium; it can also be easily transported in electronic form
`by streaming, or by moving from one storage location to
`another. In both cases, either radio or the Internet may be
`used to transport the sound recording.
`
`[0010] Digital music and the Internet are changing the way
`music is delivered and used, and are changing the require-
`ments for music identification. These changes are brought
`about because the Internet can be used to deliver both
`performances and copies, and the Internet
`increases the
`number of delivery channels.
`
`{0011] Whereas a terrestrial radio station may reach one
`thousand listeners at any momentin time while playing the
`same one song, an Internet radio station may reach one
`thousand listeners at one time while playing one thousand
`different songs. This means that a larger and more diverse
`selection of songs must be identified.
`
`[0012] Existing business models for music are being chal-
`lenged. For example, CD readers attached to personal com-
`puters, and peer-to-peer services are making it easier to copy
`and exchange music. New methods for registering, moni-
`toring, controlling, and extracting value from music are
`needed.
`
`[0013] The copying of digital music is easy. Users are able
`to make copies on a variety of different media formats, for
`a variety of consumerelectronic devices. This creates a need
`to identify more copies of songs, across multiple media
`formats and types of device. Some of the devices are not
`connected to the Internet, which introduces an additional
`requirement on an identification system.
`
`[0014] There is a need for a single solution that can
`identify streamed or moved music acrossall delivery chan-
`nels. A single solution is preferable due to economies of
`scale, to remove the need to reconcile across methods and
`databases, and to provide a simple solution forall aspects of
`the problem.
`
`[0015] Current methods rely on attaching tags, water-
`marks, encryption, and fingerprints (the use of intrinsic
`features of the music). Tags are attached to the physical
`media or to the digital copy. The lowest common denomi-
`nator is the artist-title pair (ATP). Other information can
`include publisher, label and date. Attempts to give a sound
`recoding a unique ID include the ISRC (International Stan-
`dard Recording Code), the ISWC (International Standard
`
`

`

`US 2003/0086341 Al
`
`May8, 2003
`
`Work Code), the EAN (European Article Number), the UPC
`(Universal Product Code), ISMN (International Standard
`Music Number) and the CAE (Compositeur, Auteur, Edi-
`teur). All are alphanumeric codesthat are either attached to
`physical copies of the sound recording, or embeddedin the
`digital copy. Part of the rationale for creating the various
`codes was to assist with the automated identification and
`
`tracking of the works.
`
`[0016] However, there are problems with the use of ATPs
`and alpha-numeric codes. They can be easily detached or
`changed (as evidenced by the recent attempts by Napster to
`use ATPsto block content). Once detached or changed, they
`require human intervention (listening) to be reattached or
`corrected. There is no way to automatically authenticate that
`the content is what it’s tag claims it to be. They must be
`attached at source, prior to duplication, which reducestheir
`utility with legacy content. They are applied intermittently
`or incorrectly. They require a critical mass of industry
`participants to be useful. EAN/UPCidentify the CD and are
`not useful for individual music tracks. In some countries,
`there are laws against
`transmitting data along with the
`music, which limitstheir utility. Also, transmitting such data
`may require additional bandwidth.
`
`[0017] Watermarks add an indelible and inaudible signal
`that is interpreted by a special reader. Watermarks can be
`robust to noise. They are good for combinations of live and
`recorded content, for example where an announcer speaks
`over recorded background music. Watermarks can deliver
`additional information without the need to access a database.
`
`The problems with watermarksare: they are not necessarily
`indelible nor inaudible; they require addition at source, prior
`to duplication, and therefore have limited utility for legacy
`content; and if applied to legacy content, there still needs to
`be a wayto first identify the music.
`
`[0018] Encryption uses techniques embedded in software
`to make the content inaccessible without a key. Identification
`is done prior to encryption, and the identification informa-
`tion (metadata) is locked up with the music. Some of the
`problemswith encryption are: it has limited utility for legacy
`content, if applied to legacy content, there still needs to be
`a way to identify that content; and there is consumer
`resistance to locking up music. These problems are caused
`by incompatibilities between equipment that plays locked
`music and equipment that does not, leading to a reluctance
`to purchase equipmentthat maynot play their existing music
`collections and to purchasing music that may not play on
`equipment the consumers currently own.
`
`[0019] Another approach is to use intrinsic properties of
`the music to provide a “fingerprint.”The identifying features
`are a part of the music, therefore changing them changesthe
`music. The advantages of this method include: nothing is
`added to the music; the fingerprints can be regenerated at
`any time; fingerprints work on legacy content and do not
`require broad industry adoption to be applicable to all
`content; and fingerprints can madeofan entire song, and can
`therefore ensure that song’s completeness and authenticity.
`
`[0020] Current fingerprinting methodsare not suitable, for
`reasons that will be described in more detail later. Their
`limitations come about because of the requirements for (1)
`identifying large numbers of songs, and (2) identifying
`songs that have slight variations from the original. These
`variations are insufficient to cause a human to judge the
`
`songs as being different, but they can be sufficient to cause
`a machine to do so. In sum,
`the problems with current
`fingerprinting methods are that some systems can handle a
`large number of songs, but cannot handle the variations,
`while other systems can handle many variations, but cannot
`handle a large number of songs.
`
`[0021] Variations in songs may be caused by numerous
`“delivery channel effects.” For example, songs played on the
`radio are subjected to both static and dynamic frequency
`equalization and volume normalization. Songs may also be
`speeded up or slowed down to shorten or lengthen their
`playing time. Stored music can vary from the original
`because of the same effects found in radio, and because of
`other manipulations. The most common manipulationis the
`use of a codec to reduce the size of a file of stored music to
`
`make it more suitable for storage or movement. The most
`common codec is the MP3. The codec encodes the song to
`a compressed form, and at playback decodes, or expands,it
`for listening. An ideal codec will remove only those parts of
`the original that are minimally perceptually salient so that
`the version that has undergone compression and expansion
`soundslike the original. However, the process is lossy and
`changes the waveform of the copy from that of the original.
`Other manipulations and their manifestations (delivery
`channeleffects) are described below.
`
`identifying
`[0022] Existing methods are intended for
`stored sound recordings, and for identifying sound record-
`ings as they are being played (performances). The main
`distinctions between the two identification systemsare:
`
`[0023] Played music identification systems must be
`capable of identifying a song without any knowledge
`of the song’s start point. It is easier to find the start
`point in stored music.
`
`[0024] Played music identification can have an upper
`capacity of about 10,000 reference recordings.
`Stored music requires a larger capacity.
`
`is being
`[0025] Played music is identified as it
`played, so there is not a stringent requirement for
`speed of fingerprint extraction or lookup. For many
`applications, stored music must be identified at many
`timesreal time.
`
`[0026] Played music identification may be limited to
`several thousand radio stations. There is a need for
`
`stored music identification by tens of millions of
`individual music users.
`
`[0027] Played music must be identified in the pres-
`ence of manipulations that create variations from the
`original. Methodsof identifying stored music in the
`prior art are not designed to compensate for varia-
`tions.
`
`[0028] Both categories include techniques that rely on the
`use of intrinsic properties, the addition of metadata or the
`addition of inaudible signals. However the examination will
`concentrate on those identification techniques that use the
`intrinsic properties of the sound recording, either by them-
`selves, or in combination with other information.
`
`[0029] One commonly used technique for identifying cop-
`ies of music on a compact disc (CD) is to use the spacing
`between tracks and the duration of tracks or the “Table of
`
`Contents” of a CD to create a unique identifier for the CD,
`
`

`

`US 2003/0086341 Al
`
`May8, 2003
`
`as described in U.S. Pat. No. 6,230,192. The CD identity is
`used to lookup the name and order of the tracks from a
`previously completed database. This method does not work
`once the music has been removed from the CD, and is a copy
`on a computer hard drive.
`
`bursts, and signal dropout. It is capable of monitoring for
`one of approximately 10,000 songs in each of 5 radio
`channels. The disclosed technique is fairly robust, but the
`size of the database of reference songsis limited, primarily
`due to the database search techniques used.
`
`[0030] Another technique uses a hash algorithm to label a
`file. Hash algorithms, such as the Secure Hash Algorithm
`(SHAL) or MD5,are meantfordigital signature applications
`where a large message has to be “compressed” in a secure
`manner before being signed with the private key. The
`algorithms maybe applied to a musicfile of arbitrary length
`to produce a 128-bit message digest. The benefits of the hash
`values are they are quick to extract, they are small in size,
`and they can be used to perform rapid database searches
`because each hash is a unique identifier for a file. The
`disadvantages include:
`
`(1) The algorithms are designed to be secure
`[0031]
`to tampering, so any change to the file, however
`minor, will result in a different hash value. As a
`result,
`the hash value changes when the file is
`subjected to any of the channeleffects. For example,
`there are on average 550 variants of each song on a
`large file sharing exchange such as Napster. A slight
`alteration of a song(e.g. the removal of one sample)
`will result in a different hash, which will not be able
`to be used to identify the song.
`
`(2) Each variant of a song file requires that a
`[0032]
`different hash be stored in the database, resulting in
`a large database with a many-to-one relationship.
`
`‘Yet another technique is described in U.S. Pat. No.
`[0033]
`5,918,223. The method extracts a series of feature vectors
`from a piece of music which it then sends to a database for
`identification. The advantages of this technique are that the
`feature vectors consist of intrinsic properties of music that
`are claimed to be perceptually salient. This meansthat they
`should be robust to manyofthe distribution channel effects.
`The disadvantages are:
`
`(1) The feature vector
`[0034]
`intensive to extract
`
`is computationally
`
`[0035]
`
`(2) The feature vector is large, which means:
`
`(a) It takes long time to look up and is
`[0036]
`expensive to implement
`for large numbers of
`queries.
`
`[0037]
`traffic
`
`(b)
`
`It
`
`increases the amount of network
`
`(3) Each individual vector does not contain
`[0038]
`sufficient information to uniquely identify a song.
`Identification is accomplished after a series of fea-
`ture vectors are matched in the database. The data-
`
`base therefore takes a long time to search and must
`be limited in size.
`
`(4) There is no evidence that the technique is
`[0039]
`immuneto all delivery channeleffects.
`
`[0040] One method for identifying played sound record-
`ings is described by Kenyon in US. Pat. No. 5,210,820. The
`820 patent is primarily designed for radio station monitor-
`ing wherethe signal is acquired from listening stations tuned
`to a terrestrial radio station ofinterest. The system is capable
`of identifying songs irrespective of speed variation, noise
`
`Identifying all sound recordings includes stored
`[0041]
`music for around 10 million different songs in early 2002.
`For streamed music this numberis in the tens of thousands.
`The prior art has focused on streamed music with a much
`smaller number of songs.
`
`Identifying legacy content applies to approxi-
`[0042]
`mately 500 billion copies of digital music in existence.
`Methods that require the music to be identified at the point
`of origin cannot identify these copies.
`
`[0043] New content consists of relatively few songs that
`comprise the majority of popular music, distributed from a
`few points of origin, with processes in place to control the
`workflow, plus a larger number of songs distributed from
`many points of origin. These points are geographically
`distributed, and have diverse methods of workflow manage-
`ment. Therefore, methods that require the music to be
`identified at the point of origin cannot identify the majority
`of songs.
`
`SUMMARYOF THE INVENTION
`
`[0044] An aspect of the invention is to automatically
`identify all sound recordings, including legacy content and
`new content.
`
`[0045] Another aspect of the inventionis to identify sound
`recordings rapidly. The system should be able to identify
`music at many timesreal time. For example a three minute
`song should be identified in less than three seconds.
`
`[0046] A further aspect of the invention is to automatically
`identify sound recordings with computational efficiency of
`extraction and lookup. Computational efficiency of the fin-
`gerprint extraction and lookupis desirable because many of
`the songs will be identified on consumerelectronics devices
`with limited processing power.
`
`‘Yet another aspect of the invention is to automati-
`[0047]
`cally identify sound recordings using a small fingerprint
`extracted from each sound recording and compact lookup
`code. Both are desirable because many of the songs will be
`identified on consumer electronics devices with limited
`storage space.
`
`[0048] Astill further aspect of the invention is to identify
`sound recordings whether the tags are absent or incorrectly
`applied, whether intentionally or not.
`
`‘Yet another aspect of this invention is to automati-
`[0049]
`cally identify variations of sound recordings where those
`variations are caused by delivery channeleffects. The mani-
`festations of those effects that should be considered include:
`
`(1) DC value—the average value of a digi-
`[0050]
`tized song waveform amplitude in the time domain.
`
`(2) Phase Inversion—the process of multiply-
`[0051]
`ing every time domain digital sample of a song
`waveform by -1. For a multichannel song, phase
`inversion is applied to all channels.
`
`

`

`US 2003/0086341 Al
`
`May8, 2003
`
`(3) Pitch-invariant speed increase—the pro-
`[0052]
`cess of speeding up the playback rate of a song
`without affecting its pitch.
`
`(4) Peak limiting—the process of limiting the
`[0053]
`maximum signal amplitude to a specified threshold.
`
`(5) Volume normalization—the process by
`[0054]
`which the gain of an audio file is increased until its
`loudest point (or sample) is at maximum level.
`
`(6) Dynamic range reduction—the process by
`[0055]
`which the dynamic range of a sound is reduced.
`Dynamic Range is the ratio of the strongest, or
`loudest part
`to the weakest, or softest, part of a
`sound; it is measured in dB
`
`(7) Equalization—theprocess usedto alter the
`[0056]
`relative balance of frequencies to produce desired
`tonal characteristics in sounds.
`
`(8) Remastering—the process of mastering a
`[0057]
`recording after the first mastering has been done.
`May happen when the “master tape” is re-processed
`because a recording is reissued, or included in a
`different album. Sometimes an actual mastering
`house is used, and other times the “mastered” mate-
`rial is sent directly to a duplication facility where
`they can also do the final few steps. Typical master-
`ing effects include many potential processes of the
`audio signal such as equalization, compression, lim-
`iting, normalization, wideningthe stereo image, edit-
`ing fades, and just putting the songs in the correct
`order
`
`in kbs at which an
`(9) Bit rates—the rate,
`[0058]
`original song is compressed by a codec.
`
`(10) Start time variations—variations in the
`[0059]
`fingerprint caused by whatdifferent players consider
`the start of a song.
`
`(11) Different rippers—variations in the fin-
`[0060]
`gerprint
`caused by different
`rippers
`(software
`devices that extract a song from a CD for compres-
`sion).
`
`(12) Codecs—variations in the fingerprint
`[0061]
`caused by different coding and decoding schemes.
`
`(13) Watermarking—variations in the finger-
`[0062]
`prints caused by the addition of a watermark.
`
`(14) Addition of noise—variations in the fin-
`[0063]
`gerprint caused by the addition of noise to the audio,
`from various sources.
`
`[0064] The requirementsfor being able to deal with legacy
`content preclude systems based on encryption, watermark-
`ing or tagging at source. The requirement to be robust to
`simple manipulations of the tags precludes tagging systems.
`This leaves fingerprinting as the only way of meeting most
`of the requirements.
`
`[0065] An additional requirement for some applications is
`that the entire song be checked to ensure thatit is all present
`and correct. Reasons for this requirement:
`include:
`(1)
`quality assurance where the rights owner of a song, or an
`artist, may wish to assure that their song is only distributed
`in its entirety, and (2) prevention of spoofing which relates
`to attempts to misrepresent identification which may be a
`
`tactic used to distribute songs illegally over a network.If a
`fingerprint is taken from a small section of the song, such as
`near the beginning, someone trying to spoof the system
`might prepend a section of a legal song onto the front of an
`illegal song.
`
`[0066] A further aspect of this invention is automatic
`identification and authentication of entire songs.
`
`[0067] The above aspects can be attained by a method of
`identifying recordings by extracting at least one candidate
`fingerprint from at
`least one portion of an unidentified
`recording; and searching for a match between at least one
`value derived from the at least one candidate fingerprint and
`a value in at least one reference fingerprint amonga plurality
`of reference fingerprints.
`
`[0068] These together with other aspects and advantages
`which will be subsequently apparent, reside in the details of
`construction and operation as more
`fully hereinafter
`described and claimed, reference being had to the accom-
`panying drawings forming a part hereof, wherein like
`numerals refer to like parts throughout.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0069] FIG. 1A is a flowchart of a fingerprint component
`testing procedure according to an embodimentof the inven-
`tion.
`
`[0070] FIG. 1B is a flowchart of a procedure for finger-
`print creation according to an embodiment of the present
`invention.
`
`FIG.2 is flowchart of the procedure for histogram
`[0071]
`equalization of soundfiles.
`
`[0072] FIG. 3 is flowchart of the procedure for band-by-
`band frequency equalization.
`
`[0073] FIG. 4 is a block diagram of time-frequency
`decomposition to create a matrix of frequency amplitude at
`time intervals in FIG. 1B.
`
`FIG.5 is flowchart of the procedure for creating a
`[0074]
`fingerprint based on a perceptual model of audition.
`
`[0075] FIGS. 6A-6C are wavelet based fingerprints of
`three songs with three variants each.
`
`[0076] FIGS. 7A and 7B is flowchart of a procedure for
`searching a database of reference fingerprints.
`
`[0077] FIGS. 8A and 8B are graphs of SRR search
`parameters overlaid on an example of a fingerprint.
`
`[0078] FIGS. 9A-9D are graphs of the distributions of
`matches for wavefiles, blade 128files, blade 32 files and fhg
`128 kg MP3files.
`
`[0079] FIG. 10A is a graph of the efficacy of Search by
`Range Reduction.
`
`[0080] FIG. 10B is a graph oftotal error (type 1+type 2)
`as a function of the Search by Range Reduction threshold.
`
`[0081] FIG. 11 is flowchart of a procedure for combining
`fuzzy and exact matches between candidate and reference
`fingerprints.
`
`[0082] FIG. 12 is flowchart of a procedure for using
`fingerprints to identify a stream of music.
`
`

`

`US 2003/0086341 Al
`
`May8, 2003
`
`[0083] FIG. 13A is a graph of the distance of the closest
`match based on extracting one fingerprint every second from
`a sample song.
`
`[0084] FIG. 13B is a graph of the song ID in the database
`corresponding to the closest match in FIG. 7A.
`
`[0085] FIG. 14 is a graph of percentage agreement
`between machine-extracted and human-extracted break-
`points (accuracy) for 95 songs.
`
`[0086] FIG. 15A is flowchart of a procedure for repre-
`senting an entire song as a compact vector.
`
`[0087] FIG. 15B is graph of the procedure illustrated in
`FIG. 15A.
`
`[0088] FIG. 16 is a simplified block diagram of a syst

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket