`a2) Patent Application Publication (10) Pub. No.: US 2003/0086341 Al
`(43) Pub. Date: May8, 2003
`
`Wells et al.
`
`US 20030086341A1
`
`(54) AUTOMATIC IDENTIFICATION OF SOUND
`RECORDINGS
`
`Related U.S. Application Data
`
`(75)
`
`Inventors: Maxwell Wells, Seattle, WA (US);
`Vidya Venkatachalam, Bellevue, WA
`(US); Luca Cazzanti, Bellevue, WA
`(US); Kwan Fai Cheung, Bellevue,
`WA (US); Navdeep Dhillon, Seattle,
`WA (US); Somsak Sukittanon, Seattle,
`WA(US)
`
`Correspondence Address:
`STAAS & HALSEY LLP
`
`700 11TH STREET, NW
`SUITE 500
`
`WASHINGTON,DC 20001 (US)
`
`(73) Assignee: GRACENOTE,INC., Berkeley, CA
`
`(21) Appl. No.:
`
`10/200,034
`
`(22)
`
`Filed:
`
`Jul. 22, 2002
`
`(60) Provisional application No. 60/306,911, filed on Jul.
`20, 2001.
`
`Publication Classification
`
`Int. C1?eee G11B 11/00; GO6F 17/00
`(51)
`(52) U.S. Ch ecco 369/13.56; 700/94; 705/51;
`704/208
`
`ABSTRACT
`(57)
`Copiesof original sound recordingsare identified by extract-
`ing features from the copy, creating a vector of those
`features, and comparing that vector against a database of
`vectors. Identification can be performed for copies of sound
`recordings that have been subjected to compression and
`other manipulation such that they are not exact replicas of
`the original. Computational efficiency permits many hun-
`dreds of queries to be serviced at the same time. The vectors
`maybeless than 100 bytes, so that many millions of vectors
`can be stored on a portable device.
`
`103
`
`
`Test set
`Candidate
`
`
`of
`Fingerprint
`recordings
`Component
`
`
`
`106
`
`
`
`
`
`Dif-
`Acceptfor
`Compare
`
`ference
`further
`105
`Candidate
`
`
`
`
`< x testing
`
`Fingerprints
`
`
`
`
`
` Test sets of
`Candidate
`
`Fingerprint
`Recordings
`
`
`
`Component
`Reject or
`with effects
`
`
`
`modify
`
`
`108
`
`EX1049
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`EX1049
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 1 of 20
`
`US 2003/0086341 Al
`
`
`
`Jo,|dssoV
`
`Jayyny
`
`Buljs9]
`
`901
`
`“$!Q
`
`g0Uasa}
`
`LX>
`
`Jopelay
`
`Aj|pou
`
`801
`
`Vi‘Sls
`
`COL
`
`
`
`quiduaBbul4jo
`
`ayeplpuedv8SOL
`
`
`
`yuauodwosdsBulpiooe
`
`aeduoy
`
`s}epipueyD
`
`sjuidiebul
`
`SOL
`
`ayepIpuey
`
`juUdebul4
`
`JUSUCdWUOD
`
`
`
`JOs}es}SOL
`
`sBulpio0ey
`
`SJO@YOYIM
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 2 of 20
`
`US 2003/0086341 Al
`
`90¢GO0e
`
`0d
`
`£0?
`
`
`
`Aouenbel4yoezul
`
`Aouenbai
`
`gyejauese}eJBUey
`
`JOJOJOBA,JO10]D8/A,Sul
`
`
`
`SBDUBLIEAjetjUuaDAquanbe14
`
`
`yoesulsejouepud|XE
`
`joxiyey\
`
`Aouanbel4
`
`soepnyduiy
`
`Sul]ye
`
`S|JEAIO}U|
`
`oul|
`
`uolsodwo0eq
`Aguenbel4
`Huluonipuod
`
`jeuBls
`
`
`
`
`
`judeBul4WIOY0}SONjEAJOPSAayeUs}EQUOy
`
`Burjoajes
`
`Buuepio
`
`pue
`
`Bunybiana
`
`a}BIBUS)
`
`JoJO}S/A
`
`sepny|duiy
`
`awl]Yyoes}e
`
`Q}e18U95
`
`JO10}09/,
`
`saiouenbel4
`
`oul|yoed}e
`
`802
`
`gb‘Sls
`
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication May 8,2003 Sheet 3 of 20
`
`US 2003/0086341 Al
`
`FIG. 2
`
`218 220
` 221
`
`Histogram
`
`Normalized
`
`224
`
`
`Histogram
`
`
`228
`
` Histogram
`
`
` Cumulative
`
`Probability
`Density
`
`
`
`
`Remapped
`Cumulative
`
` 226
`
`
`Probability
`Density
`
` Remapped
` Histogram
` 230
`Equalized
`
`
`Audio
`
`sample
`
`
`
`Patent Application Publication
`
`May 8, 2003
`
`Sheet 4 of 20
`
`US 2003/0086341 Al
`
`232
`
`Audio
`Sample
`
`
`
`Time
`frequency
`matrix
`
`244
`
`534 | frame
`
`Remove
`sensitive bands
`
`250
`
`
`Compute
`mean across
`
`
`bands and
`
`
`frames
`
`252
`
`DCT
`
`
`
`
`Split into
`
`frequency
`
`
`bands
`
`246
`
`240
`
`Normalize the
`
`DCT values
`
`Standard
`
`Normalize
`
`band and
`
`254
`
`frame means
`
`Concatenate band
`
`and frame means
`
`256
`
`into fingerprint
`
`
`
`
`
`Patent Application Publication
`
`
`volsnpJA
`SUI}Ul:SOWel]coe
`
`fouonboly|SAqsanien
`Owl}4cgjeiedasWWUOJSUBL|re8ZESUISO)
`
`ONG
`~LESssedpueg)epueg
`FOG
`
`(su9}|I4
`
`CCEOZE
`
`May8, 2003 Sheet 5 of 20
`
`US 2003/0086341 Al
`
`9ZE
`
`9190/981q
`
`
`
`
`
`Patent Application Publication May 8,2003 Sheet 6 of 20
`
`US 2003/0086341 Al
`
`512
`
`501
`
`sample
`
`component
`
`Reduced
`
`matrix
`
`frequency
`
`Windowed
`;
`audio
`frame
`
`508
`
`502
`
`Compute
`power
`matrix
`
`Transform
`to
`|
`cale
`O tog §
`
`509
`
`t
`
`Compute
`powerin time-
`frequency
`
`blocks
`
`Ear
`transfer
`function
`
`510
`
`Normalize
`
`power
`
`Generate
`fingerprint
`
`514
`
`F I g . 5
`
`513
`511 Discard
`
`
`Matrix
`
`506
`
`Advance
`frame
`
`505
`
`Time
`
`
`
`
`
`frequency and/or
`
`time bands
`
`
`
`Patent Application Publication May 8,2003 Sheet 7 of 20
`
`US 2003/0086341 Al
`
`FIG.6A
`
`FIG.6B
`
`FIG.6C
`
`NS
`Ww
`Ww)
`
`co
`i>
`uw
`
`°O
`
`_
`
`LO
`
`oO
`
`10
`LO
`
`o
`1.
`—
`
`CUM
`Oo
`
`So
`—
`
`«(1
`CS
`
`°
`
`vr
`
`LO
`
`oO
`
`=
`Ss
`
`CP
`vv
`
`©
`
`a
`
`°
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 8 of 20
`
`US 2003/0086341 Al
`
`JUusWAa|y
`
`UIY}IM
`
`Souelsip
`
`uopueqge
`
`yolees
`
`614
`
`al
`
`802
`
`SOA
`
`ON
`
`SOA
`
`
`
`wuewlel3E01
`
`
`
`Zouesipsan\uIIMqua
`
`youees
`
`SLL
`
`uopueqeON
`
`Vl‘Sls
`
`polUOpuegqe
`
`yoleas
`
`UIUIMSJUBLUE|e
`
`SoAQoue}sIp
`ebéONBuipsoesdUM
`
`
`
`uiyjimsyuaLue|eONSOZaoe20198
`
`
`
`
`BuipsoeidUyIM
`
`Sdq[IBJOJos
`
`Sd|B$0Jos
`
`yolees
`
`pLbbl
`
`202
`
`
`
`=3
`
`902
`
`LOZ
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication May 8,2003 Sheet 9 of 20
`
`US 2003/0086341 Al
`
`
`Compare distance of candidate
`from all reference FPsin set
`
`
`
`“7
`
`Choose
`
`closest
`
`distance
`
`Within
`
`distance
`
`
`
`No match in
`
`
`
`
`
`
`
`
`threshold?
`
`database
`
`FIG. 7B
`
`Matching
`
`
`
`
`
`fingerprint
`
`
`
`
`
`
`/
`
`©
`
`—
`
`Patent Application Publication May 8,2003 Sheet 10 of 20
`
`US 2003/0086341 Al
`
`oO
`9
`
`oO
`N
`
`QN
`
`i
`
`-
`
`oOel
`
`w
`
`;
`Oo
`
`t
`L
`i
`Q
`OQ
`oO
`oO
`oO
`oO
`S
`Oo
`Oo
`Oo
`oO
`N
`Oo
`co
`©
`-
`-
`
`™) PO |
`Oo
`©
`Oo
`Oo
`oO
`oa
`wt
`N
`
`oO
`
`O°
`(o>)
`oO
`oO
`
`
`
`
`
`FIG.8B
`
`Patent Application Publication May 8,2003 Sheet 11 of 20
`
`US 2003/0086341 Al
`
`oc
`
`o
`
`25
`
`15
`
`10
`
`
`
`
`
`Patent Application Publication
`
`————|.V6‘9ls
`
`
`36‘SIs
`
`d6Sls
`
`May8, 2003 Sheet 12 of 20
`
`US 2003/0086341 Al
`
`
`
` OLxaouerlsid
`
`
`
`
`
`
`d6‘Sls
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 13 of 20
`
`US 2003/0086341 Al
`
`018Za
`
`CLS40uolodold
`
`a)1
`
`VOLSIs
`
`gob‘Sls
`
`
`
`Patent Application Publication May 8,2003 Sheet 14 of 20
`
`US 2003/0086341 Al
`
`1500
`
`
`
`Candidate
`fingerprint
`
`1501
`
`1502
`
`Search LRU
`
`cache
`
`
`
`
`
`1507
`
`1508
`
`
`Return
` matching FP
`
`Populate
`
`LRU cache
`
`Fig. 11
`
`
`
`FP notin
`
`database
`
`
`
`
`
`Patent Application Publication May 8,2003 Sheet 15 of 20
`
`US 2003/0086341 Al
`
` 1601
`Stream of
`
`
`music
`
`
`
`1602
`
`at regular
`
`
`intervals
`
`Search
`
`1603
`
`
`
`database
`
` against
`
` Extract
`
`
`
`
`fingerprints
`1600
`
`Database offingerprints
`
`identified
`
`1605
`
`Fig. 12
`
`
`
`One
`
` Stream not
`
`or more
`
`fingerprint
`
`1604
`
`matches
`
`1607
`
`
`
`Stream
`
`additional
`
`
`identified
`
`Meets
`
`criteria
`
`
`?
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 16 of 20
`
`US 2003/0086341 Al
`
`
`
`
`
`S3UBISIPYdJBW1seso|7)——
`
`wee eee ee eet ee es! wee ee eee
`
`1''1i1,''1k
`
`rrr TPs
`
`ee pe de ee
`
`iVelSis
`
`OcL
`
`001
`
`6ebSls
`
`
`
`Patent Application Publication
`
`May8, 2003 Sheet 17 of 20
`
`US 2003/0086341 Al
`
`
`
`eeee
`
`eee eye eee yr rrr
`
`ee ee Je ee ede eee dee
`
`boo eee eb eee eee
`'''
`t'1‘
`
`‘
`procter
`
`eee
`
`ieee Joe ee eed ee
`
`eee qe eee
`
`BYSIBUUOY
`
`ANoot
`
`SIBLUO}gSSOl0e@jljJadsulodyesigpz-sbues$6
`
`06
`
`08OLog0SOvO€0gOl
`
`VbSls
`
`
`
`
`
`
`Patent Application Publication May 8,2003 Sheet 18 of 20
`
`US 2003/0086341 Al
`
`Song
`
`Choose a
`
`
`
`principal FP
`
`Extract FPs
`
`1701
`
`
`every 15
`
`Compute
`
`
`seconds
`distance between
`each FP and
`
`1704
`
`principal
`
`1702
`
`
`
`
`Fig. 15A
`
`Create
`
`representative
`vector
`
`
`4705
`
`
`
`youolj}e007
`
`US 2003/0086341 Al
`
`SUL|
`
`gs)einbis
`
`00
`
`Patent Application Publication
`
`May8, 2003 Sheet 19 of 20
`
`jediouud
`
`quudsaBuy
`
`Ol
`
`
`
`useMmjieqsoue}sid
`
`pueajdiouud
`
`
`
`sjuLdiabBuyJ@u}O
`
`
`
`Patent Application Publication May 8,2003 Sheet 20 of 20
`
`US 2003/0086341 Al
`
`1606
`
`oc
`
`FIG.16
`
`O
`
`©9
`
`)
`a)
`Lu
`oO
`
`Ooo
`
`1604
`
`
`
`US 2003/0086341 Al
`
`May8, 2003
`
`AUTOMATIC IDENTIFICATION OF SOUND
`RECORDINGS
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`[0001] This application is related and claims priority to
`US. provisional application entitled AUTOMATIC IDEN-
`TIFICATION OF SOUND RECORDINGS, havingserial
`No. 60/306,911, by Wells et al., filed Jul. 20, 2001 and
`incorporated by reference herein.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`
`1. Field of the Invention
`
`[0003] The present invention is directed to the identifica-
`tion of recordings and, more particularly, to the identifica-
`tion of sound recordings, such as recordings of music or
`spoken words.
`
`[0004]
`
`2. Description of the Related Art
`
`Identification is a process by which a copy of a
`[0005]
`sound recording is recognized as being the same as the
`original or reference recording. There is a need to automati-
`cally identify sound recordings for the purposes of registra-
`tion, monitoring and control, all of which are important in
`ensuring the financial compensationof the rights owners and
`creators of music. There is also a need for identification for
`
`the purposes of adding value to, or extracting value from the
`music. Registration is a process by which the owner of
`content recordshis or her ownership. Monitoring records the
`movementand use of content so that it can be reported back
`to the owner, generally for purposes of payment. Control is
`a process by which the wishes of a content owner regarding
`the use and movementof the content are enforced.
`
`[0006] Some examples of adding value to music include:
`identification of unlabelled or mislabeled content to makeit
`
`easier for users of the music to access and organize their
`music and identification so that the user can be provided
`with related content, for example, information about the
`artist, or recommendations of similar pieces of music.
`
`[0007] Some examples of extracting value from music
`include: identification for the provision of buying opportu-
`nities and identification for the purpose of interpreting
`something about
`the psychographics of the listener. For
`example, a particular song may trigger an offer to purchase
`it, or a related song by the sameartist, or an article of
`clothing made popular by that artist. This extracts value
`from the music by using it as a delivery vehicle for a
`commercial message. In addition, psychographics uses psy-
`chological, sociological and anthropological factors to deter-
`mine how a market is segmented by the propensity of groups
`within the market
`to make a decision about a product,
`person,
`ideology or otherwise hold an attitude or use a
`medium. This information can be used to better focus
`
`commercial messages and opportunities. This extracts value
`from the music by usingit to profile the listener.
`
`[0008] There have been two types of monitoring, reflect-
`ing the delivery of stored music and the delivery of played
`music. Stored music is considered to be copies for which
`there are “mechanical” or “reproduction” rights. Played
`music may be considered to be a performance, whether or
`not the performanceis live or recorded. This demarcation is
`
`reflected in different payment structures, which are admin-
`istered by different organizations. One organization (Harry
`Fox Agency) collects reproduction royalties when CDsor
`tapes are sold. These physical goods are counted and moni-
`tored using a variety of accounting practices and techniques.
`ASCAP, BMI and SESACcollect performance royalties
`when live or recorded music is played on the radio or in
`public spaces. These performances are monitored using a
`combination of automatic identification methods and human
`verification.
`
`[0009] There are several different methods used for deliv-
`ery of music. Live music is “delivered” in a performance
`space, by radio and TV (both analog and digital) and over
`the Internet. Stored music or other sound recordings may be
`delivered on physical media associated with the recordings
`(CDs, cassettes, mini discs, OD-RWs, DVDs) which may be
`moved (stored, distributed, sold, etc). However, a sound
`recording does not have to be associated with a physical
`medium; it can also be easily transported in electronic form
`by streaming, or by moving from one storage location to
`another. In both cases, either radio or the Internet may be
`used to transport the sound recording.
`
`[0010] Digital music and the Internet are changing the way
`music is delivered and used, and are changing the require-
`ments for music identification. These changes are brought
`about because the Internet can be used to deliver both
`performances and copies, and the Internet
`increases the
`number of delivery channels.
`
`{0011] Whereas a terrestrial radio station may reach one
`thousand listeners at any momentin time while playing the
`same one song, an Internet radio station may reach one
`thousand listeners at one time while playing one thousand
`different songs. This means that a larger and more diverse
`selection of songs must be identified.
`
`[0012] Existing business models for music are being chal-
`lenged. For example, CD readers attached to personal com-
`puters, and peer-to-peer services are making it easier to copy
`and exchange music. New methods for registering, moni-
`toring, controlling, and extracting value from music are
`needed.
`
`[0013] The copying of digital music is easy. Users are able
`to make copies on a variety of different media formats, for
`a variety of consumerelectronic devices. This creates a need
`to identify more copies of songs, across multiple media
`formats and types of device. Some of the devices are not
`connected to the Internet, which introduces an additional
`requirement on an identification system.
`
`[0014] There is a need for a single solution that can
`identify streamed or moved music acrossall delivery chan-
`nels. A single solution is preferable due to economies of
`scale, to remove the need to reconcile across methods and
`databases, and to provide a simple solution forall aspects of
`the problem.
`
`[0015] Current methods rely on attaching tags, water-
`marks, encryption, and fingerprints (the use of intrinsic
`features of the music). Tags are attached to the physical
`media or to the digital copy. The lowest common denomi-
`nator is the artist-title pair (ATP). Other information can
`include publisher, label and date. Attempts to give a sound
`recoding a unique ID include the ISRC (International Stan-
`dard Recording Code), the ISWC (International Standard
`
`
`
`US 2003/0086341 Al
`
`May8, 2003
`
`Work Code), the EAN (European Article Number), the UPC
`(Universal Product Code), ISMN (International Standard
`Music Number) and the CAE (Compositeur, Auteur, Edi-
`teur). All are alphanumeric codesthat are either attached to
`physical copies of the sound recording, or embeddedin the
`digital copy. Part of the rationale for creating the various
`codes was to assist with the automated identification and
`
`tracking of the works.
`
`[0016] However, there are problems with the use of ATPs
`and alpha-numeric codes. They can be easily detached or
`changed (as evidenced by the recent attempts by Napster to
`use ATPsto block content). Once detached or changed, they
`require human intervention (listening) to be reattached or
`corrected. There is no way to automatically authenticate that
`the content is what it’s tag claims it to be. They must be
`attached at source, prior to duplication, which reducestheir
`utility with legacy content. They are applied intermittently
`or incorrectly. They require a critical mass of industry
`participants to be useful. EAN/UPCidentify the CD and are
`not useful for individual music tracks. In some countries,
`there are laws against
`transmitting data along with the
`music, which limitstheir utility. Also, transmitting such data
`may require additional bandwidth.
`
`[0017] Watermarks add an indelible and inaudible signal
`that is interpreted by a special reader. Watermarks can be
`robust to noise. They are good for combinations of live and
`recorded content, for example where an announcer speaks
`over recorded background music. Watermarks can deliver
`additional information without the need to access a database.
`
`The problems with watermarksare: they are not necessarily
`indelible nor inaudible; they require addition at source, prior
`to duplication, and therefore have limited utility for legacy
`content; and if applied to legacy content, there still needs to
`be a wayto first identify the music.
`
`[0018] Encryption uses techniques embedded in software
`to make the content inaccessible without a key. Identification
`is done prior to encryption, and the identification informa-
`tion (metadata) is locked up with the music. Some of the
`problemswith encryption are: it has limited utility for legacy
`content, if applied to legacy content, there still needs to be
`a way to identify that content; and there is consumer
`resistance to locking up music. These problems are caused
`by incompatibilities between equipment that plays locked
`music and equipment that does not, leading to a reluctance
`to purchase equipmentthat maynot play their existing music
`collections and to purchasing music that may not play on
`equipment the consumers currently own.
`
`[0019] Another approach is to use intrinsic properties of
`the music to provide a “fingerprint.”The identifying features
`are a part of the music, therefore changing them changesthe
`music. The advantages of this method include: nothing is
`added to the music; the fingerprints can be regenerated at
`any time; fingerprints work on legacy content and do not
`require broad industry adoption to be applicable to all
`content; and fingerprints can madeofan entire song, and can
`therefore ensure that song’s completeness and authenticity.
`
`[0020] Current fingerprinting methodsare not suitable, for
`reasons that will be described in more detail later. Their
`limitations come about because of the requirements for (1)
`identifying large numbers of songs, and (2) identifying
`songs that have slight variations from the original. These
`variations are insufficient to cause a human to judge the
`
`songs as being different, but they can be sufficient to cause
`a machine to do so. In sum,
`the problems with current
`fingerprinting methods are that some systems can handle a
`large number of songs, but cannot handle the variations,
`while other systems can handle many variations, but cannot
`handle a large number of songs.
`
`[0021] Variations in songs may be caused by numerous
`“delivery channel effects.” For example, songs played on the
`radio are subjected to both static and dynamic frequency
`equalization and volume normalization. Songs may also be
`speeded up or slowed down to shorten or lengthen their
`playing time. Stored music can vary from the original
`because of the same effects found in radio, and because of
`other manipulations. The most common manipulationis the
`use of a codec to reduce the size of a file of stored music to
`
`make it more suitable for storage or movement. The most
`common codec is the MP3. The codec encodes the song to
`a compressed form, and at playback decodes, or expands,it
`for listening. An ideal codec will remove only those parts of
`the original that are minimally perceptually salient so that
`the version that has undergone compression and expansion
`soundslike the original. However, the process is lossy and
`changes the waveform of the copy from that of the original.
`Other manipulations and their manifestations (delivery
`channeleffects) are described below.
`
`identifying
`[0022] Existing methods are intended for
`stored sound recordings, and for identifying sound record-
`ings as they are being played (performances). The main
`distinctions between the two identification systemsare:
`
`[0023] Played music identification systems must be
`capable of identifying a song without any knowledge
`of the song’s start point. It is easier to find the start
`point in stored music.
`
`[0024] Played music identification can have an upper
`capacity of about 10,000 reference recordings.
`Stored music requires a larger capacity.
`
`is being
`[0025] Played music is identified as it
`played, so there is not a stringent requirement for
`speed of fingerprint extraction or lookup. For many
`applications, stored music must be identified at many
`timesreal time.
`
`[0026] Played music identification may be limited to
`several thousand radio stations. There is a need for
`
`stored music identification by tens of millions of
`individual music users.
`
`[0027] Played music must be identified in the pres-
`ence of manipulations that create variations from the
`original. Methodsof identifying stored music in the
`prior art are not designed to compensate for varia-
`tions.
`
`[0028] Both categories include techniques that rely on the
`use of intrinsic properties, the addition of metadata or the
`addition of inaudible signals. However the examination will
`concentrate on those identification techniques that use the
`intrinsic properties of the sound recording, either by them-
`selves, or in combination with other information.
`
`[0029] One commonly used technique for identifying cop-
`ies of music on a compact disc (CD) is to use the spacing
`between tracks and the duration of tracks or the “Table of
`
`Contents” of a CD to create a unique identifier for the CD,
`
`
`
`US 2003/0086341 Al
`
`May8, 2003
`
`as described in U.S. Pat. No. 6,230,192. The CD identity is
`used to lookup the name and order of the tracks from a
`previously completed database. This method does not work
`once the music has been removed from the CD, and is a copy
`on a computer hard drive.
`
`bursts, and signal dropout. It is capable of monitoring for
`one of approximately 10,000 songs in each of 5 radio
`channels. The disclosed technique is fairly robust, but the
`size of the database of reference songsis limited, primarily
`due to the database search techniques used.
`
`[0030] Another technique uses a hash algorithm to label a
`file. Hash algorithms, such as the Secure Hash Algorithm
`(SHAL) or MD5,are meantfordigital signature applications
`where a large message has to be “compressed” in a secure
`manner before being signed with the private key. The
`algorithms maybe applied to a musicfile of arbitrary length
`to produce a 128-bit message digest. The benefits of the hash
`values are they are quick to extract, they are small in size,
`and they can be used to perform rapid database searches
`because each hash is a unique identifier for a file. The
`disadvantages include:
`
`(1) The algorithms are designed to be secure
`[0031]
`to tampering, so any change to the file, however
`minor, will result in a different hash value. As a
`result,
`the hash value changes when the file is
`subjected to any of the channeleffects. For example,
`there are on average 550 variants of each song on a
`large file sharing exchange such as Napster. A slight
`alteration of a song(e.g. the removal of one sample)
`will result in a different hash, which will not be able
`to be used to identify the song.
`
`(2) Each variant of a song file requires that a
`[0032]
`different hash be stored in the database, resulting in
`a large database with a many-to-one relationship.
`
`‘Yet another technique is described in U.S. Pat. No.
`[0033]
`5,918,223. The method extracts a series of feature vectors
`from a piece of music which it then sends to a database for
`identification. The advantages of this technique are that the
`feature vectors consist of intrinsic properties of music that
`are claimed to be perceptually salient. This meansthat they
`should be robust to manyofthe distribution channel effects.
`The disadvantages are:
`
`(1) The feature vector
`[0034]
`intensive to extract
`
`is computationally
`
`[0035]
`
`(2) The feature vector is large, which means:
`
`(a) It takes long time to look up and is
`[0036]
`expensive to implement
`for large numbers of
`queries.
`
`[0037]
`traffic
`
`(b)
`
`It
`
`increases the amount of network
`
`(3) Each individual vector does not contain
`[0038]
`sufficient information to uniquely identify a song.
`Identification is accomplished after a series of fea-
`ture vectors are matched in the database. The data-
`
`base therefore takes a long time to search and must
`be limited in size.
`
`(4) There is no evidence that the technique is
`[0039]
`immuneto all delivery channeleffects.
`
`[0040] One method for identifying played sound record-
`ings is described by Kenyon in US. Pat. No. 5,210,820. The
`820 patent is primarily designed for radio station monitor-
`ing wherethe signal is acquired from listening stations tuned
`to a terrestrial radio station ofinterest. The system is capable
`of identifying songs irrespective of speed variation, noise
`
`Identifying all sound recordings includes stored
`[0041]
`music for around 10 million different songs in early 2002.
`For streamed music this numberis in the tens of thousands.
`The prior art has focused on streamed music with a much
`smaller number of songs.
`
`Identifying legacy content applies to approxi-
`[0042]
`mately 500 billion copies of digital music in existence.
`Methods that require the music to be identified at the point
`of origin cannot identify these copies.
`
`[0043] New content consists of relatively few songs that
`comprise the majority of popular music, distributed from a
`few points of origin, with processes in place to control the
`workflow, plus a larger number of songs distributed from
`many points of origin. These points are geographically
`distributed, and have diverse methods of workflow manage-
`ment. Therefore, methods that require the music to be
`identified at the point of origin cannot identify the majority
`of songs.
`
`SUMMARYOF THE INVENTION
`
`[0044] An aspect of the invention is to automatically
`identify all sound recordings, including legacy content and
`new content.
`
`[0045] Another aspect of the inventionis to identify sound
`recordings rapidly. The system should be able to identify
`music at many timesreal time. For example a three minute
`song should be identified in less than three seconds.
`
`[0046] A further aspect of the invention is to automatically
`identify sound recordings with computational efficiency of
`extraction and lookup. Computational efficiency of the fin-
`gerprint extraction and lookupis desirable because many of
`the songs will be identified on consumerelectronics devices
`with limited processing power.
`
`‘Yet another aspect of the invention is to automati-
`[0047]
`cally identify sound recordings using a small fingerprint
`extracted from each sound recording and compact lookup
`code. Both are desirable because many of the songs will be
`identified on consumer electronics devices with limited
`storage space.
`
`[0048] Astill further aspect of the invention is to identify
`sound recordings whether the tags are absent or incorrectly
`applied, whether intentionally or not.
`
`‘Yet another aspect of this invention is to automati-
`[0049]
`cally identify variations of sound recordings where those
`variations are caused by delivery channeleffects. The mani-
`festations of those effects that should be considered include:
`
`(1) DC value—the average value of a digi-
`[0050]
`tized song waveform amplitude in the time domain.
`
`(2) Phase Inversion—the process of multiply-
`[0051]
`ing every time domain digital sample of a song
`waveform by -1. For a multichannel song, phase
`inversion is applied to all channels.
`
`
`
`US 2003/0086341 Al
`
`May8, 2003
`
`(3) Pitch-invariant speed increase—the pro-
`[0052]
`cess of speeding up the playback rate of a song
`without affecting its pitch.
`
`(4) Peak limiting—the process of limiting the
`[0053]
`maximum signal amplitude to a specified threshold.
`
`(5) Volume normalization—the process by
`[0054]
`which the gain of an audio file is increased until its
`loudest point (or sample) is at maximum level.
`
`(6) Dynamic range reduction—the process by
`[0055]
`which the dynamic range of a sound is reduced.
`Dynamic Range is the ratio of the strongest, or
`loudest part
`to the weakest, or softest, part of a
`sound; it is measured in dB
`
`(7) Equalization—theprocess usedto alter the
`[0056]
`relative balance of frequencies to produce desired
`tonal characteristics in sounds.
`
`(8) Remastering—the process of mastering a
`[0057]
`recording after the first mastering has been done.
`May happen when the “master tape” is re-processed
`because a recording is reissued, or included in a
`different album. Sometimes an actual mastering
`house is used, and other times the “mastered” mate-
`rial is sent directly to a duplication facility where
`they can also do the final few steps. Typical master-
`ing effects include many potential processes of the
`audio signal such as equalization, compression, lim-
`iting, normalization, wideningthe stereo image, edit-
`ing fades, and just putting the songs in the correct
`order
`
`in kbs at which an
`(9) Bit rates—the rate,
`[0058]
`original song is compressed by a codec.
`
`(10) Start time variations—variations in the
`[0059]
`fingerprint caused by whatdifferent players consider
`the start of a song.
`
`(11) Different rippers—variations in the fin-
`[0060]
`gerprint
`caused by different
`rippers
`(software
`devices that extract a song from a CD for compres-
`sion).
`
`(12) Codecs—variations in the fingerprint
`[0061]
`caused by different coding and decoding schemes.
`
`(13) Watermarking—variations in the finger-
`[0062]
`prints caused by the addition of a watermark.
`
`(14) Addition of noise—variations in the fin-
`[0063]
`gerprint caused by the addition of noise to the audio,
`from various sources.
`
`[0064] The requirementsfor being able to deal with legacy
`content preclude systems based on encryption, watermark-
`ing or tagging at source. The requirement to be robust to
`simple manipulations of the tags precludes tagging systems.
`This leaves fingerprinting as the only way of meeting most
`of the requirements.
`
`[0065] An additional requirement for some applications is
`that the entire song be checked to ensure thatit is all present
`and correct. Reasons for this requirement:
`include:
`(1)
`quality assurance where the rights owner of a song, or an
`artist, may wish to assure that their song is only distributed
`in its entirety, and (2) prevention of spoofing which relates
`to attempts to misrepresent identification which may be a
`
`tactic used to distribute songs illegally over a network.If a
`fingerprint is taken from a small section of the song, such as
`near the beginning, someone trying to spoof the system
`might prepend a section of a legal song onto the front of an
`illegal song.
`
`[0066] A further aspect of this invention is automatic
`identification and authentication of entire songs.
`
`[0067] The above aspects can be attained by a method of
`identifying recordings by extracting at least one candidate
`fingerprint from at
`least one portion of an unidentified
`recording; and searching for a match between at least one
`value derived from the at least one candidate fingerprint and
`a value in at least one reference fingerprint amonga plurality
`of reference fingerprints.
`
`[0068] These together with other aspects and advantages
`which will be subsequently apparent, reside in the details of
`construction and operation as more
`fully hereinafter
`described and claimed, reference being had to the accom-
`panying drawings forming a part hereof, wherein like
`numerals refer to like parts throughout.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0069] FIG. 1A is a flowchart of a fingerprint component
`testing procedure according to an embodimentof the inven-
`tion.
`
`[0070] FIG. 1B is a flowchart of a procedure for finger-
`print creation according to an embodiment of the present
`invention.
`
`FIG.2 is flowchart of the procedure for histogram
`[0071]
`equalization of soundfiles.
`
`[0072] FIG. 3 is flowchart of the procedure for band-by-
`band frequency equalization.
`
`[0073] FIG. 4 is a block diagram of time-frequency
`decomposition to create a matrix of frequency amplitude at
`time intervals in FIG. 1B.
`
`FIG.5 is flowchart of the procedure for creating a
`[0074]
`fingerprint based on a perceptual model of audition.
`
`[0075] FIGS. 6A-6C are wavelet based fingerprints of
`three songs with three variants each.
`
`[0076] FIGS. 7A and 7B is flowchart of a procedure for
`searching a database of reference fingerprints.
`
`[0077] FIGS. 8A and 8B are graphs of SRR search
`parameters overlaid on an example of a fingerprint.
`
`[0078] FIGS. 9A-9D are graphs of the distributions of
`matches for wavefiles, blade 128files, blade 32 files and fhg
`128 kg MP3files.
`
`[0079] FIG. 10A is a graph of the efficacy of Search by
`Range Reduction.
`
`[0080] FIG. 10B is a graph oftotal error (type 1+type 2)
`as a function of the Search by Range Reduction threshold.
`
`[0081] FIG. 11 is flowchart of a procedure for combining
`fuzzy and exact matches between candidate and reference
`fingerprints.
`
`[0082] FIG. 12 is flowchart of a procedure for using
`fingerprints to identify a stream of music.
`
`
`
`US 2003/0086341 Al
`
`May8, 2003
`
`[0083] FIG. 13A is a graph of the distance of the closest
`match based on extracting one fingerprint every second from
`a sample song.
`
`[0084] FIG. 13B is a graph of the song ID in the database
`corresponding to the closest match in FIG. 7A.
`
`[0085] FIG. 14 is a graph of percentage agreement
`between machine-extracted and human-extracted break-
`points (accuracy) for 95 songs.
`
`[0086] FIG. 15A is flowchart of a procedure for repre-
`senting an entire song as a compact vector.
`
`[0087] FIG. 15B is graph of the procedure illustrated in
`FIG. 15A.
`
`[0088] FIG. 16 is a simplified block diagram of a syst