`a2) Patent Application Publication co) Pub. No.: US 2003/0028796 Al
`
` Robertset al. (43) Pub. Date: Feb. 6, 2003
`
`
`US 20030028796AL
`
`(54) MULTIPLE STEP IDENTIFICATION OF
`RECORDINGS
`
`(52) US. Cd. oes cccceecetecnsesseessesnnesnesnsessneenneees 713/193
`
`(75)
`
`Inventors: Dale T. Roberts, San Anselmo, CA
`(US); David Hyman, Kensington, CA
`(US); Stephen White, Berkeley, CA
`(US)
`
`Correspondence Address:
`Satere aoe iw
`SUITE 500
`,
`WASHINGTON,DC 20001 (US)
`
`(73) Assignee: Gracenote, Inc.
`
`(21) Appl. No.:
`:
`(22)
`Filed:
`
`10/208,189
`dul. 31,, 2002
`Related U.S. Application Data
`
`(60) Provisional application No. 60/308,594,filed on Jul.
`31, 2001.
`
`Publication Classification
`
`(SV)
`
`Tint, C07 ieee eeeeeeecceececceeesesnneeeeceececennneee HO4L 9/32
`
`(57)
`
`ABSTRACT
`
`Multiple information is extracted from an unknownrecord-
`ing and information associated therewith. Associated infor-
`mation includes the filename,if the recording is a computer
`file in, e.g., MP3 format, or table of contents (TOC) data, if
`the recording is on a removable medium,such as a compact
`disc. At least one and preferably several algorithmically
`determined fingerprints are extracted from the recording
`using one or more fingerprint extraction methods. The
`information extracted is compared with corresponding intor-
`mation in a database maintained for reference recordings.
`Identification starts with the most accurate and efficient
`method available, e.g., using a hash ID, a unique ID ortext.
`Fingerprint matching is used to confirm other matches and
`validation is performed by comparing the duration of the
`unknown and a possibly matching reference recording.
`
`as
`
`oo
`
`Master Metadata Database
`
`
`
`Audio
`
`Fingerprint
`
`DB J
`
`
`
`430
`
`460
`
`
`
`Title
`Artist/Author Name
`
`Owner Namea
`
`Video
`Fingerprint
`DB
`
`
`
`
`470
`
`EX1063
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`EX1063
`Roku V. Media Chain
`U.S. Patent No. 10,489,560
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 1 of 6
`
`US 2003/0028796 Al
`
`06py
`
`OAI}CI]SIUILU
`
`OZ/
`
`JOAIOS
`
`
`
`Buryoye|1X8
`
`BuryoyewGl
`
`Zyuluduebul|juduebul4
`
`Nyuuduebul4
`
`
`
`uovepleAuoReing
`
`
`
`a01AaqJUaI|D
`
`sjuldieBbul4
`
`BVEpejsyy
`
`
`
`‘qlenbiun
`
`‘alweusll4
`
`
`
`bel€dl
`
`uoneing
`
`yoeNXy
`
`Seoepeyu|
`
`pueBurodey
`
`
`Oe}OLl
`
`
`
`sjoo|siskjeuy
`
`uyMs]Insoy
`
`JOJOAQ}
`
`9BOUSPIJUOD
`
`eyepobesn
`
`Or!
`
`Buipuooey
`
`OOt
`
`b‘Sls
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 2 of 6
`
`US 2003/0028796 Al
`
`
`
`
`
`
`
`eipawpaziusooeywayskguon|UB008ypeyluepIUN Bulpiosey
`
`
`
`
`peyeioosseYIMlL,|“llsjen6iq‘dAd‘do‘alld1ey6ig‘GAG‘dd
`0zz+O1Zool
`
`
`
`
`
`Jepeojdnyuudiebulypuasqijuudebul4Joyenxgjuudiebul4
`gyerepeyouOWCi
`
`09¢=Ovea)qualuoyoeyxyjuudsebulyoCO€?¢
`
`
`
`
`
`
`06Z=oez—JeMleguoUBooayyudiebul4—que+—ogz
`
`
`
`
`juldia6bul4BuryowsyuLdiebul4anlaoeyJuudsebul4quudiebul
`
`
`
`asEeqeiedUOHepIeAayse9g
`
`Odl
`
`Cc
`
`‘Old
`
`|———
`
`TWOL
`
`
`
`
`
`ayseg
`
`|
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 3 of 6
`
`US 2003/0028796 Al
`
`
`
`
`
`oe8YSEH(s)aunyeuBls4MBeyZaca‘JULCISDUI)WUOJOABAA
`‘‘quudseBuyWiojone(TetwreCIAL(|Nola
`
`Bulyojeyyxe]pueGWEN2!
`
`
`UIE,aIl4
`
`
`
`oseBurysolgGINL
`
`
`
`SPP,XO]UBSJOPUBSq]
`
`UIAalyBej]-01-QINubissy
`ABy!AO9F
`
`
`
`oveQUILUA}Ep0}BUIOApukeoI}ISINSH2gy)Aqunos
`yoyewjoKoeinooe5SSOIPPYdi
`
`aienbunOeQuoneingall
`
`a
`
`
`
`Sule¢*©)|4ismiyjenltr8
`
`SdINL
`
`poublsseoe
`
`
`
`
`
`SACIISIUILUDYOe}yndu|ywuslD
`
`SOOELS}U]USEH
`
`Ole
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 4 of 6
`
`US 2003/0028796 Al
`
`
`
`al
`
`yseH
`
`OLP
`
`da
`
`ad
`
`
`
`
`
`asegejeqeyepeyayyJe}sey\
`OWENJOUINVASINY
`
`eeee
`SWENJAUMO
`
`
`judsebui4quuduadul4
`oaplAolpny
`
`WL
`
`3ed
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 5 of 6
`
`US 2003/0028796 Al
`
`
`
`
`
`oe2
`
`
`
` A
`
`
`
`
`
`osg_f
`029om
`o99—_S
`ogg|
`oS|
`0024
`
` —»|yjnseya Jojebai66e
`_____}
`
`
`
`
`
`
`
`|9A9|UUMJINSeY
`j@AQ|YIMyNSOY
`JO[BAS]YMJINSSY
`JoJang]UMyNsey
`JO[Ona]YIMJINSSY
`AB]UIIMJINSOY
`BOUBPYUOSJO
`BIUBPIJUODJO
`BOUSpUOSJO
`SoUSpyUuoo
`Souspyuao
`SouUapIyUoo
`
`
`<—_—,Buiyayey)x01
`org
`osg—/|J
`
`laIzquudueBulyoaplA
`ozo~S
`oro|pat—_
`0€9ae
`
`
` A
`—Bulyoyeyydi}anbiuy|l«
`
`
`
`
`JaAIegUoUBODaY
`
`
`NJuLdiebulyospiA,
`
`NquudieBulyolpny
`
`ZJuuduabulyoipny
`
`|juldieBulyoaplA,
`
`USeH|2d9WNN
`uoneingYoel
`
`UOWepIIEA
`Buiyojyeyy
`
`
`
`
`
`
`1}=uoneingyore,_)}
`av‘Sis
`
`
`
`sjuudiebulyolpny
`
`
`yuudiaBul4capi,_}
`(6e1gal‘ewenya|l4)
`
` USBH|BOWOWNN|Jass
`
`BEpe}aW
`
`ao1Aaqquell
`ajenbun—|_/
`PeyUSplup)
`(ain)OLS
`
`
`
`Och
`
`|
`
`
`
`A
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Buipsooey
`
`OLL
`
` ~~|08SOFS
`06S
`
`OOL
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 6 of 6
`
`US 2003/0028796 Al
`
`sonsiinoy
`
`eseqejep
`
`—a
`
`
`
`
`
`i
`
`028
`
`peziuBoosay
`
`sjinsoy
`
`s]insoy
`
`oeolPususeH
`SOUSPYUODYSIH
`
`
`JP‘Sls
` A
`~
`BOUSPI|UODMO7
`MalAayJenuel|
`
`soljsunapyAjddy
`peziuBosay=|_/
`uontuBosay
`Buissao0ld
`SAIC!0}
`waysksLV/uomuBooes
`O22pez|uBOosIUn|oti
`
`
`
`QO}WOISSILUGNS2J092
`peziuBbodeiuf)
`ulgBuipjoy
`
`
`
`80d
`
`s}|Nsel
`
`syinsoy
`
`o1poa
`
`OL8
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`US 2003/0028796 Al
`
`Feb. 6, 2003
`
`MULTIPLE STEP IDENTIFICATION OF
`ID3V2 tag for anotherfile and theretore, the ‘UID alone is
`not a reliable identifier of the audio contentinafile.
`RECORDINGS
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`{0001] This application is related to and claimspriority to
`U.S. provisional application entitled DIGITAL MUSIC
`MULTIPLE STEP IDENTIFICATION METHOD AND
`
`SYSTEM havingserial No. 60/308,594, by Dale T. Roberts,
`et al., filed Jul. 31, 2001, and incorporated by reference
`herein.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`
`1. Field of the Invention
`
`[0003] The present invention is directed to recognition of
`recordings from their content, and, more particularly to
`combining fingerprint recognition with other information
`about a recording to increase reliability of recognition and to
`accomplish reliable recognition efficiently by using the least
`expensive forms of recognition first and layering on more
`complex forms as needed.
`
`[0004]
`
`2. Description of the Related Art
`
`[0005] There are many uses for recognition of audio (and
`video) recordings. Many of the uses relate to compensation
`or control by the rights holders for reproduction and per-
`formance of the works recorded. This use of such systems
`has increased in importance since the developmentoffile
`sharing software, such as Napster, and the many other
`similar services available at the end of the twentieth century
`and the beginning of the twenty first century. Although the
`need for accurate recognition has been significant for several
`years, no system has been successful in meeting this need.
`
`[0006] Another use of recording recognition is to provide
`added value to users whenlistening (or watching) record-
`ings. One example is the CDDB Music Recognition Service
`from Gracenote, Inc. of Berkeley, Calif. which recognizes
`compact discs (CDs) and supplies information regarding a
`recognized CD, such as album name, artist, track names and
`access to related content on the Internet, including album
`covers, arlist and fan websites, etc. While tae CDDB service
`is effective for recognizing compactdiscs, there are several
`draw backs in using it to recognize files that are not stored
`on a removable disc, such as CD or DVD.
`
`[0007] All audio fingerprinting techniques have “blind
`spots”, places where a system using that technique sees
`similarities and differences in audio where it shouldn’t. By
`relying on just one fingerprinting technique, single source
`solutions are less accurate when encountering a ‘blind spot’.
`
`[0008] One of the more popular uses for the Gracenote
`CDDBsystemis in applications that digitally encode audio
`files into MP3 and other formats. These encoding applica-
`tions utilize Gracenote’s CDDB service to recognize the
`compact disc being encoded and to write the correct meta-
`data into the title and ID tags. Gracenote’s CDDBservice
`returns a unique ID (TUID) for each track and supports the
`insertion of such IDs in the ID3V2 tags for MP3 files. The
`TUIDis both hashed and proprietary, and can only be read
`by the Gracenote system. However, the ID3V2 tags can
`easily be manipulated to store a TUID for one file in the
`
`[0009] Gracenote’s CDDB service also provides text
`matching capability that can be utilized to identify digital
`audio files from their file names, file paths, ID tags(titles),
`etc. by matching the text extracted by a client device to a
`metadata databasc of track, artist, and album names.
`Although this text matching utilizes user-generated spelling
`variants associated with each record to improve recognition,
`there has been no way to verify that the text matches the
`audio content of the recording once the recording has been
`separated from a compact disc and stored in a file in any
`format.
`
`SUMMARYOF THE INVENTION
`
`[0010] An aspect of the present invention is maximizing
`identification of recordings while minimizing resource
`usage.
`
`{0011] Another aspect of the present invention is using
`multiple identification methods so that resource intensive
`methods, such as audio fingerprinung, are employed only
`when necessary.
`
`{0012] A further aspect of the invention is minimization of
`processing of unidentified data.
`
`[0013] Yet another aspect of the present inventionis to use
`the least expensive recognition technique, with progres-
`sively more expensive recognition techniques layered onto
`the process until a desired confidence level is reached.
`
`{0014] A still further aspect of the invention is validation
`of content-based identification of a recording by comparing
`text associated with an unidentified recording and text
`associated with identification records.
`
`[0015] Yet another aspectof the present invention is use of
`recording identification methods from different sources to
`increase reliability.
`
`(0016] A still further aspect of the invention is validation
`of content-based recording identification using fuzzy track
`length analysis.
`
`[0017] Yet another aspect of the invention is automatic
`extraction of identification data for use in a reference data-
`
`base and for identification of recordings.
`
`{0018] A still further aspect of the invention is that uni-
`dentified recordings are periodically re-run through the
`system to determine if recently added data or recently
`improved techniques will result in recognition.
`
`[0019] The above aspects can be attained by a method of
`identifying recordings by extracting information about an
`unknownrecording stored in media possessed by a user and
`at least one algorithmically determined fingerprint from at
`least one portion of the unknownrecording; determining a
`possible identification of the unknown recording using at
`least one piece of the information extracted from the
`unknown recording and an identification database of corre-
`sponding information for reference recordings; and identi-
`fying the unknown recording when the possible identifica-
`tion based on eachoftheat least one piece of the information
`in combination with the at least one algorithmically deter-
`minedfingerprintidentifies a single reference recording with
`
`
`
`US 2003/0028796 Al
`
`Feb. 6, 2003
`
`respective confidencelevels. ‘he at least one portion of the
`unknownrecording may contain audio, video or both.
`
`the database is maintained by a pro-
`[0020] Preferably,
`vider of identification services which supplies unique iden-
`tifiers that can be recognized only by servers under the
`control of the provider of identification services. The unique
`identifiers are associated with recordings once they have
`been identified. Subsequently, copies of the recordings are
`recognized using the unique identifiers to greatly speed up
`the process. The unique identifiers optionally are cached in
`high-speed RAM orspecially indexed database tables.
`
`(0021] When non-waveformdata is not available for an
`unknown recording,
`the unknown recording is preferably
`identified by extracting fingerprints from at least one portion
`of the unknownrecording using a plurality of algorithms;
`determining a possible identification of the unknownrecord-
`ing using at least two of the fingerprints extracted from the
`unknownrecording andat least one database of correspond-
`ingly generated fingerprints for reference recordings; and
`identifying the unknown recording when the possible iden-
`tification based on cachofthe fingerprints identifics a single
`reference recording with respective confidence levels.
`
`[0022] Preferably, an existing database, used to identify
`recordings possessed by users, which does not contain
`fingerprint information is expanded by obtaining non-wave-
`form data associated with a recording possessed by a user of
`the database; extracting at least one fingerprint from at least
`one portion of the recording; and storing the at least one
`fingerprint as identifying information for the recording,
`when a matchis foundin the database for the non-waveform
`data. One example is that during the process of encoding
`digital music files from an audio CD possessed by a user, a
`recognition system can be used to identify the audio CD so
`that fingerprints extracted during the encoding process can
`be directly associated with the audio CD using a unique ID
`system.
`
`[0023] Recognition of recordings using either fingerprints
`or unique identifiers is preferably validated by other infor-
`mation maintainedin the identification databasc, such as the
`length of the recording or a numeric identifier embedded
`within the recording. Information about recordings that do
`not pass validation or match some, but not all of the
`information used for identification, may be stored for later
`analysis of the reason for the error. If the fingerprints are
`obtained as described above, there may have been an error
`in obtaining the fingerprint. Therefore, errors may be output
`to an operator, or the system could correct the information
`stored in the database, based on recognition ofpatternsin the
`ioformation that
`is stored for
`improper matches. For
`example, if a large percentage of matching fingerprints are
`stored,but the other information consistently does not match
`them, there could be an error in the fingerprint database
`which needs to be flagged to an operator.
`
`[0024] The present invention includes a system for iden-
`tifying recordings that includes an extraction unit to extract
`information about an unknown recording stored in media
`possessed by a user andat least one algorithmically deter-
`mined fingerprint from at least one portion of the unknown
`recording; and an identification unit, coupled to the extrac-
`tion unit, to make a possible identification of the unknown
`recording using at
`least one piece of the information
`extracted from the unknownrecording and an identification
`
`database of corresponding information for reference record-
`ings, and to identify the unknown recording when the
`possible identification based on eachofthe at least one piece
`of the information in combination with the at least one
`
`identifies a single
`algorithmically determined fingerprint
`reference recording with respective confidence levels.
`
`[0025] The present invention also includes a system for
`identifying recordings that includes an extraction unit to
`extract fingerprints from at least one portion of an unknown
`recording using a plurality of algorithms, and an identifica-
`tion unit, coupled to said extraction unit, to make a possible
`identification of the unknownrecording using at least two of
`the fingerprints extracted from the unknownrecording andat
`least one database of correspondingly generated fingerprints
`for
`reference recordings, and to identify the unknown
`recording when the possible identification based on each of
`the fingerprints identifies a single reference recording with
`respective confidence levels.
`
`the
`In either of the systems described above,
`[0026]
`extraction unit
`is typically a client unit connected by a
`network, such as the Internct, to at least one server as the
`identification unit. The client device may be a personal
`computer with a drive accessing the recording, a consumer
`electronics device with a network connection, or a server
`computer transmitting the unknown recording from one
`location to another. Furthermore, a portion of the database
`may be available locally and the extraction unit and identi-
`fication unit may reside in the same device and share
`components.
`
`{0027] The present invention also includes a system for
`obtaining reference information stored in a database used to
`identify unknown recordings, including a receiving unit to
`obtain non-waveform data associated with a recording pos-
`sessed by a user of the database for identification of record-
`ings possessed by the user; an cxtraction unit to extract at
`least one fingerprint from at least one portion of the record-
`ing; and a storage unit, coupled to said receiving unit and
`said extraction unit, to store the at least one fingerprint as
`identifying information for the recording, when a matchis
`found in the database for the non-waveform data.
`
`[0028] These together with other aspects and advantages
`which will be subsequently apparent, reside in the details of
`construction and operation as more fully hereinafter
`described and claimed, reference being had to the accom-
`panying drawings forming a part hereof, wherein like
`numerals refer to like parts throughout.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0029] FIG. 1 is a functional block diagram of a system
`according to the present invention.
`
`{0030] FIG. 2 is flowchart of a fingerprint extraction
`according to the present invention.
`
`[0031] FIG. 3 is a flowchart of a method of recognizing
`unknownrecordings.
`
`FIGS. 4A-4C are a block diagram of a system
`[0032]
`according to the present invention.
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`invention, a suite of
`[0033] According to the present
`identification components are provided in a system likethat
`
`
`
`US 2003/0028796 Al
`
`Feb. 6, 2003
`
`illustrated in FIG. 1 to facilitate analysis and identification
`of audio (and video) files utilizing multiple methods. Pref-
`erably, an existing database 90 containing recording identi-
`fiers and text data is combined with text-based digital audio
`and audio fingerprinting identification methods. Preferably,
`the text data in database 90 is obtained from user submis-
`
`sions and includes user-submitted spelling variants. One
`such database is available as the CDDB Music Recognition
`Service from Gracenote, Inc..
`
`[0034] As illustrated in FIG. 1, a recording 100 is
`accessed by client device 110 via any conventional method,
`such as reading a digital audio file from a hard drive or a
`compact disc. Information is extracted from recording 100
`and associated information (metadata). Fingerprints are
`extracted from recording 100, as described in more detail
`below. The information that is extracted from the metadata
`includes the duration of the recording which is the track
`length (from the TOC) for a CD track,the filename and ID3
`tag if the recording is in an MP3file, and the table of
`contents (TOC) data if the recording is on a CD. If thefile
`containing the recording was produced by a client device
`operating according to the invention, a unique ID will be
`extracted from the ID3file, but initially it will be assumed
`that information ts not available.
`
`In an exemplary embodiment, the extracted infor-
`[0035]
`mation is sent from client 110 to server 120 to determine a
`
`possible identification of the unknown recording using at
`least one piece of the information extracted from recording
`100 and a database 130 of correspondingly generated fin-
`gerprints for reference recordings. If text or a unique ID
`were extracted, an attempt is made to find a match. If a
`match is found using the text or unique ID, at least one
`algorithmically determined fingerprint is compared with the
`fingerprint(s) stored in the matching records to determine
`whether there is a single reference recording that matches
`the information extracted from recording 100 with respec-
`tive confidence levels for each item of information that
`matches. If no matches can be found based on text and
`
`is made to identify the a single
`unique ID, an attempt
`reference recording using at least two of the fingerprints
`extracted from recording 100. If a single reference recording
`is located using either method, preferably the duration of
`recording 100 is compared with the duration of the single
`reference recording as a final validation step.
`[0036] Preferably related metadata is used for validation
`of the match obtained by fingerprint recognition. Like any
`recognition system fingerprinting can produce erroneous
`results. Without a validation component such an error can
`propagate throughout the system and return erroneous data
`to large percentages of users. The use of validation criteria
`such as track length comparison enables the system to catch
`potential errors and flag them for validation.
`[0037] A-system according to the present invention pref-
`erably includes customresult reporting and flexible admin-
`istrative interfaces 130 to enable weighting of various iden-
`tification methods and the order of their engagement.
`Analysis of successful match rates for specific identification
`methods allows an administrator to manipulate the identi-
`fying criteria for each component to maximize the identifi-
`cation probability. A system according to the present inven-
`tion preferably incorporates usage data from over 28 million
`users utilizing the CDDB database via Gracenote Data
`Services division, to help guide results 140.
`
`‘he flexibility of a system according to the present
`[0038]
`invention allows different configurations ta be used for
`identifying recordings in different environments. An appli-
`cation that monitors streaming audio, for example, requires
`a very different system and solution architecture than one
`that identifies files in a peer-to-peer system, or one that
`identifies analog input. However the present invention can
`be configured for identification of recordings in each of
`these situations.
`
`[0039] A-systcm according to the present invention maxi-
`mizes identification while minimizing resource usage. The
`use of multiple identification methods ensures that more
`resource intensive methods, such as audio fingerprinting are
`employed only when necessary. The use of multiple audio
`fingerprinting technologies reduces data collision and covers
`any “blind spots” in a given audio fingerprint technology.
`The “blind spots” found in single source fingerprinting
`systems, are avoided by using multiple sources for different
`fingerprinting techniques. This also provides the ability to
`fine tune deployment for specific target applications.
`
`[0040] Preferably, fingerprints are obtained using multiple
`fingerprint recognition services using the method illustrated
`in FIG. 2. This increases the ability of the system to
`accurately recognize recordings of various types.
`
`illustrated in FIG. 2, when unidentified
`[0041] As
`(unknown) recording 100 is accessed by fingerprint extrac-
`tion client 110,if possible conventional TOC/file recognition
`is performed by recognition system 210 and results 220 are
`returned to fingerprint client 110. Results 220 include a
`unique identifier (TUID) that points into a master metadata
`database (not shown in FIG. 2), if the TUID is found.
`Recording 100 is also processed by fingerprint extractor 230
`using at least one and preferably several different algorith-
`mically derived fingerprint extraction systems to obtain
`fingerprint(s) which are stored in fingerprint/ID send cache
`240. As described below in more detail, instructions are
`received regarding when fingerprint uploader 250 should
`send the fingerprints to fingerprint recognition server 120.
`
`In fingerprint recognition server 120, the finger-
`{0042]
`prints transmitted by fingerprint uploader 250 are initially
`stored in fingerprint receive cache 260. The fingerprints then
`undergo fingerprint validation 270 using an algorithmic
`comparator that attempts to cross-correlate fingerprints for a
`recording with fingerprints uploaded and extracted by dif-
`ferent end users. If it
`is found that
`the fingerprints are
`substantially similar, they would be validated. This is not the
`only method that’s available for validation, but serves as one
`example of a process that could be used to reject bad data.
`
`In this embodiment, fingerprints that are deter-
`[0043]
`mined to be valid and related undergo stitching 280. For
`example, if fingerprints are taken from 30 second segments
`of the recording,
`the fingerprints are assembled into a
`continuousfingerprint stream. This could simplify recogni-
`tion of segments of the recording. The resulting fingerprints
`are stored in fingerprint database 290 associated with exist-
`ing database 90 (FIG. 1).
`
`[0044] The CDDB database has in part been generated
`through uscr submissionsto create a metadata database with
`over 12 million tracks and 900,000 albumsas of mid-2002.
`This database contains both basic metadata(artist, album,
`and track names) as well as extended data(genre, label, etc.).
`
`
`
`US 2003/0028796 Al
`
`Feb. 6, 2003
`
`[0045] A similar distributed collection method may be
`utilized in the creation of a waveform database using the
`system illustrated in FIG.1. In the case where recording 100
`is a raw audio waveform,e.g., when a CD is encoded into
`another format, such as an MP3file, client device 110
`obtains non-waveform data associated with recording 100
`which is possessed by a user of database 90 and executes
`extraction algorithm(s) to extract fingerprints from at least
`one portion of the recording. The fingerprints are then sent
`to server 120 with a unique ID, preferably derived from the
`TOC of the CD. Whenthe unique ID is available, Le., , when
`a match is found in the database for the non-waveform data,
`server 120 is able to associate the appropriate metadata in
`database 90 and the fingerprint(s) with same level of accu-
`racy as identification of CDs by the existing database 90
`which is provided for identification of recordings possessed
`by users. Fingerprints dynamically gathered in this manner
`maybe sent to a fingerprint collection server (not shown in
`FIG. 1) which would accumulate fingerprints from authen-
`ticated clients, as described in more detail below, prior to
`storing the at least one fingerprint as identifying information
`for the recording.
`
`[0046] Multiple fingerprint gathering extractors can also
`be run over a set of static waveforms from a commercial
`encoder such as Loudeye or Muse. The challenge with this
`approachis associating the fingerprints with the appropriate
`metadata. The method described above enables audio fin-
`
`gerprints to be logically associated with parent records and
`associated back to the original audio source. In the preferred
`embodiment, the unique ID provides differentiation between
`live and studio versions of the same song while simulta-
`neously linking those records to the same artist and their
`respective albums.
`
`[0047] Preferably server(s) 120 store information in a
`parallel record set that are linked with unique IDs. When
`client 110 asks server 120 to recognize media (CD, digital
`audio file, video file) server 120 may also return a record
`about how fingerprints should be gathered for this particular
`CD. This is called the Gathering Instructions Record (GIR).
`The GIR may include a set of instructions that the remote
`fingerprint gathering code follows. The record may be
`pre-computed in off hours or may be dynamically computed
`at the time of recognition.
`
`[0048] Server 120 mayuse information it knows aboutthe
`popularity of a CD to drive decisions about gathering.
`Everything about a rare CD could be gathered, because the
`opportunity to get the fingerprints would not want to be
`missed (even if it was somewhat burdensome to the user).
`The opposite situation could be true for a very popular CD.
`The load may be distributed across many users so that they
`would not even notice that any work for fingerprint gather-
`ing was occurring.
`
`[0049] The rules and procedures for building the GIR may
`be manual, automated and may changeover time. They may
`also be applied uniquely to specific users, applications or
`geographic locations.
`
`In one embodiment, the server dynamically gathers
`[0050]
`fingerprints by modifying the GIR to remove fingerprints
`that have been gathered previously. The frequency of updat-
`ing GIRs may vary from instant to delays of days, weeks or
`months. Some example instructions that may be included in
`the GIR are:
`
`(0051] Alist of track and segments to be gathered and
`their priority.
`
`[0052] A fingerprint generator algorithm to use.
`
`[0053] Parameters that tell the fingerprint generator
`how to process the fingerprint, such as:
`
`[0054]
`
`Frequency of audio samples
`
`[0055]
`
`Bands of the frequency domain to process
`
`[0056]
`
`Resolution of the fingerprint
`
`[0057]
`
`Desired Quality of Audio
`
`[0058] When to do the fingerprint gathering, such as
`
`[0059] Before encoding the track
`
`[0060] After encoding the track
`
`[0061]
`
`In parallel with encoding the track
`
`Instructions for caching the fingerprint and
`[0062]
`when to transmit it back to the server, such as
`
`[0063]
`
`Before encoding the track
`
`[0064]
`
`After encoding the track
`
`[0065]
`
`After the CD has been fully encoded
`
`[0066] When the communication channel back to
`the server is not busy
`
`[0067] When the next CD is looked up
`
`[0068] When a group of fingerprints is ready for
`transmission
`
`Instructions to take CPU powerinto the pro-
`[0069]
`cess so as to not overload the computer
`
`the system attempts to improve the
`[0070] Preferably,
`quality of the fingerprints during operation. Quality of the
`source signal, the parameters used for fingerprinting, along
`with improvements in the fingerprinting algorithms will
`result in a complex quality matrix that is used by server 120
`to determine what fingerprints to gather if higher quality is
`available. An example of source quality is provided below:
`Preferably, database 90 or a similar database maintained by
`fingerprint collection server(s) stores the source quality for
`fingerprints stored in the database, so that when a fingerprint
`from higher quality source is available, the fingerprint may
`be replaced.
`
`
`
`US 2003/0028796 Al
`
`Feb. 6, 2003
`
`Name
`
`Bil Rale
`
`Compression Error Correction
`
`Qualily Index
`
`Source Quality Table
`
`Hardware
`44100 kbps None
`CD_Audio_HEC
`Sollware
`44100 kbps None
`CD_Audio_SEC
`None
`44100 kbps None
`CD_Audio
`None
`44100 kbps None
`CDR_Audio
`
`
`CDR_Made_From_MP3 44100 kbs=mp3 None
`MP3_File
`160 kbps mp3
`None
`
`amMPWNH
`
`{0071] Fingerprints dynamically gathered may contain
`information that helps validate quality. Information such as
`errors while reading from the media may besent up to the
`fingerprint collector. The system mayreject fingerprints that
`had high error rates from the source media.
`
`[0072] As noted above, instead of immediately storing a
`fingerprint, multiple fingerprints for a recording may be
`gathered in by a fingerprint collection server prior to being
`added to the database. These fingerprints may be compared
`algorithmically to determine their correlation. If correlation
`is not adequate then additional fingerprints may be gathered
`until adequate correlation is achieved and one ofthe finger-
`prints or a composite fingerprint is stored in the database.
`This prevents bad fingerprints from becoming part of the
`database.
`
`[0073] Stitching of the segmented fingerprints may be
`necessary since slight variations in timing could result in
`overlap of the fingerprints. Algorithmic stitching could
`result
`in a higher quality continuous fingerprint. Simple
`stitching appends segmented fingerprints in order of appear-
`ance in the recording. Complex stitching could involve
`scaling different qualities of fingerprints to the lowest com-
`mon denominator and then appending them in orderoftheir
`appearance in the recording. Preferably some form of math-
`ematical fitting is utilized if the fingerprint segmentation
`contains jitter, so that appending is a fuzzy process rather
`simple addition of the datastream.
`
`[0074] One example of audio fingerprinting that can be
`used is described in the U.S. patent application entitled
`Automatic Identification of Sound Recordings,
`filed by
`Maxwell Wells ct al. on Jul. 22, 2002 and incorporated
`herein by reference. However, any knownalgorithmically
`derived fingerprinting technique may be used, not only for
`digital audio, but also video, TV programs(both analog and
`digital) and DVDs. Appropriate identifiers and recognition
`techniques will be used for the media to be recognized in a
`particular application.
`
`access to them, to determine if the files are allowed in the
`system, a process known as “filter-in”.
`
`[0077] Client device 110 (FIG. 1) extracts information
`310 (FIG. 3) from an audio file at the time of upload to
`server 120 (FIG. 1). The extracted information preferably
`includes non-waveform data, such as a unique ID, ID3 tag,
`filename text data, track duration, et