`
`
`
`Exhibit 23
`
`
`
`
`
`
`
`
`
`
`(12) United States Patent
`WOld
`
`USOO6968337B2
`(10) Patent No.:
`US 6,968,337 B2
`(45) Date of Patent:
`Nov. 22, 2005
`
`(54) METHOD AND APPARATUS FOR
`IDENTIFYING ANU OWN WORK
`N
`NGAN UNKNOWN
`(75) Inventor: Erling H. Wold, El Cerrito, CA (US)
`
`(73) Assignee: Audible Magic Corporation, Los
`Gatos, CA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 458 days.
`
`(*) Notice:
`
`EP
`EP
`WO
`WO
`
`
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`O402210 A 12/1990 ........... GO6F/11/08
`O517405 A2 * 12/1992 ............ GO7C/9/00
`WOO123981 A1 * 4/2001
`............. GO6F/1/OO
`WO O2/15035 A2
`2/2002 ........... G06F/17/OO
`OTHER PUBLICATIONS
`L. Baum et al., A Maximization Technique Occuring in the
`Statistical Analysis of Probabilistic Functions of Markov
`Chains, The Annals of Mathematical Statistics, vol. 41, No.
`1 pp. 164-171, 1970 (no month).
`(Continued)
`Primary Examiner Frantz Coby
`(74) Attorney, Agent, or Firm-Sierra Patent Group, Ltd.
`(57)
`ABSTRACT
`A System for determining an identity of a received work. The
`System receives audio data for an unknown work. The audio
`data is divided into Segments. The System generates a
`Signature of the unknown work from each of the Segments.
`portion of the Signatures. The reduced dimension signatures
`are then compared to reduced dimensions Signatures of
`known works that are Stored in a database. A list of candi
`dates of known works is generated from the comparison.
`The Signatures of the unknown works are then compared to
`the Signatures of the known works in the list of candidates.
`The unknown work is then identified as the known work
`having Signatures matching within a threshold.
`
`Reduced dimension signatures are then generated at least a
`
`(21) Appl. No.: 10/192,783
`(22) Filed:
`Jul. 9, 2002
`(65)
`Prior Publication Data
`US 2003/0023852 A1 Jan. 30, 2003
`O
`O
`Related U.S. Application Data
`(60) Provisional application No. 60/304,647, filed on Jul. 10,
`2001.
`(51) Int. Cl. .................................................. G06F 7700
`(52) U.S. CI.
`707/100; 707/101; 707/102;
`(58) Field of Search
`707/100, 101
`707/102,103 R, 104.1,705f26; 386s3.
`s
`s
`• us
`..
`s
`725/22; 380/54, 28
`s
`s
`
`(56)
`
`References Cited
`
`- - - - - - - - - - - - - - - - - - - - - - -
`
`707/103 R. 707/1041
`
`U.S. PATENT DOCUMENTS
`179/1 SB
`3.919,479 A 11/1975 Moon et all
`4,230,990 A 10/1980 Lert, Jr. et al... 455/67
`4,450,531 A 5/1984 Kenyon et al. ............. 364/604
`
`2 2 2
`
`C. C.
`
`“.
`
`---...
`
`22O
`
`Stricaining Application
`Server
`
`Web
`Server
`
`224
`
`212
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 2 of 19
`
`26 Claims, 9 Drawing Sheets
`
`218
`
`-
`
`-r
`
`26
`
`Cable Head End st-e
`
`2O
`
`,
`
`Cable ModeII
`
`24
`
`206
`
`208
`
`
`
`US 6,968.337 B2
`Page 2
`
`2002/O198789 A1 12/2002 Waldman ..................... 705/26
`U.S. PATENT DOCUMENTS
`OTHER PUBLICATIONS
`4,677,455 A 6/1987 Okajima ...................... 357/38
`-
`-
`-
`4,677,466 A
`6/1987 Lert, Jr. et al. ............... 358/84
`A. P. Dempster et al. “Maximum Likelihood from Incom
`4,739,398 A
`4/1988 Thomas et al. ............... 358/84
`plete Data via the SEMS Algorithm”, Journal of the Royal
`4,843,562 A
`6/1989 Kenyon et al. ............. 364/487
`Statistical Society, Series B (Methodological), vol.39, Issue
`4,918,730 A 4/1990 Schulze ......
`381/43
`1, pp. 1-38, 1977 (no month).
`5,210,820 A 5/1993 Kenyon ...
`... 395/2
`D. Reynolds et al., “Robust Text-Independent Speaker Iden
`5,283,819 A 2/1994 Glick et al. ..
`... 379/90
`E. A 3. E. Ele al - - - - - - - - - - - - - - - - - - - i. tification Using Gaussian Mixture Speaker Models”, IEEE
`2- Y
`-
`IS el al. . . . . . . . . . . . . . . . . . . . .
`5,581,658 A 12/1996 O'Hagan et al. ............. 395/22
`Transactions on Speech and Audio Processing, vol. 3, No. 1,
`5,613,004 A * 3/1997 Coo
`tal
`2s.
`pp. 72-83, Jan. 1995.
`5,710,916 A
`1/1998 Barbara et al. ............. 395/609
`B. Pellom et al., “Fast Likelihood Computation Techniques
`5,918.223 A 6/1999 Blum et al. .................... 707/1
`in Nearest-Neighbor Based search for Continuous Speech
`5,930,369 A 7/1999 Cox et al. ..................... 380/54
`Recognition', IEEE Signal Processing Letters, vol. 8, No. *
`5,949,885 A * 9/1999 Leighton ...
`... 380/54
`pp. 221-224, Aug. 2001.
`6,006,256 A 12/1999 Zdepski et al. .
`... 709/217
`J. Haitsma et al., “Robust Audio hashing for Content Iden
`6,011,758 A 1/2000 Dockes et al. ................ 369/30
`tification’, CBMI 2001, Second International Workshop On
`6,026,439 A 2/2000 Chowdhury et al. ........ 709/233
`Content Based Multimedia and Indexing, Sep. 19-21, 2001,
`6,044,402 A 3/2000 Jacobson et al. ........... 709/225
`Brescia, Italy, Sep. 19-21, 2001.
`6,118,450 A 9/2000 Proehl et al. ...
`... 345/349
`6.192340 B1
`2
`PacketHound Tech Specs, palisdesys.com/products/pack
`2- -- 12
`/2001 Abecassis ...
`... 704/270
`6,243,615 B1
`6/2001 Neway et al. ..
`... 700/108
`ethound/tech specs/prod Phtechspecs.shtml, 2002 (no
`6.253,193 B1
`6/2001 Ginter et al. ................. 705/57
`month).
`6.253,337 B1
`6/2001 Maloney et al. .............. 714/38
`“How Does PacketHound Work?”, palisadesys.com/prod
`6,422,061 B1
`7/2002 Sunshine et al. .......... 73/29.01
`ucts/packethound/how does itwork/prod Phhow.shtml,
`6,771885 B1
`8/2004 Agnihotri et al. ............. 386/83
`2002 (no month).
`2002/0O82999 A1
`6/2002 Lee et al. ..................... 705/51
`2002/0O87885 A1
`7/2002 Peled et al. ................. 713/201
`
`* cited by examiner
`
`66
`
`-
`
`-
`
`-
`
`2Y------
`
`perman et al. ..........
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 3 of 19
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 1 of 9
`
`US 6,968,337 B2
`
`VI ‘QIH
`
`
`
`UOQUQAUI QuºS9.IA
`
`
`
`-
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 4 of 19
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 2 of 9
`
`US 6,968,337 B2
`
`
`
`
`
`XIowa pº?dures 3A1302),
`
`- - - -
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 5 of 19
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 3 of 9
`
`US 6,968,337 B2
`
`9 IZ
`
`8 IZ
`
`UOQU3AU I JU3S3.J? Z 'OIH
`
`zí z
`
`ÞZZ
`
`| od
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 6 of 19
`
`
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 7 of 19
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 7 of 19
`
`US. Patent
`
`Nov. 22, 2005
`
`Sheet 4 0f 9
`
`US 6,968,337 B2
`
`308
`
`HopSize
`
`306\
`
`
`
`FIG3PresentInvention
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 5 of 9
`
`US 6,968,337 B2
`
`
`
`i
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 8 of 19
`
`i.
`
`
`
`U.S. Patent
`
`US 6,968,337 B2
`
`
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 9 of 19
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 7 of 9
`
`US 6,968,337 B2
`
`
`
`:38, e?epe?ðIAN
`
`
`606?899 WV
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 10 of 19
`
`
`
`UOIQU’9AUI QUÐS?JA 9 '{DIH
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 8 of 9
`
`US 6,968,337 B2
`
`
`
`
`
`solepipupo Jons? antaro
`
`
`
`UOQUÐAUI QUÐSQ.I.-I. V L 'OIH
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 11 of 19
`
`
`
`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 9 of 9
`
`US 6,968,337 B2
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 12 of 19
`
`
`
`1
`METHOD AND APPARATUS FOR
`IDENTIFYING AN UNKNOWN WORK
`
`US 6,968,337 B2
`
`PRIORITY CLAIM
`This application claims the benefit of U.S. Provisional
`Application Ser. No. 60/304,647, filed Jul. 10, 2001.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to data communications. In
`particular, the present invention relates to a novel method
`and apparatus for identifying an unknown work.
`
`BACKGROUND
`
`15
`
`2. The Prior Art
`Digital audio technology has greatly changed the land
`Scape of music and entertainment. Rapid increases in com
`puting power coupled with decreases in cost have made it
`possible for individuals to generate finished products having
`a quality once available only in a major Studio. One conse
`quence of modern technology is that legacy media Storage
`Standards, Such as reel-to-reel tapes, are being rapidly
`replaced by digital Storage media, Such as the Digital Ver
`satile Disk (DVD), and Digital Audio Tape (DAT).
`Additionally, with higher capacity hard drives Standard on
`most personal computers, home users may now Store digital
`files Such as audio or Video tracks on their home computers.
`Furthermore, the Internet has generated much excitement,
`particularly among those who see the Internet as an oppor
`tunity to develop new avenues for artistic expression and
`communication. The Internet has become a virtual gallery,
`where artists may post their works on a Web page. Once
`posted, the WorkS may be viewed by anyone having acceSS
`to the Internet.
`One application of the Internet that has received consid
`erable attention is the ability to transmit recorded music over
`the Internet. Once music has been digitally encoded, the
`audio may be both downloaded by users for play, or broad
`cast (“streamed”) over the Internet. When audio is streamed,
`it may be listened to by Internet users in a manner much like
`traditional radio Stations.
`Given the widespread use of digital media, digital audio
`files, or digital Video files containing audio information, may
`need to be identified. The need for identification of digital
`files may arise in a variety of Situations. For example, an
`artist may wish to Verify royalty payments or generate their
`own Arbitron(R)-like ratings by identifying how often their
`Works are being Streamed or downloaded. Additionally,
`users may wish to identify a particular work. The prior art
`has made efforts to create methods for identifying digital
`audio WorkS.
`However, systems of the prior art suffer from certain
`disadvantages. One area of difficulty arises when a large
`number of reference Signatures must be compared to an
`unknown audio recording.
`The Simplest method for comparing an incoming audio
`Signature (which could be from a file on the Internet, a
`recording of a radio or Internet radio broadcast, a recording
`from a cell phone, etc) to a database of reference Signatures
`for the purpose of identification is to Simply compare the
`incoming Signature to every element of the database.
`However, since it may not be known where the reference
`Signatures might have occurred inside the incoming
`Signature, this comparison must be done at many time
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 13 of 19
`
`2
`locations within the incoming Signature. Each individual
`Signature-to-signature comparison at each point in time may
`also be done in a "brute-force' manner using techniques
`known in the art, essentially computing the full Euclidean
`distance between the entire Signatures feature vectors. A
`match can then be declared when one of these comparisons
`yields a Score or distance that is above or below Some
`threshold, respectively.
`However, when an audio signature or fingerprint contains
`a large number of features Such a brute-force Search
`becomes too expensive computationally for real-world data
`bases which typically have Several hundred thousand to
`Several million Signatures.
`Many researchers have worked on methods for multi
`dimensional indexing, although the greatest effort has gone
`into geographical (2-dimensional) or spatial (3-dimensional)
`data. Typically, all of these methods order the elements of
`the database based on their proximity to each other.
`For example, the elements of the database can be clus
`tered into hyper-spheres or hyper-rectangles, or the Space
`can be organized into a tree form by using partitioning
`planes. However, when the number of dimensions is large
`(on the order of 15 or more), it can be shown mathematically
`that more-or-leSS uniformly distributed points in the Space
`all become approximately equidistant from each other. Thus,
`it becomes impossible to cluster the data in a meaningful
`way, and comparisons can become both lengthy and inac
`Curate.
`Hence, there exists a need to provide a means for data
`comparison which overcomes the disadvantages of the prior
`art.
`
`BRIEF DESCRIPTION OF THE INVENTION
`A method and apparatus for identifying an unknown work
`is disclosed. In one aspect, a method may includes the acts
`of providing a reference database having a reduced dimen
`Sionality containing Signatures of Sampled works, receiving
`a Sampled work, producing a Signature from the work; and
`reducing the dimensionality of the Signature.
`BRIEF DESCRIPTION OF THE DRAWING
`FIGURES
`FIG. 1A is a flowchart of a method according to the
`present invention.
`FIG. 1B is a flowchart of another method according to the
`present invention.
`FIG. 2 is a diagram of a system suitable for use with the
`present invention.
`FIG. 3 is a diagram of Segmenting according to the
`present invention.
`FIG. 4 is a detailed diagram of Segmenting according to
`the present invention showing hop size.
`FIG. 5 is a graphical flowchart showing the creating of a
`Segment feature vector according to the present invention.
`FIG. 6 is a diagram of a Signature according to the present
`invention.
`FIG. 7A is a flowchart of a method for preparing a
`reference database according to the present invention.
`FIG. 7B is a flowchart of method for identifying an
`unknown work according to the present invention.
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`Persons of ordinary skill in the art will realize that the
`following description of the present invention is illustrative
`
`
`
`3
`only and not in any way limiting. Other embodiments of the
`invention will readily Suggest themselves to Such skilled
`perSons having the benefit of this disclosure.
`It is contemplated that the present invention may be
`embodied in various computer and machine-readable data
`Structures. Furthermore, it is contemplated that data Struc
`tures embodying the present invention will be transmitted
`acroSS computer and machine-readable media, and through
`communications Systems by use of Standard protocols Such
`as those used to enable the Internet and other computer
`networking Standards.
`The invention further relates to machine-readable media
`on which are Stored embodiments of the present invention.
`It is contemplated that any media Suitable for Storing instruc
`tions related to the present invention is within the Scope of
`the present invention. By way of example, Such media may
`take the form of magnetic, optical, or Semiconductor media.
`The present invention may be described through the use
`of flowcharts. Often, a single instance of an embodiment of
`the present invention will be shown. AS is appreciated by
`those of ordinary skill in the art, however, the protocols,
`processes, and procedures described herein may be repeated
`continuously or as often as necessary to Satisfy the needs
`described herein. Accordingly, the representation of the
`present invention through the use of flowcharts should not
`be used to limit the Scope of the present invention.
`The present invention may also be described through the
`use of web pages in which embodiments of the present
`invention may be viewed and manipulated. It is contem
`plated that Such web pages may be programmed with web
`page creation programs using languages Standard in the art
`such as HTML or XML. It is also contemplated that the web
`pages described herein may be viewed and manipulated with
`web browserS running on operating Systems Standard in the
`art, Such as the Microsoft Windows(E) and Macintosh(E)
`versions of Internet Explorer(R) and Netscape(R).
`Furthermore, it is contemplated that the functions performed
`by the various web pages described herein may be imple
`mented through the use of Standard programming languages
`Such a JavaE) or similar languages.
`The present invention will first be described in general
`overview. Then, each element will be described in further
`detail below.
`Referring now to FIG. 1A, a flowchart is shown which
`provides a general Overview of the present invention as
`related to the preparation of a database of reference Signa
`tures. Two overall acts are performed to prepare a reference
`database in accordance with the present invention: in act
`100, the present invention reduces the dimensionality of
`reference Signatures, and the reference database is indexed
`in act 102.
`Referring now to FIG. 1B, a flowchart is shown which
`provides a general Overview of the present invention as
`related to the identification of an unknown signature in
`accordance with the present invention. In act 104, a Sampled
`work is received. In act 106, the present invention reduces
`the dimensionality of the received work. In act 108, the
`present invention determines initial candidates. In act 110,
`the present invention Searches for the best candidate.
`Prior to presenting a detailed Overview of each act of
`FIGS. 1A and 1B, some background will first be presented.
`Structural Embodiment of the Present Invention
`Referring now to FIG. 2, a diagram of a System Suitable
`for use with the present invention is shown. FIG. 2 includes
`a client system 200. It is contemplated that client system 200
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,968,337 B2
`
`4
`may comprise a personal computer 202 including hardware
`and Software Standard in the art to run an operating System
`Such as Microsoft Windows.(R), MAC OS(R) Palm OS, UNIX,
`or other operating Systems Standard in the art. Client System
`200 may further include a database 204 for storing and
`retrieving embodiments of the present invention. It is con
`templated that database 204 may comprise hardware and
`Software Standard in the art and may be operatively coupled
`to PC 202. Database 204 may also be used to store and
`retrieve the works and Segments utilized by the present
`invention.
`Client system 200 may further include an audio/video
`(A/V) input device 208. A/V device 208 is operatively
`coupled to PC 202 and is configured to provide works to the
`present invention which may be Stored in traditional audio or
`video formats. It is contemplated that A/V device 208 may
`comprise hardware and Software Standard in the art config
`ured to receive and sample audio works (including video
`containing audio information), and provide the sampled
`Works to the present invention as digital audio files.
`Typically, the A/V input device 208 would supply raw audio
`samples in a format such as 16-bit stereo PCM format. A/V
`input device 208 provides an example of means for receiv
`ing a Sampled work.
`It is contemplated that Sampled WorkS may be obtained
`over the Internet, also. Typically, Streaming media over the
`Internet is provided by a provider, such as provider 218 of
`FIG. 2. Provider 218 includes a streaming application server
`220, configured to retrieve works from database 222 and
`Stream the works in a formats Standard in the art, Such as
`Real(F), Windows Media(E), or QuickTime(R). The server then
`provides the streamed works to a web server 224, which then
`provides the streamed work to the Internet 214 through a
`gateway 216. Internet 214 may be any packet-based network
`standard in the art, such as IP, Frame Relay, or ATM.
`To reach the provider 218, the present invention may
`utilize a cable or DSL head end 212 standard in the art
`operatively, which is coupled to a cable modem or DSL
`modem 210 which is in turn coupled to the system's network
`206. The network 206 may be any network standard in the
`art, such as a LAN provided by a PC 202 configured to run
`Software Standard in the art.
`It is contemplated that the Sampled work received by
`system 200 may contain audio information from a variety of
`Sources known in the art, including, without limitation,
`radio, the audio portion of a television broadcast, Internet
`radio, the audio portion of an Internet Video program or
`channel, Streaming audio from a network audio Server, audio
`delivered to personal digital assistants over cellular or
`wireleSS communication Systems, or cable and Satellite
`broadcasts.
`Additionally, it is contemplated that the present invention
`may be configured to receive and compare Segments coming
`from a variety of Sources either Stored or in real-time. For
`example, it is contemplated that the present invention may
`compare a real-time Streaming work coming from Streaming
`server 218 or A/V device 208 with a reference segment
`stored in database 204.
`Segmenting Background
`It is contemplated that a wide variety of Sampled works
`may be utilized in the present invention. However, the
`inventors have found the present invention especially useful
`with Segmented works. An Overview of a Segmented work
`will now be provided.
`FIG. 3 shows a diagram showing the Segmenting of a
`work according to the present invention. FIG. 3 includes
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 14 of 19
`
`
`
`S
`audio information 300 displayed along a time axis 302. FIG.
`3 further includes a plurality of segments 304,306, and 308
`taken of audio information 300 over some segment size T.
`In an exemplary non-limiting embodiment of the present
`invention, instantaneous values of a variety of acoustic
`features are computed at a low level, preferably about 100
`times a second. In particular, 10 MFCCs (cepstral
`coefficients) are computed. It is contemplated that any
`number of MFCCs may be computed. Preferably, 5-20
`MFCCs are computed, however, as many as 30 MFCCs may
`be computed, depending on the need for accuracy verSuS
`Speed.
`Segment-level features are disclosed U.S. Pat. No. 5,918,
`223 to Blum, et al., which is assigned to the assignee of the
`current disclosure and incorporated by reference as though
`fully Set forth herein. In an exemplary non-limiting embodi
`ment of the present invention, the Segment-level acoustical
`features comprise Statistical measures as disclosed in the
`223 patent of low-level features calculated over the length
`of each Segment. The data structure may store other book
`keeping information as well (segment size, hop size, item
`ID, UPC, etc). As can be seen by inspection of FIG. 3, the
`segments 304, 306, and 308 may overlap in time. This
`amount of overlap may be represented by measuring the
`time between the center point of adjacent Segments. This
`amount of time is referred to herein as the hop Size of the
`Segments, and is So designated in FIG. 3. By way of
`example, if the Segment length T of a given Segment is one
`Second, and adjacent Segments overlap by 50%, the hop size
`would be 0.5 second.
`The hop size may be set during the development of the
`Software. Additionally, the hop sizes of the reference data
`base and the real-time Signatures may be predetermined to
`facilitate compatibility. For example, the reference Signa
`tures in the reference database may be precomputed with a
`fixed hop and Segment size, and thus the client applications
`should conform to this Segment size and have a hop size
`which integrally divides the reference Signature hop size. It
`is contemplated that one may experiment with a variety of
`Segment sizes in order to balance the tradeoff of accuracy
`with Speed of computation for a given application.
`The inventors have found that by carefully choosing the
`hop Size of the Segments, the accuracy of the identification
`proceSS may be significantly increased. Additionally, the
`inventors have found that the accuracy of the identification
`proceSS may be increased if the hop Size of reference
`Segments and the hop Size of Segments obtained in real-time
`are each chosen independently. The importance of the hop
`Size of Segments may be illustrated by examining the
`proceSS for Segmenting pre-recorded works and real-time
`WorkS Separately.
`Reference Signatures
`Prior to attempting to identify a given work, a reference
`database of Signatures must be created. When building a
`reference database, a Segment length having a period of leSS
`than three Seconds is preferred. In an exemplary non
`limiting embodiment of the present invention, the Segment
`lengths have a period ranging from 0.5 Seconds to 3 Seconds.
`For a reference database, the inventors have found that a hop
`size of approximately 50% to 100% of the segment size is
`preferred.
`It is contemplated that the reference Signatures may be
`Stored on a database Such as database 204 as described
`above. Database 204 and the discussion herein provide an
`example of means for providing a plurality of reference
`Signatures each having a Segment size and a hop size.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 15 of 19
`
`US 6,968,337 B2
`
`6
`
`Unknown Signatures
`The choice of the hop size is important for the Signatures
`of the audio to be identified, hereafter referred to as
`“unknown audio.'
`FIG. 4 shows a detailed diagram of the Segmentation of
`unknown audio according to the present invention. FIG. 4
`includes unknown audio information 400 displayed along a
`time axis 402. FIG. 4 further includes segments 404 and 406
`taken of audio information 400 over some segment length T.
`In an exemplary non-limiting embodiment of the present
`invention, the Segment length of unknown audio Segments is
`chosen to range from 0.5 to 3 Seconds.
`As can be seen by inspection of FIG. 4, the hop size of
`unknown audio Segments is chosen to be Smaller than that of
`reference Segments. In an exemplary non-limiting embodi
`ment of the present invention, the hop Size of unknown
`audio Segments is less than 50% of the Segment size. In yet
`another exemplary non-limiting embodiment of the present
`invention, the unknown audio hop Size may be 0.1 Seconds.
`The inventors have found Such a Small hop Size advan
`tageous for the following reasons. The ultimate purpose of
`generating unknown audio Segments is to analyze and
`compare them with the reference Segments in the database to
`look for matches. The inventors have found at least two
`major reasons why an unknown audio recording would not
`match its counterpart in the database. One is that the
`broadcast channel does not produce a perfect copy of the
`original. For example, the work may be edited or processed
`or the announcer may talk over part of the work. The other
`reason is that larger Segment boundaries may not line up in
`time with the original segment boundaries of the target
`recordings.
`The inventors have found that by choosing a Smaller hop
`size, Some of the Segments will ultimately have time bound
`aries that line up with the original Segments, notwithstand
`ing the problems listed above. The Segments that line up
`with a “clean” segment of the work may then be used to
`make an accurate comparison while those that do not So line
`up may be ignored. The inventors have found that a hop Size
`of 0.1 seconds seems to be the maximum that would solve
`this time shifting problem.
`AS mentioned above, once a work has been Segmented,
`the individual Segments are then analyzed to produce a
`Segment feature vector. FIG. 5 is a diagram showing an
`Overview of how the Segment feature vectors may be created
`using the methods described in U.S. Pat. No. 5,918,223 to
`Blum, et al. It is contemplated that a variety of analysis
`methods may be useful in the present invention, and many
`different features may be used to make up the feature vector.
`The inventors have found that the pitch, brightness,
`bandwidth, and loudness features of the 223 patent to be
`useful in the present invention. Additionally, Spectral fea
`tures may be used analyzed, Such as the energy in various
`spectral bands. The inventors have found that the cepstral
`features (MFCCs) are very robust (more invariant) given the
`distortions typically introduced during broadcast, Such as
`EQ, multi-band compression/limiting, and audio data com
`pression techniques Such as MP3 encoding/decoding, etc.
`In act 500, the audio segment is sampled to produce a
`Segment. In act 502, the Sampled Segment is then analyzed
`using Fourier Transform techniques to transform the Signal
`into the frequency domain. In act 504, mel frequency filters
`are applied to the transformed Signal to extract the Signifi
`cant audible characteristics of the spectrum. In act 506, a
`Discrete Cosine Transform is applied which converts the
`Signal into mel frequency cepstral coefficients (MFCCs).
`
`
`
`US 6,968,337 B2
`
`7
`Finally, in act 508, the MFCCs are then averaged over a
`predetermined period. In an exemplary non-limiting
`embodiment of the present invention, this period is approxi
`mately one Second. Additionally, other characteristics may
`be computed at this time, Such as brightness or loudness. A
`5
`Segment feature vector is then produced which contains a list
`containing at least the 10 MFCCS corresponding average.
`The disclosure of FIGS. 3, 4, and 5 provide examples of
`means for creating a signature of a Sampled work having a
`Segment size and a hop size.
`FIG. 6 is a diagram showing a complete signature 600
`according to the present invention. Signature 600 includes a
`plurality of Segment feature vectors 1 through n generated as
`shown and described above. Signature 600 may also include
`an identification portion containing a unique ID. It is con
`templated that the identification portion may contain a
`unique identifier provided by the RIAA (Recording Industry
`ASSociation of America) or Some other audio authority or
`cataloging agency. The identification portion may also con
`tain information such as the UPC (Universal Product Code)
`of the various products that contain the audio corresponding
`to this Signature. Additionally, it is contemplated that the
`Signature 600 may also contain information pertaining to the
`characteristics of the file itself, Such as the hop size, Segment
`size, number of Segments, etc., which may be useful for
`Storing and indexing.
`Signature 600 may then be stored in a database and used
`for comparisons.
`The following computer code in the C programming
`language provides an example of a database Structure in
`memory according to the present invention:
`
`8
`Signature derived from a Segmentation of the original audio
`work as described above. In a presently preferred
`embodiment, reference Signatures have 20 non-overlapping
`Segments, where each Segment is one Second in duration,
`with one-Second spacing from center to center, as described
`above. Each of these segments is represented by 10 Mel
`filtered cepstral coefficients (MFCCs), resulting in a feature
`vector of 200 dimensions. Since indexing a vector Space of
`this dimensionality is not practical, the number of dimen
`Sions used for the initial Search for possible candidates is
`reduced according to the present invention.
`Reducing the Dimensionality
`FIG. 7A is a flowchart of dimension reduction according
`to the present invention. The number of dimensions used for
`the initial Search for possible candidates is reduced, resulting
`in what the inventors refer to as a Subspace. By having the
`present invention Search a Subspace at the outset, the effi
`ciency of the Search may be greatly increased.
`Referring now to FIG. 7A, the present invention accom
`plishes two tasks to develop this Subspace: (1) the present
`invention uses less than the total number of Segments in the
`reference signatures in act 701; and (2) the present invention
`performs a principal components analysis to reduce the
`dimensionality in act 703.
`Using Less Segments to Perform an Initial Search
`The inventors empirically have found that using data from
`two consecutive segments (i.e., a two-second portion of the
`Signature) to search for approximately 500 candidates is a
`good tradeoff between computation complexity and accu
`racy. The number of candidates can be altered for different
`applications where either speed or accuracy is more or leSS
`important.
`For example, the present invention may be configured to
`extract a predetermined percentage of candidates. In an
`exemplary non-limiting embodiment of the present
`invention, a list of candidates may comprise 2% of the size
`of the reference Signature database when using 2 Segments
`for the initial Search. In another exemplary non-limiting
`embodiment of the present invention, a list of candidates
`may be those reference Signatures whose distances based on
`the initial 2 Segment Search are below a certain threshold.
`As will be appreciated by those of ordinary skill in the ar