throbber
Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 1 of 19
`
`
`
`Exhibit 23
`
`
`
`
`
`
`
`
`

`

`(12) United States Patent
`WOld
`
`USOO6968337B2
`(10) Patent No.:
`US 6,968,337 B2
`(45) Date of Patent:
`Nov. 22, 2005
`
`(54) METHOD AND APPARATUS FOR
`IDENTIFYING ANU OWN WORK
`N
`NGAN UNKNOWN
`(75) Inventor: Erling H. Wold, El Cerrito, CA (US)
`
`(73) Assignee: Audible Magic Corporation, Los
`Gatos, CA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 458 days.
`
`(*) Notice:
`
`EP
`EP
`WO
`WO
`
`
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`O402210 A 12/1990 ........... GO6F/11/08
`O517405 A2 * 12/1992 ............ GO7C/9/00
`WOO123981 A1 * 4/2001
`............. GO6F/1/OO
`WO O2/15035 A2
`2/2002 ........... G06F/17/OO
`OTHER PUBLICATIONS
`L. Baum et al., A Maximization Technique Occuring in the
`Statistical Analysis of Probabilistic Functions of Markov
`Chains, The Annals of Mathematical Statistics, vol. 41, No.
`1 pp. 164-171, 1970 (no month).
`(Continued)
`Primary Examiner Frantz Coby
`(74) Attorney, Agent, or Firm-Sierra Patent Group, Ltd.
`(57)
`ABSTRACT
`A System for determining an identity of a received work. The
`System receives audio data for an unknown work. The audio
`data is divided into Segments. The System generates a
`Signature of the unknown work from each of the Segments.
`portion of the Signatures. The reduced dimension signatures
`are then compared to reduced dimensions Signatures of
`known works that are Stored in a database. A list of candi
`dates of known works is generated from the comparison.
`The Signatures of the unknown works are then compared to
`the Signatures of the known works in the list of candidates.
`The unknown work is then identified as the known work
`having Signatures matching within a threshold.
`
`Reduced dimension signatures are then generated at least a
`
`(21) Appl. No.: 10/192,783
`(22) Filed:
`Jul. 9, 2002
`(65)
`Prior Publication Data
`US 2003/0023852 A1 Jan. 30, 2003
`O
`O
`Related U.S. Application Data
`(60) Provisional application No. 60/304,647, filed on Jul. 10,
`2001.
`(51) Int. Cl. .................................................. G06F 7700
`(52) U.S. CI.
`707/100; 707/101; 707/102;
`(58) Field of Search
`707/100, 101
`707/102,103 R, 104.1,705f26; 386s3.
`s
`s
`• us
`..
`s
`725/22; 380/54, 28
`s
`s
`
`(56)
`
`References Cited
`
`- - - - - - - - - - - - - - - - - - - - - - -
`
`707/103 R. 707/1041
`
`U.S. PATENT DOCUMENTS
`179/1 SB
`3.919,479 A 11/1975 Moon et all
`4,230,990 A 10/1980 Lert, Jr. et al... 455/67
`4,450,531 A 5/1984 Kenyon et al. ............. 364/604
`
`2 2 2
`
`C. C.
`
`“.
`
`---...
`
`22O
`
`Stricaining Application
`Server
`
`Web
`Server
`
`224
`
`212
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 2 of 19
`
`26 Claims, 9 Drawing Sheets
`
`218
`
`-
`
`-r
`
`26
`
`Cable Head End st-e
`
`2O
`
`,
`
`Cable ModeII
`
`24
`
`206
`
`208
`
`

`

`US 6,968.337 B2
`Page 2
`
`2002/O198789 A1 12/2002 Waldman ..................... 705/26
`U.S. PATENT DOCUMENTS
`OTHER PUBLICATIONS
`4,677,455 A 6/1987 Okajima ...................... 357/38
`-
`-
`-
`4,677,466 A
`6/1987 Lert, Jr. et al. ............... 358/84
`A. P. Dempster et al. “Maximum Likelihood from Incom
`4,739,398 A
`4/1988 Thomas et al. ............... 358/84
`plete Data via the SEMS Algorithm”, Journal of the Royal
`4,843,562 A
`6/1989 Kenyon et al. ............. 364/487
`Statistical Society, Series B (Methodological), vol.39, Issue
`4,918,730 A 4/1990 Schulze ......
`381/43
`1, pp. 1-38, 1977 (no month).
`5,210,820 A 5/1993 Kenyon ...
`... 395/2
`D. Reynolds et al., “Robust Text-Independent Speaker Iden
`5,283,819 A 2/1994 Glick et al. ..
`... 379/90
`E. A 3. E. Ele al - - - - - - - - - - - - - - - - - - - i. tification Using Gaussian Mixture Speaker Models”, IEEE
`2- Y
`-
`IS el al. . . . . . . . . . . . . . . . . . . . .
`5,581,658 A 12/1996 O'Hagan et al. ............. 395/22
`Transactions on Speech and Audio Processing, vol. 3, No. 1,
`5,613,004 A * 3/1997 Coo
`tal
`2s.
`pp. 72-83, Jan. 1995.
`5,710,916 A
`1/1998 Barbara et al. ............. 395/609
`B. Pellom et al., “Fast Likelihood Computation Techniques
`5,918.223 A 6/1999 Blum et al. .................... 707/1
`in Nearest-Neighbor Based search for Continuous Speech
`5,930,369 A 7/1999 Cox et al. ..................... 380/54
`Recognition', IEEE Signal Processing Letters, vol. 8, No. *
`5,949,885 A * 9/1999 Leighton ...
`... 380/54
`pp. 221-224, Aug. 2001.
`6,006,256 A 12/1999 Zdepski et al. .
`... 709/217
`J. Haitsma et al., “Robust Audio hashing for Content Iden
`6,011,758 A 1/2000 Dockes et al. ................ 369/30
`tification’, CBMI 2001, Second International Workshop On
`6,026,439 A 2/2000 Chowdhury et al. ........ 709/233
`Content Based Multimedia and Indexing, Sep. 19-21, 2001,
`6,044,402 A 3/2000 Jacobson et al. ........... 709/225
`Brescia, Italy, Sep. 19-21, 2001.
`6,118,450 A 9/2000 Proehl et al. ...
`... 345/349
`6.192340 B1
`2
`PacketHound Tech Specs, palisdesys.com/products/pack
`2- -- 12
`/2001 Abecassis ...
`... 704/270
`6,243,615 B1
`6/2001 Neway et al. ..
`... 700/108
`ethound/tech specs/prod Phtechspecs.shtml, 2002 (no
`6.253,193 B1
`6/2001 Ginter et al. ................. 705/57
`month).
`6.253,337 B1
`6/2001 Maloney et al. .............. 714/38
`“How Does PacketHound Work?”, palisadesys.com/prod
`6,422,061 B1
`7/2002 Sunshine et al. .......... 73/29.01
`ucts/packethound/how does itwork/prod Phhow.shtml,
`6,771885 B1
`8/2004 Agnihotri et al. ............. 386/83
`2002 (no month).
`2002/0O82999 A1
`6/2002 Lee et al. ..................... 705/51
`2002/0O87885 A1
`7/2002 Peled et al. ................. 713/201
`
`* cited by examiner
`
`66
`
`-
`
`-
`
`-
`
`2Y------
`
`perman et al. ..........
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 3 of 19
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 1 of 9
`
`US 6,968,337 B2
`
`VI ‘QIH
`
`
`
`UOQUQAUI QuºS9.IA
`
`
`
`-
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 4 of 19
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 2 of 9
`
`US 6,968,337 B2
`
`
`
`
`
`XIowa pº?dures 3A1302),
`
`- - - -
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 5 of 19
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 3 of 9
`
`US 6,968,337 B2
`
`9 IZ
`
`8 IZ
`
`UOQU3AU I JU3S3.J? Z 'OIH
`
`zí z
`
`ÞZZ
`
`| od
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 6 of 19
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 7 of 19
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 7 of 19
`
`US. Patent
`
`Nov. 22, 2005
`
`Sheet 4 0f 9
`
`US 6,968,337 B2
`
`308
`
`HopSize
`
`306\
`
`
`
`FIG3PresentInvention
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 5 of 9
`
`US 6,968,337 B2
`
`
`
`i
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 8 of 19
`
`i.
`
`

`

`U.S. Patent
`
`US 6,968,337 B2
`
`
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 9 of 19
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 7 of 9
`
`US 6,968,337 B2
`
`
`
`:38, e?epe?ðIAN
`
`
`606?899 WV
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 10 of 19
`
`
`
`UOIQU’9AUI QUÐS?JA 9 '{DIH
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 8 of 9
`
`US 6,968,337 B2
`
`
`
`
`
`solepipupo Jons? antaro
`
`
`
`UOQUÐAUI QUÐSQ.I.-I. V L 'OIH
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 11 of 19
`
`

`

`U.S. Patent
`
`Nov. 22, 2005
`
`Sheet 9 of 9
`
`US 6,968,337 B2
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 12 of 19
`
`

`

`1
`METHOD AND APPARATUS FOR
`IDENTIFYING AN UNKNOWN WORK
`
`US 6,968,337 B2
`
`PRIORITY CLAIM
`This application claims the benefit of U.S. Provisional
`Application Ser. No. 60/304,647, filed Jul. 10, 2001.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to data communications. In
`particular, the present invention relates to a novel method
`and apparatus for identifying an unknown work.
`
`BACKGROUND
`
`15
`
`2. The Prior Art
`Digital audio technology has greatly changed the land
`Scape of music and entertainment. Rapid increases in com
`puting power coupled with decreases in cost have made it
`possible for individuals to generate finished products having
`a quality once available only in a major Studio. One conse
`quence of modern technology is that legacy media Storage
`Standards, Such as reel-to-reel tapes, are being rapidly
`replaced by digital Storage media, Such as the Digital Ver
`satile Disk (DVD), and Digital Audio Tape (DAT).
`Additionally, with higher capacity hard drives Standard on
`most personal computers, home users may now Store digital
`files Such as audio or Video tracks on their home computers.
`Furthermore, the Internet has generated much excitement,
`particularly among those who see the Internet as an oppor
`tunity to develop new avenues for artistic expression and
`communication. The Internet has become a virtual gallery,
`where artists may post their works on a Web page. Once
`posted, the WorkS may be viewed by anyone having acceSS
`to the Internet.
`One application of the Internet that has received consid
`erable attention is the ability to transmit recorded music over
`the Internet. Once music has been digitally encoded, the
`audio may be both downloaded by users for play, or broad
`cast (“streamed”) over the Internet. When audio is streamed,
`it may be listened to by Internet users in a manner much like
`traditional radio Stations.
`Given the widespread use of digital media, digital audio
`files, or digital Video files containing audio information, may
`need to be identified. The need for identification of digital
`files may arise in a variety of Situations. For example, an
`artist may wish to Verify royalty payments or generate their
`own Arbitron(R)-like ratings by identifying how often their
`Works are being Streamed or downloaded. Additionally,
`users may wish to identify a particular work. The prior art
`has made efforts to create methods for identifying digital
`audio WorkS.
`However, systems of the prior art suffer from certain
`disadvantages. One area of difficulty arises when a large
`number of reference Signatures must be compared to an
`unknown audio recording.
`The Simplest method for comparing an incoming audio
`Signature (which could be from a file on the Internet, a
`recording of a radio or Internet radio broadcast, a recording
`from a cell phone, etc) to a database of reference Signatures
`for the purpose of identification is to Simply compare the
`incoming Signature to every element of the database.
`However, since it may not be known where the reference
`Signatures might have occurred inside the incoming
`Signature, this comparison must be done at many time
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 13 of 19
`
`2
`locations within the incoming Signature. Each individual
`Signature-to-signature comparison at each point in time may
`also be done in a "brute-force' manner using techniques
`known in the art, essentially computing the full Euclidean
`distance between the entire Signatures feature vectors. A
`match can then be declared when one of these comparisons
`yields a Score or distance that is above or below Some
`threshold, respectively.
`However, when an audio signature or fingerprint contains
`a large number of features Such a brute-force Search
`becomes too expensive computationally for real-world data
`bases which typically have Several hundred thousand to
`Several million Signatures.
`Many researchers have worked on methods for multi
`dimensional indexing, although the greatest effort has gone
`into geographical (2-dimensional) or spatial (3-dimensional)
`data. Typically, all of these methods order the elements of
`the database based on their proximity to each other.
`For example, the elements of the database can be clus
`tered into hyper-spheres or hyper-rectangles, or the Space
`can be organized into a tree form by using partitioning
`planes. However, when the number of dimensions is large
`(on the order of 15 or more), it can be shown mathematically
`that more-or-leSS uniformly distributed points in the Space
`all become approximately equidistant from each other. Thus,
`it becomes impossible to cluster the data in a meaningful
`way, and comparisons can become both lengthy and inac
`Curate.
`Hence, there exists a need to provide a means for data
`comparison which overcomes the disadvantages of the prior
`art.
`
`BRIEF DESCRIPTION OF THE INVENTION
`A method and apparatus for identifying an unknown work
`is disclosed. In one aspect, a method may includes the acts
`of providing a reference database having a reduced dimen
`Sionality containing Signatures of Sampled works, receiving
`a Sampled work, producing a Signature from the work; and
`reducing the dimensionality of the Signature.
`BRIEF DESCRIPTION OF THE DRAWING
`FIGURES
`FIG. 1A is a flowchart of a method according to the
`present invention.
`FIG. 1B is a flowchart of another method according to the
`present invention.
`FIG. 2 is a diagram of a system suitable for use with the
`present invention.
`FIG. 3 is a diagram of Segmenting according to the
`present invention.
`FIG. 4 is a detailed diagram of Segmenting according to
`the present invention showing hop size.
`FIG. 5 is a graphical flowchart showing the creating of a
`Segment feature vector according to the present invention.
`FIG. 6 is a diagram of a Signature according to the present
`invention.
`FIG. 7A is a flowchart of a method for preparing a
`reference database according to the present invention.
`FIG. 7B is a flowchart of method for identifying an
`unknown work according to the present invention.
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`Persons of ordinary skill in the art will realize that the
`following description of the present invention is illustrative
`
`

`

`3
`only and not in any way limiting. Other embodiments of the
`invention will readily Suggest themselves to Such skilled
`perSons having the benefit of this disclosure.
`It is contemplated that the present invention may be
`embodied in various computer and machine-readable data
`Structures. Furthermore, it is contemplated that data Struc
`tures embodying the present invention will be transmitted
`acroSS computer and machine-readable media, and through
`communications Systems by use of Standard protocols Such
`as those used to enable the Internet and other computer
`networking Standards.
`The invention further relates to machine-readable media
`on which are Stored embodiments of the present invention.
`It is contemplated that any media Suitable for Storing instruc
`tions related to the present invention is within the Scope of
`the present invention. By way of example, Such media may
`take the form of magnetic, optical, or Semiconductor media.
`The present invention may be described through the use
`of flowcharts. Often, a single instance of an embodiment of
`the present invention will be shown. AS is appreciated by
`those of ordinary skill in the art, however, the protocols,
`processes, and procedures described herein may be repeated
`continuously or as often as necessary to Satisfy the needs
`described herein. Accordingly, the representation of the
`present invention through the use of flowcharts should not
`be used to limit the Scope of the present invention.
`The present invention may also be described through the
`use of web pages in which embodiments of the present
`invention may be viewed and manipulated. It is contem
`plated that Such web pages may be programmed with web
`page creation programs using languages Standard in the art
`such as HTML or XML. It is also contemplated that the web
`pages described herein may be viewed and manipulated with
`web browserS running on operating Systems Standard in the
`art, Such as the Microsoft Windows(E) and Macintosh(E)
`versions of Internet Explorer(R) and Netscape(R).
`Furthermore, it is contemplated that the functions performed
`by the various web pages described herein may be imple
`mented through the use of Standard programming languages
`Such a JavaE) or similar languages.
`The present invention will first be described in general
`overview. Then, each element will be described in further
`detail below.
`Referring now to FIG. 1A, a flowchart is shown which
`provides a general Overview of the present invention as
`related to the preparation of a database of reference Signa
`tures. Two overall acts are performed to prepare a reference
`database in accordance with the present invention: in act
`100, the present invention reduces the dimensionality of
`reference Signatures, and the reference database is indexed
`in act 102.
`Referring now to FIG. 1B, a flowchart is shown which
`provides a general Overview of the present invention as
`related to the identification of an unknown signature in
`accordance with the present invention. In act 104, a Sampled
`work is received. In act 106, the present invention reduces
`the dimensionality of the received work. In act 108, the
`present invention determines initial candidates. In act 110,
`the present invention Searches for the best candidate.
`Prior to presenting a detailed Overview of each act of
`FIGS. 1A and 1B, some background will first be presented.
`Structural Embodiment of the Present Invention
`Referring now to FIG. 2, a diagram of a System Suitable
`for use with the present invention is shown. FIG. 2 includes
`a client system 200. It is contemplated that client system 200
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,968,337 B2
`
`4
`may comprise a personal computer 202 including hardware
`and Software Standard in the art to run an operating System
`Such as Microsoft Windows.(R), MAC OS(R) Palm OS, UNIX,
`or other operating Systems Standard in the art. Client System
`200 may further include a database 204 for storing and
`retrieving embodiments of the present invention. It is con
`templated that database 204 may comprise hardware and
`Software Standard in the art and may be operatively coupled
`to PC 202. Database 204 may also be used to store and
`retrieve the works and Segments utilized by the present
`invention.
`Client system 200 may further include an audio/video
`(A/V) input device 208. A/V device 208 is operatively
`coupled to PC 202 and is configured to provide works to the
`present invention which may be Stored in traditional audio or
`video formats. It is contemplated that A/V device 208 may
`comprise hardware and Software Standard in the art config
`ured to receive and sample audio works (including video
`containing audio information), and provide the sampled
`Works to the present invention as digital audio files.
`Typically, the A/V input device 208 would supply raw audio
`samples in a format such as 16-bit stereo PCM format. A/V
`input device 208 provides an example of means for receiv
`ing a Sampled work.
`It is contemplated that Sampled WorkS may be obtained
`over the Internet, also. Typically, Streaming media over the
`Internet is provided by a provider, such as provider 218 of
`FIG. 2. Provider 218 includes a streaming application server
`220, configured to retrieve works from database 222 and
`Stream the works in a formats Standard in the art, Such as
`Real(F), Windows Media(E), or QuickTime(R). The server then
`provides the streamed works to a web server 224, which then
`provides the streamed work to the Internet 214 through a
`gateway 216. Internet 214 may be any packet-based network
`standard in the art, such as IP, Frame Relay, or ATM.
`To reach the provider 218, the present invention may
`utilize a cable or DSL head end 212 standard in the art
`operatively, which is coupled to a cable modem or DSL
`modem 210 which is in turn coupled to the system's network
`206. The network 206 may be any network standard in the
`art, such as a LAN provided by a PC 202 configured to run
`Software Standard in the art.
`It is contemplated that the Sampled work received by
`system 200 may contain audio information from a variety of
`Sources known in the art, including, without limitation,
`radio, the audio portion of a television broadcast, Internet
`radio, the audio portion of an Internet Video program or
`channel, Streaming audio from a network audio Server, audio
`delivered to personal digital assistants over cellular or
`wireleSS communication Systems, or cable and Satellite
`broadcasts.
`Additionally, it is contemplated that the present invention
`may be configured to receive and compare Segments coming
`from a variety of Sources either Stored or in real-time. For
`example, it is contemplated that the present invention may
`compare a real-time Streaming work coming from Streaming
`server 218 or A/V device 208 with a reference segment
`stored in database 204.
`Segmenting Background
`It is contemplated that a wide variety of Sampled works
`may be utilized in the present invention. However, the
`inventors have found the present invention especially useful
`with Segmented works. An Overview of a Segmented work
`will now be provided.
`FIG. 3 shows a diagram showing the Segmenting of a
`work according to the present invention. FIG. 3 includes
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 14 of 19
`
`

`

`S
`audio information 300 displayed along a time axis 302. FIG.
`3 further includes a plurality of segments 304,306, and 308
`taken of audio information 300 over some segment size T.
`In an exemplary non-limiting embodiment of the present
`invention, instantaneous values of a variety of acoustic
`features are computed at a low level, preferably about 100
`times a second. In particular, 10 MFCCs (cepstral
`coefficients) are computed. It is contemplated that any
`number of MFCCs may be computed. Preferably, 5-20
`MFCCs are computed, however, as many as 30 MFCCs may
`be computed, depending on the need for accuracy verSuS
`Speed.
`Segment-level features are disclosed U.S. Pat. No. 5,918,
`223 to Blum, et al., which is assigned to the assignee of the
`current disclosure and incorporated by reference as though
`fully Set forth herein. In an exemplary non-limiting embodi
`ment of the present invention, the Segment-level acoustical
`features comprise Statistical measures as disclosed in the
`223 patent of low-level features calculated over the length
`of each Segment. The data structure may store other book
`keeping information as well (segment size, hop size, item
`ID, UPC, etc). As can be seen by inspection of FIG. 3, the
`segments 304, 306, and 308 may overlap in time. This
`amount of overlap may be represented by measuring the
`time between the center point of adjacent Segments. This
`amount of time is referred to herein as the hop Size of the
`Segments, and is So designated in FIG. 3. By way of
`example, if the Segment length T of a given Segment is one
`Second, and adjacent Segments overlap by 50%, the hop size
`would be 0.5 second.
`The hop size may be set during the development of the
`Software. Additionally, the hop sizes of the reference data
`base and the real-time Signatures may be predetermined to
`facilitate compatibility. For example, the reference Signa
`tures in the reference database may be precomputed with a
`fixed hop and Segment size, and thus the client applications
`should conform to this Segment size and have a hop size
`which integrally divides the reference Signature hop size. It
`is contemplated that one may experiment with a variety of
`Segment sizes in order to balance the tradeoff of accuracy
`with Speed of computation for a given application.
`The inventors have found that by carefully choosing the
`hop Size of the Segments, the accuracy of the identification
`proceSS may be significantly increased. Additionally, the
`inventors have found that the accuracy of the identification
`proceSS may be increased if the hop Size of reference
`Segments and the hop Size of Segments obtained in real-time
`are each chosen independently. The importance of the hop
`Size of Segments may be illustrated by examining the
`proceSS for Segmenting pre-recorded works and real-time
`WorkS Separately.
`Reference Signatures
`Prior to attempting to identify a given work, a reference
`database of Signatures must be created. When building a
`reference database, a Segment length having a period of leSS
`than three Seconds is preferred. In an exemplary non
`limiting embodiment of the present invention, the Segment
`lengths have a period ranging from 0.5 Seconds to 3 Seconds.
`For a reference database, the inventors have found that a hop
`size of approximately 50% to 100% of the segment size is
`preferred.
`It is contemplated that the reference Signatures may be
`Stored on a database Such as database 204 as described
`above. Database 204 and the discussion herein provide an
`example of means for providing a plurality of reference
`Signatures each having a Segment size and a hop size.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Case 1:14-cv-02396-PGG-SN Document 239-5 Filed 11/12/20 Page 15 of 19
`
`US 6,968,337 B2
`
`6
`
`Unknown Signatures
`The choice of the hop size is important for the Signatures
`of the audio to be identified, hereafter referred to as
`“unknown audio.'
`FIG. 4 shows a detailed diagram of the Segmentation of
`unknown audio according to the present invention. FIG. 4
`includes unknown audio information 400 displayed along a
`time axis 402. FIG. 4 further includes segments 404 and 406
`taken of audio information 400 over some segment length T.
`In an exemplary non-limiting embodiment of the present
`invention, the Segment length of unknown audio Segments is
`chosen to range from 0.5 to 3 Seconds.
`As can be seen by inspection of FIG. 4, the hop size of
`unknown audio Segments is chosen to be Smaller than that of
`reference Segments. In an exemplary non-limiting embodi
`ment of the present invention, the hop Size of unknown
`audio Segments is less than 50% of the Segment size. In yet
`another exemplary non-limiting embodiment of the present
`invention, the unknown audio hop Size may be 0.1 Seconds.
`The inventors have found Such a Small hop Size advan
`tageous for the following reasons. The ultimate purpose of
`generating unknown audio Segments is to analyze and
`compare them with the reference Segments in the database to
`look for matches. The inventors have found at least two
`major reasons why an unknown audio recording would not
`match its counterpart in the database. One is that the
`broadcast channel does not produce a perfect copy of the
`original. For example, the work may be edited or processed
`or the announcer may talk over part of the work. The other
`reason is that larger Segment boundaries may not line up in
`time with the original segment boundaries of the target
`recordings.
`The inventors have found that by choosing a Smaller hop
`size, Some of the Segments will ultimately have time bound
`aries that line up with the original Segments, notwithstand
`ing the problems listed above. The Segments that line up
`with a “clean” segment of the work may then be used to
`make an accurate comparison while those that do not So line
`up may be ignored. The inventors have found that a hop Size
`of 0.1 seconds seems to be the maximum that would solve
`this time shifting problem.
`AS mentioned above, once a work has been Segmented,
`the individual Segments are then analyzed to produce a
`Segment feature vector. FIG. 5 is a diagram showing an
`Overview of how the Segment feature vectors may be created
`using the methods described in U.S. Pat. No. 5,918,223 to
`Blum, et al. It is contemplated that a variety of analysis
`methods may be useful in the present invention, and many
`different features may be used to make up the feature vector.
`The inventors have found that the pitch, brightness,
`bandwidth, and loudness features of the 223 patent to be
`useful in the present invention. Additionally, Spectral fea
`tures may be used analyzed, Such as the energy in various
`spectral bands. The inventors have found that the cepstral
`features (MFCCs) are very robust (more invariant) given the
`distortions typically introduced during broadcast, Such as
`EQ, multi-band compression/limiting, and audio data com
`pression techniques Such as MP3 encoding/decoding, etc.
`In act 500, the audio segment is sampled to produce a
`Segment. In act 502, the Sampled Segment is then analyzed
`using Fourier Transform techniques to transform the Signal
`into the frequency domain. In act 504, mel frequency filters
`are applied to the transformed Signal to extract the Signifi
`cant audible characteristics of the spectrum. In act 506, a
`Discrete Cosine Transform is applied which converts the
`Signal into mel frequency cepstral coefficients (MFCCs).
`
`

`

`US 6,968,337 B2
`
`7
`Finally, in act 508, the MFCCs are then averaged over a
`predetermined period. In an exemplary non-limiting
`embodiment of the present invention, this period is approxi
`mately one Second. Additionally, other characteristics may
`be computed at this time, Such as brightness or loudness. A
`5
`Segment feature vector is then produced which contains a list
`containing at least the 10 MFCCS corresponding average.
`The disclosure of FIGS. 3, 4, and 5 provide examples of
`means for creating a signature of a Sampled work having a
`Segment size and a hop size.
`FIG. 6 is a diagram showing a complete signature 600
`according to the present invention. Signature 600 includes a
`plurality of Segment feature vectors 1 through n generated as
`shown and described above. Signature 600 may also include
`an identification portion containing a unique ID. It is con
`templated that the identification portion may contain a
`unique identifier provided by the RIAA (Recording Industry
`ASSociation of America) or Some other audio authority or
`cataloging agency. The identification portion may also con
`tain information such as the UPC (Universal Product Code)
`of the various products that contain the audio corresponding
`to this Signature. Additionally, it is contemplated that the
`Signature 600 may also contain information pertaining to the
`characteristics of the file itself, Such as the hop size, Segment
`size, number of Segments, etc., which may be useful for
`Storing and indexing.
`Signature 600 may then be stored in a database and used
`for comparisons.
`The following computer code in the C programming
`language provides an example of a database Structure in
`memory according to the present invention:
`
`8
`Signature derived from a Segmentation of the original audio
`work as described above. In a presently preferred
`embodiment, reference Signatures have 20 non-overlapping
`Segments, where each Segment is one Second in duration,
`with one-Second spacing from center to center, as described
`above. Each of these segments is represented by 10 Mel
`filtered cepstral coefficients (MFCCs), resulting in a feature
`vector of 200 dimensions. Since indexing a vector Space of
`this dimensionality is not practical, the number of dimen
`Sions used for the initial Search for possible candidates is
`reduced according to the present invention.
`Reducing the Dimensionality
`FIG. 7A is a flowchart of dimension reduction according
`to the present invention. The number of dimensions used for
`the initial Search for possible candidates is reduced, resulting
`in what the inventors refer to as a Subspace. By having the
`present invention Search a Subspace at the outset, the effi
`ciency of the Search may be greatly increased.
`Referring now to FIG. 7A, the present invention accom
`plishes two tasks to develop this Subspace: (1) the present
`invention uses less than the total number of Segments in the
`reference signatures in act 701; and (2) the present invention
`performs a principal components analysis to reduce the
`dimensionality in act 703.
`Using Less Segments to Perform an Initial Search
`The inventors empirically have found that using data from
`two consecutive segments (i.e., a two-second portion of the
`Signature) to search for approximately 500 candidates is a
`good tradeoff between computation complexity and accu
`racy. The number of candidates can be altered for different
`applications where either speed or accuracy is more or leSS
`important.
`For example, the present invention may be configured to
`extract a predetermined percentage of candidates. In an
`exemplary non-limiting embodiment of the present
`invention, a list of candidates may comprise 2% of the size
`of the reference Signature database when using 2 Segments
`for the initial Search. In another exemplary non-limiting
`embodiment of the present invention, a list of candidates
`may be those reference Signatures whose distances based on
`the initial 2 Segment Search are below a certain threshold.
`As will be appreciated by those of ordinary skill in the ar

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket