`
`
`
`Exhibit 26
`
`
`
`
`
`
`
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 2 of 32
`ETlQmégbfigv-OBQE-PGG-SN Document 239-8 Filed 11/12/20 PageZof 32
`
`This application is submitted in the name of inventor Erling H. Wold.
`
`AMC—Ol—OO4P
`
`SPECIFICATION
`
`METHOD AND APPARATUS FOR
`
`IDENTIFYING AN UNKNOWN WORK
`
`BACKGROUND OF THE INVENTION
`
`Field of the Invention
`
`The present invention relates to data communications. In particular, the present
`
`
`
`
`
`
`,4“?
`
`10
`
`invention relates to a novel method and apparatus for identifying an unknown work.
`
`The Prior Art
`
`Background
`
`Digital audio technology has greatly changed the landscape of music and
`
`entertainment. Rapid increases in computing power coupled with decreases in cost have
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 3 of 32
`ETR§§§6W02396PGGSN Document 239-8 Filed 11/12/20 Page3of 32
`
`AMC-Ol-OO4P
`
`made it possible for individuals to generate finished products having a quality once
`
`available only in a major studio. One consequence of modern technology is that legacy
`
`media storage standards, such as reel—to—reel tapes, are being rapidly replaced by digital
`
`storage media, such as the Digital Versatile Disk (DVD), and Digital Audio Tape (DAT).
`
`Additionally, with higher capacity hard drives standard on most personal computers,
`
`home users may now store digital files such as audio or video tracks on their home
`
`computers .
`
`Furthermore, the Internet has generated much excitement, particularly among
`
`those who see the Internet as an opportunity to develop new avenues for artistic
`
`expression and communication. The Internet has become a Virtual gallery, where artists
`
`may post their works on a Web page. Once posted, the works may be viewed by anyone
`
`having access to the Internet.
`
`
`
`
`One application of the Internet that has received considerable attention is the
`
`ability to transmit recorded music over the Internet. Once music has been digitally
`
`15
`
`encoded, the audio may be both downloaded by users for play, or broadcast (“streamed”)
`
`over the Internet. When audio is streamed, it may be listened to by Internet users in a
`
`manner much like traditional radio stations.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 4 of 32
`ETfifiggébW-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page40f 32
`
`AMC-Ol-OO4P
`
`Given the widespread use of digital media, digital audio files, or digital video files
`
`containing audio information, may need to be identified. The need for identification of
`
`digital files may arise in a variety of situations. For example, an artist may wish to verify
`
`royalty payments or generate their own Arbitron®—like ratings by identifying how often
`
`their works are being streamed or downloaded. Additionally, users may wish to identify
`
`a particular work. The prior art has made efforts to create methods for identifying digital
`
`audio works.
`
`
`
`
`itin—3412‘.=dim.
`
`However, systems of the prior art suffer from certain disadvantages. One area of
`
`difficulty arises when a large number of reference signatures must be compared to an
`
`unknown audio recording.
`
`The simplest method for comparing an incoming audio signature (which could be
`
`from a file on the Internet, a recording of a radio or Internet radio broadcast, a recording
`
`from a cell phone, etc) to a database of reference signatures for the purpose of
`
`identification is to simply compare the incoming signature to every element of the
`
`15
`
`database. However, since it may not be known where the reference signatures might have
`
`occurred inside the incoming signature, this comparison must be done at many time
`
`locations within the incoming signature. Each individual signature-to—signature
`
`comparison at each point in time may also be done in a “brute-force” manner using
`
`U)
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 5 of 32
`ETEfigégW-OBQdPGG-SN DO0Ument239—8 Filed 11/12/20 Page50f 32
`
`AMC-Ol-OO4P
`
`techniques known in the art; essentially computing the full Euclidean distance between
`
`the entire signatures’ feature vectors. A match can then be declared when one of these
`
`comparisons yields a score or distance that is above or below some threshold,
`
`respectively.
`
`However, when an audio signature or fingerprint contains a large number of
`
`features such a brute—force search becomes too expensive computationally for real-world
`
`databases which typically have several hundred thousand to several million signatures.
`
`Many researchers have worked on methods for multi—dimensional indexing,
`
`although the greatest effort has gone into geographical (2-dimensional) or spatial (3-
`
`dimensional) data. Typically, all of these methods order the elements of the database
`
`based on their proximity to each other.
`
`For example, the elements of the database can be clustered into hyper-spheres or
`
`hyper—rectangles, or the space can be organized into a tree form by using partitioning
`
`planes. However, when the number of dimensions is large (on the order of 15 or more), it
`
`
`
`
`15
`
`can be shown mathematically that more-or-less uniformly distributed points in the space
`
`all become approximately equidistant from each other. Thus, it becomes impossible to
`
`cluster the data in a meaningful way, and comparisons can become both lengthy and
`
`inaccurate.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 6 of 32
`Equggwm-OZPJQE-PGG-SN Document 239-8 Filed 11/12/20 Page60f 32
`
`Hence, there exists a need to provide a means for data comparison which
`
`overcomes the disadvantages of the prior art.
`
`AMC-01—004P
`
`BRIEF DESCRIPTION OF THE INVENTION
`
`A method and apparatus for identifying an unknown work is disclosed. In one
`
`5
`
`aspect, a method may includes the acts of providing a reference database having a
`
`
`
`
`
`reduced dimensionality containing signatures of sampled works; receiving a sampled
`
`work; producing a signature from the work; and reducing the dimensionality of the
`
`signature.
`
`BRIEF DESCRIPTION OF THE DRAWING FIGURES
`
`Figure 1A is a flowchart of a method according to the present invention.
`
`Figure 1B is a flowchart of another method according to the present invention.
`
`Figure 2 is a diagram of a system suitable for use with the present invention.
`
`Figure 3 is a diagram of segmentng according to the present invention.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 7 of 32
`Efigfgfiblfifig-OZEG-PGG-SN Document 239-8 Filed 11/12/20 Page7of 32
`
`Figure 4 is a detailed diagram of segmenting according to the present invention
`
`AMC-Ol-004P
`
`showing hop size.
`
`Figure 5 is a graphical flowchart showing the creating of a segment feature vector
`
`according to the present invention.
`
`5
`
`Figure 6 is a diagram of a signature according to the present invention.
`
`
`
`Figure 7A is a flowchart of a method for preparing a reference database according
`
`to the present invention.
`
`Figure 7B is a flowchart of method for identifying an unknown work according to
`
`.1
`
`the present invention.
`
`DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
`
`Persons of ordinary skill in the art will realize that the following description of the
`
`present invention is illustrative only and not in any way limiting. Other embodiments of
`
`the invention will readily suggest themselves to such skilled persons having the benefit of
`
`15
`
`this disclosure.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 8 of 32
`EngqsgéGggtpg-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page80f 32
`
`It is contemplated that the present invention may be embodied in various computer
`
`AMC-Ol-004P
`
`and machine-readable data structures. Furthermore, it is contemplated that data
`
`structures embodying the present invention will be transmitted across computer and
`
`machine—readable media, and through communications systems by use of standard
`
`protocols such as those used to enable the Internet and other computer networking
`
`standards.
`
`
`
`15
`
`The invention further relates to machine-readable media on which are stored
`
`embodiments of the present invention. It is contemplated that any media suitable for
`
`storing instructions related to the present invention is within the scope of the present
`
`invention. By way of example, such media may take the form of magnetic, optical, or
`
`semiconductor media.
`
`The present invention may be described through the use of flowcharts. Often, a
`
`single instance of an embodiment of the present invention will be shown. As is
`
`appreciated by those of ordinary skill in the art, however, the protocols, processes, and
`
`procedures described herein may be repeated continuously or as often as necessary to
`
`satisfy the needs described herein. Accordingly, the representation of the present
`
`invention through the use of flowcharts should not be used to limit the scope of the
`
`present invention.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 9 of 32
`EMpggmg-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page90f 32
`
`AMC-Ol-OO4P
`
`The present invention may also be described through the use of web pages in
`
`which embodiments of the present invention may be Viewed and manipulated. It is
`
`contemplated that such web pages may be programmed with web page creation programs
`
`using languages standard in the art such as HTML or XML. It is also contemplated that
`
`the web pages described herein may be viewed and manipulated with web browsers
`
`running on operating systems standard in the art, such as the Microsoft Windows® and
`
`Macintosh® versions of Internet Explorer® and Netscape®. Furthermore, it is
`
`contemplated that the functions performed by the various web pages described herein
`
`may be implemented through the use of standard programming languages such a Java®
`
`or similar languages.
`
`The present invention will first be described in general overview. Then, each
`
`element will be described in further detail below.
`
`
`
`
`Referring now to Figure 1A, a flowchart is shown which provides a general
`
`overview of the present invention as related to the preparation of a database of reference
`
`15
`
`signatures. Two overall acts are performed to prepare a reference database in
`
`accordance with the present invention:
`
`in act 100, the present invention reduces the
`
`dimensionality of reference signatures; and the reference database is indexed in act 102.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 10 of 32
`EfigsgflgéglgggOBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 10 of 32
`
`AMC-01-004P
`
`Referring now to Figure 1B, a flowchart is shown which provides a general
`
`overview of the present invention as related to the identification of an unknown signature
`
`in accordance with the present invention. In act 104, a sampled work is received. In act
`
`106, the present invention reduces the dimensionality of the received work. In act 108,
`
`the present invention determines initial candidates. In act 110, the present invention
`
`searches for the best candidate.
`
`Prior to presenting a detailed overview of each act of FIGS. 1A and 1B, some
`
`background will first be presented.
`
`Structural embodiment of the present invention
`
`Referring now to Figure 2, a diagram of a system suitable for use with the present
`
`invention is shown. FIG. 2 includes a client system 200. It is contemplated that client
`
`system 200 may comprise a personal computer 202 including hardware and software
`
`standard in the art to run an operating system such as Microsoft Windows ®, MAC OS ®
`
`
`
`Palm OS, UNIX, or other operating systems standard in the art. Client system 200 may
`
`15
`
`further include a database 204 for storing and retrieving embodiments of the present
`
`invention. It is contemplated that database 204 may comprise hardware and software
`
`standard in the art and may be operatively coupled to PC 202. Database 204 may also be
`
`used to store and retrieve the works and segments utilized by the present invention.
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 11 of 32
`EfifiggéégggSOZBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 11 of 32
`
`AMC-Ol—OO4P
`
`Client system 200 may further include an audio/video (A/V) input device 208.
`
`A/V device 208 is operatively coupled to PC 202 and is configured to provide works to
`
`the present invention which may be stored in traditional audio or video formats. It is
`
`contemplated that A/V device 208 may comprise hardware and software standard in the
`
`art configured to receive and sample audio works (including video containing audio
`
`information), and provide the sampled works to the present invention as digital audio
`
`files. Typically, the A/V input device 208 would supply raw audio samples in a format
`
`such as 16-bit stereo PCM format. A/V input device 208 provides an example of means
`
`for receiving a sampled work.
`
`It is contemplated that sampled works may be obtained over the Internet, also.
`
`Typically, streaming media over the Internet is provided by a provider, such as provider
`
`218 of FIG. 2. Provider 218 includes a streaming application server 220, configured to
`
`retrieve works from database 222 and stream the works in a formats standard in the art,
`
`
`
`
`
`such as Real®, Windows Media®, or QuickTime®. The server then provides the
`
`15
`
`streamed works to a web server 224, which then provides the streamed work to the
`
`Internet 214 through a gateway 216. Internet 214 may be any packet—based network
`
`standard in the art, such as IP, Frame Relay, or ATM.
`
`10
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 12 of 32
`EfifigfigaggggOBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 12 of 32
`
`AMC-01-004P
`
`To reach the provider 218, the present invention may utilize a cable or DSL head
`
`end 212 standard in the art operatively, which is coupled to a cable modem or DSL
`
`modem 210 which is in turn coupled to the system’s network 206. The network 206 may
`
`be any network standard in the art, such as a LAN provided by a PC 202 configured to
`
`run software standard in the art.
`
`
`
`
`It is contemplated that the sampled work received by system 200 may contain
`
`audio information from a variety of sources known in the art, including, without
`
`limitation, radio, the audio portion of a television broadcast, Internet radio, the audio
`
`portion of an Internet video program or channel, streaming audio from a network audio
`
`server, audio delivered to personal digital assistants over cellular or wireless
`
`communication systems, or cable and satellite broadcasts.
`
`Additionally, it is contemplated that the present invention may be configured to
`
`receive and compare segments coming from a variety of sources either stored or in real-
`
`time. For example, it is contemplated that the present invention may compare a real-time
`
`15
`
`streaming work coming from streaming server 218 or A/V device 208 with a reference
`
`segment stored in database 204.
`
`S egmenting background
`
`11
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 13 of 32
`Efififigéc’bébfié-OZBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 13 of 32
`
`It is contemplated that a wide variety of sampled works may be utilized in the
`
`present invention. However, the inventors have found the present invention especially
`
`useful with segmented works. An overview of a segmented work will now be provided.
`
`AMC—01-004P
`
`Figure 3 shows a diagram showing the segmenting of a work according to the
`
`present invention. FIG. 3 includes audio information 300 displayed along a time axis
`
`302. FIG. 3 further includes a plurality of segments 304, 306, and 308 taken of audio
`
`information 300 over some segment size T.
`
`In an exemplary non-limiting embodiment of the present invention, instantaneous
`
`values of a variety of acoustic features are computed at a low level, preferably about 100
`
`times a second. In particular, 10 MFCCS (cepstral coefficients) are computed. It is
`
`contemplated that any number of MFCCs may be computed. Preferably, 5-20 MFCCs
`
`are computed, however, as many as 30 MFCCs may be computed, depending on the need
`
`for accuracy versus speed.
`
`
`
`
`
`
`
`Segment-level features are disclosed US Patent #5,918,223 to Blum, et al., which
`
`15
`
`is assigned to the assignee of the current disclosure and incorporated by reference as
`
`though fully set forth herein. In an exemplary non-limiting embodiment of the present
`
`invention, the segment-level acoustical features comprise statistical measures as disclosed
`
`in the ‘223 patent of low-level features calculated over the length of each segment. The
`
`12
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 14 of 32
`EfiflfSSlSWOBga'PGG'SN Document 239-8 Filed 11/12/20 Page 14 of 32
`
`data structure may store other bookkeeping information as well (segment size, hop size,
`
`item ID, UPC, etc).
`
`AMC-01-004P
`
`As can be seen by inspection of FIG. 3, the segments 304, 306, and 308 may
`
`overlap in time. This amount of overlap may be represented by measuring the time
`
`between the center point of adjacent segments. This amount of time is referred to herein
`
`as the hop size of the segments, and is so designated in FIG. 3. By way of example, if the
`
`segment length T of a given segment is one second, and adjacent segments overlap by
`
`50%, the hop size would be 0.5 second.
`
`The hop size may be set during the development of the software. Additionally,
`
`the hop sizes of the reference database and the real—time signatures may be predetermined
`
`to facilitate compatibility. For example, the reference signatures in the reference
`
`database may be precomputed with a fixed hop and segment size, and thus the client
`
`applications should conform to this segment size and have a hop size which integrally
`
`
`
`divides the reference signature hop size. It is contemplated that one may experiment with
`
`15
`
`a variety of segment sizes in order to balance the tradeoff of accuracy with speed of
`
`computation for a given application.
`
`The inventors have found that by carefully choosing the hop size of the segments,
`
`the accuracy of the identification process may be significantly increased. Additionally,
`
`13
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 15 of 32
`Efigsssfléafibfié-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 15 of 32
`
`AMC-01-004P
`
`the inventors have found that the accuracy of the identification process may be increased
`
`if the hop size of reference segments and the hop size of segments obtained in real-time
`
`are each chosen independently. The importance of the hop size of segments may be
`
`illustrated by examining the process for segmenting pre-recorded works and real—time
`
`works separately.
`
`Reference signatures
`
`
`
`
`
`
`
`Prior to attempting to identify a given work, a reference database of signatures
`
`must be created. When building a reference database, a segment length having a period
`
`of less than three seconds is preferred. In an exemplary non-limiting embodiment of the
`
`present invention, the segment lengths have a period ranging from 0.5 seconds to 3
`
`seconds. For a reference database, the inventors have found that a hop size of
`
`approximately 50% to 100% of the segment size is preferred.
`
`It is contemplated that the reference signatures may be stored on a database such
`
`as database 204 as described above. Database 204 and the discussion herein provide an
`
`15
`
`example of means for providing a plurality of reference signatures each having a segment
`
`size and a hop size.
`
`Unknown signatures
`
`l4
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 16 of 32
`Efigf85%6b§-fig-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 16 of 32
`
`The Choice of the hop size is important for the signatures of the audio to be
`
`identified, hereafter referred to as “unknown audio.”
`
`AMC-01-004P
`
`Figure 4 shows a detailed diagram of the segmentation of unknown audio
`
`according to the present invention. FIG. 4 includes unknown audio information 400
`
`displayed along a time axis 402. FIG. 4 further includes segments 404 and 406 taken of
`
`audio information 400 over some segment length T. In an exemplary non—limiting
`
`embodiment of the present invention, the segment length of unknown audio segments is
`
`chosen to range from 0.5 to 3 seconds.
`
`As can be seen by inspection of FIG. 4, the hop size of unknown audio segments is
`
`chosen to be smaller than that of reference segments. In an exemplary non-limiting
`
`embodiment of the present invention, the hop size of unknown audio segments is less
`
`than 50% of the segment size. In yet another exemplary non—limiting embodiment of the
`
`present invention, the unknown audio hop size may be 0.1 seconds.
`
`
`
`
`
`The inventors have found such a small hop size advantageous for the following
`
`15
`
`reasons. The ultimate purpose of generating unknown audio segments is to analyze and
`
`compare them with the reference segments in the database to look for matches. The
`
`inventors have found at least two major reasons why an unknown audio recording would
`
`not match its counterpart in the database. One is that the broadcast channel does not
`
`15
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 17 of 32
`Efigfisslécbébfié-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 17 of 32
`
`produce a perfect copy of the original. For example, the work may be edited or processed
`
`or the announcer may talk over part of the work. The other reason is that larger segment
`
`boundaries may not line up in time with the original segment boundaries of the target
`
`AMC-Ol-OO4P
`
`recordings.
`
`The inventors have found that by choosing a smaller hop size, some of the
`
`segments will ultimately have time boundaries that line up with the original segments,
`
`notwithstanding the problems listed above. The segments that line up with a “Clean”
`
`segment of the work may then be used to make an accurate comparison while those that
`
`do not so line up may be ignored. The inventors have found that a hop size of 0.1 seconds
`
`seems to be the maximum that would solve this time shifting problem.
`
`As mentioned above, once a work has been segmented, the individual segments
`
`are then analyzed to produce a segment feature vector. Figure 5 is a diagram showing an
`
`overview of how the segment feature vectors may be created using the methods described
`
`in US Patent #5,918,223 to Blum, et a1. It is contemplated that a variety of analysis
`
`
`
`
`15
`
`methods may be useful in the present invention, and many different features may be used
`
`to make up the feature vector.“ The inventors have found that the pitch, brightness,
`
`bandwidth, and loudness features of the ‘223 patent to be useful in the present invention.
`
`Additionally, spectral features may be used analyzed, such as the energy in various
`
`l6
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 18 of 32
`EfigffisfléaébfisrozwfiPGG-SN Document 239-8 Filed 11/12/20 Page 18 of 32
`
`AMC—Ol-004P
`
`spectral bands. The inventors have found that the cepstral features (MFCCs) are very
`
`robust (more invariant) given the distortions typically introduced during broadcast, such
`
`as EQ, multi-band compression/limiting, and audio data compression techniques such as
`
`MP3 encoding/decoding, etc.
`
`
`
`
`In act 500, the audio segment is sampled to produce a segment. In act 502, the
`
`sampled segment is then analyzed using Fourier Transform techniques to transform the
`
`signal into the frequency domain. In act 504, mel frequency filters are applied to the
`
`transformed signal to extract the significant audible characteristics of the spectrum. In
`
`act 506, a Discrete Cosine Transform is applied which converts the signal into mel
`
`frequency cepstral coefficients (MFCCs). Finally, in act 508, the MFCCS are then
`
`averaged over a predetermined period. In an exemplary non—limiting embodiment of the
`
`present invention, this period is approximately one second. Additionally, other
`
`characteristics may be computed at this time, such as brightness or loudness. A segment
`
`feature vector is then produced which contains a list containing at least the 10 MFCCS
`
`15
`
`corresponding average.
`
`The disclosure of FIGS. 3, 4, and 5 provide examples of means for creating a
`
`signature of a sampled work having a segment size and a hop size.
`
`17
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 19 of 32
`Efiagfagégfilécys-OZBQSPGG-SN Document 239-8 Filed 11/12/20 Page 19 of 32
`
`AMC-Ol-OO4P
`
`Figure 6 is a diagram showing a complete signature 600 according to the present
`
`invention. Signature 600 includes a plurality of segment feature vectors 1 through It
`
`generated as shown and described above. Signature 600 may also include an
`
`identification portion containing a unique ID. It is contemplated that the identification
`
`portion may contain a unique identifier provided by the RIAA (Recording Industry
`
`Association of America) or some other audio authority or cataloging agency. The
`
`identification portion may also contain information such as the UPC (Universal Product
`
`Code) of the various products that contain the audio corresponding to this signature.
`
`Additionally, it is contemplated that the signature 600 may also contain information
`
`pertaining to the characteristics of the file itself, such as the hop size, segment size,
`
`number of segments, etc., which may be useful for storing and indexing.
`
`Signature 600 may then be stored in a database and used for comparisons.
`
`The following computer code in the C programming language provides an
`
`example of a database structure in memory according to the present invention:
`
`typedef struct
`{
`
`float hopSize;
`float
`segmentSize;
`MFSignature* signatures;
`} MFDatabase;
`
`/* hop size */
`/* segment size */
`/* array of signatures */
`
`l8
`
`
`
`
`15
`
`20
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 20 of 32
`gage 1'14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 20 of 32
`056099US
`
`AMC-O l -004P
`
`10
`
`
`
`The following provides an example of the structure of a segment according to the
`present invention:
`
`typedef struct
`{
`
`char* id;
`
`numSegrnents;
`long
`float* features;
`
`long
`size;
`float hopSize;
`float
`segmentSize;
`} MFSignature;
`
`/* unique ID for this audio clip */
`/"< number of segments */
`/* feature array */
`/* size of per—segment feature vector */
`
`The discussion of FIG. 6 provides an example of means for storing segments and
`
`signatures according to the present invention.
`
`A more detailed description of the operation of the present invention will now be
`
`provided.
`
`Referring now to Figure 7A, a flowchart showing one aspect of a method
`
`according to the present invention is presented.
`
`Reference database preparation
`
`19
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 21 of 32
`Efig§85%&94§fi\§02396-PGG-SN Document 239-8 Filed 11/12/20 Page 21 of 32
`
`AMC-Ol—OO4P
`
`Prior to the identification of an unknown sample, a database of reference
`
`signatures is prepared in accordance with the present invention.
`
`In an exemplary non—limiting embodiment of the present invention, a reference
`
`signature may comprise an audio signature derived from a segmentation of the original
`
`audio work as described above. In a presently preferred embodiment, reference
`
`signatures have 20 non-overlapping segments, where each segment is one second in
`
`duration, with one—second spacing from center to center, as described above. Each of
`
`these segments is represented by 10 Mel filtered cepstral coefficients (MFCCs), resulting
`
`in a feature vector of 200 dimensions. Since indexing a vector space of this
`
`dimensionality is not practical, the number of dimensions used for the initial search for
`
`possible candidates is reduced according to the present invention.
`
`Reducing the dimensionality
`
`
`
`
`
`
`
`Figure 7A is a flowchart of dimension reduction according to the present
`
`invention. The number of dimensions used for the initial search for possible candidates is
`
`15
`
`reduced, resulting in what the inventors refer to as a subspace. By having the present
`
`invention search a subspace at the outset, the efficiency of the search may be greatly
`
`increased.
`
`20
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 22 of 32
`Efifigjfléc’légpkOBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 22 of 32
`
`AMC—Ol—OO4P
`
`Referring now to FIG. 7A, the present invention accomplishes two tasks to
`
`develop this subspace: (1) the present invention uses less than the total number of
`
`segments in the reference signatures in act 701; and (2) the present invention performs a
`
`principal components analysis to reduce the dimensionality in act 703.
`
`Using less segments to perform an initial search
`
`
`
`
`
`The inventors empirically have found that using data from two consecutive
`
`segments (i.e., a two-second portion of the signature) to search for approximately 500
`
`candidates is a good tradeoff between computation complexity and accuracy. The
`
`number of candidates can be altered for different applications where either speed or
`
`accuracy is more or less important.
`
`For example, the present invention may be configured to extract a predetermined
`
`percentage of candidates. In an exemplary non—limiting embodiment of the present
`
`invention, a list of candidates may comprise 2% of the size of the reference signature
`
`database when using 2 segments for the initial search. In another exemplary non-limiting
`
`15
`
`embodiment of the present invention, a list of candidates may be those reference
`
`signatures whose distances based on the initial 2 segment search are below a certain
`
`threshold.
`
`21
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 23 of 32
`Efigfissléfig-fié-OBQE-PGG-SN Document 239-8 Filed 11/12/20 Page 23 of 32
`
`AMC-Ol-OO4P
`
`As will be appreciated by those of ordinary skill in the art, the dimension
`
`reduction of the present invention may be used to perform initial search using fewer
`
`segments for data other than MFCC—based feature vectors.
`
`It is contemplated that any
`
`feature—based vector set may be used in the present invention.
`
`Furthermore, the segments used in the initial search do not have to be the same
`
`size as the segments used for the final search. Since it may be better to use as few
`
`dimensions as possible in the initial search for candidates, a smaller segment size is
`
`advantageous here. The full segment size can then be used in the final search. In an
`
`exemplary non—limiting embodiment of the current invention, the initial search may use
`
`the higher-order MFCCs (since these are the most robust) — this is a simple way to reduce
`
`the dimensionality.
`
`In the next section, we will discuss another, more sophisticated, method for
`
`reducing the segment size for the initial candidate search.
`
`
`
`Perform alternate encoding
`
`15
`
`The second step is to use an alternate encoding of the MFCC data which has the
`
`same information but with fewer features.
`
`22
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 24 of 32
`fiafi6éggg§fi§2396-PGG-SN Document 239-8 Filed 11/12/20 Page 24 of 32
`
`AMC-Ol-004P
`
`To accomplish this, the present invention first performs an eigenanalysis of N
`
`candidates to determine the principal components of the MFCCs for our typical audio
`
`data. In an exemplary non-limiting embodiment of the present invention, the present
`
`invention examines 25,000 audio signatures of 20 segments each - each taken from a
`
`different recording, which gives provides 500,000 sets of MFCCs. The inventors have
`
`found that this is enough to be a good statistical sample of the feature vectors.
`
`
`
`
`
`As is appreciated by those of ordinary skill in the art, the number examined in the
`
`present invention may be adjusted to provide a good statistical sample of different kinds
`
`of music. For example, 100 or a 1000 segments may be satisfactory.
`
`Next, a Karhunen-Loeve transformation is derived. Each set of 10 MFCCS
`
`becomes a column of a matrix A. We then compute ATA and find the 10 eigenvalues and
`
`eigenvectors of this matrix. Sorting the eigenvectors by eigenvalue (largest eigenvalue
`
`first) results in a list of orthogonal basis vectors that are the principal components of the
`
`segment data. For a database of typical music recordings, 95% of the information in the
`
`15
`
`MFCCs is contained in the first 7 components of this new basis.
`
`As is known by those having ordinary skill in the art, the Karhunen—Loeve
`
`transformation is represented by the matrix that has the all 10 of the above eigenvectors
`
`as its rows. This transformation is applied to all the segments of all the reference
`
`
`
`Case 1:14-cv-02396-PGG-SN Document 239-8 Filed 11/12/20 Page 25 of 32
`gfigl%§l64690§/6032396-PGG-SN Document 239-8 Filed 11/12/20 Page 25 of 32
`
`AMC-Ol-004P
`
`signatures in the database as well as to all the segments of any signatures that are to be
`
`identified. This allows approximate