throbber
Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 1 of 12
`
`
`
`Exhibit 27
`
`
`
`
`
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 2 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 2 of 12
`
`A SYSTEM AND METHOD FOR ACOUSTIC FINGERPRINTING
`
`Docket No. G1841-906862
`
`Field of the Invention
`
`The present
`
`invention is related to a method for the creation of digital
`
`5
`
`fingerprints that are representative of the acoustic properties of an audio signal. More
`
`particularly,
`
`it
`
`is a system to allow the creation of fingerprints that allow the
`
`recognition of audio signals,
`
`independent of common signal distortions, such as
`
`normalization and psycho acoustic compression.
`
`10
`
`Description of the Prior Art
`
`Acoustic fingerprinting has historically been used primarily for
`
`signal
`
`recognition purposes, in particular, terrestrial radio monitoring systems. Since these
`
`were primarily continuous audio sources, fingerprinting solutions were required
`
`which dealt with the lack of delimiters between given signals. Additionally,
`
`15
`
`performance was not a primary concern of these systems, as any given monitoring
`
`system did not have to discriminate between hundreds of thousands of signals, and the
`
`ability to tune the system for speed versus robustness was not of great importance.
`
`As a survey of the existing approaches, 5,918,223 describes a system that
`
`builds sets of feature vectors, using such features as bandwidth, pitch, brightness,
`
`20
`
`loudness, and MFCC coefficients. It has problems relating to the cost of the match
`
`algorithm (which requires summed differences across the entire feature vector set), as
`
`well as the discrimination potential inherent in its feature bank. Many common signal
`
`distortions that are encountered in compressed audio files, such as normalization,
`
`impact
`
`those features, making them unacceptable for
`
`a large—scale
`
`system.
`
`25
`
`Additionally, it is not tunable for speed versus robustness, which is an important trait
`
`for certain systems.
`
`TYSOO1:9132658v1lOOOOO1-#BRCH7IO3\13\01
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 3 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 3 of 12
`
`5,581,658 describes a system which uses neural networks to identify audio
`
`content. It has advantages in high noise situations versus feature vector based systems,
`
`but does not scale effectively, clue to the cost of running a neural network to
`
`discriminate between hundreds of thousands, and potentially millions of signal
`
`5
`
`patterns, making it impractical for a large—scale system.
`
`5,210,820 describes an earlier form of feature vector analysis, which uses a
`
`simple spectral band analysis, with statistical measures such as variance, moments,
`
`and kurtosis calculations applied. It proves to be effective at recognizing audio signals
`
`after common radio style distortions, such as speed and volume shifts, but tends to
`
`10
`
`break down under psycho—acoustic compression schemes such as mp3 and ogg vorbis,
`
`or other high noise situations.
`
`None of these systems proves to be scalable to a large number of fingerprints,
`
`and a large volume of recognition requests. Additionally, none of the existing systems
`
`are effectively able to deal with many of the common types of signal distortion
`
`15
`
`encountered with compressed files, such as normalization, small amounts of time
`
`compression and expansion, envelope changes, noise injection, and psycho acoustic
`
`
`
`
`compression artifacts.
`
`Summary of the Invention
`
`20
`
`This system for acoustic fingerprinting consists of two parts: the fingerprint
`
`generation component, and the fingerprint recognition component. Fingerprints are
`
`built off a sound stream, which may be sourced from a compressed audio file, a CD, a
`
`radio broadcast, or any of the available digital audio sources. Depending on whether a
`
`defined start point exists in the audio stream, a different fingerprint variant may be
`
`25
`
`used. The recognition component can exist on the same computer as the fingerprint
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 4 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 4 of 12
`
`component, but will frequently be located on a central server, where multiple
`
`fingerprint sources can access it.
`
`Fingerprints are formed by the subdivision of an audio stream into discrete
`
`frames, wherein acoustic features, such as zero crossing rates, spectral residuals, and
`
`5
`
`Haar wavelet residuals are extracted, summarized, and organized into frame feature
`
`vectors. Depending on the robustness requirement of an application, different frame
`
`overlap percentages, and summarization methods are supported,
`
`including simple
`
`frame vector concatenation, statistical summary (such as variance, mean,
`
`first
`
`derivative, and moment calculation), and frame vector aggregation.
`
`10
`
`Fingerprint recognition is performed by a Manhattan distance calculation
`
`between a nearest neighbor set of feature vectors, from a reference database of feature
`
`vectors, and a given unknown fingerprint vector. Additionally, previously unknown
`
`fingerprints can be recognized due to a lack of similarity with existing fingerprints,
`
`allowing the system to intelligently index new signals as they are encountered.
`
`15
`
`Identifiers are associated with the reference database vector, which allows the match
`
`subsystem to return the associated identifier when a matching reference vector is
`
`
`
`
`
`found.
`
`Finally, comparison functions can be described to allow the direct comparison
`
`of fingerprint vectors, for the purpose of defining similarity in specific feature areas,
`
`20
`
`or from a gestalt perspective. This allows the sorting of fingerprint vectors by
`
`similarity, a useful quantity for multimedia database systems.
`
`Brief Description of the Drawings
`
`In the drawings:
`
`25
`
`FIG. 1 is a logic flow diagram, showing the preprocessing stage of fingerprint
`
`generation, including decompression, down sampling, and dc offset correction.
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 5 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 5 of 12
`
`FIG. 2 is a logic flow diagram, giving an overview of the fingerprint
`
`generation steps.
`
`FIG. 3 is a logic flow diagram, giving more detail of the time domain feature
`
`extraction step.
`
`5
`
`FIG. 4 is a logic flow diagram, giving more detail of the spectral domain
`
`feature extraction step.
`
`FIG. 5 is a logic flow diagram, giving more detail of the beat tracking feature
`
`step.
`
`FIG. 6 is a logic flow diagram, giving more detail of the finalization step,
`
`10
`
`including spectral band residual computation, and wavelet residual computation and
`
`sorting.
`
`FIG. 7 is a diagram of the aggregation match server components.
`
`FIG. 8 is a diagram of the collection match server components.
`
`FIG. 9 is a logic flow diagram, giving an overview of the concatenation match
`
`15
`
`server logic.
`
`FIG. 10 is a logic flow diagram, giving more detail of the concatenation match
`
`server comparison function.
`
`FIG. 11 is a logic flow diagram, giving an overview of the aggregation match
`
`server logic.
`
`20
`
`FIG. 12 is a logic flow diagram, giving more detail of the aggregation match
`
`server string fingerprint comparison function.
`
`Detailed Description of the Invention
`
`The ideal context of this system places the fingerprint generation component
`
`25 Within a database or media playback tool. This system, upon adding unknown content,
`
`proceeds to generate a fingerprint, which is then sent to the fingerprint recognition
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 6 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 6 of 12
`
`component,
`
`located on a central recognition server. The resulting identification
`
`information can then be returned to the media playback tool, allowing, for example,
`
`the correct identification of an unknown piece of music, or the tracking of royalty
`
`payments by the playback tool.
`
`5
`
`The first step in generating a fingerprint is opening a media file, identifying
`
`the file format, and if appropriate, decompressing the media file. This decompressed
`
`stream is then scanned for a DC offset error, and if one is detected, the offset is
`
`removed. Following the DC offset correction, the audio Stream is down sampled to
`
`11025 hz, which also serves as a low pass filter of the high frequency component of
`
`10
`
`the audio, and is then down mixed to a mono stream, since the current feature banks
`
`do not rely upon phase information. This step is performed to both speed up
`
`extraction of acoustic features, and because more noise is introduced in high
`
`frequency components by compression and radio broadcast, making them less useful
`
`components from a feature standpoint. This 11025 hz, 16 bit, mono audio stream is
`
`15
`
`then passed into the fingerprint generation subsystem.
`
`Four parameters influence fingerprint generation, specifically, frame size,
`
`frame overlap percentage, frame vector aggregation type, and signal sample length. In
`
`different types of applications, these can be optimized to meet a particular need. For
`
`example, increasing the signal sample length will audit a larger amount of a signal,
`
`20
`
`which makes the system usable for signal quality assurance, but takes longer to
`
`generate a fingerprint. Increasing the frame size decreases the fingerprint generation
`
`cost, reduces the data rate of the final signature, and makes the system more robust to
`
`small misalignment in fingerprint windows, but reduces the overall robustness of the
`
`fingerprint. Increasing the frame overlap percentage increases the robustness of the
`
`25
`
`fingerprint, reduces sensitivity to window misalignment, and can remove the need to
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 7 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 7 of 12
`
`sample a fingerprint from a known start point, when a high overlap percentage is
`
`coupled with a collection style frame aggregation method. It has the costs of a higher
`
`data rate for the fingerprint, longer fingerprint generation times, and a more expensive
`
`match routine.
`
`5
`
`In the present invention, 2 combinations of parameters were found to be
`
`particularly effective for different systems. The use of a frame size of 96,000 samples,
`
`a frame overlap percentage of 0, a concatenation frame vector aggregation method,
`
`and a signal sample length of 288,000 samples proves very effective at quickly
`
`indexing multimedia content, based on sampling the first 26 seconds in each file. It is
`
`10
`
`not robust against window shifting, or usable in a system wherein that window cannot
`
`be aligned, however. For applications where the overlap point between a reference
`
`fingerprint and an audio stream is unknown, the use of 32,000 sample frame windows,
`
`with a 75% frame overlap, a signal sample length equal to the entire audio stream, and
`
`a collection aggregation method is advised.
`
`15
`
`Moving into the fingerprint pipeline, the first step is to advance window frame
`
`size samples into the working buffer. Provided a full frame was read in, the time
`
`domain features of the working frame vector are computed, according to FIG 3. The
`
`zero crossing rate is computed by storing the sign of the previous sample, and
`
`incrementing a counter each time the sign of the current sample is not equal to the
`
`20
`
`sign of the previous sample, with zero samples ignored. The zero crossing total is then
`
`divided by the frame window length, to compute the zero crossing mean feature. The
`
`absolute value of each sample is also summed into a temporary variable, which is also
`
`divided by the frame window length to compute the sample mean value. This is
`
`divided by the root-mean-square of the samples in the frame window, to compute the
`
`25
`
`mean/RMS ratio feature. Additionally, the mean energy value is stored for each block
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 8 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 8 of 12
`
`of 10624 samples within the frame. The absolute value of the difference from block to
`
`block is then averaged to compute the mean energy delta feature.
`
`Next a Haar wavelet transform, with transform size of 64 samples, using 1/2 for
`
`the high pass and low pass components of the transform, is computed across the frame
`
`5
`
`samples. Each transform is overlapped by 50%, and the resulting coefficients are
`
`summed into a 64 point array. Each point in the array is then divided by the number of
`
`transforms that have been performed, and the minimum array value is stored as the
`
`normalization value. The absolute value of each array value minus the normalization
`
`value is then stored in the array, any values less than 1 are set to 0, and the final array
`
`10
`
`values are converted to log space using the equation array[I] = 20*log10(array[1]).
`
`These log scaled values are then sorted into ascending order, to create the wavelet
`
`domain feature bank.
`
`Subsequent to the wavelet computation, a Blackman—Harris window of 64
`
`samples in length is applied, and a Fast Fourier transform is computed. The resulting
`
`15
`
`power bands are summed in a 32 point array, converted to a log scale using the
`
`equation spec[l] = log10(spec[1] / 4096) + 6, and then the difference from the
`
`previous transform is summed in a companion spectral band delta array of 32 points.
`
`This is repeated, with a 50% overlap between each transform, across the entire frame
`
`window. Additionally, after each transform is converted to log scale, the sum of the
`
`20
`
`second and third bands, times 5, is stored in an array, beatStore, indexed by the
`
`transform number.
`
`After the calculation of the last Fourier transform,
`
`the beatStore array is
`
`processed using the beat tracking algorithm described in FIG 5. The minimum value
`
`in the beatStore array is found, and each beatStore value is adjusted such that
`
`25
`
`beatStore[I] = beatStore[I] —- minimum val. Then, the maximum value in the beatStore
`
`
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 9 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 9 of 12
`
`array is found, and a constant, beatmax is declared which is 80% of the maximum
`
`value in the beatStore array. For each value in the beatStore array which is greater
`
`than the beatmax constant, if all the beatStore values +- 4 array slots are less than the
`
`current value, and it has been more than 14 slots since the last detected beat, a beat is
`
`5
`
`detected and the BPM feature is incremented.
`
`Upon completing the spectral domain calculations,
`
`the frame finalization
`
`process described in FIG 6 is used to cleanup the final frame feature values. First, the
`
`spectral power band means are converted to spectral residual bands by finding the
`
`minimum spectral band mean, and subtracting it from each spectral band mean. Next
`
`10
`
`the sum of the spectral residuals is stored as the spectral residual sum feature. Finally,
`
`depending on the aggregation type, the final frame vector consisting of the spectral
`
`residuals,
`
`the spectral deltas,
`
`the sorted wavelet residuals,
`
`the beats feature,
`
`the
`
`mean/RMS ratio, the zero crossing rate, and the mean energy delta feature is stored.
`
`In the concatenation model, the frame vector is concatenated with any other frame
`
`15
`
`vectors to form a final fingerprint vector. In the aggregation model, each frame vector
`
`is stored in a final fingerprint set, where each vector is kept separate.
`
`In the preferred system, the fingerprint resolution component is located on a
`
`central server, although methods using a partitioning scheme based on the fingerprint
`
`database hash tables can also be used in a distributed system. Depending on the type
`
`20
`
`of fingerprint to be resolved, the architecture of the server will be similar to FIG 7 for
`
`concatenation model
`
`fingerprints, and similar
`
`to FIG 8 for aggregation style
`
`fingerprints. Both models share several data tables, such as the feature vector—>
`
`identifier database, the feature vector hash index, and the feature class —> comparison
`
`weights and match distance tuple table. Within the concatenation system,
`
`the
`
`25
`
`identifiers in the feature vector—> identifier database are unique GUIDs, which allows
`
`
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 10 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 10 of 12
`
`the return of a unique identifier for an identified fingerprint. The aggregation match
`
`server has several additional tables. The cluster ID occurrence rate table shows the
`
`overall occurrence rate of any given feature vector, for the probability functions
`
`within the match algorithm. The feature vector cluster table is a mapping from any
`
`5
`
`feature vector to the cluster ID which identifies all
`
`the nearest neighbor feature
`
`vectors for a given feature vector. In the aggregation system, a unique integer or
`
`similar value is used in place of the GUID, since the Fingerprint String database
`
`contains the GUID for aggregation fingerprints. The fingerprint string database
`
`consists of the identifier streams associated with a given fingerprint, and the cluster
`
`10
`
`ID’s for each component within the identifier stream. Finally, the cluster ID -> string
`
`location table consists of a mapping between every cluster ID and all
`
`the string
`
`fingerprints that contain a given cluster ID.
`
`To resolve an incoming concatenation fingerprint,
`
`the match algorithm
`
`described in FIG 9 is used. First, a check‘ is performed to see if more than one feature
`
`15
`
`class exists, and if so, the incoming feature vector is compared against each reference
`
`class vector, using the comparison function in FIG 10 and a default weight set. The
`
`feature class with the shortest distance to the incoming feature vector is used to load
`
`an associated comparison function weight scheme and match distance. Next, using the
`
`feature vector database hash index, which subdivides the reference feature vector
`
`20
`
`database based on the highest weighted features in the vector, the nearest neighbor
`
`feature vector set of the incoming feature vector is loaded. Next, each loaded feature
`
`vector in the nearest neighbor set is compared, using the loaded comparison weight
`
`scheme. If any of the reference vectors have a distance less than the loaded match
`
`threshold, the linked GUID for that reference vector is returned as the match for the
`
`25
`
`incoming feature vector. If none of the nearest neighbor vectors are within the match
`
`
`
`

`

`
`
`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 11 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 11 of 12
`
`threshold, a new GUID is generated, and the incoming feature vector is added to the
`
`reference database, allowing the system to organically add to the reference database
`
`as signals are encountered. Additionally, the step of re—averaging the feature values of
`
`the matched feature vector can be taken, which consists of multiplying each feature
`
`5
`
`vector field by the number of times it has been matched, adding the values of the
`
`incoming feature vector, dividing by the now incremented match count, and storing
`
`the resulting means in the reference database entry. This helps to reduce fencepost
`
`error, and move a reference feature vector to the center of the spread for different
`
`quality observations of a signal, in the event the initial observations were of an overly
`
`10
`
`high or low quality.
`
`Resolution of an aggregation fingerprint is essentially a two level process.
`
`First, the individual feature vectors within the aggregation fingerprint are resolved,
`
`using essentially the same process as
`
`the concatenation fingerprint, with the
`
`modification that instead of returning a GUID, the individual signatures return a
`
`15
`
`subsig ID and a cluster ID, which indicates the nearest neighbor set that a given
`
`subsig belongs to. After all the aggregated feature vectors within the fingerprint are
`
`resolved, a string fingerprint, consisting of an array of subsig ID and cluster ID tuples
`
`is formed. This format allows for the recognition of signal patterns within a larger
`
`signal stream, as well as the detection of a signal that has been reversed. Matching is
`
`20
`
`performed by subdividing the incoming string fingerprint into smaller chunks, such as
`
`the subsigs which correspond to 10 seconds of a signal, looking up which cluster ID
`
`within that window has the lowest occurrence rate in the overall feature database,
`
`loading the reference string fingerprints which share that cluster ID, and doing a run
`
`length match between those loaded string fingerprints and the incoming fingerprint.
`
`25
`
`Additionally, the number of matches and mismatches between the reference string
`
`10
`
`

`

`Case 1:14-cv-02396-PGG-SN Document 239-9 Filed 11/12/20 Page 12 of 12
`Case 1:14-cv-O2396-PGG-SN Document 239-9 Filed 11/12/20 Page 12 of 12
`
`fingerprint and the incoming fingerprint are stored. This is used instead of summed
`
`distances, because several consecutive mismatches should trigger a mismatch, since
`
`that indicates a strong difference in the signals between two fingerprints. Finally, if
`
`the match vs. mismatch rate crosses a predefined threshold, a match is recognized,
`
`5
`
`and the GUID associated with the matched string fingerprint is returned.
`
`Additional variants on this match routine include searching forwards and
`
`backwards for matches, so as to detect reversed signals, and accepting a continuous
`
`stream of aggregation feature vectors, storing a trailing window, such as 30 seconds
`
`of signal, and only returning a GUID when a match is finally detected, advancing the
`
`10
`
`search window as more fingerprint subsigs are submitted to the server. This last
`
`variant is particularly useful for a streaming situation, where the start and stop points
`
`of the signal to be identified are unknown.
`
`
`
`11
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket