`Case 1:14-cv-02396—PGG-MHD Document 153-15 Filed 06/28/19 Page 1 of 10
`
`EXHIBIT N
`
`EXHIBIT N
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 2 of 10
`
`United States Patent (19)
`Ghias et al.
`
`54). APPARATUS AND METHOD FOR
`SEARCHING AMELODY
`
`76 Inventors: Asif U. Ghias, 2306 Sultana Dr.,
`Yorktown Heights, N.Y. 10598;
`Jonathan A. Logan, 1186 Palomino
`Rd., Santa Barbara, Calif. 93 105; David
`W. Chamberlin, 206 Flynn Ave.,
`Mountain View, Calif. 94043
`Appl. No.: 742,260
`21
`ppl. No.:
`9
`22 Filed:
`Oct. 31, 1996
`Related U.S. Application Data
`
`60 Provisional application No. 60/008,177, Oct. 31, 1995.
`51) Int. Cl. ............................. A63H 5700; G04B 13/00;
`G1OH 7/00
`52 U.S. Cl. ................................................. 84/609; 84/616
`s
`58 Field of Search .............................. 846. '.
`/616, 654
`
`56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4,688,464 8/1987 Gibson et all
`4,771.671
`9/1988 Hoff, Jr............ 8464s
`5,038,658 8/1991 Tsuruta et al..
`5,088,380 2/1992 Minamitaka .............................. 84/637
`
`USOO5874.686A
`Patent Number:
`11
`(45) Date of Patent:
`
`5,874,686
`Feb. 23, 1999
`
`5,402,339 3/1995 Nakashima et al..
`5,510,572 4/1996 Hayashi et al. ........................... 84/609
`5,619,004 4/1997 Dame.
`
`FOREIGN PATENT DOCUMENTS
`405061917A 3/1993 Japan.
`OTHER PUBLICATIONS
`Hawley, Michael J., Structure out of Sound, 1993, MIT
`Doctoral Thesis, Abstract.
`Primary Examiner William M. Shoop, Jr.
`Assistant Examiner Jeffrey W. Donels
`Attorney, Agent, or Firm-Hodgson, Russ, Andrews, Woods
`& Goodyear LLP
`57
`
`ABSTRACT
`
`Apparatus and method for easily, efficiently, and accurately
`als ski A ally S inputsdO a ople and
`converted into a form of a Sequence of digitized represen
`tations of relative pitch Gries. SCSI
`thereof. A melody database is Searched for at least one
`Sequence of digitized representations of relative pitch dif
`ferences between Successive notes which at least approxi
`mately matches the Sequence of digitized representations of
`relative pitch differences between Successive notes of the
`melody.
`
`16 Claims, 3 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`20
`
`P
`
`HUMMED
`OUERY
`
`COMPUTER
`
`WAVELET
`PTCH
`TRACKER
`
`LOWPASSCUBic
`FILTER
`: SPLINE
`WAVELET:
`CENTER
`PITCH
`CPPING TRACKER
`
`as
`
`AUTOCORRELATON
`BASED PITCH
`DETECTOR
`
`MELODY
`DATABASE
`
`OUERY ENGINE
`
`RANKED ST
`OF MATCHING
`MELODIES
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 3 of 10
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 1 of 3
`
`5,874,686
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`COMPUTER
`
`
`
`are
`
`WAVELET
`IFS
`
`FILTER
`
`
`
`SPLINE
`WAVELET
`CBEING TRACKER
`
`m. m. m. maa e a
`
`- - - - - - -
`
`AUTOCORRELATION
`BASED PITCH
`DETECTOR
`
`MELODY
`DATABASE
`
`QUERY ENGINE
`
`RANKED LIST
`OF MATCHING
`MELODES
`
`
`
`26
`
`- EIG.1
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 4 of 10
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 2 of 3
`
`5,874,686
`
`0.9
`0.8
`O.7
`O.6
`O.5
`0.4
`O.3
`0.2
`O.1
`
`30
`
`O
`
`2
`
`4
`
`6
`
`10 12 14 16 18 20
`8
`TIME
`- EIG. 2
`
`30
`
`30
`
`30
`
`30
`
`30
`
`0.9
`0.8
`O.7
`0.6
`O.5
`0.4
`0.3
`O.2
`O.1
`
`O
`
`50
`
`100 150 200 250 300 350 400
`TIME
`- EIG. 3
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 5 of 10
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 3 of 3
`
`5,874,686
`
`2.5
`
`1.5
`
`1
`0.5
`O
`-0.5
`-1
`- 1.5
`-2
`O
`
`5
`4
`3
`
`2
`1
`O
`-1
`-2
`-3
`-4
`O
`
`/?
`
`42
`/
`
`40
`
`60
`
`7O
`
`80
`
`30
`
`50
`
`40
`TIME
`- EIG. 4
`
`1 O
`
`20
`
`50
`
`O.5
`
`1
`
`1.5
`
`3
`
`3.5 4 4.5
`
`5
`
`2 2.5
`TIME
`- EIG.5
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 6 of 10
`
`1
`APPARATUS AND METHOD FOR
`SEARCHING AMELODY
`
`5,874,686
`
`CROSS REFERENCE TO ARELATED
`APPLICATION
`This application claims priority of provisional application
`Ser. No. 60/008,177, filed Oct. 31, 1995, which is hereby
`incorporated herein by reference.
`
`15
`
`BACKGROUND OF THE INVENTION
`The present invention relates generally to database
`Searching. More particularly, the present invention relates to
`melody Searching.
`Next generation databases should include image, audio,
`and Video data in addition to traditional text and numerical
`data. These data types will require query methods that are
`more appropriate and natural to the type of respective data.
`For instance, a natural way to query an image database is to
`retrieve images based on operations on images or Sketches
`Supplied as input. Similarly, a natural way of querying an
`audio database (of Songs) is to hum the tune of a Song, as
`apparently addressed in T. Kageyama and Y. Takashima, "A
`Melody Retrieval Method With Hummed Melody”
`(language: Japanese), Transactions of the Institute of
`25
`Electronics, Information and Communication Engineers
`D-II, J77D-II(8): 1543–1551, August 1994. Such a system
`would be useful in any multimedia database containing
`musical data by providing an alternative and natural way of
`querying. One can also imagine a widespread use of Such a
`System in commercial music industry, music radio and TV
`Stations, music Stores, and even for one's personal use.
`It has been observed that melodic contour, defined as the
`Sequence of relative differences in pitch between Successive
`notes, can be used to discriminate between melodies. See
`Stephen Handel, Listening. An Introduction to the Percep
`tion of Auditory Events, The MIT Press, 1989, which indi
`cates that melodic contour is one of the most important
`methods that listeners use to determine Similarities between
`melodies. In Michael Jerome Hawley, Structure out of
`Sound, PhD thesis, MIT, September 1993, a method of
`querying a collection of melodic themes by Searching for
`exact matches of Sequences of relative pitches input by a
`MIDI keyboard is briefly discussed.
`U.S. Pat. No. 5,510,572 discloses utilizing pitch differ
`ences between Successive notes in classifying motion for a
`melody analyzer and harmonizer, wherein a Search may be
`incidentally used to find an appropriate chord progression to,
`for example, harmonize music So as to accompany a singer.
`Other art which may be of interest includes U.S. Pat. Nos.
`5,040,081; 5,146,833; 5,140,886; 4,688,464, and 5,418,322.
`
`35
`
`40
`
`45
`
`50
`
`SUMMARY OF THE INVENTION
`It is an object of the present invention to easily, efficiently,
`and accurately Search melodies.
`In order to easily, efficiently, and accurately Search
`melodies, in accordance with the present invention, a com
`puter means is provided which has a database of melodies
`each including a plurality of notes in a form of a Sequence
`of digitized representations of relative pitch differences
`between Successive notes, a melody is inputted to the
`computer means and converted into a form of a Sequence of
`digitized representations of relative pitch differences
`between Successive notes thereof, and the melody database
`is Searched for at least one Sequence of digitized represen
`tations of relative pitch differences between Successive notes
`
`55
`
`60
`
`65
`
`2
`which at least approximately matches the Sequence of digi
`tized representations of relative pitch differences between
`Successive notes of the melody.
`The above and other objects, features, and advantages of
`the present invention will be apparent in the following
`detailed description of a preferred embodiment thereof when
`read in conjunction with the accompanying drawings
`wherein the same reference numerals denote the same or
`Similar parts throughout the Several views.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a block diagram illustrating an apparatus and
`method which embodies the present invention.
`FIGS. 2 and 3 are graphs illustrating an excitation signal
`and train of Such excitation signals respectively used to
`create a Synthesized pitch for the method and apparatus.
`FIG. 4 is a graph of a formant structure which is formed
`from certain formant frequencies.
`FIG. 5 is a graph of a synthesized pitch created by
`convolving the train of excitation pulses of FIG. 3 and the
`formant structure of FIG. 4.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`Referring to FIG. 1, there is illustrated generally at 10
`apparatus for Searching a melody, which is represented by a
`Series of Successive notes, illustrated at 12, which of course
`have different pitches. In other words, it may be desirable to
`know the identity of the particular melody or tune 12, and a
`database, illustrated at 14, of melodies or tunes is Searched,
`as described hereinafter, to locate at least one melody or tune
`which at least approximately matches the tune 12. The
`database 14 is shown to be contained within a general
`purpose computer 16, but, alternatively, the database 14 may
`be located apart from the computer 16 and Suitably con
`nected thereto for communicating between the computer and
`database. Both of these alternatives are meant to come
`within the Scope of the present invention.
`In accordance with the present invention, the tune 12 is
`hummed by a person 18 into a microphone 20, and the
`hummed query, illustrated at 21, is Suitably digitized, in
`accordance with principles commonly known to those of
`ordinary skill in the art to which this invention pertains, and
`the digitized signals of the hummed query 21 are fed to a
`pitch tracking module 22 in computer 16. The pitch tracker
`assembles a contour representation of the hummed melody
`12, as hereinafter discussed in greater detail, which is fed to
`a query engine, illustrated at 24. The query engine 24
`Searches the melody database 14 and outputs a ranked list of
`approximately matching melodies, as illustrated at 26. A
`preSelected error tolerance may be applied to the Search. The
`query engine 24 may of course alternatively be programmed
`to output the Single most approximate matching melody or,
`if desired, to output an exact matching melody. However, by
`Searching for an approximate matching melody, as herein
`after discussed, various forms of anticipated errors may be
`taken into account.
`The database 14 of melodies may be acquired, for
`example, by processing public domain MIDI Songs, or may
`otherwise be Suitably acquired, and may be Stored as a
`flat-file database. Pitch tracking may be performed in, for
`example, Matlab software, a product of the Matworks, Inc.
`of Natick, Mass., chosen for its built-in audio processing
`capabilities and the east of testing a number of algorithms
`within it. Hummed queries may be recorded in a variety of
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 7 of 10
`
`5,874,686
`
`4
`
`
`w = -- : k = 1, 2, 3,...
`(2k-1)c
`3 as ws
`
`With L-17 cm (average value of vocal-tract length) and a
`Sound propagation Speed of c=340 meters/sec., the frequen
`cies of these resonances will be:
`
`F=(2k-1)-500 Hz; k=1,2,3,...
`
`The frequencies F are called formant frequencies.
`The resulting Sound that is heard is considered to be the
`convolution of the excitation pulse, illustrated at 30 in FIG.
`2, created by the glottis and the formant frequencies.
`Therefore, if one wants to model a Speech Signal, one Starts
`with a train of excitation pulses 30, as shown in FIG. 3. The
`period T in the train of excitations 30 is 0.01 sec. making
`the pitch 100 Hz. For the formant frequencies, the above
`equation for F is used with ke {1,2,3}. This gives formant
`frequencies: F=500 Hz, F=1500 Hz, and F=2500 Hz.
`Combining these frequencies and adding an exponential
`envelope produces the formant structure shown at 40 in FIG.
`4. By convolving the train of excitation pulses 30 with the
`formant structure 40 a synthesized pitch, as shown at 50 in
`FIG. 5, is obtained.
`Since what one hears as pitch is considered to actually be
`the frequency at which the bursts of air occur, one may be
`able to find the pitch of the Segment by tracking the bursts
`of air.
`There are at least three approaches to tracking pitch.
`The Maximum Likelihood approach, which is described
`by James D. Wise, James R. Caprio, and Thomas W. Parks
`in “Maximum Likelihood Pitch Estimation.” IEEE Trans.
`Acoust., Speech, Signal Processing, 24(5): 418-423, Octo
`ber 1976, is a modification of autocorrelation (hereinafter
`discussed) that increases the accuracy of the pitch and
`decreases the chances of aliasing.
`However, this approach is very slow due to its computa
`tional complexity. An implementation in Matlab Software
`may typically take approximately one hour or longer to
`evaluate 10 seconds of audio on a 90 MHZ Pentium work
`Station. With Some optimizations, the performance may be
`improved to approximately 15 minutes per 10 Seconds of
`audio, but this is still far too slow for the purposes of this
`invention.
`The Cepstrum Analysis approach, which is discussed in
`A. V. Oppenheim, “A Speech Analysis-Synthesis System
`Based on Homomorphic Filtering.” J. Acoustical Society of
`America, 45: 458–465, February 1969, and in Alan V.
`Oppenheim and Ronald W. Schafer, Discretetime-Signal
`Processing, Prentice Hall, Englewood Cliffs, N.J., 1989,
`does not give very accurate results for-humming.
`In order to provide both rapid and accurate pitch tracking,
`a preferred pitch tracking approach is autocorrelation, which
`is described in L. R. Rabiner, J. J. Dubnowski, and R. W.
`Schafer, “Realtime Digital Hardware Pitch Detector.” IEEE
`Transactions on Acoustics, Speech and Signal Processing,
`ASSP-24(1): 2–8, February 1976. This approach isolates
`and tracks the peak energy levels of the Signal which is a
`measure of the pitch. Referring back to FIG. 4, the Signal
`S(n), illustrated at 40, peaks where the impulses occur.
`Therefore, tracking the frequency of these peaks 42 should
`provide the pitch of the Signal. In order to get the frequency
`of these peakS 42, autocorrelation is employed, which is
`defined by:
`
`3
`formats, depending upon the platform-specific audio input
`capabilities of the Matlab software. For example, the format
`may be a 16-bit, 44 Khz WAV format on a Pentium system,
`or a 8-bit 8 Khz AU format on a Sun Sparcstation. The query
`engine 24 may use an approximate pattern matching 5
`algorithm, described hereinafter, in order to tolerate hum
`ming errors.
`User input 12 (humming) to the system 10 is converted
`into a Sequence of relative pitch transitions, as follows. A
`note 12 in the input may be classified in one of three ways:
`a note is either the same as the previous note (S), higher than
`the previous note (U), or lower than the previous note (D).
`Thus, the input is converted into a String with a three letter
`alphabet (U.D.S). For example, the introductory theme of
`Beethoven's 5th Symphony would be converted into the is
`sequence: - S S D U S S D (the first note is ignored as it
`has no previous pitch). For another example, the melody
`portion 12, as illustrated in FIG. 1 with one high note
`between two lower notes, would be converted into the
`Sequence: U D.
`To accomplish this conversion, a Sequence of pitches in
`the melody must be isolated and tracked. This is not as
`Straight-forward as it Sounds, however, as there is still
`considerable controversy over exactly what pitch is. The
`general concept of pitch is clear: given a note, the pitch is the 25
`frequency that most closely matches what one hears. Per
`forming this conversion in a computer can become trouble
`Some because Some intricacies of human hearing are still not
`clearly understood. For instance, if one plays the 4th, 5th,
`and 6th harmonics of Some fundamental frequency, one 30
`actually hears the fundamental frequency, not the harmonics,
`even though the fundamental frequency is not present. This
`phenomenon was first discovered by Schouten in Some
`pioneer investigations carried out from 1938 to 1940.
`Schouten Studied the pitch of periodic Sound waves pro- 35
`duced by an optical siren in which the fundamental of 200
`HZ was cancelled completely. The pitch of the complex tone,
`however, was the same as that prior to the elimination of the
`fundamental. See R. Plomp, Aspects of Tone Sensation,
`Academic Press, London, 1976.
`Because of the interest in tracking pitch in humming,
`methods were examined for automatically tracking pitch in
`a human Voice. Before one can estimate the pitch of an
`acoustic signal, we must first understand how this signal is
`created, which requires forming a model of Sound produc- 45
`tion at the Source. The vibrations of the vocal cords in voiced
`Sounds are caused as a consequence of forces that are
`exerted on the laryngeal walls when air flows through the
`glottis (the gap between the Vocal cords). The terms “vocal
`folds' and “vocal cords' are more or leSS used as Synonyms 50
`in the literature. Wolfgang Hess, Pitch Determination of
`Speech Signals, Springer-Verlag, Berlin Heidelberg, 1983,
`describes a model of the vocal cords as proposed by M.
`Hirano, “Structure and Vibratory Behavior of the Vocal
`Folds, " in M. Sawashima and F. S. Cooper, editors, 55
`Dynamic aspects of Speech production, pages 13–27, Uni
`versity of Tokyo Press, 1976. For the purposes of the present
`application though, it is Sufficient to know that the glottis
`repeatedly opens and closes thus providing bursts of air
`through the Vocal tract.
`The Vocal tract can be modeled as a linear passive
`transmission System with a transfer function H(Z). If one
`adds an additional transfer function R(Z) which takes into
`account the radiation, the output impedance of the Vocal
`tract can approximately be set to Zero. In the neutral position 65
`where the Vocal tract can be regarded as a uniform tube, the
`resonances of the Vocal tract occur at Sound wavelengths of
`
`2O
`
`40
`
`60
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 8 of 10
`
`5,874,686
`
`6
`previously discussed U.D.S characters, and the converted
`user input (the key 23) is compared with all the Songs.
`The pattern-matching desirably uses a “fuzzy' Search to
`allow for errors within the matches. These errors reflect the
`inaccuracies in the way people hum as well as errors in the
`representation of the Songs themselves.
`For performing the key-search within the database 14, it
`is considered desirable to use an efficient approximate
`pattern matching algorithm. By “approximate” is meant that
`the algorithm should be able to take into account various
`forms of errors.
`Using the word CASABLANCA as an example, the
`following are examples of the types of errors which may
`occur in a typical pattern matching Scheme.
`
`where 1 is the lag and h is the input signal.
`While autocorrelation is Subject to aliasing (picking an
`integer multiple of the actual pitch) and is computationally
`complex, it still provides greater Speed and accuracy as
`compared to the previously discussed approaches. The
`present implementation of autocorrelation was found to
`require approximately 45 seconds for 10 seconds of 44 KHZ,
`16-bit audio on a 90 MHz pentium workstation.
`In order to improve the performance and Speed and
`robustness of the pitch-tracking algorithm, a cubic-spline
`wavelet transform or other suitable wavelet transform may
`be used. The cubic spline wavelet peaks at discontinuities in
`the Signal (i.e., the air impulses). One of the most significant
`features of the wavelet analysis is that it can be implemented
`in a filter bank structure and therefore computed in O(n)
`time.
`It is preferred that the cubic Spline wavelet implementa
`tion be an event-based implementation of the tracer.
`The cubic spline wavelet transform is linear and shift
`invariant which are useful properties for Speech Signals since
`they are often modelled as a linear combination of shifted
`and damped sinusoids.
`If a signal x(t) or its derivatives have discontinuities, then
`the modulus of the cubic spline transform of X(t) exhibits
`local maxima around the points of discontinuity. This is
`considered to be a desirable property for a cubic spline
`wavelet pitch-tracker Since glottal closure causes Sharp
`changes in the derivative of the air flow in the glottis and
`transients in the Speech Signal.
`It has been shown in S. Mallat and W. L. Hwang,
`“Singularity Detection and Processing with Wavelets”,
`Technical Report, Courant Institute of Mathematical
`Sciences, New York, March, 1991, that if one chooses a
`wavelet function g(t) that is the first derivative of a smooth
`ing function (a Smoothing function is a function whose
`Fourier transform has energy concentrated in the low
`frequency region) P(t), then the local maxima of the trans
`form indicate the Sharp variations in the Signal whereas the
`local minima indicate the Slow variations. Hence, the local
`maxima of the transform using a wavelet, which is the first
`derivative of a Smoothing function, is considered to be
`desirable for detecting the abrupt changes or transients in the
`Speech Signal caused by glottal closure.
`One of the most appealing features of the cubic spline
`wavelet transform is that is can be implemented in a filter
`bank structure. It therefore is O(n) fast.
`Preferably, the autocorrelation pitch tracking approach
`includes low-pass filtering (for removing non-vocal
`frequencies) and center-clipping, as described in M. M.
`Sondhi, “New Methods of Pitch Extraction," IEEE Trans.
`Audio Electroacoust. (Special Issue on Speech Communi
`cation and Processing Part II), AU-16: 262-266, June
`1968. It is considered that these will help eliminate the
`formant Structure that generally causes difficulty for auto
`correlation based pitch detectors.
`This preferred form of autocorrelation takes between 20
`and 45 Seconds on a Sparc10 WorkStation to process typical
`Sequences of hummed notes. A brute-force Search of the
`database unsurprisingly shows linear growth with the size of
`the database, but remains below 4 seconds for 100 songs on
`a Sparc2. Therefore, the search time is currently effectively
`limited by the efficiency of the pitch-tracker.
`In order to Search the database, Songs in the database 14
`are preprocessed to convert the melody into a stream of the
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`In CASABLANCA In a melody
`
`Transposition error
`Dropout error
`Duplication error
`Insertion error
`
`CASABALNCA
`CAS BLANCA
`CASAABLANCA
`CASABTLANCA
`
`two notes reversed
`a note dropped
`a note duplicated
`a note inserted
`
`Several Algorithms have been developed that address the
`problem of approximate String matching. Running times
`have ranged from 0(mn) for the brute force algorithm to
`0(kn) or 0(nlog(m), where “0” means “on the order of,” m
`is the number of pitch differences in the query, and n is the
`Size of the String (Song). See Ricardo Baeza-Yates and G. H.
`Gonnet, “Fast String Matching with Mismatches.” Informa
`tion and Computation, 1992. A preferred algorithm which is
`considered to offer better performance in general for this
`purpose is that described in Ricardo A. Baesa-Yates and
`Chris H. Perieberg, “Fast and Practical Approximate String
`Matching, “Combinatorial Pattern Matching, Third Annual
`Symposium, ' pages 185-192, 1992.
`This algorithm addresses the problem of String matching
`with k mismatches. The problem consists of finding all
`instances of a pattern String P=p1.p2p3 . . . pm in a text
`String T=t1, t2,t3 .
`.
`. th Such that there are at most k
`mismatches (characters that are not the same) for each
`instance of P in T. When k=0 (no mismatches) one has the
`Simple String matching problem, Solvable in O(n) time.
`When k=m, every substring of T of length m qualifies as a
`match, Since every character of P can be mismatched. Each
`of the errors in the above CASABLANCA example corre
`sponds to k=1.
`The worst case for this algorithm occurs when P(the key
`23) consists of m occurrences of a single distinct character,
`and T (contour representation of Song) consists of n
`instances of that character. In this case the running time is
`0(mn). However, this is neither a common nor useful situ
`ation for the present invention. In the average case of an
`alphabet in which each character is equally likely to occur,
`the running time is
`
`where r is the size of the alphabet.
`The computer 16 may desirably be programmed So that,
`for a given query, the database 14 returns a list of Songs
`ranked by how well they matched the query, not just one best
`match. The number of matches that the database 14 should
`retrieve depends upon the error-tolerance used during the
`key-Search. This error-tolerance could be set in one of two
`possible ways: either it can be a user-definable parameter or
`the database can itself determine this parameter based on, for
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 9 of 10
`
`5,874,686
`
`7
`example, heuristics that depends on the length of the key.
`This would give the user an opportunity to perform queries
`even if the user is not Sure of Some notes within the tune.
`From the results of the query the user can identify the
`Song of interest. If the list is too large, the user can perform
`a new query on a restricted Search list consisting of Songs
`just retrieved. This allows the user to identify Sets of Songs
`that contain Similar melodies.
`An experimental evaluation of the System tested the
`tolerance of the System with respect to input errors, whether
`from mistakes in the user's humming or from problems with
`the pitch-tracking.
`Effectiveness of the system 10 is directly related to the
`accuracy with which pitches that are hummed can be tracked
`and the accuracy of the melodic information within the
`database 14. Under ideal circumstances, one can achieve
`close to 100% accuracy tracking humming, where “ideal
`circumstances' means that the user places a Small amount of
`Space between each note and hits each note Strongly. For this
`purpose, humming Short notes is encouraged. Even more
`ideal is for the user to aspirate the notes as much as possible,
`perhaps going So far as to voice a vowel, as in “haaa haaa
`haaa..” Only male voices have been experimented with.
`The evaluation database contained a total of 183 Songs.
`Each song was converted from public domain General MIDI
`Sources. Melodies from different musical genres were
`included, including both classical and popular music. A few
`Simple heuristics were used to cut down on the amount of
`irrelevant information from the data, e.g., MIDI channel 10
`was ignored as this is reserved for percussion in the General
`MIDI standard. However, the database still contained a great
`deal of information unrelated to the main theme of the
`melody. Even with this limitation, it is discovered that
`Sequences of 10 to 12 pitch transitions were Sufficient to
`discriminate 90% of the songs.
`AS a consequence of using a fast approximate String
`matching algorithm, Search keys can be matched with any
`portion of the melody, rather than just the beginning. AS the
`Size of the database grows larger, however, this may not
`prove to be an advantage.
`Contour representations for each Song are Stored in Sepa
`rate files. Opening and closing files may become a signifi
`cant overhead. Performance may therefore be improved by
`packing all the Songs into one file, or by using a database
`manager. The programming code may be modernized to
`make it independent of any particular database Schema.
`The pattern matching algorithm in the form previously
`described does not discriminate between the various forms
`of pattern matching errors discussed earlier, but only
`accounts for them collectively. Some forms of errors may be
`more common than otherS depending upon the way people
`casually hum different tunes. For example, drop-out errors
`reflected as dropped notes in tunes are more common than
`transposition or duplication errors. Tuning the key-Search So
`that it is more tolerant to drop-out errors, for example, may
`yield better results.
`The generation of the melodic contours of the Source
`songs automatically from MIDI data is convenient but not
`optimal. More accuracy and less redundant information
`could be obtained by entering the melodic themes for
`particular Songs by keyboard.
`The resolution of the relative pitch differences may be
`desirably increased by using query alphabets of three, five,
`and more possible relationships between adjacent pitches.
`Early experiments using an alphabet of five relative pitch
`differences (Same, higher, much higher, lower, much lower)
`indicated changes of this Sort are promising. One drawback
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`of introducing more resolution is that the users must be
`Somewhat more accurate in the intervals they actually hum.
`It should be understood that, while the present invention
`has been described in detail herein, the invention can be
`embodied otherwise without departing from the principles
`thereof, and Such other embodiments are meant to come
`within the scope of the present invention as defined by the
`appended claims.
`What is claimed is:
`1. Apparatus for Searching a melody comprising a com
`puter means having a database of melodies each including a
`plurality of notes in a form of a Sequence of digitized
`representations of relative pitch differences between Succes
`Sive notes, means for inputting a melody to Said computer
`means, means for converting the melody into a form of a
`Sequence of digitized representations of relative pitch dif
`ferences between Successive notes thereof, and means for
`Searching Said melody database for at least one Sequence of
`digitized representations of relative pitch differences
`between Successive notes which at least approximately
`matches said Sequence of digitized representations of rela
`tive pitch differences between Successive notes of the
`melody.
`2. Apparatus according to claim 1 wherein Said input
`means comprises a microphone.
`3. Apparatus according to claim 1 further comprising
`means for outputting from Said computer a ranked list of
`approximately matching melodies.
`4. Apparatus according to claim 1 wherein Said converting
`means comprises an autocorrelation based pitch detector
`CS.
`5. Apparatus according to claim 1 wherein Said converting
`means comprises an autocorrelation based pitch detector
`means having low-pass filter means for removing non-vocal
`frequencies and center-clipping means for eliminating
`unwanted formant Structure.
`6. Apparatus according to claim 1 wherein Said converting
`means comprises a wavelet pitch tracker.
`7. Apparatus according to claim 1 wherein Said converting
`means comprises a cubic spline wavelet pitch tracker.
`8. Apparatus according to claim 1 further comprising
`means for approximately matching Sequences of digitized
`representations of relative pitch differences between Succes
`Sive notes.
`9. A method for Searching a melody comprising the Steps
`of: (a) selecting a computer means having a database of
`melodies each including a plurality of notes in a form of a
`Sequence of digitized representations of relative pitch dif
`ferences between Successive notes, (b) inputting a melody to
`the computer means, (c) converting the melody into a form
`of a Sequence of digitized representations of relative pitch
`differences between Successive notes thereof, and (d)
`Searching the melody database for at least one Sequence of
`digitized representations of relative pitch differences
`between Successive notes which at least approximately
`matches the Sequence of digitized representations of relative
`pitch differences between Successive notes of the melody.
`10. A method according to claim 9 wherein the step of
`inputting the melody comprises humming into a micro
`phone.
`11. A method according to claim 9 further comprising
`operating the computer to output a ranked list of approxi
`mately matching melodies.
`12. A method according to claim 9 wherein the step of
`converting comprises applying to the Sequence of represen
`tations of relative pitch changes between Successive notes of
`the hummed melody an autocorrelation based pitch detector.
`
`
`
`Case 1:14-cv-02396-PGG-MHD Document 153-15 Filed 06/28/19 Page 10 of 10
`
`5,874,686
`
`9
`13. A method according to claim 9 wherein the step of
`converting comprises applying to the Sequence of represen
`tations of relative pitch changes between Successive notes of
`the hummed melody an autocorrelation based pitch detector
`having low pass filtering for removing non-vocal frequen
`cies and center-clipping for eliminating unwanted formant
`Structure.
`14. A method according to claim 9 wherein the step of
`converting comprises applying to the Sequence of represen
`tations