`Ghias et al.
`
`[54] APPARATUS AND METHOD FOR
`SEARCHING A MELODY
`
`[76] Inventors: égiggv?lgzisétiiog$133159?"
`Jonathan A. Logan, 1186 Palomino
`Rd., Santa Barbara, Calif. 93105; David
`W. Chamberlin, 206 Flynn Ave.,
`Mountam VleW’ Cahf' 94043
`[21] A 1 N 742 260
`pp .
`o.:
`,
`Filed:
`Oct 31’ 1996
`
`[22]
`
`Related US Application Data
`
`[60] Provisional application No. 60/008,177, Oct. 31, 1995.
`
`[51] Int. Cl.6 ........................... .. A63H 5/00; G04B 13/00;
`GlOH 7/00
`[52] U S C]
`8 4 /609_ 84/616
`I.
`.
`. ............................................... ..
`,
`[58] Fleld of Search """""""""""""" " 84/60gZ/660196’ 6635A:
`’
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4688 464 8/1987 Gb
`t
`1
`
`,
`
`,
`
`1 son e a .
`
`.
`
`US005874686A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,874,686
`Feb. 23, 1999
`
`5,402,339
`3/1995 Nakashima et al. .
`5,510,572
`4/1996 Hayashi et al. ......................... .. 84/609
`5,619,004
`4/1997 Dame .
`FOREIGN PATENT DOCUMENTS
`405061917A 3/1993 Japan .
`
`OTHER PUBLICATIONS
`HaWley, Michael 1., Structure out of Sound, 1993, MIT
`Doctoral Thesis, Abstract.
`
`Primary Examiner—William M. Shoop, Jr.
`Assistant Examiner—Jeffrey W. Donels
`Attorney, Agent, or Firm—Hodgson, Russ, AndreWs, Woods
`& Goodyear LLP
`[57]
`
`ABSTRACT
`
`_
`_
`Apparatus and method for easily, ef?ciently, and accurately
`searching melodies. A melody is inputted to a computer and
`Converted into a form of a Sequence of digitized represen_
`tations of relative pitch differences betWeen successive notes
`thereof. A melody database is searched for at least one
`sequence of digitized representations of relative pitch dif
`ferences betWeen successive notes Which at least approxi
`mately matches the sequence of digitized representations of
`relative pitch differences betWeen successive notes of the
`
`4,771,671
`5,038,658
`5,088,380
`
`9/1988 Hoff, Jr. .................................. .. 84/645
`8/1991 Tsuruta et a1. .
`2/1992 Minamitaka ............................ .. 84/637
`
`melody‘
`
`16 Claims, 3 Drawing Sheets
`
`2 0
`
`\pl/ P
`12
`
`HUMMED 21
`QUERY
`
`18
`
`V
`
`1O
`/
`
`16
`\
`
`COMPUTER
`
`|_L_C_)VV_P7A§S_' r ———— ——
`1
`CUBIC I
`
`_ _ _ _ _ __
`
`:Wél\_/|_EC|T_iET: L_F_|L_T§B_1 I SPLINE :
`: TRACKER : f6E?T_E§_| :WSliIEIIT-TT:
`_____ n L 9|__|BP_||1G_ j ‘LIE/‘£35581
`
`\AUTOCORRELATION
`22 / 2
`3\
`BASED PITCH
`DETECTOR
`
`14
`
`MELODY
`
`v
`
`V
`
`RANKED LIST
`OF MATCHING / 26
`MELODIES
`
`Google Ex. 1010
`
`
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 1 of3
`
`5,874,686
`
`20
`
`P P P
`
`12
`
`HUMMED 21
`QUERY
`
`18
`
`v
`
`10
`/
`
`16
`\
`
`COMPUTER
`
`14
`
`_____ __
`
`|_L'OVV_PT\§S_' I‘ ____ _ -
`I
`CUBIC I
`
`: WAVELET : L_F_'L_TEB_J } SPLINE l
`' “SEER '
`IWAVECLET:
`'_ ____ __l I_TJEI\_IT_E_IQ_I '
`P'T H |
`LQILHZPING } 'LTBAPEEEI
`\
`
`22/
`
`23
`
`AUTOCORRELATION
`BASED PITCH
`DETECTOR
`
`DIE/‘18%|; —> QUERY ENGINE /24
`
`l
`
`RANKED LIST
`OF MATCHING / 26
`MELODIES
`
`Google Ex. 1010
`
`
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 2 of3
`
`5,874,686
`
`0.9
`0.8
`0.7
`0.6-
`0.5
`0.4
`0.3
`0.2
`
`0.1 -
`
`30
`
`n
`
`i
`
`n
`
`/ 3°
`
`”
`
`0.9
`0.8
`0.7
`0.6
`30
`0.5 /30 /
`0.4
`0.3
`0.2
`0.1
`
`O I
`O
`
`I
`50
`
`I
`100
`
`I
`I
`I
`I
`150 200 250 300 350 400
`TIME
`F163
`
`Google Ex. 1010
`
`
`
`U.S. Patent
`
`Feb. 23, 1999
`
`Sheet 3 of3
`
`5,874,686
`
`42
`2- /
`1.5‘
`
`42
`/
`
`40
`
`TIME
`
`Google Ex. 1010
`
`
`
`5,874,686
`
`1
`APPARATUS AND METHOD FOR
`SEARCHING A MELODY
`
`CROSS REFERENCE TO A RELATED
`APPLICATION
`
`This application claims priority of provisional application
`Ser. No. 60/008,177, ?led Oct. 31, 1995, Which is hereby
`incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`20
`
`25
`
`The present invention relates generally to database
`searching. More particularly, the present invention relates to
`melody searching.
`Next generation databases should include image, audio,
`and video data in addition to traditional text and numerical
`data. These data types Will require query methods that are
`more appropriate and natural to the type of respective data.
`For instance, a natural Way to query an image database is to
`retrieve images based on operations on images or sketches
`supplied as input. Similarly, a natural Way of querying an
`audio database (of songs) is to hum the tune of a song, as
`apparently addressed in T. Kageyama and Y. Takashima, “A
`Melody Retrieval Method With Hummed Melody”
`(language: Japanese), Transactions of the Institute of
`Electronics, Information and Communication Engineers
`D-II, J77D-II(8): 1543—1551, August 1994. Such a system
`Would be useful in any multimedia database containing
`musical data by providing an alternative and natural Way of
`30
`querying. One can also imagine a Widespread use of such a
`system in commercial music industry, music radio and TV
`stations, music stores, and even for one’s personal use.
`It has been observed that melodic contour, de?ned as the
`sequence of relative differences in pitch betWeen successive
`notes, can be used to discriminate betWeen melodies. See
`Stephen Handel, Listening: An Introduction to the Percep
`tion of Auditory Events, The MIT Press, 1989, Which indi
`cates that melodic contour is one of the most important
`methods that listeners use to determine similarities betWeen
`melodies. In Michael Jerome HaWley, Structure out of
`Sound, PhD thesis, MIT, September 1993, a method of
`querying a collection of melodic themes by searching for
`exact matches of sequences of relative pitches input by a
`MIDI keyboard is brie?y discussed.
`US. Pat. No. 5,510,572 discloses utiliZing pitch differ
`ences betWeen successive notes in classifying motion for a
`melody analyZer and harmoniZer, Wherein a search may be
`incidentally used to ?nd an appropriate chord progression to,
`for example, harmoniZe music so as to accompany a singer.
`Other art Which may be of interest includes US. Pat. Nos.
`5,040,081; 5,146,833; 5,140,886; 4,688,464, and 5,418,322.
`
`45
`
`SUMMARY OF THE INVENTION
`
`55
`
`It is an object of the present invention to easily, ef?ciently,
`and accurately search melodies.
`In order to easily, ef?ciently, and accurately search
`melodies, in accordance With the present invention, a com
`puter means is provided Which has a database of melodies
`each including a plurality of notes in a form of a sequence
`of digitiZed representations of relative pitch differences
`betWeen successive notes, a melody is inputted to the
`computer means and converted into a form of a sequence of
`digitiZed representations of relative pitch differences
`betWeen successive notes thereof, and the melody database
`is searched for at least one sequence of digitiZed represen
`tations of relative pitch differences betWeen successive notes
`
`2
`Which at least approximately matches the sequence of digi
`tiZed representations of relative pitch differences betWeen
`successive notes of the melody.
`The above and other objects, features, and advantages of
`the present invention Will be apparent in the folloWing
`detailed description of a preferred embodiment thereof When
`read in conjunction With the accompanying draWings
`Wherein the same reference numerals denote the same or
`similar parts throughout the several vieWs.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram illustrating an apparatus and
`method Which embodies the present invention.
`FIGS. 2 and 3 are graphs illustrating an excitation signal
`and train of such excitation signals respectively used to
`create a synthesiZed pitch for the method and apparatus.
`FIG. 4 is a graph of a formant structure Which is formed
`from certain formant frequencies.
`FIG. 5 is a graph of a synthesiZed pitch created by
`convolving the train of excitation pulses of FIG. 3 and the
`formant structure of FIG. 4.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`Referring to FIG. 1, there is illustrated generally at 10
`apparatus for searching a melody, Which is represented by a
`series of successive notes, illustrated at 12, Which of course
`have different pitches. In other Words, it may be desirable to
`knoW the identity of the particular melody or tune 12, and a
`database, illustrated at 14, of melodies or tunes is searched,
`as described hereinafter, to locate at least one melody or tune
`Which at least approximately matches the tune 12. The
`database 14 is shoWn to be contained Within a general
`purpose computer 16, but, alternatively, the database 14 may
`be located apart from the computer 16 and suitably con
`nected thereto for communicating betWeen the computer and
`database. Both of these alternatives are meant to come
`Within the scope of the present invention.
`In accordance With the present invention, the tune 12 is
`hummed by a person 18 into a microphone 20, and the
`hummed query, illustrated at 21, is suitably digitiZed, in
`accordance With principles commonly knoWn to those of
`ordinary skill in the art to Which this invention pertains, and
`the digitiZed signals of the hummed query 21 are fed to a
`pitch tracking module 22 in computer 16. The pitch tracker
`assembles a contour representation of the hummed melody
`12, as hereinafter discussed in greater detail, Which is fed to
`a query engine, illustrated at 24. The query engine 24
`searches the melody database 14 and outputs a ranked list of
`approximately matching melodies, as illustrated at 26. A
`preselected error tolerance may be applied to the search. The
`query engine 24 may of course alternatively be programmed
`to output the single most approximate matching melody or,
`if desired, to output an exact matching melody. HoWever, by
`searching for an approximate matching melody, as herein
`after discussed, various forms of anticipated errors may be
`taken into account.
`The database 14 of melodies may be acquired, for
`example, by processing public domain MIDI songs, or may
`otherWise be suitably acquired, and may be stored as a
`?at-?le database. Pitch tracking may be performed in, for
`example, Matlab softWare, a product of the MatWorks, Inc.
`of Natick, Mass., chosen for its built-in audio processing
`capabilities and the east of testing a number of algorithms
`Within it. Hummed queries may be recorded in a variety of
`
`Google Ex. 1010
`
`
`
`3
`formats, depending upon the platform-speci?c audio input
`capabilities of the Matlab software. For example, the format
`may be a 16-bit, 44 KhZ WAV format on a Pentium system,
`or a 8-bit 8 KhZ AU format on a Sun Sparcstation. The query
`engine 24 may use an approximate pattern matching
`algorithm, described hereinafter, in order to tolerate hum
`ming errors.
`User input 12 (humming) to the system 10 is converted
`into a sequence of relative pitch transitions, as folloWs. A
`note 12 in the input may be classi?ed in one of three Ways:
`a note is either the same as the previous note (S), higher than
`
`10
`
`the previous note (U), or loWer than the previous note Thus, the input is converted into a string With a three letter
`
`alphabet (U,D,S). For example, the introductory theme of
`Beethoven’s 5th Symphony Would be converted into the
`sequence: — S S D U S S D (the ?rst note is ignored as it
`has no previous pitch). For another example, the melody
`portion 12, as illustrated in FIG. 1 With one high note
`betWeen tWo loWer notes, Would be converted into the
`sequence: U D.
`To accomplish this conversion, a sequence of pitches in
`the melody must be isolated and tracked. This is not as
`straight-forWard as it sounds, hoWever, as there is still
`considerable controversy over exactly What pitch is. The
`general concept of pitch is clear: given a note, the pitch is the
`frequency that most closely matches What one hears. Per
`forming this conversion in a computer can become trouble
`some because some intricacies of human hearing are still not
`clearly understood. For instance, if one plays the 4th, 5th,
`and 6th harmonics of some fundamental frequency, one
`actually hears the fundamental frequency, not the harmonics,
`even though the fundamental frequency is not present. This
`phenomenon Was ?rst discovered by Schouten in some
`pioneer investigations carried out from 1938 to 1940.
`Schouten studied the pitch of periodic sound Waves pro
`duced by an optical siren in Which the fundamental of 200
`HZ Was cancelled completely. The pitch of the complex tone,
`hoWever, Was the same as that prior to the elimination of the
`fundamental. See R. Plomp, Aspects of Tone Sensation,
`Academic Press, London, 1976.
`Because of the interest in tracking pitch in humming,
`methods Were examined for automatically tracking pitch in
`a human voice. Before one can estimate the pitch of an
`acoustic signal, We must ?rst understand hoW this signal is
`created, Which requires forming a model of sound produc
`tion at the source. The vibrations of the vocal cords in voiced
`sounds are caused as a consequence of forces that are
`exerted on the laryngeal Walls When air ?oWs through the
`glottis (the gap betWeen the vocal cords). The terms “vocal
`folds” and “vocal cords” are more or less used as synonyms
`in the literature. Wolfgang Hess, Pitch Determination of
`Speech Signals, Springer-Verlag, Berlin Heidelberg, 1983,
`describes a model of the vocal cords as proposed by M.
`Hirano, “Structure and Vibratory Behavior of the Vocal
`Folds, ” in M. SaWashima and F. S. Cooper, editors,
`Dynamic aspects of speech production, pages 13—27, Uni
`versity of Tokyo Press, 1976. For the purposes of the present
`application though, it is suf?cient to knoW that the glottis
`repeatedly opens and closes thus providing bursts of air
`through the vocal tract.
`The vocal tract can be modeled as a linear passive
`transmission system With a transfer function
`If one
`adds an additional transfer function R(Z) Which takes into
`account the radiation, the output impedance of the vocal
`tract can approximately be set to Zero. In the neutral position
`Where the vocal tract can be regarded as a uniform tube, the
`resonances of the vocal tract occur at sound Wavelengths of
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,874,686
`
`4
`
`With L—17 cm (average value of vocal-tract length) and a
`sound propagation speed of c=340 meters/sec., the frequen
`cies of these resonances Will be:
`
`The frequencies Fk are called formant frequencies.
`The resulting sound that is heard is considered to be the
`convolution of the excitation pulse, illustrated at 30 in FIG.
`2, created by the glottis and the formant frequencies.
`Therefore, if one Wants to model a speech signal, one starts
`With a train of excitation pulses 30, as shoWn in FIG. 3. The
`period To in the train of excitations 30 is 0.01 sec. making
`the pitch 100 HZ. For the formant frequencies, the above
`equation for Fk is used With ke {1,2,3}. This gives formant
`frequencies: F1=500 HZ, F2=1500 HZ, and F3=2500 HZ.
`Combining these frequencies and adding an exponential
`envelope produces the formant structure shoWn at 40 in FIG.
`4. By convolving the train of excitation pulses 30 With the
`formant structure 40 a synthesiZed pitch, as shoWn at 50 in
`FIG. 5, is obtained.
`Since What one hears as pitch is considered to actually be
`the frequency at Which the bursts of air occur, one may be
`able to ?nd the pitch of the segment by tracking the bursts
`of air.
`There are at least three approaches to tracking pitch.
`The Maximum Likelihood approach, Which is described
`by James D. Wise, James R. Caprio, and Thomas W. Parks
`in “Maximum Likelihood Pitch Estimation,” IEEE Trans.
`Acoust., Speech, Signal Processing, 24(5): 418—423, Octo
`ber 1976, is a modi?cation of autocorrelation (hereinafter
`discussed) that increases the accuracy of the pitch and
`decreases the chances of aliasing.
`HoWever, this approach is very sloW due to its computa
`tional complexity. An implementation in Matlab softWare
`may typically take approximately one hour or longer to
`evaluate 10 seconds of audio on a 90 MHZ Pentium Work
`station. With some optimiZations, the performance may be
`improved to approximately 15 minutes per 10 seconds of
`audio, but this is still far too sloW for the purposes of this
`invention.
`The Cepstrum Analysis approach, Which is discussed in
`A. V. Oppenheim, “A Speech Analysis-Synthesis System
`Based on Homomorphic Filtering,” J. Acoustical Society of
`America, 45: 458—465, February 1969, and in Alan V.
`Oppenheim and Ronald W. Schafer, Discretetime-Signal
`Processing, Prentice Hall, EngleWood Cliffs, N.J., 1989,
`does not give very accurate results for-humming.
`In order to provide both rapid and accurate pitch tracking,
`a preferred pitch tracking approach is autocorrelation, Which
`is described in L. R. Rabiner, J. J. DubnoWski, and R. W.
`Schafer, “Realtime Digital HardWare Pitch Detector,” IEEE
`Transactions on Acoustics, Speech and Signal Processing,
`ASSP-24(1): 2—8, February 1976. This approach isolates
`and tracks the peak energy levels of the signal Which is a
`measure of the pitch. Referring back to FIG. 4, the signal
`s(n), illustrated at 40, peaks Where the impulses occur.
`Therefore, tracking the frequency of these peaks 42 should
`provide the pitch of the signal. In order to get the frequency
`of these peaks 42, autocorrelation is employed, Which is
`de?ned by:
`
`Google Ex. 1010
`
`
`
`5,874,686
`
`6
`previously discussed U,D,S characters, and the converted
`user input (the key 23) is compared With all the songs.
`The pattern-matching desirably uses a “fuZZy” search to
`alloW for errors Within the matches. These errors re?ect the
`inaccuracies in the Way people hum as Well as errors in the
`representation of the songs themselves.
`For performing the key-search Within the database 14, it
`is considered desirable to use an ef?cient approximate
`pattern matching algorithm. By “approximate” is meant that
`the algorithm should be able to take into account various
`forms of errors.
`Using the Word CASABLAN CA as an example, the
`folloWing are examples of the types of errors Which may
`occur in a typical pattern matching scheme.
`
`Where I is the lag and h is the input signal.
`While autocorrelation is subject to aliasing (picking an
`integer multiple of the actual pitch) and is computationally
`complex, it still provides greater speed and accuracy as
`compared to the previously discussed approaches. The
`present implementation of autocorrelation Was found to
`require approximately 45 seconds for 10 seconds of 44 KHZ,
`16-bit audio on a 90 MHZ pentium Workstation.
`In order to improve the performance and speed and
`robustness of the pitch-tracking algorithm, a cubic-spline
`Wavelet transform or other suitable Wavelet transform may
`be used. The cubic spline Wavelet peaks at discontinuities in
`the signal (i.e., the air impulses). One of the most signi?cant
`features of the Wavelet analysis is that it can be implemented
`in a ?lter bank structure and therefore computed in O(n)
`time.
`It is preferred that the cubic spline Wavelet implementa
`tion be an event-based implementation of the tracer.
`The cubic spline Wavelet transform is linear and shift
`invariant Which are useful properties for speech signals since
`they are often modelled as a linear combination of shifted
`and damped sinusoids.
`If a signal x(t) or its derivatives have discontinuities, then
`the modulus of the cubic spline transform of x(t) exhibits
`local maxima around the points of discontinuity. This is
`considered to be a desirable property for a cubic spline
`Wavelet pitch-tracker since glottal closure causes sharp
`changes in the derivative of the air How in the glottis and
`transients in the speech signal.
`It has been shoWn in S. Mallat and W. L. HWang,
`“Singularity Detection and Processing With Wavelets”,
`Technical Report, Courant Institute of Mathematical
`Sciences, NeW York, March, 1991, that if one chooses a
`Wavelet function g(t) that is the ?rst derivative of a smooth
`ing function (a smoothing function is a function Whose
`Fourier transform has energy concentrated in the loW
`frequency region) P(t), then the local maxima of the trans
`form indicate the sharp variations in the signal Whereas the
`local minima indicate the sloW variations. Hence, the local
`maxima of the transform using a Wavelet, Which is the ?rst
`derivative of a smoothing function, is considered to be
`desirable for detecting the abrupt changes or transients in the
`speech signal caused by glottal closure.
`One of the most appealing features of the cubic spline
`Wavelet transform is that is can be implemented in a ?lter
`bank structure. It therefore is 0(n) fast.
`Preferably, the autocorrelation pitch tracking approach
`includes loW-pass ?ltering (for removing non-vocal
`frequencies) and center-clipping, as described in M. M.
`Sondhi, “New Methods of Pitch Extraction,” IEEE Trans.
`Audio Electroacoust. (Special Issue on Speech Communi
`cation and Processing—Part II), AU-16: 262—266, June
`1968. It is considered that these Will help eliminate the
`formant structure that generally causes dif?culty for auto
`correlation based pitch detectors.
`This preferred form of autocorrelation takes betWeen 20
`and 45 seconds on a Sparc10 Workstation to process typical
`sequences of hummed notes. A brute-force search of the
`database unsurprisingly shoWs linear groWth With the siZe of
`the database, but remains beloW 4 seconds for 100 songs on
`a Sparc2. Therefore, the search time is currently effectively
`limited by the ef?ciency of the pitch-tracker.
`In order to search the database, songs in the database 14
`are preprocessed to convert the melody into a stream of the
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`In CASABLANCA
`
`In a melody
`
`Transposition error
`Dropout error
`Duplication error
`Insertion error
`
`CASABALNCA
`CAS BLANCA
`CASAABLANCA
`CASABTLANCA
`
`tWo notes reversed
`a note dropped
`a note duplicated
`a note inserted
`
`Several Algorithms have been developed that address the
`problem of approximate string matching. Running times
`have ranged from 0(mn) for the brute force algorithm to
`0(kn) or 0(nlog(m), Where “0” means “on the order of,” m
`is the number of pitch differences in the query, and n is the
`siZe of the string (song). See Ricardo BaeZa-Yates and G. H.
`Gonnet, “Fast String Matching With Mismatches,” Informa
`tion and Computation, 1992. Apreferred algorithm Which is
`considered to offer better performance in general for this
`purpose is that described in Ricardo A. Baesa-Yates and
`Chris H. Perieberg, “Fast and Practical Approximate String
`Matching, “Combinatorial Pattern Matching, Third Annual
`Symposium,” pages 185—192, 1992.
`This algorithm addresses the problem of string matching
`With k mismatches. The problem consists of ?nding all
`instances of a pattern string P=p1,p2,p3 .
`.
`. pm in a text
`string T=t1,t2,t3 .
`.
`. tn such that there are at most k
`mismatches (characters that are not the same) for each
`instance of P in T. When k=0 (no mismatches) one has the
`simple string matching problem, solvable in O(n) time.
`When k=m, every substring of T of length m quali?es as a
`match, since every character of P can be mismatched. Each
`of the errors in the above CASABLAN CA example corre
`sponds to k=1.
`The Worst case for this algorithm occurs When P (the key
`23) consists of m occurrences of a single distinct character,
`and T (contour representation of song) consists of n
`instances of that character. In this case the running time is
`0(mn). HoWever, this is neither a common nor useful situ
`ation for the present invention. In the average case of an
`alphabet in Which each character is equally likely to occur,
`the running time is
`
`Where |r| is the siZe of the alphabet.
`The computer 16 may desirably be programmed so that,
`for a given query, the database 14 returns a list of songs
`ranked by hoW Well they matched the query, not just one best
`match. The number of matches that the database 14 should
`retrieve depends upon the error-tolerance used during the
`key-search. This error-tolerance could be set in one of tWo
`possible Ways: either it can be a user-de?nable parameter or
`the database can itself determine this parameter based on, for
`
`Google Ex. 1010
`
`
`
`5,874,686
`
`7
`example, heuristics that depends on the length of the key.
`This Would give the user an opportunity to perform queries
`even if the user is not sure of some notes Within the tune.
`From the results of the query the user can identify the
`song of interest. If the list is too large, the user can perform
`a neW query on a restricted search list consisting of songs
`just retrieved. This alloWs the user to identify sets of songs
`that contain similar melodies.
`An experimental evaluation of the system tested the
`tolerance of the system With respect to input errors, Whether
`from mistakes in the user’s humming or from problems With
`the pitch-tracking.
`Effectiveness of the system 10 is directly related to the
`accuracy With Which pitches that are hummed can be tracked
`and the accuracy of the melodic information Within the
`database 14. Under ideal circumstances, one can achieve
`close to 100% accuracy tracking humming, Where “ideal
`circumstances” means that the user places a small amount of
`space betWeen each note and hits each note strongly. For this
`purpose, humming short notes is encouraged. Even more
`ideal is for the user to aspirate the notes as much as possible,
`perhaps going so far as to voice a voWel, as in “haaa haaa
`haaa.” Only male voices have been experimented With.
`The evaluation database contained a total of 183 songs.
`Each song Was converted from public domain General MIDI
`sources. Melodies from different musical genres Were
`included, including both classical and popular music. A feW
`simple heuristics Were used to cut doWn on the amount of
`irrelevant information from the data, e.g., MIDI channel 10
`Was ignored as this is reserved for percussion in the General
`MIDI standard. HoWever, the database still contained a great
`deal of information unrelated to the main theme of the
`melody. Even With this limitation, it is discovered that
`sequences of 10 to 12 pitch transitions Were suf?cient to
`discriminate 90% of the songs.
`As a consequence of using a fast approximate string
`matching algorithm, search keys can be matched With any
`portion of the melody, rather than just the beginning. As the
`siZe of the database groWs larger, hoWever, this may not
`prove to be an advantage.
`Contour representations for each song are stored in sepa
`rate ?les. Opening and closing ?les may become a signi?
`cant overhead. Performance may therefore be improved by
`packing all the songs into one ?le, or by using a database
`manager. The programming code may be moderniZed to
`make it independent of any particular database schema.
`The pattern matching algorithm in the form previously
`described does not discriminate betWeen the various forms
`of pattern matching errors discussed earlier, but only
`accounts for them collectively. Some forms of errors may be
`more common than others depending upon the Way people
`casually hum different tunes. For example, drop-out errors
`re?ected as dropped notes in tunes are more common than
`transposition or duplication errors. Tuning the key-search so
`that it is more tolerant to drop-out errors, for example, may
`yield better results.
`The generation of the melodic contours of the source
`songs automatically from MIDI data is convenient but not
`optimal. More accuracy and less redundant information
`could be obtained by entering the melodic themes for
`particular songs by keyboard.
`The resolution of the relative pitch differences may be
`desirably increased by using query alphabets of three, ?ve,
`and more possible relationships betWeen adjacent pitches.
`Early experiments using an alphabet of ?ve relative pitch
`differences (same, higher, much higher, loWer, much loWer)
`indicated changes of this sort are promising. One draWback
`
`1O
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`8
`of introducing more resolution is that the users must be
`someWhat more accurate in the intervals they actually hum.
`It should be understood that, While the present invention
`has been described in detail herein, the invention can be
`embodied otherWise Without departing from the principles
`thereof, and such other embodiments are meant to come
`Within the scope of the present invention as de?ned by the
`appended claims.
`What is claimed is:
`1. Apparatus for searching a melody comprising a com
`puter means having a database of melodies each including a
`plurality of notes in a form of a sequence of digitiZed
`representations of relative pitch differences betWeen succes
`sive notes, means for inputting a melody to said computer
`means, means for converting the melody into a form of a
`sequence of digitiZed representations of relative pitch dif
`ferences betWeen successive notes thereof, and means for
`searching said melody database for at least one sequence of
`digitiZed representations of relative pitch differences
`betWeen successive notes Which at least approximately
`matches said sequence of digitiZed representations of rela
`tive pitch differences betWeen successive notes of the
`melody.
`2. Apparatus according to claim 1 Wherein said input
`means comprises a microphone.
`3. Apparatus according to claim 1 further comprising
`means for outputting from said computer a ranked list of
`approximately matching melodies.
`4. Apparatus according to claim 1 Wherein said converting
`means comprises an autocorrelation based pitch detector
`means.
`5. Apparatus according to claim 1 Wherein said converting
`means comprises an autocorrelation based pitch detector
`means having loW-pass ?lter means for removing non-vocal
`frequencies and center-clipping means for eliminating
`unWanted formant structure.
`6. Apparatus according to claim 1 Wherein said converting
`means comprises a Wavelet pitch tracker.
`7. Apparatus according to claim 1 Wherein said converting
`means comprises a cubic spline Wavelet pitch tracker.
`8. Apparatus according to claim 1 further comprising
`means for approximately matching sequences of digitiZed
`representations of relative pitch differences betWeen succes
`sive notes.
`9. A method for searching a melody comprising the steps
`of: (a) selecting a computer means having a database of
`melodies each including a plurality of notes in a form of a
`sequence of digitiZed representations of relative pitch dif
`ferences betWeen successive notes, (b) inputting a melody to
`the computer means, (c) converting the melody into a form
`of a sequence of digitiZed representations of relative pitch
`differences betWeen successive notes thereof, and (d)
`searching the melody database for at least one sequence of
`digitiZed representations of relative pitch differences
`betWeen successive notes Which at least approximately
`matches the sequence of digitiZed representations of relative
`pitch differences betWeen successive notes of the melody.
`10. A method according to claim 9 Wherein the step of
`inputting the melody comprises humming into a micro
`phone.
`11. A method according to claim 9 further comprising
`operating the computer to output a ranked list of approxi
`mately matching melodies.
`12. A method according to claim 9 Wherein the step of
`converting comprises applying to the sequence of represen
`tations of relative pitch changes betWeen successive notes of
`the hummed melody an autocorrelation based pitch detector.
`
`Google Ex. 1010
`
`
`
`5,874,686
`
`9
`13. A method according to claim 9 wherein the step of
`converting comprises applying to the sequence of represen
`tations of relative pitch changes betWeen successive notes of
`the hummed melody an autocorrelation based pitch detector
`having loW pass ?ltering for removing non-vocal frequen
`cies and center-clipping for eliminating unWanted formant
`structure.
`14. A method according to claim 9 Wherein the step of
`converting comprises applying to the sequence of represen
`tations of relative pitch changes betWeen successive notes of 10
`the hummed melody a Wavelet pitch tracker.
`
`10
`15. A method according to claim 9 Wherein the step of
`converting comprises applying to the sequence of represen
`tations of relative pitch changes betWeen successive notes of
`the hummed melody a cubic spline Wavelet pitch tracker.
`16. A method according to claim 9 further comprising
`operating the computer means to obtain approximate
`matches of sequences of digitiZed representations of relative
`pitch differences betWeen successive notes.
`
`Google Ex. 1010
`
`