`
`US006996521B2
`
`
`(12)United States Patent
`
`Iliev et al.
`
`(10)Patent No.:US 6,996,521 B2
`
`Feb.7,2006
`(45)Date of Patent:
`
`(54)AUXILIARY CHANNEL MASKING IN AN
`
`
`
`AUDIO SIGNAL
`
`WO
`
`WO 00/04662 1/2000
`
`OTHER PUBLICATIONS
`
`International Search Report for Application No. PCT/US
`
`(75)Inventors: Alexander I. Iliev, Miami, FL (US);
`
`
`
`
`
`
`Michael S. Scordilis, Miami, FL (US)
`
`
`
`01/31214, dated May 28, 2002.
`XP-001076669, David R. Perrott et al., "Minimum Audible
`
`
`
`
`
`
`
`
`
`Angle Thresholds for Sources Varying in Both Elevation and
`
`
`Azimuth", J. Acoust. Soc. Am., vol. 87 No. 4,Apr. 1990, pp.
`1728-1731.
`
`
`
`
`
`( *) Notice: Subject to any disclaimer, the term of this
`
`
`
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 839 days.
`
`(73) Assignee: The University of Miami, Miami, FL
`
`
`
`
`
`(US)
`
`(21)Appl. No.: 09/969,615
`
`
`
`(22)Filed:Oct. 4, 2001
`
`(Continued)
`
`Primary Examiner-W. R. Young
`
`
`
`Assistant Examiner-Jakieda Jackson
`
`
`
`
`(74)Attorney, Agent, or Firm-Christopher & Weisberg,
`P.A.
`
`(65)
`
`
`
`Prior Publication Data
`
`(57)
`
`ABSTRACT
`
`
`
`
`
`US 2002/0059059 Al May 16, 2002
`
`
`
`
`
`Related U.S. Application Data
`
`A method is provided for embedding data into an audio
`
`
`
`
`
`
`signal and determining data embedded into an audio signal.
`
`
`
`
`In the method for embedding data into an audio signal, the
`
`(60)Provisional application No. 60/238,009, filed on Oct.
`
`
`
`audio signal is based on a first set of data and includes a
`
`
`phase component. The method modifies at least a portion of
`
`
`the phase component of the audio signal to embed a second
`(51)Int. Cl.
`
`
`
`set of data into the audio signal. The modified audio signal
`G0IL 11/00 (2006.01)
`
`
`
`
`can be made to differ with respect to the audio signal in a
`
`
`
`
`(52)U.S. Cl. ................... 704/200; 704/200.1; 704/230;
`
`
`
`manner at least one of (i) substantially imperceptible and (ii)
`375/240.03
`
`
`
`imperceptible to a listener of the first set of data depending
`
`
`(58) Field of Classification Search ................ 704/200,
`
`
`
`on the extent that the phase component of the audio signal
`
`
`
`704/200.1, 230; 128/746; 395/2.14; 375/240.03
`
`is modified. In the method for determining data embedded
`
`
`See application file for complete search history.
`
`
`
`
`into an audio signal, the audio signal is based on a first set
`
`
`of data of an original audio signal and includes a phase
`
`
`component. The method determines a second set of data
`
`
`
`
`embedded into the audio signal based on the phase compo
`U.S. PATENT DOCUMENTS
`
`
`
`
`
`nent of the audio signal. The audio signal differs with respect
`
`
`5,645,074 A * 7/1997 Shennib et al. ............. 600/559
`
`
`
`
`to the original audio signal in a manner that is at least one
`
`
`
`
`5,682,461 A * 10/1997 Silzle et al. ................ 704/205
`
`
`
`
`of (i) substantially imperceptible and (ii) imperceptible to a
`
`
`
`6,738,423 Bl * 5/2004 Lainema et al. ....... 375/240.03
`
`
`listener of the first set of data.
`
`
`
`6,2000.
`
`(56)
`
`
`
`References Cited
`
`WO
`
`FOREIGN PATENT DOCUMENTS
`WO 99/11020 3/1999
`
`-- - -- - - - -----t�\----
`�
`
`
`Original Stereo Audio Signal
`
`
`
`15 Claims, 11 Drawing Sheets
`
`
`
`-- - - - - -
`650
`Data to be
`Masked
`
`605
`Right
`Channel
`
`610
`Left
`Channel
`
`635
`Magnitude and
`
`Phase Spectrum
`
`Psychoacoustical
`655
`Comparison
`Phase Threshold
`Masked Channel
`&Detection
`Encoding
`
`Magnitude and
`
`Phase Spectrum
`630
`
`680
`
`Erroneous Frequency
`
`Components
`675
`
`Yes
`'------------ZErrors?
`
`N-pointFFT
`
`No
`
`690
`NewLeft
`Channel
`
`MZ Audio, Ex. 2008, Page 1 of 22
`
`
`
`US 6,996,521 B2
`
`Page 2
`
`OTHER PUBLICATIONS
`
`XP-001076670, Armin Kohlrausch, “Binaural Masking
`Experiments Using Noise Maskers With Frequency-Depen-
`dent Interaural Phase Differences.II: Influence of Frequency
`and Interaural-Phase Uncertainty”, J. Acoust. Soc. Am., vol.
`88 No. 4, Oct. 1990, pp. 1749-1756.
`XP-002197648, Tolga Cilogluet al., “An Improved All-Pass
`Watermarking Scheme for Speech and Audio”, Proceedings
`of IEEE International Conference on Multimedia and Expo,
`
`New York, NY, USA, Jul. 30, 2000, vol. 2, pp. 1017-1020.
`XP-000635079, W. Bender et al., “Techniques for Data
`Hiding”, IBM Systems Journal, vol. 35, Nos. 3 and 4, 1996,
`pp. 313-335.
`John F. Tilki et al., “Encoding a Hidden Auxiliary Channel
`Onto a Digital Audio Signal Using Psychoacoustic Mask-
`ing”, Proceedings of IEEE Southeastoon °97, Blacksburg,
`VA, USA Apr. 12-14, 1997, pp. 331-363.
`
`* cited by examiner
`
`MZ Audio, Ex. 2008, Page 2 of 22
`
`MZ Audio, Ex. 2008, Page 2 of 22
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 1 of 11
`
`US 6,996,521 B2
`
`5A
`
`Se
`
`FIG.| Azimuth
`Plane
`
`UN
`OT
`
`Left
`
`MZ Audio, Ex. 2008, Page 3 of 22
`
`MZ Audio, Ex. 2008, Page 3 of 22
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 2 of 11
`
`US 6,996,521 B2
`
`006+
`
`q---~-~~..-----~.....
`
`qaMizy
`
`¢Old sued
`
`aq
`
`MZ Audio, Ex. 2008, Page 4 of 22
`
`MZ Audio, Ex. 2008, Page 4 of 22
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 3 of 11
`
`US 6,996,521 B2
`
`sooIsaqqUlsouYINUIZY
`
`€DI
`
`i E 20ue)sIcT mary
`
`SOURISIC 9]
`
`@a
`
`oga
`
`Oo5;ot
`
`oOao
`
`wmCaaSs
`0bah
`
`MZ Audio, Ex. 2008, Page 5 of 22
`
`MZ Audio, Ex. 2008, Page 5 of 22
`
`
`
`U.S. Patent
`
`wee ee eh ee le et ef ee
`
`Nfeepeeeeteteeeee'(II1-fo|Pot|iIiiia4LweedeneheedoneiiiiI|iiioe}itd1
`-roota='It'=iii}+—_—oooeeeeekeeeeeeraoon47r_—
`
`
`oSi‘irs4i_--SJ---~-i----4-----
`Dpechoedeee=+a2ii:3peeeebeedeLee4.---1----
`CSTCcg'T9TiclT8090790ZO
`
`tia1}e!ii]'tNOenetmeposterfersoneToor
`~i
`—it1gPot
`ainK(z)Aousnbary
`a.rOld
`—'I’i=1i4——1i
`
`
`
`
`
`
`so0i8aq¢pure‘Zz‘]Josajduynwizysomos
`
`Ov
`
`Q9
`
`(dI) soUsIATAC aseyg jeInevssya]
`
`MZ Audio, Ex. 2008, Page 6 of 22
`
`MZ Audio, Ex. 2008, Page 6 of 22
`
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 5 of 11
`
`gO
`a
`
`:
`uy
`
`™N
`_
`(aadap ynwuize} Vy
`
`oS
`
`MZ Audio, Ex. 2008, Page 7 of 22
`
`US 6,996,521 B2 o
`
`”
`S
`pa
`
`Qo
`oO
`ae
`3
`¢
`gi
`
`ao
`Oo
`ko
`
`aT
`
`f
`CN
`
`MZ Audio, Ex. 2008, Page 7 of 22
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 6 of 11
`
`US 6,996,521 B2
`
`aq0}eed
`
`pose
`
`worezyuendq9]
`
`Ladwrod-N
`
`
`
`Aouonboiysnosuolg
`
`syuauodut07)
`
`JV
`
`Yo]MON
`
`jeuueyy
`
`9“Old
`
`
`
`pouueypoyseW
`
`SUIpoou
`
`uonoajaq=wostedwi07
`
`jeoysnoseoyoAsg
`
`
`
`Pyoyserp],sey
`
`MOPULA,
`
`yurod-Ny
`
`
`
`pureopryruseyy]
`
`
`
`umngoodgoseyd
`
`yoT
`
`pouueyy
`
`MZ Audio, Ex. 2008, Page 8 of 22
`
`pueapnyuseyy
`
`
`
`winsjoadgaseyd
`
`IYsIYy
`
`yauueyy
`
`eeeeeee
`
`
`
`[eusigorpnyoarzayg[eursuO
`
`
`
`
`
`MZ Audio, Ex. 2008, Page 8 of 22
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 7 of 11
`
`US 6,996,521 B2
`
`092
`
`pouueyD
`
`payxseyypopoosg
`
`uoWVIaIaq¥uosirdii09
`
`JeonsnosvoyoAsg
`
`
`
`pjoysemy,aseyg
`
`puespnyusey]
`
`
`
`wmnysadgaseyd
`
`StL
`
`LDJA
`
`oolay¢$$008
`
`
`mnydadgoseygsaamotjouueyy
`
`
`purapmyruseyyLidWONwary
`O@LOn906:
`
`
`
`jeusigoipny
`
`BedpoysepyWIA
`
`MZ Audio, Ex. 2008, Page 9 of 22
`
`MZ Audio, Ex. 2008, Page 9 of 22
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 8 of 11
`
`US 6,996,521 B2
`
`008
`
`
`LINANOdWOD
`
`ASVHdVNIGNTONITVNOISVSATBOXS|
`
`
`
`ASVHdJHLASIGOWN
`
`AHLAOLNHNOdWOO
`
`TVNDIS
`
`8Ol4
`
`MZ Audio, Ex. 2008, Page 10 of 22
`
`MZ Audio, Ex. 2008, Page 10 of 22
`
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 9 of 11
`
`US 6,996,521 B2
`
`IVNOISVSATHORN
`
`daqdsv4dWaVIVG4OLAS
`CQNOOdSVANIALad
`VIVdHOLHSLSVNOdaSsvd
`
`‘IVNDISAHLOLNI
`
`6Old
`
`MZ Audio, Ex. 2008, Page 11 of 22
`
`MZ Audio, Ex. 2008, Page 11 of 22
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 10 of 11
`
`US 6,996,521 B2
`
`Q12
`
`RECEIVER
`TRANSMITTER
`
`PROCESSOR
`
`MEMORY
`
`FIG.10
`
`MZ Audio, Ex. 2008, Page 12 of 22
`
`me
`fx}
`
`2a
`
`a
`Yo
`
`:&
`
`MZ Audio, Ex. 2008, Page 12 of 22
`
`
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 11 of 11
`
`US 6,996,521 B2
`
`ITOld
`
`MZ Audio, Ex. 2008, Page 13 of 22
`
`MZ Audio, Ex. 2008, Page 13 of 22
`
`
`
`US 6,996,521 B2
`
`1
`AUXILIARY CHANNEL MASKING IN AN
`AUDIO SIGNAL
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/238,009, filed Oct. 6, 2000, which is
`incorporated in this application by this reference.
`
`BACKGROUND
`
`1. Field of the Invention
`This invention generally relates to the field of signal
`processing and communications. More particularly,
`the
`present invention relates to embedding data into an audio
`signal and detecting data embedded into an audio signal.
`2. Description of Background Information
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and may be easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and may be easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`the Internet),
`Audio delivery through a network (e.g.,
`presented as a stand-aloneservice or as part of a multimedia
`presentation, comes in a large range of perceived qualities.
`Signal quality dependson the audio content (e.g., speech and
`music), the quality of the original recording, the available
`channel bandwidth, and real-time transmission constraints.
`Real-time Internet audio usually applies to broadcasting
`services. It is generally achieved by streaming audio, which
`is decoded at a receiving workstation. Real-time transmis-
`sion requirements impose limitations on signal quality. At
`present, audio streaming delivers quality comparable to AM
`radio.
`
`By relaxing real-time constraints, new opportunities for
`services have appeared where the quality and security of the
`transmitted audio is enhanced. Such services include the
`secure downloading of CD-quality music at transmission
`rates that are too high for real-time transmission but lower
`than the CD standard. Such signal compression capitalizes
`on psychoacoustic properties of human hearing.
`Security and authentication of audio distributed over
`networks (e.g., non-homogeneous networks) is also often
`required, in addition to low bit rates that do not compromise
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`audio quality. Moreover, perceptual coding may be used for
`the insertion of new, secure information to an original audio
`signal in a waythat this information remains inaudible and
`extractable by secure means. This process is generally
`referred to as watermarking.
`Simultaneous frequency masking is used to implement
`perceptual coding and transparent watermarking in digital
`audio. Frequency masking is a property of hearing that
`renders audio signal components in a frequency region
`inaudible if a component of higher energy is in the same
`vicinity. The ability of the dominant component to mask
`others dependsonits relative energy and on its proximity to
`the other audio signal components. In addition to simulta-
`neous frequency masking,
`temporal masking is used to
`reduce pre-echoes and post-echoes resulting from signal
`processing.
`While masking in the power spectrum of auditory signals
`dominates audio coding and watermarking techniques, the
`phase information has not been involved to date(see, e.g.,
`Nyquist & Brand, Measurements ofPhase Distortion, BELL
`SYS. TECH. J., Vol. 7, 522-49 (1930); D. Preis, Phase and
`Phase Equalization in Audio Signal Processing—A Tutorial
`Review, J. AUDIO ENGINEERINGSOCIETY, Vol. 30, No.
`11, 774-94 (1982)).
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts relative phase and intensity differences due
`to a source located on an azimuth plane.
`FIG. 2 depicts a sound source on an azimuth plane.
`FIG. 3 depicts, for a sound source distance of r=5 m,plots
`of distances between a sound source and each of twoears,
`and a plot of the differences of the distances between the
`sound source and of each of the two ears.
`FIG. 4 depicts interaural phase differences (in degrees)
`plotted against frequency for azimuth angles of 1, 2, and 3
`degrees.
`FIG. 5 depicts minimum audible angle values as a func-
`tion of frequency for zero azimuth and zero elevation.
`FIG. 6 depicts an embodimentof a method for an encoder
`of an auxiliary channel.
`FIG. 7 depicts an embodiment of a method for a decoder
`of an auxiliary channel.
`FIG. 8 depicts an embodimentof a method for embedding
`data into an audio signal.
`FIG. 9 depicts an embodiment of a method for determin-
`ing data embedded into an audio signal.
`FIG. 10 depicts an embodiment of an apparatus for
`embedding data into an audio signal and/or determining data
`embedded into an audio signal.
`FIG. 11 depicts an embodiment of a machine-readable
`medium having encoded information, which when read and
`executed by a machine causes a method for embedding data
`into an audio signal and/or determining data embedded into
`an audio signal.
`
`DETAILED DESCRIPTION
`
`An embodiment of a method allows for signal processing
`to include information, such as an auxiliary channel, in an
`audio signal (e.g., a stereo audio signal), in a mannerthat is
`not perceptible by listening. This included information does
`not limit the capacity of data that the original, unmodified
`audio signal may contain. The method, for example, uses
`principles of binaural hearing, and the minimum audible
`angle (“MAA”)in particular, which is the minimum detect-
`able angular displacement of a sound source in auditory
`
`MZ Audio, Ex. 2008, Page 14 of 22
`
`MZ Audio, Ex. 2008, Page 14 of 22
`
`
`
`US 6,996,521 B2
`
`3
`space. The method then varies a phase spectrum (e.g.,
`short-time phase spectrum) of an audio signal within a range
`controlled by the MAA to encode information(e.g., digital
`information) in various forms(e.g., text, images, speech, and
`music) into the phase spectrum of the audio signal.
`This method is simpler to compute and to implementthan
`simultaneous frequency masking in the power spectrum,
`mentioned above, and allows for simultaneous encoding of
`a masked digital multimedia signal. The method allows,in
`effect, for the “hiding” (e.g., masking) of considerably more
`information in an audio signal than simultaneous frequency
`masking. Also, the method may allow for the inclusion of
`audibly imperceptible parallel (e.g., information related to
`the audio signal) and/or secure (e.g., encrypted) information
`in an audio signal.
`As used herein, the term audio signal encompasses any
`type of signal comprising audio. In addition to including
`traditional stand-alone audio,
`the term audio signal also
`encompasses any audio that is a component of a signal
`including other types of data. For example, the term audio
`signal as used herein extends to audio components of
`multimedia signals, of video signals, etc. Furthermore, as
`used herein, an auxiliary channel is simply a form of data or
`information that may be embedded into an audio signal
`and/or detected as embedded in an audio signal. While the
`information in an auxiliary channel as used herein maybe in
`stream format such as audio or video, the information and
`data that may be embedded and/or detected in an audio
`signal mayalso be in non-stream format such as one or more
`images or itemsoftext.
`The detailed description refers to the accompanying draw-
`ings that illustrate embodiments of the present invention.
`Other embodiments are possible and modifications may be
`made to the embodiments without departing from the spirit
`and scope of the invention. Therefore, the detailed descrip-
`tion is not meantto limit the invention. Rather the scope of
`the invention is defined by the appended claims, and their
`equivalents.
`
`Embodiment of a Binaural Hearing Phase Tolerance Model
`Binaural Phase Information in Sound Source Localization
`
`To estimate direction and distance (i.e., location) of a
`sound source, a listener uses binaural (both ears) audible
`information. This may be achieved by the brain processing
`binaural differential information and includes:
`
`interaural phase or time difference (IPD/ATD);
`interaural intensity or loudness difference (IID/ILD); and
`spectral notches, whose locations depend on the elevation
`angle of incidence of the sound wavefront.
`FIG. 1 depicts an example of a sound source located on
`the azimuth plane and of plausible audio signal segments
`arriving at a listener’s ears. The sound source in FIG. 1 is
`closer to the right ear and as a result sound from the sound
`source arrives at the right ear earlier than the left ear. Since
`all sounds travel with equal speed in space, the frequency-
`dependent
`time difference perceived is in the interaural
`phase difference or IPD. For a source at a fixed distance, a
`minimum angular movement may bedetectable bylistening.
`This MAA may be dependent on elevation, azimuth, and
`frequency. The changes in phase and intensity may vary as
`a sound source is moved around the head ofa listener.
`
`The IID/ALD is another type of binaural difference per-
`ceived. In FIG. 1, the audio signalat the right ear has higher
`intensity or loudness than the audio signal arriving atthe left
`ear because of the inverse square distance law applying to
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`spherical wave propagation in free space, as well as the
`contribution of the acoustic shadow of the head falling on
`the left hemisphere.
`
`MAA/IPD Relationship
`FIG. 2 illustrates a geometricalrelationship of parameters
`related to a sound source located on the azimuth plane
`comprising a listener’s ears (i.e., a horizontal plane). The
`MAA playsa significant role in sound localization. In FIG.
`2, is an azimuth angle, r is a distance from the sound source
`to the center of the listener’s head, and d is an interaural
`distance. The distance of the source from the right ear is Ar
`and from the left ear is Al, whereas Adis their difference,
`which may be expressed as:
`
`dy
`Ar’ = (rxcos0)? + [r-sina - | , and
`dy
`AP = (r«cos0)? + [sina + | , 80
`
`Ad=Ar-Al
`
`()
`
`(2)
`
`(3)
`
`FIG. 3 depicts a plot of exemplary Ar, Al and Ad for:
`source distance of r=5 meters, interaural distance of d=0.17
`m (ie., a typical interaural distance for an adult listener),
`zero elevation, and azimuth angle changing over a complete
`revolution around the listener. Ad is independent of source
`distance (see above). The IPD is a function of frequency and
`may be expressed as:
`
`®d =ad(2} «360° or @= ado(2}x2ex radiians,
`c
`c
`
`where ® is the resulting IPD, f is the frequency of a sound
`source, and c is the speed of soundin air. FIG. 4 illustrates
`a plot of ® for azimuth angles of 1°, 2° and 3°, where c=344
`m/s.
`
`the IPD detectable by the human
`On the one hand,
`auditory system for a source moving on the azimuth plane is
`a function of the MAA,as expressed in equations(1) to (4),
`and depicted on FIG. 4. On the other hand, the MAA is a
`function of source location and frequency, with highest
`sensitivity corresponding to source movements confined on
`the azimuth plane, such as, for example,
`in a forward
`direction (e.g., azimuth angle, 8=0°) (see W. A. Yost, FUN-
`DAMENTALS OF HEARING(1993)). FIG. 5 depicts a plot
`of the MAA as a function of frequency for @=0°.
`FIG. 5 illustrates that, in the most sensitive region of the
`acoustic space (e.g., zero azimuth and zero elevation), the
`MAA rangesfrom about 1° to 2.5°. Angular values may be
`smallest for frequencies below 1 kHz and increasefor higher
`frequencies. If an MAA of 1° is assumed (the conservative,
`worst case selection), then the resulting maximum imper-
`ceptible IPD may be expressed as:
`
`IPDnax=—3-104E-3 *f(degrees)
`
`(5)
`
`As such, the maximum IPD values may range from 0° at DC
`(i.e., O Hz) to -68.3° at 22 kHz. Source movements that
`result in IPD values within that frequency-dependent upper
`bound maynot be detectable by the human auditory system.
`
`MZ Audio, Ex. 2008, Page 15 of 22
`
`MZ Audio, Ex. 2008, Page 15 of 22
`
`
`
`US 6,996,521 B2
`
`5
`Inaudible Random Phase Distortion for Stereo Audio
`Signals
`The analysis described above is based on a sound source
`emitting a pure sinusoidal tone and localized by binaural
`hearing. For a group of sound sources, the MAA and IPD
`results would be valid for such sources emitting the same
`tone. Principles of linearity and superposition suggest that a
`group of sound sources emitting identical pure tones at the
`same loudness levels may not be able to be distinguished
`into individual sound sources provided that their locations
`are within an MAA corresponding to their spatial location.
`As such,a pair of identical sound sources will be perceived
`to be fused to a single sound source if their separation is
`smaller than the corresponding MAA ofthe region contain-
`ing such sound sources or if the resulting IPD is below
`computed maximum limits (i.e., a threshold).
`In an experiment, a stereo audio signal consisting of
`identical tones was synthesized and an imageof one channel
`was movedby increasing a phase difference between the two
`channels. Listening tests confirmed that IPDs corresponding
`to an MAA ofbetween 1° and 2° were not detectable. Such
`observations were in agreement with results reported in the
`Embodiment of an Encoder of an Auxiliary Channel
`relevant scientific literature (see, e.g., W. A. Yost, FUNDA-
`FIG. 6 depicts an embodiment of a method 600 for an
`MENTALS OF HEARING(1993)).
`encoder of a masked audio signal. In block 605, the method
`Aset of experiments was then conducted to determine the
`600 receives a right channel of an audio signal (e.g., a
`extent to which the principles of linearity and superposition
`CD-quality stereo audio signal). In block 610, the method
`apply in a case of complex acoustic stimuli, as opposed to
`600 receives a left channel of the audio signal. The method
`a case of pure tones. Using Fourier analysis (e.g., using the
`600 may perform a complete analysis-resynthesis loop, and
`Fast Fourier Transform (FFT)algorithm), audio signals may
`may apply N-point rectangular windowsto the left channel
`be expressed as a weighted sum of sinusoids at different
`and to the right channel to start a short-time analysis of the
`oscillating frequencies and different phase values.
`audio signal, as block 615 illustrates. In block 620,afirst
`Short-time Fourier analysis, for example, was performed
`FFT algorithm computes a magnitude spectrum and a phase
`on speech and music stereo audio signals sampled at 44,100
`spectrum of the right channel, illustrated in block 635. In
`Hz. FFT was applied on 1024-point rectangular windows.
`block 625, a second FFT algorithm computes a magnitude
`The resulting frequency components located at about each
`spectrum and a phase spectrum of the left channel, illus-
`21.5 Hz apart were considered as independent stimuli. The
`trated in block 630. In an embodiment, a 1024-point rect-
`FFT algorithm provided the phase value of each component
`angular window (e.g., N=1024) is applied.
`in modulo (27) form. Because 2rotation of a sound source
`In block 640, the method 600 compares the phase differ-
`on a particular plane corresponds to one full rotation, the
`ence between the left channel and the right channel for each
`phase was not unwrappedto its principal value. The number
`frequency component against an IPD psychoacoustic thresh-
`of rotations a source may have madeis of no consequence.
`old, expressed in, for example, equation (6) where the
`The cosine of a phase difference between right and left
`MAA=1° and illustrated in block 645. Phase components
`channels (e.g., a stereo pair) was usedto test the correspond-
`outside the threshold may be left untouched and passed on
`ing IPD. When
`for synthesis. The remaining components are part of the
`encoding space.
`In block 650, method 600 receives data to be masked into
`the audio signal. For the case of encoding a single-bit-per-
`frequency-component whenever a logical zero is being
`encoded, for example, the phase values of the left channel
`and the right channel may be made equal. For the case of
`logical one being encoded, for example, the phase difference
`between the two channels may be made less or equal to the
`maximum permissible IPD for that frequency component.
`The method 600 may use a 1-bit encoding process as
`follows:
`
`6
`When 3° was used for the MAA,the affected changes were
`perceivable for all masker audio signals and all listeners.
`When 2° was used for the MAA,the change to the audio
`signal remained nearly unnoticeable for rock music, and
`somewhat audible for speech and classical music. For the
`case of 6=1°, however, the broadband noise was success-
`fully masked for all masker audio signals andall listeners,
`confirming that 9=1° as a possible maximum unnoticeable
`angular displacement of the sound source from the median
`plane.
`Having extended the MAA results for azimuth angles to
`complex acoustic stimuli and determined that the phase
`spectrum of audio signals may be randomly disturbed within
`the IPD boundsresulting from the MAA, masking mean-
`ingful
`information into an audio signal was performed.
`Frequency components having an IPD below an audible
`threshold may beidentified and set to zero to achieve signal
`compression. Also, new information may be included as an
`auxiliary channelthat is not audibly perceptible to a listener
`(e.g., watermarked). The auxiliary channel may be made
`further secure by encryption.
`
`10
`
`15
`
`20
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`cos [(phase(right, f)-phase(left, f)]><cos(-3.104E-3
`“fy
`
`(6)
`
`wheref, is frequency samples from 0 Hz to 44,100/2 Hz, and
`phase is in degrees, the phase information of the stereo pair
`at f; was considered blurred. All such components were
`identified on a short-time basis, their right channel wasleft
`intact, while the phase of their left channel at f, was ran-
`domly changed up to the value of IPD,,,,,. corresponding to
`f,. The altered stereo audio signal was resynthesized through
`an Inverse Fast Fourier Transform (IFFT) algorithm. Lis-
`tening tests, where subjects were presented with the original
`and the processed stereo audio signals, revealed that it was
`not possible to distinguish between the two stereo audio
`signals. Thus, the linearity and superposition principles were
`proven to be valid for the given conditions;
`thereby, the
`results for pure tone audio signals may be extended to
`complex acoustic stimull.
`Listening tests for MAA of 2° and 3° were also performed
`with and various types of audio selected as the masker audio
`signal and with broadband noise being the data, in the form
`of another audio signal, masked into the masker signal.
`
`60
`
`65
`
`phase[X, (f)|=phase[X,,(f)] logical 0
`
`phase[X, ()|=AIPD,,,,,(f)logical 1
`
`phase[X, (f)|>=/PD,,,,,(/)no encoding
`
`7D
`
`(7.2)
`
`(7.3)
`
`The approach taken in this process is to use the right channel
`as reference and to alter the phase of the left channel.
`Constant k in equation (7.2) specifies the amount of phase
`difference within the IPD threshold which would denote a
`
`logical one. In an embodiment, k= was used.
`
`MZ Audio, Ex. 2008, Page 16 of 22
`
`MZ Audio, Ex. 2008, Page 16 of 22
`
`
`
`US 6,996,521 B2
`
`7
`In block 655, the method 600 collects all the frequency
`components of the left channel, both those altered as well as
`those left unchanged bythe application of the psychoacous-
`tical threshold comparison and constructs the new frequency
`representation of the left channel which now contains the
`masked data.
`
`In block 660, method 600 computes the N-point IFFT of
`the new left channel to produce its time-domain represen-
`tation. This is followed by a 16 bit quantization of the
`produced time sequence, whichis a preferred representation
`of digital audio signals (e.g., CD-quality audio).
`The effects of quantization noise on the masking process
`are tested in method 600 by employing an N-point FFT in
`block 670 that converts the obtained time sequence of the
`new left channel back into the frequency domain. Block 675
`compares the frequency representation of the new left chan-
`nel obtained via high-precision arithmetic and available at
`block 655 against its representation which has been sub-
`jected to 16-bit quantization in the time domain. If the
`quantization has disturbed the representation of the masked
`data then the erroneous frequency components are detected
`and rendered unusable by the masking process by making
`their phases large enough to escape encoding in the next
`round. This is achieved in block 680 by makingthe phase of
`the erroneous frequency components correspond to 120% of
`the IPD of that frequency location. The new phaseprofile of
`the left channel is again presented to block 655 for encoding
`the masked data via block 640. This testing cycle repeats
`until no errors are detected in the masking process. If the
`inserted data masked in a given N-point audio signal frame
`has not been altered by the quantization process and there-
`fore no errors were detected then the encoding process has
`been declared successful and the new N points of the left
`channel are presented for storage or transmission at block
`690. This encoding process continues with subsequent
`N-point frames of the original audio signal until no more
`data are left in block 650.
`
`10
`
`25
`
`As will be apparent to those skilled in the art, a variant of
`method 600 may equally be applied to alter the right channel
`and use the left channel as reference. Additionally, to those
`skilled in the art, a variant of the method 600 maybe applied
`to alter both the left and right channels. Moreover,
`the
`method 600 may be applied to just one channel or extended
`to more channels than just left and nght channels.
`
`40
`
`Embodiment of a Decoder of an Auxiliary Channel
`FIG. 7 depicts an embodiment of a method 700 for a
`decoder of a masked audio signal. In block 705, the method
`700 receives a right channel of an audio signal (e.g., a
`CD-quality stereo audio signal). In block 710, the method
`700 receives a left channel of the audio signal. The method
`700 may apply N-point rectangular windows to the left
`channel and to the nght channelto start a short-time analysis
`of the audio signal, as block 715 illustrates. The value of N
`should, although not necessarily, match the corresponding
`value used during encoding, for example, in an embodiment
`N=1024. In block 720, a first FFT algorithm computes a
`magnitude spectrum and a phase spectrum of the right
`channel, illustrated in block 7390. In block 725, a second FFT
`algorithm computes a magnitude spectrum and a phase
`spectrum of the left channel, illustrated in block 735.
`In block 740, the method 700 examines the phase infor-
`mation for every frequency component against an IPD
`psychoacoustic threshold, expressed in, for example, equa-
`tion (7) and illustrated in block 750, to detect the presence
`of encoded data masked into the audio signal. In block 760,
`
`55
`
`60
`
`65
`
`8
`the method 700 decodes the encoded information corre-
`sponding to the data masked into the audio signal according
`to the following process:
`[phase [X,(f)|-phase[Xp,(]=<rUPDmax(f)logical 0
`
`(8.1)
`
`FUEPDnax(f)<Iphase[X,(f)|-phase[Xp(f]E<talPDmaf)
`logical 1
`
`[phase [X,(f)|-phase[X,(/)|>rfPD,,._,(f) no encoding
`
`(8.2)
`
`(8.3)
`
`Constants r, and r, in equations (8.1), (8.2) and(8.3) specify
`the ranges of phase differences used in the decoding process
`to extract logical 0, logical 1 or to indicate that no encoding
`was included in the particular frequency component under
`examination. In an embodiment, r,=" and r,=%4 were used.
`In this embodiment of method 700,
`the left channel
`remains unchangedandit is imperceptibly different from the
`“original” left channel presented to the encoder, while the
`right channel has been used as the reference channel in the
`process and it is quantitatively and perceptually the same as
`that presented to the encoder. The decoded data in block 760
`is identical to the data provided to the encoderin block 650.
`As will be apparent to those skilled in the art, a variant of
`method 700 may equally be applied to decode the right
`channel where the left channel is used as reference. Addi-
`tionally, to those skilled in the art, a variant of the method
`700 may be applied to decode both the left and right
`channels. Moreover, the method 700 may be applied to just
`one channel or extended to m