throbber
I 1111111111111111 11111 111111111111111 11111 111111111111111 111111111111111111
`
`US006996521B2
`
`
`(12)United States Patent
`
`Iliev et al.
`
`(10)Patent No.:US 6,996,521 B2
`
`Feb.7,2006
`(45)Date of Patent:
`
`(54)AUXILIARY CHANNEL MASKING IN AN
`
`
`
`AUDIO SIGNAL
`
`WO
`
`WO 00/04662 1/2000
`
`OTHER PUBLICATIONS
`
`International Search Report for Application No. PCT/US
`
`(75)Inventors: Alexander I. Iliev, Miami, FL (US);
`
`
`
`
`
`
`Michael S. Scordilis, Miami, FL (US)
`
`
`
`01/31214, dated May 28, 2002.
`XP-001076669, David R. Perrott et al., "Minimum Audible
`
`
`
`
`
`
`
`
`
`Angle Thresholds for Sources Varying in Both Elevation and
`
`
`Azimuth", J. Acoust. Soc. Am., vol. 87 No. 4,Apr. 1990, pp.
`1728-1731.
`
`
`
`
`
`( *) Notice: Subject to any disclaimer, the term of this
`
`
`
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 839 days.
`
`(73) Assignee: The University of Miami, Miami, FL
`
`
`
`
`
`(US)
`
`(21)Appl. No.: 09/969,615
`
`
`
`(22)Filed:Oct. 4, 2001
`
`(Continued)
`
`Primary Examiner-W. R. Young
`
`
`
`Assistant Examiner-Jakieda Jackson
`
`
`
`
`(74)Attorney, Agent, or Firm-Christopher & Weisberg,
`P.A.
`
`(65)
`
`
`
`Prior Publication Data
`
`(57)
`
`ABSTRACT
`
`
`
`
`
`US 2002/0059059 Al May 16, 2002
`
`
`
`
`
`Related U.S. Application Data
`
`A method is provided for embedding data into an audio
`
`
`
`
`
`
`signal and determining data embedded into an audio signal.
`
`
`
`
`In the method for embedding data into an audio signal, the
`
`(60)Provisional application No. 60/238,009, filed on Oct.
`
`
`
`audio signal is based on a first set of data and includes a
`
`
`phase component. The method modifies at least a portion of
`
`
`the phase component of the audio signal to embed a second
`(51)Int. Cl.
`
`
`
`set of data into the audio signal. The modified audio signal
`G0IL 11/00 (2006.01)
`
`
`
`
`can be made to differ with respect to the audio signal in a
`
`
`
`
`(52)U.S. Cl. ................... 704/200; 704/200.1; 704/230;
`
`
`
`manner at least one of (i) substantially imperceptible and (ii)
`375/240.03
`
`
`
`imperceptible to a listener of the first set of data depending
`
`
`(58) Field of Classification Search ................ 704/200,
`
`
`
`on the extent that the phase component of the audio signal
`
`
`
`704/200.1, 230; 128/746; 395/2.14; 375/240.03
`
`is modified. In the method for determining data embedded
`
`
`See application file for complete search history.
`
`
`
`
`into an audio signal, the audio signal is based on a first set
`
`
`of data of an original audio signal and includes a phase
`
`
`component. The method determines a second set of data
`
`
`
`
`embedded into the audio signal based on the phase compo­
`U.S. PATENT DOCUMENTS
`
`
`
`
`
`nent of the audio signal. The audio signal differs with respect
`
`
`5,645,074 A * 7/1997 Shennib et al. ............. 600/559
`
`
`
`
`to the original audio signal in a manner that is at least one
`
`
`
`
`5,682,461 A * 10/1997 Silzle et al. ................ 704/205
`
`
`
`
`of (i) substantially imperceptible and (ii) imperceptible to a
`
`
`
`6,738,423 Bl * 5/2004 Lainema et al. ....... 375/240.03
`
`
`listener of the first set of data.
`
`
`
`6,2000.
`
`(56)
`
`
`
`References Cited
`
`WO
`
`FOREIGN PATENT DOCUMENTS
`WO 99/11020 3/1999
`
`-- - -- - - - -----t�\----
`�
`
`
`Original Stereo Audio Signal
`
`
`
`15 Claims, 11 Drawing Sheets
`
`
`
`-- - - - - -
`650
`Data to be
`Masked
`
`605
`Right
`Channel
`
`610
`Left
`Channel
`
`635
`Magnitude and
`
`Phase Spectrum
`
`Psychoacoustical
`655
`Comparison
`Phase Threshold
`Masked Channel
`&Detection
`Encoding
`
`Magnitude and
`
`Phase Spectrum
`630
`
`680
`
`Erroneous Frequency
`
`Components
`675
`
`Yes
`'------------ZErrors?
`
`N-pointFFT
`
`No
`
`690
`NewLeft
`Channel
`
`MZ Audio, Ex. 2008, Page 1 of 22
`
`

`

`US 6,996,521 B2
`
`Page 2
`
`OTHER PUBLICATIONS
`
`XP-001076670, Armin Kohlrausch, “Binaural Masking
`Experiments Using Noise Maskers With Frequency-Depen-
`dent Interaural Phase Differences.II: Influence of Frequency
`and Interaural-Phase Uncertainty”, J. Acoust. Soc. Am., vol.
`88 No. 4, Oct. 1990, pp. 1749-1756.
`XP-002197648, Tolga Cilogluet al., “An Improved All-Pass
`Watermarking Scheme for Speech and Audio”, Proceedings
`of IEEE International Conference on Multimedia and Expo,
`
`New York, NY, USA, Jul. 30, 2000, vol. 2, pp. 1017-1020.
`XP-000635079, W. Bender et al., “Techniques for Data
`Hiding”, IBM Systems Journal, vol. 35, Nos. 3 and 4, 1996,
`pp. 313-335.
`John F. Tilki et al., “Encoding a Hidden Auxiliary Channel
`Onto a Digital Audio Signal Using Psychoacoustic Mask-
`ing”, Proceedings of IEEE Southeastoon °97, Blacksburg,
`VA, USA Apr. 12-14, 1997, pp. 331-363.
`
`* cited by examiner
`
`MZ Audio, Ex. 2008, Page 2 of 22
`
`MZ Audio, Ex. 2008, Page 2 of 22
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 1 of 11
`
`US 6,996,521 B2
`
`5A
`
`Se
`
`FIG.| Azimuth
`Plane
`
`UN
`OT
`
`Left
`
`MZ Audio, Ex. 2008, Page 3 of 22
`
`MZ Audio, Ex. 2008, Page 3 of 22
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 2 of 11
`
`US 6,996,521 B2
`
`006+
`
`q---~-~~..-----~.....
`
`qaMizy
`
`¢Old sued
`
`aq
`
`MZ Audio, Ex. 2008, Page 4 of 22
`
`MZ Audio, Ex. 2008, Page 4 of 22
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 3 of 11
`
`US 6,996,521 B2
`
`sooIsaqqUlsouYINUIZY
`
`€DI
`
`i E 20ue)sIcT mary
`
`SOURISIC 9]
`
`@a
`
`oga
`
`Oo5;ot
`
`oOao
`
`wmCaaSs
`0bah
`
`MZ Audio, Ex. 2008, Page 5 of 22
`
`MZ Audio, Ex. 2008, Page 5 of 22
`
`

`

`U.S. Patent
`
`wee ee eh ee le et ef ee
`
`Nfeepeeeeteteeeee'(II1-fo|Pot|iIiiia4LweedeneheedoneiiiiI|iiioe}itd1
`-roota='It'=iii}+—_—oooeeeeekeeeeeeraoon47r_—
`
`
`oSi‘irs4i_--SJ---~-i----4-----
`Dpechoedeee=+a2ii:3peeeebeedeLee4.---1----
`CSTCcg'T9TiclT8090790ZO
`
`tia1}e!ii]'tNOenetmeposterfersoneToor
`~i
`—it1gPot
`ainK(z)Aousnbary
`a.rOld
`—'I’i=1i4——1i
`
`
`
`
`
`
`so0i8aq¢pure‘Zz‘]Josajduynwizysomos
`
`Ov
`
`Q9
`
`(dI) soUsIATAC aseyg jeInevssya]
`
`MZ Audio, Ex. 2008, Page 6 of 22
`
`MZ Audio, Ex. 2008, Page 6 of 22
`
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 5 of 11
`
`gO
`a
`
`:
`uy
`
`™N
`_
`(aadap ynwuize} Vy
`
`oS
`
`MZ Audio, Ex. 2008, Page 7 of 22
`
`US 6,996,521 B2 o
`
`”
`S
`pa
`
`Qo
`oO
`ae
`3

`gi
`
`ao
`Oo
`ko
`
`aT
`
`f
`CN
`
`MZ Audio, Ex. 2008, Page 7 of 22
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 6 of 11
`
`US 6,996,521 B2
`
`aq0}eed
`
`pose
`
`worezyuendq9]
`
`Ladwrod-N
`
`
`
`Aouonboiysnosuolg
`
`syuauodut07)
`
`JV
`
`Yo]MON
`
`jeuueyy
`
`9“Old
`
`
`
`pouueypoyseW
`
`SUIpoou
`
`uonoajaq=wostedwi07
`
`jeoysnoseoyoAsg
`
`
`
`Pyoyserp],sey
`
`MOPULA,
`
`yurod-Ny
`
`
`
`pureopryruseyy]
`
`
`
`umngoodgoseyd
`
`yoT
`
`pouueyy
`
`MZ Audio, Ex. 2008, Page 8 of 22
`
`pueapnyuseyy
`
`
`
`winsjoadgaseyd
`
`IYsIYy
`
`yauueyy
`
`eeeeeee
`
`
`
`[eusigorpnyoarzayg[eursuO
`
`
`
`
`
`MZ Audio, Ex. 2008, Page 8 of 22
`
`
`
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 7 of 11
`
`US 6,996,521 B2
`
`092
`
`pouueyD
`
`payxseyypopoosg
`
`uoWVIaIaq¥uosirdii09
`
`JeonsnosvoyoAsg
`
`
`
`pjoysemy,aseyg
`
`puespnyusey]
`
`
`
`wmnysadgaseyd
`
`StL
`
`LDJA
`
`oolay¢$$008
`
`
`mnydadgoseygsaamotjouueyy
`
`
`purapmyruseyyLidWONwary
`O@LOn906:
`
`
`
`jeusigoipny
`
`BedpoysepyWIA
`
`MZ Audio, Ex. 2008, Page 9 of 22
`
`MZ Audio, Ex. 2008, Page 9 of 22
`
`
`
`
`

`

`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 8 of 11
`
`US 6,996,521 B2
`
`008
`
`
`LINANOdWOD
`
`ASVHdVNIGNTONITVNOISVSATBOXS|
`
`
`
`ASVHdJHLASIGOWN
`
`AHLAOLNHNOdWOO
`
`TVNDIS
`
`8Ol4
`
`MZ Audio, Ex. 2008, Page 10 of 22
`
`MZ Audio, Ex. 2008, Page 10 of 22
`
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 9 of 11
`
`US 6,996,521 B2
`
`IVNOISVSATHORN
`
`daqdsv4dWaVIVG4OLAS
`CQNOOdSVANIALad
`VIVdHOLHSLSVNOdaSsvd
`
`‘IVNDISAHLOLNI
`
`6Old
`
`MZ Audio, Ex. 2008, Page 11 of 22
`
`MZ Audio, Ex. 2008, Page 11 of 22
`
`

`

`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 10 of 11
`
`US 6,996,521 B2
`
`Q12
`
`RECEIVER
`TRANSMITTER
`
`PROCESSOR
`
`MEMORY
`
`FIG.10
`
`MZ Audio, Ex. 2008, Page 12 of 22
`
`me
`fx}
`
`2a
`
`a
`Yo
`
`:&
`
`MZ Audio, Ex. 2008, Page 12 of 22
`
`

`

`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 11 of 11
`
`US 6,996,521 B2
`
`ITOld
`
`MZ Audio, Ex. 2008, Page 13 of 22
`
`MZ Audio, Ex. 2008, Page 13 of 22
`
`

`

`US 6,996,521 B2
`
`1
`AUXILIARY CHANNEL MASKING IN AN
`AUDIO SIGNAL
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/238,009, filed Oct. 6, 2000, which is
`incorporated in this application by this reference.
`
`BACKGROUND
`
`1. Field of the Invention
`This invention generally relates to the field of signal
`processing and communications. More particularly,
`the
`present invention relates to embedding data into an audio
`signal and detecting data embedded into an audio signal.
`2. Description of Background Information
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and may be easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and may be easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`the Internet),
`Audio delivery through a network (e.g.,
`presented as a stand-aloneservice or as part of a multimedia
`presentation, comes in a large range of perceived qualities.
`Signal quality dependson the audio content (e.g., speech and
`music), the quality of the original recording, the available
`channel bandwidth, and real-time transmission constraints.
`Real-time Internet audio usually applies to broadcasting
`services. It is generally achieved by streaming audio, which
`is decoded at a receiving workstation. Real-time transmis-
`sion requirements impose limitations on signal quality. At
`present, audio streaming delivers quality comparable to AM
`radio.
`
`By relaxing real-time constraints, new opportunities for
`services have appeared where the quality and security of the
`transmitted audio is enhanced. Such services include the
`secure downloading of CD-quality music at transmission
`rates that are too high for real-time transmission but lower
`than the CD standard. Such signal compression capitalizes
`on psychoacoustic properties of human hearing.
`Security and authentication of audio distributed over
`networks (e.g., non-homogeneous networks) is also often
`required, in addition to low bit rates that do not compromise
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`audio quality. Moreover, perceptual coding may be used for
`the insertion of new, secure information to an original audio
`signal in a waythat this information remains inaudible and
`extractable by secure means. This process is generally
`referred to as watermarking.
`Simultaneous frequency masking is used to implement
`perceptual coding and transparent watermarking in digital
`audio. Frequency masking is a property of hearing that
`renders audio signal components in a frequency region
`inaudible if a component of higher energy is in the same
`vicinity. The ability of the dominant component to mask
`others dependsonits relative energy and on its proximity to
`the other audio signal components. In addition to simulta-
`neous frequency masking,
`temporal masking is used to
`reduce pre-echoes and post-echoes resulting from signal
`processing.
`While masking in the power spectrum of auditory signals
`dominates audio coding and watermarking techniques, the
`phase information has not been involved to date(see, e.g.,
`Nyquist & Brand, Measurements ofPhase Distortion, BELL
`SYS. TECH. J., Vol. 7, 522-49 (1930); D. Preis, Phase and
`Phase Equalization in Audio Signal Processing—A Tutorial
`Review, J. AUDIO ENGINEERINGSOCIETY, Vol. 30, No.
`11, 774-94 (1982)).
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts relative phase and intensity differences due
`to a source located on an azimuth plane.
`FIG. 2 depicts a sound source on an azimuth plane.
`FIG. 3 depicts, for a sound source distance of r=5 m,plots
`of distances between a sound source and each of twoears,
`and a plot of the differences of the distances between the
`sound source and of each of the two ears.
`FIG. 4 depicts interaural phase differences (in degrees)
`plotted against frequency for azimuth angles of 1, 2, and 3
`degrees.
`FIG. 5 depicts minimum audible angle values as a func-
`tion of frequency for zero azimuth and zero elevation.
`FIG. 6 depicts an embodimentof a method for an encoder
`of an auxiliary channel.
`FIG. 7 depicts an embodiment of a method for a decoder
`of an auxiliary channel.
`FIG. 8 depicts an embodimentof a method for embedding
`data into an audio signal.
`FIG. 9 depicts an embodiment of a method for determin-
`ing data embedded into an audio signal.
`FIG. 10 depicts an embodiment of an apparatus for
`embedding data into an audio signal and/or determining data
`embedded into an audio signal.
`FIG. 11 depicts an embodiment of a machine-readable
`medium having encoded information, which when read and
`executed by a machine causes a method for embedding data
`into an audio signal and/or determining data embedded into
`an audio signal.
`
`DETAILED DESCRIPTION
`
`An embodiment of a method allows for signal processing
`to include information, such as an auxiliary channel, in an
`audio signal (e.g., a stereo audio signal), in a mannerthat is
`not perceptible by listening. This included information does
`not limit the capacity of data that the original, unmodified
`audio signal may contain. The method, for example, uses
`principles of binaural hearing, and the minimum audible
`angle (“MAA”)in particular, which is the minimum detect-
`able angular displacement of a sound source in auditory
`
`MZ Audio, Ex. 2008, Page 14 of 22
`
`MZ Audio, Ex. 2008, Page 14 of 22
`
`

`

`US 6,996,521 B2
`
`3
`space. The method then varies a phase spectrum (e.g.,
`short-time phase spectrum) of an audio signal within a range
`controlled by the MAA to encode information(e.g., digital
`information) in various forms(e.g., text, images, speech, and
`music) into the phase spectrum of the audio signal.
`This method is simpler to compute and to implementthan
`simultaneous frequency masking in the power spectrum,
`mentioned above, and allows for simultaneous encoding of
`a masked digital multimedia signal. The method allows,in
`effect, for the “hiding” (e.g., masking) of considerably more
`information in an audio signal than simultaneous frequency
`masking. Also, the method may allow for the inclusion of
`audibly imperceptible parallel (e.g., information related to
`the audio signal) and/or secure (e.g., encrypted) information
`in an audio signal.
`As used herein, the term audio signal encompasses any
`type of signal comprising audio. In addition to including
`traditional stand-alone audio,
`the term audio signal also
`encompasses any audio that is a component of a signal
`including other types of data. For example, the term audio
`signal as used herein extends to audio components of
`multimedia signals, of video signals, etc. Furthermore, as
`used herein, an auxiliary channel is simply a form of data or
`information that may be embedded into an audio signal
`and/or detected as embedded in an audio signal. While the
`information in an auxiliary channel as used herein maybe in
`stream format such as audio or video, the information and
`data that may be embedded and/or detected in an audio
`signal mayalso be in non-stream format such as one or more
`images or itemsoftext.
`The detailed description refers to the accompanying draw-
`ings that illustrate embodiments of the present invention.
`Other embodiments are possible and modifications may be
`made to the embodiments without departing from the spirit
`and scope of the invention. Therefore, the detailed descrip-
`tion is not meantto limit the invention. Rather the scope of
`the invention is defined by the appended claims, and their
`equivalents.
`
`Embodiment of a Binaural Hearing Phase Tolerance Model
`Binaural Phase Information in Sound Source Localization
`
`To estimate direction and distance (i.e., location) of a
`sound source, a listener uses binaural (both ears) audible
`information. This may be achieved by the brain processing
`binaural differential information and includes:
`
`interaural phase or time difference (IPD/ATD);
`interaural intensity or loudness difference (IID/ILD); and
`spectral notches, whose locations depend on the elevation
`angle of incidence of the sound wavefront.
`FIG. 1 depicts an example of a sound source located on
`the azimuth plane and of plausible audio signal segments
`arriving at a listener’s ears. The sound source in FIG. 1 is
`closer to the right ear and as a result sound from the sound
`source arrives at the right ear earlier than the left ear. Since
`all sounds travel with equal speed in space, the frequency-
`dependent
`time difference perceived is in the interaural
`phase difference or IPD. For a source at a fixed distance, a
`minimum angular movement may bedetectable bylistening.
`This MAA may be dependent on elevation, azimuth, and
`frequency. The changes in phase and intensity may vary as
`a sound source is moved around the head ofa listener.
`
`The IID/ALD is another type of binaural difference per-
`ceived. In FIG. 1, the audio signalat the right ear has higher
`intensity or loudness than the audio signal arriving atthe left
`ear because of the inverse square distance law applying to
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`spherical wave propagation in free space, as well as the
`contribution of the acoustic shadow of the head falling on
`the left hemisphere.
`
`MAA/IPD Relationship
`FIG. 2 illustrates a geometricalrelationship of parameters
`related to a sound source located on the azimuth plane
`comprising a listener’s ears (i.e., a horizontal plane). The
`MAA playsa significant role in sound localization. In FIG.
`2, is an azimuth angle, r is a distance from the sound source
`to the center of the listener’s head, and d is an interaural
`distance. The distance of the source from the right ear is Ar
`and from the left ear is Al, whereas Adis their difference,
`which may be expressed as:
`
`dy
`Ar’ = (rxcos0)? + [r-sina - | , and
`dy
`AP = (r«cos0)? + [sina + | , 80
`
`Ad=Ar-Al
`
`()
`
`(2)
`
`(3)
`
`FIG. 3 depicts a plot of exemplary Ar, Al and Ad for:
`source distance of r=5 meters, interaural distance of d=0.17
`m (ie., a typical interaural distance for an adult listener),
`zero elevation, and azimuth angle changing over a complete
`revolution around the listener. Ad is independent of source
`distance (see above). The IPD is a function of frequency and
`may be expressed as:
`
`®d =ad(2} «360° or @= ado(2}x2ex radiians,
`c
`c
`
`where ® is the resulting IPD, f is the frequency of a sound
`source, and c is the speed of soundin air. FIG. 4 illustrates
`a plot of ® for azimuth angles of 1°, 2° and 3°, where c=344
`m/s.
`
`the IPD detectable by the human
`On the one hand,
`auditory system for a source moving on the azimuth plane is
`a function of the MAA,as expressed in equations(1) to (4),
`and depicted on FIG. 4. On the other hand, the MAA is a
`function of source location and frequency, with highest
`sensitivity corresponding to source movements confined on
`the azimuth plane, such as, for example,
`in a forward
`direction (e.g., azimuth angle, 8=0°) (see W. A. Yost, FUN-
`DAMENTALS OF HEARING(1993)). FIG. 5 depicts a plot
`of the MAA as a function of frequency for @=0°.
`FIG. 5 illustrates that, in the most sensitive region of the
`acoustic space (e.g., zero azimuth and zero elevation), the
`MAA rangesfrom about 1° to 2.5°. Angular values may be
`smallest for frequencies below 1 kHz and increasefor higher
`frequencies. If an MAA of 1° is assumed (the conservative,
`worst case selection), then the resulting maximum imper-
`ceptible IPD may be expressed as:
`
`IPDnax=—3-104E-3 *f(degrees)
`
`(5)
`
`As such, the maximum IPD values may range from 0° at DC
`(i.e., O Hz) to -68.3° at 22 kHz. Source movements that
`result in IPD values within that frequency-dependent upper
`bound maynot be detectable by the human auditory system.
`
`MZ Audio, Ex. 2008, Page 15 of 22
`
`MZ Audio, Ex. 2008, Page 15 of 22
`
`

`

`US 6,996,521 B2
`
`5
`Inaudible Random Phase Distortion for Stereo Audio
`Signals
`The analysis described above is based on a sound source
`emitting a pure sinusoidal tone and localized by binaural
`hearing. For a group of sound sources, the MAA and IPD
`results would be valid for such sources emitting the same
`tone. Principles of linearity and superposition suggest that a
`group of sound sources emitting identical pure tones at the
`same loudness levels may not be able to be distinguished
`into individual sound sources provided that their locations
`are within an MAA corresponding to their spatial location.
`As such,a pair of identical sound sources will be perceived
`to be fused to a single sound source if their separation is
`smaller than the corresponding MAA ofthe region contain-
`ing such sound sources or if the resulting IPD is below
`computed maximum limits (i.e., a threshold).
`In an experiment, a stereo audio signal consisting of
`identical tones was synthesized and an imageof one channel
`was movedby increasing a phase difference between the two
`channels. Listening tests confirmed that IPDs corresponding
`to an MAA ofbetween 1° and 2° were not detectable. Such
`observations were in agreement with results reported in the
`Embodiment of an Encoder of an Auxiliary Channel
`relevant scientific literature (see, e.g., W. A. Yost, FUNDA-
`FIG. 6 depicts an embodiment of a method 600 for an
`MENTALS OF HEARING(1993)).
`encoder of a masked audio signal. In block 605, the method
`Aset of experiments was then conducted to determine the
`600 receives a right channel of an audio signal (e.g., a
`extent to which the principles of linearity and superposition
`CD-quality stereo audio signal). In block 610, the method
`apply in a case of complex acoustic stimuli, as opposed to
`600 receives a left channel of the audio signal. The method
`a case of pure tones. Using Fourier analysis (e.g., using the
`600 may perform a complete analysis-resynthesis loop, and
`Fast Fourier Transform (FFT)algorithm), audio signals may
`may apply N-point rectangular windowsto the left channel
`be expressed as a weighted sum of sinusoids at different
`and to the right channel to start a short-time analysis of the
`oscillating frequencies and different phase values.
`audio signal, as block 615 illustrates. In block 620,afirst
`Short-time Fourier analysis, for example, was performed
`FFT algorithm computes a magnitude spectrum and a phase
`on speech and music stereo audio signals sampled at 44,100
`spectrum of the right channel, illustrated in block 635. In
`Hz. FFT was applied on 1024-point rectangular windows.
`block 625, a second FFT algorithm computes a magnitude
`The resulting frequency components located at about each
`spectrum and a phase spectrum of the left channel, illus-
`21.5 Hz apart were considered as independent stimuli. The
`trated in block 630. In an embodiment, a 1024-point rect-
`FFT algorithm provided the phase value of each component
`angular window (e.g., N=1024) is applied.
`in modulo (27) form. Because 2rotation of a sound source
`In block 640, the method 600 compares the phase differ-
`on a particular plane corresponds to one full rotation, the
`ence between the left channel and the right channel for each
`phase was not unwrappedto its principal value. The number
`frequency component against an IPD psychoacoustic thresh-
`of rotations a source may have madeis of no consequence.
`old, expressed in, for example, equation (6) where the
`The cosine of a phase difference between right and left
`MAA=1° and illustrated in block 645. Phase components
`channels (e.g., a stereo pair) was usedto test the correspond-
`outside the threshold may be left untouched and passed on
`ing IPD. When
`for synthesis. The remaining components are part of the
`encoding space.
`In block 650, method 600 receives data to be masked into
`the audio signal. For the case of encoding a single-bit-per-
`frequency-component whenever a logical zero is being
`encoded, for example, the phase values of the left channel
`and the right channel may be made equal. For the case of
`logical one being encoded, for example, the phase difference
`between the two channels may be made less or equal to the
`maximum permissible IPD for that frequency component.
`The method 600 may use a 1-bit encoding process as
`follows:
`
`6
`When 3° was used for the MAA,the affected changes were
`perceivable for all masker audio signals and all listeners.
`When 2° was used for the MAA,the change to the audio
`signal remained nearly unnoticeable for rock music, and
`somewhat audible for speech and classical music. For the
`case of 6=1°, however, the broadband noise was success-
`fully masked for all masker audio signals andall listeners,
`confirming that 9=1° as a possible maximum unnoticeable
`angular displacement of the sound source from the median
`plane.
`Having extended the MAA results for azimuth angles to
`complex acoustic stimuli and determined that the phase
`spectrum of audio signals may be randomly disturbed within
`the IPD boundsresulting from the MAA, masking mean-
`ingful
`information into an audio signal was performed.
`Frequency components having an IPD below an audible
`threshold may beidentified and set to zero to achieve signal
`compression. Also, new information may be included as an
`auxiliary channelthat is not audibly perceptible to a listener
`(e.g., watermarked). The auxiliary channel may be made
`further secure by encryption.
`
`10
`
`15
`
`20
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`cos [(phase(right, f)-phase(left, f)]><cos(-3.104E-3
`“fy
`
`(6)
`
`wheref, is frequency samples from 0 Hz to 44,100/2 Hz, and
`phase is in degrees, the phase information of the stereo pair
`at f; was considered blurred. All such components were
`identified on a short-time basis, their right channel wasleft
`intact, while the phase of their left channel at f, was ran-
`domly changed up to the value of IPD,,,,,. corresponding to
`f,. The altered stereo audio signal was resynthesized through
`an Inverse Fast Fourier Transform (IFFT) algorithm. Lis-
`tening tests, where subjects were presented with the original
`and the processed stereo audio signals, revealed that it was
`not possible to distinguish between the two stereo audio
`signals. Thus, the linearity and superposition principles were
`proven to be valid for the given conditions;
`thereby, the
`results for pure tone audio signals may be extended to
`complex acoustic stimull.
`Listening tests for MAA of 2° and 3° were also performed
`with and various types of audio selected as the masker audio
`signal and with broadband noise being the data, in the form
`of another audio signal, masked into the masker signal.
`
`60
`
`65
`
`phase[X, (f)|=phase[X,,(f)] logical 0
`
`phase[X, ()|=AIPD,,,,,(f)logical 1
`
`phase[X, (f)|>=/PD,,,,,(/)no encoding
`
`7D
`
`(7.2)
`
`(7.3)
`
`The approach taken in this process is to use the right channel
`as reference and to alter the phase of the left channel.
`Constant k in equation (7.2) specifies the amount of phase
`difference within the IPD threshold which would denote a
`
`logical one. In an embodiment, k= was used.
`
`MZ Audio, Ex. 2008, Page 16 of 22
`
`MZ Audio, Ex. 2008, Page 16 of 22
`
`

`

`US 6,996,521 B2
`
`7
`In block 655, the method 600 collects all the frequency
`components of the left channel, both those altered as well as
`those left unchanged bythe application of the psychoacous-
`tical threshold comparison and constructs the new frequency
`representation of the left channel which now contains the
`masked data.
`
`In block 660, method 600 computes the N-point IFFT of
`the new left channel to produce its time-domain represen-
`tation. This is followed by a 16 bit quantization of the
`produced time sequence, whichis a preferred representation
`of digital audio signals (e.g., CD-quality audio).
`The effects of quantization noise on the masking process
`are tested in method 600 by employing an N-point FFT in
`block 670 that converts the obtained time sequence of the
`new left channel back into the frequency domain. Block 675
`compares the frequency representation of the new left chan-
`nel obtained via high-precision arithmetic and available at
`block 655 against its representation which has been sub-
`jected to 16-bit quantization in the time domain. If the
`quantization has disturbed the representation of the masked
`data then the erroneous frequency components are detected
`and rendered unusable by the masking process by making
`their phases large enough to escape encoding in the next
`round. This is achieved in block 680 by makingthe phase of
`the erroneous frequency components correspond to 120% of
`the IPD of that frequency location. The new phaseprofile of
`the left channel is again presented to block 655 for encoding
`the masked data via block 640. This testing cycle repeats
`until no errors are detected in the masking process. If the
`inserted data masked in a given N-point audio signal frame
`has not been altered by the quantization process and there-
`fore no errors were detected then the encoding process has
`been declared successful and the new N points of the left
`channel are presented for storage or transmission at block
`690. This encoding process continues with subsequent
`N-point frames of the original audio signal until no more
`data are left in block 650.
`
`10
`
`25
`
`As will be apparent to those skilled in the art, a variant of
`method 600 may equally be applied to alter the right channel
`and use the left channel as reference. Additionally, to those
`skilled in the art, a variant of the method 600 maybe applied
`to alter both the left and right channels. Moreover,
`the
`method 600 may be applied to just one channel or extended
`to more channels than just left and nght channels.
`
`40
`
`Embodiment of a Decoder of an Auxiliary Channel
`FIG. 7 depicts an embodiment of a method 700 for a
`decoder of a masked audio signal. In block 705, the method
`700 receives a right channel of an audio signal (e.g., a
`CD-quality stereo audio signal). In block 710, the method
`700 receives a left channel of the audio signal. The method
`700 may apply N-point rectangular windows to the left
`channel and to the nght channelto start a short-time analysis
`of the audio signal, as block 715 illustrates. The value of N
`should, although not necessarily, match the corresponding
`value used during encoding, for example, in an embodiment
`N=1024. In block 720, a first FFT algorithm computes a
`magnitude spectrum and a phase spectrum of the right
`channel, illustrated in block 7390. In block 725, a second FFT
`algorithm computes a magnitude spectrum and a phase
`spectrum of the left channel, illustrated in block 735.
`In block 740, the method 700 examines the phase infor-
`mation for every frequency component against an IPD
`psychoacoustic threshold, expressed in, for example, equa-
`tion (7) and illustrated in block 750, to detect the presence
`of encoded data masked into the audio signal. In block 760,
`
`55
`
`60
`
`65
`
`8
`the method 700 decodes the encoded information corre-
`sponding to the data masked into the audio signal according
`to the following process:
`[phase [X,(f)|-phase[Xp,(]=<rUPDmax(f)logical 0
`
`(8.1)
`
`FUEPDnax(f)<Iphase[X,(f)|-phase[Xp(f]E<talPDmaf)
`logical 1
`
`[phase [X,(f)|-phase[X,(/)|>rfPD,,._,(f) no encoding
`
`(8.2)
`
`(8.3)
`
`Constants r, and r, in equations (8.1), (8.2) and(8.3) specify
`the ranges of phase differences used in the decoding process
`to extract logical 0, logical 1 or to indicate that no encoding
`was included in the particular frequency component under
`examination. In an embodiment, r,=" and r,=%4 were used.
`In this embodiment of method 700,
`the left channel
`remains unchangedandit is imperceptibly different from the
`“original” left channel presented to the encoder, while the
`right channel has been used as the reference channel in the
`process and it is quantitatively and perceptually the same as
`that presented to the encoder. The decoded data in block 760
`is identical to the data provided to the encoderin block 650.
`As will be apparent to those skilled in the art, a variant of
`method 700 may equally be applied to decode the right
`channel where the left channel is used as reference. Addi-
`tionally, to those skilled in the art, a variant of the method
`700 may be applied to decode both the left and right
`channels. Moreover, the method 700 may be applied to just
`one channel or extended to m

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket