`
`a2) United States Patent
`US 6,996,521 B2
`(10) Patent No.:
`
`(45) Date of Patent: Feb. 7, 2006
`Tliev et al.
`
`(54) AUXILIARY CHANNEL MASKINGIN AN
`AUDIO SIGNAL
`
`WO
`
`WO 00/04662
`
`1/2000
`
`OTHER PUBLICATIONS
`
`(75)
`
`Inventors: AlexanderI. Iliev, Miami, FL (US);
`MichaelS. Scordilis, Miami, FL (US)
`
`(73) Assignee: The University of Miami, Miami, FL
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`USS.C. 154(b) by 839 days.
`
`(21) Appl. No.: 09/969,615
`
`(22)
`
`Filed:
`
`Oct. 4, 2001
`
`International Search Report for Application No. PCT/US
`01/31214, dated May 28, 2002.
`XP-001076669, David R. Perrott et al., “Minimum Audible
`Angle Thresholds for Sources Varying in Both Elevation and
`Azimuth”, J. Acoust. Soc. Am., vol. 87 No. 4, Apr. 1990, pp.
`1728-1731.
`
`(Continued)
`
`Primary Examiner—wW.R. Young
`Assistant Examiner—Jakieda Jackson
`(74) Attorney, Agent, or Firm—Christopher & Weisberg,
`PA.
`
`(65)
`
`Prior Publication Data
`
`(57)
`
`ABSTRACT
`
`US 2002/0059059 Al
`
`May 16, 2002
`
`Related U.S. Application Data
`
`(51)
`
`A method is provided for embedding data into an audio
`signal and determining data embeddedinto an audio signal.
`In the method for embedding data into an audio signal, the
`(60) Provisional application No. 60/238,009,filed on Oct.
`audio signal is based onafirst set of data and includes a
`6, 2000.
`phase component. The method modifies at least a portion of
`the phase componentof the audio signal to embed a second
`Int. Cl.
`set of data into the audio signal. The modified audio signal
`(2006.01)
`GOLL 11/00
`can be made to differ with respect to the audio signal in a
`(52) US. Che eeececssssseeceee 704/200; 704/200.1; 704/230;
`mannerat least oneof (i) substantially imperceptible and (ii)
`375/240.03
`imperceptible to a listener of the first set of data depending
`(58) Field of Classification Search ................ 704/200,
`on the extent that the phase componentof the audio signal
`704/200.1, 230; 128/746; 395/2.14; 375/240.03
`is modified. In the method for determining data embedded
`See application file for complete search history.
`into an audio signal, the audio signal is based onafirst set
`of data of an original audio signal and includes a phase
`component. The method determines a second set of data
`embedded into the audio signal based on the phase compo-
`nent of the audio signal. The audio signal differs with respect
`to the original audio signal in a mannerthatis at least one
`of (i) substantially imperceptible and (ii) imperceptible to a
`listener of the first set of data.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,645,074 A *
`7/1997 Shennib et al. «1.0.0.0... 600/559
`5,682,461 A * 10/1997 Silzle etal... 704/205
`
`5/2004 Lainemaetal. ....... 375/240.03
`6,738,423 B1*
`FOREIGN PATENT DOCUMENTS
`
`WO
`
`WO 99/11020
`
`3/1999
`
`15 Claims, 11 Drawing Sheets
`
`Original Siereo Audio Signal
`
`Data to be
`
`Masked
`
`
`
`
`
` :
`
`
`
`16 bit Quantization
`
`
`605
`:
`1
`620
`Right
`:
`aint
`ETP
`
`
`
`Chem!
`F>] N-point FET
`
`655
`t
`‘
`Masked Channel
`
`i
`,
`Encoding
`
`660
`Magnitude and
`tity N-point FFT
`!
`Phase Spectrum
`
`1 N-point
`sO
`:
`625
`630
`
`6
`IFFT
`Alter
`665
`
`Erroneous Frequency
`Coraponents
`
`
`
`635
`Magnitude and
`phase Spectrum
`
`+-—P1
`645
`
`
`
`Psychoacoustical
`Phase Threshold
`
`640
`
`;
`Comparison
`& Detection +—p
`
`: '
`
`'
`:
`'
`'
`!
`
`610
`
`Left
`Channel
`
`New Left
`Channel
`
`
`
`
`
`
`
`
`
`Sony Exhibit 1027
`Sony Exhibit 1027
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`
`
`US 6,996,521 B2
`
`Page 2
`
`OTHER PUBLICATIONS
`
`XP-001076670, Armin Kohlrausch, “Binaural Masking
`Experiments Using Noise Maskers With Frequency-Depen-
`dent Interaural Phase Differences.II: Influence of Frequency
`and Interaural-Phase Uncertainty”, J. Acoust. Soc. Am., vol.
`88 No. 4, Oct. 1990, pp. 1749-1756.
`XP-002197648, Tolga Cilogluet al., “An Improved All-Pass
`Watermarking Scheme for Speech and Audio”, Proceedings
`of IEEE International Conference on Multimedia and Expo,
`
`New York, NY, USA,Jul. 30, 2000, vol. 2, pp. 1017-1020.
`XP-000635079, W. Bender et al., “Techniques for Data
`Iliding”, IBM Systems Journal, vol. 35, Nos. 3 and 4, 1996,
`pp. 313-335.
`John F. Tilki et al., “Encoding a Hidden Auxiliary Channel
`Onto a Digital Audio Signal Using Psychoacoustic Mask-
`ing”, Proceedings of IEEE Southeastoon °97, Blacksburg,
`VA, USA Apr. 12-14, 1997, pp. 331-363.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 1 of 11
`
`US 6,996,521 B2
`
`ousgNazy
`T‘Did
`
`§-~~--------- =_
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 2 of 11
`
`US 6,996,521 B2
`
`009:oy8ue
`
`1{
`
`
`
`ourqaaizy
`
`of
`
`Vv
`
`
`
`8nonrsod[euIs11Q
`
`dqV\7006+“SOTnmmmmnennnnnOG
`
`¢Old
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 3 of 11
`
`US 6,996,521 B2
`
`
`
`SdoI8oc]ULspouYYINUAIZY
`
`¢Dl
`
`30Ut)SICT WsTY
`
`7 E
`
`QOUeISIC] Yo]
`
`co3oga
`
`U5e
`
`oohaSBaoO
`
`oOF4.
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 4 of 11
`
`US 6,996,521 B2
`
`oT€StOfFTZTI8090FOZO0feempeeeheieh0c
`
`
`
`(zy)Aousnbary
`
`byOld
`
`''t4b
`
`'1|||
`
`k
`
`ww a fale we hn foede we ne=
`
`’'t!!'tii\
`
`>o
`
`S—
`
`
`
`
`
`sooiZaq¢pure‘Zz‘7Josajsuyynuizysmog
`
`we wee dee wwe abe meee
`
`meso yr mrtotes
`
`F'''I ')i4
`
`I1‘‘tt+'trI‘'>t'i'Jay!ai'{'
`
`(Cla{) souls asey_ [emMesayuyl
`
`
`
`
`US 6,996,521 B2
`
`n
`2
`ey
`
`U.S. Patent
`
`Feb.7, 2006
`
`Sheet 5 of 11
`
`i g
`
`oa
`
`7OD0I o
`
`a
`co
`
`3
`
`oS
`|on]
`La
`
`om
`
`™N
`(aauGap ynuze} Vy
`
`_
`
`oS
`
`Qo
`mo
`oN
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 6 of 11
`
`US 6,996,521 B2
`
`worezyuend1q9]
`
`Ladwtod-n
`
`
`
`Aauonbaiysnosuolg
`
`sjueuodui07)
`
`JV
`
`Yo]Mon,
`
`jouuryy
`
`9“Old
`
`
`
`pouueyPosey
`
`SUIpoou
`
`uonoajaq=wostedwi07
`
`jearsnoovoyoAsd
`
`
`
`PoysompLsseyd
`
`
`
`puropryusey]
`
`
`
`umngoodgoseyd
`
`yoT
`
`pouueyy
`
`aq0}eed
`
`pose
`
`eeeere
`
`puespnyuseyy
`
`
`
`wingjoadgaseyd
`
`IYsIYy
`
`yauueyy
`
`
`
`jeusigorpnyoarayg[eUrsuD
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 7 of 11
`
`US 6,996,521 B2
`
`oe>
`
`mnydaedgoseudsaamot|jouueyy
`
`
`purapnyrudeyyLALWIOENwary
`OTLOnOL3
`
`
`
`
`
`jeusigompny03119
`
`092
`
`pouueyD
`
`payxseyypopoosg
`
`uoqoa}9q
`uosurdi09)
`
`JeonsnosvoyoAsg
`
`
`
`pjoysamy,sseqg
`
`MOPUTAL
`
`yulod-N
`
`pueopnyrudeyy
`
`
`
`wmnysadgaseyd
`
`SEL
`
`LDid
`
`BedpaysepyWM
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 8 of 11
`
`US 6,996,521 B2
`
`008
`
`
`LNAUNOdWOD
`
`ASVHdVNIGNTIONITVNOISVdATHOTS
`
`ASVHddHLAXIGOW
`
`AHLAOLNHNOdWOD
`
`TVNDIS
`
`8OlA
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 9 of 11
`
`US 6,996,521 B2
`
`‘IVNDISVYAIHOda
`
`daqgqdddANdVLIVddOLas
`CNOOdSVANINGHLAG
`VIVGdOLHSLSaaVNOdasvad
`
`‘ITVNODISGHLOLNI
`
`6Old
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 10 of 11
`
`US 6,996,521 B2
`
`UHATSOSNVUL
`PLOT710
`aaAHOVALLINSNVAL
`
`aqOssaoOdd
`
`AYOWNHAN
`
`O1Old
`
`
`
`U.S. Patent
`
`Feb. 7, 2006
`
`Sheet 11 of 11
`
`US 6,996,521 B2
`
`ITOld
`
`
`
`US 6,996,521 B2
`
`1
`AUXILIARY CHANNEL MASKINGIN AN
`AUDIO SIGNAL
`
`This application claims the benefit of U.S. Provisional
`Application No. 60/238,009, filed Oct. 6, 2000, which is
`incorporated in this application by this reference.
`
`BACKGROUND
`
`1. Field of the Invention
`This invention generally relates to the field of signal
`processing and communications. More particularly,
`the
`present invention relates to embedding data into an audio
`signal and detecting data embedded into an audio signal.
`2. Description of Background Information
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and maybe easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`Theefficient and secure storage and distribution of digital
`audio signals are becoming issues of considerable impor-
`tance for the information revolution currently unfolding.
`The challenges of the storage and distribution of such
`signals arise particularly from the digital nature of modern
`audio. Most modern digital audio allows for the creation of
`unlimited, perfect copies and maybe easily and massively
`distributed via the Internet. Nevertheless, such digital nature
`also makes possible the adoption of intelligent techniques
`that can contribute in the control of unauthorized copying
`and distribution of multimedia information comprising
`audio. In addition, opportunities arise whereby digital audio
`may be used as a medium for the delivery of enhanced
`services and for a more gratifying audio and/or visual
`experience.
`the Internet),
`Audio delivery through a network (e.g.,
`presented as a stand-aloneservice or as part of a multimedia
`presentation, comesin a large range of perceived qualities.
`Signal quality depends on the audio content(e.g., speech and
`music), the quality of the original recording, the available
`channel bandwidth, and real-time transmission constraints.
`Real-time Internet audio usually applies to broadcasting
`services. It is generally achieved by streaming audio, which
`is decoded at a receiving workstation. Real-time transmis-
`sion requirements impose limitations on signal quality. At
`present, audio streaming delivers quality comparable to AM
`radio.
`
`By relaxing real-time constraints, new opportunities for
`services have appeared where the quality and security of the
`transmitted audio is enhanced. Such services include the
`secure downloading of CD-quality music at transmission
`rates that are too high for real-time transmission but lower
`than the CD standard. Such signal compression capitalizes
`on psychoacoustic properties of human hearing.
`Security and authentication of audio distributed over
`networks (e.g., non-homogeneous networks) is also often
`required, in addition to low bit rates that do not compromise
`
`10
`
`15
`
`20
`
`25
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`audio quality. Moreover, perceptual coding may be used for
`the insertion of new, secure information to an original audio
`signal in a way that this information remains inaudible and
`extractable by secure means. This process is generally
`referred to as watermarking.
`Simultaneous frequency masking is used to implement
`perceptual coding and transparent watermarking in digital
`audio. Frequency masking is a property of hearing that
`renders audio signal components in a frequency region
`inaudible if a component of higher energy is in the same
`vicinity. The ability of the dominant component to mask
`others dependson its relative energy and on its proximity to
`the other audio signal components. In addition to simulta-
`neous frequency masking,
`temporal masking is used to
`reduce pre-echoes and post-echoes resulting from signal
`processing.
`While masking in the power spectrum of auditory signals
`dominates audio coding and watermarking techniques, the
`phase information has not been involved to date (see, e.g.,
`Nyquist & Brand, Measurements ofPhase Distortion, BELL
`SYS. TECH. J., Vol. 7, 522-49 (1930); D. Preis, Phase and
`Phase Equalization in Audio Signal Processing—A Tutorial
`Review, J. AUDIO ENGINEERING SOCIETY, Vol. 30, No.
`11, 774-94 (1982)).
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts relative phase and intensity differences due
`to a source located on an azimuth plane.
`FIG. 2 depicts a sound source on an azimuth plane.
`FIG. 3 depicts, for a sound source distance of r=5 m,plots
`of distances between a sound source and each of twoears,
`and a plot of the differences of the distances between the
`sound source and of each of the two ears.
`FIG. 4 depicts interaural phase differences (in degrees)
`plotted against frequency for azimuth angles of 1, 2, and 3
`degrees.
`FIG. 5 depicts minimum audible angle values as a func-
`tion of frequency for zero azimuth and zero elevation.
`FIG. 6 depicts an embodiment of a method for an encoder
`of an auxiliary channel.
`FIG. 7 depicts an embodimentof a method for a decoder
`of an auxiliary channel.
`FIG. 8 depicts an embodimentof a method for embedding
`data into an audio signal.
`FIG. 9 depicts an embodiment of a method for determin-
`ing data embedded into an audio signal.
`FIG. 10 depicts an embodiment of an apparatus for
`embedding data into an audio signal and/or determining data
`embedded into an audio signal.
`FIG. 11 depicts an embodiment of a machine-readable
`medium having encoded information, which when read and
`executed by a machine causes a method for embedding data
`into an audio signal and/or determining data embedded into
`an audio signal.
`
`DETAILED DESCRIPTION
`
`An embodimentof a method allowsfor signal processing
`to include information, such as an auxiliary channel, in an
`audio signal (e.g., a stereo audio signal), in a mannerthat is
`not perceptible by listening. This included information does
`not limit the capacity of data that the original, unmodified
`audio signal may contain. The method, for example, uses
`principles of binaural hearing, and the minimum audible
`angle (“MAA”) in particular, which is the minimum detect-
`able angular displacement of a sound source in auditory
`
`
`
`US 6,996,521 B2
`
`3
`space. The method then varies a phase spectrum (e.g.,
`short-time phase spectrum)of an audio signal within a range
`controlled by the MAA to encode information (e.g., digital
`information) in various forms(e.g., text, images, speech, and
`music) into the phase spectrum of the audio signal.
`This method is simpler to compute and to implementthan
`simultaneous frequency masking in the power spectrum,
`mentioned above, and allows for simultaneous encoding of
`a masked digital multimedia signal. The method allows, in
`effect, for the “hiding” (e.g., masking) of considerably more
`information in an audio signal than simultaneous frequency
`masking. Also, the method may allow for the inclusion of
`audibly imperceptible parallel (e.g., information related to
`the audio signal) and/or secure (e.g., encrypted) information
`in an audio signal.
`As used herein, the term audio signal encompasses any
`type of signal comprising audio. In addition to including
`traditional stand-alone audio,
`the term audio signal also
`encompasses any audio that is a component of a signal
`including other types of data. For example, the term audio
`signal as used herein extends to audio components of
`multimedia signals, of video signals, etc. Furthermore, as
`used herein, an auxiliary channel is simply a form of data or
`information that may be embedded into an audio signal
`and/or detected as embedded in an audio signal. While the
`information in an auxiliary channel as used herein may be in
`stream format such as audio or video, the information and
`data that may be embedded and/or detected in an audio
`signal may also be in non-stream format such as one or more
`images or items oftext.
`The detailed description refers to the accompanying draw-
`ings that illustrate embodiments of the present invention.
`Other embodiments are possible and modifications may be
`made to the embodiments without departing from the spirit
`and scope of the invention. Therefore, the detailed descrip-
`tion is not meant to limit the invention. Rather the scope of
`the invention is defined by the appended claims, and their
`equivalents.
`
`Embodiment of a Binaural Hearing Phase Tolerance Model
`Binaural Phase Information in Sound Source Localization
`
`To estimate direction and distance (ie., location) of a
`sound source, a listener uses binaural (both ears) audible
`information. This may be achieved by the brain processing
`binaural differential information and includes:
`
`interaural phase or time difference (IPD/ITD);
`interaural intensity or loudness difference (IID/ILD); and
`spectral notches, whose locations depend on the elevation
`angle of incidence of the sound wavefront.
`FIG. 1 depicts an example of a sound source located on
`the azimuth plane and of plausible audio signal segments
`arriving at a listener’s ears. The sound source in FIG. 1 is
`closer to the right ear and as a result sound from the sound
`source arrives at the right ear earlier than the left ear. Since
`all sounds travel with equal speed in space, the frequency-
`dependent
`time difference perceived is in the interaural
`phase difference or IPD. For a source at a fixed distance, a
`minimum angular movement maybe detectable by listening.
`This MAA may be dependent on elevation, azimuth, and
`frequency. The changes in phase and intensity may vary as
`a sound source is moved around the head ofa listener.
`
`The IIDALD is another type of binaural difference per-
`ceived. In FIG. 1, the audio signalat the right ear has higher
`intensity or loudnessthan the audio signalarrivingat the left
`ear because of the inverse square distance law applying to
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`spherical wave propagation in free space, as well as the
`contribution of the acoustic shadow ofthe head falling on
`the left hemisphere.
`
`MAA/IPD Relationship
`FIG. 2 illustrates a geometrical relationship of parameters
`related to a sound source located on the azimuth plane
`comprising a listener’s ears (i.e., a horizontal plane). The
`MAA playsa significant role in sound localization. In FIG.
`2,9 is an azimuth angle, r is a distance from the sound source
`to the center of the listener’s head, and d is an interaural
`distance. The distance of the source from the right ear is Ar
`and from the left ear is Al, whereas Adis their difference,
`which may be expressed as:
`
`: _ay
`2
`2
`Ar = (rxcos6)” +| r «sin
`3
`
`> and
`
`dy
`:
`2
`2
`AF = (rxcos0y + resind + > , 80
`
`Ad=Ar-Al
`
`(3)
`
`FIG. 3 depicts a plot of exemplary Ar, Al and Ad for:
`source distance of r=5 meters, interaural distance of d=0.17
`m (ie., a typical interaural distance for an adult listener),
`zero elevation, and azimuth angle changing over a complete
`revolution around the listener. Ad is independent of source
`distance (see above). The IPD is a function of frequency and
`may be expressed as:
`
`d ~ad-(4) «360° or d= ade(L]s20 radiians,
`c
`c
`
`where ® is the resulting IPD, f is the frequency of a sound
`source, and c is the speed of sound in air. FIG. 4 illustrates
`a plot of ® for azimuth angles of 1°, 2° and 3°, where c=344
`m/s.
`
`the IPD detectable by the human
`On the one hand,
`auditory system for a source moving on the azimuth planeis
`a function of the MAA,as expressed in equations (1) to (4),
`and depicted on FIG. 4. On the other hand, the MAA is a
`function of source location and frequency, with highest
`sensitivity corresponding to source movements confined on
`the azimuth plane, such as, for example,
`in a forward
`direction (e.g., azimuth angle, 8=0°) (see W. A. Yost, FUN-
`DAMENTALSOF HEARING(1993)). FIG. 5 depicts a plot
`of the MAA asa function of frequency for 6=0°.
`FIG. 5 illustrates that, in the most sensitive region of the
`acoustic space (e.g., zero azimuth and zero elevation), the
`MAA ranges from about 1° to 2.5°. Angular values may be
`smallest for frequencies below 1 kHz and increase for higher
`frequencies. If an MAA of 1° is assumed(the conservative,
`worst case selection), then the resulting maximum imper-
`ceptible IPD may be expressedas:
`
`IPDyax=—3-104E-3*f(degrees)
`
`(5)
`
`As such, the maximum IPD values may range from 0° at DC
`(i.e., 0 Hz) to -68.3° at 22 kHz. Source movements that
`result in IPD values within that frequency-dependent upper
`bound maynotbe detectable by the human auditory system.
`
`
`
`US 6,996,521 B2
`
`5
`Inaudible Random Phase Distortion for Stereo Audio
`Signals
`The analysis described above is based on a sound source
`emitting a pure sinusoidal tone and localized by binaural
`hearing. For a group of sound sources, the MAA and IPD
`results would be valid for such sources emitting the same
`tone. Principles of linearity and superposition suggestthat a
`group of sound sources emitting identical pure tones at the
`same loudness levels may not be able to be distinguished
`into individual sound sources provided that their locations
`are within an MAA corresponding to their spatial location.
`As such,a pair of identical sound sources will be perceived
`to be fused to a single sound source if their separation is
`smaller than the corresponding MAA ofthe region contain-
`ing such sound sources or if the resulting IPD is below
`computed maximum limits (i.e., a threshold).
`In an experiment, a stereo audio signal consisting of
`identical tones was synthesized and an imageof one channel
`was movedbyincreasing a phase difference between the two
`channels. Listening tests confirmed that IPDs corresponding
`to an MAA of between 1° and 2° were not detectable. Such
`observations were in agreement with results reported in the
`relevantscientific literature (see, e.g., W. A. Yost, FUNDA-
`MENTALS OF HEARING(1993)).
`Aset of experiments was then conducted to determine the
`extent to which the principles of linearity and superposition
`apply in a case of complex acoustic stimuli, as opposed to
`a case of pure tones. Using Fourier analysis (e.g., using the
`Fast Fourier Transform (FFT) algorithm), audio signals may
`be expressed as a weighted sum of sinusoids at different
`oscillating frequencies and different phase values.
`Short-time Fourier analysis, for example, was performed
`on speech and musicstereo audio signals sampled at 44,100
`Hz. FFT was applied on 1024-point rectangular windows.
`The resulting frequency components located at about each
`21.5 Hz apart were considered as independent stimuli. The
`FFT algorithm provided the phase value of each component
`in modulo (2) form. Because 2x rotation of a sound source
`on a particular plane corresponds to one full rotation, the
`phase was not unwrapped toits principal value. The number
`of rotations a source may have madeis of no consequence.
`The cosine of a phase difference between right and left
`channels (e.g., a stereo pair) wasusedto test the correspond-
`ing IPD. When
`
`cos [(phase(right, f)-phase(left, f,)]><cos(-3.104E-3
`“fy
`
`(6)
`
`wheref, is frequency samples from 0 Hz to 44,100/2 Hz, and
`phase is in degrees, the phase information of the stereo pair
`at f; was considered blurred. All such components were
`identified on a short-time basis, their right channel wasleft
`intact, while the phase of their left channel at f, was ran-
`domly changed up to the value of IPD,,,,. corresponding to
`f,. The altered stereo audio signal was resynthesized through
`an Inverse Fast Fourier Transform (IFFT) algorithm. Lis-
`tening tests, where subjects were presented with the original
`and the processed stereo audio signals, revealed that it was
`not possible to distinguish between the two stereo audio
`signals. Thus,the linearity and superposition principles were
`proven to be valid for the given conditions;
`thereby, the
`results for pure tone audio signals may be extended to
`complex acoustic stimuli.
`Listening tests for MAA of 2° and 3° were also performed
`with and various types of audio selected as the masker audio
`signal and with broadband noise being the data, in the form
`of another audio signal, masked into the masker signal.
`
`10
`
`15
`
`20
`
`25
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`When 3° was used for the MAA,the affected changes were
`perceivable for all masker audio signals andall listeners.
`When 2° was used for the MAA,the change to the audio
`signal remained nearly unnoticeable for rock music, and
`somewhat audible for speech and classical music. For the
`case of 6=1°, however, the broadband noise was success-
`fully masked for all masker audio signals and all listeners,
`confirming that @=1° as a possible maximum unnoticeable
`angular displacement of the sound source from the median
`plane.
`Having extended the MAA results for azimuth angles to
`complex acoustic stimuli and determined that the phase
`spectrum of audio signals may be randomly disturbed within
`the IPD bounds resulting from the MAA, masking mean-
`ingful
`information into an audio signal was performed.
`Frequency components having an IPD below an audible
`threshold may be identified and set to zero to achieve signal
`compression. Also, new information maybe included as an
`auxiliary channel that is not audibly perceptible to a listener
`(e.g., watermarked). The auxiliary channel may be made
`further secure by encryption.
`
`Embodiment of an Encoder of an Auxiliary Channel
`FIG. 6 depicts an embodiment of a method 600 for an
`encoder of a masked audio signal. In block 605, the method
`600 receives a right channel of an audio signal (e.g., a
`CD-quality stereo audio signal). In block 610, the method
`600 receives a left channel of the audio signal. The method
`600 may perform a complete analysis-resynthesis loop, and
`may apply N-point rectangular windowsto the left channel
`and to the right channelto start a short-time analysis of the
`audio signal, as block 615 illustrates. In block 620, a first
`FFT algorithm computes a magnitude spectrum and a phase
`spectrum of the right channel, illustrated in block 635. In
`block 625, a second FFT algorithm computes a magnitude
`spectrum and a phase spectrum of the left channel, illus-
`trated in block 630. In an embodiment, a 1024-point rect-
`angular window (e.g., N=1024) is applied.
`In block 640, the method 600 compares the phase differ-
`ence betweenthe left channel and the nght channel for each
`frequency componentagainst an IPD psychoacoustic thresh-
`old, expressed in, for example, equation (6) where the
`MAA=1° andillustrated in block 645. Phase components
`outside the threshold may be left untouched and passed on
`for synthesis. The remaining components are part of the
`encoding space.
`In block 650, method 600 receives data to be masked into
`the audio signal. For the case of encoding a single-bit-per-
`frequency-component whenever a logical zero is being
`encoded, for example, the phase values of the left channel
`and the right channel may be made equal. For the case of
`logical one being encoded, for example, the phase difference
`between the two channels may be madeless or equalto the
`maximum permissible IPD for that frequency component.
`The method 600 may use a 1-bit encoding process as
`follows:
`
`phase[X,(f\]=phase[X,(/)] logical 0
`
`phase[X,(f)|=AIPD,,,,,(f)logical 1
`
`phase[X,(f\]>=IPD,,,,,({f)no encoding
`
`(7.1)
`
`(7.2)
`
`(7.3)
`
`The approachtaken in this processis to use the right channel
`as reference and to alter the phase of the left channel.
`Constant k in equation (7.2) specifies the amount of phase
`difference within the IPD threshold which would denote a
`
`logical one. In an embodiment, k= was used.
`
`
`
`US 6,996,521 B2
`
`7
`In block 655, the method 600 collects all the frequency
`componentsof the left channel, both those altered as well as
`those left unchanged by the application of the psychoacous-
`tical threshold comparison and constructs the new frequency
`representation of the left channel which now contains the
`masked data.
`
`In block 660, method 600 computes the N-point IFFT of
`the new left channel to produce its time-domain represen-
`tation. This is followed by a 16 bit quantization of the
`produced time sequence, whichis a preferred representation
`of digital audio signals (e.g., CD-quality audio).
`The effects of quantization noise on the masking process
`are tested in method 600 by employing an N-point FFT in
`block 670 that converts the obtained time sequence of the
`newleft channel backinto the frequency domain. Block 675
`compares the frequency representation of the new left chan-
`nel obtained via high-precision arithmetic and available at
`block 655 against its representation which has been sub-
`jected to 16-bit quantization in the time domain. If the
`quantization has disturbed the representation of the masked
`data then the erroneous frequency components are detected
`and rendered unusable by the masking process by making
`their phases large enough to escape encoding in the next
`round. This is achieved in block 680 by making the phase of
`the erroneous frequency components correspond to 120% of
`the IPD of that frequency location. The new phaseprofile of
`the left channel is again presented to block 655 for encoding
`the masked data via block 640. This testing cycle repeats
`until no errors are detected in the masking process. If the
`inserted data masked in a given N-point audio signal frame
`has not been altered by the quantization process and there-
`fore no errors were detected then the encoding process has
`been declared successful and the new N points of the left
`channel are presented for storage or transmission at block
`690. This encoding process continues with subsequent
`N-point frames of the original audio signal until no more
`data are left in block 650.
`
`As will be apparent to those skilled in theart, a variant of
`method 600 may equally be appliedto alter the right channel
`and use the left channel as reference. Additionally, to those
`skilled in theart, a variant of the method 600 may be applied
`to alter both the left and right channels. Moreover,
`the
`method 600 may be applied to just one channel or extended
`to more channels than just left and night channels.
`
`Embodiment of a Decoder of an Auxiliary Channel
`FIG. 7 depicts an embodiment of a method 700 for a
`decoder of a masked audio signal. In block 705, the method
`700 receives a right channel of an audio signal (e.g., a
`CD-quality stereo audio signal). In block 710, the method
`700 receives a left channel of the audio signal. The method
`700 may apply N-point rectangular windows to the left
`channelandto the right channelto start a short-time analysis
`of the audio signal, as block 715 illustrates. The value of N
`should, although not necessarily, match the corresponding
`value used during encoding, for example, in an embodiment
`N=1024. In block 720, a first FFT algorithm computes a
`magnitude spectrum and a phase spectrum of the right
`channel, illustrated in block 7390. In block 725, a second FFT
`algorithm computes a magnitude spectrum and a phase
`spectrum of the left channel, illustrated in block 735.
`In block 740, the method 700 examines the phase infor-
`mation for every frequency component against an IPD
`psychoacoustic threshold, expressed in, for example, equa-
`tion (7) and illustrated in block 750, to detect the presence
`of encoded data masked into the audio signal. In block 760,
`
`5
`
`10
`
`25
`
`,<
`
`40
`
`50
`
`55
`
`60
`
`8
`the method 700 decodes the encoded information corre-
`sponding to the data maskedinto the audio signal according
`to the following process:
`lphase[X, ()|-phase Xp(f]E<rUPDaf)logical 0
`
`(8.1)
`
`FUPDnax(f)<phase[X,(f)|-phase Xp/)l=<rfPDmaf)
`logical 1
`
`lphase[X, (f)|-phase[X,,(f)]l>rJPD,,,.,(/) no encoding
`
`(8.2)
`
`(8.3)
`
`Constants r, and r, in equations (8.1), (8.2) and (8.3) specify
`the ranges of phase differences used in the decoding process
`to extract logical 0, logical 1 or to indicate that no encoding
`was included in the particular frequency component under
`examination. In an embodiment, r,= and r,=*%4 were used.
`In this embodiment of method 700,
`the left channel
`remains unchangedandit is imperceptibly different from the
`“original” left channel presented to the encoder, while the
`right channel has been used as the reference channel in the
`process andit is quantitatively and perceptually the same as
`that presented to the encoder. The decoded data in block 760
`is identical to the data provided to the encoder in block 650.
`As will be apparent to those skilled in the art, a variant of
`method 700 may equally be applied to decode the right
`channel where the left channel is used as reference. Addi-
`tionally, to those skilled in the art, a variant of the method
`700 may be applied to decode both the left and right
`channels. Moreover, the method 700 may be applied to just
`one channel or extended to more channels thanjustleft and
`right channels.
`
`Embodiment for Encoding and Decoding a Plurality of Bits
`into an Audio Signal
`The method in an embodiment described above concerned
`
`the encoding of a single bit per frequency component. A
`method of another embodiment, however, is provided for
`increasing the masked auxiliary channel capacity by encod-
`ing more complicated bit patterns in every suitable fre-
`quency location, in part by relying on the finding that the
`IPD threshold increases linearly with frequency, as illus-
`trated in FIG. 4. The method may encode multiple bits per
`frequency componentby setting uniform quantization levels
`in phase for frequency components that satisfy the IPD
`threshold test. The numberof quantization steps may be kept
`constant through the usable frequency range by increasing
`its size linearly with frequency following the linear increase
`of the phase threshold.
`For the case of M multiple-bits-per-frequency-component
`encoding, the IPD may be segmented into intervals equal in
`number to 2”, where M is the numberof bits to be encoded
`in a frequency component. For example,
`for M=2,
`the
`following process may be used:
`phase[X, (f)|=phase[X,(f)| word 00
`
`(9.1)
`
`phase[X,(f)]=0.25IPD,,q.(f)word 01
`
`phase [X, (f\]=0.5IPD,,af) word 10