`
`Jaap Haitsma, Michiel van der Veen, Ton Kalker and Fons Bruekers
`Philips Research Laboratories
`Prof. Holstlaan 4
`5656 AA Eindhoven, The Netherlands
`[jaap.haitsma][michiel.van.der.veen][ton.kalker][fons.bruekers]@ philips.com
`
`ABSTRACT
`Based on existing technology used in image and video wa-
`termarking, we have developed a robust audio watermark-
`ing technique. The embedding algorithm operates in fre-
`quency domain, where the magnitudes of the Fourier co-
`efficients are slightly modified.
`In the temporal domain,
`an additional scale parameter and gain function are neces-
`sary to refine the watermark and achieve perceptual trans-
`parency. Watermark detection relies on the Symmetrical
`Phase Only Matched Filtering (SPOMF) cross-correlation
`approach. Not only the presence of a watermark, but also
`its cyclic shift is detected. This shift supports a multi-bit
`payload for one particular watermark sequence. The water-
`marking technology proved to be very robust to a large num-
`ber of signal processing ” attacks” such as MP3 (64 kb/s),
`all-pass filtering, echo addition, time-scale modification, re-
`sampling, noise addition, etc.
`It is expected that this ap-
`proach may contribute in a wide variety of existing (e.g.
`monitoring and copy protection) and future applications.
`
`Keywords
`audio, broadcast monitoring, copy protection, watermark
`embedding, watermark detection
`
`INTRODUCTION
`1.
`A digital audio watermark is an information label, which
`is embeddedin an audio signal in an imperceptible manner.
`During the past few years a numberof new audio watermark-
`ing techniques have been developed to support applications
`such as copy control [1] [2] or broadcast monitoring [3]. Most
`of these operate in time domain and employ methods such
`as echo-hiding [4] or some kind of noise addition, exploit-
`ing temporal and/or spectral masking models of the human
`auditory system [5] [6].
`
`[7]
`Based on image and video watermarking techniques [3]
`we have developed an alternative approach to audio water-
`marking. Similar to the work of Piva et al.[2], watermark
`
`Permission to make digital or hard copies ofall or part of this work for
`personal or classroom useis granted without fee provided that copies
`are not made ordistributed for profit or commercial advantage and that
`copies bear this notice and the full citation on the first page. To copy
`otherwise, to republish, to post on servers or to redistribute to lists,
`requires prior specific permission and/ora fee.
`ACM Multimedia Workshop Marina Del Rey CA USA
`Copyright ACM 2000 1-58113-311-1/00/11...$5.00
`
`embedding is performed in frequency domain. The princi-
`ples of spectral masking are exploited in a relatively simple
`mannerbyslightly modifying magnitudes of the Fourier co-
`efficients. The embedding algorithm is complemented with
`a detection procedure adapted from cross-correlation tech-
`niques used in imageregistration [9] and video watermark-
`ing [3] [8]. The combination of both algorithmsoffers sev-
`eral advantages in terms of robustness to sometrivial signal
`processing ”attacks” (e.g. all-pass filtering). In this paper,
`we introduce both embedding and detection algorithms and
`discuss briefly some key aspects such as payload, perceptual
`transparency, robustness and detection reliability.
`
`2. EMBEDDING
`A sketch of our watermark embedding algorithm is displayed
`in Figure 1. A random watermark sequence W(k) is drawn
`from a normal distribution with mean and standard devia-
`tion of 0 and 1, respectively. A cyclic shifted version W,(k)is
`used to achieve a multi-bit payload for one particular water-
`mark sequence W(k). Every possible shift may be associated
`with a different information label. Therefore, payloadis di-
`rectly proportional to the watermarksize (e.g. 1024-sample
`watermark corresponds to payload of maximum 10 bit).
`
`The dominant part of the perceptually weighted watermark
`w(n) is derived in the Fourier domain, where spectral mask-
`ing is exploited in a relatively simple manner. First, the
`audio signal x(n) is segmented into frames and transformed
`to the frequency domain. Here, the magnitudeof its Fourier
`coefficients are slightly modified by utilizing the shifted wa-
`termark sequence W,(k):
`
`Wi(k) = Wa(k)X;(k),
`
`(1)
`
`where i indicates the frame number, X;,(k) the spectral rep-
`resentation of the frame x;(n), and W/(k) the resulting fre-
`quency domain watermark. Note that the frame size is
`a trade-off between perceptual transparency (small frame
`sizes) and detection reliability (large frame sizes). Several
`experiments have demonstrated that, in general, framesizes
`of 2048-samples provide a good compromisein this trade-off.
`
`Inverse Fourier transforms F~! are used to reconstruct the
`time-domain watermark representation w(n). Shaping the
`watermark in frequency domain (Equation 1) is not suffi-
`cient to assure perceptual transparency. Since fixed length
`Fourier transforms do not provide accurate time-localization,
`watermarks computed in frequency domain will spread in
`time over the entire analysis window. This may result in
`
`119
`
`Sony v. MZ Audio
`
`Sony Exhibit 1041
`Sony Exhibit 1041
`Sony v. MZ Audio
`
`
`
`perceptual distortions such as pre-echos. Therefore, an ad-
`ditional scale parameter a and gain function g(n) are intro-
`ducedto refine the watermark in the temporal domain:
`
`y(n) = x(n) + ag(n)w(n),
`
`(2)
`
`wherea is the global scale parameter, g(n) a data dependent
`gain function with values between 0 and 1, and y(n) the
`watermarked audio.
`
`Analog to the frame size, also a is a parameter that in-
`fluences the trade-off between perceptual transparency and
`detection reliability: very small/large values of a may result
`in perceptual transparency/distortions and low/high water-
`mark detection reliabilities. Several informal adaptive up-
`down listening tests [10] were performed on a variety of wa-
`termarked audio excerpts to extract critical values of a. We
`found perceptual transparency was achieved by selecting a
`between 0.15 and 0.25, depending on the audio excerpt.
`
`y(n)
`
` Gain Function
`
`g(n)
`
`Figure 1: Overview of watermark embedding algo-
`rithm for digital audio. F and F~' indicate Fourier
`and inverse Fourier transforms, respectively.
`
`3. DETECTION
`Figure 2 gives an overview of the watermark detection algo-
`rithm. It relies on a cross-correlation procedure between the
`watermark sequence W(k) and the audio. Experimentsre-
`vealed that filtering prior to cross-correlation may improve
`detection reliabilities significantly.
`In our detection algo-
`rithm, y(n) is filtered with the "equalization” filter d(n)
`accordingto:
`
`g(r) = y(n) + d(n),
`
`(3)
`
`-—1]. This signal is
`with filter coefficients d(n) =[ —1 2
`segmented into frames and transformed to frequency domain
`to obtain the magnitude of the Fourier coefficients:
`¥i(k) = | F (g:(n)) |,
`
`(4)
`
`where ¥ indicates a Fourier transform operation. For each
`individual frame, the magnitude of Fourier coefficients Y;(k)
`need to be cross-correlated with every possible shifted ver-
`sion of W(k) to extract the payload. Such a cross-correlation
`is calculated most efficiently using Fourier transformedsig-
`nals:
`
`Y¥ir =F(Y;(k)), and We =F(W(k))*.
`
`(5)
`
`The traditional cross-correlation may then be written as:
`C= F (fir . Wr) :
`(6)
`is the cross-correlation function. Similar to de-
`where C;
`tection procedures in video watermarking [3], the detection
`performance may be enhanced by using the Symmetrical
`Phase Only Matched Filtering approach (SPOMF; [9]). In
`this cross-correlation procedure, only phase information of
`the signals Y;,7 and Wr is used:
`(7)
`Ci =F" (Phir) P(Wr)),
`where P is a phase-only operation and P(x) = 2x/|x| for
`x #0 and P(0) = 1. To improve detection reliability even
`further, Cj
`is accumulated over a period of time Cin
`> Cj. Since Chum is distributed normally its components
`may be normalized to the standard deviation a:
`Cc
`fo
`sum
`(8)
`Ch 7 a(Chum) ?
`Its
`is the normalized cross-correlation function.
`where C/,
`peak value, expressed in standard deviation o,
`is related
`directly to the detection reliability, whereas its position cor-
`responds to the cyclic shift (payload).
`
`The detection reliability depends strongly on the number of
`accumulated frames. In general, cross-correlation functions
`Cj need to be added over a period of 2 to 5 sec to exceed a
`detection threshold of 5¢. This corresponds to a false alarm
`probability of 2.9-107*. Figure 3 displays a typical cross-
`correlation function Cy.
`In this example, a peak value of
`~ 13o (false alarm probability of 6.3 - 10~°*) is detected at
`position 512.
`
`Payload
`
`Equalization
`
` y(n)
`
`Wk)
`
`Figure 2: Overview of watermark detection
`
`4. EXPERIMENTAL RESULTS
`In a numberof experiments we have examined the robust-
`ness of our audio watermark to a wide variety of signal ” at-
`tacks”. The following audio excerpts were used:
`(i) O For-
`tuna from Carl Orff, (ii) Success has made a failure of our
`home from Sinead O’Connor, (iii) Say what you want from
`Texas and (iv) She works hard for the money from Donna
`Summer. The 20 sec.
`audio fragments were sampled at
`44.1 kHz (16 bit, mono). Based on up-downlistening tests
`(section 2) we selected a = 0.2 for watermark embedding
`(Equation 2). All audio excerpts were subjected to the fol-
`lowing processing ” attacks”:
`
`© MP3 Encoding/Decodingat 64 kb/s and 32 kb/s.
`
`120
`
`
`
`Donna
`rd
`No Processing
`146
`MPS (G4kbit/3)
`MP3 (32kbit/s)|60[56|65|6.7
`Alkpass Filtering
`171
`Amp. Compr.
`178
`Equalization
`182
`Echo Addition
`16.2
`Band-Pass Filter
`14.3
`Time Scale +4%
`17.1
`Time Scale -4%
`16.5
`Fiesampling
`127
`Noise addition
`16.4
`D/A A/D
`7
`
`200
`
`900
`
`600
`500
`400
`Shift of watermark (W)
`
`700
`
`deviations} 100
`
`
`Oatectionrefiahility(stardard
`
`Table 1: Detection reliabilities expressed in stan-
`dard deviation co.
`
`Attack
`
`800
`
`900
`
`1000
`
`Figure 3: Example of cross-correlation function C),
`accumulated over a period of 5 sec. Dashed line
`indicates detection threshold of 5c.
`
`e All-pass Filtering using system function:
`H(z) = (0.812? — 1.64z + 1)/(z? — 1.64z + 0.81).
`
`e Amplitude Compression with the following ampli-
`tude compression ratios: 8.94:1 for |A| > —28.6 dB;
`1.73:1 for —46.4 < [A] < —28.6 dB; 1:1.61 for [A] <
`—46.4 dB.
`
`e Equalization with a 10-band equalizer where signals
`within each band are suppressed or amplified by 6 dB.
`
`e Echo Addition with a delay and decay of 100 ms and
`50%, respectively.
`
`e Band-Pass Filtering using a second order Butter-
`worthfilter with cut-off frequencies 100 Hz and 6000 Hz.
`
`e Time Scale Modification of +4% or -4%, where the
`pitch is unaffected.
`
`e Resampling consisting of subsequent down and up
`sampling to 22.05 kHz and 44.10 kHz, respectively.
`e Noise Addition with uniform white noise. Maximum
`magnitude of 150 quantization steps.
`
`e D/A-A/D Conversionsusing a commercial analogue
`tape recorder.
`
`Processing was performed in MatLab and CoolEdit Pro 1.2.
`Thedetection results were calculated by accumulatingcross-
`correlation functions C; (Equation 7) over periods of 5 sec
`and averaging the four detection reliabilities.
`
`The results are displayed in Table 1. Unprocessed water-
`markedaudio excerpts result in typical detection reliabilities
`between ~ 130 and ~ 170. MP3 compression at very low
`bit-rates (e.g. 32 kb/s) results in measurements close to the
`detection threshold of 5¢. The data reveal that detection
`reliability is affected only marginally by other signal attacks
`including MP3 compression at 64 kb/s and all-passfiltering.
`In general, reliabilities are in the range 1lo — 17¢, corre-
`sponding to a false alarm probability of at least 1.9. 10-75.
`
`5. CONCLUSIONS
`Based on existing technology in image and video watermark-
`ing, we have developed new algorithms for embedding and
`detecting watermarks in digital audio. Important character-
`istics of this new technique were discussed. Key results of
`this study are:
`
`1. Embedding: The dominant part of the perceptu-
`ally weighted watermark is derived in frequency do-
`main by slightly modifying the magnitude of Fourier
`coefficients. An additional scale parameter and time-
`domain gain function were necessary to refine the wa-
`termark. The scale parameter may also be utilized to
`tune system characteristics such as perceptual trans-
`parency and detectionreliability.
`
`2. Detection: The SPOMFcross-correlation approach
`offered a robust technology for blind detection of wa-
`termarks in digital audio.
`
`3. Robustness: Our watermark algorithm proved to be
`robust to a wide variety of signal processing ” attacks”
`such as MP3 (64 kb/s), all-passfiltering, echo addition,
`speed change, resampling, noise addition, etc.
`
`With the accomplishments described in paper, and possi-
`ble future developments, it is expected that our audio wa-
`termarking strategy can support a wide variety of existing
`(monitoring and copy control) and future applications.
`
`6. REFERENCES
`{1] E. Koch, and J. Zhao, 1995, ” Towards robust and
`hidden image copyright labeling”, in Nonlinear Signal
`Processing Workshop, Thessaloniki, Greece, pp.
`452-455.
`
`(2] A. Piva, M. Barni, and F. Bartolini, 1998, ” Copyright
`protection of digital images by means of frequency
`domain watermarking” , Proceedings of SPIE, vol.
`3456, pp. 25-35.
`
`[3] T. Kalker, G. Depovere, J. Haitsma, and M. Maes,
`1999, ”A video watermarking system for broadcast
`
`121
`
`
`
`monitoring” , Proceedings of IS&T/SPIE/EI25,
`Security and Watermarking of Multimedia Content,
`vol. 3657, pp. 103-112.
`
`D. Gruhl, W. Bender, and A. Lu, 1996,
`”Echo-hiding”, Information hiding: 1st International
`Workshop, R.J. Anderson, Ed., vol. 1174 of Lecture
`Notes in Computer Science, Isaac Newton Institute,
`England, pp. 295-315.
`
`P. Bassia, and I. Pitas, 1998, "Robust audio
`watermarking in the time domain”, 9th European
`Signal Processing Conference (EUSIPCO98), Greece,
`pp. 25-28.
`
`{7]
`
`[8]
`
`[9]
`
`I. Cox, J. Kilian, F.T. Leighton, and T. Shamoon,
`1996, ”A secure, robust watermark for multimedia”,
`In Proc. of the Information Hiding: First Int.
`Workshop, Lecture Notes in Computer Sciences, vol.
`1174, R. Anderson, ed., Springer-Verlag, pp. 183-206.
`
`G.F.G. Depovere, T. Kalker, and J.P.M.G. Linnartz,
`1998, Improved watermark detection reliability using
`filtering before correlation”, Int. Conf. on Image
`Processing, ICIP, Chicago IL.
`
`L.G. Brown, 1992, ”A survey of image registration
`techniques”, ACM Computing Surveys, vol. 24, pp.
`325-376.
`
`M.D. Swanson, B. Zhu, A.H. Tewfik, and L. Boney,
`1998, "Robust audio watermarking using perceptual
`masking”, Signal Processing, vol. 66, 337-355.
`
`[10]
`
`H. Levit, 1970, ” Transformed up-down methods in
`psychoacoustics”, The Journal of the Acoustical
`Society of America, vol. 49, pp. 467-477.
`
`[4]
`
`[5)
`
`[6]
`
`122
`
`