`Content Analysis
`
`Chung-Ping Wu, Po-Chyi Su and C.-C. Jay Kuo
`Media Fair, Inc., 1055 Corporate Center Dr., Ste 580
`Monterey Park, CA 91754
`and
`Department of Electrical Engineering-Systems
`University of Southern California, Los Angeles, CA 90089-2564
`E-mail: {chungpin,pochyisu,cckuo} Q@sipi.usc.edu
`
`ABSTRACT
`Digital audio watermarking embeds inaudible information into digital audio data for the purposes of copyright
`protection, ownership verification, covert communication, and/or auxiliary data carrying.
`In this paper, we first
`describe the desirable characteristics of digital audio watermarks. Previous work on audio watermarking, which
`has primarily focused on the inaudibility of the embedded watermark and its robustness against attacks such as
`compression and noise,
`is then reviewed.
`In this research, special attention is paid to the synchronization attack
`caused by casual audio editing or malicious random cropping, which is a low-cost yet effective attack to watermarking
`algorithms developed before. A digital audio watermarking scheme of low complexity is proposed in this research as
`an effective way to deter users from misusingorillegally distributing audio data. The proposed scheme is based on
`audio content analysis using the wavelet filterbank while the watermark is embeddin the Fourier transform domain.
`A blind watermark detection technique is developed to identify the embedded watermark under various types of
`attacks.
`Keywords: digital watermark, blind watermark detection, audio content analysis, synchronization attack, human
`auditory system, malicious cropping attack,wavelet
`
`1. INTRODUCTION
`Digital audio watermarking, the embedding anddetection of an imperceptible signalin digital audio data, has received
`increasing attention recently. Among various different uses of digital audio watermarking, copyright protection is
`the most highly demanded application. The fast growth of the Internet and the maturity of audio compression
`techniques enable the promising market of on-line music distribution. However, since the digital technology allows
`lossless data duplication,illegal copying and distribution would be mucheasier than before. This concern does make
`musical creators and distributors hesitant to step into this market quickly. Therefore, the proper content protection
`technology is the key to the emergence of this new market.
`Encryption and watermarking are the two most important content protection techniques. Encryption protects
`the content from anyone without the proper decryption key.
`It is useful in protecting the audio data from being
`intercepted during transmission. However, after the intended receiver decrypts it with the correct key, audio data
`could be illegally distributed and misused. Watermarks, on the other hand, cannot be removed from audio data
`even by the intended receiver. The embedded watermark signal permanently remains in audio data after repeated
`reproduction andredistribution. Thus, this signal could be used to protect the copyright of audio content by playback
`prohibition, illegal copy source tracing and ownership establishment.
`Other applications of digital audio watermarking include data hiding for covert communication, auxiliary data
`embedding for audio content labeling, and modification detection for authentication. Data hiding can also be used
`to complement encryption, ie.
`enhancing communication security by concealing the existence of sensitive data
`transmission. Embedded auxiliary data can carry lyrics or descriptions of the carrying audio data, or serve as links
`to external databases. Disappearance of fragile watermark could indicate unauthorized modifications and be used
`for content integrity verification.
`Different watermarking applications have different sets of requirements. Here, our discussion is focused on copy-
`right protection because it has the most stringent requirement on the watermark’s ability to survive intentional
`
`382
`
`
`
`
`
`In Security and Watermarking of Multimedia ContentsII, Ping Wah Wong,Edward J. Delp,
`Editors, Proceedings of SPIE Vol. 3971 (2000) ¢ 0277-786X/00/$15.00
`Sony Exhibit 1021
`Sony Exhibit 1021
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`
`
`attacks. This is considered as one of the most challenging issues of the watermarking technology today. Users benefit
`from embedded label data while hackers do not know the existence of hidden communication data. Thus, embedded
`watermarksin these two applications are generally not subject to malicious attacks.
`This paperis organized as follows. The requirements for audio watermarking systems are described in Section
`9. Previous work on audio watermarking is reviewed in Section 3. Our current work onsalient point extraction and
`Fourier domain watermarking is presented in Section 4. Experimentalresults and their analysis are given in Section
`5. Finally, concluding remarks are provided in Section 6.
`
`2. REQUIREMENTS FOR AUDIO WATERMARKING SYSTEMS
`In order for the embedded watermark to effectively protect the copyright of the digital audio data,it has been
`generally agreed!® that a good watermarking scheme should satisfy the following properties:
`
`1. The embedded watermark should not produce audible distortion to the sound quality of the original audio.
`2. The computation required by watermark embedding and detection should be low. The complexity of watermark
`detection should be especially low to facilitate its integration into consumerelectronic products.
`3. Watermark detection should be done without referencing the original audio data. This property is known as
`blind detection.
`
`4. The watermark should be undetectable without prior knowledge of the embedded watermark sequence. This
`property prevents attackers from reversing the embedding process to remove the watermark.
`5. The embedded watermark should be robust against commonsignal processing attacks suchas filtering, resam-
`pling and compression.
`6. The watermark should survive malicious attacks such as random cropping and noise adding. However, severe
`attacks that produce annoying noise can be ignored for the survivaltest.
`
`3. PREVIOUS WORK ON AUDIO WATERMARKING
`A variety of audio watermarking methods with very different characteristics have been proposed. They will be
`reviewed in this section.
`Early work on audio watermark embedding achieved inaudibility by placing watermark signals in perceptually
`insignificant regions. One popular choice was the higher frequency region,!°1? where humansensitivity declines
`compared to its peak around 1 kHz. In somesystems,'®"'
`the watermark signalis high-pass filtered before being
`inserted into the original audio. In another system,!? the Fourier transform magnitude coefficients over the frequency
`range from 2.4 kHz to 6.4 kHz are replaced with the watermark sequence. In these systems, inaudibility is further
`enhanced by only embedding watermarks in audio segments whose low frequency components have a higher energy
`value. The strong low frequency signals in the original audio could help to mask the embedded high frequency
`watermark signal.
`Another human insensitive domain is the Fourier transform phase coefficients. Human ears are relatively insen-
`sitive to phase distortions, and especially lack the ability to perceive the absolute phase value. A scheme! proposed
`to substitute the phase ofan initial audio segment with a reference phase that represents the watermark. The phase
`of subsequent segments is adjusted to preserve therelative phase between segments. In another system,!°
`selected
`Fourier transform phase coefficients in higher frequencies are discarded and new valuesare assigned based on neigh-
`boring reference coefficients. The watermark is represented by the relative phase between selected coefficients and
`their neighbors. The problem with watermarking schemes that hide watermark signals in perceptually insignificant
`regionsis that they are less robust to signal processing and malicious attacks. Compression algorithmsdo not preserve
`these regions well so that malicious hackers could implementstronger attacks in these regions without introducing
`annoying noise.
`Another class of algorithms embed watermarks as echo signals of the original audio. The inaudibility of echo
`hiding is based on the theory that-resonance is so common in our environment that human usually do not perceive
`it as noise. In these algorithms,?""* watermark signals are actually delayed and attenuated versions of the original
`
`383
`
`
`
`signal. The watermark sequence is represented by delay amounts which are retrieved by observing autocorrelation
`peaks in the time domain"! or in the cepstrum domain.”
`~
`Recently, some researchers use a concept borrowed from spread spectrum communication and embed the water-
`mark as pseudo-random noise in the time domain. It is guaranteed by spread spectrum theory that the embedded
`watermark is statistically undetectable by hackers. Since human ears have different sensitivity to additive noise in
`different frequency bands, all proposed work uses somefilter to spectrally shape the pseudo-random (white) noise
`and achieve inaudibility. A simple band-passfilter was used in one work,!
`and a nonlinearfilter was adopted in
`another.* In yet another system,!° instead offiltering white noise, a scheme was developed to generate the band-
`limited pseudo-random watermarksignal. The inaudibility of the embedded watermark could be further ensured by
`utilizing the masking effects of the human auditory system. One system!®* used MPEG-I Audio Psychoacoustic
`Model1 to spectrally shape the watermark signal while another system?” used the masking model from MPEG-II
`AAC. Watermark detection is done by calculating the correlation between the watermarked audio signal and the wa-
`termark signal. Armed with the spread spectrum communication theory, this type of watermarking usually survives
`pretty well under distortions and attacks. However, synchronization is difficult to implement, and its computational
`cost is high.
`
`Another trend in digital audio watermarking is to combine watermark embedding with the compression or mod-
`ulation process. The integration could minimize unfavorable mutual interference between watermarking and com-
`pression, especially preventing the watermark from being removed by compression.
`In one scheme,!*® watermark
`embedding is performed during vector quantization. The watermark is embedded by changing the selected code
`vector or changing the distortion weighting factor used in the searching process. The need of the original audio to
`extract the watermark greatly limits the applications of this scheme. Another algorithm!® embeds watermark directly
`in the sigma delta modulation bitstream to eliminate the need of transforming it into PCM data, thereby keeping
`the computational cost low. This is important to the sigma delta modulation system, where hardware savings is
`the main goal. In another scheme,®° watermarking is integrated with MPEG-II AAC compression. Watermark is
`embedded by modifying selected compression coefficients such as the scale factor.
`
`4. PROPOSED ALGORITHM
`
`Although the methods described in section 3 have their own features and properties, they share one commonproblem.
`That is, they are vulnerable to the synchronization attack in watermark detection. This problem could be resulted
`from casual audio editing such as cropping unwanted audio segmentsor intentional attacks such as randomly deleting
`or adding samples to watermarked audio data. This random sample cropping attack is very effective in interfering
`with the watermark detection process with respect to the algorithms mentioned above. This attack has a very low
`computational complexity. Besides, when done correctly, it would not introduce annoying noise to the underlying
`audio signals. One might argue that such a skillful attack could only be done by a few professionals and not by
`the majority of consumers. However, once a watermarking methodis widely in use, it is almost certain that some
`professionals would produce and distribute attacking apparatuses so that a majority of common users would be able
`to perform the skillful attack. One method® was proposed to solve the synchronization problem, where an exhaustive
`search algorithm was used andthe original audio signal was required. Consequently, its computational complexity
`is too high, and the need of original audio for watermark detection greatly limits its applications. Furthermore, it
`can only handle the casual editing attack, but not the random sample cropping attack.
`In this research, we propose a low complexity solution to the synchronization problem caused by both casual
`and malicious attacks. The solution is composed of a salient point extraction technique and a Fourier transform
`domain watermark embedding procedure. Salient point extraction through audio content analysis is done during
`both watermark embedding and detection processes so that synchronization is regained at each salient point. The
`extraction algorithm is designed such that salient points remain stable after distortion. TheFourier transform domain
`watermark embedding and detection is adopted since the frequency domain informationis less effected by sample
`cropping in the time domain.
`
`One common characteristic among most existing audio watermarkingalgorithms is that their watermark is em-
`bedded throughout the entire audio signal. However, this may not be the most efficient way to embed and detect
`watermarks. For a skilled attacker, different amount of attack could be applied to different segments of the audio
`signal to avoid introducing annoying noise. For example, randomly cropping (deleting) one sample out of every 100
`samples in high energy tonal segments of audio signals would produce noticeable noise, but the effect. of doing so in
`
`384
`
`
`
`
`
`
`Leettrnrtr3 fi1}
`
`oeototo “Ts
`=
`rfeeeEEEtheeethtt
`
`
`
`attee \ medio|pttt
`
`fbenbeat
`hao!
`ealephacentheheelkans|[alumnahonaiikeeadamndbantt{t Feesiraread
`
`
`
`
`(Hz)
`
`
`
`E FJ
`
`Frequency
`
`1
`
`2
`
`3
`
`5
`
`Figure 1. Illustration of the correspondence between music notes and frequency values, and the 5-subbandpartition
`adopted in this work
`
`low energy segments would be inaudible. Thus, watermarks embedded in highly-attackable areas will face heavier
`attack and are morelikely to be destroyed. The second major contribution of this work is the introduction of “attack-
`sensitive regions” via audio content analysis. If the watermark is only embedded in attack-sensitive regions where
`little attack could be applied, the computational complexity of both watermark embedding and detection could be
`reduced.
`
`By combining techniquesof salient point extraction, attack-sensitive region identification, and Fourier transform
`domain watermark embedding and detection, we propose a complete audio watermark embedding and detection
`system for copyright protection. This system satisfies all desired properties of watermark design described earlier.
`Furthermore, it has a very low computational complexity, and it is robust to casual and intentional synchronization
`attacks. Although we incorporate the conceptof salient point extraction and attack-sensitive regions into our own
`watermark embedding methodhere, it is our belief that other watermark embedding algorithms will benefit from
`the same concepts as well.
`
`4.1. Audio Content Analysis for Watermarking
`In our system, audio content analysis is performed for the purposesof salient point extraction and attack-sensitive
`regionidentification. Salient points in an audio signal allow watermark detection to resynchronize at these locations.
`Synchronization bysalient points has far less complexity than exhaustive search and makes blind watermark detection
`possible. It should be noted that we do not insert salient points, but extract them from the raw audio via content
`analysis. This approach has two advantages over explicitly embedding synchronization signals. One is that our
`content analysis approach does not introduce anydistortion to the original audio signal since we do not add anything
`to it. The otheris that the explicitly added synchronization signal is more likely to be taken out by attackers.
`A goodsalient point extraction method should produce approximately the same set of salient points from audio
`signals before and after attacks such as audio compression, low-pass filtering and noise adding. To achieve this, we
`extract salient points based on audio features that are sensitive to human ears. In this way, if an attacker wants to
`destroy these salient points, he/she would haveto alter these features and produce noticeable distortions. We choose
`the energy variation as the main feature for salient point extraction because the associated computational cost is low
`and alterations in this feature would be audible.
`
`The basic schemeis to extract Salient points as locations where the audio signal energy is climbing fast to a peak
`value. While this approach works well for simple music pieces with few instruments, it has two problems with more
`
`385
`
`
`
`;
`
`1
`
`pup Fe)
`
`(a)
`(b)
`Figure 2. A 6-level dyadic wavelet decomposition, where each branch in (a) represents the structure in (b) and
`outputs are numbered corresponding to subbands in Fig.1
`
`Ix(el)|
`
`(a)
`
`(b)
`
`(c)
`
`Figure 3. Theeffect of frequency inversion due to downsampling after high-pass filtering: (a) the spectrum before
`filtering, (b) the spectrum after high-passfiltering, (c) the spectrum after downsampling, where the highest frequency
`in (a) is now mapped to the lowest frequency.
`
`complex music pieces. The first problem is that the overall energy variation becomes ambiguous for complicated
`music where many instruments are played together. Thus, the stability of salient points decreases. The other problem
`is that optimal threshold values are different for music pieces with different complexity. While a high threshold value
`is suitable for music with sharp energyvariation, the application of the same value to complex music would yield
`very few salient points.
`Therefore, it is beneficial to parse complex music into several simpler ones so that stability of salient points could
`be improved and the same threshold could be applied to all music pieces. Complex music is usually composed of
`instruments whose fundamental frequencies occupy different frequency ranges in order to form harmony. Figure 1
`illustrates the correspondence between music notes and frequency values. It also shows the partition in our design,
`which consists of 5 frequency ranges. Note that the frequency width of each octave is not the same. The frequency
`intervals in Figure 1 correspond to outputs of a 6-level dyadic wavelet decomposition under a sampling rate of 44.1kHz
`as shown in Figure 2.
`In order to prevent the frequency inversion effect?! due to the application of downsampling to the output of
`high-pass filtering as shown in Figure 3, we modify the dyadic wavelet decomposition of Figure 2 into Figure 4 by
`
`386
`
`
`
`
`
`
`
`salient
`point
`after
`distortion
`
`original
`salient
`point
`
`Figure 5. Theeffect of salient point displacement on the discrete Fourier transform domain watermarking.
`
`eliminating the downsampling step after each high-pass filtering. Thus, salient points are extracted separately from
`each of the 5 outputs in Figure 4.
`
`The procedure of attack-sensitive region identification aims at decreasing the watermark embedding and detec-
`tion complexity. Thus, it is important that the identification process itself does not require too much computation.
`In this work, we integrate attack-sensitive region identification process into the salient point extraction process so
`that almost no extra computation is needed for attack-sensitive region identification. The attack that we are mainly
`concerned with is the random sample cropping attack. The corresponding attack-sensitive regions is the high energy
`tonal region. Since salient points chosen with our algorithm are located at positions where the audio signal energy
`is fast climbing to a peak, the region following each salient point would contain high energy. We simply define this
`region as the attack-sensitive region, so that no additional computation is needed.
`
`4.2. Fourier Transform Domain Watermark Embedding and Detection
`Although salient points are selected to be as stable as possible, it is difficult to get exactly the samesalient points
`after some audio processing such as compression. A certain amount of displacement in the location of salient points
`is common and should be tolerated.
`If we embed and detect watermark in the time domain, it is obvious that
`even a small amountof displacement would have a problem since embedding and detection cannot be synchronized.
`However, this problem is alleviated by considering the magnitude coefficients of the discrete Fourier transform.
`This property is illustrated in Figure 5, where a(t), i = 1,...,2?, is the watermarked region. The watermark is
`embedded in |A(k)|, k = 1,...,2?,where A(k) is the discrete Fourier transformcoefficient of a(z). Suppose that the
`salient point is displaced in the detection process, and the watermarked region is mistaken to be anotherregion b(2),
`
`387
`
`
`
`1 <i < 2”. However,it is a well known property that if c(i) is formed by moving the right-most part of a(z) to the
`left-most part, then c(i) and a(i) have identical discrete Fourier transform magnitude coefficients, i.e.
`|C(k)| = |A(k)|,
` &=1,---,2?,
`
`(1)
`
`Let us denote the difference between b(i) and c(i) with
`
`d(i) = c(i) — (8),
`
`i= 1,--+,2?.
`
`(2)
`
`Then, we have
`
`(3)
`
`|B(k)| ~ |C(k)| + |D(k)|
`= |A(k)|+|D(k)|, &=1,---,2?
`Thus, from (3), we see that the error caused by the displaced salient point is |D(k)|. There is no disastrous mis-
`synchronization effect in the frequency domain. Whenthe displacement amount is small relative to the windowsize,
`the energy in |D(k)|
`is small.
`it is common to utilize the temporal and frequency
`In order for the embedded watermark to be inaudible,
`masking effects of the human auditory system (HAS).14° Temporal maskingrefers to the effect that weaker signals
`immediately before and after a stronger signal may be inaudible while frequency maskingrefers to the effect that
`when two signals occur simultaneously and are close together in the frequency, the stronger signal may make the
`weaker one inaudible.
`Since our watermark is only embedded in attack-sensitive regions, which have a high energy value, the temporal
`masking effect is used. That is, the weak-energy watermark is masked by the high energy audio samples in these
`regions. To take advantage of the frequency maskingeffect as well, the proposed schemeonly embeds the watermark
`signal in the magnitude of the discrete Fourier transform coefficients that have large values.
`The watermarkdetection is done by calculating the average correlation coefficient between the watermark sequence
`and the watermarked audio signal in the Fourier transform domain and comparingit with a threshold. The criterion
`for selecting the threshold is to minimize the expected costof detection errors. Note that the cost of miss (i.e. failure
`to detect when there is a watermark)is different from the cost of false alarm (i.e. claim a detection while there is
`no watermark). Although these costs vary in different applications,it is generally true that the cost of false alarm
`is much greater than the cost of miss. The false alarm rate should be extremely low because it undermines the
`credibility of the watermarking method to prove copyright ownership.
`In contrast, the constraint on the miss (or
`failure-to-detect) rate need not be so stringent, since the failure-to-detect rate of 1% or 10% might have a similar
`effect in scaring people awayin illegally copying audio data. To conclude, the detection threshold should be set
`relatively high to ensure no false detection happens.
`
`5. EXPERIMENTAL RESULTS
`The inaudible and robust properties of the proposed watermarking scheme are demonstrated with three pieces of
`audio signals: Piano concerto by Bach with only a single piano, symphony ” Bolero” by Ravel with trumpet and
`drums, and a song with human vocal and complex background music. All signals are sampled at a frequency of 44.1
`kHz, and each piece is about 30 secondslong.
`
`5.1. Audio content analysis
`The effectiveness of the proposed audio content analysis is measured by its ability to extract the same set of salient
`points from audio signals before and after signal attack and/or processing. An example of the comparison between
`the salient points extracted from theoriginal and processedfiles is shown in Table 1. As we can see from this example,
`almost every salient point is more or less shifted by a few points. However, as explained in Section 4.2, this does
`not cause a catastrophic effect on watermark detection. Empirically, a displacement of less than 100 points produces
`very little decrease to the average correlation coefficient in watermark detection. Therefore, it should be viewed as
`successful salient point extraction. Some salient points may disappear and some may becreated after processing.
`However, again these phenomenaonly cause a marginal deterioration to detection results.
`The success rates of correctly extracted salient points with and without the wavelet filterbank are compared in
`Table 2. The attack used in Table 2 is MP3 compression/decompression.
`In our experiments, distortions such as
`
`388
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`5 2
`
`between distorted file
`
`0 file
`
`S
`
`tw
`
`5 3
`
`
`
`
`
`320039|444820|444795]25
`2
`
`
`
`
`
`
`
`
`131028 | 335700! —O}|“460640]335700] 460643] _—_--3
`
`
`
`Table 1. Comparison between salient points extracted from original and processed audio files, where rows printed
`
`in the bold type are regarded as failures, and the success rate in this example is 78.5%.
`
`
`Success
`Success
`Success
`
`rate
`rate
`
`
`
`
`without
`using
`wavelet
`wavelet
`
`
`filterbank|filterbank|increase
`
`
`
`
`single piano
`83.3%
`83.6%
`0.3%
`
`[trumpetmuse|||
`drum and
`71.4%
`77.3%
`5.9%
`
`
`
`[‘Saskgroundmass|||
`
`°ats
`
`=e w — ©QIQi
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`salient point|salient point|salient point]|salient point|salient point|salient point||salient point|salient point|salient point
`location
`location
`shift
`location
`location
`shift
`location
`location
`shift
`
`
`
`
`
`
`
`
`extracted
`extracted
`amount
`extracted
`extracted
`amount
`extracted
`extracted
`amount
`
`
`
`
`
`between
`from
`from
`from
`from
`between
`from
`from
`
`
`
`
`
`
`nal fil
`distorted file
`twofiles
`originalfile
`=|?
`440
`N
`343182|
`343309
`|658i]
`658
`|_none|_—_|
`351030]
`351003{ 27]
`
`14463|__ none|_——_|
`153485
`359383|
`359351
`
`
`
`|-13]
`-13
`3
`185056
`19464
`185107|
`
`
`|384255]384259|=|
`21092| 21063]29]
`-4
`
`
`
`28657| 28651/6|389912]389914]-2
`iBco|Gi}no
`2
`
`44152] 44104]48|391882]391884|2
`N
`
`a} NI
`59635
`
`
`
`|399407|_399526|__-119|
`fi
`91080
`-46
`©
`
`
`|406960|_none|_—~__
`94883
`
`|271803]271803}_—iO
`98548
`
`|_105946|_——_|
`|_297097|297097]_—iO|429936|_none|_——~_|
`
`
`
`rate
`
`vocal with complex
`
`63.0%
`
`73.1% |
`
`10.1%
`
`
`
`
`Table 2. The success rate of correct salient point extraction after the cascade of three MPEG LayerIII compres-
`sion/decompression operations with a bit rate of 64 kbps
`
`additive noise, low-pass filtering and downsampling cause muchless salient point displacement than MP3 compres-
`sion/decompression. It is observed that the more complex the music piece, the lower the success rate. However, the
`use of wavelet decomposition is most effective in raising the success rate of salient point extraction from complex
`music.
`
`5.2. Watermark Embedding
`
`The quality of the proposed watermarking method is evaluated by using the blind listening test. Listeners are
`presented with the original and watermarked audio without the knowledge of which one is watermarked. They are
`asked to, tell which one has better sound quality. We do not use the question “whether any differences could be
`detected between the two audio signals”® since people tend to imagine the difference while they actually cannot hear
`any. In fact, several listeners reported that audio signals were different when the samepiece of audio clip was played
`twice.
`
`Eleven people took the listening test, and the percentage of preferring the original audio to the watermarked
`audio is given in Table 3. The result shows that about one half of listeners preferred watermarked audio to the
`original. Therefore, no audible distortion is introduced by the embedded watermark.
`
`389
`
`,
`
`
`
`
`
`
`Test
`Original preferred
`Audio
`to watermarked
`
`45.5%
`
`
`single piano
`54.5%
`
`drum and trumpet music
`
`45.5%
`
`vocal with complex background music
`
`
`
`
`
`Table 3. The blind listening test of watermarked audio pieces.
`
`
`Single|drum and| Vocal with
`ATTACK piano|trumpet|complex background
`
`music
`music
`|
`
`
`
`
`
`
`No attack
`2.56
`2.17
`
`170
`
`MPEG
`2.14
`1.51
`
`
`compression
`|
`
`
`Random
`2.08
`1.76
`
`
`cropping
`_|
`
`1.92
`1.7]
`
`Lowpass
`filtering
`
`
`
`=
`
`Table 4. The ratio between the correlation peak with the correct user ID and thelargest correlation in 1000 random
`trials.
`
`5.3. Blind Watermark Detection
`We tested the robustness of the proposed blind watermark retrieval algorithm against several kinds of attacks,
`including additive noise, MPEG compression, random cropping, low pass filtering, and resampling. The quality of
`watermark detection is evaluated by the ratio between the correlation value obtained from the correct user ID and
`the largest correlation obtained from 1000 other random user IDs. The ratio between the correlation value from the
`correct user ID and thelargest correlation obtained from 1000 other random user IDs are summarized for the three
`test audio pieces in Table 4. Each kind of attack leads to a different amount of decrease in this peak ratio. However,
`in all cases experimented, the correlation peak of the correct user ID always standsoutof the rest correlation peaks.
`Wehave the following observations.
`
`e Additive white noise.
`White noise with 10% of the power of the audiosignal is added. Noise of this level is clearly audible, but only
`causes a moderate decrease in the peak ratio.
`
`e MPEG compression.
`In multimedia applications, lossy compression is a very common procedure to increase transmission and storage
`efficiency. Some information is thrown away during the compression process, thus creating a potential hazard
`for watermark detection. To test the robustness of the proposed watermarking approach to lossy compression,
`the watermarked audio signal is compressed and decompressed by MPEG layer Ill coder with a bit rate of
`64 kbps. As shown in Table 4, this attack is more serious than others. However, the watermark canstill be
`detected correctly.
`
`e Random cropping.
`Randomly cropping one sample out of every 100 samples produces a disastrous synchronization problem for
`time-domain watermarking methods. However, the correlation peak ratio is only slightly decreased with the
`proposed method.
`
`e Lowpassfiltering.
`With watermarks embedded in the frequency domain,lowpassfiltering with a very low cutoff frequency could
`effectively eliminate the embedded watermark. However, since our watermark is embedded in the frequency
`bands with the highest energy, filtering out the inserted watermark also greatly effects the sound quality. In
`
`
`
`390
`
`
`
`our experiment, a lowpassfilter with acutoff frequency of 4kHzis applied to watermarked audio signals. The
`loss of high frequency componentsis clearly audible, but the correlation peak ratio is only decreased around
`25%.
`
`As shown in Table 4, the correlation peak ratios after various kinds of attacks are scattered between 1.5 ~ 2.5.
`These values could be increased if the watermark is embedded and retrieved everywhere in the audio signal, or if
`the original audio is used in watermark detection. However, the correlation ratio in Table 4 is already high enough
`for unambiguous watermark detection. The efficiency achieved by blind watermark detection and embedding in
`attack-sensitive regions only is very important for the practical use of audio watermarks.
`
`6. CONCLUSION
`The rapid growth of multimedia technologies facilitates the production and transmission of digital media data.
`It brings us not only opportunities but also challenges to copyright protection. An audio watermarking scheme
`_ which meets both the robustness and the low computational complexity requirements via audio content analysis was
`presented in this paper. The analysis identifies attack-sensitive regions which are suitable for watermark insertion,
`and provides consistent audio segmentation results before and after attacks. A modified dyadic wavelet filterbank
`is used to enhance the analysis results for complex music. After audio content analysis, a watermark embed