throbber
Robust and Efficient Digital Audio Watermarking Using Audio
`Content Analysis
`
`Chung-Ping Wu, Po-Chyi Su and C.-C. Jay Kuo
`Media Fair, Inc., 1055 Corporate Center Dr., Ste 580
`Monterey Park, CA 91754
`and
`Department of Electrical Engineering-Systems
`University of Southern California, Los Angeles, CA 90089-2564
`E-mail: {chungpin,pochyisu,cckuo} Q@sipi.usc.edu
`
`ABSTRACT
`Digital audio watermarking embeds inaudible information into digital audio data for the purposes of copyright
`protection, ownership verification, covert communication, and/or auxiliary data carrying.
`In this paper, we first
`describe the desirable characteristics of digital audio watermarks. Previous work on audio watermarking, which
`has primarily focused on the inaudibility of the embedded watermark and its robustness against attacks such as
`compression and noise,
`is then reviewed.
`In this research, special attention is paid to the synchronization attack
`caused by casual audio editing or malicious random cropping, which is a low-cost yet effective attack to watermarking
`algorithms developed before. A digital audio watermarking scheme of low complexity is proposed in this research as
`an effective way to deter users from misusingorillegally distributing audio data. The proposed scheme is based on
`audio content analysis using the wavelet filterbank while the watermark is embeddin the Fourier transform domain.
`A blind watermark detection technique is developed to identify the embedded watermark under various types of
`attacks.
`Keywords: digital watermark, blind watermark detection, audio content analysis, synchronization attack, human
`auditory system, malicious cropping attack,wavelet
`
`1. INTRODUCTION
`Digital audio watermarking, the embedding anddetection of an imperceptible signalin digital audio data, has received
`increasing attention recently. Among various different uses of digital audio watermarking, copyright protection is
`the most highly demanded application. The fast growth of the Internet and the maturity of audio compression
`techniques enable the promising market of on-line music distribution. However, since the digital technology allows
`lossless data duplication,illegal copying and distribution would be mucheasier than before. This concern does make
`musical creators and distributors hesitant to step into this market quickly. Therefore, the proper content protection
`technology is the key to the emergence of this new market.
`Encryption and watermarking are the two most important content protection techniques. Encryption protects
`the content from anyone without the proper decryption key.
`It is useful in protecting the audio data from being
`intercepted during transmission. However, after the intended receiver decrypts it with the correct key, audio data
`could be illegally distributed and misused. Watermarks, on the other hand, cannot be removed from audio data
`even by the intended receiver. The embedded watermark signal permanently remains in audio data after repeated
`reproduction andredistribution. Thus, this signal could be used to protect the copyright of audio content by playback
`prohibition, illegal copy source tracing and ownership establishment.
`Other applications of digital audio watermarking include data hiding for covert communication, auxiliary data
`embedding for audio content labeling, and modification detection for authentication. Data hiding can also be used
`to complement encryption, ie.
`enhancing communication security by concealing the existence of sensitive data
`transmission. Embedded auxiliary data can carry lyrics or descriptions of the carrying audio data, or serve as links
`to external databases. Disappearance of fragile watermark could indicate unauthorized modifications and be used
`for content integrity verification.
`Different watermarking applications have different sets of requirements. Here, our discussion is focused on copy-
`right protection because it has the most stringent requirement on the watermark’s ability to survive intentional
`
`382
`
`
`
`
`
`In Security and Watermarking of Multimedia ContentsII, Ping Wah Wong,Edward J. Delp,
`Editors, Proceedings of SPIE Vol. 3971 (2000) ¢ 0277-786X/00/$15.00
`Sony Exhibit 1021
`Sony Exhibit 1021
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`

`

`attacks. This is considered as one of the most challenging issues of the watermarking technology today. Users benefit
`from embedded label data while hackers do not know the existence of hidden communication data. Thus, embedded
`watermarksin these two applications are generally not subject to malicious attacks.
`This paperis organized as follows. The requirements for audio watermarking systems are described in Section
`9. Previous work on audio watermarking is reviewed in Section 3. Our current work onsalient point extraction and
`Fourier domain watermarking is presented in Section 4. Experimentalresults and their analysis are given in Section
`5. Finally, concluding remarks are provided in Section 6.
`
`2. REQUIREMENTS FOR AUDIO WATERMARKING SYSTEMS
`In order for the embedded watermark to effectively protect the copyright of the digital audio data,it has been
`generally agreed!® that a good watermarking scheme should satisfy the following properties:
`
`1. The embedded watermark should not produce audible distortion to the sound quality of the original audio.
`2. The computation required by watermark embedding and detection should be low. The complexity of watermark
`detection should be especially low to facilitate its integration into consumerelectronic products.
`3. Watermark detection should be done without referencing the original audio data. This property is known as
`blind detection.
`
`4. The watermark should be undetectable without prior knowledge of the embedded watermark sequence. This
`property prevents attackers from reversing the embedding process to remove the watermark.
`5. The embedded watermark should be robust against commonsignal processing attacks suchas filtering, resam-
`pling and compression.
`6. The watermark should survive malicious attacks such as random cropping and noise adding. However, severe
`attacks that produce annoying noise can be ignored for the survivaltest.
`
`3. PREVIOUS WORK ON AUDIO WATERMARKING
`A variety of audio watermarking methods with very different characteristics have been proposed. They will be
`reviewed in this section.
`Early work on audio watermark embedding achieved inaudibility by placing watermark signals in perceptually
`insignificant regions. One popular choice was the higher frequency region,!°1? where humansensitivity declines
`compared to its peak around 1 kHz. In somesystems,'®"'
`the watermark signalis high-pass filtered before being
`inserted into the original audio. In another system,!? the Fourier transform magnitude coefficients over the frequency
`range from 2.4 kHz to 6.4 kHz are replaced with the watermark sequence. In these systems, inaudibility is further
`enhanced by only embedding watermarks in audio segments whose low frequency components have a higher energy
`value. The strong low frequency signals in the original audio could help to mask the embedded high frequency
`watermark signal.
`Another human insensitive domain is the Fourier transform phase coefficients. Human ears are relatively insen-
`sitive to phase distortions, and especially lack the ability to perceive the absolute phase value. A scheme! proposed
`to substitute the phase ofan initial audio segment with a reference phase that represents the watermark. The phase
`of subsequent segments is adjusted to preserve therelative phase between segments. In another system,!°
`selected
`Fourier transform phase coefficients in higher frequencies are discarded and new valuesare assigned based on neigh-
`boring reference coefficients. The watermark is represented by the relative phase between selected coefficients and
`their neighbors. The problem with watermarking schemes that hide watermark signals in perceptually insignificant
`regionsis that they are less robust to signal processing and malicious attacks. Compression algorithmsdo not preserve
`these regions well so that malicious hackers could implementstronger attacks in these regions without introducing
`annoying noise.
`Another class of algorithms embed watermarks as echo signals of the original audio. The inaudibility of echo
`hiding is based on the theory that-resonance is so common in our environment that human usually do not perceive
`it as noise. In these algorithms,?""* watermark signals are actually delayed and attenuated versions of the original
`
`383
`
`

`

`signal. The watermark sequence is represented by delay amounts which are retrieved by observing autocorrelation
`peaks in the time domain"! or in the cepstrum domain.”
`~
`Recently, some researchers use a concept borrowed from spread spectrum communication and embed the water-
`mark as pseudo-random noise in the time domain. It is guaranteed by spread spectrum theory that the embedded
`watermark is statistically undetectable by hackers. Since human ears have different sensitivity to additive noise in
`different frequency bands, all proposed work uses somefilter to spectrally shape the pseudo-random (white) noise
`and achieve inaudibility. A simple band-passfilter was used in one work,!
`and a nonlinearfilter was adopted in
`another.* In yet another system,!° instead offiltering white noise, a scheme was developed to generate the band-
`limited pseudo-random watermarksignal. The inaudibility of the embedded watermark could be further ensured by
`utilizing the masking effects of the human auditory system. One system!®* used MPEG-I Audio Psychoacoustic
`Model1 to spectrally shape the watermark signal while another system?” used the masking model from MPEG-II
`AAC. Watermark detection is done by calculating the correlation between the watermarked audio signal and the wa-
`termark signal. Armed with the spread spectrum communication theory, this type of watermarking usually survives
`pretty well under distortions and attacks. However, synchronization is difficult to implement, and its computational
`cost is high.
`
`Another trend in digital audio watermarking is to combine watermark embedding with the compression or mod-
`ulation process. The integration could minimize unfavorable mutual interference between watermarking and com-
`pression, especially preventing the watermark from being removed by compression.
`In one scheme,!*® watermark
`embedding is performed during vector quantization. The watermark is embedded by changing the selected code
`vector or changing the distortion weighting factor used in the searching process. The need of the original audio to
`extract the watermark greatly limits the applications of this scheme. Another algorithm!® embeds watermark directly
`in the sigma delta modulation bitstream to eliminate the need of transforming it into PCM data, thereby keeping
`the computational cost low. This is important to the sigma delta modulation system, where hardware savings is
`the main goal. In another scheme,®° watermarking is integrated with MPEG-II AAC compression. Watermark is
`embedded by modifying selected compression coefficients such as the scale factor.
`
`4. PROPOSED ALGORITHM
`
`Although the methods described in section 3 have their own features and properties, they share one commonproblem.
`That is, they are vulnerable to the synchronization attack in watermark detection. This problem could be resulted
`from casual audio editing such as cropping unwanted audio segmentsor intentional attacks such as randomly deleting
`or adding samples to watermarked audio data. This random sample cropping attack is very effective in interfering
`with the watermark detection process with respect to the algorithms mentioned above. This attack has a very low
`computational complexity. Besides, when done correctly, it would not introduce annoying noise to the underlying
`audio signals. One might argue that such a skillful attack could only be done by a few professionals and not by
`the majority of consumers. However, once a watermarking methodis widely in use, it is almost certain that some
`professionals would produce and distribute attacking apparatuses so that a majority of common users would be able
`to perform the skillful attack. One method® was proposed to solve the synchronization problem, where an exhaustive
`search algorithm was used andthe original audio signal was required. Consequently, its computational complexity
`is too high, and the need of original audio for watermark detection greatly limits its applications. Furthermore, it
`can only handle the casual editing attack, but not the random sample cropping attack.
`In this research, we propose a low complexity solution to the synchronization problem caused by both casual
`and malicious attacks. The solution is composed of a salient point extraction technique and a Fourier transform
`domain watermark embedding procedure. Salient point extraction through audio content analysis is done during
`both watermark embedding and detection processes so that synchronization is regained at each salient point. The
`extraction algorithm is designed such that salient points remain stable after distortion. TheFourier transform domain
`watermark embedding and detection is adopted since the frequency domain informationis less effected by sample
`cropping in the time domain.
`
`One common characteristic among most existing audio watermarkingalgorithms is that their watermark is em-
`bedded throughout the entire audio signal. However, this may not be the most efficient way to embed and detect
`watermarks. For a skilled attacker, different amount of attack could be applied to different segments of the audio
`signal to avoid introducing annoying noise. For example, randomly cropping (deleting) one sample out of every 100
`samples in high energy tonal segments of audio signals would produce noticeable noise, but the effect. of doing so in
`
`384
`
`

`

`
`
`
`Leettrnrtr3 fi1}
`
`oeototo “Ts
`=
`rfeeeEEEtheeethtt
`
`
`
`attee \ medio|pttt
`
`fbenbeat
`hao!
`ealephacentheheelkans|[alumnahonaiikeeadamndbantt{t Feesiraread
`
`
`
`
`(Hz)
`
`
`
`E FJ
`
`Frequency
`
`1
`
`2
`
`3
`
`5
`
`Figure 1. Illustration of the correspondence between music notes and frequency values, and the 5-subbandpartition
`adopted in this work
`
`low energy segments would be inaudible. Thus, watermarks embedded in highly-attackable areas will face heavier
`attack and are morelikely to be destroyed. The second major contribution of this work is the introduction of “attack-
`sensitive regions” via audio content analysis. If the watermark is only embedded in attack-sensitive regions where
`little attack could be applied, the computational complexity of both watermark embedding and detection could be
`reduced.
`
`By combining techniquesof salient point extraction, attack-sensitive region identification, and Fourier transform
`domain watermark embedding and detection, we propose a complete audio watermark embedding and detection
`system for copyright protection. This system satisfies all desired properties of watermark design described earlier.
`Furthermore, it has a very low computational complexity, and it is robust to casual and intentional synchronization
`attacks. Although we incorporate the conceptof salient point extraction and attack-sensitive regions into our own
`watermark embedding methodhere, it is our belief that other watermark embedding algorithms will benefit from
`the same concepts as well.
`
`4.1. Audio Content Analysis for Watermarking
`In our system, audio content analysis is performed for the purposesof salient point extraction and attack-sensitive
`regionidentification. Salient points in an audio signal allow watermark detection to resynchronize at these locations.
`Synchronization bysalient points has far less complexity than exhaustive search and makes blind watermark detection
`possible. It should be noted that we do not insert salient points, but extract them from the raw audio via content
`analysis. This approach has two advantages over explicitly embedding synchronization signals. One is that our
`content analysis approach does not introduce anydistortion to the original audio signal since we do not add anything
`to it. The otheris that the explicitly added synchronization signal is more likely to be taken out by attackers.
`A goodsalient point extraction method should produce approximately the same set of salient points from audio
`signals before and after attacks such as audio compression, low-pass filtering and noise adding. To achieve this, we
`extract salient points based on audio features that are sensitive to human ears. In this way, if an attacker wants to
`destroy these salient points, he/she would haveto alter these features and produce noticeable distortions. We choose
`the energy variation as the main feature for salient point extraction because the associated computational cost is low
`and alterations in this feature would be audible.
`
`The basic schemeis to extract Salient points as locations where the audio signal energy is climbing fast to a peak
`value. While this approach works well for simple music pieces with few instruments, it has two problems with more
`
`385
`
`

`

`;
`
`1
`
`pup Fe)
`
`(a)
`(b)
`Figure 2. A 6-level dyadic wavelet decomposition, where each branch in (a) represents the structure in (b) and
`outputs are numbered corresponding to subbands in Fig.1
`
`Ix(el)|
`
`(a)
`
`(b)
`
`(c)
`
`Figure 3. Theeffect of frequency inversion due to downsampling after high-pass filtering: (a) the spectrum before
`filtering, (b) the spectrum after high-passfiltering, (c) the spectrum after downsampling, where the highest frequency
`in (a) is now mapped to the lowest frequency.
`
`complex music pieces. The first problem is that the overall energy variation becomes ambiguous for complicated
`music where many instruments are played together. Thus, the stability of salient points decreases. The other problem
`is that optimal threshold values are different for music pieces with different complexity. While a high threshold value
`is suitable for music with sharp energyvariation, the application of the same value to complex music would yield
`very few salient points.
`Therefore, it is beneficial to parse complex music into several simpler ones so that stability of salient points could
`be improved and the same threshold could be applied to all music pieces. Complex music is usually composed of
`instruments whose fundamental frequencies occupy different frequency ranges in order to form harmony. Figure 1
`illustrates the correspondence between music notes and frequency values. It also shows the partition in our design,
`which consists of 5 frequency ranges. Note that the frequency width of each octave is not the same. The frequency
`intervals in Figure 1 correspond to outputs of a 6-level dyadic wavelet decomposition under a sampling rate of 44.1kHz
`as shown in Figure 2.
`In order to prevent the frequency inversion effect?! due to the application of downsampling to the output of
`high-pass filtering as shown in Figure 3, we modify the dyadic wavelet decomposition of Figure 2 into Figure 4 by
`
`386
`
`

`

`
`
`
`
`salient
`point
`after
`distortion
`
`original
`salient
`point
`
`Figure 5. Theeffect of salient point displacement on the discrete Fourier transform domain watermarking.
`
`eliminating the downsampling step after each high-pass filtering. Thus, salient points are extracted separately from
`each of the 5 outputs in Figure 4.
`
`The procedure of attack-sensitive region identification aims at decreasing the watermark embedding and detec-
`tion complexity. Thus, it is important that the identification process itself does not require too much computation.
`In this work, we integrate attack-sensitive region identification process into the salient point extraction process so
`that almost no extra computation is needed for attack-sensitive region identification. The attack that we are mainly
`concerned with is the random sample cropping attack. The corresponding attack-sensitive regions is the high energy
`tonal region. Since salient points chosen with our algorithm are located at positions where the audio signal energy
`is fast climbing to a peak, the region following each salient point would contain high energy. We simply define this
`region as the attack-sensitive region, so that no additional computation is needed.
`
`4.2. Fourier Transform Domain Watermark Embedding and Detection
`Although salient points are selected to be as stable as possible, it is difficult to get exactly the samesalient points
`after some audio processing such as compression. A certain amount of displacement in the location of salient points
`is common and should be tolerated.
`If we embed and detect watermark in the time domain, it is obvious that
`even a small amountof displacement would have a problem since embedding and detection cannot be synchronized.
`However, this problem is alleviated by considering the magnitude coefficients of the discrete Fourier transform.
`This property is illustrated in Figure 5, where a(t), i = 1,...,2?, is the watermarked region. The watermark is
`embedded in |A(k)|, k = 1,...,2?,where A(k) is the discrete Fourier transformcoefficient of a(z). Suppose that the
`salient point is displaced in the detection process, and the watermarked region is mistaken to be anotherregion b(2),
`
`387
`
`

`

`1 <i < 2”. However,it is a well known property that if c(i) is formed by moving the right-most part of a(z) to the
`left-most part, then c(i) and a(i) have identical discrete Fourier transform magnitude coefficients, i.e.
`|C(k)| = |A(k)|,
` &=1,---,2?,
`
`(1)
`
`Let us denote the difference between b(i) and c(i) with
`
`d(i) = c(i) — (8),
`
`i= 1,--+,2?.
`
`(2)
`
`Then, we have
`
`(3)
`
`|B(k)| ~ |C(k)| + |D(k)|
`= |A(k)|+|D(k)|, &=1,---,2?
`Thus, from (3), we see that the error caused by the displaced salient point is |D(k)|. There is no disastrous mis-
`synchronization effect in the frequency domain. Whenthe displacement amount is small relative to the windowsize,
`the energy in |D(k)|
`is small.
`it is common to utilize the temporal and frequency
`In order for the embedded watermark to be inaudible,
`masking effects of the human auditory system (HAS).14° Temporal maskingrefers to the effect that weaker signals
`immediately before and after a stronger signal may be inaudible while frequency maskingrefers to the effect that
`when two signals occur simultaneously and are close together in the frequency, the stronger signal may make the
`weaker one inaudible.
`Since our watermark is only embedded in attack-sensitive regions, which have a high energy value, the temporal
`masking effect is used. That is, the weak-energy watermark is masked by the high energy audio samples in these
`regions. To take advantage of the frequency maskingeffect as well, the proposed schemeonly embeds the watermark
`signal in the magnitude of the discrete Fourier transform coefficients that have large values.
`The watermarkdetection is done by calculating the average correlation coefficient between the watermark sequence
`and the watermarked audio signal in the Fourier transform domain and comparingit with a threshold. The criterion
`for selecting the threshold is to minimize the expected costof detection errors. Note that the cost of miss (i.e. failure
`to detect when there is a watermark)is different from the cost of false alarm (i.e. claim a detection while there is
`no watermark). Although these costs vary in different applications,it is generally true that the cost of false alarm
`is much greater than the cost of miss. The false alarm rate should be extremely low because it undermines the
`credibility of the watermarking method to prove copyright ownership.
`In contrast, the constraint on the miss (or
`failure-to-detect) rate need not be so stringent, since the failure-to-detect rate of 1% or 10% might have a similar
`effect in scaring people awayin illegally copying audio data. To conclude, the detection threshold should be set
`relatively high to ensure no false detection happens.
`
`5. EXPERIMENTAL RESULTS
`The inaudible and robust properties of the proposed watermarking scheme are demonstrated with three pieces of
`audio signals: Piano concerto by Bach with only a single piano, symphony ” Bolero” by Ravel with trumpet and
`drums, and a song with human vocal and complex background music. All signals are sampled at a frequency of 44.1
`kHz, and each piece is about 30 secondslong.
`
`5.1. Audio content analysis
`The effectiveness of the proposed audio content analysis is measured by its ability to extract the same set of salient
`points from audio signals before and after signal attack and/or processing. An example of the comparison between
`the salient points extracted from theoriginal and processedfiles is shown in Table 1. As we can see from this example,
`almost every salient point is more or less shifted by a few points. However, as explained in Section 4.2, this does
`not cause a catastrophic effect on watermark detection. Empirically, a displacement of less than 100 points produces
`very little decrease to the average correlation coefficient in watermark detection. Therefore, it should be viewed as
`successful salient point extraction. Some salient points may disappear and some may becreated after processing.
`However, again these phenomenaonly cause a marginal deterioration to detection results.
`The success rates of correctly extracted salient points with and without the wavelet filterbank are compared in
`Table 2. The attack used in Table 2 is MP3 compression/decompression.
`In our experiments, distortions such as
`
`388
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`5 2
`
`between distorted file
`
`0 file
`
`S
`
`tw
`
`5 3
`
`
`
`
`
`320039|444820|444795]25
`2
`
`
`
`
`
`
`
`
`131028 | 335700! —O}|“460640]335700] 460643] _—_--3
`
`
`
`Table 1. Comparison between salient points extracted from original and processed audio files, where rows printed
`
`in the bold type are regarded as failures, and the success rate in this example is 78.5%.
`
`
`Success
`Success
`Success
`
`rate
`rate
`
`
`
`
`without
`using
`wavelet
`wavelet
`
`
`filterbank|filterbank|increase
`
`
`
`
`single piano
`83.3%
`83.6%
`0.3%
`
`[trumpetmuse|||
`drum and
`71.4%
`77.3%
`5.9%
`
`
`
`[‘Saskgroundmass|||
`
`°ats
`
`=e w — ©QIQi
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`salient point|salient point|salient point]|salient point|salient point|salient point||salient point|salient point|salient point
`location
`location
`shift
`location
`location
`shift
`location
`location
`shift
`
`
`
`
`
`
`
`
`extracted
`extracted
`amount
`extracted
`extracted
`amount
`extracted
`extracted
`amount
`
`
`
`
`
`between
`from
`from
`from
`from
`between
`from
`from
`
`
`
`
`
`
`nal fil
`distorted file
`twofiles
`originalfile
`=|?
`440
`N
`343182|
`343309
`|658i]
`658
`|_none|_—_|
`351030]
`351003{ 27]
`
`14463|__ none|_——_|
`153485
`359383|
`359351
`
`
`
`|-13]
`-13
`3
`185056
`19464
`185107|
`
`
`|384255]384259|=|
`21092| 21063]29]
`-4
`
`
`
`28657| 28651/6|389912]389914]-2
`iBco|Gi}no
`2
`
`44152] 44104]48|391882]391884|2
`N
`
`a} NI
`59635
`
`
`
`|399407|_399526|__-119|
`fi
`91080
`-46

`
`
`|406960|_none|_—~__
`94883
`
`|271803]271803}_—iO
`98548
`
`|_105946|_——_|
`|_297097|297097]_—iO|429936|_none|_——~_|
`
`
`
`rate
`
`vocal with complex
`
`63.0%
`
`73.1% |
`
`10.1%
`
`
`
`
`Table 2. The success rate of correct salient point extraction after the cascade of three MPEG LayerIII compres-
`sion/decompression operations with a bit rate of 64 kbps
`
`additive noise, low-pass filtering and downsampling cause muchless salient point displacement than MP3 compres-
`sion/decompression. It is observed that the more complex the music piece, the lower the success rate. However, the
`use of wavelet decomposition is most effective in raising the success rate of salient point extraction from complex
`music.
`
`5.2. Watermark Embedding
`
`The quality of the proposed watermarking method is evaluated by using the blind listening test. Listeners are
`presented with the original and watermarked audio without the knowledge of which one is watermarked. They are
`asked to, tell which one has better sound quality. We do not use the question “whether any differences could be
`detected between the two audio signals”® since people tend to imagine the difference while they actually cannot hear
`any. In fact, several listeners reported that audio signals were different when the samepiece of audio clip was played
`twice.
`
`Eleven people took the listening test, and the percentage of preferring the original audio to the watermarked
`audio is given in Table 3. The result shows that about one half of listeners preferred watermarked audio to the
`original. Therefore, no audible distortion is introduced by the embedded watermark.
`
`389
`
`,
`
`

`

`
`
`
`Test
`Original preferred
`Audio
`to watermarked
`
`45.5%
`
`
`single piano
`54.5%
`
`drum and trumpet music
`
`45.5%
`
`vocal with complex background music
`
`
`
`
`
`Table 3. The blind listening test of watermarked audio pieces.
`
`
`Single|drum and| Vocal with
`ATTACK piano|trumpet|complex background
`
`music
`music
`|
`
`
`
`
`
`
`No attack
`2.56
`2.17
`
`170
`
`MPEG
`2.14
`1.51
`
`
`compression
`|
`
`
`Random
`2.08
`1.76
`
`
`cropping
`_|
`
`1.92
`1.7]
`
`Lowpass
`filtering
`
`
`
`=
`
`Table 4. The ratio between the correlation peak with the correct user ID and thelargest correlation in 1000 random
`trials.
`
`5.3. Blind Watermark Detection
`We tested the robustness of the proposed blind watermark retrieval algorithm against several kinds of attacks,
`including additive noise, MPEG compression, random cropping, low pass filtering, and resampling. The quality of
`watermark detection is evaluated by the ratio between the correlation value obtained from the correct user ID and
`the largest correlation obtained from 1000 other random user IDs. The ratio between the correlation value from the
`correct user ID and thelargest correlation obtained from 1000 other random user IDs are summarized for the three
`test audio pieces in Table 4. Each kind of attack leads to a different amount of decrease in this peak ratio. However,
`in all cases experimented, the correlation peak of the correct user ID always standsoutof the rest correlation peaks.
`Wehave the following observations.
`
`e Additive white noise.
`White noise with 10% of the power of the audiosignal is added. Noise of this level is clearly audible, but only
`causes a moderate decrease in the peak ratio.
`
`e MPEG compression.
`In multimedia applications, lossy compression is a very common procedure to increase transmission and storage
`efficiency. Some information is thrown away during the compression process, thus creating a potential hazard
`for watermark detection. To test the robustness of the proposed watermarking approach to lossy compression,
`the watermarked audio signal is compressed and decompressed by MPEG layer Ill coder with a bit rate of
`64 kbps. As shown in Table 4, this attack is more serious than others. However, the watermark canstill be
`detected correctly.
`
`e Random cropping.
`Randomly cropping one sample out of every 100 samples produces a disastrous synchronization problem for
`time-domain watermarking methods. However, the correlation peak ratio is only slightly decreased with the
`proposed method.
`
`e Lowpassfiltering.
`With watermarks embedded in the frequency domain,lowpassfiltering with a very low cutoff frequency could
`effectively eliminate the embedded watermark. However, since our watermark is embedded in the frequency
`bands with the highest energy, filtering out the inserted watermark also greatly effects the sound quality. In
`
`
`
`390
`
`

`

`our experiment, a lowpassfilter with acutoff frequency of 4kHzis applied to watermarked audio signals. The
`loss of high frequency componentsis clearly audible, but the correlation peak ratio is only decreased around
`25%.
`
`As shown in Table 4, the correlation peak ratios after various kinds of attacks are scattered between 1.5 ~ 2.5.
`These values could be increased if the watermark is embedded and retrieved everywhere in the audio signal, or if
`the original audio is used in watermark detection. However, the correlation ratio in Table 4 is already high enough
`for unambiguous watermark detection. The efficiency achieved by blind watermark detection and embedding in
`attack-sensitive regions only is very important for the practical use of audio watermarks.
`
`6. CONCLUSION
`The rapid growth of multimedia technologies facilitates the production and transmission of digital media data.
`It brings us not only opportunities but also challenges to copyright protection. An audio watermarking scheme
`_ which meets both the robustness and the low computational complexity requirements via audio content analysis was
`presented in this paper. The analysis identifies attack-sensitive regions which are suitable for watermark insertion,
`and provides consistent audio segmentation results before and after attacks. A modified dyadic wavelet filterbank
`is used to enhance the analysis results for complex music. After audio content analysis, a watermark embed

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket