throbber
ROBUST AND HIGH-QUALITY TIME-DOMAIN AUDIO WATERMARKING
`SUBJECT TO PSYCHOACOUSTIC MASKING
`Wen-Nung Lie! and Li-Chun Chang?
`
`Department of Electrical Engineering, National Chung Cheng University, Chia-Yi, 621, Taiwan, ROC.
`E-mail : wnlie@ee.ccu.edu.tw
`2Multimedia Lab., Institute for Information Industry, Taiwan, ROC.
`
`ABSTRACT
`
`We propose in this paper a new method for embedding
`digital watermarks into audio signals in the time domain. By
`testing frequency domain characteristics
`(i.e.,
`the psycho-
`acoustic model)
`and. making appropriate adjustments, our
`algorithm is capable of preventing watermark disturbance from
`human perception. A watermark can be extracted without the
`knowledge of original audio signals. Experiments show that our
`watermarking scheme leads to results with good audio quality
`(comparable to the original ones, according to some subjective
`tests) and is robust (more than 98% of survival rate) to pirate
`attacks, such as MP3 compression, low-passfiltering, amplitude
`normalization, DA/AD same-rate reacquisition, and cropping.
`
`1. INTRODUCTION
`
`In the past few years, digital multimedia technology has
`much progress and practical applications. especially in MPEG-!
`Layer 3 (MP3) [13] — a standard of new generation in digital
`audio compression.Its compressionrate is normally between |
`:
`10
`and |
`: 12, which would save a lot of storage spaces for
`digital music and be convenient in distributing them over the
`Internet. The popularity of MP3, as a consequence, let people
`search music from Interneteasily. But, at the same time, this kind
`of convenience also gives opportunities for illegal people or
`pirates to copy others’ original works without paying. The
`problem of copyright protection needs to be resolved urgently so
`that owners are thus willing to put their MP3 works on the
`Internet for more popularity.
`Digital watermarking which has been a popular research
`issue in recent years is a process that embeds proprietary data
`(often the signature,
`logo, or simply an ID number)
`into a
`multimedia object to help protecting the owner's right. In order
`for a digital watermark to be effective and practical, it should
`possess such properties as
`(1) imperceptibility : embedding ofthis extra data should not
`degrade human perception about the object,
`the
`(2) security : a least possibility to remove or detect
`watermarks even for those who know the principle of
`embeddingalgorithm,
`(3) robustness : embedded watermarks should not be removed
`or eliminated by unauthorized distributors with common
`manipulations, like lossy compression, linear or non-linear
`filtering, cropping,etc.
`Most of the literature have focused their discussions on
`image and video watermarking. However, similar principles can
`be also applied to the problem of audio watermarking. Most of
`
`added their
`existing audio watermarking algorithms
`the
`watermark labels in the time domain [1,2,5,7,9-11], while some
`others in the frequency domain [3,4,8,12,15} and few in the
`compressed domain [6]. The basic approach to watermarking in
`the time domainis to first select a pseudo-random noise (i.e., the
`PN sequence) and then embed it
`into the host audio by
`modifying the amplitudes accordingly. Increasing the amount of
`watermark disturbance can generally enhance robustness against
`attacks, but will also degrade audio quality significantly.
`In this paper, we would like to propose a new method,
`based on the relative energy relations between adjacent sample
`sections, to embed digital watermarks into a host audio signal in
`the time domain. Our embedding process is subject to the human
`auditory system (HAS) in such a way that disturbances on the
`host audio signal is actually beyond the sensing of human ears
`(according to subjective testing). Also, the watermarks can be
`extracted without the knowledge of original host signal and be
`robust enough to a number of intentional attacks.
`including
`filtering. MP3 lossy compression, amplitude normalization, D/A
`& A/D re-acquisition, and signal cropping.
`
`2. WATERMARK EMBEDDING SCHEME
`
`The proposed watermark embedding algorithm is based on
`the relative energy relations between three consecutive sample
`sections. At first, the whole audio signal f(x) is partitioned into
`sample sections, each of length L. Denote any three consecutive
`sections of them by sec_l, sec_?, and sec_3, as shownin Fig. }.
`Their energies, E;, E., and E; are defined and computed as
`XvAbel
`.
`E}= S|foop
`wt2bel
`EL= S|Fcol>
`weed
`s43L-1
`EV= > |fts):
`NEN ADE
`
`)
`
`(2)
`
`(3)
`
`where x, is the starting sample index of sec_/. After sorting E),
`E>, and E; by their magnitudes, we re-denote them as Eynas Exide
`and E,,;,,. Compute the energy differences, we obtain
`
`A= Fax ~ Eid ,
`mun
`B=E,,-E,,.
`
`(4)
`(5)
`
`in every three
`We are trying to embed one information bit
`sections, according to the relation between A and B. The form of
`watermark can originally be text data, random data stream, or a
`logo image. They are then transformed into binary bit streams
`
`0-7803-6685-9/01/$10.00©2001 IEEE
`
`Tl-45
`
`Sony v. MZ Audio
`
`Sony Exhibit 1032
`Sony Exhibit 1032
`Sony v. MZ Audio
`
`

`

`and embeddedas follows.
`
`mid
`
`+ E,,,)°d’) (16),
`
`(1) for watermark bit “1” :
`If (A-B2E,, +2E
`Then no operation
`Else increase Enna, or reduce Eyyig
`until A-B2 (BE, +2E,, +E)-d’
`(2) for watermark bit “0”:
`If(B-A2(E,,, + 2Byg + Bam) @) U6)
`Thenno operation
`Else increase Ej,jjqg or reduce E,jin
`until B-A2=(E,, + 2Bjig t Eq.) a"
`
`where d’ isa parameter determining the difference margin that
`A should have with respect to B to represent “O” or “1” (also
`determining the amount of watermark energy to be added or
`subtracted). In both cases (1) & (2), the energies E,,j, , Eypig » OF
`Enay are modified by scaling (up or down) signal amplitudes of the
`corresponding sample sections, that is,
`f’(x) =w- f(x). The scale
`factor w surely depends on the choice of d’ [16]. On the other
`hand, a’ should be properly limited so as to maintain the quality
`of watermarked audio signals.
`
`fi)
`Amplitude
`
`L
`
`L
`
`L
`
`| <—_——+]-___+ ]«|
`sec_|
`sec_2
`sec_3
`E,
`E,
`
`x
`
`Sampies
`
`Fig. 1. Use of three consecutive sample sections for watermark
`embedding.
`
`3. WATERMARK EXTRACTION SCHEME
`
`In our algorithm, watermarks can be extracted without the
`knowledgeof original audio signal. Assuming that the start point
`for data embedding has been aligned, every three sample sections
`of the same length L can be tested for de-watermarking. Denote
`the section energies to be E’, E,, and E”, similar to Eqs.(1)~(3).
`ma >
`y
`They can also be ordered to obtain E”,
`E’,,, and E’,, and
`computed to obtain
`
`A'= EY. ~ Eg
`B= Ena ~ EX, .
`
`(6)
`(7)
`
`Comparing A’ and B’, we get theretrieved bit “1” if A’ > B’,
`and bit “O” otherwise. This process is repeated every three
`sectionsto retrieve the whole watermark bit stream.
`
`4. CONTINUITY OF AUDIO WAVEFORM
`
`Our underlying embedding scheme addressed in Section 2
`actually can not be feasible in practice since discontinuities
`occurring between boundaries of adjacent sections (notice that
`only one out of three sample sections is incurred amplitude
`modification) will cause significant audio quality defect. To cope
`with this problem, progressive weighting near section boundaries.
`as shown in Fig.2, is adopted so as to have continuous weights
`
`thereabout. Since the bit
`(and thus continuous waveform)
`embedded depends only on the total energy added to or
`subtracted from the whole section, gradual
`increasing or
`decreasing of weights near section boundaries does not influence
`the bit information that it represents. As a consequence, original
`audio quality can beretained.
`
`Weighting
`
`Weighting
`
`
`
`Samples
`to t-————
`(a)
`(b)
`Fig.2. Progressive weighting curves near section boundaries: (a)
`the scale up case, (b) the scale down case.
`
`Samples
`
`5. PSYCHOACOUSTIC MODEL TEST
`
`Masking in both the frequency and time domains plays an
`important
`role in human auditory system. Fig.3 shows the
`masking threshold in quiet, or the absolute threshold of hearing
`[14]
`is a frequency domain phenomenon where signals with
`sound pressure level not exceeding the corresponding threshold
`would be inaudible. Application of this psychoacoustic model to
`the watermarked audio signal
`is to prevent
`the disturbance
`perceptible to the human ears. The key point is to make tests and
`constrain watermark energy to below the masking thresholds.
`This requires that a suitable d’ value is chosen.
`Denote the audio signal by f(x) and the watermarked signal
`by y(x). The watermark signal r(x), which equals y(x)-/(x), can be
`obtained every segment(in this paper, a segment has 1024 points)
`and then FFT-transformed to yield the spectrum R(f). Values of
`d's for all sections included in a segment would be lowered
`- down until a large fraction of R(f) are below the given masking
`threshold curve, as shown in Fig.3. Normally, @’ is set to an
`initial value and then lowered down depending on the result of
`psychoacoustic modeltest.
`is
`Watermark signal would not be perceivable if it
`controlled by testing the psychoacoustic model and reducing d’.
`Butnotice that decreasing of d’ would also reduce the robustness
`of watermark.
`
`6. ERROR CORRECTION CODING
`
`This procedure is to protect the watermark data by packing
`them with error correcting codes (ECC) before embedding so as
`to increase their recovery capability (i.e., robustness) under
`external attacks.
`/
`Fist,
`the watermark is described in terms of a series of
`binary data. A seed of pseudo random numbergenerator is used
`to permute binary bits so that security of watermark will be
`improved. The permutated bit stream is then transformed intoits
`convolution code with a bigger length. A tradeoff should be
`actually made between robustness and embeddingefficiency.
`
`II-46
`
`

`

`the information embedded in samples y, ~
`Step 2: Extract
`Y,,,, by the technique addressed in Section 3.
`Step 3: Compare the extracted data with the synchronization code.
`If they match, J‘ = is identified as the alignment point
`and go to step 5. Otherwise, proceed with nextstep.
`Step 4: Increase I by 1
`to window the next L, samples and
`proceed with steps 2 and3.
`Siep 5: Complete the whole de-watermarking process by
`extracting bits embedded in samplesafter y,.
`.
`
`The adoption of synchronization code can obviously make
`the search of alignment point
`J’ much more efficient (since
`N, >> N,) if the audio is indeed cropped or inserted with some
`insignificant signals (e.g., silence segment) to disturb the de-
`watermarking process. Theoretically,
`the maximum length of
`samples to be searched for the synchronization code will be no
`more than 1, =3L-(N,+N,). Our system is designed with an
`option to search the
`synchronization code
`for
`speed-up
`consideration.
`
`8. EXPERIMENTAL RESULTS
`
`1 Segment =
`9.
`1024 pts
`
`r(x)
`Watermark
`
`Samples
`
`(a)
`
`Sound
`Pressure
`Level in dB @
`0
`Threshold in
`i Spretrumof
`0 “ Quiet
`Watermark
`
`xX
`
`0.0501 02
`
`o5
`
`Lt
`
`2
`
`5 0 ®D
`
`Frequency
`(kHz)
`
`a 1
`
`0 °
`
`0.02
`
`Fig. 3. Psychoacoustic mode! test. (a) FFT for each segment
`(1024 points) of r(x), (b) masking thresholds in quiet for
`frequencies up to 20 kHz.
`
`7. SYNCHRONIZATION PROBLEM
`
`The watermark extraction scheme presented in Section 3
`assumes that
`the start point of the first sample section is
`synchronized. This assumption will be violated when an attacker
`tries to crop the signal or insert redundancy in the front end. To
`cope with this problem, a synchronization code (a special 20-bit
`pattern) is concatenated with the convolution-coded watermark
`data before they are repeatedly embeddedin the audio signal (as
`shown in Fig.4).
`
`Synchronization|Convolution-coded|Synchronization|Convolution-coded code watermark code watermark
`
`
`
`
`
`Three kinds of music, including Duicimer, Symphony, and
`Popular, each is of 30~60 seconds duration and has 44.1kHz
`sampling rate and stereo channels, are used for experimentaltests.
`Parameters
`used
`in
`the
`embedding
`algorithm include
`L=300samples and initial d’=0.05 for psychocoustic model
`test. All
`the watermarked audio signals are listened by seven
`testers and 100% of them feel much satisfied. When presented
`with the original audio signals and requested to compare with the
`watermarked ones (without prior knowledge), 5 out of 7 testers
`can't make distinction of them. Under this watermarked audio
`
`quality, we perform a series of intentional attacks to prove the
`robustness of our watermarking algorithm.
`We adopt 1) MP3 compression, 2) low-pass filtering, 3)
`amplitude normalization, 4) digital-to-analog / analog-to-digital
`same-rate re-acquisition, and 5) signal cropping, for attack tests.
`Table 1 shows result of watermark extraction after MP3
`compression at different bit rates. To reflect the true error rate,
`the watermark is selected as a pattern of 101010--- and not
`protected by the convolution code. For commercial usage, 128
`kbps is most popular and 80 kbps does not have satisfactory
`quality even without watermark being embedded. We also
`experimented with re-encoding (i.e., “twice’’) to simulate the
`possible attack from pirates. All results gain watermarks at more
`than 98% ofcorrelation, except 80 kbps MP3 compression..
`Table 2 shows results for low-passfiltering attack, whose
`bandwidth is 4kHz. It
`is found that the Popular music has a
`higher error rate (1.9~2.8%) due to its business nature in
`amplitudes.
`:
`When the watermarked signal is attacked by DA/AD re-
`acquisition (D/A followed by A/D), signal amplitudes will be
`incurred a factor change (the same phenomenon for energy
`normalization) and the section start point be shifted forwards or
`backwards(i.e., become asynchronous). We apply the techniques
`addressed in Section 6 & 7 to embed the watermarks. Table 3
`shows the results with different kinds of watermarks adopted,
`such as random bits, text, and logo image. It can be found that
`100% of correlation can still be achieved in spite of some
`
`N,
`
`N,
`
`Fig. 4. Concatenation of synchronization code and convolution-
`encoded watermark.
`
`Warermarked
`Signal
`
`
`
`
`
`is kbsh73 MAAsla
`
`
`
`Samples
`
`“nerve.
`Jor Scarcring Syretrosivalon cave
`
`[rerenrrantreennnnencnsconsel
`L, >3L(N, + No)
`
`Fig. 5. Search of the synchronization code.
`
`For shift-invariant de-watermarking purpose, we can search
`for the synchronization code first before a series of retrieval
`process can be started. A searching interval L;,
`larger than
`3L(N\+N2)
`(see Fig.4), should be appointed so that a full
`synchronization code can be found providentially. Fig. 5 shows
`the chart and the search algorithm is described below.
`
`Step 1: Set starting index of the test audio signal, / = 1, and select
`L,=3L-(N,).
`
`Il-47
`
`

`

`alignment errors. This can be owing to the adoption of error
`correcting codes.
`
`[i]
`
`REFERENCES
`
`Dulcimer
`
`g|R
`
`
`R
`892
`a
`
`|Textdata|48|2708=f1100 256|oh|6100
`
`{8]
`
`“Digital
`L. Boney, A.H. Tewfik and K.N. Hanmdy,
`Table 1. Results of watermark extraction after MP3
`watermarks for audio signals,” Proceedings of the Third
`
`IEEE International Conference on Multimedia Computing
`Compression for Dulcimer.
`
`
`
`
`
`Embedded| Error|Correlation
`and Systems, pp.473~480, 1996.
`
`
`
` : Bit rate
`
`
`Channel
`bits
`(%)
`
`
`[2] D. Gruhl, A. Lu, and W. Bender, “Echo hiding,” Lecture
`[|L[|i441[0|100%|
`Notes in Computer Science, Information Hiding, Vol.1174,
`
`
`160 kbps
`
`pp.295~315, Springer, 1996.
`PR[tagft99.3%|
`
`
`
`[3] J.Tilki and A.A. Beex, “Encoding a hidden auxiliary
`|cft4at[|6|99.58% |
`128 kbps
`
` (once) R 1441 3 99.79%
`channel onto a digital audio signal using psychoacoustic
`
`
`
`
`
`
`masking,” Sourheastcon'97. Engineering New Century,
`
`
`128 kbps
`L
`1441
`15
`98.96%
`
`Proceedings, IEEE, pp.331~333, 1997.
`(twice)|R_|i441|13|99.10%
`
`[4]
`Ye Wang, “A new watermarking method of digital audio
`
` . L 1441 43 97.02%
`
`
`
`
`
`content for copyright protection,” Proceedings of ICSP'SS
`Fourth International Conference on Signal Processing,
`Vol.2, pp. 1420~1423, 1998.
`Table 2. Results of watermark extraction after low-passfiltering.
`(5]
`P. Bassia and I. Pitas, “Robust audio watermarking in the
`. Embedded|Error|Correlation
`
`time domain,” EUSIPCO’98, 1998.
`(6) Q. Lintian and N. Klara, “Non-invertible watermarking
`
`L
`144]
`methods
`for MPEG encoded audio,” Proc. Of SPIE
`R
`144)
`Conference on Security and Watermarking of Multimedia
`L
`926
`Contents, Vol.3657, pp.194~202, 1999.
`Symphony
`_|926|2|99.78%|
`[7] M. Ikeda, K. Takeda and F. Itakura, “Audio data hiding by
`use
`of
`band-limited
`random sequences,”
`JEEE
`International Conference on Acoustics, Speech, and Signal
`Processing, Vol.4 , pp.2315~2318, 1999.
`C.P. Wu, P.C. Su and C.C. Jay Kuo, “Robust frequency
`Table 3. Results of watermark extraction after cropping attack.
`domain audio watermarking based on audio content
`Estimated|Alignment
`analysis,” SMIP’99.
`:
`
`Watermark|Embedded| alignment Correlation
`
`[9]
`C. Xu, . Wu, and Q. Sun, “A robust digital audio
`type
`data bits
`:
`(%)
`watermarking
`technique,” Proceedings of
`the Fifth
`International Symposium on Signal Processing and Its
`Applications, Vol.1, pp.95~98, 1999.
`{10] C. Xu, J. Wu, Q. Sun and K. Xin, “Application of digital
`watermarking technology in audio signals,” Journal of the
`Audio Engineering society, Vo\.47, pp.805~8 12, 1999.
`[11] C. Neubauer, J. Herre, and K. brandenburg, “Continuous
`steganographic data
`transmission using uncompressed
`audio,” Information Hiding, pp.208~217, 1998.
`[12] J. Lacy, S. R. Quackenbush, A. R. Reibman, D. Shur, and J.
`H. Snyder, “On combining watermarking with perceptual
`coding.” Proceedings of the 1998 IEEE International
`Conference on Acoustics, Speech and Signal Processing,
`Vol.6, pp.3725 ~3728, 1998.
`[13] P. Noll, “MPEG digital audio coding,” /EEE Signal
`Processing Magazine, Vol.145, PP.59~81, 1997.
`[14] E. Ambikairajah, A. G. Davi, and W. T. K. Wong, “Auditory
`masking and MPEG-1 audio compression,” Electronics &
`Communication Engineering Journal, Vol.94, pp.165~175,
`1997,
`[15] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure
`spread spectrum watermarking for multimedia.” /JEEE
`Trans. on Image Processing, Vol.6, No.12, pp.1673~1687,
`1997,
`
`9. CONCLUSIONS AND REMARKS
`
`A novel method for embedding digital watermarks into
`audio signals subject to human auditory perception is proposed
`in this paper. Our algorithm maintains an energy relation between
`every three sample sections to represent
`the embedded bit
`information by scaling up or down corresponding amplitudes and
`conserving audio waveformsthat are perceivable to humanears.
`Though the watermark is embedded in time domain, frequency
`domain characteristics are also analyzed (i.e.. the psychoacoustic
`model) to prevent human perception. Besides,
`the watermarks
`can be extracted withoutoriginal audio signals.
`Experiments show that our watermarking scheme leads to
`results with good audio quality (comparable to the original one,
`according to the subjective tests) and is robust (more than 98% of
`survival rate) to pirate attacks, such as MP3 compression, low-
`pass filtering, amplitude normalization, DA/AD reacq-uisition,
`and cropping.
`Advanced attacks should be directed towards samplingrate
`conversion(e.g., 44.1 KHz <248 kHz), resolution transfor-mation
`(e.g., 8 bits © 16 bits), and stereo/monoconversion.
`
`[16] Li-Chun Chang, “Digital Watermarking of Audio Signals,”
`Master thesis, National Chung Cheng Univ., ROC., June
`2000.
`
`II-48
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket