throbber
SPREAD SPECTRUM SIGNALING FOR SPEECH WATERMARKING
`
`Qiang Cheng
`
`University of Illinois
`Urbana Champaign,IL
`
`Jeffrey Sorensen
`
`IBM T. J. Watson Research Center
`Yorktown Heights, NY
`sorenj@us.ibm.com
`
`ABSTRACT
`
`The technique of embeddinga digital signal into an audio record-
`ing or image using techniques that render the signal imperceptible
`has received significant attention. Embedding an imperceptible,
`cryptographically secure signal, or watermark, is seen as a poten-
`tial mechanism that may beused to prove ownershipordetect tam-
`pering. While there has been a considerable amountofattention
`devoted to the techniques of spread-spectrum signaling for use in
`image and audio watermarking applications, there has only been a
`limited study for embedding data signals in speech. Speechis an
`uncharacteristically narrow bandsignal given the perceptual capa-
`bilities of the human hearing system. However, using speechanal-
`ysis techniques, one may design an effective data signalthat can be
`usedto hide an arbitrary messagein a speech signal. Also included
`are experiments demonstrating the subliminal channel capacity of
`the speech data embedding technique developedhere.
`
`1. INTRODUCTION
`
`Watermarking is a technique for embedding a cryptographic sig-
`nature into digital content for the purposes of detecting copying or
`alteration of the content. This is accomplished using coding tech-
`niques that hide data within the image or audio content in a manner
`not normally detectable. This paper focuses on an, asyet, largely
`unexplored aspect area of audio watermarks: speech.
`For audio watermarking, Preuss, et. al.
`[8] invent a digital
`information hiding technique for audio using the techniques of
`spread spectrum modulation. Boney, et. al.
`[2] explicitly make
`use of MPEG-1 Psychoacoustic Model to obtain the frequency
`masking values to achieve good imperceptibility. Recently Riuz
`and Deller [9] propose a speech watermarking methodfor the ap-
`plication to the digital speech libraries. These methods have been
`extensively applied for music applications, but embed information
`over a very wide audio band based on humanhearing capabilities.
`A potential attacker need only low-passfilter the resulting signal
`to remove most of the watermarking information.
`Speech differs from music in their acoustic characteristics and
`watermarking requirements. Speech is an acoustically rich signal
`that it uses only a small portion of the human perceptual range.
`Typical speech reproduction hardware, although often the same as
`used with music, includes much lower bit rate channels such as
`telephone or compressed voice “vocoders.” However,
`the same
`analysis techniques employed in such voice coding schemes can
`easily be adapted to create an audio watermarking signal that is
`robustto speech channels. Presented here is a technique for encod-
`ing an additional, arbitrary digital messageinto speechsignals. By
`making use of the well understood techniques of speech analysis,
`
`significantly higher bit rates can be embedded without effecting
`the perceived quality of the recording.
`
`Thedigital hiding technique for speech can be applied to copy-
`right protection for digital speech libraries, audio books,as well as
`covert communication channel. The embedded information may
`be any digital message. Messagesthat can be used to prove author-
`ship require the generation of an appropriate cryptographically se-
`cure digital message and are beyondthe scope ofthis paper. How-
`ever, consult [4] for information on the application of watermarks.
`
`2. VOICEBAND SPREAD SPECTRUM SIGNAL
`
`In contrast to previous work on audio watermarking, the speech
`signal is a considerably narrower bandwidth signal. The long-
`time-averaged powerspectral density of speech indicatesthat the
`signal is confined to a range of approximately 10 Hz to 8 kHz
`[6].
`In order that the watermark survives typical transformation
`of speech signals, including speech codecs,it is importantthat the
`watermark be limited to the perceptually relevant portions of the
`spectra. However,
`the watermark should remain imperceptible.
`Therefore, a spread-spectrum signal with an uncharacteristically
`narrow bandwidth will be used.
`
`Using a direct sequence spread spectrum [3] signal, we wish
`to design a PN sequence with a main side lobe thatfits within a
`typical telephone channel [5], which ranges from 250 Hz to 3800
`kHz.
`In this work, the message sequence and the PN sequence
`are modulated using simple Binary Phase Shift Keying (BPSK).
`The center frequency of the carrier is chosen to be fe = 2025Hz.
`The clock rate of the PN sequence, or chip rate,
`is taken to be
`1775Hz, whichis half of the signal bandwidth. Because the width
`of our watermark is very close to the modulation frequency, it is
`necessary to low passfilter the spread spectrum signal before mod-
`ulation to prevent excessive aliasing. For this, we have chosen to
`use a seventh order Butterworthfilter with a cutoff of 3400 Hz.
`
`illustrates the power spectral density of the water-
`Figure |
`mark signal, with the long-term average speech power spectrum
`(for both a male and female speaker) for illustration. The sim-
`plest implementation of a speech watermark system would involve
`adding this signal, which sounds primarily like radiostatic, to the
`speech signal at the appropriate gain. However, taking advantage
`of our knowledge of the speech signal itself, we are able to em-
`bed a significantly higher gain signal using techniquesthatare the
`subject of the next two sections.
`
`0-7803-7041-4/01/$10.00 ©2001 IEEE
`
`1337
`
`Sony Exhibit 1023
`Sony Exhibit 1023
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`

`

`PowerSpectrumMagnitude(48) 1 &
`xJ:888
`
`
`
`
`
`
`
` 0
`
`
`
`\ SAANfA,ww~YN
`
`os
`
`<
`
`15
`
`2
`
`Frequency
`
`
`
`4.
`
`25
`x 10°
`
`
`
`
`
`~60F
`
`
`
`Watermark Sign
`
`]
`
`\\
`
`
`
`D 8 PowerSpectralDensity(d8)
`
`\a
`
`har
`\
`
`1
`.
`200025
`0030
`Frequency (Hz)
`
`.
`oO
`
`1
`380040
`
`\
`0045
`
`aS pwr 'yl Sey
`se
`
`at ape N
`v
`N
`NasYOY
`\
`
`-J
`
`5000
`
`1
`
`oo
`
`500
`
`:
`100015
`
`1.
`00
`
`& 8
`
`-140]
`
`to)
`
`-180
`
`ae
`
`0
`
`
`
`
`
`
`
`Figure 1: Power spectral densities of the watermark, male voice,
`and female voice.
`
`3. LPC ANAYLSIS AND FILTERING
`
`Ourgoal is to add as much watermark signal energy as possible to
`the speech signal, while still satisfying the constraint that the added
`signal not be perceivable whenlistened to. Most watermarking ap-
`proachesrely on a perceptual model of human hearing. Speechis
`an inherently complex stimuli with rapidly changing spectral char-
`acteristics. Conventional masking effects are most often studied
`for spectral bands outside the range of speech, above 4 kHz. How-
`ever, an effective production model for speech is available. The
`well knowntechnique of linear prediction has proven to be highly
`effective in modeling speech signals.
`In addition, human speech
`perceptionreflects the production system characteristics. Ourfind-
`ings indicate that using the production model can provide excellent
`hiding characteristics.
`In our watermark signal embedding algorithm, the watermark
`signalis filtered to match the overall spectral shape of the speech
`signal. In addition, the linear predictive analysis provides an effec-
`tive dynamic measure of the degree of noise already present in the
`speech signal. Portions of speech that have a highly white spec-
`trum, fricative sounds and the rapidly changing plosives sounds,
`are especially good candidates for embedding additional water-
`mark energy.
`Linear predicative analysis of speech involves computing the
`maximumlikelihood coefficientsofanall-pole filter of the form
`if
`
`A(z) =
`
`Go +a12z~>1+---+ap27P
`
`(1)
`
`Thereis a considerableliterature on the application of linear pre-
`diction to speech signals. For our analysis, we have chosen to use
`the Levinson-Durbin recursive technique for evaluating LPC coef-
`ficients a; from the short-term autocorrelation coefficients.
`The short term autocorrelation can be computed from the win-
`dowed speech frame s(t) as
`N-1
`— S> s(n)s(n — #)
`n=1
`
`which,
`
`TO
`ry
`
`Pi
`To
`
`tae
`ts
`
`[Tp-1
`Tp-2
`
`Tp-1
`
` Tp-2 + To
`
`a4
`a2
`
`Ap
`
`T1
`r2
`
`Tp
`
`Figure 2: Power spectrum of a segment of speech and spectrum of
`LPC-shaped watermark signal.
`
`which, in vector notation can be represented by
`
`Ra=r
`
`The prediction residual energy, or the average squared-error can
`be computed as
`
`E=a'Ra
`
`is a measure of the “predictability” of the speech signal, and an
`effective measure of the noise content.
`Beforefiltering the watermark signal using theall-pole filter,
`a bandwidth expansion operation is performed. This movesall of
`the poles closer to the center ofthe unit circle, increasing the band-
`width of their respective resonances. The vocaltract filter often
`tends to have quite narrow spectral peaks. Due to masking phe-
`nomena, sounds near these peaks are unlikely to be perceived by
`the listener. Therefore, by increasing the bandwidth of formantre-
`sponses, larger overall watermark signal gains should betolerable.
`The bandwidth parameter ¥ is used to adjust the LPC coefficients
`— a
`a; = aiy
`
`where y may be chosen between 0 and1.
`Figure 2 shows the power spectrum of a segment of speech,
`and the spectrum of the watermark signal that results afterfiltering
`using the spectral envelope of the speech segment.
`
`4. WATERMARK SIGNAL GAIN
`
`The instantaneous watermark gain is dynamically determined to
`match the characteristics of the speech signal. In the simplest case,
`whenlittle speech energy is present(i.e. during silence) the wa-
`termark is added using a fixed gain threshold. This is selected so
`that the watermark becomesthe effective noisefloorof the record-
`ing. Perceptually, a small amountof noise is always expected in a
`recording and the watermarksignalis not atypical of such record-
`ing noise. In many applications, silence may not be transmitted or
`might be by coded using extreme compression.
`In these circum-
`stances, designers should choose anerror correcting code (such as
`a convolutional code) with the proper characteristics so that the
`message may be recovered despite these losses.
`The normalized per sample speech energy EFfor one frameis
`N

`Es = % oy, 8 (n) = Fro.
`
`1338
`
`

`

`
`
`
`
`
`
`BitErrorProbability
`
`
`
`BX8&BitErrorProbability
`
`0
`
`20
`
`60
`40
`Frame Rate(ms)
`
`0
`
`100
`
`500
`
`1500
`4000
`Message Bit Rate (bits/sec)
`
`2000
`
`(%)
`
` 200
`
`Figure 4: Bit Error Probability versus Frame Rate, and Bit Error
`Probability versus Message Bit Rate.
` 250
`—*- Female Speaker
`
`
`
`
`
`
`
`WatermarkingChannelCapacity(bits/sec)
`
`-©- Male Speaker
`
`50
`
`0
`
`500
`
`1500
`1000
`MessageBit Rate(bits/sec)
`
`2000
`
`
`
` 1 L 1 1 L 1 1 1
`
`
`
`
`
`
`
`0
`04
`02
`03
`04
`os
`06
`O7
`08
`09
`1
`
`
`Energy
`
`x10”
`ey ‘
`:
`T
`1
`t
`Constant, Predictor Error and EnergyTerms
`Constant + Predictor Eyror
`\
`\
`41Cons tant Tern
`Xe
`
`3
`\ \
`
`
`
`\
`
`nn
`YNoap
`sae
`Withee
`
`02
`03
`04
`05
`0
`4
`0.
`Time(s)
`
`.
`
`4
`
`|
`
`06
`
`07
`
`08
`
`09
`
`1
`
`
`
`WatermarkAdditive
`
`5
`
`1
`
`Figure 3: A segmentof speech and the corresponding watermark
`gains.
`
`The watermark gain in each frame can be determined bythe
`linear combinationofthe gains for silence, normalized per sample
`residual energy FE, and normalized per sample speech energy Es,
`
`g(t) =Ao+ AEF AEs,
`
`(2)
`
`which is designed to maximize the strength of the watermark sig-
`nals without incurring perceptual degradations. Figure 3 shows a
`segment of speech and the embedded watermarksignal. The re-
`sulting watermarked speech is shownalso in Figure 3. Listening
`test demonstrates that the watermarked speechis indistinguishable
`from the original speech with this watermark gain. If the gain is
`increased further, there will be “hoarseness” in the watermarked
`speech. Thoughit hardly affects the naturalness of the voice, the
`difference with the original speech is indeed perceptible.
`
`5. WATERMARK DETECTION
`
`Atthe receiving end, the received signal ro(t) is given by
`N
`ro(t) = Y> w(t) + s(t) + Lo(t),
`
`t=1
`
`63)
`
`where w(t) is the LPC-shaped watermark signal, s(£) is the orig-
`inal speech signal, and Io(t) is somedeliberated attacksordigital
`signal processing. Weestimate the LPC coefficients from the re-
`ceived signal, and then take the inverse LPCfiltering of ro(t) to
`get r(t). After inverse LPCfiltering, voiced speech becomesperi-
`odie pulses, and unvoiced speech becomes whitened noise. Asis
`typical for speech processing, we modelthe inversefiltered s(t) as
`White Gaussian Noise (WGN). Inverse LPCfiltering decorrelates
`the speech samples s(t) as well as equalizes the watermarksignal
`w(t). A correlation receiver,
`
`N
`Hy
`Y= d(t)r(t) > 0,
`t=1
`
`(4)
`
`Figure 5: Watermarking Channel Capacity versus Message Bit
`Rate
`
`desired robustnessproperty. The decodingrule is a maximum like-
`lihood decisionrule, which is also a minimum probability-of-error
`rule since 0 and 1 in the message are sent with equal probabilities.
`The problem of synchronization when the original message is
`not available is beyond the scope of this paper. However, the PN
`sequence used in the spread spectrum modulation can be used to
`drive a phase locked loop during decoding. The techniques pre-
`sented in [3] [8] can be used in our framework for synchronization
`purposes.
`:
`
`6. EMBEDDED CHANNEL CAPACITY
`
`A set of simulation experiments were performed to demonstrate
`the relationship betweenthe frame size and messagerate (1 bit per
`frame) and thebit error probability, as shown in Figure 4.
`The spread spectrum signal, when addedto the original speech,
`can be considered as a noisy communication channel, called the
`watermarking channel. The watermarkis the content of the trans-
`mitted message. Without loss of generality, the message is con-
`sidered to be a binary signal with equal probability for 0 and 1.
`The watermark channel is binary symmetric The channel capac-
`ity, which is the theoretical maximumrate for data transmission,is
`defined for the watermarking channel [1]:
`
`gives us optimumdetection performance in AWGN [7], where NV
`is the length of a frame, in which one messagebit is embedded,
`d(t) is the despreading function, which is the synchronized, BPSK
`modulated spreading function for the current frame. The correla-
`tion with d(t) can average outthe interference, thus providing the
`
`C = R(1 + plogop + (1 — p)loge(1 — p)),
`
`(5)
`
`where p is the crossover probability, R is the message bit rate.
`The simulation results for the watermarking channel capacity are
`plotted in Figure 5. For a binary symmetric channel, the chan-
`
`1339
`
`

`

`tioning system can be built using the data embedding algorithm
`presented here, where the text transcription of the speech would
`be hidden in the speechitself. In addition, in-band signaling ap-
`plications, typically done using dual tone *touch-tone”signals can
`be replaced with embedded contro] signals, suggesting novelsi-
`multaneous voice and data applications. For the purposeofside-
`information embedding,
`thereis little threat from intentional at-
`tacks. Thus, a larger capacity of information can be communicated
`with less dependency on the redundancy oferror correct codings.
`
`9. REFERENCES
`
`[1] R. E. Blahut. Principles and Pratice of Information Theory.
`Addison-Wesley Publishing Company, 1987.
`
`[2]
`
`L. Boeny, A. H. Tewfik, and K. N. Hamdy. Digital watermarks
`for audio signals.
`In Proc. of Multimedia 1996, Hiroshima,
`1996.
`
`G. R. Cooper and C. D. McGillem. Modern Communica-
`tions and Spred Spectrum. McGraw-Hill Book Company, New
`York, 1986.
`
`F. Hartung and M. Kutter. Multimedia watermarking tech-
`niques. In Proceedings of the IEEE, vol. 87, July, 1999.
`
`C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz.
`Ntimit: A phonetically balanced, continuous speech,
`tele-
`phone bandwidth speech database.
`In JCASSP, pages 109-
`112, Albuquerque, NM, 1990.
`N. S. Jayant and P. Noll. Digital Coding of Waveforms. Pren-
`tice Hall, Inc., Englewood Cliffs, New Jersey, 1984.
`
`H. V. Poor. An Introduction to Signal Detection and Estima-
`tion. Springer-Verlag, New York, 1994.
`
`R. Preuss, S. Roukos, A. Huggins, H. Gish, M. Bergamo, and
`P. Peterson. Embedding Signalling. US Patent 5319735, 1994.
`
`R. J. Ruiz and J. R. Deller. Digital watermarking of speech
`signals for the national gallery of the spoken word. In JCASSP,
`Turkey, 2000.
`
`1340
`
`
`
`
`Speec
`Watermark
`Bit
`Bit
`Rate
`Reliability
`706 kbps
`74.05%
`128 kpbs
`71.58%
`32 kbps
`68.65%
`13 kpbs
`61.23%
`
`
`
` ;
`
`Compression
`Scheme
`16 bit linear PCM
`16 bit linear PCM
`IMA ADPCM
`GSM 6.10
`
`Speech
`Bandwidth
`22 kHz
`4 kHz
`4 kHz
`4kHz
`
`Table 1: Watermarking Attacks by Voice Compression
`
`nel capacity is achievable [1]. That is, transmission codes can be
`designed for reliable communication underoratthisrate.
`The plot shows that the frame size needs to be small when
`high channel capacity is desired. However, the LPC prediction
`suffers when the frame size is too small, which makes LPC shap-
`ing less effective. And also the degradation of the watermarking
`channel due to attacks is more severe for smaller frame, see Sec-
`tion 7. Therefore, there is an intrinsic tradeoff between channel
`capacity and survivability of watermark. To achieve high channel
`capacity, good LPC predictability, and reasonable survivability si-
`multaneously we have chosen 800 bits per second as our message
`embeddingrate.
`
`7. ROBUSTNESS
`
`Watermarked media is subject to a variety of attacks. With images,
`images may be cropped,rotated, filtered, or otherwise changed.
`Audio signals are less subject to these types of manipulations, as
`the human perceptual system is quite sensitive to changes in au-
`dio signals. However, speech signals may be affected by transfor-
`mations that include: analog to digital and digital to analog con-
`versions, filtering, re-equalization, changes in playback rate, and
`compression. The algorithm presented here puts all of the water-
`marksignal in the most perceptually important areas of the speech
`signal. Therefore, primitive attempts to remove the watermark by
`filtering are almost certain to proveineffective.
`In order to demonstrate the robustness of the data embedding
`scheme, wehave used an analog reproduction system to simulate a
`crude attemptat duplication. A recording is made at 8 kHz, signif-
`icantly reducing the bandwidth, and then the signal is re-sampled
`at the original rate. This could be considered similar to recording
`across a telephone channel, although no explicit telephone net-
`work equalization was applied. Finally, these 8 kHz recording
`were compressed and decompressedusingthe typical speech com-
`pression algorithms IMA ADPCMand GSM 6.10. Theresults are
`summarized in Table 7.
`
`8. APPLICATIONS AND FUTURE WORK
`
`This paper presents a technique for embedding an arbitrary mes-
`sage In a speech signal.
`In order to provide a complete water-
`marking application, one must choose a messagethat provides the
`appropriatecryptographic properties, such as proof of authenticity
`or ownership. In this respect, the embedding algorithm presented
`here can be used with nearly any comparable application. For ex-
`ample,it can be applied to the copyright of the language-learning
`CD's, audio books, recorded teleconferencing data, digital speech
`libraries [9] and Internet radio broadcasts,etc.
`In addition, a speech data embeddingalgorithm suggests some
`new andpossibly unique applications. For example, a closed cap-
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket