throbber
Supplied by the British Library 08 Sep 2022, 09:49 (BST)
`
`Sony Exhibit 1013
`Sony v. MZ Audio
`
`

`

`Supplied by the British Library 08 Sep 2022, 09:49 (BST)
`
`

`

`Signal Processing 66 (1998) 337—355
`
`Robust audio watermarking using perceptual masking
`
`Mitchell D. Swanson *, Bin Zhu, Ahmed H. Tewfik, Laurence Boney
`
` Department of Electrical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
` Ecole Nationale Supe& rieure des Te& le& communications/SIG, 46 rue Barrault, 75634 Paris Cedex 13, France
`Received 10 February 1997; received in revised form 11 November 1997
`
`Abstract
`
`We present a watermarking procedure to embed copyright protection into digital audio by directly modifying the
`audio samples. Our audio-dependent watermarking procedure directly exploits temporal and frequency perceptual
`masking to guarantee that the embedded watermark is inaudible and robust. The watermark is constructed by breaking
`each audio clip into smaller segments and adding a perceptually shaped pseudo-random sequence. The noise-like
`watermark is statistically undetectable to prevent unauthorized removal. Furthermore, the author representation we
`introduce resolves the deadlock problem. We also introduce the notion of a dual watermark: one which uses the original
`signal during detection and one which does not. We show that the dual watermarking approach together with the
`procedure that we use to derive the watermarks effectively solves the deadlock problem. We also demonstrate the
`robustness of that watermarking procedure to audio degradations and distortions, e.g., those that result from colored
`noise, MPEG coding, multiple watermarks, and temporal resampling.  1998 Elsevier Science B.V. All rights reserved.
`
`Zusammenfassung
`
`Wir stellen ein Wasserzeichen-Verfahren zur Einbettung des Urheberrechtsschutzes in digitale Audiodaten vor, wobei
`die Audiosignalwerte direkt modifiziert werden. Unser audioabha¨ ngiges Wasserzeichen-Verfahren nu¨ tzt unmittelbar die
`Wahrnehmungsverdeckung in Zeit-und Frequenzbereich aus, um sicherzustellen, da¨ s das eingebettete Wasserzeichen
`unho¨ rbar und robust ist. Das Wasserzeichen wird konstruiert, indem jeder Audioabschnitt in kleinere Segmente zerteilt
`wird und eine wahrnehmungsgerecht geformte Pseudozufallsfolge hinzuaddiert wird. Das gera¨ uschartige Wasserzeichen
`ist statistisch nicht erkennbar, um unautorisiertes Entfernen zu verhindern. Weiters lo¨ st die von uns eingefu¨ hrte
`Autorendarstellung das Pattstellungsproblem. Wir fu¨ hren auch den Begriff dualer Wasserzeichen ein: eines, das das
`Originalsignal wa¨ hrend der Erkennung benutzt, und eines, das es nicht benutzt. Wir zeigen, da¨ s der Ansatz mit dualen
`Wasserzeichen in Verbindung mit dem Verfahren, das wir zur Herleitung der Wasserzeichen einsetzen, das Pattstellun-
`gsproblem wirksam lo¨ st. Wir zeigen auch die Robustheit des Wasserzeichen-Verfahrens gegenu¨ ber Audiosto¨ rungen
`und -verzerrrungen, z.B.
`jenen, die von farbigem Rauschen, MPEG-Codierung, mehrfachen Wasserzeichen, und
`Abtastratenwandlung herru¨ hren.  1998 Elsevier Science B.V. All rights reserved.
`
`Re´ sume´
`
`Nous pre´ sentons dans cet article une proce´ dure de watermarking permettant d’inte´ grer une protection de droits
`d’auteur dans des donne´ es audio nume´ riques par modification directe des e´ chantillons audio. Cette proce´ dure exploite
`
`* Corresponding author.
` This work was supported by AFOSR under grant AF/F49620-94-1-0461. Patent pending, Media Science, Inc., 1996.
`
`0165-1684/98/$19.00  1998 Elsevier Science B.V. All rights reserved.
`PII S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 0 1 4 - 0
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`338
`
`M.D. Swanson et al. / Signal Processing 66 (1998) 337–355
`
`directement les masquages perceptuels temporel et fre´ quentiel pour garantir que le filigrane nume´ rique (watermark) est
`inaudible et robuste. Le watermark est construit en fragmentant chaque morceau audio en segments plus petits et en
`ajoutant une se´ quence pseudo-ale´ atoire modele´ e perceptuellement. Le watermark semblable a` du bruit est inde´ tectable
`statistiquement afin d’empeˆ cher une suppression non autorise´ e de celui-ci. De plus, la repre´ sentation de l’auteur que nous
`introduisons re´ soud le proble` me de l’impasse. Nous introduisons e´ galement la notion de watermark dual: l’un qui utilise
`le signal original lors de la de´ tection et l’autre non. Nous montrons que l’approche de watermarking dual combine´ e avec
`la proce´ dure que nous utilisons pour de´ river les watermarks re´ soud effectivement le proble` me de l’impasse. Nous mettons
`e´ galement en e´ vidence la robustesse de cette proce´ dure de watermarking vis-a` -vis des de´ gradations et distorsions audio,
`telles que celles qui re´ sultent d’un bruit colore´ , d’un codage MPEG, de watermarks multiples, et de re´ -e´ chantillonnage
`temporel.  1998 Elsevier Science B.V. All rights reserved.
`
`Keywords: Copyright protection; Masking; Digital watermarking
`
`1. Introduction
`
`Efficient distribution, reproduction, and manip-
`ulation have led to wide proliferation of digital
`media, e.g., audio, video, and images. However,
`these efficiencies also increase the problems asso-
`ciated with copyright enforcement. For this reason,
`creators and distributors of digital data are hesitant
`to provide access to their intellectual property. They
`are actively seeking reliable solutions to the prob-
`lems associated with copyright protection of multi-
`media data.
`Digital watermarking has been proposed as
`a means to identify the owner or distributor of
`digital data. Watermarking is the process of encod-
`ing hidden copyright information in digital data by
`making small modifications to the data samples.
`Unlike encryption, watermarking does not restrict
`access to the data. Once encrypted data is decrypted,
`the media is no longer protected. A watermark is
`designed to permanently reside in the host data.
`When the ownership of a digital work is in question,
`the information can be extracted to completely
`characterize the owner.
`To function as a useful and reliable intellectual
`property protection mechanism, the watermark
`must be:
`E embedded within the host media;
`E perceptually inaudible within the host media;
`E statistically undetectable to ensure security and
`thwart unauthorized removal;
`E robust to manipulation and signal processing
`operations on the host signal, e.g., noise, com-
`
`pression, cropping, resizing, D/A conversions, etc.;
`and
`E readily extracted to completely characterize the
`copyright owner.
`In particular, the watermark may not be stored
`in a file header, a separate bit stream, or a separate
`file. Such copyright mechanisms are easily removed.
`The watermark must be inaudible within the host
`audio data to maintain audio quality. The water-
`mark must be statistically undetectable to thwart
`unauthorized removal by a ‘pirate’. A watermark
`which may be localized through averaging, correla-
`tion, spectral analysis, Kalman filtering, etc., may
`be readily removed or altered, thereby destroying
`the copyright information.
`The watermark must be robust to signal distor-
`tions, incidental and intentional, applied to the host
`data. For example, in most applications involving
`storage and transmission of audio, a lossy coding
`operation is performed on the audio to reduce
`bit-rates and increase efficiency. Operations which
`damage the host audio also damage the embedded
`watermark. The watermark is required to survive
`such distortions to identify the owner of the data.
`Furthermore, a resourceful pirate may use a variety
`of signal processing operations to attack a digital
`watermarking. A pirate may attempt to defeat
`a watermarking procedure in two ways: (1) damage
`the host audio to make the watermark undetectable,
`or (2) establish that the watermarking scheme is
`unreliable, i.e., it detects a watermark when none is
`present. The watermark should be impossible to
`defeat without destroying the host audio.
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`M.D. Swanson et al. / Signal Processing 66 (1998) 337—355
`
`339
`
`Finally, the watermark should be readily extrac-
`ted given the watermarking procedure and the
`proper author signature. Without the correct signa-
`ture, the watermark cannot be removed. The ex-
`tracted watermark must correctly identify the owner
`and solve the deadlock issue (cf. Section 2) when
`multiple parties claim ownership.
`Watermarking digital media has received a great
`deal of attention recently in the literature and the
`research community. Most watermarking schemes
`focus on image and video copyright protection, e.g.,
`[1—3,7,10,14,15,18,19,22,24]. A few audio water-
`marking techniques have been reported. Several
`techniques have been proposed in [1]. Using a phase
`coding approach, data is embedded by modifying
`the phase values of Fourier transform coefficients
`of audio segments. Embedding data as spread spec-
`trum noise have also been proposed. A third tech-
`nique, echo coding, employs multiple decaying
`echoes to place a peak in the cepstrum at a known
`location. Another audio watermarking technique is
`proposed in [21], where Fourier transform coeffi-
`cients over the middle frequency bands are replaced
`with spectral components from a signature. some
`commercial products are also available. The
`ICE system from Central Research Laboratories
`inserts a pair of very short
`tone sequences
`into an audio track. An audio watermarking
`product MusiCode is available from ARIS techno-
`logies.
`Most schemes utilize the fact that digital media
`contain perceptually insignificant components
`which may be replaced or modified to embed copy-
`right protection. However, the techniques do not
`directly exploit spatial/temporal and frequency
`masking. Thus, the watermark is not guaranteed
`inaudible. Furthermore, robustness is not maxi-
`mized. The amount of modification made to each
`coefficient to embed the watermark are estimated
`and not necessarily the maximum amount possible.
`In this paper, we introduce a novel watermarking
`scheme for audio which exploits the human auditory
`system (HAS) to guarantee that the embedded
`watermark is imperceptible. As the perceptual char-
`acteristics of
`individual audio signals vary, the
`watermark adapts to and is highly dependent on
`the audio being watermarked. Our watermark is
`generated by filtering a pseudo-random sequence
`
`(author id) with a filter that approximates the
`frequency masking characteristics of the HAS. The
`resulting sequence is further shaped by the temporal
`masking properties of the audio. Based on pseudo-
`random sequences, the noise-like watermark is
`statistically undetectable. Furthermore, we will show
`in the sequel that the watermark is extremely robust
`to a large number of signal processing operations
`and is easily extracted to prove ownership.
`The work presented in this paper offers several
`major contributions to the field, including
`A perception-based watermarking procedure: The
`embedded watermark adapts to each individual
`host signal. In particular, the temporal and fre-
`quency distribution of the watermark are dictated
`by the temporal and frequency masking character-
`istics of the host audio signal. As a result, the
`amplitude (strength) of the watermark increases
`and decreases with host, e.g., lower amplitude in
`‘quiet’ regions of the audio. This guarantees that
`the embedded watermark is inaudible while having
`the maximum possible energy. Maximizing the
`energy of the watermark adds robustness to attacks.
`An author representation which solves the deadlock
`problem: An author is represented with a pseudo-
`random sequence created by a pseudo-random
`generator [13] and two keys. One key is author
`dependent, while the second key is signal dependent.
`The representation is able to resolve rightful owner-
`ship in the face of multiple ownership claims.
`A dual watermark. The watermarking scheme
`uses the original audio signal to detect the presence
`of a watermark. The procedure can handle virtually
`all types of distortions, including cropping, temporal
`rescaling, etc., using a generalized likelihood ratio
`test. As a result, the watermarking procedure is
`a powerful digital copyright protection tool. We
`integrate this procedure with a second watermark
`which does not require the original signal. The dual
`watermarks also address the deadlock problem.
`In the next section, we introduce our noise-like
`author representation and the dual watermarking
`scheme. Our frequency and temporal masking mod-
`els are reviewed in Section 3. Our watermarking
`design and detection algorithms are introduced in
`Sections 4 and 5. Finally, experimental results
`are presented in Section 6. Watermark statistics
`and fidelity results for four test audio signals are
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`340
`
`M.D. Swanson et al. / Signal Processing 66 (1998) 337–355
`
`presented. The robustness of our watermarking
`procedure is illustrated for a wide assortment of
`signal processing operations and distortions. We
`present our conclusion in Section 7.
`
`2. Author representation, dual watermarking and
`the deadlock problem
`
`Data embedding algorithms may be used to
`establish ownership and distribution of data. In
`fact, this is the application of data embedding or
`watermarking that has received most attention in
`the literature. Unfortunately, most current water-
`marking schemes are unable to resolve rightful
`ownership of digital data when multiple ownership
`claims are made, i.e., when a deadlock problem
`arises. The inability of many data embedding algo-
`rithms to deal with deadlock, first described by
`Craver et al. [4], is independent of how the water-
`mark is inserted in the multimedia data or how
`robust it is to various types of modifications.
`Today, no scheme can unambiguously determine
`ownership of a given multimedia signal if it does
`not use an original or other copy in the detection
`process to at least construct the watermark to be
`detected. A pirate can simply add his or her water-
`mark to the watermarked data or counterfeit
`a watermark that correlates well or is detected in
`the contested signal. Current data embedding
`schemes used as copyright protection algorithms
`are unable to establish who watermarked the data
`first. Furthermore, none of the current data embed-
`ding schemes has been proven to be immune to
`counterfeiting watermarks that will correlate well
`with a given signal as long as the watermark is not
`restricted to partially depend in a non-invertible
`manner on the signal.
`If the detection scheme can make use of the
`original to construct the watermark, then it may be
`possible to establish unambiguous ownership of the
`data regardless of whether the detection scheme
`subtracts the original from the signal under consid-
`eration prior to watermark detection or not. Spe-
`cifically, [5] derives a set of sufficient conditions
`that watermarks and watermarking schemes must
`satisfy to provide unambiguous proof of ownership.
`For example, one can use watermarks derived from
`
`pseudo-random sequences that depend on the signal
`and the author. Ref. [5] establishes that this will
`work for all watermarking procedures regardless of
`whether they subtract the original from the signal
`under consideration prior to watermark detection
`or not. Ref. [20] independently derived a similar
`result
`for a restricted class of watermarking
`techniques that rely on subtracting a signal
`derived from the original from the signal under
`consideration prior
`to watermark detection.
`The signal-dependent key also helps to thwart
`the ‘mix-and-match’ attack described in [5].
`An author can construct a watermark that de-
`pends on the audio signal and the author and
`provides unambiguous proof of ownership as fol-
`lows. The author has two random keys x and
`from which a pseudo-random
`x (i.e., seeds)
`sequence y can be generated using a suitable
`pseudo-random sequence generator [16]. Popular
`generators include RSA, Rabin, Blum/Micali, and
`Blum/Blum/Shub [6]. With the two proper keys,
`the watermark may be extracted. Without the two
`keys, the data hidden in the signal is statistically
`undetectable and impossible to recover. Note that
`classical maximal length pseudo noise sequence
`(i.e., m-sequence) generated by linear feedback shift
`registers are not used to generate a watermark.
`Sequences generated by shift registers are crypto-
`graphically insecure: one can solve for the feedback
`pattern (i.e., the keys) given a small number of
`output bits y.
`The noise-like sequence y may be used to derive
`the actual watermark hidden into the audio signal
`or control the operation of the watermarking algo-
`rithm, e.g., determine the location of samples that
`may be modified. The key x is author dependent.
`The key x is signal dependent. The key x is the
`secret key assigned to (or chosen by) the author.
`Key x is computed from the audio signal which the
`author wishes to watermark. It is computed from
`the signal using a one-way hash function. For
`example, the tolerable error levels supplied by
`masking models (see Section 3) are hashed in [20]
`to a key x. Any one of a number of well-known
`secure one-way hash functions may be used to
`compute x, including RSA, MD4 [17], and SHA
`[12]. For example, the Blum/Blum/Shub pseudo-
`random generator uses the one way function
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`M.D. Swanson et al. / Signal Processing 66 (1998) 337—355
`
`341
`
`y"gL(x)"x mod n where n"pq for primes p and
`q so that p"q"3 mod 4. It can be shown that
`generating x or y from partial knowledge of y is
`computationally infeasible for the Blum/Blum/Shub
`generator.
`The signal-dependent key x makes counterfeiting
`very difficult. The pirate can only provide key x to
`the arbitrator. Key x is automatically computed
`by the watermarking algorithm from the original
`signal. As it is computationally infeasible to invert
`the one-way hash function, the pirate is unable to
`fabricate a counterfeit original which generates
`a desired or predetermined watermark.
`Deadlock may also be resolved using the dual
`watermarking scheme of [20]. That scheme employs
`a pair of watermarks. One watermarking procedure
`requires the original data set for watermark detec-
`tion. This paper provides a detailed description of
`that procedure and of its robustness. The second
`watermarking procedure does not require the orig-
`inal data set. A data embedding technique which
`satisfies the restrictions outlined in [5] can be used
`to insert the second watermark. The second water-
`mark need not be highly robust to editing of the
`audio segment since, as we shall see below, it is
`meant to protect the audio clip that a pirate claims
`to be his original. The robustness level of most of
`the recent watermarking techniques that do not
`require the original for watermark detection is quite
`adequate. The arbitrator would expect the original
`to be of a high enough quality. This limits the
`operations that a pirate can apply to an audio clip
`and still claim it to be his high-quality original
`sound. The watermark that requires the original
`audio sequence for its detection is very robust as we
`show in this paper.
`In case of deadlock, the arbitrator simply first
`checks for the watermark that requires the original
`for watermark detection. If the pirate is clever and
`has used the attack suggested in [4] and outlined
`above, the arbitrator would be unable to resolve
`the deadlock with this first test. The arbitrator
`simply then checks for the watermark that does not
`require the original audio sequence in the audio
`segments that each ownership contender claims to
`be his original. Since the original audio sequence of
`a pirate is derived from the watermarked copy
`produced by the rightful owner, it will contain the
`
`watermark of the rightful owner. On the other
`hand, the true original of the rightful owner will not
`contain the watermark of the pirate since the pirate
`has no access to that original and the watermark
`does not require subtraction of another data set for
`its detection.
`
`3. Audio masking
`
`Audio masking is the effect by which a faint but
`audible sound becomes inaudible in the presence of
`another louder audible sound, i.e., the masker [9].
`The masking effect depends on the spectral and
`temporal characteristics of both the masked signal
`and the masker. Our watermarking procedure
`directly exploits both frequency and temporal mask-
`ing characteristics to embed an inaudible and robust
`watermark.
`
`3.1. Frequency masking
`
`Frequency masking refers to masking between
`frequency components in the audio signal. If two
`signals, which occur simultaneously, are close to-
`gether in frequency, the stronger masking signal
`may make the weaker signal inaudible. The masking
`threshold of a masker depends on the frequency,
`sound pressure level (SPL), and tone-like or noise-
`like characteristics of both the masker and the
`masked signal [13]. It is easier for a broadband
`noise to mask a tonal, than for a tonal signal to
`mask out a broadband noise. Moreover, higher-
`frequency signals are more easily masked.
`The human ear acts as a frequency analyzer and
`can detect sounds with frequencies which vary from
`10 to 20 000 Hz. The HAS can be modeled by a set
`of 26 band-pass filters with bandwidths that increase
`with increasing frequency. The 26 bands are known
`as the critical bands. The critical bands are defined
`around a center frequency in which the noise band-
`width is increased until there is a just noticeable
`difference in the tone at the center frequency. Thus,
`if a faint tone lies in the critical band of a louder
`tone, the faint tone will not be perceptible.
`Frequency masking models are readily obtained
`from the current generation of high-quality audio
`
`

`

`342
`
`M.D. Swanson et al. / Signal Processing 66 (1998) 337–355
`
`codes. In this work, we use the masking model
`defined in ISO-MPEG Audio Psychoacoustic
`Model 1, for Layer I [8]. We are currently updating
`our frequency masking model to the model specified
`by ISO-MPEG Audio Layer III. The Layer I mask-
`ing method is summarized as follows for a 32 kHz
`sampling rate [8,11]. The MPEG model also sup-
`ports sampling rates of 44.1 kHz and 48 kHz.
`
`Step 1: Calculate the spectrum. Each 16 ms segment
`of the signal s(n), N"512 samples, is weighted with
`a Hann window, h(n):
`h(n)"(8/3
`
`
`2 1!cos 2♳ nN.
`
`(1)
`
`The power spectrum of the signal s(n) is calculated
`as
`
`
`
`S(k)"10 log 1N ,\
`
`
`L
`
`
`
`s(n)h(n) exp !j2♳nkN .
`
`(2)
`
`The maximum is normalized to a reference sound
`pressure level of 96 dB. The power spectrum of
`a 32 kHz test signal is shown in Fig. 1.
`
`components. Tonal
`Step 2:
`Identify tonal
`(noisy) components
`(sinusoidal) and non-tonal
`are identified because their masking models are
`different.
`A tonal component is a local maximum of the
`spectrum (S(k)'S(k#1) and S(k)*S(k!1)) sat-
`isfying:
`S(k)!S(k#j)*7 dB,
`j3[!2,#2]
`if 2(k(63
`j3[!3,!2,#2,#3]
`if 63)k(127
`j3[!6,2,!2,#2,2,#6]
`if 127)k)250.
`
`We add to its intensity those of the previous and
`following components: Other tonal components in
`the same frequency band are no longer considered.
`Non-tonal components are made of the sum of the
`intensities of the signal components remaining in
`each of the 24 critical bands between 0 and
`15 500 Hz. The auditory system behaves as a bank
`of bandpass filters, with continuously overlapping
`center frequencies. These ‘auditory filters’ can be
`approximated by rectangular filters with critical
`
`Fig. 1. Power spectrum of audio signal.
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`M.D. Swanson et al. / Signal Processing 66 (1998) 337—355
`
`343
`
`bandwidth increasing with frequency. In this model,
`the audible band is therefore divided into 24 non-
`regular critical bands. Tonal and non-tonal compo-
`nents of the example audio signal are shown in
`Fig. 2.
`Step 3: Remove masked components. Components
`below the absolute hearing threshold and tonal
`components separated by less than 0.5 Barks are
`removed. A plot of the removed components, along
`with the absolute hearing threshold is shown in
`Fig. 3.
`Step 4: Individual and global masking thresholds.
`In this step, we account for the frequency masking
`effects of the HAS. We need to discretize the fre-
`quency axis according to hearing sensitivity and
`express frequencies in Barks. Note that hearing
`sensitivity is higher at low frequencies. The resulting
`masking curves are almost linear and depend on
`a masking index different for tonal and non-tonal
`components. They are characterized by different
`lower and upper slopens depending on the distance
`between the masked and the masking component.
`We use f to denote the set of frequencies present in
`the test signal. The global masking threshold for
`
`each frequency f takes into account the absolute
`hearing threshold S and the masking curves P of
`the N tonal components and N non-tonal compo-
`nents:
`
`S ( f)"10log 101D# ,
`10.D D ..
`
`10.D D .
`
`H
`
`# ,
`H
`
`(3)
`
`The masking threshold is then the minimum of
`the local masking threshold and the absolute hear-
`ing threshold in each of the 32 equal width sub-
`bands of the spectrum. Any signal which falls below
`the masking threshold is inaudible. A plot of the
`original spectrum, along with the masking threshold,
`is shown in Fig. 4.
`As a result, for each audio block of N"512
`samples, a masking value (i.e., threshold) for each
`frequency component is produced. Modifications
`to the audio-frequency components less than the
`masking threshold create no audible distortions to
`the audio piece.
`
`Fig. 2. Identification of tonal components.
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`344
`
`M.D. Swanson et al. / Signal Processing 66 (1998) 337–355
`
`Fig. 3. Removal of masked components.
`
`3.2. Temporal masking
`
`4. Watermark design
`
`Temporal masking refers to both pre- and
`post-masking. Pre-masking effects render weaker
`signals inaudible before the stronger masker is
`turned on, and post-masking effects
`render
`weaker
`signals
`inaudible after
`the
`stronger
`masker is turned off. Pre-masking occurs from 5
`to 20 ms before the masker is turned on while
`post-masking occurs from 50 to 200 ms after the
`masker is turned off [13]. Note that temporal and
`frequency masking effects have dual localization
`properties. Specifically, frequency masking effects
`are localized in the frequency domain, while tem-
`poral masking effects are localized in the time
`domain.
`We approximate temporal masking effects using
`the envelope of the host audio. The envelope is
`modeled as a decaying exponential. In particular,
`the estimated envelope t(i) of signal s(i) increases
`with the signal and decays as e\?R. An audio
`signal, along with its estimated envelope, is shown
`in Fig. 5.
`
`Each audio signal is watermarked with a unique
`noise-like sequence shaped by the masking phe-
`nomena. The watermark consists of (1) an author
`representation (cf. Section 2), and (2) spectral and
`temporal shaping using the masking effects of the
`HAS.
`Our watermarking scheme is based on a re-
`peated application of a basic watermarking
`operation on smaller segments of the audio signal.
`A diagram of our audio watermarking technique
`is shown in Fig. 6. The length N audio signal
`length
`is first segmented into blocks sG(k) of
`i"0,1,2,WN/512X!1, and k"
`512 samples,
`0,1,2,511. The block size of 512 samples is dictated
`by the frequency masking model we employ. Block
`sizes of 1024 have also been used. The algorithm
`works as follows. For each audio segment sG(k):
`1. compute the power spectrum SG(k) of the audio
`segment sG(k) (Eq. (2));
`2. compute the frequency mask MG(k) of the power
`spectrum SG(k) (cf. Section 3.1);
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`M.D. Swanson et al. / Signal Processing 66 (1998) 337—355
`
`345
`
`Fig. 4. Original spectrum and masking threshold.
`
`(cf.
`
`3. use the mask MG(k) to weight the noise-like
`author representation for that audio block,
`creating the shaped author signature PG(k)"

`G(k)MG(k);
`4. compute the inverse FFT of the shaped noise
`pG(k)"IFFT(PG(k));
`5. compute the temporal mask tG(k) of sG(k)
`Section 3.2);
`6. use the temporal mask tG(k) to further shape the
`frequency shaped noise, creating the watermark
`wG(k)"tG(k)pG(k) of that audio segment;
`G(k)"sG(k)#wG(k).
`7. create the watermarked block s
`The overall watermark for a signal is simply the
`concatenation of the watermark segments wG for all
`of the length 512 audio blocks. The author signature
`yG for block i is computed in terms of the personal
`author key x and signal-dependent key x com-
`puted from block sG.
`The dual localization effects of the frequency and
`temporal masking control the watermark in both
`domains. As noted earlier,
`frequency-domain
`shaping alone is not enough to guarantee that the
`watermark will be inaudible. Frequency-domain
`masking computations are based on a Fourier
`transform analysis. A fixed length Fourier transform
`
`does not provide good time localization for our
`application. In particular, a watermark computed
`using frequency-domain masking will spread in time
`over the entire analysis block. If the signal energy is
`concentrated in a time interval that is shorter than
`the analysis block length, the watermark is not
`masked outside of that subinterval. This leads to
`audible distortion, e.g., pre-echoes. The temporal
`mask guarantees that the ‘quiet’ regions are not
`disturbed by the watermark.
`
`5. Watermark detection
`
`The watermark should be extractable even if
`common signal processing operations are applied
`to the host audio. This is particularly true in the
`case of deliberate unauthorized attempts to remove
`it. For example, a pirate may attempt to add noise,
`filter, code, re-sample, etc., an audio piece in an
`attempt to destroy the watermark. As the embedded
`watermark is noise-like, a pirate has insufficient
`knowledge to directly remove the watermark.
`Therefore, any destruction attempts are done
`blindly.
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`346
`
`M.D. Swanson et al. / Signal Processing 66 (1998) 337–355
`
`Fig. 5. Audio signal and estimated envelope.
`
`Fig. 6. Diagram of audio watermarking procedure.
`
`Let r(i), 0)i)N!1, be N samples of recovered
`audio piece which may or may not have a water-
`mark. Assume first that we know the exact location
`of the received signal. Without loss of generality, we
`will assume that r(i)"s(i)#d(i), 0)i)N!1,
`where d(i) is a disturbance that consists of noise
`only, or noise and a watermark. The detection
`scheme relies on the fact that the author or arbitra-
`tor has access to, or can compute, the original
`signal and the two keys x and x required to
`
`generate the pseudo-random sequence y. Therefore,
`detection of the watermark is accomplished via
`hypothesis testing. Since s(i) is known, we specifically
`need to consider the hypothesis test
`H: t(i)"r(i)!s(i)"n(i),
`0)i)N!1 (No watermark),
`H: t(i)"r(i)!s(i)"w(i)#n(i),
`0)i)N!1 (Watermark),
`
`(4)
`
`Supplied by the British Library 07 Sep 2022, 08:00 (BST)
`
`

`

`M.D. Swanson et al. / Signal Processing 66 (1998) 337—355
`
`347
`
`where w(i) is the potentially modified watermark,
`and n(i) is noise. The correct hypothesis is estimated
`by measuring the similarity between the extracted
`signal t(i) and original watermark w(i):
`
`(5)
`
`Sim(x,w)" ,\
`H t( j)w( j)
` ,\
`H w( j)w( j)
`and comparing with a threshold ¹. Note that Eq. (5)
`implicitly assumes that the noise n(i) is white, Gaus-
`sian with a zero mean, even though this assumption
`may not be true. It also assumes that w(i) has not
`been modified. These two assumptions do not hold
`true in most situations. However, our experiments
`indicate that, in practice, the detection test given in
`Eq. (5) is very robust (see Section 6). Our experi-
`ments also indicate that a threshold ¹"0.15 yields
`a high detection performance.
`Suppose now that we do not know the location of
`the observed clip r(i). Specifically, suppose that
`r(i)"s(i#♸)#d(i), 0)i)N!1, where, as be-
`fore, d(i) is a disturbance that consists of noise only,
`or noise and a watermark, and ♸ is the unknown
`delay corresponding to the clip. Note that ♸ is not
`necessarily an integer. In this case, we need to
`perform a generalized likelihood ratio test [23] to
`determine whether the received signal has been
`watermarked or not. Once more, we assume that
`the noise n(i) is white, Gaussian with a zero mean
`even though this may not be true. This leads us to
`compare the ratio
`
`maxO exp(! ,\(r(i)!(s(i#♸)#w(i#♸)))
`
`(r(i)!s(i#♸)))
`maxO exp(! ,\
`
`L
`
`L
`
`(6)
`
`with a threshold. If this ratio is higher than the
`threshold, we would declare the watermark to be
`present. Note that since ♸ is not necessarily an
`integer, computing the numerator and denominator
`of Eq. (6) requires that we perform interpolation or
`evaluate these expressions in the Fourier domain
`using Parseval’s theorem.
`A generalized likelihood ratio test is also needed
`if one suspects that the rec

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket