`a2) Patent Application Publication (10) Pub. No.: US 2003/0028381 Al
`(43) Pub. Date: Feb. 6, 2003
`
`Tuckeret al.
`
`US 20030028381A1
`
`(54) METHOD FOR WATERMARKING DATA
`
`(52) US. Ch.
`
`vosssssscsssssssssssssssasuntssstsnsasenssneensee 704/273
`
`(75)
`
`Inventors: Roger Cecil Ferry Tucker, Chepstow
`(GB); Paul St John Brittan,
`Claverham (GB)
`
`(57)
`
`ABSTRACT
`
`Correspondence Address:
`HEWLETT-PACKARD COMPANY
`Intellectual Property Administration
`P.O. Box 272400
`Fort Collins, CO 80527-2400 (US)
`
`(73) Assignee: HEWLETT PACKARD COMPANY
`
`(21) Appl. No.:
`
`10/201,958
`
`(22)
`
`Filed:
`
`Jul. 25, 2002
`
`(30)
`
`Foreign Application Priority Data
`
`Jul. 31,2001
`
`(GB) sencconemaeenceeverna 0118661.8
`
`Publication Classification
`
`CSU)
`
`Tints C07 eee ceeeesseeeeeeeesteeeeceeneeeannesess G10L 21/00
`
`A method for inserting a watermark into an audio signal
`comprising substituting a noise-like signal portion with a
`replacement noise-like signal portion, and the replacement
`noise-like signal portion is modulated with watermark data.
`In a preferred embodiment Perceptual Noise Substitution is
`used to locate those portions of the audio signal which are
`noise-like and which may be replaced by synthetic noise
`modulated with watermark data.
`
`Advantageously the inventive method results in a signal
`having a synthetic noise signal portion which is modulated
`by watermark data but whichis perceived merely as a noisy
`signal portion and not as watermark data carrying. Further-
`more, watermarks incorporated by the inventive method
`may be adapted to be robust to various audio compression
`schemes.
`
`a n
`
`D
`
`
`
`Noise
`analysis
`
`I
`on-noise
`|
`components
`
`
` Noise
`Noise-
`
`
`parameter
`parameters|based
`extraction
`Pseudo-random
`
`synthesis
`b
`noise generator
`with known key
`
`Audio
`signal out
`
`
` ----—
` signal in
`
`watermark data\ft
`
`Error-protected
`
`Sony Exhibit 1037
`Sony Exhibit 1037
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 1 of 4
`
`US 2003/0028381 A1
`
`
` Normal
`bit-stream
`compression
`Combined
`~~
`
`
`Noise
`
`parameter
`encoding
`
`
`FIGURE1
`(PRIOR ART)
`
`
`
`Noise-based
`
`
`
`
`synthesis Audio
`
`signal out
`
`FIGURE 2
`(PRIOR ART)
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 2 of 4
`
`US 2003/0028381 Al
`
`NojeUsis
`
`olpny
`
`7
`
`pe199}01d-10g KoyUMOUyUIA
`
`SISOYJUAS
`
`-aSION
`
`peseq
`
`€3dfSid
`Joyaweied
`uol}seI3X9
`
`BlepYBULIDIEM
`
`wopueJ-opnasg
`Joyeisuesssiou
`
`
`siskjeue|Uljeusis
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003
`
`Sheet 3 of 4
`
`US 2003/0028381 Al
`
`PANO
`
`
`
`Patent Application Publication
`
`Feb. 6, 2003 Sheet 4 of 4
`
`US 2003/0028381 Al
`
`Wy
`
`aS|BULION
`
`Slojoweied
`asiou
`|
`aauanbas
`
`.40}yOaYyD
`
`
`
`
`
`—eweeeecee
`
`SANNI
`
`{|
`
`JO}eI9Ua3
`
`Joiesoued
`
`wopup
`
`asiou
`
`
`
`
`
`
`
`
`US 2003/0028381 Al
`
`Feb. 6, 2003
`
`METHOD FOR WATERMARKING DATA
`
`FIELD OF THE INVENTION
`
`invention relates to a method for
`[0001] The present
`watermarking data, and in particular, but not exclusively to
`watermarking an audio signal.
`
`BACKGROUND TO THE INVENTION
`
`[0002] The process of embedding data in digitised
`media—audio, video or images—is often referred to as
`digital watermarking. Unlike the paper watermarkingit is
`named after, a key requirementis that the digital watermark
`should be completely imperceptible. Other requirements
`depend on the application:
`
`[0003] A fragile watermark is used to show that the media
`has not been tampered with in any way, and should be
`affected whenever anything is done to the media, in particu-
`lar editing of any kind.
`
`[0004] A robust watermark is mainly used to prove own-
`ership or copyright & should not be removable no matter
`what is done to the media, including compression,writing to
`tape, editing or any other manipulation which retains the
`main value of the media.
`
`[0005] Robust watermarking uses a combination of error
`correction coding as for example discussed by P. Sweene,
`“Error Control Coding (An Introduction)’, Prentice-Hall
`International Ltd., Englewood Cliffs, N.J. (1991), spread-
`spectrum modulation see for example R. Preuss, S. Roukos,
`A. Higgins, H. Gish, M. Bergamo, P. Peterson, “Embedded
`Signalling”, U.S. Pat. No. 5,319,735, 1994, and perceptual
`modelling eg M. Swanson, B. Zhu, A. Tewfik, L. Boney,
`“Robust Audio Watermarking Using perceptual Masking,
`Signal Processing, vol. 66, no. 3, May 1998, pp. 337-355, to
`hide the watermark data in a waythat is least perceptible but
`still recoverable.
`
`(0006] A problem with perceptual modelling is that com-
`pression schemes use the same model to decide which parts
`of the signal do not need to be reproduced in the decoded
`audio. Thus the very part of the signal where the data is
`hidden is the same part likely to be removed by compres-
`sion. However, cven after compression, some of the water-
`mark tends to remain, and the robustness introduced through
`spread-spectrum and error coding allows it be recovered as
`long as the embeddeddata bit-rate is low.
`
`[0007] Some known watermarking schemessubstitute part
`of an audio signal with a watermark signal. Examples of
`such schemesare given in U'S. Pat. No. 5,774,452 and by J
`F Tilki and A A Beex in “Encoding a Hidden Digital
`Signature onto an Audio Signal using Psychoacoustic Mask-
`ing”, Gin Proc 1996, 7 Int Conf. on Sig. Proc. Apps. and
`Tech., pp 476-480). Because the substituted signal is quite
`different, they rely on psychoacoustic masking to minimise
`the perceptual effect of the substitution. If it were possible
`to substitute a signal which is perceptually equivalent to the
`original audio, there would be no need rely on psychoacous-
`tic masking, and the signal would not be in danger of being
`removed by compression schemes like MP3 (MPEG Audio
`Layer 3, as sct out in “Information technology-coding of
`moving pictures and associated audio for digital storage
`media up to about 1.5 Mbit/s—Part 3. Audio”, ISOAEC
`11172-3: 1993). W Bender, D Gruhl, N Morimoto and A Lu
`
`in “‘lechniques for data hiding” IBM Systems Journal, Vol.
`35, Nos. 3 & 4, pp 313-336, propose just such an idea for
`image watermarking, a technique known as Texture Block
`Encoding. A humanselects two areas of an image where the
`texture is similar, and a small amount ofthe first area is then
`copied into the second area—the shape of this copied data
`defines the watermark and in the above referenced paper by
`Benderet al, is a few letters of text. The technique suffers
`from the need for a human to both select the areas and assess
`
`the visual impact after watermarking,andis not suitable for
`automated watermarking.
`
`[0008] A numberof recent audio compression techniques
`search for parts of the signal that can be characterised by
`randomnoise, and substitute pseudo-randomnoise for these
`parts of the signal when decoding. R C F‘Tucker in “Low
`Bit-Rate Frequency Extension Coding” (Audio and music
`technology: the challenge of creative DSP, IEE Colloquium,
`Nov. 18, 1998, pp 3/1-3/5) observesthat the high frequency
`parts of an audio signal can successfully be replaced by
`spectrally-shaped noise for medium-quality compression.
`Scott Levine and Julius O Smith III in “A Sines+Transients+
`Noise Audio Representation for Data Compression and
`Time/Pitch-Scale Modifications” (105 Audio Engineering
`Society Convention, San Francisco 1998) uses noise more
`carefully, separating out the transients from the steady-state
`noise and using transform coding on the transients. A more
`general scheme proposed by D Schultz in “Improving Audio
`Codecs by Noise Substitution” (JAudio Eng. Soc., Vol 44,
`No 178, July/August 1998, pp 593-596), the contents of
`which is hereby incorporated by reference, searches all
`time-frequency segments above 5 kHz and uses synthetic
`noise to reproduce only those segments which have strong
`noise-like properties.
`
`[0009] We haverealised that a signal portion which has an
`attribute which is perceived to be non-information carrying,
`for example noise in an audio signal, can be replaced by a
`signal portion which has an attribute whichis also perecived
`as being non-information carrying but which is modulated
`with watermark data. In particular we have realised that it
`would be advantageous to substitute a portion of a signal
`having a substantially random attribute for a replacement
`signal portion which also has a substantially random
`attribute which has been modulated with watermark data. In
`
`one embodiment of the present invention the compression
`scheme suggested by D Schultz is utilised by modulating the
`synthetic noise with watermark data.
`
`SUMMARIES OF THE INVENTION
`
`[0010] Accordingto a first aspect of the invention there is
`provided a method of incorporating a watermark into a
`signal, comprising substituting a replaceable signal portion
`of the signal which has a substantially random attribute with
`a replacementsignal portion, the replacement signal portion
`having a substantially random attribute which has been
`modulated by watermark data.
`
`(0011] A watermark so incorporated is advantageously
`substantially imperceptible as a result of replacing a signal
`portion having a substantially random attribute with another
`signal portion also having a substantially random attribute.
`
`{0012] An attribute of a signal portion may be the general
`nature of the signal portion or alternatively may be a
`particular parameter of the signal portion.
`
`
`
`US 2003/0028381 Al
`
`Feb. 6, 2003
`
`‘lhe method preferably comprises analysing an
`[0013]
`audio signal above a predetermined frequency for replace-
`able signal portions which are of a substantially random
`nature.
`
`[0014] The method may comprise analysing the audio
`signal for replaceable signal portions of a substantially
`random nature above 5 kHz.
`
`[0015] Preferably the method comprises analysing the
`audio signal in a predetermined frequency band for replace-
`able signal portions which are of a substantially random
`nature.
`
`[0016] Most preferably the predetermined frequency band
`is 5 kHz to 11 kHz.
`
`[0017] The replacement signal portion may comprise a
`signal generated by a random signal generator in accordance
`with a predetermined key.
`
`[0018] Preferably an instantaneous signal level value of
`the replacementsignal portion is modulated in response to a
`respective instantaneous value of the watermark data.
`
`[0019] Preferably where the watermark data comprises a
`first binary value and a secondbinary value,the first binary
`value results in a respective instantaneoussignal level value
`of the replacement signal portion being multiplied by unity
`and the second binary value results in a respective instan-
`taneous signal level value being inverted about a predeter-
`mined value of signal level.
`
`{0020] The watermark data may be incorporated into the
`signal as a plurality of discrete replacement signal portions
`making the watermark data more difficult to locate.
`
`[0021] One bit of watermark data may advantageously be
`distributed over two discrete replacement signal portions.
`
`[0022] The discrete replacement signal portions are pref-
`erably temporally spaced.
`
`[0023] The discrete replacement signal portions may be
`spaced in the frequency domain.
`
`[0024] A first replacementsignal portion for a first portion
`of watermark data may be generated by a random signal
`generator in accordance with a first key, and a second
`replacement signal portion for a second portion of water-
`mark data may be generated by a random signal generator in
`accordance with a second key.
`
`[0025] Whenthe signal is an audio signal the signal may
`be divided into a plurality of time-frequency frames. Audio
`components within each frame are preferably analysed to
`determine a measure of the randomness of the signal pro-
`duced by the components.
`
`[0026] The method may comprise incorporating a syn-
`chronisation sequence signal portion into the signal,
`the
`synchronisation sequence signal portion being generated by
`a random signal generator in accordance with a key, and the
`location of incorporation of the synchronisation sequence
`signal portion in the signal being indicative of the location
`of incorporation of a replacement signal portion in the
`signal.
`
`[0027] The method may in addition comprise incorporat-
`ing a header signal portion into the signal, the header signal
`portion comprising a signal portion generated by a random
`
`signal generator which is modulated by data which is
`representative of the frequency band in which the replace-
`ment signal portion is located.
`
`[0028] The replaceable signal portion may comprise a
`portion of an audio signal generated by a random signal
`generator in an audio synthesiser.
`
`[0029] The audio synthesiser may comprise a music syn-
`thesiser.
`
`[0030] The replaceable signal portion may comprise a
`portion of a speech signal.
`
`[0031] According to a second aspect of the invention there
`is provided a computer readable medium having stored
`therein instructions for causing a processing unit to execute
`the method in accordance withthe first aspect of the inven-
`tion.
`
`readable medium’ we mean a
`[0032] By ‘computer
`medium which is capable of storing instructions for a
`processing unit. The term ‘processing unit’ shall be taken to
`mean a device which accepts an input and processes that
`input in accordance with predetermined instructions to pro-
`duce an output.
`
`[0033] Accordingto a third aspect of the invention there is
`provided an encoder which is configured to perform the
`method in accordance with the first aspect of the invention.
`
`[0034] According to a fourth aspect of the invention there
`is provided a method of reading a signal which is provided
`with a watermark, comprising locating a replacementsignal
`portion and identifying the presence of the watermarkin said
`replacement signal portion, the replacement signal portion
`having a substantially randomattribute which has been
`modulated by watermark data, the replacement signal por-
`tion having replaced a replaceable signal portion which has
`a substantially random attribute.
`
`[0035] The method may be a method of rcading an audio
`signal which is provided with a watermark.
`
`[0036] Preferably the method comprises searching fre-
`quency bands for a recognisable synchronisation sequence
`signal portion.
`
`[0037] The reading method desirably comprises locating a
`synchronisation sequence signal portion by comparing the
`audio signal to an output produced by a random signal
`generator in accordance with a key,
`the location of the
`synchronisation sequence signal portion being indicative of
`the location of the watermark data in the audio signal.
`
`[0038] The method may comprise demodulating the
`replacement signal portion by correlating an output pro-
`duced by a random signal generator in accordance with a
`known key with the replacement signal portion.
`
`[0039] When the signal is an audio signal the step of
`locating a replacement signal portion desirably comprises
`dividing the audio signal into a plurality of time-frequency
`frames, and analysing audio components in each frame to
`determine a measure of the randomness of the signal pro-
`duced by the components.
`
`[0040] Accordingto a fifth aspect of the invention there is
`provided a computer readable medium having stored therein
`instructions for causing a processing unit to execute the
`method in accordance with the third aspect of the invention.
`
`
`
`US 2003/0028381 Al
`
`Feb. 6, 2003
`
`[0041] According to a sixth aspect of the invention there
`is provided an encoder comprising a signal analyser,
`a
`random signal generator and a modulator, the arrangement
`being such that in use the signal analyser analyses a signal
`so as to determine a replaceable signal portion which has a
`substantially random attribute, the modulator being opera-
`tive to modulate a replacementsignal portion generated by
`the random signal generator with watermark data, and the
`replaceable signal portion being substituted by the replace-
`ment signal portion.
`
`[0042] According to a seventh aspect of the invention
`there is provided a reader comprising a signal analyser, a
`random signal generator and a demodulator, the arrangement
`being such that in use the signal analyser analyses a signal
`in order to determine the presence of a watermark in the
`signal, the watermark being incorporated into the signal by
`way of a replacement signal and the replacement signal
`portion having a substantially random attribute which has
`been modulated by watermark data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG.1 is a block diagram of a knownaudio signal
`[0043]
`compression process:
`
`FIG.2 is a block diagram of a knownaudio signal
`[0044]
`decompression process for decompressing a signal pro-
`cessed in accordance with FIG. 1;
`
`[0045] FIG. 3 is a block diagram of an encoder which
`incorporates watermark data into an audio signal in accor-
`dance with the invention;
`
`[0046] FIG. 4 is a schematic time frequency plot showing
`a watermark data packet; and
`
`FIG.5 is a block diagram illustrating a watermark
`[0047]
`reader for reading watermark data from an audio signal.
`
`DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`[0048] Various embodiments of the invention will now be
`described, by way of example only, with reference to the
`accompanying drawings. With reference to FIGS. 1 and 2
`there is shown schematically a method of compressing an
`audio signal as set out in the aforementioned reference by D
`Schulz, known as Perceptual Noise Substitution (PNS).
`
`[0049] More specifically FIG. 1 shows an audio signal
`being input into a data compression unit 1. The audio signal
`undergoes noise analysis whereby time-frequency frames of
`the signal are analysed so as to determine which of those
`frames are substantially noise-like, ie where the signal can
`be considered to be of a substantially random nature. Sub-
`sequently, those signal components which cannot be con-
`sidered to be sufficiently noise-like are compressed in a
`conventional manner, whereas those components of the
`audio signal which have been determined to be substantially
`random in nature are then sent to an encoder. The encoder
`
`to one in which both noise and non-noise-like components
`are conventionally compressed.
`
`[0050] Turning to FIG. 2, in order to regain the audio
`signal the combined bit stream is decompressed as follows.
`The combined bit stream is transmitted to a data decom-
`pression unit 2. The data representing the non-noise-like
`components is decompressed in a conventional manner. The
`data representing the noise-like components is fed to a
`synthesiser 3. The synthesiser 3 is operative to accept a
`signal from a pseudo-random noise generator 4 and in
`response to the data representing the noise-like components
`a noise signal is inserted into the audio signal where the
`original noise-lke components were.
`
`[0051] The following embodiment of the present inven-
`tion comprises a combination of the above method carried
`by the compression unit 3 and the decompression unit 14 to
`incorporate watermark data into an audio signal as will be
`described below with reference initially to FIG.3.
`
`{0052] An audio signal which is to be watermarked is
`transmitted to watermarking apparatus 20. The audio signal
`is first subjected to a noise analyser unit 5 in order to
`determine which time-frequency portionsof the audio signal
`are to be considered as noise-like, ie have a substantially
`random nature when takenin isolation. The signal is divided
`into thirty-two frequency bands within the audible range of
`frequencies. Time-frequency sub-framesare created then by
`sub-sampling each band and then dividing the bands into
`groups of 12 samples representing approximately 10 ms of
`audio.
`
`[0053] Each frame is then analysed to determine whichof
`them is sufficiently noise-like to be replaced by a ‘synthetic’
`noise signal portion. Each time-frequency frame is given a
`score to indicate a measure of how noise-like the elements
`within that frame are. The score can be calculated from the
`normalised prediction error as described by Schulz in the
`aforementioned reference.
`
`[0054] Having determined which frames are sufficiently
`noise-like, the step of noise parameter extraction comprises
`generating data, the noise parameters, which are represen-
`tative of the energy of the frames which have been consid-
`ered to be sufficiently noise-like. The noise parameters then
`undergo the step of noise-based synthesis, which is now
`described.
`
`[0055] A pseudo-random noise generator 8 is operative to
`generate an audio noise signal in accordance with a known
`key. The output of the noise generator 8 provides an input to
`a modulator 7 which in addition accepts an input of a
`watermark data signal which is preferably error-protected.
`Where the watermark data is represented by a binary system,
`an error-protection scheme may comprise adding a ‘1’ or a
`‘0’ to a string of digits depending on whether the string of
`digits consists of an even number or an odd numberof ‘1’
`digits respectively. Error-protection allows somedeteriora-
`tion in the signal, and also so that data cannot be erroneously
`extracted from real noise.
`
`generates data to indicate the broad frequency characteristic
`and energy of the components considered to be noise-like.
`Thus there is produced bit-stream comprising data repre-
`senting compressed non-noise-like signal components and
`data relating to the noise-like components. Such a method of
`compression results in a reduced bandwidth signal compared
`
`[0056] The modulator 7 is operative to modulate the signal
`level of the pseudo-random noise in accordance with the
`watcrmark data. More specifically an instantancous ampli-
`tude value of the signal generated by the noise generatoris
`either multiplied by unity or inverted about a predetermined
`signal
`level value depending on whether the respective
`
`
`
`US 2003/0028381 Al
`
`Feb. 6, 2003
`
`instantaneous value of the watermark data is ‘1’ or ‘0’. ‘Thus
`for example if a generated noise component of 30 corre-
`sponds to an instantaneous value of the watermark data of
`‘1’, when inverted would result in a modulated value of -30.
`
`[0057] The result of such modulation is that a noise-like
`replacement signal portion is produced, notwithstanding the
`modulation, which is of a substantially random nature.
`
`[0058] FIG. 4 showsa time-frequency plot in which there
`is Shown a watermark data packet 10 comprising three signal
`sub-packets which are substantially contiguous in time and
`which has been embedded into an audio signal (notillus-
`trated) into where it has been determined that a noise-like
`portion in the original audio signal can be replaced by a
`synthetically generated modulated noise signal. The three
`signal
`sub-packets
`shown represent
`a synchronisation
`sequence 11, header information 12 and watermark data 13.
`The shorter the combined packet 10 the more the overhead
`of the synchronisation sequence, but the shorter (and there-
`fore more likely to occur) the noise-like portion needed to
`placeit.
`
`[0059] As already stated a first step of the inventive
`method in this embodiment is to locate portions of the
`original signal which may be replaced by synthetically
`generated noise signal portions. A synchronisation sequence
`which is incorporated into the audio signal acts as a flag
`which allows a watermark packet to be located. The syn-
`chronisation sequenceis generated by the outputof the noise
`generator with a known key so that its signature may be
`recognised.
`
`[0060] The synchronisation sequence achieves three pur-
`poses:
`
`1. it allows the exact start time of the data to be
`[0061]
`pinpointed
`
`it allows any time, frequency or spectral
`2.
`[0062]
`distortions in the audio to be measured and compen-
`sated for in a normalisation process
`
`it allows a further normalisation process to
`3.
`[0063]
`calculate the original noise parameters exactly, since
`the framing can be exactly the sameas that used for the
`calculations conducted during insertion of the water-
`mark data.
`
`[0064] The normalisation process can therefore recover
`the original modulated noise signal, apart from distortions
`caused by any compression that may have taken place.
`
`information such as
`[0065] The header contains usual
`packet length, and may also contain information relating to
`the exact frequency band in the audio signal of the water-
`mark data. The header and data sections are generated by
`modulating the information onto the output from the noise
`generator 8 in a knownkey.
`
`[0066] Although FIG. 4 shows the watermark data as
`being provided in a single packet, this need not necessarily
`be the case. It may be that due to the limited length of the
`locations in the audio signal where a substitute noise signal
`portion may be inserted, the watermark data needs to be
`distributed over a plurality of discrete watermark data pack-
`ets which are separated by portions of the original audio
`signal. Howeverevenif it is not necessary to incorporate the
`watermark data in such a way it would nevertheless be
`
`advantageous to distribute the watermark data over a plu-
`rality of discrete time-frequency packets. Thus for example
`one bit of the watermark data could be copied overat least
`two discrete watermark data packets so that advantageously
`increased robustness is achieved.
`
`[0067] Where the watermark data is dispersed over a
`plurality of discrete data packets, a different key (in a known
`sequence) may be used to start the pseudo-random noise
`generator for each packet to avoid using the same key twice
`and risking detection by autocorrelation.
`
`[0068] The replacement signal portion should preferably
`be given short-term spectral colour or energy variations that
`makesit difficult to be detected by noise analysis, but which
`is not perceptible. This exploits the necessarily conservative
`decision-making of any noise analysis system (as in that
`suggested by Schulz) which hasto be careful not to make the
`substitution when there appear to be tonal components
`present. Vor a given noise analysis scheme, such as might be
`employed in a future MPEG4 audio encoder,
`the noise
`should be altered just enough to stop it being detected whilst
`retaining its perception as noisc.
`
`[0069] By placing the watermark packet in only a few of
`the possible substitution places in the original audio signal,
`and giving the watermark properties that make it harder to
`detect, any attempt to remove it will force the threshold at
`which substitution occurs to be lowered, and in doing so the
`audio will be corrupted through makinga lot of inappropri-
`ate noise substitutions.
`
`[0070] Another possible way to ensure high robustness
`would be to adjust the properties of the generated noise
`signal according to the masking effect of the signal energy
`just beneath the noise band. The greater the energy of this
`signal, the more the masking effect and the less noise-like
`the replacement signal can be. U'S. Pat. No. 5,774,452 uses
`this maskingeffect to hide frequency shift keying (FSK) data
`in the upper frequencics of the audio signal.
`
`[0071] The process of reading watermark data provided in
`an audio signal is now described.
`
`[0072] FIG. 5 shows a watermark reader 14. The reader
`has stored in associated storage device the key or set of keys
`used by the random-noise generator 8, and from these can
`construct the synchronisation sequences foundatthe start of
`each packet—in FIG. 5 blocks B represent an additional
`step which will be needed for each key. If the reader 14 does
`not know the exact frequency band where the watermark
`packet has been placed because it was selected according to
`the original audio signal,
`it must estimate the possible
`locations in the same way as the watermark encoder 3 did.
`Alternatively it could simply search all possible frequency
`bands until a synchronisation sequence is found, as shown
`schematically by blocks A in FIG. 5 which represent the
`requirement
`tor a search for each frequency band.
`‘The
`headers 12 would contain the exact frequency band infor-
`mation, so that once any packet has been read, the exact
`frequency band to search for other packets is known by the
`reader.
`
`[0073] The demodulator 18 is operative to compare the
`replacement signal portion which is modulated by water-
`mark data, with a signal produced by the random noise
`generator in accordance with the same key which generated
`the replacement signal portion before modulation.
`
`
`
`US 2003/0028381 Al
`
`Feb. 6, 2003
`
`‘lhe reader 14 searches a selected frequency band
`[0074]
`for a synchronisation sequence by approximately normalis-
`ing the energy and spectrum of the audio in that band and
`then correlating with a local copy (i.e. which is knownbythe
`reader) of the synchronisation sequence 11. This correlation
`could take place in a conventional manner in the time
`domain or could be in the same transform domain as the
`
`watermark data is encoded for extra robustness to compres-
`sion.
`
`[0075] Once a positive correlation is found, demodulation
`of the located watermark data packet can begin.
`
`[0076] Demodulation is achieved by generating a random
`noise signal in accordance with the key which was used to
`generate the random noise signal which was modulated with
`watermark data during encoding. The demodulator 18 is
`operative to compare the normalised watermark packet with
`the random noise signal and hence infer the watermark data.
`The water mark data so derived can then be checked against
`the watermark data which was encodedinitially.
`
`It will be appreciated that although the encoder 3
`[0077]
`and the reader 14 are shown schematically in FIGS. 3 and
`5 respectively as comprising various physical modules or
`units such as a noise generator 8 and a modulator7, the steps
`which are conducted during the encoding and reading pro-
`cesses are carried out in one preferred embodiment by a
`computer comprising a processing unit and associated data
`storage.
`
`[0078] Many known watermark schemes mix the water-
`mark signal with the audio at a much lower, and therefore
`inaudible, signal level. Between this approach, which works
`on all types of audio, and complete substitution of the audio
`by the watermark, which works only for noise-like audio,
`there is the possibility of mixing the watermark data at an
`audible signal level where the signal is somewhat but not
`completely noise-like. This approach would provide a fall-
`back when the noise analysis fails to find cnough segments
`in the original audio signal that can be completely substi-
`tuted by noise to embed a watermark. The level at which the
`watermark signal is mixed would depend on the score from
`the noise analysis.
`
`[0079] Detection of watermark data embedded in such a
`combined way would work in the same way as described
`above, but the synchronisation sequence would need to be
`longer and the data bit rate of the watermark data lower, as
`sinusoidal components would interfere with the detection
`process.
`
`[0080] The inventive method need not necessarily be
`implemented using, noise substitution and two other possible
`implementations are now discussed.
`
`[0081] Where parts of audio are generated by musical
`synthesis, eg a drum machine, synthesiser or sequencer, any
`random process in the synthesis can be exploited to carry
`watermark data. Clearly any noise-like synthetic signal can
`be used as described above, but many other opportunities
`exist. For instance, since timings of audio components
`produced by a background sequencerare usually randomly
`varied to give a less machine-like rhythm this variation
`constitutcs a substantially random attributc, and the cxact
`timings can be varied to encode a few bits of data per note.
`Thus a signal portion comprising two such components can
`be considered to be a replaceable signal portion, the tem-
`
`poral spacing of such components being capable of being
`modulated by watermark data to produce a replacement
`signal portion.
`
`To illustrate how a random processother than noise
`[0082]
`might be exploited in audio, the varying timings in speech
`signals could be used to give a low data rate scheme. Speech
`contains pauses, not just between words but also smaller
`pauses as part of sounds known as ‘stops’—1,k,g,d,b,p in
`English. The precise timings of these pauses are perceived as
`being a substantially random attribute and accordingly a
`signal portion comprising such a pause can be considered to
`be a replaceable signal portion. By passing a signal repre-
`senting the speech through a short buffer, these pauses can
`be modulated by a small amount according to the watermark
`data to be embeddedto produce replacementsignal portions.
`As the timings will be reproduced exactly by any compres-
`sion scheme, the watermark will be robustto the particularly
`severe compression often applied to speech signals. For
`example, the speech signals may be part of a recording of a
`speech or may be produced by a digital voice synthesiscr.
`
`[0083] Robustness to deliberate attack by re-varying the
`pauses would require the pauses to be disguised with some
`signal that is inconsequential to the humanlistener but will
`fool a pause detector.
`
`1. A meth