throbber
United States Patent
`US 7,289,961 B2
`(10) Patent No.:
`(12)
`Bockoet al.
`(45) Date of Patent:
`Oct. 30, 2007
`
`
`US007289961B2
`
`(54) DATA HIDING VIA PHASE MANIPULATION
`OF AUDIO SIGNALS
`
`(75)
`
`Inventors: Mark F. Bocko, Caledonia, NY (US);
`:
`:
`.
`Zeljko Ignjatovic, Rochester, NY (US)
`:
`:
`:
`(73) Assignee: University of Rochester, Rochester,
`NY (US)
`.
`.
`.
`.
`Subjectto any disclaimer,the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 107 days.
`
`.
`x
`(*) Notice:
`
`(21) Appl. No.: 10/870,685
`(22)
`Filed:
`Jun. 18, 2004
`
`(65)
`
`Prior Publication Data
`
`US 2005/0033579 Al
`
`Feb. 10, 2005
`
`(51)
`
`Related U.S. Application Data
`(60) Provisional application No. 60/479,438,filed on Jun.
`19, 2003.
`Int. Cl
`(2006.01)
`G10OL 21/00
`(52) US. Ch wee 704/273; 704/270, 704/253
`(58) Field of Classification Search ................ 704/273,
`704/270, 253
`See application file for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`5,937,000 A *
`6,175,627 Bl
`6,266,430 Bl
`6,363,159 Bl
`6,404,898 Bl
`6,427,012 Bl
`
`8/1999 Leeetal. wee 375/141
`1/2001 Petrovic et al.
`7/2001 Rhoads
`3/2002 Rhoads
`6/2002 Rhoads
`7/2002 Petrovic et al.
`
`8/2002 Petrovic etal.
`6,430,301 Bl
`8/2002 Tewfik et al.
`6,442,283 Bl
`2/2003 Kobayashi etal. ......... 704/504
`6,526,385 B1*
`3003 ne ‘ al.
`ae Bi
`2
`oads
`,560,
`5/2003 Rhoads
`6.560.350 B2
`5/2003 Rhoad:
`6,567,780 B2
`10/2003 Hannigan etal.
`6.633.654 B2
`11/2003 Rhoads
`6,647,128 Bl
`11/2003 Rhoads
`6,647,129 B2
`6,650,762 B2* 11/2003 Gibson et al. wees 382/100
`6,654,480 B2
`11/2003 Rhoads
`6,674,876 Bl
`1/2004 Hannigan etal.
`6,675,146 B2
`1/2004 Rhoads
`6,684,199 Bl
`1/2004 Stebbings
`(Continued)
`OTHER PUBLICATIONS
`.
`.
`.
`Gang et al. (“MP3 resistant oblivious steganography”, Acoustics,
`Speech, and Signal Processing, 2001, Proceedings. (ICASSP ’01).
`IEEE,international conference, May 7-11, 2001, p. 1365-1368 vol.
`3).*
`
`(Continued)
`Primary Examiner—Richemond Dorvil
`Assistant Examiner—Qi Han
`(74) Attorney, Agent, or Firm—Blank Rome LLP
`(57)
`ABSTRACT
`
`Data are embedded in an audio signal for watermarking,
`steganography, or other purposes. The audio signal
`is
`divided into time frames. In each time frame, the relative
`phases of one or more frequency bands are shifted to
`represent the data to be embedded. In one embodiment, two
`frequency bandsare selected according to a pseudo-random
`sequence, and their relative phase is shifted. In another
`embodiment, the phases of one or more overtonesrelative to
`the fundamental tone are quantized.
`
`10 Claims, 8 Drawing Sheets
`
`u
`
`?;
`
`Frequency>
`
`||leo]|lo
`
` Sony Exhibit 1001
`
`Sony Exhibit 1001
`Sony v. MZ Audio
`Sony v. MZ Audio
`
`

`

`US 7,289,961 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`6,707,409
`6,737,957
`6,792,542
`6,996,521
`2002/0034224
`2002/0107691
`2003/0095685
`
`Bl
`Bl
`Bl
`B2*
`Al*
`Al*
`Al*
`
`3/2004
`5/2004
`9/2004
`2/2006
`3/2002
`8/2002
`5/2003
`
`Ignjatovic et al.
`Petrovic et al.
`Leeet al.
`Tliev et al. wee 704/200
`Srimivasan ......... cece 375/240
`Kirovski et al.
`.......... 704/270
`Tewfik et al. oc. 382/100
`
`H. J. Kim, et al. “Audio watermarking techniques”, in Intelligent
`Watermarking Techniques, H. C. Huang, H. M. Hang,and J. S. Pan,
`(Editor), World Scientific Publishing Co., May 2004.
`“Audio Signal Watermaking Based on Replica Modulation”, Rade
`Petrovic, Telsiks 2001, Yugoslavia, Sep. 19-21, 2001, pp. 227-234.
`“Data Hiding Within Audio Signals”, Rade Petrovic, et al., Telsiks
`1999, Oct. 13-15, 1999, pp. 88-95.
`
`* cited by examiner
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 1 of 8
`
`US 7,289,961 B2
`
`Figure 1
`
`~~
`*woe‘
`e.
`
`Robust
`
`Semi-
`
`Fragile
`
`Fragile
`Magnitude
`
`Visible
`
`Perceptually Invisible
`
`Undetectable
`
`Figure 2
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 2 of 8
`
`US 7,289,961 B2
`
`Figure 3
`
`|| lg]
`
`Frequency>
`
`
`ob,
`
`|
`
`|
`| ¢;
`
`|
`
`Frequency>
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 3 of 8
`
`US 7,289,961 B2
`
`F
`
`igure
`
`5
`
`Magnitude
`
`Time
`
`F
`
`igure
`
`6
`
`Tima
`
`aaeg
`
`2
`
`res
`
`
`
`
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 4 of 8
`
`US 7,289,961 B2
`
`Figure 7
`
`netTTets
`<dMe
`4be ial
`
`2|
`sit
`
`Ky
`
`
` [%]10199
`
`[ETEIft}
`
`iFnl|__|=
`
`SNR [dB]
`
`Figure 8
`
`
`
`(%}JoueBuiposac)
`
`
`
`32
`
`48
`
`64
`
`80
`
`96
`112 128
`160
`MP3 encoderbitrate (kbits)
`
`192
`
`224
`
`
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 5 of 8
`
`US 7,289,961 B2
`
`Figure 9
`
`[5%] 4
`ErrorRate 2
`
`BitError
`
`Decoding
`
`Sample delay
`
`Figure 10
`
`CorrectionFrames usage [%]
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 6 of 8
`
`US 7,289,961 B2
`
`94%il
`yreulayeyy
`
`Nwoll
`o-79)(
`
`v-ba0!
`
`[]omnsty
`
`
`

`

`U.S. Patent
`
`Sheet 7 of 8
`
`US 7,289,961 B2
`
`Oct. 30, 2007
`
`of€l gal
`
`

`

`U.S. Patent
`
`Oct. 30, 2007
`
`Sheet 8 of 8
`
`US 7,289,961 B2
`
`DIMIDE AUDIO SIGNAL INTO TIME FRAMES
`AND FREQUENCY COMPONENTS
`
`
`
`
`1302
`
`
`
` 1304
`
`
`me
`
`SELECT AT LEAST TWO FREQUENCY COMPONENTS
`
`
`
`
`
`FIG. 13
`
`

`

`US 7,289,961 B2
`
`1
`DATA HIDING VIA PHASE MANIPULATION
`OF AUDIO SIGNALS
`
`REFERENCE TO RELATED APPLICATION
`
`The present application claims the benefit of U.S. Provi-
`sional Patent Application No. 60/479,438, filed Jun. 19,
`2003, whose disclosure is hereby incorporated by reference
`in its entirety into the present disclosure.
`
`STATEMENT OF GOVERNMENT INTEREST
`
`The work leading to the present invention was supported
`by the Air Force Research Laboratory/IFEC under grant
`number F30602-02-1-0129. The government has certain
`rights in the invention.
`
`FIELD OF THE INVENTION
`
`The present invention is directed to a system and method
`for insertion of hidden data into audio signals andretrieval
`of such data from audio signals and is more particularly
`directed to such a system and method using a phase encod-
`ing scheme.
`
`DESCRIPTION OF RELATED ART
`
`receiving a great
`Digital watermarking currently is
`amountof attention due to commercial interests that seek to
`
`control the distribution of digital media as well as other
`types of digital data. A watermark is data that is embedded
`in a media or document file that serves to identify the
`integrity, the origin or the intendedrecipient of the host data
`file. One attribute of watermarksis that they may bevisible
`or invisible. A watermark also may be robust, fragile or
`semi-fragile. The data capacity of a watermark is a further
`attribute. Trade-offs among these three properties are pos-
`sible and each type of watermark has its specific use. For
`example, robust watermarksare useful for establishing own-
`ership of data, whereas fragile watermarks are useful for
`verifying the authenticity of data.
`Steganography literally means “covered writing” and is
`closely related to watermarking,
`sharing many of the
`attributes and techniques of watermarking. Steganography
`works by embedding messages within other, seemingly
`harmless messages, so that seemingly harmless messages
`will not arouse the suspicion of those wishing to intercept
`the embedded messages.
`As a basic example, a message can be embedded in a
`bitmap image in the following manner. In each byte of the
`bitmap image,
`the least significant bit
`is discarded and
`replaced by a bit of the message to be hidden. While the
`colors of the bitmap image will be altered, the alteration of
`colors will typically be subtle enough that most observers
`will not notice. An intended recipient can reconstruct the
`hidden message byextracting the least significantbit of each
`byte in the transmitted image. If the bitmap image has
`eight-bit color depth (256 colors), and the message to be
`hidden is a text message with eight-bit text encoding, then
`each letter of the text message can be encoded in and
`extracted from eight pixels of the bitmap image. While more
`sophisticated examples exist, the above example will serve
`to illustrate the basic concept.
`The field of steganography is receiving a good deal of
`attention due to interest in covert communication via the
`Internet, as well as via other channels, and data hiding in
`information systems security applications. The single most
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`important requirement of a steganographic methodis thatit
`be invisible to all but the intended recipient of the message.
`FIG. 1 illustrates the attributes and uses of various cat-
`
`egories of watermarking and steganographic techniques.
`Two dimensionsthat characterize watermarking and stega-
`nographic techniques are visibility and robustness. In FIG.
`1, the “visibility” axis extends from visible to undetectable,
`and the “robustness” axis extends from fragile to robust. In
`this “attribute” space we show the regions occupied by
`various watermarking and steganographic techniques. Ide-
`ally, steganography should always be undetectable. A third
`dimension, data capacity, also may be included. In general,
`enhancement of any of the three attributes—visibility,
`robustness, and capacity—compromises
`the other
`two
`attributes.
`Steganography in digital audio signals is especially chal-
`lenging due to the acuity and complexity of the human
`auditory system (HAS). Besides having a wide dynamic
`range and a fairly small differential range, the HASis unable
`to perceive absolute monaural phase, except
`in certain
`contrived situations.
`
`FIG. 2 shows the magnitude and phase spectrogram of a
`few seconds of speech, specifically, a male voice saying,
`“This is a sample of speech.” The upper plot shows the
`magnitude of the spectrum as a function of time. The bands
`of horizontal lines represent the overtone spectrum of the
`pitched portions of the signal.
`In addition to the usual
`display of the magnitudeofthe spectral density (in the upper
`plot), the phase of the spectrum is also displayed (in the
`lowerplot). The phase of the spectrum is apparently random.
`This was verified by computing the autocorrelation in fre-
`quency of each spectral “slice”; it was found to be highly
`peaked at zero delay, indicating no correlation.
`Two companies, Verance and Digimarc, have introduced
`schemes for watermarking of audio signals. Those two
`schemes will be described.
`
`Verance was formed in 1999 from the merger of ARIS
`Technologies Inc. and Solana Technology Development
`Corporation. Verance provides software packages to com-
`panies interested in controlling the use of their copyrighted
`digital audio content, but the major application seems to be
`in broadcast monitoring and verification. For that applica-
`tion, hidden tags are inserted into digital files for TV and
`radio commercials, programs and music, and a service is
`provided which monitors all airplay in all major US media
`marketsso that reports can be providedto the advertisers and
`copyright owners.
`In 1999, Verance was selected to provide a worldwide
`industry standard for copy protected DVD audio andin the
`Secure Digital Music Initiative (SDMI) and was adopted by
`the 4C Entity, a consortium of technology companies com-
`mitted to “protecting entertainment content when recorded
`to physical media.” Verance’s audio watermarking technol-
`ogy was intended to embed inaudible yet identifiable digital
`codes into an audio waveform. The audio watermarks are
`expected to carry detailed information associated with the
`audio and audio-visual content for such purposes as moni-
`toring and tracking its distribution and use as well as
`controlling access to and usage of the content. Embedded
`watermarks travel with the audio and audiovisual content
`wherever it goes and are highly resistant to even the most
`sophisticated attempts to remove them.
`The problem with Verance’s technology for copyright
`protection, however, is that it can be hacked. It has been
`demonstrated that the watermark data can be detected and
`removed by hackers who were able to discover the key by
`applying general signal process analysis. This weakness was
`
`

`

`US 7,289,961 B2
`
`3
`uncovered in a “hackers challenge”test, set up by the SDMI.
`The technology has not been accepted by the industry since
`its announcementin 1999.
`
`Digimare was founded in 1995 with a focus on deterring
`counterfeiting and piracy of media content through “digital
`watermarking,” primarily for images and video.
`It had
`revenue in 2002 of $80M. Its earliest success came from
`working with a consortium of leading central banks on the
`development of a system to deter PC counterfeiting of
`banknotes. The company provides products and services that
`enable production of millions of personal
`identification
`products such as driver’s licenses in more than 33 USstates
`and 20 countries.
`Digimare does not have a significant business in audio
`watermarking, but about six years ago, Digimare competed
`in an open, competitive bid process by the DVD-CCA (DVD
`Copy Control Association), to protect movies from piracy.
`The DVD-CCAincludes the leading companies from the
`motion picture, computer and consumer electronics indus-
`tries. The DVD-CCA decided on Aug. 1, 2002, that the
`offered technologies from Digimarce and its competitors
`were inadequate. An interim solution was announcedby the
`DVD-CCA onSep. 15, 2003. It appears that that the interim
`DVD-CCAsolution is no longer supported.
`Other technologies will now be described.
`An alternative data protection technique from NEC, as
`described in U.S. Pat. No. 6,539,475 (Method for protecting
`digital data through unauthorized copying), has a trigger
`signal embedded in the data. If the embedded trigger mark
`is present, the data is considered to be a scrambled copy. The
`device then descrambles the input data if it detects a trigger
`signal. In the case of an unauthorized copy that contains a
`trigger signal with unscrambleddata, the descrambler would
`render the data useless.
`The principal weakness of this technology lies in the
`requirement to remove the protection before the data can be
`used. If an authorized person is able to insert the recording
`device
`after
`the descrambling,
`an unprotected and
`descrambled copy of the data can be made.
`In anotherpatent, U.S. Pat. No. 6,684,199, assigned to the
`Recording Industry Association of America,
`the system
`authenticates data by introducing an authentication key in
`the form of a predetermined error. The purposeis to prevent
`piracy through unauthorized access and unauthorized copy-
`ing of the data stored on the media disc. It is one of the few
`techniquesthat can survive analog conversion, but it is open
`to signal processing analysis by hackers.
`Examination of various music and speech spectrograms
`indicates an apparent randomness of phase, which is not
`surprising since the analysis frequencies of the spectral
`analysis are not phase coherent with the frequencies present
`in the signal. So far, however, that apparent randomness of
`phase has not been exploited for data-hiding purposes.
`
`SUMMARY OF THE INVENTION
`
`invention to
`is therefore an object of the present
`Tt
`overcome the above-noted deficiencies of the priorart.
`It is another object of the inventionto realize a technique
`which resists blind signal-processing attacks.
`Tt
`is still another object of the invention to realize a
`technique which can survive digital-to-analog conversion.
`Tt
`is yet another object of the invention to realize a
`technique which can survive lossy audio compression, such
`as MPEGI layer IIT (MP3) compression, and which can
`even be applied directly to compressed audio files such as
`MP3files.
`
`4
`the present
`To achieve the above and other objects,
`invention is directed to a technique in which the phase of
`chosen components of the host audio signal is manipulated.
`In a preferred embodiment, the phase manipulation, and thus
`the hidden message, may be detected by a receiver with the
`proper “key.” Without the key, the hidden data is undetect-
`able, both aurally and via blind digital signal processing
`attacks. The method described is both aurally transparent
`and robust and can be applied to both analog and digital
`audio signals, the latter including uncompressed as well as
`compressed audio file formats such as MP3. The present
`invention allows up to 20 kbits of data to be embedded in
`compressed or uncompressed audiofiles.
`Naturally occurring audio signals such as music or voice
`contain a fundamental frequency and a spectrum of over-
`tones with well-defined relative phases. When the phases of
`the overtones are modulated to create a composite waveform
`different from the original, the difference will not be easily
`detected. Thus, the manipulation of the phases of the har-
`monics in an overtone spectrum of voice or music may be
`exploited as a channel for the transmission of hidden data.
`The fact that the phases are random presents an opportu-
`nity to replace the random phasein the original soundfile
`with any pseudo-random sequence in which one may embed
`hidden data. In such an approach, the embedded data is
`encoded in the larger features of the cover file, which
`enhances the robustness of the method. To extract
`the
`embedded data, one uses the “key”to distinguish the phase
`modulation encoding from the inherent phase randomness of
`the audio signal.
`The present invention has the advantage over existing
`Verance algorithms of being undetectable and robustto blind
`signal processing attacks and of being uniquely robust to
`digital to analog conversion processing.
`The present invention can be used to watermark movies
`by applying the watermark to the audio channel in such a
`way as to resist detection or tampering.
`The present invention would allow copies of the data to be
`distributed as unscrambled information, but would contain
`the capability to identify the source of any copy. For
`example, a digital rights management system implementing
`the present invention would inform users as they download
`music that unauthorized copies are traceable to them and
`they are responsible for preventing further illegal distribu-
`tion of the downloadedfile.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`Preferred embodiments of the present invention and varia-
`tions thereon will be set forth in detail with reference to the
`
`drawings, in which:
`FIG. 1 is a conceptual diagram illustrating the attributes
`of various data embedding techniques;
`FIG.2 is a spectrogram showing characteristics of human
`speech;
`FIG. 3 is a phase diagram illustrating a first preferred
`embodimentof the present invention;
`FIG.4 is a phase diagram illustrating a second preferred
`embodinent of the present invention;
`FIG. 5 is a spectrogram of a musical excerpt used to test
`the present invention;
`FIG.6 is a spectrogram of the same musical excerpt with
`data embedded therein;
`FIG.7 is a graph of the decoding errorrate as a function
`of signal-to-noise ratio (SNR) for three levels of quantiza-
`tion;
`
`55
`
`60
`
`65
`
`

`

`US 7,289,961 B2
`
`5
`FIG. 8 is a graph of the decoding error rate as a function
`of MP3 encoderbit rate for three levels of quantization;
`FIG. 9 is a graph ofbit error rate as a function of sample
`density for different frame lengths;
`FIG. 10 is a graph of decodingerror rate as a function of
`a rate of usage of synchronization frames;
`FIG. 11 is a schematic diagram showing a sigma-delta
`modulator for reducing phase discontinuities;
`FIG. 12 is a schematic diagram showing a system on
`which either of the preferred embodiments can be imple-
`mented; and
`FIG.13 is a flow chart summarizing the preferred embodi-
`ments.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`Two preferred embodiments and variations thereon will
`be set forth in detail with reference to the drawings.
`A first method of phase encoding is indicated in FIG.3.
`In the illustrated method, during each time frameoneselects
`a pair (or more) of frequency components of the spectrum
`and re-assigns their relative phases. The choice of spectral
`components and the selected phase shift can be chosen
`according to a pseudo-random sequence known only to the
`sender and receiver. To decode, one must compute the phase
`of the spectrum and correlate it with the known pseudo-
`random carrier sequence.
`Morespecifically, a phase encoding schemeis indicated in
`which information is inserted as the relative phase of a pair
`of partials ,, @, in the sound spectrum. In each time frame
`a new pair of partials may be chosen according to a
`pseudo-random sequence known only to the sender and
`receiver. The relative phase between the two chosenspectral
`components is then modified according to a pseudo-random
`sequence onto which the hidden message is encoded.
`A second preferred embodiment, called the Relative
`Phase Quantization Encoding Scheme or the Quantization
`Index Modulation (QIM) scheme, will now be disclosed
`with reference to FIG.4. In that phase encoding method the
`following steps are employed. Onefirst computes the spec-
`trum of a frame of audio data, then selects an apparent
`fundamentaltone andits series of overtones as shownin the
`
`left plot of FIG. 4; it is convenient to select the strongest
`frequency component in the spectrum. Then,
`two of the
`overtones in the selected series are “relative phase quan-
`tized” according to one of two quantization scales, as shown
`on the right. The choice of quantizationlevels indicates a “1”
`or “0” datum. Therelative phase-quantized spectrum is then
`inversely transformed to convert back to the time domain.
`The second. preferred embodiment uses a variable set of
`phase quantization steps as explained below.
`Step 1:
`Segment the time representation of the audio signal S[i],
`(0=i=I-1) into series of frames of L points S,[i] where
`(0=1=L-1). At this stage, a threshold check may be applied
`and the frame skipped if insufficient audio power was
`present in the frame.
`Step 2:
`Compute the spectrum of each frame of audio data and
`calculate the phase of each frequency component within the
`frame, ®,(@,) (O2i=L-1). An idealization of a typical
`spectrum with a fundamental and accompanying overtone
`series is shown.
`
`6
`
`Step 3:
`Quantize the relative phases of two of the overtones in the
`selected frame according to one of two quantization scales,
`as shown on the right of FIG. 4.
`A®=n/2”
`
`If ‘1’ is to be embedded,
`®,,(@)=APxround(®,,(w,)/A®)
`
`If ‘0’ is to be embedded,
`®,,(@,)=A®xround(®,,(w,)/AP-0.5)+AD/2
`
`15
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`The number of quantization levels ‘n’ is variable. The
`greater the numberof levels, the less audible the effect of
`phase quantization. However, when a greater number of
`quantization levels is employed,
`the probability of data
`recovery error increases.
`Step 4:
`Inverse transform the phase-quantized spectrum to con-
`vert back to the time representation ofthe signal by applying
`an L-point IFFT (inverse fast Fourier transform).
`Recovery of the embedded data requires the receiver to
`compute the spectrum ofthe signal and to know which two
`spectral components were phase quantized. In the tests
`described later, the relative phase between the fundamental
`and the second harmonic was employed as the communica-
`tion channel.
`
`FIG.5 showsthe spectrum (magnitude is in the upper plot
`and the phase in the lower plot) of a musical excerpt
`(“Nite-Flite” by the Sammy Nestico Big Band). FIG. 6
`shows the spectrum, (magnitude and phase) of the same
`music file with 1 kbit of hidden data. The data is encodedin
`the phase quantization of the second harmonic of the stron-
`gest spectral component of each frame; four quantization
`levels are used. There is no apparent spectral evidence of the
`embedded data. In this method any one or several of the
`spectral components may be so manipulated.
`The method described above wasalso applied to a 23-sec-
`ond-long classical guitar solo. Gaussian noise was intro-
`duced prior to decoding. The relative phase between the 2
`strongest harmonics of the music file was quantized and
`embedded with 1 kbit of binary data then followed with the
`decoding process in the presence of Gaussian noise. The
`above was done for 3 different quantization scales (2”
`equally spaced quantization levels), with n=l, 2 and 3
`respectively. The decoding error rate at 3 different quanti-
`zation levels with increasing signal to noise ratio (SNR)is
`shown in FIG.7.
`
`Applying the method described here to 512 points frames
`of 44,100 samples/sec audio one may encode 86 bits per
`second per chosen spectral
`line. This is slightly over 5
`kbits/minute. We have also employed the method on up to 4
`harmonics of the overtone spectrum with satisfactory
`results, raising the data capacity to approximately 20 kbits/
`minute.
`
`The robustness of data against lossy compression will
`now be described. MP3 is a common form of lossy audio
`compression that employs human auditory system features,
`specifically frequency and temporal masking, to compress
`audio by a factor of approximately 1:10.
`The robustness of the steganographic technique described
`above was evaluated by hiding data in an uncompressed
`(.wav) audio file followed by conversion to MP3 format and
`then back to .wav format. The spectrogramsofthe final wav
`files were indistinguishable from the originals, and the audio
`quality was typical of MP3 compressed audio.
`In the
`
`

`

`US 7,289,961 B2
`
`7
`example presented here, we embedded 1 kbit of data in the
`phase of the 2”” harmonic ofthe strongest spectral feature in
`each frame. The file was then converted to MP3 using the
`Lame MP3 encoder, converted back to .wav format and then
`examinedfor the presence of the hidden data. In FIG.8, the
`decoding error rate is illustrated as a function of the MP3
`encoder output bitrate—ranging from 32 kbit/sec to 224
`kbit/sec. We explored data survivability as a function of the
`number of quantization steps, 2”, for n=1, 2, 3. The frame
`length employed was 576 points and the sampling frequency
`was 44,100 Hz.
`Tt was found that the data recovery error rate could be
`reducedto near zero by employing an amplitude threshold in
`the selection of the segments of audio data that were
`encoded. A weak form oferror correction could be employed
`to guard against such infrequent errors. One also may
`implement the techniques described above directly in com-
`pressed audio files, which would eliminate recovery errors.
`To test the robustness of the stego message under D-A-D
`conversion, the audio file with the embedded binary stego
`message wasrecordedto cassette tape employing a common
`tape deck and then re-digitized using the same deck for
`play-back. The tape deck introduced amplitude modulation,
`nonlinear time shifts (wow and flutter) and broad-band
`noise.
`The encoding method performs best when the decoder
`and the encoder are synchronized. As shown in FIG. 9,
`de-synchronization leads to an increased bit-recovery error
`rate. Therefore, a synchronization method is needed to
`compensate for the time shifts introduced by the D-A-D
`conversion process. One such method that we found to be
`effective is as follows. First, at the encoder we chose frames
`distributed periodically throughoutthefile to encode a stego
`message that is known to the decoder. At the decoder these
`frames serve as “synchronization frames”. or example, if
`we encode every fourth frame in the audio file with the
`binary stego message ‘1’, during decoding we may check
`every fourth frame to assess the instantaneous time-shift and
`then resynchronize the remaining data frames before decod-
`ing.
`Another factor is the ratio of power between the selected
`harmonics. In some frames, the power ratio is too low to
`allow robust encoding and those frames will be skipped. We
`found that for a power ratio of 1:5, the robustness of the
`method was maintained.
`
`FIG. 10 showsthe decoding error rate as a function of the
`percentage of frames employed for synchronization. As we
`can see from the figure the decoding error rate decreases as
`the number of synchronization frames
`increases. For
`example, when 45% of the frames are employed as syn-
`chronization frames,
`the decoding error rate approaches
`10%.
`
`An artifact of the phase manipulation method described
`above is a small discontinuity at
`the frame boundaries
`caused by reassignment of the phase of one ofthe spectral
`components. Depending upon the magnitude of the discon-
`tinuity, there may be a broad spectral component, appearing
`as white noise, in the backgroundofthe host file spectrum.
`In order to reduce the magnitude of the discontinuity, three
`techniques have been employed. In the first, rather than
`reassigning the phase of a single spectral component we do
`so for a band of frequencies in the neighborhood of the
`spectral component of interest. We typically use a band of
`frequencies of width equal to a few percent of the signal
`bandwidth.
`A second method is to employ an error diffusion tech-
`nique using a sigma delta modulator. Background informa-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`tion on sigma-delta modulation is found in our U.S. Pat. No.
`6,707,409, issued Mar. 16, 2004.
`FIG. 11 shows a schematic diagram of a device for error
`diffusion employed in conjunction with the phase-manipu-
`lation data-hiding method. FIG. 11 represents the most
`general case for N-th order sigma-delta modulation as used
`to diffuse an error resulting from embedding data into the
`host signal. In the device 1100 of FIG. 11, a host signal
`supplied to an input 1102 is integrated through a series of
`integrators 1104-1, 1104-2,
`.
`.
`. 1104-N. The integrated
`signal is received in an embedding module, where a water-
`mark or other signal received at a watermark input 1106 is
`embedded. Theresulting signal is output through an output
`1110 and is also fed back to the integrators 1104-1,
`1104-2,
`.
`.
`. 1104-N through subtracting circuits 1112.
`Although the device of FIG. 11 has been applied to frame
`sizes of 1,024 samples, the frame size is variable, and the
`resulting audio quality is clearly affected by the choice of the
`framesize.
`
`Although both of these methods proved to be acceptable,
`a third method provedto be the simplest and mosteffective.
`The third method for reducing the phase discontinuities at
`the frame boundaries is simply to force the phase shifts to go
`to zero at the frame boundaries. In our implementation we
`employed a raised cosine function (1+cos)” with n=10. At
`the frame boundaries the phase of the chosen harmonicis not
`shifted and in the central region of the frame the phase is
`shifted by an amount equal to the difference of the original
`phase of the chosen harmonic and the nearest phase quan-
`tization step. The audible artifacts are eliminated in this
`method.
`
`FIG. 12 shows a system on which the present invention,
`including either of the two preferred embodiments disclosed
`above, can be implemented. The system 1200 is shown as
`including an encoder 1202 and a decoder 1214, although, of
`course, either of the devices 1202, 1214 could have both
`encoding and decoding capabilities.
`In the encoder 1202, the audio signal and the data to be
`embeddedare received in an input 1204. A processor 1206
`embedsthe data in the audio signal and outputs the encoded
`file through an output 1208. From the output 1208,
`the
`encodedfile can be transmitted in any suitable fashion,e.g.,
`by being placed on a persistent storage medium 1210 (DVD,
`CD, tape, or the like) or by being transmitted over a live
`transmission system 1212.
`In the decoder 1214, the encodedfile is received at an
`input 1216. A processor 1218 extracts the embedded data
`from the signal and outputs the data through an output 1220.
`If required, the audio signal can also be output through the
`output 1220. For example, if the embedded data are used for
`watermarking purposes, the data and the audio signal can be
`supplied to a player which will not play the audio signal
`unless the required watermarking data are present.
`The preferred embodiments will now be summarized with
`referenceto the flow chart of FIG. 13. In step 1302, the audio
`signal is divided into time frames and frequency compo-
`nents. In step 1304, at least two frequency components are
`selected. In step 1306, the phases are altered in accordance
`with the data.
`While two preferred embodiments and variations thereon
`have been set forth above in detail, those skilled in the art
`whohave reviewedthe present disclosure will readily appre-
`ciate that other embodiments can be realized within the
`
`scope of the invention. For example, numerical values are
`illustrative rather than limiting, as are recitations of specific
`file formats. Moreover, in addition to steganography and
`watermarking, any suitable use for hidden data falls within
`
`

`

`US 7,289,961 B2
`
`
`
`the
`
`9
`the present invention. Furthermore, the present invention
`can be implemented on any suitable hardware through any
`suitable software, firmware, or the like. Also, audio signals
`or files are not limited to portions of data recognized as
`discrete files by an operating system, but instead may be
`continuously recorded signals or portions thereof. Therefore,
`the present invention should be construed as limited only by
`the appendedclaims.
`We claim:
`1. A method for embedding data in an audio signal,
`method comprising:
`(a) dividing the audio signal into a plurality of time frames
`and,
`in each time frame, a plurality of frequency
`components;
`(b) in each ofat least someofthe plurality of time frames,
`selecting at
`least two of the plurality of frequency
`components; and
`(c) alte

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket