`_____________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`_____________
`
`SONY GROUP CORPORATION (JAPAN), SONY CORPORATION OF
`AMERICA, SONY INTERACTIVE ENTERTAINMENT LLC, SONY
`PICTURES ENTERTAINMENT INC., SONY ELECTRONICS INC., and
`VERANCE CORPORATION,
`Petitioners,
`
`v.
`
`MZ AUDIO SCIENCE, LLC,
`Patent Owner.
`_____________
`
`Case No. TBD
`Patent No. 7,289,961
`_____________
`
`DECLARATION OF RACHEL J. WATTERS
`RELATING TO EXHIBIT 1023(cid:3)
`
`Sony Exhibit 1054
`Sony v. MZ Audio
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`I, Rachel J. Watters, am a librarian, and the Head of Resource Sharing for the
`
`General Library System, Memorial Library, located at 728 State Street, Madison,
`
`Wisconsin, 53706. Part of my job responsibilities include oversight of Wisconsin
`
`TechSearch (“WTS”), an interlibrary loan department at the University of Wisconsin-
`
`Madison.
`
`I have workedasa librarian at the University of Wisconsin library system
`
`since 1998,starting as a graduate student employee in the Kurt F. Wendt Engineering
`
`Library and WTS,then asa librarian in Interlibrary Loan at Memorial Library.
`
`I began
`
`In 2019,
`professional employment at WTSin 2002 and became WTSDirector in 2011.
`I became ofHead ofResource Sharing for UW-Madison’s General Library System.
`I
`have a master’s degree in Library and Information Studies from the University of
`
`Wisconsin-Madison. Through the course of my studies and employment, I have
`
`become well informed about the operationsof the University of Wisconsin library
`
`system, which follows standardlibrary practices.
`
`This Declaration relates to the dates of receipt and availability of the following:
`
`Cheng, Q. and Sorensen,J. (2001). Spread spectrum signaling
`for speech watermarking. Proceedings ofthe 2001 IEEE
`International Conference on Acoustics, Speech, and Signal
`Processing, Volume III of VI: Image & Multidimensional Signal
`Processing, Multimedia Signal Processing, 7-11 May, 2001, Salt
`Lake City, Utah, p. 1337-1340.
`
`Standard operating procedures for materials at the University ofWisconsin-
`
`
`
`Madison Libraries. When a volume wasreceived by the Library, it would be checked
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`in, addedto library holdings records, and madeavailable to readers as soonafterits
`
`arrival as possible. The procedure normally took a few days or at most 2 to 3 weeks.
`
`Exhibit A to this Declaration is true and accurate copy of the front matter of the
`
`Proceedings ofthe 2001 IEEE International Conference on Acoustics, Speech, and
`
`Signal Processing, Volume III of VI: Image & Multidimensional Signal Processing,
`
`Multimedia Signal Processing, 7-11 May, 2001, Salt Lake City, Utah (2001)
`
`publication, which includes stamps on the verso page showingthat this volumeis the
`
`property of the Kurt F. Wendt Library at the University of Wisconsin-Madison. Exhibit
`
`A also includes an excerpt of pages 1337 to 1340 of that volume, showingthearticle
`
`entitled Spread spectrum signalingfor speech watermarking (2001).
`
`Attached as Exhibit B is the cataloging system record of the University of
`
`Wisconsin-Madison Libraries for its copy of the Proceedings ofthe 2001 IEEE
`
`International Conference on Acoustics, Speech, and Signal Processing, Volume III of
`
`VI: Image & Multidimensional Signal Processing, Multimedia Signal Processing, 7-11
`
`May, 2001, Salt Lake City, Utah (2001) publication. As shownin the “Receiving date”
`
`field of this Exhibit, the University of Wisconsin-Madison Libraries owned this volume
`
`and had it cataloged in the system as of August 20, 2001.
`
`Membersofthe interested public could locate the Proceedings ofthe 2001 IEEE
`
`International Conference on Acoustics, Speech, and Signal Processing, Volume III of
`
`VI: Image & Multidimensional Signal Processing, Multimedia Signal Processing, 7-11
`
`Z
`
`
`
`Declaration of Rachel J. Watters on Authentication of Publication
`
`May, 2001, Salt Lake City, Utah (2001) publication after it was cataloged by searching
`
`the public library catalog or requesting a search through WTS. The search could be
`
`doneby title and/or subject key words. Members ofthe interested public could access
`
`the publication by locating it on the library’s shelvesor requesting it from WTS.
`
`I declare that all statements made herein of my own knowledgearetrue and that
`
`all statements made on information and belief are believed to be true; and further that
`
`these statements were made with the knowledge that willful false statements andthe like
`
`so made are punishableby fine or imprisonment, or both, under Section 1001 of Title 18
`
`of the United States Code.
`
`Date: September8, 2022
`
`Memorial Library
`728 State Street
`Madison, Wisconsin 53706
`
`Rachel
`J. Watters
`Head of Resource Sharing
`
`
`
`
`
`
`
`
`
`EXHIBIT A
`EXHIBIT A
`
`
`
`
`
`
`
`The 2001brads oie ipSEare
`
`
`iSSna“ WeCtrm1Ay
`
`} |
`i
`:
`
`*
`
`— a -oe aitPay :
`
`aleaaathssieeapiclsok
`. ne
`gens Fe
`
`eesara
`
`
`|
`a" he/
`
`
`
`
`<0" =" ‘
`| a
`o
`‘Se ve mi
`‘
`o
`IEEE
`EY x —
`|
`NyeKem)4
`TheInstitute ofElectrical and Electronics ETa *
`jase
`7 me
`
`eS
`
`Vj
`iaUy 4
`
`>
`
`YL
`
`SKYeLeMaselarecoomeeesa
`aia
`
`;
`
`Signal Processing Society
`
`
`
`89075277504
`|
`|
`:
`
`
`
`“utL007dSSVO.
`
`
`
`TALLMLKELLEeTLLYeOhce08UdellLOANIaHLLLE Se:
`
`us:
`
`BUISSIIONTJV
` 4
`
`a
`SyOROKe by
`The Institute of Electrical and Electronics Epapncers
`Signal Processing Society
`
`|
`
`ii if
`
`>
`
`j
`
`a
`oe|
`’
`
`Via
`
`LeBea re
`“
`
`]
`Yh
`
`
`
`SUirAlin og International Conferenos® ~
`‘Acoustics,ee rar
`
`and ST
`gig
`oT
`rireelkaval
`on Eea Paes|ee
`
`we, -og ;, .a)
`signalProcessing.
`Beeee4
`
`?. aa ehpaar
`
`
`
`0,Aafi t
`> oeekei(
`
`
`ar oTdat US.
`»Laa
`:
`
`PROCEEDINGS
`
`mee
`
`3
`
`
`
`
`
`
`
`
`
`
`
`2001 IEEE International
`Conference on Acoustics, Speech,
`and Signal Processing
`
`PROCEEDINGS
`
`VOLUME III OF VI
`IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING
`MULTIMEDIA SIGNAL PROCESSING
`
`7-11 May, 2001
`Salt Palace Convention Center
`Salt Lake City, Utah, USA
`
`Sponsored by
`The Institute of Electrical and Electronics Engineers
`Signal Processing Society
`
`©|)
`
`IEEE
`
`_
`
`__hb
`
`
`SlonalProcessingSociety
`
`
`
`®
`
`
`
`2001 IEEE International Conference on Acoustics,
`Speech, and Signal Processing
`
`Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyondthelimit
`of U.S. copyright lawfor private use of patrons those articles in this volumethat carry a codeat the bottom ofthefirst page, provided the per-
`copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying,
`reprint or republication permission, write to IEEE Copyrights Manager, IEEE Operations Center, 445 Hoes Lane, P.O. Box 1331, Piscataway,
`NJ 08855-1331. All rights reserved. Copyright © 2001 bythe Institute of Electrical and Electronics Engineers, Inc.
`
`Printed in the United States of America by The Printing House
`
`General Library System
`University of Wisconsin -
`728 State Street .
`Madison, W! 53706-1494
`U.S.A.
`
`Madisagt
`
`Kurt F. Wendt Library
`University of Wisconsin - Madi
`.
`2415 N. Randall Avenue
`Madison, WI 53706-1688
`
`a
`sci
`
`IEEE Catalog Number: 01CH37221
`ISBN: 0-7803-704 1-4
`ISBN: 0-7803-7042-2 (Microfiche Edition)
`ISSN: 1520-6149
`
`Additional Proceedings (hard-copy and CD-ROM) maybe ordered from:
`IEEE Operations Center
`445 Hoes Lane
`PO Box 1331
`Piscataway, NJ 08855-1331 U.S.A.
`Tel: 800-678-IEEE (U.S.A. and Canada)
`732-981-0060
`Fax: 732-981-9667
`customer-service @jeee.org
`
`il
`
`
`
`ICASSP 2001 Conference Committee
`
`The 2001 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2001), sponsored by the IEEE
`Signal Processing Society, is the twenty-sixth in an ongoing series of annual international conferences devotedto the theoretical
`experimental, and developmentaspects of signal processing, acoustics, and speech. Conferences of this magnitude and scope
`are possible only because of the continuing interest and support of Society members, manifested both by their submission of
`high-quality papers andtheir participation in the conference. The ICASPP 2001 Conference Committee is grateful to the
`authors, session chairs, volunteers, and all the other people who have contributed to every aspect of this conference.
`
`Conference Committee
`
`General Chair
`Vice General Chair,
`Vice General Chair,
`Vice General Chair,
`V. John Mathews
`Finance
`Technical Program
`Local Arrangements
`Todd Moon
`A. Lee Swindlehurst
`Brian Jeffs
`University of Utah
`mathews @ee.utah.edu
`Brigham Young University
`Utah State University
`Brigham Young University
`swindle @ee.byu.edu
`Todd.Moon@ece.usu.edu—_bjeffs @ee.byu.edu
`
`Registration
`Forrest Staffanson
`TRW Systems
`staffans @ee.utah.edu
`
`Student Forum
`Dae Hee Youn
`Yonsei University and
`University of Utah
`dhyoun@ee.utah.edu
`
`Conference
`Management
`Pamela Bendio
`Utah State University
`Extension
`pamb @ext.usu.edu
`
`On-Line Paper
`Handling
`Utah State University
`Conventions and
`Tradeshows
`
`Tutorials
`Louis Scharf
`Colorado State University
`scharf@engr.colostate.edu
`
`Exhibits
`Scott Budge
`Utah State University
`scott.budge @ece.usu.edu
`
`Scott Douglas
`Southern Methodist University
`douglas @seas.smu.edu
`
`Special Sessions
`Rob Nowak
`Rice Usiiversity
`nowak @rice.edu
`
`Publications
`.
`Behrouz Farhang-Boroujeny
`University of Utah
`farhang @ee.utah.edu
`
`Job Fair
`Jake Gunther
`Utah State University
`jake @ece.usu.edu
`
`Randall Sylvester
`L-3 Communications
`
`Industry DSP Technology
`Program
`José Fridman
`Analog Devices, Inc.
`jose.fridman @analog.com
`
`Michael Deisher
`Intel Corporation
`michael.deisher @intel.com
`
`Publicity
`Tamal Bose
`Utah State University
`TBose @ece.usu.edu
`
`European Liaison
`Giovanni Sicuranza
`University of Trieste
`sicuranza @univ.trieste.it
`
`Far East Liaison
`Ken Sugiyama
`NECCorporation
`ken-nec @tsl.cl.nec.co.jp
`
`Webmaster, Communication
`and Systems Administrators
`Robert Holloway
`Utah State University
`Conventions and Tradeshows,
`Logan, Utah, USA
`icassptechsupport @ext.usu.edu
`
`iil
`
`
`
`SPREAD SPECTRUM SIGNALING FOR SPEECH WATERMARKING
`
`Qiang Cheng
`
`University ofIllinois
`Urbana Champaign, IL
`
`ABSTRACT
`
`The technique of embeddinga digital signal into an audio record-
`ing or image using techniques that render the signal imperceptible
`has received significant attention. Embedding an imperceptible,
`cryptographically secure signal, or watermark, is seen as a poten-
`tial mechanism that may be used to prove ownershipordetect tam-
`pering. While there has been a considerable amount ofattention
`devoted to the techniques of spread-spectrum signaling for use in
`image and audio watermarkingapplications,there has only been a
`limited study for embedding data signals in speech. Speech is an
`uncharacteristically narrow bandsignal given the perceptual capa-
`bilities of the human hearing system. However,using speech anal-
`ysis techniques, one may design aneffective data signalthat can be
`used to hide an arbitrary message in a speechsignal. Also included
`are experiments demonstrating the subliminal channel capacity of
`the speech data embedding technique developed here.
`
`1. INTRODUCTION
`
`Jeffrey Sorensen
`
`IBM T. J. Watson Research Center
`Yorktown Heights, NY
`sorenj@us.ibm.com
`
`significantly higher bit rates can be embedded withouteffecting
`the perceived quality of the recording.
`
`The digital hiding technique for speech can be applied to copy-
`right protection for digital speech libraries, audio books, as well as
`covert communication channel. The embedded information may
`be any digital message. Messages that can be usedto prove author-
`ship require the generation of an appropriate cryptographically se-
`cure digital message and are beyond the scope ofthis paper. How-
`ever, consult [4] for information on the application of watermarks.
`
`2. VOICEBAND SPREAD SPECTRUM SIGNAL
`
`In contrast to previous work on audio watermarking, the speech
`signal is a considerably narrower bandwidth signal. The long-
`time-averaged powerspectral density of speech indicates that the
`signal is confined to a range of approximately 10 Hz to 8 kHz
`[6].
`In order that the watermark survives typical transformation
`of speech signals, including speech codecs, it is important that the
`watermark be limited to the perceptually relevant portions of the
`spectra. However,
`the watermark should remain imperceptible.
`Therefore, a spread-spectrum signal with an uncharacteristically
`narrow bandwidth will be used.
`
`Watermarking is a technique for embedding a cryptographic sig-
`nature into digital content for the purposesof detecting copying or
`alteration of the content. This is accomplished using coding tech-
`niques that hide data within the image or audio content in a manner
`not normally detectable. This paper focuses on an, asyet, largely
`unexplored aspect area of audio watermarks: speech.
`For audio watermarking, Preuss, et. al.
`[8] invent a digital
`information hiding technique for audio using the techniques of
`spread spectrum modulation. Boney, et. al.
`[2] explicitly make
`use of MPEG-1 Psychoacoustic Model to obtain the frequency
`masking values to achieve good imperceptibility. Recently Riuz
`and Deller [9] propose a speech watermarking methodfor the ap-
`plication to the digital speech libraries. These methods have been
`extensively applied for music applications, but embed information
`over a very wide audio band based on humanhearing capabilities.
`A potential attacker need only low-passfilter the resulting signal
`to remove most of the watermarking information.
`Speech differs from music in their acoustic characteristics and
`watermarking requirements. Speech is an acoustically rich signal
`that it uses only a small portion of the human perceptual range.
`illustrates the power spectral density of the water-
`Figure 1
`Typical speech reproduction hardware, although often the same as
`mark signal, with the long-term average speech power spectrum
`(for both a male and female speaker) for illustration. The sim-
`used with music, includes much lower bit rate channels such as
`telephone or compressed voice “vocoders.” However, the same
`plest implementation of a speech watermark system would involve
`addingthis signal, which soundsprimarily like radiostatic, to the
`analysis techniques employed in such voice coding schemes can
`easily be adapted to create an audio watermarking signal that is
`speech signal at the appropriate gain. However, taking advantage
`robust to speech channels. Presented here is a technique for encod-
`of our knowledge of the speech signal itself, we are able to em-
`bedasignificantly higher gain signal using techniquesthatare the
`Ing an additional, arbitrary digital message into speech signals. By
`making use of the well understood techniques of speechanalysis,
`subject of the next two sections.
`
`Using a direct sequence spread spectrum [3] signal, we wish
`to design a PN sequence with a main side lobe that fits within a
`typical telephone channel [5], which ranges from 250 Hz to 3800
`kHz.
`In this work, the message sequence and the PN sequence
`are modulated using simple Binary Phase Shift Keying (BPSK).
`The center frequency of the carrier is chosen to be f, = 2025Hz.
`The clock rate of the PN sequence, or chip rate,
`is taken to be
`1775Hz, whichis half of the signal bandwidth. Because the width
`of our watermark is very close to the modulation frequency,
`it is
`necessary to low pass filter the spread spectrum signal before mod-
`ulation to prevent excessive aliasing. For this, we have chosen to
`use a seventh order Butterworth filter with a cutoff of 3400 Hz.
`
`0-7803-7041-4/01/$10.U0 ©2001 IEEE
`
`1337
`
`
`
`
` -200
`
`PowerSpectrumMagnitude(48) t 8 T
`
`g8
`
`5
`
`g 8
`
`(48)
`
`
`
`
`S00
`
`003000 350040005
`1000150200025
`Frequency(Hz)
`
`5000
`
`
`
`
`
`PowerSpectralDensity
`
`
`6 T
`é
`
`8 &
`
`
`
`
`
`
`
`yonRAIAA
`
`1
`4
`
`==
`
`25
`x 10°
`
`0
`
`1
`05
`
`1
`1
`
`Frequency
`
`L
`15
`
`1
`2
`
`Figure 1: Power spectral densities of the watermark, male voice,
`and female voice.
`
`3. LPC ANAYLSIS AND FILTERING
`
`Ourgoal is to add as much watermarksignal energy as possible to
`the speech signal, while still satisfying the constraint that the added
`signal not be perceivable whenlistened to. Most watermarking ap-
`proachesrely on a perceptual model of human hearing. Speech is
`an inherently complex stimuli with rapidly changing spectral char-
`acteristics. Conventional masking effects are most often studied
`for spectral bands outside the range of speech, above 4 kHz. How-
`ever, an effective production model for speech is available. The
`well known technique of linear prediction has proven to be highly
`effective in modeling speech signals.
`In addition, human speech
`perceptionreflects the production system characteristics. Our find-
`ings indicate that using the production model can provide excellent
`hiding characteristics.
`In our watermark signal embedding algorithm, the watermark
`signal is filtered to match the overall spectral shape of the speech
`signal. In addition, the linear predictive analysis providesan effec-
`tive dynamic measureofthe degree of noise already presentin the
`speech signal. Portions of speech that have a highly white spec-
`trum, fricative sounds and the rapidly changing plosives sounds,
`are especially good candidates for embedding additional water-
`mark energy.
`Linear predicative analysis of speech involves computing the
`maximumlikelihood coefficients ofanall-pole filter of the form
`Al ()
`ao + aj,z
`+++ + apz7-P
`
`Thereis a considerable literature on the application oflinear pre-
`diction to speechsignals. For our analysis, we have chosen to use
`the Levinson-Durbin recursive technique for evaluating LPC coef-
`ficients a; from the short-term autocorrelation coefficients.
`Theshort term autocorrelation can be computed from the win-
`dowed speech frame s(t) as
`N-1
`r= S> s(n)s(n — i)
`n=1
`
`which,
`
`TO
`ay
`
`Ty
`TO
`
`isa
`s
`ee
`
`Tp-1
`Tp-2
`
`Tp-1
`
`Tp-2
`
`++
`
`0
`
`a4
`a2
`
`Qp
`
`rT
`T2
`
`Tp
`
`Figure 2: Power spectrum of a segmentof speech and spectrum of
`LPC-shaped watermarksignal.
`
`which, in vector notation can be represented by
`
`Ra=r
`
`The prediction residual energy, or the average squared-error can
`be computed as
`
`E=a'Ra
`
`is a measure of the “predictability” of the speech signal, and an
`effective measure of the noise content.
`Beforefiltering the watermark signal using the all-pole filter,
`a bandwidth expansion operation is performed. This movesall of
`the polescloser to the centerof the unit circle, increasing the band-
`width of their respective resonances. The vocaltractfilter often
`tends to have quite narrow spectral peaks. Due to masking phe-
`nomena, sounds nearthese peaksare unlikely to be perceived by
`the listener. Therefore, by increasing the bandwidth of formantre-
`sponses, larger overall watermark signal gains should betolerable.
`The bandwidth parameter + is used to adjust the LPC coefficients
`ro
`a
`a; = ay
`
`where y may be chosen between0 and1.
`Figure 2 shows the power spectrum of a segment of speech,
`and the spectrum of the watermarksignalthat resultsafterfiltering
`using the spectral envelope of the speech segment.
`
`4. WATERMARK SIGNAL GAIN
`
`The instantaneous watermark gain is dynamically determined to
`matchthe characteristics of the speech signal. In the simplest case,
`whenlittle speech energy is present (i.e. during silence) the wa-
`termark is added using a fixed gain threshold. This is selected so
`that the watermark becomestheeffective noise floor of the record-
`ing. Perceptually, a small amount of noise is always expected in a
`recording and the watermarksignalis not atypical of such record-
`ing noise. In many applications, silence may not be transmitted or
`might be by coded using extreme compression. In these circum-
`stances, designers should choose an error correcting code (such as
`a convolutional code) with the proper characteristics so that the
`message may be recovered despite these losses.
`The normalized per sample speech energy E; for one frame is
`N
`«
`E,;=% i s°(n) = WTO.
`
`1338
`
`
`
` 0
`
`
` 200
`
` 0
`
`500
`
`1500
`1000
`MessageBit Rate(bits/sec)
`
`2000
`
`100
`
`50
`
`
`
`
`
`
`
`BitErrorProbability °
`
`
`
`
`
`
`
`
`
`BRe&BitErrorProbability(%)
`
`20
`
`60
`40
`Frame Rate (ms)
`
`2
`
`100
`
`0
`
`1000
`1500
`Message Bit Rate (bits/sec)
`
`2000
`
`Figure 4: Bit Error Probability versus Frame Rate, and Bit Error
`Probability versus Message Bit Rate.
`250
`—*- Female Speaker
`-©- Male Speaker
`
`
`
`WatermarkingChannelCapacity(bits/sec)
`
`Figure 5: Watermarking Channel Capacity versus Message Bit
`Rate
`
`desired robustness property. The decodingrule is a maximumlike-
`lihooddecision rule, whichis also a minimum probability-of-error
`rule since 0 and 1 in the messageare sent with equal probabilities.
`The problem ofsynchronization whenthe original message is
`not available is beyond the scope ofthis paper. However, the PN
`sequence used in the spread spectrum modulation can be used to
`drive a phase locked loop during decoding. The techniquespre-
`sented in [3] [8] can be used in our framework for synchronization
`purposes.
`
`6. EMBEDDED CHANNEL CAPACITY
`
`A set of simulation experiments were performed to demonstrate
`the relationship between the framesize and messagerate (1 bit per
`frame) andthebit error probability, as shownin Figure 4.
`Thespread spectrum signal, when addedto the original speech,
`can be considered as a noisy communication channel, called the
`watermarking channel. The watermark is the content of the trans-
`mitted message. Without loss of generality, the message is con-
`sidered to be a binary signal with equal probability for 0 and 1.
`The watermark channel is binary symmetric The channel capac-
`ity, which is the theoretical maximum rate for data transmission,is
`defined for the watermarking channel[1]:
`
`C = R(1 + plogep + (1 — p)loge(1 — p)),
`
`(5)
`
`where p is the crossover probability, R is the message bitrate.
`The simulation results for the watermarking channel capacity are
`plotted in Figure 5. For a binary symmetric channel, the chan-
`
`1339
`
`aaa
` 0
`
`
`
`
`
`
`
`Wise
`t
`0
`0.
`1
`02
`
`03
`
`o4
`
`05
`Time(s)
`
`06
`
`07
`
`08
`
`09
`
`1
`
` 1 L L 1 L 1 1 1
`
`
`
`
`
`
`
`0
`o4
`02
`03
`04
`os
`06
`07
`08
`09
`1
`x10” 1 r 1 t — T +
`
`
`
`Constant, Predictor Error and Energy Terms.
`Cons tant + Predictor Eyror
`\
`
`4 [Cons tant Tom \
`oe
`
`
`
`
`
`WatermarkAdditiveEnergyNywo
`
`Figure 3: A segment of speech and the corresponding watermark
`gains.
`
`The watermark gain in each frame can be determined by the
`linear combinationofthe gainsfor silence, normalized per sample
`residual energy E, and normalized per sample speech energy Es,
`
`g(t) =Xot+ MEt+ AEs,
`
`(2)
`
`which is designed to maximize the strength of the watermark sig-
`nals without incurring perceptual degradations. Figure 3 shows a
`segment of speech and the embedded watermarksignal. There-
`sulting watermarked speech is shownalso in Figure 3. Listening
`test demonstrates that the watermarked speechis indistinguishable
`from the original speech with this watermark gain. If the gain is
`increased further, there will be “hoarseness” in the watermarked
`speech. Thoughit hardly affects the naturalness ofthe voice, the
`difference with the original speechis indeed perceptible.
`
`5. WATERMARK DETECTION
`
`At the receiving end,the received signal ro(t) is given by
`N
`ro(t) = Yo w(t) + s(t) + Tol),
`
`t=1
`
`G)
`
`where w(t) is the LPC-shaped watermarksignal, s(t) is the orig-
`inal speech signal, and Io(t) is some deliberated attacksordigital
`signal processing. Weestimate the LPC coefficients from the re-
`ceived signal, and then take the inverse LPC filtering of ro(#) to
`get r(t). After inverse LPCfiltering, voiced speech becomesperi-
`odic pulses, and unvoiced speech becomes whitened noise. Asis
`typical for speech processing, we modeltheinversefiltered s(t) as
`White Gaussian Noise (WGN). Inverse LPCfiltering decorrelates
`the speech samples s(t) as well as equalizes the watermark signal
`w(t). A correlation receiver,
`
`N
`Hy
`S-d(t)r(t) > 0,
`t=1
`
`(4)
`
`gives us optimumdetection performance in AWGN[7], where NV
`is the length of a frame, in which one message bit is embedded,
`d(t) is the despreading function, which is the synchronized, BPSK
`modulated spreading function for the current frame. The correla-
`tion with d(t) can average outthe interference, thus providing the
`
`
`
`|
`
`tioning system can bebuilt using the data embedding algorithm
`presented here, where the text transcription of the speech would
`be hidden in the speechitself. In addition, in-band signaling ap-
`plications, typically done using dual tone ”touch-tone” signals can
`be replaced with embedded control signals, suggesting novelsi-
`multaneous voice and data applications. Forthe purposeofside-
`information embedding, there is little threat from intentionalat-
`tacks. Thus, a larger capacity of information can be communicated
`with less dependency on the redundancyoferror correct codings.
`
`9. REFERENCES
`
`[1] R. E. Blahut. Principles and Pratice of Information Theory.
`Addison-Wesley Publishing Company, 1987.
`
`[2] L. Boeny, A. H. Tewfik, and K. N. Hamdy. Digital watermarks
`for audio signals.
`In Proc. of Multimedia 1996, Hiroshima,
`1996.
`
`[3] G. R. Cooper and C. D. McGillem. Modern Communica-
`tions and Spred Spectrum. McGraw-Hill Book Company, New
`York, 1986.
`
`[4] F. Hartung and M. Kutter. Multimedia watermarking tech-
`niques. In Proceedings of the IEEE,vol. 87, July, 1999.
`
`[5] C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz.
`Ntimit: A phonetically balanced, continuous speech,
`tele-
`phone bandwidth speech database.
`In JCASSP, pages 109—
`112, Albuquerque, NM, 1990.
`[6] N.S. Jayant and P. Noll. Digital Coding of Waveforms. Pren-
`tice Hall, Inc., Englewood Cliffs, New Jersey, 1984.
`
`[7] H. V. Poor. An Introduction to Signal Detection and Estima-
`tion. Springer-Verlag, New York, 1994.
`
`[8] R. Preuss, S. Roukos, A. Huggins, H. Gish, M. Bergamo, and
`P. Peterson. Embedding Signalling. US Patent 5319735, 1994.
`
`[9] R. J. Ruiz and J. R. Deller. Digital watermarking of speech
`signals for the national gallery of the spoken word. In JCASSP,
`Turkey, 2000.
`
`
`Speec
`Watermark
`Bit
`Bit
`Speech
`Compression
`Rate
`Reliability
`Bandwidth
`Scheme
`706 kbps
`74.05%
`22 kHz
`16 bit linear PCM
`128 kpbs
`71.58%
`4 kHz
`16 bit linear PCM
`32 kbps
`68.65%
`4 kHz
`IMA ADPCM
`GSM 6.10 61.23% 4 kHz 13 kpbs
`
`
`
`
`
`
`Table 1: Watermarking Attacks by Voice Compression
`
`nel capacity is achievable [1]. That is, transmission codes can be
`designed for reliable communication underorat this rate.
`The plot shows that the frame size needs to be small when
`high channel capacity is desired. However, the LPC prediction
`suffers when the framesize is too small, which makes LPC shap-
`ing less effective. And also the degradation of the watermarking
`channel due to attacks is more severe for smaller frame, see Sec-
`tion 7. Therefore, there is an intrinsic tradeoff between channel
`capacity and survivability of watermark. To achieve high channel
`capacity, good LPCpredictability, and reasonable survivability si-
`multaneously we have chosen 800 bits per second as our message
`embeddingrate.
`
`7. ROBUSTNESS
`
`Watermarked media is subject to a variety of attacks. With images,
`images may becropped,rotated, filtered, or otherwise changed.
`Audio signals are less subject to these types of manipulations, as
`the human perceptual system is quite sensitive to changes in au-
`dio signals. However, speech signals may be affected by transfor-
`mations that include: analog to digital and digital to analog con-
`versions,filtering, re-equalization, changes in playback rate, and
`compression. The algorithm presented here puts all of the water-
`marksignal in the most perceptually importantareas of the speech
`signal. Therefore, primitive attempts to remove the watermark by
`filtering are almost certain to proveineffective.
`In order to demonstrate the robustness of the data embedding
`scheme, we have used an analog reproduction system to simulate a
`crude attemptat duplication. A recording is made at 8 kHz,signif-
`icantly reducing the bandwidth, and then the signal is re-sampled
`at the original rate. This could be considered similar to recording
`across a telephone channel, although no explicit telephone net-
`work equalization was applied. Finally, these 8 kHz recording
`were compressed and decompressedusing the typical speech com-
`pression algorithms IMA ADPCM and GSM6.10. Theresults are
`summarized in Table 7.
`
`8. APPLICATIONS AND FUTURE WORK
`
`This paper presents a technique for embedding an arbitrary mes-
`sage In a speech signal.
`In order to provide a complete water-
`marking application, one must choose a message that provides the
`appropriate cryptographicproperties, such as proof of authenticity
`or ownership. In this respect, the embedding algorithm presented
`here can be used with nearly any comparable application. For ex-
`ample, it can be applied to the copyright of the language-learning
`CD's, audio books, recorded teleconferencing data, digital speech
`libraries [9] and Internet radio broadcasts,etc.
`In addition, a speech data embedding algorithm suggests some
`new and possibly unique applications. For example, a closed cap-
`
`1340
`
`
`
`
`
`
`
`
`
`EXHIBIT B
`EXHIBIT B
`
`
`
`9/8/22, 1:52 PM
`
`Physical Item Editor
`
`Manage Patron Services
`
`| Scan In Items
`
`| Monitor Requests & Item Processes
`
`| Receiving Items
`
`| Manage In Process Items
`
`| Printouts Queue
`
`| Shipping Items
`
`| Deliver Digital Documen
`
`Resource
`description
`
`Holdings
`Holdings ID
`
`
`Barcode
`Item ID
`
`Process type
`MMS ID
`
`Status
`
`Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing / sponsored by the Institute of
` K
`Electrical and Electronics Engineers Signal Processing Society. ICASSP (Conference) IEEE Service Center, Piscataway, NJ : 1998-
`[1520-6149]
`
`Verona Shelving Facility: Shelving Verona ; TA365 I102 1
`22809736020002122
`
`View all holdings
`89075277509
`23614787100002122
`View all items
`Loan
`9943740853602122
`Browse shelf listing
`Item not in place
`
`General
`
`ENUM/CHRON
`
`Notes
`
`History
`
`General Information
`
`Barcode
`
`89075277509
`
`Copy ID
`
`1
`
`Material type
`
`Book
`
`Item policy
`
`general (book)
`
`Provenance
`
`-
`
`Is magnetic
`
`No
`
`PO Line
`
`Issue date
`
`
`
`-
`
`Receiving date
`
`08/20/2001
`
`Expected receiving date
`
`-
`
`Enumeration A
`
`v.2001:3
`
`Enumeration B
`
`Chronology I
`
`Chronology J
`
`-
`
`-
`
`-
`
`Description
`
`v.2001:3
`
`Pages
`
`Pieces
`
`Replacement cost
`
`-
`
`1
`
`-
`
`Receiving operator
`
`import
`
`Physical condition
`
`-
`
`Process type
`
`Inventory Information
`
`Inventory number
`
`Inventory date
`
`Inventory price
`
`-
`
`-
`
`-
`
`Location Information
`
`-
`
`-
`
`-
`
`Permanent location
`
`Verona Shelving Facility: Shelving Verona (vsf)
`
`https://wisconsin-madison.alma.exlibrisgroup.com/ng/page;u=%2Frep%2Faction%2FpageAction.resource_editor.physical.item_general.xml.do%3Fpag… 1/2
`
`
`
`9/8/22, 1:52 PM
`
`Physical Item Editor
`
`Item call number type
`
`
`Item call number
`
`Source (Subfield 2)
`
`Storage location ID
`
`-
`
`-
`
`-
`
`-
`
`Temporary Location Information
`
`Item is in temporary location
`
`No
`
`Temporary location
`
`Temporary call number type
`
`Temporary call number
`
`Temporary source (Subfield 2)
`
`Temporary item policy
`
`Due back date
`
`Retention Information
`
`Committed to Retain
`
`Retention Reason
`
`Retention Note
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
` © Ex Libris, Part of Clarivate, 2022 | Terms of Use
`
`https://wisconsin-madison.alma.exlibrisgroup.com/ng/page;u=%2Frep%2Faction%2FpageAction.resource_editor.physical.item_general.xml.do%3Fpag… 2/2
`
`