throbber

`
` 00=aROOA
`
`as) United States
`a2) Patent Application Publication co) Pub. No.: US 2007/0143105 Al
`Brahoet al.
`(43) Pub. Date:
`Jun. 21, 2007
`
`(54) WIRELESS HEADSET AND METHOD FOR
`ROBUSTVOICE DATA COMMUNICATION
`
`(52)
`
`“WiSMGL swszrasssscnacsccnasaccs 704/231; 704/246
`
`(76)
`
`Inventors: Keith Braho, Murrysville, PA (US),
`Roger Graham Byford, Apollo, PA
`(US); Thomas S. Kerr, Murrysville, PA
`(US); Amro El-Jaroudi, Pittsburgh, PA
`(US)
`Correspondence Address:
`WOOD, HERRON & EVANS, LLP
`2700 CAREW TOWER
`441 VINE STREET
`CINCINNATI, OH 45202 (US)
`(21) Appl. No.:
`11/303,271
`
`(22)
`
`Filed:
`
`Dec. 16, 2005
`Publication Classification
`
`(51)
`
`Int. Ch
`GIOL
`
`15/00
`
`(2006.01)
`
`(57)
`
`ABSTRACT
`
`A wireless device for use with speech recognition applica-
`lions comprises a frame generatorfor generating successive
`frames from digitized original audio signals,
`the frames
`representing portions of the digitized audio signals. An
`autocorrelationcircuit generates a set ofcoellicients for each
`frame,
`the coefficient set being reflective of spectral char-
`acteristics of the audio signal portion represented bythe
`frame.
`In one embodiment, the autocorrelation coeflicients
`may be used to predict
`the original audio signal
`to be
`subtracted from the original audio signals and to generate
`residual signals A Bluetooth transceiver is configured for
`transmitting the set of coeflicients and/or residual signals as
`data to another device, which utilizes the coeflicients for
`speech applications.
`
`&
`
`
`
`
`
`GENERATE
`PREDICTION
`COEFFICIENTS
`
`
`
`
`
`
`PREDICT CURRENT
`SPEECH VALUES WITH
`PREDICTION
`COEFFICIENTS
`
`SUBTRACT PREDICTED
`SPEECH VALUES FROM TRUE
`SPEECH SIGNAL VALUES TO
`GENERATE RESIDUAL
`SIGNALS
`
`
`
`ENCODE
`RESIDUAL
`SIGNALS
`
`
`
`TRANSMIT ENCODED
`RESIDUAL SIGNALS
`AS DATA
`
`
`1
`
`APPLE 1138
`APPLE 1138
`
`1
`
`

`

`Patent Application Publication Jun. 21,2007 Sheet 1 of 3
`
`US 2007/0143105 Al
`
`£0 ————_t_—___-
`
`BLUETOOTH
`ENABLED
`DEVICE
`
`ee
`
`FIG.1
`
`lo
`
`ke
`
`‘
`
`TRANSCEIVER
`
`4B
`
`44
`
`46
`
`BASEBAND
`PROCESSOR
`
`BLUETOOTH
`
`2
`
`

`

`Patent Application Publication Jun. 21,2007 Sheet 2 of 3
`
`US 2007/0143105 Al
`
`ay
`
`ANALOG
`AUDIO INPUT
`
`
`
`AUTO
`CORRELATION
`
`72
`
`DIGITAL SPEECH
`DATA
`.
`
`
`
`FIG. 3
`
`BLUETOOTH|BLUETOOTH|PAYLOAD|RETRANSMISSTIONS| FOREWARD|SYMMETRIC|ASYMMETRIC
`
`LINK
`PACKET
`SIZE
`SUPPORTED
`ERROR
`MAX RATE
`MAX RATE
`TYPE
`TYPE
`(BITS)
`CORRECTION
`(KBPS)
`(KBPS)
`[—act__[om|oe[|ves|ves|woes[1088]1008
`P—act__|ows|0-968|ves__|ves|2501|se72|544
`[act[ows|o-voe|ves|ves|2867_[a77e[363
`[act|om|o-2ie|ves|no|wee|rea[ree
`
`ons|o-4ea|ves|No|3004 [5056[06.4[ons|
`ac.|ows|o-ere|ves|no|4889[7232|576
`||
`rd
`P_sco
`[a
`sco
`sco
`
`a]
`
`3
`
`

`

`Patent Application Publication Jun. 21,2007 Sheet 3 of 3
`
`US 2007/0143105 Al
`
`34
`
`
`
`GENERATE
`PREDICTION
`COEFFICIENTS
`
`PREDICT CURRENT
`SPEECH VALUES WITH
`PREDICTION
`COEFFICIENTS
`
`
`
`
`
`
`
`
`
`
`
`
` SUBTRACT PREDICTED
`
`
`SPEECH VALUES FROM TRUE
`
`SPEECH SIGNAL VALUES TO
`
`
`GENERATE RESIDUAL
`
`
`SIGNALS
`
`
`
` ENCODE
`RESIDUAL
`
`
`SIGNALS
`
`
` TRANSMIT ENCODED
`
`
`RESIDUAL SIGNALS
`
`
`AS DATA
`
`
`4
`
`

`

`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`WIRELESS HEADSET AND METHOD FOR
`ROBUST VOICE DATA COMMUNICATION
`
`FIELD OF THE INVENTION
`
`[0001] This invention relates generally to wireless com-
`munication devices, and particularly to a wireless device,
`such as a headset, utilized for speech recognition applica-
`tions, and other speech applications.
`
`BACKGROUND OF THE INVENTION
`
`[0002] Wireless communication devices are used for a
`variety ofdifferent functions and to provide a communica-
`tion platform for a user. Oneparticular wireless communi-
`cation device is a headset. Generally, headsets incorporate
`speakers that convey audio signals to the wearer, for the
`wearerto hear, and also incorporate microphonesto capture
`speech from the wearer. Such audio and speech signals are
`generally converted to electrical signals and processed to be
`wirelessly transmitted or received.
`
`[0003] Wireless headsets have become somewhat com-
`monplace. Wireless headsets
`are generally wirelessly
`coupled with other devices such as cell phones, computers,
`stereos, and other devices that process audiosignals. In use,
`a wireless headset may be coupled with other equipment
`utilizing various RF communication protocols, such as the
`IEEE 802.11 standard for wireless communication. Other
`wireless communication protocols have been more recently
`developed, such as the Bluetooth protocol.
`
`low-power, short-range
`[0004] Bluetooth is a low-cost,
`radio technology designed specifically as a cable replace-
`ment to connect devices, such as headsets, mobile phone
`handsets,
`and computers or other
`terminal equipment
`together. One particular use of the Bluetooth protocol is to
`provide a communication protocol between a mobile phone
`handset and an earpiece or headpiece. The Bluetooth pro-
`tocol is a well; known protocol understood by a person of
`ordinary skill in the art, and thusall ofthe particulars are not
`set forth herein.
`
`[0005] While wireless headsets are utilized for wireless
`telephone communications,
`their use is also desirable for
`other voice or audio applications. For example, wireless
`headsets may play a particular role in speech recognition
`technology. U.S. patent application Ser. No. 10/671,140,
`entitled “Wireless Headset for Use in a Speech Recognition
`Environment,” and filed on Sep. 25, 2003, sets forth one
`possible use for a wireless headset and that application is
`incorporated herein by reference in its entirety. Speech
`recognition applications demand high quality speech or
`audiosignal, and thus a significantly robust communication
`protocol. While Bluetooth provides an effective means for
`transmission of voice for typical telephonyapplications, the
`current Bluetooth standard has limitations that make it
`significantly less eflective for speech recognition applica-
`tions and systems.
`
`For example, the most frequently used standard
`({0006]
`representing voice or speech data in the telephonyindustry
`utilizes 8-bit data digitized at an 8,000 Hz sample rate. This
`communication standard has generally evolved from the
`early days of analog telephony when it was generally
`accepted that a frequency range of 250 Hz to 4,000 Hz was
`adequate for voice communication over a telephone. More
`
`recent digital voice protocol standards, including the Blue-
`tooth protocol, have built upon this legacy.
`In order to
`achieve an upper bandwidth limit of 4,000 Hz, a minimal
`sample rate of at least twice that, or 8,000 Hz,
`is required.
`To minimize link bandwidth, voice samples are encoded as
`8 bits per sample and employa non-linear transfer function
`to provide increased dynamic range on the order of 64-72
`dB. The Bluetooth standard supports generally the most
`commontelephonyencoding schemes. At the physical layer,
`the Bluetooth protocol uses a “synchronous connection
`oriented” (SCO) link to transfer voice data.An SCO link
`sends data at fixed, periodic intervals. The data rate ofan
`SCOlink is fixed at 64,000bits per second (64 Kbps). Voice
`packets transmitted over an SCO link do not employ flow
`control and are not retransmitted. Therefore, some packets
`are dropped during normal operation, thus resulting in data
`loss of portions of the audiosignals.
`
`[0007] For most human-to-human communication appli-
`cations, such as telephony applications, the current Blue-
`tooth voice sampling and encoding techniques using SCO
`links and voice packets are adequate. Generally, humans
`have the ability to subconsciously use reasoning, context,
`and other clues to mentally reconstruct the original speech
`over a more lossy communication medium. Furthermore,
`where necessary, additional mechanisms, such as the pho-
`netic alphabet, can be employed toensure the reliability of
`the information transferred (e.g., “Z” as in Zulu).
`
`[0008] However, for human-to-machine communication,
`such as speech recognition systems, significantly better
`speech sampling and encoding performance is necessary.
`First, a morereliable data link is necessary, because dropped
`voice packets in the typical telephony Bluetooth protocol
`can significantly reduce the performance ofa speechrecog-
`nition system. For example, each dropped Bluetooth SCO
`packet canresult in a loss of 3.75 milliseconds ofspeech.
`This can drastically increase the probability of a speech
`recognition error.
`
`the information-bearing frequency
`[0009] Additionally,
`range of speech is now understood tobe in the range of 250
`Hz to 6,000 Hz, with additional less critical content avail-
`able up to 10,000 Hz. The intelligibility of consonants has
`been shown to diminish when the higher frequencies are
`filtered out ofthe speechsignal. Theretore,it is importantto
`preservethis high end ofthe spectrum.
`
`[0010] However, increasing the sample rate ofthe audio
`signal to 12,000 Hz, while still maintaining 8-bit encoding
`exceeds the capability of the Bluetooth SCO link, because
`such an encoding scheme would require a data rate of96
`Kbps, which is above the 64 Kbps Bluetooth SCO rate.
`
`Speech samples digitized as 8-bit data also contain
`[0011]
`a high degree of quantization error, which has the effect of
`reducing the signal-to-signal ratio (SNR) ofthe data fed to
`the recognition system. Speech signals also exhibit a vari-
`able dynamic range across different phonemes and different
`frequencies. In the frequency ranges where dynamicrangeis
`decreased, the effect of quantization error 1s proportionally
`increased. A speech systemwith 8-bit resolution can have up
`to 20 dB additional quantization error in certain frequency
`ranges for the “unvoiced” components ofthe speechsignal.
`Most speech systems reduce the effect of quantization error
`by increasing the sample size to a minimumof 12 bits per
`sample. Thus,
`the current Bluetooth voice protocol
`for
`
`5
`
`

`

`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`telephony is not adequate for speech application such as
`speech recognition applications.
`
`[0012] Therefore, there is a need for an improved wireless
`device for use in speech and voice applications. There is
`particularly a need for a wireless headset device that
`is
`suitable for use in speech recognition applications and
`systems. Still further, it would be desirable to incorporate a
`Bluetooth protocol in a wireless headset suitable for use with
`speech recognition systems.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0013] The accompanying drawings, which are incorpo-
`rated in and constitute a part of this specification, illustrate
`embodiments ofthe invention and, together with a general
`description of the invention given above, and the detailed
`description of the embodiments given below,
`serve to
`explain the principles ofthe invention.
`
` [0018]
`
`FIG. 1 illustrates the schematic view of a commu-
`[0014]
`nication system in which the present
`invention may be
`incorporated.
`
`FIG, 2 illustrates a block diagram view of compo-
`[0015]
`nents of a wireless communication device in accordance
`with the principles ofthe invention.
`
`FIG. 3 is a flowchart illustrating one operational
`[0016]
`embodimentofthe present invention.
`
`FIG. 4 is atable of Bluetooth protocol parameters
`[0017]
`utilized in accordance with one aspect ofthe present inven-
`tion.
`
`FIG. 5 is a flowchart illustrating another opera-
`tional embodiment ofthe present invention.
`
`DETAILED DESCRIPTION OF EMBODIMENT
`OF THE INVENTION
`
`invention addresses the above-refer-
`[0019] The present
`enced issues and noted drawbacks in the prior art by
`providing a wireless device that is useful for speech appli-
`cations and, particularly, useful for speech recognition appli-
`cations that require higher quality speech signals for proper
`performance. Tothat end, the present invention rather than
`relying upon voice sampling and coding techniques for
`human-to-human communication, such as over the tele-
`
`phone,utilizes correlation processing andrepresents spectral
`characteristics of an audio or speech signal in the form of
`data.
`
`Particularly, an autocorrelation component ofthe
`[0020]
`invention generates a set of coeflicients
`for successive
`portion or framesofa digitized audiosignal. The coefficients
`are reflective of spectral characteristics of the audiosignal
`portions, which are represented by multiple successive
`frames. The sets of coeflicients reflective of audio signal
`frames are transmitted as data packets in a wireless format.
`Although various wireless transmission protocols might be
`used, one particular embodimentutilizes a Bluetooth trans-
`ceiver and a Bluetooth protocol. However, rather than uti-
`lizing standard Bluetooth voice processing and voice pack-
`ets, the present inventiontransmits the sets of coeflicients as
`data, utilizing data packets in the Bluetooth protocol. Other
`wireless transmission schemes mayutilize their data trans-
`mission parameters, as well,
`in accordance with the prin-
`ciples ofthe present invention, as opposed to voice param-
`
`eters, which are general utilized to transmit voice for human-
`to-human communication.
`In one particular aspect,
`the
`Bluetooth transceiverutilizes an asynchronous connection-
`less (ACL) link for transmitting the coeflicients as data.
`
`invention overcomes the
`the present
`[0021] Therefore,
`inherent limitations of Bluetooth and other wireless com-
`munication methods for use with speech recognition, by
`providing desired link reliability between the devices, pro-
`viding high dynamic range and lower quantization error
`coding, and by providing less link bandwidth than current
`methods, while avoiding additional computational complex-
`ity on the speech recognition system.
`
`illustrates a schematic view of a system
`1
`FIG.
`[0022]
`incorporating the inventionandaspects thereof. The aspects
`and features ofthe present invention may be incorporated
`into various different wireless devices, which contain the
`
`signal processing capability and wireless transceiving capa-
`bility for implementing the elements of the invention. For
`example, the invention might be incorporated into a tradi-
`tional computer, such as a desktop or laptop computer, in a
`portable terminal or portable computing device, a cellular
`phone, a headset, or any other device whichis operable for
`processing, audio signals and transceiving the data which
`results from such signal processing according to the inven-
`tion.
`
`In one particular embodiment ofthe invention, it is
`[0023]
`incorporated into a wireless headset device worn byauser,
`and the data is transceived as data packets utilizing a
`Bluetooth protocol. Therefore,
`in the example discussed
`herein, a Bluetooth-enabled headset device is described.
`However,
`it should be understood that
`this is only one
`particular device and one particular data-transceiving pro-
`tocol that might be utilized. Other such devices and trans-
`ceiving protocols could also be used in accordance with
`aspect ofthe present invention. Therefore, the invention is
`not limited only to Bluetooth headsets.
`
`[0024] Referring again to FIG. 1, a user 10 is shown
`utilizing or wearing a headset device 12. The headset 12 has
`wireless transceiving capabilities, which are implemented
`byappropriate processing and transceiving circuitry 14. The
`headsetcircuitry 14 also handles othersignal processing as
`described below. The headset 12 will generally usually have
`one or more speakers 16 for the user to hear audio signals
`that are received as well as to hear those audio signals that
`are spoken and transmitted by the user. To capture audio
`signals, such as speech,
`the headset 12 incorporates a
`microphone 18. The processed signalsare referred to herein
`generally as “audio signals” and will
`include voice and
`speech signals, as well as other audio signals. Generally,it
`is desirable for
`the user
`to communicate commands,
`responses, general speech, etc.
`to one or more devices,
`which are wirelessly coupled to the headset. For example,
`the headset 12 might communicate with a portable terminal
`20 which maybe wornorcarried bytheuser, another person,
`ora piece of equipment. Such a wireless link is indicated by
`reference numeral 21, Similarly,
`the wireless headset 12
`might communicate with another enabled device, such as a
`cellular phone or other device 22, which is coupled by
`wireless link 23. In FIG. 1 device 22 is indicated as being a
`Bluetooth-enabled device, although other transceiving pro-
`tocols might be used. In another embodiment, headset 12
`maybe coupled directlyto a server or other computer device
`
`6
`
`

`

`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`24, generally through a wireless access point 26, which has
`an appropriate antenna 27. The wirelesslink to the server 24
`is indicated by reference 25. Generally, the wireless cou-
`pling of a device such as headset 12 to various other devices
`(not shown) in accordance with the principles ofthe present
`invention 1s also possible, as long as the devices havethe
`necessary processing and transceiving circuitry for imple-
`menting the invention.
`
`[0025] One particular speech application for the wireless
`headset device 12 or other inventive wireless device is a
`speech recognition application whereinthe speech generated
`by user 10 is analyzed and processed for performing mul-
`tiple tasks. For example, a user might be directed to perform
`atask through headset 12. Uponor during completionof the
`task, the user might speak to the system, through micro-
`phone18, to confirmthe instructionsandtask, ask additional
`information, or report certain conditions, for example. The
`speech of the user and the words spoken must
`then be
`analyzed or “recognized” to extract the information there-
`from. U.S. patent application Ser. No. 10/185,995, entitled
`“Terminal and Method for Efficient Use and Identification of
`Peripherals” and filed on Jun. 27, 2002, discusses use of a
`headset and speech recognitionin an inventory management
`system, for example, that application is incorporated herein
`byreference in its entirety. Various different speech recog-
`nition technologies may be used to process the unique data
`generated by the wireless headset or other device of the
`invention, and persons ofordinaryskill in the art knowsuch
`technologies. Therefore, the particulars ofa specific speech
`recognition systemare not set forth herein.
`
`FIG. 2 illustrates. in block diagram form, various
`[0026]
`components of a wireless headset 12 to implement one
`embodiment of the invention. The componentsillustrated,
`while separated into functional blocks in FIG. 2, might be
`combined together into a single integrated circuit, or maybe
`implemented utilizing individual circuit components. As
`noted above, headset 12 incorporates a microphone 18 for
`capturing an audio signal, such as a speechsignal from user
`10. Microphone 18 is coupled to an audio coder/decoder or
`audio codec 40. The audio codec 40 performs analog-to-
`digital (A/D) conversiononthe analog audio signal captured
`by microphone 18. The audio codec 40 also preferably
`performsanti-aliasing on the resulting digitized data as well.
`In effect, audio codec 40 provides a digitized audiosignal
`reflective ofthe analog audio signal captured by microphone
`18. Audio codec 40 supports the necessary data sample rates
`and bit resolutions noted below for implementing various
`embodiments of the present
`invention. Particularly, audio
`codec 40 provides sampling rates for high-quality audio
`signals that capture most of the speech frequencies that
`would be ofinterest to speech applications, such as speech
`recognition applications.
`
`[0027] The digital audiodata, or the digitized audiosignal,
`is supplied to a digital processor 42. The digital processor
`includes a microprocessoror other digital signal processor,
`volatile and non-volatile memory, and associated logic nec-
`essary to provide the desired processing ofthe signal for
`implementing the invention. For example, as discussed
`further below,
`the digital processor 42 may provide pre-
`emphasis processing. frame generation, windowing, and
`auto correlation processing of the digital data stream. The
`product of the digital processor 42 is processed, digitized
`
`audio or speech data, which is then supplied to a baseband
`processor 44, such as a Bluetooth baseband processor, for
`example.
`
`[0028] The baseband processor 44 then formats the pro-
`cessed digital speech data according to transceiving protocol
`standards and, in the exemplary embodiment, according to
`Bluetooth protocol standards, However, the digital speech
`data provided by baseband processor 44 is not transmitted as
`voice packets under the Bluetooth protocol, as it would be
`under typical Bluetooth telephony applications. Rather,
`in
`accordance with one aspect of the invention, the digitized
`speech is transmitted as data using data packets under the
`Bluetooth protocol. The baseband processor may perform
`such operations as adding packet header information, for-
`ward error correction, cyclic redundancy check, and data
`encryption.
`It also implements and manages the Bluetooth
`stack. As noted above, the Bluetooth transmission protocol
`is a standard transmission protocol, and thus will be readily
`understood bya person ofordinary skill in the art. As such,
`all ofthe various specifics associated with Bluetooth trans-
`mission are not discussed herein.
`
`[0029] A wireless transceiver, such as a Bluetoothtrans-
`ceiver 46, coupled to an antenna 48, performs all operations
`necessary to transmit and receive the voice data over a
`wireless link, such as a Bluetooth link. Wireless transceiver
`
`46 mightbe operable under another wireless communication
`protocol even though the exemplary embodiment discussed
`herein utilizes Bluetooth. The operations of Bluetooth trans-
`ceiver 46 may include, but are not limited to, such typical
`transceiver operations as conversion to RF frequencies,
`modulation and demodulation, spreading, and amplification.
`Antenna 48 provides efficient transmission and reception of
`signals in a wireless format.
`
`[0030] While one aspect ofthe invention is directed to
`transmitting a representation of captured speech signals
`from a device for use in speech recognition applications,
`wireless headset 12 also implements a receive data link. All
`the various functional blocks shown in FIG. 2 support
`bidirectional data transfer. The audio codec 40 is capable of
`performing digital-to-analog (D/A) conversion and sending
`the analog signal to one or more speakers 16. The audio
`codec 40 preferably separates A\D and D\A converters with
`independent channels so that full duplex operation is pos-
`sible, The received data link can be implemented utilizing
`either an asynchronous connection-less (ACL) link, as dis-
`cussed further belowfor one embodimentof the invention,
`or an SCO link. If telephony-quality data is acceptable on
`the receive link, then an SCO link can be employed, and
`standard Bluetooth audio processing can be performed by
`either the baseband processor 44 or the digital processor 42,
`or by some combination of both. The processed audio data
`will then be sent
`to the audio codec 40 for playback by
`speakers 16. Generally, an SCO link using various packets
`might be acceptable on the receive side, unlike the transmit
`side of the invention, because the received data link may
`contain audio that will be listened to and interpreted by a
`human(1.e. the user) rather than a machine. As with typical
`Bluetooth voice applications, a lower quality voice link is
`possible for telephony applications.
`
`[0031] However, if a more reliable link is necessary or
`desired, then an ACL link might be employed onthe receive
`side as well, according to the invention. In that case, audio
`
`7
`
`

`

`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`processing would be performed bythe digital processor 42.
`A more reliable receive data link may be necessary, for
`example, for safety-critical applications, such as for use by
`emergencyfirst responders.
`
`[0032] As noted above,it will be apparent to a person of
`ordinary skill
`in the art
`that
`the disclosed embodiment is
`exemplaryonly and a wide range ofother embodiments may
`be implemented in accordance with the principles ofthe
`present invention. For example, various different commer-
`cially available components are available to implement the
`elements described in FIG. 2, with varying levels ofinte-
`gration. Furthermore,
`the functional blocks in the Figure
`may be implemented using individual
`integrated circuit
`components, or several functional blocks may be combined
`together into a single integrated circuit.
`
`figure shows a
`that
`[0033] Referring now to FIG. 3,
`processing flow chart for one embodiment ofthe present
`invention. Analog audio signals, such as user speech or
`voice signal, are collected, such as by a microphone 18 inthe
`headset 12 in the example discussed herein. Alternatively, an
`analog audio signal might be retrieved from a storage
`medium, such as tape to be further processed andtransmitted
`according to the invention. The audio input is provided to
`circuitry for performing A/D conversion 62. For example,
`the audio signals might be directed to a codec 40 as
`discussed, In the A/D conversion step 62, the analog audio
`signal is converted to digital samples, whichare suitable for
`being further processed and for being used in speech appli-
`cations, such as in a speech recognition system. The A/D
`conversion step 62 may utilize typical sampling rates for
`high-quality audio signals, such as sampling rates 11,025
`Hz; 16,000 Hz: 22,060 Hz; 44,100 Hz, and 48,000 Hz. For
`the purposes of discussion of the exemplary embodiment
`herein, we will address the sample rates of 11,025 Hz and
`16,000 Hz. Such sample rates are suitable for capturing most
`of the speech frequencies that would be of interest in general
`speechapplications, such as a speech recognition applica-
`tion. Accordingly,
`the audio codec 40 is configured and
`operable for achieving such sampling rates. It is also desir-
`able that the resolution ofthe A/D conversionin step 62 by
`the codec 40 is at
`least 12 bits in order to provide an
`acceptable quantization error. Reasonably priced devices
`that provide up to 16 bits ofresolution are commercially
`available and, thus, a 16-bit resolution is also discussed in
`
`the exemplary environment herein. Ofcourse, other higher
`resolutions might also be utilized.
`
`[0034] The output of the A/D conversion step 62 may,
`therefore, provide a continuous bit stream of from 132.3
`Kilobits/second (Kbps) (i.e., 11,025 Hzx12 bits resolution)
`to around 256 Kbps (i.e., 16,000 Hzx16 bits resolution).
`While such a bit stream would clearly exceed the capability
`of a typical Bluetooth SCO link using voice packets to
`transmit the speech signal, the present invention provides
`generation ofdatareflective ofthe audio signaland, utilizes
`an ACL link with data packets. Additional processing ofthe
`bit stream enhances the data for being transmitted, and then
`subsequently used with a speech application, such as speech
`recognition system, the additional processing also reduces
`the bandwidth needed totransfer the data over a Bluetooth
`link.
`
`to further process the bit stream, a
`[0035] Specifically,
`pre-emphasis step 64 may be utilized. A pre-emphasis step
`
`maybe performed, for example, bythe digital processor 42.
`In one embodiment, the pre-emphasis is typically provided
`in the digital processor by a first-orderfilter that is used to
`emphasize the higher frequencies of the speech spectra,
`which may contain informationofgreater value to a speech
`recognition systemthan the lower frequencies. One suitable
`filter may have an equationofthe form:
`P(tex(t}-a*yt-1)
`
`EQ |
`
`is utilized to
`[0036] where “a” is a scaling factor that
`control the amount of pre-emphasis applied. The range of
`the scaling factor is typically between 0.9 and 1.0 depending
`upon the amountof spectral tilt present in the speech data.
`Spectral
`tilt essentially refers to the overall slope of the
`spectrum ofa speechsignal as is knowntothose ofskill in
`theart.
`
`[0037] To further process the digitized audio signal in the
`form ofthe bit stream, the data stream is then processed
`through a frame generation step or steps 66, The frame
`generation might also be performed bydigital signal pro-
`cessing circuitry such as the digital processor 42 of FIG. 2.
`In the frame generation step, the data stream is subdivided
`into multiple successive frames to be further processed. In
`one embodiment of the invention, the frames are overlap-
`ping frames. Data overlap oneach endofthe frame is needed
`to eliminateartifacts that would be introduced bythe signal-
`processing algorithms further down the processing chain.
`For speech recognition systems, framed buffer sizes may
`typically range from 10 msec (i.e., 100 frames per second)
`to 100 msec (i.e., 10 frame per second) ofcontinuous audio
`samples. Frames may have an overlap ofaround 0 percent
`to 50 percent ofthe previous frame. The frames essentially
`represent portions of the digitized audio signal and the
`successive frames thus make up the whole captured audio
`signal fromstep 60. Follow-unprocessing is then performed
`on each frame sequentially.
`
`[0038] Referring again to FIG. 3, a windowing step may
`be provided in the digital signal processing by digital
`processor 42. For example, a Hamming window might be
`utilized to multiply each frame in one embodiment. Of
`course, other types of windowing circuits might also be
`utilized to adjust the digitized audio signal. The windowing
`step 68, such as with a Hamming window,serves to smooth
`the frequency content of the frame and reduce spectral
`leakage that would occur by the implicit rectangular win-
`dowing imposed by the framing operation of step 66.
`Without the windowing step 68 of the Hamming window,
`the sudden breaks at each end of the successive frames
`would cause ringing in the frequency content, spreading
`energy from some [requencies across the entire spectrum.
`The Hamming windowtapers the signal at the edgesofthe
`frame, thereby reducing the spectral leakagethat occurs. The
`Hamming windowhas a raised cosine shape and might be
`specified for a windowofsize “N,” as follows:
`
`-
`w(i) = 05440465 co ss |
`
`EQ 2
`
`In accordance with a further aspect ofthe present
`[0039]
`invention, an autocorrelation step 70is performed. Thatis,
`the autocorrelation of each frameis calculated in sequence.
`
`8
`
`

`

`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`The autocorrelation step 70 generates a set ofcoeflicients for
`each frame. The coeflicients are reflective of spectral char-
`acteristics of the audio signal portion represented by the
`frame. That is, the data sent by the present inventionis not
`simply a digitized voice signal, but
`rather is a set of
`coeflicients configured as data that are reflective ofspectral
`characteristics of the audio signal portion.
`
`is the envelope ofthe spec-
`Ina speechsignal, it
`[0040]
`trumthat containsthe data of interest to a speech recognition
`system. The autocorrelation step 70 computes a set of
`coefficients that parameterize the spectral envelope of the
`speech signal. That is, the coefficientset is reflective ofthe
`spectral envelope. This is a particular advantage of the
`present invention, with use in speech recognition systems,
`because speech recognition systems also use autocorrelation
`coeflicients. Therefore, in further processing, the data sent
`by the inventive wireless device, no additional computa-
`tional complexity would be imposed on the speech recog-
`nition system.
`
`[0041] Autocorrelation is computed on each frame as
`follows, for example:
`
`Rijs Dix) extr-i)
`
`EQ3
`
`[0042] where “R”is autocorrelation coeflicients,
`
`[0043] where “i” is in the range of 0 to the number of
`autocorrelation coeflicients generated minus 1, and
`
`[0044] where “t” is based onthesize of the frame.
`
`[0045] Autocorrelation algorithms are knownto a person
`of ordinary skill in the art to generate spectral information
`useful
`to a speech recognition system, The number of
`coeflicients to use depends primarily on the speech fre-
`quencyrange andthespectral tilt of the speech signal. As a
`general rule, two coefficients are generated for every 1,000
`Hz of speech bandwidth, plus additional coefficients as
`needed for the speech recognition system to compensate for
`spectral tilt. In accordance with one aspect of the present
`invention, the typical values of “i” as the numberofcoef-
`ficients, range from 10 to 21 coeflicients per frame. Each
`coefficient that is generated in the inventionis represented as
`a data word, and the data wordsizes typically range from 16
`to 32 bits for each coefficient. Of course, different ranges of
`coeflicients might be utilized, as well as different sized data
`words. However, the noted rangesare typical for an exem-
`plary embodiment ofthe invention. The autocorrelation step
`is also a process provided bythe digital signal processor, or
`digital processor 42.
`
`[0046] The resulting output from the autocorrelation step
`70 is digital
`speech data 72 that consists of a set of
`autocorrelation coeflicients reflective of the spectral charac-
`teristics of the captured analog audio input. Therefore, the
`coeflicie

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket