`
` 00=aROOA
`
`as) United States
`a2) Patent Application Publication co) Pub. No.: US 2007/0143105 Al
`Brahoet al.
`(43) Pub. Date:
`Jun. 21, 2007
`
`(54) WIRELESS HEADSET AND METHOD FOR
`ROBUSTVOICE DATA COMMUNICATION
`
`(52)
`
`“WiSMGL swszrasssscnacsccnasaccs 704/231; 704/246
`
`(76)
`
`Inventors: Keith Braho, Murrysville, PA (US),
`Roger Graham Byford, Apollo, PA
`(US); Thomas S. Kerr, Murrysville, PA
`(US); Amro El-Jaroudi, Pittsburgh, PA
`(US)
`Correspondence Address:
`WOOD, HERRON & EVANS, LLP
`2700 CAREW TOWER
`441 VINE STREET
`CINCINNATI, OH 45202 (US)
`(21) Appl. No.:
`11/303,271
`
`(22)
`
`Filed:
`
`Dec. 16, 2005
`Publication Classification
`
`(51)
`
`Int. Ch
`GIOL
`
`15/00
`
`(2006.01)
`
`(57)
`
`ABSTRACT
`
`A wireless device for use with speech recognition applica-
`lions comprises a frame generatorfor generating successive
`frames from digitized original audio signals,
`the frames
`representing portions of the digitized audio signals. An
`autocorrelationcircuit generates a set ofcoellicients for each
`frame,
`the coefficient set being reflective of spectral char-
`acteristics of the audio signal portion represented bythe
`frame.
`In one embodiment, the autocorrelation coeflicients
`may be used to predict
`the original audio signal
`to be
`subtracted from the original audio signals and to generate
`residual signals A Bluetooth transceiver is configured for
`transmitting the set of coeflicients and/or residual signals as
`data to another device, which utilizes the coeflicients for
`speech applications.
`
`&
`
`
`
`
`
`GENERATE
`PREDICTION
`COEFFICIENTS
`
`
`
`
`
`
`PREDICT CURRENT
`SPEECH VALUES WITH
`PREDICTION
`COEFFICIENTS
`
`SUBTRACT PREDICTED
`SPEECH VALUES FROM TRUE
`SPEECH SIGNAL VALUES TO
`GENERATE RESIDUAL
`SIGNALS
`
`
`
`ENCODE
`RESIDUAL
`SIGNALS
`
`
`
`TRANSMIT ENCODED
`RESIDUAL SIGNALS
`AS DATA
`
`
`1
`
`APPLE 1138
`APPLE 1138
`
`1
`
`
`
`Patent Application Publication Jun. 21,2007 Sheet 1 of 3
`
`US 2007/0143105 Al
`
`£0 ————_t_—___-
`
`BLUETOOTH
`ENABLED
`DEVICE
`
`ee
`
`FIG.1
`
`lo
`
`ke
`
`‘
`
`TRANSCEIVER
`
`4B
`
`44
`
`46
`
`BASEBAND
`PROCESSOR
`
`BLUETOOTH
`
`2
`
`
`
`Patent Application Publication Jun. 21,2007 Sheet 2 of 3
`
`US 2007/0143105 Al
`
`ay
`
`ANALOG
`AUDIO INPUT
`
`
`
`AUTO
`CORRELATION
`
`72
`
`DIGITAL SPEECH
`DATA
`.
`
`
`
`FIG. 3
`
`BLUETOOTH|BLUETOOTH|PAYLOAD|RETRANSMISSTIONS| FOREWARD|SYMMETRIC|ASYMMETRIC
`
`LINK
`PACKET
`SIZE
`SUPPORTED
`ERROR
`MAX RATE
`MAX RATE
`TYPE
`TYPE
`(BITS)
`CORRECTION
`(KBPS)
`(KBPS)
`[—act__[om|oe[|ves|ves|woes[1088]1008
`P—act__|ows|0-968|ves__|ves|2501|se72|544
`[act[ows|o-voe|ves|ves|2867_[a77e[363
`[act|om|o-2ie|ves|no|wee|rea[ree
`
`ons|o-4ea|ves|No|3004 [5056[06.4[ons|
`ac.|ows|o-ere|ves|no|4889[7232|576
`||
`rd
`P_sco
`[a
`sco
`sco
`
`a]
`
`3
`
`
`
`Patent Application Publication Jun. 21,2007 Sheet 3 of 3
`
`US 2007/0143105 Al
`
`34
`
`
`
`GENERATE
`PREDICTION
`COEFFICIENTS
`
`PREDICT CURRENT
`SPEECH VALUES WITH
`PREDICTION
`COEFFICIENTS
`
`
`
`
`
`
`
`
`
`
`
`
` SUBTRACT PREDICTED
`
`
`SPEECH VALUES FROM TRUE
`
`SPEECH SIGNAL VALUES TO
`
`
`GENERATE RESIDUAL
`
`
`SIGNALS
`
`
`
` ENCODE
`RESIDUAL
`
`
`SIGNALS
`
`
` TRANSMIT ENCODED
`
`
`RESIDUAL SIGNALS
`
`
`AS DATA
`
`
`4
`
`
`
`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`WIRELESS HEADSET AND METHOD FOR
`ROBUST VOICE DATA COMMUNICATION
`
`FIELD OF THE INVENTION
`
`[0001] This invention relates generally to wireless com-
`munication devices, and particularly to a wireless device,
`such as a headset, utilized for speech recognition applica-
`tions, and other speech applications.
`
`BACKGROUND OF THE INVENTION
`
`[0002] Wireless communication devices are used for a
`variety ofdifferent functions and to provide a communica-
`tion platform for a user. Oneparticular wireless communi-
`cation device is a headset. Generally, headsets incorporate
`speakers that convey audio signals to the wearer, for the
`wearerto hear, and also incorporate microphonesto capture
`speech from the wearer. Such audio and speech signals are
`generally converted to electrical signals and processed to be
`wirelessly transmitted or received.
`
`[0003] Wireless headsets have become somewhat com-
`monplace. Wireless headsets
`are generally wirelessly
`coupled with other devices such as cell phones, computers,
`stereos, and other devices that process audiosignals. In use,
`a wireless headset may be coupled with other equipment
`utilizing various RF communication protocols, such as the
`IEEE 802.11 standard for wireless communication. Other
`wireless communication protocols have been more recently
`developed, such as the Bluetooth protocol.
`
`low-power, short-range
`[0004] Bluetooth is a low-cost,
`radio technology designed specifically as a cable replace-
`ment to connect devices, such as headsets, mobile phone
`handsets,
`and computers or other
`terminal equipment
`together. One particular use of the Bluetooth protocol is to
`provide a communication protocol between a mobile phone
`handset and an earpiece or headpiece. The Bluetooth pro-
`tocol is a well; known protocol understood by a person of
`ordinary skill in the art, and thusall ofthe particulars are not
`set forth herein.
`
`[0005] While wireless headsets are utilized for wireless
`telephone communications,
`their use is also desirable for
`other voice or audio applications. For example, wireless
`headsets may play a particular role in speech recognition
`technology. U.S. patent application Ser. No. 10/671,140,
`entitled “Wireless Headset for Use in a Speech Recognition
`Environment,” and filed on Sep. 25, 2003, sets forth one
`possible use for a wireless headset and that application is
`incorporated herein by reference in its entirety. Speech
`recognition applications demand high quality speech or
`audiosignal, and thus a significantly robust communication
`protocol. While Bluetooth provides an effective means for
`transmission of voice for typical telephonyapplications, the
`current Bluetooth standard has limitations that make it
`significantly less eflective for speech recognition applica-
`tions and systems.
`
`For example, the most frequently used standard
`({0006]
`representing voice or speech data in the telephonyindustry
`utilizes 8-bit data digitized at an 8,000 Hz sample rate. This
`communication standard has generally evolved from the
`early days of analog telephony when it was generally
`accepted that a frequency range of 250 Hz to 4,000 Hz was
`adequate for voice communication over a telephone. More
`
`recent digital voice protocol standards, including the Blue-
`tooth protocol, have built upon this legacy.
`In order to
`achieve an upper bandwidth limit of 4,000 Hz, a minimal
`sample rate of at least twice that, or 8,000 Hz,
`is required.
`To minimize link bandwidth, voice samples are encoded as
`8 bits per sample and employa non-linear transfer function
`to provide increased dynamic range on the order of 64-72
`dB. The Bluetooth standard supports generally the most
`commontelephonyencoding schemes. At the physical layer,
`the Bluetooth protocol uses a “synchronous connection
`oriented” (SCO) link to transfer voice data.An SCO link
`sends data at fixed, periodic intervals. The data rate ofan
`SCOlink is fixed at 64,000bits per second (64 Kbps). Voice
`packets transmitted over an SCO link do not employ flow
`control and are not retransmitted. Therefore, some packets
`are dropped during normal operation, thus resulting in data
`loss of portions of the audiosignals.
`
`[0007] For most human-to-human communication appli-
`cations, such as telephony applications, the current Blue-
`tooth voice sampling and encoding techniques using SCO
`links and voice packets are adequate. Generally, humans
`have the ability to subconsciously use reasoning, context,
`and other clues to mentally reconstruct the original speech
`over a more lossy communication medium. Furthermore,
`where necessary, additional mechanisms, such as the pho-
`netic alphabet, can be employed toensure the reliability of
`the information transferred (e.g., “Z” as in Zulu).
`
`[0008] However, for human-to-machine communication,
`such as speech recognition systems, significantly better
`speech sampling and encoding performance is necessary.
`First, a morereliable data link is necessary, because dropped
`voice packets in the typical telephony Bluetooth protocol
`can significantly reduce the performance ofa speechrecog-
`nition system. For example, each dropped Bluetooth SCO
`packet canresult in a loss of 3.75 milliseconds ofspeech.
`This can drastically increase the probability of a speech
`recognition error.
`
`the information-bearing frequency
`[0009] Additionally,
`range of speech is now understood tobe in the range of 250
`Hz to 6,000 Hz, with additional less critical content avail-
`able up to 10,000 Hz. The intelligibility of consonants has
`been shown to diminish when the higher frequencies are
`filtered out ofthe speechsignal. Theretore,it is importantto
`preservethis high end ofthe spectrum.
`
`[0010] However, increasing the sample rate ofthe audio
`signal to 12,000 Hz, while still maintaining 8-bit encoding
`exceeds the capability of the Bluetooth SCO link, because
`such an encoding scheme would require a data rate of96
`Kbps, which is above the 64 Kbps Bluetooth SCO rate.
`
`Speech samples digitized as 8-bit data also contain
`[0011]
`a high degree of quantization error, which has the effect of
`reducing the signal-to-signal ratio (SNR) ofthe data fed to
`the recognition system. Speech signals also exhibit a vari-
`able dynamic range across different phonemes and different
`frequencies. In the frequency ranges where dynamicrangeis
`decreased, the effect of quantization error 1s proportionally
`increased. A speech systemwith 8-bit resolution can have up
`to 20 dB additional quantization error in certain frequency
`ranges for the “unvoiced” components ofthe speechsignal.
`Most speech systems reduce the effect of quantization error
`by increasing the sample size to a minimumof 12 bits per
`sample. Thus,
`the current Bluetooth voice protocol
`for
`
`5
`
`
`
`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`telephony is not adequate for speech application such as
`speech recognition applications.
`
`[0012] Therefore, there is a need for an improved wireless
`device for use in speech and voice applications. There is
`particularly a need for a wireless headset device that
`is
`suitable for use in speech recognition applications and
`systems. Still further, it would be desirable to incorporate a
`Bluetooth protocol in a wireless headset suitable for use with
`speech recognition systems.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0013] The accompanying drawings, which are incorpo-
`rated in and constitute a part of this specification, illustrate
`embodiments ofthe invention and, together with a general
`description of the invention given above, and the detailed
`description of the embodiments given below,
`serve to
`explain the principles ofthe invention.
`
` [0018]
`
`FIG. 1 illustrates the schematic view of a commu-
`[0014]
`nication system in which the present
`invention may be
`incorporated.
`
`FIG, 2 illustrates a block diagram view of compo-
`[0015]
`nents of a wireless communication device in accordance
`with the principles ofthe invention.
`
`FIG. 3 is a flowchart illustrating one operational
`[0016]
`embodimentofthe present invention.
`
`FIG. 4 is atable of Bluetooth protocol parameters
`[0017]
`utilized in accordance with one aspect ofthe present inven-
`tion.
`
`FIG. 5 is a flowchart illustrating another opera-
`tional embodiment ofthe present invention.
`
`DETAILED DESCRIPTION OF EMBODIMENT
`OF THE INVENTION
`
`invention addresses the above-refer-
`[0019] The present
`enced issues and noted drawbacks in the prior art by
`providing a wireless device that is useful for speech appli-
`cations and, particularly, useful for speech recognition appli-
`cations that require higher quality speech signals for proper
`performance. Tothat end, the present invention rather than
`relying upon voice sampling and coding techniques for
`human-to-human communication, such as over the tele-
`
`phone,utilizes correlation processing andrepresents spectral
`characteristics of an audio or speech signal in the form of
`data.
`
`Particularly, an autocorrelation component ofthe
`[0020]
`invention generates a set of coeflicients
`for successive
`portion or framesofa digitized audiosignal. The coefficients
`are reflective of spectral characteristics of the audiosignal
`portions, which are represented by multiple successive
`frames. The sets of coeflicients reflective of audio signal
`frames are transmitted as data packets in a wireless format.
`Although various wireless transmission protocols might be
`used, one particular embodimentutilizes a Bluetooth trans-
`ceiver and a Bluetooth protocol. However, rather than uti-
`lizing standard Bluetooth voice processing and voice pack-
`ets, the present inventiontransmits the sets of coeflicients as
`data, utilizing data packets in the Bluetooth protocol. Other
`wireless transmission schemes mayutilize their data trans-
`mission parameters, as well,
`in accordance with the prin-
`ciples ofthe present invention, as opposed to voice param-
`
`eters, which are general utilized to transmit voice for human-
`to-human communication.
`In one particular aspect,
`the
`Bluetooth transceiverutilizes an asynchronous connection-
`less (ACL) link for transmitting the coeflicients as data.
`
`invention overcomes the
`the present
`[0021] Therefore,
`inherent limitations of Bluetooth and other wireless com-
`munication methods for use with speech recognition, by
`providing desired link reliability between the devices, pro-
`viding high dynamic range and lower quantization error
`coding, and by providing less link bandwidth than current
`methods, while avoiding additional computational complex-
`ity on the speech recognition system.
`
`illustrates a schematic view of a system
`1
`FIG.
`[0022]
`incorporating the inventionandaspects thereof. The aspects
`and features ofthe present invention may be incorporated
`into various different wireless devices, which contain the
`
`signal processing capability and wireless transceiving capa-
`bility for implementing the elements of the invention. For
`example, the invention might be incorporated into a tradi-
`tional computer, such as a desktop or laptop computer, in a
`portable terminal or portable computing device, a cellular
`phone, a headset, or any other device whichis operable for
`processing, audio signals and transceiving the data which
`results from such signal processing according to the inven-
`tion.
`
`In one particular embodiment ofthe invention, it is
`[0023]
`incorporated into a wireless headset device worn byauser,
`and the data is transceived as data packets utilizing a
`Bluetooth protocol. Therefore,
`in the example discussed
`herein, a Bluetooth-enabled headset device is described.
`However,
`it should be understood that
`this is only one
`particular device and one particular data-transceiving pro-
`tocol that might be utilized. Other such devices and trans-
`ceiving protocols could also be used in accordance with
`aspect ofthe present invention. Therefore, the invention is
`not limited only to Bluetooth headsets.
`
`[0024] Referring again to FIG. 1, a user 10 is shown
`utilizing or wearing a headset device 12. The headset 12 has
`wireless transceiving capabilities, which are implemented
`byappropriate processing and transceiving circuitry 14. The
`headsetcircuitry 14 also handles othersignal processing as
`described below. The headset 12 will generally usually have
`one or more speakers 16 for the user to hear audio signals
`that are received as well as to hear those audio signals that
`are spoken and transmitted by the user. To capture audio
`signals, such as speech,
`the headset 12 incorporates a
`microphone 18. The processed signalsare referred to herein
`generally as “audio signals” and will
`include voice and
`speech signals, as well as other audio signals. Generally,it
`is desirable for
`the user
`to communicate commands,
`responses, general speech, etc.
`to one or more devices,
`which are wirelessly coupled to the headset. For example,
`the headset 12 might communicate with a portable terminal
`20 which maybe wornorcarried bytheuser, another person,
`ora piece of equipment. Such a wireless link is indicated by
`reference numeral 21, Similarly,
`the wireless headset 12
`might communicate with another enabled device, such as a
`cellular phone or other device 22, which is coupled by
`wireless link 23. In FIG. 1 device 22 is indicated as being a
`Bluetooth-enabled device, although other transceiving pro-
`tocols might be used. In another embodiment, headset 12
`maybe coupled directlyto a server or other computer device
`
`6
`
`
`
`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`24, generally through a wireless access point 26, which has
`an appropriate antenna 27. The wirelesslink to the server 24
`is indicated by reference 25. Generally, the wireless cou-
`pling of a device such as headset 12 to various other devices
`(not shown) in accordance with the principles ofthe present
`invention 1s also possible, as long as the devices havethe
`necessary processing and transceiving circuitry for imple-
`menting the invention.
`
`[0025] One particular speech application for the wireless
`headset device 12 or other inventive wireless device is a
`speech recognition application whereinthe speech generated
`by user 10 is analyzed and processed for performing mul-
`tiple tasks. For example, a user might be directed to perform
`atask through headset 12. Uponor during completionof the
`task, the user might speak to the system, through micro-
`phone18, to confirmthe instructionsandtask, ask additional
`information, or report certain conditions, for example. The
`speech of the user and the words spoken must
`then be
`analyzed or “recognized” to extract the information there-
`from. U.S. patent application Ser. No. 10/185,995, entitled
`“Terminal and Method for Efficient Use and Identification of
`Peripherals” and filed on Jun. 27, 2002, discusses use of a
`headset and speech recognitionin an inventory management
`system, for example, that application is incorporated herein
`byreference in its entirety. Various different speech recog-
`nition technologies may be used to process the unique data
`generated by the wireless headset or other device of the
`invention, and persons ofordinaryskill in the art knowsuch
`technologies. Therefore, the particulars ofa specific speech
`recognition systemare not set forth herein.
`
`FIG. 2 illustrates. in block diagram form, various
`[0026]
`components of a wireless headset 12 to implement one
`embodiment of the invention. The componentsillustrated,
`while separated into functional blocks in FIG. 2, might be
`combined together into a single integrated circuit, or maybe
`implemented utilizing individual circuit components. As
`noted above, headset 12 incorporates a microphone 18 for
`capturing an audio signal, such as a speechsignal from user
`10. Microphone 18 is coupled to an audio coder/decoder or
`audio codec 40. The audio codec 40 performs analog-to-
`digital (A/D) conversiononthe analog audio signal captured
`by microphone 18. The audio codec 40 also preferably
`performsanti-aliasing on the resulting digitized data as well.
`In effect, audio codec 40 provides a digitized audiosignal
`reflective ofthe analog audio signal captured by microphone
`18. Audio codec 40 supports the necessary data sample rates
`and bit resolutions noted below for implementing various
`embodiments of the present
`invention. Particularly, audio
`codec 40 provides sampling rates for high-quality audio
`signals that capture most of the speech frequencies that
`would be ofinterest to speech applications, such as speech
`recognition applications.
`
`[0027] The digital audiodata, or the digitized audiosignal,
`is supplied to a digital processor 42. The digital processor
`includes a microprocessoror other digital signal processor,
`volatile and non-volatile memory, and associated logic nec-
`essary to provide the desired processing ofthe signal for
`implementing the invention. For example, as discussed
`further below,
`the digital processor 42 may provide pre-
`emphasis processing. frame generation, windowing, and
`auto correlation processing of the digital data stream. The
`product of the digital processor 42 is processed, digitized
`
`audio or speech data, which is then supplied to a baseband
`processor 44, such as a Bluetooth baseband processor, for
`example.
`
`[0028] The baseband processor 44 then formats the pro-
`cessed digital speech data according to transceiving protocol
`standards and, in the exemplary embodiment, according to
`Bluetooth protocol standards, However, the digital speech
`data provided by baseband processor 44 is not transmitted as
`voice packets under the Bluetooth protocol, as it would be
`under typical Bluetooth telephony applications. Rather,
`in
`accordance with one aspect of the invention, the digitized
`speech is transmitted as data using data packets under the
`Bluetooth protocol. The baseband processor may perform
`such operations as adding packet header information, for-
`ward error correction, cyclic redundancy check, and data
`encryption.
`It also implements and manages the Bluetooth
`stack. As noted above, the Bluetooth transmission protocol
`is a standard transmission protocol, and thus will be readily
`understood bya person ofordinary skill in the art. As such,
`all ofthe various specifics associated with Bluetooth trans-
`mission are not discussed herein.
`
`[0029] A wireless transceiver, such as a Bluetoothtrans-
`ceiver 46, coupled to an antenna 48, performs all operations
`necessary to transmit and receive the voice data over a
`wireless link, such as a Bluetooth link. Wireless transceiver
`
`46 mightbe operable under another wireless communication
`protocol even though the exemplary embodiment discussed
`herein utilizes Bluetooth. The operations of Bluetooth trans-
`ceiver 46 may include, but are not limited to, such typical
`transceiver operations as conversion to RF frequencies,
`modulation and demodulation, spreading, and amplification.
`Antenna 48 provides efficient transmission and reception of
`signals in a wireless format.
`
`[0030] While one aspect ofthe invention is directed to
`transmitting a representation of captured speech signals
`from a device for use in speech recognition applications,
`wireless headset 12 also implements a receive data link. All
`the various functional blocks shown in FIG. 2 support
`bidirectional data transfer. The audio codec 40 is capable of
`performing digital-to-analog (D/A) conversion and sending
`the analog signal to one or more speakers 16. The audio
`codec 40 preferably separates A\D and D\A converters with
`independent channels so that full duplex operation is pos-
`sible, The received data link can be implemented utilizing
`either an asynchronous connection-less (ACL) link, as dis-
`cussed further belowfor one embodimentof the invention,
`or an SCO link. If telephony-quality data is acceptable on
`the receive link, then an SCO link can be employed, and
`standard Bluetooth audio processing can be performed by
`either the baseband processor 44 or the digital processor 42,
`or by some combination of both. The processed audio data
`will then be sent
`to the audio codec 40 for playback by
`speakers 16. Generally, an SCO link using various packets
`might be acceptable on the receive side, unlike the transmit
`side of the invention, because the received data link may
`contain audio that will be listened to and interpreted by a
`human(1.e. the user) rather than a machine. As with typical
`Bluetooth voice applications, a lower quality voice link is
`possible for telephony applications.
`
`[0031] However, if a more reliable link is necessary or
`desired, then an ACL link might be employed onthe receive
`side as well, according to the invention. In that case, audio
`
`7
`
`
`
`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`processing would be performed bythe digital processor 42.
`A more reliable receive data link may be necessary, for
`example, for safety-critical applications, such as for use by
`emergencyfirst responders.
`
`[0032] As noted above,it will be apparent to a person of
`ordinary skill
`in the art
`that
`the disclosed embodiment is
`exemplaryonly and a wide range ofother embodiments may
`be implemented in accordance with the principles ofthe
`present invention. For example, various different commer-
`cially available components are available to implement the
`elements described in FIG. 2, with varying levels ofinte-
`gration. Furthermore,
`the functional blocks in the Figure
`may be implemented using individual
`integrated circuit
`components, or several functional blocks may be combined
`together into a single integrated circuit.
`
`figure shows a
`that
`[0033] Referring now to FIG. 3,
`processing flow chart for one embodiment ofthe present
`invention. Analog audio signals, such as user speech or
`voice signal, are collected, such as by a microphone 18 inthe
`headset 12 in the example discussed herein. Alternatively, an
`analog audio signal might be retrieved from a storage
`medium, such as tape to be further processed andtransmitted
`according to the invention. The audio input is provided to
`circuitry for performing A/D conversion 62. For example,
`the audio signals might be directed to a codec 40 as
`discussed, In the A/D conversion step 62, the analog audio
`signal is converted to digital samples, whichare suitable for
`being further processed and for being used in speech appli-
`cations, such as in a speech recognition system. The A/D
`conversion step 62 may utilize typical sampling rates for
`high-quality audio signals, such as sampling rates 11,025
`Hz; 16,000 Hz: 22,060 Hz; 44,100 Hz, and 48,000 Hz. For
`the purposes of discussion of the exemplary embodiment
`herein, we will address the sample rates of 11,025 Hz and
`16,000 Hz. Such sample rates are suitable for capturing most
`of the speech frequencies that would be of interest in general
`speechapplications, such as a speech recognition applica-
`tion. Accordingly,
`the audio codec 40 is configured and
`operable for achieving such sampling rates. It is also desir-
`able that the resolution ofthe A/D conversionin step 62 by
`the codec 40 is at
`least 12 bits in order to provide an
`acceptable quantization error. Reasonably priced devices
`that provide up to 16 bits ofresolution are commercially
`available and, thus, a 16-bit resolution is also discussed in
`
`the exemplary environment herein. Ofcourse, other higher
`resolutions might also be utilized.
`
`[0034] The output of the A/D conversion step 62 may,
`therefore, provide a continuous bit stream of from 132.3
`Kilobits/second (Kbps) (i.e., 11,025 Hzx12 bits resolution)
`to around 256 Kbps (i.e., 16,000 Hzx16 bits resolution).
`While such a bit stream would clearly exceed the capability
`of a typical Bluetooth SCO link using voice packets to
`transmit the speech signal, the present invention provides
`generation ofdatareflective ofthe audio signaland, utilizes
`an ACL link with data packets. Additional processing ofthe
`bit stream enhances the data for being transmitted, and then
`subsequently used with a speech application, such as speech
`recognition system, the additional processing also reduces
`the bandwidth needed totransfer the data over a Bluetooth
`link.
`
`to further process the bit stream, a
`[0035] Specifically,
`pre-emphasis step 64 may be utilized. A pre-emphasis step
`
`maybe performed, for example, bythe digital processor 42.
`In one embodiment, the pre-emphasis is typically provided
`in the digital processor by a first-orderfilter that is used to
`emphasize the higher frequencies of the speech spectra,
`which may contain informationofgreater value to a speech
`recognition systemthan the lower frequencies. One suitable
`filter may have an equationofthe form:
`P(tex(t}-a*yt-1)
`
`EQ |
`
`is utilized to
`[0036] where “a” is a scaling factor that
`control the amount of pre-emphasis applied. The range of
`the scaling factor is typically between 0.9 and 1.0 depending
`upon the amountof spectral tilt present in the speech data.
`Spectral
`tilt essentially refers to the overall slope of the
`spectrum ofa speechsignal as is knowntothose ofskill in
`theart.
`
`[0037] To further process the digitized audio signal in the
`form ofthe bit stream, the data stream is then processed
`through a frame generation step or steps 66, The frame
`generation might also be performed bydigital signal pro-
`cessing circuitry such as the digital processor 42 of FIG. 2.
`In the frame generation step, the data stream is subdivided
`into multiple successive frames to be further processed. In
`one embodiment of the invention, the frames are overlap-
`ping frames. Data overlap oneach endofthe frame is needed
`to eliminateartifacts that would be introduced bythe signal-
`processing algorithms further down the processing chain.
`For speech recognition systems, framed buffer sizes may
`typically range from 10 msec (i.e., 100 frames per second)
`to 100 msec (i.e., 10 frame per second) ofcontinuous audio
`samples. Frames may have an overlap ofaround 0 percent
`to 50 percent ofthe previous frame. The frames essentially
`represent portions of the digitized audio signal and the
`successive frames thus make up the whole captured audio
`signal fromstep 60. Follow-unprocessing is then performed
`on each frame sequentially.
`
`[0038] Referring again to FIG. 3, a windowing step may
`be provided in the digital signal processing by digital
`processor 42. For example, a Hamming window might be
`utilized to multiply each frame in one embodiment. Of
`course, other types of windowing circuits might also be
`utilized to adjust the digitized audio signal. The windowing
`step 68, such as with a Hamming window,serves to smooth
`the frequency content of the frame and reduce spectral
`leakage that would occur by the implicit rectangular win-
`dowing imposed by the framing operation of step 66.
`Without the windowing step 68 of the Hamming window,
`the sudden breaks at each end of the successive frames
`would cause ringing in the frequency content, spreading
`energy from some [requencies across the entire spectrum.
`The Hamming windowtapers the signal at the edgesofthe
`frame, thereby reducing the spectral leakagethat occurs. The
`Hamming windowhas a raised cosine shape and might be
`specified for a windowofsize “N,” as follows:
`
`-
`w(i) = 05440465 co ss |
`
`EQ 2
`
`In accordance with a further aspect ofthe present
`[0039]
`invention, an autocorrelation step 70is performed. Thatis,
`the autocorrelation of each frameis calculated in sequence.
`
`8
`
`
`
`US 2007/0143105 Al
`
`Jun. 21, 2007
`
`The autocorrelation step 70 generates a set ofcoeflicients for
`each frame. The coeflicients are reflective of spectral char-
`acteristics of the audio signal portion represented by the
`frame. That is, the data sent by the present inventionis not
`simply a digitized voice signal, but
`rather is a set of
`coeflicients configured as data that are reflective ofspectral
`characteristics of the audio signal portion.
`
`is the envelope ofthe spec-
`Ina speechsignal, it
`[0040]
`trumthat containsthe data of interest to a speech recognition
`system. The autocorrelation step 70 computes a set of
`coefficients that parameterize the spectral envelope of the
`speech signal. That is, the coefficientset is reflective ofthe
`spectral envelope. This is a particular advantage of the
`present invention, with use in speech recognition systems,
`because speech recognition systems also use autocorrelation
`coeflicients. Therefore, in further processing, the data sent
`by the inventive wireless device, no additional computa-
`tional complexity would be imposed on the speech recog-
`nition system.
`
`[0041] Autocorrelation is computed on each frame as
`follows, for example:
`
`Rijs Dix) extr-i)
`
`EQ3
`
`[0042] where “R”is autocorrelation coeflicients,
`
`[0043] where “i” is in the range of 0 to the number of
`autocorrelation coeflicients generated minus 1, and
`
`[0044] where “t” is based onthesize of the frame.
`
`[0045] Autocorrelation algorithms are knownto a person
`of ordinary skill in the art to generate spectral information
`useful
`to a speech recognition system, The number of
`coeflicients to use depends primarily on the speech fre-
`quencyrange andthespectral tilt of the speech signal. As a
`general rule, two coefficients are generated for every 1,000
`Hz of speech bandwidth, plus additional coefficients as
`needed for the speech recognition system to compensate for
`spectral tilt. In accordance with one aspect of the present
`invention, the typical values of “i” as the numberofcoef-
`ficients, range from 10 to 21 coeflicients per frame. Each
`coefficient that is generated in the inventionis represented as
`a data word, and the data wordsizes typically range from 16
`to 32 bits for each coefficient. Of course, different ranges of
`coeflicients might be utilized, as well as different sized data
`words. However, the noted rangesare typical for an exem-
`plary embodiment ofthe invention. The autocorrelation step
`is also a process provided bythe digital signal processor, or
`digital processor 42.
`
`[0046] The resulting output from the autocorrelation step
`70 is digital
`speech data 72 that consists of a set of
`autocorrelation coeflicients reflective of the spectral charac-
`teristics of the captured analog audio input. Therefore, the
`coeflicie