throbber
Express Mail mailing label no.EL212283154US
`Date of Deposit: June 8, 2000
`I hereby certify that this paper or fee is being deposited with the United States Postal Service "Express Mail Post
`Office to Addressee" service under 37 CFR 1.10 on the date indicated above and is addressed to the Assistant
`Cor missio er for PktentsrlaQX PROVISIONAL APPLICATION, Washington, D. C. 20231.
`
`„."
`
`_ 0 0
`
`Anizov
`
`By:
`
`arrie Martin
`
`Attorney Docket No. AGLE0001PR
`
`IN THE U.S. PATENT AND TRADEMARK OFFICE
`Provisional Application Cover Sheet
`
`Assistant Commissioner for Patents
`BOX PROVISIONAL APPLICATION
`Washington, D.C. 20231
`
`Sir:
`
`1-1
`
`This is a request for filing a PROVISIONAL APPLICATION FOR PATENT under 37 CFR 1.53(b)(2).
`
`INVENTOR(s)/APPLICANT(s)
`
`Middle Initial Residence (City and Either State or Foreign Country
`First Name
`Last Name
`San Carlos, California
`Ted
`Calderone
`M.
`Woodside, California
`Paul
`Cook
`Palo Alto, California
`J.
`Mark
`Foster
`Additional inventors are being named on separately numbered sheets attached hereto.
`
`Title of the Invention
`
`METHOD AND APARATUS FOR CENTRALIZED VOICE-DRIVEN NATURAL
`LANGUAGE PROCESSING IN MULTI-MEDIA & HIGH BAND WIDTH
`APPLICATIONS
`
`Correspondence Address
`
`Michael A. Glenn
`3475 Edison Way, Ste. L
`Menlo Park, CA 94025
`
`Telephone No. (650) 474-8400
`
`Enclosed Application Parts (check all that apply)
`
`(X) Specification
`Number of Pages
`and 3 Drawing(s)
`
`( ) Other (specify)
`
`24
`
`( X) Small Entity Statement -( specify - IND or BUS)
`
`Filing Fee and Method of Payment
`
`X $75.00 for Small Entity
`
`$150 for Large Entity
`
`The Commissioner is authorized to charge the filing fee of $75.00 and any additional fees or credit any
`overpayment to Deposit Account No. 07-1445 (Order No. AGLE0001PR). A copy is enclosed for this
`purpose.
`
`spectfu y Submitted,
`
`EA E. JENNINGS,
`Reg. No. 44,804
`
`Comcast - Exhibit 1003, page 1
`
`

`

`Attorney Docket No. /4 GLC000 I F.
`
`Applicant/Patentee: AGILETV CORPORATION
`Serial or Patent No.
`Herewith
`Filed or Issued:
`Title: METHOD & APARATUS FOR CENTRALIZED VOICE-DRIVEN NATURAL LANGUAGE
`PROCESSING IN MULTI-MEDIA & HIGH BAND WIDTH APPLICATIONS
`
`Atty Docket No.
`
`AGLE0001PR
`
`VERIFIED STA I F,MENT (DECLARATION) CLAIMING SMALL ENTITY STATUS
`37 CFR 1.9(f) and 1.27(c)--SMALL BUSINESS CONCERN
`
`I hereby declare that I am
`[ the owner of the small business concern identified below:
`[ X ] an official empowered to act on behalf of the small business concern identified below:
`
`AGILETV CORPORATION
`ONAME OF CONCERN:
`ii!ADDRESS: 333 Ravenswood Ave.. Bldg. 202, Menlo Park, CA 94025
`
`1.7I hereby declare that the above identified small business concern qualifies as a small business concern as defined in 13
`'.:!CFR 121.3-18, and reproduced in 37 CFR 1.9(d), for purposes of paying reduced fees under 41(a) and (b) of Title
`''I‘735, U.S. Code, in that the number of employees of the concern, including those of its affiliates, does not exceed 500
`:persons. For purposes of this statement, (1) the number of employees of the business concern is the average over the
`'—'previous fiscal year of the concern of the persons employed on a full-time, part-time or temporary basis during each of
`the pay periods of the fiscal year, and (2) concerns are affiliates of each other when either, directly or indirectly, one
`.,concern controls or has the power to control the other, or a third party or parties controls or has the power to control
`
`DI hereby declare that rights under contract or law have been conveyed to and remain with the small business
`invention entitled: METHOD & APARATUS FOR
`the
`D concern identified above with regard to
`F:10ENTRALIZED VOICE-DRIVEN NATURAL LANGUAGE PROCESSING IN MULTI-MEDIA & HIGH
`, by inventor(s) Ted Calderone, Paul M.
`BAND WIDTH APPLICATIONS
`Cook, Mark J. Foster , described in
`
`[ X ] the specification filed herewith.
`[ ] application Serial No.
`[ ] patent #
`
`filed
`issued
`
`
`
`
`
`If the rights held by the above-identified small business concern are not exclusive, each individual, concern or
`organization having rights to the invention is listed below* and no rights to the invention are held by any person, other
`than the inventor, who could not qualify as a small business concern under 37 CFR 1.9(d) or by any concern which
`would not qualify as a small business concern under 37 CFR 1.9(d) or a nonprofit organization under 37 CFR 1.9(e).
`
`*Note: separate verified statements are required from each named person, concern or organization having rights to the
`invention averring to their status as small entities. (37 CFR 1.27)
`
`Name:
`Address:
`
`[ ] individual
`
`[ small business concern
`
`
`
`[ ] nonprofit organization
`
`Comcast - Exhibit 1003, page 2
`
`

`

`Attorney Docket No. A & L Ca 0 ? ( s
`
`I acknowledge the duty to file, in this application or patent, notification of any change in status resulting in loss of
`entitlement to small entity status prior to paying, or at the time of paying, the earliest of the issue fee or any
`maintenance fee due after the date on which status as a small entity is no longer appropriate. (37 CFR 1.28(b)).
`
`I hereby declare that all statements made herein of my own knowledge are true and that all statements made on
`information and belief are believed to be true; and further, that these statements were made with the knowledge that
`willful false statements and the like so made are punishable by fine or imprisonment, or both, under 1001 of Title 18
`of the U.S. Code, and that such willful false statements may jeopardize the validity of the application, any patent
`issuing thereon, or any patent to which this verified statement is directed.
`
`NAME OF PERSON SIGNING:
`James E. Jervis
`TITLE IN ORGANIZATION:
`Vice President, Intellectual Property
`ADDRESS OF PERSO SIGNING: 333 Rav swood Ave., Bldg. 202 Menlo Park, Ca 94025
`
`• SIGNATURE
`
`DA
`
`Comcast - Exhibit 1003, page 3
`
`

`

`Attorney Docket: AGLE0001PR
`
`AgileTv System
`
`Current Practice
`
`Currently voice operated functions using the latest voice recognition technologies
`have been limited to only a hand full of applications such as toys, appliances,
`some computers, voice dictation, cellular phones and voice control of one's
`home. Most of these applications use voice recognition technology that runs on
`a computer or voice recognition chip technology. These voice recognition
`systems typically offer only a limited number of commands and the recognition
`efficiency could be considered only fair and often requires voice training. There
`is however, another class of voice recognition technology called "Natural
`Language" which requires state of the art processing software and hundreds of
`megabytes of RAM to support. "Natural Language" voice recognition is currently
`being used in high end systems such as billing systems for the utility companies
`and the New York stock exchange because of it's ability to recognize spoken
`words from any voice. Some natural language systems claim to be totally user
`independent and are also capable of recognizing speech in several different
`languages. The system described in this disclosure uses a "Natural Language"
`voice recognition engine for both speech recognition and voice identification.
`
`With the exception of dictation systems that use voice recognition and show text
`
`as recognized on the monitor screen, all the systems mentioned above do not
`provide immediate feedback to voice input.
`
`In cable systems, several downstream data channels to receive channel and
`synchronization information are typically transmitted in a band of frequencies
`that, in the past, were reserved for re-broadcasting FM channels over cable.
`Currently most cable systems reserve some of the 88 to 108 MHz FM spectrum
`for set-top data transmission leaving the unused portion of that spectrum for
`
`AG LE0001P R_v 2
`
`Page 1 of 24
`
`Comcast - Exhibit 1003, page 4
`
`

`

`Attorney Docket: AGLE0001PR
`
`barker channels or additional video channels. The Open Cable Standard
`requires that the 70 to 130 MHz band be available for what's called Out-of-Band
`or (00B) or Downstream transmission.
`
`Most cable systems of today use the popular Hybrid Fiber Coax type architecture
`so that the downstream video signals, digital or analog, are sent to "hubs" or
`"nodes" via fiber optic cable. At the receiving side of the node, the optical signal
`from the fiber gets converted to an electrical signal containing all the analog and
`digital video RF carriers. This signal, in turn, is amplified and distributed via
`coaxial cable to all the subscribers in the node with a typical node consisting of
`anywhere from 500 to 1000 subscribers. Also, the 5 to 40 MHz upstream signal
`from each subscriber in the node is collected, combined and then sent to the
`headend via the same fiber used for the downstream video carriers or a separate
`fiber is used.
`
`Summary Introduction
`
`Certain embodiments include a multi-user control system for audio visual devices
`incorporating a voice recognition system that is centrally located, Certain further
`embodiments include centrally locating the voice recognition system in or near
`Cable Television (CATV) headend. Certain other further embodiments include
`centrally locating the voice recognition system in or near a server farm. Certain
`other further embodiments include centrally locating the voice recognition system
`in or near a web-site. Certain other further embodiments include centrally
`locating the voice recognition system in or near a gateway.
`
`Certain embodiments are capable of recognizing the vocal commands from a
`cable subscriber and then enacting upon those commands to control the delivery
`of entertainment and information services such as Video On Demand, Pay Per
`View, Channel control and the Internet. This system is unique in that the voice
`
`AG LE0001PR_v 2
`
`Page 2 of 24
`
`Comcast - Exhibit 1003, page 5
`
`

`

`Attorney Docket: AGLE0001PR
`
`command which is originated in the home of the subscriber is then sent upstream
`via a 5 to 40 MHz return path in the cable system to a central voice recognition
`The voice recognition and identification engine
`and processing engine.
`described in this disclosure is capable of processing thousands of voice
`commands simultaneously and therefore can offer a low latency entertainment
`and information experience to the Subscriber.
`
`Functional Description (what it does)
`
`In certain embodiments, overall media control system consists of several
`functional blocks with the first function being that of inputting a subscriber's voice
`into the system.
`
`Figure 1 depicts a remote control unit 1000 coupled with set-top apparatus 1100
`communicating via a two-stage wireline communications system containing a first
`wireline physical transport 1200 distributor node 1300 and a high speed physical
`transport 1400, possessing various delivery points 1500 and entry points 1510-
`1518 to a tightly coupled server farm 2000 with one or more gateways 2100, one
`or more tightly coupled server arrays 2200, in accordance with certain
`embodiments.
`
`Certain embodiments include a remote control unit 1000 fitted with microphone.
`Certain further embodiments include a remote control unit 1000 fitted with a
`special noise canceling microphone. Certain other further embodiments include
`a remote control unit 1000 fitted with microphone and a push-to-talk button.
`Certain further embodiments include a remote control unit 1000 fitted with a
`special noise canceling microphone and a push-to-talk button.
`
`The purpose of the microphone in the remote is to relay the subscriber's voice
`commands to the central voice recognition engine. The purpose of the push-to-
`
`AGLE0001PR_v_2
`
`Page 3 of 24
`
`Comcast - Exhibit 1003, page 6
`
`

`

`Attorney Docket: AGLE0001PR
`
`talk button is to begin the process of voice recognition by informing the system
`
`that the Subscriber is about to speak and also to provide immediate address
`information.
`
`In certain embodiments, voice commands from the subscriber are then
`preprocessed, that is the analog signals picked up from the microphone are
`converted to digital signals where they undergo additional processing before
`being transmitted to the voice recognition and identification engine located in the
`cable headend or other centralized location.
`
`The preprocessing function can also take place in the remote control 1000 itself
`
`before being transmitted to the set-top box 1100 or set-top appliance 1100, in
`certain embodiments.
`
`The voice signal from the remote 1000 is a digitally modulated RF signal whose
`properties comply with part 15 of the FCC rules in certain embodiments. The
`set-top box 1100 or set-top appliance 1100 receives the voice signal from the
`remote 1000 and performs the preprocessing function mentioned above.
`
`It 1100 is also used to transmit voice and sub'scriber address data to the
`centralized location or headend for voice recognition and identification. The RF
`signal from the remote 1000 is received by the set-top appliance 1100 and then
`re-modulated for upstream transmission 1200 on the 5 to 40 MHz cable return
`path. If a commercial set-top box 1100 is used to transmit the upstream voice
`data then the upstream channel allocation and transmission protocol are then
`controlled by the bi-directional communication system which is resident in the
`set-top box.
`
`In certain alternative embodiments, a commercial set-top box 1100 is not being
`used to transmit the digitized voice data upstream, the set-top appliance 1100 is
`
`AGLE0001PR_v 2
`
`Page 4 of 24
`
`Comcast - Exhibit 1003, page 7
`
`

`

`Attorney Docket: AGLE0001PR
`
`the upstream channel allocation and
`receiving
`for
`responsible
`then
`synchronization information. The data receiver in the set-top appliance 1100 is
`frequency agile, that is, it can be tuned to any one of several downstream data
`channels to receive channel and synchronization information.
`
`The system described specifically uses the subscriber's address information as a
`means by which the centrally located Agile Voice Processor can fetch a particular
`subscriber's parameter file. The parameter file contains voice training parameter
`data, voice identification parameters and user profiles for each member of the
`family at that address. This file can also contain parental control information and
`other specifics to that particular household such as language preferences or
`movie preferences or even internet preferences.
`
`The Addressed Subscriber Parameter File (ASPF) is what gives the system an
`identification and voice recognition.
`extremely high probability of user
`Addressing is an important feature when considering secure transactions such as
`banking because the speech recognition and identification system has to only
`identify an average of 4 parameter files for any one physical address, which of
`course, results in a very high probability of recognizing a specific speaker's voice.
`
`Financial level transactional security (Voice Banking) can be realized with this
`system and with the addition of simple voice encryption processing in the Voice
`Preprocessor even higher levels of security can be attained. This directly
`supports a method of contracting based upon an offer perceived by the user, a
`recognizable acceptance of the offer by an identified user.
`
`The set-top appliance 1100 is also capable of receiving and decoding data in the
`downstream path. This function is required in order to organize and synchronize
`the transmission of upstream data. Downstream data can consist of upstream
`
`AGLE0001PR_v 2
`
`Page 5 of 24
`
`Comcast - Exhibit 1003, page 8
`
`

`

`Attorney Docket: AGLE0001PR
`
`channel allocation information and voice verification overlay information coded as
`text.
`
`For embodiments where the set-top box 1100 is used for both upstream and
`downstream communication for the described voice command function, the
`function of the Set-top appliance is only to receive the RF signal from the remote
`control and then digitize and compress the voice signal, further preparing it for
`upstream transmission.
`
`New RF protocol standards such as "Blue Tooth" allow the remote control's RF
`signal to transmit the voice signal directly to the set-top box where again, the
`
`preprocessing can either be done in the remote control 1000 or in firmware within
`the set-top box 1100.
`
`Set-top boxes 1100 that employ the DOCSIS type cable modems such as Open
`Cable set-top boxes or the so called "Heavy Set-top boxes" from Scientific
`Atlanta and General Instruments are capable of sending and receiving voice data
`using efficient data transmission protocols.
`The DOCSIS protocol also
`incorporates error detection and correction capabilities as well as other
`transmission enhancements such as pre-equalization for more efficient and error
`free transmission.
`
`The voice signal transmitted from Subscriber's set-top box or set-top appliance
`1100 is received 1510 by the 5 to 40 MHz data receiving equipment 2100 in the
`cable Headend.
`
`If the digitized voice signal comes from a commercial set-top box such as a
`General Instruments or a Scientific Atlanta set-top, then the return path receiving
`equipment in the headend is specific to that type of box. Therefore, the data
`coming from this equipment, which will contain other upstream traffic, is parsed in
`
`AGLE0001PR_v 2
`
`Page 6 of 24
`
`Comcast - Exhibit 1003, page 9
`
`

`

`Attorney Docket: AGLE0001 PR
`
`such a way that only the voice commands and address information from the
`subscriber are used by the AgileTV Voice Recognition Engine in the headend.
`
`If the digitized voice signal that's being sent upstream comes from the AgileTV's
`
`set-top appliance then the upstream data receiver in the headend is a separate
`standalone unit designed to receive only voice command signals from the
`AgileTV's set-top appliance in the subscribers home. Using the set-top
`appliance as the upstream transmitter allows the use of custom upstream
`protocols such as FM, AM, PSK or spread spectrum digital transmission. Digital
`transmission techniques such as QPSK or QAM can also be employed but
`
`require more costly transmission and receiver equipment.
`
`Upon receiving the digitized and preprocessed voice signal from the subscriber's
`set-top box or set-top appliance, the received upstream signal will be in the form
`of an Ethernet data stream containing voice and address information. Since the
`Agile Voice Processing Unit (AVPU) is a high speed voice processing unit
`capable of processing the data from several nodes, the digital voice signals from
`each of these nodes are combined into a single high speed digital bit stream in
`the input multiplexer of the AVPU)
`
`Voice Processing Engine description
`
`Upstream signals 1510 are received at the Agile Voice Processor Unit (AVPU)
`RPD 2100, in certain embodiments.
`
`1. Voice and data signals are received from commercial return path data
`receivers, or:
`
`2. Voice and data signals are received and decoded by custom return path
`receivers using at least one of the following protocol options: FM or AM
`
`AGLE0001 PR_v_2
`
`Page 7 of 24
`
`Comcast - Exhibit 1003, page 10
`
`

`

`Attorney Docket: AGLE0001 PR
`
`modulation/demodulation, FDMA, TDMA, FSK, PSK, or QPSK digital
`Spectrum modulation/demodulation,
`
`modulation/demodulation,
`
`Spread
`
`Telephony or cellular return or Wireless
`
`• Application Introduction
`
`The AVPU Engine is not an application service, in and of itself. While AgileTV
`may provide new end user applications, the primary function of the AVPU Engine
`
`is to provide voice control services for existing applications, such as Interactive
`Program Guides, and Video On Demand services.
`
`• Application Registration
`
`At system initialization time, applications such as VOD or Interactive Program
`Guides that wish to utilize voice recognition services must first register with the
`AVPU system. A standard program interface is utilized to enable each
`application to specify its complete menu hierarchy in the form of a tree structure.
`This tree contains "labels" for each menu, along with the text of each "button" on
`each menu screen. This provides the information to the AVPU engine to enable
`it to independently provide voice navigation services through the menu hierarchy
`on behalf of the application. This menu hierarchy represents the "static" portion
`of the application's data.
`
`In addition to the static menu structure, it is also the responsibility of the
`application to inform AVPU of "dynamic" content — for example, the names of
`movies in a VOD system, or program names and times in an interactive program
`guide. Each time a user enters a menu context in which dynamic content
`appears, the application will inform the system of this context by passing a
`"handle" associated with the list of names that comprise the dynamic content.
`The system will combine the static menu content with the augmented dynamic
`
`AGLE0001 PR_v 2
`
`Page 8 of 24
`
`Comcast - Exhibit 1003, page 11
`
`

`

`Attorney Docket: AGLE0001PR
`
`content (see Similarity Searching below), as well as application-independent
`keywords such as HELP, in order to form a complete "grammar". This construct
`is then passed to the voice recognition engine to maximize recognition accuracy.
`
`Given that dynamic content, by definition, varies, it is the application's
`responsibility to inform the system whenever the content changes.
`In an
`interactive TV guide application, for example, the application may register a new
`set of dynamic content every one-half hour. For a VOD system, this registration
`would be performed whenever the database of movies offered changes.
`
`• Application Interface
`
`Once registration has been completed, and the system is being used, recognition
`of an utterance will cause a signal to be sent back to the application. This signal
`will inform the application to perform the requested action, and/or to update the
`contents of the screen as a result of the user's request.
`In this manner, the
`application can utilize the system's voice recognition services with minimal
`modifications to the application's code, while retaining the same graphical "look
`and feel" that users have become accustomed to.
`
`• Using the System
`
`In the subscriber's home, AgileTV supplies a voice-enabled remote control,
`which contains both a microphone and, in certain further embodiments, a Push-
`To-Talk (PTT) switch, as well as traditional universal remote control functionality.
`While the conventional remote control functions are transmitted via IR, voice
`output is transmitted as RF to a small "VoiceLink" pod located at the set-top box.
`
`When the PTT button is pushed by the user, the remote control sends a "PTT
`active" command to the VoiceLink, which then informs the set-top box to place an
`
`AGLE0001 PR_v 2
`
`Page 9 of 24
`
`Comcast - Exhibit 1003, page 12
`
`

`

`Attorney Docket: AGLE0001 PR
`
`icon on the screen, indicating to the user that the system is "listening" to them.
`Next, as the user speaks into the microphone, the speech is digitized,
`compressed, and transmitted to the VoiceLink.
`
`The VoiceLink encrypts the speech sample to provide security, then adds
`
`subscriber address information, a length code, and a Cyclical Redundancy Code
`(CRC) to enable data transmission errors to be detected.
`
`In homes with "heavy" set-top boxes, the VoiceLink will transmit this voice
`information to the set-top box, which will then transmit it to the headend as a
`series of packets.
`
`Otherwise, the VoiceLink will directly transmit the voice stream to the headend
`itself. This process continues until the VoiceLink receives a "PTT Release" from
`the remote, indicating end of speech. This information is also transmitted to the
`headend, signaling end of utterance.
`
`AGLE0001PR_v_2
`
`Page 10 of 24
`
`Comcast - Exhibit 1003, page 13
`
`

`

`Attorney Docket: AGLE0001PR
`
`• Address Decoding
`
`Each individual consumer's interface (i.e. set-top box or set-top appliance) will
`have a unique address that is determined during the manufacturing process. As
`voice packets are transmitted upstream, this address information is pre-pended
`to the voice packets, enabling rapid determination of which household the voice
`sample is being received from. This address information is key to improving the
`efficiency of several different headend processing stages. The first address
`decode is used to assign an input buffer address to the sample. This input buffer
`is used to collect incoming voice packets until the final packet of a speech
`utterance has been received.
`
`Certain further embodiments use on-the-fly Cyclical Redundancy Code (CRC)
`error checking generation. Each time a packet is read in, CRC in CPU registers
`is computed as each byte is read, then the partial CRC is stored at the end of the
`stored packet. When the next packet arrives, the partial CRC is read from where
`it was stored, and the new packet data is appended to the end of the previous
`packet, overwriting the temporary CRC. This continues until a complete voice
`sample has been received. By doing this, memory accesses is cut in half
`compared to first storing the string, then making a second pass generating the
`CRC.
`
`Once a complete speech utterance has been received, the Input CPU will use
`the sample's source address to target the speech data to a specific voice
`processing CPU. This direct correspondence between the source address and a
`specific voice CPU is important, since it allows voice CPUs to efficiently cache
`user-specific parameters for the households they serve. Without this mapping,
`the bandwidth necessary to move household-specific data to each voice CPU
`would be prohibitive. In certain further embodiments, a translation table is
`actually used: this allows voice->CPU assignments to be changed dynamically in
`
`AGLE0001 PR_v 2
`
`Page 11 of 24
`
`Comcast - Exhibit 1003, page 14
`
`

`

`Attorney Docket: AGLE0001PR
`
`the event of a hardware failure, while retaining the efficiency advantages of direct
`mapping.
`
`• Grammar File is Loaded
`
`In order for a speech sample to be processed by the voice recognition engine, it
`is necessary for a voice CPU to first contain in its local memory a copy of the
`
`grammar definition associated with the household's set-top box state. Speech
`recognition is most effective when the speech engine is aware of which words
`are most likely to be spoken, the order in which these words may appear, and the
`meaning of various sequences of words — this information is contained in a
`construct known as a grammar.
`
`Before transferring the new speech sample to a voice CPU, the grammar
`associated with the speech sample is transferred to the target speech CPU,
`using a simple LRU queue.
`If the voice CPU contains empty space in its
`grammar buffer memory, then the indicated grammar is transferred directly to the
`empty buffer area from disk. If not, then the least-recently-used grammar buffer
`entry is discarded, and the new grammar information is loaded into the vacated
`buffer memory.
`
`• Household Parameter File is Loaded
`
`The next step in processing the voice sample is to ensure that the parameters
`associated with this household are already cached in the specific voice CPU's
`RAM.
`If these parameters are not present, then the least-recently-used
`parameter cache entry is evicted from the cache.
`
`To do this, the oldest cache entry on this voice CPU is first examined to see if it
`has been modified. If so, the cache entry will be written to disk, and the cache
`
`AGLE0001PR_v 2
`
`Page 12 of 24
`
`Comcast - Exhibit 1003, page 15
`
`

`

`Attorney Docket: AGLE0001PR
`
`slot is then declared vacant. Next, the household speech parameters associated
`with the new speech sample are loaded into the vacated cache block. During the
`relatively long access times needed to load a new set of household parameters
`from disk (and optionally to write the old parameters to disk), the current voice
`sample will be held in the input buffer in a "waiting" state.
`
`Only after the new household speech parameters have been loaded into the
`targeted voice CPU will the voice sample be moved into the work queue for the
`voice CPU. In this manner, the voice CPU is not held off from processing other
`voice requests during lengthy disk accesses.
`Instead, the voice CPU will
`continue to process other voice samples associated with households whose
`
`parameters are already in cache.
`
`• Assignment to a Voice CPU
`
`Once the voice parameters associated with a speech sample are finally cached
`in a voice CPU, the speech sample is assigned to the voice CPU by placing a
`descriptor for the voice sample on the target voice CPU's "work" queue. As
`speech samples are processed, they are removed from the front of the work
`queue by the voice CPU.
`
`Eventually, the voice CPU will reach the location of the current input sample.
`Once this occurs, the speech sample is transferred into the voice CPU's local
`memory under DMA control, and the status of this voice sample is changed to
`"Next". This transfer occurs in parallel with the processing of the prior speech
`sample, ensuring that voice CPU utilization is maximized.
`
`Once this transfer is complete, and the voice CPU completes processing of the
`prior sample, the status of this voice sample is changed to "Current", and the
`voice recognition engine begins processing this sample.
`
`AGLE0001PR_v 2
`
`Page 13 of 24
`
`Comcast - Exhibit 1003, page 16
`
`

`

`Attorney Docket: AGLE0001PR
`
`• Deadlock Elimination
`
`Successful processing of a speech sample requires both the proper grammar
`and the proper household parameter information be simultaneously loaded into a
`voice CPU. The possibility exists of having a race condition in which a recently-
`loaded grammar or speech parameter file is evicted prior to its use, in the
`process of loading the grammar or speech parameters for the current voice
`sample.
`
`To eliminate this race condition, the total number of speech samples sitting in the
`waiting and working queues of a voice CPU may not exceed the number of
`cache entries in the voice CPU.
`
`• Speaker Identification
`
`The first step in recognizing the current speech sample is to determine which
`individual person pronounced the current utterance. To do this, the "Speaker
`Identification" software module running on the targeted voice CPU compares the
`vocal characteristics of this speech sample with the characteristics of the
`speakers who have been previously identified in this household — these voice
`"templates" are an important component of the speech parameters that are
`cached in the CPU.
`
`In the vast majority of utterances, the incoming speech sample will match the
`characteristics of a previously-identified speaker. When this occurs, the speech
`sample is passed on to the next phase, speech recognition.
`
`If the speech sample is not identified with an existing speaker, then a "new user"
`routine is invoked, enabling a new user to be associated with this household.
`This routine records the new individual's speech parameters in this household's
`
`AGLE0001PR_v_2
`
`Page 14 of 24
`
`Comcast - Exhibit 1003, page 17
`
`

`

`Attorney Docket: AGLE0001PR
`
`speech parameters, so that during subsequent utterances, the new speaker will
`be identified by the speaker identification process.
`
`• Speech Recognition
`
`The inputs to the speech recognition software module are a speech sample, an
`individual user's speech parameters, and the grammar to be recognized. The
`speech engine determines the most likely utterance based on statistical analysis,
`and returns a text string corresponding to the utterance. As a statistical process,
`this matching process is probabilistic: along with the returned text string, the
`speech engine will also return a "percentage of match likelihood": this enables
`different applications to respond differently based on the calculated confidence in
`the recognition result.
`
`For recognition results having a low "cost", such as a request to display listings
`for a particular movie, lower confidence criteria need apply. For recognition
`results with a high cost to the user, such as a request to purchase a movie,
`higher confidence thresholds may be required (furthermore, purchase verification
`will be requested).
`
`When recognition accuracy is particularly low, and the voice recognition engine
`determined partial matches to more than one possible phrase, the engine will
`return the text of several possible matches. This process, known as "N-Best"
`enables an application, or the user, to select from several alternative recognition
`results.
`
`• Voice recording
`
`In cases where a transaction will result in a charge to the use

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket