throbber
Multimodal Text Entry on Mobile Devices
`Bo-June (Paul) Hsu1, Milind Mahajan2, Alex Acero2
`1MIT & Microsoft Research
`2Microsoft Research
`
`1 Motivation
`Text entry on mobile phones is growing in popularity due to increasing use of applications such
`as SMS and e-mail. Mobile phones and mobile devices in general do not have a keyboard which
`is as convenient as that on the desktop computers. Mobile phones in particular tend to have only
`a numeric keypad on which multiple letters map to the same key. Users currently use methods
`such as multi-tap (“this” = 8 44 444 7777) or Tegic Communications’ T9 (“this” = 8447) to enter
`text using a numeric keypad. Novice users of these methods achieve text entry rates of 5-10
`words per minute [1].
`2 Multi-modal combination of speech & keypad
`Speech has a high communication bandwidth which is estimated at about 250 words per minute
`[2]. However, text entry throughput using automatic speech recognition (ASR) is much lower in
`practice due to the time spent by the user in checking for and correcting the ASR errors which
`are inevitable with the current state of the art ASR systems. We will demonstrate a system which
`uses a combination of speech and keypad input to improve the overall text entry experience for
`the user of the mobile devices.
`3 Overview of operation
`The user presses and holds the microphone button on the side of the device and speaks a
`sentence. The user is then presented on screen with the best hypothesis and a selection list for
`only the first word of the sentence. If the best hypothesis word presented on screen is correct the
`user presses OK button (a designated confirmation key on the keypad). Otherwise, if the desired
`word is in the alternates list, the user navigates to it using the up-down keys and presses the OK
`button to confirm it. Alternatively, the user starts entering the word using the keypad. The
`algorithm re-computes the best hypothesis word and the alternates list using the a-posteriori
`probability obtained by combining the information from the following sources:
`•
`the keypad entries for the prefix of the word
`• speech recognition result lattice
`• words before the current word which have already been corrected
`•
`language model
`After, the user correctly enters the desired word; the process is similarly repeated for the
`subsequent words in the sentence.
`
`Exhibit 1021
`Page 01 of 02
`
`

`

`4 Key Features
`• Combination of speech and keypad
`ASR is inherently ambiguous and the results in a probability distribution over word sequences.
`Entering a prefix of a word with a keypad will also not lead to a unique word hypothesis in
`general. However, since the ambiguities are orthogonal to some extent between the two
`modalities, the combination helps to improve the overall performance. As an illustrative
`example, consider that the words “good” and “home” both correspond to the same key sequence
`4663# under T9 but are not highly confusable for ASR.
`• Continuous speech with sequential commit
`At first glance, the combination of continuous speech ASR with a word by word correction
`mechanism (sequential commit) may appear to be sub-optimal and is certainly counter-intuitive.
`However, this is a key innovation which we believe contributes to a better overall user
`experience given the current state of the art for ASR on mobile devices.
`ASR errors often involve segmentation errors. Consider the famous (though hypothetical)
`example of the phrase “recognize speech” being recognized by ASR as “wreck a nice beach”.
`Showing the full ASR result leads to difficult choices for the correction interface. Which words
`should the user select for correction? When the user attempts to correct the word “wreck”, should
`it cause the rest of the phrase to change? How would the user feel about other words changing as
`a side-effect of corrections?
`We avoid all these issues by presenting word by word alternates sequentially from left to right.
`In this hypothetical example, the user would perhaps select the word “recognize” as the second
`alternate to word “wreck” and our algorithm will probably present “speech” as the top hypothesis
`for the next word given the context of the previous correction.
`Word by word text entry is already a familiar user interface for the users of the mobile devices.
`Providing the same user interface for both speech assisted text entry and keypad only text entry
`allows the users to switch seamlessly between the two modes which is advantageous since the
`users may not use speech all the time even if it leads to better throughput due to social
`considerations.
`• ASR latency hiding
`On mobile devices, ASR result latency can be a problem. Our user interface with sequential
`commit allow us to take advantage of the intermediate ASR hypotheses and start presenting the
`word hypothesis to the user before the ASR has completely finished.
`• Graceful degradation
`We also consider the language model (a-priori word probabilities in context) in addition to ASR
`lattice in creating the word hypotheses. This allows us to hypothesize words which are not in the
`ASR lattice and also leads to more graceful degradation in the presence of ASR errors.
`References:
` [1] C. L. James & K. M. Reischel, “Text input for mobile devices: comparing model prediction to actual
`performance”, Proceedings of the SIGCHI conference on Human factors in computing systems, 2001.
`[2] M. Kolsch & M. Turk, “Keyboards without Keyboards: A Survey of Virtual Keyboards”, University of
`California at Santa Barbara Technical Report 2002.
`
`Exhibit 1021
`Page 02 of 02
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket