throbber
Multimodal
`
`User Interfaces in the Open Agent Architecture
`
`Douglas B. Moran
`Adam J. Cheyer
`Luc E. Julia
`David L. Martin
`SRI International
`333 Ravenswood Avenue
`Menlo Park CA 94025 USA
`+14158596486
`{moran,cheyer,julia,martin}
`
`@ai.sri.com
`
`Sangkyu Park
`Intelligence Section
`Artificial
`Electronics and Telecommunications
`Research Institute (ETRI)
`161 Kajong-Dong
`Yusong-Gu, Taejon 305-350 KOREA
`+82 428605641
`skpark@com.etri.
`
`re.kr
`
`ABSTRACT
`
`of the Open Agent Architecture
`The design and development
`(OAA)l
`system has focused on providing
`access to agent-
`based applications
`through an intelligent,
`cooperative,
`dis-
`tributed,
`and multimodal
`agent-based user interfaces.
`The
`current multimodal
`interface supports a mix of spoken lan-
`guage, handwriting
`and gesture, and is adaptable to the user’s
`preferences,
`resources and environment.
`Only the primary
`user interface agents need run on the local computer,
`thereby
`simplifying
`the task of using a range of applications
`from a
`variety of platforms, especiall y low-powered
`computers such
`as Personal Digital Assistants (PDAs). An important
`consid-
`eration in the design of
`the OAA was to facilitate mix-and-
`match:
`to facilitate
`the reuse of agents in new and unantici-
`pated applications,
`and to support
`rapid prototyping
`by facil-
`itating the replacement of agents by better versions.
`
`this
`the agents and tools developed as part of
`of
`The utility
`ongoing research project has been demonstrated by their use
`as infrastructure
`in unrelated projects.
`
`Keywords:
`handwriting,
`
`agent architecture, multimodal,
`natural
`language
`
`speech, gesture,
`
`INTRODUCTION
`
`A major component
`systems
`of our research on multiagent
`is in the user interface to large communities
`of agents. We
`i-navedeveloped agent-based multimodal
`user interfaces us-
`ing the same agent architecture
`used to build the back ends
`of
`these applications. We describe these interfaces and the
`ltarger architecture,
`and outline some of
`the applications
`that
`have been built using this architecture and interface agents.
`
`Permissionto make digital/hard
`copies of all or pafi of
`for
`this material
`persorrrd or classroom use is granted without
`fee provided
`that
`the copies
`are not made or distributed
`for profit or commercial
`advantage,
`tbe copy-
`right notice,
`the title of
`the publication
`and its date appear, and notice is
`is by permission of the ACM, Inc. To copy other-wise,
`given that copyright
`to republish,
`to post on servers or to redistribute
`to lists,
`requires specific
`permission
`and/or
`fee.
`IU I 97, Orlando Florida USA
`@1997 ACM 0-89791-839-8/96/01
`
`..$3.50
`
`OF OPEN AGENT ARCHITECTURE
`OVERVIEW
`system
`The Open Agent Architecture
`(OAA)
`is a multiagent
`from
`that
`focuses on supporting the creation of applications
`agents that were not designed to work together,
`thereby fa-
`cilitating
`the wider
`reuse of
`the expertise embodied by an
`agent. Part of
`this focus is the user interface to these ap-
`plications, which can be viewed as supporting the access of
`human agents to the automated agents. Key attributes of the
`OAA are
`
`l
`
`l
`
`b
`
`l
`
`l
`
`l
`
`in multi-
`The OAA supports agents written
`Open:
`Currently
`ple languages and on multiple
`platforms.
`supported languages are C, Prolog,
`Lisp, Java, Mi-
`crosoft’s Visual Basic and Borland’s Delphi.
`Cur-
`rently supported platforms
`are PCs (Windows
`3.1 and
`95), Sun Workstations
`(Solaris 1.1 and 2.x) and SGIS.
`
`The agents that compose an application
`Distributed
`can run on multiple platforms.
`
`Extensible: Agents can be added to the system while
`it is running, and their capabilities will become imme-
`diately available to the rest of
`the agents. Similarly,
`agents can be dynamically
`removed from the system
`(intentionally
`or not).
`
`can be run from a
`applications
`Mobile: OAA-based
`lightweight
`portable computer
`(or PDA) because only
`the user
`interface
`agents need run on the portable.
`They provide the user with access to a range of agents
`running on other platforms.
`
`The user interface is implemented with
`Collaborative:
`agents, and thus the user appears to be just another
`agent
`to the automated agents. This greatly simplies
`creating
`systems where multiple
`humans and auto-
`mated agents cooperate.
`
`The user interface supports hand-
`Multiple Modalities:
`writing, gesture and spoken language in addition to the
`traditional
`graphical user interface modalities.
`
`61
`
`Comcast - Exhibit 1005, page 61
`
`

`

`Users can enter commands
`Interaction:
`l Multimodal
`for example, a spoken com-
`with a mix of modalities,
`to be acted on is identified
`by
`mand in which the object
`a pen gesture (or other graphical pointing operation).
`
`by work being done as part
`The OAA has been influenced
`of DARPA’s 13 (Intelligent
`Integration
`of
`Information)
`pro-
`gram (http://isx.corn/pub/I3)
`and Knowledge Sharing Effort
`(http://www-ksl.
`stanford.edu/knowledge-sharing/)
`[13].
`
`THE USER INTERFACE
`The User
`Interface
`Agent
`The user interface is implemented with a set of agents that
`have at their
`logical
`center an agent called the User Inter-
`face (U]) Agent.
`The User
`Interface Agent manages the
`various modalities
`and applies additional
`interpretation
`to
`those inputs as needed. Our current system supports speech,
`handwriting
`and pen-based gestures in addition to the con-
`ventional
`keyboard and mouse inputs. when
`speech input
`is detected,
`the UI Agent sends a command to the Speech
`Recognition
`agent to process the audio input and to return the
`corresponding
`text. Three modes are supported for speech
`input:
`open microphone,
`push-to-talk,
`and click-to-start-
`talking.
`Spoken and handwritten
`inputs can be treated as
`either
`raw text, or interpreted
`by a natural
`language under-
`standing agent.
`The first style
`There are two basic styles of user interface.
`parallels the traditional
`graphical user interface (GUI)
`for an
`application:
`The user selects an application
`and is presented
`with a window that has been designed for the application
`im-
`plemented by that agent and that is composed of the familiar
`GUI-style items.
`In this style interface,
`the application
`is typ-
`ically implemented
`as a primary agent, with which the user
`interacts, and a number of supporting agents that are used by
`the primary
`agent, and whose existence is hidden from the
`user. When text entry is needed,
`the user may use handwrit-
`ing or speech instead of
`the keyboard, and the pen may be
`used as an alternative to the mouse. Because the UI Agent
`handles all
`the alternate modalities,
`the applications
`are iso-
`lated from the details of which modalities
`are being used.
`This simplifies
`the design of the applications,
`and simplifies
`adding new modalities.
`not only is there no
`interface,
`In the second basic style of
`agents are largely invisible
`to
`primary
`agent,
`the individual
`the user, and the user’s requests may involve the cooperative
`actions of multiple
`agents.
`In the systems we have imple-
`mented,
`this interface is based on natural
`language (for ex-
`ample, English),
`and is entered with either speech or hand-
`writing. When the UI Agent detects speech or pen-based
`input,
`it
`invokes a speech recognition
`agent or handwriting
`recognition
`agent, and sends the text returned by that agent
`to a natural
`language understanding
`agent, which produces a
`logicalform representation of the user’s request, This logical
`
`agent, which identifies
`form is then passed to a Facilitator
`the subtasks and delegates them to the appropriate
`applica-
`tion agents. For example,
`in our Map-based Tourist
`Informa-
`tion application for the city of San Francisco,
`the user can ask
`for the distance between a hotel and sightseeing destination.
`The locations
`of
`the two places are in different
`databases,
`which are managed by different agents, and the distance cal-
`culation is performed
`by yet another agent.
`
`These two basic styles of interfaces can be combined in a sin-
`gle interface.
`In our O@ce Assistant application,
`the user is
`presented with a user interface based on the Rooms metaphor
`and is able to access conventional
`applications
`such as e-
`mail, calendw, and databases in the familiar manner.
`In ad-
`dition there is a subwindow for spoken or written natural
`lan-
`guage commands that can involve multiple
`agents.
`
`typi-
`inputs,
`research is multimodal
`focus of our
`A major
`cally a mix of gesture/pointing
`with spoken or handwritten
`language. The UI agent manages the interpretation
`of the in-
`dividual modalities
`and passes the results to a Modality Co-
`ordination
`agent, which returns the composite query, which
`is then passed to the Facilitator
`agent
`for delegation
`to the
`appropriate application
`agents (described in subsequent sec-
`tions).
`
`Speech Recognition
`
`systems, sub-
`speech recognition
`We have used different
`criteria. We use research sys-
`stituting
`to meet different
`laboratory
`in our organization
`tems developed by another
`[3] and by a commercial
`spin-
`(http://www-speech.sri.corn/)
`off
`from that
`laboratory.2 We are current] y evaluating other
`speech reeognizers,
`and will
`create agents to interface
`to
`their application
`programming
`interfaces (APIs)
`if
`they sat-
`isfy the requirements
`for new applications
`being considered.
`
`Natural
`
`Language Understanding
`
`is
`A major advantage of using an agent-based architecture
`that
`it provides
`simple mix-and-match
`for
`the components.
`In developing
`systems, we have used three different
`natural
`language (NL) systems: a simple one, based on Prolog DCG
`(Definite Clause Grammar),
`then an intermediate
`one, based
`on CHAT [ 16], and finally, our most capable research system
`GEMINI
`[6, 7]. The ability to trivially
`substitute one natural
`language agent for another has been very useful
`in rapid pro-
`totyping
`of systems. The DCG-based
`agent
`is used during
`the early stages of development
`because grammars are eas-
`ily written and modified. Writing
`grammars for the more so-
`phisticated NL agents requires more effort, but provides bet-
`ter coverage of
`the language that real users are likely to use,
`and hence we typically
`delay upgrading to the more sophis-
`ticated agents until
`the application
`crosses certain thresholds
`of maturity and usage.
`
`1open Agent
`
`,&Chitecture
`
`~d OAA ~
`
`trademwks of SRI Intemationd.
`
`Other brand names and product names herein ~
`
`trademarks ~d mzist-d
`
`trademarks of their respective holders.
`(fo~erly CoronaCorp.),
`2NUan~eColoration
`
`Building 110,333RavenswoodAvenue,MenloP~k,C.494025 (dornairr:COronacorp.corn)
`
`62
`
`Comcast - Exhibit 1005, page 62
`
`

`

`Pen Input
`
`a pen in the user interface has
`including
`We have found that
`several significant
`advantages. First,
`the gestures that users
`employ with a pen-based system are substantially
`richer
`than
`those employed by other pointing and tracking systems (e.g.,
`a mouse). Second, handwriting
`is an important
`adjunct
`to
`spoken language.
`Speech recognizes
`(including
`humans)
`can have problems with unfamiliar words (e.g., new names).
`Users can use the pen to correct misspelled words, or may
`even anticipate
`the problem and switch from speaking to
`handwriting.
`Third,
`our personal experience is that when
`a person who has been using a speech-and-gesture
`interface
`faces an environment where speech is inappropriate,
`replac-
`ing speech with handwriting
`is more natural.
`holds
`Using 2D gestures in the human-computer
`interaction
`promise for recreating the pen-and-paper
`situation where the
`user is able to quickly
`express visual
`ideas while she or he
`is using artother modality
`such as speech. However,
`to suc-
`cessfully attain a high level of human-computer
`cooperation,
`the interpretation
`of on-line data must be accurate and fast
`enough to give rapid and correct
`feedback to the user.
`is
`The gestures-recognition
`engine used in our application
`fully described in [9] as the early recognition
`process. There
`is no constraint on the number of strokes. The latest eval-
`uations gave better
`than 9690 accuracy, and the recognition
`was performed
`in less than half a second on a PC 486/50,
`siitisfying what we judge is required in terms of quality and
`speed.
`this engine shares pen data with a hand-
`In most applications,
`The use of
`the same medium to handle
`writing
`recognize.
`two different modalities
`is a source of ambiguities
`that are
`solved by a competition
`between both recognizes
`in order
`tc~determine whether
`the user wrote (a sentence or a com-
`mand) or produced a gesture. A remaining
`problem is to
`solve a mixed input
`(the user draws and writes in the same
`set of strokes).
`engine is its
`the gestures recognition
`The main strength of
`adaptability
`and reusability.
`It allows the developer
`to easily
`define the set of gestures according to the application.
`Each
`gesture is actually described with a set of parameters such
`as the number of directions,
`a broken segment, and so forth.
`Adding a new gesture consists of
`finding the description
`for
`each parameter.
`If a conflict appears with an existing object,
`the discrimination
`is done by creating a new parameter. For
`a given application,
`as few as four parameters are typically
`required to describe and discriminate
`the set of gestures.
`We can use any handwriting
`recognize
`compatible with Mi-
`crosoft’s PenWlndows.3
`
`Modality
`
`Coordination
`
`Agent
`
`Our interface supports a rich set of interactions between nat-
`ural
`language (spoken, written, or typed) and gesturing (e.g.,
`pointing,
`circling)—much
`richer
`than that seen in the put-
`
`that-there systems. Deictic words (e.g.,Wr, them, here) can
`to many classes of objects, and also can be
`be used to refer
`used to refer
`to either
`individuals
`or collections
`of
`individu-
`als.
`for
`is responsible
`(MC) agent
`Coordination
`The Modality
`combining
`the inputs in the different modalities
`to produce a
`single meaning that matches the user’s intention.
`It is respon-
`sible for resolving references,
`for filling
`in missing informa-
`tion for an incoming
`request, and for resolving
`ambiguities
`by using contexts, equivalence or redundancy.
`Taking into account contexts implies establishing a hierarchy
`of rules between them. The importance
`of each context and
`the hierarchy may vary during a single session.
`In the actual
`system, missing information
`is extracted from the dialogue
`context
`(no graphical context or interaction
`context).
`and
`When the user says “Show me the photo of
`this hotel”
`simultaneously
`points with the pen to a hotel,
`the MC agent
`resolves references based on that gesture.
`If no hotel
`is ex-
`plicitly
`indicated,
`the MC agent searches the conversation
`context
`for an appropriate
`reference (for example,
`the hotel
`may have been selected by a gesture in the previous com-
`mand).
`If
`there is no selected hotel
`in the current context,
`the MC Agent will wait a certain amount of
`time (currently
`2 to 3 seconds) before asking the user to identify
`the ho-
`tel
`intended.
`This short delay is designed to accommodate
`different
`synchronizations
`of speech and gesture: different
`users (or a single user in different
`circumstances) may point
`before, during or just after speaking.
`
`the user says “Show me the distance
`In another example,
`from the hotel
`to here” while pointing
`at a destination.
`The
`previous queriei have resulted in a single hotel being focused
`upon, and the MC agent resolves “the hotel”
`from this con-
`text.4 The gesture provides the MC agent with the referent of
`“here”. Processing the resulting query may involve multiple
`agents,
`for example,
`the location of hotels and sightseeing
`destinations may well be in a different databases, and these
`locations may be expressed in different
`formats,
`requiring
`another agent
`to resolve the differences
`and then compute
`the distance.
`
`Flexible Sets of Modalities
`
`in what
`the user maximum flexibility
`The OAA allows
`modalities will be used. Sometimes,
`the user will be on
`a computer
`that does not support
`the full
`range of modali-
`ties (e.g., no pen or handwriting
`recognition).
`Sometimes,
`the user’s environment
`limits
`the choice of modalities,
`for
`example,
`spoken commands are inappropriate
`in a meeting
`where someone else is speaking, whereas in a moving ve-
`hicle, speech is likely to be more reliable than handwriting.
`And sometimes,
`the user’s choice of modalities
`is influenced
`by the data being entered [14].
`With this flexibility,
`the telephone has become our low-end
`user interface to the system. For example, we can use the
`
`Intelligence Corp (CIC) of Redwood City, CA.
`Windows from Communication
`is Handwriterfor
`30ur preferred recognize
`4User feedback about which items are in focus (contextually)
`is provided by graphically
`highlighting
`them.
`
`63
`
`Comcast - Exhibit 1005, page 63
`
`

`

`and we use the tele-
`telephone to check on our appointments,
`phone to notify us of
`the arrival and content of
`important
`e-mail when we are away from our computers.
`This flexibility
`has also proven quite advantageous in accom-
`modating hardware failure. For example, moving the PC for
`one demonstration
`of
`the system shook loose a connection
`on the video card. The UI agent detected that no monitor
`was present, and used the text-to-speech
`agent
`to generate
`the output
`that was normally displayed graphically.
`the des-
`In another project’s demonstration
`(CommandTalk),
`ignated computer was nonfunctional,
`and an underpowered
`computer had to be substituted. Using the OAA’S innate ca-
`pabilities,
`the application’s
`components were distributed
`to
`other computers on the net. However,
`the application
`had
`been designed and tested using the microphone
`on the local
`computer, and the substitute had none. The solution was to
`add the Telephone agent
`that had been created for other ap-
`plications:
`it automatically
`replaced the microphone
`as the
`input
`to the speech recognize.
`
`the System
`Learning
`problems with systems that utilize
`the well-known
`One of
`language is in communicating
`to the user what can
`natural
`and cannot be said. A good solution to this is an open re-
`search problem. Our approach has been to use the design
`of
`the GUI
`to help illustrate what can be said: All
`the sim-
`ple operations
`can also be invoked through traditional GUI
`items, such as menus,
`that cover much of the vocabulary.
`
`OAA AGENTS
`Overview
`in a high-level
`OAA agents communicate with each other
`logical
`language called the Interagent Communication
`Lan-
`guage (ICL).
`ICL is similar
`in style and functionality
`to the
`Knowledge Query and Manipulation
`Language (KQML)
`of
`the DARPA Knowledge Sharing Effort. The differences area
`result of our focus on the user interface
`ICL was designed to
`be compatible with the output of our natural
`language under-
`standing systems,
`thereby simplifying
`transforming
`a user’s
`query or command into one that can be handled by the auto-
`mated agents.
`tools (the Agent De-
`set of
`in initial
`We have developed
`in the creation of agents [11].
`to assist
`velopment Toolkit)
`These tools guide the developer
`through the process, and au-
`tomatically
`generate code templates from specifications
`(in
`the style of various commercial CASE tools).
`These tools
`are implemented
`as OAA agents, so they can interact with,
`and build upon, existing agents. The common agent support
`routines have been packaged as libraries, with coordinated
`libraries for the various languages that we support:
`These tools support building
`both entirely
`new agents and
`creating agents from existing applications,
`including
`legacy
`systems. These latter agents are called wrappers
`(or trans-
`ducers);
`they convert between ICL and the application’s API
`
`(or other interface if
`
`there is no API).
`
`The Facilitator
`
`Agent
`
`agents play a key
`the Facilitator
`In the OAA framework,
`role. When an agent
`is added to the application,
`it registers
`its capabilities with the Facilitator.
`Part of
`this registration
`is the natural
`language vocabulary
`that can be used to talk
`about
`the tasks that
`the agent can perform. When an agent
`needs work done by other agents within the application,
`it
`sends a request
`to the Facilitator, which then delegates it
`to
`an agent, or agents,
`that have registered that
`they can han-
`dle the needed tasks. The ability
`of
`the Facilitator
`to han-
`dle complex
`requests from agents is an important
`attribute
`of
`the OAA design. The goal
`is to minimize
`the informa-
`tion and assumptions
`that
`the developer must embed in an
`agent,
`thereby making it easier to reuse agents in disparate
`applications.
`between applica-
`The OAA supports direct communication
`tion agents, but
`this has not been heavily utilized in our im-
`plementations
`because our focus has been on aspects of ap-
`plications
`in which the role of the Facilitator
`is crucial. First,
`we are interested in user interfaces that support
`interactions
`with the broader community
`of agents, and the Facilitator
`is
`key to handling complex queries. The Facilitator
`(and sup-
`porting agents) handle the translation
`of
`the user’s model of
`the task into the system model
`(analogous to how natural
`lan-
`guage interfaces to databases handle transforming
`the user’s
`model
`into the database’s schemas). Second,
`the Facilitator
`simplifies
`reusing agents in new applications.
`If a commu-
`nity of agents is assembled using agents acquired from other
`communities,
`those agents cannot be assumed to all make
`atomic requests that can be handled by other agents: simple
`requests in one application maybe implemented
`by a combi-
`nation of agents in another application.
`The Facilitator
`is re-
`sponsible for decomposing
`complex requests and translating
`the terminology
`used. This translation
`is typically
`handled
`by delegating it to another agent.
`
`there
`if
`bottleneck
`is a potential
`the Facilitator
`In the OAA,
`is a high volume of communication
`between the agents. Our
`focus has been on supporting
`a natural user interface to a
`very large community
`of intelligent
`agents, and this environ-
`ment produces relatively
`low volume through the Facilitator.
`In the CommandTalk
`application
`(discussed later),
`the mul-
`tiagent system is actually partitioned
`into two communities:
`the user interface and the simulator. The simulator has very
`high volume interaction
`and a carefully
`crafted communica-
`tion channel and appears as a single agent
`to the Facilitator
`and the user interface agents.
`
`‘lYiggers
`users
`applications,
`In an increasing variety of conventionrd
`daemons or watch-
`can set
`triggers
`(also called monitors,
`dogs)
`to take specific action when an event occurs. How-
`ever,
`the possible actions are limited
`to those provided
`in
`
`5A ~lea~eof a “eHion of this SOftWm
`
`is planned.Theannouncementwiii appearon htip:llwww.ti.sri.co~~oti.
`
`64
`
`Comcast - Exhibit 1005, page 64
`
`

`

`The OAA supports triggers in which both
`that application.
`the condition
`and action parts of a request can cover the full
`range of functionality
`represented by the agents dynamically
`connected to the network.
`
`example, one of the authors success-
`real-world
`In a practical
`fully used agent triggers to find a new home. The local rental
`housing market
`is very tight, with all desirable offerings be-
`ing taken immediately.
`‘l%us, you need to be among the first
`to respond to a new listing.
`Several of
`the local newspa-
`pers provide on-line versions of
`their advertisements
`before
`the printed versions are available, but
`there is considerable
`variability
`in when they actually become accessible. To au-
`tomatically
`check for suitable candidates,
`the author made
`the following
`request
`to the agent system:
`“When a house
`for
`rent
`is available
`in Menlo Park for
`less than 1800 dol-
`lars, notifi me immediately.”
`This natural
`language request
`installed a trigger on an agent knowledgeable
`about
`the do-
`main of World Wide Web sources for house rental
`listings.
`At regular
`intervals,
`the agent
`instructs a Web retrieval agent
`to scan data from three on-line newspaper databases. When
`an advertisement meeting the specified criteria is detected,
`a request
`is sent
`to the Facilitator
`for a notify action to be
`delegated to the appropriate other agents.
`
`interactions
`series of
`The notify action involves a complex
`between several agents, coordinated by the Notify and Facil-
`itator agents. For example,
`if
`the user is in a meeting in a
`conference room,
`the Notify
`agent
`first determines his cur-
`rent
`location by checking his calendar
`(if no listing is found,
`the default
`location is his office, which is found from another
`database). The Notify
`agent
`then requests contact
`informa-
`tion for
`the conference
`room, and finds only a telephone
`number. Subsequent
`requests create a spoken version of the
`advertisement and retrieve the user’s confirmation
`password.
`When all
`required
`information
`is collected,
`the Facilitator
`contacts the Telephone agent with a request
`to dial
`the tele-
`phone, ask for
`the user, confirm his identity with password
`(entered by TouchTone), and finally play the message. Other
`media,
`includlng FAX, e-mail and pager, can be considered
`by the Notify agent if agents for handling these services hap-
`pen to be connected to the network.
`
`DISTRIBUTED
`
`SYSTEMS
`
`Multiple
`
`Platforms
`
`run on a
`that we have implemented
`The OAA applications
`individual
`and the exact
`location
`of
`variety of platforms,
`agents is easily changed. We currently
`support PCs (Wh-
`dows 3.1 and 95) and Sun and SGI workstations.
`Our pri-
`mary user
`interface
`platform is the PC, partly because it
`currently offers better support
`for pen-based computing
`and
`partly because of our emphasis on providing
`user interfaces
`on lightweight
`computers
`(portable PCs and PDAs in near
`future). PCs also have the advantage of mass-market GUI-
`building packages such as Visual Basic and Delphi. A lesser
`version of
`the user interface has been implemented
`under X
`for UNIX workstations.
`
`the agents in the UI
`is on a PC, some of
`Even when the UI
`package are running elsewhere. Our preferred speech recog-
`nize requires a UNIX workstation,
`and our natural
`language
`agents and Modality Coordination
`agent have been written
`for UNIX systems.
`
`Mobile Computing
`
`not only as people moving about
`We view mobile computing
`with portable computers using wireless communication,
`but
`also people moving between computers.
`Today’s user may
`have a workstation
`in his office,
`a personal
`computer
`at
`home, and a portable or PDA for meetings.
`In additional,
`when the user meets with management,
`colleagues and cus-
`tomers (“customers”
`in the broad sense of
`the people who
`require his services),
`their computers may be different plat-
`forms. From each of
`these environments,
`the user should be
`able to access his data and run his applications.
`because
`The OAA facilitates
`supporting multiple
`platforms
`only the primary user interface agents need to be running on
`the local computer,
`thereby simplifying
`the problem of port-
`ing to new platforms
`and modality
`devices. Also, since only
`a minimal
`set of agents need to be run locally,
`lightweight
`computers (portables, PDA, and older systems) have the re-
`sources needed to be able to utilize heavyweight,
`resource-
`hungry applications.
`
`COLLABORATION
`
`One of the major advantages of having an agent-based inter-
`face to a multiagent
`application
`is that
`it greatly simplifies
`the interactions
`between the user and the application:
`appli-
`cation agents may interact with a human in the same way
`they interact with any other agent.
`collaborative
`This advantage is readily
`seen when building
`systems. Perhaps the simplest
`form of collaboration
`is to
`allow users to share input and output
`to each other’s applica-
`tions. This form of cooperation
`is inherent
`in the design of
`the OAA:
`it
`facilitates
`the interoperation
`of software devel-
`oped by distributed
`communities,
`especially
`disparate user
`communities
`(different
`platforms,
`different
`conventions).
`We are currently integrating more sophisticated styles of col-
`laboration
`into the OAA framework,
`using the synchronous
`collaborative
`technology
`[5] built by another group within
`our organization.
`In the resulting systems, humans can com-
`municate with agents, agents can work with other automated
`agents, and humans can interact
`in realtime with other hu-
`mans users.
`
`APPLICATIONS
`
`AND REUSE
`
`and Map-based
`the O@ce Assistant
`applications,
`Two
`have been the primary experimental
`en-
`Tourist
`Information
`vironments
`for
`this research project. The agent architecture
`and the specific agents developed on this research project
`have proved to be so useful
`that they are being used by an ex-
`panding set of other projects within our organization.
`These
`other
`internal projects are helping us improve the documen-
`
`65
`
`Comcast - Exhibit 1005, page 65
`
`

`

`to be extremely
`the agent-based approach to multimodality
`useful.
`In these systems, all
`the components
`share a com-
`mon interface-the
`map-and
`the fact
`that
`there are many
`agents is entirely invisible to the user.
`
`infor-
`One example is a map-based system to provide tourist
`mation about San Francisco. Requests expressed in a vari-
`ety of modalities
`can control
`the scrolling and zoom level of
`the map, retrieve information
`about
`locations and distances,
`display hotels or attractions meeting a user’s preferences, or
`present detailed information
`in a variety of media about par-
`ticular hotels or attractions. Where appropriate,
`this informa-
`tion is derived and updated regularly
`from WWW sources.
`
`Map-based interfaces provide a rich setting in which to ex-
`plore the coordination
`of gesture with speech and traditional
`GUI modalities.
`The tourist
`information
`system accommo-
`dates the use of a variety of
`familiar
`pen gestures, such as
`circling
`objects or regions, drawing arrows, X’
`ing positions
`or objects, and striking out objects. Depending
`on context
`and timing considerations,
`requests can be derived from sin-
`gle gestures, multiple
`gestures interpreted
`together, spoken
`or handwritten
`input, point-and-click,
`or some combination
`of these operations.
`
`to left
`For example, an arrow drawn across a map from right
`(which itself
`is recognized from two or three pen strokes)
`is
`interpreted
`as a request
`to scroll
`the map. The same effect
`may be achieved by speaking “scroll
`left”. Display of hotels
`can be obtained by writing
`or speaking “Show hotels”,
`or,
`perhaps,
`“Show hotels with a pool”.
`The distance between
`two objects or locations may be obtained by circling, X’ing,
`or clicking on each of them, and then drawing a straight
`line
`between them. Alternatively,
`one can speak “Show the dis-
`tance fmm here to here”, while selecting two locations,
`or
`one can write “distance”
`either before or after selecting two
`objects.
`
`recognition
`the input
`of
`This system, and the organization
`agents,
`is described in detail
`in [2]. A related system is de-
`scribed in [15].
`
`CommandTalk
`
`tation and packaging of our toolkits and libraries, and we are
`hoping to release a version in the near future.
`Some of the projects adopting the OAA have been motivated
`by the availability
`of various agents, especially
`the user in-
`terface agents. Some projects have gone further and used
`the OAA to integrate the major software components being
`developed on those projects.
`
`Office Assistant
`
`for a number
`The OAA has been used as the framework
`of applications
`in several domain areas.
`In the first OAA-
`based system, a multifunctional
`“office assistant”,
`fourteen
`autonomous
`agents provide information
`retrieval and com-
`munication
`services for a group of coworkers in a networked
`computing
`environment
`([4]).
`This system makes use of a
`multimodal
`user interface running on a pen-enabled portable
`PC, and allows for the use of a telephone to give spoken com-
`mands to the system. Services are provided by agents run-
`ning on UNIX workstations, many of which were created by
`providing
`agent wrappers for legacy applications.
`
`In a typical scenario, agents with expertise in e-mail process-
`ing,
`text-to-speech
`translation,
`notification
`planning,
`calen-
`dar and database access, and telephone control cooperate to
`find a user and alert him or her of an important message. The
`office assistant system provides a compelling
`demonstration
`of how new services can arise from the synergistic
`combi-
`nation of the capabilities
`of components that were originally
`intended to operate in isolation.
`In addition,
`as described
`earlier,
`it demonstrates
`the combination
`of
`two basic styles
`of user interaction — one that directly
`involves a particular
`agent as the primary
`point of contact, and one that anony-
`mously delegates requests across a collection
`of agents — in
`a way that allows the user to switch freely between the two.
`
`screen portrays
`the initial
`this system,
`In the interface for
`an office,
`in which familiar
`objects are associated with the
`appropriate functionality,
`as provided by some agent. For in-
`stance, clicklng
`on a wall clock brings up a dialogue that al-
`lows one to interact with the calendar agent (that
`is, browsing
`and editing one’s appointments).
`In this style of
`interaction,
`even though the calendar agent may call on other agents in
`responding to some request,
`it has primary
`responsibility,
`in
`that all requests through that dialogue are handled by it.
`The alternative style of
`interaction
`is one in which the user
`might speak “Where will
`I be at 2:00 this aftern

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket