`
`User Interfaces in the Open Agent Architecture
`
`Douglas B. Moran
`Adam J. Cheyer
`Luc E. Julia
`David L. Martin
`SRI International
`333 Ravenswood Avenue
`Menlo Park CA 94025 USA
`+14158596486
`{moran,cheyer,julia,martin}
`
`@ai.sri.com
`
`Sangkyu Park
`Intelligence Section
`Artificial
`Electronics and Telecommunications
`Research Institute (ETRI)
`161 Kajong-Dong
`Yusong-Gu, Taejon 305-350 KOREA
`+82 428605641
`skpark@com.etri.
`
`re.kr
`
`ABSTRACT
`
`of the Open Agent Architecture
`The design and development
`(OAA)l
`system has focused on providing
`access to agent-
`based applications
`through an intelligent,
`cooperative,
`dis-
`tributed,
`and multimodal
`agent-based user interfaces.
`The
`current multimodal
`interface supports a mix of spoken lan-
`guage, handwriting
`and gesture, and is adaptable to the user’s
`preferences,
`resources and environment.
`Only the primary
`user interface agents need run on the local computer,
`thereby
`simplifying
`the task of using a range of applications
`from a
`variety of platforms, especiall y low-powered
`computers such
`as Personal Digital Assistants (PDAs). An important
`consid-
`eration in the design of
`the OAA was to facilitate mix-and-
`match:
`to facilitate
`the reuse of agents in new and unantici-
`pated applications,
`and to support
`rapid prototyping
`by facil-
`itating the replacement of agents by better versions.
`
`this
`the agents and tools developed as part of
`of
`The utility
`ongoing research project has been demonstrated by their use
`as infrastructure
`in unrelated projects.
`
`Keywords:
`handwriting,
`
`agent architecture, multimodal,
`natural
`language
`
`speech, gesture,
`
`INTRODUCTION
`
`A major component
`systems
`of our research on multiagent
`is in the user interface to large communities
`of agents. We
`i-navedeveloped agent-based multimodal
`user interfaces us-
`ing the same agent architecture
`used to build the back ends
`of
`these applications. We describe these interfaces and the
`ltarger architecture,
`and outline some of
`the applications
`that
`have been built using this architecture and interface agents.
`
`Permissionto make digital/hard
`copies of all or pafi of
`for
`this material
`persorrrd or classroom use is granted without
`fee provided
`that
`the copies
`are not made or distributed
`for profit or commercial
`advantage,
`tbe copy-
`right notice,
`the title of
`the publication
`and its date appear, and notice is
`is by permission of the ACM, Inc. To copy other-wise,
`given that copyright
`to republish,
`to post on servers or to redistribute
`to lists,
`requires specific
`permission
`and/or
`fee.
`IU I 97, Orlando Florida USA
`@1997 ACM 0-89791-839-8/96/01
`
`..$3.50
`
`OF OPEN AGENT ARCHITECTURE
`OVERVIEW
`system
`The Open Agent Architecture
`(OAA)
`is a multiagent
`from
`that
`focuses on supporting the creation of applications
`agents that were not designed to work together,
`thereby fa-
`cilitating
`the wider
`reuse of
`the expertise embodied by an
`agent. Part of
`this focus is the user interface to these ap-
`plications, which can be viewed as supporting the access of
`human agents to the automated agents. Key attributes of the
`OAA are
`
`l
`
`l
`
`b
`
`l
`
`l
`
`l
`
`in multi-
`The OAA supports agents written
`Open:
`Currently
`ple languages and on multiple
`platforms.
`supported languages are C, Prolog,
`Lisp, Java, Mi-
`crosoft’s Visual Basic and Borland’s Delphi.
`Cur-
`rently supported platforms
`are PCs (Windows
`3.1 and
`95), Sun Workstations
`(Solaris 1.1 and 2.x) and SGIS.
`
`The agents that compose an application
`Distributed
`can run on multiple platforms.
`
`Extensible: Agents can be added to the system while
`it is running, and their capabilities will become imme-
`diately available to the rest of
`the agents. Similarly,
`agents can be dynamically
`removed from the system
`(intentionally
`or not).
`
`can be run from a
`applications
`Mobile: OAA-based
`lightweight
`portable computer
`(or PDA) because only
`the user
`interface
`agents need run on the portable.
`They provide the user with access to a range of agents
`running on other platforms.
`
`The user interface is implemented with
`Collaborative:
`agents, and thus the user appears to be just another
`agent
`to the automated agents. This greatly simplies
`creating
`systems where multiple
`humans and auto-
`mated agents cooperate.
`
`The user interface supports hand-
`Multiple Modalities:
`writing, gesture and spoken language in addition to the
`traditional
`graphical user interface modalities.
`
`61
`
`Comcast - Exhibit 1005, page 61
`
`
`
`Users can enter commands
`Interaction:
`l Multimodal
`for example, a spoken com-
`with a mix of modalities,
`to be acted on is identified
`by
`mand in which the object
`a pen gesture (or other graphical pointing operation).
`
`by work being done as part
`The OAA has been influenced
`of DARPA’s 13 (Intelligent
`Integration
`of
`Information)
`pro-
`gram (http://isx.corn/pub/I3)
`and Knowledge Sharing Effort
`(http://www-ksl.
`stanford.edu/knowledge-sharing/)
`[13].
`
`THE USER INTERFACE
`The User
`Interface
`Agent
`The user interface is implemented with a set of agents that
`have at their
`logical
`center an agent called the User Inter-
`face (U]) Agent.
`The User
`Interface Agent manages the
`various modalities
`and applies additional
`interpretation
`to
`those inputs as needed. Our current system supports speech,
`handwriting
`and pen-based gestures in addition to the con-
`ventional
`keyboard and mouse inputs. when
`speech input
`is detected,
`the UI Agent sends a command to the Speech
`Recognition
`agent to process the audio input and to return the
`corresponding
`text. Three modes are supported for speech
`input:
`open microphone,
`push-to-talk,
`and click-to-start-
`talking.
`Spoken and handwritten
`inputs can be treated as
`either
`raw text, or interpreted
`by a natural
`language under-
`standing agent.
`The first style
`There are two basic styles of user interface.
`parallels the traditional
`graphical user interface (GUI)
`for an
`application:
`The user selects an application
`and is presented
`with a window that has been designed for the application
`im-
`plemented by that agent and that is composed of the familiar
`GUI-style items.
`In this style interface,
`the application
`is typ-
`ically implemented
`as a primary agent, with which the user
`interacts, and a number of supporting agents that are used by
`the primary
`agent, and whose existence is hidden from the
`user. When text entry is needed,
`the user may use handwrit-
`ing or speech instead of
`the keyboard, and the pen may be
`used as an alternative to the mouse. Because the UI Agent
`handles all
`the alternate modalities,
`the applications
`are iso-
`lated from the details of which modalities
`are being used.
`This simplifies
`the design of the applications,
`and simplifies
`adding new modalities.
`not only is there no
`interface,
`In the second basic style of
`agents are largely invisible
`to
`primary
`agent,
`the individual
`the user, and the user’s requests may involve the cooperative
`actions of multiple
`agents.
`In the systems we have imple-
`mented,
`this interface is based on natural
`language (for ex-
`ample, English),
`and is entered with either speech or hand-
`writing. When the UI Agent detects speech or pen-based
`input,
`it
`invokes a speech recognition
`agent or handwriting
`recognition
`agent, and sends the text returned by that agent
`to a natural
`language understanding
`agent, which produces a
`logicalform representation of the user’s request, This logical
`
`agent, which identifies
`form is then passed to a Facilitator
`the subtasks and delegates them to the appropriate
`applica-
`tion agents. For example,
`in our Map-based Tourist
`Informa-
`tion application for the city of San Francisco,
`the user can ask
`for the distance between a hotel and sightseeing destination.
`The locations
`of
`the two places are in different
`databases,
`which are managed by different agents, and the distance cal-
`culation is performed
`by yet another agent.
`
`These two basic styles of interfaces can be combined in a sin-
`gle interface.
`In our O@ce Assistant application,
`the user is
`presented with a user interface based on the Rooms metaphor
`and is able to access conventional
`applications
`such as e-
`mail, calendw, and databases in the familiar manner.
`In ad-
`dition there is a subwindow for spoken or written natural
`lan-
`guage commands that can involve multiple
`agents.
`
`typi-
`inputs,
`research is multimodal
`focus of our
`A major
`cally a mix of gesture/pointing
`with spoken or handwritten
`language. The UI agent manages the interpretation
`of the in-
`dividual modalities
`and passes the results to a Modality Co-
`ordination
`agent, which returns the composite query, which
`is then passed to the Facilitator
`agent
`for delegation
`to the
`appropriate application
`agents (described in subsequent sec-
`tions).
`
`Speech Recognition
`
`systems, sub-
`speech recognition
`We have used different
`criteria. We use research sys-
`stituting
`to meet different
`laboratory
`in our organization
`tems developed by another
`[3] and by a commercial
`spin-
`(http://www-speech.sri.corn/)
`off
`from that
`laboratory.2 We are current] y evaluating other
`speech reeognizers,
`and will
`create agents to interface
`to
`their application
`programming
`interfaces (APIs)
`if
`they sat-
`isfy the requirements
`for new applications
`being considered.
`
`Natural
`
`Language Understanding
`
`is
`A major advantage of using an agent-based architecture
`that
`it provides
`simple mix-and-match
`for
`the components.
`In developing
`systems, we have used three different
`natural
`language (NL) systems: a simple one, based on Prolog DCG
`(Definite Clause Grammar),
`then an intermediate
`one, based
`on CHAT [ 16], and finally, our most capable research system
`GEMINI
`[6, 7]. The ability to trivially
`substitute one natural
`language agent for another has been very useful
`in rapid pro-
`totyping
`of systems. The DCG-based
`agent
`is used during
`the early stages of development
`because grammars are eas-
`ily written and modified. Writing
`grammars for the more so-
`phisticated NL agents requires more effort, but provides bet-
`ter coverage of
`the language that real users are likely to use,
`and hence we typically
`delay upgrading to the more sophis-
`ticated agents until
`the application
`crosses certain thresholds
`of maturity and usage.
`
`1open Agent
`
`,&Chitecture
`
`~d OAA ~
`
`trademwks of SRI Intemationd.
`
`Other brand names and product names herein ~
`
`trademarks ~d mzist-d
`
`trademarks of their respective holders.
`(fo~erly CoronaCorp.),
`2NUan~eColoration
`
`Building 110,333RavenswoodAvenue,MenloP~k,C.494025 (dornairr:COronacorp.corn)
`
`62
`
`Comcast - Exhibit 1005, page 62
`
`
`
`Pen Input
`
`a pen in the user interface has
`including
`We have found that
`several significant
`advantages. First,
`the gestures that users
`employ with a pen-based system are substantially
`richer
`than
`those employed by other pointing and tracking systems (e.g.,
`a mouse). Second, handwriting
`is an important
`adjunct
`to
`spoken language.
`Speech recognizes
`(including
`humans)
`can have problems with unfamiliar words (e.g., new names).
`Users can use the pen to correct misspelled words, or may
`even anticipate
`the problem and switch from speaking to
`handwriting.
`Third,
`our personal experience is that when
`a person who has been using a speech-and-gesture
`interface
`faces an environment where speech is inappropriate,
`replac-
`ing speech with handwriting
`is more natural.
`holds
`Using 2D gestures in the human-computer
`interaction
`promise for recreating the pen-and-paper
`situation where the
`user is able to quickly
`express visual
`ideas while she or he
`is using artother modality
`such as speech. However,
`to suc-
`cessfully attain a high level of human-computer
`cooperation,
`the interpretation
`of on-line data must be accurate and fast
`enough to give rapid and correct
`feedback to the user.
`is
`The gestures-recognition
`engine used in our application
`fully described in [9] as the early recognition
`process. There
`is no constraint on the number of strokes. The latest eval-
`uations gave better
`than 9690 accuracy, and the recognition
`was performed
`in less than half a second on a PC 486/50,
`siitisfying what we judge is required in terms of quality and
`speed.
`this engine shares pen data with a hand-
`In most applications,
`The use of
`the same medium to handle
`writing
`recognize.
`two different modalities
`is a source of ambiguities
`that are
`solved by a competition
`between both recognizes
`in order
`tc~determine whether
`the user wrote (a sentence or a com-
`mand) or produced a gesture. A remaining
`problem is to
`solve a mixed input
`(the user draws and writes in the same
`set of strokes).
`engine is its
`the gestures recognition
`The main strength of
`adaptability
`and reusability.
`It allows the developer
`to easily
`define the set of gestures according to the application.
`Each
`gesture is actually described with a set of parameters such
`as the number of directions,
`a broken segment, and so forth.
`Adding a new gesture consists of
`finding the description
`for
`each parameter.
`If a conflict appears with an existing object,
`the discrimination
`is done by creating a new parameter. For
`a given application,
`as few as four parameters are typically
`required to describe and discriminate
`the set of gestures.
`We can use any handwriting
`recognize
`compatible with Mi-
`crosoft’s PenWlndows.3
`
`Modality
`
`Coordination
`
`Agent
`
`Our interface supports a rich set of interactions between nat-
`ural
`language (spoken, written, or typed) and gesturing (e.g.,
`pointing,
`circling)—much
`richer
`than that seen in the put-
`
`that-there systems. Deictic words (e.g.,Wr, them, here) can
`to many classes of objects, and also can be
`be used to refer
`used to refer
`to either
`individuals
`or collections
`of
`individu-
`als.
`for
`is responsible
`(MC) agent
`Coordination
`The Modality
`combining
`the inputs in the different modalities
`to produce a
`single meaning that matches the user’s intention.
`It is respon-
`sible for resolving references,
`for filling
`in missing informa-
`tion for an incoming
`request, and for resolving
`ambiguities
`by using contexts, equivalence or redundancy.
`Taking into account contexts implies establishing a hierarchy
`of rules between them. The importance
`of each context and
`the hierarchy may vary during a single session.
`In the actual
`system, missing information
`is extracted from the dialogue
`context
`(no graphical context or interaction
`context).
`and
`When the user says “Show me the photo of
`this hotel”
`simultaneously
`points with the pen to a hotel,
`the MC agent
`resolves references based on that gesture.
`If no hotel
`is ex-
`plicitly
`indicated,
`the MC agent searches the conversation
`context
`for an appropriate
`reference (for example,
`the hotel
`may have been selected by a gesture in the previous com-
`mand).
`If
`there is no selected hotel
`in the current context,
`the MC Agent will wait a certain amount of
`time (currently
`2 to 3 seconds) before asking the user to identify
`the ho-
`tel
`intended.
`This short delay is designed to accommodate
`different
`synchronizations
`of speech and gesture: different
`users (or a single user in different
`circumstances) may point
`before, during or just after speaking.
`
`the user says “Show me the distance
`In another example,
`from the hotel
`to here” while pointing
`at a destination.
`The
`previous queriei have resulted in a single hotel being focused
`upon, and the MC agent resolves “the hotel”
`from this con-
`text.4 The gesture provides the MC agent with the referent of
`“here”. Processing the resulting query may involve multiple
`agents,
`for example,
`the location of hotels and sightseeing
`destinations may well be in a different databases, and these
`locations may be expressed in different
`formats,
`requiring
`another agent
`to resolve the differences
`and then compute
`the distance.
`
`Flexible Sets of Modalities
`
`in what
`the user maximum flexibility
`The OAA allows
`modalities will be used. Sometimes,
`the user will be on
`a computer
`that does not support
`the full
`range of modali-
`ties (e.g., no pen or handwriting
`recognition).
`Sometimes,
`the user’s environment
`limits
`the choice of modalities,
`for
`example,
`spoken commands are inappropriate
`in a meeting
`where someone else is speaking, whereas in a moving ve-
`hicle, speech is likely to be more reliable than handwriting.
`And sometimes,
`the user’s choice of modalities
`is influenced
`by the data being entered [14].
`With this flexibility,
`the telephone has become our low-end
`user interface to the system. For example, we can use the
`
`Intelligence Corp (CIC) of Redwood City, CA.
`Windows from Communication
`is Handwriterfor
`30ur preferred recognize
`4User feedback about which items are in focus (contextually)
`is provided by graphically
`highlighting
`them.
`
`63
`
`Comcast - Exhibit 1005, page 63
`
`
`
`and we use the tele-
`telephone to check on our appointments,
`phone to notify us of
`the arrival and content of
`important
`e-mail when we are away from our computers.
`This flexibility
`has also proven quite advantageous in accom-
`modating hardware failure. For example, moving the PC for
`one demonstration
`of
`the system shook loose a connection
`on the video card. The UI agent detected that no monitor
`was present, and used the text-to-speech
`agent
`to generate
`the output
`that was normally displayed graphically.
`the des-
`In another project’s demonstration
`(CommandTalk),
`ignated computer was nonfunctional,
`and an underpowered
`computer had to be substituted. Using the OAA’S innate ca-
`pabilities,
`the application’s
`components were distributed
`to
`other computers on the net. However,
`the application
`had
`been designed and tested using the microphone
`on the local
`computer, and the substitute had none. The solution was to
`add the Telephone agent
`that had been created for other ap-
`plications:
`it automatically
`replaced the microphone
`as the
`input
`to the speech recognize.
`
`the System
`Learning
`problems with systems that utilize
`the well-known
`One of
`language is in communicating
`to the user what can
`natural
`and cannot be said. A good solution to this is an open re-
`search problem. Our approach has been to use the design
`of
`the GUI
`to help illustrate what can be said: All
`the sim-
`ple operations
`can also be invoked through traditional GUI
`items, such as menus,
`that cover much of the vocabulary.
`
`OAA AGENTS
`Overview
`in a high-level
`OAA agents communicate with each other
`logical
`language called the Interagent Communication
`Lan-
`guage (ICL).
`ICL is similar
`in style and functionality
`to the
`Knowledge Query and Manipulation
`Language (KQML)
`of
`the DARPA Knowledge Sharing Effort. The differences area
`result of our focus on the user interface
`ICL was designed to
`be compatible with the output of our natural
`language under-
`standing systems,
`thereby simplifying
`transforming
`a user’s
`query or command into one that can be handled by the auto-
`mated agents.
`tools (the Agent De-
`set of
`in initial
`We have developed
`in the creation of agents [11].
`to assist
`velopment Toolkit)
`These tools guide the developer
`through the process, and au-
`tomatically
`generate code templates from specifications
`(in
`the style of various commercial CASE tools).
`These tools
`are implemented
`as OAA agents, so they can interact with,
`and build upon, existing agents. The common agent support
`routines have been packaged as libraries, with coordinated
`libraries for the various languages that we support:
`These tools support building
`both entirely
`new agents and
`creating agents from existing applications,
`including
`legacy
`systems. These latter agents are called wrappers
`(or trans-
`ducers);
`they convert between ICL and the application’s API
`
`(or other interface if
`
`there is no API).
`
`The Facilitator
`
`Agent
`
`agents play a key
`the Facilitator
`In the OAA framework,
`role. When an agent
`is added to the application,
`it registers
`its capabilities with the Facilitator.
`Part of
`this registration
`is the natural
`language vocabulary
`that can be used to talk
`about
`the tasks that
`the agent can perform. When an agent
`needs work done by other agents within the application,
`it
`sends a request
`to the Facilitator, which then delegates it
`to
`an agent, or agents,
`that have registered that
`they can han-
`dle the needed tasks. The ability
`of
`the Facilitator
`to han-
`dle complex
`requests from agents is an important
`attribute
`of
`the OAA design. The goal
`is to minimize
`the informa-
`tion and assumptions
`that
`the developer must embed in an
`agent,
`thereby making it easier to reuse agents in disparate
`applications.
`between applica-
`The OAA supports direct communication
`tion agents, but
`this has not been heavily utilized in our im-
`plementations
`because our focus has been on aspects of ap-
`plications
`in which the role of the Facilitator
`is crucial. First,
`we are interested in user interfaces that support
`interactions
`with the broader community
`of agents, and the Facilitator
`is
`key to handling complex queries. The Facilitator
`(and sup-
`porting agents) handle the translation
`of
`the user’s model of
`the task into the system model
`(analogous to how natural
`lan-
`guage interfaces to databases handle transforming
`the user’s
`model
`into the database’s schemas). Second,
`the Facilitator
`simplifies
`reusing agents in new applications.
`If a commu-
`nity of agents is assembled using agents acquired from other
`communities,
`those agents cannot be assumed to all make
`atomic requests that can be handled by other agents: simple
`requests in one application maybe implemented
`by a combi-
`nation of agents in another application.
`The Facilitator
`is re-
`sponsible for decomposing
`complex requests and translating
`the terminology
`used. This translation
`is typically
`handled
`by delegating it to another agent.
`
`there
`if
`bottleneck
`is a potential
`the Facilitator
`In the OAA,
`is a high volume of communication
`between the agents. Our
`focus has been on supporting
`a natural user interface to a
`very large community
`of intelligent
`agents, and this environ-
`ment produces relatively
`low volume through the Facilitator.
`In the CommandTalk
`application
`(discussed later),
`the mul-
`tiagent system is actually partitioned
`into two communities:
`the user interface and the simulator. The simulator has very
`high volume interaction
`and a carefully
`crafted communica-
`tion channel and appears as a single agent
`to the Facilitator
`and the user interface agents.
`
`‘lYiggers
`users
`applications,
`In an increasing variety of conventionrd
`daemons or watch-
`can set
`triggers
`(also called monitors,
`dogs)
`to take specific action when an event occurs. How-
`ever,
`the possible actions are limited
`to those provided
`in
`
`5A ~lea~eof a “eHion of this SOftWm
`
`is planned.Theannouncementwiii appearon htip:llwww.ti.sri.co~~oti.
`
`64
`
`Comcast - Exhibit 1005, page 64
`
`
`
`The OAA supports triggers in which both
`that application.
`the condition
`and action parts of a request can cover the full
`range of functionality
`represented by the agents dynamically
`connected to the network.
`
`example, one of the authors success-
`real-world
`In a practical
`fully used agent triggers to find a new home. The local rental
`housing market
`is very tight, with all desirable offerings be-
`ing taken immediately.
`‘l%us, you need to be among the first
`to respond to a new listing.
`Several of
`the local newspa-
`pers provide on-line versions of
`their advertisements
`before
`the printed versions are available, but
`there is considerable
`variability
`in when they actually become accessible. To au-
`tomatically
`check for suitable candidates,
`the author made
`the following
`request
`to the agent system:
`“When a house
`for
`rent
`is available
`in Menlo Park for
`less than 1800 dol-
`lars, notifi me immediately.”
`This natural
`language request
`installed a trigger on an agent knowledgeable
`about
`the do-
`main of World Wide Web sources for house rental
`listings.
`At regular
`intervals,
`the agent
`instructs a Web retrieval agent
`to scan data from three on-line newspaper databases. When
`an advertisement meeting the specified criteria is detected,
`a request
`is sent
`to the Facilitator
`for a notify action to be
`delegated to the appropriate other agents.
`
`interactions
`series of
`The notify action involves a complex
`between several agents, coordinated by the Notify and Facil-
`itator agents. For example,
`if
`the user is in a meeting in a
`conference room,
`the Notify
`agent
`first determines his cur-
`rent
`location by checking his calendar
`(if no listing is found,
`the default
`location is his office, which is found from another
`database). The Notify
`agent
`then requests contact
`informa-
`tion for
`the conference
`room, and finds only a telephone
`number. Subsequent
`requests create a spoken version of the
`advertisement and retrieve the user’s confirmation
`password.
`When all
`required
`information
`is collected,
`the Facilitator
`contacts the Telephone agent with a request
`to dial
`the tele-
`phone, ask for
`the user, confirm his identity with password
`(entered by TouchTone), and finally play the message. Other
`media,
`includlng FAX, e-mail and pager, can be considered
`by the Notify agent if agents for handling these services hap-
`pen to be connected to the network.
`
`DISTRIBUTED
`
`SYSTEMS
`
`Multiple
`
`Platforms
`
`run on a
`that we have implemented
`The OAA applications
`individual
`and the exact
`location
`of
`variety of platforms,
`agents is easily changed. We currently
`support PCs (Wh-
`dows 3.1 and 95) and Sun and SGI workstations.
`Our pri-
`mary user
`interface
`platform is the PC, partly because it
`currently offers better support
`for pen-based computing
`and
`partly because of our emphasis on providing
`user interfaces
`on lightweight
`computers
`(portable PCs and PDAs in near
`future). PCs also have the advantage of mass-market GUI-
`building packages such as Visual Basic and Delphi. A lesser
`version of
`the user interface has been implemented
`under X
`for UNIX workstations.
`
`the agents in the UI
`is on a PC, some of
`Even when the UI
`package are running elsewhere. Our preferred speech recog-
`nize requires a UNIX workstation,
`and our natural
`language
`agents and Modality Coordination
`agent have been written
`for UNIX systems.
`
`Mobile Computing
`
`not only as people moving about
`We view mobile computing
`with portable computers using wireless communication,
`but
`also people moving between computers.
`Today’s user may
`have a workstation
`in his office,
`a personal
`computer
`at
`home, and a portable or PDA for meetings.
`In additional,
`when the user meets with management,
`colleagues and cus-
`tomers (“customers”
`in the broad sense of
`the people who
`require his services),
`their computers may be different plat-
`forms. From each of
`these environments,
`the user should be
`able to access his data and run his applications.
`because
`The OAA facilitates
`supporting multiple
`platforms
`only the primary user interface agents need to be running on
`the local computer,
`thereby simplifying
`the problem of port-
`ing to new platforms
`and modality
`devices. Also, since only
`a minimal
`set of agents need to be run locally,
`lightweight
`computers (portables, PDA, and older systems) have the re-
`sources needed to be able to utilize heavyweight,
`resource-
`hungry applications.
`
`COLLABORATION
`
`One of the major advantages of having an agent-based inter-
`face to a multiagent
`application
`is that
`it greatly simplifies
`the interactions
`between the user and the application:
`appli-
`cation agents may interact with a human in the same way
`they interact with any other agent.
`collaborative
`This advantage is readily
`seen when building
`systems. Perhaps the simplest
`form of collaboration
`is to
`allow users to share input and output
`to each other’s applica-
`tions. This form of cooperation
`is inherent
`in the design of
`the OAA:
`it
`facilitates
`the interoperation
`of software devel-
`oped by distributed
`communities,
`especially
`disparate user
`communities
`(different
`platforms,
`different
`conventions).
`We are currently integrating more sophisticated styles of col-
`laboration
`into the OAA framework,
`using the synchronous
`collaborative
`technology
`[5] built by another group within
`our organization.
`In the resulting systems, humans can com-
`municate with agents, agents can work with other automated
`agents, and humans can interact
`in realtime with other hu-
`mans users.
`
`APPLICATIONS
`
`AND REUSE
`
`and Map-based
`the O@ce Assistant
`applications,
`Two
`have been the primary experimental
`en-
`Tourist
`Information
`vironments
`for
`this research project. The agent architecture
`and the specific agents developed on this research project
`have proved to be so useful
`that they are being used by an ex-
`panding set of other projects within our organization.
`These
`other
`internal projects are helping us improve the documen-
`
`65
`
`Comcast - Exhibit 1005, page 65
`
`
`
`to be extremely
`the agent-based approach to multimodality
`useful.
`In these systems, all
`the components
`share a com-
`mon interface-the
`map-and
`the fact
`that
`there are many
`agents is entirely invisible to the user.
`
`infor-
`One example is a map-based system to provide tourist
`mation about San Francisco. Requests expressed in a vari-
`ety of modalities
`can control
`the scrolling and zoom level of
`the map, retrieve information
`about
`locations and distances,
`display hotels or attractions meeting a user’s preferences, or
`present detailed information
`in a variety of media about par-
`ticular hotels or attractions. Where appropriate,
`this informa-
`tion is derived and updated regularly
`from WWW sources.
`
`Map-based interfaces provide a rich setting in which to ex-
`plore the coordination
`of gesture with speech and traditional
`GUI modalities.
`The tourist
`information
`system accommo-
`dates the use of a variety of
`familiar
`pen gestures, such as
`circling
`objects or regions, drawing arrows, X’
`ing positions
`or objects, and striking out objects. Depending
`on context
`and timing considerations,
`requests can be derived from sin-
`gle gestures, multiple
`gestures interpreted
`together, spoken
`or handwritten
`input, point-and-click,
`or some combination
`of these operations.
`
`to left
`For example, an arrow drawn across a map from right
`(which itself
`is recognized from two or three pen strokes)
`is
`interpreted
`as a request
`to scroll
`the map. The same effect
`may be achieved by speaking “scroll
`left”. Display of hotels
`can be obtained by writing
`or speaking “Show hotels”,
`or,
`perhaps,
`“Show hotels with a pool”.
`The distance between
`two objects or locations may be obtained by circling, X’ing,
`or clicking on each of them, and then drawing a straight
`line
`between them. Alternatively,
`one can speak “Show the dis-
`tance fmm here to here”, while selecting two locations,
`or
`one can write “distance”
`either before or after selecting two
`objects.
`
`recognition
`the input
`of
`This system, and the organization
`agents,
`is described in detail
`in [2]. A related system is de-
`scribed in [15].
`
`CommandTalk
`
`tation and packaging of our toolkits and libraries, and we are
`hoping to release a version in the near future.
`Some of the projects adopting the OAA have been motivated
`by the availability
`of various agents, especially
`the user in-
`terface agents. Some projects have gone further and used
`the OAA to integrate the major software components being
`developed on those projects.
`
`Office Assistant
`
`for a number
`The OAA has been used as the framework
`of applications
`in several domain areas.
`In the first OAA-
`based system, a multifunctional
`“office assistant”,
`fourteen
`autonomous
`agents provide information
`retrieval and com-
`munication
`services for a group of coworkers in a networked
`computing
`environment
`([4]).
`This system makes use of a
`multimodal
`user interface running on a pen-enabled portable
`PC, and allows for the use of a telephone to give spoken com-
`mands to the system. Services are provided by agents run-
`ning on UNIX workstations, many of which were created by
`providing
`agent wrappers for legacy applications.
`
`In a typical scenario, agents with expertise in e-mail process-
`ing,
`text-to-speech
`translation,
`notification
`planning,
`calen-
`dar and database access, and telephone control cooperate to
`find a user and alert him or her of an important message. The
`office assistant system provides a compelling
`demonstration
`of how new services can arise from the synergistic
`combi-
`nation of the capabilities
`of components that were originally
`intended to operate in isolation.
`In addition,
`as described
`earlier,
`it demonstrates
`the combination
`of
`two basic styles
`of user interaction — one that directly
`involves a particular
`agent as the primary
`point of contact, and one that anony-
`mously delegates requests across a collection
`of agents — in
`a way that allows the user to switch freely between the two.
`
`screen portrays
`the initial
`this system,
`In the interface for
`an office,
`in which familiar
`objects are associated with the
`appropriate functionality,
`as provided by some agent. For in-
`stance, clicklng
`on a wall clock brings up a dialogue that al-
`lows one to interact with the calendar agent (that
`is, browsing
`and editing one’s appointments).
`In this style of
`interaction,
`even though the calendar agent may call on other agents in
`responding to some request,
`it has primary
`responsibility,
`in
`that all requests through that dialogue are handled by it.
`The alternative style of
`interaction
`is one in which the user
`might speak “Where will
`I be at 2:00 this aftern