Desktop Teleoperation via the World Wide Web*
`Ken Goldberg, Michael Mascha, Steve Gentner, Nick Rothenberg,
`Carl Sutter, and Jeff Wiegley
`University of Southern californiat
`Initiated at CERN in 1992, the "World Wide Web" (WWW,
`including the HTML language and the HTTP protocol) [ I ]
`provides a standard graphical interface to the Internet.
`"Point-and-click" clients for reading hypertext have been
`ported to most computer platforms; the worldwide number
`of users is well over 500,000 and growing rapidly.
`As feasibility study, we built a system that allows a robot
`manipulator to be teleoperated via the WWW. Although the
`$eld of teleoperation dates back over 50 years, the WWW
`provides a low-cost and widely-available interface that can
`make teleoperated resources accessible to anyone with a
`desktop (or laptop!) computer and modem.
`The "Mercury Project" consists of an industrial robot
`armJitted with a CCD camera and a pneumatic system. We
`placed a sandbox$lled with buried artifacts in the robot
`workspace. Using the ISMAP feature of HTTP, users can
`remotely move the camera to view desired locations or direct
`a short burst of compressed air into the sand to view the
`newly cleared region.
`To our knowledge, the Mercury Project is the Jirst system
`to permit WWW users to remotely view and alter the real
`world. Since it came online September 1, 1994, the system
`has been available almost continuously. As of February I ,
`1995, the project had been accessed by over 50,000 unique
`sites around the world1.
`This paper focuses on interface design, robot hardware,
`and system architecture. Archival information including ex-
`ample images, operator logs and the answer to the puzzle is
`available at:
`1 Goals sf the Project
`In the Spring of 1994, hundreds of WWW servers were
`coming online every week. We conjectured that it might
`'This work was supported in part by NSF Young Investiga-
`tor Award IRI-9457523 to Prof. Goldberg, who can be reached
`at or (+213) 740-9080. A very early version
`of this paper was presented at the Second International WWW
`Conference, Chicago, IL. Oct 17, 1994.
`' ~ o l d b e r ~ , Gentner, and Wiegley are with the Computer Sci-
`ence Department, Mascha and Rothenberg are with the Anthropol-
`ogy Department, Sutter is with the Center for Scholarly Technol-
`The Mercury Project will be decommissioned in March 1995
`to prepare for a new project.
`IEEE lnternotlonal Conference
`on Robotics and Automation
`0-7803- 1965-6/95 $4.00 01995 IEEE
`be possible to use this medium to allow low cost public
`access to a teleoperated robot, in effect providing: desktop
`Figure 1: Robot, camera and air nozzle above workspace.
`As illustrated in Figure 1, we set up a SCARA-type robot
`arm over a semi-annular workspace containing sand and
`buried artifacts. We attached a CCD camera to the end of
`the arm along with a nozzle to direct air bursts into the sand.
`We then developed an interface so this hardware could be
`controlled via the WWW.
`Our primary criterion was that the system be reliable
`enough to operate 24 hours a day and survive user attempts
`at sabotage. A practical criterion was that the system be low
`in cost as we had a limited budget. It is worth noting that the
`manufacturing industry uses similar criteria, reliability and
`cost, to evaluate robots for production. Thus our experience
`with RISC robotics [2] proved helpful.
`Our secondary goal was to create an evolving W site
`that would encourage repeat visits by users. Towu-d this
`end, all of the buried artifacts were derived from an ucnamed
`19th Century text. Users are challenged to identify this text
`and thereby collectively solve the "puzzle". After tach 5-
`minute operating session, users are prompted to describe
`their findings and hypotheses in an ongoing Operator's Log.
`As of 1 February 1995, although the Log includes over 1000
`pages of entries, the puzzle has yet to be solved.
`2 Related Work
`Goertz demonstrated one of the first "master-slave7' tele-
`operators 50 years at the Argonne National Laboratory[3].
`Remotely operated mechanisms have long been desired for
`use in inhospitable environments such as radiation sites, un-
`dersea [4] and space exploration [5]. At General Electric,
`Mosher [6] developed a complex two-arm telwperator with
`vidw cameras. Prosthetic hands were also applied to teleop-
`eration [7]. More recently, teleoperation is being considered
`for medical diagnosis [8], manufacturing [9] and microma-
`nipulation [lo]. See Sheridan [11] for an excellent review
`of the extensive literature on teleoperation and telerobotics.
`Most of these systems require fairly complex hardware
`at the human interface: exoskeleton master linkages are
`attached to the human arm to minimize the kinesthetic ef-
`fects of distance to create a sense of "tele-presence". Our
`objective was to provide widespread access by using only
`the "point-and-click" interface available under the standard
`HTML language.
`A number of WWW sites provide access to remote de-
`vices such as cameras, coffee pots, and coke machines 1121.
`Although we believe our system was the first to allow WWW
`users to manipulate a remote environment, remote motion
`control was independently explored by several other re-
`searchers. In October 1994, Mark Cox of Bradford Univer-
`sity reported a system that allows WWW users to remotely
`schedule photos from a robotic telescope [13] and Rich
`Wallace of NYU demonstrated a remote camera that can
`be selectively aimed using a WWW ISMAP [14]. Shortly
`after the Mercury Project came online, Ken Taylor of the
`University of Western Australia demonstrated a remotely
`controlled six-axis telerobot with a fixed observing camera
`1151. Although Taylor's system requires users to type in
`spatial coordinates to specify relative arm movements, his
`system allows WWW users to pick up blocks by controlling
`the robot's parallel-jaw gripper.
`3 System Design and User Interface
`To facilitate use by a wide audience of non-specialists, we
`sought to make all robot controls available via the standard
`point-and-click mouse commands as shown in Figure 2.
`This forced us to consider a 2D workspace with only a few
`buttons for out-of-plane effects. Users are trained with an
`on-line tutorial prior to operating the robot.
`The user interface centers around the bitmap that we call
`the "status image" as shown in Figure 3. Any number of
`"observers" car1 simultanwusly view the status image, but
`only the current "operator" can send commands by clicking
`on the image. To limit access to one operator at a time,
`we implemented password authentication and a queue that
`gives each operator 5 minutes at the helm.
`When the operator clicks on the status image using the
`mouse, theXY coordinates are transferred back to our server,
`which interprets them to decode the desired robot action.
`This action can be: (1) a global move to center the camera
`Figure 2: The interface as viewed by a WWW browser.
`at XY in the schematic workspace, (2) a local move to
`center the camera at XY in the camera image, (3) moving
`the camera to one of two fixed Z heights, or (4) blowing
`a burst of compressed air into the sand directly below the
`We worked to reduce the size of the status image to min-
`imize turnaround time when a command is issued. The
`average image size for the status image, encoded as a .gif
`file, is 17.3 Kbytes. Although were able to achieve response
`times of 10 seconds for on-campus users, cycle times of up
`to 60 seconds were reported from users in Europe operating
`via 14.4K telephone lines.
`Just for fun, we created a fictional context for the system,
`inventing the history of a deceased paleohydrologist who
`had discovered unexplained artifacts in a radioactive region
`of southwest Nevada. We explained that the Mercury robot
`was originally developed to explore that region and that one
`mandate of our grant was to make our system "available to
`the broader scientific community". A hypertext document
`describing this background provides an online introduction.
`4 Robot and Camera
`The SCARA robot is an IBM SR5427 built by Smkyo in
`early 1980. SCARA stands for "Selective Compliance As-
`sembly Robot Arm"; common in industrial assembly for
`"pick-and-place" operations because it is fast, accurate and
`has a large 2.5D workspace. We selected this robot over
`other robots in our lab due to its excellent durability, large
`workspace, and because it was gathering dust in our lab.
`Figure 3: The "status image". At the right is a schematic top view of the semi-annular workspace and robot linkage. At left
`is a CCD camera image of the view directly beneath the robot end-effector. UpPown buttons are included for Z motion of
`the camera, and the round button is used to blow a burst of compressed air into the sand.
`Unfortunately IBM no longer supports this robot and we
`were forced to read two antiquated BASIC programs and
`monitor their serial line transmissions to decipher the proto-
`cols needed for serial control of the robot. The robot accepts
`joint motion commands using IEEE format and checksums.
`To allow users to manipulate the remote environment we
`initially planned to place a simple gripper at the end effector.
`Anticipating user attempts at sabotage (which is, after all, the
`time-honored hacker tradition), we opted to use compressed
`air as the medium for manipulation.
`The CCD camera is an EDC 1000 from Electrim Inc. This
`camera was chosen based on size and cost. Image data is
`sent from the camera back through a custom serial line to
`a video capture card. The camera image has a resolution
`of 192 by 165 pixels with 256 shades of gray, which we
`truncate to 64 shades to reduce transfer time. Exposure
`time can be changed by software to range between 64ms to
`200ms. Althouprh we dowed the robot to minimize dynamic
`effects, mechanical settling times are long enough to cause
`image blur at the camera. To avoid this, we implemented a
`stability check by taking two images separated by 64ms and
`differencing them. Subsequent images are taken until the
`two successive images are sufficiently similar.
`To avoid the complexity of another servo motor, we use
`a fixed focus camera and choose a focal point that compro-
`mises between the two fixed camera heights. The workspace
`is primarily illuminated by standard florescent fixtures. We
`tested a contrast enhancement routine to normalize the light-
`ing of each image captured from the camera. This increased
`image quality in most cases but exaggerated intensity vari-
`ations across the workspace.
`5 System Architecture
`As shown in Figure 4, WWW clients from around the world
`enter our system through the Internet. The system includes
`three communicating subsystems. Server A responds to
`Universal Resource Locator (URL) requests for any file on
`Figure 4: System Architecture
`the raiders/ directory. Server A runs the vanilla NCSA HTTP
`Demon v.1.3 on a Sun SPARCserver 1000, with SunOS
`Release 5.3. Server A caches the most recent status image
`and sends it whenever an observer request comes in.
`When a user registers as an operator by entering a pass-
`word, we use a database server to verify. This server, B,
`runs on the same machine as Server A. The database server
`is custom programmed for this project, but performs fairly
`standard database functions.
`When an operator is verified, Server A either adds the
`operator to the queue or communicates with Server C which
`controls the robot. Server A decodes the ISMAP X and Y
`mouse coordinates, and sends them across campus to Server
`C via Ethernet.
`On Server C, a custom program decodes the XY coordi-
`nates into a robot command and verifies that the command is
`legal, e.g., within the robot workspace. If it is, the command
`is then executed via a command sent to the robot over a 4800
`baud serial line. Once the command is completed, server C
`uses a local frame buffer to capture the image.
`Server C then generates a new schematic view of the robot
`in the resulting configuration, combines it with the camera
`image and appropriately highlighted control buttons to form
`Servers A and B are at opposite ends of the USC campus
`and are connected via Ethernet. Each machine has its own IP
`address and resides in the domain. Communication
`is achieved using a socket connection between the two ma-
`chines. The implementation on Server A was doneusing the
`standard BSD socket functionsprovided with the SunOS 4.1
`operating system and Perl. On Server C we used a publicly
`available socket package called Waterloo TCP and Borland
`C. The Waterloo TCP package was obtained from the ftp site
` in the file /pub/msdos/
`6 Performance
`We expected that the system would fail after about 6 weeks
`of continuous use. Although Gentner goes in to groom the
`sand once a day, the system is still in operation and has run
`unattended for the past 6 months.
`Network throughput averages 20 Kbyteslsec, which is
`poor compared with 500 Kbytes/sec that can be achieved
`between two Sun workstations in close proximity on the
`campus network. At this time we feel that the delays are
`being imposed by the MS-DOS operating system running
`on Server C because of its inability to support networking
`operations and its lack of multitasking abilities, which ne-
`cessitates busy waiting cycles in the PC software to obtain
`concurrence between the robotic/camera operations and the
`networking duties.
`When server C detects an error, it automatically resets
`the robot controller, recalibrates, and returns the robot to
`its previous position. Also, server A automatically sends
`email if any of the key servers stop functioning. This occurs
`on average twice a month usually due to re-starts of the
`primary usc server. Server A also sends mail to the project
`team if server C stops responding, which occurs about once
`a month.
`We monitor system usage with standard access-logs and
`with custom logs at Server B. In WWW parlance, a "hit" is
`a client request for a file from our system directorgr tree. In
`the period 1 Aug, 1994 through 1 Feb, 1995: 1,968,637 hits
`were made by 52,153 unique hosts (see Figures 6 and 7). If
`we define "uses" as clusters of hits with less than half hour
`idle time, the system was used 87,700 times due to repeat
`visits. The daily average was 430 uses which generated
`approximately 1000 new images. In 1994, the Mercury
`Project accounted for roughly half of all requests to USC's
`WWW server.
`Space prevents us from discussing more sophisticated
`analysis of usage patterns such as the deterministic finite
`automaton transition model created by Wallace and Fisher
`from Over looO pages of operator'slogs:
`From: Rex Kwok <>
`Date: Thu Nov 3 21 52: 17 PST 1994
`Figure 5: Sample camera images: Top row shows scene be-
`fore burst of compressed air, bottom row after. Left column
`taken by camera in the up position, Right column by camera
`in the down position.
`a new status image. Server C then compresses this image
`into GIF format and returns it to Server A, which updates
`the most recent status image and returns it to the client.
`To maintain compatibility with the widest possible set of
`user platforms, we stayed within the the standard H'ITP pro-
`tocol. For example, although X windows permit live video
`feed, we sacrificed this feature for the sake of compatibil-
`ity. We hope that future versions of the protocol will allow
`the server to connect to and update clients to avoid manual
`re-loading of images.
`The major difficulty in implementing Server C was
`scheduling responses to the network, the local mouse, and
`the image capture board. Although we seriously considered
`a multi-tasking environment such as Linux, the Electrim
`camera was only compatible with DOS and the company
`would not part with any source code. Thus we hand-crafted
`our memory management and used the screen itself as a
`memory buffer. This enabled us to speed a custom GIF
`encoder down to a few microseconds per status image.
`5.1 Random Tokens
`Each time Server A returns a new status image to an operator
`or observer, it adds a large random number to its embedded
`URL for the update button. This random token prevents
`the client from caching the status image (otherwise repeated
`requests to update the image would simply reload the local
`image and not request an updated image from Server A).
`The random token also allows Server A to identify and
`track clients. When an operator logs in with a verified pass-
`word, Server A tracks the operator by maintaining a database
`of recent clients so that URL requests can be customized de-
`pending on the user's status. For example the queue is only
`visible to the operators and those on deck.
`Figure 8: Composite image of workspace with artifacts such as miniature lantern, seed packet, etc..
`1 %
`Figure 6: Cumulative number of unique (new) hosts access-
`ing the project.
`"FANTASTIC! It is amazing to operate a robot arm from
`From: Scott Hankin <>
`Date: Fri Sep 23 0 9 3 4 5 9 PDT 1994:
`"...this site seem; similar to the Internet. The search is
`analogous to trying to find something on the net, where you
`scan to locate areas of interest. Sometimes you'll encounter
`a useful nugget of information like [the antique lantern];
`other times you'll discover information which seems valid
`but may be misleading, like the sample of 'fool's gold".
`Some i~formation i 3 in diferent languages, like the scrap of
`North America
`Figure 7: Breakdown of total number of hits by continent.
`paper with text in English and German which points to the
`multinational nature of the net."
`From: Dr. Steve M. Potter <>
`Date: Thu Oct 27 23:30:09 PDT 1994
`"What fun! Cool idea, folks. Best use of forms and click-
`able maps I have seen ... I was wondering how I know this
`is not a clever laser-disk full of pictures you grabbed, with
`no robot, until i saw the time on the watch after blasting it.
`That was when my skepticism evaporated."
`And our favorite ...
`From: James Bryant <>
`Date: Sat Sep 10 08:54:11 PDT 1994
`"I don't believe I have seen a nicer application of science,
`or its money on the net."
`8 Discussion and Future Applications
`The system design exemplifies RISC Robotics, which ad-
`vocates Reduced Intricacy in Sensing and Control. The
`SCARA-type robot requires only 4 axes, is relatively inex-
`pensive and robust, and it is easy to avoid singularities. The
`end effector we've used here is also about the minimum.
`For more on RISC as applied to industrial robotics, see 121.
`We view this project as a feasibility study for a broad
`range of new applications using the Internet to bring remote
`classrooms and scientific lab equipment to a much wider au-
`dience. Remote scholars might gain access to priceless and
`otherwise inaccessible resources (a Grecian urn, a Guten-
`berg Bible, etc.), thus providing an alternative to pre-stored
`libraries which are limited in terms of perspective and depth
`of resolution.
`Since we can offer no guarantees about transmission
`times, our interface design is not suitable for time-critical
`interactions such as remote assembly with force fedback.
`Our system also suffers by forcing operators to work se-
`quentially and wait in a queue. In our next project, we
`hope to multi-task the robot so that many operators can be
`accomodatedl in parallel.
`A variant of our system might allow high-school students
`to collect rock samples From the moon. Anather version
`might allow researchers to remotely control a Scanning Tbn-
`nelling Electron Microscope. The NSF and ARPA recently
`proposed similar ideas for "Virtsad Labordtories" [gj. M7F:
`believe that widespread access to "desktop tdmxapermtion"
`can pemit the next generation of students and researchers
`to share experiences that can advance basic and applied sci-
`Figure 9: The Mercury robot and authors (L to R: Mascha,
`Goldberg, Sutter, Wiegley, Rothenberg, and Gentner. photo
`by Irene Firtiig.)
`Richard Wallace of NYU, George Bekey, Andy Fagg, Juer-
`gen Rossman of U. Dortmund, Peter Danzig, Eric Mankin,
`Irene Firtig, John Canny, Eric Paulos, Victoria Vesna, Peter
`Lunenfeld, Zane Vellm, the ICRA reviewers, the Los Angeles
`Museum of Miniatures, the Laika Foundation, and everyone
`who participated by operating the robot.
`[I] Tim Berners-Lee, Robert Cailliau, Jean,-Francios
`Groff, and Bernd Pollerman. World-wide web: The in-
`formation universe. Electronic Netw0rkinn:Research.
`Applications and Policy, 8(2), Westport k ~ ,
` spring
`[2] John Canny and Ken Goldberg. "'RISC" for in-
`dustrial robotics: Recent results and open prob-
`In International Conference on Robotics
`and Automation. IEEE, May 1994. Also avail-
`able via anonymous ftp from under
`[3] Raymond Goertz and R. Thompson. Electronically
`controlled manipulator. Nucleonics, 1954.
`[41 R. Do Ballard. A last long look at titanic. National
`Geographic, 170(6), December 1986.
`[§I A. K. Bejczy. Sensors, controls, and man-machine
`interface for advanced
`208(4450), 1980.
`163 R. S. Mosher. Industridrnanipulators. Scientfzc Amer-
`ican, 21 1(4), 1964.
`[7] R. Tomovic. On man-machine control. Automatics, 5,
`[8] A. Bejczy, G, Bekey, R. Taylor, and S. Rovetta. A re-
`search methodology for tele-surgery with tinne delays.
`In First International Symposium on Medical Robotics
`and Computer Assisted Surgery, Sept 1994.
`[9] Matthew Gertz, David Stewart, and Pradeelp Khosla.
`A human-machine interface for distributed virtual lab-
`oratories. IEEE Robotics and Automation Adagazine,
`December 1994.
`[I01 T. Sato, J. Ichikawa, M. Mitsuishi, and Y. EIatamura.
`A new micro-teleoperation system employing a hand-
`held force feedback pencil. In International Confer-
`ence on Robotics and Automation. IEEE, May 1994.
`[I l] Thomas B. Sheridan. Telerobotics, Automation, and
`Human Supervisory Control. MIT Press, 1992.
`1121 ~tt~:~~~k~b~no.~tanford.edul~ahool~orn~utersl~n-
`We are grateful to the following for their support and help-
`ful suggestions: Howard Moraff of NSF, Rick Lacy and
`Mark Brown from USC's Center for Scholarly Technology,
`1161 Richard S. Wallace and Shana M. Fisher. Finite-state
`machine models of world-wide web clients. Technical
`report, NYU CS TR 700, March 1995.
