`
`IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 9, NO. 2, APRIL 1993
`
`Automated Tracking and Grasping of a Moving
`Object with a Robotic Hand-Eye System
`
`Peter K. Allen, Member, IEEE, Aleksandar Timcenko, Student Member, IEEE,
`Billibon Yoshimi, Student Member, IEEE, and Paul Michelman, Member, IEEE
`
`Abstract- Most robotic grasping tasks assume a stationary
`or fixed object. In this paper, we explore the requirements for
`tracking and grasping a moving object. The focus of our work
`is to achieve a high level of interaction between a real-time
`vision system capable of tracking moving objects in 3-D and a
`robot arm with gripper that can be used to pick up a moving
`object. There is an interest in exploring the interplay of hand-eye
`coordination for dynamic grasping tasks such as grasping of
`parts on a moving conveyor system, assembly of articulated
`parts, or for grasping from a mobile robotic system. Coordination
`between an organism's sensing modalities and motor control
`system is a hallmark of intelligent behavior, and we are pursuing
`the goal of building an integrated sensing and actuation system
`that can operate in dynamic as opposed to static environments.
`The system we have built addresses three distinct problems in
`robotic hand-eye coordination for grasping moving objects: fast
`computation of 3-D motion parameters from vision, predictive
`control of a moving robotic arm to track a moving object, and
`interception and grasping. The system is able to operate at
`approximately human arm movement rates, and experimental
`results in which a moving model train is tracked is presented,
`stably grasped, and picked up by the system. The algorithms we
`have developed that relate sensing to actuation are quite general
`and applicable to a variety of complex robotic tasks that require
`visual feedback for arm and hand control.
`
`I. INTRODUCTION
`HE focus of our work is to achieve a high level of
`
`T interaction between a real-time vision system capable of
`
`tracking moving objects in 3-D and a robot arm equipped
`with a dextrous hand that can be used to intercept, grasp,
`and pick up a moving object. We are interested in exploring
`the interplay of hand-eye coordination for dynamic grasping
`tasks such as grasping of parts on a moving conveyor system,
`assembly of articulated parts, or for grasping from a mobile
`robotic system. Coordination between an organism's sensing
`modalities and motor control system is a hallmark of intelli-
`gent behavior, and we are pursuing the goal of building an
`integrated sensing and actuation system that can operate in
`dynamic as opposed to static environments.
`There has been much research in robotics over the last few
`years that addresses either visual tracking of moving objects
`or generalized grasping problems. However, there have been
`
`Manuscript received October 14, 1991; revised June 2, 1992. This work
`was supported in part by DARPA under Contract "39-84-C-0165,
`by
`NSF under Grants DMC-86-05065, DCI-86-08845, CCR-86.12709, IRI-86-
`57151, and IRI-88-1319 by North American Philips Laboratories, by Siemens
`Corporation, and by Rockwell Inc.
`The authors are with the Department of Computer Science, Columbia
`University, New York, NY 10027.
`IEEE Log Number 9207358.
`
`few efforts that try to link the two problems. It is quite clear
`that complex robotic tasks such as automated assembly will
`need to have integrated systems that use visual feedback to
`plan, execute, and monitor grasping.
`The system we have built addresses three distinct problems
`in robotic hand%ye coordination for grasping moving objects:
`fast computation of 3-D motion parameters from vision, pre-
`dictive control of a moving robotic arm to track a moving
`object, and interception and grasping. The system is able to
`operate at approximately human arm movement rates, using
`visual feedback to track, intercept, stably grasp, and pick up a
`moving object. The algorithms we have developed that relate
`sensing to actuation are quite general and applicable to a
`variety of complex robotic tasks that require visual feedback
`for arm and hand control.
`Our work also addresses a very fundamental and lim-
`iting problem that is inherent in building integrated sens-
`ing/actuation systems; integration of systems with different
`sampling and processing rates. Most complex robotic systems
`are actually amalgams of different processing devices, con-
`nected by a variety of methods. For example, our system
`consists of three separate computation systems: a parallel
`image processing computer; a host computer that filters, trian-
`gulates, and predicts 3-D position from the raw vision data; and
`a separate arm control system computer that performs inverse
`kinematic transformations and joint-level servoing. Each of
`these systems has its own sampling rate, noise characteristics,
`and processing delays, which need to be integrated to achieve
`smooth and stable real-time performance. In our case, this
`involves overcoming visual processing noise and delays with
`a predictive filter based upon a probabilistic analysis of the
`system noise characteristics. In addition, real-time arm control
`needs to be able to operate at fast servo rates regardless of
`whether new predictions of object position are available.
`The system consists of two fixed cameras that can image a
`scene containing a moving object (Fig. 1). A PUMA-560 with
`a parallel jaw gripper attached is used to track and pick up the
`object as it moves (Fig. 2). The system operates as follows:
`1) The imaging system performs a stereoscopic optic-flow
`calculation at each pixel in the image. From these optic-
`flow fields, a motion energy profile is obtained that forms
`the basis for a triangulation that can recover the 3-D
`position of a moving object at video rates.
`2) The 3-D position of the moving object computed by
`step 1 is initially smoothed to remove sensor noise,
`and a nonlinear filter is used to recover the correct
`
`1042-296X/93$03.00 0 1993 IEEE
`
`I
`
`Page 1 of 14
`
`SAMSUNG EXHIBIT 1018
`Samsung v. Image Processing Techs.
`
`
`
`ALLEN et al.: AUTOMATED TRACKING AND GRASPING OF A MOVING OBJECT
`
`153
`
`\
`
`circie of radius
`250 millimeters
`
`L
`
`J
`
`Stereo
`Cameras
`
`Fig. 1. Tracking/grasping system.
`
`trajectory parameters which can be used for forward pre-
`diction, and the updated position is sent to the trajectory-
`planner/arm-control system.
`3) The trajectory planner updates the joint-level servos of
`the arm via kinematic transform equations. An additional
`fixed-gain filter is used to provide servo-level control
`in case of missed or delayed communication from the
`vision and filtering system.
`4) Once tracking is stable, the system commands the arm
`to intercept the moving object and the hand is used to
`grasp the object stably and pick it up.
`The following sections of the paper describe each of these
`subsystems in detail along with experimental results.
`
`11. PREVIOUS WORK
`Previous efforts in the areas of motion tracking and real-
`time control are too numerous to exhaustively list here. We
`instead list some notable efforts that have inspired us to use
`similar approaches. Burt et al. [9] have focused on high-
`speed feature detection and hierarchical scaling of images
`in order to meet the real-time demands of surveillance and
`other robotic applications. Related work has been reported by
`Lee and Wohn [29] and Wiklund and Granlund [43] who use
`image differencing methods to track motion. Corke, Paul, and
`Wohn [13] report a feature-based tracking method that uses
`special-purpose hardware to drive a servo controller of an
`arm-mounted camera. Goldenberg et al. [ 161 have developed
`a method that uses temporal filtering with vision hardware
`similar to our own. Luo, Mullen, and Wessel [30] report a real-
`time implementation of motion tracking in 1-D based on Horn
`and Schunk’s method. Verghese et ul. [41] report real-time
`short-range visual tracking of objects using a pipelined system
`similar to our own. Safadi [37] uses a tracking filter similar to
`our own and a pyramid-based vision system, but few results
`are reported with this system. Rao and Durrant-Whyte [36]
`have implemented a Kalman filter-based decentralized tracking
`
`Fig. 2. Experimental hardware
`
`system that tracks moving objects with multiple cameras.
`Miller [31] has integrated a camera and arm for a tracking
`task where the emphasis is on learning kinematic and control
`parameters of the system. Weiss et al. [42] also use visual
`feedback to develop control laws for manipulation. Brown [8]
`has implemented a gaze control system that links a robotic
`“head” containing binocular cameras with a servo controller
`that allows one to maintain a fixed gaze on a moving object.
`Clark and Ferrier [12] also have implemented a gaze control
`system for a mobile robot. A variation of the tracking problems
`is the case of moving cameras. Some of the papers addressing
`this interesting problem are 191, [15], [441, and [18].
`The majority of literature on the control problems encoun-
`tered in motion tracking experiments is concerned with the
`problem of generating smooth, up-to-date trajectories from
`noisy and delayed outputs from different vision algorithms.
`Our previous work [4] coped with that problem in a similar
`way as in [38], using an cy - p - y filter, which is a
`form of steady-state Kalman filter. Other approaches can be
`found in papers by [33], [34], [28], [6]. In the work of
`Papanikolopoulos et al. [33], [34], visual sensors are used in
`the feedback loop to perform adaptive robotic visual tracking.
`Sophisticated control schemes are described which combine a
`Kalman filter’s estimation and filtering power with an optimal
`(LQG) controller which computes the robot’s motion. The
`vision system uses an optic-flow computation based on the
`SSD (sum of squared differences) method which, while time
`consuming, appears to be accurate enough for the tracking
`
`SAMSUNG EXHIBIT 1018
`Page 2 of 14
`
`
`
`154
`
`IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 9, NO. 2, APRIL 1993
`
`task. Efficient use of windows in the image can improve
`the performance of this method. The authors have presented
`good tracking results, as well as stated that the controller is
`robust enough so the use of more complex (time-varying LQG)
`methods is not justified. Experimental results with the CMU
`Direct Drive Arm I1 show that the methods are quite accurate,
`robust, and promising.
`The work of Lee and Kay [28] addresses the problem of
`uncertainty of cameras in the robot’s coordinate frame. The
`fact that cameras have to be strictly fixed in robot’s frame
`might be quite annoying since each time they are (most often
`incidentally) displaced, one has to undertake a tedious job
`of their recalibration. Again, the estimation of the moving
`object’s position and orientation is done in the Cartesian space
`and a simple error model is assumed. Andersen et al. [6] adopt
`a 3rd-order Kalman filter in order to allow a robotic system
`(consisting of two degrees of freedom) to play the labyrinth
`game. A somewhat different approach has been explored in the
`work of Houshangi [24] and Koivo et al. [27]. In these works,
`the autoregressive (AR) and autogressive moving-average with
`exogenous input (ARMAX) models are investigated for visual
`tracking.
`
`111. VISION SYSTEM
`In a visual tracking problem, motion in the imaging system
`has to be translated into 3-D scene motion. Our approach
`is to initially compute local optic-flow fields that measure
`image velocity at each pixel in the image. A variety of
`techniques for computing optic-flow fields have been used with
`varying results including matching-based techniques [5], [ 101,
`[39], gradient-based techniques [23], [32], [ 113, and spatio-
`temporal energy methods [20], [2]. Optic-flow was chosen as
`the primitive upon which to base the tracking algorithm for
`the following reasons.
`The ability to track an object in three dimensions implies
`that there will be motion across the retinas (image planes)
`that are imaging the scene. By identifying this motion in
`each camera, we can begin to find the actual 3-D motion.
`The principal constraint in the imaging process is high
`computational speed to satisfy the update process for
`the robotic arm parameters. Hence, we needed to be
`able to compute image motion quickly and robustly. The
`Hom-Schunck optic-flow algorithm (described below) is
`well suited for real-time computation on our PIPE image
`processing engine.
`We have developed a new framework for computing
`optic-flow robustly using an estimation-theoretic frame-
`work [40]. While this work does not specifically use these
`ideas, we have future plans to try to adapt this algorithm
`to such a framework.
`the
`implementation of
`Our method begins with an
`Horn-Schunck method of computing optic-flow [22]. The
`underlying assumption of this method is the optic-flow
`constraint equation, which assumes image irradiance at time t
`and t + St will be the same:
`I ( z + sz, y + sy, t + S t ) = I(2, y, t ) .
`
`(1)
`
`If we expand this constraint via a Taylor series expansion,
`and drop second- and higher-order terms, we obtain the form
`of the constraint we need to compute normal velocity:
`
`where U and U are the velocities in image space, and I,,
`IY, and 1, are the spatial and temporal derivatives in the
`image. This constraint limits the velocity field in an image
`to lie on a straight line in velocity space. The actual velocity
`cannot be determined directly from this constraint due to
`the aperture problem, but one can recover the component of
`velocity normal to this constraint line as
`
`A second, iterative process is usually employed to propagate
`velocities in image neighborhoods, based upon a variety of
`smoothness and heuristic constraints. These added neighbor-
`hood constraints allow for recovery of the actual velocities U ,
`U in the image. While computationally appealing, this method
`of determining optic-flow has some inherent problems. First,
`the computation is done on a pixel-by-pixel basis, creating a
`large computational demand. Second, the information on optic
`flow is only available in areas where the gradients defined
`above exist.
`We have overcome the first of these problems by using
`the PIPE image processor [26], [7]. The PIPE is a pipelined
`parallel image processing computer capable of processing 256
`x 256 x 8 bit images at frame rate speeds, and it supports
`the operations necessary for optic-flow computation in a pixel-
`parallel method (a typical image operation such as convolution,
`warping, additionkubtraction of images can be done in one
`cycle-l/60
`s). The second problem is alleviated by our not
`needing to know the actual velocities in the image. What we
`need is the ability to locate and quantify gross image motion
`robustly. This rules out simple differencing methods which are
`too prone to noise and will make location of image movement
`difficult. Hence, a set of normal velocities at strong gradients
`is adequate for our task, precluding the need to iteratively
`propagate velocities in the image.
`
`A. Computing Normal Optic-Flow in Real-Time
`Our goal is to track a single moving object in real time. We
`are using two fixed cameras that image the scene and need to
`report motion in 3-D to a robotic arm control program. Each
`camera is calibrated with the 3-D scene, but there is no explicit
`need to use registered (i.e., scan-line coherence) cameras. Our
`method computes the normal component of optic-flow for each
`pixel in each camera image, finds a centroid of motion energy
`for each image, and then uses triangulation to intersect the
`back-projected centroids of image motion in each camera. Four
`processors are used in parallel on the PIPE. The processors are
`assigned as four per camera-two
`each for the calculation of
`X and Y motion energy centroids in each image. We also
`use a special processor board (ISMAP) to perform real-time
`
`SAMSUNG EXHIBIT 1018
`Page 3 of 14
`
`
`
`ALLEN et al.: AUTOMATED TRACKNG AND GRASPING OF A MOVING OBJECT
`
`155
`
`Gaussian
`
`Fig. 3. PIPE motion tracking algorithm.
`histogramming. The steps below correspond to the numbers
`in Fig. 3.
`The camera images the scene and the image is sent
`to processing stages in the PIPE.
`The image is smoothed by convolution with a Gauss-
`ian mask. The convolution operator is a built-in
`operation in the PIPE and it can be performed in
`one frame cycle.
`In the next two cycles, two more images are read in,
`smoothed and buffered, yielding smoothed images
`Io and I1 and 12. The ability to buffer and pipeline
`images allows temporal operations on images, albeit
`at the cost of processing delays (lags) on output.
`There are now three smoothed images in the PIPE,
`with the oldest image lagging by 3/60 s.
`Images Io and I, are subtracted yielding the temporal
`derivative It.
`In parallel with step 5, image 11 is convolved with a
`3 x 3 horizontal spatial gradient operator, returning
`the discrete form of I,. In parallel, the vertical spatial
`gradient is calculated yielding I, (not shown).
`The results from steps 5 and 6 are held in buffers
`and then are input to a look-up table that divides
`the temporal gradient at each pixel by the absolute
`value of the summed horizontal and vertical spatial
`gradients [which approximates the denominator in
`(3)]. This yields the normal velocity in the image at
`each pixel. These velocities are then thresholded and
`any isolated (i.e., single pixel motion energy) blobs
`are morphologically eroded. The above threshold
`velocities are then encoded as gray value 255. In
`our experiments, we thresholded all velocities below
`10 pixels per 60 ms to zero velocity.
`
`11)
`
`9-10) In order to get the centroid of the motion information,
`we need the X and Y coordinates of the motion
`energy. For simplicity, we show only the situation for
`the X coordinate. The gray-value ramp in Fig. 3 is
`an image that encodes the horizontal coordinate value
`(0-255) for each point in the image as a gray value.
`Thus, it is an image that is black (0) at horizontal
`pixel 0 and white (255) at horizontal pixel 255. If
`we logically and each pixel of the above threshold
`velocity image with the ramp image, we have an
`image which encodes high velocity pixels with their
`positional coordinates in the image, and leaves pixels
`with no motion at zero.
`By taking this result and histogramming it, via a
`special stage of the PIPE which performs histograms
`at frame rate speeds, we can find the centroid of the
`moving object by finding the mean of the resulting
`histogram. Histogramming the high-velocity position
`encoded images yields 256 16-bit values (a result for
`each intensity in the image). These 256 values can
`be read off the PIPE via a parallel interface in about
`10 ms. This operation is performed in parallel to find
`the moving object’s Y centroid (and in parallel for X
`and Y centroids for camera 2). The total associated
`delay time for finding the centroid of a moving object
`becomes 15 cycles or 0.25 s.
`The same algorithm is run in parallel on the PIPE for the
`second camera. Once the motion centroids are known for
`each camera, they are back-projected into the scene using the
`camera calibration matrices and triangulated to find the actual
`3-D location of the movement. Because of the pipelined nature
`of the PIPE, a new X or Y coordinate is produced every 1/60
`s with this delay. Fig. 4 shows two camera images of a moving
`train, and Fig. 5 shows the motion energy derived from the
`real-time optic-flow algorithm.
`While we are able to derive 3-D position from motion-
`stereo at real-time rates, there are a number of sources of noise
`and error inherent in the vision system. These include stereo-
`triangulation error, moving shadow s which are interpreted
`as object motion (we use no special lighting in the scene),
`and small shifts in centroid alignments due to the different
`viewing angles of the cameras, which have a large baseline.
`The net effect of this is to create a 3-D position signal that
`is accurate enough for gross-level object tracking, but is not
`sufficient for the smooth and highly accurate tracking required
`for grasping the object. We describe in the next section how
`a probabilistic model of the motion that includes noise can
`be used to extract a more stable and accurate 3-D position
`signal.
`
`IV. ROBOTIC ARM CONTROL
`The second part of the system is the arm control. The robotic
`arm has to be controlled in real time to follow the motion of
`the object, using the output of the vision system. The raw
`vision system output is not sufficient as a control parameter
`since its output is both noisy and delayed in time. The control
`system needs to do the following:
`
`SAMSUNG EXHIBIT 1018
`Page 4 of 14
`
`
`
`156
`
`IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 9, NO. 2, APRIL 1993
`
`(b)
`(a)
`Fig. 4. Left and right camera images.
`
`(b)
`(a)
`Fig. 5. Motion energy derived from optic flow (left and right cameras)
`
`filter out the noise with a digital filter;
`predict the position to cope with delays introduced by
`both vision subsystem and the digital filter;
`perform the kinematic transformations which will map the
`desired manipulator’s tip position from a Cartesian coor-
`dinate frame into joint coordinates, and actually perform
`the movement.
`Our vision algorithm provides in each sampling instant a
`position in space as a triplet of Cartesian coordinates ( x . y. z ) .
`The task of the control algorithm is to smooth and predict
`the trajectory, thus positioning the robot where the object is
`during its motion.
`
`A well-known and useful solution is the Kalman filter
`approach, because it successfully performs both smoothing
`and prediction. However, the assumption the Kalman filter
`makes is that the noise applied to the system is white.
`That fact directly depends on the parametrization of the
`trajectory and, unfortunately in our case, the simplest pas-
`not support this noise
`sible parametrization-Cartesian-does
`model. Our previous work [4] used a variant of this approach
`and obtained tracking that was smooth but not accurate enough
`to allow actual grasping of the moving object. Our solution to
`this problem was to appeal to a local coordinate system that
`was able to model the motion and system noise characteristics
`
`SAMSUNG EXHIBIT 1018
`Page 5 of 14
`
`
`
`ALLEN et al.: AUTOMATED TRACKING AND GRASPING OF A MOVING OBJECT
`
`157
`
`no. of points
`I
`
`,,,o{
`
`Fig. 7. Experimental density of SI, the expected value of the arc length.
`
`Fig. 6. Model of the motion in the plane: the moving object is in p k + l .
`while the vision system computes Q k f l . SO is the actual arc length, and s
`is the measured arc length.
`
`more accurately, thus producing a more accurate control
`algorithm.
`A. The Model of the Motion
`The model of the motion we are using separates 3-D space
`into an X Y plane and the Z axis, and addresses these two
`components of motion separately. In the experimental results
`we present in Section VI, we have used the model to track
`planar trajectories. However, we still need to track the Z
`dimension to determine the height above the plane of the
`motion to command the robotic arm to correctly grasp the
`object. The tracking of the object in the X Y plane is done
`using a local model presented below, while the tracking in
`2 is done with a Cartesian displacement using the fixed-gain
`filter described in [4].
`The main idea in the trajectory parametrization used in this
`paper is to describe a point in a local coordinate frame, relative
`to the point from the previous sampling instant, by the triplet
`of coordinates (SO, $0, Az) where (see Fig. 6)
`SO is the length of an arc between two points (we will
`approximate the arc length by a straight line section and
`set SO to be the distance between points
`and Pk+l,
`SO = 11Pk+1-pkII where 11.11 denotes Euclidean distance);
`$0 is the “bending” of the trajectory;
`AZ is the altitude difference between two consecutive
`points.
`Due to the existence of noise, the measured coordinates will
`be random variables with certain distributions. We have made
`the following assumptions, as a result of both reasoning about
`the vision algorithm and certain necessary simplifications.
`In sampling instant k, our object is at point Pk.
`In the next sampling instant k + 1, the object is at Pk+1
`and the point returned by the vision algorithm is &k+l.
`The measured arc length is s = ll&k+l - 911.
`Q k f l is normally distributed around Pk+1. The noise can
`be expressed by its two components, tangential nt and
`normal n,, where both nt and n, are random variables.
`nt and n, are both zero-mean, with the same variance CT
`and mutually not correlated. Experimentally, it has been
`determined that their coefficient of correlation is between
`0.1 and 0.2.
`AZ is independent of SO and $0 (i.e., the altitude Z is
`independent of the position in X Y plane).
`
`This last assumption limits the method to tracking planar tra-
`jectories in space. However, the probabilistic method described
`below can be extended to arbitrary 3-D space curve trajectories
`by finding a distribution that will allow computation of a space
`curve torsion parameter. This essentially means creating a full
`Frenet Frame representation at each point in time.
`Under these assumptions, it can be shown that (see Appen-
`dix I) the velocity ‘U and curvature K are
`‘U = lim so/T
`(4)
`T+O
`K = lim tan 40/so
`(5)
`T-0
`where SO = 114+1 - Pk11,$0 = 7r - LPk_1PkPk+l and T is
`the sampling interval.
`Our model assumes the following coordinate transformation
`that relates the moving object’s coordinate frame at one instant
`with the next instant in time:
`TA = rot(z, $0) o trans(z, so) o trans(z, Az)
`(6)
`where rot and trans are rotation about and translation along a
`given axis. Presented as a 4 x 4 matrix, transformation (6) is
`0 SO cos $0
`cos $0
`-sin $0
`sin $0
`0 so sin $0
`cos $0
`0
`~
`A
`0
`0
` 1
`0
`What are the advantages of such a parametrization? The
`most obvious one is the simplicity of the prediction task in this
`framework; all we need is to multiply the velocity ‘U = so/T
`by the time T > I’ we want to predict, as well as “bending”
`$0. The next advantage is that in order to achieve an accurate
`prediction, we do not need a high-order model with the mostly
`heuristic tuning of numerous parameters. The price we have
`to pay is that Jiltering is not straightforward. It turns out that
`we cannot just apply a low-pass filter in order to recover a
`dc component from s, but rather we need a more elaborate
`approach that takes into account a probabilistic distribution
`of s. Fig. 7 is a histogram of the experimentally measured
`density of the computed arc length between triangulated image
`motion points. This distribution shows the need to use a more
`sophisticated method than a simple averaging filter, which
`we have found to be incorrect in being able to correctly
`estimate the movement of the object between vision samples.
`
`]
`
`z
`
`T A = [
`
`(7)
`.
`
`
`
`SAMSUNG EXHIBIT 1018
`Page 6 of 14
`
`
`
`~
`
`158
`
`IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 9, NO. 2, APRIL 1993
`
`s = [J&k+i - pkll
`(computed arc length including noise)
`r
`c
`3
`
`< V i S
`-;\;
`
`low pass filter
`N.c)
`(stt
`>
`
`nonlinear transform
`see section N.B)
`
`s2 (computed square ofthe arc
`length including noise)
`I
`
`r
`
`low pass filta
`(sec section I V . 9
`
`Fig. 8. Overview of the filtering method.
`
`The analysis below describes a probabilistic model of the
`experimental distribution in Fig. 7, allowing us to recover
`the actual arc length parameter SO and the bending angle $0
`at each sampling instant. While this model introduces more
`complexity than a standard Cartesian model, we will see below
`that it is more effective in allowing us to accurately predict
`and smooth our trajectory.
`
`B. Estimating Arc Length SO and Bending Parameter $0
`We begin this subsection with an intuitive overview of
`the method used to recover the actual arc length SO and
`actual bending parameter $0 from the noisy estimates provided
`by the vision system. The filtering method is summarized
`in Fig. 8. Given the model of the motion described in the
`previous subsection, we want to extract the actual arc length
`SO from measured, noisy arc length s. Using a smoothing filter
`(see subsection C), we can derive the expected value of the
`arc length which we denote S I , as well as its second-order
`moment (expectation of s2) which we denote sz. These two
`control inputs, s1 and sz, can be used to create two integral
`equations [(13) and (15)] which, when integrated, express the
`known smoothed control inputs s1 and sg as functions of
`the actual arc length SO and a variance 0. These equations
`allow us to estimate SO from the control inputs. To recover
`the actual bending parameter $ 0 , our task is simplified since
`its distribution is symmetric, and its expectation is the actual
`bending parameter itself.
`Let s = lIQk+l - Pkll be the distance between the object
`and the next position returned by the vision algorithm. Let
`3 k be the moving coordinate frame of the point Pk with axes
`t, n, and 2, where t is the tangential and n normal to the
`trajectory’s projection in X Y plane (see Fig. 6), and let 3 k + l
`be the analogous coordinate frame in the point Pk+1. The
`transformation from 3 k to 3 k + 1 is given by TA:
`
`is given by
`In the coordinate frame 3 k , the point Qk+l
`z 1IT. Thus, the distance s = IlQk+l - 911
`!!‘il[nt n,
`is given by
`
`where n: = nt cos $0 + n, sin 40 and n’, = -nt sin $0 +
`n, cos $0. n: and n’, are obtained by “rotating” nt and n,
`by $0. It is known that the transformation of noiicorrelated
`Gaussian random variables results in noncorrelated Gaussian
`random variables with the same variance. Thus, n: and n’, are
`noncorrelated and Gaussian with the variance 0.
`Now we have expressed in relation (8) the dependency of
`s on two random variables with known properties n: and n’,.
`The formula for random variables’ distribution transformations
`gives us the distribution function F ( s ) . (Note that henceforth
`s is used to denote the argument of the distribution or
`distribution density functions.)
`
`where D is a disk of radius s about 4.
`We need to find the expectation of the random variable
`whose distribution is given by (9). In order to do that, we
`need the density function which can be found by calculat-
`ing the integral in (9). By introducing the substitution t =
`r cos B,n = r sin 0, we get
`
`The distribution density is given as f ( s ) =
`differentiation,
`
`or, after
`
`The point Q k + l is given by 3 k + 1 by a triple (nt,n,,z),
`where nt is a noise component along t, and n, is a noise
`component along n. Both nt and n, are Gaussian with zero
`mean and mutually noncorrelated. Coordinate z is the altitude
`at p k + 1 .
`
`The last integral can be expressed by a modified Bessel
`function Io ( z )
`
`SAMSUNG EXHIBIT 1018
`Page 7 of 14
`
`
`
`ALLEN et al.: AUTOMATED TRACKING AND GRASPING OF A MOVING OBJECT
`
`159
`
`Fig. 9. Distribution density f ( s ) , s o = 1,a = 0.4-1.0,
`
`incremenr = 0.1.
`
`A graph of f(s) is given in Fig. 9. Here, SO is fixed to 1, and
`r~ varies from 0.4 to 1.0. Our job is to recover SO given f(s).
`It is apparent from Fig. 9 that the peak value of f(s)
`depends on o, and drifts toward higher values as o grows. The
`expectation for s also depends on 0. In particular, we have
`
`where
`
`U(.)
`
`-
`-x2/4
`
`=
`&e
`
`. (lo(:)
`4- :(Io(:)
`+11(:))).(14)
`Here, r~ is the constant for the given system and it is related
`to SO. In order to estimate o, we will use the second-order
`moment
`
`si E ( s 2 ) = / s2f(s)ds = + 2a2’
`l=zu( @T@ )
`
`00
`
`(15)
`
`Equations (13) and (15) are derived in Appendix 11.
`Now by eliminating SO from (13) and (15), we have
`
`F
`
`where p = sz/s1 and z = o/sl. Now by setting x =
`L ,
`we end up with an equation (see Appendix 111)
`d Z - 5
`Ul(X) = ___ 4.)
`= p .
`Equation (17) relates our known control inputs ( p = s2/s1)
`to x. We can create a table of values for this function off line,
`and then by interpolation calculate a value of x given p .
`Let x~(p) be the solution of (17). Now we can express SO
`and o as functions of SI and s2 as follows:
`
`(17)
`
`so = s:,
`
`(18)
`
`2
`
`3
`
`4
`
`Fig. 10. y = ui(z).
`
`Fig. 11. Distribution density f ( 4 ) .
`
`o = s2
`
`1
`
`Jq$
`
`(19)
`
`inter-
`This method requires little on-line computation-an
`polation table of values of u1 is all we need to recover the
`arc length parameter so.
`To find the bending parameter $0, we use the