`
`47
`
`A Real-Time Video Tracking System
`
`ALTON L. GILBERT, MEMBER, IEEE, MICHAEL K. GILES, GERALD M. FLACHS, MEMBER, IEEE,
`ROBERT B. ROGERS, MEMBER, IEEE, AND YEE HSUN U, MEMBER, IEEE
`
`Abstract-Object identification and tracking applications of pattern
`recognition at video rates is a problem of wide interest, with previous
`attempts limited to very simple threshold or correlation (restricted
`window) methods.
`New high-speed algorithms together with fast
`digital hardware have produced a system for missile and aircraft iden-
`tification and tracking that possesses a degree of "intelligence" not
`previously implemented in a real-time tracking system.
`Adaptive
`statistical clustering and projection-based classification algorithms are
`applied in real time to identify and track objects that change in ap-
`pearance through complex and nonstationary background/foreground
`situations.
`Fast estimation and prediction algorithms combine linear
`and quadratic estimators to provide speed and sensitivity. Weights are
`determined to provide a measure of confi'dence in the data and result-
`ing decisions. Strategies based on maximizing the probability of main-
`taining track are developed.
`This paper emphasizes the theoretical
`aspects of the system and discusses the techniques used to achieve real-
`time implementation.
`
`Index Terms-Image processing, intensity histograms, object iden-
`tification, optical tracking, projections, tracking system, video data
`compression, video processing, video tracking.
`
`INTRODUCTION
`I MAGE PROCESSING methods constrained to operate on
`sequential images at a high repetition rate are few. Pattern
`recognition techniques are generally quite complex, requiring
`a great deal of computation to yield an acceptable classifica-
`Many problems exist, however, where such a time-
`tion.
`Reasonably complex
`consuming technique is unacceptable.
`operations can be performed on wide-band data in real time,
`yielding solutions to difficult problems in object identification
`and tracking.
`The requirement to replace film as a recording medium to
`obtain a real-time location of an object in the field-of-view
`(FOV) of a long focal length theodolite gave rise to the de-
`velopment of the real-time videotheodolite (RTV). U.S. Army
`White Sands Missile Range began the development of the RTV
`in 1974, and the system is being deployed at this time. Design
`philosophy called for a system capable of discriminatory judg-
`ment in identifying the object to be tracked with 60 indepen-
`dent observations/s, capable of locating the center of mass of
`the object projection on the image plane within about 2 per-
`
`Manuscript received September 14, 1978; revised November 19, 1978.
`This work was supported by the U.S. Army ILIR Program and the U.S.
`Army Research Office.
`A. L. Gilbert and M. K. Giles are with the U.S. Army White Sands
`Missile Range, White Sands, NM 88002.
`G. M. Flachs and R. B. Rogers are with the Department of Electrical
`Engineering, New Mexico State University, Las Cruces, NM 88003.
`Y. H. U was with the Department of Electrical Engineering, New
`Mexico State University, Las Cruces, NM 88003.
`He is now with
`Texas Instruments Incorporated, Dallas, TX 75222.
`
`cent of the FOV in rapidly changing background/foreground
`situations (therefore adaptive), able to generate a predicted
`observation angle for the next observation, and required to
`output the angular displacements of the object within the
`FOV within 20 ms after the observation was made. The sys-
`tem would be required to acquire objects entering the FOV
`that had been prespecified by shape description. In the RTV
`these requirements have been met, resulting in a real-time ap-
`plication of pattern recognition/image processing technology.
`The RTV is made up of many subsystems, some of which
`are generally not of interest to the intended audience of this
`paper. These subsystems (see Fig. 1) are as follows:
`
`1)
`2)
`3)
`4)
`5)
`6)
`7)
`8)
`9)
`10)
`11)
`
`main optics;
`optical mount;
`interface optics and imaging subsystem;
`control processor;
`tracker processor;
`projection processor;
`video processor;
`input/output (I/O) processor;
`test subsystem;
`archival storage subsystem;
`communications interface.
`
`The main optics is a high quality cinetheodolite used for ob-
`taining extremely accurate (rms error - 3 arc-seconds) angular
`data on the position of an object in the FOV. It is positioned
`by the optical mount which responds to azimuthal and eleva-
`tion drive commands, either manually or from an external
`source. The interface optics and imaging subsystem provides
`a capability to increase or decrease the imaged object size on
`the face of the silicon target vidicon through a 10:1 range,
`provides electronic rotation to establish a desired object
`orientation, performs an autofocus function, and uses a gated
`image intensifier to amplify the image and "freeze" the mo-
`tion in the FOV. The camera output is statistically decom-
`posed into background, foreground, target, and plume re-
`gions by the video processor, with this operation carried on
`at video rates for up to the full frame. The projection pro-
`cessor then analyzes the structure of the target regions to
`verify that the object selected as "target" meets the stored
`(adaptive) description of the object being tracked. The tracker
`processor determines a position in the FOV and a measured
`orientation of the target, and decides what level of confidence
`it has in the data and decision. The control processor then
`generates commands to orient the mount, control the inter-
`face optics, and provide real-time data output. An I/O pro-
`
`0162-8828/80/0100-0047$00.75 © 1980 IEEE
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1005
`Samsung v. Image Processing Techs.
`
`
`
`48
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 1, JANUARY 1980
`
`Tracking Optics
`II
`
`TV
`ICamera
`
`Z
`
`Optics Mount
`
`I
`
`i I;i
`A 0 E -E
`
`Inter oce
`Optics
`i
`I
`i
`I
`Z;ZO
`0; 00
`Control
`Processor
`Tracker Processor
`Projection Processor
`
`Video+
`
`Video Processor
`I0 Processor
`
`RTV Processor
`
`1
`
`Encoder
`
`rTV
`
`or
`
`T
`
`ideo Tape
`lRecorder
`l
`L Archival Storage
`Fig. 1. RTV tracking system.
`
`_
`Commo
`ntc
`
`_ .s_l_ .n;Remote
`
`Cont
`Input
`Data Output
`
`cessor allows the algorithms in the system to be changed, in-
`terfaces with a human operator for tests and operation, and
`provides data to and accepts data from the archival storage
`subsystem where the live video is combined with status and
`position data on a video tape. The test subsystem performs
`standard maintenance checks on the system. The communica-
`tions interface provides the necessary interaction with the ex-
`ternal world for outputing or receiving data.
`The video processor, projection processor, tracker processor,
`and control processor are four microprogrammable bit-slice
`microprocessors [1], which utilize Texas Instruments' (TIs')
`new 74S481 Schottky processor, and are used to perform
`the real-time tracking function.
`The four tracking processors, in turn, separate the target
`image from the background, locate and describe the target
`image shape, establish an intelligent tracking strategy, and
`generate the camera pointing signals to form a fully auto-
`matic tracking system.
`Various reports and papers discuss several of the develop-
`mental steps and historical aspects of this project [2] - [7].
`this paper the video, projection, tracker, and control
`In
`processors are discussed at some length.
`
`VIDEO PROCESSOR
`The video processor receives the digitized video, statistically
`analyzes the target and background intensity distributions,
`and decides whether a given pixel is background or target
`[8]. A real-time adaptive statistical clustering algorithm is
`used to separate the target image from the background scene
`The scene in the FOV of the TV
`at standard video rates.
`camera is digitized to form an n X m matrix representation
`P = (pi1) n, m
`of the pixel intensities Pij. As the TV camera scans the scene,
`the video signal is digitized at m equally spaced points across
`During each video field, there are n
`each horizontal scan.
`horizontal scans which generate an n X m discrete matrix
`representation at 60 fields/s. A resolution of m = 512 pixels
`per standard TV line results in a pixel rate of 96 ns per pixel.
`Every 96 ns, a pixel intensity is digitized and quantized into
`
`eight bits (256 gray levels), counted into one of six 256-level
`histogram memories, and then converted by a decision mem-
`ory to a 2-bit code indicating its classification (target, plume,
`or background).
`There are many features that can be func-
`tionally derived from relationships between pixels, e.g., tex-
`ture, edge, and linearity measures. Throughout the following
`discussion of the clustering algorithm, pixel intensity is used
`to describe the pixel features chosen.
`The basic assumption of the clustering algorithm is that
`the target image has some video intensities not contained
`in the immediate background. A tracking window is placed
`about the target image, as shown in Fig. 2, to sample the
`background intensities immediately adjacent to the target
`The background sample should be taken relatively
`image.
`close to the target image, and it must be of sufficient size to
`accurately characterize the background intensity distribution
`in the vicinity of the target. The tracking window also serves
`as a spatial bandpass filter by restricting the target search re-
`gion to the immediate vicinity of the target. Although one
`tracking window is satisfactory for tracking missile targets
`with plumes, two windows are used to provide additional
`reliability and flexibility for independently tracking a target
`and plume, or two targets. Having two independent windows
`allows each to be optimally configured and provides reliable
`tracking when either window can track.
`The tracking window frame is partitioned into a background
`region (BR) and a plume region (PR). The region inside the
`frame is called the target region (TR) as shown in Fig. 2.
`During each field, the feature histograms are accumulated
`for the three regions of each tracking window.
`The feature histogram of a region R is an integer-value, in-
`teger argument function hR (x).
`The domain of hR (x) is
`[O,d], where d corresponds to the dynamic range of the
`analog-to-digital converter, and the range of hR (x) is [O, r],
`where r is the number of pixels contained in the region R;
`thus, there are r + 1 possible values of hR (x). Since the do-
`main hR (x) is a subset of the integers, it is convenient to
`define hR(x) as a one-dimensional array of integers
`h(O), h(l), h(2), *
`* *, h(d)-
`Letting xi denote the ith element in the domain of x (e.g.,
`x25 = 24), and x(j) denote the jth sample in the region R
`(taken in any order), hR (x) may be generated by the sum
`
`hR (Xi) =
`
`r
`
`j =1
`
`xi,x (j)
`
`O
`
`= {
`
`where 6 is the Kronecker delta function
`i*j
`:=j.
`A more straightforward definition which corresponds to the
`actual method used to obtain hR (x) uses Iverson's notation
`[211 to express hR (x) as a one-dimensional vector of d + 1
`integers which are set to zero prior to processing the region
`R as
`h +-(d+ 1)pO.
`
`SAMSUNG EXHIBIT 1005
`Page 2 of 10
`
`
`
`GILBERT et al.: REAL-TIME VIDEO TRACKING SYSTEM
`
`49
`
`I
`
`at =
`
`Letting
`number of background points in PR
`total number of points in PR
`number of background points in TR
`total number of points in TR
`number of plume points in TR
`y =
`= total number of points in TR
`and assuming that 1) the BR contains only background points,
`2) the PR contains background and plume points, and 3) the
`TR contains background, plume, and target points, one has
`hPR (X) = hP(x)
`hfR(x) = ah4(x) + (1 - ca) hr(x)
`hTR(X) = PhB (x) + yhr(x) + (1 - i - T) hT(x).
`By assuming there are one or more features x where hB(x)
`is much larger than hf(x), one has
`hpR(x)
`Ep
`B
`hrR(x)- hpR(X)
`where c = (1 - a) hf(x) << hf?(x). Now for all features x
`where h,f(x) = 0, one has the solution a = hPR(x)IhPR(x). For
`all features x where hip(x) >0, the inequality hpR(x)IhPR(x) > a
`is valid. Consequently, a good estimate for cx is given by
`ax = min {h'R(x)IhPR(X)}
`and this estimate will be exact if there exists one or more fea-
`tures where hPR(x) 0 0 and hf(x) = 0. Having an estimate of
`at and hO(x) allows the calculation of hf(x).
`In a similar manner, estimates of,B and y are obtained,
`x hfR(X)
`3min hP(x
`*hBR(x)
`OY hTR(X)
`=min '
`hp(x)
`Having field-by-field estimates of the background, plume,
`and target density functions (h'(x), h/(x), hf(x)), a linear
`recursive estimator and predictor [101 is utilized to establish
`Letting H(ilj)
`learned estimates of the density functions.
`represent the learned estimate of a density function for the
`ith field using the sampled density functions hi(x) up to the
`jth field, we have the linear estimator
`H(ili)=w H(ili- 1)+(1 - w)hi(x)
`and linear predictor
`H(i + I li) = 2H(fli) - H(i - I ji - 1).
`The above equations provide a linear recursive method for
`The weighting factor
`compiling learned density functions.
`can be used to vary the learning rate. When w = 0, the learn-
`ing effect is disabled and the measured histograms are used
`As w increases toward one, the leaming
`by the predictor.
`effect increases and the measured density functions have a
`
`x
`
`Fig. 2. Tracking window.
`
`As each pixel in the region is processed, one (and only one)
`element of H is incremented as
`h[x(j)] -h[x(j)] + 1.
`When the entire- region has been scanned, h contains the dis-
`tributions of pixels over intensity and is referred to as the
`feature histogram of the region R.
`It follows from the above definition that h satisfies the
`identity
`
`r=hR (xi) or r-+/h.
`i =0
`Since h is also nonnegative and finite, it can be made to sat-
`isfy the requirements of a probability assignment function
`by the normalization
`h-h *. +/h.
`Hereafter, all feature histograms are assumed to be normalized
`and are used as relative-frequency estimates of the probability
`of occurrence of the pixel values x in the region over which
`the histogram is defined.
`For the ith field, these feature histograms are accumulated
`for the background, plume, and target regions and written
`hiR(x):
`
`BR(
`
`x
`
`hiR(x): Eh (x)R=X
`x
`
`1
`
`hR (x): E hTR(X)
`x
`after they are normalized to the probability interval [0, 1].
`These normalized histograms provide an estimate of the
`probability of feature x occurring in the background, plume,
`and target regions on a field-by-field basis. The histograms
`are accumulated at video rates using high-speed LSI mem-
`ories to realize a multiplexed array of counters, one for each
`feature x.
`The next problem in the formulation of a real-time cluster-
`ing algorithm is to utilize the sampled histograms on a field-
`by-field basis to obtain learned estimates of the probability
`density functions for background, plume, and target points.
`Knowing the relative sizes of the background in PR, the back-
`ground in TR, and the plume in TR, allows the computation
`of estimates for the probability density function for back-
`ground, plume, and target features. This gives rise to a type
`of nonparametric classification similar to mode estimation
`as discussed by Andrews [91, but with an implementation
`method that allows for real-time realization.
`
`SAMSUNG EXHIBIT 1005
`Page 3 of 10
`
`
`
`so
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 1, JANUARY 1980
`
`F -o x
`Fig. 3. Projections.
`
`digitized patterns, the projection gives the number of object
`points along parallel lines; hence, it is a distribution of the
`target points for a given view angle.
`It has been shown that for sufficiently large numbers of
`projections a multigray level digitized pattern can be uniquely
`reconstructed [12].
`This means that structural features of a
`pattern are contained in the projections. The binary input
`simplifies the construction of projections and eliminates in-
`terference of structural information by intensity variation
`within the target pattern; consequently, fewer projections
`are required to extract the structural information.
`In fact,
`any convex, symmetric binary pattern can be reconstructed
`by only two orthogonal projections, proving that the projec-
`tions do contain structural information.
`Much research in the projection area has been devoted to
`the reconstruction of binary and multigray level pictures
`from a set of projections, each with a different view angle.
`In the real-time tracking problem, the horizontal and vertical
`projections can be rapidly generated with specialized hard-
`ware circuits that can be operated at high frame rates.
`Al-
`though the vertical and horizontal projections characterize
`the target structure and locate the centroid of the target
`image, they do not provide sufficient information to pre-
`cisely determine the orientation of the target. Consequently,
`the target is dissected into two equal areas and two orthogonal
`projections are generated for each area.
`To precisely determine the target position and orientation,
`the target center-of-area points are computed for the top sec-
`tion (XCT, YcT) and bottom section (XcB, YcB) of the tracking
`parallelogram using the projections. Having these points, the
`target center-of-area (Xc, Yc) and its orientation can be easily
`computed (Fig. 4):
`XT + XB
`c
`c
`
`Xc =
`
`2 2
`
`yT - yB
`q=tanI XT X
`
`C
`
`C
`
`The top and bottom target center-of-area points are used,
`rather than the target nose and tail points, since they are
`much easier to locate, and more importantly, they are less
`sensitive to noise perturbations.
`It is necessary to transform the projection functions into
`
`reduced effect. A small w should be used when the back-
`ground is rapidly changing; however, when the background is
`relatively stationary, w can be increased to obtain a more
`stable estimate of the density functions.
`The predictor provides several important features for the
`First, the predictor provides a better esti-
`tracking problem.
`mate of the density functions in a rapidly changing scene
`which may be caused by background change or sunglare prob-
`lems.
`Secondly, the predictor allows the camera to have an
`automatic gain control to improve the target separation from
`the background.
`With the learned density functions for the background,
`plume, and target features (Hf'(x), HP(x), H1T(x)), a Bayesian
`classifier [11] can be used to decide whether a given feature
`x is a background, plume, or target point. Assuming equal
`a priori probabilities and equal misclassification costs, the
`classification rule decides that a given pixel feature x is a
`background pixel if
`HB(x)>HfI(x) and HB(x)>HiH(x),
`a target pixel if
`HT(x) >>HB(x) and HT(x) >HrP(x),
`or a plume pixel if
`HP(x)>HB(x) and HfP(x)>HfT(x).
`The results of this decision rule are stored in a high-speed
`classification memory during the vertical retrace period.
`With the pixel classification stored in the classification mem-
`ory, the real-time pixel classification is performed by simply
`letting the pixel intensity address the classification memory
`location containing the desired classification.
`This process
`can be performed at a very rapid rate with high-speed bipolar
`memories.
`
`PROJECTION PROCESSOR
`The video processor described above separates the target
`image from the background and generates a binary picture,
`where target presence is represented by a "1" and target
`absence by a "0."
`The target location, orientation, and
`structure are characterized by the pattern of 1 entries in the
`binary picture matrix, and the target activity is character-
`ized by a sequence of picture matrices.
`In the projection
`processor, these matrices are analyzed field-by-field at 60
`fields/s
`using projection-based classification algorithms to
`extract the structural and activity parameters needed to
`identify and track the target.
`The targets are structurally described and located by using
`the theory of projections. A projection in the x-y plane of a
`picture function f(x,y) along a certain direction w onto a
`straight line z perpendicular to w is defined by
`
`PW(Z) =f(x,y) dw
`
`as shown in Fig. 3. In general, a projection integrates the in-
`tensity levels of a picture along parallel lines through the pat-
`tern, generating a function called the projection. For binary
`
`SAMSUNG EXHIBIT 1005
`Page 4 of 10
`
`
`
`GILBERT et al.: REAL-TIME VIDEO TRACKING SYSTEM
`
`51
`
`P%(z)
`
`/ II I
`I
`Z3 Z4 Z5
`
`Z2
`
`Zsi
`
`Zk-1 Zk
`
`.
`
`1
`
`-4- Z
`Zk + i
`
`Fig. 5. Projection parameters.
`
`with the pixel classifier of the video processor. The projec-
`tions are formed by the PAM as the data are received in real
`time. In the vertical retrace interval, the projection processor
`assumes addressing control of the PAM and computes the
`structural parameters before the first active line of the next
`This allows the projections to be accumulated in real
`field.
`time, while the structural parameters are computed during
`the vertical retrace interval.
`
`TRACKER PROCESSOR
`In the tracking problem, the input environment is restricted
`to the image in the FOV of the tracking optics. From this in-
`formation, the tracking processor extracts the important in-
`puts, classifies the current tracking situation, and establishes
`an appropriate tracking strategy to control the tracking optics
`for achieving the goals of the tracking system.
`The state concept can be used to classify the tracking situa-
`tions in terms of state variables as in control theory, or it can
`be interpreted as a state in a finite state automaton [15], [16].
`Some of the advantages of the finite state automaton ap-
`proach are as follows.
`1) A finite state automaton can be easily implemented with
`a look-up table in a fast LSI memory.
`2) A finite state automaton significantly reduces the amount
`of information to be processed.
`3) The tracking algorithm can be easily adjusted to differ-
`ent tracking problems by changing the parameters in the
`look-up table.
`4) The finite state automaton can be given many character-
`istics displayed by human operators.
`The purpose of the tracker processor is to establish an intel-
`ligent tracking strategy for adverse tracking conditions. These
`conditions often result in losing the target image within or out
`of the FOV. When the target image is lost within the FOV,
`the cause can normally be traced back to rapid changes in the
`background scene, rapid changes in the target image due to
`sun glare problems, or cloud formations that obstruct the
`target image. When the target image is lost by moving out of
`the camera's FOV, the cause is normally the inability of the
`tracking optics dynamics to follow a rapid motion of the
`target image.
`It is important to recognize these situations
`and to formulate an intelligent tracking strategy to continue
`tracking while the target image is lost so that the target image
`can be reacquired after the disturbance has passed.
`To establish an intelligent tracking strategy, the tracker pro-
`cessor evaluates the truthfulness and trackability of the track-
`
`xB
`
`Xm
`
`Fig. 4. Projection location technique.
`
`Area quantiza-
`a parametric model for structural analysis.
`tion offers the advantage of easy implementation and high
`This process transforms a projection
`immunity to noise.
`function Pw(z) into k rectangles of equal area (Fig. 5), such
`that
`
`Zi+l
`
`Z-
`
`1 Zk+1
`
`Pw(z)dz=Jk7
`
`Pw(z) dz
`
`for i= 1,2,>-,k.
`Another important feature of the area quantization model
`for a projection function of an object is that the ratio of line
`Zi and L Zk - Z2,
`segments li = Zi+ 1
`1-
`Si= '
`
`for i=2,3, ,k- I
`
`are object size invariant. Consequently, these parameters pro-
`vide a measure of structure of the object which is independent
`of size and location [13]. In general, these parameters change
`continuously since the projections are one-dimensional repre-
`sentations of a moving object. Some of the related problems
`of these geometrical operations are discussed by Johnston and
`Rosenfeld [14].
`The structural parameter model has been implemented and
`successfully used to recognize a class of basic patterns in a
`The pattern class includes triangles,
`noisy environment.
`crosses, circles, and rectangles with different rotation angles.
`These patterns are chosen because a large class of more com-
`plex target shapes can be approximated with them.
`The architecture of the projection processor consists of a
`projection accumulation module (PAM) for accumulating
`projections and a microprogrammable processor for
`the
`The binary target
`computing the structural parameters.
`picture enters the PAM as a serial stream in synchronization
`
`SAMSUNG EXHIBIT 1005
`Page 5 of 10
`
`
`
`52
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 1, JANUARY 1980
`
`ing data. The truthfulness of the tracking data relates to the
`confidence that the measured tracking data truly define the
`location of the target under track. The trackability of the
`target image relates to the question of whether the target
`image has desirable tracking properties.
`The inputs to the tracker processor are derived from the
`projection representation of the target image by the projec-
`Area quantization is used to transform each
`tion processor.
`projection function P(z) into K = 8 equal area intervals as
`shown in Fig. 5.
`These inputs are:
`1)
`target size (TSZ);
`2)
`target location (TX, TY);
`3)
`target orientation (TO);
`4)
`target density (TDN);
`target shape = {(SXi, SY1)Ii = 1, 2, *
`5)
`*, 6}.
`Target size is simply the total number of target points. Target
`location is given by the center-of-area points of the projec-
`Where Xi and Yi are the parameters of Fig. 5 when
`tions.
`projected on the x and y axes, respectively,
`TX=X5 and TY=Y5.
`The target orientation defines the orientation of the target
`image with respect to vertical boresight.
`Target density is
`derived from the target length (TL), width (TW), and size
`(TSZ) by
`
`TDN =
`
`TL X TW
`TSZ
`The target shape is described by the ratio of the lengths of
`the equal area rectangles and the total lengths
`
`SXi = (Xi+ 2 - Xi+ 1)/(X8 - X2)
`and
`
`Syi = (yi+ 2 - Yi+ 1)/(Y8 - Y2)
`for i= 1, 2, *
`* *, 6. Observe that the first and last equal area
`subintervals are not used in the shape description, since they
`are quite sensitive to noise.
`The tracker processor establishes a confidence weight for
`its inputs, computes boresight and zoom correction signals,
`and controls the position and shape of the target tracking
`window to implement an intelligent tracking strategy. The
`outputs of the tracker processor are as follows.
`Outputs to Control Processor: 1) Target X displacement
`from boresight (DX), 2) target Y displacement from bore-
`sight (DY), 3) desired change in zoom (DZ), 4) desired change
`in image rotation (DO), and 5) confidence weight (W).
`Outputs to
`Video Processor: 1) tracking window size,
`2) tracking window shape, and 3) tracking window position.
`The outputs to the control processor are used to control
`the target location and size for the next frame. The bore-
`sight correction signals are used to control the azimuth and
`elevation pointing angles of the telescope. The desired zoom
`is used to control the zoom lens, keeping the target visible
`within the FOV.
`The desired image rotation controls the
`image rotation element to keep the target image vertical. The
`
`confidence weight is used by the control processor much like
`a Kalman weight to combine the measured and predicted
`values. When the confidence weight is low, the control pro-
`cessor relies more heavily on the recent trajectory to predict
`the location of the target on the next frame.
`The outputs to the video processor define the size, shape,
`and position of the tracking window. These are computed
`on the basis of the size and shape of the target image and the
`amount of jitter in the target image location. There is no
`loss in resolution when the tracking window is made larger;
`however, the tracking window acts like a bandpass filter and
`rejects unwanted noise outside the tracking window.
`A confidence weight is computed from the structural fea-
`tures of the target image to measure the truthfulness of the
`input data. The basic objective of the confidence weight is
`to recognize false data caused by rapid changes in the back-
`ground scene or cloud formations. When these situations are
`detected, the confidence weight is reduced and the control
`processor relies more heavily on the previous tracking data
`to orientate the tracking optics toward the target image. This
`allows the control processor to continue tracking the target
`so that the target image can be reacquired after the perturba-
`tion passes.
`The confidence weight measures how well the structural fea-
`tures of the located object fit the target image being tracked.
`A linear recursive filter is used to continually update the
`structural features to allow the algorithm to track the de-
`sired target through different spatial perspectives.
`Experi-
`mental studies have indicated that the structural parameters
`S = {(SXi, SYi)li = 1, 2, *
`* *, 6} and the target density are
`important features in detecting erratic data. Let TDN(k) and
`(SXi (k), SYi (k)) for i = 1, 2, ... , 6 represent the measured
`target density and shape parameters, respectively, for the kth
`field, and let (SXi(k),SYi(k)) represent the filtered values
`for the target shape parameters. The linear filter is defined
`by
`
`SXi(k + 1) = (k
`
`1 ) SXi (k) + SXi (k)
`
`SYi(k + 1)
`
`(k - 1)SYi(k) +SYi(k)
`
`for i = 1, 2,
`, 6 and a positive integer K. The confidence
`weight for the kth field is given by
`W(k) = a,i max {l - C(k), O} + a2 min {l, TDN(k)}
`where
`
`6
`6
`C(k)= £ SXi(k) - SXi(k) + £ SYi(k) - SYi(k)
`i=l
`i=l
`
`and
`O.< 1, t2<1, al+a2=l.
`This formulation for the confidence weight has been experi-
`mentally tested, and it has demonstrated the ability to mea-
`sure the truthfulness of the tracking data in the tracking en-
`The filtered shape parameters are not updated
`vironment.
`
`SAMSUNG EXHIBIT 1005
`Page 6 of 10
`
`
`
`GILBERT et al.: REAL-TIME VIDEO TRACKING SYSTEM
`
`53
`
`when the confidence weight falls below a given lower thresh-
`old.
`This prevents the shape parameters from being updated
`incorrectly during periods when the target image is lost or
`badly perturbed by the background noise.
`To formulate an intelligent tracking strategy, the tracking
`algorithm needs the ability to respond to a current input
`based upon the sequence of inputs that lead to the current
`state (or situation). A finite state sequential machine pos-
`sesses such a property because each state represents the
`collection of all input sequences that take the machine from
`the initial state to the present state (Nerode's tape equiva-
`lence [17] ). By defining an equivalence relation R on the tape
`set as
`
`if 5(so,X)=5(so,y) Vx, yEE*
`XRy
`the tape set 1* can be partitioned into equivalent classes
`[x] =si= {yIxRy VyeZ*}.
`Consequently, a state represents all input sequences that
`produce a given tracking situation.
`This interpretation of
`input sequences transforms the development of the track-
`ing algorithm into a problem of defining a finite state se-
`quential machine
`TA = (S, I, Z, 6, W).
`The states of the machine S = {Si, S2, S3,
`, s,,} define the
`different tracking situations that must be handled by the
`tracking algorithm.
`The inputs to the finite state machine
`are derived from the image parameters that characterize the
`size, shape, and location of the present target image. The
`output set Z defines a finite set of responses that the track-
`ing algorithm employs for maintaining track and retaining
`high resolution data. The next state mapping 8: S X I -S
`defines the next state 8(si, ij) = Sk when an input ij is applied
`to state si. The output mapping W: S -+ Z is a Moore out-
`put that defines the proper tracking strategy (response) for
`each state.
`The inputs to the sequential machine are chosen to give the
`sequential machine a discrete measure of the trackability of
`the target image. These inputs are:
`
`target size (TSZ);
`confidence weight (W);
`image orientation (Tb);
`image displacement from boresight (TX, TY);
`image movement (A TX, A TY);
`rate of change in target size (ATSZ);
`rate of change in confidence weight (A W).
`
`1)
`2)
`3)
`4)
`5)
`6)
`7)
`The image size is clearly an important measure of the track-
`ability since the image is difficult to track when the image
`is either too small or too large. The confidence weight pro-
`vides a measure of whether the image is the target under
`Image displacement from boresight gives a measure
`track.
`of whether the image is leaving the FOV. Image movement
`in the FOV measures the amount of jitter in the target image.
`The rate of change in the target size and confidence weight
`allows a prediction of what is likely to happen on subsequent
`
`fields. These inputs are quantized into an 8-bit binary input
`vector for the sequential machine.
`The states of the sequential machine are chosen to define
`the major tracking situations. These states are:
`1) target acquisition = Sl;
`2) normal tracking= S2;
`3) abrupt chang