`tures." Proc, PRIP, August 1979, 382390.
`COVER, T. M. "Estimation by the nearest neighbor rule." IEEE Trans. Information Theory 14, January
`1968, 5055.
`Fu, K. S. Sequential Methods in Pattern Recognition and Machine Learning. New York: Academic Press,
`1968.
`Fu, K. S. Syntactic Methods in Pattern Recognition. New York: Academic Press, 1974.
`FUKUNAGA, K. Introduction to Statistical
` Pattern Recognition. New York, Academic Press, 1972.
`GIBSON, J. J. The Perception of the Visual
` World. Cambridge, MA: Riverside Press, 1950.
`
`HALL, E. L, R. P. KRUGER, S. J. DWYER III, D. L. HALL, R. W. MCLAREN, and G. S. LODWICK. "A
`vey of preprocessing and feature extraction techniques for radiographic images." IEEE Trans.
`Computers 20, September 1971.
`
` sur
`
`HARALICK, R. M. "Statistical and structural approaches to texture." Proc, 4th IJCPR, November
`1978,4560.
` DINSTEIN. "Textural features for image classification." IEEE
` SHANMUGAM, and I.
`HARALICK, R. M., R.
` 610621.
`Trans. SMC 3, November 1973,
`HOPCROFT, J. E. and J. D.
` ULLMAN. Introduction to Automata Theory, Languages and Computation. Read
`ing, MA: AddisonWesley, 1979.
`JAYARAMAMURTHY, S. N. "Multilevel array grammars for generating texture scenes." Proc, PRIP,
`August 1979, 391398.
`JULESZ, B. "Textons, the elements of texture perception, and their interactions." Nature 290, March
`1981,9197.
`KENDER, J. R. "Shape from texture: a brief overview and a new aggregation transform." Proc,
`DARPA IU Workshop, November 1978, 7984.
`KRUGER, R. P., W. B.
` THOMPSON, and A. F.
` TWINER. "Computer diagnosis of pneumoconiosis." IEEE
`Trans. SMC 45, 1974, 4049.
`LAWS, K. I. "Textured image segmentation." Ph.D. dissertation, Dept. of Engineering, Univ. South
`ern California, 1980.
`Lu, S. Y. and K. S. Fu. "A syntactic approach to texture analysis." CGIP 7, 3, June 1978, 303330.
`MALESON, J. T., C. M.
` BROWN, and J. A.
` FELDMAN. "Understanding natural texture." Proc, DARPA
`IU Workshop, October 1977,1927.
`MILGRAM, D. L. and A.
` ROSENFELD. "Array automata and array grammars." Proc, IFIP Congress 71,
`Booklet TA2. Amsterdam: NorthHolland, 1971, 166173.
`PRATT, W. K., O. D.
` FAUGERAS, and A.
` GAGALOWICZ. "Applications of Stochastic Texture Field
`Models to Image Processing." Proc. of the IEEE. Vol.69, No. 5, May 1981
`ROSENFELD, A. "Isotonic grammars, parallel grammars and picture grammars." In MI6, 1971.
`STEVENS, K.A. "Representing and analyzing surface orientation." In Artificial Intelligence: An MIT
`spective, Vol. 2, P. H. Winston and R. H. Brown (Eds.). Cambridge, MA: MIT Press, 1979.
`STINY, G. and J.
` GIPS. Algorithmic Aesthetics: Computer Models for Criticism and Design in the Arts. Berke
`ley, CA: University of California Press, 1972.
`TAMURA, H., S.
` MORI, and T.
` YAMAWAKI. "Textural features corresponding to visual perception."
`IEEE Trans. SMC 8, 1978, 460473.
` GONZALEZ. Pattern Recognition Principles. Reading, MA: AddisonWesley, 1974.
`Tou, J. T. and R. C.
`WESZKA, J. S., C. R.
` DYER, and A.
` ROSENFELD. "A comparative study of texture measures for terrain
`classification." IEEE Trans. SMC 6, 4, April 1976, 269285.
`ZUCKER, S. W. "Toward a model of texture." CGIP
` 5, 2, June 1976, 190202.
`
` Per
`
`194
`
`Ch. 6 Textyre
`
`IPR2021-00921
`Apple EX1015 Page 210
`
`
`
`Motion
`
`7
`
`7.1 MOTION UNDERSTANDING
`
`Motion imagery presents many interesting challenges to computer vision, but
`static scene analysis received more attention in the 1960's and 1970's. In part, this
`may have been due to a technical problem: With most types of input media and
`domains, motion vision input is much more voluminous than static vision input.
`However, we believe that a more basic problem has been the assumption that mo
`tion vision could best be understood (or implemented) as many static frames
`analyzed very quickly, with results linked up in temporal sequence. This character
`ization of motion vision is extreme but perhaps illuminating. First, it assumes that
`vision involves processing static scenes. Second, it acknowledges that massive
`amounts of data may be required. Third, in it motion understanding degenerates
`to a postprocessing step which is mostly a matching operation—the differences or
`similarities between (understood) frames are analyzed and recorded. The extreme
`"static is basic" view is that motion is an unnaturally complex or difficult problem
`because it is ill suited to the techniques available.
`A modified view is that object motion provides good image cues for segmen
`tation, much as color might. This approach leads to the use of motion for segmen
`tation, so that motion gets a more basic role in the understanding process. In this
`view, motion as such is useful for basic image understanding; a motion image se
`quence may actually be easier to understand than a static image, because the
`effects of motion can help in segmentation. Recent examples may be found in
`[Snyderl981].
`A further departure from the "static is basic" view is that motion under
`standing is qualitatively different from static vision. A logical extreme of this view
`is that there are many visual processing operations whose primitives are points in
`motion, and that in fact static vision is the puzzle, being illsuited to the needs and
`mechanisms of biological systems. Serious work in computer motion understand
`
`195
`
`IPR2021-00921
`Apple EX1015 Page 211
`
`
`
`ing has begun even more recently than computer vision as a whole, and it is too
`early to dismiss any approach out of hand. There are domains and applications in
`which the "static is basic" paradigm seems natural, but it also seems very reason
`able that animals have perceptual systems or subsystems for which "motion is
`basic."
`Section 7.2 is concerned with processing and understanding the "flow" of the
`world image across the retina. Section 7.3 considers several techniques for under
`standing sequences of static images.
`
`7.1.1 Domain Independent Understanding
`
`Domain independent motion processing extracts information from timevarying
`images using the weakest possible assumptions about the world. Processing that
`merely transforms the input data into another imagelike structure is in the pro
`vince of generalized image processing. However, if the motion processing aggre
`gates spatial information on the basis of a common feature, then the processing is a
`form of segmentation.
`The basic visual input for domainindependent work in motion vision under
`standing is optical flow. Although Helmholtz noted the striking immediacy of
`threedimensional perception mediated through motion [Helmholtz 1925], Gib
`son is usually credited with pioneering the theory that a primary visual stimulus for
`motion is the flow of elements in the optic array, or pattern of luminance in the full
`sphere of solid angle surrounding the observer [Gibson 1950, 1957, 1965, 1966].
`Human beings undoubtedly are sensitive to optical flow, as evidenced by the
`"looming" reflex [Schiff 1965], the effect of flow on balance [Lee and Lishman
`1975], and many other documented phenomena [Nakayama and Loomis 1974].
`The basic input to an "optical flow understander" is a continuously changing
` vectors, each expressing the instan
`visual field, which may be considered a field of
` a world point. A field
`taneous change of position on the optic array of the image of
` 7.1. The extraction of the vectors from the chang
`of such vectors is shown in Fig.
`ing image is a lowlevel operation often posited by optical flow research; one com
`putational mechanism was given in Chapter 3. Flow may also be approximated in
`an image sequence by matching and difference operations (Section 7.3.1).
`Computer vision researchers have recently begun to concern themselves
`with both the geometry and computational mechanisms that might be useful in the
`understanding of optical flow [Horn and Schunck 1980; Clocksin 1980; Prager
`1979; Prazdny 1979; Lawton 1981]. Many formalisms are in use. Cartesian, polar
`space, and spherical coordinates all have their appeal in different situations;
`differential vector geometry and simple analytic geometry are both used; even the
`geometry of the eye or camera varies from one study to another. This chapter does
`not contain a "unified flow theory;" instead it briefly describes several approaches,
`each of which uses a different aspect of optical flow.
`
`7.1.2 Domain Dependent Understanding
`
`The use of models, or at least stronger assumptions about the world, is comple
`mentary to domainindependent processing. The changing image, or even the field
`of optical flow, can be treated as input to a modeldriven vision process whose goal
`
`196
`
`Ch. 7 Motion
`
`IPR2021-00921
`Apple EX1015 Page 212
`
`
`
`(a)
`
`(b)
`
`Fig. 7.1 An example of an optical flow field for an approaching "hill." (a) The hill, (b)
`Flow field.
`is typically to segment the input into areas corresponding to meaningful world ob
`jects. The optical.flow field becomes just another component of the generalized im
`age, together with intensity, texture, or color. Motion often reveals information
`similar to that from range data; flow and range are discontinuous at object boun
`daries, surface orientation may be derived, and so forth. Object (or world) mo
`tions determine image (or retinal) motions; we shall be explicit about which
`motion we mean when confusion can occur.
`Section 7.3 describes how knowledge of object motion phenomena can help
`in segmenting the flow field. One useful assumption is that the world contains rigid
`bodies. Tests for rigid bodies and calculations using data from them are quite
`useful—for example, the threedimensional position of four points on a rigid ob
`ject may be determined uniquely from three views (Section 7.3.2). A weaker ob
`ject model, that they are assemblies of compound rigid pendula (linkages), is
`enough to accomplish successful segmentation of very sparse motion input which
`consists only of images of the end points of links (Section 7.3.3). Section 7.3.4
`describes work with a highly specific and detailed model which is used in several
`ways to restrict lowlevel image processing and aid in threedimensional interpreta
`tion of human motion images. Section 7.3.5 considers the processing of sequences
`of segmented images.
`The coherence of most threedimensional objects and their continuity
`through time are two general principles which, although occasionally violated,
`guide many segmentation and pointmatching heuristics. The assumed correspon
`dence of regions in images with objects is one example. Motion images provide
`another example; object coherence implies the likelihood of many "continuity"
`(actually similarity) conditions on the positions and velocities of neighboring
`image points.
`
`Sec. 7.1 Motion Understanding
`
`197
`
`IPR2021-00921
`Apple EX1015 Page 213
`
`
`
`Here are five heuristics for use in matching points from images separated by a
`small time interval [Prager 1979] (Fig. 7.2).
`1. Maximum velocity. If a world point is known to have a maximum velocity V
`with respect to a stationary imaging device, then it can move at most V dt
`between two images made dt time units apart. Thus given the location of the
`point in one image (and some assumptions about depth), this constraint limits
`where the point can appear on the second image.
`2. Small velocity change. Since most visible physical objects have finite mass, this
`heuristic is a conseqence of physical laws and the assumption of a "small inter
`val" between images. Of course, the definition of "small interval" depends on
`the definition of the velocity changes one desires to measure.
`
`/
`
`$
`
`^>
`
`<y
`
`/
`
`Small Velocity Changes
`
`t,
`
`t 2
`
`Maximum Velocity
`
`/
`
`/
`
`/
`
`/
`
`Common Motion
`
`Consistent Match
`
`\
`\
`
`t X
`
`•
`>
`\
`
`/
`
`—
`
`N^
`
`Model
`
`198
`
`Ch. 7 Motion
`
`Fig. 7.2 Five heuristics.
`
`IPR2021-00921
`Apple EX1015 Page 214
`
`
`
`3. Common motion. Spatially coherent objects often appear in successive images
`as regions of points sharing a "common motion." It is interesting that such a
`weak notion as common motion (and the related "common position") actu
` a few points with very complex
`ally can serve to segment very sparse scenes of
`motion behavior if a longenough sequence of images is used (Sections 7.3.3
`and 7.3.4).
`4. Consistent match. Two points from one image generally do not match a single
`point from another image (exceptions arise from occlusions). This is one of
`the main heuristics in the stereopsis algorithm described in Chapter 3.
`5. Known motion. If a world model can supply information about object motions,
`perhaps retinal motions can be derived, predicted, and recognized.
`In the discussions to follow these heuristics (and others) are often used or
`implicitly taken as principles. A careful catalog of the probable behavior of objects
`in motion is often a useful practical adjunct to a mathematical treatment. The
`mathematics itself must be based on a set of assumptions, and often these are
`closely related to the phenomenological heuristics noted above.
`
`7.2 UNDERSTANDING OPTICAL FLOW
`
`This section describes some more direct calculations on optical flow, using no
`other input information. Information may be obtained from flow that seems useful
`both for survival in the world and (on a less existential level) for automated image
`understanding. As with shape from shading research (Chapter 3), the paradigm
`here is often to see mathematically what information resides in the input and to use
`this to suggest mechanisms for doing the computation. The flow input is assumed
`to be known (Chapter 3 showed how to derive optical flow by local analysis of
`changing intensity in the image).
`
`7.2.1 Focus of Expansion
`
`As one moves through a world of static objects, the visual world as projected on the
`retina seems to flow past. In fact, for a given direction of translatory motion and
` flowing out of one particular retinal point,
`direction of gaze, the world seems to be
`the focus of expansion (FOE). Each direction of motion and gaze induces a unique
`FOE, which may be a point at infinity if the motion is parallel to the retinal (image)
`plane.
`These aspects of optical flow have been studied by computing the simulated
`flow pattern an observer would see while moving through a "forest" of vertical
`cylinders [Prager 1979] or Gaussian hills and valleys [Lawton 1981]. Some sample
`FOEs are shown in Fig. 7.3. Figure 7.3c shows a second FOE when the field of view
`contains an object which is itself in motion.
`Our first model of the imaging situation is a simplification of the imaging
`geometry given in Appendix 1. Let the viewpoint be at the origin with the view
`
`Sec. 7.2 Understanding Optical Flow
`
`199
`
`IPR2021-00921
`Apple EX1015 Page 215
`
`
`
`Fig. 7.3 FOE for rectilinear observer motion, (a) An image, (b) Later image, (c) Flow
`shows different FOEs for static floor and moving object.
`
`direction out along the positive Zaxis, and let the focal length /=
`spective distortion equations simplify to
`
` 1. Then the per
`
`,
`
`X
`
`X
`z
`y' = y_
`z
`In the next two sections the letters u, v, and w (sometimes written as func
`tions of /) denote world point velocity components, or the time derivatives of
`world coordinates (x, y, z). Observer motion with instantaneous velocity
` {—dxldt,
`—dy/dt, —dz/dt) = (~u,
` —v, w), keeping the coordinate system attached to the
` w). Consider a
`viewpoint, gives points in a stationary world a relative velocity («, v,
`point located at Gc 0, yo, z 0) at some initial time. After a time interval t, its image
`will be at
`
`(7.1)
`
`(7.2)
`
`(*',/) =
`
`Xo + ut y 0+ vt
`ZQ + Wt' ZQ + Wt
`
`(7.3)
`
`Ch. 7 Motion
`
`200
`
`IPR2021-00921
`Apple EX1015 Page 216
`
`
`
` a straight line; as rgoes
`As t varies, this parametric "flowpath" equation is that of
`to minus infinity, the image of the point travels back along the straight line toward
`a particular point on the image, namely,
`
`u_ v_
`w' w
`This focus of expansion is where the optical flow originates on the image. If the ob
`server changes direction (or objects in the world change their direction), the FOE
`changes as well.
`
`FOE =
`
`(7.4)
`
`7.2.2 Adjacency, Depth, and Collision
`
`<>
`
`The flow path equation of a point moving with a constant velocity reveals informa
`tion about its depth in z. The information is not provided directly, since all flow
` alike. However, there is the elegant re
`paths for points at a given depth do not look
`lation
`
`n
`
`^ z(t)
`Pit)
`w(t)
`VU)
`Here again w is dz/dt, and Fis dD/dt. D is the distance along the straight flow path
`from the FOE to the image of the point. Thus the distance/velocity ratio of the
`point's image is the same as the distance/velocity ratio of the world point. This
`result is basic, but perhaps not immediately obvious.
`The above relation is called the timetoadjacency relation, because the
`righthand side, z/w, is the zdistance of the point from the image plane divided by
`its velocity toward the plane. It is thus the time until the point passes through the
`image plane. This basic time interval is clearly useful when dealing with world ob
`jects; it changes when the magnitude of the world point's velocity (or the
`observer's) changes.
`Knowing the depth of any point determines the depth of all others of the
`same velocity w, for it follows from the two time to adjacency equations of
`the points that
`
`zl(t)D2(t)Vl(t)
`
`Z2(t)= mm
`
`(7 6)
`
`The timetoadjacency equation allows easy determination of the world coor
` z velocity. If the observer is mobile and in control of
`dinates of a point, scaled by its
`his own velocity, and if the world is stationary, such scaled coordinates may be use
`ful. Using the perspective distortion equations,
`z(t)=wMm_
`y{t)=yuu{t)DU)
`
`(77)
`
`( 7 8)
`
`XKt)
`
`v { t)
`
`Understanding Optical Flow
`
`U»)
`
`201
`
`IPR2021-00921
`Apple EX1015 Page 217
`
`
`
`As a last example, let us relate optical flow to the sensing of impending colli
`sions with world objects. The focal point of the imaging system, or origin of coordi
`nates, is at any instant headed "toward the focus of expansion," whose image
`coordinates are (u/w, v/w). It is thus traveling in the direction
`
`0= {JL t ^ , 1)
`w w
`and is following at any instant a path in the environment instantaneously defined
`by the parametric equation
`
`(7.10)
`
`(x,y,z) = tO=t(, ,1)
`w w
` time. Given this vector expression for the
`where racts like a real scalar measure of
`path of the observer, one can apply wellknown vector formulas from analytic solid
`geometry to derive useful information about the relation of this path to world
`points, which are also vectors.
`For example, the position P along the observer's path at which a world point
`approaches closest is given by
`
`(7.11)
`
`where O is the direction of observer motion and x the position of the world point.
`2 between
`Here the period (.) is the dot product operator. The squared distance Q
`the observer and the world point at closest approach is then
`Q2= (xx) (xO)V(OO)
`
`(7.13)
`
`7.2.3 Surface Orientation and Edge Detection
`
`It is possible to derive surface orientation and to characterize certain types of sur
`face discontinuities (edges) by their motion. A formalism, computer program, and
`biologically motivated computational mechanism for these calculations was
`developed in [Clocksin 1980].
`This section outlines mainly the surface orientation aspect of this work. As
` a monocular observer, whose focal point is the origin of
`usual, the model is for
`coordinates. An unusual feature of the model is that the observer has a spherical
`retina. The world is thus projected onto an "image unit sphere" instead of an im
`age plane. World points and surface orientation are represented in an observer
`centered Cartesian coordinate system. The image sphere has a spherical coordi
`nate system which may be considered as "longitude" 9 and "latitude" 0. These
`coordinates bear no relation to the orientation of the retina. World points are then
` r. An observercentered Carte
`determined by their image coordinates and a range
`sian coordinate system is also useful; it is related to the sphere as shown in Fig. 7.4,
`and by the transformations given in Appendix 1.
` a freely moving world point may be found through
`The flow of the image of
`the following derivation. As before, let the world velocity of the point (possibly in
`duced by observer motion) (dx/dt, dy/dt, dz/dt) be written (w, v, w). Similarly,
`
`202
`
`Ch. 7 Molion
`
`IPR2021-00921
`Apple EX1015 Page 218
`
`
`
`
`Fig. 7.4 Spherical coordinate system, and the definition of a and
`
` T.
`
` 9 and <j> directions as
`
`8
`
`e =
`
`write the angular velocities of the image point in the
`d9
`dt
`d<f>
`dt
`Then from the coordinate transformation equations of Appendix 1,
`y = x tan 9
`Differentiating and solving for d9/dt (written as 8) gives
`v — u tan 9
`e
`o =
`x sec 29
`Substituting for x its spherical coordinate expression r sin</> cos0 and simplifying
`yields the general expression for flow in the
` 9 direction:
`
`(7.14)
`
`(7.15)
`
`(7.16)
`
`(7.17)
`
`ft a,
`
`V C 0 S ^ ~ U Sin®
`r sin</>
`The derivation of e proceeds from the coordinate transformation equation
`2 = r cos<f>
`Differentiating, solving for d<f>/dt (written as e), and using
`
`Understanding Optical Flow
`
`(n
`
`i o)
`
`(7.19)
`
`203
`
`IPR2021-00921
`Apple EX1015 Page 219
`
`
`
`dr _ xu + yv + zw
`dt
`r
` <f> direction:
`yields the general expression for flow in the
` — nv
`_ (xu + yv + zw) cos</>
`r2 sin<f>
`As usual, general point motions are rather complicated to deal with, and
`more constraints are needed if the optic flow is to be "inverted" to discover much
`about the outside world. Let us then make the simplification that the world is sta
` z direction at some speed S (This as
`tionary and the observer is traveling along the
`sumption is briefly discussed below.) Explicitly, suppose that
`
`,*, ~^x
`
`C7 21)
`
`v 0, w S
`u 0,
`Substituting these into the general flow equations (7.18) and (7.21) yields
`simplified flow equations:
`
`8 = 0
`
`(7.22)
`
`e = ^ ±
`
`(7.23)
`
`r
`Thus r is a function of 9 and $ and therefore so is e.
`It is this simplified flow equation which forms the basis for surface orientation
`calculation and edge detection. The goals are to assign to any point in the flow field
` edge, surface, or space and also to derive the type of
`one of three interpretations:
`edge and the orientation of the surface.
`To find surface orientation, represent the surface normal of a surface I by
` cr and T being the
`two angles cr and r defined as in Fig. 7.4 with the two planes of
`RZ and QR planes, respectively. The slant is measured relative to the line of sight,
`denoted by R in the figure, cr and T correspond to depth changes in "depth
` 9 and <£, respectively. Thus,
`profiles" oriented along lines of constant
`
`tana =
`
`1 dr
`d(t>
`r
`
`(7.24)
`
`1 dr
`tanr ± ^
`r 89
`r 69
`Surface orientation is defined by cr and r or equivalently by their tangents. A
`surface perpendicular to the line of sight has a = T = 0.
`Equations (7.24) and (7.25) assume the range ris known. However, one can
`determine them without knowing r through the simplified flow equation, Eq.
`(7.23). The latter may be written
`
`(7.25)
`
`_ 5sin</>
`r~
`e(9, 0)
`
`where e (9, 0) gives the flow in the
`9 and <£ gives
`
` <f> direction. Differentiating this with respect to
`
`204
`
`Ch. 7 Motion
`
`IPR2021-00921
`Apple EX1015 Page 220
`
`
`
`
`
`dr = „ e cos 0 — sin <ft Qe/d0)
`e 2
`St
` sin t fa/d9)
`dr_ = $
`e 2
`dO
`These last three equations may be substituted into Eqs. (7.24) and (7.25), and the
`results may then be simplified to the following surface orientation equations:
`J_
`lne
`90
`
`tancr = cotd) —
`
`tar—JLfcw)
`
`(7.26)
`
`(7.27)
`
`(7.28)
`
`(7.29)
`
`These tangents are thus easily computed from optical flow. The result does
`not depend on velocity, and no depth scaling is required. In fact, absolute depth is
`not computable unless we know more, such as the observer speed.
`Turning briefly to edge perception: Although physical edges are a depth
`phenomenon, in flow they are mirrored by e, the flow measure that allows deter
`mination of orientation without depth. In particular, it is possible to demonstrate
`that the Laplacian of e has singularities where the Laplacian of depth has singulari
`ties. An arc on the sphere projects out onto a "depth profile" in the world, along
`which depth may vary. If the arc is parameterized by a, relations among the depth
`profile, flow profile, and the singularities in flow are shown in Fig. 7.5. Thus the
`Laplacian of € provides information about edge type but not about edge depth.
`The formal derivations are at an end. Implementing them in a computer pro
`gram or in a biological system requires solutions to several technical problems.
`More details on the implementation of this model on a computer and a possible
`Sing.V2^
`
`Range
`profile
`
`Flow
`profile
`
`Theoretical
`edge
`signature
`
`N
`
`Fig. 7.5 The singularities of the
`second derivative of the flow profile
`inform about the type of edge.
`
`Sec. 7.2 Understanding Optical Flow
`
`205
`
`IPR2021-00921
`Apple EX1015 Page 221
`
`
`
`implementation using lowlevel physiological vision primitives appear in [Clocksin
`1980]. There are some data on human performance for the types of tasks at
`tempted by the program. The assumption of
` a fixed environment basically implies
`that flow motions in the environment are likely to be interpreted as observer mo
`tions. This view is rather strikingly borne out by "swaying room" experiments
`[Lee and Lishman 1975], in which a subject stands in a swayable visual environ
`ment. (A large, lowmass bottomless box suspended from above may be lowered
`around the subject, giving him a roomlike visual environment.) When the hang
`ing "room" is made to sway, the subject inside tends to lose balance. Further,
`moving surfaces in the real world are quite often objects of interest, such as an
`imals.
`A survey of depth perception experiments [Braunstein 1976] points to mo
`tion as the dominant indicator of surface orientation perception. Randomdot
`displays of monocular flow patterns [Rogers and Graham 1979] evoke striking per
`ceptions of solid oriented surfaces; flow may be adequate for shape and depth per
`ception even with no other depth information. The experiments on perception of
`"edges," or discontinuities in flow caused by discontinuities in depth of textured
`surfaces, are less common. However, there have been enough to provide some
`confirmation of the model.
`The computational model is consistent with and has correctly predicted
`psychological data on human thresholds for slant and edge perception in optical
`flow fields. (The thresholds are on the amount of slant to the surface and the depth
`difference of the edge sides.) The computational model can be used to determine
`range, but only to poor accuracy; this happens to correspond with the human trait
`that orientation is much more accurately determined by flow than is range. Quanti
`tatively, the accuracy of orientation and range determinations are the same for the
`model and for human beings under similar conditions.
`
`7.2.4 Egomotion
`
`It is possible to extract information about complex observer motions from optical
`flow, although at considerable computational cost. In one formulation [Prazdny
`1979], a model observer is allowed to follow any space curve in an environment of
`stationary objects, while at the same time turning its head. It is possible to derive
`formulae that determine the observer's instantaneous velocity vector and head ro
`tational vector from a small number (six) of flow vectors in the image on a (stand
`ard flat) retina.
`The equations that describe flow given observer motion and head rotation
`can be quite compactly written by using vector operators and a polar coordinate
`system (similar to that of the last section). The inherent elegance and power of the
`vector operations is well displayed in these calculations. Inverting the equations
`results in a system of three cubic equations of
` 20 terms each. Such a system can be
`solved by normal methods for simultaneous nonlinear equations, but the solutions
`tend to be relatively sensitive to noise. In the noisefree case, the method seems to
`perform quite adequately.
`The calculation yields a method for deriving relative depth, or the ratio of the
`
`206
`
`Ch. 7 Motion
`
`IPR2021-00921
`Apple EX1015 Page 222
`
`
`
`distances of points from the observer. An approximation to surface orientation
`may be obtained using several relative depth measurements in a small area and as
`suming that the surface normal varies slowly in tne area.
`
`7.3 UNDERSTANDING IMAGE SEQUENCES
`
`An image sequence is an ordered set of images. The image sequences of interest
`here are samplings of fourdimensional spacetime. Commonly, as in a movie, the
` a threedimensional physical world, se
`images are twodimensional projections of
`quenced through time. Sometimes the sequence consists of twodimensional im
`ages of essentially twodimensional slices of the threedimensional world, se
`quenced through the third spatial dimension. Some of the techniques in this sec
`tion are useful in interpreting the threedimensional nature of objects from such
`spatial image sequences, but the main concern here is with temporal image se
`quences. In many practical applications, the input must be such a sequence, and
`continuous motion must be inferred from discrete location differences of image
`points. The thrust of work under these assumptions is often to extend static image
`understanding by making models that incorporate or explain objects in motion, ex
`tending segmentation to work across time [Thompson 1979, Tsotsos 1980].
`When asked why he was listening to a metronome ticking, Ezra Pound is said
`to have replied that he did not listen to the ticks, but to the "spaces between
`them." Like Pound, we take the ticks, or images, as given, and are really in
`terested in what goes on "between the ticks." We usually want to determine and
`describe how the images are related to each other. This information must be
`derived from the static images, and two approaches immediately present them
`selves: broadly, the first is to look for differences between the images, and the
`second is to look for similarities.
`These two approaches are complementary, and are often used together. A
`general paradigm for objectoriented motion analysis is the following:
`1. Segment (describe) the individual images. This process may be complex,
`yielding a relational structure or a segmentation into regions or edges. An im
`portant special case is the one in which the description (se