`
`Intensity and range images, (a) A (synthesized) intensity image of a
`Fig. 2.26
`street scene with potholes. The roofs all have the same intensity, which is different
`from the walls; (b) a corresponding range image. The wall and roof of each house
`have similar ranges, but the ranges differ from house to house.
`
`One basic difference between sound and visible light ranging is that a light
`beam is usually reflected off just one surface, but that a sound beam is generally
`partially transmitted and partially reflected by "surfaces." The returning sound
`pulse has structure determined by the discontinuities in impedence to sound found
`in the medium through which it has passed. Roughly, a light beam returns infor
`mation about a spot, whereas a sound beam can return information about the
`medium in the entire column of material. Thus, although sound itself travels rela
`tively slowly, the data rate implicit in the returning structured sound pulse is quite
`high. Figure 2.27 shows an image made using the range data from ultrasound. The
`
`Sec. 2.3
`
`Imaging Devices for Computer Vision
`
`55
`
`IPR2021-00921
`Apple EX1015 Page 73
`
`
`
`Image made from
`Fig. 2.27
`ultrasound ranging.
`
`sound pulses emanate from the top of the image and proceed toward the bottom,
`being partially reflected and transmitted along the way. In the figure, it is as if we
`were looking perpendicular to the beams, which are being displayed as brighter
`where strong reflectance is taking place. A single "scan line" of sound thus pro
`duces an image of an entire planar slice of medium.
`
`2.3.3 Reconstruction Imaging
`
`Twodimensional reconstruction has been the focus of much research attention
`because of its important medical applications. Highquality images such as that
`shown in Fig. 1.2b can be formed by multiple images of xray projection data. This
`section contains the principles behind the most important reconstruction algo
`rithms. These techniques are discussed in more detail with an expanded list of
`references in [Gordon and Herman 1974]. For a view of the many applications of
`twodimensional reconstruction other than transmission scanning, the reader is re
`ferred to [Gordon etal. 1975].
`Figure 2.28 shows the basic geometry to collect onedimensional projections
`of twodimensional data. (Most systems construct the image in a plane and repeat
`this technique for other planes; there are few true threedimensional reconstruc
`tion systems that use planes of projection data simultaneously to construct
`volumes.)
` projection of
`In many applications sensors can measure the onedimensional
` Gc') of an ideal image fix, y) in the
`twodimensional image data. The projection g
`direction 9 is given by J fix', y') dy' where x' = R
`9x. If enough different projec
`tions are obtained, a good approximation to the image can be obtained with two
`dimensional reconstruction techniques.
`From Fig. 2.28, with the source at the first position along line AA', we can ob
`tain the first projection datum from the detector at the first position along
`line AB is termed a ray and the measurement at B a ray sum. Moving the source
`
` BB'. The
`
`56
`
`Ch. 2
`
`Image Formation
`
`IPR2021-00921
`Apple EX1015 Page 74
`
`
`
`B
`
`Fig. 2.28 Projection geometry.
`
`and detector along lines AA'and BB'in synchrony allows us to obtain the entire
`data for projection 1. Now the lines AA'and BB'are rotated by a small angle
`about 0 and the process is repeated. In the original xray systems d9 was 1° of an
`gle, and 180 projections were taken. Each projection comprised 160 transmission
`measurements. The reconstruction problem is simply this: Given the projection
`datagjCx'), k — 0, . . ., N — I, construct the original image
`fix).
`Systems in use today use a fan beam rather than the parallel rays shown.
`However, the mathematics is simpler for parallel rays and illustrates the funda
`mental ideas. We describe three related techniques: summation, Fourier interpola
`tion, and convolution.
`
` dO
`
`The Summation Method
`k{x') over the
`The summation method is simple: Distribute every ray sum g
`image cells along the ray. Where there are N cells along a ray, each such cell is in
`cremented by —gix'). This step is termed back projection. Repeating this process
`for every ray results in an approximate version of the original [DeRosier 1971].
`This technique is equivalent (within a scale factor) to blurring the image, or con
`volving it with a certain pointspread function. In the continuous case of infinitely
`many projections, this function is simply the radically symmetric h(r) = \/r.
`
`Sec. 2.3
`
`Imaging Devices for Computer Vision
`
`57
`
`IPR2021-00921
`Apple EX1015 Page 75
`
`
`
`(a)
`
`Fig. 2.29 Basis of Fourier techniques, (a) Projection axis x'; (b) corresponding
`axis in Fourier Space.
`
`Fourier A Igorithms
`If a projection is Fouriertransformed, it defines a line through the origin in
`frequency space (Fig. 2.29). To show this formally, consider the expression for the
`twodimensional transform
`
`Fin) = fffix,
`
`y) exp \J2TT (wc + vy)] dx dy
`
`(2.47)
`
`Now consider^ = 0 (projection onto thexaxis): x'
`
` = A:and
`
`Sot*') ffix,
`The Fourier transform of this equation is
`
`y) dy
`
`5 Igoix')] = fflfix,
`
`y) dy) expj2irux dx
`
`= J J fix, y) expj2iTux dy dx
`
`which, by comparison with (2.47), is
`
`(2.48)
`
`(2.49)
`
`yfeoOcO] = F(u,0)
`Generalizing to any 0, the transform of an arbitrary g(x') defines a line in the
`k (w) is the cross section
`Fourier space representation of the cross section. Where S
`of the Fourier transform along this line,
`Sk(<o) = F(u cos0, u sin0)
`= J gk(x') exp
`Thus one way of reconstructing the original image is to use the Fourier transform
`of the projections to define points in the transform of fix),
`interpolate the
`undefined points of the transform from the known points, and finally take the in
`verse transform to obtain the reconstructed image.
`
`(2.50)
`
`(2.51)
`
`[jl7ru(x')]dx'
`
`Ch. 2 Image Formation
`
`IPR2021-00921
`Apple EX1015 Page 76
`
`
`
`r M
`
`M
`
`i «i
`
`»,
`
`Fig. 2.30 Convolution method.
`
`This technique can be applied with transforms other than the Fourier
`transform, and such methods are discussed in [DeRosier 1971; Crowther and Klug
`1971].
`The Convolution Method
`The convolution method is the natural extension of the summation method.
`Since the summation method produces an image degraded from its convolution
`with some function h, one can remove the degradation by a "deconvolution." The
`straightforward way to accomplish this is to Fouriertransform the degraded image,
`l , and inverseFourier
`multiply the result by an estimate of the transformed h~
`transform the result. However, since all the operations are linear, a faster approach
`is to deconvolve the projections before performing the back projection. To show
`this formally, we use the inverse transform
`
`fix) = ff
`
`F(u, v) exp
`
` {/2TT(UX + vy)]du dv
`
`(2.52)
`
`Changing to cylindrical coordinates ico, 9) yields
`
`fix) = ff
`
`F 9i(o)exp[j2ira>ixcos9
`
`+ y sin 9)]\oo\doo d9
`
`(2.53)
`
`Sincex'= xcos9 + y sin0, rewrite Eq. (2.53) as
`
`fix) =
`
`fr
`
`l{Feia>)Hi<o)}d9
`Since the image is bandlimited at some interval (—to
`m, co m) one can define Hico)
`arbitrarily outside of this interval. Therefore, Hico) can be defined as a constant
`minus a triangular peak as shown in Fig. 2.30. Finally, the operation inside the in
`tegral in Eq. (2.54) is a convolution. Using the transforms shown in Fig. 2.30,
`
`(2.54)
`
`fix) = flfoix')
`
`~ f„ix')co
`
`msmc2icomx')]
`Owing to its speed and the fact that the deconvolutions can be performed
`while the data are being acquired, the convolution method is the method employed
`in the majority of systems.
`
`d9
`
`(2.55)
`
`EXERCISES
`
`2.1
`
` an eye of 50 mm and a
`In a binocular animal vision system, assume a focal length /of
` Ax vs. —z using Eq. (2.9). If the resolu
`separation distance rfof 5 cm. Make a plot of
` 50 line pairs/mm, what is the useful range of the bi
`tion of each eye is on the order of
`nocular system?
`
`Exercises
`
`59
`
`IPR2021-00921
`Apple EX1015 Page 77
`
`
`
`In an opponentprocess color vision system, assume that the following relations hold:
`
`RG
`
`Red
`
`Yellow
`
`Blue
`
`BY
`
`Green
`
` components of the opponentprocess sys
`For example, if the (RG, B Y, WBk)
`tem are (0.5, 3, 4), the perceived color will be blue.
` for the following (R,G,B) measurements:
`Work out the perceived colors
`(b)
`(0.2,0.3,0)
`(0.2,0.3,0.4)
`(c)
`(7,4,1)
`(a)
`Develop an indexing scheme for a hexagonal array and define a Euclidean distance
`measure between points in the array.
`Assume that a onedimensional image has the following form:
`
`fix)
`
`= COS(2TT« 0A:)
`and is sampled with u s = u 0. Using the graphical method of Section 2.2.6, find an ex
` to the original im
`pression for fix) as given by Eq. (2.49). Is this expression equal
`age? Explain.
`A certain image has the following Fourier transform:
`
`F(u) =
`
`nonzero
`0
`
`inside a hexagonal domain
`otherwise
`
`(a)
`
`(b)
`
`What are the smallest values for u and v so that Fiu) can be reconstructed
`from F x (u) ?
` v
`Suppose now that rectangular sampling is not used but that now the u and
`directions subtend an angle of 7r/3. Does this change your answer as
` to the
`smallest wand v? Explain.
` of Fig. 2.3 to include convergence: Let the two
`Extend the binocular imaging model
` a
`imaging systems pivot in the y = 0 plane about the viewpoint. Let the system have
`baseline of 2d and be converged at some angle 0 such that a point ix,
` y, z) appears at
`the origin of each image plane.
`(a) Solve for z in terms of r and 9.
`(b) Solve for z in this situation for points with nonzero disparity.
` two Rect functions, where
`Compute the convolution of
`
`Rectlv) =
`
`1
`0
`
`0 < x< 1
`otherwise
`
`Show the steps in your calculations.
`
`Ch. 2
`
`Image Formation
`
`IPR2021-00921
`Apple EX1015 Page 78
`
`
`
`2.8
`
`Rect(x) =
`
`b
`0
`
`for |x | < a
`otherwise
`
`(a) WhatisRect(x)*8(xa)?
`(b) What is the Fourier transform of fix) where fix) = RectGc+c) +
`Rect(xc) and c> a?
`2.9 A digitizer has a sampling interval of Ax = by = A. Which of the following images
`can be represented unambiguously by their samples? (Assume that effects of a finite
`image domain can be neglected.)
`(a)
`(sm(7rx/A))/(7rx/A)
`(b) cos (7r/x/2A)cos(37rx/4A)
`(c) Rect(x) (see Problem 2.8)
`e"* 2
`
`(d)
`
`R E F E R E N C ES
`
`AGIN, G. J. "Representation and description of curved objects" (Ph.D. dissertation). AIM173, Stan
`ford AI Lab, October 1972.
`ANDREWS, H. C. and B. R.
` HUNT. Digital Image Restoration. Englewood Cliffs, NJ: PrenticeHall, Inc.,
`1977.
` KLUG. "ART and science, or, conditions for 3d reconstruction from electron
`CROWTHER, R. A. and A.
`microscope images." /. Theoretical Biology 32, 1971.
`DEROSIER, D. J. "The reconstruction of threedimensional images from electron micrographs." Con
`temporary Physics 12, 1971.
`DUDA, R. O. and P. E.
` HART. Pattern Recognition and Scene Analysis. New York: Wiley, 1973.
`GARVEY, T. D. "Perceptual strategies for purposive vision." Technical Note 117, AI Center, SRI Inter
`national, September 1976.
`GONZALEZ, R. C. and P.
` WINTZ. Digital Image Processing. Reading, MA: AddisonWesley, 1977.
`GORDON, R. and G. T.
` HERMAN. "Threedimensional reconstruction from projections: a review of al
` 111151.
`gorithms." International Review of Cytology 38, 1974,
`GORDON, R., G. T.
` HERMAN, and S. A.
` JOHNSON. "Image reconstruction from projections." Scientific
`American, October 1975.
`HERING, E. "Principles of a new theory of color sense." In Color Vision, R.C. Teevan and R.C. Birney
`(Eds.). Princeton, NJ: D. Van Nostrand, 1961.
` 201231.
` Intelligence 8, 2, April 1977,
`HORN, B. K. P. "Understanding image intensities." Artificial
`HORN, B. K. P. and R. W.
` SJOBERG. "Calculating the reflectance map." Proc, DARPA IU Workshop,
`November 1978,115126.
`HURVICH, L. M. and D.
` JAMESON. "An opponentprocess theory of color vision." Psychological Review
`64,1957,384390.
`JAIN, A. K. "Advances in mathematical models for image processing." Proc. IEEE 69, 5, May 1981,
`502528.
` GREENBERG. "Color spaces for computer graphics." Computer Graphics 12, 3,
`JOBLOVE, G. H. and D.
`August 1978, 2025.
`KENDER, J. R. "Saturation, hue, and normalized color: calculation, digitization effects, and use."
`Technical Report, Dept. of Computer Science, CarnegieMellon Univ., November 1976.
`
`References
`
`61
`
`IPR2021-00921
`Apple EX1015 Page 79
`
`
`
`LAND, E. H. "The retinex theory of color vision." Scientific American, December 1977, 108128.
`MUNSELL, A. H.A Color Notation,
` 8th ed. Baltimore, MD: Munsell Color Co., 1939.
`
` LIMPERIS. "Geometrical con
` GINSBERG, and T.
` HSIA, I. W.
` RICHMOND, J. J.
`NICODEMUS, F. E., J. C.
`siderations and nomenclature for reflectance." NBS Monograph 160, National Bureau
` of Stand
`ards, U.S. Department
` of Commerce, Washington, DC, October 1977.
`NITZAN, D., A. BRAIN, and R. DUDA. "The measurement and use of registered reflectance and range
` 2, February 1977.
`data in scene analysis." Proc. IEEE 65,
`
` CRAWFORD. "Forming models of plane
` AMBLER, and G. F.
` BROWN, A. P.
`POPPLESTONE, R. J., C. M.
` 4th IJCAI, September 1975, 664668.
`andcylinder faceted bodies from light stripes." Proc,
`PRATT, W. K. Digital Image Processing. New York: WileyInterscience, 1978.
`ROSENFIELD A. and A. C. KAK. Digital Picture Processing. New York: Academic Press, 1976.
`SMITH, A. R. "Color gamut transform pairs." Computer
` Graphics 12, 3, August 1978, 1219.
`SUGIHARA, K. "Dictionaryguided scene analysis based
` on depth information." In Progress Report on
`3D Object Recognition. Bionics Research Section, ETL, Tokyo, March 1977.
`TENENBAUM, J. M. and S.
` WEYL. "A regionanalysis subsystem for interactive scene analysis." Proc,
`4th IJCAI, September 1975, 682687.
`
` GRAMIAK. "Methods for ultrasonic imaging of the heart." Ultrasound in Medicine
`WAAG, R. B. and R.
`and Biology 2, 1976, 163170.
` a preprocessing technique for robot and machine vi
`WILL, P. M. and K. S. PENNINGTON. "Grid coding:
`sion." Artificial Intelligence 2, 3/4, Winter 1971, 319329.
`
`62
`
`Ch. 2
`
`Image Formation
`
`IPR2021-00921
`Apple EX1015 Page 80
`
`
`
`Early Processing
`
`3
`
`3.1 RECOVERING INTRINSIC STRUCTURE
`
`The imaging process confounds much useful physical information into the gray
`level array. In this respect, the imaging process is a collection of degenerate
`transformations. However, this information is not irrevocably lost, because there
`is much spatial redundancy: Neighboring pixels in the image have the same or
`nearly the same physical parameters. A collection of techniques, which we call
`early processing, exploits this redundancy in order to undo the degeneracies in the
`imaging process. These techniques have the character of transformations for
` intrinsic images [Barrow and
`changing the image into "parameter images" or
`Tenenbaum 1978; 1981] which reflect the spatial properties of the scene. Common
`intrinsic parameters are surface discontinuities, range, surface orientation, and
`velocity.
`In this chapter we neglect highlevel internal model information even though
`it is important and can affect early processing. Consider the case of the perceived
`central edge in Fig. 3.1a. As shown by Fig. 3.1b, which shows portions of the same
`image, the central edge of Fig. 3.1a is not present in the data. Nevertheless, the hu
`man perceiver "sees" the edge, and one reasonable explanation is that it is a prod
`uct of an internal block model. Modeldirected activity is taken up in later
`chapters. These examples show how high level models (e.g., circles) can affect
`lowlevel processors (e.g., edge finders). However, for the purposes of study it is
`often helpful to neglect these effects. These simplifications make it easier to derive
`the fundamental constraints between the physical parameters and gray levels. Once
`these are understood, they can be modified using the more abstract structures of
`later chapters.
`Most early computer vision processing can be done with parallel computa
`tions whose inputs tend to be spatially localized. When computing intrinsic images
`
`63
`
`IPR2021-00921
`Apple EX1015 Page 81
`
`
`
`(a)
`
`(b)
`
`Fig. 3.1
`
`(a) A perceived edge, (b) Portions of image in (a) showing the lack of image data.
`
`the parallel computations are iterated until the intrinsic parameter measurements
`converge to a set of values. A computation that falls in the paralleliterative
` relaxation [Rosenfeld et al. 1976]. Relaxa
`category is known in computer vision as
`tion is a very general computational technique that is useful in computer vision.
`Specific examples of relaxation computations appear throughout the book; general
`observations on relaxation appear in Chapter 12.
`This chapter covers six categories of early processing techniques:
`
`1. Filtering is a generic name for techniques of changing image gray levels to
`enhance the appearance of objects. Most often this means transformations
`that make the intensity discontinuities between regions more prominent.
`These transformations are often dependent on gross object characteristics. For
`example, if the objects of interest are expected to be relatively large, the image
`can be blurred to erase small intensity discontinuities while retaining those of
`the object's boundary. Conversely, if the objects are relatively small, a
`transformation that selectively removes large discontinuities may be appropri
`ate. Filtering can also compensate for spatially varying illumination.
`
`2. Edge operators detect and measure very local discontinuities in intensity or its
`gradient. The result of an edge operator is usually the magnitude and orienta
`tion of the discontinuity.
`
`3. Range transforms use known geometry about stereo images to infer the dis
`tance of points from the viewer. These transforms make use of the inverse per
`spective transform to interpret how points in threedimensional space project
`onto stereo pairs. A correspondence between points in two stereo images of
`known geometry determines the range of those points. Relative range may
`also be derived from local correspondences without knowing the imaging
`geometry precisely.
`4. Surface orientation can be calculated if the source illumination and reflectance
`properties of the surface are known. This calculation is sometimes called
`
`Ch. 3 Early Processing
`
`IPR2021-00921
`Apple EX1015 Page 82
`
`
`
`"shape from shading." Surface orientation is particularly simple to calculate
`when the source illumination can be controlled.
`
`5. Optical flow, or velocity fields of image points, can be calculated from local
`temporal and spatial variations in sequences of graylevel images.
`
`6. A pyramid is a general structure for representing copies of the image at multi
`ple resolutions. A pyramid is a "utility structure" which can dramatically im
`prove the speed and effectiveness of many early processing and later segmen
`tation algorithms.
`
`3.2 FILTERING THE IMAGE
`
`Filtering is a very general notion of transforming the image intensities in some way
`so as to enhance or deemphasize certain features. We consider only transforms
`that leave the image in its original format: a spatial array of gray levels. Spurred on
`by the needs of planetary probes and aerial reconnaissance, filtering initially
`received more attention than any other area of image processing and there are ex
`cellent detailed reference works (e.g., [Andrews and Hunt 1977; Pratt 1978; Gon
`zalez and Wintz 1977]). We cannot afford to examine these techniques in great
`detail here; instead, our intent is to describe a set of techniques that conveys the
`principal ideas.
`Almost without exception, the best time to filter an image is at the image for
`mation stage, before it has been sampled. A good example of this is the way chemi
`cal stains improve the effectiveness of microscopic tissue analysis by changing the
`image so that diagnostic features are obvious. In contrast, filtering after sampling
` noise, that are undesir
`often emphasizes random variations in the image, termed
`able effects introduced in the sampling stage. However, for cases where the image
`formation process cannot be changed, digital filtering techniques do exist. For ex
`ample, one may want to suppress low spatial frequencies in an image and sharpen
`its edges. An image filtered in this way is shown in Fig. 3.2.
`Note that in Fig. 3.2 the work of recognizing realworld objects still has to be
`done. Yet the edges in the image, which constitute object boundaries, have been
`made more prominent by the filtering operation. Good filtering functions are not
`easy to define. For example, one hazard with Fourier techniques is that sharp
`edges in the filter will produce unwanted "ringing" in the spatial domain, as evi
` a digression to discuss
`denced by Fig. 2.5. Unfortunately, it would be too much of
`techniques of filter design. Instead, the interested reader should refer to the refer
`ences cited earlier.
`
`3.2.1 Template Matching
`
`Template matching is a simple filtering method of detecting a particular feature in
`an image. Provided that the appearance of this feature in the image is known accu
`
`Sec. 3.2 Filtering the Image
`
`*5
`
`IPR2021-00921
`Apple EX1015 Page 83
`
`
`
`(a)
`
`(b)
`
`Fig. 3.2 Effects of high frequency filtering, (a) Original image, (b) Filtered image.
`
`rately, one can try to detect it with an operator called a
` template. This template is, in
`effect, a subimage that looks just like the image of the object. A similarity measure
`is computed which reflects how well the image data match the template for each
`possible template location. The point of maximal match can be selected as the loca
`tion of the feature. Figure 3.3 shows an industrial image and
` a relevant template.
`Correlation
`One standard similarity measure between a function
`the Euclidean distance diy) squared, given by
`diy)2 = y £[fix)tixy)] 2
`
` fix) and a template tix) is
`
`(3.1)
`
`X
`
`M
`N
`By ]T we mean }£ 2£ , for some M, N which define the size of the template ex
`x
`x=—My——N
`tent. If the image at point y is an exact match, then diy)
`Expanding the expression for d 2, we can see that
`d2iy) = E [ / 2 ( x) 2/(x)Kx y) + t 2ix y)]
`
` = 0; otherwise, diy)>0.
`
`(3.2)
`
`Notice that J) t 2ix — y) is a constant term and can be neglected. When
`
` £ / 2 ( x) is
`
`X
`
`X
`
`X
`
`approximately constant it too can be discounted, leaving what is called the cross
`correlation between /and t.
`
`Rfliy) = y Lfix)tixy)
`
`X
`
`(3.3)
`
`This is maximized when the portion of the image "under"
`
` t is identical to t.
`
`Ch. 3 Early Processing
`
`IPR2021-00921
`Apple EX1015 Page 84
`
`
`
`Template
`
`Industrial Image
`
`Fig. 3.3 An industrial image and template for a hexagonal nut.
`
`One may visualize the templatematching calculations by imagining the tem
`plate being shifted across the image to different offsets; then the superimposed
` multiplied together, and the products are
` added. The result
`values at this offset are
`ing sum of products forms an entry in the "correlation array" whose coordinates
`are the offsets attained by the source template.
`If the template is allowed to take all offsets with respect to the image such that
`some overlap takes place, the correlation array is larger than either the template or
`the
`image. An n x n
`image with an m x m
`template yields an
`(n + m — lxn + m — \) correlation array. If the template is not allowed to
` — m + 1 x n — m + 1); for
`shift off the image, the correlation array is (n
`m < n. Another form of correlation results from computing the offsets modulo
`the size of the image; in other words, the template "wraps around" the image. Be
`ing shifted off to the right, its right portion reappears on the left of the image. This
`sort of correlation is called periodic correlation, and those with no such wraparound
`properties are called aperiodic. We shall be concerned exclusively with aperiodic
`correlation. One can always modify the input to a periodic correlation algorithm by
`padding the outside with zeros so that the output is the aperiodic correlation.
`Figure 3.4 provides an example of (aperiodic) "shift, add, multiply" tem
`plate matching. This figure illustrates some difficulties with the simple correlation
`measure of similarity. Many of the advantages and disadvantages of this measure
`stem from the fact that it is linear. The advantages of this simplicity have mainly to
`do with the existence of algorithms for performing the calculation efficiently (in
`transform domain) for the entire set of offsets. The disadvantages have to do with
`
` a ■
`
`Template
`
`Image
`
`Correlation
`
`1 1 1
`1 1 1
`1 1 1
`
`1 10 0 0
`1 1 10 0
`10 10 0
`0 0 0 0 0
`0 0 0 0 8
`
`7 4 2 x x
`5 3 2 x x
`2 1 9 x x
`X X X X X
`
`X X X X X
`x — undefined
`
`(a) A simple template, (b) An image
`Fig. 3.4
`with noise, (c) The aperiodic correlation array of
`the template and image. Ideally peaks in the
`correlation indicate positions of good match. Here
`the correlation is only calculated for offsets that
`leave the template entirely within the image. The
`correct peak is the upper left one at 0,0 offset. The
`"false alarm" at offset 2, 2 is caused by the bright
`"noise point" in the lower right of the image.
`
`Sec. 3.2 Filtering the Image
`
`67
`
`IPR2021-00921
`Apple EX1015 Page 85
`
`
`
`the fact that the metric is sensitive to properties of the image that may vary with
`the offset, such as its average brightness. Slight changes in the shape of the object,
`its size, orientation, or intensity values can also disturb the match.
`Nonetheless, the idea of template matching is important, particularly if Eq.
`(3.3) is viewed as a
` filtering operation instead of an algorithm that does all the work
`of object detection. With this viewpoint one chooses one or more templates
`(filters) that transform the image so that certain features of an object are more
`readily apparent. These templates generally highlight subparts of the objects. One
`such class of templates is edge templates (discussed in detail in Section 3.3).
`We showed in Section 2.2.4 that convolution and multiplication are Fourier
`transform pairs. Now note that the correlation operation in (3.3) is essentially the
`same as a convolution with a function t'(x) =
` t(—x). Thus in a mathematical
`sense cross correlation and convolution are equivalent. Consequently, if the size of
`the template is sufficiently large, it is cheaper to perform the template matching
`operation in the spatial frequency domain, by the same transform techniques as for
`filtering.
`Normalized Correlation
` Eq. (3.3) was that the image en
`A crucial assumption in the development of
`ergy covered by the matching template at any offset was constant; this leads to a
`linear correlation matching technique. This assumption is approximately correct if
`the average image intensity varies slowly compared to the template size, but a
`bright spot in the image can heavily influence the correlation by affecting the sum
`of products violently in a small area (Fig. 3.4). Even if the image is well behaved,
`the range of values of the metric can vary with the size of the matching template.
`Are there ways of normalizing the correlation metric to make it insensitive to these
`variations?
`There is a wellknown treatment of the normalized correlation operation. It
`has been used for a variety of tasks involving registration and stereopsis of images
`[Quam and Hannah 1974]. Let us say that two input images are being matched to
`find the best offset that aligns them.
`2 (possi
`2 is the patch of/
`Let/i(x) and/ 2(x) be the images to be matched. q
`bly all of it) that is to be matched with a similarsized patch oif\.q\'\S
`the patch of
`/] that is covered by q
`2 when q 2 is offset by y.
`Let EQ be the expectation operator. Then
`2\h
`(3.4)
`(E{q,))
`[E(q})
`o( q])=
`2))2\h
`(3.5)
`o(q 2)= [E(qi) (E(q
`{ and q 2. (For notational con
`give the standard deviations of points in patches q
`venience, we have dropped the spatial arguments of q\ and q
`2.) Finally, the nor
`malized correlation is
`
`E(qxq2) E(q l)E(q2)
`,
`\ r
`\
`crKqi)o{q 2)
`and E(q\q 2) is the expected value of the product of intensities of points that are
`superimposed by the translation by y.
`
`N(y) =
`
`0.6)
`
`Ch. 3 Early Processing
`
`IPR2021-00921
`Apple EX1015 Page 86
`
`
`
`The normalized correlation metric is less dependent on the local properties of
`the reference and input images than is the unnormalized correlation, but it is sensi
`tive to the signaltonoise content of the images. High uncorrelated noise in the
`two images, or the image and the reference, decreases the value of the correlation.
`As a result, one should exercise some care in interpreting the metric. If the noise
`properties of the image are known, one indication of reliability is given by the
`"(signal + noise)tonoise" ratio. For the normalized correlation to be useful, the
`standard deviation of the patches of images to be matched (i.e., of the areas of im
`age including noise) should be significantly greater than that of the noise. Then a
`correlation value may be considered significant if it is approximately equal to the
`theoretically expected one. Consider uncorrelated noise of identical standard devi
`ation, in a patch of true value fix, y). Let the noise component of the image be
`n (x, y). Then the theoretical maximum correlation is
`.2
`1 gSaL
`a2(f + n)
`In matching an idealized, noisefree reference pattern, the best expected
`
`value of the cross correlation is f^TT
`
`(3.7)
`
`(38)
`
`If the noise and signal characteristics of the data are known, the patch size
`may be optimized by using that information and the simple statistical arguments
`above. However, such considerations leave out the effects of systematic, nonsta
`tistical error (such as imaging distortions, rotations, and scale differences between
`images). These systematic errors grow with patch size, and may swamp the statisti
`cal advantages of large patches. In the worst case, they may vitiate the advantages
`of the correlation process altogether.
`Since correlation is expensive, it is advantageous to ensure that there is
`enough information in the patches chosen for correlation before the operation is
`done. One way to do this is to apply a cheap "interest operator" before the rela
`tively expensive correlation. The idea here is to make sure that the image varies
`enough to give a usable correlation image. If the image is of uniform intensity,
`even its correlation with itself (autocorrelation) is flat everywhere, and no infor
`mation about where the image is registered with itself is derivable. The "interest
`operator" is a way of finding areas of image with high variance. In fact, a common
`and useful interest measure is exactly the (di