throbber
APPL-1010/Page 1 of 16
`Apple Inc. v. Corephotonics
`
`

`

`Texts in Computer Science
`
`Editors
`David Gries
`Fred B. Schneider
`
`Forfurther volumes:
`www.springer.com/series/3191
`
`APPL-1010 / Page 2 of 16
`
`APPL-1010 / Page 2 of 16
`
`

`

`
`
`Richard Szeliski
`
`Computer Vision
`
`Algorithms and Applications
`
`Z) Springer
`
`APPL-1010 / Page 3 of 16
`
`APPL-1010 / Page 3 of 16
`
`

`

`
`
`Dr. Richard Szeliski
`Microsoft Research
`
`One Microsoft Way
`98052-6399 Redmond
`Washington
`USA
`szeliski@ microsoft.com
`
`Series Editors
`David Gries
`Department of Computer Science
`Upson Hall
`Cornell University
`Ithaca, NY 14853-7501, USA
`
`Fred B. Schneider
`Department of Computer Science
`UpsonHall
`Cornell University
`Ithaca, NY 14853-7501, USA
`
`ISSN 1868-0941
`ISBN 978-1-84882-934-3
`DOI 10.1007/978-1-84882-935-0
`Springer London Dordrecht Heidelberg New York
`
`e-ISSN 1868-095X
`e-ISBN 978-1-84882-935-0
`
`British Library Cataloguing in Publication Data
`A catalogue record for this book is available from the British Library
`
`Library of Congress Control Number: 2010936817
`
`© Springer-Verlag London Limited 2011
`Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
`permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
`stored or transmitted,
`in any form or by any means, with the prior permission in writing of the
`publishers, or in the case of reprographic reproduction in accordance with the terms oflicenses issued by
`the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent
`to the publishers.
`The use of registered names, trademarks,etc., in this publication does not imply, even in the absence of a
`specific statement, that such names are exempt from the relevant laws and regulations and therefore free
`for general use.
`The publisher makes norepresentation, express or implied, with regard to the accuracy of the information
`contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
`that may be made.
`
`Printed on acid-free paper
`
`Springer is part of Springer Science+Business Media (www.springer.com)
`
`APPL-1010 / Page 4 of 16
`
`APPL-1010 / Page 4 of 16
`
`

`

`
`
`This modelis equivalentto first projecting the world points onto a local fronto-parallel image
`plane and then scaling this image using regular perspective projection. The scaling can be the
`samefor all parts of the scene (Figure 2.7b) or it can be different for objects that are being
`modeled independently (Figure 2.7c). More importantly, the scaling can vary from frame to
`frame whenestimating structure from motion, which can better model the scale changethat
`occurs as an object approaches the camera.
`Scaled orthography is a popular model for reconstructing the 3D shape of objects far away
`from the camera, since it greatly simplifies certain computations. For example, pose (camera
`
`
`
`APPL-1010 / Page 5 of 16
`
`easier to express exact rotations. When the angle is in radians, the derivatives of R with
`respect to w can easily be computed (2.36).
`Quaternions, on the other hand,are better if you want to keep track of a smoothly moving
`camera, since there are no discontinuities in the representation.It is also easier to interpolate
`between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994; Bregler
`and Malik 1998).
`Myusual preference is to use quaternions, but to update their estimates using an incre-
`mental rotation, as described in Section 6.2.2.
`
`2.1.5 3D to 2D projections
`
`Nowthat we know howto represent 2D and 3D geometric primitives and how to transform
`them spatially, we need to specify how 3D primitives are projected onto the image plane. We
`can do this using a linear 3D to 2D projection matrix. The simplest model is orthography,
`which requires nodivision to get the final (inhomogeneous)result. The more commonly used
`modelis perspective, since this more accurately models the behavior of real cameras.
`
`Orthography and para-perspective
`
`An orthographic projection simply drops the z componentof the three-dimensional coordi-
`nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x to denote
`2D points.) This can be written as
`
`If we are using homogeneous(projective) coordinates, we can write
`
`1
`
`0
`
`0
`
`0
`
`01]
`0
`z=/|/0 1
`000 1
`
`p,
`
`i.e., we drop the z component but keep the w component. Orthography is an approximate
`model for long focal length (telephoto) lenses and objects whose depth is shallow relative
`to their distance to the camera (Sawhney and Hanson 1991). It is exact only for telecentric
`lenses (Baker and Nayar 1999, 2001).
`In practice, world coordinates (which may measure dimensions in meters) need to be
`scaled to fit onto an image sensor (physically measured in millimeters, but ultimately mea-
`sured in pixels). For this reason, scaled orthographyis actually more commonly used,
`
`x = [sIgx2|0] p.
`
`APPL-1010 / Page 5 of 16
`
`

`

`(a) 3D view
`
`
`(b) orthography
` (c) scaled orthography
`
`(e) perspective
`
`(f) object-centered
`
`Figure 2.7 Commonly usedprojection models: (a) 3D view of world, (b) orthography, (c) scaled orthography,
`(d) para-perspective, (e) perspective, (f) object-centered. Each diagram showsa top-down viewofthe projection.
`Note how parallel lines on the ground plane and boxsides remain parallel in the non-perspective projections.
`
`APPL-1010 / Page 6 of 16
`
`APPL-1010 / Page 6 of 16
`
`

`

`
`
`orientation) can be estimated using simple least squares (Section 6.2.1). Under orthography,
`structure and motion can simultaneously be estimated using factorization (singular value de-
`composition), as discussed in Section 7.3 (Tomasi and Kanade 1992).
`A closely related projection model is para-perspective (Aloimonos 1990; Poelman and
`Kanade 1997).
`In this model, object points are again first projected onto a local reference
`parallel to the image plane. However, rather than being projected orthogonally to this plane,
`they are projected parallel to the line of sight to the object center (Figure 2.7d). This is
`followed by the usual projection onto the final image plane, which again amountsto a scaling.
`The combination of these two projectionsis therefore affine and can be written as
`
`403
`G02
`401
`G00
`
`
`
`=| 413|P.aio G11 G12
`0
`0
`0
`1
`
`
`
`Note howparallel lines in 3D remain parallel after projection in Figure 2.7b—d. Para-perspective
`provides a more accurate projection model than scaled orthography, without incurring the
`added complexity of per-pixel perspective division, which invalidatestraditional factoriza-
`tion methods (Poelman and Kanade 1997).
`
`Perspective
`
`The most commonly used projection in computer graphics and computer vision is true 3D
`perspective (Figure 2.7e). Here, points are projected onto the image plane by dividing them
`by their z component. Using inhomogeneouscoordinates, this can be written as
`
`z£=P.(p)=|
`
`y/z
`1
`
`|.
`
`In homogeneouscoordinates, the projection has a simple linear form,
`
`1
`
`0
`
`0
`
`0
`
`0 0/9,
`z=|/0 1
`00 1
`0
`
`i.e., we drop the w component of p. Thus, after projection, it is not possible to recover the
`distance of the 3D point from the image, which makessense for a 2D imaging sensor.
`A form often seen in computer graphics systems is a two-step projection thatfirst projects
`3D coordinates into normalized device coordinates in the range (x,y,z)
`[-1,—1] x
`[—1, 1] x [0, 1], and then rescales these coordinates to integer pixel coordinates using a view-
`port transformation (Watt 1995; OpenGL-ARB 1997).
`The(initial) perspective projection
`is then represented using a 4 x 4 matrix
`
`~
`c=
`
`1
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`0
`
`0 Par]Zrange
`
`0
`
`PnceBian}Somme
`
`=
`
`p
`
`;
`
`0
`
`1
`
`0
`
`where Znear and Zfa, are the near and far z clipping planes and Zrange = Zfar — Znear- Note
`that the first two rows are actually scaled by the focal length and the aspect ratio so that
`
`
`
`APPL-1010 / Page 7 of 16
`
`
`
`APPL-1010 / Page 7 of 16
`
`€
`

`

`
`
`Figure 2.8 Projection of a 3D camera-centered point p, onto the sensor planesat location p. O, is the camera
`center (nodalpoint), c, is the 3D origin ofthe sensor plane coordinate system, and s, and s,, are the pixel spacings.
`
`[—1, —1]?. The reasonfor keepingthe third row, rather
`visible rays are mapped to (x, y, z) €
`than droppingit, is that visibility operations, such as z-buffering, require a depth for every
`graphical element that is being rendered.
`
`If we set Zncar = 1, Ztar —> ©, and switch the sign of the third row, the third element
`of the normalized screen vector becomesthe inverse depth,i.e., the disparity (Okutomi and
`Kanade 1993). This can be quite convenient in manycases since, for cameras moving around
`outdoors, the inverse depth to the camera is often a more well-conditioned parameterization
`than direct 3D distance.
`
`While a regular 2D image sensor has no way of measuring distance to a surface point,
`range sensors (Section 12.2) and stereo matching algorithms (Chapter 11) can compute such
`values. It is then convenient to be able to map from a sensor-based depth or disparity value d
`directly back to a 3D location using the inverse of a 4 x 4 matrix (Section 2.1.5). We can do
`this if we represent perspective projection using a full-rank 4 x 4 matrix, as in (2.64).
`
`Cameraintrinsics
`
`Once wehave projected a 3D point through an ideal pinhole using a projection matrix, we
`muststill transform the resulting coordinates according to the pixel sensor spacing and the
`relative position of the sensor plane to the origin. Figure 2.8 showsan illustration of the
`geometry involved. In this section, we first present a mapping from 2D pixel coordinates to
`3D rays using a sensor homography M,, since this is easier to explain in termsof physically
`measurable quantities. We then relate these quantities to the more commonly used camera in-
`trinsic matrix K,, which is used to map 3D camera-centered points p, to 2D pixel coordinates
`Zs.
`
`Image sensors return pixel values indexed by integer pixel coordinates (xs, Ys), often
`with the coordinates starting at the upper-left corner of the image and moving down and to
`the right.
`(This convention is not obeyed by all imaging libraries, but the adjustment for
`other coordinate systems is straightforward.) To map pixel centers to 3D coordinates, wefirst
`scale the (x;, ys) values by the pixel spacings (sz, 8) (sometimes expressed in microns for
`solid-state sensors) and then describe the orientation of the sensorarray relative to the camera
`projection center O, with an origin c, and a 3D rotation R, (Figure 2.8).
`
`APPL-1010 / Page 8 of 16
`
`APPL-1010 / Page 8 of 16
`
`

`

`The combined 2D to 3D projection can then be written as
`
`x
`0
`O
`Ss,
`O
`s,
`0

`—
`
`p=[Rsles}] 4 9||us|= Mets.9
`
`
`
`1
`
`0
`
`O
`
`1
`
`Thefirst two columnsof the 3 x 3 matrix IM, are the 3D vectors correspondingto unit steps
`in the image pixel array along the x, and y, directions, while the third column is the 3D
`image array origin C,.
`the three parameters describing
`The matrix M,, is parameterized by eight unknowns:
`the rotation R,, the three parameters describing the translation c,, and the two scale factors
`(Sz, Sy). Note that we ignore here the possibility of skew between the two axes onthe image
`plane, since solid-state manufacturing techniques render this negligible. In practice, unless
`we have accurate external knowledge of the sensor spacing or sensor orientation, there are
`only seven degrees of freedom, since the distance of the sensor from the origin cannot be
`teased apart from the sensor spacing, based on external image measurementalone.
`However, estimating a camera model M, with the required seven degrees of freedom
`(i.e., where the first two columnsare orthogonal after an appropriate re-scaling) is impractical,
`so mostpractitioners assume a general 3 x 3 homogeneous matrix form.
`The relationship between the 3D pixel center p and the 3D camera-centered point p,is
`given by an unknownscaling s, p = sp,. We can therefore write the complete projection
`between p, and a homogeneousversion ofthe pixel address %, as
`
`&,=aM,'p, = Kp,.
`
`The 3 x 3 matrix K is called the calibration matrix and describes the camera intrinsics (as
`opposed to the camera’s orientation in space, whicharecalled the extrinsics).
`From the above discussion, we see that K has seven degrees of freedom in theory and
`eight degrees of freedom (the full dimensionality of a 3 x 3 homogeneous matrix) in practice.
`Why, then, do most textbooks on 3D computer vision and multi-view geometry (Faugeras
`1993; Hartley and Zisserman 2004; Faugeras and Luong 2001) treat K as an upper-triangular
`matrix with five degrees of freedom?
`While this is usually not made explicit in these books, it is because we cannot recover
`the full K matrix based on external measurement alone. Whencalibrating a camera (Chap-
`ter 6) based on external 3D points or other measurements (Tsai 1987), we end up estimating
`the intrinsic (AC) and extrinsic (A, t) camera parameters simultaneously usingaseries of
`measurements,
`
`@,—K| B|t | p,= Pr,
`wherep,,, are known 3D world coordinates and
`
`P=K[R{t]
`
`is known as the camera matrix. Inspecting this equation, we see that we can post-multiply
`K by R,and pre-multiply [R|¢] by RY, and still end up with a valid calibration. Thus,it
`is impossible based on image measurements alone to know thetrue orientation of the sensor
`and the true camera intrinsics.
`
`
`
`APPL-1010 / Page 9 of 16
`
`APPL-1010 / Page 9 of 16
`
`

`

`
`
`Figure 2.9 Simplified cameraintrinsics showingthe focal length f and the optical center (c,, cy). The image
`width and height are W and H.
`
`The choice of an upper-triangular form for K seems to be conventional. Given a full
`3 x 4camera matrix P = K[R|t], we can compute an upper-triangular K matrix using QR
`factorization (Golub and Van Loan 1996). (Note the unfortunate clash of terminologies: In
`matrix algebra textbooks, R represents an upper-triangular (right of the diagonal) matrix; in
`computer vision, R is an orthogonalrotation.)
`There are several ways to write the upper-triangular form of K. One possibility is
`
`Cy
`S
`tee
`K=|0 fy q |,
`0
`Oo
`1
`
`(2.57)
`
`which uses independentfocal lengths f, and fy for the sensor « and y dimensions. The entry
`s encodes any possible skew between the sensor axes due to the sensor not being mounted
`perpendicular to the optical axis and (c,,c,) denotes the optical center expressed in pixel
`coordinates. Another possibility is
`
`Kk=]/
`
`8s
`f
`0 af
`0
`oO
`
`Cz
`vc
`41
`
`|,
`
`(2.58)
`
`where the aspect ratio a has been made explicit and a commonfocallength f is used.
`In practice, for many applications an even simpler form can be obtained bysetting a = 1
`and s = 0,
`
`0 &
`f
`K=|0 f
`vx
`00 1
`
`|.
`
`(2.59)
`
`Often, setting the origin at roughly the center of the image,e.g., (Cz,cy) = (W/2, H/2),
`where W and H are the image height and width, can result in a perfectly usable camera
`model with a single unknown,i.e., the focal length f.
`Figure 2.9 shows how these quantities can be visualized as part of a simplified imaging
`model. Note that now we have placed the image plane in front of the nodal point (projection
`center of the lens). The sense of the y axis has also been flipped to get a coordinate system
`compatible with the way that most imaginglibraries treat the vertical (row) coordinate. Cer-
`tain graphics libraries, such as Direct3D,use a left-handed coordinate system, which can lead
`to some confusion.
`
`APPL-1010 / Page 10 of 16
`
`
`
`APPL-1010 / Page 10 of 16
`
`

`

`
`
`Figure 2.10 Central projection, showing the relationship between the 3D and 2D coordinates, p and a, as well
`as the relationship between the focal length f, image width W,andthe field of view 6.
`
`A note on focal lengths
`
`The issue of how to express focal lengths is one that often causes confusion in implementing
`computer vision algorithms and discussing their results. This is because the focal length
`depends on the units used to measure pixels.
`If we numberpixel coordinatesusing integer values, say [0, W) x [0, H), the focal length
`f and cameracenter (cz, Cy) in (2.59) can be expressed as pixel values. How do these quan-
`tities relate to the more familiar focal lengths used by photographers?
`Figure 2.10 illustrates the relationship between the focal length f, the sensor width W,
`and the field of view 0, which obey the formula
`
`tan = 7 or f= “ an 4 5
`
`-1
`
`For conventional film cameras, W = 35mm,and hence f is also expressed in millimeters.
`Since we work with digital images, it is more convenient to express W in pixels so that the
`focal length f can be useddirectly in the calibration matrix K asin (2.59).
`Anotherpossibility is to scale the pixel coordinates so that they go from [—1, 1) along
`the longer image dimension and [—a~',a~') along the shorter axis, where a > 1 is the
`image aspectratio (as opposedto the sensor cell aspect ratio introducedearlier). This can be
`accomplished using modified normalized device coordinates,
`
`xl, = (2a, -—W)/S and y, = (2y, —H)/S, where S = max(W, #).
`
`This has the advantage that the focal length f and optical center (cz, cy) become independent
`of the image resolution, which can be useful when using multi-resolution, image-processing
`algorithms, such as image pyramids(Section 3.5).” The use of S' instead of W also makesthe
`focal length the same for landscape (horizontal) and portrait (vertical) pictures, as is the case
`in 35mm photography. (In some computer graphics textbooks and systems, normalized device
`coordinates go from [—1, 1] x [—1, 1], which requires the use of two different focal lengths
`to describe the camera intrinsics (Watt 1995; OpenGL-ARB 1997).) Setting S = W = 2in
`(2.60), we obtain the simpler (unitless) relationship
`
`
`
` f-+ =tan (2.62)
`
`2 To make the conversiontruly accurate after a downsamplingstep in a pyramid,floating point values of W and
`H would have to be maintained since they can become non-integral if they are ever odd at a larger resolution in the
`\
`pyramid.
`
`APPL-1010/ Page 11 of 16
`
`APPL-1010 / Page 11 of 16
`
`

`

`The conversion between the various focal length representations is straightforward,e.g.,
`to go from a unitless f to one expressedin pixels, multiply by W/2, while to convert from an
`f expressed in pixels to the equivalent 35mm focal length, multiply by 35/W.
`
`Camera matrix
`
`Now that we have shown how to parameterize the calibration matrix K, we can put the
`camera intrinsics and extrinsics together to obtain a single 3 x 4 camera matrix
`
`P=K[R|t].
`
`(2.63)
`
`It is sometimes preferable to use an invertible 4 x 4 matrix, which can be obtained by not
`dropping the last row in the P matrix,
`
`[K 0|[Rt
`
`~
`
`=| or tf ge
`
`-
`
`| | =e.
`
`(2.64)
`
`where E is a 3D rigid-body (Euclidean) transformation and K is the full-rank calibration
`matrix. The 4 x 4 camera matrix P can be used to map directly from 3D world coordinates
`Dy = (Zw; Yw; Zw; 1) to screen coordinates(plus disparity), x; = (Xs, ys, 1, d),
`
`xz, ~ Pp,,,
`
`(2.65)
`
`where ~ indicates equality up to scale. Note that after multiplication by P, the vector is
`divided by the third element ofthe vector to obtain the normalized form 2, = (25, Ys, 1, d).
`
`Plane plus parallax (projective depth)
`
`In general, when using the 4 x 4 matrix P, we havethe freedom to remap the last row to
`whateversuits our purpose(rather than just being the “standard” interpretation of disparity as
`inverse depth). Let us re-write the last row of P as pz = s3[fio|co|, where||fo|| = 1. We
`then have the equation
`
`(2.66)
`d= = (tg ‘Du +),
`where z = p.- Pp, = 12+ (DP, — C) is the distance of p,, from the camera center C’' (2.25)
`along the optical axis Z (Figure 2.11). Thus, we can interpret d as the projective disparity
`or projective depth of a 3D scene point p,, from the reference plane fio - p,, + co = 0
`(Szeliski and Coughlan 1997; Szeliski and Golland 1999; Shade, Gortler, He et al. 1998;
`Baker, Szeliski, and Anandan 1998). (The projective depth is also sometimescalled parallax
`in reconstruction algorithms that use the term plane plus parallax (Kumar, Anandan, and
`Hanna 1994; Sawhney 1994).) Setting Ao = O and co = 1, 1.e., putting the reference plane
`at infinity, results in the more standard d = 1/z version of disparity (Okutomi and Kanade
`1993).
`Another wayto see this is to invert the P matrix so that we can mappixels plus disparity
`directly back to 3D points,
`
`(2.67)
`py =Pag.
`In general, we can choose P to have whatever form is convenient, i.e., to sample space us-
`ing an arbitrary projection. This can comein particularly handy whensetting up multi-view
`
`APPL-1010 / Page 12 of 16
`
`APPL-1010 / Page 12 of 16
`
`

`

`d=0.5
`
`d=-0.25 image plane plane
`
`d=0
`
`d= inverse depth
`
`d= projective depth
`
`Figure 2.11 Regular disparity (inverse depth) and projective depth (parallax from a reference plane).
`
`stereo reconstruction algorithms, since it allows us to sweepaseries of planes (Section 11.1.2)
`through space with a variable (projective) sampling that best matches the sensed image mo-
`tions (Collins 1996; Szeliski and Golland 1999; Saito and Kanade 1999),
`
`Mapping from one camera to another
`
`What happens when wetake two images of a 3D scene from different camera positions or
`orientations (Figure 2.12a)? Using the full rank 4 x 4 camera matrix P = KE from (2.64),
`we can write the projection from world to screen coordinates as
`
`Lo ow KoEop = Pop.
`
`Assuming that we know the z-buffer or disparity value do for a pixel in one image, we can
`compute the 3D point location p using
`
`~—1
`p~E,'Ko £0
`
`and then project it into another image yielding
`~
`~ «1.
`43
`=
`=
`-1p-1-
`@~ Ki E\p=K,E\E)'K, to=PiP, %o = Mioixo.
`
`Unfortunately, we do not usually have access to the depth coordinates of pixels in a regular
`photographic image. However, for a planar scene, as discussed above in (2.66), we can
`replace the last row of Po in (2.64) with a general plane equation, fio - p + co that maps
`points on the plane to dp = 0 values (Figure 2.12b). Thus, if we set do = 0, we can ignore
`the last column of M19 in (2.70) andalso its last row, since we do not care aboutthe final
`z-buffer depth. The mapping equation (2.70) thus reduces to
`
`#1 ~ Hioxo,
`
`where H19 is a general 3 x 3 homography matrix and x, and @ are now 2D homogeneous
`coordinates(i.e., 3-vectors) (Szeliski 1996).Thisjustifies the use of the 8-parameter homog-
`raphy as a general alignment model for mosaics of planar scenes (Mann and Picard 1994;

`Szeliski 1996).
`
`APPL-1010 / Page 13 of 16
`
`APPL-1010 / Page 13 of 16
`
`

`

`p=(%Y,Z1)
`
`
`
`(a)
`
`(b)
`
`Figure 2.12 A point is projected into two images: (a) relationship between the 3D point coordinate (X,Y, Z, 1)
`and the 2D projected point (x, y, 1, d); (b) planar homography induced by pointsall lying on a commonplane
`Ao: pt+co =O.
`
`The other special case where we do not need to know depth to perform inter-camera
`mapping is when the camera is undergoing pure rotation (Section 9.1.3), ie., when to = ¢4.
`In this case, we can write
`
`&@, ~ K,R,R)' Koto = Ki RiK> xo,
`
`(2.72)
`
`which again can be represented with a 3 x 3 homography. If we assumethat the calibration
`matrices have knownaspectratios and centers of projection (2.59), this homography can be
`parameterized by the rotation amount and the two unknownfocal lengths. This particular
`formulation is commonly used in image-stitching applications (Section 9.1.3).
`
`Object-centered projection
`
`When working with long focal length lenses, it often becomesdifficult to reliably estimate
`the focal length from image measurements alone. This is because the focal length and the
`distance to the object are highly correlated and it becomesdifficult to tease these two effects
`apart. For example, the change in scale of an object viewed through a zoom telephoto lens
`can either be due to a zoom change or a motion towards the user.
`(This effect was put to
`dramatic use in some of Alfred Hitchcock’s film Vertigo, where the simultaneous change of
`zoom and camera motion producesa disquieting effect.)
`This ambiguity becomesclearer if we write out the projection equation corresponding to
`the simple calibration matrix Kk (2.59),
`
`Te Pr ty
`=OT2,13
`Ts i ptt, 1?
`:
`b
`Ys = pryPty + Cy,
`Tz° pr tz
`
`(2.73)
`(2.74)
`
`
`
`where rz, ry, and r, are the three rows of R. If the distance to the object center t, > ||p||
`(the size of the object), the denominator is approximately t, and the overall scale of the
`projected object dependsontheratio of f to t,. It therefore becomesdifficult to disentangle
`these two quantities.
`
`APPL-1010 / Page 14 of 16
`
`APPL-1010 / Page 14 of 16
`
`

`

`z
`To see this more clearly, let 7, = t;! and s = n,f. We can then re-write the above
`equations as
`
`
`
`Te Dt+tz
`Xs
`*Tinw,-p
`—goPe oo,
`ry Ptty
`u
`“Tine,
`(Szeliski and Kang 1994; Pighin, Hecker, Lischinski et al. 1998). The scale of the projection
`s can be reliably estimated if we are looking at a known object(i.e., the 3D coordinates p
`are known). The inverse distance 7, is now mostly decoupled from the estimates of s and
`can be estimated from the amount of foreshortening as the object rotates. Furthermore, as
`the lens becomeslonger,1.e., the projection model becomesorthographic, there is no need to
`replace a perspective imaging model with an orthographic one, since the same equation can
`be used, with 7, — 0 (as opposed to f and t, both going to infinity). This allows us to form
`a natural link between orthographic reconstruction techniques such as factorization and their
`projective/perspective counterparts (Section 7.3).
`
`7
`
`2.1.6 Lens distortions
`
`The above imaging models all assume that cameras obey a Jinear projection model where
`straight lines in the world result in straight lines in the image.
`(This follows as a natural
`consequenceoflinear matrix operations being applied to homogeneous coordinates.) Unfor-
`tunately, many wide-angle lenses have noticeable radial distortion, which manifests itself as
`a visible curvature in the projection of straight lines. (See Section 2.2.3 for a more detailed
`discussion of lens optics, including chromatic aberration.) Unless this distortion is taken into
`account, it becomes impossible to create highly accurate photorealistic reconstructions. For
`example, image mosaics constructed without taking radial distortion into account will often
`exhibit blurring due to the mis-registration of corresponding features before pixel blending
`(Chapter 9).
`Fortunately, compensating for radial distortion is not that difficult in practice. For most
`lenses, a simple quartic model of distortion can produce goodresults. Let (2,,y-) be the
`pixel coordinates obtained after perspective division but before scaling by focal length f and
`shifting by the optical center (cz, cy), ie.,
`
`To Potts
`Lo. = ——
`Tz: ptt,
`Ty:
`ptt
`Tz: ptt,
`Yo = Pty
`
`The radial distortion model says that coordinates in the observed images are displaced away
`(barrel distortion) or towards (pincushion distortion) the image center by an amount propor-
`tional to their radial distance (Figure 2.13a—b).> The simplest radial distortion models use
`low-order polynomials, e.g.,
`
`Zo = «(1+ Kir? + Kors)
`Ge = yc(l+Kir2 + kore),
`
`3 Anamorphic lenses, which are widely usedin feature film production,do not follow this radial distortion model.
`Instead, they can be thoughtof, to a first approximation, as inducing different vertical and horizontal scalings,i.e.,
`non-square pixels.
`
`
`
`
`
`APPL-1010 / Page 15 of 16
`
`APPL-1010 / Page 15 of 16
`
`

`

`Humansperceive the three-dimensional structure of the world with apparentease. However, despite all of
`the recent advances in computervision research, the dream of having a computerinterpret an image at the
`samelevel as a two-year old remains elusive.Why is computer vision such a challenging problem and whatis
`the current state of the art?
`
`ComputerVision: Algorithms and Applications explores the variety of techniques commonly used to
`PYREANEAOTC MITcigORT LCemLeUOMM(oeotalMAT(CLaTUN)ARLeononicerLog wherevision is being suc-
`cessfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such
`as image editing and stitching, which students can apply to their own personal photos and videos.
`Morethan just a source of“recipes,” this exceptionally authoritative and comprehensive textbook/reference
`also takes a scientific approachto basic vision problems, formulating physical models of the imaging process
`before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical
`models and solved using rigorous engineering techniques.
`
`Topics and Features:
`* Structured to supportactive curricula and project-oriented courses, with tips in the Introduction for using
`the book in a variety of customized courses
`
`* Presents exercises at the end of each chapter with a heavy emphasis ontesting algorithms and containing
`numeroussuggestions for small mid-term projects
`
`topics in the Appendices, which cover
`* Provides additional material and more detailed mathematical
`linear algebra, numerical techniques, and Bayesian estimation theory
`
`* Suggests additional reading at the end of each chapter, including the latest research in each sub-field, in
`addition to a full Bibliography at the end of the book
`
`SMMAUoNUceTmaUSMUaCRome(Le LAL Noete! website,http://szeliski.org/Book/
`
`Suitable for an upper-level undergraduate or graduate-level course in computer science or CleteaXaara
`textbook focuses on basic techniques that work under real-world conditions and encourages students to
`PUA Metetmega)CoM ColtLateTalo eReocttae doodLoo suitable as a unique reference
`to the fundamental techniques and current researchliterature in computer vision.
`Dr. Richard Szeliski has more than 25 years’ experience in computer vision research, most notably at
`Digital Equipment Corporation and Microsoft Research. This text draws on TaayTem >>401-1g(ao ol SOL
`ovenemo MeelESMMCR eTnlae TeenMOLILeisMol ACLSol UL! Stanford.
`
`9°781848°829343 springer.com
`
`ISBN 978-1-84882-934-3
`
`APPL-1010 / Page 16 of 16
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket