throbber
APPL-1010/Page 1 of 16
`Apple Inc. v. Corephotonics
`
`

`

`Texts in Computer Science
`
`Editors
`
`David Gries
`Fred B. Schneider
`
`For further volumes:
`www.springer.c0m/series/3191
`
`APPL-1010 / Page 2 of 16
`
`APPL-1010 / Page 2 of 16
`
`

`

`
`
`Richard Szeliski
`
`Computer Vision
`
`12::
`
`Algorithms and Applications
`
`@ Springer
`
`APPL-1010 / Page 3 of 16
`
`APPL-1010 / Page 3 of 16
`
`

`

`
`
`Dr. Richard Szeliski
`Microsoft Research
`
`One Microsoft Way
`98052—6399 Redmond
`Washington
`USA
`szeliski@micr0soft.com
`
`Series Editors
`David Gries
`
`Department of Computer Science
`Upson Hall
`Cornell University
`Ithaca, NY 14853—7501, USA
`
`Fred B. Schneider
`
`Department of Computer Science
`Upson Hall
`Cornell University
`Ithaca, NY 14853—7501, USA
`
`ISSN 1868—0941
`ISBN 978—1-84882-934—3
`DOI 10.1007/978—1—84882-935-0
`
`e—ISSN 1868-095X
`e—ISBN 978—1—84882—935—0
`
`Springer London Dordrecht Heidelberg New York
`
`British Library Cataloguing in Publication Data
`A catalogue record for this book is available from the British Library
`
`Library of Congress Control Number: 2010936817
`
`© Springer-Verlag London Limited 2011
`Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
`permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
`stored or transmitted,
`in any form or by any means, with the prior permission in writing of the
`publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by
`the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent
`to the publishers.
`The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
`specific statement, that such names are exempt from the relevant laws and regulations and therefore free
`for general use.
`The publisher makes no representation, express or implied, with regard to the accuracy of the information
`contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
`that may be made.
`
`Printed on acid-free paper
`
`Springer is part of Springer Science+Business Media (www.springer.com)
`
`APPL-1010 / Page 4 of 16
`
`APPL-1010 / Page 4 of 16
`
`

`

`easier to express exact rotations. When the angle is in radians, the derivatives of R with
`respect to to can easily be computed (2.36).
`Quaternions, on the other hand, are better if you want to keep track of a smoothly moving
`camera, since there are no discontinuities in the representation. It is also easier to interpolate
`between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994; Bregler
`and Malik 1998).
`
`My usual preference is to use quaternions, but to update their estimates using an incre-
`mental rotation, as described in Section 6.2.2.
`
`2.1.5 3D to 2D projections
`
`Now that we know how to represent 2D and 3D geometric primitives and how to transform
`
`them spatially, we need to specify how 3D primitives are projected onto the image plane. We
`can do this using a linear 3D to 2D projection matrix. The simplest model is orthography,
`
`which requires no division to get the final (inhomogeneous) result. The more commonly used
`model is perspective, since this more accurately models the behavior of real cameras.
`
`Orthography and para-perspective
`
`An orthographic projection simply drops the 2 component of the three-dimensional coordi-
`nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and a: to denote
`
`2D points.) This can be written as
`
`If we are using homogeneous (projective) coordinates, we can write
`
`15,
`
`0 0
`
`1
`
`OOH
`
`Ot—‘O
`
`CDC)
`
`ll
`
`81
`
`i.e., we drop the 2 component but keep the w component. Orthography is an approximate
`model for long focal length (telephoto) lenses and objects whose depth is shallow relative
`to their distance to the camera (Sawhney and Hanson 1991). It is exact only for telecentric
`lenses (Baker and Nayar 1999, 2001).
`
`In practice, world coordinates (which may measure dimensions in meters) need to be
`scaled to fit onto an image sensor (physically measured in millimeters, but ultimately mea—
`
`sured in pixels). For this reason, scaled orthography is actually more commonly used,
`
`a: : [$I2X210]p.
`
`This model is equivalent to first projecting the world points onto a local fronto—parallel image
`
`plane and then scaling this image using regular perspective projection. The scaling can be the
`same for all parts of the scene (Figure 2.7b) or it can be different for objects that are being
`modeled independently (Figure 2.7c). More importantly, the scaling can vary from frame to
`frame when estimating structure from motion, which can better model the scale change that
`occurs as an object approaches the camera.
`Scaled orthography is a popular model for reconstructing the 3D shape of objects far away
`from the camera, since it greatly simplifies certain computations. For example, pose (camera
`
`
`
`APPL-1010 / Page 5 of 16
`
`
`
`APPL-1010 / Page 5 of 16
`
`

`

`(a) 3D View
`
`
`
`
`
`(c) scaled orthography
`
`(e) perspective
`
`(f) object-centered
`
`Figure 2.7 Commonly used projection models: (a) 3D View of world, (b) orthography, (c) scaled orthography,
`(d) para-perspective, (e) perspective, (0 object-centered. Each diagram shows a top—down View of the projection.
`Note how parallel lines on the ground plane and box sides remain parallel in the non-perspective projections.
`
`APPL-1010 / Page 6 of 16
`
`APPL-1010 / Page 6 of 16
`
`

`

`orientation) can be estimated using simple least squares (Section 6.2.1). Under orthography,
`structure and motion can simultaneously be estimated using factorization (singular value de-
`
`composition), as discussed in Section 7.3 (Tomasi and Kanade 1992).
`A closely related projection model is para—perspective (Aloimonos 1990; Poelman and
`Kanade 1997).
`In this model, object points are again first projected onto a local reference
`
`parallel to the image plane. However, rather than being projected orthogonally to this plane,
`they are projected parallel to the line of sight to the object center (Figure 2.7d). This is
`followed by the usual projection onto the final image plane, which again amounts to a scaling.
`The combination of these two projections is therefore afi‘ine and can be written as
`
`53:
`
`a00
`
`a01
`
`(102
`
`(110 an 0.12
`0
`0
`0
`
`(Log
`
`(L13
`1
`
`5-
`
`Note how parallel lines in 3D remain parallel after projection in Figure 2.7b—d. Para—perspective
`provides a more accurate projection model than scaled orthography, without incurring the
`added complexity of per-pixel perspective division, which invalidates traditional factoriza—
`tion methods (Poelman and Kanade 1997).
`
`Perspective
`
`The most commonly used projection in computer graphics and computer vision is true 3D
`
`perspective (Figure 2.7e). Here, points are projected onto the image plane by dividing them
`by their 2 component. Using inhomogeneous coordinates, this can be written as
`
`(327320)):
`
`ac/z
`y/z
`1
`
`.
`
`In homogeneous coordinates, the projection has a simple linear form,
`
`15,
`
`ooo
`
`0 0 1
`
`OOH
`
`Ol—‘O
`
`II
`
`H:
`
`i.e., we drop the in component of p. Thus, after projection, it is not possible to recover the
`distance of the 3D point from the image, which makes sense for a 2D imaging sensor.
`A form often seen in computer graphics systems is a two-step projection that first projects
`3D coordinates into normalized device coordinates in the range (as, y, z) E [—1, —1] X
`
`[,1, 1] x [0, 1], and then rescales these coordinates to integer pixel coordinates using a view—
`port transformation (Watt 1995; OpenGL—ARB 1997).
`The (initial) perspective projection
`is then represented using a 4 x 4 matrix
`
`-
`as =
`
`1
`
`0
`0
`0
`
`0
`
`1
`0
`0
`
`0
`
`0
`
`0
`_Zfar/zrange
`1
`
`0
`Znearzfar/Zrange
`0
`
`~
`p
`
`,
`
`where znear and Zfar are the near and far 2 clipping planes and zrange = Zfar — znear. Note
`that the first two rows are actually scaled by the focal length and the aspect ratio so that
`
`APPL-1010 / Page 7 of 16
`
`
`
`
`
`APPL-1010 / Page 7 of 16
`
`

`

`
`
`Figure 2.8 Projection of a 3D camera—centered point pC onto the sensor planes at location 19. 0C is the camera
`center (nodal point), cs is the 3D origin of the sensor plane coordinate system, and 5,, and 59 are the pixel spacings.
`
`visible rays are mapped to (as, y, z) E [—1, —1]2. The reason for keeping the third row, rather
`than dropping it, is that Visibility operations, such as z-buflering, require a depth for every
`graphical element that is being rendered.
`
`If we set 2near = 1, Zfar —> co, and switch the sign of the third row, the third element
`
`of the normalized screen vector becomes the inverse depth, i.e., the disparity (Okutomi and
`Kanade 1993). This can be quite convenient in many cases since, for cameras moving around
`outdoors, the inverse depth to the camera is often a more well-conditioned parameterization
`than direct 3D distance.
`
`While a regular 2D image sensor has no way of measuring distance to a surface point,
`
`range sensors (Section 12.2) and stereo matching algorithms (Chapter 11) can compute such
`values. It is then convenient to be able to map from a sensor—based depth or disparity value d
`
`directly back to a 3D location using the inverse of a 4 X 4 matrix (Section 2.1.5). We can do
`this if we represent perspective projection using a full—rank 4 x 4 matrix, as in (2.64).
`
`Camera intrinsics
`
`Once we have projected a 3D point through an ideal pinhole using a projection matrix, we
`must still transform the resulting coordinates according to the pixel sensor spacing and the
`
`relative position of the sensor plane to the origin. Figure 2.8 shows an illustration of the
`geometry involved. In this section, we first present a mapping from 2D pixel coordinates to
`3D rays using a sensor homography M 5, since this is easier to explain in terms of physically
`measurable quantities. We then relate these quantities to the more commonly used camera in—
`trinsic matrix K, which is used to map 3D camera—centered points 19C to 2D pixel coordinates
`5,.
`
`Image sensors return pixel values indexed by integer pixel coordinates (903,313), often
`with the coordinates starting at the upper-left corner of the image and moving down and to
`
`(This convention is not obeyed by all imaging libraries, but the adjustment for
`the right.
`other coordinate systems is straightforward.) To map pixel centers to 3D coordinates, we first
`
`scale the (905, y,) values by the pixel spacings (sz, 5,) (sometimes expressed in microns for
`solid-state sensors) and then describe the orientation of the sensor array relative to the camera
`
`projection center 00 with an origin cs and a 3D rotation Rs (Figure 2.8).
`
`APPL-1010 / Page 8 of 16
`
`APPL-1010 / Page 8 of 16
`
`

`

`The combined 2D to 3D projection can then be written as
`
`p: [ Rslcs]
`
`Sz
`0
`0
`0
`
`0
`s
`0.,
`0
`
`0
`O
`0
`1
`
`a:
`S
`ys
`1
`
`_
`=Msms.
`
`The first two columns of the 3 X 3 matrix M s are the 3D vectors corresponding to unit steps
`in the image pixel array along the ms and ys directions, while the third column is the 3D
`image array origin cs.
`
`the three parameters describing
`The matrix M S is parameterized by eight unknowns:
`the rotation R5, the three parameters describing the translation cs, and the two scale factors
`
`(SI, 5y). Note that we ignore here the possibility of skew between the two axes on the image
`plane, since solid—state manufacturing techniques render this negligible. In practice, unless
`we have accurate external knowledge of the sensor spacing or sensor orientation, there are
`
`only seven degrees of freedom, since the distance of the sensor from the origin cannot be
`teased apart from the sensor spacing, based on external image measurement alone.
`
`However, estimating a camera model M S with the required seven degrees of freedom
`
`(i.e., where the first two columns are orthogonal after an appropriate re—scaling) is impractical,
`so most practitioners assume a general 3 x 3 homogeneous matrix form.
`
`The relationship between the 3D pixel center 1) and the 3D camera—centered point pc is
`given by an unknown scaling s, p : 3pc. We can therefore write the complete projection
`between pc and a homogeneous version of the pixel address 5:5 as
`
`a, : aMs—lpc = Kpc.
`
`The 3 X 3 matrix K is called the calibration matrix and describes the camera intrinsics (as
`opposed to the camera’s orientation in space, which are called the extrinsics).
`
`From the above discussion, we see that K has seven degrees of freedom in theory and
`eight degrees of freedom (the full dimensionality of a 3 X 3 homogeneous matrix) in practice.
`Why, then, do most textbooks on 3D computer vision and multi-view geometry (Faugeras
`1993; Hartley and Zisserman 2004; Faugeras and Luong 2001) treat K as an upper-triangular
`matrix with five degrees of freedom?
`
`While this is usually not made explicit in these books, it is because we cannot recover
`
`the full K matrix based on external measurement alone. When calibrating a camera (Chap-
`ter 6) based on external 3D points or other measurements (Tsai 1987), we end up estimating
`the intrinsic (K) and extrinsic (R, 1;) camera parameters simultaneously using a series of
`measurements,
`
`where pw are known 3D world coordinates and
`
`iS:K[R‘t]pw:Ppw,
`
`P : K[R|t]
`
`is known as the camera matrix. Inspecting this equation, we see that we can post—multiply
`K by R1 and pre-multiply [R|t] by RT, and still end up with a valid calibration. Thus, it
`is impossible based on image measurements alone to know the true orientation of the sensor
`and the true camera intrinsics.
`
`
`
`APPL-1010 / Page 9 of 16
`
`APPL-1010 / Page 9 of 16
`
`

`

`
`
`ys
`
`Figure 2.9 Simplified camera intrinsics showing the focal length f and the optical center (cm, cg). The image
`width and height are W and H.
`
`The choice of an upper-triangular form for K seems to be conventional. Given a full
`
`3 x 4 camera matrix P = K [R|t], we can compute an upper—triangular K matrix using QR
`factorization (Golub and Van Loan 1996). (Note the unfortunate clash of terminologies: In
`
`matrix algebra textbooks, R represents an upper—triangular (right of the diagonal) matrix; in
`computer vision, R is an orthogonal rotation.)
`There are several ways to write the upper—triangular form of K. One possibility is
`
`K:
`
`fm
`
`0
`0
`
`5
`
`f,
`0
`
`cos
`
`cy
`l
`
`,
`
`(2.57)
`
`which uses independent focal lengths fm and fy for the sensor a: and y dimensions. The entry
`3 encodes any possible skew between the sensor axes due to the sensor not being mounted
`perpendicular to the optical axis and (ex, Cy) denotes the optical center expressed in pixel
`coordinates. Another possibility is
`
`K =
`
`f
`
`s
`
`O af
`0
`0
`
`cm
`
`Cy
`1
`
`,
`
`(2.58)
`
`where the aspect ratio a has been made explicit and a common focal length f is used.
`
`In practice, for many applications an even simpler form can be obtained by setting a = 1
`and s : 0,
`
`K:
`
`f
`
`0
`0
`
`0
`
`f
`0
`
`can
`
`c,,
`1
`
`.
`
`(2.59)
`
`Often, setting the origin at roughly the center of the image, e.g., (cx, 0,) = (W/ 2, H/ 2),
`where W and H are the image height and width, can result in a perfectly usable camera
`model with a single unknown, i.e., the focal length f.
`
`Figure 2.9 shows how these quantities can be visualized as part of a simplified imaging
`model. Note that now we have placed the image plane in front of the nodal point (projection
`center of the lens). The sense of the y axis has also been flipped to get a coordinate system
`
`compatible with the way that most imaging libraries treat the vertical (row) coordinate. Cer—
`tain graphics libraries, such as Direct3D, use a left-handed coordinate system, which can lead
`to some confusion.
`
`APPL-1010 / Page 10 of 16
`
`
`
`APPL-1010 / Page 10 of 16
`
`

`

`
`
`Figure 2.10 Central projection, showing the relationship between the 3D and 2D coordinates, p and :c, as well
`as the relationship between the focal length f, image width W, and the field of view 6.
`
`A note on focal lengths
`
`The issue of how to express focal lengths is one that often causes confusion in implementing
`
`computer vision algorithms and discussing their results. This is because the focal length
`depends on the units used to measure pixels.
`If we number pixel coordinates using integer values, say [0, W) X [0, H), the focal length
`f and camera center (cm, cy) in (2.59) can be expressed as pixel values. How do these quan-
`tities relate to the more familiar focal lengths used by photographers?
`
`Figure 2.10 illustrates the relationship between the focal length f, the sensor width W,
`and the field of View 6, which obey the formula
`
`(9
`6 W
`tan— 2 — or f: 3 [tan-]
`2
`2f
`2
`
`W
`
`’1
`
`.
`
`For conventional film cameras, W 2 35mm, and hence f is also expressed in millimeters.
`Since we work with digital images, it is more convenient to express W in pixels so that the
`
`focal length f can be used directly in the calibration matrix K as in (2.59).
`
`Another possibility is to scale the pixel coordinates so that they go from [—1, 1) along
`the longer image dimension and [—a_1,a_1) along the shorter axis, where a Z 1 is the
`image aspect ratio (as opposed to the sensor cell aspect ratio introduced earlier). This can be
`accomplished using modified normalized device coordinates,
`
`S
`m’ = (2$3 e W)/S and y; = (2ys — H)/S, where S : maXU/V, H).
`
`This has the advantage that the focal length f and optical center (cm, 0,) become independent
`of the image resolution, which can be useful when using multi—resolution, image—processing
`algorithms, such as image pyramids (Section 3.5).2 The use of S instead of W also makes the
`focal length the same for landscape (horizontal) and portrait (vertical) pictures, as is the case
`in 35mm photography. (In some computer graphics textbooks and systems, normalized device
`
`coordinates go from [—1, 1] x [~1, l], which requires the use of two different focal lengths
`to describe the camera intrinsics (Watt 1995; OpenGL-ARB 1997).) Setting 5' = W = 2 in
`(2.60), we obtain the simpler (unitless) relationship
`
`
`
`6
`_
`f 1 = tan —.
`2
`
`2 To make the conversion truly accurate after a downsampling step in a pyramid, floating point values of W and
`H would have to be maintained since they can become non-integral if they are ever odd at a larger resolution in the
`\
`pyramid.
`
`APPL-1010 / Page 11 of 16
`
`APPL-1010 / Page 11 of 16
`
`

`

`The conversion between the various focal length representations is straightforward, e.g.,
`
`to go from a unitless f to one expressed in pixels, multiply by W/ 2, while to convert from an
`f expressed in pixels to the equivalent 35mm focal length, multiply by 35/ Wt
`
`Camera matrix
`
`Now that we have shown how to parameterize the calibration matrix K, we can put the
`camera intrinsics and extrinsics together to obtain a single 3 X 4 camera matrix
`
`P:K[R[t].
`
`(2.63)
`
`It is sometimes preferable to use an invertible 4 X 4 matrix, which can be obtained by not
`dropping the last row in the P matrix,
`
`~
`
`K 0
`
`R t
`
`P_[0T liiOT 1]:KE,
`
`~
`
`(2.64)
`
`where E is a 3D rigid—body (Euclidean) transformation and K is the full-rank calibration
`matrix. The 4 x 4 camera matrix P can be used to map directly from 3D world coordinates
`
`pm 2 (30“,, gm, zw, 1) to screen coordinates (plus disparity), m3 : (ms, 3),, 1, d),
`
`m, N 1511,,
`
`(2.65)
`
`where N indicates equality up to scale. Note that after multiplication by P, the vector is
`
`divided by the third element of the vector to obtain the normalized form :03 : (ms, 3),, 1, d).
`
`Plane plus parallax (projective depth)
`
`In general, when using the 4 X 4 matrix 15, we have the freedom to remap the last row to
`whatever suits our purpose (rather than just being the “standard” interpretation of disparity as
`inverse depth). Let us re—write the last row of 15 as p3 = 33[fi0|c0], where “7‘10“ 2 1. We
`then have the equation
`
`d = —(’fL0 ' pm + Co),
`
`(2.66)
`
`where z 2 p2 - 5w : rz - (pm 4 c) is the distance of pm from the camera center 0 (2.25)
`along the optical axis Z (Figure 2.11). Thus, we can interpret d as the projective disparity
`
`or projective depth of a 3D scene point pm from the reference plane fig - pw + co : 0
`(Szeliski and Coughlan 1997; Szeliski and Golland 1999; Shade, Gortler, He et al. 1998;
`Baker, Szeliski, and Anandan 1998). (The projective depth is also sometimes called parallax
`in reconstruction algorithms that use the term plane plus parallax (Kumar, Anandan, and
`Hanna 1994; Sawhney 1994).) Setting 710 : 0 and c0 = 1, i.e., putting the reference plane
`
`at infinity, results in the more standard d = 1/2 version of disparity (Okutomi and Kanade
`1993).
`Another way to see this is to invert the 13 matrix so that we can map pixels plus disparity
`directly back to 3D points,
`
`15,.) = P :05.
`
`(2.67)
`
`In general, we can choose 13 to have whatever form is convenient, i.e., to sample space us-
`ing an arbitrary projection. This can come in particularly handy when setting up multi—View
`
`‘zg
`
`APPL-1010 / Page 12 of 16
`
`APPL-1010 / Page 12 of 16
`
`

`

`
`
`d = inverse depth
`
`d=0.5
`
`d=0
`
`d=-0.25
`
` image plane
`
`1p ane
`d = projective depth
`
`Figure 2.11 Regular disparity (inverse depth) and projective depth (parallax from a reference plane).
`
`stereo reconstruction algorithms, since it allows us to sweep a series of planes (Section 11.1.2)
`
`through space with a variable (projective) sampling that best matches the sensed image mo—
`tions (Collins 1996; Szeliski and Golland 1999; Saito and Kanade 1999).
`
`Mapping from one camera to another
`
`What happens when we take two images of a 3D scene from different camera positions or
`orientations (Figure 2.12a)? Using the full rank 4 X 4 camera matrix P = KE from (2.64),
`we can write the projection from world to screen coordinates as
`
`i‘o N KoEop : 130p.
`
`Assuming that we know the z-buffer or disparity value do for a pixel in one image, we can
`compute the 3D point location p using
`
`p N 193112,; 1:20
`
`and then project it into another image yielding
`-
`~—1-
`~
`-
`~
`~
`_ ~ —1~
`:111 ~ KlElp = K1131E0 1KO 330 : P1PO x0 = M10330.
`
`Unfortunately, we do not usually have access to the depth coordinates of pixels in a regular
`photographic image. However, for a planar scene, as discussed above in (2.66), we can
`
`replace the last row of P0 in (2.64) with a general plane equation, fig - p + co that maps
`points on the plane to do : 0 values (Figure 2.12b). Thus, if we set do = O, we can ignore
`the last column of M10 in (2.70) and also its last row, since we do not care about the final
`
`z-buffer depth. The mapping equation (2.70) thus reduces to
`
`521 ~ Ernie,
`
`where £110 is a general 3 X 3 homography matrix and 51:1 and £0 are now 2D homogeneous
`coordinates (i.e., 3—vectors) (Szeliski 1996).This justifies the use of the 8-parameter homog-
`raphy as a general alignment model for mosaics of planar scenes (Mann and Picard 1994;
`~
`Szeliski 1996).
`
`APPL-1010 / Page 13 of 16
`
`APPL-1010 / Page 13 of 16
`
`

`

`in = (X,Y,Z,1)
`
`
`
`
`
`(a)
`
`(b)
`
`Figure 2.12 A point is projected into two images: (a) relationship between the 3D point coordinate (X, Y, Z, 1)
`and the 2D projected point (x, y, 1, d); (b) planar homography induced by points all lying on a common plane
`’flo'p-I—COZO.
`
`The other special case where we do not need to know depth to perform inter-camera
`mapping is when the camera is undergoing pure rotation (Section 9.1.3), i.e., when to = t1.
`In this case, we can write
`
`521 N KlRlelKglcfro : KlRlngliO,
`
`(2.72)
`
`which again can be represented with a 3 x 3 homography. If we assume that the calibration
`matrices have known aspect ratios and centers of projection (2.59), this homography can be
`
`parameterized by the rotation amount and the two unknown focal lengths. This particular
`formulation is commonly used in image—stitching applications (Section 9.1.3).
`
`Object-centered projection
`
`When working with long focal length lenses, it often becomes difficult to reliably estimate
`the focal length from image measurements alone. This is because the focal length and the
`distance to the object are highly correlated and it becomes difficult to tease these two effects
`apart. For example, the change in scale of an object viewed through a zoom telephoto lens
`can either be due to a zoom change or a motion towards the user.
`(This effect was put to
`dramatic use in some of Alfred Hitchcock’s film Vertigo, where the simultaneous change of
`zoom and camera motion produces a disquieting effect.)
`
`This ambiguity becomes clearer if we write out the projection equation corresponding to
`
`the simple calibration matrix K (2.59),
`
`Tm 'P“ to:
`: _
`sz.p+tz+cx
`$5
`-
`t
`ys : er +031,
`7‘2 'P“ tz
`
`
`
`2.73
`
`)
`(
`(2.74)
`
`where rm, Ty, and rz are the three rows of R. If the distance to the object center tz >> Ilpll
`(the size of the object), the denominator is approximately tz and the overall scale of the
`projected object depends on the ratio of f to t2. It therefore becomes difficult to disentangle
`these two quantities.
`
`APPL-1010 / Page 14 of 16
`
`APPL-1010 / Page 14 of 16
`
`

`

`Z
`To see this more clearly, let 772 = t‘1 and s = 772 f. We can then re—write the above
`equations as
`
`$3
`
`S I
`
`
`
`Tac'p"tm
`: — m
`51+nsz-p+c
`ry'p"ty
`81+77z712'p+cy
`y
`(Szeliski and Kang 1994; Pighin, Hecker, Lischinski et al. 1998). The scale of the projection
`5 can be reliably estimated if we are looking at a known object (i.e., the 3D coordinates p
`are known). The inverse distance 772 is now mostly decoupled from the estimates of s and
`can be estimated from the amount of foreshortening as the object rotates. Furthermore, as
`the lens becomes longer, i.e., the projection model becomes orthographic, there is no need to
`replace a perspective imaging model with an orthographic one, since the same equation can
`be used, with 77;, —> 0 (as opposed to f and 75,, both going to infinity). This allows us to form
`a natural link between orthographic reconstruction techniques such as factorization and their
`projective/perspective counterparts (Section 7.3).
`
`2.1.6 Lens distortions
`
`The above imaging models all assume that cameras obey a linear projection model where
`straight lines in the world result in straight lines in the image.
`(This follows as a natural
`consequence of linear matrix operations being applied to homogeneous coordinates.) Unfor—
`tunately, many wide—angle lenses have noticeable radial distortion, which manifests itself as
`a visible curvature in the projection of straight lines. (See Section 2.2.3 for a more detailed
`discussion of lens optics, including chromatic aberration.) Unless this distortion is taken into
`
`account, it becomes impossible to create highly accurate photorealistic reconstructions. For
`example, image mosaics constructed without taking radial distortion into account will often
`exhibit blurring due to the mis—registration of corresponding features before pixel blending
`(Chapter 9).
`
`Fortunately, compensating for radial distortion is not that difficult in practice. For most
`
`lenses, a simple quartic model of distortion can produce good results. Let (are, ya) be the
`pixel coordinates obtained after perspective division but before scaling by focal length f and
`
`shifting by the optical center (or, cy), i.e.,
`
`:EC :
`
`'r‘x p+tx
`
`r2 p+tz
`r -
`——t
`Tz'p__tz
`ya 2 La,
`
`The radial distortion model says that coordinates in the observed images are displaced away
`(barrel distortion) or towards (pincushion distortion) the image center by an amount propor-
`tional to their radial distance (Figure 2.13a—b).3 The simplest radial distortion models use
`low—order polynomials, e. g.,
`
`
`
`506 : xc(1 + H17“? —— mgr?)
`
`a = yctl + w? —— wt), (2.78)
`3 Anamorphic lenses, which are widely used in feature film production, do not follow this radial distortion model.
`Instead, they can be thought of, to a first approximation, as inducing different vertical and horizontal scalings, i.e.,
`non—square pixels.
`
`APPL-1010 / Page 15 of 16
`
`APPL-1010 / Page 15 of 16
`
`

`

`Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of
`the recent advances in computer vision research, the dream of having a computer interpret an image at the
`same level as a two-year old remains elusive.Why is computer vision such a challenging problem and what is
`the current state of the art?
`
`Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to
`analyze and interpret images. It also describes challenging real-world applications where vision is being suc-
`cessfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such
`as image editing and stitching, which students can apply to their own personal photos and videos.
`
`More than just a source of “recipes," this exceptionally authoritative and comprehensive textbook/reference
`also takes a scientific approach to basic vision problems, formulating physical models of the imaging process
`before inverting them to produce descriptions of a scene.These problems are also analyzed using statistical
`models and solved using rigorous engineering techniques.
`
`Topics and Features:
`
`~ Structured to support active curricula and project-oriented courses, with tips in the Introduction for using
`the book in a variety of customized courses
`
`- Presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing
`numerous suggestions for small mid-term projects
`
`topics in the Appendices, which cover
`' Provides additional material and more detailed mathematical
`linear algebra, numerical techniques, and Bayesian estimation theory
`
`- Suggests additional reading at the end of each chapter, including the latest research in each sub-field, in
`addition to a full Bibliography at the end of the book
`
`- Supplies supplementary course material for students at the associated website, http://szeliski.org/Book/
`
`Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this
`textbook focuses on basic techniques that work under real-world conditions and encourages students to
`push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference
`to the fundamental techniques and current research literature in computer vision.
`
`Dr. Richard Szeliski has more than 25 years‘ experience in computer vision research, most notably at
`Digital Equipment Corporation and Microsoft Research.This text draws on that experience, as well as on
`computer vision courses he has taught at the University ofWashington and Stanford.
`
`ISBN 978—1—84882—934—3
`
`9 781848 829343
`
`springemom
`
`APPL-1010 / Page 16 of 16
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket