`Apple Inc. v. Corephotonics
`
`
`
`Texts in Computer Science
`
`Editors
`David Gries
`Fred B. Schneider
`
`For further volumes:
`www.springer.com/series/3 191]
`
`APPL-1012 / Page 2 of 211
`APPL-1012 / Page 2 of 211
`
`
`
`
`
`Richard Szeliski
`
`Computer Vision
`
`Algorithms and Applications
`
`G) Springer
`
`APPL-1012 / Page 3 of 211
`APPL-1012 / Page 3 of 211
`
`
`
`
`
`Dr. Richard Szeliski
`Microsoft Research
`One Microsoft Way
`98052-6399 Redmond
`Washington
`USA
`szeliski @microsoft.com
`
`Series Editors
`David Gres
`Department of Computer Science
`Upson Hall
`Comell University
`Ithaca, NY 14853-7501, USA
`
`Fred B. Schneider
`Department of Computer Science
`Upson Hall
`Comell University
`lihaca, NY 14853-7501, USA
`
`ISSN 1868-0941
`ISBN 978-1-84882-934-3
`DOT 10, 1007/978-1-84882-935-0
`Springer London Dordrecht Heidelberg New York
`
`e-ISSN 1868-095X
`e-ISBN 978-1-84882-935-0
`
`British Library Cataloguing in Publication Data
`A catalogue record for this book is available fromthe British Library
`
`Library of Congress Control Number: 20108368 17
`
`© Springer-Verlag London Limited 2011
`Apart from any fair dealing for the purposes of research or private study, of criticism of review, as
`permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
`stored or
`transmitted,
`in any form or by any means, with the prior permission in writing of
`the
`publishers, or in the case of reprographic reproduction in accordance withthe terms of licenses issued by
`the Copynght Licensing Agency. Enquiries conceming reproduction outside those terms should be sent
`to the publishers.
`The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
`specific statement, that such names are exempt fromthe relevant laws and regulations and therefore free
`for general use.
`The publisher makes no representation, express or implied, with regard to the accuracy ofthe information
`contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
`that may be made.
`
`Printed on acid-free paper
`
`Springer is part of Springer Science+Business Media (www.springer.com)
`
`APPL-1012 / Page 4 of 211
`APPL-1012 / Page 4 of 211
`
`
`
`Chapter 2
`
`Image formation
`
`2.1 Geometric primitives and transformations .. 2.1.6. ee a as cigs ganna
`2.1.1 Geometric primitives... 2. ee es Manone RN
`21.2
`2D transformations... 2. ae ee
`21S SD tranaiirmatigng; 32 eee eee aaa ee a
`21.4
`SD rotations. eee ce ee ee TELE
`2.15.
`3Dto2D projections . 2... ee ee ees ease ees ayaa
`2G:
`Tans distortions aceasta ta eee ele ee eas A aaa
`22 Photormetricimage formation.
`. 6... ee
`22.1 Lighting... aces eee ee ee Fee wee
`2.2.2 Reflectance and shading ... 2... 66. ees te eee
`Dei:
`ROHER cay sag git ela ergata etieee £2 SS Ree eral ee
`23 The digital camera... 0.004. 2g Re ae ea ta
`2.3.1
`Sampling and aliasing. 2.0.6 ee ee ke es
`Se EUG cnt
`na:
`cersses ese caccy!
`tes
`sisecien
`erpernecee erage nes
`ice See ere
`23.3. Compression ... 2.6 ce ees CREST TRL RAR ee
`24 Additional reading ........ Sgt erty SEER og BERT ECE eae
`25:
`tbnsinee: 5 re Sse pa add a Se ata es at a WEA
`
`23
`29
`33
`36
`
`aT
`42
`52
`54
`a4
`
`61
`
`KR. Szeliski, Computer Vision: Algorithms andApplications, Texts in Computer Science,
`DOT 10,1007/978-1-84882-945-0_2, © Springer-Verlag London Limited 2011
`
`iT
`
`APPL-1012 / Page 5 of 211
`APPL-1012 / Page 5 of 211
`
`
`
`28
`
`2 Image formation
`
`
`
`Figure 2.1 A few components of the image formation process: (a) perspective projection; (b)light scattering
`when hitting a surface; (c) lens optics; (d) Bayer color filter array.
`
`APPL-1012 / Page 6 of 211
`APPL-1012 / Page 6 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`29
`
`Before we can intelligently analyze and manipulate images, we need to establish a vocabulary
`for describing the geometry of a scene. We also need to understand the image formation
`process that produced a particular image given a set of lighting conditions, scene geometry,
`surface properties, and camera optics, In this chapter, we present a simplified model of such
`an image formation process.
`.
`Section 2.1 introduces the basic geometric primitives used throughout the book (points,
`lines, and planes) and the geometric transformations that project these 3D quantities into 2D
`image features (Figure 2.1a). Section 2.2 describes how lighting, surface properties (Fig-
`ure 2.1b), and camera optics (Figure 2.1c) interact in order to produce the color values that
`fall onto the image sensor, Section 2.3 describes how continuous color images are turned into
`discrete digital samples inside the image sensor (Figure 2.1d) and how to avoid (or at least
`characterize) sampling deficiencies, such as aliasing.
`The material covered in this chapter is but a brief summary of a very rich and deepset of
`topics, traditionally covered in a number of separate fields. A more thorough introduction to
`the geometry of points, lines, planes, and projections can be found in textbooks on multi-view
`geometry (Hartley and Zisserman 2004; Faugeras and Luong 2001) and computer graphics
`(Foley, van Dam, Feiner et al. 1995). The image formation (synthesis) process is traditionally
`taught as part of a computer graphics curriculum (Foley, van Dam, Feiner et al. 1995; Glass-
`ner 1995; Watt 1995; Shirley 2005) but it is also studied in physics-based computer vision
`(Wolff, Shafer, and Healey 1992a), The behavior of camera lens systems is studied in optics
`(Miler 1988; Hecht 2001; Ray 2002), Two good books on color theory are (Wyszecki and
`Stiles 2000; Healey and Shafer 1992), with (Livingstone 2008) providing a more fun and in-
`formal introduction to the topic of color perception. Topics relating to sampling and aliasing
`are covered in textbooks on signal and image processing (Crane 1997; Jéhne 1997; Oppen-
`heim and Schafer 1996; Oppenheim, Schafer, and Buck 1999; Pratt 2007; Russ 2007; Burger
`and Burge 2008; Gonzales and Woods 2008).
`A note to students: If you have already studied computer graphics, you may wantto
`skim the material in Section 2.1, although the sections on projective depth and object-centered
`projection near the end of Section 2.1.5 may be new to you, Similarly, physics students (as
`well as computer graphics students) will mostly be familiar with Section 2.2. Finally, students
`with a good background in image processing will already be familiar with sampling issues
`(Section 2.3) as well as some of the material in Chapter3.
`
`2.1 Geometric primitivesand transformations
`
`In this section, we introduce the basic 2D and 3D primitives used in this textbook, namely
`points, lines, and planes. We also describe how 3D features are projected into 2D features,
`More detailed descriptions of these topics (along with a gentler and more intuitive introduc-
`tion) can be found in textbooks on multiple-view geometry (Hartley and Zisserman 2004;
`Faugeras and Luong 2001),
`
`2.1.1 Geometric primitives
`
`Geometric primitives form the basic building blocks used to describe three-dimensional shapes.
`In this section, we introduce points, lines, and planes. Later sections of the book discuss
`
`APPL-1012 / Page 7 of 211
`APPL-1012 / Page 7 of 211
`
`
`
`30
`
`a)
`
`2 Image formation
`
` (b)
`
`Figure 2.2 (a) 2D line equation and (b) 3D plane equation, expressed in terms of the normal # and distance to
`the origin a.
`
`curves (Sections 5.1 and 11.2), surfaces (Section 12.3), and volumes (Section 12.5).
`
`2D points. 2D points (pixel coordinates in an image) can be denoted using a pair of values,
`x = (x,y) € R?, or alternatively,
`
`= | *
`
`y
`
`:
`
`(2.1)
`
`(As stated in the introduction, we use the (2:1, a2, ...) notation to denote column vectors.)
`2D points can also be represented using homogeneous coordinates, % = (#, ij, 1b) © P?,
`where vectors that differ only by scale are considered to be equivalent, P* = ‘R* — (0,0, 0)
`is called the 2D projective space.
`A homogeneous vector @ can be converted back into an inhomogeneous vector 2 by
`dividing through by the last element t, i.c.,
`
`z= (2,9, 0) = w(z,y, 1) = we,
`
`(2.2)
`
`where & = (xr, y, 1) is the augmented vector. Homogeneous points whoselast element is w =
`O are called ideal points or points at infinity and do not have an equivalent inhomogeneous
`representation.
`
`2Dlines can also be represented using homogeneous coordinates f = (a, b,c).
`2D lines.
`The corresponding line equation is
`
`#-l=ar+by+ec=0.
`
`(2.3)
`
`Wecan normalize the line equation vector so that ! = (fi,, iy, d) = (2, d) with ||7|) = 1. In
`this case, ft is the normal vector perpendicular to the line and ¢ is its distance to the origin
`(Figure 2.2),
`(The one exception to this normalization is the line at infinity i= (0,0, 1),
`which includes all (ideal) points at infinity.)
`We can also express fi as a function of rotation angle 6, A = (Az, iy) = (cosé,sin#)
`(Figure 2.2a). This representation is commonly used in the Hough transform line-finding
`algorithm, which is discussed in Section 4.3.2. The combination (@,d) is also known as
`polar coontinates.
`When using homogeneous coordinates, we can compute the intersection of two lines as
`
`#=1, xh,
`
`(2.4)
`
`APPL-1012 / Page 8 of 211
`APPL-1012 / Page 8 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`31
`
`where x is the cross product operator. Similarly, the line joining two points can be written as
`
`i= & x Bo.
`
`(2.5)
`
`When trying to fit an intersection point to multiple lines or, conversely, a line to multiple
`points, least squares techniques (Section 6.1.1 and Appendix A.2) can be used, as discusstd
`in Exercise 2.1,
`
`2D conics. There are other algebraic curves that can be expressed with simple polynomial
`homogeneous equations. For example, the conic sections (so called because they arise as the
`intersection of a plane and a 3D cone) can be written using a quadric equation
`
`£' Qz =0.
`
`(2.6)
`
`Quadric equations play useful roles in the study of multi-view geometry and camera calibra-
`tion (Hartley and Zisserman 2004; Faugeras and Luong 2001) but are not used extensively in
`this book,
`
`3D points. Point coordinates in three dimensions can be written using inhomogeneous co-
`ordinates x = (x,y,z) € R* or homogeneous coordinates = (Z,j, 2,1) € P®. As before,
`itis sometimes useful to denote a 3D point using the augmented vector & = (2, y, 2,1) with
`= we.
`
`3D planes can also be represented as homogencous coordinates 7m = (a, b,c, d)
`3D planes.
`with a corresponding plane equation
`.
`
`m= ar+ by -+cor+d—0,
`
`(2.7)
`
`We can also normalize the plane equation as m = (fiz, iy, iz, d) = (vt, d) with ||f|] = 1.
`Tn this case, 71 1s the normal vector perpendicular to the plane and d is its distance to the
`origin (Figure 2.2b). As with the case of 2D lines, the plane atinfinity ym = (0,0,0,1),
`which contains all the points at infinity, cannot be normalized (i.c., it does not have a unique
`normal or a finite distance),
`Wecan express 7 as a function of two angles (0, 4),
`
`ft = (cos é cos @, sin d cos @, sin d),
`
`(2.8)
`
`Le. using spherical coordinates, but these are less commonly used than polar coordinates
`since they do not uniformly sample the space of possible normal vectors,
`
`3D lines. Lines in 3D are less elegant than either lines in 2D or planes in 3D. One possible
`representation is to use two points on the line, (p,q). Any other point on the line can be
`expressed as a linear combination of these two points
`
`-
`
`r= (1—A)p+Aq,
`
`(2.9)
`
`as shown in Figure 2.3. If we restrict 0 < A < 1, we get the line segment joining p and q.
`
`APPL-1012 / Page 9 of 211
`APPL-1012 / Page 9 of 211
`
`
`
`32
`
`2 Image formation
`
`
`
`Figure 2.3 3D line equation, r = (1 — A)p+ Ag.
`
`If we use homogeneous coordinates, we can write the line as
`
`r=pp+ Ag.
`
`(2.10)
`
`A special case of this is whenthe secondpointis at infinity, i.e. @ = (dz, dy,dz,0) = (d,0).
`Here, we see that d is the direction of the line, We can then re-write the inhomogeneous 3D
`line equation as
`
`r=p+daAd.
`
`(2.11)
`
`A disadvantage of the endpoint representation for 3D lines is that it has too many degrees
`of freedom,i.e., six (three for each endpoint) instead of the four degrees that a 3D line truly
`has. However, if we fix the two points on the line to lie in specific planes, we obtain a rep-
`resentation with four degrees of freedom. For example,if we are representing nearly vertical
`lines, then z = 0 and z = 1 form two suitable planes, ie., the (2, y) coordinates in both
`planes provide the four coordinates describing the line. ‘This kind of two-plane parameteri-
`zation is used in the lightfield and Lumigraph image-based rendering systems described in
`Chapter 13 to represent the collection of rays seen by a camera as it moves in front of an
`object. The two-endpoint representation is also useful for representing line segments, even
`when their exact endpoints cannot be seen (only guessed at).
`If we wish to representall possible lines without bias towards any particular orientation,
`we can use Picker coordinates (Hartley and Zisserman 2004, Chapter 2; Faugeras and Luong
`2001, Chapter 3). These coordinates are the six independent non-zero entries in the 4x4 skew
`symmetric matrix
`
`L=pq' —@p’,
`
`(2.12)
`
`where p and g are any two (non-identical) points on the line. This representation has only
`four degrees of freedom, since L is homogeneousand also satisfies det(.L) = 0, which results
`in a quadratic constraint on the Pliicker coordinates.
`Inpractice, the minimal representation is not essential for most applications. An ade-
`quate model of 3D lines can be obtained by estimating their direction (which may be known
`ahead of time, e.g., for architecture) and some point within the visible portion of the line
`(see Section 7.5.1) or by using the two endpoints, since lines are most often visible as finite
`line segments. However, if you are interested in more details about the topic of minimal
`line parameterizations, Férstner (2005) discusses various ways to infer and model 3D lines in
`projective geometry, as well as how to estimate the uncertainty in such fitted models.
`
`APPL-1012 / Page 10 of 211
`APPL-1012 / Page 10 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`a3
`
`
`elONTU
`
`
`
`Figure 2.4 Basic set of 2D planar transformations.
`
`3D quadrics.
`
`‘The 3D analog of a conic section is a quadric surface
`
`#’Qz=0
`
`(2.13)
`
`(Hartley and Zisserman 2004, Chapter 2). Again, while quadric surfaces are useful in the
`study of multi-view geometry and can also serve as useful modeling primitives (spheres,
`ellipsoids, cylinders), we do not study them in great detail in this book.
`
`2.1.2 2D transformations
`
`Having defined our basic primitives, we can now tum our attention to how they can be trans-
`formed. The simplest transformations occur in the 2D plane andare illustrated in Figure 2.4,
`
`Translation.
`
`2D translations can be written as 2" = a2 + ¢ or
`
`where J is the (2 x 2) identity matrix or
`
`wea[ lr ele
`
`Ioet
`
`v=| or |?
`
`(2.14)
`
`(2.15)
`
`where 0 is the zero vector. Using a2 x 3 matrix results in a more compact notation, whereas
`using a full-rank 3 x 3 matrix (which can be obtained from the 2 x 3 matrix by appending a
`(07 1] row) makes it possible to chain transformations using matrix multiplication. Note that
`in any equation where an augmented vector such as & appears on both sides, it can always be
`replaced with a full homogeneous vector #.
`
`Rotation + translation. This transformation is also known as 2D rigid body motion or
`the 29 Euclidean transformation (since Euclidean distances are preserved). It can be written
`asa’ = ARax+tor
`
`where
`
`=[Rt]a@
`a
`‘i
`R=|
`cos0
`.
`is an orthonormal rotation matrix with RR™ = I and |A| = 1,
`
`cos@ —sin
`
`2.16)
`
`217)
`
`APPL-1012 / Page 11 of 211
`APPL-1012 / Page 11 of 211
`
`
`
`34
`
`2 Image formation
`
`Scaled rotation. Also known as the similarity transform, this transformation can be ex-
`pressed as a’ = sft + ¢ where s is an arbitrary scale factor. It can also be written as
`
`
`
`a —b #t,|_
`e'=(sR t]#=
`
`ba «la,
`
`(2.18)
`
`where we no longer require that a? + 6° = 1. Thesimilarity transform preserves angles
`between lines.
`
`Affine. The affine transformation is written as 2’ = AZ, where A is an arbitrary 2 x 3
`matrix, 1.€.,
`
`a
`
`Oo Jz.
`
`(2.19)
`
`ait | Goo
`
`fo1
`
`Mo Gi 42
`
`Parallel lines remain parallel under affine transformations.
`
`Projective. This transformation, also known as a perspective transform or homography,
`operates on homogeneous coordinates,
`
`# = Ha,
`
`(2.20)
`
`where AT is an arbitrary 3 x 3 matrix. Note that Ai is homogeneous,i.c., it is only defined
`up to a scale, and that two AY matrices that differ only by scaleare equivalent, The resulting
`homogeneous coordinate ' must be normalized in order to obtain an inhomogeneous result
`am, Le.,
`
`
`Ayoa + ary + fiz
`ints hoot + hoy + hoa
`hagt + hay + hao
`hagt + hary + hag
`Perspective transformations preserve straight lines (i.e., they remain straight after the trans-
`formation).
`
`and y=
`
`(2.21)
`
`Hierarchy of 2D transformations. The preceding set of transformationsare illustrated
`in Figure 2.4 and summarized in Table 2.1. The easiest way to think of them is as a set
`of (potentially restricted) 3 x 3 matrices operating on 2D homogeneous coordinate vectors.
`Hartley and Zisserman (2004) contains a more detailed description of the hierarchy of 2D
`planar transformations.
`The above transformations form a nested set of growps, Le., they are closed under com-
`position and have an inverse that is a member of the same group. (This will be important
`later when applying these transformations to images in Section 3,6.) Each (simpler) group is
`a subset of the more complex group below it.
`
`Co-vectors. While the above transformations can be used to transform points in a 2D
`plane, can they also be used directly to transform a line equation? Consider the homogeneous
`equation | -& = 0. If we transform «' = Ala, we obtain
`(2.22)
`i .2@=fwe=(a'Tya=i-z=0,
`i.c., (=HaE: Thus, the action of a projective transformation on a co-vector such as a 2D
`line or 3D normal can berepresented by the transposed inverse of the matrix, which ts equiv-
`alent to the adjoint of HY, since projective transformation matrices are homogeneous. Jim
`
`APPL-1012 / Page 12 of 211
`APPL-1012 / Page 12 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`Transformation
`
`Matrix
`
`#DoF Preserves
`
`Icon
`
`translation
`rigid (Euclidean)
`similarity
`affine
`projective
`
`[ETP bess
`[R[t],.,
`[sR|t],.,
`[Adie
`[ H lve
`
`2
`3
`4
`6
`8
`
`orientation
`lengths
`angles
`parallelism
`straight lines
`
`[|
`oO
`Oo
`i!
`|
`
`Table 2.1 Hierarchy of 2D coordinate transformations. Each transformation also preserves the properties listed
`in the rows below it, i.c., similarity preserves not only angles but also parallelism and straight lines, The 2 x 3
`matrices are extended with a third [07 1] row to form a full 3 x 3 matrix for homogeneous coordinate transforma-
`tions,
`
`Blinn (1998) describes (in Chapters 9 and 10) the ins and outs of notating and manipulating
`co-vectors.
`
`While the above transformations are the ones we use most extensively, a number of addi-
`tional transformations are sometimes used,
`
`Stretch/squash. This transformation changes the aspectratio of an image,
`#
`=
`=
`
`s,o +t,
`Syy + ty,
`
`y/
`
`and is a restricted form of an affine transformation. Unfortunately, it does not nest cleanly
`with the groups listed in Table 2.1.
`
`Planar surface flow. This cight-parameter transformation (Horn 1986; Bergen, Anan-
`dan, Hanna et al. 1992; Girod, Greiner, and Niemann 2000),
`
`a = ag+a,r+ agy + asx” + apry
`yi = a3+agr + agy + azz” + agzy,
`
`It can thus be thought of as a
`arises when a planar surface undergoes a small 31D motion.
`small motion approximation to a full homography. Its main attraction is that itis [imear in the
`motion parameters, a;, which are often the quantities being estimated.
`
`Bilinear interpolant. This cight-parameter transform (Wolberg 1990),
`
`a =
`/
`uv =
`
`ap + a2 + doy + agry
`
`ag + Ogu + ogy + O7ry,
`
`can be used to interpolate the deformation due to the motion of the four comer points of
`a square,
`(In fact, it can interpolate the motion of any four non-collinear points.) While
`
`APPL-1012 / Page 13 of 211
`APPL-1012 / Page 13 of 211
`
`
`
`36
`
`2 Image formation
`
`
`
`
`
` Transformation Matrix #DoF Preserves Icon
`
`
`
`
`
`translation
`rigid (Euclidean)
`similarity
`affine
`projective
`
`[1|t],.,
`[Rt],,
`[sR|t],.,
`L Ades
`[ Fi lise
`
`3
`6
`7
`12
`15
`
`L]
`orientation
`Oo
`lengths
`©
`angles
`a}
`parallelism
`straight lines C|
`
`Table 2.2 Hierarchy of 3D coordinate transformations. Each transformation also preserves the properties listed
`in the rows below it, i.c., similarity preserves not only angles but also parallelism and straight lines. The 3 x 4
`matrices are extended with a fourth [0”1] row to form a full 4 x 4 matrix for homogeneous coordinate transfor-
`mations. The mnemonic icons are drawn in 2D but are meant to suggest transformations occurring in a full 3D
`cube,
`
`the deformation is linear in the motion parameters, it does not generally preserve straight
`lines (only lines parallel to the square axes), However, it is often quite useful, e.g., in the
`interpolation of sparse grids using splines (Section 8.3).
`
`2.1.3 3D transformations
`
`The set of three-dimensional coordinate transformations is very similar to that available for
`9D transformations and is summarized in Table 2.2. As in 2D, these transformations form a
`nested set of groups. Hartley and Zisserman (2004, Section 2.4) give a more detailed deserip-
`tion of this hierarchy.
`
`Translation.
`
`3D translations can be written as 2" = a: + ¢ or
`
`v=[i t]z2
`
`(2.23)
`
`where J is the (3 x 3) identity matrix and 0 is the zero vector.
`
`Rotation + translation. Also known as 3D rigid body motion or the 3D Euclidean trans-
`formation,it can be written as a’ = Aa: + ¢ or
`
`(2.24)
`o=[(R tle
`where R is a 3 x 3 orthonormal rotation matrix with RR?’ = I and |R| = 1. Note that
`sometimes it is more convenient to describe a rigid motion using
`
`«' = Riz —c) = Rx — Re,
`
`(2.25)
`
`where ¢ is the center of rotation (often the camera center).
`Compactly parameterizing a 3D rotation is a non-trivial task, which we describe in more
`detail below.
`
`APPL-1012 / Page 14 of 211
`APPL-1012 / Page 14 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`a7
`
`Scaled rotation. The 3D similarity transform can be expressed as a’ = sFta: + ¢ where
`313 an arbitrary scale factor. It can also be written as
`
`This transformation preserves angles between lines and planes.
`
`a =[sR t ]%.
`
`(2.26)
`
`‘
`
`Affine.
`Le,
`
`‘The affine transform is written as 2’ = AZ, where A is an arbitrary 3 = 4 matrix,
`
`J
`© =]
`
`491 42 a
`
`dio G1 Gin Gis
`Oop 91
`Gaz
`fgg
`
`@o0
`
`#
`
`(2.27)
`
`Parallel lines and planes remain parallel under affine transformations.
`
`Projective. This transformation, variously known as a 3D perspective transform, homog-
`raphy, or collineation, operates on homogeneous coordinates,
`
`&' = HG,
`
`(2.28)
`
`where f is an arbitrary 4 x 4 homogeneous matrix, As in 2D, the resulting homogeneous
`coordinate &' must be normalized in order to obtain an inhomogencousresult a. Perspective
`transformations preserve straight lines (i.c., they remain straight after the transformation).
`
`2.1.4 3D rotations
`
`The biggest difference between 2D and 3D coordinate transformations is that the parameter-
`ization of the 3D rotation matrix Ft is not as straightforward but several possibilities exist.
`
`Euler angles
`
`A rotation matrix can be formed as the product of three rotations around three cardinal axes,
`eg. 0, ¥, and z, ora, y, and «, This is generally a bad idea, as the result depends on the
`order in which the transforms are applied. What is worse, it is not always possible to move
`smoothly in the parameter space, i.c., sometimes one or more of the Euler angles change
`dramatically in response to a small changein rotation.' For these reasons, we do not even
`give the formulafor Euler angles in this book—interested readers can look in other textbooks
`or technical reports (Faugeras 1993; Diebel 2006). Note that,
`in some applications, if the
`rotations are known to be a set of uni-axial transforms, they can always be represented using
`an explicit set of rigid transformations,
`
`Axis/angle (exponential twist)
`
`A rotation can be represented by a rotation axis 7 and an angle @, or equivalently by a 3D
`vector w = Of, Figure 2.5 shows how we can compute the equivalent rotation. First, we
`project the vector v onto the axis 7 to obtain
`vy = Aa(A-v) = (AAT)y,
`
`(2.29)
`
`! In robotics, this is sometiones referred to as gimbal lock,
`
`APPL-1012 / Page 15 of 211
`APPL-1012 / Page 15 of 211
`
`
`
`38
`
`:
`
`2 Image formation
`
`
`
`Figure 2.5 Rotation around an axis 7 by an angle @.
`
`which is the component of v that is not affected by the rotation. Next, we compute the
`perpendicular residual of » from 7,
`
`vp =v—vy =(1—An")u.
`
`We can rotate this vector by 90° using the cross product,
`
`vy, =A xX v= [fi],0,
`
`(2.30)
`
`(2.31)
`
`where [72], is the matrix form ofthe cross product operator with the vector 7 = (fix, fiys fiz),
`
`Ty
`fly
`O
`
`[fi]x=|fiz 0 —ne, |. (2.32)
`
`a a
`
`
`
`
`
`Note that rotating this vector by another 90° is equivalent to taking the cross product again,
`
`Use = Ti ¥ ty = jfij2v =—1,
`
`and hence
`
`vy SUL =v7+t+uyy = (I+ [fAl* jw.
`We can now compute the in-plane componentof the rotated vector w as
`
`i) =cosdv, +sindv, = (sin é[n], — cos O[fi]®, )v.
`
`Putting all these terms together, we obtain the final rotated vector as
`
`ut + oy = (2 + sin O[fi], + (1 — cos 6) {va)%, )v.
`
`(2.33)
`
`We can therefore write the rotation matrix corresponding to a rotation by @ around an axis
`as
`
`R(h,0) = 1 +sin off], + (1 — cos @)[njz,,
`
`(2.34)
`
`which is known as Rodriguez'sformula (Ayache 1989).
`The product of the axis 71 and angle #, w = 07 = (w,,wy,wz), is a minimal represen-
`tation for a 3D rotation. Rotations through common angles such as multiples of 90° can be
`represented exactly (and converted to exact matrices) if # is stored in degrees, Unfortunately,
`
`APPL-1012 / Page 16 of 211
`APPL-1012 / Page 16 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`39
`
`this representation is not unique, since we can always add a multiple of 360° (27 radians) to
`(and getthe same rotation matrix. As well, (7, @) and (—#, —@) represent the same rotation.
`However, for small rotations (e.g., corrections to rotations), this is an excellent choice,
`In particular, for small (infinitesimal or instantaneous) rotations and #? expressed in radians,
`&
`Rodriguez's formula simplifies to
`
`R(w) = I+ sin bff], = I+ [6A], =
`
`]
`
`ws
`
`—tily
`
`til,
`
`ly
`
`i =, | j
`
`Wy
`
`1
`
`(2.35)
`
`which gives a nice linearized relationship between the rotation parameters w and #2, We can
`also write R(w)v =v + w x v, which is handy when we want to compute the derivative of
`Ae with respect to uw,
`
`Oo
`
`y
`
`ss —-
`
`-r
`
`O
`
`Another way to derive a rotation through a finite angle is called the exponential twist
`(Murray, Li, and Sastry 1994), A rotation by an angle ( is equivalent to k rotations through
`6/k. In the limit as k — co, we obtain
`
`(2.37)
`R(f, 6) = lim(1 + 7lof)" = exp [w].
`If we expand the matrix exponential as a Taylor series (using the identity [fi]£*? = —[”i]*,
`k > 0, and again assuming @ is in radians), -
`
`BP
`&
`exp [w], = T+ o[f]x + >lAly + alls +°
`gs
`.
`_
`02 #
`= 14+(00-F + Nile t+ (> - At Ax
`= I+sin6[A], +(1—cosé)[fj2,
`
`(2.38)
`
`which yields the familiar Rodriguez's formula.
`
`Unit quaternions
`
`The unit quaternion representation is closely related to the angle/axis representation. A unit
`quaternion is a unit length 4-vector whose components can be written as g = (qx, Gy; Gz, Iw)
`or gq = (2, y, 2, w) for short. Unit quaternions live on the unit sphere ||q|| = 1 and antipodal
`(opposite sign) quaternions, q and —q,represent the same rotation (Figure 2.6), Other than
`this ambiguity (dual covering), the unit quaternion representation of a rotation is unique.
`Furthermore, the representation is continuous, i.e., as rotation matrices vary continuously,
`one can find a continuous quaternion representation, although the path on the quaternion
`sphere may wrap all the way around before returning to the “origin” g, = (0,0,0,1). For
`these and other réasons given below, quaternions are a very popular representation for pose
`and for pose interpolation in computer graphics (Shoemake 1985).
`
`APPL-1012 / Page 17 of 211
`APPL-1012 / Page 17 of 211
`
`
`
`40
`
`2 Image formation
`
`
`
`Figure 2.6 Unit quaternions live on the unit sphere ||q{| = 1. This figure shows a smooth trajectory through the
`three quaternions gp, q,, and q5. The antipodal point to qj, namely —qg, represents the same rotation as qo.
`
`Quaternions can be derived from the axis/angle representation through the formula
`
` #,
`a
`(2.39)
`q = (v,w) = (sin qthons 3)
`where 7 and @ are the rotation axis and angle, Using the trigonometric identities sing =
`2sin £ cos § and (1 — cos@) = 2sin’$, Rodriguez's formula can be convertedto
`
`R(A,0)=I+sin6[A], + (1 —cosé)[Al?
`T+ 2Qwie], + 2[v]2.
`(2.40)
`
`This suggests a quick way to rotate a vector v by a quaternion using a series of cross products,
`scalings, and additions. To obtain a formula for R(q) as.a function of (x,y, z, w), recall that
`
`rz
`zy
`—y—z7
`y
`-z
`0
`
`
`[v];=| O a|and [v]} =2 ry —x* — 24 yz
`
`
`-y x
`0
`zz
`yz
`—e? —y"
`
`
`
`We thus obtain
`
`R(qg)=|
`
`1-2Ay?+2*)
`Aeyt+zw)
`Oaz—yu)
`
`a(zz + yw)
`2(ry—zw)
`BAye—-cw)
`1-2(27427)
`—Ayz+aw) 1-20 + y2)
`
`|.
`
`(2.41)
`
`The diagonal terms can be made more symmetrical by replacing 1 — 2(y* + 2*) with (2? +
`wu? —y* — 2°), ete.
`The nicest aspect of unit quaternions is that there is a simple algebra for composing rota-
`tions expressed as unit quaternions. Given two quaternions gy = (vp, wo) and q, = (v1, Ww),
`the quaternion multiply operator is defined as
`
`Go = Ipod, = (vo X Vp + wWov1 + WV, WOU— Vo + V1),
`
`(2.42)
`
`with the property that R(q,) = R(g,)A(q,). Note that quaternion multiplication is not
`commutative, just as 3D rotations and matrix multiplications are not.
`
`APPL-1012 / Page 18 of 211
`APPL-1012 / Page 18 of 211
`
`
`
`2.1 Geometric primitives and transformations
`
`41
`
`procedure slerp(qp,q 1, @):
`
`return q2 = 440
`
`» Fe = 4/9 = (Vp, Wr)
`
`. ifw, < O then g, + —q,
`
`. 6, = 2tan™(\Ivp|/w,)
`
`~ fp = N(uy) = v-/||v5|]
`
`.= od,
`
`She (sin 22-72,., cos fa.)
`
`Algorithm 2.1 Spherical linear interpolation (slerp). ‘The axis and total angle are first computed from the quater-
`nion ratio. (This computation can be lifted outside an inner loop that generates a set of interpolated position for
`animation.) An incremental quaternion is then computed and multiplied by the starting rotation quaternion.
`
`Taking the inverse of a quaternion is easy: Just flip the sign of v or w (but not both!),
`(You can verify this has the desired effect of transposing the Ri matrix in (2.41).) Thus, we
`can also define guaternion division as
`
`92 = 40/41 = Gon: = (U9 Xv, + wots — wg, —wyuh — Uo- 4).
`
`(2.43)
`
`This is useful when the incremental rotation between tworotations is desired,
`In particular, if we want to determine a rotation that is partway between two given rola-
`tions, we can compute the incremental rotation, take a fraction of the angle, and compute the
`new rotation. This procedure is called spherical linear interpolation or slerp for short (Shoe-
`make 1985) and is given in Algorithm 2.1. Note that Shoemake presents two formulas other
`than the one given here. The first exponentiates q,. by alpha before multiplying the original
`quaternion,
`
`while the second treats the quaternions as 4-vectors on a sphere and uses
`
`d2 = 4r os
`
`2
`
`2
`
`sin(1 — a)é
`~
`sin@
`
`0
`
`
`sin ad}
`sin @
`
`li
`
`(2.44)
`
`(2
`
`45
`
`}
`
`where # = cos~'(qg, « q,) and the dot productis directly between the quaternion 4-vectors,
`AL of these formulas give comparable results, although care should be taken when g, and q,
`are close together, which is why I prefer to use an arctangent to establish the rotation angle.
`
`Which rotation representation is better?
`
`The choice of representation for 3D rotations depends partly on the application.
`The axis/angle representation is minimal, and hence does not require any additional con-
`straints on the patameters (no need to re-normalize after each update).
`If the angle is ex-
`pressed in degrees, it is easier to understand the pose (say, 90° twist around x-axis), and also
`
`APPL-1012 / Page 19 of 211
`APPL-1012 / Page 19 of 211
`
`
`
`42
`
`2 Image formation
`
`easier to express exact rotations. When the angle is in radians, the derivatives of A. with
`respect to w can easily be computed (2.36).
`Quaternions,on the other hand, are better if you wantto keep track of a smoothly moving
`camera, since there are no discontinuities in the representation.It is also easier to interpolate
`between rotations and to chain rigid transformations (Murray, Li, and Sastry 1994; Bregler
`and Malik 1998).
`Myusual preference is to use quaternions, but to update their estimates using an incre-
`mental rotation, as described in Section 6.2.2,
`
`2.1.5 3D to 2D projections
`
`Now that we know how to represent 2D and 3D geometric primitives and how to transform
`them spati