`
`Computer Vision
`
`Algorlthms and Apphcatlons
`
`
`
`
`
`APPL-1013 / Page 1 of 63
`Apple v. Corephotonics
`
`
`
`Texts in Computer Science
`
`Editors
`
`David Gries
`Fred B. Schneider
`
`For further volumes:
`
`www.3pringercomfseriesl3191
`
`
`
`APPL-1013 / Page 2 of 63
`
`APPL-1013 / Page 2 of 63
`
`
`
`Richard Szeliski
`
`Computer Vision
`
`av
`
`Algorithms and Applications
`
`5; Springer
`
`APPL-1013 / Page 3 0f 63
`
`APPL-1013 / Page 3 of 63
`
`
`
`
`
`Dr. Richard Szeliskj
`Microsoft Research
`One Microsoft Way
`98052—6399 Redmond
`
`Washington
`USA
`szeiiski@microsoft.com
`
`Series Editors
`David Gries
`Department of Computer Science
`Upson Hall
`Cornell University
`Ithaca. NY 14853-7501, USA
`
`Fred B. Schneider
`Department of Computer Science
`Upson Hall
`Cornell University
`Ithaca, NY 14853—7501. USA
`
`ISSN 1868—0941
`ISBN 97'8—1-84882-934-3
`DOI 10,100?:‘978-1-84882—935—0
`
`e-ISSN l868-095X
`e-ISBN 978-1-84882—935-0
`
`Springer London Dordrecht Heidelberg New York
`
`British Library Cataloguing in Publication Data
`A catalogue record for this book is available from the British Library
`
`Library of Congress Control Number: 20109368]?
`
`© Springer—Veriag London Limited 201 1
`Apart from any fair dealing for the purposes of research or privale study. or criticism or review. as
`permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
`stored or transmitted,
`in any form or by any means, with the prior permission in writing of the
`publishers. or in the case of reprographic reproduction in accordance with the terms of licenses issued by
`the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent
`to the publishers.
`The use of registered names, trademarks, etc., in this publication does not imply, even in the absence ofa
`specific statement, that such names are exempt from the relevant laws and regulations and therefore free
`for general use.
`The publisher makes no representation, express or implied, with regard to the accuracy of the infomtation
`contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
`that may be made.
`
`Printed on acid—free paper
`
`Springer is part of Springer Sciencc+Business Media (www.5pringcr.com)
`
`APPL-1013 / Page 4 of 63
`
`APPL-1013 / Page 4 of 63
`
`
`
`Chapter 2
`
`Image formation
`
`2.1 Geometric primitives and transformations ................... 29
`2.1.1 Geometric primitives .......................... 29
`2.1.2
`2D transfonnations ........................... 33
`2.1.3
`3D transfonnatiOns ........................... 36
`2.1.4
`3D rotations ............................... 37
`
`2.1.5
`2.1.6
`
`3D to 2D projections .......................... 42
`Lens distortions ............................. 52
`
`2.2
`
`Photometric image formation .......................... 54
`2.2.1
`Lighting ................................. 54
`2.2.2 Reflectance and shading ........................ 55
`2.2.3 Optics .................................. 61
`2.3 The digital camera ' ........................ -....... 65
`2.3.1
`Sampling and aliasing ......................... 69
`2.3.2 Color .................................. 71
`
`2.3.3 Compression .............................. 81}
`2.4 Additional reading ............................... 82
`2.5 Exercises .................................... 82
`
`R. SzeIiski, Computer Vision: Algorithms and Applications, Texts in Computchcience,
`DOI 10. 10073978-1-84882—935—[2 © Springer-Veriag London Limited 201 l
`
`27
`
`APPL-1013 / Page 5 of 63
`
`APPL-1013 / Page 5 of 63
`
`
`
`28
`
`'
`
`2 Image formation
`
`
`
`
`
`Figure 2.1 A few components of the image formation process: (a) perspective projection; (h) light scattering
`when hitting a surfaCe; (c) lens optics; (d) Bayer color filter array.
`
`APPL-1013 / Page 6 of 63
`
`APPL-1013 / Page 6 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`29
`
`Before we can intelligently analyze and manipulate images, we need to establish a vocabulary
`for describing the geometry of a Scene. We also need to understand the image formation
`process that produced a particular image given a set of lighting conditions, scene geometry,
`surface properties, and camera Optics. In this chapter. we present a simplified model of such
`an image formalion process.
`t
`Section 2.1 introduces the basic geometric primitives used throughout the book (points,
`lines, and planes) and the geometric transformations that project these 3D quantities into 2D
`image features (Figure 2.1a). Section 2.2 describes how fighting, surface properties (Fig-
`ure 2.1b), and camera optics (Figure 2.1c) interact in order to produce the color values that
`fall onto the image sensor. Section 2.3 describes how continuous color images are turned into
`discrete digital samples inside the image sensor (Figure 2.1d) and how to avoid (or at least
`characterize) sampling deficiencies, such as aliasing.
`
`The material covered in this chapter is but a brief summary of a very rich and deep set of
`topics, traditionally covered in a number of separate fields. A more thorough introduction to
`the geometry of points, lines, planes, and projections can be found in textbooks on multi—view
`
`geometry (Hartley and Zisserman 2004; Faugeras and Luong 2001) and computer graphics
`(Foley, van Dam, Feiner at at. 1995). The image formation (synthesis) process is traditionally
`taught as part of a computer graphics curriculum (Foley, van Dam, Feiner er al. 1995; Glass—
`ner 1995; Watt 1995; Shirley 2005) but it is also studied in physics-based computer vision
`(Wolff, Shafer, and Healey 1992a). The behavior of camera lens systems is studied in optics
`(Moller 1988; Hecht 2001; Ray 2002). Two good books on color theory are (Wyszecki and
`Stiles 2000; Healey and Shafer 1992), with (Livingstone 2008) providing a more fun and in-
`formal introduction to the topic of color perception. Topics relating to sampling and aliasing
`are covered in textbooks on signal and image processing (Crane 1997; Jahne 1997; Oppen-
`heim and Schafer 1996; Oppenheim, Schafer, and Buck 1999; Pratt 2007; Russ 2007; Burger
`and Burge 2008; Gonzales and Woods 2008).
`
`A note to students: If you have already studied computer graphics, you may want to
`skim the material in Section 2.], although the sections on projective depth and object—centered
`projection near the end of Section 2.1.5 may be new to you. Similarly, physics students (as
`well as computer graphics students) will mostly be familiar with Section 2.2. Finally, students
`with a good background in image processing will already be familiar with sampling issues
`(Section 2.3) as well as some of the material in Chapter 3.
`
`2.1 Geometric primitivesand transformations
`
`In this section, we introduce the basic 2D and 3D primitives used in this textbook, namely
`points, lines, and planes. We also describe how 3D features are projected into 2D features.
`More detailed descriptions of these topics (along with a gentler and more intuitive introduc—
`tion) can be found in textbooks on multiple-view geometry (Hartley and Zisserman 2004;
`Faugeras and Luong 2001).
`
`2.1.1 Geometric primitives
`
`Geometric primitives form the basic building blocks used to describe three-dimensional shapes.
`In this section, we introduce points, lines, and planes. Later sections of the book discuss
`
`APPL-1013 / Page 7 of 63
`
`APPL-1013 / Page 7 of 63
`
`
`
`30
`
`2 Image formation
`
`
`(b)
`
`Figure 2.2 (a) 2D line equation and (b) 3D plane equation, expressed in terms of the normal it and distance to
`the origin d.
`
`curves (Sections 5.1 and 112), surfaces (Section 12.3), and volumes (Section 12.5).
`
`2D points (pixel coordinates in an image) can be denoted using a pair of values,
`ZD points.
`as = (a, y) E R2, or alternatively,
`
`3:“)
`
`a
`
`(2.1)
`
`(As stated in the introduction, we use the (931, $2, . . .) notation to denote column vectors.)
`2D points can also be represented using homogeneous coordinates, a = (E, i}, til) 6 792,
`where vectors that differ only by scale are considered to be equivalent. 392 = R3 — (0,0,0)
`is called the 2D projective space.
`
`A homogeneous vector 55: can be converted back into an inhomogeneous vector :12 by
`dividing through by the last element tit, i.e.,
`
`E: = (i,§,1fi) : 13(miyt1) 2 15:5,
`
`(2-2)
`
`where i = (a, y, 1) is the augmented vector. Homogeneous points whose last element is if.- =
`0 are called ideal points or points at infinity and do not have an equivalent inhomogeneous
`rePresentation.
`
`2D lines can also be represented using homogeneous coordinates f = (a, b, c).
`2D lines.
`The corresponding line equation is
`
`e-f=ae+by+c=0.
`
`(2.3)
`
`We can normalize the line equation vector so that! = (fit, fry, ti) 2 (ft, :13) with ”fill = 1. In
`this case, it. is the normal vector perpendicular to the line and d is its distance to the origin
`(Figure 2.2).
`(The one exception to this normalization is the line at infinity 3 = (0,0,1),
`which includes all (ideal) points at infinity.)
`We can also express it as a function of rotation angle 9, ii, = (1%, fig) = (cos 0, sin 6)
`(Figure 2.2a). This representation is commonly used in the Hough transform line-finding
`algorithm, which is discussed in Section 4.3.2. The combination (Ehd) is also known as
`polar coordinates.
`When using homogeneous coordinates, we can compute the intersection of two lines as
`
`e = i1 X112,
`
`(2.4)
`
`APPL-1013 / Page 8 of 63
`
`APPL-1013 / Page 8 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`31
`
`where x is the cross product operator. Similarly, the line joining two points can be written as
`
`t": 5:, x 37:3.
`
`(2.5)
`
`When trying to fit an intersection point to multiple lines or, conversely, a line to multiple
`points, least squares techniques (Section 6.1.1 and Appendix A2) can be used, as discussed
`in Exercise 2.1.
`
`2D conics. There are other algebraic curves that can be expressed with simple polynomial
`homogeneous equations. For example, the conic sections (so called because they arise as the
`intersection of a plane and a 3D cone) can be written using a quadric equation
`
`s1" Qi: = o.
`
`(2.5)
`
`Quadric equations play useful roles in the study of multi-view geometry and camera calibra-
`tion (Hartley and Zisserman 2004; Faugeras and Luong 2001) but are not used extensively in
`this book.
`
`3D points. Point coordinates in three dimensions can be written using inhomogeneous co-
`ordinates a: = (:c, y, z) E R3 or homogeneous coordinates :i': = (§,§,2,1IJ) 6 733. As before,
`it is sometimes useful to denote a 3D point using the augmented vector 5: = (:3, y, z, 1) with
`i 2 1.35.
`
`3D planes can also be represented as homogeneous coordinates fit = (a, b, c, d)
`3D planes.
`with a corresponding plane equation
`'
`
`:E-fitzax+by+cz+d=0.
`
`(2.7)
`
`We can also normalize the plane equation as m = (715, fry, fiz, d.) 2 (fit, d) with ”it“ = 1.
`In this case, it. is the normal vector perpendicular to the plane and d is its distance to the
`
`origin (Figure 2.2b). As with the case of 2D lines, the plane at infinity fit = (0,0,0,1),
`which contains all the points at infinity, cannot be normalized (i.e., it does not have a unique
`normal or a finite distance).
`
`We can express it as a function of two angles (9, 95),
`
`ii = (cos 19 cos <35, sine cos (:5, sin 95),
`
`(2.8)
`
`i.e., using spherical coordinates, but these are less commonly used than polar coordinates
`since they do not uniformly sample the space of possible normal vectors.
`
`3D lines. Lines in 3D are less elegant than either lines in 2D or planes in 3D. One possible
`representation is to use two points on the line, (p, q). Any other point on the line can be
`expressed as a linear combination of these two points
`
`as shown in Figure 2.3. If we restrict O S A S 1, we get the line segment joining p and q.
`
`r = (1 — 20p + Aq,
`
`(2.9)
`
`APPL-1013 / Page 9 of 63
`
`APPL-1013 / Page 9 of 63
`
`
`
`32
`
`2 Image formation
`
`
`
`Figure 2.3 31) line equation, r- : (1 — Mp + Ag.
`
`If we use homogeneous coordinates, we can write the line as
`
`F = #15 + Ari.
`
`(2.10)
`
`A special case of this is when the second point is at infinity, i.e., ti 2 (CL, rig, dz, 0) : (Ci, 0).
`Here, we see that d is the direction of the line. We can then re-write the inhomogeneous 3D
`line equation as
`
`r=p+Aai
`
`(2.11)
`
`A disadvantage of the endpoint representation for 3D lines is that it has too many degrees
`of freedom, i.e., six (three for each endpoint) instead of the four degrees that a 3D line truly
`has. However, if we fix the two points on the line to lie in specific planes, we obtain a rep—
`resentation with four degrees of freedom. For example, if we are representing nearly vertical
`
`fines, then 2 : 0 and z = 1 form two suitable planes, i.e., the (a, y) coordinates in both
`planes provide the four coordinates describing the line. This kind of two—plane parameter-i—
`zation is used in the iigkrfieid and Lumigraph image-based rendering systems described in
`Chapter 13 to represent the collection of rays seen by a camera as it moves in front of an
`object. The two-endpoint representation is also useful for representing line segments, even
`when their exact endpoints cannot be seen (only guessed at).
`If we wish to represent all possible lines without bias towards any particular orientation,
`we can use Pificksr coordinates (Hartley and Zisserrnan 2004, Chapter 2; Faugeras and Luong
`2001, Chapter 3). These coordinates are the six independent non-zero entries in the 4 X4 skew
`symmetric matrix
`
`L = 156T - 615T,
`
`(2.12)
`
`where 15‘ and {f are any two (non-identical) points on the line. This representation has only
`
`four degrees of freedom, since L is homogeneous and also satisfies det(L) = 0, which results
`in a quadratic constraint on the Plucker coordinates.
`
`In‘practice, the minimal representatiOn is not essential for most applications. An ade-
`quate model of 3D lines can be obtained by estimating their direction (which may be known
`ahead of time, e.g., for architecture) and some point within the visible portion of the line
`(see Section 7.5.1) or by using the two endpoints, since lines are most often visible as finite
`line segments. However, if you are interested in more details about the topic of minimal
`line pararneterizations, F'orstner (2005) discusses various ways to infer and model 3D lines in
`projective geometry, as well as how to estimate the uncertainty in such fitted models.
`
`APPL—1013 / Page 10 of 63
`
`APPL-1013 / Page 10 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`33
`
`translation
`
`y m0 projective
`fl#4
`fl
`
`
`Euclidean
`affine
`
`.
`
`x
`
`Figure 2.4 Basic set of 2D planar transformations.
`
`3D quadrics. The 3D analog of a conic section is a quadric surface
`
`for = 0
`
`(2.13)
`
`(Hartley and Zisserman 2004, Chapter 2). Again, while quadric surfaces are useful in the
`study of multi-view geometry and can also serve as useful modeling primitives (spheres,
`ellipsoids, cylinders), we do not study them in great detail in this book.
`
`2.1.2 2D transformations
`
`Having defined our basic primitives, we can now him our attention to how they can be trans
`formed. The simplest transformations occur in the 213 plane and are illustrated in Figure 2.4.
`
`Translation.
`
`2D translations can be written as :c’ = a: + t or
`
`where I is the {2 x 2) identity matrix or
`
`w’=[r t]:r:
`
`(2.14)
`
`-
`
`(2.15)
`
`_,
`
`I
`
`$2M. lie
`
`t
`
`_
`
`where D is the zero vector. Using a 2 x 3 matrix results in a more compact notation, whereas
`using a full-rank 3 x 3 matrix (which can be obtained from the 2 x 3 matrix by appending a
`[0T 1] row) makes it possible to chain transformations using matrix multiplication. Note that
`in any equation where an augmented vector such as an appears on both sides, it can always be
`replaced with a full homogeneous vector 5:.
`
`Rotation + translation. This transformation is also known as 2D rigid body motion or
`the 2D Euclidean transformation (since Euclidean distances are preserved). It can be written
`asm’=Ra:+tor
`
`where
`
`m’=[R fir
`
`R:
`
`_
`cosfl —sm6
`5111 6
`cos 6
`
`(2.16)
`
`(2.17)
`
`is an orthonormal rotation matrix with RRT = I and IRI = l.
`
`APPL—1013 / Page 11 of 63
`
`APPL-1013 / Page 11 of 63
`
`
`
`34
`
`2 Image formation
`
`Scaled rotation. Also known as the similarity transform, this transformation can be ex—
`
`pressed as m“ = stc + t where s is an arbitrary scale factor. It can also be written as
`
`a:=[sR t]s=[: ‘ab Ha
`
`I!
`
`s.
`
`(2.18}
`
`where we no longer require that a2 + b2 = 1. The similarity transform preserves angles
`between Lines.
`
`Affine. The affine transformation is written as as" = Ada, where A is an arbitrary 2 x 3
`matrix, i.e.,
`
`02 ] as.
`
`G12
`
`(119)
`
`w: = [
`
`a
`
`on
`
`are
`
`o
`
`or
`
`air
`
`o
`
`_
`
`Parallel lines remain parallel under affine transformations.
`
`Projective. This transformation, also known as a perspective transform or homogrophy,
`operates on homogeneous coordinates
`
`~ 2 fig,
`
`(2.20)
`
`where 1:! is an arbitrary 3 x 3 matrix. Note that Iii is homogeneous, i.e., it is only defined
`up to a scale, and that two H matrices that differ only by scaledare equivalent. The resulting
`homogeneous coordinate :3" must be normalized in order to obtain an inhomogeneous result
`9:, i.e.,
`
`._
`_
`i _ hmm + hliy + his
`93’ _ how '1' hint; + hm
`hoax + how + 471-22
`y
`haox + h212," + has
`Perspective transformations preserve straight lines (i.e., they remain straight after the trans-
`formation).
`
`.
`
`(2.21)
`
`an
`
`d
`
`Hierarchy 0! 2D transformations. The preceding set of transformations are illustrated
`in Figure 2.4 and summarized in Table 2.1. The easiest way to think of them is as a set
`of (potentially restricted) 3 x 3 matrices operating on 2D homogeneous coordinate vectors.
`Hartley and Zisserrnan (2004) contains a more detailed description of the hierarchy of 2D
`planar transformations.
`The above transformations form a nested set of groups, i.e., they are c105ed under com—
`
`(This will be important
`position and have an inverse that is a member of the same group.
`later when applying these transformations to images in Section 3.6.) Each (simpler) group is
`a subset of the more complex group below it.
`
`Ctr-vectors. While the above transformations can be used to transform points in a 2D
`
`plane, can they also he used directly to transform a line equation? Consider the homogeneous
`equationi i:-— D. If we transform a: = Ha: we obtain
`a.
`.4
`is=iHs=(nTifs =t- =0,
`
`(2.22)
`
`~ —r~
`4
`.
`re, 1—— H 1. Thus, the action of a projective transformation on a co--vecror such as a 2D
`line or 3D normal can be represented by the transposedinverse of the matrix, whichrs equiv-
`alent to the oojfoinr of H, since projective transformation matrices are homogeneous. Jim
`
`APPL—1013 / Page 12 of 63
`
`APPL-1013 / Page 12 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`35
`
`
`
`
`
` Transformation Matrix # DoF Preserves Icon
`
`
`
`
`
`translation
`rigid (Euclidean)
`similarity
`affine
`projective
`
`[ I I t ]2><3
`[ R l t ]2X3
`[ sR I t ]2X3
`[ A ]2x3
`[ I? L 3
`
`X
`
`2
`3
`4
`6
`8
`
`C]
`orientation
`lengths O
`angles
`0
`parallelism U
`straight lines
`i
`
`Table 2.1 Hierarchy of 2D coordinate transformations. Each transformation also preserves the properties listed
`in the rows below it, i.e., similarity preserves not only angles but also parallelism and straight lines. The 2 X 3
`matrices are extended with a third [0T 1] row to form a full 3 x 3 matrix for homogeneous coordinate transforma-
`tions.
`
`Blinn (1998) describes (in Chapters 9 and 10) the ins and outs of notating and manipulating
`eo—vectors.
`'
`
`While the above transformations are the ones we use most extensively, a number of addi—
`tional transformations are sometimes used.
`
`Stretchisquash.
`
`'Ihis transformation changes the aspect ratio of an image,
`f
`m = smart—ta:
`!
`y = syy+tw
`
`and is a restricted form of an affine transformation. Unfortunately, it does not nest cleanly
`with the groups listed in Table 2.1.
`
`Planar surlace flow. This eight-parameter transformation (Horn 1986; Bergen, Anah—
`dan, Hanna e: at. 1992; Girod, Greiner, and Niemann 2000),
`
`r
`2
`a = :10
`I ogy E can may
`ole:
`
`
`:9" = as —— (34$ + {153; —l— sang —— asxy,
`
`arises when a planar surface undergoes a small 3D motion. It can thus be thought of as a
`small motion approximation to a full homography. Its main attraction is that it is linear in the
`motion parameters, at, which are often the quantities being estimated.
`
`Bilinear interpolant. This eight-parameter transform (Wolberg 1990),
`
`a,"r = on + ow + ogy —— nary
`
`y“ = 0.3 + (149.: + 0.53; —— [ct-fry,
`
`
`
`can be used to interpolate the deformation due to the motion of the four corner points of
`a square.
`(In fact, it can interpolate the motion of any four non-collinear points.) While
`
`APPL—1013 / Page 13 of 63
`
`APPL-1013 / Page 13 of 63
`
`
`
`36
`
`2 Image formation
`
`
`
`
`
`
`
` Transformation Matrix # DoF Preserves Icon
`
`
`
`
`
`translation
`rigid (Euclidean)
`similarity
`affine
`projective
`
`[ I i t ]3x4
`[ R j r. ]3x4
`[ sR I
`t; h“
`[ A ]3M
`I? i m
`
`3
`6
`7
`12
`15
`
`|:i
`orientation
`lengths O
`angles
`0
`parallelism E
`straight lines a
`
`Table 2.2 Hierarchy of 3D coordinate transformations. Each transformation also preserves the properties listed
`in the rows below it, La, similarity preserves not only angles but also parallelism and straight lines. The 3 x 4
`matrices are extended with a fourth [0T 1] row to form a full 4 x 4 matrix for homogeneous coordinate transfor—
`mations. The mnemonic icons are drawn in 213 but are meant to suggest transformations occurring in a full 3D
`cube.
`
`the deformation is linear in the motion parameters, it does not generally preserve straight
`
`lines (only lines parallel to the square axes). However, it is often quite useful, e.g., in the
`interpolation of sparse grids using splines (Section 8.3).
`
`2.1.3 3D transformations
`
`The set of three-dimensional coordinate transformations is very similar to that available for
`2D transformations and is summarized in Table 2.2. As in 2D, these transformations form a
`
`nested set of goups. Hartley and Zisserman (2004, Section 2.4) give a more detailed descrip-
`tion of this hierarchy.
`
`Translation.
`
`3D translations can be written as m" = a: + t or
`
`where I is the (3 x 3) identity matrix and 0 is the zero vector.
`
`az’=[r
`
`t]fi
`
`(2.23)
`
`Rotation + translation. Also known as 3D rigid body motion or the 3D Euclidean trans-
`
`formation, it can be written as m’ 2 Rs: + t or
`
`m'=[R Hr:
`
`(2.24)
`
`where R is a 3 x 3 orthonormal rotation matrix with RRT = I and JR| = 1. Note that
`sometimes it is more convenient to describe a rigid motion using
`
`ac’ = R(::c — c) = Re: — RC,
`
`(2.25)
`
`where c is the center of rotation (often the camera center).
`
`Compactiy pararneterizing a 3D rotation is a non-trivial task, which we describe in more
`detail below.
`
`APPL—1013 / Page 14 of 63
`
`APPL-1013 / Page 14 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`37
`
`Scaled rotation. The 31) similarity transform can be expressed as at“ = sRm + t where
`s is an arbitrary scale factor. It can also be written as
`
`This transformation preserves angles between lines and planes.
`
`3": [ 3R t ]:r:.
`
`(2.26)
`
`,
`
`Affine. The affine transform is written as :c’ 2 A53, where A is an arbitrary 3 x 4 matrix,
`i.e.,
`
`93’f =
`
`aon
`
`{310
`320
`
`I101
`
`(111
`(121
`
`Goa
`
`{112
`9:22
`
`ass
`
`(113
`0as
`
`52.
`
`(2.27)
`
`Parallel lines and planes remain parallel under affine transformations.
`
`Projective. This transformation, variously known as a 3D perspective transform, homog-
`raphy, or collimation, operates on homogeneous coordinates,
`
`a z are,
`
`(2.28)
`
`where I?!" is an arbitrary 4 X 4 homogeneous matrix. As in 2D, the resulting homogeneous
`coordinate 5' must be normalized in order to obtain an inhomogeneous result as. Perspective
`
`transformations preserve straight lines (i.e., they remain straight after the transformation).
`
`2.1.4 3D rotations
`
`The biggest difference between 2D and 3D coordinate transformations is that the parameter-
`ization of the 3D rotation matrix R is not as straightforward but several possibilities exist.
`
`Euler angles
`
`A rotation matrix can be formed as the product of three rotations around three cardinal axes,
`e.g., r, y, and z, or :s, y, and r. This is generally a bad idea, as the result depends on the
`order in which the transforms are applied. What is worse, it is not always possible to move
`smoothly in the parameter space, i.e., sometimes one or more of the Euler angles change
`dramatically in response to a small change in rotation.1 For these reasons, we do not even
`give the formula for Euler angles in this book—interested readers can look in other textbooks
`or technical reports (Faugeras 1993; Diebel 2006). Note that, in some applications, if the
`rotations are known to be a set of uni-axial transforms, they can always be represented using
`
`an explicit set of rigid transformations.
`
`Axlslangle (exponential twist)
`
`A rotation can be represented by a rotation axis fir, and an angle 9, or equivalently by a 3D
`vector to = 9171.. Figure 2.5 shows how we can compute the equivalent rotation. First, we
`
`project the vector 1: onto the axis r“: to obtain
`
`1 In robotics, this is sometimes referred to as gimbal lock.
`
`o“ z «are - 1.!) = (as%,
`
`(2.29)
`
`APPL—1013 / Page 15 of 63
`
`APPL-1013 / Page 15 of 63
`
`
`
`38
`
`2 Image formation
`
`
`
`Figure 2.5 Rotation around an axis ft by an angle 6.
`
`which is the component of 1; that is not afiected by the rotation. Next, we compute the
`perpendicular residual of 'o from 11,
`
`vi =v—v" = (I—fifiT 'u.
`
`We can rotate this vector by 90° using the cross product,
`
`ox=fixo:[fi]xo,
`
`(2.30)
`
`(2.31)
`
`where [fix is the matrix form of the cross product operator with the vector 73. 2 (fix, fig, fiz),
`
`[an =
`
`a —a2
`a;
`0
`
`a,
`ea .
`
`(2.32)
`
`Note that rotating this vector by another 90° is equivalent to taking the cross product again.
`
`andhence
`
`”xx = 15» X 11x = [fifiv = ‘Ul:
`
`A 2
`v" —— 'u — 'UJ_ = t: +11,“ = (I+ [n}X)v.
`
`_
`
`We can now compute the in-plane component of the rotated vector u as
`
`at = cos 3m + sin (ii-ox = (sin 6[fi}x — cos €[fi]i)o.
`
`Putting all these terms together, we obtain the final rotated vector as
`
`uzui +1)“ = (I+sin9[fi.}x +(1—cosa)[a]§)v.
`
`(2.33)
`
`We can therefore write the rotation matrix corresponding to a rotation by 6 around an axis fi.
`as
`
`ma, 9) = I + sin amp + (1 — cogenafi,
`
`(2.34)
`
`which is known as Rodriguez ’5 formula (Ayache 1989).
`
`The product of the axis ti. and angle 9, w = 9r“: = (mm, my, (dz), is a minimal represen-
`tation for a 3D rotation. Rotations through common angles such as multiples of 90" can be
`
`represented exactly (and converted to exact matrices) if 8 is stored in degrees. Unfortunately,
`
`APPL—1013 / Page 16 of 63
`
`APPL-1013 / Page 16 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`39
`
`this representation is not unique, since we can always add a multiple of 360° (211' radians) to
`9 and get the same rotation matrix. As well, (ii, 3) and (—fi, —6) represent the same rotation.
`However, for small rotations (e.g., corrections to rotations), this is an excellent choice.
`
`In particular, for small (infinitesimal or instantaneous) rotations and 6 expressed in radians,
`s,
`Rodriguez’s formula simplifies to
`
`12(9)) as I + sin 6[fi]x m I + [8171.]>< =
`
`1
`Luz
`—toy
`
`an,
`1
`(4),:
`
`toy
`~wx
`1
`
`,
`
`(2.35)
`
`which gives a nice linearized relationship between the rotation parameters to and R. We can
`also write R(w)o m 'o + w x 1), which is handy when we want to compute the derivative of
`Rt! with respect to to,
`
`0
`
`21:: = —[o]x = —z
`y
`
`z —y
`0
`a:
`—m
`0
`
`.
`
`(2.36)
`
`Another way to derive a rotation through a finite angle is called the exponential twist
`(Murray, Li, and Sastry 1994). A rotation by an angle 6' is equivalent to is rotations through
`fl/k. In the limit as it: _+ 00, we obtain
`
`R(a, 9) = :3an + épapp = exp [w]x.
`
`(2.37)
`
`If we expand the matrix exponential as a Taylor series (using the identity [iiiKiJr2 = —[fi.]§‘<.
`k > 0, and again assarning B is in radians), ~
`
`9
`s
`exp[w]x = I+S[a]x+3ei+ga1§,+
`A2
`93
`33
`A
`92
`= 1+(9-§+---)[nlx+(§—E+-~)[nlx
`= r+sma{a]x+(1—coss)[a}i,
`
`(2.38)
`
`which yields the familiar Rodriguez’s formula.
`
`Unit quaternions
`
`The unit quaternion representation is closely related to the anglea’axis representation. A unit
`quaternion is a unit length 4—veetor whose components can be written as q 2 (gm, qy, qz, qw)
`or q = (r, y, z, to) for short. Unit quaternions live on the unit sphere J|q|| = 1 and antisocial
`(opposite sign) quatemions, q and —q, represent the same rotation (Figure 2.6). Other than
`this ambiguity (dual covering), the unit quaternion representation of a rotation is unique.
`Furthermore, the representation is continuous, i.e., as rotation matrices vary continuously,
`one can find a continuous quaternion representation, although the path on the quaternicn
`sphere may wrap all the way around before returning to the “origin" go 2 (U, U, 0, 1). For
`these and other reasons given below, quaternions are a very popular representation for pose
`and for pose interpolation in computer graphics (Shoemake 1985).
`
`APPL—1013 / Page 17 of 63
`
`APPL-1013 / Page 17 of 63
`
`
`
`40
`
`2 Image formation
`
`
`
`Figure 2.6 Unit quaternions live on the unit sphere ”£1“ = 1. This figure shows a smooth trajectory through the
`three quaternions qo, Q1, and (12. The antipodal point to qg, namely -q2, represents the same rotation as Q2.
`
`Quaternions can be derived from the axisfangle representation through the formula
`
`q = (o, it!) = (sin gfi, cos g),
`
`(2.39)
`
`where fl and 9 are the rotation axis and angle. Using the trigonometric identities sint? :-
`2 sin 3 cos % and (1 — cos 6’) = 2 sin2 %, Rodriguez’s formula can be converted to
`
`me, a) = I + sin Slfilx + (1 - cos snag
`
`= I + 2w[v] x + 2a];
`
`(2.40)
`
`This suggests a quick way to rotate a vector v by a quaternion using a series of cross products,
`scalings, and additions. To obtain a formula for R(q) as .a function of (a, y, z, to), recall that
`
`[1:]x =
`
`y
`0 —z
`z
`0 —:s
`—y
`a
`0
`
`and [MK 2
`
`—y2 -— 2.2
`my
`m2:
`
`my
`—m2 — 32
`yz
`
`mz
`ya
`~$2 — y2
`
`We thus obtain
`
`1 — 2(y2 + 22)
`
`2(my — 2w)
`
`2(wz + yw)
`
`R(q) =
`
`2(xy + 211;)
`
`1 — 2(ch + :53)
`
`2(yz .1 mo)
`
`.
`
`(2.41)
`
`2(w2 — w)
`
`2(w + W)
`
`1 - 2($2 + y?)
`
`The diagonal terms can be made more symmetrical by replacing l — 2(y2 + zz) with (:32 +
`1n2 — yg — 22), etc.
`The nicest aspect of unit quaternions is that there is a simple algebra for cemposing rota-
`tions expressed as unit quaternjons. Given two quaternions 90 = (no, we) and q1 = (in, 1.91),
`the quatemion multiply operator is defined as
`
`Q2 = (10911 = (’00 X '01 + 10091 + with), ”wow: — ’00 "01):
`
`(2-42)
`
`with the property that R(q2) = R(qD)R(qi). Note that quaternion multiplication is not
`commutative, just as 3D rotations and matrix multiplications are not.
`
`APPL—1013 / Page 18 of 63
`
`APPL-1013 / Page 18 of 63
`
`
`
`2.1 Geometric primitives and transformations
`
`41
`
`
`
`procedure slerp(qo, (11, oz):
`
`1. qr = til/so = (sum)
`
`2. fits, < Othen qr <— —q,.
`
`3. 6,. = 2tan—1(|[o,.[|/wr)
`
`4- fir : NWT) : “gr/””1"”
`
`9
`
`6. get 2 (sin £211":th £51)
`
`7- return 9’2 = case
`
`
`Algorithm 2.] Spherical linear interpolation (slerp). The axis and total angle are first computed from the quater—
`nion ratio. (This computation can be lifted outside an inner loop that generates a set of interpolated position for
`animation.) An incremental quaternion is then computed and multiplied by the starting rotation quaternion.
`
`Taking the inverse of a quaternion is easy: Just flip the sign of v or to (but not both!).
`(You can verify this has the desired effect of tranSposing the R matrix in (2.41).) Thus, we
`can also define quaternion division as
`
`‘12 = 90/91 = Gloéif1 = ('00 X “”1 4‘ “wot-’1 — “wit-’0: —w0w1 *- ‘Uo "01)-
`
`(2-43)
`
`This is useful when the incremental rotation betWeen two rotations is desired.
`
`In particular, if we want to determine a rotation that is partway between two given rota-
`tions, we can compute the incremental rotation. take a fraction of the angle, and compute the
`new rotation. This prOCedure is called spherical linear interpolation or slerp for short (Shoe-
`make 1985) and is given in Algorithm 2.1. Note that Shoemake presents two formulas other
`than the one given here. The first exponentiates or by alpha before multiplying the original
`quatemion.
`
`while the second treats the quaternions as 4-vectors on a sphere and uses
`
`(12 = 9390:
`
`_ sin(l — (3):?
`sino‘
`
`sin 0:9
`‘10 + m9]:
`
`9‘2
`
`(2-44)
`
`(245)
`
`where :9 = cos—1(q0 - ‘11) and the dot product is directly between the quaternion 4—vectors.
`All of these formulas give comparable results, although care should be taken when go and q1
`are close together, which is why I prefer to use an arctangent to establish the rotation angle.
`
`Which rotation representation is better?
`
`The choice of representation for 3D rotations depends partly on the application.
`The axislangle representation is minimal, and hence does not require any additional con—
`straints on the parameters (no need to re-normalize after each update). If the angle is ex-
`pressed in degrees, it is easier to understand the pose (say, 90° twist around m-axis), and also
`
`APPL—1013 / Page 19 of 63
`
`APPL-1013 / Page 19 of 63
`
`
`
`42