`Szeliski et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006097854A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,097,854
`Aug.l, 2000
`
`[54)
`
`IMAGE MOSAIC CONSTRUCriON SYSTEM
`AND A PI'ARATUS WITH PATCH-BASED
`ALIGNMENT, GLOBAL BLOCK
`ADJUSTMENT AND PAIR-WISE MOTlON(cid:173)
`BASED LOCAL WARPING
`
`[75]
`
`Inventors: Richard S7.eliski; Heung-Yeung Shum,
`both of Bellevue, Wash.
`
`[73] As.<>ignee: Microsoft Corporatio n, Redmond,
`Wash.
`
`[21] Appl. No.: 08/905,023
`
`Aug. 1, 1997
`
`[22] Filed:
`Int. C l.7
`(51]
`................... .......... .......................... G06K 9/36
`[52) U.S. C l . ............................................. 382/284; 345/435
`[58] Field of Search ..................................... 382/107, 154,
`382/282, 284, 294, 296; 345/419, 425-438;
`348/42, 263, 580, 584, 598
`
`[56]
`
`References C ited
`
`U.S. PATENT DOCUMENTS
`
`5, 187,754
`5,488,674
`5,581,638
`5,611,000
`5,649,032
`5,907,626
`
`2/1993 Currinl cl al ........................... 382/284
`1/ 1996 Burl cl al. ............................... 382/284
`12/ 1996 Givens cl al ........................... 382/294
`3/1997 Sze1iski et al. ......................... 382!294
`7/ !997 Burl e l al. .............. ................. 382!294
`............................ 382!284
`5! 1999 Toklu el al.
`
`OTHER PUBLICATIONS
`
`Richard Szelisk:i and James Coughlan, "Spline-Based Image
`Registration," Tech Report CRL 94/1, Digital Equipment
`Corporation, Cambridge Research Lab, Cambridge, MA,
`Apr., 1994.
`P. Anandan et al., editors, IEEE Workshop on Representa(cid:173)
`tions o( Visual Scenes, Cambridge, Massachusells, Jun.
`1995, IEEE Computer Society Press. PP. 10-17.
`Anonymous. Creating full view panoramic image mosaics
`and texture-mapped models. In Computer Graphics Pro(cid:173)
`ceedings Annual Conference Series, Proc. SIGGRAPI-1'97
`(Los Angeles) Aug. 1997, ACM S!GGRAPH. PP. 251-258.
`
`J. R. Bergen, P. Anaodan, K. l Hanna, and R. Hjngorani.
`Hierarchical model- based motion estimation. In Second
`European Conference on Computer Vision (ECCY'92), pp.
`237-252, Santa Margberita Liguere, Italy, May 1992.
`Springer-Verlag.
`S.J. Gortler, R. Grzeszczuk, R. Szeliski, and M.F. Cohyen.
`The lumigrap. In Computer Graphics Proceedings, Annual
`Conference Series, pp. 43-54, Proc. S IGGRAPH' 96 (New
`Orleans), Aug. 1996. ACM SIGGRAPII.
`S. E. Cben. QuickTime VR- an image-based approach to
`virtual environment navigation. Computer Graphics (SIG(cid:173)
`GRAPH'95), pp. 29-38, Aug. 1995.
`R. J. Hartley. Self-calibration from multiple views of a
`rotating camera. In Tbird European Conference on Com(cid:173)
`puter Vision (ECCV'94), val. l, pp. 471-478, Stockholm,
`Sweden, May 1994. Springer- Verlag.
`M. Irani, S. Hsu, and P. Anandan. Video compression using
`mosaic representations. Signal Proces.siog: Image Commu(cid:173)
`nication, 7:529-522, 1995.
`S. B. Kang and R. Weiss. Characterization of errors in
`compositing panoramic images, Technical Report 96/2,
`Digital Equipment Corporation, Cambridge Research Lab,
`Jun. 1996.
`
`(List continued on next page.)
`
`Primary Exnminer--Bbavesh Mehta
`Attorney, Agent, or Firm-Micbaelson & Wallace; Peter L.
`Michaelson; Robert M. Wallace
`
`[57]
`
`ABSTRACT
`
`Tbe system of the invention aligns a set plural overlapping
`images useful in constructing a mosaic by performing patch(cid:173)
`based alignment of the set of overlapping images to produce
`a set of warped images, performing block adjustment of the
`set of warped images to produce a set of block-adjusted
`images, and then performing pair-wise motion-based local
`warping of tbe set of block-adjusted images.
`
`160 Claims, 25 Drawing Sheets
`
`(4 of 25 Drawing Sheet(s) .Filed in Color)
`
`Align Ex. 1006
`U.S. Patent No. 9,962,244
`
`0001
`
`
`
`6,097,854
`Page 2
`
`OTI-TER PUBLICATIONS
`
`M.-C. Lee et al. A layered video object ocding system using
`sprite and affine motion model. IEEE Transactions on Cir(cid:173)
`cuits and Systems For Video technology, 7(1):130-145, Feb.
`1997.
`M. Levoy and P. Hanrahan. Light field rendering. lo Com(cid:173)
`puter Graphics Proceedings, An nual Conference Series, pp.
`31-42, Proc. SIGGRAPH'96 (New Orleans), Aug. 1996.
`ACM SIGGRAPH.
`B.D. Lucas and T. Kanade. An iterative image registration
`technique with an application io stereo vision. In Seventh
`loteroatlooal Joint Conference on Artificial intelligence
`(IJCAl-81), pp. 674-679, Vancouver, 1981.
`H. E. Malde. Panoramic photographs. American Scientist,
`71(2):132-140, Mar.- Apr. 1983.
`L. McMillian and G. Bishop. Plenoptk modeling: An
`image-based rendering system. Computer Graphics (SIG(cid:173)
`GRAPl-1'95), pp. 39-46, Aug. 1995.
`S . Mann and R. W. Picard. Virt11al bellows: Constructing
`high-quality images from video. In First IEEE International
`Conference on Image Processing (ICIP-94), vol. 1, pp.
`363-367, Austin, Texas, Nov. 1994.
`W. H. Press, B.P. Flannery, S. A Teukolsky, and w:r.
`VCetterling. Numerical Recipes in C: The Art of Scientific
`Computing. Cambridge University Press, Cambridge,
`England, second edition, L992.
`G. S. Stein. Accurate internal camera calibration using
`rotation, with analysis of sources of error. In Fifth Interna(cid:173)
`tional Conference on Computer Vision (ICCV'95), pp.
`230-236, Cambridge, Massachusetts, Jun. 1995.
`
`R. Szeliski. Image mosaicing for tcle-rcality applications. In
`IEEE Workshop on Applications of Computer Vision
`(WACV'94), pp. 44-53, Sarasota, Florida, Dec. I 994. IEEE
`Computer Society.
`
`R. Szeliski. Video mosaics for virtual environments. IEEE
`Computer Graphics and Applications, pp. 22-30, Mar. 1996.
`
`G. Wolbcrg " Dig ital Image Warping" IEEE Computer Soci(cid:173)
`ety Press, Los Alamitos, Ca. 1990.
`
`Ned Greene, New York Institute of Technology "Environ(cid:173)
`ment Mapping and Other AQpplications Of World Projec(cid:173)
`tions" iEEE, Nov. 1986, pp. 21-29.
`
`Roger Y. Tsa.i. "A Versatile,e Camera Calibration Technique
`for High- Accuracy 3D Machine Vision Metrology Using
`Off- Tbe-ShelfTV Camerads and Lenses" IEEE Journal of
`Robotics and Automation vol. RA-3, Aug. 1987, pp.
`323-344.
`
`Hank Weghorst, Gary Hooper, and Donald P. Greenberg,
`Cornell University ACM Transactions on Graphics, vol. 3,
`No. 1, Jan. 1984, PP. 52-69.
`
`Lance Wi[)jams. Computer Graphics Laboratory New York
`Institute of Technology Old Westbury, New York, ACM
`0-89791-10-9-1/83, Computer Graphics vol. 17, No.3, JuL
`1983, pp. 1-11.
`
`0002
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 1 of 25
`
`6,097,854
`
`ESTIMATE FOCAL LENGTH
`
`11 0
`
`IMAGE ALIGNMENT
`(USING PATCH-BASED
`ALGORITHM, PREFERABLY)
`
`120
`
`GLOBAL IMAGE ALIGNMENT
`
`130
`
`DEGHOSTING LOCAL ALIGNMENT
`
`ENVIRONMENT{fEXTURE MAPPING OF COLORS
`BLENDED FROM OVERLAPPING IMAGES
`
`150
`
`STANDARD 3-D GRAPHICS SOFlWARE
`
`~60
`
`OUTPUT
`
`IMAGE
`
`FIG. 1
`
`0003
`
`
`
`~ I
`
`21
`~
`PROCESSING
`UNIT
`
`r - - - - - - - - - - - - - - - - - - - - - - - . .....-------.
`I SYSTEM MEMORY r-22
`I
`20 I
`---~24
`ROM
`.--r-~26
`BIOS
`~ 25
`._/ ~35
`RAM
`OPERATING A
`SYSTEM
`APPLICATION
`PROGRAMS
`
`48
`..--+---'----.
`VIDEO
`ADAPTOR
`
`MONITOR
`
`47
`
`I
`
`, . _ _ _ _ ____J
`
`23
`
`l _____ l
`I
`53 I
`46
`.--S~ER'-IA--+L---. NETWORK I
`OPTICAL
`DRIVE
`PORT
`INTERFACE
`I
`INTERFACE INTERFACE
`I
`___ j
`
`51
`
`I
`I
`I
`I
`: ~~~~~
`: I PR8~rRf'M 1,
`!.. -
`_______ ,....___
`)mmu,•J
`27
`
`,..,.."'
`_,-'
`,.. ....
`_,-'
`
`, .... '"'
`
`I
`
`I L - - - - r - - - '
`
`\
`
`\ \
`
`\
`
`\
`
`\
`
`I
`OPERATING !APPLICATION
`SYSTEM
`PROGRAMS
`
`: 29
`PROGRAM !PROGRAM
`MODULES. DATA
`
`31
`
`42
`
`KEYBOARD
`
`35
`
`36
`
`37
`
`38
`
`FIG. 2A
`
`40
`
`APPLICATION
`36~ PROGRAMS
`
`Cj
`•
`rJ).
`•
`~
`
`~ .... ~ = ....
`
`)o>
`c:
`~ .....
`N g
`
`~
`
`fJJ
`::r
`
`(">
`
`("> -N
`0 -N
`
`Vl
`
`0\
`
`~ = \0
`...
`"-1
`00
`Ul
`~
`
`0004
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 3 of 25
`
`6,097,854
`
`220 \
`
`•
`•
`'
`I ~ I
`~ I' ~
`-
`
`,
`
`2 3\0
`I IMAGE
`MEMORY
`
`oJJ
`
`240
`I
`
`PROGRAM
`MEMORY
`IMAGE
`ALIGNMENT
`ALGORITHM
`
`2\0
`IMAGE WARP
`MEMORY
`Mo, M1 , M2 ,
`M3 , • • • Mk
`
`250
`\
`
`MICROPROCESSOR
`
`'
`
`270
`I
`\
`ENVIRONMENT/
`TEXTURE MAP
`MEMORY
`3-D MODEL
`
`~
`
`IMAGE OUTPUT/
`DISPLAY/MONITOR
`
`@
`-
`TEXTURED MAP
`~ l
`
`7 11 :\
`T
`l
`
`t--
`
`.J
`
`\ I
`
`-
`-
`
`FIG. 28
`
`0005
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 4 of 25
`
`6,097,854
`
`PANARAMIC
`"----IM_A_GE____ _ _______ /
`-y-
`FIG. 3
`
`0006
`
`
`
`U.S. Patent
`
`Aug. l , 2000
`
`Sheet 5 of 25
`
`6,097,854
`
`p
`WORLD3D y
`COORD. SYS
`
`(0, 0, 0)
`
`~------------- _____________ /
`
`-y-
`FIG. 4
`
`·'
`
`-
`FIG. 5
`
`0007
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 6 of 25
`
`6,097,854
`
`-~-~\---.-610
`0 I
`
`X
`
`I
`I
`
`-~~>-610
`x'
`0:
`
`I
`
`I
`I
`
`Io (x)
`
`I 1 (x)
`
`~M
`
`FIG. 6
`
`~(I+ D)
`
`(x0) y; )
`100~·
`(xo, Yo
`
`(x~, y; )
`) .__..,---105
`
`')L (' ')
`Xo, Yo
`x~, Yo
`107~~
`~__..,---102
`
`(
`
`I
`
`FIG. 7
`
`0008
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 7 of 25
`
`6,097,854
`
`INPUT PARTIALLY OVERLAPPING IMAGES
`Io (x) AND I 1 (x)
`
`810
`
`820
`
`INITALIZE TRANSFORM MATRIX M (E.G., M ...._ I)
`
`FOR U\ TEST VERSION OF 11 (x), FIND A DEFORMATION
`D OF x WHICH PROVIDES BETTER REGISTRATION OF
`11 WITH 10 , WHERE 11 (x) = 11 (x')
`
`830
`
`M-+- (It D) M
`
`840
`
`860
`
`UPDATE 11, BY RESAMPLING (925) THE LA TEST
`VERSION OF l1 USING THE U\TESTVERSION OF M
`
`NO
`
`880
`
`OUTPUT U\TEST VERSION OF
`11 AND M
`
`FIG. 8
`
`0009
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 8 of 25
`
`6,097,854
`
`905
`
`910
`
`I1 (x)
`
`935
`
`920
`940 X
`- -
`-
`-
`JACOBIAN
`GRADIENT
`955
`
`925
`
`+
`
`930
`
`9i
`
`J i
`
`9iJi9i
`
`Jigi
`
`1-
`
`-
`
`I
`
`I
`I 950
`I
`1 SUMMER
`~ I
`I
`
`T T
`Jig igiJi
`
`I
`970
`I
`
`:
`
`SUMMER
`~ I
`
`-
`
`...J
`
`M ..._.-
`
`980
`
`D = F (d)
`
`d
`
`FIG. 9
`
`0010
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 9 of 25
`
`6,097,854
`
`START AT COARSEST IMAGE RESOLUTION
`
`1010
`
`PERFORM ALIGNMENT ALGORITHM ~1 020
`USING PRIOR RESULTS
`
`1040
`
`YES
`OUTPUT
`>---~ RESULTS
`
`1050
`
`GO TO THE NEXT HIGHEST RESOLUTION LEVEL
`AND PERFORM THE REQUIRED MODIFICATION
`OF THE ALIGNMENT PARAMETERS
`e.g. MODIFY ESTIMATE OF M
`
`FIG. 10
`
`1310
`
`PERFORM AT LEAST A ROUGH ALIGNMENT OF
`I 0, I 1 USING INCREMENTAL ALIGNMENT
`ALGORITHM WITH PERSPECTIVE TRANSFORM M
`J
`~1 320
`EXTRACT INDIVIDUAL ELEMENT OF M (m 0 THROUGH m 7 )
`~
`ESTIMATE FOCAL LENGTHS f0, f1
`FROM A FUNCTION OF m 0 THROUGH M7
`
`0
`~ 133
`
`;
`
`FIG. 13
`
`0011
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 10 of 25
`
`6,097,854
`
`INPUT IMAGES Ik (x) AND II (x) AND THE
`ORIENTATIONS, Rk AND RI, OF lk AND l1
`RELATIVE TO A POINT P
`IN 3D WORLD COORDINATES
`
`1110
`
`1130
`
`FOR LA TEST VE__RSION OF I I· FIND AN INCREMENTAL
`ROTATION R (Q) OF x THROUGH A 3D ANGLE
`Q = (Wx, Wy, Wz) WHICH PROVIDES BETTER
`REGISTRATION OF I1WITH Ik
`
`UPDATE l1 BY RESAMPLING II
`USING THE LATEST VERSION OF M
`
`1160
`
`NO
`
`YES
`
`OUTPUT LA TEST VERSION OF
`I1 AND R1
`
`~1180
`
`FIG. 11
`
`0012
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 11 of 25
`
`6,097,854
`
`X
`
`lo (x)
`
`1205 1225
`
`1230 + +,..._ _ ___.
`
`9i 1235
`
`1290
`_j_
`
`1215
`
`-It
`.---..At
`
`1255
`
`J i
`
`9i
`
`Jigi
`
`1260
`
`1217
`
`Jigi
`
`,..
`R1..- R1 (Q) Rz
`
`" R (Q)
`
`1285
`
`RODRIGUEZ
`FORMULA
`" R (Q) = G (Q)
`
`1250
`
`grJ!
`
`SUMMER
`~ I
`
`1270 SUMMER
`}:
`I
`
`- ~
`
`NORMAL EQUATION
`SOLVER
`
`1275
`
`FIG. 12
`
`0013
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 12 of 25
`
`6,097,854
`
`r - - - - - -
`INNERLOOP
`12901
`
`1
`I
`I
`I 1460
`I r-+--'--------,
`
`9 · I
`
`-
`II 1245
`
`- - I
`XJ
`
`COMPUTE
`JACOBIAN Jj
`AT CENTER
`OF PATCH j
`
`L--+{x
`12501
`9 i 9 iT
`.---____.JL...,.L.-----, ..----L----.12701
`}: SUM OVER ~ SUM OVER
`I
`I
`ALL PIXELS
`ALL PIXELS
`IN PATCH j
`IN PATCH j
`Aj
`1420 X 1-+------=..-----t 1425
`
`COMPUTE
`WEIGHTS
`
`Wj
`
`1485
`
`I
`I
`X
`I
`W· JAJ·T
`W· J ·b·
`J LJJ ____ J
`_______ 1 LJ_ _
`- - ------ - - --- --- - ---
`1440
`1450
`OUTER
`I
`SUM OVER ALL
`SUM OVER ALL
`LOOP
`1
`1470
`PATCHES
`PATCHES
`1 sjsm
`1 sjsm
`I
`- - - - - - - - - - ____ __ _
`j
`b
`A
`FIG. 14
`
`0014
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 13 of 25
`
`6,097,854
`
`X·1 X·
`J
`J
`r---- -------
`\ 11
`
`\
`\
`\
`\
`
`\
`\
`\
`\
`
`\ -""'-
`
`FIG. 15
`
`' ' ' ' ' ' ' ' ' ' '
`
`'
`........
`
`... ....
`
`X'1
`rJ
`.....,- ---
`'
`',
`',
`'
`'
`
`~2
`---------0
`,
`,
`D x· ,'
`t'
`,
`,
`,
`'
`'t) Xjo
`
`FIG. 17
`
`6 Xjo
`FIG. 16
`
`)(:2
`
`p'1
`
`I
`
`I
`
`X·1
`J
`'
`
`'
`
`x·
`,'
`J
`,'
`Q
`',
`...
`,
`'
`, uJ·2
`, u. •
`U .1 '' J
`,
`,
`•
`'
`J
`' ,
`,
`'
`'
`'C5 Xjo
`FIG. 19
`
`0015
`
`
`
`U.S. Patent
`
`Aug. l , 2000
`
`Sheet 14 of 25
`
`6,097,854
`
`1810
`
`PERFORM PATCH-BASED ALIGNMENT TO
`PRODUCE Rk FOR EACH IMAGE I k
`
`1820
`
`DETERMINE FROM Rk 's WHICH IMAGES ARE
`SIGNIFICANTLY OVERLAPPING PAIRS
`
`1830
`
`DEFINE PATCHES j IN ONE IMAGE h
`
`1840
`
`1850
`
`FOR EACH PATCHj IN IMAGE Ik FIND WHICH
`IMAGES OVERLAP THAT PATCH AND COMPUTE
`OPTIMAL REGISTRATION AND ERROR
`
`DISCARD (OR DEWEIGHTI PATCHES HAVING
`HIGH RESIDUAL ERROR (OR LOW VARIANCE)
`
`1860
`
`PERFORM SIMULTANEOUS INCREMENTAL 3D ROTATIONS
`R (Q) (OR WARPINGS M) OF EACH IMAGE OF THE
`Si:T OF PAIRS, SO AS ro CONVERGE EACH PAIR
`OF CENTER-RAY-DIRECTIONS
`
`FOR EACH IMAGE Ik UPDATE Rk OR M BY EON 17; 1870
`RECOMPUTE fk AND FORM Vk
`
`1890
`
`OUTPUT Rk FOR EACH IMAGE
`
`FIG. 18
`
`0016
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 15 of 25
`
`6,097,854
`
`FORk: 1-N AND I= 1-N
`DETERMINE OVERLAPPING IMAGE PAIRS
`lk I lr FROM Rk1 Rr OR Mk1 Mr
`~
`FOR EACH OVERLAPPING IMAGE PAIR
`lk I I[ WARP IMAGE I[ BY A~-1 Rk
`OR Mr1Mk AND OVERLAY ON lk
`~
`FOR EACH PATCH~tiN lk OVERLAID BY l1
`COMPUTE M
`ION ESTIMATE Ujkl
`
`,... -----
`
`2010
`
`,... I'-- 2020
`
`,... -----
`
`2030
`
`,... 1-'--.__
`2040
`
`2050 FOR EAC
`H
`PATCHj
`
`+
`FOR EACH PATCH~ COMPUTE A NORMALIZED
`AVERGAE Ujk F Ujki FOR ALL IMAGES
`OVERLAYING THEjth PATCH
`~
`REVERSE EACH Ujk - Ujk ..- Ujk
`+
`REVERSE MAP TO PIXEL LOCATIONS IN
`UNWARPED VERSIONS OF Ik USING -ujk
`AND INTERPOLATE THE SPARSE SET OF MOTION
`ESTIMATES -Ujk TO OBTAIN PER=PIXEL MOTION
`ESTIMATES u {x)
`~
`RESAMPLE PIXEL VALUES AT REVERSE-MAPPED
`,. .........
`PIXEL LOCATIONS ACCORDING TO
`lk (x) = Ik {x + u {x))
`~
`
`,. r--
`
`2060
`
`t'--
`
`2070
`
`OUTPUT RESAMPLED VERSION OF Ik (x) + r-- 2080
`
`FIG. 20
`
`0017
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 16 of 25
`
`6,097,854
`
`FIG. 21
`
`FIG. 22
`
`PATCHj
`
`Ujkm
`~
`I
`
`Ujk
`
`_- ~ Ujk/
`
`I -------------+----"
`
`FIG. 23
`
`0018
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 17 of 25
`
`6,097,854
`
`-U
`
`1k
`.....
`.....
`..... ~ ,
`...
`
`u16k
`
`-----..~...-PATCH
`CENTER
`PIXEL
`(SOURCE
`PIXEL)
`
`DESTINATION
`PIXEL
`FIG. 25A
`
`,. ..... ~ '
`~ '
`
`U1sk
`
`-----..~...-PATCH
`CENTER
`PIXEL
`
`MOTON
`DESTINATION
`PIXEL
`FIG. 24
`
`FIG. 258
`
`0019
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 18 of 25
`
`6,097,854
`
`---.-----,-....1.--r-----r--
`
`-.......--- -
`
`-
`
`-
`
`-
`
`SPARSE
`DATA
`INTERPOLATION
`
`FIG. 26
`
`0020
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 19 of 25
`
`6,097,854
`
`FIG. 27
`
`FIG. 28
`
`(0,0)
`
`(1,0)
`
`FIG. 29
`
`0021
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 20 of 25
`
`6,097,854
`
`FIG. 30
`
`FIG. 31
`
`0022
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 21 of 25
`
`6,097,854
`
`:-.--32 20
`
`r-3200
`
`PAINT PSEUDOCOLORS INTO
`TEXTURE MAP TRIANGLES
`USING AUXILIARY BUFFER
`+
`FOR EVERY TRIANGLE IN 3-D MODEL,
`COMPUTE Mr MAPPING ALL VEHICLES
`p IN 3-D MODEL TO POINTS u
`, r--3210
`IN 2-D TEXTURE MAP
`+
`FOR EACH IMAGE (I)k ~k RECALL CAMERA
`MATRIX Mk MAPPING Xk TO 3-D
`/
`POINT P IN 3-D MODEL
`+
`COMPUTE IMAGE PIXELS: Xk FROM TEXTURE
`/ r--32 30
`MAP PIXELS u: Xk- MkMf1u
`f
`FOR EACH TRIANGLE IN 3-D MODEL FIND / r--32 40
`EVERY POINT u INSIDE CORRESPONDING
`TRIANGLE IN TEXTURE MAP
`50
`3~
`+
`FOR EVERY POINT u INSIDE OR BESIDE THE TRIANGLE?
`FIND THE CORRESPONDING PIXEL Xk IN
`IMAGE Ik FOR EVERY IMAGE k: 1-N AND
`COMPUTE A BLENDED COLOR ~HADE) OF
`u AS A WEIGHTED SUM OF TH COLORS
`OF THE CORRESPONDING PIXELS Xk
`IN THEN IMAGES Ik
`
`COLOR (u) = ~Wk COLOR (X ) = tWk COLOR (Mk Mf1u)
`+
`60
`f-..-...-32
`PLACE COLOR (u) IN TEXTURE MAP USING '
`PSEUDOCOLOR STENCIL
`
`FIG. 32
`
`0023
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 22 of 25
`
`6,097,854
`
`FIG. 33A
`
`FIG_ 33B
`
`FIG. 33D
`
`0024
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 23 of 25
`
`6,097,854
`
`FIG. 34A
`
`FIG~ 34B
`
`FIG. 34C
`
`FIG~ 35A
`
`.FIG. 35C
`
`F'IG.
`
`FIG. 34E
`
`0025
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 24 of 25
`
`6,097,854
`
`FIG. 36A
`
`FIG. 36B
`
`FIG. 36C
`
`0026
`
`
`
`U.S. Patent
`
`Aug.l, 2000
`
`Sheet 25 of 25
`
`6,097,854
`
`FIG. 3? A FIG. 3?.B FIG. 37C FIC; . 37D
`
`FTG. ~38
`
`0027
`
`
`
`6,097,854
`
`2
`panning motion becomes a pure translalion problem. Ideally,
`to btiild a cylindrical or spherical panorama from a horizon(cid:173)
`tal panning sequence, only the unknown panning angles
`need to be recovered. In practice, smaU vertical translations
`5 arc needed to compensate for vertical jitter and optical twist.
`Therefore, both a horizontal translation t,.. and a vertical
`translation t>" are estimated for each input image. To recover
`the tra nslational motion, tbe incremental translation ot=(ot,.,
`lit>") is estimated by minimizing the intensity error between
`10 two images, 10 , 11.
`
`£(61) = L:rJ.cx; + <it ) - io(-<;)f
`
`i
`
`where
`
`are corresponding points in the two images, and t=(I,,ty) is
`the global translational motion field which is ihe same for all
`pixels. After a first order Taylor series expansion, the above
`equation becomes
`
`where e;=l 1(x"1)-10 (x;) is the current intensity or color error,
`and g/='Vl1(x'1) is tbc image gradient of 11 at x',.. Tbis
`30 minimization problem has a simple least-squares solution,
`
`1
`IMAGE MOSAIC CONSTRUCTION SYSTEM
`AND APPARATUS WITH PATCH-BASED
`AUGN M EN1~ GLOBAL BLOCK
`ADJUSTMENT AND PAIR-WISE MOT ION(cid:173)
`BASED LOCAL WARPING
`BACKGROUND OF THE INVENTION
`l. Technical Field
`"The invention is related to computer systems for con(cid:173)
`structing and rendering panoramic mosaic images from a
`sequence of still images, video images or scanned photo(cid:173)
`graphic images or the like.
`2. Background Art
`Image-based rendering is a popular way to simulate a
`visually rich tele-preseoce or virtual reality experience. 15
`Instead of building and rendering a complete 30 model of
`the environmeot, a collection of images is used to render the
`scene while supporting virtu a 1 camera motion. For example,
`a single cylindrical image surrounding the viewer enables
`the user to pan and zoom inside an environment created from 20
`real images. More powerful image-based rendering systems
`can be built by adding a depth map to the image or by using
`a larger collection of images.
`The present inveotioo is particularly directed to image(cid:173)
`based rendering systems without any depth information, i.e., 25
`those which only support user panning, rotation, and zoom.
`Most of the commercial products based on this idea (such as
`QuickTime VR) use cylindrical images with a limited ver(cid:173)
`tical field of view, although newer systems support full
`spherical maps (e.g., Interactive Pictures, and RealVR). A
`number of techniques bave been-developed for capturing
`panoramic images of real-world scenes. One way is to
`record an image onto a long film strip using a panoramic
`camera to directly capture a cylindrical panoramic image.
`Another way is to use a Ieos with a very large field of view 35
`such as a fisheye Ieos. Mirrored pyramids and parabolic
`mirrors can also be used to directly capture panoramic
`images. A Jess hardware-intensive method for constructing
`ful l view panoramas is to take many regular photographic or
`video images in order to cover tbe whole viewing space. 40
`These images must then be al igned and composited into
`complete panoramic images using an image mosaic or
`"stitching" method. Most stitching systems require a care(cid:173)
`fully controlled camera motion (pure pan), and only produce
`cylindrical images.
`Cylindrical panoramas are commonly used because of
`their ease of construction. To build a cylindrical panorama,
`a sequence of images is taken by a camera moun ted on a
`leveled tripod. If the camera focal length or field of view is
`known, each perspective image can be warped into cylin- 50
`drical coordinates. To build a cylindrical panorama, 3D
`world coordinates are mapped to 20 cylindrical screen
`coordinates using
`
`45
`
`To handle larger initial displacements, a hierarchical coarse(cid:173)
`to-fine optimization scbeme is used. To reduce discontinui(cid:173)
`ties in intensity and color between the images being
`com posited, a simple feat hering process is applied in which
`the pixels in each image are weigbted proportionally to thei r
`distance to the edge (or more precisely, their distance to the
`nearest invisible pixel). Once registration is finished, the
`ends are clipped (and optionally the top and bottom), and a
`single panoramic image is produced.
`Creating panoramas in cylindrical or spherical coordi(cid:173)
`nates has several limitations. First, it can only handle the
`simple case of pure panning motion. Second, even though it
`is possible to convert an image to 20 spherical or cylindrical
`coordinates for a known tilling angle, ill-sampling at no rth
`pole and south pole causes big registration errors. (Note tha t
`cylindrical coordinates become undefined as you tilt your
`camera toward north or south pole.) Third, it requires
`knowing the focal length (or equivalently, field of view).
`While focal length can be carefully calibrated in the lab,
`ss estimating the focal length of the lens by registering two or
`more images using conventional techniques is not very
`accurate.
`The automatic construction of large, high-resolution
`image mosaics is an active area of research in the fields of
`pbotogrammetry, computer vision, image processing, and
`computer grapbics. Image mosaics can be used for many
`diJierent applications. 1be most traditional application is the
`construction of large aerial and satellite photographs from
`collections of images. More recen t applications include
`65 scene stabilization and change detection, video compression
`and video indexi ng, increasing the fie ld of view and reso(cid:173)
`lution of a camera, and even simp!e photo editing. A
`
`() = t:~n-• (X {Z) and ,. = Y j J X 2 + z2
`
`where 0 is the panning angle and v is tbe scan line. Similarly,
`3D world coordinates can be mapped into 20 spherical
`coordinates 8, <1> using
`
`Once each input image has been warped, constructing the
`panoramic mosaics for a leveled camera undergoing a pure
`
`60
`
`0028
`
`
`
`6,097,854
`
`3
`particu larly popular application is the emulation of tradi(cid:173)
`tional film-based panoramic photography with digital pan(cid:173)
`oramic mosaics, for applications such as the construction of
`virtual environments and virtual travel. ln computer vision,
`image mosaics are part of a larger recent trend, namely the 5
`study of visual scene representations. llle complete descrip(cid:173)
`tion of visual scenes and scene models often entails the
`recovery of depth or parallax information as well.
`lo computer graphics, image mosaics play an important
`role in the field of image-based rendering, which aims to 10
`rapidly render photorealistic novel views from collections of
`rea l (or pre-rendered) images. For applications such as
`virtual travel and architectural walkthroughs, it is desirable
`to have complete (full view) panoramas, i.e., mosaics which
`cover the whole viewing sphere and hence allow the user to t5
`look in any direction. Unfortunately, most of the results to
`date have been limited to cylindrical panoramas obtained
`with cameras rotating on leveled tripods with carefully
`designed stages adjusted to minimize motion parallax. This
`has limited the users of mosaic building ('"stitching") to 20
`researchers and professional photographers who can aO'ord
`such specialized equipment. Prescot techniques are difficult
`because generall y they require special camera equipment
`which provides pure panning motio n with oo motion paral(cid:173)
`lax.
`Problems to be Solved by the Invention:
`It would be desirable for any user to be able to •·paint" a
`fuJ I view panoramic mosaic with a simple band-held camera
`or camcorder. However, th ~s requires several problems to be
`overcome.
`First, cylindrical or spherical coordinates should be
`avoided for constructing the mosaic, since these represen(cid:173)
`tations introduce singularities near the poles of the viewing
`sphere.
`Second, accumulated misrcgistration errors need to be 35
`corrected, which are always present in any large image
`mosaic. For example, if one registers a sequence of images
`using pairwise alignments, there is usually a gap between the
`last image and the first one even if these two images are tbe
`same. A simple "gap closing" technique is introduced in the 40
`specification below which forces the first and last image to
`be the same and distributes the resulting corrections across
`tbc image sequence. Unfortunately, this approach works
`only for pure panning mo tions with uniform motion steps, a
`significant limitation.
`Third, any deviations from the pure parallax-free motion
`model or ideal pinhole (perspective) camera model may
`result in local misregistrations, which are visible as a loss of
`detail or multiple images (ghosting).
`
`SUMMARY OF THE INVENTION
`The specification discloses bow to avoid using cylindrical
`or spherical coordinates for constntcting the mosaic by
`associating a rotation matrix (and optionally a focal length)
`with each input image, and performing registration in the 55
`input image's coordinate system. Such mosaics are referred
`to herein as rota tional mosaics. The system can use a
`postprocessing stage to project such mosaics onto a conve(cid:173)
`nient viewing surface, i.e., to create an environment map
`represented as a texture-mapped polyhedron surrounding the
`origin. Preferably, the system of tbe invention performs the
`registration by patch-based alignment of the images for
`computational efficiency.
`1be system of the invention employs a global block
`adjustment process, which solves the problem of accumu(cid:173)
`lated misregistration errors, to find the optimal overall
`registration.
`
`4
`The system of the invention solves tbe problem of loss of
`detail or image ghosting by computing local motion esti(cid:173)
`mates (block-based optical flow) between pairs of overlap(cid:173)
`ping images, and using these estimates to warp each input
`image so as to reduce the misregistratioo. This is less
`ambitious than actually recovering a perspective depth value
`for each pixel, but bas the advantage of being able to
`simultaneously model other effects such as radial Ieos dis(cid:173)
`tortions and s mall movements in the image.
`The system of the invention aligns a set plural overlap(cid:173)
`ping images useful in constructing a mosaic by performing
`patch-based alignment of the set of overlapping images to
`produce a set of warped images, performing block adjust(cid:173)
`meat of the set of warped images to produce a set of
`block-adjusted images, and then performing pair-wise
`motion-based local warping of the set of block-adjusted
`images.
`The patch-based alignment is accomplished by first find(cid:173)
`ing an incremental -deformation of one image rela tive to a
`3-dimensional coordinate system tending to reduce registra(cid:173)
`tion error between overlapping portions of the pair of
`images, and tbeo warping the one image in accordance with
`the incremental deformation. Tbe incremen tal deformation
`is found by computing a difference error vector between the
`25 pair of images. First, the one image is divided into plural
`patches. Within each patch, gradients ace computed at pixels
`within the patch and a single J acobian is computed fo r the
`entire patch. The Jacobian is the Jacoian of a coordinate
`system of the one image wi th respect to the incremental
`30 deformation. TI1e gradients are combined with the Jacobian
`to produce a matrix for the patcb. The gradients are sepa(cid:173)
`rately combined with the error vector and with the Jacobian
`to produce a residual for the patch. Tbe matrix is summed
`over plural patches and the residual is also summed over
`plural patches to produce Hessians and residuals, respec(cid:173)
`tively. Normal equations are defined with the residuals and
`Hessians and arc solved to provide the desired incremental
`deformation.
`T he g lobal block adjustment is accomplished by
`determining, fo r each one of the warped images, ray direc(cid:173)
`tions relative to a 3-dimeosional coordinate system at plural
`predetermined pixel locations in the one image. For eacb one
`of the plural pixel locations io the one image, ray directions
`are determined relative to the 3-dimensional coordinate
`45 system of the corresponding pixel location in each one of the
`other images overlapping the one predetermined pixel loca(cid:173)
`tion of the one image. Then, incremental deformations of the
`overlapping images are computed which simultaneously
`minimize differences between the ray directions of plural
`so pairs of the overlapping images whicb include the one
`image. The l'oregoing is performed fo r each of the plural
`predetermined pixel locations of the one image simulta(cid:173)
`neously. The images are warped in accordance with the
`incremental deformations and the process is repeated.
`The pair-wise motion-based local warping, or deghosting,
`is accomplished by -deterroioing, at plural predetermined
`pixel locations of each one of the images, motions between
`tbe one image and other images of the set, combining the
`motions to produce ao estimated motion at eacb of the plural
`60 predetermined pixel locations of the one image, and then
`warping the one image in accordance witb the estimated
`motions. Preferably, it is first determined which of the
`images of the set overlies the one image. This determination
`is made by determining alignment transformations relating
`65 the images to a 3-dimensional coordinate system and then
`inferring mutual overlap between images from the transfor(cid:173)
`mations. The images are resampled in accordance with these
`
`0029
`
`
`
`6,097,854
`
`5
`
`5
`alignment transformations. The warping of each image is
`accomplished by constructing a ma pping of warped pixel
`locations from tbe estimated motions and tben resampling
`tbe one image at tbe warped pixel locations. Tbe mapping is
`preferably a reverse mapping of pixels in an unwarped
`version of tbe one image.
`The invention can be used to create full view panoramic
`mosaics from image sequences. Unlike current panoramic
`stitching methods, which usually requi re pure horizon tal
`camera panning, the disclosed system does not requi re any
`contro lled motions or constraints on how the images are
`taken (as long as there is no strong mo tion parallax). For
`example, images taken from a band-beld digital camera can
`be stitched seamlessly into panoramic mosaics.
`By taJ.:ing as many images as needed, image mosaics can t5
`be constructed which cover as la rge a field of view as
`desired, e.g., a complete environment map. Because tbe
`image mosaics is represented in the invention using a set of
`transforms, there are no singu larity problems sucb as those
`existing at the top and bottom of cylindrical or spherical 20
`maps. This method is fast and robust because it directly
`recovers 30 ro tations instead of general 8 parameter planar
`perspective transforms. By mapping the mosaic onto an
`arbitrary texture-mapped polyhedron surrounding the origin,
`the vi rtual environment can be viewed or explored using 25
`standard 30 graphics viewers and hardware wi thout requir(cid:173)
`ing special-purpose p layers. Once a mosaic bas been
`constructed, it can, of course, be mapped into cylindrical or
`spherical coordinates, and displayed usi ng a special purpose
`viewer. Such specialized representations are not necessary,
`and represen t just a particular choice of geometry and
`texture coordinate embedding. Instead, using a texture map(cid:173)
`ping process described herein, the mosaic can be convened
`to an environment map, i.e., by mapping the mosa ic onto any
`texture-mapped polyhedron surrounding tbe origin. This 35
`allows the use of standard 30 graphics APis and 30 model
`formats, and allows tbe use of 30 graphics accelerators for
`texture mapping. The mapping process can be employed in
`a rendering process that can be just as fast as special-purpose
`v iewers. Furthermore, the mapping proces.s can take advan- 40
`tage of 30 graphics (rendering) hardware, support mul ti(cid:173)
`resolution textures, and allow for easy integration with
`regular 30 graphics.
`BRIEF DESCRIPTION OF Ti lE DRAWINGS
`1be file of this patent contains at least one drawing
`executed in color. Copies of this patent with color drawings
`will be provided by the Paten t and Trademark Office upon
`request and payment of tbe neces.sary fee.
`FIG. 1 is a block diagram illustrating the operation of a 50
`system embodying the invention.
`FIGS