throbber
SIGNAL PROCESSING: IN&GE COMMUNICATION ELSEVIER Efficient representations of video sequences and their applications Michal Irani*, P. Anandan, Jim Bergen, Rakesh Kumar, Steve Hsu Dwirl srrr2off Rc.vctr~C/f Crntw, CN531)0. Princr/o/l, ‘YJ oxw, L’S,4 Abstract Recently, there has been a gl-owing interest in the USC of mosaic images to rcprcscnf the infwnatio~~ contained in video sequences. This paper systematically investigates ho\\ to go beyond thinking of the mosaic
`
`simply ax ;I
`
`\,isualization device. but rather as a basis for an @cic~nr and umplctv representation of video scqucnccs. We describe two different types of mosaics called the stufic and the &nc/rxi~ mosaics that are suitable for diflcrcnt needs and scenarios. Thcsc two types of mosaics are unified and generalized in a mosaic representation called the /o~ywr.nl /~wwmkl. To handle sequences containing large variations in image resolution, we develop a n2fr/ti~eso/Llrit,rl
`discuss a series of increasingly complex alignment transformations (ranging from 2D to 3D and layers) for making the mosaics. We describe techniques for the basic elements of the mosaic construction process, namely sequence ~L/~;\w~KW~, scqucncc i~tc!lrx~io/~ into a mosaic image. and ~KC~‘LK~/ r777d~~,sis to represent information not captured by the
`image. WC dcscribc several powerful video applications of mosaic representations including ~.icko c~m7prcssior7, 7 it/co ~~77/11777~~~~7~7~~777. ~~77/7~777~~cv/ c.i.s77r7/i_lrtic~rr. and other applications in IG/U inck.\-i77(q, srorc~h. and 777trr7;/~7{/“rio77. Kqvvords- Video representation: Mosaic images; Motion analysis: Image rcsistration: Video databasch:
`
`777ostric~. We
`
`mosaic
`
`\‘~dcc~
`
`compression; Video enhancement: Video visualization; Video indwng: \:idco manipulation I. Introduction Video is a very rich source of information. Its two basic advantages over still images are the ability to obtain a continuously varying set of views of a scene. and the ability to capture the temporal (or ‘dynamic’) evolution of phenomena. A number of applications that involve processing the entire information within video sequences have recently emerged. These include digital libraries, interactive video analysis and softcopy exploitation environments. low-bitratc video transmission, and interactive video editing and manipulation systems. These applications require ekient representations of large video streams, and efficient methods of access- ing and analyzing the information contained in the video data. There has been a growing interest in the use of a panoramic ‘mosaic’ image as an efficient way to represent a collection of frames (e.g., see Fig. I ) [ 17.2 I. 22, 161. Since successive images within a video sequence usually overlap by a large amount, the mosaic image provides a significant reduction in the total amount of data needed to represent the scent. OY23-SY6S/Y61$15.00 @ lYY6 Elscvicr Science B.V. All nghts reserwd ssnr 0Y23-5’)h5(95)00055-0
`
`Prime Focus Ex 1034-1
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`M. Irani et al. ISignal Processing: Image Communication 8 (1996) 327-351
`
`Fig. 1. Static mosaic image of a table-tennis game sequence. (a)-(c) Three out of a 300 frame sequence obtained by a camera panning across the scene; (d) the static mosaic image constructed using a temporal median; (e) the static mosaic image constructed using a temporal average. Although the idea of the mosaic and even some of its applications have been recognized, there has not been a systematic approach to the characterization of what the mosaic is, or even an attempt to develop any type of standard terminology or taxonomy. In prac- tice, a single type of mosaic, such as a static mosaic image obtained from all the frames of a contiguous sequence, is suitable for only a limited class of appli- cations. Different applications such as video database storage and retrieval and real-time transmission and processing require different types of mosaics. Also, while mosaics have been recognized as effi- cient ways of providing ‘snapshot’ views of scenes, the issue of how to develop a representation of scenes based on mosaics has not been adequately treated. Specifically, we refer to the question of how to represent the details
`
`captured by the mosaics, so that the sequence can be fully recovered from the mosaic representation. The purpose of this paper is to develop a taxonomy of mosaics by carefully considering the various issues that arise in developing mosaic representations. Once this taxonomy is available, it can be readily seen how the various types of mosaics can be used for different applications. The paper includes examples of several applications of mosaics, including video compression, video visualization, video enhancement, and other ap- plications.
`
`complete
`
`not
`
`Prime Focus Ex 1034-2
`Prime Focus v Legend3D
`IPR2016-01243
`
`328
`

`

`scene subsequences (e.g., see [23]),
`
`and a static mosaic image is constructed for each scene subsequence to provide a snapshot view of the sub- sequence. This is done
`
`in batch mode,
`by aligning all frames of that subsequence to a
`fixed
`
`Static mosaic
`
`M. Iruni et al. ISiynul Prowssing: Image Communication 8 (1996) 327-351 329 The remainder of the paper is organized as follows. Section 2 presents various types of mosaic represen- tations, and discusses their efficiency and complete- ness in terms of sequence representation. Section 3 describes the techniques that we use to align the im- ages, construct the mosaics, and detect the significant ‘residuals’ not captured in the mosaics from the input video stream. Section 4 outlines a number of powerful video applications of the mosaic representations with examples and experimental results. Finally Section 5 discusses the salient issues for future research on this topic. 2. The mosaic representation A mosaic image is constructed from all frames in a scene sequence, giving a panoramic view of the scene. Although the idea of a mosaic image is simple and clear, a closer look at the definition reveals a number of subtle variations. For instance, since the different images that comprise a mosaic spatially overlap with each other, but are taken at different time instances, there is a choice regarding how the different grey val- ues available for the same pixel are combined. Sim- ilarly, the variations in the pixel resolution between images leads to the issue of choosing the resolution of the mosaic image. Finally, there are also choices regarding the geometric transformation model used for aligning the images to each other. The different choices in these various issues is typically a result of the type of application for which the mosaic is in- tended. In this section we describe different ‘types’ of mosaics that arise out of the types of considerations outlined above. 2. I.
`The static mosaic is the common mosaic represen- tation [ 17,22,2 1, 16, 141, although it is usually not re- ferred to by this name. It has been previously referred to as mosaic or as ‘salient still’ (e.g., see Figs. 1 and 2). It will be shown (in Section 4) how the static mosaic can also be extended to represent temporal subsam- ples of key events in the sequence to produce a static ‘event’ mosaic (or ‘synopsis’ mosaic). The input video sequence is usually segmented into contiguous
`coordinate system (which can be either user-defined or chosen automatically according to some other criteria). The aligned images are then integrated using different types of temporal filters into a mosaic image, and the signif- icant residuals are computed for each frame of relative to the mosaic image. The details of the mosaic con- struction process are described in Section 3. Note that after integration, the moving objects either disappear or leave ‘ghost-like’ traces in the panoramic mosaic image. Examples of static mosaic images are shown in Figs. 1 and 2. In Fig. 1 a static mosaic im- age of a table-tennis game sequence is constructed, once using a temporal median, and once using a temporal average. In this sequence, the player and the crowd move with respect to the back- ground, while the camera pans to the right. The constructed mosaic image displays a sharp back- ground, with blurry crowd, and a ghost-like player. Fig. 2 shows a static mosaic image of a base- ball game sequence produced using a temporal median. In this sequence two players run across the field (from right to left), while the camera pans to the left and zooms in on the players. The constructed mosaic image in this case displays a sharp image of the background with no trace of the two players. In both examples, a 2D mo- tion model was sufficient to align the images (see Section 3). The static mosaic image exploits long term
`
`correlations (over large portions of the image frames), and is therefore an efficient scene representation. For examples, in Figs. 1 and 2, the
`video sequence can be represented by the mo- saic image of the background scene with the appropri- ate transformations that relate each frame to the mo- saic image. The only information in the sequence
`captured by the mosaic image and needing additional representation are the changes in the scene with re- spect to the background (e.g., moving players). These residuals can either be represented independently for
`
`ral
`redundancies (over the entire scene subsequence) and large
`sputiul
`
`tempo-
`
`entire
`
`not
`
`Prime Focus Ex 1034-3
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`A4. Irani et al. ISignal Processing: Image Communication 8 (1996) 327-351
`
`b) e) Fig. 2. Static mosaic image of a baseball game sequence. (a)-(f) S’ LX out of a 90 frame sequence obtained by a camera panning from right to left and zooming in on the runners. (g) The static mosaic image constructed using a temporal median. The black regions are scene parts that were never imaged by the camera (since the camera zoomed-in on the scene).
`
`Prime Focus Ex 1034-4
`Prime Focus v Legend3D
`IPR2016-01243
`
`330
`

`

`each frame, or can frequently be represented more ef- ficiently as another layer using yet another mosaic [l] (see Section 2.5). The issue of representing residuals which are not captured by the mosaic image has frequently been overlooked by handling sequences with no scene ac- tivity [2 1, 16. 141. The mosaic image, along with the frame alignment transformations, and with the residu- als together constitute a conydrtr and &icirnt repre- sentation, from which the video sequence can beJtl& reconstructed. These issues have been addressed to a limited extent with respect to video compression in [I], although that work does not consider how to as- sign a significance measure to the residuals or how to handle non-rigid layers. The static mosaic, being an efficient scene repre- sentation, is ideal for video storuge and retrieval, es- pecially for rupid browsing in large digital libraries and to obtain efficient access to individual frames of interest. It can also be used to increase the efficiency of content-based indexing into a video sequence, to reduce the tedium associated with video manipulation and analysis. Last but not least, it can be used for en- hanced visualization in the form of panoramic views, as well as a tool for enhancing the contents of the im- ages. These applications are described in greater detail in Section 4. Since the stutic mosaic is constructed in hutch mode, it cannot completely depict the dynamic as- pects of the video sequence. This requires a dynumic mosaic, which is a srquerfcr of evolving mosaic im- ages, where the content of each new mosaic image is updated with the most current information from the most recent frame. The sequence of dynamic mosaics can be visualized either with a stationary background (e.g., by completely removing any camera induced motion), or in a manner such that each new mosaic image frame is aligned to the corresponding input video image frame. In the former case, the coordinate system of the mosaic is fixed (see Fig. 3), whereas in the latter case the mosaic is viewed within a mov- ing coordinate system (see Fig. 4). In some cases a third alternative may be more appropriate, wherein a portion of the camera motion (e.g.. high frequency jitter) is removed or a preferred camera trajectory is synthesized. When a jxed coordinate system is chosen for the dynamic mosaic, each new image frame is warped to- wards the current dynamic mosaic image, and the in- formation within its field of view is updated accord- ing to the update criterion (e.g., most recent, average, weighted average, etc. (see Section 3.2)). When the coordinate system of the mosaic is chosen to be (/J)- namicall~~ updated to match that of the input sequence, the current dynamic mosaic image is warped towards each new frame, and then the information within the current field of view is updated according to the up- date criterion. When a virtuul coordinute sy.s/em is chosen (either predetermined by the user, or computed according to some criterion), both the dynamic mosaic and the current frame are warped towards that coordi- nate system. Note that the definition of the coordinate system and the warping mechanism will vary accord- ing to the world and motion model (see Section 3). Figs. 3 and 4 show examples of the evolution of some dynamic mosaics. Fig. 3 shows an evolving dy- namic mosaic image of a table-tennis game, where the player and the crowd move with respect to the back- ground, while the camera pans to the right. In this example we chose to construct the mosaic in a ,~~.YcJc/ coordinate system (that of the first frame). Note that in the dynamic mosaic the crowd and the player do not blur out (as opposed to the static mosaic shown in Fig. 1 ), and are constantly being updated. Fig. 4 shows an evolving dynamic mosaic image of a baseball game sequence, where two players run across the field (from right to left), while the camera pans to the left and zooms in on the players. In this ex- ample we chose to construct the mosaic in a dynumic coordinate system that matches that of the input video (i.e., changes with each new frame). Note that in the dynamic mosaic the players do not disappear (as op- posed to the static mosaic in Fig. 2), but are constantly being updated. The umplete dynamic mosaic representation of the video sequence consists of the ,$r.st dynamic mo- saic, and the incrementul alignment parameters and the incremental residuals that represent the changes. Note that the difference in mosaic content between the static and dynamic mosaics implies a difference in the residuals that are not represented by the mosaic. In the dynamic case, since the content of the mosaic is
`
`Prime Focus Ex 1034-5
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`332 M. Irani et al. /Signal Processing: Image Communication 8 (1996) 327-351 Fig. 3. Evolution of the dynamic mosaic images of the table-tennis game sequence. Left column: Three frames from the original sequence. Right column: The corresponding dynamic mosaic images. Note that the position of the player and the crowd are constantly being updated to match the current frame. dynamically being updated, the residuals reflect only changes in the scene that occur in the time elapsed between successive frames, as well as additional parts of the scene that were revealed for the first time to the camera. These are different from the residuals in the static case, which represent objects that have some motion in any portion of the video sequence (with re- spect to the background static mosaic). Fig. 5 shows an example of frame residuals detected for the static and the dynamic representations in the table-tennis se- quence. In general, since changes between successive frames are relatively small, the amount of residual information in the dynamic mosaic will be smaller than that in the static case. The dynamic mosaic is therefore a more
`
`to individual frames, which is essential for video manipulation and editing. The dynamic mosaic is an ideal tool for low bit- rate transmission (see Section 4). The choice of the coordinate system for constructing and visualizing the dynamic mosaic image will depend on the applica- tion. For example, in remote surveillance type of ap- plications, which typically involve a narrow field of view camera that repeatedly scans the same outdoor
`
`eficient
`
`scene representation than the static mosaic. It too allows complete reconstruc- tion of the original video sequence. However, due to its
`
`incremental
`random access
`frame reconstruction, it lacks the important capability of
`
`Prime Focus Ex 1034-6
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`,2-i. Irani et al. ISignal Processiny: hnaye Communication 8 11996) 327-351
`
`333
`
`b) Fig. 4. Evolution of the dynamic mosaic images of the baseball game sequence. Left column: Three frames from the original sequence. Right column: The corresponding dynamic mosaic images. Note that the coordinate system as well as the position of the players are constantly being updated to match the current frame.
`
`Prime Focus Ex 1034-7
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`M. Irani et al. ISignal Processing: Image Communication 8 (1996) 327-351 Fig. 5. The residual maps of static versus dynamic cases. (a) A single frame from the table-tennis sequence. (b) The residual map computed for the corresponding frame in the static representation. The brighter values signify more significant residuals. (c) The residual map computed for the corresponding frame in the dynamic representation. Note that the amount of residuals in the dynamic case is significantly smaller than the amount of residuals in the static case. natural scene and is usually very bouncy, it is benefi- cial to construct a dynamic mosaic with a fixed coor- dinate system, as it will also serve as a stabilization mechanism for the remote observer. However, in flight surveillance, it makes more sense to keep a dynami- cally updating coordinate system that matches that of the view as seen by the pilot (with a gradually growing field of view obtained as the mosaic is constructed). These issues are discussed in Section 4. 2.3.
`The static and the dynamic mosaics are extremes of a continuum: One uses a completely static image, which may be based on an arbitrarily long sequence, and in principle there may be an arbitrarily long time interval between the current frame and the static mo- saic. The dynamic mosaic is completely current and always has the most recent available information. As a result, the dynamic mosaic is more efficient than the static mosaic, but since it requires sequential recon- struction of the frames, it does not provide as immedi- ate an access to the individual frames as the static mo- saic. In order to bridge the gap between these two ex- tremes and obtain the benefits of both representations (i.e., representation efficiency versus random-access to frames), a ‘temporal pyramid’ mosaic can be used. As discussed in Section 2.2, the static mosaic does not remove as much short-term temporal redundancy among residuals as the dynamic mosaic (see Fig. 5). The static mosaic can be extended to use a hierarchy of mosaics whose levels corresponds to different amounts of temporal integration. This hierarchical organization is similar to spatial image pyramid representation. The finest level contains the set of original images, one for each frame of the input sequence. The temporal sampling decreases successively as we go from fine to coarse resolution levels of the pyramid. We will refer to the nodes of the pyramid as mosaics. For instance, in the manner analogous to the Laplacian pyramid, the sampling rate can be reduced by a factor of 2 between successive levels (although in principle, the factor can be any number). In this case, each mosaic at a given level can be obtained in the same fashion as pixel val- ues are computed in the Laplacian pyramid (e.g., as difference of low-pass operators). The
`
`level will consist of a single mosaic, which is the same as the static mosaic described in Section 2.1. The suc- ceeding levels represent residuals estimated over var- ious time scales. Reconstruction can be achieved by hierarchically combining the static mosaic with the residual mosaics, namely in logarithmic time. 2.4.
`Changes in image resolution occur within the se- quence, e.g., as the camera zooms in and out. Con- structing the mosaic image in low resolution results in the loss of high frequency information in the re- gions of the mosaic that correspond to high resolution frames. Constructing the mosaic image at the high- est detected resolution, on the other hand, incurs the penalty of oversampling the low resolution frames. Moreover, in the dynamic mosaic case, the resolution variation is not known in advance. Varying image resolutions can be handled by a
`
`tiresolution
`
`mosaic data structure, which captures in- formation from each new frame at its closest corre- sponding resolution level in a
`mosaic pyramid.
`It is a sparse pyramid in the sense that the resolution levels
`
`mul-
`
`Temporal pyramid
`
`coarsest
`
`Multiresolution mosaic
`
`Prime Focus Ex 1034-8
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`pixrls,
`while the unit elements at each level of the temporal pyramid mosaic are
`mosuic im-
`ages.
`
`Mosaic wpresentutions C~YSUS scene complexity
`
`So
`
`Purallux based mosaic
`
`indiiciduul
`
`frame residuals. A better and more efficient approach to modeling parallax effects can be done by naturally extending the mosaic repre- sentation to capture parallax in a
`
`single
`
`M. Irani rt ul. I Signal Processing: haye Comrrwkution 8 (19%) 327-351 335 are not complete (certain mosaic regions may be rep- can be neglected. Similarly, if independent scene ac- resented at high resolution, others only at low resolu- tivity is confined to a small number of pixels, again tion). When a frame is predicted/reconstructed from these effects can be neglected during the construction the mosaic pyramid, the highest existing resolution of the mosaic. In both these situations, there will be data in the mosaic which corresponds to the frame nonzero but small residuals associated with the mo- (i.e., is within its region of support) is projected onto saic. These residuals are represented separately. The that frame’s resolution. 2D alignment process is described in Section 3.1.1. Note that the multiresolution mosaic is a completely different representation than the temporal pyramid mo- saic, although both use a coarse-to-fine data structure. The unit elements at each level of the multiresolu- tion mosaic are
`The multiresolution mosaic data structure can be applied to the static, dynamic, and temporal pyramid mosaic representations. 2.5.2.
`As explained above, the 2D alignment models are sufficient when the effects of 3D parallax (relative to the mosaic surface) are small. However, this does not mean that when these effects are significant, we need to abandon the mosaic-based approach to representing video sequences. One approach is to represent the par- allax effects as
`representation of the scene segment under the following circumstances: In scenarios where there is no scene activity apart from the motion of the camera and when either there is no translation of the camera, or when the entire scene can be approximated by a single parametric surface (typically a plane). The key to the 3D extension lies in the observation that the residual motion after aligning a dominant pla- nar surface in the scene is purely epipolar and is due to the combination of camera translation and the dis- tance of the other parts of the scene from the dominant (aligned) plane [14, 191. Specifically, we use the fol- lowing result derived in [ 141: Given two views (under perspective projection) of a scene (possibly from two distinct uncalibrated cameras), if the image motion corresponding to an arbitrary plane (called the ‘refer- ence plane’) is compensated (by applying an appro- priate 2D parametric warping transformation to one of the images) then the residual parallax displacement field on the reference image plane is an epipolar field. Further, the magnitude of the parallax displacement vector at each point directly depends on the distance of that point from the reference plane. The 2D motion field of a 3D planar surface is de- scribed by the 2D quadratic transformation: u(x) = 1’I.X +
`The total motion vector of a point can be written as the sum of the motion vector due to the planar surface (u,, tip) (as represented in Eq. ( 1)) and the residual parallax motion vector (Us, rr): In practice, however, 2D alignment is a good approx- imation even when these conditions are violated, pro- vided the violations are small. For instances, if the camera translates slowly and/or the relative distances between surface elements in the scene (LIZ) is small compared to their range (Z), the effects of parallax (u,L’) = (upcp) + (UbC,). The residual vector can be represented as (2) (3)
`
`far, we have described various types of mosaics that address different requirements, specifically rep- resentational efficiency and access efficiency. In this subsection, we consider another aspect of mosaic rep- resentations, namely the choice of the frame to frame alignment transformation. 2.5.1.
`
`T/w 20 mosuic
`
`All the examples that have been shown so far in this paper have relied on constructing a mosaic image using 2D alignment parametric transformations. Such a mosaic provides a
`
`complete
`
`c(x) ==
`
`pz’ + p5 + p7x2 + psxy,
`Pd.,’ + p6 + p7xy + p*y2.
`
`Prime Focus Ex 1034-9
`Prime Focus v Legend3D
`IPR2016-01243
`
`2.5.
`representation.
`p3.r +
`

`

`Irani et al. ISignal Processing: Image Communication 8 (1996) 327-351
`
`f
`
`where Y = HIP,, H is the perpendicular distance of the point of interest from the reference plane and P, is its depth. (TX, TY, T,) is the displacement of the camera between two views as expressed in the coordinate system of the reference (or fist) view, and Tl is the perpendicular distance from the camera center of the second view to the plane and
`
`is the focal length. At each point in the image y varies directly with the height of the corresponding 3D point from the reference surface and inversely with the depth of the point. In [ 14,19,20] it was shown that the parallax field is a relative affine invariant. Finally, it is noted that aligning the reference plane by warping the inspection image using the parametric motion field (u,, vr) also removes all of the rotational components of camera motion. Since the 3D structure is usually invariant over time (at least over the duration of several seconds or min- utes), it can be represented as a mosaic image that can be used to predict the parallax-induced motion over that duration. We refer to this representation of 3D structure as a ‘height’ map (relative to the dom- inant surface), a term borrowed from aerial imagery analysis. Thus, the complete mosaic-based representation of a 3D scene sequence (without independently mov- ing objects) would contain: (i) an intensity mosaic image produced by 3D alignment of the sequence frames, (ii) a corresponding height mosaic, (iii) for each frame: the computed 2D alignment transforma- tion of the dominant surface, (iv) for each frame: the computed 3D camera translation. For more details on this representation see [14,13,15]. An example of the mosaic image produced by 3D alignment is shown in Fig. 6. The three original wall images Frames l-3 are shown in Figs. 6(a)-(c) respectively. Fig. 6(d) shows a 2D mosaic built us- ing only 2D affine transformations. The 2D affine transformations used aligns the wall part of the im- ages. However the objects sticking out of the wall exhibit parallax and are not registered by the affine. As a result in the 2D mosaic (Fig. 6(d)), there are many ghost (duplicate) lines in the bottom half of the image. The reader’s attention is drawn to the image regions corresponding to the duplicate lines in the boxes titled ‘TRY’ and ‘Wooden blocks’ in the left bottom and the smearing on the book title informa- tion (e.g., Excel, Word, Getting Started) in the right bottom of Fig. 6(d), respectively. Fig. 6(e) shows a 3D corrected mosaic image. In this case, using the parallax motion information, the objects sticking out of the wall are correctly positioned and no duplicate lines are visible. The 3D corrected mosaic was made by using wall frame 3 (Fig. 6(c)) as the final destination image. Using the parallax com- puted from wall frames 1 and 2, wall frame 2 was reprojected into the frame 3 coordinate system. This reprojected image was then merged with frame 3 to make the Mosaic image shown in Fig. 6(e). Note in Fig. 6(c) one cannot see the boxes entitled ‘Wooden blocks’ or ‘TRY’. In the mosaic image, they however appear and are present in the geometrically correct lo- cations. The efficiency of representing 3D information with a height map (heights with respect to a surface in the scene) is greater than representing it with a range map (depths relative to the camera). The increased effi- ciency is due to the fact that the height map is invariant to the camera motion, as opposed to the range map. Moreover, the range of values of a typical height map has significantly smaller than that of a typical camera centered range or ‘depth’ map, and can therefore be much more compactly encoded. The details of the 3D mosaic are described in [15]. 2.5.3. Layers and tiles In principle, the 2D alignment model augmented with 3D parallax information is adequate for all scenes in which there is no independent object motion in the scene. However, in practice, when the 3D scene be- gins to be cluttered with objects at widely varying depths, and/or when real or ‘fence-like’ transparency is present, the parallax based representation of 3D is highly inefficient. A natural extension to the 2D mo- saic is to use multiple layers of 2D mosaics in the manner suggested by Adelson [ 11. In this representation, each layer can either rep- resent a different moving object or may represent surface at a different 3D depth. The benefits of mul- tiple motion analysis and layered representation has been previously described in [
`
`1 , 10,9,11,4].
`In our own work, we have developed algorithms for multiple
`
`Prime Focus Ex 1034-10
`Prime Focus v Legend3D
`IPR2016-01243
`
`336 M.
`

`

`Fig. 6. Parallax corrected mosaic to represent 3D scenes. (a)-(c) Three frames of the input video sequence taken from a sideways moving camera. The third frame (c) is used as the reference image. (d) The result of constructing a mosaic-based on 2D planar surface alignment. Patterns on the wall are perfectly aligned. However, note that objects ‘sticking out’ of the wall are not well aligned, as indicated by the duplicate lines. (e) The result of constructing a mosaic after correcting for 3D planar parallax. Note that all portions of the scene are well aligned. motion segmentation [8] and layered motion recovery [ 181. We plan to combine these with the 2D mosaics, and the parallax-based representations described earlier. Another extension that is necessary in order to han- dle extended fields of view is a ‘tiled’ representation. To motivate this, consider the simple example of a panning camera. In this case, the use of a single mo- saic image plane based on central projection does not meaningfully extend to more than a few degrees of ro- tation. Reprojecting the views acquired after rotating the camera by 45” or so results in significant distor- tion of the images, and therefore later in poor image reconstruction. Similarly when the camera is moved around an object (e.g., even a simple object like a box or a table) to get a frontal view of all of its sur- faces, projecting these to a single planar mosaic view will lead to the loss of image information from other views. In the panning case, a natural alternative is to use a coordinate system based on a spherical retina. However, this approach does not easily generalize to the second example of a moving camera given above. A better approach may be to have a series of tiles that correspond to different mosaic imaging planes (e.g., in the case of the pan, these will be tangent planes to the sphere) and assemble these ‘tiles’ together into a larger mosaic. Each image can be predicted from the tile that corresponds most to that image in terms of
`
`Prime Focus Ex 1034-11
`Prime Focus v Legend3D
`IPR2016-01243
`
`

`

`Irani et al. ISignal Processing: Image Communication 8 (1996) 327-351
`
`of
`
`PIX + P2Y + Ps + P7X2 + PSXY,
`
`resolution and minimizes the distortion. In this way gions of interest (which is initially the entire image the representation becomes somewhat more complex, region) is used as a match measure. This measure is

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket