throbber
0001
`
`IBG 1044
`CBM of U.S. Patent No. 7,693,768
`
`

`
`U.S. Patent
`
`.mmhSm2|)w
`
`US 6,205,260 B1
`
`
`
`
`
`M3O._.m_EH._.50¢“:Mmm.._o
`
`0—..U_..._
`
`
`
`oH._.<.w3<oz<moo
`
`20:.Emzmuzo
`
`0002
`0002
`
`
`

`
`U.S. Patent
`
`M
`
`0.,
`
`m
`
`US 6,205,260 B1
`

`
`:+fim_o_umoEN
`
`N.U_u_
`
`oH._.<§O._.3<<OZHQOQ
`
`
`
`zofiozmpmzmzfiuzo
`
`M:immo»Hw.\mm_%H,_J_mo..o~u_.._.m:+...m.u9=:_omM
`m:..m:wo_.3TT.......,n_:i.m:
`
`eV&m9\1m@mA
`
`M:i.m.m.,mH:+...m_o€m
`
`V
`
`0003
`0003
`
`
`
`

`
`U.S. Patent
`
`masaMcom2|)m
`
`US 6,205,260 B1
`
`
`
`mato»Eus?someMmoH<mos_.._o
`
`
`
`oH._.<_2o._.D<az<OZHQOU
`
`
`
`zofionfimzoon._zH._nzo
`
`m.o_"_
`

`
`.m+...m_u_omoEH
`
`
`
` a+...m.EmM.~i.m.9.nw0To”/V
`

`
`0004
`0004
`
`
`
`

`
`U.S. Patent
`
`Mar. 20, 2001
`
`Sheet 4 of 11
`
`US 6,205,260 B1
`
`INITIALIZE MOSAIC FIELDS
`AND BUFFERS
`gmoscnc
`- (§,tJ=0, M(§,t)=0
`INITIALIZE WARPING PARAMETERS
`
`wtoett-1}( )
`
`19
`
`POSITION AT BEGINNING
`OF IMAGE I(§_,t)
`
`20
`
`22
`
`0
`
`24
`
`26
`9 YES N0
`
`
`COMPARE I(§,t) WITH
`PREVIOUS ENCODED/DECODED
`FRAME c"c§1t§.t—1)}
`
`
`
`GET NEXT PIXEL
`
`28
`
`N0
`NEW IMAGE REGION
`3ChOnge(§.t}=1b
`
`
`
`
`52
`
`YES
`
`STHRES
`CHA9NGE
`
`N0
`
`3
`
`YES
`
`:54
`
`S
`
`change
`
`(§,t}=O
`
`(
`
`°“°“9° g
`
`,t)=1
`
`a
`
`58
`
`
`IDENTIFY
`
`NEE! BACKGROUND
`AREA (Jchange(§,t)==0 8: 8:
`‘.3
`Wtvto 3mosaict§.t—1))=o)
`
`F|G.4A
`
` 3nbg(§,t)=1
`
`3,,bg(§.t1=o
`
`40
`
`0005
`0005
`
`

`
`U.S. Patent
`
`Mar. 20, 2001
`
`Sheet 5 of 11
`
`US 6,205,260 B1
`
`:
`
`FROM FIG.-1-A
`
`
`
`42
`
`IS
`ASSOCIATED
`
`
`BACKGROUND REGION
`FOR MOSAIC KNOW?
`(wtvr to §'$moSaic(§.t—1 1 )=1)
`?
`
`
`
`NO
`
`
`
`
`COMPARE I(§,t) WITH
`CURRENT WARPED MOSAIC
`Wt¢-t°(M(§_,t—1
`
`
`
`
`
`ENCODE FOREGROUND
`REGIONS IN IMAGE
`I(§,tJ WHERE
`3fgt§,t)=1a on 1b
`
`
`
`
`
`
`
`
`TO FIG.4-C
`
`0006
`0006
`
`

`
`U.S. Patent
`
`Mar. 20, 2001
`
`Sheet 6 of 11
`
`US 6,205,260 B1
`
`FROM FIG.4-B
`
`POSITION AT BEGINNING
`OF CODED/DECODED
`FOREGROUND SHAPE
`
`55
`
`68
`
`0
`
`70
`
`0 YES NO
`
`GET NEXT PIXEL
`
`72
`
`YES
`
`
`
`80
`
`
`
`78
`
`
`
`
`
`
`MOSAIC
`PREGION KNOW?
`-.5
`- (s.t-1)=1
`l‘T10SCllC —
`
`82
`
`
`BACKGROUND
`REGION
`
`twtm_t(Zs‘bg(§_tn=1
`?
`
`M'(§,t—1)=M(§,t—1)
`
`
`YES
`
`84
`
`M'(§,t—1}=Wt01_(t_1)(C_1C{I(§,t—1}§)
`
`TO FIG.4D
`
`0007
`0007
`
`

`
`U.S. Patent
`
`Mar. 20, 2001
`
`Sheet 7 of 11
`
`US 6,205,260 B1
`
`FROM FIG.4C
`
`as
`
`G)
`
`
`
`
`ENCODE BACKGROUND REGIONS
`IN IMAGE I{§.t) WHERE
`
`
`
`
`
`
`POSITION OF BEGINNING
`OF MOSAIC AT TIME t-1
`
`92
`
`94
`
`° YES 6
`
`N
`
`0
`
`96
`
`GET NEXT PIXEL
`
`90
`
`104
`
`FmosaIc(§'U=1
`J
`.
`
`TO FIG.4-E
`
`0008
`0008
`
`98
`
`NO
`
`100
`
`YES
`
`
`WtoFt(Ibg(§,t))=1
`
`?
`
`F|G.4D
`
`NO
`
`102
`
`3mosc|ic{§'t)=O
`
`

`
`U.S. Patent
`
`Mar. 20, 21101
`
`Sheet 8 of 11
`
`US 6,205,260 B1
`
`
`
`M (§.t)=[1——O{WtO,,t(Fbg(§,t))] M'(§.t—1)
`+ ocwto,ttFbg<§,t1 1 [M'(§.t—-1 1 +
`
`
`
`Wto(_t(C_1C§AI(§,t)} 1]
`
`
`
`
`F|G.4E
`
`0009
`0009
`
`

`
`U
`
`8
`
`0
`
`S
`
`mnu
`
`UmiansEaz<
`
`
`
`NhwmWmN_n..:2OOmn_<Iwm.n_._.mO
`
`6.,:3mmmmazoouo
`
`%amwamfimnmmmmmazoomo0.W_n=2OOmmmw_n:2OO
`
`zoC<hzu2on._m
`
`mn_\3mmm:..uv93am.._oz:omov_o<m
`
`zoEoEmEn_
`
`
`
`
`
`
`
`om.m:zoE<_2Emm_e.I-Imo<_2HNm_o<§tmU_u_v_oo._m_mmxom._._<o»mm..1.N:o:...m_mEaoozm
`
`mozEa<;Jmm_EH.__2§<aETm:oH>m~_n_aozEm<2P«Emmoazoo4/SEzxm:2:
`
`um_a»<mnomozfimfsoz2zoam%m_xu<m
`u,:u:uX3zo:§zn.:,.ommTM.3oMmoz<:onoz\moz<:o91,.
`
`193:u:uv93on.mafixntm..,m_..._ozHmoo:3memo...mi2.H.:<o%
`
`
`
`s_<m_o<Hn_ozCzm_2n._._n_s:I.
`
`
`ME.mmroz3omoxo<mofm
`
`
`«Ummmaazoomo8$>ooz:fi._,_w_.._¢m.%m_m.,Hm_zn._NH..H__%Hn.»__W_zH
`
`
`
`0010
`0010
`
`
`

`
`U.S. Patent
`
`Mar. 20, 2001
`
`Sheet 10 of 11
`
`US 6,205,260 B1
`
`179
`
`CAMERA.CONTROL
`
`PRIOR TO
`XMISSION
`
`190
`
`186
`
`
`
`'
`
`188
`
`DURING
`XMISSION
`
`180
`
`C)
`
`C)(D
`DD
`
`F|G.6
`
`192
`
`
`
`TIME TO \”’'E To“
`
`v
`
`TIME T0+2
`
`F|G.8
`
`210
`
`0011
`0011
`
`

`
`U.S. Patent
`
`M
`
`...mT\.
`
`mom
`
`mm:
`
`mwowM£3
`mz5zH._mHm>._<z<ozEoozn._@25391Eo.S<Exn._m_~_3fi._..__Mmomo_.s.fi._v_°mu.EE<E
`
`
`
`
`
`
`
`
`2.,,m_>EEzn._mmmn_n._~_R._:&_n._zn._oodqmoz
`
`$.48
`
`US 6,205,260 B1
`
`uz8zH._
`
`
`
`_._0w_<mmn_H:_O
`
`.mmm<m_<.2n_Eooozuu82>z.fi._Em:m82>Bm<m_uEHEm
`
`5mm.
`
`0012
`0012
`
`
`
`

`
`US 6,205,260 B1
`
`1
`SPRITE-BASED VIDEO CODING SYSTEM
`WITH AUTOMATIC SEGMENTATION
`INTEGRATED INTO CODING AND SPRITE
`BUILDING PROCESSES
`
`CROSS RL*ll7I£RI£N(_‘l.'l TO REl.Al'l3|)
`APPLICATIONS
`
`This application claims priority from U.S.A. Provisional
`Application No. 60f034,558, filed on Dec. 30, 1996, which
`is hereby incorporated by reference.
`
`I3/\(IKGR()UNI) ()1-‘ 'l‘I|lE INVLlN'I'I()N
`
`The invention relates to a sprite-based encoder {encoder
`and decoder that automatically builds a sprite (also called a
`mosaic) while operating in a separate shapeftextu re coding
`environment such as MPEG-4 and novel applications that
`utilize the mosaic.
`
`A mosaic image (the terrn’s mosaic and sprite are used
`interchangeably) is built from images of a certain scene
`object over several video frames. For instance, a mosaic of
`a background scene in the case of a panning camera will
`result in a panoramic image of the background. Two major
`types of sprites and sprite-based coding are defined in
`MPEG-4. The first type of mosaic is called an olI-line static
`sprite. An off—line static sprite is a panoramic image which
`is used to produce a sequence of snapshots of the same video
`object (such as background). Each individual snapshot is
`generated by simply warping portions of the mosaic content
`and copying it to the video bu l]'er where the current video
`frame is being reconstructed. Static sprites are built off—line
`and are transmitted as side information.
`
`The second type of mosaic is called an on-line dynamic
`sprite. On-line dynamic sprites are used in predictive coding
`of a video object. A prediction of each snapshot of the video
`object in a sequence is obtained by warping a section of the
`dynamic sprite. The residual signal is coded and used to
`update the mosaic in the encoder and the decoder concur-
`rently. The content ol‘ a dynamic mosaic may be constantly
`updated to include the latest video object infonnation. As
`opposed to static sprites, dynamic sprites are built on line
`simultaneously in the encoder and decoder. Consequently,
`no additional information needs to be transmitted besides
`
`mosaic dimensions, a blending factor and the global motion
`parameters.
`Both olT-line static and on-line dynamic sprite-based
`coding require constructing a sprite. In the former case, the
`sprite is built prior to transmission. In the later case,
`the
`sprite is built on-line during the transmission. So far,
`MPEG-4 has assumed that the outline (segmentation) of the
`object for which the sprite is going to be built is known
`a—priori at every time instant. Although this is true in certain
`applications, especially in post—production or content gen-
`eration using blue screen techniques, automatic segmenta-
`tion does not currently exist for sprites built dynamically in
`encoding environments.
`Therefore a need remains for sprite-based coding systems
`where sprite building does not require a—priori knowledge of
`soene segmentation.
`
`SUMMARY OF THE INVENTION
`
`A sprite-based coding system includes an encoder and
`decoder where sprite-building is automatic and segmenta-
`tion of the object used to reconstruct the mosaic is automatic
`and integrated into the ooding process. The sprite object is
`distinguished from the rest ol‘ the video objects on basis of
`
`‘Ill
`
`15
`
`20
`
`30
`
`40
`
`E50
`
`2
`
`its motion. The sprite object moves according to the domi-
`nant component of the scene motion, which is usu ally due to
`camera motion or zoom. Hence, the sprite-based coding
`system utilizes dominant motion to distinguish the back-
`ground object in the image from foreground objects in the
`image. The sprite-based coding system is easily integrated
`into a video object—based coding framework such as MPEG-
`4, where shape and texture of individual video objects are
`coded separately. The automatic segmentation integrated in
`the sprite-based coding system identifies the shape and
`texture of the sprite object.
`In one application, the sprite-based encodingfdecoding
`system builds a background mosaic olli-line prior to the
`bi-directional
`transmission of conversational and visual
`data. The initial mosaic reconstruction phase provides both
`the videophone encoder and decoder the opportunity to build
`either a partial or a complete representation of the back-
`ground behind the videophone user. In a second phase the
`videophone supports a conversational service between two
`videophone users and the mosaic is used by the videophone
`encoder and decoder to implement mosaic—based predictive
`coding of the visual data. In very low bit rate applications,
`coding of video frames in ten11s of video objects within may
`require too much bandwidth, because the shape of such
`objects may consume a significant portion of the limited bit
`budget. In such cases, the sprite-based coding system con-
`ducts Irame-based coding where automatic segmentation is
`only used to obtain better dominant motion estimation for
`sprite building and dominant motion—compensated predic-
`tion. The sprite—based encoding system can also generate a
`sprite that has a higher spatial resolution than the original
`images.
`The sprite-based coding system is suitable for applica-
`tions where camera view may change frequently, such as
`video conferencing with multiple cameras, or a talk show
`captured with more than one camera. In these applications,
`multiple sprites are built and used as needed. For instance,
`if a camera goes back and forth between two participants in
`front of two diiferent backgrounds, two background sprites
`are built and used as appropriate. More specifically, when
`background A is visible, building of the sprite for back-
`ground I} and its use in coding is suspended until Back-
`ground B appears again. The use of multiple sprites in this
`fashion is possible within the MPEG-4 framework, as
`described below.
`
`A sprite is built during the encoding process. However,
`the resulting sprite may be subsequently used, after coding,
`as a representative image of the compressed video clip. Its
`features can be used to identify the features of a video clip,
`which is then used in feature-based (or c0ntent—based)
`storage and retrieval of video clips. Ilence sprite-based
`coding, according to another embodiment of the invention,
`is used to populate a video library of bitstreams or decoded
`images where sprite images generated during the encoding
`process act as representative images of the video clips. The
`mosaics can also be coded using a still
`image coding
`method.
`
`In a similar fashion, one or several event lists may be
`associated with a background sprite. Apossible choice for an
`event list is the set of consecutive positions of one or several
`vertices belonging to each foreground objects. Such a list is
`then used to generate token representative images of the
`foreground object position in the sprite. Consecutive posi-
`tions of each vertex are then either linked by a straight line
`or share a distinct color. The consecutive positions of the
`vertex are then shown statically (all successive positions in
`the same sprite) or dynamically {vertex positions shown in
`
`0013
`0013
`
`

`
`US 6,205,260 B1
`
`3
`the mosaic successively in time). A vertex is chosen to
`correspond to any distinctive feature of the foreground
`object, such as the center of gravity or a salient point in the
`shape of the object. In the latter case, and if several vertices
`are used simultaneously, the vertices are arranged according
`to a hierarchical description of the object shape. With this
`technique, a user or a presentation interface has the freedom
`to choose between coarse to finer shapes to show successive
`foreground object positions in the background sprite. This
`concept may be used in a video library system to retrieve
`content based on motion characteristics of the foreground
`object(s).
`The mosaic can also be warped at a fixed zooming factor
`to provide more resolution than the images used to generate
`the mosaic. An optimal view point for the mosaic is provided
`by locating an optimiried center reference frame instant and
`then building the mosaic around the reference frame instant.
`The foregoing and other objects, features and advantages
`of the invention will become more readily apparent from the
`following detailed description of a preferred embodiment of
`the invention, which proceeds with reference to the accom-
`panying drawings.
`
`BRIEF DESCRIPTIONS OF THE DRAWINGS
`
`FIG. 1 illustrates the bulfers used in the sprite-based
`encoder according to the invention at time t—l.
`FIG. 2 illustrates the buffers used in the sprite-based
`encoder at time t to 1+1.
`
`FIG. 3 illustrates the buffers used in the sprite-based
`encoder at time t+l to t+2.
`
`FIGS. 4A—-4-E show a flow diagram describing operation
`of the sprite-based encoder illustrated in FIGS. 1-3.
`FIG. 5 is a functional block diagram of the sprite-based
`encoden
`
`FIG. 6 is a diagram of a sprite-based video conferencing
`system according to another embodiment of the invention.
`l"‘I('i. 7 is a block diagram of a sprite-based video search
`system according to another embodiment of the invention.
`FIG. 8 shows how consecutive portions of a foreground
`object may be represented in a mosaic according to yet
`another aspect of the invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The sprite-encoder system progressively learns to disso-
`ciate foreground from background while building a back-
`ground mosaic at the same time. Once an initialization step
`(step 1)
`is complete, steps 2 to 11 described below arc
`repeated until the construction of the backgnlund is com-
`plete or until it
`is aborted. The meaning of a region as
`referred to below is either a single set of contiguous pixels
`or alternatively several disconnected sets of contiguous
`pixels in the image. The single region or the union of these
`disconnected regions are referred to interchangeably as a
`region.
`The notations are as follows: l(s,t) denotes the content of
`video frame at spatial position s and at time t. W,,_,-,_1,(I(
`§,t—1)) denotes a warping operator which maps the image at
`time (1-1) to time t. For a given pixel location 5,, in a video
`buifer at
`time I, this warping operation is performed by
`copying the pixel value at corresponding location s in frame
`(t—1). The correspondence between location so and location
`s is established by a particular and well defined transfon11a-
`tion such as an affine or a perspective transformation, which
`
`10
`
`‘I5
`
`20
`
`30
`
`40
`
`45
`
`60
`
`4
`
`are well known to those skilled in the art. 3, s,t) is an
`indicator buffer, say for quantity J, denotes a particular state
`related to I at any pixel location s within the image at time
`t. Thresh is a threshold value. The operations §Thresh and
`éfhresh are symbolic and can represent complex thresh-
`olding operations.
`image
`The size (per color component) of the current
`frame I(s,t)
`is M,><N, and the size of the previous
`compressedfdeconipressed frame after warping, W,__[.,_])
`((T_1C{([(s,t—l)}),
`is such that
`it can be inscribed in a
`rectangular array of M,_,><N_._, pixels. A sprite M(s,t) is an
`image intensity (texture) butfer of size M,,_><N,,, per color
`component. The field -'i'§m,_m_(s_,t) is a single component field
`of the same size.
`Referring now to FIGS. 1-4, a mosaic of the background
`is built as follows. The construction of the sprite is started at
`time t. The image I(s,t—1) has already been compressed and
`decorrtpressed and is available in both the encoder and the
`decoder.
`In the following steps,
`the image content
`is
`assumed to have a background and a foreground part which
`are transmitted as separate video objects (or VOs).
`FIG. '1 illustrates steps 1 through 11 from time t—l, the
`instant when mosaic building is initiated, to time lwhen the
`new, video frame or field has been acquired. FIGS. 2 and 3
`illustrate steps 2 through 11 from time t to t+1 and from time
`t+l
`to t+2, respectively. FIG. 4 shows the operations per-
`formed by the sprite-based encoder during steps l—ll.
`At the top left corner in each of these FIGS. 1-3 is shown
`a newly acquired video frame (A) which is compared to the
`previous video frame (B) (next image field to the right), once
`it has been cocledfdecoded and warped (step 2). Step 3 is
`illustrated by the right most image field (C) in the first row
`of FIGS. 1-3. This field shows the area where content
`
`change has been detected. The status of the mosaic buffer at
`time 1 (before updating occurs) is shown in the leftmost
`image field in the second row (D). The buffer
`is used to
`identify the new background areas as described in step 4
`below. These areas correspond to regions where background
`was not known until now.
`
`Foreground identification is illustrated by the rightmost
`image in the second row ([5). The operations associated with
`this image (I?) that use the change map, the mosaic indicator
`buffer and the new background areas to define the fore-
`ground are described in step 5. The two leftmost image fields
`in the third row (G, H) illustrate steps 6 and 7 of the
`invention, respectively. In II, background shape information
`comes from the codedldecoded foreground shape infonT1a-
`tion obtained in G. Finally, the mosaic updating process is
`illustrated by the image field (I) in the bottom right corner.
`This process.
`takes place in steps 8,9,1U' and 11 of the
`sprite-based encoding process.
`Step 1: Initialization.
`In operation 19 in FIG. 4, the binary field -‘timm-,,(s,t) is
`initialized to 0 for every position s in the bulfer, meaning that
`the content of the mosaic is unknown at these locations. The
`
`content of the mosaic butler M(§,t) is initialized to 0. The
`warping parameters from the current video frame I(s_,t—1) to
`the mosaic are initialized to be W,0_,_,,_,)(any),
`t0 here
`representing an arbitrary fictive time. This initial warping is
`important as it provides a way to specify the “resolution” or
`the “time reference” used to build the mosaic. Potential
`applications of this initial mapping are making a mosaic
`with super spatial resolution or selection of an optimal time
`t0 minimizing the distortion introduced by the method as
`described in further detail below. A super spatial resolution
`mosaic is a mosaic with increased resolution compared to
`the original individual images in the sequence.
`
`0014
`0014
`
`

`
`US 6,205,260 B1
`
`5
`
`Step 2: Acquisition.
`The image I('s,t) is acquired and the forward warping
`parameters for mapping the image I('s,t—l)
`to I(s,t) are
`computed. The number of warping parameters as well as the
`method for estimating these parameters are known to those
`skilled in the art and are therefore not specified here. For
`example, a dominant motion estimation algorithm such as
`the one given in .I. Bergen et al., “A three-frame algorithrrl
`for estimating two-component image motion," IEEE Trans.
`Pattern Anal., Mach. Intel., Vol. 14, pp. 886-896, September
`1992. may be used. The warping parameters are composed
`with the current warping parameters, resulting in the map-
`ping W“. ,0(any). These parameters are transmitted to a
`decoder either directly or indirectly in the form Of a series of
`coded vertex coordinates. The sprite—based encoder is posi-
`tioned at the beginning of the image frame I(s,t) in operation
`20 in FIG. 4 and the first pixel in the image frame is obtained
`in operation 24.
`Step 3: Detect Change in Content between Previously
`Codedt'Decoded Frame and Current Frame.
`
`i) Initialization of a large buffer of size M,,><N,, greater
`than the image {Mb>M,, N,_>N_,) and possibly as large
`as the mosaic. The buffer is initialized to 3 to indicate
`
`unknown status. 3C,,,,,,g,_,{s,t)=3
`ii) In operations 26, 28,32, 34 and 36 in FIG. 4, scene
`changes (motion compensated) are computed over
`common image support. Give label 0 to all locations
`where change in content is deemed small (34). Give
`label 1a to locations where change is detected to be
`large (36). To make regions more homogeneous, imple-
`ment additional operations (e.g. morphological
`operations) which either reset label from la to O or set
`label from 0 to la. Regions labeled 0 will typically be
`considered and coded as part of the background Video
`Object while regions labeled la will
`typically be
`encoded as part of the foreground Video Object.
`
`.'f.-:....g..t.s. 1) = {
`
`it mt-.:t—w,H-.:c"'Ctrts.:—1m|sn:n~.s.-:...,,.
`D
`in otherwise
`
`where Thres _
`L,m,_,_,,. denotes a pre—defined threshold
`value.
`
`iii) In operations 28 and 30 in FIG. 4, new image regions
`are tagged, where support of image at time I does not
`overlap with support of image at
`time (1-1),
`
`as 3L.,m,,,_,,(s,t)=1b
`Step 4: Identify New Background Areas.
`In operations 38, 40 and 41 in FIG. 4, a new background
`area is detected if there has not been any change in image
`content in the last two video frames. The corresponding area
`in the mosaic must also indicate that the background at this
`location is unknown. The resulting new background area is
`then pasted to any neighboring regions where background is
`known. As will be seen in later steps, incorporation of new
`background data into the mosaic must be done according to
`compressed;‘de—compressed background shape information
`to avoid any drift between encoder and decoder.
`
`.‘I;.i.,,ts. n =
`
`if tt.7.-rmxpts. 1'] == E1l5’-:&tl‘-’n—n?l.ffamuaIct5. 1’ ** 111 == mi
`oth crwise
`
`D{ 1
`
`Here, the indicator value 0 means that the background is
`unknown.
`
`‘IE!
`
`‘IS
`
`20
`
`30
`
`40
`
`45
`
`E50
`
`6
`Step 5: Perform Foreground} Background Segmentation.
`Operations 44, 46, 48 and 50 in FIG. 4, first look at
`
`regions where the background is known (-‘.'§,,,,,_,,—,,c(s,t—l)=l).
`Thresholding is performed (46) to distinguish the ‘fore-
`ground from the background (case {i)) For regions where
`background is not known, operations 56-62 tag as fore-
`ground any regions where changes have occurred {label la
`and lb defined in step 3) (cases (iii) and (iv)). Operations 52
`and 54 in case (ii) represents new background areas that are
`excluded from being part of the foreground.
`wfi- - r(si'r1o5.rzr‘¢:(§ol_1 })==1
`
`=1-'1 if “(-5. l'J— Wu pol-Ml-5. 1'— ll,“ > Tftrestij.-3
`.
`.1:
`Jygtsr {:0 otherwise
`
`where Thres}-g is a pre-defined threshold value, which is
`used here to segment foreground from background.
`
`ii) else if .‘3,,,,g(s,t)==1 $§,g(s,t)=O
`clsc
`(wr1—(a(3mus'nfc(§3[_1-)==0) & & (%c:.l'rr.rrIgg’(
`s,t)==1a)) 3,g(s,t)=1a
`iv)
`0156
`(VVII-—{o(grrIatsar'c'(§![_1)==0) & & (3:.'J'rrrng;'(
`s,1)==1b}) -‘«'i;,,J(§,1)=1b
`In cases (iii) and (iv), a sub-classification of the regions
`tagged as 1 into either regions la and lb is used for the sole
`purpose of providing the encoder with the flexibility to
`follow diflerent macroblock selection rules. For example,
`regions tagged as Ia might be preferably coded with inter-
`frame macroblocks since these regions occur over common
`image support. ()n the other hand, regions tagged as lb
`might preferably be coded with intra—frame macroblocks
`since these regions do not share a common support with the
`previoLLs frame. These regions can be tagged as regions “ I ”
`if such distinction does not need to be stored or is not used
`by the encoder.
`The sprite—based encoder then jumps back to operations
`22 and 24 in FIG. 4. The next pixel is then obtained for the
`image frame and steps 3, case ii) -5 are repeated for the next
`pixel.
`Step 6: Compress,r‘Decompress Foreground Shape and Tex-
`ture.
`
`After all pixels in the image frame have been analyzed,
`and applicable pixels tagged as foreground or background
`regions, the sprite-based encoder jumps to operation 64 in
`FIG. 4. A conventional (I, P or B—\/OP) prediction mode is
`used to encode foreground regions labeled as la and lb. In
`the case of P or B-V()Ps, individual macroblocks can either
`use inter-frame prediction or intra—frame coding. The pixels
`corresponding to regions labeled 1b {which can include
`revealed background not represented in mosaic) are prefer-
`ably coded as intra macroblocks.
`The shape of the foreground is compressed and transmit-
`ted as well. Once de-com pressed, this shape is used by the
`encoder and decoder to update the content of the mosaic.
`This process can be performed using a standard MPEG-4
`VM version 9.0 encoding technique such as described in
`“MPEG-4 Video Verification Model Version 9.0”, Docu-
`ment IS()fII_".(.‘ .|T(_‘t.t'SC29;"W'Gl l,=’Nl869. The sprite—based
`encoder is then positioned at the beginning of the coded!
`decoded foreground shape in operation 66 and the first pixel
`of the shape bitmap is obtained in operation 70 of FIG. 4.
`Step 7: Get Background Shape.
`In operations 72-76 in FIG. 4, the background shape is
`then obtained from the compressedfde-compressed fore-
`ground shape. Compressionx’Dc—compression is necessary
`
`0015
`0015
`
`

`
`US 6,205,260 B1
`
`7
`here to ensure that encoder and decoder share the same
`shape information.
`
`.7b_gL‘i. I] ={
`
`it‘ r‘ ct_’I;,t.s. rt == 0t
`1
`0 otherwise
`
`where C‘JC{ } denotes shape codingfdecoding which is
`performed using standard Ml-"I.iG-4 coding techniques as
`described above.
`
`‘ll!
`
`Step 8: Initialize New Background Texture in Mosaic.
`Operations 78-84 in FIG. 4 identify where new back-
`ground has occurred and initialize mosaic with content
`found in previous video frame (time (1-1)). Note that the
`
`15
`
`field -‘3‘"£w(:_;,t) cannot be used here since this information is
`unknown to the decoder.
`
`8
`background regions have been updated, the segmentation
`process described above in steps 2—11 is repeated for the
`next image frame acquired in operation 108.
`The method described above builds the mosaic with
`reference to time 10, which can be a time instant in the past
`or can be the current time or a future time instant. It is
`straightforward to rewrite the above equations for the case
`where the mosaic is continuously warped to the current time
`instant t.
`Referring to FIG. 5, a block diagram of the sprite—based
`encoder 109 is depicted. The purpose of this drawing is to
`highlight the dependencies in the various components and
`quantities used by the method 01‘ the invention. It also
`emphasizes the various warping and un—warping stages that
`are necessary to align consecutive video fields.
`The current image to be encoded "114 is used along with
`the previous original
`image 110 to estimate the global
`
`M't-5. r— 11:
`
`Mtg, r- I)
`
`it‘ L‘f,m,,(,,-,t.§. r — I] —_-= 1]
`
`Wro---tr
`
`tiff-' lei-f1.§.!-— 1t}J
`
`if
`
`(Wino r13}-gts. I1} == 1t&&t.I-umar.-ts.
`
`I-—1ll==Ull
`
`Step 9: Calculate Background Texture Residuals from
`Mosaic Prediction.
`
`is
`in operation 86, a dillerence signal
`If $§L_g(s,t)==l
`calculated by using mosaic content as a predictor. The
`resulting Al(s,t) calculated in operation 88, is used to com-
`pute the diflerence signal. In operations 63 and 70 in FIG. 4,
`the sprite~based encoder repeats steps 7-9 above for each
`identified foreground image pixel. The dillerence signal over
`each macroblock is Compared to conventional difference
`signals produced by using prediction from the previous and
`the next video frame (P or B prediction mode). The mac-
`roblock type is selected according to the best prediction
`mode. The residual signal is transmitted to the decoder along
`with the compressed background shape as described in
`"Result of N3 Experiment Using Unified S2i'N3 Syntax”,
`Regis J. Crinon and Ibrahim Sezan, Document ISO/IEC—
`JTCt;'SC29;’WG11x’M1404.
`
`*7\‘(§-l)=l(-.81}-‘-\"r.—.oU\"l'(-E.’—1J)
`
`Operations 90 and 92 in FIG. 4 encodes background regions
`
`in the image I(s_,t) where 3,,g(s_,l)==l and then start subse-
`quent processing at the beginning pixel of the mosaic.
`Step 10: Update Background Shape in Mosaic.
`Operations 98-404 in FIG. 4 update the mosaic indicator
`map to include the shape of the new background.
`
`:r.,....,.-,t.s. n =
`
`if .7"-to.sa;cl.5. I-v lt= I
`1
`it tt Wu. .tJi,,t.~;. n) = Ila’-8‘-[..‘jittL15at'(l.5. 1 — 1:: OJ:
`1
`0 nlhenvisc
`
`30
`
`40
`
`45
`
`Step 11: Update Mosaic.
`Operations 104 and 106 in FIG. 4 update content of the
`mosaic in regions corresponding to new or non-covered
`background in frame t.
`
`E50
`
`Mt-srt=I ‘I -aW.nHt3i,(sIJJtM'ts.L—'I t+aWmH(-‘3‘».,.(s ml “(-2-
`t—1t+W.o. ,(C 'C{M(§.-')}J]
`
`The selection of the value of the blending parameter 0.
`(Deer.-:1) in the above equation is application dependent. The
`sprite-based encoder then jumps back to operations 94 and
`96 and processes the next pixel ol‘ the mosaic. After all
`
`motion (warping) parameters in 116. The output of estimator
`116 is transmitted over a communication channel 122 (the
`encoding of these parameters can take various forms includ-
`ing a series of coded vertex coordinates). The output of
`estimator 116 is also passed to a warping block 120 which
`compose (computes accumulated) warping transformations
`in time. A map indicating regions to be updated 128 and
`initialized new background 130 are used for generating a
`background mosaic 134. The image 144 which represents
`image 114 once it has been segmented into foreground and
`background regions, is generated from the warped back-
`ground mosaic buffer 136, the warped mosaic indicator map
`133 and identified new background regions. 150.
`A background shape signal and background update pixel
`values 164 are generated from the input image 114, the
`warped mosaic 136, uncovered background in the previous
`encodedfdecoded image 158 and the encodedfdecoded fore-
`ground shape of the new image 114 being encoded. The
`coded background shape and texture residuals 166,
`the
`coded foreground shape and texture 170 are transmitted in
`block 172. The contents from block 172 are decoded in
`
`block 156 and warped in block 152 and 154. The decoded,
`warped, previous image from block 152 and the foreground}
`background—segmented picture 144 are used to calculate the
`foreground residual signal in block 160. Uncovered back-
`ground regions are identilied in block 158 from the warped
`decoded previous image generated in block 154 as well as
`from the current mosaic indicator map 124. The uncovered
`background regions 153 are used to initialize new mosaic
`regions in block 130 (this corresponds to the bottom equality
`in step 8 above).
`A Mosaic—Based Video Conferencing and Vidcophone Sys-
`tern.
`
`FIG. 6 shows a block diagram of a video conferencing
`system 179 that uses an ell‘-line built background sprite as a
`dynamic sprite during transmission. Prior to transmission of
`encoded images, the encoding system 179 conducts a time
`adjustable configuration phase. During the configuration
`phase, a switch 190 disables sprite—based encoding of
`images 180 until an on-line background mosaic is built
`off-line by an oil‘-line mosaic builder 186. By lirst building
`a background mosaic ofi—lir1e, coding etficiency is increased
`because the mosaic can incorporate all
`the background
`
`0016
`0016
`
`

`
`US 6,205,260 B1
`
`9
`regions that the videophone system need to represent in the
`course of the conversational transmission between the two
`end users.
`Prior to transmission, a switch 184 connects a video
`camera 182 to the oll‘-line mosaic builder 186. Small dis-
`placements of a foreground object, such as the head and
`shoulders of image 180 are used to build a background
`mosaic. The head displacements allow the off—line mosaic
`builder 186 to identify and build the background image
`behind the head and shoulders. The displacements of the
`foreground are voluntary where the system 179 directs the
`foreground in image 180 to move in different directions until
`a substantially complete background mosaic is derived in
`mosaic builder 186. Alternatively, the system 179 does not
`direct the user to move in different directions redu.cing gain
`in coding efliciency. The sprite-base encoding technique
`described above in steps 1-11 is used to build the back-
`ground mosaic.
`The sprite—based codec 188 then uses the mosaic during
`video transmission as a dynamic sprite with the blending
`factor set to U to prevent any updating of the mosaic. In this
`case, macroblock types may be dynamic or static. In one
`extreme case, all macroblocks are static-type macroblocks
`meaning that
`the background mosaic is used as a static
`sprite. In another extreme case, all macroblocks are dynamic
`and the mosaic is being used as a dynamic (predictive)
`sprite. This later case requires a higher data transmission
`bandwidth.
`A Mosaic-Based Video Database
`Referring to FIG. 7, a sprite—based video search system
`194 uses the sprite—based encoder 109 to populate and search
`a database 198 of video biLstreams 202, i.e., a database of
`compressed bitstreams. The sprite-based encoder 109 com-
`press video clips 193 using the sprite-based encoding tech-
`nique described above in steps l—l1. Features of the video
`clips 193 can no longer be visually identified in the com-
`pressed bitstream 202. Thus, a user would have to decom-
`press the video bitstream 202 for all video clips 193 until the
`correct video clip is located.
`However, the sprite-based encoder 109 generates both the
`compressed bitstream 202 and a mosaic 195 generated
`during the encoding process. The mosaic image 195 is used
`as a representative image of the video clip bitstream 202.
`The features in mosaic 195 are then analyzed and represen-
`tative features 208 extracted by an analysis engine 196. The
`representative features 208 are inserted along with the video
`bitstream into a video database 198. A search

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket