throbber
Signal Processing: Image Communication 15 (2000) 365}385
`
`MPEG-4 natural video coding } An overview
`
`Touradj Ebrahimi!,*, Caspar Horne"
`
`!Signal Processing Laboratory, Swiss Federal Institute of Technology } EPFL, 1015 Lausanne, Switzerland
`"Mediamatics Inc., 48430 Lakeview Blvd., Fremont, CA, 94538, USA
`
`Abstract
`
`This paper describes the MPEG-4 standard, as de"ned in ISO/IEC 14496-2. The MPEG-4 visual standard is
`developed to provide users a new level of interaction with visual contents. It provides technologies to view, access and
`manipulate objects rather than pixels, with great error robustness at a large range of bit-rates. Application areas range
`from digital television, streaming video, to mobile multimedia and games. The MPEG-4 natural video standard consists
`of a collection of tools that support these application areas. The standard provides tools for shape coding, motion
`estimation and compensation, texture coding, error resilience, sprite coding and scalability. Conformance points in the
`form of object types, pro"les and levels, provide the basis for interoperability. Shape coding can be performed in binary
`mode, where the shape of each object is described by a binary mask, or in gray scale mode, where the shape is described in
`a form similar to an alpha channel, allowing transparency, and reducing aliasing. Motion compensation is block based,
`with appropriate modi"cations for object boundaries. The block size can be 16]16, or 8]8, with half pixel resolution.
`MPEG-4 also provides a mode for overlapped motion compensation. Texture coding is based in 8]8 DCT, with
`appropriate modi"cations for object boundary blocks. Coe$cient prediction is possible to improve coding e$ciency.
`Static textures can be encoded using a wavelet transform. Error resilience is provided by resynchronization markers, data
`partitioning, header extension codes, and reversible variable length codes. Scalability is provided for both spatial and
`temporal resolution enhancement. MPEG-4 provides scalability on an object basis, with the restriction that the object
`shape has to be rectangular. MPEG-4 conformance points are de"ned at the Simple Pro"le, the Core Pro"le, and the
`Main Pro"le. Simple Pro"le and Core Pro"les address typical scene sizes of QCIF and CIF size, with bit-rates of 64, 128,
`384 and 2 Mbit/s. Main Pro"le addresses a typical scene sizes of CIF, ITU-R 601 and HD, with bit-rates at 2, 15 and
`38.4 Mbit/s. ( 2000 Elsevier Science B.V. All rights reserved.
`
`Keywords: MPEG-4 visual; Video coding; Object based coding; Streaming; Interactivity
`
`1. Introduction
`
`Multimedia commands the growing attention of
`the telecommunications, consumer electronics, and
`
`* Corresponding author.
`E-mail address: touradj.ebrahimi@ep#.ch (T. Ebrahimi)
`
`computer industry. In a broad sense, multimedia is
`assumed to be a general framework for interaction
`with information available from di!erent sources,
`including video.
`A multimedia standard is expected to provide
`support for a large number of applications. These
`applications translate into speci"c sets of require-
`ments which may be very di!erent from each other.
`One theme common to most applications is the
`need for supporting interactivity with di!erent
`
`0923-5965/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.
`PII: S 0 9 2 3 - 5 9 6 5 ( 9 9 ) 0 0 0 5 4 - 5
`
`VIMEO/IAC EXHIBIT 1026
`VIMEO ET AL., v. BT, 2019-00833
`
`

`

`366
`
`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`kinds of data. Applications related to visual in-
`formation can be grouped together on the basis of
`several features:
`f type of data (still images, stereo images, video,
`etc.),
`f type of source (natural images, computer gener-
`ated images, text/graphics, medical images, etc.),
`f type of communication (ranging from point-to-
`point to multipoint-to-multipoint),
`f type of desired functionalities (object manipula-
`tion, on-line editing, progressive transmission,
`error resilience, etc.).
`
`Video compression standards, MPEG-1 [5] and
`MPEG-2 [6], although perfectly well suited in en-
`vironments for which they were designed, are not
`necessarily #exible enough to e$ciently address the
`requirements of multimedia applications. Hence,
`MPEG (Moving Picture Experts Group) commit-
`ted itself to the development of the MPEG-4
`standard, providing a common platform for a wide
`range of multimedia applications [3]. MPEG has
`been working on the development of the MPEG-4
`standard since 1993, and "nally after about 6 years
`of e!orts, an International Standard covering the
`"rst version of MPEG-4 has been adopted recently
`[7].
`This paper provides an overview of the major
`natural video coding tools and algorithms as
`de"ned in the International Standard ISO/IEC
`14496-2 (MPEG-4). Additional features and im-
`provements that will be de"ned in an amendment,
`due in 2000, are not described here.
`
`2. MPEG-4 visual overview
`
`2.1. Motivation
`
`Digital video is replacing analog video in many
`existing applications. A prime example is the intro-
`duction of digital television that is starting to see
`wide deployment. Another example is the progress-
`ive replacement of analog video cassettes by DVD
`as the preferred medium to watch movies. MPEG-2
`has been one of the key technologies that enabled
`the acceptance of these new media. In these existing
`applications, digital video will initially provide sim-
`
`ilar functionalities as analog video, i.e. the content
`is represented in digital form instead of analog, with
`obvious direct bene"ts such as improved quality
`and reliability, but the content remains the same to
`the user. However, once the content is in the digital
`domain, new functionalities can easily be added,
`that will allow the user to view, access, and mani-
`pulate the content in completely new ways. The
`MPEG-4 standard provides key technologies that
`will enable such functionalities.
`
`2.2. Application areas
`
`2.2.1. Digital TV
`With the phenomenal growth of the Internet and
`the World Wide Web, the interest in advanced
`interactivity with content provided by digital televi-
`sion is increasing. Increased text, picture, audio or
`graphics that can be controlled by the user can add
`to the entertainment value of certain programs, or
`provide valuable information unrelated to the
`current program, but of interest to the viewer. TV
`station logos, customized advertising, multi-win-
`dow screen formats allowing display of sports stat-
`istics or stock quotes using data-casting are some
`examples of increased functionalities. Providing the
`capability to link and to synchronize certain events
`with video would even improve the experience.
`Coding and representation of not only frames of
`video, but also individual objects in the scene (video
`objects), can open the door for completely new
`ways of television programming.
`
`2.2.2. Mobile multimedia
`The enormous popularity of cellular phones and
`palm computers indicates the interest in mobile
`communications and computing. Using multi-
`media in these areas would enhance the user’s ex-
`perience and improve the usability of these devices.
`Narrow bandwidth, limited computational capa-
`city and reliability of the transmission media are
`limitations that currently hamper widespread use
`of multimedia here. Providing improved error
`e$ciency
`resilience,
`improved
`coding
`and
`#exibility in assigning computational resources
`would bring mobile multimedia applications closer
`to reality.
`
`

`

`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`367
`
`2.2.3. TV production
`Content creation is increasingly turning into
`virtual production techniques as extensions to the
`well-known chroma keying. The scene and the ac-
`tors are recorded separately, and can be mixed
`with additional computer generated special e!ects.
`By coding video objects instead of rectangular lin-
`ear video frames, and allowing access to the video
`objects, the scene can be rendered with higher qual-
`ity, and with more #exibility. Television programs
`consisting of composited video objects, and addi-
`tional graphics and audio, can then be transmitted
`directly to the viewer, with the additional advant-
`age of allowing the user to control the program-
`ming in a more sophisticated way. In addition,
`depending on the targeted viewers, local TV sta-
`tions could inject regional advertisement video
`objects, better suited when international programs
`are broadcast.
`
`2.2.4. Games
`The popularity of games on stand-alone game
`machines, and on PCs clearly indicate the interest
`in user interaction. Most games are currently using
`three-dimensional graphics, both for the environ-
`ment, and for the objects that are controlled by the
`players. The addition of video objects into these
`games would make the games even more realistic,
`and using overlay techniques, the objects could be
`made more life-like. Essential is the access to indi-
`vidual video objects, and using standards based
`technology would make it possible to personalize
`games by using personal video data bases linked in
`real-time into the games.
`
`2.2.5. Streaming video
`Streaming video over the Internet is becoming
`very popular, using viewing tools as software plug-
`ins for a Web browser. News updates and live
`music shows are just examples of many possible
`video streaming applications. Here, bandwidth is
`limited due to the use of modems, and transmission
`reliability is an issue, as packet loss may occur.
`Increased error resilience and improved coding e$-
`ciency will improve the experience of streaming
`video. In addition, scalability of the bitstream, in
`terms of temporal and spatial resolution, but also
`in terms of video objects, under the control of the
`
`viewer, will further enhance the experience, and
`also the use of streaming video.
`
`2.3. Features and functionalities
`
`The MPEG-4 visual standard consists of a set of
`tools that enable applications by supporting several
`classes of
`functionalities. The most
`important
`features covered by MPEG-4 standard can be clus-
`tered in three categories (see Fig. 1) and sum-
`marized as follows:
`
`1. Compression ezciency: Compression e$ciency
`has been the leading principle for MPEG-1 and
`MPEG-2, and in itself has enabled applications
`such as Digital TV and DVD. Improved coding
`e$ciency and coding of multiple concurrent
`data streams will increase acceptance of applica-
`tions based on the MPEG-4 standard.
`2. Content-based interactivity: Coding and repres-
`enting video objects rather than video frames
`enables content-based applications. It is one of
`important novelties o!ered by
`the most
`MPEG-4. Based on e$cient representation of
`objects, object manipulation, bitstream editing,
`and object-based scalability allow new levels of
`content interactivity.
`3. Universal access: Robustness in error-prone en-
`vironments allows MPEG-4 encoded content to
`be accessible over a wide range of media, such as
`mobile networks as well as wired connections. In
`addition, object-based temporal and spatial
`scalability allow the user to decide where to use
`sparse resources, which can be the available
`bandwidth, but also the computing capacity or
`power consumption.
`
`Fig. 1. Functionalities o!ered by the MPEG-4 visual standard.
`
`

`

`368
`
`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`functionalities,
`these
`some of
`To support
`MPEG-4 should provide the capability to repres-
`ent arbitrarily shaped video objects. Each object
`can be encoded with di!erent parameters, and at
`di!erent qualities. The shape of a video object can
`be represented in MPEG-4 by a binary or a gray-
`level (alpha) plane. The texture is coded separately
`from its shape. For low-bit-rate applications, frame
`based coding of texture can be used, similar to
`MPEG-1 and MPEG-2. To increase robustness to
`errors, special provisions are taken into account at
`the bitstream level to allow fast resynchronization,
`and e$cient error recovery.
`The MPEG-4 visual standard has been explicitly
`optimized for three bit-rate ranges:
`(1) Below 64 kbit/s,
`(2) 64}384 kbit/s,
`(3) 384}4 Mbit/s.
`For high-quality applications, higher bit-rates
`are also supported while using the same set of tools
`and the same bitstream syntax for those available
`in the lower bit-rates.
`MPEG-4 provides support for both interlaced
`and progressive material. The chrominance format
`that is supported is 4:2:0. In this format the number
`of Cb and Cr samples are half the number of samples
`of the luminance samples in both horizontal and
`vertical directions. Each component can be repre-
`sented by a number of bits ranging from 4 to 12 bits.
`
`2.4. Structure and syntax
`
`The central concept de"ned by the MPEG-4
`standard is the audio-visual object, which forms the
`foundation of
`the object-based representation.
`Such a representation is well suited for interactive
`applications and gives direct access to the scene
`contents. Here we will limit ourselves to mainly
`natural video objects. However, the discussion re-
`mains quite valid for other types of audio-visual
`objects. A video object may consist of one or more
`layers to support scalable coding. The scalable syn-
`tax allows the reconstruction of video in a layered
`fashion starting from a stand-alone base layer, and
`adding a number of enhancement layers. This
`allows applications to generate a single MPEG-4
`video bitstream for a variety of bandwidth and/or
`computational complexity requirements. A special
`
`case where a high degree of scalability is needed, is
`when static image data is mapped onto two or three
`dimensional objects. To address this functionality,
`MPEG-4 provides a special mode for encoding
`static textures using a wavelet transform.
`An MPEG-4 visual scene may consist of one or
`more video objects. Each video object is character-
`ized by temporal and spatial information in the
`form of shape, motion and texture. For certain
`applications video objects may not be desirable,
`because of either the associated overhead or the
`di$culty of generating video objects. For those
`applications, MPEG-4 video allows coding of rec-
`tangular frames which represent a degenerate case
`of an arbitrarily shaped object.
`An MPEG-4 visual bitstream provides a hier-
`archical description of a visual scene as shown
`in Fig. 2. Each level of the hierarchy can be accessed
`in the bitstream by special code values called start
`codes. The hierarchical
`levels that describe the
`scene most directly are:
`f Visual Object Sequence (VS): The complete
`MPEG-4 scene which may contain any 2-D
`or 3-D natural or synthetic objects and their
`enhancement layers.
`f Video Object (VO): A video object corresponds to
`a particular (2-D) object in the scene. In the most
`simple case this can be a rectangular frame, or it
`can be an arbitrarily shaped object correspond-
`ing to an object or background of the scene.
`f Video Object Layer (VOL): Each video object can
`be encoded in scalable (multi-layer) or non-scala-
`ble form (single layer), depending on the applica-
`tion, represented by the video object
`layer
`(VOL). The VOL provides support for scalable
`coding. A video object can be encoded using
`spatial or temporal scalability, going from coarse
`to "ne resolution. Depending on parameters
`such as available bandwidth, computational
`power, and user preferences, the desired resolu-
`tion can be made available to the decoder.
`
`There are two types of video object layers, the
`video object layer that provides full MPEG-4 func-
`tionality, and a reduced functionality video object
`layer, the video object layer with short headers. The
`latter provides bitstream compatibility with base-
`line H.263 [4].
`
`

`

`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`369
`
`Fig. 2. Example of an MPEG-4 video bitstream logical structure.
`
`Each video object is sampled in time, each time
`sample of a video object is a video object plane.
`Video object planes can be grouped together to
`form a group of video object planes:
`f Group of Video Object Planes (GOV): The GOV
`groups together video object planes. GOVs can
`provide points in the bitstream where video ob-
`ject planes are encoded independently from each
`other, and can thus provide random access
`points into the bitstream. GOVs are optional.
`f Video Object Plane (VOP): A VOP is a time
`sample of a video object. VOPs can be encoded
`independently of each other, or dependent on
`each other by using motion compensation.
`A conventional video frame can be represented
`by a VOP with rectangular shape.
`
`A video object plane can be used in several di!erent
`ways. In the most common way the VOP contains
`
`the encoded video data of a time sample of a video
`object. In that case it contains motion parameters,
`shape information and texture data. These are en-
`coded using macroblocks. It can also be used to
`code a sprite. A sprite is a video object that is
`usually larger than the displayed video, and is per-
`sistent over time. There are ways to slightly modify
`a sprite, by changing its brightness, or by warping it
`to take into account spatial deformation. It is used
`to represent large, more or less static areas, such as
`backgrounds. Sprites are encoded using macro-
`blocks.
`the
`A macroblock contains a section of
`luminance component and the spatially subsam-
`pled chrominance components. In the MPEG-4
`visual standard there is support for only one
`chrominance format for a macroblock, the 4:2:0
`format. In this format, each macroblock contains
`4 luminance blocks, and 2 chrominance blocks.
`
`

`

`370
`
`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`Fig. 3. General block diagram of MPEG-4 video.
`
`Fig. 4. Example of VOP-based decoding in MPEG-4.
`
`Each block contains 8]8 pixels, and is encoded
`using the DCT transform. A macroblock carries the
`shape information, motion information, and tex-
`ture information.
`Fig. 3 illustrates the general block diagram of
`MPEG-4 encoding and decoding based on the
`notion of video objects. Each video object is coded
`separately. For reasons of e$ciency and backward
`compatibility, video objects are coded via their
`corresponding video object planes in a hybrid cod-
`ing scheme somewhat similar to previous MPEG
`standards. Fig. 4 shows an example of decoding of
`a VOP.
`
`3. Shape coding tools
`
`In this section, we discuss the tools o!ered by the
`MPEG-4 standard for explicit coding of shape in-
`formation for arbitrarily shaped VOs. Besides the
`shape information available for the VOP in ques-
`tion, the shape coding scheme also relies on motion
`estimation to compress the shape information even
`further. A general description of shape coding tech-
`niques would be out of the scope of this paper.
`Therefore, we will only describe the scheme ad-
`opted by MPEG-4 natural video standard for
`shape coding. Interested readers are referred to [2]
`
`

`

`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`371
`
`for information on other shape coding techniques.
`In MPEG-4 visual standard, two kinds of shape
`information are considered as inherent character-
`istics of a video object. These are referred to as
`binary and gray scale shape information. By binary
`shape information, one means a label information
`that de"nes which portions (pixels) of the support
`of the object belong to the video object at a given
`time. The binary shape information is most com-
`monly represented as a matrix with the same size as
`that of the bounding box of a VOP. Every element
`of the matrix can take one of the two possible
`values depending on whether the pixel is inside or
`outside the video object. Gray scale shape is a gen-
`eralization of the concept of binary shape providing
`a possibility to represent transparent objects, and
`reduce aliasing e!ects. Here the shape information
`is represented by 8 bits, instead of a binary value.
`
`3.1. Binary shape coding
`
`In the past, the problem of shape representation
`and coding has been thoroughly investigated in the
`"elds of computer vision, image understanding, im-
`age compression and computer graphics. However,
`this is the "rst time that a video standardization
`e!ort has adopted a shape representation and cod-
`ing technique within its scope. In its canonical
`form, a binary shape is represented as a matrix of
`binary values called a bitmap. However, for the
`purpose of compression, manipulation, or a more
`semantic description, one may choose to represent
`the shape in other forms such as using geometric
`representations or by means of its contour. Since its
`beginning, MPEG adopted a bitmap based com-
`pression technique for the shape information. This
`is mainly due to the relative simplicity and higher
`maturity of such techniques. Experiments have
`shown that bitmap-based techniques o!er good
`compression e$ciency with relatively low com-
`putational complexity. This section describes the
`coding methods for binary shape information. Bi-
`nary shape information is encoded by a motion
`compensated block based technique allowing both
`lossless and lossy coding of such data. In MPEG-4
`video compression algorithm, the shape of every
`VOP is coded along with its other properties (tex-
`ture and motion). To this end, the shape of a VOP
`
`is bounded by a rectangular window with a size of
`multiples of 16 pixels in horizontal and vertical
`directions. The position of the bounding rectangle
`could be chosen such that it contains the minimum
`number of blocks of size 16]16 with nontranspar-
`ent pixels. The samples in the bounding box and
`outside of the VOP are set to 0 (transparent). The
`rectangular bounding box is then partitioned into
`blocks of 16]16 samples and the encoding/decod-
`ing process is performed block by block.
`The binary matrix representing the shape of
`a VOP is referred to as binary mask. In this mask
`every pixel belonging to the VOP is set to 255, and
`all other pixels are set to 0. It is then partitioned
`into binary alpha blocks (BAB) of size 16]16. Each
`BAB is encoded separately. Starting from rectangu-
`lar frames, it is common to have BABs which have
`all pixels of the same value, either 0 (in which case
`the BAB is called a transparent block) or 255 (in
`which case the block is said to be an opaque block).
`The shape compression algorithm provides several
`modes for coding a BAB. The basic tools for
`encoding BABs are the Context based Arithmetic
`Encoding (CAE) algorithm [1] and motion
`compensation. InterCAE and IntraCAE are the
`variants of the CAE algorithm used with and with-
`out motion compensation, respectively. Each shape
`coding mode supported by the standard is a combi-
`nation of these basic tools. Motion vectors can be
`computed by searching for a best match position
`(given by the minimum sum of absolute di!erence).
`The motion vectors themselves are di!erentially
`coded. Every BAB can be coded in one of the
`following modes:
`
`1. The block is #agged transparent. In this case no
`coding is necessary. Texture information is not
`coded for such blocks either.
`2. The block is #agged opaque. Again, shape cod-
`ing is not necessary for such blocks, but texture
`information needs to be coded (since they belong
`to the VOP).
`3. The block is coded using IntraCAE without use
`of past information.
`4. Motion vector di!erence (MVD) is zero but the
`block is not updated.
`5. MVD is zero and the block is updated. InerCAE
`is used for coding the block update.
`
`

`

`372
`
`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`Fig. 5. Context number selected for InterCAE (a) and IntraCAE
`(b) shape coding. In each case, the pixel to be encoded is marked
`by a circle, and the context pixels are marked with crosses. In the
`InterCAE, part of the context pixels are taken from the co-
`located block in the previous frame.
`
`6. MVD is non-zero, but the block is not coded.
`7. MVD is non-zero, and the block is coded.
`
`The CAE algorithm is used to code pixels in
`BABs. The arithmetic encoder is initialized at the
`beginning of the process. Each pixel is encoded as
`follows:
`
`1. Compute a context number according to the
`de"nition of Fig. 5.
`2. Index a probability table using this context
`number.
`3. Use the retrieved probability to drive the arith-
`metic encoder for codeword assignment.
`
`3.2. Gray scale shape coding
`
`The gray scale shape information has a structure
`similar to that of binary shape with the di!erence
`that every pixel (element of the matrix) can take on
`a range of values (usually 0 to 255) representing the
`degree of the transparency of that pixel. The gray
`scale shape corresponds to the notion of alpha
`plane used in computer graphics, in which 0 corres-
`ponds to a completely transparent pixel and 255 to
`a completely opaque pixel. Intermediate values of
`the pixel correspond to intermediate degrees of
`transparencies of that pixel. By convention, a bi-
`nary shape information corresponds to a gray scale
`shape information with values of 0 and 255. Gray
`scale shape information is encoded using a block
`based motion compensated DCT similar to that of
`texture coding, allowing lossy coding only. The
`gray scale shape coding also makes use of binary
`shape coding for coding of its support.
`
`4. Motion estimation and compensation tools
`
`Motion estimation and compensation are com-
`monly used to compress video sequences by ex-
`ploiting temporal redundancies between frames.
`The approaches for motion compensation in the
`MPEG-4 standard are similar to those used in
`other video coding standards. The main di!erence
`is that the block-based techniques used in the other
`standards have been adapted to the VOP structure
`used in MPEG-4. MPEG-4 provides three modes
`for encoding an input VOP, as shown in Fig. 6,
`namely:
`
`1. A VOP may be encoded independently of any
`other VOP. In this case the encoded VOP is
`called an Intra VOP (I-VOP).
`2. A VOP may be predicted (using motion com-
`pensation) based on another previously decoded
`VOP. Such VOPs are called Predicted VOPs
`(P-VOP).
`3. A VOP may be predicted based on past as well
`as future VOPs. Such VOPs are called Bidirec-
`tional
`Interpolated VOPs (B-VOP). B-VOPs
`may only be interpolated based on I-VOPs or
`P-VOPs.
`
`Obviously, motion estimation is necessary only
`for coding P-VOPs and B-VOPs. Motion estima-
`
`Fig. 6. The three modes of VOP coding. I-VOPs are coded
`without any information from other VOPs. P- and B-VOPs are
`predicted based on I- or other P-VOPs.
`
`

`

`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`373
`
`tion (ME) is performed only for macroblocks in the
`bounding box of the VOP in question. If a macro-
`block lies entirely within a VOP, motion estimation
`is performed in the usual way, based on block
`matching of 16]16 macroblocks as well as 8]8
`blocks (in advanced prediction mode). This results
`in one motion vector for the entire macroblock, and
`one for each of its blocks. Motion vectors are com-
`puted to half-sample precision.
`For macroblocks that only partially belong to
`the VOP, motion vectors are estimated using the
`modixed block (polygon) matching technique. Here,
`the discrepancy of matching is given by the sum of
`absolute di!erence (SAD) computed for only those
`pixels in the macroblock that belong to the VOP.
`In case the reference block lies on the VOP bound-
`ary, a repetitive padding technique assigns values
`to pixels outside the VOP. The SAD is then com-
`puted using these padded pixels as well. This
`improves e$ciency, by allowing more possibilities
`when searching for candidate pixels for prediction
`at the boundary of the reference VOP.
`For P- and B-VOPs, motion vectors are encoded
`as follows. The motion vectors are "rst di!eren-
`tially coded, based on up to three vectors of pre-
`viously transmitted blocks. The exact number
`depends on the allowed range of the vectors. The
`maximum range is selected by the encoder and
`transmitted to the decoder, in a fashion similar to
`MPEG-2. Variable length coding is then used to
`encode the motion vectors.
`MPEG-4 also supports overlapped motion com-
`pensation, similar to the one used in H.263 [4].
`This usually results in better prediction quality at
`lower bit-rates.
`Here,
`for each block of the macroblock, the
`motion vectors of neighboring blocks are con-
`
`sidered. This includes the motion vector of the
`current block, and its four neighbors. Each vector
`provides an estimate of the pixel value. The actual
`predicted value is then a weighted average of all
`these estimates.
`
`5. Texture coding tools
`
`The texture information of a video object plane is
`present in the luminance, Y, and two chrominance
`components, Cb, Cr, of the video signal. In the case
`of an I-VOP, the texture information resides dir-
`ectly in the luminance and chrominance compo-
`nents. In the case of motion compensated VOPs the
`texture information represents the residual error
`remaining after motion compensation. For encod-
`ing the texture information, the standard 8]8
`block-based DCT is used. To encode an arbitrarily
`shaped VOP, an 8]8 grid is super-imposed on the
`VOP. Using this grid, 8]8 blocks that are internal
`to VOP are encoded without modi"cations. Blocks
`that straddle the VOP are called boundary blocks,
`and are treated di!erently from internal blocks. The
`transformed blocks are quantized, and individual
`coe$cient prediction can be used from neighboring
`blocks to further reduce the entropy value of the
`coe$cients. This is followed by a scanning of the
`coe$cients, to reduce to average run length be-
`tween to coded coe$cients. Then, the coe$cients
`are encoded by variable length encoding. This pro-
`cess is illustrated in the block diagram of Fig. 7.
`
`5.1. Boundary macroblocks
`
`Macroblocks that straddle VOP boundaries,
`boundary macroblocks, contain arbitrarily shaped
`
`Fig. 7. VOP texture coding process.
`
`

`

`374
`
`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`texture data. A padding process is used to extend
`these shapes into rectangular macroblocks. The
`luminance component is padded on 16]16 basis,
`while the chrominance blocks are padded on 8]8
`basis. Repetitive padding consists in assigning
`a value to the pixels of the macroblock that lie
`outside of the VOP. When the texture data is the
`residual error after motion compensation,
`the
`blocks are padded with zero-values. Padded macro-
`blocks are then coded using the technique de-
`scribed above.
`For intra coded blocks, the padding is performed
`in a two-step procedure called Low Pass Extrapola-
`tion (LPE). This procedure is as follows:
`1. Compute the mean of the pixels in the blocks
`that belong to the VOP. Use this mean value as the
`padding value, that is,
`
`fr,c
`
`D
`(r,c)bVOP
`
`" 1
`N
`
`+
`(x,y)|VOP
`
`fx,y ,
`
`(1)
`
`match errors. The DCT transform is followed by
`a quantization process.
`
`5.3. Quantization
`
`The DCT coe$cients are quantized as a lossy
`compression step. There are two types of quantiz-
`ations available. Both are essentially a division of
`the coe$cient by a quantization step size. The "rst
`method uses one of two available quantization ma-
`trices to modify the quantization step size depend-
`ing on the spatial frequency of the coe$cient. The
`second method uses the same quantization step size
`for all coe$cients. MPEG-4 also allows for a non-
`linear quantization of DC values.
`
`5.4. Coezcient prediction
`
`The average energy of the quantized coe$cients
`can be further reduced by using prediction from
`neighboring blocks. The prediction can be per-
`formed from either the block above, the block
`to the left, or the block above left, as illustrated
`in Fig. 8. The direction of
`the prediction is
`adaptive and is selected based on comparison of
`horizontal and vertical DC gradients (increase or
`reduction in its value) of surrounding blocks
`A, B and C.
`
`where N is the number of pixels of the macroblock
`in the VOP. This is also known as mean-repetition
`DCT.
`2. Use the average operation given in Eq. (2) for
`each pixel fr,c, where r and c are the row and
`column position of each pixel in the macroblock
`outside the VOP boundary. Start from the top left
`corner f0,0 of the macroblock and proceed row by
`row to the bottom right pixel.
`#fr~1,c
`#fr,c‘1
`4
`
`#fr‘1,c
`
`.
`
`(2)
`
`fr,c
`
`D
`(r,c)bVOP
`
`"fr,c~1
`
`The pixels considered in the right-hand side of
`Eq. (2) should lie within the VOP, otherwise they
`are not considered and the denominator is adjusted
`accordingly.
`Once the block has been padded, it is coded in
`a similar fashion to an internal block.
`
`5.2. DCT
`
`Internal video texture blocks and padded bound-
`ary blocks are encoded using a 2-D 8]8 block-
`based DCT. The accuracy of the implementation of
`8]8 inverse transform is speci"ed by the IEEE
`1180 standard to minimize accumulation of mis-
`
`Fig. 8. Candidate blocks for coe$cient prediction.
`
`

`

`T. Ebrahimi, C. Horne / Signal Processing: Image Communication 15 (2000) 365}385
`
`375
`
`There are two types of prediction possible, DC
`prediction and AC prediction:
`f DC prediction: The prediction is performed for
`the DC coe$cient only, and is

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket