`A Video
`Compression
`Standard
`for Multimedia
`Applications
`
`Miler Le Gall
`
`Page 1
`
`NETFLIX, INC
`Exhibit 1012
`IPR2018-01630
`
`
`
`Ii
`
`mam,
`and
`Standards
`
`DIGITAL MULTIMEDIA SYSTEMS
`
`digital video; M PEG is a standard
`that responds to a need. In this situ-
`ation a standards committee is a
`forum where precompetitive re-
`search can take place, where manu-
`facturers meet researchers, where
`industry meets academia. By and
`large, because the problem to be
`solved was perceived as important,
`the technology developed within
`MPEG is at the forefront of both
`research and industry. Now that
`the work of the MPEG committee
`has reached maturity (a "Commit-
`tee Draft" was produced in Septem-
`ber 1990), the VLSI industry is
`ready and waiting to implement
`MPEG's solution.
`
`MPEG Standard Activities
`The activity of the MPEG commit-
`tee was started in 1988 with the goal
`of achieving a draft of the standard
`by 1990. In the two years of MPEG
`activity, participation has increased
`tenfold from 15 to 150 participants.
`The MPEG activity was not started
`without due consideration to the
`related activities of other standard
`organizations. These considera-
`tions are of interest, not only be-
`cause it is important to avoid dupli-
`cation of work between standards
`committees but most of all, because
`these activities provided a very im-
`portant background and technical
`input to the work of the MPEG
`committee.
`
`Background: Relevant Standards
`The JPEG Standard. The activities
`of JPEG (Joint Photographic Ex-
`perts' Group) [10] played a consid-
`erable role in the beginning of
`MPEG, since both groups were
`originally in the same working
`group of ISO and there has been
`considerable overlap in member-
`ship. Although the objectives of
`JPEG are focused exclusively on
`still-image compression, the distinc-
`tion between still and moving image
`is thin; a video sequence can be
`
`T he development of digital
`
`video technology in the 1980s
`has made it possible to use digital
`video compression for a variety of
`telecommunication
`applications:
`teleconferencing, digital broadcast
`codec and video telephony.
`Standardization of video com-
`pression techniques has become a
`high priority because only a stan-
`dard can reduce the high cost of
`video compression codecs and re-
`solve the critical problem of inter-
`operability of equipment from dif-
`The
`ferent
`manufacturers.
`existence of a standard is often the
`trigger to the volume production of
`integrated circuits (VLSI) necessary
`for significant cost reductions. An
`example of such a phenomenon—
`where a standard has stimulated
`the growth of an industry—is the
`spectacular growth of the facsimile
`market in the wake of the standard-
`ization of the Group 3 facsimile
`compression algorithm by
`the
`CCITT. Standardization of com-
`pression algorithms for video was
`first initiated by the CCITT for tele-
`conferencing and videotelephony
`[7]. Standardization of video com-
`pression techniques for transmis-
`sion of contribution-quality televi-
`sion signals has been addressed in
`(more precisely
`in
`the CC IRI
`CMTT/2, a joint committee be-
`tween the CCIR and the CCITT).
`Digital transmission is of prime
`importance for telecommunication,
`particularly in the telephone net-
`work, but there is a lot more to digi-
`tal video than teleconferencing and
`visual telephony. The computer
`industry, the telecommunications
`industry and the consumer elec-
`tronics industry are increasingly
`sharing
`the same
`technology—
`there is much talk of a convergence,
`which does not mean that a com-
`puter workstation and a television
`receiver are about to become the
`same thing, but certainly, the tech-
`nology is converging and includes
`'CCIR is the International Consultative Com-
`mittee on Broadcasting: CCU "I is the Inter-
`national Committee on Telegraph and Tele-
`phones. CNIT.I. is a joint committee of the
`CCITT and the CCIR working on issues rele-
`vant to television and telephony.
`
`digital video compression. In the
`view of shared technology between
`different segments of the informa-
`tion processing industry, the Inter-
`national Organization for Stand-
`ardization (ISO) has undertaken an
`effort to develop a standard for
`video and associated audio on digi-
`tal storage media, where the con-
`cept of digital storage medium in-
`cludes conventional storage devices
`CD-ROM, DAT,
`tape drives,
`winchesters, writable optical drives,
`as well as telecommunication chan-
`nels such as ISDNs, and local area
`networks.
`This effort is known by the name
`of the expert group that started it:
`MPEG—Moving Picture Experts
`Group—and is currently part of
`the
`ISO-I EC/JTC1/SC2/WG I1.
`The MPEG activities cover more
`than video compression, since the
`compression of the associated audio
`and the issue of audio-visual syn-
`chronization cannot be worked in-
`dependently of the video compres-
`sion: MPEG-Video is addressing
`the compression of video signals at
`about 1.5 Mbits, MPEG-Audio is
`addressing the compression of a
`digital audio signal at the rates of
`64, 128 and 192 kbits/s per channel,
`MPEG-System
`is addressing the
`issue of synchronization and multi-
`plexing of multiple compressed
`audio and video bit streams. This
`article focuses on the activities of
`MPEG-Video. The premise of
`MPEG is that a video signal and its
`associated audio can be compressed
`to a bit rate of about 1.5 Mbits/s
`with an acceptable quality.
`Two very
`important conse-
`quences follow: Full-motion video
`becomes a form of computer data,
`i.e., a data type to be integrated
`with text and graphics; Motion
`video and its associated audio can
`be delivered over existing com-
`puter and telecommunication net-
`works.
`
`Precompetitive Research
`The growing importance of digital
`video is reflected in the participa-
`tion of more and more companies
`in standards activities dealing with
`
`INICATIONS OF THE ACM/April 1991 I Vol.34, No.4
`
`47
`
`Page 2
`
`
`
`thought of as a sequence of still
`images to be coded individually, but
`displayed sequentially at video rate.
`However, the "sequence of still
`images" approach has the disad-
`vantage that it fails to take into con-
`sideration the extensive frame-to-
`frame redundancy present in all
`video sequences. Indeed, because
`there is a potential for an additional
`factor of three in compression ex-
`ploiting the temporal redundancy,
`and because this potential has very
`significant implications for many
`relying on storage
`applications
`media with limited bandwidth, ex-
`tending the activity of the ISO com-
`mittee to moving pictures was a nat-
`ural next step.
`
`CCITT Expert Group on Visual Te-
`lephony. As previously mentioned,
`most of the pioneering activities in
`video compression were triggered
`teleconferencing and video-
`by
`telephony applications. The defini-
`tion and planned deployment of
`ISDN (Integrated Service Digital
`Network) was the motivation for
`the standardization of compression
`techniques at the rate of px64 kbits/s
`where p takes values from one (one
`B channel of ISDN) to more than
`20 (Primary rate ISDN is 23 or 30 B
`channels). The Experts Group on
`visual telephony in the CCITT
`Study Group XV addressed the
`problem and produced CCITT
`Recommendation H.261: "Video
`Codec for Audiovisual Services at
`px64 kbits" [7, 9]. The focus of the
`CCITT expert group is a real-time
`encoding-decoding system, exhibit-
`ing less than 150 ms delay. In addi-
`tion, because of the importance of
`very low bit-rate operation (around
`64 kbits/s), the overhead informa-
`tion is very tightly managed.
`After careful consideration by
`the MPEG committee, it was per-
`ceived that while the work of the
`CCITT expert group was of very
`high quality, relaxing the constraint
`on very low delay and the focus on
`extremely low bit rates could lead to
`a solution with increased visual
`quality in the range of I to 1.5
`Mbits/s. On the other hand, the
`
`contribution of the CCITT expert
`group has been extremely relevant
`and the members of MPEG have
`strived to maintain compatibility,
`introducing changes only to im-
`prove quality or to satisfy the need
`of applications. Consequently, the
`emerging MPEG standard, while
`not strictly a superset of CCITT
`Recommendation H.261, has much
`commonality with that standard so
`that implementations supporting
`both standards are quite plausible.
`
`CMTT/2 Activities. If digital video
`compression can be used
`for
`videoconferencing or videotele-
`phony applications, it also can be
`used for
`transmission of com-
`pressed television signals for use by
`broadcasters. In this context the
`transmission channels are either
`the high levels of the digital hierar-
`chy, H21 (34 Mbits/s) and H22 (45
`Mbits/s) or digital satellite channels.
`The CMTT/2 addressed the com-
`pression of television signals at 34
`and 45 Mbits/s [4]. This work was
`focused on contribution quality
`codecs, which means that the de-
`compressed signal should be of
`high enough quality to be suit-
`able for further processing (such as
`chromakeying). While the technol-
`ogy used might have some com-
`monalities with the solutions con-
`sidered by MPEG, the problem and
`the target bandwidth are very dif-
`ferent.
`
`MPEG Standardization Effort
`The MPEG effort started with a
`tight schedule, due to the realiza-
`tion that failure to get significant
`results fast enough would result in
`potentially disastrous consequences
`such as the establishment of multi-
`ple, incompatible de facto standards.
`With a tight schedule came the
`need for a tight methodology, so
`the committee could concentrate on
`technical matters, rather than waste
`time in dealing with controversial
`issues.
`
`Requirements. The purpose of the
`requirement phase was twofold:
`first, precisely determine the focus
`of the effort; then determine the
`rules of the game for the competi-
`tive phase. At the time MPEG
`began its effort, the requirements
`for the integration of digital video
`and computing were not clearly
`understood, and the MPEG ap-
`proach was to provide enough sys-
`tem design freedom and enough
`quality to address many applica-
`tions. The outcome of the require-
`ment phase was a document "Pro-
`posal Package Description" [8] and
`a test methodology [5].
`
`Competition. When developing an
`international standard, it is very
`important to make sure the trade-
`offs are made on the basis of maxi-
`mum information so that the life of
`the standard will be long: there is
`nothing worse than a standard that
`is obsolete at the time of publica-
`tion. This means the technology
`behind the standard must be state
`of the art, and the standard must
`bring together the best of academic
`and industrial research. In order to
`achieve this goal, a competitive
`phase followed by extensive testing
`is necessary, so that new ideas are
`considered solely on the basis of
`their
`technical merits and
`the
`trade-off between quality and cost
`of implementation.
`In the MPEG-Video competition,
`17 companies or institutions con-
`tributed or sponsored a proposal,
`and 14 different proposals were
`presented and subjected to analysis
`and subjective testing (see Table 1).
`Each proposal consisted of a docu-
`mentation part, explaining the al-
`gorithm and documenting the sys-
`tem claims, a video part for input to
`the subjective test [5], and a collec-
`tion of computer files (program
`and data) so the compression claim
`could be verified by an impartial
`evaluator.
`
`Methodology. The MPEG method-
`ology was divided in three phases:
`Requirements, Competition and
`Convergence:
`
`Convergence. The
`convergence
`phase is a collaborative process
`where the ideas and techniques
`identified as promising at the end
`
`48
`
`April 1991/V01.34, No.4 /COMMUNICATIONS OF MI ACM
`
`Page 3
`
`
`
`of the competitive phase are to be
`integrated into one solution. The
`convergence process is not always
`ideas of considerable
`painless;
`merit frequently have to be aban-
`doned in favor of slightly better or
`
`slightly simpler ones. The method-
`ology for convergence took the
`form of an evolving document
`called a simulation model and a se-
`ries of fully documented experi-
`ments (called core experiments).
`
`TABLE 1.
`
`Participation: Companies and institutions having contributed
`an MPEG Video Proposal
`
`Company
`
`AT&T
`
`Bellcore
`
`Intel
`
`GCT
`
`C-Cube Micro
`
`DEC
`
`France Telecom
`
`Cost 211 Bis
`
`IBM
`
`JVC Coro
`
`Matsushita EIC
`
`Mitsubishi EC
`
`NEC Corp.
`
`NTT
`
`Philips CE
`
`Sony Corp.
`
`Country
`
`USA
`
`USA
`
`USA
`
`Japan
`
`USA
`
`USA
`
`France
`
`EUR
`
`USA
`
`Japan
`
`Japan
`
`Japan
`
`Japan
`
`Japan
`
`Netherlands
`
`Japan
`
`Proposer
`
`AT&T
`
`Bellcore
`
`Bellcore
`
`Bellcore
`
`C-Cube Micro.
`
`DEC
`
`France Telecom
`
`France Telecom
`
`IBM
`
`JVC Cord
`
`Matsushita EIC
`
`Mitsubishi EC
`
`NEC Corp.
`
`NT!'
`
`Philips CE
`
`Sony Corp.
`
`Telenorma/U. Hannover
`
`Germany
`
`Telenorma/U. Hannover
`
`Storage Media and Channels where MPEG could have
`Applications
`
`CD-ROM
`
`DAT
`
`Winchester Disk
`
`Writable Optical Disks
`
`ISDN
`
`IAN
`
`Other Communication Channels
`
`DIGITAL MULTIMEDIA SYSTEMS
`
`The experiments were used to re-
`solve which of two or three alterna-
`tives gave the best quality subject to
`a reasonable implementation cost.
`
`Schedule. The schedule of MPEG
`was derived with the goal of obtain-
`ing a draft of the standard (Com-
`mittee Draft) by the end of 1990.
`Although the amount of work was
`considerable, and staying on sched-
`ule meant many meetings,
`the
`members of MPEG-Video were
`able to reach an agreement on a
`Draft in September 1990. The con-
`tent of the draft has been "frozen"
`since then, indicating that only
`minor changes will be accepted, i.e.,
`editorial changes and changes only
`meant to correct demonstrated in-
`accuracies. Figure 1 illustrates the
`MPEG schedule for the competitive
`and convergence phases.
`
`MPEG-Video Requirements
`A Generic Standard
`Because of the various segments of
`the information processing indus-
`try represented in the ISO commit-
`tee, a representation for video on
`digital storage media has to support
`many applications. This is ex-
`pressed by saying that the MPEG
`standard is a generic standard. Ge-
`neric means that the standard is
`independent of a particular appli-
`cation; it does not mean however,
`that it ignores the requirements of
`the applications. A generic stan-
`dard possesses features that make it
`somewhat universal—e.g., it fol-
`lows the toolkit approach; it does
`not mean that all the features are
`used all the time for all applica-
`tions, which would result in dra-
`matic inefficiency. In MPEG, the
`requirements on the video com-
`pression algorithm have been de-
`rived directly from the likely appli-
`cations of the standard.
`Many applications have been
`proposed based on the assumption
`that an acceptable quality of video
`
`COMMUNICATIONS OF THO ACM e Apr al 1991/Vol.34. Nu.4
`
`49
`
`Page 4
`
`
`
`June 1989: Pre-registration Deadline
`
`September 1989: Proposal Registration
`
`Competition
`
`October 1989: Subfective Test
`
`March 1990: Definition of Video Algorithm
`(Simulation Model 1)
`
`Convergence
`
`advantages of the other media
`(recordability,
`random
`acces-
`sability, portability and low cost).
`The compressed bit rate of 1.5
`Mbits is also perfectly suitable to
`computer and telecommunication
`networks and the combination of
`digital storage and networking can
`be at the origin of many new appli-
`cations from video on Local area
`networks (LANs) to distribution of
`video over telephone lines [I].
`
`Asymmetric Applications. In order
`to find a taxonomy of applications
`of digital video compression, the
`distinction between symmetric and
`asymmetric applications is most
`useful. Asymmetric applications are
`those that require frequent use of
`the decompression process, but for
`which the compression process is
`performed once and for all at the
`production of the program. Among
`asymmetric applications, one could
`find an additional subdivision into
`electronic publishing, video games
`and delivery of movies. Table 3
`shows the asymmetric applications
`of digital video.
`
`September 1990: Draft Proposal
`
`FIGURE I.
`MPEG Schedule for the Competi-
`tive and Convergence Phases
`
`can be obtained for a bandwidth of
`about 1.5 Mbits/second (including
`audio). We shall review some of
`these applications because they put
`constraints on
`the compression
`technique that go beyond those
`required of a videotelephone or a
`videocassette recorder (VCR). The
`challenge of MPEG was to identify
`those constraints and to design an
`algorithm that can flexibly accom-
`modate them.
`
`Applications of Compressed Video
`on Digital Storage Media
`Digital Storage Media. Many stor-
`age media and telecommunication
`channels are perfectly suited to a
`video compression technique tar-
`geted at the rate of 1 to 1.5 Mbits/s
`(see Table 2). CD-ROM is a very
`important storage medium because
`of its large capacity and low cost.
`Digital audio tape (DAT) is also
`perfectly suitable to compressed
`video; the recordability of the me-
`dium is a plus, but its sequential
`nature is a major drawback when
`random access is required. Win-
`chester-type computer disks pro-
`vide a maximum of flexibility
`(recordability, random access) but
`at a significantly higher cost and
`limited portability. Writable optical
`disks are expected to play a signifi-
`cant role in the future because they
`have the potential to combine the
`
`TABLE 3.
`
`ksymmetric Applications of
`Digital Video
`
`Electronic Publishing
`Education and Training
`Travel Guidance
`videotext
`Point of Sale
`Games
`Entertainment (movies)
`
`TABLE 4.
`
`Symmetric Applications of
`Digital Video
`
`Electronic Publishing
`(production)
`Video Mall
`Videotelephone
`Video Conferencing
`
`eration of material for playback-
`only applications; (desktop video
`publishing); another class involves
`the use of telecommunication ei-
`ther in the form of electronic mail
`or in the form of interactive face-
`to-face applications. Table 4 shows
`the symmetric applications of digi-
`tal video.
`
`Features of the Video
`Compression Algorithm
`The requirements for compressed
`video on digital storage media
`(DSM) have a natural impact on the
`solution. The compression algo-
`rithm must have features that make
`it possible to fulfill all the require-
`ments. The following features have
`been identified as important in
`order to meet the need of the appli-
`cations of MPEG.
`
`Symmetric Applications. Symmetric
`require essentially
`applications
`equal use of the compression and
`the decompression process. In sym-
`metric applications there is always
`production of video information
`either via a camera (video mail,
`videotelephone) or by editing pre-
`recorded material. One major class
`of symmetric application is the gen-
`
`Random Access. Random access is
`an essential feature for video on a
`storage medium whether or not the
`medium is a random access me-
`dium such as a CD or a magnetic
`disk, or a sequential medium such
`as a magnetic tape. Random access
`requires that a compressed video
`bit stream be accessible in its middle
`frame of video be
`and any
`
`50
`
`.prd 1991/Vo1.34, Nu.4i COOORNOCATIONS Of TM ACM
`
`Page 5
`
`
`
`decodable in a limited amount of
`time. Random access implies the
`existence of access points, i.e., seg-
`ments of information coded only
`with reference to themselves. A
`random access time of about 1/2
`second should be achievable with-
`out significant quality degradation.
`
`Fast ForwardlReverse Searches. De-
`pending on the storage media, it
`should be possible to scan a com-
`pressed bit stream (possibly with
`the help of an application-specific
`directory structure) and, using the
`appropriate access points, display
`selected pictures to obtain a fast
`forward or a fast reverse effect.
`This feature is essentially a more
`demanding form of random acces-
`sibility.
`
`Reverse Playback. Interactive appli-
`cations might require the video sig-
`nal to play in reverse. While it is not
`necessary for all applications to
`maintain full quality in reverse
`mode or even to have a reverse
`mode at all, it was perceived that
`this feature should be possible with-
`out an extreme additional cost in
`memory.
`
`Audio-Visual Synchronization. The
`video signal should be accurately
`synchronizable to an associated
`audio source. A mechanism should
`be
`provided
`to permanently
`resynchronize the audio and the
`video should the two signals be de-
`rived from slightly different clocks.
`This feature is addressed by the
`MPEG-System group whose task is
`to define the tools for synchroniza-
`tion as well as integration of multi-
`ple audio and video signals.
`
`Robustness to Errors. Most digital
`storage media and communication
`channels are not error-free, and
`while it is expected that an appro-
`priate channel coding scheme will
`be used by many applications, the
`source coding scheme should be
`robust to any remaining uncor-
`rected errors; thus catastrophic
`behavior in the presence of errors
`should be avoidable.
`
`Coding/Decoding Delay. As men-
`tioned previously, applications such
`as videotelephony need to maintain
`the total system delay under 150 ms
`in order to maintain the conversa-
`tional, "face-to-face" nature of the
`application. On the other hand,
`publishing applications could con-
`tent themselves with fairly long
`encoding delays and strive to main-
`tain the total decoding delay below
`the "interactive threshold" of about
`one second. Since quality and delay
`can be traded-off to a certain ex-
`tent, the algorithm should perform
`well over the range of acceptable
`delays and the delay is to be consid-
`ered a parameter.
`
`Editability. While it is understood
`that all pictures will not be com-
`pressed independently (i.e., as still
`images), it is desirable to be able to
`construct editing units of a short
`time duration and coded only with
`reference to themselves so that an
`acceptable level of editability in
`compressed form is obtained.
`
`Format Flexibility. The computer
`paradigm of "video in a window"
`supposes a large flexibility of for-
`mats in terms of raster size (width,
`height) and frame rate.
`
`Cost Tradeoffs. All the proposed
`algorithmic solutions were evalu-
`ated in order to verify that a de-
`coder is implementable in a small
`number of chips, given the technol-
`ogy of 1990. The proposed algo-
`rithm also had to meet the con-
`straint that the encoding process
`could be performed in real time.
`
`Overview of the MPEG
`Compression Algorithm
`The difficult challenge in the de-
`sign of the MPEG algorithm is the
`following: on one hand the quality
`requirements demand a very high
`compression not achievable with
`intraframe coding alone; on the
`other hand, the random access re-
`quirement is best satisfied with
`pure intraframe coding. The algo-
`rithm can satisfy all the require-
`ments only insofar as it achieves the
`
`DIGITAL MULTIMEDIA SYSTEMS
`
`Image
`and Video
`Siatillank
`
`M Digital
`The requirements on
`the MPEG video com-
`pression algorithm
`have been derived
`directly from
`the likely
`applications of the
`standard.
`
`high compression associated with
`interframe coding, while not com-
`promising random access for those
`applications that demand it. This
`requires a delicate balance between
`intra- and interframe coding, and
`between recursive and nonrecur-
`sive temporal redundancy reduc-
`tion. In order to answer this chal-
`lenge, the members of MPEG have
`resorted to using two interframe
`coding techniques: predictive and
`interpolative.
`The MPEG video compression
`algorithm [3] relies on two basic
`techniques: block-based motion
`compensation for the reduction of
`the
`temporal
`redundancy and
`transform domain-(DCT) based
`compression for the reduction of
`spatial
`redundancy. Motion-
`compensated techniques are ap-
`plied with both causal (pure predic-
`tive coding) and noncausal predic-
`tors (interpolative coding). The
`remaining signal (prediction error)
`is further compressed with spatial
`redundancy reduction (DCT). The
`information relative to motion is
`based on 16 X 16 blocks and is
`transmitted together with the spa-
`tial information. The motion infor-
`mation is compressed using vari-
`
`IICATIONS OF THE ACM/April 1991/Vo1.34, Nu.4
`
`51
`
`Page 6
`
`
`
`subsignal with low temporal resolu-
`tion (typically 1/2 or 1/3 of the
`frame rate) is coded and the full-
`resolution signal is obtained by in-
`terpolation of the low-resolution
`signal and addition of a correction
`term. The signal to be recon-
`structed by interpolation is ob-
`tained by adding a correction term
`to a combination of a past and a fu-
`ture reference.
`Motion-compensated
`interpola-
`tion (also called bidirectional pre-
`diction
`in MPEG
`terminology)
`presents a series of advantages, not
`the least of which is that the com-
`pression obtained by interpolative
`coding is very high. The other ad-
`vantages of bidirectional prediction
`(temporal interpolation) are:
`
`• It deals properly with uncovered
`areas, since an area just uncov-
`ered is not predictable from the
`past reference, but can be prop-
`erly predicted from the "future"
`reference.
`• It has better statistical properties
`since more information is avail-
`able: in particular, the effect of
`noise can be decreased by averag-
`ing between the past and the fu-
`ture reference pictures.
`• It allows decoupling between
`prediction and coding (no error
`propagation).
`• The trade-off associated with the
`frequency of bidirectional pic-
`tures is the following: increasing
`the number of B-pictures be-
`tween references decreases the
`correlation of B-pictures with the
`references as well as the correla-
`tion between
`the
`references
`themselves. Although this trade-
`off varies with the nature of the
`video scene, for a large class of
`scenes it appears reasonable to
`space references at about 1/10th
`second interval resulting in a
`combination of the type I B B P B
`BPBB.. IBBPBB.
`
`Motion Representation, Macroblock.
`There is a trade-off between the
`coding gain provided by the motion
`information and the cost associated
`with coding the motion informa-
`
`Bidirectional Prediction
`
`FIGURE 2.
`Interframe Coding
`
`codes
`able-length
`maximum efficiency.
`
`to achieve
`
`Temporal Redundancy Reduction
`Because of the importance of ran-
`dom access for stored video and the
`significant bit-rate reduction af-
`forded by motion-compensated in-
`terpolation, three types of pictures
`are
`considered
`in MPEG.2
`I ntra pictures (I), Predicted pictures
`(P) and
`Interpolated pictures
`(B—for bidirectional prediction).
`Intrapictures provide access points
`for random access but only with
`moderate compression; predicted
`pictures are coded with reference
`to a past picture (Intra- or Pre-
`dicted) and will in general be used
`as a reference for future predicted
`pictures; bidirectional pictures pro-
`vide the highest amount of com-
`pression but require both a past
`and a future reference for predic-
`tion; in addition, bidirectional pic-
`tures are never used as reference.
`In all cases when a picture is coded
`with respect to a reference, motion
`compensation is used to improve
`the coding efficiency. The relation-
`the three picture
`ship between
`types is illustrated in Figure 2. The
`organization of the pictures in
`
`21n addition to the three picture types men-
`tioned in the text, an additional type "DC-
`picture" has been defined. The DC-picture
`type is used to make fast searches possible on
`sequential DSMs such as tape recorders with a
`last search mechanism. The DC-picture type
`is never used in conjunction with the other
`picture types.
`
`MPEG is quite flexible and will de-
`pend on application-specific pa-
`rameters such as random accessibil-
`ity and coding delay. As an example
`in Figure 2, an intracoded picture is
`inserted every 8 frames, and the
`ratio of interpolated pictures to
`intra- or predicted pictures is three
`out of four.
`
`Motion Compensation.
`Prediction. Among the techniques
`that exploit the temporal redun-
`dancy of video signals, the most
`widely used is motion-compensated
`prediction. It is the basis of most
`compression algorithms for visual
`telephony such as the CCITT stan-
`dard H.261. Motion-compensated
`prediction assumes that "locally"
`the current picture can be modeled
`as a translation of the picture at
`some previous time. Locally means
`that the amplitude and the direc-
`tion of the displacement need not
`be the same everywhere in the pic-
`ture. The motion information is
`part of the necessary information to
`recover the picture and has to be
`coded appropriately.
`
`Interpolation. Motion-compensated
`interpolation is a key feature of
`MPEG. It is a technique that helps
`satisfy some of the application-
`dependent requirements since it
`improves random access and re-
`duces the effect of errors while at
`the same time contributing signifi
`candy to the image quality.
`In the temporal dimension, mo-
`tion-compensated interpolation is a
`multiresolution
`technique:
`a
`
`52
`
`spot 1991/ Vol.34, No.4iCOMMUNICATIONS Or MI ACM
`
`Page 7
`
`
`
`tion. The choice of 16 x 16 blocks
`for the motion-compensation unit
`is the result of such a trade-off,
`such motion-compensation units
`are called Macroblocks. In the more
`general case of a bidirectionally
`coded picture, each 16 x 16 mac-
`roblock can be of type Intra, For-
`ward-Predicted,
`Backward-
`Predicted or Average. As expressed
`in Table 5, the expression for the
`predictor for a given macroblock
`depends on reference pictures (past
`and future) as well as the motion
`vectors: 37 is the coordinate of the
`picture element, rr ol the motion
`vector relative to the reference pic-
`ture Io,
`the motion vector rel-
`ative to the reference picture II.
`The motion information consists
`of one vector for forward-predicted
`macroblocks
`and
`backward-
`predicted macroblocks, and of two
`vectors for bidirectionally predicted
`macroblocks. The motion informa-
`tion associated with each 16 x 16
`block is coded differentially with
`respect to the motion information
`present in the previous adjacent
`block. The range of the differential
`motion vector can be selected on a
`picture-by-picture basis, to match
`the spatial resolution, the temporal
`resolution and the nature of the
`motion in a particular sequence—
`the maximal allowable range has
`been chosen large enough to ac-
`commodate even the most demand-
`ing situations. The differential
`motion
`is
`further
`information
`coded by means of a variable-length
`code to provide greater efficiency
`by taking advantage of the strong
`
`spatial correlation of the motion
`vector field (the differential motion
`vector is likely to be very small ex-
`cept at object boundaries).
`
`Motion Estimation. Motion estima-
`tion covers a set of techniques used
`to extract the motion information
`from a video sequence. The MPEG
`syntax specifies how to represent
`the motion information: one or two
`motion vectors per 16 x 16 sub-
`block of the picture depending on
`the type of motion compensation:
`forward-predicted,
`backward-
`predicted, average. The MPEG
`draft does not specify how such
`vectors are to be computed, how-
`ever. Because of the block-based
`motion
`representation however,
`block-matching
`techniques
`are
`likely to be used; in a block-match-
`ing technique, the motion vector is
`obtained by minimizing a cost func-
`tion measuring the mismatch be-
`tween a block and each predictor
`candidate. Let M1 be a macroblock
`in the current picture I,, v the dis-
`placement with respect to the refer-
`ence picture Ir, then the optimal
`displacement ("motion vector") is
`obtained by the formula:
`
`-
`-
`E D[1, (x) - 1,(x + v)]
`X E St
`
`v7 =
`
`x e V
`
`where the search range V of the
`possible motion vectors and the se-
`lection of the cost function D are
`left entirely to the implementation.
`Exhaustive searches where all the
`possible motion vectors are consid-
`
`DIGITAL MULTIMEDIA SYSTEMS
`
`The freedom
`left to
`manufacturers...
`means the existence
`of a standard
`does not prevent
`creativity and
`inventive spirit.
`
`ered are known to give good re-
`sults, but at the expense of a very
`large complexity for large ranges:
`the decision of tradeoff quality of
`the motion vector field versus com-
`plexity of the motion estimation
`process is for the implementer to
`make.
`
`Spatial Redundancy Reduction
`Both still-image and prediction-
`error signals have a very high spa-
`tial redundancy. The redundancy
`reduction techniques usable to this
`effect are many, but because of the
`block-based nature of the motion-
`compensation process, block-based
`techniques are preferred. In the
`
`Macroblock Type
`
`Intra
`
`I Forward Predicted
`
`I Backward Predicted
`
`11Aacroblock In 3-Picture
`Prediction 11110des for T1
`111.11
`Predictor
`
`Prediction Error
`
`1, (TO = 128
`(R) =
`+ nwn,)
`1, 0-0 = 12 (Tt + mv2i)
`
`Average
`
`(z)=