`
`MPEG:
`A Video
`compression
`Standard
`for Multimedia
`Applications
`
`UNIFIED 1004
`
`UNIFIED 1004
`
`
`
`
`
`DIGITAL MULTIMEDII SYSTEMS
`
`
`
`digital video; MPEG is a standard
`that responds to a need. in this situ-
`ation a standards committee is a
`forum where precompetitive re-
`search can take place, where manu-
`facturers meet researchers, where
`industry meets academia. By and
`large. because the problem to be
`solved was perceived as important,
`the technology developed within
`MPEG is at the forefront of both
`research and industry. Now that
`the work of the MPEG committee
`has reached maturity (a "Commit-
`tee Draft" was produced in Septem-
`ber 1990),
`the VLSI
`industry is
`ready and waiting to implement
`MPEC‘s solution.
`
`MPEG Standard Activities
`The activity of the MPEG commit-
`tee was started in 1988 with the goal
`of achieving a draft of the standard
`by 1990. In the two years of MPEG
`activity. participation has increased
`tenfold from 15 to 150 participants.
`The MPEG activity was not started
`without due consideration to the
`related activities of other standard
`organizations. These
`considera-
`tions are of interest, not only be-
`cause it is important to avoid dupli-
`cation of work between standards
`committees but most of all, because
`these activities provided a very im-
`portant background and technical
`input to the work of the MPEG
`committee.
`
`Background: Relevant Standards
`The jPEG Standard. The activities
`of jPEG (joint Photographic Ex-
`perts' Group) [10] played a consid-
`erable role in the beginning of
`MPEG,
`since both groups were
`originally in the same working
`group of ISO and there has been
`considerable overlap in member-
`ship. Although the objectives of
`jPEG are focused exclusively on
`still-image compression. the distinc-
`tion between still and moving image
`is thin; a video sequence can be
`
`he development of digital
`video technology in the 1980s
`has made it possible to use digital
`video compression for a variety of
`telecommunication
`applications:
`teleconferencing, digital broadcast
`codec and video telephony.
`Standardization of video com—
`pression techniques has become a
`high priority because only a stan-
`dard can reduce the high cost of
`video compression codecs and re-
`solve the critical problem of inter-
`operability of equipment from dif-
`ferent
`manufacturers.
`The
`existence of a standard is often the
`trigger to the volume production of
`integrated circuits (V L51) necessary
`for significant cost reductions. An
`example of such a phenomenon—
`where a standard has stimulated
`the growth of an industry—is the
`spectacular growth of the facsimile
`market in the wake of the standard-
`ization of the Group 3 facsimile
`compression
`algorithm by
`the
`CCITT. Standardization 'of com-
`pression algorithms for video was
`first initiated by the CCI'i‘T for tele-
`conferencing and videotelephony
`[7]. Standardization of video com-
`pression techniques for transmis-
`sion of contribution~quaiity televi-
`sion signals has been addressed in
`the CCIRl
`(more
`precisely
`in
`(IMTTI2, a joint committee be—
`tween the CCIR and the CCITT}.
`Digital transmission is of prime
`importance for telecommunication,
`particularly in the telephone net-
`work. but there is a lot more to digi-
`tal video than teleconferencing and
`visual
`telephony. The computer
`industry.
`the telecommunications
`industry and the consumer elec-
`tronics industry are increasingly
`sharing the
`same
`technology—
`there is much talk of a convergence,
`which does not mean that a com—
`puter workstation and a television
`receiver are about
`to become the
`same thing, but certainly, the tech-
`nology is converging and includes
`'{lClR is the international Consultative Com—
`mittee on Broadcasting: (lCiTT is the Inter-
`national Committee on Telegraph and 'l'ele-
`phones. CMT’i'
`is a joint committee of the
`{:(Ii'I‘T and the (LCiR working on issues rele-
`vant to television and telephony.
`
`in the
`digital video compression.
`view of shared technology between
`different segments of the informa-
`tion processing industry, the Inter-
`national Organization for Stand-
`ardization (150) has undertaken an
`effort
`to develop a standard for
`video and associated audio on digi-
`tal storage media. where the con-
`cept of digital storage medium in-
`cludes conventional storage devices
`(JD-ROM. DAT,
`tape
`drives,
`winchesters. writable optical drives,
`as well as telecommunication chan-
`nels such as ISDNs, and local area -
`networks.
`This effort is known by the name
`of the expert group that started it:
`MPEG—Moving Picture Experts
`Group—and is currently part of
`the
`ISO~1ECIJTC1ISC2IWGI 1.
`The MPEG activities cover more
`than video compression, since the -
`compression of the associated audio
`and the issue of audio—visual syn-
`chronization cannot be worked in-
`
`dependently of the video compres-
`sion: MPEG—Video is addressing
`the compression of video signals at
`about 1.5 Mbits. MPEG-Audio is
`addressing the compression of a
`digital audio signal at the rates of
`64, 128 and 192 kbitst’s per channel,
`MPEG-System is addressing the
`issue of synchronization and multi-
`plexing of multiple compressed
`audio and video bit streams. This
`article focuses on the activities of
`MPEG-Video. The
`premise
`of
`MPEG is that a video signal and its
`associated audio can be compressed
`to a bit rate of about 1.5 Mbitsi’s
`with an acceptable quality.
`conse-
`Two
`very
`important
`quences follow: Full-motion video
`becomes a form of computer data,
`Le. a data type to be integrated
`with text and graphics; Motion
`video and its associated audio can
`be delivered over existing com-
`puter and telecommunication net-
`works.
`
`Precompetltlve Research
`The growing importance of digital
`video is reflected in the participa-
`tion of more and more companies
`in standards activities dealing with
`
`mum “Till MIA pril 1991 H’s-1.3 1, No.1
`
`‘1
`
`
`
`WWW
`
`thought of as a sequence of still
`images to be coded individually, but
`displayed sequentially at video rate.
`However,
`the “sequence of still
`images” approach has the disad-
`vantage that it fails to take into conw
`sideration the extensive frame-to—
`
`frame redundancy present in all
`video sequences. Indeed, because
`there is a potential for an additional
`factor of three in compression ex-
`ploiting the temporal redundancy,
`and because this potential has very
`significant
`implications for many
`applications
`relying on
`storage
`media with limited bandwidth, ex—
`tending the activity of the ISO com“
`mittee to moving pictures was a nat-
`ural next step.
`
`CCHT Expert Group on Visual Tie-
`kphony. As previously mentioned,
`most 'of the pioneering activities in
`video compression were triggered
`by
`teleconferencing and video-
`telephony applications. The defini»
`tion and planned deployment of
`ISBN (Integrated Service Digital
`Network) was the motivation for
`the standardization of compression
`techniques at the rate of pr4 kbitsls
`where p takes values from one (one
`B channel of ISBN) to more than
`20 (Primary rate ISBN is 23 or 30 B
`channels). The Experts Group on
`visual
`telephony in the CCITT
`Study Group XV addressed the
`problem and produced CCITT
`Recommendation H.261: “Video
`Codec for Audiovisual Services at
`
`px64 ltbits“ [7, 9]. The focus of the
`CCITT expert group is a real-time
`encodingvdecoding system, exhibit-
`ing less than 150 ms delay. In addi-
`tion, because of the importance of
`very low bit-rate operation (around
`64 kbitsl's), the overhead informa-
`tion is very tightly managed.
`After careful consideration by
`the MPEG committee,
`it was per-
`ceived that while the work of the
`
`CCITT expert group was of very
`high quality, relaxing the constraint
`on very low delay and the focus on
`extremely low bit rates could lead to
`a
`solution with increased visual
`
`to 1.5
`quality in the range of 1
`Mbitsls. On the other hand,
`the
`
`contribution of the CCITT expert
`group has been extremely relevant
`and the members of MPEG have
`
`strived to maintain compatibility,
`introducing changes only to im-
`prove quality or to satisfy the need
`of applications. Consequently. the
`emerging MPEG standard. while
`not strictly a superset of CCITT
`Recommendation H.261, has much
`commonality with that standard so
`that
`implementations
`supporting
`both standards are quite plausible.
`
`CMTTI2 Activities. If digital video
`compression
`can
`be
`used
`for
`videoconferencing
`or
`videotelew
`phony applications, it also can be
`used for
`transmission of com-
`
`pressed television signals for use by
`broadcasters.
`In this context
`the
`transmission channels are either
`
`the high levels of the digital [hierar-
`chy, H2] (34 Mbitsr’s) and H22 (45
`Mbitsr‘s) or digital satellite channels.
`The CMTTl2 addressed the com-
`
`pression of television signals at 34
`and 45 Mbitsl's [4]. This work was
`focused on contribution quality
`codecs, which means that the de~
`compressed signal should be of
`high enough quality to be suit-
`able for further processing (such as
`chromakeying}. While the technol-
`ogy used might have some com—
`monalities with the solutions con-
`
`sidered by MPEG, the problem and
`the target bandwidth are very dif-
`ferent.
`
`MPEG Standardization Effort
`The MPEG effort started with a
`
`tight schedule, due to the realiza-
`tion that failure to get significant
`results fast enough would result in
`potentially disastrous consequences
`such as the establishment of multi-
`
`ple, incompatible defaclo standards.
`With a tight schedule came the
`need for a tight methodology, so
`the committee could concentrate on
`technical matters, rather than waste
`time in dealing with controversial
`issues.
`
`Requirements. The purpose of the
`requirement phase was
`twofold:
`first, precisely determine the focus
`of the effort;
`then determine the
`rules of the game for the competi-
`tive phase. At
`the time MPEG
`began its effort, the requirements
`for the integration of digital video
`and computing were not clearly
`understood, and the MPEG ap-
`proach was to provide enough sys-
`tem design freedom and enough
`quality to address many applica-
`tions. The outcome of the require-
`ment phase was a document “Pro-
`posal Package Description” [8] and
`a test methodology [5].
`
`Competition. When developing an
`international standard,
`it
`is very
`important to make sure the trade-
`offs are made on the basis of maxi-
`mum information so that the life of
`
`the standard will be long: there is
`nothing worse than a standard that
`is obsolete at the time of publica—
`tion. This means the technology
`behind the standard must be state
`of the art, and the standard must
`bring together the best of academic
`and industrial research. In order to
`
`achieve this goal, a competitive
`phase followed by extensive testing
`is necessary. so that new ideas are
`considered solely on the basis of
`their
`technical merits and the
`
`trade~off between quality and con
`of implementation.
`In the MPEG-Video competition.
`17 companies or institutions con-
`tributed or sponsored a proposal.
`and 14 different proposals were
`presented and subjected to analysis
`and subjective testing (see Table 1).
`Each proposal consisted of a docu-
`mentation part, explaining the al-
`gorithm and documenting the sys-
`tem claims, a video part for input to
`the subjective test [5], and a collec—
`tion of computer files (program
`and data) so the compression claim
`could be verified by an impartial
`evaluator.
`
`Methodology. The MPEG method-
`ology was divided in three phases:
`Requirements, Competition and
`Convergence:
`
`convergence
`The
`Convergence.
`phase is a collaborative process
`where the ideas and techniques
`identified as promising at the end
`
`April 1991an1.3§, Nojfcmmwm “I
`
`
`
`
`slightly simpler ones. The method-
`ology for convergence tool;
`the
`l‘orm ol‘ an evolving document
`called a simulation model and a se—
`ries of fully documented experi—
`ments (called core experiments).
`
`oi the competitive phase are to be
`integrated into one solution. The
`convergence process is not altvavs
`painless:
`ideas
`of
`considerable
`merit frequently have to be aban—
`doned in later of slightly better or
`
`DIGITAL MULTIMEDIA SYSTEMS
`
`
`
`'l‘he experiments were used to re-
`solve which of two or three alterna-
`tives gave the. best quality subject to
`a reasonable implementation cost.
`
`Schedule. The schedule of MI’l‘LG
`was derived with the goal ol'obtaitt-
`ing a draft of the standard (Com-
`mittee Utah} by the end of [990.
`Although the amount of work was
`considerable. and staying on sched—
`ule meant many meetings.
`the
`members of MPl‘lG-Video Were
`able to reach an agreement on a
`Draft in September 1991}. The con-
`tent of the draft has been "frozen"
`since
`then.
`indicating that only
`minor changes will be accepted. i.e..
`editorial changes and changes only
`meant to correct demonstrated in-
`accuracies. Figure 1
`illustrates the
`Ml’l‘l} schedule tor the competitive
`and convergence phases.
`
`MPEC-theo Requirements
`A Generic Standard
`Because of the various segments of
`the inlormation processing indus—
`try represented in the ISO commit—
`tee. a representation [or video on
`digital storage media has to support
`many applications. This
`is
`ex—
`pressed hv saying that
`the MPEG
`standard is a generic standard. Ge-
`neric means that
`the standard is
`
`independent of a particular appli-
`cation; it does not mean however.
`that it ignores the requirements ol‘
`the applications. A generic stan-
`dard possesses features that make it
`somewhat universal—cg" it
`fol-
`lows the toolkit approach:
`it does
`not mean that all the features are.
`used all
`the time for all applica—
`tions. which would result
`in dra—
`
`the
`In MPH}.
`matic inefficiency.
`retptirements on the video coni-
`pression algorithm have been de-
`rived (lirectlv from the likely appli-
`cations ol' the standard.
`
`Many applicaticms have been
`proposed based on the assumption
`that an acceptable quality of video
`
`
`
`
`Company
`Counterr
`
`Proposer
`
`
`ATstT
`'—
`AT&T
`USA
`Betlcore
`Bettco‘re
`USA
`—_l
`
`l—
`Intel
`USA
`Bellcore
`
`
`
`Bellcol'e
`
`
`C~Cube Mlcro
`USA
`C-Cube Micro. —l
`
`DEC
`usa
`mac
`__|
`'—
`
`France Telecom
`France
`France Telecom
`|__)”——|
`Cost 211 Bls
`EUR
`France Telecom
`
`IBM
`USA
`IBM
`__l
`
`
`l_ JVC Corp
`Japan
`JVC .Corp
`_|
`Matsushtta Elc
`Japan
`matsusnita Etc
`
`
`
`Mitsubishi EC
`MItsublshl 'Ec
`Japan
`
`NEC Com.
`'— NEC corp.
`Japan
`
`N‘IT
`Japan
`NTI'
`
`Pnttlps CE
`Netherlands
`Philips CE
`'—
`
`[— Sonv Corp.
`Japan
`Sony Corp.
`Telenorrna/U. Hannover
`Germany
`Telenorma/U. Hannover
`l_
`
`TRBLE 2.
`
`
`
`I
`1
`
`
`
`
`thch‘ester Dls'k
`Writabte Optical Disks
`
`
`
`ISDN
`
`
`LAN
`
`
`
`Other Communication channels
`
`
`
`
`
`mummanmwmimH-'.-\Itrt: WWI-Vol Il-l, Nu!
`
`
`
`TABLE 3.
`
`June 1999: Pro-registration Deadline
`
`Sepmber 1999: Proposal Registration
`
`October 1989: Subieotive Test
`
`March 1990: Definition of Video Algorlihm
`(Simulallon Model 1)
`
`September 1990: Draft Proposal
`
`
`
`MPEG schedule for the competi-
`tive and convergence Phases
`
`can be obtained for a bandwidth of
`
`about 1.5 Mbitslsecoud (including
`audio). We shall
`review some of
`these applications because they put
`constraints on the
`compression
`technique that go beyond those
`required of a videotelephone or a
`videocassette recorder (VCR). The
`challenge of MPEG was to identify
`those constraints and to design an
`algorithm that can flexibly accom-
`modate them.
`
`Applications of Compressed Video
`on Digital Storage Media
`
`Digital Storage Media. Many stor-
`age media and telecommunication
`channels are perfectly suited to a
`video compression technique tar-
`geted at the rate of l to 1.5 Mbitsl's
`[see Table 2). CDAROM is a very
`important storage medium because
`of its large capacity and low cost.
`Digital audio tape {DAT} is also
`perfectly suitable to compressed
`video; the recordability of the me-
`dium is a plus, but
`its sequential
`nature is a major drawback when
`random access is required. Win-
`chester-type computer disks pro-
`vide
`a maximum of
`flexibility
`(recordability, random access) but
`at a significantly higher cost and
`limited portability. Writable optical
`disks are expected to play a signifi-
`cant role in the future because they
`have the potential to combine the
`
`the other media
`advantages of
`random
`acces‘
`(recordability,
`sability, portability and low cost).
`The compressed bit rate of 1.5
`Mbits is also perfectly suitable to
`computer and telecommunication
`networks and the combination of
`
`digital storage and networking can
`be at the origin of many new appli-
`cations from video on Local area
`
`networks [LANs} to distribution of
`video over telephone lines [1].
`
`Asymmetric Applications. In order
`to find a taxonomy of applications
`of digital video compression.
`the
`distinction between symmetric and
`asymmetric applications
`is most
`useful. Asymmetric applications are
`those that require frequent use of
`the decompression process, but for
`which the compression process is
`performed once and for all at the
`production of the program. Among
`asymmetric applications, one could
`find an additional subdivision into
`
`electronic publishing, video games
`and delivery of movies. Table 3
`shows the asymmetric applications
`of digital video.
`
`eration of material for playback-
`only applications;
`(desktop video
`publishing); another class involves
`the use of telecommunication ei-
`ther in the form of electronic mail
`or in the form of interactive face-
`
`to—face applications. Table 4 shows
`the symmetric applications of digi-
`tal video.
`
`Features of the Video
`compression Algorithm
`
`The requirements for compressed
`video on digital
`storage media
`[DSM] have a natural impact on the
`solution. The compression algo-
`rithm must have features that make
`
`it possible to fulfill all the require-
`ments. The following features have
`been identified as
`important
`in
`order to meet the need of the appli-
`cations of MPEG.
`
`Symmetric Applications. Symmetric
`applications
`require
`essentially
`equal use of the compression and
`the decompression process. In sym-
`metric applications there is always
`production of video information
`either via a camera {video mail.
`videotelephone) or by editing pre-
`recorded material. One major class
`of symmetric application is the gen-
`
`Random Access. Random access is
`an essential feature for video on a
`
`storage medium whether or not the
`medium is a random access me—
`
`dium such as a (1D or a magnetic
`disk, or a sequential medium such
`as a magnetic tape. Random access
`requires that a compressed video
`bit stream be accessible in its middle
`
`and
`
`any
`
`frame of
`
`video
`
`he
`
`April 193” {Vol.31, Noel roomumom OF Till “I
`
`
`
`
`
`decodable in a limited amount of
`time. Random access implies the
`existence of access points, i.e.. seg-
`ments of information coded only
`with reference to themselves. A
`random access time of about
`112
`second should be achievable with
`
`out significant quality degradation.
`
`Fast Forwardlfleverse Searches. De-
`
`it
`pending on the storage media,
`should be possible to scan a com-
`pressed bit stream (possibly with
`the help of an applicationvspecific
`directory structure) and, using the
`appropriate access points. display
`selected pictures to obtain a fast
`forward or a fast reverse effect.
`This feature is essentially a more
`demanding form of random acces-
`sibility.
`
`Reverse Playback. Interactive appli-
`cations might require the video sig-
`nal to play in reverse. While it is not
`necessary for all applications to
`maintain full quality in reverse
`mode or even to have a reverse
`
`mode at all, it was perceived that
`this feature should be possible with-
`out an extreme additional cost
`in
`memory.
`
`Audio-Visual Synchronization. The
`video signal should be accurately
`synchronizable to an associated
`audio source. A mechanism should
`
`permanently
`to
`provided
`be
`resynchronize the audio and the
`video should the two signals be de-
`rived from slightly different clocks.
`This feature is addressed by the
`MPEG-System group whose task is
`to define the tools for synchroniza-
`tion as well as integration of multi-
`ple audio and video signals.
`
`Robustness to Errors. Most digital
`storage media and communication
`channels are not error-free, and
`
`while it is expected that an appro-
`priate channel coding scheme will
`be used by many applications, the
`source coding scheme should be
`robust
`to any remaining uncor-
`rected errors:
`thus catastrophic
`behavior in the presence of errors
`should be avoidable.
`
`Codingl'Decoding Delay. As men-
`tioned previously, applications such
`as videotelephony need to maintain
`the total system delay under 150 ms
`in order to maintain the conversa-
`tional, “face—to-face“ nature of the
`application. 0n the other hand,
`publishing applications could con—
`tent
`themselves with fairly long
`encoding delays and strive to main-
`tain the total decoding delay below
`the "interactive threshold" of about
`one second. Since quality and delay
`can be traded-off to a certain ex-
`
`tent, the algorithm should perform
`well over the range of acceptable
`delays and the delay is to be consid-
`ered a parameter.
`
`is understood
`Editability. While it
`that all pictures will not be com-
`pressed independently (i.e., as still
`images). it is desirable to be able to
`construct editing units of a short
`time duration and coded only with
`reference to themselves so that an
`
`acceptable level of editability in
`compressed form is obtained.
`
`Format Flexibility. The computer
`paradigm of “video in a window"
`supposes a large flexibility of for
`mats in terms of raster size (width,
`
`height} and frame rate.
`
`the proposed
`Cost Trodeofis. All
`algorithmic solutions were evalu-
`ated in order to verify that a de-
`coder is implementable in a small
`number of chips, given the technol-
`ogy of 1990. The proposed algo-
`rithm also had to meet
`the con-
`straint
`that
`the encoding process
`could be performed in real time.
`
`Overview of the MPEG
`Compression Algorithm
`The difficult challenge in the de—
`sign of the MPEG algorithm is the
`following: on one hand the quality
`requirements demand a very high
`compression not achievable with
`intraframe coding alone; on the
`other hand, the random access re-
`
`satisfied with
`is best
`quirement
`ptire intraf'rame coding. The algo-
`rithm can satisfy all
`the require-
`ments only insofar as it achieves the
`
`mm“ OF “II mlffipril IQ‘JI {Vol.34, Noni
`
`Dlfll'flll. Mlllfl'lfllfllll SVS‘I‘EHS
`
`
`The requirements on
`the MPEG video COIII-
`IJI'ESSiOII algorithm
`have been derived
`directly from
`the likely
`applications (If the
`standard.
`
`high compression associated with
`interframe coding, while not com-
`promising random access for those
`applications that demand it. This
`requires a delicate balance between
`intra- and interframe coding, and
`between recursive and nonrecur—
`
`sive temporal redundancy reduc—
`tion. In order to answer this chal-
`
`lenge, the members of MPEG have
`resorted to using two interframe
`coding techniques: predictive and
`interpolative.
`The MPEG video compression
`algorithm [3]
`relies on two basic
`techniques:
`block-based motion
`compensation for the reduction of
`the
`temporal
`redundancy
`and
`transform domain-{DCT}
`based
`compression for the reduction of
`spatial
`redundancy.
`Motion~
`compensated techniques are ap-
`plied with both causal (pure predic—
`tive coding) and noncausal predic-
`tors {interpolative
`coding}. The
`remaining signal (prediction error)
`is further compressed with spatial
`redundancy reduction (DC’l‘). The
`information relative to motion is
`based on 16X 16 blocks and is
`
`transmitted together with the spa—
`tial information. The motion infor—
`
`mation is compressed using vari-
`
`
`
`
`
`Forward Predation
`
`W“
`
`
`
`Bidireotional Prediction
`
`subsignal with low temporal resolu-
`tion (typically U2 or U3 of the
`frame rate) is coded and the full-
`resolution signal is obtained by in-
`terpolation of the low-resolution
`signal and addition of a correction
`term. The signal
`to be
`recon-
`structed by interpolation is ob-
`tained by adding a correction term
`to a combination ofa past and a fu-
`ture reference.
`
`Motion-compensated interpola-
`tion (also called bidirectional pre-
`diction
`in MPEG terminology}
`presents a series of advantages. not
`the least of which is that the com~
`
`pression obtained by interpolative
`coding is very high. The other ad-
`vantages of bidirectional prediction
`{temporal interpolation} are:
`
`0 it deals properly with uncovered
`areas. since an area just uncov-
`ered is not predictable from the
`past reference. but can be prop-
`erly predicted from the “future"
`reference.
`
`0 It has better statistical properties
`since more information is avail-
`
`in particular, the effect of
`able:
`noise can be decreased by averag~
`ing between the past and the fu-
`ture reference pictures.
`0 It
`allows decoupling between
`prediction and coding {no error
`propagation).
`O The trade-off associated with the
`
`frequency of bidirectional pic-
`tures is the following: increasing
`the number of B-pictures be-
`tween references decreases the
`
`correlation of B-pictures with the
`references as well as the correla-
`tion
`between
`the
`references
`
`themselves. Although this trade-
`off varies with the nature of the
`
`video scene. for a large class of
`scenes it appears reasonable to
`space references at about
`lilOth
`second interval
`resulting in a
`combination of the type I B B P B
`BPBB..IBBPBB.
`
`MPEG is quite flexible and will de-
`pend on application-specific pa-
`rameters such as random accessibil-
`
`ity and coding delay. As an example
`in Figure 2. an intracoded picture is
`inserted every 8 frames, and the
`ratio of interpolated pictures to
`intra— or predicted pictures is three
`out of four.
`
`Motion Compensation.
`Prediction. Among the techniques
`that exploit
`the temporal redun-
`dancy of video signals,
`the most
`widely used is motion-compensated
`prediction. It
`is the basis of most
`compression algorithms for visual
`telephony such as the CCITT stan—
`dard H.261. Motion-compensated
`prediction assumes
`that “locally"
`the current picture can be modeled
`as a translation of the picture at
`some previous time. Locally means
`that the amplitude and the direc-
`tion of the displacement need not
`be the same everywhere in the pic-
`ture. The motion information is
`
`part of the necessary information to
`rec0ver the picture and has to be
`coded appropriately.
`
`Interpolation. Motion-compensated
`interpolation is a key feature of
`MPEG. It is a technique that helps
`satisfy some of the application-
`dependent
`requirements since it
`improves random access and re-
`duces the effect of errors while at
`
`Interframe Coding
`
`codes
`able-length
`maximum efficiency.
`
`to
`
`achieve
`
`Temporal Redundancy Reduction
`Because of the importance of ran-
`dom access for stored video and the
`
`significant bit-rate reduction af-
`forded by motion«compensated in-
`terpolation, three types of pictures
`are
`considered
`in MPEG.2
`
`Intrapictures (1), Predicted pictures
`(P)
`and
`Interpolated
`pictures
`(B—for bidirectional prediction).
`Intrapictures provide access points
`for random access but only with
`moderate compression; predicted
`pictures are coded with reference
`to a past picture (Intra— or Pre-
`dicted) and will in general be used
`as a reference for future predicted
`pictures; bidirectional pictures pro-
`vide the highest amount of com-
`pression but require both a past
`and a future reference for predic-
`tion; in addition. bidirectional pic-
`tures are never used as reference.
`
`in all cases when a picture is coded
`with respect to a reference, motion
`compensation is used to improve
`the coding efficiency. The relation-
`ship between the three picture
`types is illustrated in Figure 2. The
`
`the picturesorganization of in
`
`
`“In addition to the three picture types menv
`tioned in the text. an additional type "DC-
`picturc" has been defined. The DC—picture
`type is used to make fast searches possible on
`sequential DSMs such as tape recorders With a
`fast search mechanism. The DC-picture type
`is never used in conjunction with the other
`picture types.
`
`the same time contributing signifi-
`cantly to the image quality.
`In the temporal dimension, mo-
`tion-compensated interpolation is a
`multiresolution
`technique:
`a
`
`Motion Representation, Macrablock.
`There is a trade«off between the
`
`coding gain provided by the motion
`information and the cost associated
`
`with coding the motion informa-
`
`April 1991 {Vol.31, NEHMWMM ”I
`
`
`
`tiott. The choice of 15 x [6 blocks
`
`for the monon-compensation unit
`is the result of such a
`trade-off,
`
`such motion-compensation ttnits
`are called M acrohlocks. [n the more
`
`general case of a hidirectionally
`coded picture. each 16 X 16 mac-
`rohlock can be of type Itttra. For-
`ward-Predicted,
`Backward-
`Predicted or Average. As expressed
`in Table 5. the citpression for the
`predictor for a given macrobloclt
`depends on reference pictures {past
`and future} as well as the motion
`vectors: Si
`is the coordinate of the
`
`picture element. fin.” the motion
`vector relative to the reference pic-
`ture I". my the motion vector rel-
`ative to the reference picture 1..
`The motion information consists
`
`of one vector for forward-predicted
`macroblocks
`and
`backward-
`
`predicted niacroblocks. and of two
`vectors for bidirectionally predicted
`macrohlocks. The morion informa-
`tion associated with each 16 X 16
`
`block is coded differentially with
`respect to the motion information
`present
`in the previous adjacent
`block. The range ofthe differential
`motion vector can be selected on a
`
`to match
`picture-by—pictttre basis.
`the spatial resolution. the temporal
`resolution and the nature of the
`
`motion iii a particular sequence—
`the maximal allowable range has
`been chosen large enough to ac-
`commodate even the most demand-
`
`situations. The differential
`ing
`motion
`information
`is
`further
`
`coded by means ofa variable-length
`code to provide greater efficiency
`by taking advantage of the strong
`
`spatial correlation of the motion
`vector field (the differential motion
`vector is likely to be very small ex-
`cept at object boundaries}.
`
`
`
`Motion Estimation. Motion estima-
`
`tion covers a set of techniques used
`to extract the motion information
`front a video sequence. The MI’EG
`syntax specifies how to represent
`the motion information: one or two
`
`[6 X 16 sub-
`motion vectors per
`block of the picture depending on
`the type of motion compensation:
`forward—predicted.
`backward-
`predicted.
`average. The MPEG
`draft does not specify how such
`vectors are to be computed. how-
`ever. Because of the block-based
`
`however.
`representation
`motion
`block-matching
`techniques
`are
`likely to be used: in a block-match-
`ing technique, the motion vector is
`obtained by minimizing a cost func-
`tion measuring the mismatch be-
`tween a block and each predictor
`candidate. Let M; be a macroblock
`in the current picture I... v the dis-
`placement with respect to the refer-
`ence picture I...
`then the optimal
`displacement
`("motion vector") is
`obtained by the formula:
`
`v; = min‘ I 2 bit. (E) _ u; + 3)]
`x (- M‘
`
`XEV
`
`where the search range V of the
`possible motion vectors and the se~
`lection of the cest function D are
`left entirely to the implementation.
`Exhaustive searches where all the
`
`possible motion vectors are consid-
`
`The freedom
`
`left to
`
`manufacturers...
`means the ElliStEIICE
`of a standard
`does I101: prevent
`creativity and
`inventive spirit.
`
`ered are known to give good re-
`sults. but at the expense of a very
`large complexity for large ranges:
`the decision of tradeoff quality of
`the motion vector field versus com-
`
`plexity of the motion estimation
`process is for the implementer to
`make.
`
`Spatial Redundancyr Reduction
`Both still-image and prediction-
`error signals have a very high spa-
`tial redundancy. The redundancy
`reduction techniques usable to this
`effect are many, but because of the
`block-based nature of the motion-
`
`compensation process. block-based
`techniques are preferred.
`In the
`
`
`
`DON—“Hm OF 1’“! ”If April 199] fVoIIH. No}
`
`
`
`image Samples
`
`Transfonn Coeffidents
`
`fifififik‘flt‘t‘lHEL‘L‘L‘L‘L‘H
`
`Guantlzatlon,
`Zlg-Zag Scan,
`Run-length codlng
`
`(Hm: Amplitude] Symbols
`
`
`
`-9
`
`Ouantlzer with deadzonn
`{Nontntra H—blocks)
`Reconstructed
`
`Duantlzer wlth no deadzone
`{Intra M-blocks)
`Reconstructed
`lave!
`
`246310
`-3
`-5
`-7
`
`field of block-based spatial redun-
`dancy techniques.
`transform cod-
`ing techniques and vector quantiza-
`tion coding are the two likely
`candidates. Transform coding tech-
`niques with a combination of visu-
`ally weighted scalar quantization
`and run—length coding have been
`preferred because the DUI~ pres-
`ents a certain number of definite
`
`relatively
`advantages and has a
`straightlorwarti
`implementation;
`the advantages are the following:
`
`'I The DC'I'
`'l‘ransform:
`
`is
`
`an Orthogonal
`
`are
`'I't‘anslorms
`Orthogonal
`filter-hank-oriented (i.e.. have
`a frequency domain interpreta-
`tion).
`the samples on a
`Locality:
`8 X 8 spatial window are suffi—
`cient to compute 64 transform
`coefficients (or suhbands).
`()rthogonaiitv guarantees well-
`behaved
`quantization
`in
`subbands.
`O The DUI~ is the best of the or-
`
`transforms with a fast
`thogonal
`algorithm, and a very close ap-
`proximation to the optimal for a
`large class of images.
`0 The DCT basis
`function (or
`subband decomposition) is suffi-
`ciently well-behaved to allow ef-
`fective use of psychovisual crite-
`ria.
`(This is not
`the case with
`“simpler"
`transform such
`as
`Walsh-Hadamard.)
`
`In the standards for still image
`coding OPEC) an