`
`Comcast, Ex. 1017
`
`1
`
`Comcast, Ex. 1017
`
`
`
`
`
`1
`Introduction to Digital Multimedia,.
`Compression, and MPEG,,2
`
`l . l THE DIGITAL REVOLUTION
`
`It is practically a cliche these days to claim that all
`electronic ,communications is engaged in a digital
`revolution. Some communications media such as
`the telegraph and telegrams were always digital.
`However, almost all other electronic communica(cid:173)
`tions started and flourished as analog forms ..
`In an analog communication system, a trans(cid:173)
`mitter sends voltage variations, which a receiver
`then uses to control transducers such as loudspeak-·
`ers, fax printers, and TV cathode ray tubes (CRTs),
`as shown in Figure 1.1. With digital, on the other
`hand, the transmitter first converts the controlling
`voltage into a sequence of Os and ls (called bits),
`which _are then transmitted. The receiver then
`reconverts or decodes the bits back into a replica of
`· the original voltage variations.
`The main advantage of digital representation of
`information is the robustness of the bitstream. It
`can be stored and recovered, transmitted and
`received, processed and manipulated, all virtually
`without error. The only requirement is that a zero
`bit be distinguishable from a one bit, a task that is
`quite easy in all modern signal handling systems.
`Over the years, long and medium distance tele-
`. phone transmissions have all become digital. Fax
`transmission changed to digital in the 1970s and
`80s. Digitized entertainment audio on compact
`
`disks (CDs) has completely displaced vinyl records
`and almost replaced cassette tape. Digital video
`satellite is available in several countries, and digital .
`video disks have just arrived in the market. Many
`other video delivery media are also about to
`become digital,
`including over-the-air, coaxial
`cable, video tape and even ordinary telephone
`wires. The pace of this conversion is impressive,
`even by modern standards of technological inno(cid:173)
`vation.
`Almost all sensory signals are analog at the
`point of origination or at the point of perception.
`Audio and video signals are no exception. As an
`example, television, which was invented and stan(cid:173)
`dardized over fifty years ago, has remained mostly
`analog from camera to display, and a plethora of
`products and services have been built around these
`analog standards, even though dramatic changes
`have occurred in technology. However, inexpensive
`integrated circ~its, high-speed communication net(cid:173)
`works, rapid access dense storage media, and com(cid:173)
`puting architectures that can easily handle video(cid:173)
`rate data are now rapidly making the analog
`standard obsolete.
`Digital. audio and video signals integrated with
`computers, telecommunication networks, and con(cid:173)
`sumer. products are poised to fuel the information
`revolution. At the heart of this revolution is the
`digital compression of audio and video signals.
`
`3
`
`
`
`
`
`Introduction to Digi,tal Multimedia, Compresswn, and MPEG-2
`
`3
`
`the quality of the retrieved signal does not degrade
`in an unpredictable manner with multiple, reads as
`it often doe~ with analog storage. Also, with today's
`database and user interface technology, a rich set
`of interactions is possible only with stored digital
`signals. For_ example, with medical images the
`quality can be improved by noise reduction and
`contrast enhancement algorithms. Through the
`use of pseudo color, the visibility of minute varia(cid:173)
`tions and patterns can be greatly increased. ·
`Mapping the stored signal to displays with dif(cid:173)
`ferent resolutions in space (number of lines per
`screen and number of samples per line) and time
`(frame rates) can be done easily in the digital
`domain. A familiar example of this is the conver~
`sion of film,* which is almost always at a different
`resolution and frame rate than. the televi~ion signal.
`Digital signals are also consistent with the evolv(cid:173)
`ing
`telecommunications
`infrastructure. Digital
`transmission allows much better control of the
`quality of the transmitted signal. In broadcast tele(cid:173)
`vision, for example, if the signal were digital, the
`reproduced picture in the home could be identical
`to the picture in the studio, unlike the present situ(cid:173)
`ation where the studio pictures look far better than
`pictures at home. Finally, analog systems dictate
`that the entire television system from camera to
`display operate at a common clock with a stan(cid:173)
`dardized display. In the digital domain consider- ·
`able flexibility exists by which the transmitter and
`the receiver can negotiate the parameters for scan(cid:173)
`ning, resolution, and so forth, and thus create the
`best picture consistent with the capability of each
`sensor and display.
`
`1.2 . THE NEED FOR COMPRESSION
`
`With the advent of fax came the desire for faster
`transmission of documents. With a limited bitrate
`available on the PSTN,t the only means available
`to increase transmission speed was to reduce the
`· average number of bits per page, that is, compress
`
`the digital data. With the advent of video c,onfer(cid:173)
`encing, the need for· digital compressi!:m 1 was even
`more crucial. For video, simple sampling and bina(cid:173)
`ry coding of the camera voltage variations pro(cid:173)
`duces millions of bits per second, so much so that
`without digital compression, any video conferenc(cid:173)
`ing service would be prohibitively expensive.
`For entertainment TY, the shortage of over-the(cid:173)
`air bandwidth has led to coaxial cable (CATV~
`connections directly to individual homes. But even
`today, there is much more programming available
`via satellite than can be carried on most analog
`cable systems. Thus, compression of entertainment
`TV would allow for more programming to be car(cid:173)
`ried over-the-air and through CATV transmission.
`For storage of digital video on small CD sized
`disks, compression is absolutely necessary. There is
`currently no other way of obtaining the quality
`demanded by the entertainment industry while at
`the same time storing feature length :films, which
`may last up to two hours or longer.
`Future low bitrate applications will also require
`compression before service becomes viable. For
`example, wireless cellular video telephony• must
`operate at bitrates of a few dozen kilobits per sec(cid:173)
`ond, which can only be achieved -through large
`compression of the data. Likewise, video retrieval
`on the InterNet or World Wide Web is only feasi(cid:173)
`ble with compressed video data.
`
`1.2.1 What Is Compression?
`Most sensory signals contain a substantial
`amount of redundant or superfluous information.
`For example, a television camera that captures 30
`frames per second from a stationary scene produces'
`very similar frames, one after the other. Compres(cid:173)
`sion attempts to remove the superfluous informa~
`tion so that a single frame can be represented by a
`smaller amount of finite data, or in the case of
`audio or time varying images, by a lower data rate.
`Digitized audio and video signals contain a sig(cid:173)
`nificant amount of statirtical redundan<:)J. That is,
`
`*Most commercial movie film operates at 24 frames per second, whereas U.S. Broadcast TV operates at approximately 30
`frames per second.
`tPublic Switched Tdephone Network.
`!Community Antenna Television.
`
`5
`
`
`
`
`
`Introduction to Digi,tal Multimedia, Compression, and MPEG-2
`
`5
`
`capacity, the features required by the application,
`and the affordability of the hardware required for
`implementation of the compression algorithm
`(encoder as well as decoder). ':(his section describes
`some of the issues in designing the compressmn
`system.
`
`1.3. l Quality
`The quality of presentation that can be derived
`by decoding the compressed multimedia signal is
`the most important consideration in the choice of
`-the compression algorithm. The goal is to provide
`near-transparent coding for the class of multime(cid:173)
`dia signals that are typically used in a particular
`service.
`The two most important aspects of video quali(cid:173)
`ty are spatial resolution and temporal resolution. Spa(cid:173)
`tial resolution describes the clarity or lack of blur(cid:173)
`in the displayed image, while
`temporal
`ring
`resolution describes the smoothness of motion.
`
`Motion video, like :ftlm, consists of a certain
`number of frames per second to adequately repre~
`sent motion in the scene. The first step in digitizing
`video is to partition each frarrie into a large num(cid:173)
`ber of picture elements, or pelr' for short. The larger
`the number of pels, the higher the spatial resolu(cid:173)
`tion. Similarly, the more frames per second, the
`higher the temporal resolution.'
`Pels are usually arranged in rows and columns,
`as shown ~in Fig. 1.2. The next step is to measure
`the brightness of the red, green, and blue color
`components within each pel. These three color
`brightnesses are then represented by three binary
`numbers.
`
`1.3.2 Uncompressed versus
`Compressed Bitrates
`For entertainment television in North America
`and Japan, the NTSCt c·ol~r video image has
`29.97 frames per second; approximately 480 visi-
`
`Fig, 1. 2 Frames to be digitized are first partitioned into a large number of picture efmnmts, or pels for short, which are typically
`arranged in rows and columns. A row of pels is called a scan line.
`
`Video Frame
`
`*Some authors prefer the slightly longer term pixels.
`1National Television Systems Committee. It standardized composite color TV in 1953.
`
`7
`
`
`
`6
`
`hztroduction to Digz"tal Multimedia, Compression, and MPEG-2
`
`Table 1.1 Raw and compressed bitrates for film, NTSC, PAL, HDTY, videophone, stereo audio, and five-channel audio.
`Compressed bitrates are typically lower for slow moving drama than for fast live-action sports.
`
`Video Resolution
`(pels X lines X frames/s)
`
`Uncompressed
`Bitrate (RGB)
`
`Film (USA andJapan)
`NTSCvideo
`PAL video
`HDTV video
`HDTV video
`ISDN videophone (CIFJ
`PSTN videophone (QCIF)
`Two-channel stereo audio
`Five-channel stereo audio
`
`(480 X 480 X 24Hz)
`(480 X 480 X 29.97Hz)
`(576 X 576 X 25Hz)
`(1920 X 1080 X 30Hz)
`(1280 X 720 X 60Hz)
`(352 X 288 X 29.97Hz)
`(176 X 144 X 29.97Hz)
`
`133 Mbits/s
`168 Mbits/s
`199Mbits/s
`1493 Mbits/s
`1327 Mbits/s
`73 Mbits/s
`18 Mbits/s
`1.4 Mbits/s
`3.5 Mbits/s
`
`'--
`
`Compressed
`Bitrate
`
`3 to 6 Mbits/s
`4 to 8 Ml:iits/ s
`4 to 9 Mbits/ s
`18 to 30 Mbits/s
`18 to 30 Mbits/s
`64 to 1920 kbits/ s
`10 to 30 kbits/s
`128 to 384 kbits/s
`384 to 968 kbits/s
`
`ble scan lines per frame; and requires approxi(cid:173)
`mately 480 pels per scan line in the red, green, and
`blue color components. If each color component is
`coded using 8 bits (24 bits/pel total), the bitrate
`produced is = 168 Megabits per second (Mbits/ s).
`Table 1.1 shows the raw uncompressed bitrates for
`film, several video
`formats
`including PAL,*
`HDTV,t and videophone, as well as a few audio
`formats.
`that uncompressed
`We see from Table 1.1
`bitrates are all very high and are not economical in
`many applications. However, using the MPEG
`compression algorithms
`these bitrates can be
`reduced considerably. We see that MPEG is capa(cid:173)
`ble of compressing NTSC or PAL video into 3 to 6
`Mbits/ s with a quality comparable to analog
`CATV and far superior to VHS videotape.
`
`1.3.3 Bitrates Available on
`Various Media
`
`A number of communication channels are
`available for multimedi.a transmission and storage.
`PSTN modems can operate up to about 30 kilo(cid:173)
`bits/ second (kbits/s), which is rather low for gener(cid:173)
`ic video, but is acceptable for personal videophone.
`ISDNt ranges from 64 kbits/ s to 2 Mbits/ s, which
`1S often acceptable for moderate quality video.
`
`Table 1.2 Various media and the bitrates they support.
`
`Medium
`
`PSTNmodems
`ISDN
`LAN
`ATM
`CD-ROM (normal speed)
`Digital video disk
`Over-the-air video
`CATV
`
`Bitrate
`
`Up to 30 kbits/s
`64 to 1920 kbits/ s
`10 to 100 Mbits/ s
`135 Mbits/ s or more
`1.4 Mbits/s
`9 to 10 Mbits/s
`18 to 20 'Mbits/ s
`20 to 40 Mbits/ s
`
`LANs§ range from 10 Mbits/s to 100 Mbits/s or
`more, and ATM** can be many hundreds of
`Mbits/s. First generation CD-ROMstt can handle
`1.4 Mbits/s at normal speed and often have dmi(cid:173)
`ble or triple speed options. Second generation digi(cid:173)
`tal video disks operate at 9 to 10 Mbits/ s depend(cid:173)
`ing on format. Over-the-air or CATV channels
`can carry 20 Mbits/s to 40 Mbits/s, depending oh
`distance arid interference from other transmitters.
`These options are summarized in Table l'.2.
`
`1.4 COMPRESSION TECHNOLOGY
`
`In picture coding, for example, compression tech(cid:173)
`niques have been classified according to the groups
`
`*Phase Alternate Line, used in most of Europe and elsewhere.
`!High-Definition Television. Several formats are being used.
`i1ntegrated Services Digital Network. User bitrates are multiples of 64 kbits/ s.
`§Local Area Networks such as Ethernet, FDDI, etc.
`**Asynchronous Transfer Mode. ATM networks are packetized.
`ttcompact Disk-Read Only Memory.
`
`I ;
`;
`
`i
`I '
`'
`
`: i
`''
`
`iii
`
`8
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Introduction w Digi,tal Multimedia, Compression, and MPEG--2
`
`13
`
`computing/telecommunications infrastructure and
`to a large extent gives us media independence for
`sto~age, processing, and transmission. The band(cid:173)
`width expansion resulting from digitization can be
`more than compensated for by efficient compres-
`sion that removes both the statistical and the per(cid:173)
`ceptual redundancy present in the multimedia sig(cid:173)
`nal. Digital compression has many advantages,
`most of which come as a result of reduced data
`rates. However, when digital compression is bun(cid:173)
`dled as a part of an application, many constraints
`need to be attended to.
`These constraints impact directly on the choice
`of the compression algorithm. Many of these con(cid:173)
`straints are qualitatively examined in this chapter.
`A more detailed analysis of these will be made in , 8·
`the subsequent chapters.
`
`REFERENCES
`
`1. A. N. NETRAVALI and B. G. HASKELL, Digital
`Pictures-Representation, Compression and Standards, 2nd
`edit., Plenum Press, New York, 1995.
`2. W. B. PENNEBAKER and]. L. MITCHELL, JPEG-Still
`Image Compression Standard, Van Nostrand Reinhold,
`New York, 1993.
`·
`
`3. J. L. MITCHELL et al, IylPEG-1, Chapman & Hall,
`NewYork, 1997.
`4. L. CHIARIGLIONE, "The Development of an Inte(cid:173)
`grated Audiovisual Coding Standard: MPEG,"
`Proceedings ef the IEEE, Vol. 83, No. 2, (February
`1995).
`5. D. J. LE GALL, "MPEG Video: The Second Phase
`of Work," International Symposium: Society for
`Information Display, pp. 113-116 (May 1992).
`6. A. N. NETRAVALI and A. L~PMAN, "Digital Televi(cid:173)
`sion: A\perpective," Proceedings ef the IEEE, Vol. 83,
`No. 6 Gune 1995).
`7. R. HOPKINS, "Progress on HDTV Broadcasting
`Standards in the United States," Signal Processing:
`Image Commun., Vol. 5, pp. 355-378 (1993).
`A. Puru, "MPEG Video Coding Standards," Invit(cid:173)
`ed Tutorial: lnter~ational Society for Circuits and
`Systems (April 1995).
`9. A. KoPERNIK, R. SAND, and B. CH\)Q.UET, "The
`Future of Three-Dimensional Tv," International
`Workshop on HDTV'92, Signal Processing of
`HDTV, IV pp. 17-29 (1993).
`10. S. OKUBO, "Requirements for High Quality Video
`Coding Standards," Signal Processing: Image Com(cid:173)
`mun., Vol. 4, No. 2, pp 141-151 (April 1992).
`
`15
`
`
`
`
`
`Anatomy ef MPEG-2
`
`15
`
`2. 1. 1 MPEG-1 Systems
`
`2. 1.2 MPEG-1 Video
`
`The MPEG-1 Systems1 part specifies a system
`coding layer for combining coded audio and video
`data and to provide the capability for combining
`with it user-defined private data streams as well as
`streams that may be specified at a later time. To be
`more specific, 12 the MPEG-1 Systems standard
`defines a packet structure for multiplexing coded
`audio and video data into one stream and keeping
`it synchronized. It thus supports multiplexing of
`. multiple coded audio and video streams where
`each stream · is referred to as an elemenf;ary stream.
`The systems syntax includes data fields to assist in ·
`parsing the multiplexed stream after random
`access and to allow synchronization of elementary
`streams, management of decoder buffers contain(cid:173)
`ing coded data, and id~ntification of timing infor(cid:173)
`mation of coded program. The MPEG-1 Systems
`thus specifies syntax to allow generation of systems
`bitstreams and semantics for decoding these bit(cid:173)
`streams.
`Since the basic concepts of timing employed in
`MPEG-1 Systems are also common to MPEG-2
`Systems, we briefly introduce the terminology
`employed. The Systems Time Clock (STC) is the
`reference time base, which operates at 90 kHz, and
`may or may not be phase locked to individual
`audio or video sample clocks. It produces 33-bit
`time representations that are incremented at 90
`kHz. In MPEG Systems, the mechanism for gener(cid:173)
`ating the timing information from decoded data is
`provided by the ·Systems Clock Reference (SCR) fields
`that indicate the current STC time and appear
`intermittently in the bitstream spaced no further
`than 700 ms apart. The presentation playback or
`display synchronization information is provided by
`Presenf;ation Time Stamps (PTSs), 'that represent the
`intended time of presentation of decoded video
`pictures or audio frames. The audio or video PTS.
`are sampled to an accuracy of 33 bits.
`To ensure guaranteed decoder buffer behavior,
`MPEG Systems employs a Systems Target
`Decoder (STD) and Decoding Time Stamp (DTS).
`The DTS differs from PTS only in the case of
`video pictures that require additional reordering
`delay during the decoding process.
`
`The MPEG-1 Video2 standard was originally
`aimed at coding of video of SIF resolution (352 X
`240 at 30 noninterlaced frames/s or 352 X 288 at
`25 noninterlaced frames/ s) at bitrates of about 1.2
`Mbit/ s. In reality it also allows much larger picture
`sizes and correspondingly higher bitrates. Besides
`the issue of bitrates, the other significant issue is
`that MPEy includes functions to support interac(cid:173)
`tivity such as fast forward (FF), fast reverse (FR),
`and random access into the stored bitstream.
`These functions exist since MPEG-1 was originally
`aimed at digital storage media such as video com(cid:173)
`pact discs (CDs).
`The MPEG-1 Video standard basically specifies
`the video bitstream syntax and the corresponding
`video decoding process. The MPEG-1 syntax sup(cid:173)
`ports encoding methods 13, 14, 15 that exploit both the
`spatial redundancies and temporal redundancies.
`Spatial redundancies are exploited by using block(cid:173)
`based Discrete Cosine Transform (DCT) coding of
`8 X 8 pel blocks followed by quantization, zigzag
`scan, and variable length coding of runs of zero
`quantized indices and amplitudes of these indices.
`Moreover, quantization matrix allowing perceptu(cid:173)
`ally weighted quantization of DCT coefficients
`can be used to discard perceptually irrelevant
`information, thus increasing the coding efficiency
`further. Temporal redundancies are exploited by
`using motion compensated prediction, • which
`results in a significant reduction of interframe pre(cid:173)
`diction error.
`The MPEG-1 Video syntax supports three types
`of coded pictures, Intra (I) pictures, coded separate(cid:173)
`ly by themselves, Predictive (P) pictures, coded with
`respect to immediately previous I- or P-pictures,
`and Bidirectionally Predictive (B) pictures coded
`with respect to the immediate previous I- or P-pic(cid:173)
`ture, as well as the immediate next I- or P-picture.
`In terms of coding order, P-pictures are causal,
`whereas B-pictures are noncausal and use two sur(cid:173)
`rounding causally coded pictures for prediction. In
`terms of compression efficiency, I-pictures are least
`efficient, P-pictures are somewhat better, a~d B-pic(cid:173)
`tures are the most efficient. In typical MPEG-1
`encoding13,15 an input video sequence is divided
`
`17
`
`
`
`
`
`• 113818-6: Digital Storage Media-Command
`and Control (DSM-CC)
`• 113818-7: Non Backward Compatible (NBC)
`Audio
`• 113818-8: 10-Bit Video (This work item has
`been_dropped!)
`• 113818-9: Real Time Interface
`• 113818-10: Digital Storage Media-C9mmand
`and Control (DSM-CC) Conformance
`
`We now present a brief overview of the afore(cid:173)
`mentioned parts of the MPEG-2 standard.
`
`2.2. l MPEG-2 Systems
`Since the MPEG-1 standard was intended for
`audio-visual coding for Digital Storage Media
`(DSM) applications and since D~SMs typically have
`very low or negligible transmission bit error rates,
`the MPEG-1 Systems part was not designed to be
`highly robust to bit errors. Also, the MPEG-1 Sys(cid:173)
`tems was intended for software oriented process(cid:173)
`ing, and thus large variable length packets were
`preferred to minimize software overhead.
`The MPEG-2 standard on the other hand is
`more generic and thus intended for a variety of
`audio-visual coding 'applications. The MPEG-2
`Systems4 was mandated to improve error resilience
`plus the ability to carry multiple programs simulta(cid:173)
`neously without requiring them to have a common
`time base. Additionally, it was required that
`MPEG-2 Systems should support ATM networks.
`Furthermore, MPEG-2 Systems was required to
`solve the problems addressed by MPEQ--1 Systems
`and be somewhat compatible with it.
`The MPEG-2 Systems17,18 defines two types of
`streams: the Program Stream and the Transport
`Stream. The Program Stream is similar to the
`MPEG-1 Systems stream, but uses a modified syn(cid:173)
`tax and new functions to support advanced func(cid:173)
`tionalities. Further, it provides compatibility with
`MPEG-1 Systems streams. The requirements of
`MPEG-2 Program Stream decoders are similar to
`those of MPEG-1 System Stream decoders. Fur(cid:173)
`thermore, 'Program Stream decoders are expected
`to be forward compatible with MPEG-1 System
`Stream decoders, i.e., capable of decoding MPEG-
`1 System Streams. Like MPEG-1 Systems
`
`Anatomy of MPEG-2
`
`17
`
`typically
`decoders, Program streams decoders
`employ long and variable-length packets. Such
`packets are well suited for software based process(cid:173)
`ing and error-free environments, such as when the
`compressed data ~re stored on a disk. The packet
`sizes are usually in range of 1 to 2 kbytes chosen to
`match disk sector sizes (typically 2 kbytes); howev(cid:173)
`er, packet sizes as large as 64 kbytes are also sup(cid:173)
`ported. The Program Stream includes features not
`supported lily MPEG-1 Systems such as scrambling
`of data; assignment of different priorities to pack(cid:173)
`ets; information to assist alignment of elementary
`stream packets; indication of copyright; indication
`of fast forward, fast reverse, and other trick modes
`for storage devices; an optional field for network
`performance testing; and optional numbering of
`sequence of packets.
`The second . type of stream supported by
`MPEG-2 Systems is the Transport Stream, which
`differs significantly from MPEG-1 Systems as well
`as the Program Stream. The Transport Stream
`offers robustness necessary for noisy channels as
`well as the ability to include multiple programs in a
`single stream. The Transport Stream uses fixed
`length packets of size 188 bytes, with a new header
`syntax. It is therefore more suited for hardware
`processing and for error correction schemes. Thus
`the Transport Stream stream is well suited for
`delivering c9mpressed video and audio over error(cid:173)
`prone channels such as coaxial cable television net(cid:173)
`works and satellite transponders. Furthermore,
`multiple programs with independent time bases
`can be multiplexed in one Transport Stream. In
`fact the Transport Stream is designed to support
`many functions such as asynchronous multiplexing
`of programs, fast access to desired program for
`channel hopping, multiplex of programs with
`clocks unrelated to transport clock, and correct
`synchronization of elementary streams for play(cid:173)
`back. It also allows control of decoder buffers dur(cid:173)
`ing startup and playback for both constant bitrates
`and variable bitrate programs.
`A basic data structure that is common to the
`organization of both the Program Stream and
`Transport Stream data is called the Packetized Ele(cid:173)
`mentary Stream (PES) packet PES packets are
`generated by packetizing the continuous streams of
`compressed data generated by video and audio
`
`19
`
`
`
`
`
`Anatomy ef MPEG-2
`
`19
`
`of MPEG-2 Video was cons1derably revised20 to
`include video of higher resolutions such as HDTV
`at higher bitrates as well as hierarchical video cod(cid:173)
`ing for a range of applications. Furthermore, it
`also supports coding of interlaced video of a vari(cid:173)
`ety of resolutions.
`Actually, MPEG-2 Video does not standardize
`video encoding; only the video bitstream syntax
`and decoding semantics are standardized. The
`standardization process for MPEG-2 Video con(cid:173)
`sisted of a competitive phase followed by a collabo(cid:173)
`rative phase. First an initial set of requirements for
`MPEG-2 video was defined and used as the basis
`for a "Call for Proposais" which invited candidate
`coding
`schemes
`for
`standardization. These
`schemes underwent formal subjective testing and
`analysis to determine if they met the listed require(cid:173)
`ments. During the collaborative phase, the"best ele(cid:173)
`ments of the top performing schemes in the com(cid:173)
`petitive phase were merged to form an initial Test
`Model. A Test Model contains a description of an
`encoder, bitstream syntax, and decoding seman(cid:173)
`tics. The encoder description is not to be standard- ,
`ized,. but only needed for experimental evaluation
`of proposed techniques.
`Following this, · a set of Core Experiments were
`defined that underwent several iterations until con(cid:173)
`vergence was achieved. During the time that vari(cid:173)
`ous iterations on the Test Model19 were being per(cid:173)
`formed, the requirements for MPEG-2 Video also
`underwent further iterations.
`In Fig. 2.2 we show a simplified codec19,21 for
`MPEG-2 Nonscalable Video Coding. The MPEG-
`2 Video Encoder shown for illustration purposes
`
`consists of an Inter fram.e/field DCT Encoder, a
`Frame/field Motion Estimator and Compensator,
`and a Variable Length Encoder (VLE). Earlier we
`had mentioned that MPEG-2 Video is optimized
`for coding of interlaced video, which is why both
`the DCT coding and the motion estimation and
`compensation employed by the Video Encoder are
`frame/field adaptive. The Frame/field DCT
`Encoder exploits spatial redundancies, and the
`Frame/fiel~ Motion Compensator exploits tempo(cid:173)
`ral redundancies in interlaced video signal. The
`coded video bitstream is sent to a systems multi(cid:173)
`plexer, Sy~ Mux, which outputs either a Transport
`or a Program Stream.
`The MPEG-2 Decoder in this codec consists of
`a Variable Length Decoder (VLD), 'Inter frame/
`field DCT Decoder, and the Frame/field Motion
`Compensator. Sys Demux performs the comple(cid:173)
`mentary function of Sys Mux and presents the
`video bitstream to VLD for decoding of motion
`vectors and DCT coefficients. The Frame/field
`Motion Compensator uses a motion vector decod(cid:173)
`ed by VLD to generate motion compensated pre(cid:173)
`diction that is added back to a decoded prediction
`error signal to generate decoded video out. This
`type of coding produces video bitstreams called
`nonscalable, since normally the full spatial and tem(cid:173)
`poral resolution coded is the one that is expected to
`be decoded. Actually, this is not quite true, since, if
`B-pictures are used in encoding, because they do
`not feedback into the interframe coding loop, it is
`always possible to decode some of them, thus
`achieving temporal resolution scaling even for non(cid:173)
`scalable. bitstreams. However, by nonscalable cod-
`
`video
`in
`
`Inter frame/field
`DCTEncoder
`
`Variable
`Length
`Encoder
`
`Frame/field Motion
`Estimator and
`Compensator
`
`mv
`
`><
`:I
`
`::g .,
`
`>,
`{I)
`
`J "' >,
`
`{I)
`
`Variable
`Length
`Decoder
`
`....-------, video
`Inter frame/field
`out
`DCTDecoder
`
`mv
`
`Frame/field Motion
`
`Compensator
`
`MPEG-2 Nonscalable Video Encoder -
`
`MPBG-2 Nonscalable Video Decoder
`
`Fig, 2,2 A generalized codec for MPEG-2 Nonscalable Video Coding.
`
`21
`
`
`
`
`
`quality has also been judged sufficient for HDTY.34
`Recently, newer applications of MPEG-2 Video
`have also emerged, necessitating combinations of
`coding tools requiring new Profiles.
`At the time of writing of this chapter in early
`1996, two new amendments in MPEG-2 video,
`each adding a new profile, are in progress. One of
`the two amendments is nearly complete, and the
`second one is expected to be completed by the end
`of 1996. Although MPEG-2 video already
`includes various coding techniques (decoding tech(cid:173)
`niques, really), in reality, these techniques must be
`included in one of the agreed Profiles also (these
`profiles can even be defined after completion of
`the standard). Next, we briefly introduce what
`application areas are being targeted by these two
`amendments.
`•
`•
`The first amendment
`to MPEG-2 Video
`involves a Profile that includes tools for coding of a
`higher resolution chrominance format called the·
`4:2:2 format. We wµl discuss more about this for(cid:173)
`mat in Chapter 5; for now, we need to understand
`why it was relevant for MPEG-2 video to address
`applications related to this format. When work on
`MPEG-2 Video was close to completion, it was felt
`that MPEG-2 video was capable of delivering fair(cid:173)
`ly high quality, and thus it could be used in profes(cid:173)
`sional video applications. In sorting out needs of
`such applications, besides higher bitrates, it was
`found that some professional applications require
`, higher than normal chrominance spatial (or spatio(cid:173)
`temporal) resolution, such as that provided by the
`4:2:2 format while another set of applications
`required higher amplitude resolution for lumi(cid:173)
`nance and chrominance signal. Although coding of
`the 4:2:2 format video is supported by existing
`tools in the MPEG-2 Video standard, it was neces(cid:173)
`sary to verify that its performance was equal to or
`better than that of other coding sc?emes while ful-·
`filling additional requirements such as coding qual'(cid:173)
`ity after multiple codings and decodings.
`The second amendment32 to MPEG-2 Video
`involves a Profile that includes tools for coding of a
`number of simultaneous video signals generated in
`-imaging a scene, each representing a slightly differ(cid:173)
`ent viewpqint; imaging of such scenes is said to
`constitute multiviewpoint video. An example of
`multi.viewpoint video is stereoscopic _video which
`
`Anatomy efMPEG-2
`
`21
`
`uses two slightly different views of a scene. Anoth(cid:173)
`er example is generalized 3D video where a large
`number of views of a scene may be used. We will
`discuss more about this profile in Chapter 5 and a
`lot more about 3D/stereoscopic video in Chapter
`15; for now, as in the case of'4:2:2 coding we need
`to understand why it was relevant to address multi(cid:173)
`viewpoint coding applications. As mentioned earli(cid:173)
`er, MPEG-2 Video already includes several Scala(cid:173)
`bility techniques that allow layered coding which
`exploits correlations between layers. As the result
`of a recent rise in applications in video games,
`movies, education, and entertainment, efficient
`coding of multiviewpoint video such as stereoscop(cid:173)
`ic video is becoming increasingly important. Not
`surprisingly, this involves exploiting correlations
`between different views, and since scalability tech(cid:173)
`niques of MPEG-2 Video already included su'ch
`techniques, it was considered relatively straightfor(cid:173)
`ward to enable use of such techniques via defini(cid:173)
`tion of a Profile (which is currently in·progress).
`
`2.2.3 MPEG-2 Audio
`Digital multichannel audio systems employ a
`combination of p front and q back channels; for
`example, three front channels_:_:_left, right, cen(cid:173)
`ter-and two back channels-surround left and
`surround right-to create surreal experiences. In
`addition, multichannel systems can also be used to
`provide multilingual programs, audio augmenta(cid:173)
`tion for visually impaired individuals, enhanced
`audio for hearing impaired individuals, and so
`forth.
`The MPEG-2 Audio6 stanq_ard consists of two
`parts, one that allows coqing of multichannel
`audio signals in a forward and backward compati- .
`ble manner with MPEG-1 and one that does not.
`Part 3 of the MPEG-2 standard is forward and
`backward compatible with MPE9"- l. Here