`
`Jeremiah Golston (Distinguished Member Technical Staff, Texas Instruments)
`j-golston@ti.com
`#250 at Embedded Systems Conference – San Francisco 2004
`
`Introduction
`
`Digital video is being adopted in an increasing range of applications including video telephony,
`security/surveillance, DVD, digital television, Internet video streaming, digital video camcorders, cellular
`media, and personal video recorders. Video compression is an essential enabler for these applications and an
`increasing number of video codec (compression/decompression) industry standards and proprietary
`algorithms are available to make it practical to store and transmit video in digital form. Compression
`standards are evolving to make use of advances in algorithms and take advantage of continued increases in
`available processing horsepower in low-cost integrated circuits such as digital media processors. Differences
`exist in the compression standards and within implementation of standards based on optimizations for the
`primary requirements of the target application. This paper provides an overview of the different compression
`standards and highlights where they are best suited. It also provides an overview of compression rules of
`thumb
`for different standards and
`the corresponding performance
`requirements
`for
`real-time
`implementations.
`
`The Video Compression Challenge
`
` major challenge for digital video is that raw or uncompressed video requires lots of data to be stored or
`transmitted. For example, standard definition NTSC video is typically digitized at 720x480 using 4:2:2
`YCrCb at 30 frames/second. This requires a data rate of over 165 Mbits/sec. To store one 90-minute video
`requires over 110 GBytes or approximately 140x the storage capability of a CDROM. Even lower resolution
`video such as CIF (352x288 4:2:0 at 30 frames/second) which is often used in video streaming applications
`requires over 36.5 Mbits/s - much more than can be sustained on even broadband networks such as ADSL.
`So, it is clear that compression is needed to store or transmit digital video.
`
`The goal for image and video compression is to represent (or encode) a digital image or sequence of images
`in the case of video using as few bits as possible while maintaining its visual appearance. The techniques
`that have emerged are based on mathematical techniques but require making subtle tradeoffs that approach
`being an art from.
`
`Compression Tradeoffs
`
`There are many factors to consider in selecting the compression engine to use in a digital video system. The
`first thing to consider is the image quality requirements for the application and the format of both the source
`content and target display. Parameters include the desired resolution, color depth, the number of frames per
`second, and whether the content and/or display are progressive or interlaced.
`
`
` A
`
`
`
`Page 1
`
`NETFLIX, INC
`Exhibit 1019
`IPR2018-01630
`
`
`
`Compression often involves tradeoffs between the image quality requirements and other needs of the
`application. For example, what is the maximum bit rate in terms of bits per second? How much storage
`capacity is available and what is the recording duration? For two-way video communication, what is the
`latency tolerance or allowable end-to-end system delay? The various compression standards handle these
`tradeoffs including the image resolution and target bit rate differently depending on the primary target
`application.
`
`Another tradeoff is the cost of real-time implementation of the encoding and decoding. Typically newer
`algorithms achieving higher compression require increased processing which can impact the cost for
`encoding and decoding devices, system power dissipation, and total memory in the system.
`
`Standards Bodies
`
`There have been two primary standards organizations driving the definition of image and video compression
`standards. The International Telecommunications Union (ITU) is focused on telecommunication applications
`and has created the H.26x standards for video telephony. The Internal Standards Organization (ISO) is more
`focused on consumer applications and has defined the JPEG standards for still image compression and
`MPEG standards for compressing moving pictures.
`
`The two groups often make slightly different tradeoffs based on their primary target applications. On
`occasions the two groups have worked together such as recent work by the JVT (or Joint Video Team) on a
`common standard referred to as both H.264 and MPEG-4 AVC. While almost all video standards were
`targeted for a few specific applications, they are often used to advantage in other kinds of applications when
`they are well suited
`
`Standards have been critical for the widespread adoption of compression technology. The ITU and ISO have
`been instrumental in creating compression standards the marketplace can use to achieve interoperability.
`These groups also continue to evolve compression techniques and define new standards that deliver higher
`compression and enable new market opportunities.
`
`
`
`ITU-T
`Standards
`
`
`
`Joint
`ITU-T / MPEG
`Standards
`
`
`MPEG
`Standards
`
`H.261
`
`H.263
`
`H.263+
`
`H.263++
`
` H.262/MPEG-2
`
`H.264/MPEG4 AVC
`
`MPEG-1
`
`MPEG-4
`
`Figure 1.
`1984 1986 1988 1990 1992 1994 1996
`1998
`Figure 1. Progression of the ITU-T Recommendations and MPEG standards.
`
`In addition to industry standards from the ITU and ISO, several popular proprietary solutions have emerged
`particularly for Internet streaming media applications. These include Real Networks Real Video (RV10)
`
`2002 2004
`
`2000
`
`
`
`Page 2
`
`
`
`Microsoft Windows Media Video 9 Series, ON2 VP6, and Nancy among others. Because of the installed
`base of content in these formats, they can become de facto standards.
`
`The number of standards and de facto standards is rapidly increasing creating an increasing need for flexible
`solutions for encoding and decoding. We’ll step through some of the industry standard formats in a little
`more detail in the next few sections focusing on key features and target applications.
`
`JPEG
`
`JPEG developed by the ISO was the first widespread image compression standard [1]. It was designed to
`allow compression of digital images and is now widely used for Internet web pages and digital still cameras.
`The compression quality at a given rate is somewhat dependent on the image content such as the amount of
`detail or high frequency content in the image. However, using JPEG compression factors of 10:1 can
`typically be achieved without introducing serious effects from the compression. As you go above this 10 to 1
`ratio, you can start to easily notice compression artifacts such as blockiness, contouring, and blurring of the
`image.
`
`Although JPEG is not optimized for video, it has often been used for coding video with what is sometimes
`referred to as “Motion JPEG”. Motion JPEG is not defined in the standard but typically consists of
`independently coding individual frames in a digital video sequence using JPEG. A common application for
`Motion JPEG is network video surveillance. Since each image is coded independently, it is easy to search
`through the content and also has benefits for interoperability with PC browsers.
`
`
`Input
`Image
`
`DCT
`
`Quantize
`
`Run Length
`Encode
`(RLE)
`
`Variable
`Length
`Coding
`
`Output
`Bitstream
`
`Key Components
`– DCT (translate spatial data to frequency domain)
`
`– Quantization (scale the bit allocation for different frequencies
`generating many zero valued coefficients)
`
`– Run-length code (RLE) non-zero coefficients
`
`– Variable Length Coding (VLC)
`
`–Entropy code (e.g., Huffman) run-length codes
`
`Figure 2: JPEG (Intra-frame) Compression Block Diagram
`
`
`
`
`
`
`Page 3
`
`
`
`The main functions in the JPEG standard shown in Figure 2 formed the core for all of the major compression
`algorithms that followed. Key functions include the following:
`
`Block-based Processing: Dividing each frame into blocks of pixels so that processing of the image or video
`frame can be conducted at the block level.
`Intra-frame Coding: Exploiting the spatial redundancies that exist within the image or video frame by
`coding the original blocks through transform, quantization, and entropy coding. The frame is coded based on
`spatial redundancy only. There is no dependence on surrounding frames.
`8x8 DCT: Each 8x8 block of pixel values is mapped to the frequency domain producing 64 frequency
`components
`Perceptual Quantization: Scale the bit allocation for different frequencies typically generating many zero
`valued coefficients.
`Run-length Coding: Represent the quantized frequency coefficients as a non-zero coefficient level followed
`by runs of zero coefficients and a final end of block code after the last non-zero value.
`Variable Length (Huffman) Coding: Huffman coding converts the run-level pairs into variable length
`codes (VLCs) with the bit-length optimized for the typical probability distribution.
`
`JPEG has extensions for lossless and progressive coding. Unlike most of the video compression standards,
`JPEG supports a variety of color spaces including RGB and YCrCb.
`
`JPEG2000
`
`JPEG2000 is a new still image coding standard from the ISO that was adopted in December 2000 [2]. It was
`targeted at many of the same applications as JPEG including high-quality digital still cameras, hard copy
`devices and Internet picture applications. The primary goals were to provide improved compression along
`with more seamless quality and resolution scalability.
`
`JPEG2000 achieves key improvements in scalability of resolution and bitrate through use of several key
`functions that are not used by the JPEG, MPEG, and H.26x standards.
`
`Discrete Wavelet Transform: The wavelet transform is used replacing the DCT to achieve higher
`compression and improve support for scalable transmission. Wavelets are new basis functions, unlike the
`usual cosines (DCT) and sines (FFT). They are called wavelets because they look like small waves. They
`have an excellent ability to represent both stationary as well as transient phenomena with few coefficients.
`Wavelets represent signals as a linear summation of shifted and translated versions of a basic wave.
`JPEG2000 is coded in frequency sub-bands using the wavelet transform to allow resolution scalability. The
`same bitstream can be decoded at different resolutions. Also, a thumbnail can be sent providing excellent
`quality at lower resolution and the resolution can be gradually increased as more sub-bands are received. This
`structure also helps improve error resilience for wireless and Internet applications.
`Bit Plane Coding: The quantized sub-bands from the wavelet transform are divided into code blocks. Code
`blocks are entropy coded along bit planes using a combination of a bit plane coder and binary arithmetic
`coding. In JPEG2000, embedded block coding with optimized truncation (EBCOT) is used to implement bit
`plane coding. The algorithm uses symmetries and redundancies within and across the bit planes. The bit
`plane coding structure can be used to offer bitrate scalability since increasing detail can be added as more bit
`planes are decoded. Also, different quality bitstreams can be decoded at the same resolution, depending on
`the client’s bandwidth without having to re-encode separately for each client.
`
`
`
`Page 4
`
`
`
`Binary Arithmetic Coding: The bit plane coding outputs are entropy coded using binary arithmetic coding
`to generate the bitstream. Binary coding allows more flexibility than Huffman coding because symbols don’t
`have to be represented by an integer number of bits. The JPEG2000 arithmetic coder uses predetermined
`probability values and the adaptation state machine is also supplied by the standard.
`
`JPEG2000 can sustain much better quality than JPEG at high compression ratios because the wavelet
`transform degrades more gracefully. As the compression rate decreases, the gap narrows between JPEG and
`JPEG2000. For excellent quality, the wavelet transform yields about ~30% more compression (e.g., 13:1)
`than JPEG (10:1). The wavelet transform is more computationally intensive than the DCT but it is the really
`the bit-plane coding and binary arithmetic encoding functions that add most of the complexity to JPEG2000.
`
`JPEG2000 includes both lossy and lossless compression modes. Motion JPEG2000 support is also being
`defined in the standard. One of the potential uses of motion JPEG2000 is for applications such as digital
`cinema requiring some compression but with the primary focus on highest video quality. JPEG2000 can
`support pixel depths greater than 8-bit such as 10 or 12-bits/pixel and is flexible in terms of the color space.
`
`The widespread benefit of wavelets for low-bit rate video compression is still not clear. Motion estimation
`typically works best on small blocks. However, dividing images into small blocks degrades the wavelet
`performance. This makes it difficult to apply motion compensation using wavelets in the spatial domain.
`Meanwhile, wavelets are not shift invariant so shifted versions of the image result in a totally new
`representation. This creates problems for recognizing shifted versions of objects in images in transform
`domain. Since the transform is no longer block based, research is ongoing to find efficient ways to exploit
`motion in the hierarchical or sub-band domain. There have been some proprietary video codecs that use
`wavelets for I pictures, and DCT for P and B pictures. However resulting bit-rates are not comparable with
`today’s advanced media codecs such as H.264.
`
`H.261
`
`H.261 defined by the ITU was the first major video compression standard [3]. It was targeted for video
`conferencing applications and was originally referred to as Px64 since it was designed for use with ISDN
`networks that supported multiples of 64 kbps. As seen in Figure 3, H.261 and the video compression
`standards that followed use similar core functions as JPEG such as block-based DCT but add features to
`explore the temporal redundancy or commonality from one image to the next in typical video content. For
`example, background areas often stay the same from one frame to the next and do not need to be
`retransmitted each frame. Video compression algorithms typically encode the differences between
`neighboring frames instead actual pixel values. Key features added in H.261 over JPEG include:
`
`Inter-frame Coding: Coding uses both spatial redundancy and temporal redundancy to achieve higher
`compression.
`P Frames: Frames are coded using data the previous decoded frame to predict the content in the new frame
`and exploiting any remaining spatial redundancies within the video frame by coding the residual blocks, i.e.,
`the difference between the original blocks and the corresponding predicted blocks, using DCT, quantization,
`and entropy coding.
`Motion Estimation: Used in the encoder to account for motion between the reference frame and the frame
`being coded to allow best possible prediction. This is usually the most performance intensive function in
`
`
`
`Page 5
`
`
`
`video compression and is part of why video encoders typically require much more processing than the
`corresponding video decoder.
`Motion Compensation: Process in the decoder of bringing in the predicted data from the reference frame
`accounting for the motion identified by the motion estimation in the encoder.
`Fixed Quantization: Unlike JPEG and MPEG standards, H.261 and H.263 use fixed linear quantization
`across all the AC coefficients.
`Loop Filtering: 2D separable 121 filter used to smooth out quantization effects in the reference frame.
`Must be applied in bit exact fashion by both the encoder and the decoder.
`
`
`
`Figure 3: Inter-frame Video Compression Block Diagram
`
`Due to its focus on two-way video, H.261 includes only techniques that do not introduce major added delay
`in the compression and decompression process. In general, the encoder is designed to avoid extra complexity
`since most applications require simultaneous real-time encoding and decoding.
`
`H.263
`
`H.263 was developed after H.261 with a focus on enabling better quality at even lower bitrates [4]. One of
`the major original targets was video over ordinary telephone modems that ran at 28.8 Kbps at the time. The
`target resolution was from SQCIF (128x96) to CIF. The basic algorithms are similar to H.261 but with the
`following new features:
`
`
`
`
`Page 6
`
`
`
`Motion Estimation: Support for half-pel motion vectors and larger search range. Several annexes with
`optional features including 4 motion vectors, overlapped motion compensation, and unrestricted motion
`vectors.
`3D VLC: Huffman coding which combines an end of block (EOB) indicator together with each Run Level
`pair. This feature is specifically targeted at low-bit rate where many times there are only one or two coded
`coefficients.
`
`Adequate quality over ordinary phone lines proved to be very difficult and videophones over standard
`modems are still a challenge today. Because H.263 generally offered improved efficiency over H.261, it
`became used as the preferred algorithm for video conferencing with H.261 support still required for
`compatibility with older systems. H.263 grew over time as H.263+and H.263++ added optional annexes
`supporting compression improvements and features for robustness over packet networks. For this reason, it
`has found some use in networked security applications as an alternative to motion JPEG. H.263 and its
`annexes formed the core for many of the coding tools in MPEG-4.
`
`MPEG-1
`
`MPEG-1 was the first video compression algorithm developed by the ISO [5]. The driving application was
`storage and retrieval of moving pictures and audio on digital media such as video CDs using SIF resolution
`(352x240) at 30 fps. The targeted output bitrate was 1.15 Mbps, which produces effectively 25:1
`compression. MPEG-1 is similar to H.261 but encoders typically require more performance to support the
`heavier motion found in movie content versus typical video telephony.
`
`Major new tools in MPEG-1 included:
`
` B
`
` Frames: Individual macroblocks can be coded using forward, backward or bi-directional prediction as
`illustrated in Figure 4. An example of the benefit is the ability to match a background area that was occluded
`in the previous frame using forward prediction. Bi-directional prediction can allow for decreased noise by
`averaging both forward and backward prediction. Leveraging this feature in encoders requires additional
`processing since motion estimation has to be performed for both forward and backward prediction which can
`effectively double the motion estimation computational requirements. B frame tools require a more complex
`data flow since frames are decoded out of order with respect to how they are captured and need to be
`displayed. This feature results in increased latency and thus is not suitable for some applications. B frames
`are not used for prediction so tradeoffs can be made for some applications. For example, they can be skipped
`in low frame rate apps without impacting the decoding of future I and P frames.
`Adaptive Perceptual Quantization: A quantization scale factor is applied specific to each frequency bin to
`optimize for human visual perception.
`
`Broadcasters and content providers were generally not happy with the quality that could be achieved at the
`target bitrates for MPEG-1 and so an effort was started on a new standard that would support higher
`resolution video using higher bitrates.
`
`
`
`Page 7
`
`
`
`Intra (I) Frame Coding
`– Frame is coded based on spatial redundancy only
`– No dependence on surrounding frames
`P (Predicted) Frame Coding
`– Frame is coded using prediction from prior encoded I or P frame(s)
`B (Bi-directionally) Predicted Frame
`– Frame is coded with bi-directional (forward and backward)
`prediction
`– B frames are never used for prediction
`
`I
`
`B
`
`B
`
`P
`
`B
`
`B
`
`P
`
`Figure 4: Frame Coding Types
`
`MPEG-2
`
`MPEG-2 was developed targeting digital television and soon became the most successful video compression
`algorithm so far [6]. It supports standard television resolutions including interlaced 720x480 at 60 fields per
`second for NTSC used in the US & Japan and interlaced 720x576 at 50 fields per second for PAL used in
`Europe.
`
`MPEG-2 built on MPEG-1 with extensions to support interlaced video and also much wider motion
`compensation ranges. Encoders taking full advantage of the wider search range and the higher resolution
`require significantly more processing than H.261 & MPEG-1.
`
`Interlaced Coding Tools: Includes ability to optimize the motion estimation supporting both field and frame
`based predictions and support for both field and frame based DCT/IDCT.
`Wide Search Range: Due to the target for higher resolution video, MPEG-2 supports vastly wider search
`ranges than MPEG-1. This greatly increases the performance requirement for motion estimation versus the
`earlier standards.
`
`MPEG-2 performs well at compression ratios around 30:1. The quality achieved with MPEG-2 at 4-8 Mbps
`was found to be acceptable for consumer video applications and it soon became deployed in applications
`including digital satellite, digital cable, DVDs, and now high-definition TV.
`
`The processing requirements for MPEG-2 decoding were initially very high for general-purpose processors
`and even DSPs. Optimized fixed function MPEG-2 decoders were developed and have become inexpensive
`
`
`
`Page 8
`
`(cid:127)
`(cid:127)
`(cid:127)
`
`
`over time due to the high volumes. Availability of cost-effective silicon solutions is a key ingredient for the
`success and deployment of video codec standards.
`
`MPEG-4
`
`MPEG-4 was initiated by the ISO as a follow-on to the success of MPEG-2 [7]. Some of the early objectives
`were increased error robustness supporting wireless networks, better support for low bitrate applications, and
`a variety of new tools to support merging graphic objects with video. Most of the graphics features have not
`gained significant traction yet in products and implementations have focused primarily on the improved low
`bitrate compression and error resiliency.
`
`MPEG-4 simple profile (SP) starts from H.263 baseline and adds new tools for improved compression:
`
`Unrestricted Motion Vectors: Supports prediction for objects when they partially move outside of the
`boundaries of the frame.
`Variable Block Size Motion Compensation: Allows motion compensation at either 16x16 or 8x8 block
`granularity.
`Intra DCT DC/AC Prediction: Allows the DC/AC coefficients to be predicted from neighboring blocks
`either to the left or above the current block.
`
`Error resiliency features added to support recovery for packet loss include:
`
`Slice Resynchronization: Establishes slices within images that allow quicker resynchronization after an
`error has occurred. The standard removes data dependencies between slices to allow error-free decoding at
`the start of the slice regardless of the error that occurred in the previous slice.
`Data Partitioning: A mode that allows partitioning the data within a video packet into a motion part and
`DCT data part by separating these with a unique motion boundary marker. This allows more stringent checks
`on the validity of motion vector data. If an error occurs you can have better visibility at what point the error
`occurred to avoid discarding all the motion data when an error is found.
`Reversible VLC: VLC code tables designed to allow decoding them backwards as well as forwards. When
`an error is encountered, it is possible to sync at the next slice or start code and work back to the point where
`the error occurred.
`
`The MPEG-4 advanced simple profile (ASP) starts from the simple profile and adds B frames and interlaced
`tools similar to MPEG-2. It also adds quarter-pixel motion estimation and an option for global motion
`compensation. MPEG-4 advanced simple profile requires significantly more processing performance than
`the simple profile and has higher complexity and coding efficiency than MPEG-2.
`
`MPEG-4 was used initially primarily in Internet streaming and became adopted for example by Apple’s
`QuickTime player. MPEG-4 simple profile is now finding widespread applications in emerging cellular
`media phones. MPEG-4 ASP forms the foundation for a proprietary implementation called DivX that has
`become popular.
`
`
`
`
`Page 9
`
`
`
`H.264/MPEG-4 AVC
`
` A
`
` major breakthrough is now happening with the introduction of a new standard jointly promoted by the ITU
`and ISO [8,9]. H.264/MPEG-4 AVC delivers a significant break-through in compression efficiency generally
`achieving around 2x reduction in bit rate versus MPEG-2 and MPEG-4 simple profile. In formal tests
`conducted by the JVT, H.264 delivered a coding efficiency improvement of 1.5x or greater in 78% of the 85
`testcases with 77% of those showing improvements 2x or greater and as high as 4x for some cases [10].
`
`This new standard has been referred to by many different names as it evolved. The ITU began work on
`H.26L (for long term) in 1997 using major new coding tools. The results were impressive and the ISO
`decided to work with the ITU to adopt a common standard under a Joint Video Team. For this reason, you
`sometimes hear people refer to the standard as JVT even though this is not the formal name. The ITU
`approved the new H.264 standard in May 2003. The ISO approved the standard in October of 2003 as
`MPEG-4 Part 10, Advanced Video Coding or AVC.
`
`The 2x improvement offered by H.264 creates new market opportunity such as the following possibilities:
`(cid:127) VHS-qualit y video at about 600 Kbps. This can enable video delivery on demand over ADSL lines.
`(cid:127) An HD movie can fit on one ordinary DVD instead of requiring new laser optics.
`
`Intra Prediction Modes
`9 4x4 & 4 16x16 modes = 13 modes
`
`Video
`Source
`
`+
`_
`
`Intra Prediction
`
`Intra
`
`Inter
`
`Coding Control
`
`Transform
`
`sizes
`step
`Quantization
`increased at a compounding
`rate of approximately 12.5%
`Quantized Transform
`Coefficients
`
`Quantization
`
`4x4 Integer Transform (fixed)
`
`Predicted Frame
`
`Motion
`Compensation
`
`Inverse
`Quantization
`
`Inverse
`Transform
`+
`
`+
`
`Frame Store
`
`Loop Filter
`
`Motion
`Estimation
`
`Signal-adaptive de-blocking filter
`
`Motion Vectors
`
`Seven block sizes and shapes
`Multiple reference picture selection
`1/4-pel motion estimation accuracy
`
`Bit Stream Out
`
`Entropy
`Coding
`
`[Single Universal VLC and
`Context Adaptive VLC]
`OR
`[Context-Based Adaptive
`Binary Arithmetic
`Coding]
`
`No mismatch
`
`
`Figure 5: H.264 Block Diagram and Key Features
`
`While H.264 uses the same general coding techniques as previous standards, it has many new features that
`distinguish it from previous standards and combine to enable improved coding efficiency. The main
`differences are summarized in the encoder block diagram in Figure 5 and described briefly below:
`
`
`
`
`Page 10
`
`
`
`Intra Prediction and Coding: When using intra coding, intra prediction attempts to predict the current
`block from the neighboring pixels in adjacent blocks in a defined set of directions. The difference between
`the block and the best resulting prediction is then coded rather than actual block. This results in a significant
`improvement in intra coding efficiency.
`Inter Prediction and Coding: Inter-frame coding in H.264 leverages most of the key features in earlier
`standards and adds both flexibility and functionality including various block sizes for motion compensation,
`quarter-pel motion compensation, multiple-reference frames, and adaptive loop deblocking.
`Block sizes: Motion compensation can be performed using a number of different block sizes. Individual
`motion vectors can be transmitted for blocks as small as 4x4, so up to 32 motion vectors may be transmitted
`for a single macroblock in the case bi-directional prediction. Block sizes of 16x8, 8x16, 8x8, 8x4, and 4x8
`are also supported. The option for smaller motion compensation improves the ability to handle fine motion
`detail and results in better subjective quality including the absence of large blocking artifacts.
`Quarter-Pel Motion Estimation: Motion compensation is improved by allowing half-pel and quarter-pel
`motion vector resolution.
`Multiple Reference Picture Selection: Up to five different reference frames can be used for inter-picture
`coding resulting in better subjective video quality and more efficient coding. Providing multiple reference
`frames can also help make the H.264 bitstream more error resilient. Note that this feature leads to increased
`memory requirement for both the encoder and the decoder since multiple reference frames must be
`maintained in memory.
`Adaptive Loop Deblocking Filter: H.264 uses an adaptive deblocking filter that operates on the horizontal
`and vertical block edges within the prediction loop to remove artifacts caused by block prediction errors. The
`filtering is generally based on 4x4 block boundaries, in which two pixels on either side of the boundary may
`be updated using a 3-tap filter. The rules for applying the loop deblocking filter are intricate and quite
`complex.
`Integer Transform: H.264 employs a purely integer 4x4 spatial transform which is an approximation of the
`DCT instead of a floating-point 8x8 DCT. Previous standards had to define rounding-error tolerances for
`fixed point implementations of the inverse transform. Drift caused by mismatches in the IDCT precision
`between the encoder and decoder were a source of quality loss. The small 4x4 shape helps reduce blocking
`and ringing artifacts.
`Quantization and Transform Coefficient Scanning: Transform coefficients are quantized using scalar
`quantization with no widened dead-zone. Thirty-two different quantization step sizes can be chosen on a
`macroblock basis similar to prior standards but the step sizes are increased at a compounding rate of
`approximately 12.5%, rather than by a constant increment. The fidelity of chrominance components is
`improved by using finer quantization step sizes compared to luminance coefficients, particularly when the
`luminance coefficients are coarsely quantized.
`Entropy Coding: The baseline profile uses a Universal VLC (UVLC)/Context Adaptive VLC (CAVLC)
`combination and the main profile also supports a new Context-Adaptive Binary Arithmetic Coder (CABAC).
`UVLC/CAVLC: Unlike previous standards that offered a number of static VLC tables depending on the
`type of data under consideration, H.264 uses a Context-Adaptive VLC for the transform coefficients and a
`single Universal VLC approach for all the other symbols. The CAVLC is superior to previous VLC
`implementations but without the full cost of CABAC.
`Context-Based Adaptive Binary Arithmetic Coding (CABAC): Arithmetic coding uses a probability
`model to encode and decode the syntax elements such as transform coefficients and motion vectors. To
`increase the coding efficiency of arithmetic coding, the underlying probability model is adapted to the
`changing statistics within a video frame, through a process called context modeling. Context modeling
`provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, the
`
`
`
`Page 11
`
`
`
`given inter-symbol redundancy can be exploited by switching between different probability models,
`according to already coded symbols in the neighborhood of the current symbol. Each syntax element
`maintains a different model (for example, motion vectors and transform coefficients have different models).
`CABAC can provide up to about 10% bitrate improvement over UVLC/CAVLC.
`
`H.264 supports three profiles: baseline, main, and extended. Currently the baseline profile and main profile
`are generating the most interest. The baseline profile requires less computation and system memory and is
`optimized for low latency. It does not include B frames due to its inherent latency and CABAC due to the
`computational complexity. The baseline profile is a good match for video telephony applications as well as
`other applications that require cost-effective real-time encoding [11].
`
`The main profile provides the highest compression but requires significantly more processing than the
`baseline profile making it difficult for low-cost real-time encoding and also low-latency applications.
`Broadcast and content storage applications are primarily interested in the main profile to leverage the highest
`possible video quality at the lowest bitrate [12].
`
`Windows Media Video 9 Series
`
`W