`
`Gregory K. Wallace
`Multimedia Engineering
`Digital Equipment Corporation
`Maynard, Massachusetts
`
`Submitted in December 1991 for publication in IEEE Transactions on Consumer Electronics
`
`This paper is a revised version of an article by the same
`title and author which appeared in the April 1991 issue
`of Communications of the ACM.
`
`Abstract
`For the past few years, a joint ISO/CCITT committee
`known as JPEG (Joint Photographic Experts Group)
`has been working to establish the first international
`compression standard for continuous-tone still images,
`both grayscale and color. JPEG’s proposed standard
`aims to be generic, to support a wide variety of
`applications for continuous-tone images. To meet the
`differing needs of many applications,
`the JPEG
`standard includes two basic compression methods, each
`with various modes of operation. A DCT-based method
`is specified for “lossy’’ compression, and a predictive
`method for “lossless’’ compression. JPEG features a
`simple lossy technique known as the Baseline method,
`a subset of the other DCT-based modes of operation.
`The Baseline method has been by far the most widely
`implemented JPEG method to date, and is sufficient in
`its own right for a large number of applications. This
`article provides an overview of the JPEG standard, and
`focuses in detail on the Baseline method.
`
`1 Introduction
`Advances over the past decade in many aspects of
`digital technology - especially devices for image
`acquisition, data storage, and bitmapped printing and
`display - have brought about many applications of
`digital imaging. However, these applications tend to be
`specialized due to their relatively high cost. With the
`possible exception of facsimile, digital images are not
`commonplace in general-purpose computing systems
`the way text and geometric graphics are. The majority
`of modern business and consumer usage of photographs
`and other types of images takes place through more
`traditional analog means.
`
`The key obstacle for many applications is the vast
`amount of data required to represent a digital image
`directly. A digitized version of a single, color picture
`at TV resolution contains on the order of one million
`bytes; 35mm resolution requires ten times that amount.
`Use of digital images often is not viable due to high
`storage or transmission costs, even when image capture
`and display devices are quite affordable.
`
`technology offers a
`image compression
`Modern
`possible solution.
` State-of-the-art techniques can
`compress typical images from 1/10 to 1/50 their
`uncompressed size without visibly affecting image
`quality. But compression technology alone is not
`sufficient. For digital image applications involving
`storage or transmission to become widespread in
`today’s marketplace, a standard image compression
`method
`is needed
`to enable
`interoperability of
`equipment from different manufacturers. The CCITT
`recommendation for today’s ubiquitous Group 3 fax
`machines [17] is a dramatic example of how a standard
`compression method can enable an important image
`application. The Group 3 method, however, deals with
`bilevel images only and does not address photographic
`image compression.
`
`For the past few years, a standardization effort known
`by the acronym JPEG, for Joint Photographic Experts
`Group, has been working toward establishing the first
`international digital image compression standard for
`continuous-tone
`(multilevel)
`still
`images, both
`grayscale and color. The “joint” in JPEG refers to a
`collaboration between CCITT and ISO.
` JPEG
`convenes officially as the ISO committee designated
`JTC1/SC2/WG10, but operates
`in close
`informal
`collaboration with CCITT SGVIII. JPEG will be both
`an ISO Standard and a CCITT Recommendation. The
`text of both will be identical.
`
`Photovideotex, desktop publishing, graphic arts, color
`facsimile, newspaper wirephoto transmission, medical
`imaging, and many other continuous-tone
`image
`applications require a compression standard in order to
`
`1
`
`LG 1021
`
`
`
`develop significantly beyond their present state. JPEG
`has undertaken the ambitious task of developing a
`general-purpose compression standard to meet the
`needs of almost all continuous-tone still-image
`applications.
`
`If this goal proves attainable, not only will individual
`applications flourish, but exchange of images across
`application boundaries will be facilitated. This latter
`feature will become increasingly important as more
`image applications are implemented on general-purpose
`computing systems, which are themselves becoming
`increasingly interoperable and internetworked. For
`applications which require specialized VLSI to meet
`their
`compression
`and
`decompression
`speed
`requirements, a common method will provide
`economies of scale not possible within a single
`application.
`
`This article gives an overview of JPEG’s proposed
`image-compression standard. Readers without prior
`knowledge of JPEG or compression based on the
`Discrete Cosine Transform (DCT) are encouraged to
`study first the detailed description of the Baseline
`sequential codec, which is the basis for all of the
`DCT-based decoders. While this article provides many
`details, many more are necessarily omitted. The reader
`should refer to the ISO draft standard [2] before
`attempting implementation.
`
`Some of the earliest industry attention to the JPEG
`proposal has been focused on the Baseline sequential
`codec as a motion image compression method - of the
`‘‘intraframe’’ class, where each frame is encoded as a
`separate image. This class of motion image coding,
`while providing less compression than ‘‘interframe’’
`methods like MPEG, has greater flexibility for video
`editing. While this paper focuses only on JPEG as a
`still picture standard (as ISO intended), it is interesting
`to note that JPEG is likely to become a ‘‘de facto’’
`intraframe motion standard as well.
`
`2 Background: Requirements and Selec-
`tion Process
`JPEG’s goal has been to develop a method for
`continuous-tone image compression which meets the
`following requirements:
`
`1) be at or near the state of the art with regard to
`compression
`rate and accompanying
`image
`fidelity, over a wide range of image quality ratings,
`and especially in the range where visual fidelity to
`the original is characterized as “very good” to
`“excellent”;
`also,
`the
`encoder
`should be
`parameterizable, so that the application (or user)
`can set the desired compression/quality tradeoff;
`
`to practically any kind of
`2) be applicable
`continuous-tone digital source image (i.e. for most
`practical purposes not be restricted to images of
`certain dimensions, color spaces, pixel aspect
`ratios, etc.) and not be limited to classes of imagery
`with restrictions on scene content, such as
`complexity,
`range of colors, or
`statistical
`properties;
`
`3) have tractable computational complexity, to make
`feasible software implementations with viable
`performance on a range of CPU’s, as well as
`hardware implementations with viable cost for
`applications requiring high performance;
`
`4)
`
` have the following modes of operation:
`
`•
`
`•
`
`Sequential encoding: each image component is
`encoded in a single left-to-right, top-to-bottom
`scan;
`
`Progressive encoding: the image is encoded in
`multiple scans for applications
`in which
`transmission time is long, and the viewer
`prefers to watch the image build up in multiple
`coarse-to-clear passes;
`
`•
`
`Lossless encoding: the image is encoded to
`guarantee exact recovery of every source
`image sample value (even though the result is
`low compression compared
`to
`the
`lossy
`modes);
`• Hierarchical encoding: the image is encoded at
`multiple resolutions so that lower-resolution
`versions may be accessed without first having
`to decompress the image at its full resolution.
`
`In June 1987, JPEG conducted a selection process
`based on a blind assessment of subjective picture
`quality, and narrowed 12 proposed methods to three.
`Three informal working groups formed to refine them,
`and in January 1988, a second, more rigorous selection
`process [19] revealed that the “ADCT” proposal [11],
`based on the 8x8 DCT, had produced the best picture
`quality.
`
`At the time of its selection, the DCT-based method was
`only partially defined for some of the modes of
`operation. From 1988 through 1990, JPEG undertook
`the sizable task of defining, documenting, simulating,
`testing, validating, and simply agreeing on the plethora
`of details necessary for genuine interoperability and
`universality. Further history of the JPEG effort is
`contained in [6, 7, 9, 18].
`
`2
`
`
`
`of operation, the steps shown are used as building
`blocks within a larger framework.
`
`4.1 8x8 FDCT and IDCT
`
`At the input to the encoder, source image samples are
`grouped into 8x8 blocks, shifted from unsigned integers
`with range [0, 2P - 1] to signed integers with range
`[-2P-1, 2P-1-1], and input to the Forward DCT (FDCT).
`At the output from the decoder, the Inverse DCT
`(IDCT) outputs 8x8 sample blocks to form the
`reconstructed image. The following equations are the
`idealized mathematical definitions of the 8x8 FDCT
`and 8x8 IDCT:
`
`) *
`
`) *
`
`= ; 0
`
`(1)
`
`(2)
`
`,
`
`(
`
`
`
`7
`
`7
`
`
`[
`)
` =0
` =0
`16 ]
`
` (2 +1)
`
`
`
`
`
`
`7
`
`
`
`(
`
`)
`
`(
`
`,
`
`=0
`
`) (
`16 ]
`
`
`
`
`2
`
`) = 1
` otherwise.
`
`,
`
`) = 1
`4
`
`(
`
`)
`
`(
`
` (
`
`
`
`
`4
`[
`) = 1
`
`16
`
`7
`
`=0
`
`,
`
`(
`
`
` (2 +1)
`
`16
`
`where:
`
`(
`
`),
`
`
`
`(
`
`),
`
`(
`
`(
`
`) = 1
`
`for
`
`,
`
`The DCT is related to the Discrete Fourier Transform
`(DFT).
` Some simple
`intuition
`for DCT-based
`compression can be obtained by viewing the FDCT as a
`harmonic analyzer and the IDCT as a harmonic
`synthesizer. Each 8x8 block of source image samples
`is effectively a 64-point discrete signal which is a
`function of the two spatial dimensions x and y. The
`FDCT takes such a signal as its input and decomposes
`it into 64 orthogonal basis signals. Each contains one
`of
`the 64 unique
`two-dimensional (2D) “spatial
`frequencies’’ which comprise
`the
`input signal’s
`“spectrum.” The ouput of the FDCT is the set of 64
`basis-signal amplitudes or “DCT coefficients” whose
`values are uniquely determined by the particular
`64-point input signal.
`
`The DCT coefficient values can thus be regarded as the
`relative amount of the 2D spatial frequencies contained
`in the 64-point input signal. The coefficient with zero
`frequency in both dimensions is called the “DC
`coefficient” and the remaining 63 coefficients are
`called the “AC coefficients.’’ Because sample values
`
`3
`
`3 Architecture of the Proposed Standard
`The proposed standard contains the four “modes of
`operation” identified previously. For each mode, one
`or more distinct codecs are specified. Codecs within a
`mode differ according to the precision of source image
`samples they can handle or the entropy coding method
`they use. Although the word codec (encoder/decoder)
`is used frequently in this article, there is no requirement
`that implementations must include both an encoder and
`a decoder. Many applications will have systems or
`devices which require only one or the other.
`
`The four modes of operation and their various codecs
`have resulted from JPEG’s goal of being generic and
`from the diversity of image formats across applications.
`The multiple pieces can give the impression of
`undesirable complexity, but they should actually be
`regarded as a comprehensive “toolkit” which can span a
`wide range of continuous-tone image applications. It is
`unlikely that many implementations will utilize every
`tool -- indeed, most of the early implementations now
`on the market (even before final ISO approval) have
`implemented only the Baseline sequential codec.
`
`The Baseline sequential codec is inherently a rich and
`sophisticated compression method which will be
`sufficient for many applications. Getting this minimum
`JPEG
`capability
`implemented
`properly
`and
`interoperably will provide
`the
`industry with an
`important initial capability for exchange of images
`across vendors and applications.
`
`4 Processing Steps for DCT-Based Coding
`Figures 1 and 2 show the key processing steps which
`are the heart of the DCT-based modes of operation.
`These
`figures
`illustrate
`the
`special
`case of
`single-component (grayscale) image compression. The
`reader can grasp
`the essentials of DCT-based
`compression by
`thinking of
`it as essentially
`compression of a stream of 8x8 blocks of grayscale
`image samples. Color image compression can then be
`approximately regarded as compression of multiple
`grayscale images, which are either compressed entirely
`one at a time, or are compressed by alternately
`interleaving 8x8 sample blocks from each in turn.
`
`For DCT sequential-mode codecs, which include the
`Baseline sequential codec, the simplified diagrams
`indicate how single-component compression works in a
`fairly complete way. Each 8x8 block is input, makes
`its way through each processing step, and yields output
`in compressed form into the data stream. For DCT
`progressive-mode codecs, an image buffer exists prior
`to the entropy coding step, so that an image can be
`stored and then parceled out in multiple scans with suc-
`cessively improving quality. For the hierarchical mode
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` 8x8 blocks DCT-Based Encoder
`•
`
`FDCT Quantizer
`
` Entropy
` Encoder
`
` Source Table Table Compressed
`Image Data
`Specifications Image Data
` Specifications
`
`Figure 1. DCT-Based Encoder Processing Steps
`
`DCT-Based Decoder
`
` Entropy
` Decoder
`
` Dequantizer IDCT
`
`Compressed
`Image Data
`
` Table Table
` Specifications Specifications
`
` Reconstructed
` Image Data
`
`Figure 2. DCT-Based Decoder Processing Steps
`
`typically vary slowly from point to point across an
`image, the FDCT processing step lays the foundation
`for achieving data compression by concentrating most
`of the signal in the lower spatial frequencies. For a
`typical 8x8 sample block from a typical source image,
`most of the spatial frequencies have zero or near-zero
`amplitude and need not be encoded.
`
`At the decoder the IDCT reverses this processing step.
`It takes the 64 DCT coefficients (which at that point
`have been quantized) and reconstructs a 64-point ouput
`image
`signal by
`summing
`the basis
`signals.
`Mathematically, the DCT is one-to-one mapping for
`64-point vectors between the image and the frequency
`domains. If the FDCT and IDCT could be computed
`with perfect accuracy and if the DCT coefficients were
`not quantized as in the following description, the
`original 64-point signal could be exactly recovered. In
`principle, the DCT introduces no loss to the source
`image samples; it merely transforms them to a domain
`in which they can be more efficiently encoded.
`
`IDCT
`Some properties of practical FDCT and
`implementations raise the issue of what precisely
`should be required by
`the JPEG standard.
` A
`fundamental property is that the FDCT and IDCT
`equations
`contain
`transcendental
`functions.
`Consequently, no physical
`implementation
`can
`compute them with perfect accuracy. Because of the
`DCT’s application importance and its relationship to
`the DFT, many different algorithms by which the
`
`FDCT and IDCT may be approximately computed have
`been devised [16]. Indeed, research in fast DCT
`algorithms is ongoing and no single algorithm is
`optimal for all implementations. What is optimal in
`software for a general-purpose CPU is unlikely to be
`optimal in firmware for a programmable DSP and is
`certain to be suboptimal for dedicated VLSI.
`
`Even in light of the finite precision of the DCT inputs
`and outputs, independently designed implementations
`of the very same FDCT or IDCT algorithm which differ
`even minutely in the precision by which they represent
`cosine terms or intermediate results, or in the way they
`sum and round fractional values, will eventually
`produce slightly different outputs from identical inputs.
`
`To preserve freedom for innovation and customization
`within implementations, JPEG has chosen to specify
`neither a unique FDCT algorithm or a unique IDCT
`algorithm in its proposed standard. This makes
`compliance somewhat more difficult
`to confirm,
`because
`two compliant encoders
`(or decoders)
`generally will not produce identical outputs given
`identical inputs. The JPEG standard will address this
`issue by specifying an accuracy test as part of its
`compliance tests for all DCT-based encoders and
`decoders; this is to ensure against crudely inaccurate
`cosine basis functions which would degrade image
`quality.
`
`4
`
`
`
`This output value is normalized by the quantizer step
`size. Dequantization is the inverse function, which in
`this case means simply that the normalization is
`removed by multiplying by the step size, which returns
`the result to a representation appropriate for input to the
`IDCT:
`
`,
`
` )((
`
`) = (
`
`,
`
`)
`
`*
`
`(
`
`,
`
`)
`
`(4)
`
`When the aim is to compress the image as much as
`possible without visible artifacts, each step size ideally
`should be chosen as the perceptual threshold or “just
`noticeable difference” for the visual contribution of its
`corresponding cosine basis function. These thresholds
`are also functions of the source image characteristics,
`display characteristics and viewing distance. For
`applications in which these variables can be reasonably
`well defined, psychovisual experiments can be
`performed to determine the best thresholds. The
`experiment described in [12] has led to a set of
`Quantization Tables for CCIR-601 [4] images and
`displays. These have been used experimentally by
`JPEG members and will appear in the ISO standard as a
`matter of information, but not as a requirement.
`
`4.3 DC Coding and Zig-Zag Sequence
`
`treated
`is
`the DC coefficient
`After quantization,
`separately from the 63 AC coefficients. The DC
`coefficient is a measure of the average value of the 64
`image samples. Because there is usually strong
`correlation between the DC coefficients of adjacent 8x8
`blocks, the quantized DC coefficient is encoded as the
`difference from the DC term of the previous block in
`the encoding order (defined in the following), as shown
`in Figure 3. This special treatment is worthwhile, as
`DC coefficients frequently contain a significant fraction
`of the total image energy.
`
`For each DCT-based mode of operation, the JPEG
`proposal specifies separate codecs for images with 8-bit
`and 12-bit (per component) source image samples. The
`12-bit codecs, needed to accommodate certain types of
`medical
`and
`other
`images,
`require
`greater
`computational resources to achieve the required FDCT
`or IDCT accuracy.
` Images with other sample
`precisions can usually be accommodated by either an
`8-bit or 12-bit codec, but this must be done outside the
`JPEG standard.
` For example, it would be the
`responsibility of an application to decide how to fit or
`pad a 6-bit sample into the 8-bit encoder’s input
`interface, how to unpack it at the decoder’s output, and
`how to encode any necessary related information.
`
`4.2 Quantization
`
`After output from the FDCT, each of the 64 DCT
`coefficients is uniformly quantized in conjunction with
`a 64-element Quantization Table, which must be
`specified by the application (or user) as an input to the
`encoder. Each element can be any integer value from 1
`to 255, which specifies the step size of the quantizer for
`its corresponding DCT coefficient. The purpose of
`quantization is to achieve further compression by
`representing DCT coefficients with no greater precision
`than is necessary to achieve the desired image quality.
`Stated another way, the goal of this processing step is
`to discard information which is not visually significant.
`Quantization is a many-to-one mapping, and therefore
`is fundamentally lossy. It is the principal source of
`lossiness in DCT-based encoders.
`
`Quantization is defined as division of each DCT
`coefficient by its corresponding quantizer step size,
`followed by rounding to the nearest integer:
`
` (
`
`(
`(
`
`,
`,
`
`)
`) )
`
`(3)
`
` %$
`
` #
`
`) =
`!
`
`,
`
` (
`
` DCi-1 DCi
`
`. . .
`
`blocki-1
`
`blocki
`
`. . .
`
`DIFF = DCi - DCi-1
`
`DC AC01
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`
`•
`•
`•
`•
`•
`•
`•
`•
`
`•
`•
`•
`•
`•
`•
`•
`•
`
`•
`•
`•
`•
`•
`•
`•
`•
`
` AC07
`• •
`•
`• •
`•
`• •
`•
`• •
`•
`• •
`•
`•
`•
`•
`• •
`•
`• •
`•
`
`AC70
`Differential DC encoding Zig−zag sequence
`
`AC77
`
`Figure 3. Preparation of Quantized Coefficients for Entropy Coding
`
`5
`
`
`
`&
`
`
`'
`
`
`
`
`
`
`*
`
`
`
`
`Finally, all of the quantized coefficients are ordered
`into the “zig-zag” sequence, also shown in Figure 3.
`This ordering helps to facilitate entropy coding by
`placing low-frequency coefficients (which are more
`likely
`to be nonzero) before high-frequency
`coefficients.
`
`4.4 Entropy Coding
`
`The final DCT-based encoder processing step is
`entropy coding.
` This step achieves additional
`compression losslessly by encoding the quantized DCT
`coefficients more compactly based on their statistical
`characteristics. The JPEG proposal specifies two
`entropy coding methods - Huffman coding [8] and
`arithmetic coding [15]. The Baseline sequential codec
`uses Huffman coding, but codecs with both methods
`are specified for all modes of operation.
`
`It is useful to consider entropy coding as a 2-step
`process. The first step converts the zig-zag sequence of
`quantized coefficients into an intermediate sequence of
`symbols. The second step converts the symbols to a
`data stream in which the symbols no longer have
`externally identifiable boundaries. The form and
`definition of the intermediate symbols is dependent on
`both the DCT-based mode of operation and the entropy
`coding method.
`
`Huffman coding requires that one or more sets of
`Huffman code tables be specified by the application.
`The same tables used to compress an image are needed
`to decompress it. Huffman tables may be predefined
`and used within an application as defaults, or computed
`specifically
`for a given
`image
`in an
`initial
`statistics-gathering pass prior to compression. Such
`choices are the business of the applications which use
`JPEG;
`the JPEG proposal specifies no required
`Huffman tables. Huffman coding for the Baseline
`sequential encoder is described in detail in section 7.
`
`By contrast, the particular arithmetic coding method
`specified in the JPEG proposal [2] requires no tables to
`be externally input, because it is able to adapt to the
`image statistics as it encodes the image. (If desired,
`statistical conditioning tables can be used as inputs for
`slightly better efficiency, but this is not required.)
`Arithmetic coding has produced 5-10% better
`compression than Huffman for many of the images
`which JPEG members have tested. However, some feel
`it is more complex than Huffman coding for certain
`implementations,
`for example,
`the highest-speed
`hardware
`implementations.
`
`(Throughout JPEG’s
`history, “complexity” has proved to be most elusive as
`a practical metric for comparing compression methods.)
`
`If the only difference between two JPEG codecs is the
`entropy coding method, transcoding between the two is
`
`possible by simply entropy decoding with one method
`and entropy recoding with the other.
`
`4.5 Compression and Picture Quality
`
`For color images with moderately complex scenes, all
`DCT-based modes of operation typically produce the
`following levels of picture quality for the indicated
`ranges of compression. These levels are only a
`guideline
`- quality and compression can vary
`significantly according to source image characteristics
`and scene content. (The units “bits/pixel” here mean
`the total number of bits in the compressed image -
`including the chrominance components - divided by the
`number of samples in the luminance component.)
`
`•
`
`•
`
`•
`
`•
`
`0.25-0.5 bits/pixel: moderate to good quality,
`sufficient for some applications;
`
`0.5-0.75 bits/pixel: good to very good quality,
`sufficient for many applications;
`
`0.75-1/5 bits/pixel: excellent quality, sufficient for
`most applications;
`
`1.5-2.0 bits/pixel: usually indistinguishable from
`the original, sufficient for the most demanding
`applications.
`
`5 Processing Steps for Predictive Lossless
`Coding
`After its selection of a DCT-based method in 1988,
`JPEG discovered that a DCT-based lossless mode was
`difficult to define as a practical standard against which
`encoders and decoders could be
`independently
`implemented, without placing severe constraints on
`both encoder and decoder implementations.
`
`JPEG, to meet its requirement for a lossless mode of
`operation, has chosen a simple predictive method
`which is wholly independent of the DCT processing
`described previously. Selection of this method was not
`the result of rigorous competitive evaluation as was the
`DCT-based method. Nevertheless, the JPEG lossless
`method produces results which,
`in
`light of
`its
`simplicity, are surprisingly close to the state of the art
`for lossless continuous-tone compression, as indicated
`by a recent technical report [5].
`
`Figure 4 shows the main processing steps for a
`single-component image. A predictor combines the
`values of up to three neighboring samples (A, B, and C)
`to form a prediction of the sample indicated by X in
`Figure 5. This prediction is then subtracted from the
`actual value of sample X, and the difference is encoded
`
`6
`
`
`
`Lossless Encoder
`
`Predictor
`
`Entropy
`Encoder
`
` Source
`Image Data
`
` Table
` Specifications
`
`Compressed
` Image Data
`
`Figure 4. Lossless Mode Encoder Processing Steps
`
`prediction
`
`no prediction
`
`A B C A
`
`+B-C
`A+((B-C)/2)
`B+((A-C)/2)
`(A+B)/2
`
`selection-
`value
`
`0
`
`1 2 3 4 5 6 7
`
`Table 1. Predictors for Lossless Coding
`
`6 Multiple-Component Images
`The previous sections discussed the key processing
`steps of the DCT-based and predictive lossless codecs
`for the case of single-component source images. These
`steps accomplish the image data compression. But a
`good deal of the JPEG proposal is also concerned with
`the handling and control of color (or other) images with
`multiple components. JPEG’s aim for a generic
`compression
`standard
`requires
`its proposal
`to
`accommodate a variety of source image formats.
`
`6.1 Source Image Formats
`
`The source image model used in the JPEG proposal is
`an abstraction from a variety of image types and
`applications and consists of only what is necessary to
`compress and reconstruct digital image data. The
`reader should recognize that the JPEG compressed data
`format does not encode enough information to serve as
`a complete image representation. For example, JPEG
`does not specify or encode any information on pixel
`aspect
`ratio, color space, or
`image acquisition
`characteristics.
`
`7
`
`losslessly by either of the entropy coding methods -
`Huffman or arithmetic. Any one of the eight predictors
`listed in Table 1 (under “selection-value”) can be used.
`
`Selections 1, 2, and 3 are one-dimensional predictors
`and selections 4, 5, 6 and 7 are two-dimensional
`predictors. Selection-value 0 can only be used for
`differential coding
`in
`the hierarchical mode of
`operation. The entropy coding is nearly identical to
`that used for the DC coefficient as described in section
`7.1 (for Huffman coding).
`
`C B
`A X
`
`Figure 5. 3-Sample Prediction Neighborhood
`
`For the lossless mode of operation, two different codecs
`are specified - one for each entropy coding method.
`The encoders can use any source image precision from
`2 to 16 bits/sample, and can use any of the predictors
`except selection-value 0. The decoders must handle
`any of the sample precisions and any of the predictors.
`Lossless codecs
`typically produce around 2:1
`compression for color images with moderately complex
`scenes.
`
`
`
`..
`
`.
`
`samples
`
`left
`
`CNf
`CNf-1
`
`.
`
`.
`
`.
`
`C2
`C1
`(a) Source image with multiple components
`
`top
`.
`.
`.
`.
`.
`
`yi
`
`.
`.
`.
`
`Ci
`.
`.
`.
`.
`.
`
`.
`.
`.
`.
`
`xi
`
`.
`.
`
`.
`.
`
`.
`.
`
`line
`
`right
`
`bottom
` (b) Characteristics of an image component
`
`Figure 6. JPEG Source Image Model
`
`Figure 6 illustrates the JPEG source image model. A
`source
`image contains
`from 1
`to 255
`image
`components, sometimes called color or spectral bands
`or channels. Each component consists of a rectangular
`array of samples. A sample is defined to be an
`unsigned integer with precision P bits, with any value
`in the range [0, 2P-1]. All samples of all components
`within the same source image must have the same
`precision P. P can be 8 or 12 for DCT-based codecs,
`and 2 to 16 for predictive codecs.
`
`The ith component has sample dimensions xi by yi. To
`accommodate
`formats
`in which
`some
`image
`components are sampled at different rates than others,
`components can have different dimensions.
` The
`dimensions must have a mutual integral relationship
`defined by Hi and Vi, the relative horizontal and
`vertical sampling factors, which must be specified for
`each component. Overall image dimensions X and Y
`are defined as the maximum xi and yi for all
`components in the image, and can be any number up to
`216. H and V are allowed only the integer values 1
`through 4. The encoded parameters are X, Y, and His
`and Vis for each components. The decoder reconstructs
`the dimensions xi and yi for each component, according
`to the following relationship shown in Equation 5:
`
`of decompression. For many systems, this is only
`feasible if the components are interleaved together
`within the compressed data stream.
`
`To make the same interleaving machinery applicable to
`both DCT-based and predictive codecs, the JPEG
`proposal has defined the concept of “data unit.” A data
`unit is a sample in predictive codecs and an 8x8 block
`of samples in DCT-based codecs.
`
`The order in which compressed data units are placed in
`the compressed data stream is a generalization of
`raster-scan order. Generally, data units are ordered
`from left-to-right and top-to-bottom according to the
`orientation shown in Figure 6. (It is the responsibility
`of applications to define which edges of a source image
`are top, bottom, left and right.) If an image component
`is noninterleaved (i.e., compressed without being
`interleaved with other components), compressed data
`units are ordered in a pure raster scan as shown in
`Figure 7.
`
`right
`
`•
`•
`• •
`
`••
`
`left
`
`•
`•
`•
`
`•
`•
`•
`
`top
`• • •
`• • •
`• • •
`
`bottom
`
` and
`
`(5)
`
`Figure 7. Noninterleaved Data Ordering
`
`When two or more components are interleaved, each
`component Ci is partitioned into rectangular regions of
`Hi by Vi data units, as shown in the generalized
`example of Figure 8. Regions are ordered within a
`component from left-to-right and top-to-bottom, and
`within a region, data units are ordered from left-to-right
`and top-to-bottom. The JPEG proposal defines the
`term Minimum Coded Unit (MCU) to be the smallest
`
`8
`
`
`
`,.-
`
`%+ =
`1+ =
`where Ø ø is the ceiling function.
`
`/0 +0
`2 3 +354 /6874 6879;::
`
`6.2 Encoding Order and Interleaving
`
`A practical image compression standard must address
`how systems will need to handle the data during the
`process of decompression. Many applications need to
`pipeline
`the process of displaying or printing
`multiple-component images in parallel with the process
`
`
`
`Cs4: H4=1, V4=1
`0 1 2
`•
`•
`•
`•
`
`•
`•
`
`0 1
`
`Cs3: H3=1, V3=2
` 0 1 2
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`•
`
`0 1 2 3
`
`Cs2: H2=2, V2=1
`0 1 2 3 4 5
`•
`•
`•
`•
`• •
`•
`•
`•
`•
`• •
`
`0 1
`
`Cs1: H1=2, V1=2
` 0 1 2 3 4 5
`•
`• •
`•
`•
`•
`• •
`•
`•
`•
`• •
`•
`•
`•
`• •
`•
`•
`
`•
`•
`•
`•
`
`0 1 2 3
`
`4 0
`
`0 ,
`1,
`2,
`0,
`
`4 0
`
`4 0
`
`4 1
`
`Cs4
`
`3 1
`
`0 <
`1 <
`2 <
`0 <
`
`3 1
`
`3 1
`
`3 3
`
`3 0
`
`30
`
`3 0
`
`3 2
`
`0 <
`1 <
`2 <
`0 <
`Cs3
`
`1
`
` <
`3 <
`5 <
`1 <
`
`2 0
`
`2 1
`
`2 0
`
`2 0
`
`0 <
`2 <
`4 <
`0 <
`Cs2
`
`2 0
`
`2 0
`
`2 0
`
`2 1
`
`1 1
`
`1 <
`3 <
`5 <
`1 <
`
`1 1
`
`1 1
`
`1 3
`
`1 1
`
`1 1
`
`1 1
`
`1 3
`
`1 0
`
`1 0
`
`1 0
`
`12
`
`0 <
`1 <
`0 <
`2 <
`3 <
`2 <
`4 <
`5 <
`4 <
`0 <
`1 <
`0 <
`Cs1 data units
`
`1 0
`
`1 0
`
`1 0
`
`1 2
`
`MCU1 =
`
`MCU2 =
`
`MCU3 =
`MCU4 =
`
`Figure 8. Generalized Interleaved Data Ordering Example
`
`6.3 Multiple Tables
`
`interleaving control discussed
`the
`to
`In addition
`previously, JPEG codecs must control application of
`the proper table data to the proper components. The
`same quantization table and the same entropy coding
`table (or set of tables) must be used to encode all
`samples within a component.
`
`JPEG decoders can store up to 4 different quantization
`tables and up to 4 different (sets of) entropy coding
`tables simultaneously.
` (The Baseline sequential
`decoder is the exception; it can only store up to 2 sets
`of entropy coding tables.) This is necessary for
`switching
`between
`different
`tables
`during
`decompression of
`a
`scan
`containing multiple
`(interleaved) components, in order to apply the proper
`table to the proper component. (Tables cannot be
`loaded during decompression of a scan