`Karczewicz et al.
`
`USOO6856701 B2
`US 6,856,701 B2
`Feb. 15, 2005
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54)
`
`(75)
`
`(73)
`(*)
`
`(21)
`(22)
`(65)
`
`(60)
`
`(51)
`(52)
`(58)
`
`(56)
`
`METHOD AND SYSTEM FOR CONTEXT
`BASED ADAPTIVE BINARY ARTHMETIC
`CODING
`
`Inventors: Marta Karczewicz, Irving, TX (US);
`Ragip Kurceren, Irving, TX (US)
`Assignee: Nokia Corporation, Espoo (FI)
`Notice:
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`Appl. No.: 09/995,240
`Filed:
`Nov. 27, 2001
`Prior Publication Data
`US 2003/0081850 A1 May 1, 2003
`Related U.S. Application Data
`Provisional application No. 60/322,112, filed on Sep. 14,
`2001.
`Int. Cl.............................. G06K 9/36; G06K 9/34
`U.S. Cl. ........................ 382/247; 382/173; 382/248
`Field of Search ................................. 382/173,232,
`382/233, 240, 247, 248, 250, 245, 251;
`341/50, 59, 52, 67, 69, 79, 106, 107; 358/505
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6/2001 Wilkinson .................. 382/245
`6.243,496 B1
`6,327,392 B1 12/2001 Li .............................. 382/248
`6,339,658 B1 * 1/2002 Moccagatta et al. ........ 382/240
`6,351,563 B1
`2/2002 Kim et al. .................. 382/232
`6,545,618 B2 * 4/2003 Yip ............................ 341/107
`
`6,567,081. B1 *
`5/2003 Li et al. ..................... 345/419
`6/2003 Yip ............................. 341/50
`6,577.251 B1 *
`OTHER PUBLICATIONS
`I. H. Witten et al., “Arithmetic Coding for Data Compres
`sion.” Commun ACM, vol. 30, pp. 520–540, Jun. 1987).
`H.26L Test Model Long Term No. 8 (TML-8) draft.0 ITU-T
`Telecommunication Standardization Section, Study Group
`16, Video Coding Experts Group.
`Gisle Bontegaard, Recommended Simulation Conditions
`for H.26L (VCG-M75, ITU-T Video Coding Experts
`Group, Austin, TX USA, Apr. 2-4, 2001).
`* cited by examiner
`Primary Examiner Andrew W. Johns
`ASSistant Examiner Amir Alavi
`(74) Attorney, Agent, or Firm Ware, Fressola, Va Der
`Sluys & Adolphson LLP
`(57)
`ABSTRACT
`A method and System for image coding, wherein an image
`is divided into a plurality of blocks for Scanning. The pixels
`values in the Scanned block are represented by a plurality of
`level-run value pairs, wherein the level value is indicative of
`a non-Zero pixel value and the run value is indicative of the
`number of consecutive Zero pixel values preceding the
`non-Zero pixel value. A plurality of contexts indicative of the
`level-run value pairs are conveyed to a decoder for allowing
`the decoder to reconstruct the image based on the contexts.
`The assignment of the contexts is also based on the level
`value of a preceding level-run pair. Additionally, instead of
`an end-of-block symbol, the number of non-zero coefficients
`is provided to the decoder prior to conveying the contexts
`thereto.
`
`40 Claims, 16 Drawing Sheets
`
`20
`
`
`
`Walue to Bin
`Mapping
`
`Context
`Assignment
`
`Probability
`estimation
`
`Coding
`engine
`
`122
`
`
`
`
`
`
`
`
`
`Previous Levels
`and Runs
`
`16
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 1
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 2
`
`
`
`U.S. Patent
`
`US 6,856,701 B2
`
`
`
`CN
`
`988
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 3
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 3 of 16
`
`US 6,856,701 B2
`
`
`
`s
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 4
`
`
`
`U.S. Patent
`
`US 6,856,701 B2
`
`001
`
`
`
`
`
`
`
` (IHW HOIHd) 7 '0||-||
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 5
`
`
`
`U.S. Patent
`
`US 6,856,701 B2
`
`008
`
`
`
`
`
`
`
`
`
`(IHW HOIHd)
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 6
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 6 of 16
`
`US 6,856,701 B2
`
`
`
`
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 7
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 7 of 16
`
`US 6,856,701 B2
`
`3
`
`o
`
`wn
`
`
`
`s
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 8
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 8 of 16
`
`US 6,856,701 B2
`
`
`
`| 0
`
`| 0
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 9
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 9 of 16
`
`US 6,856,701 B2
`
`- - - - - • • •- - - - ~~~~ =
`
`
`
`
`
`- - - - - - - - - - - - - - - - - - - - - -
`- - - - - - - -
`
`- - -
`
`- - -l- - - - - - - -
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 10
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 10 of 16
`
`US 6,856,701 B2
`
`Binto which
`level is mapped
`
`Prev Level
`
`Context=
`(in p ave
`
`Max Bin Level = 3
`
`1
`2
`
`Max Level F 5
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`1
`
`1
`
`2
`
`2
`
`2
`
`3
`
`3
`
`3
`
`4
`
`4
`
`4
`
`> 5
`
`> 5
`
`> 5
`
`1
`6
`
`11
`
`2
`
`7
`
`12
`
`3
`
`8
`
`13
`
`4
`
`9
`
`14
`
`5
`
`10
`
`15
`
`Context Mapping Table
`
`FIG. 7b
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 11
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 11 of 16
`
`US 6,856,701 B2
`
`Fol Leviated enri's Run.
`
`+ Level
`
`Max Bin Run F3
`
`1
`2
`
`Max Run = 4
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`
`2
`
`> 3
`
`1
`1
`
`1
`
`2
`
`2
`
`2
`
`3
`
`3
`
`3
`
`> 4
`
`> 4
`
`> 4
`
`F G. 8 3.
`
`1
`5
`
`9
`
`2
`
`6
`
`10
`
`3
`
`7
`
`11
`
`4.
`
`8
`
`12
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 12
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 12 of 16
`
`US 6,856,701 B2
`
`- - - ~ • • - - - •
`
`- - - - - - - - - - - - - - -
`
`- - - - - - - - - - -
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 13
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 13 of 16
`
`US 6,856,701 B2
`
`OZ
`
`
`
`
`
`
`
`
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 14
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 14 of 16
`
`US 6,856,701 B2
`
`
`
`J
`
`
`
`
`
`
`
`ZOZ
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 15
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 15 of 16
`
`US 6,856,701 B2
`
`Receiving
`Image
`
`Dividing image
`into blocks
`
`Scanning
`blockS
`
`Obtaining current
`and previous
`levels/runs
`
`ASSigining
`COntexts
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`510
`
`520
`
`530
`
`540
`
`550
`
`560
`
`FIG 11
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 16
`
`
`
`U.S. Patent
`
`Feb. 15, 2005
`
`Sheet 16 of 16
`
`US 6,856,701 B2
`
`f 500'
`
`
`
`
`
`
`
`Receiving
`image
`
`Dividing image
`into blocks
`
`Scanning
`blocks
`
`Providing
`No
`
`ASSigning
`COntexts
`
`
`
`
`
`
`
`
`
`
`
`510
`
`520
`
`530
`
`542
`
`550
`
`560
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 17
`
`
`
`1
`METHOD AND SYSTEM FOR CONTEXT
`BASED ADAPTIVE BINARY ARTHMETIC
`CODING
`
`This patent application is based on and claims priority to
`U.S. Provisional Application No. 60/322,112, filed Sep. 14,
`2001.
`
`FIELD OF THE INVENTION
`The present invention relates generally to the compression
`of Still images and Video Sequences and, more particularly,
`to a method and System for context-based adaptive binary
`arithmetic coding.
`BACKGROUND OF THE INVENTION
`A digital image in uncompressed form comprises an array
`of image pixels or picture elements. For example, in a
`commonly used digital image format, known as the Quarter
`Common Interchange Format (QCIF), an image, or frame,
`comprises 25,344 pixels arranged in an array 176x144
`pixels. Each pixel, in turn, is represented by a certain
`number of bits, which carry information about the brightness
`(luminance) and/or color (chrominance) of the pixel. Dif
`ferent Schemes exist for representing the luminance and/or
`chrominance of pixels in a digital image. Commonly, a
`So-called YUV color model is used. The luminance, or Y,
`component represents the luminance of the pixel, while the
`color of the pixel is represented by two chrominance or color
`difference components, labelled U and V. Other color
`models, such as RGB (Red, Green, Blue) color models,
`which are based on components representing the three
`primary colors of light, are also commonly used. However,
`color models based on a luminance/chrominance represen
`tation provide advantages compared with color models
`based on primary colors. These stem from the nature of the
`human visual System, which is more Sensitive to intensity
`variations than it is to color variations. YUV color models
`typically exploit this property by using a lower spatial
`resolution for the chrominance components (U, V) than for
`the luminance component (Y). In this way the amount of
`information needed to represent the color information in an
`image can be reduced without a noticeable reduction in
`perceived image quality.
`The lower spatial resolution of the chrominance compo
`nents is usually attained by Sub-Sampling. Typically, a block
`of 16x16 image pixels is represented by four blocks of 8x8
`pixels comprising luminance information and the corre
`sponding chrominance components are each represented by
`one block of 8x8 pixels representing an area of the image
`equivalent to that of the 16x16 pixels in the luminance
`component. The chrominance components are thus spatially
`sub-sampled by a factor of 2 in the X and y directions. The
`resulting assembly of four 8x8 pixel luminance blocks and
`two Spatially corresponding 8x8 pixel chrominance blockS is
`commonly referred to as a YUV macroblock, or macroblock,
`for short. A QCIF image comprises 11x9 such macroblocks.
`If the luminance blockS and chrominance blocks are repre
`sented with 8 bit resolution (that is by numbers in the range
`0 to 255), the total number of bits required to represent the
`luminance and chrominance information associated with
`each macroblock is 6x(8x8x8)=3072 bits. Thus, the number
`of bits needed to represent an image in QCIF format is
`99x3072=304,128 bits.
`It should be appreciated that even in the situation
`described above, where both chrominance components of a
`digital color image are Sub-Sampled by a factor of two, an
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,856,701 B2
`
`2
`uncompressed image of only moderate size (e.g. 176x144
`pixels) requires a large number of bits for its representation.
`This means that the amount of memory required to Store
`digital images in uncompressed form is excessive.
`Furthermore, if Still images are to be transferred, for
`example over a data communications network having a
`moderate or low available bandwidth, transmission times
`may become lengthy, or the network may become con
`gested. Bandwidth requirements are even more Severe if it is
`desired to transmit a Series of imageS as a digital video
`Sequence in real time. For example, transmission of a digital
`Video Sequence comprising a Series of images in uncom
`pressed QCIF format, represented using a YUV color model,
`at a rate of 30 frames per Second, requires more than 9
`Mbits/s (million bits per second). Such a high data rate is
`generally impractical for use in Video recording, transmis
`Sion and display applications because of the very large
`Storage capacity, transmission channel capacity and hard
`ware performance required. If a Video Sequence is to be
`transmitted in real-time over a fixed line network Such as an
`ISDN (Integrated Services Digital Network) or a PSTN
`(Public Service Telephone Network), the available data
`transmission bandwidth is typically of the order of 64
`kbitS/S. In mobile Video-telephony, where transmission takes
`place at least in part over a radio communications link, the
`available bandwidth can be as low as 20 kbits/s. This means
`that a significant reduction in the amount of information
`used to represent video data must be achieved in order to
`enable transmission of digital images or Video Sequences
`over low bandwidth communication networks. It is never
`theless desirable that this reduction should be achieved
`without Significantly degrading the quality of the imageS/
`Video Sequence.
`Over the past years, a considerable amount of research
`work has been directed towards reducing the amount of data
`required to represent digital images and Video Sequences,
`resulting in the development of numerous different Schemes
`and international Standards for compressing digital Still
`images and digital Video. The basic approach to image
`compression used in almost all Still image and Video encod
`erS existing today involves block-based transform coding.
`Typically, transform coding translates image data from a
`representation comprising pixel values to a form comprising
`a set of coefficient values, each of which is a weighting
`factor (multiplier) for a basis function of the transform in
`question. It can be shown that there is a considerable degree
`of Spatial redundancy within a typical digital image. In
`practical terms, this means that in general the value of any
`pixel within an image is Substantially the same as the value
`of other pixels in its immediate vicinity; that is, there is a
`Significant degree of correlation between pixel values. It is
`further known that when certain mathematical
`transformations, Such as the two-dimensional Discrete
`Cosine Transform (DCT), are performed on image data, this
`Spatial redundancy is reduced significantly, thereby produc
`ing a more compact representation of the image data.
`Block-based Transform Coding as Used in JPEG Still Image
`Coding
`In Still image compression, Such as that performed
`according to the baseline mode of the widely-used JPEG
`Standard, an image to be coded is first divided into an array
`of non-overlapping Square blocks, each block comprising,
`for example, an 8x8 array of image pixels. In the case of the
`JPEG baseline, a two-dimensional Discrete Cosine Trans
`form (DCT) is then applied independently to each of the
`image blocks. This has the effect of converting the image
`data from the pixel value domain to the Spatial frequency
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 18
`
`
`
`3
`domain and to produce a corresponding Set of coefficient
`values, each of which is a weighting factor for a basis
`function of the two-dimensional DCT. The coefficient values
`thus produced are quantized and then coded in a lossleSS
`manner using entropy coding to further reduce the amount of
`data (i.e. number of bits) required for their representation.
`According to the JPEG baseline, the entropy coder employs
`only Huffman coding to produce a compressed bit-Stream,
`although in other modes arithmetic coding may alternatively
`be used. Finally, data describing image and coding param
`eters (e.g. type of compression, quantization and coding
`tables, image size, etc.) is embedded in the bit-stream
`produced by the entropy encoder. As the JPEG standard
`comprises four alternative coding modes and places few
`constraints on the quantization and coding tables that can be
`used, this is necessary in order to enable JPEG compressed
`bit-Streams to be interchanged among different platforms
`and for images to be reconstructed without any ambiguity.
`A digital video Sequence, like an ordinary motion picture
`recorded on film, comprises a sequence of still images (often
`referred to as frames), the illusion of motion being created
`by displaying the frames one after the other at a relatively
`fast rate, typically 15 to 30 frames per Second. AS in any Still
`image, the pixel values of an individual frame within a
`digital Video Sequence exhibit considerable Spatial redun
`dancy. Therefore, the frames of a digital Video Sequence are
`amenable to block-based transform coding, just like indi
`vidual Still images.
`Images in the consecutive frames of a Video Sequence also
`tend to be quite similar and thus the overall change between
`one video frame and the next is rather Small. This means that
`there is considerable temporal redundancy within a typical
`digital Video Sequence. For example, a Scene may comprise
`Some Stationary elements, Such as background Scenery, and
`Some moving areas, for example the face of a newsreader. In
`consecutive frames of the Sequence, it is likely that the
`background will remain unaltered and the only movement in
`the Scene will be due to changes in facial expression of the
`newsreader. Thus, when forming a compressed representa
`tion of a Video Sequence there is also a possibility to use
`techniques which reduce the temporal redundancy of the
`image data of the Sequence in addition to methods that
`reduce Spatial redundancy, thereby allowing further data
`compression to be achieved.
`Hybrid Video Encoder/Decoder
`State of the art Video coding Systems make use of a
`technique known as motion-compensated prediction, to
`reduce the temporal redundancy in Video Sequences. Using
`motion-compensated prediction, the image content of Some
`(often many) frames in a digital video Sequence is pre
`dicted from one or more other frames in the Sequence,
`known as reference frames. Prediction of image content is
`achieved by tracing the motion of objects or regions of an
`image between a frame to be coded (compressed) and the
`reference frame(s) using motion vectors. In general, the
`reference frame(s) may precede the frame to be coded or
`may follow it in the video sequence. However, as will
`become apparent from discussions later in the text, it is not
`appropriate (or possible) to apply motion-compensated pre
`diction to all frames of a Video Sequence and thus at least two
`types of encoding are used in State of the art Video coding
`Systems.
`Frames of a Video Sequence which are compressed using
`motion-compensated prediction are generally referred to as
`INTER-coded or P-frames. Motion-compensated prediction
`alone rarely provides a Sufficiently precise representation of
`the image content of a Video frame and therefore it is
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,856,701 B2
`
`4
`typically necessary to provide a So-called prediction error
`(PE) frame with each INTER-coded frame. As will be
`explained in greater detail later in the text, the prediction
`error frame represents the difference between a decoded
`version of the INTER-coded frame and the image content of
`the frame to be coded. More specifically, the prediction error
`frame comprises values that represent the difference
`between pixel values in the frame to be coded and corre
`sponding reconstructed pixel values formed on the basis of
`a predicted (INTER-coded) version of the frame in question.
`Consequently, the prediction error frame has characteristics
`Similar to a still image and block-based transform coding
`can be applied in order to reduce the amount of data (number
`of bits) required to represent it.
`Frames of a Video Sequence which are not compressed
`using motion-compensated prediction are referred to as
`INTRA-coded or I-frames. Generally, INTRA-coded frames
`are produced by applying block-based transform coding
`directly to the pixel values of the frame to be coded.
`Additionally, where possible, blocks of INTRA-coded
`frames are predicted from previously coded blocks within
`the same frame. This technique, known as INTRA
`prediction, has the effect of further reducing the amount of
`data required to represent an INTRA-coded frame.
`In order to illustrate principles of block-based transform
`coding and motion-compensated prediction in greater detail,
`reference will now be made to FIG. 1, which is a schematic
`of a generic hybrid Video encoder that employs a combina
`tion of INTRA- and INTER-coding to produce a compressed
`(encoded) video bit-stream. A corresponding decoder is
`illustrated in FIG. 2 and will be described later in the text.
`The video encoder 300 comprises an input 301 for receiv
`ing a digital Video signal from a camera or other video
`Source (not shown). It also comprises a transformation unit
`304 which is arranged to perform a block-based discrete
`cosine transform (DCT), a quantizer 306, an inverse quan
`tizer 308, an inverse transformation unit 310, arranged to
`perform an inverse block-based discrete cosine transform
`(IDCT), combiners 312 and 316, and a frame store 320. The
`encoder further comprises a motion estimator 330, a motion
`field coder 340 and a motion compensated predictor 350.
`Switches 302 and 314 are operated co-operatively by control
`manager 360 to Switch the encoder between an INTRA
`mode of video encoding and an INTER-mode of video
`encoding. The encoder 300 also comprises a video multiplex
`coder 370 which forms a single bit-stream from the various
`types of information produced by the encoder 300 for further
`transmission to a remote receiving terminal or, for example,
`for Storage on a mass Storage medium, Such as a computer
`hard drive (not shown).
`Encoder 300 operates as follows. Each frame of uncom
`pressed video provided from the video source to input 301
`is received and processed macroblock-by-macroblock, pref
`erably in raster-Scan order. When the encoding of a new
`Video Sequence Starts, the first frame of the Sequence is
`encoded as an INTRA-coded frame. Subsequently, the
`encoder is programmed to code each frame in INTER-coded
`format, unless one of the following conditions is met: 1) it
`is judged that the current frame being coded is So dissimilar
`from the reference frame used in its prediction that excessive
`prediction error information is produced; 2) a predefined
`INTRA frame repetition interval has expired; or 3) feedback
`is received from a receiving terminal indicating a request for
`a frame to be provided in INTRA-coded format.
`The occurrence of condition 1) is detected by monitoring
`the output of the combiner 316. The combiner 316 forms a
`difference between the current macroblock of the frame
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 19
`
`
`
`US 6,856,701 B2
`
`15
`
`40
`
`45
`
`25
`
`S
`being coded and its prediction, produced in the motion
`compensated prediction block 350. If a measure of this
`difference (for example a sum of absolute differences of
`pixel values) exceeds a predetermined threshold, the com
`biner 316 informs the control manager 360 via a control line
`319 and the control manager 360 operates the Switches 302
`and 314 via control line 313 So as to Switch the encoder 300
`into INTRA-coding mode. Occurrence of condition 2) is
`monitored by means of a timer or frame counter imple
`mented in the control manager 360, in such a way that if the
`timer expires, or the frame counter reaches a predetermined
`number of frames, the control manager 360 operates the
`Switches 302 and 314 via control line 313 to Switch the
`encoder into INTRA-coding mode. Condition3) is triggered
`if the control manager 360 receives a feedback Signal from,
`for example, a receiving terminal, via control line 321
`indicating that an INTRA frame refresh is required by the
`receiving terminal. Such a condition may arise, for example,
`if a previously transmitted frame is badly corrupted by
`interference during its transmission, rendering it impossible
`to decode at the receiver. In this situation, the receiving
`decoder issues a request for the next frame to be encoded in
`INTRA-coded format, thus re-initialising the coding
`Sequence.
`Operation of the encoder 300 in INTRA-coding mode will
`now be described. In INTRA-coding mode, the control
`manager 360 operates the Switch 302 to accept video input
`from input line 318. The video signal input is received
`macroblock by macroblock from input 301 via the input line
`318. As they are received, the blocks of luminance and
`chrominance values which make up the macroblock are
`passed to the DCT transformation block 304, which per
`forms a 2-dimensional discrete cosine transform on each
`block of values, producing a 2-dimensional array of DCT
`coefficients for each block. In a situation Such as that
`35
`described earlier, where each macroblock comprises four
`8x8 pixel blocks of luminance values and two spatially
`corresponding 8x8 pixel blocks of chrominance values,
`DCT transformation block 304 produces an 8x8 array of
`coefficient values for each block.
`The DCT coefficients for each block are passed to the
`quantizer 306, where they are quantized using a quantization
`parameter QP. Selection of the quantization parameter QP is
`controlled by the control manager 360 via control line 315.
`Quantization introduces a loSS of information, as the quan
`tized coefficients have a lower numerical precision than the
`coefficients originally generated by the DCT transformation
`block 304. This provides a further mechanism by which the
`amount of data required to represent each image of the Video
`sequence can be reduced. However, unlike the DCT
`50
`transformation, which is essentially lossleSS, the loss of
`information introduced by quantization causes an irrevers
`ible degradation in image quality. The greater the degree of
`quantization applied to the DCT coefficients, the greater the
`loSS of image quality.
`The quantized DCT coefficients for each block are passed
`from the quantizer 306 to the video multiplex coder 370, as
`indicated by line 325 in FIG. 1. The video multiplex coder
`370 orders the transform coefficients for each block using a
`ZigZag Scanning procedure. This operation converts the
`two-dimensional array of quantized transform coefficients
`into a one-dimensional array. Typical ZigZag Scanning
`orders, such as that shown in FIG. 3, order the coefficients
`approximately in ascending order of Spatial frequency. This
`also tends to order the coefficients according to their values,
`Such that coefficients positioned earlier in the one
`dimensional array are more likely to have larger absolute
`
`55
`
`60
`
`65
`
`6
`values than coefficients positioned later in the array. This is
`because lower spatial frequencies tend to have higher ampli
`tudes within the image blockS. Consequently, the last values
`in the one-dimensional array of quantized transform coef
`ficients are commonly Zeros.
`Run-Level Coding of DCT Transform Coefficients
`Typically, the video multiplex coder 370 represents each
`non-Zero quantized coefficient in the one dimensional array
`by two values, referred to as level and run. Level is the value
`of the quantized coefficient and run is the number of
`consecutive Zero-valued coefficients preceding the coeffi
`cient in question. The run and level values for a given
`coefficient are ordered Such that the level value precedes the
`asSociated run value. A level value equal to Zero is used to
`indicate that there are no more non-Zero coefficient values in
`the block. This O-level value is referred to as an EOB
`(end-of-block) symbol.
`Entropy Coding
`The run and level values are further compressed in the
`video multiplex coder 370 using entropy coding. Entropy
`coding is a lossleSS operation, which exploits the fact that
`Symbols within a data Set to be coded generally have
`different probabilities of occurrence. Therefore, instead of
`using a fixed number of bits to represent each Symbol, a
`variable number of bits is assigned such that symbols which
`are more likely to occur are represented by code-words
`having fewer bits. For this reason, entropy coding is often
`referred to as Variable Length Coding (VLC). Since certain
`values of levels and runs are more likely than other values
`to occur, entropy coding techniques can be used effectively
`to reduce the number of bits required to represent the run and
`level values. A number of different methods can be used to
`implement entropy coding. For example, entropy coding of
`the run and level parameters may be implemented by means
`of look-up tables which define the mapping between each
`possible Symbol in the data Set to be coded and its corre
`sponding variable length code. Such look-up tables are often
`defined by Statistical analysis of training material compris
`ing Symbols identical to those to be coded and having similar
`Statistical properties. An alternative technique, known as
`arithmetic coding, can also be used to convert the run and
`level values into variable length code-words. In arithmetic
`coding a group of Symbols, for example the run and level
`values for a block of quantized transform coefficients, are
`coded as a floating point decimal number.
`Once the run and level values have been entropy coded
`using an appropriate method, the Video multiplex coder
`further combines them with control information, also
`entropy coded using a variable length coding method appro
`priate for the kind of information in question, to form a
`Single compressed bit-stream of coded image information
`335.
`A locally decoded version of the macroblock is also
`formed in the encoder 300. This is done by passing the
`quantized transform coefficients for each block, output by
`quantizer 306, through inverse quantizer 308 and applying
`an inverse DCT transform in inverse transformation block
`310. In this way a reconstructed array of pixel values is
`constructed for each block of the macroblock. The resulting
`decoded image data is input to combiner 312. In INTRA
`coding mode, Switch 314 is Set So that the input to the
`combiner 312 via Switch 314 is zero. In this way, the
`operation performed by combiner 312 is equivalent to
`passing the decoded image data unaltered.
`AS Subsequent macroblocks of the current frame are
`received and undergo the previously described encoding and
`decoding steps in blocks 304, 306, 308, 310 and 312, a
`
`Unified Patents, LLC v. Elects. & Telecomm. Res. Inst., et al.
`
`Ex. 1021, p. 20
`
`
`
`US 6,856,701 B2
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`7
`decoded version of the INTRA-coded frame is built up in
`frame store 320. When the last macroblock of the current
`frame has been INTRA-coded and Subsequently decoded,
`the frame store 320 contains a completely decoded frame,
`available for use as a prediction reference frame in coding a
`Subsequently received video frame in INTER-coded format.
`Operation of the encoder 300 in INTER-coding mode will
`now be described. In INTER-coding mode, the control
`manager 360 operates Switch 302 to receive its input from
`line 317, which comprises the output of combiner 316. The
`combiner 316 receives the video input signal macroblock by
`macroblock from input 301. As combiner 316 receives the
`blocks of luminance and chrominance values which make up
`the macroblock, it forms corresponding blocks of prediction
`error information. The prediction error information repre
`Sents the difference between the block in question and its
`prediction, produced in the motion compensated prediction
`block 350. More specifically, the prediction error informa
`tion for each block of the macroblock comprises a two
`dimensional array of values, each of which represents the
`difference between a pixel value in the block of luminance
`or chrominance information being coded and a decoded
`pixel value obtained by forming a motion-compensated
`prediction for the block, according to the procedure
`described below. Thus, in a situation where each macroblock
`comprises four 8x8 pixel blocks of luminance values and
`two spatially corresponding 8x8 pixel blocks of chromi
`nance values, the prediction error information for the mac
`roblock similarly comprises four 8x8 blocks of luminance
`prediction error values and two Spatially corresponding 8x8
`blocks of chrominance prediction error values.
`The prediction error information for each block of the
`macroblock is passed to DCT transformation block 304,
`which performs a two-dimensional discrete cosine transform
`on each block of prediction error values to produce a
`two-dimensional array of DCT transform coefficients for
`each block. Thus, in a Situation where the prediction error
`information for each macroblock comprises four 8x8 blocks
`of luminance prediction error values and two spatially
`corresponding 8x8 blocks of chrominance prediction error
`values, DCT transformation block 304 produces an 8x8
`array of transform coefficient values for each prediction
`error block. The transform coefficients for each prediction
`error block are passed to quantizer 306 where they are
`quantized using a quantization parameter QP, in a manner
`analogous to that described above in connection with opera
`tion of the encoder in INTRA-coding mode. Again, Selection
`of the quantization parameter QP is controlled by the control
`manager 360 via control line 315.
`The quantized DCT coefficients representing the predic
`tion error information for each block of the macroblock are
`passed from quantizer 306 to video multiplex coder 370, as
`indicated by line 325 in FIG.1. As in INTRA-coding mode,
`the video multiplex coder 370 orders the transform coeffi
`cients for each predi