`Mehrotra et al.
`
`US006571016B1
`US 6,571,016 B1
`May 27, 2003
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54)
`
`(75)
`
`(73)
`
`(21)
`(22)
`
`(63)
`
`(51)
`(52)
`(58)
`
`(56)
`
`INTRA COMPRESSION OF PIXEL BLOCKS
`USING PREDICTED MEAN
`
`Inventors: Sanjeev Mehrotra, Kirkland, WA (US);
`Albert S. Wang, Kirkland, WA (US)
`Assignee: Microsoft Corporation, Redmond, WA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`Notice:
`
`Appl. No.: 08/850,957
`Filed:
`May 5, 1997
`Related U.S. Application Data
`
`Continuation of application No. 08/623,299, filed on Mar.
`28, 1996, now Pat. No. 6,215,910.
`Int. Cl." ............................. G06K 9/36; G06K 9/46
`U.S. Cl. ................................... 382/236; 375/240.13
`Field of Search ................................. 382/236, 238,
`382/248, 253; 375/240.13, 240.16, 240.22,
`240.24
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,144,425 A 9/1992 Joseph ....................... 358/133
`5,155,594 A * 10/1992 Bernstein et al. ...... 375/240.14
`5,227,878 A * 7/1993 Puri et al. .............. 375/240.15
`5,351,095 A 9/1994 Kerdramvat ................. 348/699
`5,414,469 A 5/1995 Gonzales et al. ........... 348/408
`5,418,568 A 5/1995 Keith ......................... 348/390
`5,453,801 A 9/1995 Kim .......
`... 348/699
`5,473,379 A 12/1995 Horne ....
`.... 348/416
`5,502,492 A 3/1996 Jung .........
`.... 348/413
`5,512,952 A 4/1996 Iwamura ..................... 348/416
`5,521,988 A 5/1996 Li et al. ..................... 382/248
`
`5,537,155
`5,557,341
`5,560,038
`5,576,767
`5,585,852
`5,596,659
`5,604,867
`5,623,312
`5,623,313
`5,673,265
`5,694,173
`
`::
`
`7/1996
`9/1996
`9/1996
`11/1996
`12/1996
`1/1997
`2/1997
`4/1997
`4/1997
`9/1997
`12/1997
`
`O’Connell et al. ......... 348/699
`Weiss et al. ................ 348/699
`Haddock .................... 395/800
`Lee et al. ................... 348/413
`Agarwal ..................... 348/398
`Normile et al. ............. 382/253
`Harwood ............... 395/200.13
`Yan et al. ................... 348/416
`Naveen ...................... 348/416
`Gupta et al. ................ 370/432
`Kimura et al. .............. 348/423
`
`
`
`OTHER PUBLICATIONS
`“Video Coding for Low Bitrate Communication”, ITU-T,
`Draft H.263: Line Transmission of Non–Telephone Signals,
`Int’l Telecommunication Union, (May 2, 1996).
`Chaddha, N., et al., “Hierarchical Vector Quantization of
`Perceptually Weighted Block Transforms”, IEEE, pp. 3–12,
`(1995).
`* cited by examiner
`Primary Examiner—Phuoc Tran
`(74) Attorney, Agent, or Firm—Lee & Hayes, PLLC
`(57)
`ABSTRACT
`An apparatus and method for encoding video frames is
`provided. The video frames are divided into blocks for
`encoding. Encoding of the video blocks utilizes motion
`detection, motion estimation and adaptive compression, to
`obtain the desired compression for a particular bit rate.
`Adaptive compression includes intra compression (without
`regard to other frames) and inter compression (with regard
`to other frames). Intra compression, inter compression with
`motion detection, and inter compression with motion esti
`mation are performed on a block by block basis, as needed.
`Segmentation is provided to compare encoding of a block
`with encoding of its sub-blocks, and to select the best block
`size for encoding.
`
`50 Claims, 17 Drawing Sheets
`
`corresponding blocks
`
`8:3
`
`ls distance > threshold?
`
`-
`
`18
`
`Create header for Motion
`Detection (Header=0)
`do not Se?i? b|CCK
`
`|
`
`
`
`Increment N
`
`19 || compression
`
`create header for motion
`Detection (Header=1.)
`
`Perfºrm
`inter
`compressiºn
`Gr
`9|ack
`(Figure 7)
`select best compression
`for block
`
`Google Inc.
`GOOG 1038
`IPR2016-00212
`
`0001
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 1 of 17
`
`US 6,571,016 B1
`
`F/G, 7 (Prior art)
`
`
`
`108 -
`
`100
`
`
`
`112
`
`|
`||||
`
`AE/G. 2 (Prior art)
`
`
`
`
`
`
`
`202
`
`Minimize the
`amount of data
`
`Transform
`
`
`
`
`
`
`
`Skip
`
`y 204
`/
`
`
`
`, 206
`
`Minimize the
`precision of the data
`
`Minimize the # of
`bits per data item
`
`Code
`
`Repeat
`
`0002
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 2 of 17
`
`US 6,571,016 B1
`
`F/G, 3 (Related art)
`
`302
`
`_ 304
`
`Original
`Data
`
`…”
`
`
`
`- 314
`
`Playback |__
`346
`Device
`
`A/G 42
`
`
`
`0003
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 3 of 17
`
`US 6,571,016 B1
`
`4m
`
`408
`
`0004
`
`0004
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 4 of 17
`
`US 6,571,016 B1
`
`A/G 5
`
`.
`
`502
`
`egin
`
`…
`
`, 500
`/
`
`Get initial frame N
`
`Convert frame N to YUV
`(Figures 10, 11)
`
`_ 504
`~~~
`
`506
`
`Encode macroblocks in frame N
`using adaptive
`intra-compression
`
`508
`…~
`
`Decode frame N
`
`Get frame N4-1
`
`---
`
`…”
`
`Convert frame N-H1 to YUV ...”
`
`Perform Motion Detection
`on frame N-H 1
`
`Perform Motion Estimation
`On frame N-F1
`(Figure 6)
`
`510
`
`512
`
`514
`
`519
`
`|
`
`516
`
`} 518
`|
`
`- - - - - - - - - - - - - - - - - - - - - |
`
`Encode blocks in frame N+1
`(Figure 9)
`
`52O
`
`…” 522
`
`524
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`0005
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 5 of 17
`Sheet 5 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`F/G, 6.9
`@a
`
`I
`
`651
`651
`
`, 652
`652
`
`
`
`F/O 7,
`Z0
`
`
`
`/ 751
`/ 754
`
`/,r 752
`752
`
`
`
`0006
`
`0006
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 6 of 17
`
`US 6,571,016 B1
`
`, 600
`
`602
`
`Obtain a block N in a spatial |__
`604
`location
`
`Calculate distance between
`corresponding blocks
`
`… 606
`
`---
`
`|s distance > threshold?
`
`
`
`Create header for Motion
`Detection (Header=1),
`
`612
`
`Create header for Motion
`Detection (Header=0),
`do not send block
`
`
`
`Perform
`
`Perform
`
`618
`
`anº Orl
`On
`block
`(Figure 8)
`
`
`
`
`
`CO º Or)
`Orn
`619
`biock
`(Figure 7)
`
`Select best compression
`for block
`
`620
`
`.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Increment N
`
`More blocks to encode?
`
`, 624
`…”
`
`
`
`
`
`0007
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 7 of 17
`
`US 6,571,016 B1
`
`A/G 7,
`
`-
`Perform inter compression
`
`_ 618
`...”
`
`-
`
`Find best matching macroblock
`
`-
`
`Calculate motion vector
`
`-
`
`-
`
`Obtain residual
`
`Code residual
`
`-
`
`... 702
`
`- 704
`
`- 706
`
`708
`
`710
`
`712
`
`| Perform adaptive compression
`on residual
`
`Perform no residual inter
`compression on macroblock
`
`716
`
`Perform reconstruction
`of macroblock
`
`0008
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 8 of 17
`Sheet 8 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`F/G. &
`FIG, 8,52
`
`
`
`800
`
`810
`
`0009
`
`0009
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 9 of 17
`Sheet 9 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`A/G &A
`FIG, 2%
`
`
`
`, 820
`820
`
`0010
`
`0010
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 10 of 17
`Sheet 10 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`F/G, 9.
`902
`
`F/G, 9%
`FIG, 9%»
`
`
`
`940
`
`3
`b
`
`__ 920
`
` ,,/— 920
`--3'(Q—c-CD0.00'0J
`
`930 1
`930 < .
`
`C
`d
`
`e
`f
`9
`h
`i
`
`0011
`
`0011
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 11 of 17
`
`US 6,571,016 B1
`
`A/G 9a
`
`F/G 9/
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`000
`
`3
`
`1
`
`goto 110 2nd
`stage table
`goto 111 2nd
`Stage table
`
`OO
`O1
`10
`11
`
`C
`C
`C
`e
`
`1
`1
`2
`2
`
`000
`
`f
`
`1
`
`960
`
`0012
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 12 of 17
`
`US 6,571,016 B1
`
`F/G, J (),
`
`
`
`
`
`Obtain pixel values
`from codebook
`
`
`
`Define noise to be
`added to pixel values
`
`1004
`
`1006
`
`
`
`
`
`Add noise to pixel values
`
`
`
`1010
`
`--
`
`Clip pixel values
`
`
`
`Reduce pixel values |
`
`1012
`
`
`
`
`
`
`
`
`
`
`
`Construct RGB table
`according to
`display format
`
`1014
`
`Convert YUV to
`RGB display format
`
`- 1016
`...~~~
`
`
`
`1020
`
`1022
`
`|| Obtain YUV pixel values
`from YUV codebook
`
`
`
`1024 s
`
`Create and Concatenate
`YUV Word
`
`
`
`1026
`
`Look-up RGB value
`for display
`
`0013
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 13 of 17
`
`US 6,571,016 B1
`
`A/G //
`
`1102,
`\
`*
`.
`
`1104
`
`*
`\
`
`.
`
`1100
`/ /
`.
`
`110s,
`W
`\
`w
`
`Y-high
`
`Y-|OW
`
`U-high
`
`U-low
`
`V-high
`
`V-low
`
`
`
`0014
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 14 of 17
`
`US 6,571,016 B1
`
`F/G, 7%
`
`Encode 8x8 block
`
`~~~
`
`Encode two segmented
`8x4 blocks
`
`Calculate distortion (D1) and
`rate (R1) of 8x8 block
`
`Calculate sum (D2) of
`distortions and sum (R2) of
`rates of 8x4 blocks
`
`1202
`
`1204
`
`1206
`
`1208
`
`Compare D1 +XR1 to
`D2 + AR2, where
`X is a constant
`
`_ 1210
`
`
`
`1214
`
`Encode block as two 8x4 blocks
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`0015
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 15 of 17
`Sheet 15 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`F/G, 7%
`FEE, 17.21»
`
`7 1250
`
`F/G, 7%
`FIG, 17.20
`
`
`
`1288
`1288
`
`1298
`
`1298
`
`0016
`
`0016
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 27, 2003
`May 27, 2003
`
`Sheet 16 of 17
`Sheet 16 of 17
`
`US 6,571,016 B1
`US 6,571,016 B1
`
`F/G, 72/
`/FICE}, 112%]
`
`
`
`1252
`
` \
`
`\
`\ 1270
`‘- 1268
`\ 1268 W 1270
`
`0017
`
`0017
`
`
`
`U.S. Patent
`
`May 27, 2003
`
`Sheet 17 of 17
`
`US 6,571,016 B1
`
`A'/G 73.
`
`1300
`
``
`
`1302
`
`“,
`
`Get block N
`
`1304 s,
`
`--
`
`Read block header
`
`1306
`
`Was block
`encoded?
`
`
`
`1332
`~
`Perform YUV to RGB
`transform
`
`Increment N
`
`
`
`~ 1334
`
`
`
`
`
`all blocks
`decoded?
`
`_ 1310
`1308
`Calculate mean from
`intra
`decoded adjacent blocks
`
`inter or
`intra?
`
`1318
`
`Display frame
`
`Y ~
`
`1338
`
`Read tree and indices |
`
`- 1312
`
`inter
`Read motion Vector
`
`1320
`
`Decode residual
`
`1314
`
`
`
`Add to mean
`
`,”
`
`
`
`
`
`|s residual
`encoded?
`
`Read tree and indices
`
`1326
`
`1328
`
`
`
`1324
`
`Copy block from
`previous reconstructed
`frame, offset by the
`motion vector
`
`Decode residual
`
`_ 1330
`Add to previous block
`With motion Vector offset
`
`
`
`0018
`
`
`
`1
`INTRA COMPRESSION OF PIXEL BLOCKS
`USING PREDICTED MEAN
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`This application is related to U.S. Patent Applications:
`Ser. No. 08/625,650, filed Mar. 29, 1996, entitled “TABLE
`BASED LOW-LEVEL IMAGE CLASSIFICATION AND
`COMPRESSION SYSTEM" (now U.S. Pat. No. 6,404,
`923); Ser. No. 08/714,447, filed Sep. 16, 1996, entitled
`10
`“MULTIMEDIA COMPRESSION SYSTEM WITH ADDI
`TIVE TEMPORAL LAYERS" (pending); Ser. No. 08/818,
`805, entitled “METHOD AND APPARATUS FOR IMPLE
`MENTING MOTION DETECTION IN VIDEO
`COMPRESSION” (abandoned); Ser. No. 08/819,507
`15
`entitled “DIGITAL VIDEO SIGNAL ENCODER AND
`ENCODING METHOD” (now U.S. Pat. No. 6,118,817);
`Ser. No. 08/818,804, entitled “PRODUCTION OF A
`VIDEO STREAM WITH SYNCHRONIZED ANNOTA
`TIONS OVERACOMPUTER NETWORK" (now U.S. Pat.
`20
`No. 6,006,241); Ser. No. 08/819,586, entitled “METHODS
`AND APPARATUS FOR IMPLEMENTING CONTROL
`FUNCTIONS IN A STREAMED VIDEO DISPLAY SYS
`TEM” (now U.S. Pat. No. 6,014,706); Ser. No. 08/818,769,
`entitled “METHODS AND APPARATUS FOR AUTO
`25
`MATICALLY DETECTING PROTOCOLS IN A COM
`PUTER NETWORK" (now U.S. Pat. No. 5,999,979); Ser.
`No. 08/818,127, entitled “DYNAMIC BANDWIDTH
`SELECTION FOR EPFICIENT TRANSMISSION OF
`MULTIMEDIA STREAMS IN A COMPUTER NET
`30
`WORK” (now U.S. Pat. No. 6,292,834); Ser. No. 08/819,
`585, entitled “STREAMING AND DISPLAYING AVIDEO
`STREAM WITH SYNCHRONIZED ANNOTATIONS
`OVER A COMPUTER NETWORK" (now U.S. Pat. No.
`6,173,317); Ser. No. 08/818,644, entitled SELECTIVE
`RETRANSMISSION FOR EPFICIENT AND RELIABLE
`STREAMING OF MULTIMEDIA PACKETS IN A COM
`PUTER NETWORK” (now U.S. Pat. No. 5,918,002); Ser.
`No. 08/819,579, U.S. Patent Application Publication No.
`US-2001-0017941-A1, entitled METHOD AND APPARA
`40
`TUS FOR TABLE-BASED COMPRESSION WITH
`EMBEDDED CODING” (abandoned); Ser. No. 08/822,156,
`entitled “METHOD AND APPARATUS FOR COMMUNI
`CATION MEDIA COMMANDS AND DATA USING THE
`HTTP PROTOCOL” (now U.S. Pat. No. 6,128,653); Ser.
`45
`No. 08/818,826, entitled “DIGITAL VIDEO SIGNAL
`ENCODER AND ENCODING METHOD" (now U.S. Pat.
`No. 5,903,673); provisional U.S. Patent Applications: Serial
`No. 60/036,661, entitled “VCR-LIKE FUNCTIONS FOR
`RENDERING VIDEO ON DEMAND (VOD)”; Serial No.
`50
`60/036,662, entitled “METHODS AND APPARTUS FOR
`AUTODETECTING PROTOCOLS IN A COMPUTER
`NETWORK"; which are all incorporated herein by refer
`ence. U.S. patent application Ser. No. 08/623,299, filed Mar.
`28, 1996, entitled “TABLE-BASED COMPRESSION
`55
`WITH EMBEDDED CODING” (now U.S. Pat. No. 6,215,
`910)is incorporated herein by reference.
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`The present invention relates to a method and apparatus
`for compression of multimedia data. More specifically, the
`present invention relates to a method and apparatus for
`predictive compression of video frames.
`2. Description of the Related Art
`The creation of pictures or images has been a human
`activity since the beginning of humanity. However, until
`
`35
`
`60
`
`65
`
`US 6,571,016 B1
`
`5
`
`2
`recent history viewing of an image required the viewer to be
`physically present at the image. This was geographically
`cumbersome. Photography, both still and motion, broke this
`geographic constraint by allowing pictures to be captured
`and transported independent of the physical images they
`represented. Television enhanced transmission of images, by
`sending images, recorded or live, to any geographic location
`capable of receiving a radio signal. But, for the most part,
`viewers of television can only view images that are sched
`uled for transmission, rather than selecting images at will.
`With the development of computers, and more specifi
`cally computers that are linked across a network, images
`stored on one computer may be demanded by a viewer, and
`almost instantaneously provided to the viewer’s computer
`over the computer network. One computer network that is
`increasingly being used is the Internet, the well-known
`international computer network that links various military,
`government, education, nonprofit, industrial and financial
`institutions, commercial enterprises, and individuals.
`Images are typically of two types: 1) single pictures; or 2)
`moving pictures. Single pictures include photographs, com
`puter art, faxes and web pages. Moving pictures typically
`include a number of single images or frames organized into
`a particular sequence. Within a computer network, images
`are captured and stored on one computer, and then trans
`mitted over the network to another computer for viewing. An
`example of this is provided in FIG. 1, to which reference is
`now made.
`FIG. 1 illustrates a computer system 100 that includes a
`server 102 connected to a number of mass storage devices
`104. The mass storage devices 104 are used to store a
`number of video frames 120. The video frames 120 could be
`still images, or could be combined into sequences to create
`moving pictures, as described above. The sequences reside
`on the mass storage devices 104, and upon request, may be
`transmitted by the server 102 to other computers 108 via a
`network 106. In addition, the video frames 120 may be
`transferred to remote computers, such as the computer 112,
`via a network 116, using a router 110 and/or a modem 114.
`One skilled in the art should appreciate that the network 116
`could be a dedicated connection, or a dial-up connection,
`and could utilize any of a number of network protocols such
`as TCP/IP or Client/Server configurations.
`In operation, a user sitting at any of the computers 108,
`112 would request video frames 120 from the server 102,
`and the server would retrieve the video frames 120 from the
`mass storage devices 104, and transmit the frames 120 over
`the network 106. Upon receipt of the video frames 120, the
`computers 108, 112 would display the images for the
`requester.
`It should be appreciated that the computers 108, 112 may
`be positioned physically close to the server 102, or may be
`thousands of miles away. The computers 108, 112 may be
`connected to the server 102 via a direct LAN connection
`such as Ethernet or Token Ring, or may utilize plain old
`telephone service (POTS), ISDN or ADSL, depending on
`the availability of each of these services, their cost, and the
`performance required by the end user. As is typically of
`computer equipment and services, higher performance
`means more COSt.
`In most cases, the amount of data required to represent a
`video frame, or more specifically a sequence of video frames
`120 is significant. For example, a color image or frame is
`typically represented by a matrix of individual dots or pixels,
`each having a particular color defined by a combination of
`red, green and blue intensities (RGB). To create a palette of
`
`0019
`
`
`
`US 6,571,016 B1
`
`3
`
`16 million colors (i.e., true color), each of the RGB inten-
`sities are represented by an 8-bit value. So, for each pixel,
`24-bits are required to define a pixel’s color. A typical
`computer monitor has a resolution of 1024 pixels (across) by
`768 pixels (down). So, to create a full screen image for a
`computer requires 1024><768><24 bits=18,874,368 bits, or
`2,359,296 bytes of data to be stored. And that is just for one
`image.
`If a moving picture is to be displayed, a sequence of
`images are grouped, and displayed one after another, at a rate
`of approximately 30 frames per second. Thus, a 1 second,
`256 color, full screen movie could require as much as 60
`megabytes of data storage. With present technology, even
`very expensive storage systems, and high speed networks
`would be overwhelmed if alternatives were not provided. By
`way of example, as the resolution and the frame rate
`requirements of a video increase, the amount of data that is
`necessary to describe the video also increases.
`One alternative to reducing the amount of data required to
`represent images or moving pictures is to simply reduce the
`size of frames that are transmitted and displayed. One
`popular frame size is 320 pixels in width and 240 pixels in
`height, or 320x240. Thus, a 256 color frame of this size
`requires 320><240><24=1,843,200 bits, or 230 kilobytes of
`data. This is significantly less (1/10”“) than what is required
`for a full screen image. However, as frames are combined
`into moving pictures,
`the amount of data that must be
`transmitted is still significant.
`An additional solution to reducing the amount of storage
`space required for video frames involves compressing the
`data. The extent to which data is compressed is typically
`measured in terms of a compression ratio or a bit rate. The
`compression ratio is generally the number of bits of an input
`value divided by the number of bits in the representation of
`that input value in compressed code. Higher compression
`ratios are preferred over lower compression ratios. The bit
`rate is the number of bits per second of compressed data
`required to properly represent a corresponding input value.
`There are three basic methods involved in any data
`compression scheme: 1) transformation, 2) reduced preci-
`sion (quantization), and 3) minimization of number of bits
`(encoding). Each of these methods may be used
`independently, or may be combined with the other methods
`to obtain optimum compression. Although the number of
`scheme combinations is large,
`typically compression is
`accomplished by a sequential process of transformation,
`precision reduction, and coding. Coding is always the final
`stage of the process, but there are sometimes several trans-
`formation and precision reduction iterations. This process is
`summarized in FIG. 2, to which attention is now directed.
`In FIG. 2, a block 202 is shown to illustrate the step of
`transformation, a block 204 is shown to illustrate the step of
`quantization, and a block 206 is shown to illustrate the step
`of coding. The transformation block 202 transforms a data
`set into another equivalent data set that is in some way
`smaller than the original. Some transformations reduce the
`number of data items in a set. Other transformations reduce
`the numerical size of data items that allow them to be
`
`represented with fewer binary digits.
`To reduce the number of data items in a set, methods are
`used that remove redundant
`information within the set.
`
`Examples of such methods include Run-Length-Encoding
`(RLE) and LZW encoding. RLE is a pattern-recognition
`scheme that searches for the repetition of identical data
`values in a list. The data set can be compressed by replacing
`the repetitive sequence with a single data value and a length
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`value. Compression ratios obtainable from RLE encoding
`schemes vary depending on the type of data to be encoded,
`but generally range from 2:1 up to 5:1. LZW encoding
`replaces repeated sequences within a data set with particular
`codes that are smaller than the data they represent. Code-
`books are used during encoding and decoding to transform
`the data set back and forth from raw data to encoded data.
`
`Compression ratios for video images range from 2:1 to 9:1.
`Transformations that reduce the size of individual data
`
`items within a data set includes Differencing. Differencing is
`a scheme that attempts to reduce the size of individual data
`values within a data set by storing the difference between
`pixels values, rather than the actual data values for each
`pixel. In many cases the difference value is much smaller in
`magnitude than the original data value, and thus requires a
`smaller data space for storage.
`Other transformation schemes exist to transform a set of
`
`data values from one system of measurement into another,
`where the properties of the new data set facilitate the data’s
`compression. One such scheme called colorspace conver-
`sion transforms the RGB pixel values into luminance Y, and
`chrominance Cb and C, values. This is referred to as
`RGB/YUV conversion. Less important values, such as the
`C, component may be ignored without significantly affect-
`ing the image perceived by a viewer.
`Another scheme that transforms a set of data values from
`
`one system of measurement into another is the Discrete-
`Cosine-Transform. The DCT transforms a block of original
`data that typically represents color intensity (YUV) into a
`new set of values that represent cosine frequencies over the
`original block of data. Lower frequencies are stored in an
`upper left portion of the data block with higher frequencies
`stored in the rest of the block. If higher frequency compo-
`nents are ignored, an entire block of data may be represented
`by just a few data values in a block.
`It should be appreciated that each of the schemes
`described above are well known in the art, and may be
`combined, for a particular frame of data, to achieve maxi-
`mum compression. However, each of these schemes are
`applied to a single video frame, called intra-frame
`compression, which is independent of other video frames.
`For
`full motion video,
`including multicast video,
`teleconferencing, and interactive video, compressing each
`video frame separately is not sufficient, because of the large
`number of frames in even a short video sequence. Further
`compression may be achieved by taking advantage of the
`similarities between frames. In many instances, the differ-
`ence between one frame and the next is small because of the
`short
`time interval between frames. These schemes are
`
`referred to as inter-frame compression.
`One simple scheme stores only the pixels that actually
`change from one frame of the video sequence to the next.
`Said in a technical way, the scheme is to store only the pixels
`that produce a nonzero difference when subtracted from
`their corresponding pixels in a previous frame. Thus, rather
`than having to transmit all of the pixel values in a video
`block, only those pixels that have changed need to be
`transmitted.
`
`Another approach to video compression is to calculate the
`differences between corresponding pixels in consecutive
`frames and then encode the differences instead of the origi-
`nal values. This is called motion compensation. But,
`in
`motion pictures, pixel values often shift their spatial location
`from one frame to the next. To locate shifted pixels, a
`number of pixel values are grouped together to form a block.
`Then, a block within a present frame is compared to blocks
`
`0020
`
`0020
`
`
`
`US 6,571,016 B1
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`5
`in a previous frame to determine an offset such that all of the
`pixel differences are minimized. This is called motion esti
`mation. An offset is typically represented as a pair of
`numbers that specify a shift in the horizontal and vertical
`directions. This is referred to as a motion vector. If a motion
`vector can be determined for a particular block, that block
`may be encoded simply by supplying the motion vector,
`rather than by encoding the entire block.
`With each of the above transformation schemes, reduced
`precision may be used to further compress data, as shown by
`block 204. As was mentioned above, one of the chrominance
`values, C, could be ignored without significantly affecting
`the quality of the image. In addition, after performing a DCT
`transform, higher frequency components can be ignored.
`Furthermore, by calculating differences between pixel
`values, and ignoring minor differences, further compression
`may be achieved. This illustrates the repetition between the
`transformation block 202 and quantization block 204 of
`FIG. 2.
`The third block shown in FIG. 2 is the Code block 206.
`This block encodes a data set to minimize the # of bits
`required per data item. The coding process assigns a unique
`code value to data items in a set. One coding scheme that is
`used in compressing video frames is Huffman coding. Huff
`man codes assign a variable-length code to each possible
`data item, such that the values that occur most often in the
`data set have smaller length codes while the values that
`occur less frequently have longer-length codes. Huffman
`coding creates a tree structure where the leaf nodes are the
`original probabilities associated with each data value from
`the data set. Each branch in the tree is labeled with a one or
`a zero. The Huffman code assigned to each original data
`value is the set of labels along a path from the root node to
`the associated leaf node.
`The above provides a general overview of a number of
`different compression schemes for compressing video
`frames prior to transmitting the frames over a network to a
`remote computer. It should be appreciated that specific
`implementation of any of these schemes, or more accurately,
`a combination of particular ones of these schemes, requires
`significant preprocessing (encoding) of the video frames
`prior to transmission, as well as post processing (decoding)
`of the frames.
`As the complexity that is associated with compression and
`decompression increases, the efficiency with which video
`frames may be encoded and decoded drops. Stated another
`way, higher compression ratios require more processing, and
`take longer to encode/decode than do lower compression
`ratios. However, higher compression ratios allow more data
`to be delivered over a network in less time. Therefore, a
`tradeoff is generally made between obtaining a particular
`compression ratio, and obtaining a satisfactory bit rate of
`transfer. If a high compression ratio takes too long to decode,
`viewed images will appear choppy or disjunct. If an inad
`equate bit rate is obtained, a viewer will be kept waiting for
`the image, or the image will replay in slow motion.
`SUMMARY OF THE INVENTION
`What is needed is an apparatus and method that improves
`the efficiency of encoding/decoding video frames while
`maintaining a desired bit rate for a given resolution. More
`specifically, what is needed is an apparatus and method that
`incorporates several forms of motion estimation, and selects
`the best form for each block of data to be encoded.
`Accordingly, it is a feature of the present invention to
`provide a method to encode a video frame that is transmitted
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`over a communications medium. The method includes: 1)
`obtaining a video frame; 2) separating the frame into blocks;
`3) encoding a plurality of blocks using inter compression; 4)
`encoding the plurality of blocks using predictive intra com
`pression; and 5) selecting better block compression between
`the inter and predictive intra compression; wherein the steps
`of encoding the plurality of blocks is performed on a block
`by block basis, to provide optimum compression of the
`video frame for a given bit rate.
`DESCRIPTION OF THE DRAWINGS
`These and other objects, features, and advantages of the
`present invention will become better understood with regard
`to the following description, and accompanying drawings
`where:
`FIG. 1 is block diagram of a prior art computer network
`for encoding, transmission and decoding of video images.
`FIG. 2 is a prior art block diagram of encoding method
`ology for video images.
`FIG. 3 is a related art block diagram illustrating encoding,
`delivery and decoding of video images.
`FIGS. 4a and 4b illustrate a video frame divided into a
`number of macroblocks, and a macroblock divided into a
`number of blocks, each block having a number of different
`pixels.
`FIG. 5 is a process flow chart illustrating a process of
`encoding a video frame according to the present invention.
`FIG. 6a illustrates a comparison of two macroblocks in
`the same spatial location, but in two separate video frames.
`FIG. 6b is a flow chart illustrating motion compensation
`according to the present invention.
`FIG. 7a illustrates a comparison of two macroblocks in
`different spatial locations, in two separate video frames.
`FIG. 7b is a flow chart illustrating motion detection
`according to the present invention.
`FIGS. 8a and 8b illustrate predicted mean intra block
`compression according to the present invention.
`FIG. 9a is a tree representation of a classic Huffman
`decoder.
`FIG. 9b is a table used with a two-stage Huffman decoder
`according to the present invention.
`FIG. 9c is a table of a first stage decoding table used with
`a two stage Huffman decoder according to the present
`invention.
`FIG. 9d is a table of a second stage decoding table used
`with a two stage Huffman decoder according to the present
`invention.
`FIG. 9e is a table illustrating another second stage decod
`ing table used with a two stage Huffman decoder according
`to the present invention.
`FIG. 10a is a process flow diagram that illustrates the
`steps associated with preprocessing a codebook that is used
`with a color transformation of bits encoded using motion
`compensation according to the present invention.
`FIG. 10b is a process flow diagram that illustrates the
`steps associated with performing a color transformation on
`bits encoded using motion compensation according to the
`present invention.
`FIG. 11 is a block diagram of a color transformation
`performed on bits encoded using motion compensation
`according to the present invention.
`FIG. 12a is a process flow diagram illustrating a segmen
`tation process for blocks in accordance with the present
`invention.
`
`0021
`
`
`
`7
`FIG. 12B is a block diagram of a macroblock that is
`divided into smaller blocks according to the segmentation
`process of FIG. 12a.
`FIG. 12c is an encoding map tree illustrating segmenta
`tion of a block according to the segmentation process of
`FIG. 12a.
`FIG. 12d is an block diagram of a block that is segmented
`according to the segmentation process of FIG. 12a, and
`represented by the encoding map tree of FIG. 12c.
`FIG. 13 is a process flow diagram illustrating the steps of
`decoding blocks in a frame that has been encoded and
`transmitted by the motion compensation, segmentation and
`encoding methods of the present invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`The present invention provides a complex but efficient
`method and apparatus for compressing/decompressing video
`information that is distributed over a computer network.
`However, before discussing the detailed portions of the
`invention, a general overview will be provided.
`Referring to FIG. 3, a block diagram 300 of a data
`encoder/decoder system is shown. The system 300 includes
`original data 302, or data that is generally unencoded. The
`original data 302 may be a sequence of video frames, as
`described above, having a resolution of 320x240 pixels, and
`a color palette of 256 colors. The original data 302 is
`provided to an encoder 304 that encodes or compresses the
`data 302, and provides encoded data 306 as output. Although
`any suitable compression method may be used compress the
`original data 302, a preferred method includes that described
`in U.S. patent application Ser. No. 08/623,299 referenced
`above.
`The encoded data 306 is provided to a network delivery
`system 308 that accepts the encoded data 306 and generates
`as output encoded data 310. Typically, the network delivery
`system 308 is used to send encoded data 306 from one
`computer system on the network to another computer. The
`channels of communication used by the network delivery
`syste