`Normile et al.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US005461679A
`Patent Number:
`Date of Patent:
`
`[11]
`
`[45]
`
`5,461,679
`Oct. 24, 1995
`
`[54] METHOD AND APPARATUS FOR
`ENCODING/DECODING IMAGE DATA
`
`[75]
`
`Inventors: James 0. Normile, Sunnyvale; Chia L.
`Yeh, Saratoga; Daniel W. Wright,
`Sunnyvale; Ke-Chiang Chu, Saratoga,
`all of Calif.
`
`[73] Assignee: Apple Computer, Inc., Cupertino,
`Calif.
`
`[21] Appl. No.: 62,067
`
`[22] Filed:
`
`May 14, 1993
`
`Related U.S. Application Data
`
`[63] Continuation of Ser. No. 705,284, May 24, 1991, Pat. No.
`5,212,742.
`Int. Cl.6
`....................................................... G06K 9/36
`[51]
`[52] U.S. Cl . .......................... 382/304; 382/305; 395/650;
`395/163; 395/474
`[58] Field of Search .................................. 382/56, 27, 41,
`382/49; 340/752; 364/133; 395/114, 115,
`425
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,484,349
`4,665,556
`4,684,997
`5,070,531
`
`1111984 McCubbrey .............................. 382/49
`511987 Fukushima et al ....................... 382/49
`811987 Romeo et al ............................. 382/56
`1211991 Schuerman et al ....................... 382/27
`
`Primary Examiner-Yon J. Couso
`Attorney, Agent, or Finn-Blakely, Sokoloff, Taylor & Zaf(cid:173)
`man
`
`[57]
`
`ABSTRACT
`
`An apparatus and method for processing video data for
`compression/decompression in real-time. The apparatus
`comprises a plurality of compute modules, in a preferred
`embodiment, for a total of four compute modules coupled in
`parallel. Each of the compute modules has a processor, dual
`port memory, scratch-pad memory, and an arbitration
`mechanism. A first bus couples the compute modules and a
`host processor. Lastly, the device comprises a shared
`memory which is coupled to the host processor and to the
`compute modules with a second bus. The method handles
`assigning portions of the image for each of the processors to
`operate upon.
`
`4,174,514 11/1979 Sternberg .................................. 382/49
`
`15 Claims, 11 Drawing Sheets
`
`D
`
`Dis !av ill
`
`Frame
`Buffer
`427
`
`Display
`Controller
`426
`
`Host
`
`410
`
`Bus
`
`:----.. ~- - -- - - - - - - - - - - - - - - -1
`
`1
`I
`I
`I
`
`32
`
`Module
`0
`
`Module
`I
`
`Module
`2
`
`Module
`3
`
`I
`I
`I
`I
`I
`I
`I
`I
`4111
`
`32
`
`~:
`
`32
`
`420
`
`I
`I
`I
`Shared
`I
`Memory
`2MB >-~~~~~~~~ I
`I
`I
`------- ----------------------~
`
`Frame
`Buffer
`430
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 1 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 1of11
`
`5,461,679
`
`/100
`
`DCT
`
`Quantizer
`
`1Ql
`
`102
`
`Variable
`Length
`Coding
`103
`
`Multi-
`plexing
`.1lli
`
`Inverse
`Q~~
`N8_
`
`Inverse
`DCT
`lQ2
`
`Rate
`Control
`ill
`
`Buffer
`
`ill
`
`Intra/
`Inter
`
`lQQ
`
`Motion
`Estimation
`105
`
`Frame
`Memory
`lQ1
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1
`
`:
`I
`I
`I
`1
`
`1-------------------------------y ____ I
`r-------------,
`Rate Control
`I
`I
`I
`I
`I
`I
`I
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`In ut Ima e
`I
`I
`110 I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`:
`I
`I
`I
`I
`
`L_______________________________ _ ___ J
`
`Side Information
`
`Figure 1 (Prior Art)
`
`Compressed Video Out
`
`120
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 2 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 2of11
`
`5,461,679
`
`! __________________ !_ ___________ _
`
`200
`
`Compressed I
`Video In ut I
`
`210
`
`Variable
`Inverse
`Inverse
`Length
`Demulti-
`plexing .,.__,..,.Decoding1---91Quantizer..__,.... DCT
`___ 2_0 .. 1
`___ 2_02..
`203
`204
`
`208
`
`Motion
`.__ ___ _.__S_i_de_In_fo_rm_at_io_n_~ Compen-
`sation
`206
`
`Frame
`Buffer
`207
`--liiiiiiiiill
`
`220
`
`Video Out
`
`Figure 2 (Prior Art)
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 3 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 3of11
`
`5,461,679
`
`Figure 3 (Prior Art)
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 4 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 4of11
`
`5,461,679
`
`o..__....
`
`Disolav
`
`.4Z8.
`
`Frame
`Buffer
`427
`
`Display
`Controller
`426
`
`h ,,
`
`Host
`
`~~ ,.
`
`410
`
`425
`
`Bus
`
`4~
`:---- 412~------~.------~---------:
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`!
`I
`I
`I
`L-------~----------------------~
`
`.,.."32
`
`I
`I
`I
`I
`I
`I
`41~
`I
`I
`I
`I
`I
`I
`I
`
`'32
`
`Module
`o
`401
`
`Module
`I
`
`402
`
`Module
`2
`
`403
`
`Module
`3
`
`404
`
`~ ~
`~----..--_._ __ .,..._ _ __. __ ~,~32 _ ___.I
`\
`420
`
`Frame
`Buffer
`430
`
`Shared
`Memory
`2MB - - - - - - - - - - - -
`405
`
`D 440
`
`Figure 4
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 5 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 5of11
`
`5,461,679
`
`To Bus 425
`
`1--~~ 2
`
`_____ _e:l
`
`Arbitration
`Mechanism
`502
`
`530)
`
`)'32
`
`,,
`Dual
`Port
`Memory
`(16K)
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`L ____________ J
`
`I
`
`j •
`
`520
`
`l.J
`
`J "32
`
`32
`
`504
`
`Digital
`Signal
`Processor
`501
`
`Local
`Memory
`(64K)
`
`503
`
`1 '
`
`Figure 5
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 6 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 6of11
`
`5,461,679
`
`time t
`
`T
`607
`h
`602
`1 ...... -_-_-_-_-___ -_-_-_-_-_-_-_-_-....,.-r----608
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`- -~----606
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`- _...---603
`
`time t + I
`
`611
`
`P:-----602
`----------------
`...... -------.-....----------------r---608
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`-
`- -..-----606
`_____ Jl.OL _________ ..----603
`
`Figure 6
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 7 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 7of11
`
`5,461,679
`
`time-.
`
`D Forward facing key frames (intra frames)
`~ Inter frames
`~ Backward facing key frames
`
`Figure 7a
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 8 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 8of11
`
`5,461,679
`
`Frame
`ID
`751
`
`Forward Backwarc
`Pointer
`Pointer
`753
`752
`
`Variable Length Data lJ
`
`712
`
`754
`
`Figure 7b
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 9 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 9of11
`
`5,461,679
`
`Rate Control
`
`Variable
`Length
`Coding
`.8.Q4
`
`Multi-
`plexing
`80 5
`
`I
`I
`I
`I
`II
`
`I
`I
`
`DCT
`
`Quantizer
`
`fil22
`
`fill3.
`
`813
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`-in ut Ima e
`
`801
`
`I
`I
`I
`I
`I
`
`Intra/
`Inter/
`Still/
`Add
`
`£ 800
`-------------------------------- ----,
`--------------1
`I
`I
`I
`:
`
`Inverse
`Quantizer
`806
`
`Inverse
`DCT
`807
`
`Rate
`Control
`808
`
`Buffer
`
`Scene
`Detector
`
`Frame
`Frame
`Difference 114---tl Memory
`fill
`._ __ .,....8.14~
`
`Side Information
`L-------------------------------
`
`Figure Sa
`
`Compressed Video Out
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 10 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 10 of 11
`
`5,461,679
`
`853
`
`852
`
`,------------------------,
`I
`I
`I
`I
`I
`I
`I
`I
`Decision __ -"--l.....____.
`.,___...._.__ Frame
`I
`Difference
`
`Threshold
`
`u and v
`Energy
`Computation
`
`:
`I
`I
`
`B2Q
`
`851
`
`I
`
`-----------------~-------·
`
`Scene Change Detector
`
`" ' - 815
`
`Figure Sb
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 11 of 23
`
`
`
`U.S. Patent
`
`Oct. 24, 1995
`
`Sheet 11 of 11
`
`5,461,679
`
`! __________________ !_ ___________ _
`
`900
`
`I
`I
`I
`Compressed 1
`Video In ut
`
`1
`
`910
`
`Variable
`Length
`Demulti-
`plexing ....___. .. Decoding
`901
`902
`_ _ lijiiiiii . .
`
`Inverse
`Quantizer
`903
`
`Inverse
`DCT ----.
`
`908
`
`Side Information
`
`Frame
`Differ(cid:173)
`encing
`906
`
`Forward/Backward
`I
`Play Control
`Signal ----. ........
`
`911
`
`Play
`Frame
`Control t-----------t~
`Buffer
`908
`907
`
`920
`
`Video Out
`
`Figure 9
`Enhanced CCITT Decoder
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 12 of 23
`
`
`
`1
`METHOD At'lD APPARATUS FOR
`ENCODING/DECODING IMAGE DATA
`
`5,461,679
`
`2
`
`This is a continuation Ser. No. 07/705,284, filed May 24,
`1991 now U.S. Pat. No. 5,212,742.
`
`5
`
`BACKGROUND OF THE INVENTION
`
`Image Size
`
`Frame Rate per second
`
`D
`
`64
`128
`256
`512
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`244.20
`61.05
`15.26
`3.81
`
`122.10
`30.52
`7.63
`1.90
`
`81.40
`20.35
`5.08
`1.27
`
`61.06
`12.25
`3.81
`0.95
`
`48.84
`12.21
`3.05
`0.76
`
`40.70
`10.17
`2.54
`0.63
`
`10
`
`20
`
`1. Field of the Invention
`This invention relates to the field of video imaging
`systems. More specifically, this invention relates to an
`improved method and apparatus for video encoding/decod(cid:173)
`ing.
`2. Description of the Related Art
`Due to the storage requirements, recent demands for full
`motion video in such applications as video mail, video
`telephony, video teleconferencing, image database brows(cid:173)
`ing, multimedia, and other applications have required that
`standards be introduced for video compression. One image
`of 35 mm slide quality resolution requires 50 megabytes of
`data to be represented in a computer system (this number is
`arrived at by multiplying the horizontal by the vertical
`resolution by the number of bits to represent the full color
`range or 4096x4096x8x3 [R+G+B] 18=50,331,648 bytes).
`One frame of digitized NTSC (National Television Stan(cid:173)
`dards Committee) quality video comprising 720x480 pixels
`requires approximately one half megabyte of digital data to
`represent the image (720x480xl.5 bytes per pixel). In an
`NTSC system which operates at approximately 30 frames 30
`per second, digitized NTSC-quality video will therefore
`generate approximately 15.552 megabytes of data per sec(cid:173)
`ond. Without compression, assuming a storage capability of
`one gigabyte with a two megabytes per second access rate,
`it is possible to:
`a. store 65 seconds of live video on the disk and to play
`it back at 3 frames per second;
`b. store 21 high quality still images taking 24 seconds to
`store or retrieve one such image.
`Assuming that a fiber distributed data interface (FDDI) is
`available with a bandwidth of 200 megabits per second, 1.5
`channels of live video can be accommodated, or 35 mm
`quality still images can be transmitted at the rate of one
`every two seconds. With currently available technology in 45
`CD-ROM, a likely distribution medium for products con(cid:173)
`taining video, the current transfer rate is approximately 0.18
`megabytes per second. 0.37 megabytes per second may be
`attained with CD-ROM in the near future.
`For illustration, take the variable parameters to be the 50
`horizontal and vertical resolution and frame rate, and assume
`that 24 bits are used to represent each pixel. Let D represent
`the horizontal or vertical dimension and assume an aspect
`ratio of 4:3. The data rate in megabytes per second as a
`function of frame rate and image size is:
`
`It is obvious from data rate and storage considerations that
`data compaction is required in order for full motion video to
`be attained.
`In light of these storage and rate problems, some form of
`15 video compression is required in order to reduce the amount
`of storage and increase the throughput required to display
`full-motion video in a quality closely approximating NTSC.
`Photographic and, to an even greater degree, moving images
`generally portray information which contains much repeti-
`tion, smooth motion, and redundant information. Stated in
`an equivalent way, areas of an image are often correlated
`with each other, as are sequences of images over time.
`Keeping these facts in mind, several techniques as have been
`established which eliminate redundancy in video imaging in
`25 order to compress these images to a more manageable size
`which requires less storage, and may be displayed at a fairly
`high rate. Some simple compression techniques include:
`1. Horizontal and Vertical Subsampling: Sampling only a
`limited number of pixels horizontally or vertically
`across an image. The required reduction in resolution
`provides for poor quality images.
`2. Reduction in Number of Bits Per Pixel: The technique
`including the use of a Color Look Up Table is currently
`used successfully to reduce from 24 to 8 bits per pixel.
`A reduction of approximately 3-1 is the useful limit of
`this method.
`3. Block Truncation Coding and Color Cell Methods: The
`block truncation coding (ETC) was developed by Bob
`Mitchell in the early 1980' s targeted at low compres(cid:173)
`sion rate and high quality applications (Robert Mitch(cid:173)
`ell, et al., Image Compression Using Block Truncation
`Coding, IEEE Trans., Comm., pp. 1335-1342, Vol.
`Com-27, No. 9, Sept. 1979). In this scheme, the first
`order statistics (mean) and the second order statistics
`(variance) of each pixel block is extracted and trans(cid:173)
`mitted. The image is reconstructed using these two
`quantities. An 8-1 compression ratio with 4x4 block
`sizes was demonstrated in (Graham Campbell, Two
`Bit/Pixel Full Color Encoding, pp. 215-223, Proceed(cid:173)
`ings of SIGGRAPH '86, Vol. 20, No. 4, Aug. 1986).
`4. Vector Quantization (VQ): A simple VQ maps discrete
`k-dimensional vectors into a digital sequence for trans(cid:173)
`mission or storage. Each vector (a block of 4x4 or 3x3
`pixels) is compared to a number of templates in the
`code book, and the index of the best matched template
`is transmitted to the receiver. The receiver uses the
`index for table look-up to reconstruct the image. A
`simple VQ could provide about 20-1 compression with
`good quality. A more complex VQ scheme has been
`demonstrated to provide similar quality to the CCITT
`(International Consultative Committee for Telephony
`& Telegraphy) DCT (Discrete Cosine Transformation)
`scheme recommendation H.261 (T. Murakami, Scene
`Adaptive Vector Quantization for Image Coding,
`Globecom, 1988).
`5. Predictive Techniques: The assumption on which this
`
`35
`
`40
`
`55
`
`Image Size
`
`Frame Rate per second
`
`D
`
`64
`128
`256
`512
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`0.04
`0.16
`0.65
`2.62
`
`0.08
`0.33
`1.31
`5.24
`
`0.12
`0.49
`1.96
`7.86
`
`0.16
`0.65
`2.62
`10.48
`
`0.20
`0.82
`3.27
`13.10
`
`0.24
`0.98
`3.93
`15.72
`
`or formulated in a slightly different way, the number of
`minutes of storage on a 600 megabyte disk is:
`
`60
`
`65
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 13 of 23
`
`
`
`5,461,679
`
`3
`family of methods relies is that adjacent pixels are
`correlated. As a consequence, data reduction can be
`accomplished by predicting pixel values based on their
`neighbors. The difference between the predicted and
`the actual pixel value is then encoded. An extensive 5
`body of work exists on this technique and variations on
`it (O'Neil, J. B., Predictive Quantization Systems for
`Transmission of TV Signals, Bell System Technical
`Journal, pp. 689-721, May/June 1966).
`The compression ratio to be expected from each of these 10
`simple methods is between four and eight to one.
`More complex techniques for video compression are also
`known in the art. It is possible to achieve data compression
`of between four and eight to one by using some of the
`simpler techniques as mentioned above. To achieve compa- 15
`rable quality, at compression ratios from twenty to forty to
`one, involves a superlinear increase in complexity. In this
`case, it is no longer appropriate to consider the compression
`process as a simple one-step procedure.
`In general, lossless compression techniques attempt to 20
`whiten or decorrelate a source signal. Intuitively, this makes
`sense in that a decorrelated signal cannot be compressed
`further or represented more compactly. For compression
`ratios of greater than twenty to one, a lossy element must be
`introduced somewhere into the process. This is usually done 25
`through a temporal or spatial resolution reduction used in
`conjunction with a quantization process. The quantization
`may be either vector or scalar. The quantizer should be
`positioned so that a graceful degradation of perceived qual(cid:173)
`ity with an increasing compression ratio results.
`Many of the succeeding methods are complex, but may be
`broken into a series of simpler steps. The compression
`process can be viewed as a number of linear transformations
`followed by quantization. The quantization is in turn fol(cid:173)
`lowed by a lossless encoding process. The transformations 35
`applied to the image are designed to reduce redundancy in
`a representational, spatial and temporal sense. Each trans(cid:173)
`formation is described individually.
`
`30
`
`DECORRELATION
`
`Although the RGB representation of digitized images is
`common and useful, it is not particularly efficient. Each one
`of the red, green and blue constituents is potentially of full
`bandwidth, although much of the information is correlated
`from plane to plane. The first step in the compression
`process is to decorrelate the three components. If an exact
`transformation were used, it would be image dependent, and
`as a consequence computationally expensive. A useful
`approximation which does not depend on image statistics is
`the following:
`
`4
`in the process, much of the interplane redundancy has been
`removed, and a dam reduction by a factor of two has been
`achieved.
`
`REDUCTION OF TEMPORAL REDUNDANCY
`
`Reduction of temporal redundancy may be achieved sim(cid:173)
`ply by taking the difference between successive frames. In
`the case of no motion and no scene change, this frame
`difference will be zero. The situation is more complex when
`there is interframe motion. In this case, some reregistration
`of the portions of the image which have moved is required
`prior to frame differencing. This is done by estimating how
`far pixels have moved between frames. Allowing for this
`movement, corresponding pixels are again subtracted
`(Ronald Plompen, et al., Motion Video Coding in CC/TT SG
`XV-The Video Source Coding, pp. 997-1004, IEEE Global
`Telecommunications Conference, Dec. 1988). The motion
`vectors are coded and transmitted with the compressed bit
`stream. These vectors are again used in the decoding process
`to reconstruct the images. The distinction between frame
`differencing and differencing using motion estimation may
`be expressed as follows. In the case of simple differencing,
`the error between frames is calculated as:
`
`e(x, y, t)=f (x, y, t+ 1)-f (x, y, t)
`
`using motion estimation error may be written as:
`
`e(x, y, t)=f(x+x, y+y, r+l)-f(x, y, t)
`where x and y are the calculated displacements in the x and
`y directions respectively.
`
`REDUCTION OF SPATIAL REDUNDANCY
`
`In most images, adjacent pixels are correlated. Reducing
`spatial redundancy involves removing this correlation. The
`removal is achieved using a linear transformation on the
`spatial data. In the ideal case, this transform should depend
`40 on image statistics. Such a transform does exist and is
`known as the Hotelling or Karhounen Loueve (KL) trans(cid:173)
`form (N. S. Jayant, Peter Noll, Digital Coding of Waveforms,
`Prentice Hall, Signal Processing Series, p. 58). As it is
`computationally expensive and does not have a fast algo-
`rithmic implementation, it is used only as a reference to
`evaluate other transforms. A variety of other transforms have
`been applied to the problem, including: Fourier, Walsh,
`Slant, Haddamard (Arun, N. Netravali, Barry G. Haskell,
`Digital Pictures Representation and Compression, Plenum
`50 Press). The cosine transform provides the best performance
`(in the sense of being close to the KL transform). The
`discrete cosine transform is defined in the following way:
`
`45
`
`y }(.3
`
`(
`
`U
`V
`
`.6
`.21
`
`.59
`-.28
`-.52
`
`.11 lR )
`
`-.32
`.31
`
`G
`B
`
`55 9(k. l)
`
`a(k)a(l)
`2
`
`N-lN-1
`:E
`:E x(m, n)·
`m=O n=O
`
`In the case of NTSC images, the resulting U and V (chromi(cid:173)
`nance components containing color information) compo(cid:173)
`nents are of lower bandwidth than the Y (luminance com(cid:173)
`ponent containing the monochrome information). In general,
`the U and V components are of less perceptual importance
`than the Y component. The next stage in the compression
`process usually consists of subsampling U and V horizon- 65
`tally and vertically by a factor of two or four. This is done
`by low pass filtering followed by decimation. At this point
`
`60
`
`cos
`
`(
`
`7tk(2m + 1)
`2N
`
`)cos ( 7tk(Z~ l)
`
`where x(m,n) is an NxN field (Blocksize), k, 1, m, n all range
`from 0 to N-1, and
`
`1
`a(O)= \f2 ;
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 14 of 23
`
`
`
`5,461,679
`
`5
`a(j)=l; J;itO. A range of DCT "block sizes" have been
`investigated for image compression (A. Netravali, et al.,
`Picture Coding; A Review, Proc. IEEE, pp. 366-406, March
`1980), and standards bodies have decided, apparently in an
`arbitrary fashion, that 8x8 blocks are "best." Adaptive block
`size have also been considered, with the adaptation driven
`by image activity in the area to be transformed (see Chen, C.
`T., "Adaptive Transform Coding Via Quad-Tree-Based Vari(cid:173)
`'89, pp.
`able Block Size," Proceedings of ICASSP
`1854-1857). In summary, a combination of the above tech(cid:173)
`niques, as applied to a raw video image would be performed
`as follows:
`1. Digitizing the image;
`2. transform RGB to YUV;
`3. remove temporal redundancy (through frame differenc(cid:173)
`ing and motion compensation;
`4. remove spatial redundancy (through a discrete cosine
`transfer); and
`5. entropy encode the data (using Huffman coding).
`This process yields the maximum compression possible
`using prior state of the art techniques.
`
`COMPRESSION STANDARDS
`
`Three examples of state of the an compression methods
`using some of these techniques are known as: CCITT H.261
`(the International Consultive Committee for Telephony and
`Telegraphy); JPEG (Joint Photographers Experts Group);
`and MPEG (the Motion Picture Experts Group). The JPEG
`standard was developed for encoding photographic still
`images and sets forth a basic technique for encoding video
`image data. The technique converts 8x8 pixel blocks of the
`source image using a discrete cosine transformation (DCT)
`function, with each block of pixels being represented in
`YUV source format (representing luminance and chromi(cid:173)
`nance information for the block). Threshold blocks of DCT
`coefficients using psychovisual thresholding matrices are
`then used to quantize the results of the 8x8 DCT macrob(cid:173)
`locks of the source image. Finally, each of the blocks is
`entropy encoded. The decoding process reverses these steps.
`The CCITT H.261 standard was developed for use in
`video teleconferencing and video telephony applications. It
`can operate at 64 kilobits (Kbits) per second to 1.92 mega-
`bits (Mbits) per second, and can operate upon images
`between 525 and 625 lines based upon a common interme(cid:173)
`diate format (CIF). It is performed using a method as shown
`in FIG. 1.
`The CCITT encoder 100 consists of a DCT, a zig-zag
`scanned quantization, and Huffman coding. DCT 101, quan(cid:173)
`tizer 102, and variable length coding 103 blocks perform the
`coding function. Finally, multiplexer 104 combines the
`Huffman code from the variable length coding block 103,
`motion vector data from motion estimation block 105,
`quantizer data from quantizer block 102. Intra/Inter type
`information from intra/inter block 106 and performs format(cid:173)
`ting and serializing, video synchronization and block
`addressing. Frame memory 107 is used to determine differ(cid:173)
`ences from the previous frame and the current frame using
`motion estimation block 105. CCITT encoder 100 further
`comprises inverse quantizer 108 and inverse DCT function
`109 to provide frame difference information. Lastly, infor(cid:173)
`mation multiplexed by 104 is passed to rate control 111 and
`buffer 112 for output as compressed bit stream 120.
`The CCITT decoder is shown in FIG. 2 as 200. Demul(cid:173)
`tiplexing block 201 takes the encoded bit stream 210,
`
`10
`
`25
`
`6
`identifies its constituents and routes them to the relevant
`pans of decoder 200. The main function of variable length
`decoding block 202 and inverse quantizer 203 block is to
`reconstruct the DCT coefficients from
`their Huffman
`5 encoded values, rescale these values and pass these on to
`inverse DCT block 204. Inverse DCT block 204 takes
`coefficients in blocks of 8x8 and reconstitutes the underlying
`spatial video information. If the macro block is intra-coded,
`no motion estimation is invoked. If it is inter-coded, the
`output is a difference between the information in this frame
`and the motion-compensated information in the last frame.
`A motion vector transmitted from demultiplexer 201 via
`"side information" signal 208 determines the best block to
`be used for reconstruction from the last frame. Motion
`compensation is performed by 206 from information of the
`15 current image in frame buffer 207. This is fed back into the
`decoded stream 205 and then as decoded output information
`220 in CIF format. The Y and UV components share the
`same motion vector information. A detailed description of
`the CCITT H.261 standard is described in document No.
`20 584, published on Nov. 10, 1989 by the Specialists Group on
`Coding For Visual Telephony, entitled Draft Revision of
`Recommendation H.261 published by the CCITT SG XV
`Working Party XV/l (1989).
`The MPEG standard is the most recent compression
`specification to use transport methods which describe
`motion video. Though not fully finalized, the MPEG speci(cid:173)
`fication's goal is to obtain VHS quality on reconstruction of
`the images with a bit rate of approximately 1.15 megabits
`30 per second for video. This yields a total compression ratio of
`about 40-1. The distinguishing feature of MPEG from JPEG
`and CCITT H.261 is that MPEG provides a higher quality
`image than CCITT H.261, like JPEG but allows motion.
`This is in contrast to JPEG, which only provides still-frame
`imagery and no audio. In addition, MPEG adds the addi(cid:173)
`tional feature of synchronized sound along with the encoded
`video data although it has not been finalized. A detailed
`description ofMPEG may be found in the document entitled
`MPEG Video Simulation Model 3 (SM3)-Draft No. 1
`40 published by the International Organization for Standard(cid:173)
`ization ISO-IEC/JTC1/SC2/WG8, Coded Representation of
`Picture and Audio Information ISO-IEC/JTC1/SC2/WG8 N
`MPEG 90/, published by A. Koster of PTT Research.
`Some of the relative advantages and disadvantages of the
`various coding algorithms are set forth as follows. JPEG
`provides no description of motion video at all. MPEG,
`although a full featured standard (it provides both forward
`motion, backwards motion, and still frame), is still under
`development and undergoing revision. CCITT H.261,
`50 because it was developed for teleconferencing and video
`telephony, it provides a moving source but has no provisions
`for viewing the motion picture images in a reverse direction,
`or provides any means for still frame viewing. Therefore, a
`system is required which is fairly mature, such as the CCITT
`55 H.261 standard, but yet provides all the capabilities (includ(cid:173)
`ing reverse play and still frame) of a full-featured compres(cid:173)
`sion system, such as the MPEG standard.
`CCITT H.261 uses a scheme such as that shown in FIG.
`3 in order to provide for full-motion video. FIG. 3 shows a
`60 series of frames which represents a particular section of
`moving video. 301 and 302 contain full scene information
`for the image at the beginning of a series of frames. 301 and
`302 are known as "intra" frames or keyframes which are
`used in CCITT H.261. Each intra frame 301 or 302 contains
`65 a full scene description of the frame at the times they appear.
`Although compressed, intra frames 301 and 302 contain
`substantial information. Each of the intervening frames
`
`35
`
`45
`
`Ex. 1045
`IPR2016-00923, HTC v. PUMA
`Page 15 of 23
`
`
`
`5,461,679
`
`7
`between two intra frames 301 and 302 are known as "inter"
`frames 303, 304, and 305. Each inter frame such as 303-305
`contains information which should be added to the preced(cid:173)
`ing frame. For example, inter frame 303 only contains
`information which has moved since intra frame 301. There-
`fore, the information represented in frames 303-305 may be
`substantially less than that contained in frames 301 and 302
`because the inter frames contain only motion data, and not
`complete scene information for the entire frame. This pro(cid:173)
`vides a fairly high compression ratio for intervening inter 10
`frames 303-305. CCITI H.261 as represented in FIG. 3 is
`incapable of providing reverse motion video because a
`"key" frame, such as intra frame 301 only establishes
`information for inter frames 303--305 which follow intra
`frame 301 in time. In other words, 303-305 only contain 15
`information which has moved from intra frame 301, not
`motion information from intra frame 302. An attempt to play
`such a sequence of frames in reverse will generate substan(cid:173)
`tial distortion of the moving image.
`Because a decompression rate of approximately 30 frames
`per second (FPS) is required for real-time moving video, the
`processor performing the decoding process must have a
`fairly high bandwidth and be able to handle all the necessary
`matrix-matrix multiply operations required by the decoding
`process in a short period of time. To date, no single device 25
`possesses the necessary computing power to decompress an
`incoming compressed bit stream at the necessary rate to
`make data available for NTSC quality video at the 30 frame
`per second rate.
`
`5
`
`8
`instead of that normally connected to the host. This provides
`increased performance of the system as a whole, especially
`in video decompression tasks.
`These and other objects of the present invention are
`provided for by a method in a computer system for parti(cid:173)
`tioning an image for processing by a parallel computing
`system. The parallel computing system comprises N com(cid:173)
`puting units. First, the total length of an image is determined,
`and is divided by N. The dividend is then stored in a first
`value, the first value, in a preferred embodiment, the width
`of the image to be assigned to each parallel computing unit.
`A first region of the image is assigned to a first computing
`unit, the first region starting at a first position, and ending at
`the first position offset by the first value. Therefore, a portion
`of the image is assigned, in a preferred embodiment which
`is the full image in width, and HIN wherein H is the length
`or height of the image, and N is the total number of
`processors. Each of the N processors are assigned corre(cid:173)
`sponding sections of the image according to their position
`20 relative to the first processor, each having a section which is
`the full width of the image, and which is HIN in length.
`Height and width information is represented, in the preferred
`embodiment, in blocks containing luminance and chromi(cid:173)
`nance information.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`The present invention is illustrated by way of example
`and not limitation in the figures of the accompanying in
`30 which like references indicate like elements and in which:
`FIG. 1 shows a prior art video encoder.
`FIG. 2 shows a prior art video decoder.
`FIG. 3 shows a prior art scheme for representing motion
`35 video.
`FIG. 4 shows the architecture of the video processor of the
`preferred embodiment.
`FIG. 5 shows one compute module used in the preferred
`embodiment of the present invention.
`FIG. 6 shows the partitioning of an image for processing
`by each of the computing modules of the preferred embodi(cid:173)
`ment.
`FIG. 7a shows the preferred embodiment's method of
`encoding motion video.
`FIG. 7b is a detailed representation of one frame in the
`preferred embodiment.
`FIG. Sa shows the improved CCITI encoder used in the
`preferred embodiment.
`FIG. Sb is a detailed representation of the scene change
`detector of the preferred embodiment.
`FIG. 9 shows the enhanced CCITI decoder used in the
`preferred embodiment.
`
`SUMMARY AND OBJECTS OF THE
`INVENTION
`
`One of the objects of the present invention is provide an
`architecture and method which has sufficient computing
`power to allow compressed moving video images to be
`decompressed and displayed in real time.
`This and other objects of the present invention are pro(cid:173)
`vided for by ·an apparatus for processing video data for
`compression/decompression in real-time which comprises a 40
`plurality of compute modules, in a preferred embodiment,
`for a total of four compute modules coupled in parallel. Each
`of the compute modules has a processor, dual port memory,
`scratch-pad memory, and an arbitration mechanism. In a
`preferred embodiment, the processor is a digital signal 45
`processor, and the device comprises 16 kilobytes of dual(cid:173)
`port dynamic random access memory and 64 kilobytes of
`local "scratch pad" dynamic random access memory. A first
`bus couples the compute modules and a host processor. In a
`preferred embodiment, the host processor is coupled to a 50
`complete computer system comprising a display, memory,
`and o