`RECENT DEVELOPMENTS IN THE DESIGN
`OF IMAGE AND VIDEO PROCESSING ICs
`
`Konstantinos Konstantinides and Vasudev Bhaskaran
`Hewlett-Packard Laboratories
`
`1. Introduction
`Hardware support for image processing is very important in emerging applications
`such as desktop publishing, medical imaging, and multimedia. In the past, compu(cid:173)
`tational needs for intensive image processing tasks, such as image analysis of satel(cid:173)
`lite data and pattern recognition, were satisfied with custom, complex, and expen(cid:173)
`sive image processing architectures [11, [21. However, the latest scientific worksta(cid:173)
`tions have enough compute power for low-cost desktop image processing. Further(cid:173)
`more, traditional image processing operations, such as texture mapping and warp(cid:173)
`ing, are now cornbined with conventional graphics techniques. Hence, there is an
`increased interest for accelerated image and video processing on low cost computa(cid:173)
`tional platforms, such as personal computers and scientific workstations.
`
`Many developers provide already some type of image processing support, based
`mostly on general purpose microprocessors or digital signal processors (DSPs),
`such as the i860 from INTEL, or the TMS320 family from Texas Instruments.
`However, new application areas, such as high-definition TV (HDTV) and video tele(cid:173)
`conferencing demand processing power that existing general purpose DSPs cannot
`provide.
`
`Until recently, commercially-available image processing Ies performed only low(cid:173)
`level imaging operations, such as convolution, and had very limited, if any, pro(cid:173)
`gramming capability. This was due to the well defined operation and data represen(cid:173)
`tation of the low-level imaging functions; furthermore, there was a general feeling
`that the image processing market was quite small. Emerging standards, such as
`JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts
`Group) for image and video compression, and the opening of new market areas,
`such as multimedia computing, make it easier now for manufacturers to invest in the
`design and development of a new generation of image and video processing Ies.
`For example, image compression Ies are now becoming widely available. Thus, a
`new generation of general purpose image processors will soon emerge to extend the
`capabilities of general purpose DSPs.
`
`M. A. Bayoumi et al. (eds.), VLSI Signal Processing Technology
`© Kluwer Academic Publishers 1994
`
`Petitioners HTC and LG - Exhibit 1036, p. 25
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`26
`
`In this paper we present a brief overview of the specific requirements in image pro(cid:173)
`cessing and some recent developments in the design of image and video processing
`ICs. Both application-specific (ASIC) and programmable image processors are dis(cid:173)
`cussed. For the application-specific processors, special emphasis is given to the pro(cid:173)
`cessing requirements of recent standards in image and video compression. We close
`with a discussion on a "generic" general purpose image processing architecture. As
`general purpose DSPs share many common features (a Harvard architecture, multi(cid:173)
`ple data memories, etc.), we expect that future general purpose image processors
`will share many of the features embodied in the generic design.
`
`2. Image Processing Requirements
`
`Image processing is a broad field that spans a wide range of applications such as
`document processing, machine vision, geophysical imaging, multimedia, graphics
`arts, and medical imaging. A careful examination of the imaging operations needed
`for these applications suggests that one can classify the image processing operations
`into low, intermediate, and high levels of complexity. In low-level processing (i.e.
`filtering, scaling, and thresholding), all operations are performed in the pixel domain
`and both input and output are in the form of a pixel array. In intermediate-level pro(cid:173)
`cessing, such as edge detection, the transformed input data can no longer be
`expressed as just an image-sized pixel array. Finally, high-level processing, such as
`feature extraction and pattern recognition, attempts to interpret this data in order to
`describe the image content.
`
`Because of this large range of operations, it seems that every conceivable type of
`computer architecture has been applied at one time or another for image processing
`2 and that so far, there is no single architecture that can efficiently address all possi(cid:173)
`ble problems. For example, Single Instruction Multiple Data (SIMD) architectures
`are well suited for low-level image processing algorithms; however, due to their lim(cid:173)
`ited local control they cannot address complex, high-level, algorithms [3]. Multiple
`Instruction Multiple Data (MIMD) architectures are better suited for high-level
`algorithms, but they require extensive support for efficient inter-processor communi(cid:173)
`cation, data management, and programming. Regardless of the specifics, every
`image processing architecture needs to address the following requirements: process(cid:173)
`ing power, efficient data addressing, and data management and 110.
`
`2.1 Processing Power
`
`To better understand the computational requirements of image processing algo(cid:173)
`rithms, consider first the I-D space. In I-D digital signal processing, a large class of
`algorithms can be described by
`
`k=!
`where {X (i)} and {Y (i)} denote the input and output data, and {C I> C2 , ... , C p} are
`
`(1)
`
`p
`
`Y(i) = L CkX(i - k),
`
`Petitioners HTC and LG - Exhibit 1036, p. 26
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`27
`
`algorithm-dependent coefficients. Using a general purpose DSP that has a multiply(cid:173)
`accumulate unit and can access simultaneously the input data and the coefficients,
`an output sample can be generated every p cycles.
`
`In image processing, a large class of algorithms can be described by
`p
`q
`Y(i, j) = L L CmnX(i - m, j - n),
`m=1 n=1
`where X and Y are the input and output images, arid C is a p x q matrix of algo(cid:173)
`rithm-dependent coefficients. From (2), a single DSP requires now at least pq
`cycles to generate an output pixel. In a multi-DSP system, at least q DSPs have to
`be used to generate an output sample every p cycles, provided that all processors
`have direct access to the data and there exists efficient inter-processor communica(cid:173)
`tion and a data distribution mechanism. Thus, the first main requirement in image
`processing is processing power.
`
`(2)
`
`2.2 2-D Addressing
`
`Another major requirement in image processing is efficient data addressing. In con(cid:173)
`ventional DSPs, a simple p-modulo counter is adequate for the data addressing in
`(1) [41. However, in image processing, the addressing schemes may be far more
`complicated. For example, Fig. 1 shows an N x M image, and a p x p kernel of
`coefficients positioned with its left upper comer at address A of the image. From
`(2), to compute an output sample, one possible scheme to sequentially access the
`image data is to generate data addresses in the following sequence.
`A + (p -I),
`A+N +(p-I),
`A + 2N + (p - 1),
`
`A,
`A+N,
`A+2N,
`
`A+I,
`A+N+I,
`
`A+(p-I)N, A+(p-I)N+I,
`
`A + (p - I)(N + I).
`Fig. 2 shows a 2-D address arithmetic unit (AAU) as implemented on the Video
`DSP by Matsushita [51. This AAU has two parts, the controller, and the address gen(cid:173)
`erator. The AAU operation is controlled by the values in five registers: SA, NX,
`NY, DX, and DY, as shown in Fig. 2. In our case, SA=A, NX=p, NY=p, DX=I,
`and DY=N - (p -1). To generate addresses from A to A + (p -I), the adder is
`incremented by DX=l. After each row of the kernel is processed, the "Row End"
`signal enables DY=N - (p - I) to access the adder, and an address is generated for
`the beginning of the next row. Operations continue until the "End of Block" signal
`is enabled.
`
`The HSP45240 IC from Harris Semiconductor is a programmable 24-bit address
`sequence generator for image processing applications. It can be configured to gen(cid:173)
`erate addresses for filtering, FFT, rotation, warping, zooming, etc. The Harris IC is
`ideal for block oriented address generation. Five configuration registers allow a user
`to define such parameters as: the beginning address of a sequence, the block size,
`
`Petitioners HTC and LG - Exhibit 1036, p. 27
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`28
`
`..
`
`A
`
`N------i .. ~
`
`/
`
`A+(p-1)
`
`N
`A+
`
`"'"
`
`f- ~rt
`r- ~,
`-p.-
`
`1
`
`M
`
`A+(p-1 )(N+ 1 )
`
`Fig. 1 : 2-D addressing in image processing.
`
`Controller
`
`Address Calculator
`
`Address Outpu
`
`Signal
`
`SA: Start Address
`
`NY: Number of data per column
`
`NX: Number of data per row DX: Stepsize
`
`in X direction
`
`DY: Stepsize
`
`in Y direction
`
`Fig. 2 : Block diagram of a 2-D address generator.
`
`address increment, number of blocks, etc. A crosspoint switch can be used to
`reorder the address bits. This allows for bit-reverse addressing schemes used in
`FFT computations.
`
`Recent image processor ICs have very long instruction words (VLIW) for the con(cid:173)
`trol of multiple AAUs, ALUs and function units. A programmer has to
`
`Petitioners HTC and LG - Exhibit 1036, p. 28
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`29
`
`simultaneously control not only the two to three address generators, but also all the
`other function units. Kitagaki et al. [61. presented an AAU design that allows the
`user to control tho address generator controller using high-level code words. Given
`the high-level code word, an address microsequencer then generates the full
`sequence of addresses required in a particular application. The design allows for 17
`different high-level and commonly used image processing addressing modes, such
`as: block raster scan, neighborhood search, I-D and 2-D indirect access, and has
`special modes for FFT and affine transformation addressing.
`
`2.3 Data Storage and 110
`
`A 512 x 512, 8-bit grayscale image requires 262 Kbytes of storage and a 24-bit,
`1024 x 1024 color image requires approximately 3 Mbytes of storage. Considering
`that many applications require data processing in real-time (12-30 frames per sec(cid:173)
`ond or more), it is easy to see why efficient data storage and I/O are extremely
`important in image processing architectures.
`
`Systolic architectures address this problem by distributing the data across an array
`of processors and allow only local data transfers. Many DSPs achieve efficient 110
`by integrating a local Direct Memory Access (DMA) engine with the rest of the
`architecture 4; furthermore, 110 operations and computations are usually performed
`in parallel. As we will see next, similar techniques can also be applied into the
`design of image processor ICs.
`
`Image processing architectures can be divided into the following broad categories:
`dedicated image processor systems, image processing acceleration systems, and
`image processor ICs. The dedicated image processor systems usually include a host
`controller and an imaging subsystem that includes memory, customized processing
`ICs, and custom or commercially-available math processing units, connected in a
`SIMD, MIMD, or mixed-type architecture. Most of these systems are being devel(cid:173)
`oped either at universities for research in image processing architectures, or at cor(cid:173)
`porate laboratories for in-house use or for specific customers and application areas
`3, [71, [81.
`
`Image processing acceleration systems are usually being developed as board level
`subsystems for commercially-available personal computers or technical worksta(cid:173)
`tions. They use standard bus (VME, EISA, NuB us, etc.) interface to communicate
`with the main host, and usually include memory, one to four commercially-available
`DSPs (such as the TMS320C40 from Texas Instruments) or high performance
`microprocessors (such as the INTEL i860), a video port, and a frame buffer. Most
`ofthe board-level subsystems also include floating-point arithmetic unites) and have
`a Bus-based archi~cture [91.
`
`In this paper, the emphasis will be on image processor ICs. These are either appli(cid:173)
`cation-specific circuits (for example, image compression ICs) with limited program(cid:173)
`ming capability, or programmable circuits designed for a specific range of applica(cid:173)
`tions (HDTV, video conferencing, etc.). No single article can cover all existing
`
`Petitioners HTC and LG - Exhibit 1036, p. 29
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`30
`
`designs. Hence, we present only a selection of chips that cover the current trends in
`hardware for image and video processing.
`
`3. Application Specific Image Processor ICs
`
`Most image-processing functions can be implemented on a general purpose digital
`signal processor and several vendors offer such solutions today. Such solutions are
`acceptable to test the functionality of a particular image processing algorithm.
`However, they are generally not amenable to real-world applications due to the pro(cid:173)
`cessing requirements of those applications and the resulting high cost associated
`with the use of a multi-DSP system.
`
`For example, in computer-vision systems, one might want to perform a semi(cid:173)
`automated PC board inspection. This task requires imaging a PC board at video
`rates (256 x 256 x 8 bits/pixel acquired at 30 frames/sec), and performing edge(cid:173)
`detection on the digital image followed by binarization of the edge-detected image.
`Then, a template matching function might be performed on the binary image to
`determine faulty traces on the board. The above-mentioned tasks require at least
`120 million operations per second of processing capability. Most general purpose
`DSPs are not capable of delivering such a high MOP count at sustained rates. To
`make applications like these feasible, IC vendors offer single-chip solutions for
`many of these image processing tasks.
`
`Image compression is another area where a generic DSP solution is not acceptable.
`With the emergence of multimedia computing and low-cost imaging peripherals
`such as color scanners and color printers, there is a large body of emerging applica(cid:173)
`tions that will incorporate high-quality images and/or digital video. To facilitate the
`delivery and storage of high-quality images and digital video, considerable effort is
`being expended in developing compression standards for such data-types. Presently,
`the JPEG compression standard for still-images and the MPEG compression stan(cid:173)
`dard for digital-video (primarily for CD-ROM based applications) have motivated
`several IC vendors to develop single-chip solutions conforming to these standards.
`Desktop video conferencing applications have also motivated IC vendors towards
`offering custom chip solutions conforming to the Px64 (H.261) video compression
`standard. It is envisioned that workstation based computing systems in the next few
`years will contain functionality offered by such chips and this will enable the work(cid:173)
`stations to handle still-images and video in a real-time manner. In this section we
`will describe application-specific imaging ICs suitable for low-level image process(cid:173)
`ing and image compression.
`
`3.1 Image Processing ICs for Computer-Vision Systems
`
`In computer vision and most traditional image processing systems, a significant
`amount of processing is performed with low-level image processing functions, such
`as image enhancement and edge-detection. For such applications there exist several
`high-performance ICs.
`
`Petitioners HTC and LG - Exhibit 1036, p. 30
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`31
`
`ICs for Image Filtering
`
`LSI Logic[lO] offers the L64240 multi-bit finite impulse response filter (MFIR).
`This chip is a transversal filter consisting of two 32-tap sections and each 32 tap sec(cid:173)
`tion is comprised of four separate 8th order FIR filter sections. The output can be
`generated at a 20 MHz sample rate with 24 bits precision. The data and coefficients
`supplied to each filter section can be 8 bits wide. Each filter cell within a filter sec(cid:173)
`tion consists of a multiplier and an adder which adds the multiplier output and the
`adder output of the preceding filter cell. The multiplier accepts 8 bit inputs (data and
`coefficients) and the adder accepts 19 bit inputs. The adder output serves as the
`adder input to the next filter cell within this FIR filter.
`
`A format adjust block at the end of each 32-tap section provides the capability to
`scale (with saturation), threshold, clip negative values, invert or take absolute value
`of the output (these functions are useful in image enhancement and in image dis(cid:173)
`play). Since the convolution kernel can be loaded synchronously with the device
`clock, it is possible to perform adaptive filtering and cross-correlation with this
`device. By changing the kernel (coefficient data), this filter can be used for edge(cid:173)
`sharpening (useful as an image enhancement function), noise-reduction (preprocess(cid:173)
`ing step in many imaging applications wherein the image is acquired) and image
`resampling (for printing).
`
`The L64240 is reconfigurable and can perform 1-0,2-0 and 3-D filtering. By cas(cid:173)
`cading several MFIRs, input data and convolution kernel precision can be extended
`to 24 bits. This is accomplished by splitting the data sample into its high and low(cid:173)
`order bits, performing convolutions separately on these two streams, and combining
`the partial results. Furthermore, complex filtering can also be accomplished by con(cid:173)
`figuring the MFIR as four 16-tap filter sections and by feeding the real and imagi(cid:173)
`nary parts of the data into these filter sections.
`
`ICs for Image Enhancement
`
`In many imaging applications, the image data is acquired from a noisy source, e.g.
`scanned image from a photo-copy, and some noise reduction function needs to be
`applied to this image. One such technique is rank-value filtering, e.g. median(cid:173)
`filtering. An Ie that performs this task at a 20 MHz rate is the L64220 from LSI
`Logic 10. Like the MFIR, this is a 64-tap reconfigurable filter. However, the output
`is from a sorted list of the input and not a weighted sum of the input values, like the
`MFIR's filtering function. The filter can be configured to perform rank-value opera(cid:173)
`tions on 8x8, 4x16, 2x32 or 1x64 masks. The rank-value functions include min, max
`and the median function. The processor is pipelined and each stage of the pipeline
`operates on one bit of the input data. The total number of ones in the input data at a
`certain bit position (in one stage) is counted. If this number exceeds (or equals) the
`rank specified by the user, output bit for that position is one. A mask (which indi(cid:173)
`cates which of data words might still affect the output) is passed from stage and is
`modified. The basic operations performed include ANDing (for masking), summing
`
`Petitioners HTC and LG - Exhibit 1036, p. 31
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`32
`
`and comparing.
`
`In many image processing applications the acquired image needs to be enhanced for
`incorporation within an electronic document. Image enhancement in this context
`implies equalizing the histogram[IIJ of the image. This function requires computing
`the histogram of the input image and then applying a function to the input which
`equalizes the histogram 11. The L64250 IC from LSI Logic performs the histogram
`operation at a 20 MHz rate on data sets as large as 4K x 4K pixels. Similarly, the
`HSP48410 "Histogrammer" IC from Harris Semiconductors can perform signal and
`image analysis at speeds up to 40 MHz.
`
`ICs for Computer-Vision Tasks
`
`The L64250 chip can also perform a Hough Transform. The Hough transform tech(cid:173)
`nique transforms lines in cartesian coordinates to points in polar coordinate spacell.
`This property can be used to detect lines in the input image. Line detection is
`important in many computer-vision tasks such as detecting connectivity between
`two points on a PC board.
`
`Another IC useful in computer-vision tasks and optical character recognition (OCR)
`is the object contour tracer (L64290 from LSI Logic). An internal image RAM
`stores the 'binary' image of up to 128 x 128 pixels. It also stores a 'MARK' bit for
`each pixel indicating contour points that have been already been traced. This chip
`locates contours of objects in this image and for each contour, it returns the (x, y)
`coordinates, discrete curvature, bounding-box, area and the perimeter. In an OCR
`application, this data can be part of a feature set that can be used to perform tem(cid:173)
`plate matching in feature space to recognize this character (whose 128x128 image is
`input to the contour tracker). In conventional computer-vision applications, the con(cid:173)
`tour tracker can be used as a component in a system to identify objects. This could
`be the front-end signal processing component of a robot. For an NxM image, the
`contour tracker takes 4.5 NM cycles to generate the contours.
`
`This IC can also be used for the compression of black-and-white images wherein
`the contour information will represent the compressed data. For most textual docu(cid:173)
`ments, this could yield very high compression ratios compared with the traditional
`Group III and Group IV fax compression methods.
`
`In image recognition tasks, template matching might be employed to identify an
`object within a scene. In many applications the template matching is performed in
`image space. To facilitate this process, the L64230 (from LSI Logic) performs
`binary template matching with a reconfigurable binary- valued filter with 1024 taps.
`The filter can operate at 20 MHz in various configurations including I-D and 2-D
`filtering, template matching, and morphological processing.
`
`ICs for Image Analysis
`
`In this category, we find several ICs that accept spatial domain image data and
`
`Petitioners HTC and LG - Exhibit 1036, p. 32
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`33
`
`generate non-image data. Most of the ICs in this class perform some form of fre(cid:173)
`quency domain processing and the FFT is the most popular function. Plessey Semi(cid:173)
`conductor and LSI Logic (L64280)10 (among others) have high-speed FFT proces(cid:173)
`sors that are capable of computing FFT butterflies at a 20 MHz rate.
`
`3.2 ICs for Image and Video Compression
`
`Workstations today offer high performance and support large storage capacities,
`thus being viable platforms for multimedia applications that require images and
`video. However, the bandwidth and computation power of the workstations is not
`adequate for performing the compression functions needed for the video and image
`data. A single 24-bit image at lK x 768 resolution requires 2.3 Mbytes of storage,
`and an animation sequence with 15 minutes of animation would require 60 Gbytes.
`To facilitate such applications and multimedia computing in general, recently, there
`has been a great deal of activity in the image and video compression arena. This
`activity has progressed along two fronts, namely the standardization of the compres(cid:173)
`sion techniques and the development of ICs that are capable of performing the com(cid:173)
`pression functions on a single chip.
`
`The image compression standards currently being developed are in four classes as
`shown in Table 1. The JPEG standard [12] is intended for still-images. However, it
`is also being used in edit-level video applications, where there is a need to do frame(cid:173)
`by-frame processing. The MPEG (1 and 2) standards apply to full-motion video;
`MPEG-l was originally intended for CD-ROM like applications wherein, encoding
`would be done infrequently compared with decoding. MPEG-l [I3] offers VHS
`quality decompressed video whereas the intent of MPEG-2 is to offer improved
`video quality suitable for broadcast applications. Recently, video broadcast compa(cid:173)
`nies have proposed video transmission schemes that would use an MPEG decoder
`and a modem in each home. By optimizing the decoder and modem configuration,
`the broadcasters are attempting to incorporate up to four video channels in the pre(cid:173)
`sent 6 MHz single channel bandwidth. The increased channel capacity has signifi(cid:173)
`cant business implications.
`
`The Px64 (H.261) standard [14] is intended for video conferencing applications.
`This technology is quite mature but the devices available to accomplish this have
`been until now multi-board level products. As CPU capabilities increase, we expect
`some of these functions to migrate to the CPU. However, a simple calculation of
`the MIPS (million instructions per second) requirements for these compression
`schemes indicates that such an eventuality is at least five years away. For example,
`in Table 2, we show the MIPS requirements for a Px64 compression scheme. Note
`that the 1000 MIPS requirement for encoding is not going to be achievable in a cost(cid:173)
`effective manner on a desktop in the near future. Recently, several software-only
`decompression procedures have been developed for MPEG-l and Px64 real-time
`decoding on workstations. These decompression procedures rely on efficient parti(cid:173)
`tioning of the decoding process so as to offload some of the computations to the dis(cid:173)
`play processor. For the remaining compute-intensive tasks, several fast algorithms
`
`Petitioners HTC and LG - Exhibit 1036, p. 33
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`34
`
`Table 1
`Image And Video Compression Standards
`
`FEATURES
`
`JPEG
`
`MPEG-l MPEG-2
`
`Px64
`
`Yes
`Yes
`
`Yes
`
`Yes
`64Kx 64K
`(max)
`10:1 to 80:1
`
`Full-color still images
`Full-motion video
`Real-time video capture
`and playback
`Broadcast-quality
`full-motion video
`Image size
`(pixels)
`Compression ratios
`
`Typical data rates
`(compressed data)
`
`Yes
`
`Yes
`
`360 x 240
`
`Yes
`640 x 480
`
`200:1
`(max)
`1.5 Mbps
`
`100: 1
`(max)
`5-20 Mbps
`
`Yes
`
`Yes
`
`360 x 288
`
`100:1
`to 2000:1
`64 Kbps
`to 2 Mbps
`
`have been devised; for example, recently, there have been several developments in
`fast inverse DCT (Discrete Cosine Transform) methods that are well suited for these
`compression standards. For applications requiring real-time compression and
`decompression, based on the MIPS requirements discussed in Table 2, it seems that
`application-specific ICs may be the only viable near-term solution. In the JPEG
`case, application-specific ICs are useful when dealing with motion sequences. For
`still-images, it is possible to optimize the JPEG algorithms to accomplish interactive
`processing on a desktop workstation.
`
`In Table 3, we list representative image and video compression chips that are now
`commercially available. Similarly, in Table 4, we list representative image and
`video compression chips that are in a research and/or development stage. In addi(cid:173)
`tion to the vendors shown in these tables, other major IC companies, such as Texas
`Instruments and Motorola are planning to introduce both programmable and appli(cid:173)
`cation specific ICs (ASICs) for video compression. Recently, these IC vendors have
`also announced variations on the MPEG chips that are appropriate for a low-cost
`implementation of decoders for set-top converters. These variations are primarily
`intended to : a) reduce memory requirements and decoding latencies by eliminating
`B frames in the MPEG bitstream, b) support the interlaced scanning mode for video,
`and c) allow fast switching between the video channels.
`
`All of the compression chips have a common set of core functions, namely a spatial(cid:173)
`to-frequency domain transformation, quantization of frequency-domain signals, and
`entropy-coding of the quantized data. For compression of motion sequences, addi(cid:173)
`tional processing in the temporal domain is also employed. The main processing
`flow for JPEG, MPEG, and Px64 is depicted in Fig. 3 . We will briefly describe the
`main processing functions in these compression schemes.
`
`Petitioners HTC and LG - Exhibit 1036, p. 34
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`Table 2
`MIPS Requirements for Px64 Compression And Decompression
`(Image Resolution: 360x288, YCrCb Encoding, 30 frames/sec.)
`
`35
`
`COMPRESSION
`
`RGB To YCrCb
`Motion Estimation
`i.e., 25 searches
`in a 16x16 region.
`Coding Mode
`Motion-vector only
`mode, Interframe coding
`or Intraframe coding
`Loop Filtering
`Pixel Prediction
`2-0 OCT
`Quantization, Zig-zag scanning
`Entropy Coding
`Reconstruct Previous Frame
`(a) Inverse Quantization
`(b) Inverse OCT
`(c) Prediction+Oifference
`TOTAL
`
`DECOMPRESSION
`
`Entropy Coding - decoder
`Inverse Quantization
`Inverse OCT
`Loop Filter
`Prediction
`YCrCbto RGB
`TOTAL
`
`MIPS
`
`27
`608
`
`40
`
`55
`18
`60
`44
`17
`
`9
`60
`31
`969
`
`17
`9
`60
`55
`30
`27
`198
`
`Spatial-to-frequency domain transformation: This is accomplished via an 8x8 2-D
`DCT (Discrete Cosine Transform) performed on the spatial domain data. Often the
`RGB spatial domain data is first converted into a color space suitable for image
`compression. The YCrCb color space is used for this purpose. The 2-D DCT can
`be performed as a straightforward matrix multiply operation or as a matrix-vector
`multiply operation (on 16 vectors), or via a fast DCT algorithm. Straightforward
`matrix-vector multiplication methods are well suited for a hardware implementation
`whereas a programmable image computing engine (such as a DSP) might use a fast
`DCT implementation.
`
`Quantization: The frequency domain samples are quantized so that some of the
`
`Petitioners HTC and LG - Exhibit 1036, p. 35
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`36
`
`Table 3
`Commercially Available
`Image And Video Compression ICs
`
`Vendor
`C-Cube
`Microsystems
`
`Part No.
`CL550
`CL450
`CL950
`
`CL4000
`
`AT&T
`
`AVP-1300E
`
`AVP-1400D
`
`Integrated
`Information
`Technology
`
`VP
`
`LSI Logic
`
`LS647xx
`
`SGS-Thompson
`
`L64702
`L641l2
`STV3208
`
`STI3220
`
`Comments
`lOand 27 MHz
`Decode only.
`Can be used
`at MPEG-2
`resolutions.
`Encode,Decode
`
`Standard
`JPEG
`MPEG-I
`MPEG-I
`
`JPEG/MPEGI
`Px64
`JPEGIMPEG-l/
`Px64
`
`Encoder only.
`Data rates
`up to4 Mbps.
`JPEG/MPEG-l/ Decoder only.
`Px64
`Spatial resolution
`uptolKxlK.
`JPEG/MPEG-l/ MPEG-I encoding
`Px64
`at non-real time.
`Real-time decoding
`at 352x288 spatial
`resolution (CIF).
`Requires up to 7 ICs
`for muItistandard
`compression.
`240x352 pixels.
`Video decoder.
`Performs 8x8 DCT
`for JPEG and
`MPEGcore.
`Performs motion-
`estimation for
`MPEG and Px64
`encoding.
`
`JPEG/MPEG-II
`Px64
`
`JPEG
`MPEG
`JPEG/MPEG
`
`MPEG/Px64
`
`frequency components can be eliminated. This results in a loss of detail which wors(cid:173)
`ens as the compression ratio is increased. The quantization function is a simple
`pointwise divide (and rounding) operation.
`
`Entropy-coding: The quantized data has few values and there is usually a run of
`zero-valued samples between non-zero valued samples. A Huffman coding method
`employed on the runs is usually used (e.g. Baseline JPEG standard).
`
`Petitioners HTC and LG - Exhibit 1036, p. 36
`HTC and LG v. PUMA, IPR2015-01502
`
`
`
`37
`
`Table 4
`Image And Video Compression ICs
`in R& 0 stage
`Standard
`Comments
`H.261/JPEG/MPEG H.261 15 frames/sec.
`64 Kbps encoder output
`rate. 352x288 spatial
`resolution.
`352 x 240 encoding at
`30 frames/sec.
`JPEG solution requires
`only two chips.
`25 Mbits/sec (max),
`Three-chip set
`decodes only.
`2-chip prog. set.
`
`JPEG/MPEG-l!
`Px64
`
`JPEG/MPEG-l!
`MPEG-21H.261
`
`Vendor
`Matsushita
`
`NEC
`
`Pioneer
`
`Array
`Microsystems
`
`MPEG-IIPx64
`JPEG
`
`JPEG,MPEG,Px64
`
`Integrated
`Information
`Technology
`Media Vision Motive video
`encode/decode.
`1:16 max
`
`MPEG encode for SIF,
`JPEG encode/decode.
`(720x480), MPEG for
`NTSC.
`MPEG-2 encoding at
`CCIR 601 resolution is
`in non real-time.
`640x480 pixels
`
`not MPEG compliant.
`
`Motion-compensation: In CD-ROM applications, since a CD-ROM drive's output
`data rate is usually 1 - 1.5 Mbits/sec and digitized NTSC rates are at 80 - 130
`Mbits/sec, compression ratios around 100: 1 are sought for CD-ROM based video.
`The MPEG-l standard was motivated by the expectation that many video applica(cid:173)
`tions will be CD-ROM based, e.g. computer-based training. To achieve such high
`compression ratios, one needs to exploit both spatial and temporal redundancies.
`This is accomplished through motion-compensation. This technique is used in
`MPEG and Px64 encoding, wherein, frame-to-frame changes are determined and
`only the changes are transmitted. In most implementations, some form of block(cid:173)
`matching is performed between blocks (frames are partitioned into non-overlapping
`blocks). This is highly parallelizable a