`Robert J. Gove
`
`Texas lnstuments, Inc.
`Dallas, Texas 75265
`
`ABSTRACT
`
`We introduce a new highly-integrated processing chip for performing a variety of
`functions, however this chip is particularly well suited for video compression algorithms.
`Applications include multimedia PCs, virtual reality 3D graphics,
`full-duplex
`videoconferencing, HDTV, and color hardcopy. We have architected the Multimedia Video
`Processor, or MVP, to provide a yet unattainable level of performance from a single chip,
`although with the programmability typically found in today's general-purpose computers.
`While advanced semiconductor design and process techniques have been used for its
`design, the key to the advantage of this component lies in optimization of the architecture
`for real-time video and graphics processing. This paper will analyze video compression
`application requirements, describe the MVP architecture, and pose its potential as a very
`capable solution for a wide range of markets.
`
`INTRODUCTION
`
`The computer and consumer video industries are pursuing varied paths to offer cost-
`effective computing products which provide new forms of information and entertainment.
`Products are emerging from cable TV delivery of interactive digital movies to digital mobile
`offices. Digital compression and video processing at a reasonable cost are spurring this
`revolution. While algorithm developments have been important, most of the enabling
`advances lie in the availability of high-density memory and high-performance processing
`ICs. With the pending general availability of the Multimedia Video Processor, or MVP, in
`1994, a yet unattained level of digital signal processing performance will be available and
`with all the flexibility of present day programmable computers. Standard-based video-
`conferencing and playback of compressed digital video and audio (using PX64, JPEG or
`MPEG "multi-standar " codecs systems) with a single MVP processor will be possible, as
`well as eodecs with yet-to-bc—deflned algorithms like model-based compression.
`However, not only will the MVP support compression, it will also handle processing of
`high-resolution video, full—motion video processing from sources like camcorders, digital
`audio processing, hardcopy raster image processing, and 3D graphics, and all under
`software control and generation. From this wide range of functions, we calculated that
`several billion operations per second are required to provide video-based applications on
`the desktop. Current and soon to appear desktop host processors like X86, Pentium,
`Alpha, and MIPS do not have the computational power to meet these demands.
`KEYS TO THE MVP ARCHITECTURE
`
`The MVP's unique architecture and computational power enables users to integrate these
`varied functions on a single processing component. The keys to obtaining both exceptional
`processing speeds and fully—programmable features with the MVP include the use of:
`
`(1) an qficienr parallel processing architecture,
`In ;...«.,1r;,...nr /-/mu-nl nfimnna Ann. I7...“ nl.wuml.,u.¢ olm nynhirprrrivp
`(2) fast pixel processing tuned to image, video, and graphics processing,
`V», urwvprué m uurul Us U] msu5e LGMLMJIUYV nu uM«5Iu./uu nu. m w»u.uuu e,
`(4) single—chip integration without slower chip-to—chip communications.
`
`1068-0314/94 $3.00 © 1994 IEEE
`
`215
`
`Page 1 of 10
`
`HTC—LG—SAMSUNG EXHIBIT 1006
`
`PRIOR-ART_0010815
`
`
`
`2l6
`
`DSP Parallel Processors (PPn) :
`Advanced DSP Cores
`
`Master Processor (MP)
`Advanced RISC
`
`
`DataRAM2
`
` .-ParameterRAM Dat3RAMODataRAM1
`
`ParameterRAM
`
`Dat:RAMZ
`
`DataRAM1
`
` ParamezerRAM Data
`RAMO
`
`
`
`ParamecezRAM
`
`
`
`Datacache
`
`
`
`Datacache
`
`
`
`
`
`.. .O.Instrcache
`
`Figure 1: MVP Block Diagram
`
`(A Single-Chip Parallel
`
`Processor)
`
`Page 2 of 10
`
`PRIOR-ART_0010816
`
`
`
`217
`
`ALGORITHM-DIRECTED ARCHITECTURE DEFINITION
`
`Processing Requirements
`Today's proposed international video compression standards use common frequency
`domain, quantization, and entropy coding techniques to (de)compress small portions (8x8)
`of each image. While these functions demand a great deal from the encoder/decoder, many
`other varied functions remain, each with dynamic requirements which vary based on_the
`type of image compressed as well as the channel rate required to maintain real-time
`A
`operation. For optimal efficiency a processor must adapt to these dynamic needs.
`typical average of the processing demands of the Px64 video-conferencing standard
`appears in the following table.
`
`RISC vs. MVP—PP Processing Requirements for Px64
`RISC Execution
`MVP Execution
`Speed-up of
`
`Speed (avenge
`MVP-PP vs. also
`96 of lltne) '
`
`PI“ (H-231)
`FULL-DUPLEX, FULL-CIF,
`am: Functions
`
`Notion Estimation — Block Malohing (encode)
`Enaodnq Decisions — (1) Inter w/motion
`vatmn-., (2) Interwleeded c§fi.,(3) Inna
`Loop Filbving (both)
`
`Difference image (current - predicted)
`Fast DCT (encode)
`Thtesholdlauantizaton/Zig—Zag Run-lengh
`§
`
`IDCT (both)
`Reconstruction (both)
`(predated + tilt. image)
`Bitstream Decode & Dequantization (decode)
`TOTAL CYCLES
`(MIPS)
`
`o
`
`08
`
`0.014
`0.161“
`
`.9.5:EQ
`
`00$1
`0.071
`
`0.077
`
`0.0l8
`1.W=
`1,193 MIPS
`
`
`
`1.00:
`155 PF IIIPS ““‘
`
`
`
`
`
`AVERAGE
`SPEED-UP = 7.7
`
`
`
`
`
`
`
`* Multiply counted a one instruction even though most RlSCs require many cycles.
`** If the "Truncated-IDCT“ algorithm was used, IDCTs speed—up again (see later).
`"‘*"' The total is equivalent to 3 MVP-PP processors (see below PP section).
`"‘*** Audio standards concurrently execute on the MVP-MP (see below MP section).
`
`As we studied the computational requirements for motion estimation (51%) and DCTS
`(22%) it became quite apparent that a programmable image processor must excel at these
`[6
`functions. It is important to recognize that what‘s done poorly in a processor can domina
`its performance. Since most architectural improvements would not uniformly accelerate all
`functions uniformly, we looked for special architectural features for these critical functions,
`al
`while maintaining enough flexibility to benefit a larver class of algorithms.
`In fin
`ana1ysi§,7{'£.Ttié'1}' more uniform distribution of compuiational loading resulted after the
`changes.
`
`As seen in the table, the programmable image processor must perform many other
`functions well, including: bit manipulation and table look ups for entropy encoding, and
`
`Page 3 of 10
`
`PRIOR-ART_001081?
`
`
`
`218
`
`multiply and accumulate for various types of filtering operations. To obtain good image
`quality at any channel rate and 30 frames per second, the image processor rrrust compute
`over 1.2 billion operations per second (BOPS).
`
`The addition of audio compression (which requires higher precision integer and possibly
`floating point algorithms) and network communication, necessary for video conferencing
`(G.728 or G.71l, H.242, H.230, H.221), further increases the scope of computational
`requirements. Reducing the system cost, we propose to include support in the architecture
`for the required nonstandard functions like color space conversion (YCrCb to RGB),
`decimation of the source image to GP resolution and variable scaling of the decompressed
`sequence. Complete implementation of compression applications such as video-
`conferencing requires over 2 BOPS of the programmable image processor.
`
`ARCHITECTURE CHOICES
`
`We considered several candidate parallel architectures for implementation of this single~chip
`video processor [Gove~92, Guttag~92]. An architecture with a mix of dedicated and
`programmable processors was initially evaluated, then subsequently discounted when no
`single dominant function was found that was necessary almost all of the time. Besides, we
`predicted that by the time the chip was completed, that a new important algorithm would
`emerge. From the standpoint of loss of silicon efficiency by dedicated resources to any one
`function (like a DCT), we felt compelled to seek a general-purpose well—balanced system
`solution. Several other candidates existed, however the mix of algorithms and practical
`implementation limitations focused us on SIMD and MIMD architectures. These differ by
`the autonomy of the processors functions with MIMD —- a desirable feature for any data
`dependent algorithm operating in parallel.
`
`With MIMD desirable, the choice of a processor and memory interconnection architecture
`remained. Pipelined, shared bus memory, communication port (mesh/array/hypercube),
`and crossbar fully-shared memory were considered. Pipeline memory and processors
`(systolic arrays) are typically used for video, however they're too restrictive in the sense
`that one must a priori know the size of the memory and dynamics of the algorithm to
`prevent data contention and processor stalls. With our varied needs, this would lead to
`inefficiencies. A shared-bus memory structure would also have bottleneck problems with
`highly variable instruction and data streams and moving of results from one processor to
`the other. The n-way connected communication port requires a very ordered flow of data,
`like a systolic or wavefront flow of data, or the application of a pixel per processor (not
`practical in a single chip). This approach works for large arrays of simple processors
`which can operate uniformly on images, however we wanted more complex processors
`which could adapt
`to varying types of data, from bit graphics to floating-point
`representations. The crossbar fully~shared memory is ideally suited to these needs.
`minimizing contention, data movement and providing flexibility for many types of
`algorithms.
`In fact, since the crossbar operations at the processor instruction rates, this
`architecture can functionally emulate the other approaches (pipeline, shared bus...).
`
`We not only wanted to provide this order of magnitude performance increase, but the goal
`was to apply a traditional computer model of programmable processing and a large memory
`to applications with integrated image, graphics, video and audio processing, or image
`computing. As shown in Figure #2 titled "MVP System Architecture", replacing the
`processing and memory pipeline of conventional video systems with the single video
`processor and large memory system model yields tremendous application flexibility.
`In
`effect the system can re-configure itself with software from video conferencing to playing
`CD movies, just as a PC would re-configure from a spreadsheet to a video game.
`
`Page 4 Of 10
`
`PRIOR-ART_0010818
`
`
`
`219
`
`Figure 2: The MVP "System" Architecture.
`Interface for:
`- image, audio data from computer memory (disk, photo—CD...).
`- data from networks (phone or local digital).
`- image/video display on workstation monitor.
`Hosrcoulpuran
`INTERFACE
`
`MEMORY
`- Applition memory
`instruction
`—
`- muitiple images,
`audio...
`
`
`
`INPUT/OUTPUT
`INTERFACE
`
`Interface tor:
`
`live video is audio (cawieras.VCRs)
`~
`- display on TV monitors.
`
`THE MVP ARCHITECTURE
`
`The Multimedia Video Processor, or MVP, represents the next-generation of digital signal
`processors. The MVP can be technically described as a single—chip crossbar shared
`memory heterogeneous MIMD multiprocessor.
`It combines RISC and advanced DSP
`processing in one parallel architecture with unique features for each. Current RISC
`processors typically use instruction pipelining, numerous registers and a detached floating
`point processor. On the other hand, current DSPs are optimized for one dimensional
`multiply—accumulate functions. Newer DSPs have floating-point capabilities, yet most
`imaging and video only needs integer operations. DSPs usually have fewer registers than
`RISC and have direct memory accesses (DMA) with limited capabilities.
`
`The MVP combines the best features of RISC and DSP in parallel and adds other features
`to offer unprecedented Power and Flexibility. The heart of an image or video chip is its
`capability to process 2D signals. The MVP has features for 2D DSP-like processing.
`including multiply-accumulate operations. The on—chip memory and register characteristics
`of the MVP were optimi7ed for image computing algorithms, preventing time consuming
`cache misses or swapping of register contents. Multidimensional external memory access
`and double buffering minimizes the typical memory bottleneck of current DSP solutions.
`An internal memory crossbar provides extremely efficient synchronization and
`communication of multiple processors. A very high—performance RISC processor is
`integrated on the chip, providing intelligent control of the DSP-like processors. Also
`A vtrtxnan
`uu; 'A|1\u
`iI‘rteg'r‘a°tgd1i.rIito the chip, a new floating-point architecture can act as a co-processor to any of
`[.uvu\.b5UK'S or the RISC processor. By analysis of the algorithms, the
`required mix of integer ops to floating-point ops was somewhere between 8:1 and 4:1 -- a
`balance which the MVP supports. The entire collection of processors and memory is
`configured as a MIMD architecture for ease of programming and high performance for all
`image and video computing applications. This MIMD data and control supports both data
`
`Page 5 of 10
`
`PRIOR-ART_0010819
`
`
`
`220
`
`dependent algorithms like object feature matching or Huffman coding and also supporting
`traditional data independent SIMD operations like convolution.
`
`To prevent contention for memory or register access, a very wide instruction set in the
`non. &..d 6 13..., ...._..L.:... ,.....,-t...__...1 _..,......... z.. ........i z. 4., nun) "r‘I.:. t1-..:k:t:m .-mo-mire
`gun .3
`I
`A 5» uu puny L/llJ33Ua.llDLl lllbllluly 13 uauu an tilt, in v n . Lana 1AuI\JuA1ALJ yunanuua
`the programmer to produce highly—parallel optimized code. A performance penalty may
`result if only one high1y—serial task is performed continuou sly, however, the very nature of
`image, video, graphics, and audio processing, with varied concurrent and complex
`processing, prevents this from occurring. The MVP integrates more functions than ever
`before into one chip, while avoiding the compromises of other architectures.
`
`Detailed Architecture Description:
`Figure #1, titled "MVP Block Diagram”, shows the MVP chip architecture. The Master
`Processor (MP) provides a RISC processor for simple user interface, sequential
`processing, and orchestration of multiple concurrent tasks operating on the entire MVP.
`The DSP Parallel Processors (PP), of which 4 will be designed in the first version of the
`MVP, provide highly—optimized image/video/graphics/audio processing capabilities. The
`Transfer Controller (TC) intelligently moves data and instructions on and off the MVP. All
`of these processors are locally interconnected with a crossbar to 25 on—chip 2Kbyte SRAM
`modules. Other features include dual video frame timing generators (VC) and JTAG test
`and emulation circuits.
`
`With five 32+bit programmable processors operating at one targeted state rate of 50MHz
`and numerous parallel operations performed in each processor, over 2 billion operations
`per second result.
`In addition, 100 MFIDPS (fully IEEE-754) can occur. The peak data
`transfer rate is then 400 MBytes/second, adequate for many video applications. The
`internal bandwidth over the crossbar between on-chip memory and processors is 2.4
`GBytes/second.
`
`DSP PARALLEL PROCESSORS (PP)
`
`The PP has many powerful features beyond those found in conventional DSPs. Practically
`all video algorithms benefit from these features. Most of the features were added to permit
`scalability within the PP to support many simple functions (like bit ops) in one cycle or
`fewer operations with the same hardware at higher precision (like 32-bits). The following
`describes the feature and advantage:
`- 44 user registers:
`- ease of programming/compiling and fast parallel functions.
`- Single-cycle access into crossbar memory expands effective registers to 34K:
`- flexibility.
`- 'Ihree—level, no overhead instruction looping:
`- programming flexibility and faster tight loops (usually 20.30%)
`- Double parallel transfer from memory with address update:
`- most algorithms need two pixels loaded per cycle.
`- Three-operand ALU arithmetic and logical operations:
`- double speed correlation and windows support.
`O Splitable multiply (8x8=l6 or 16x16=32):
`- double speed pixel operations.
`- Word/Halfword/Byte multiple arithmetic:
`- 4x on algorithms like motion estimation and 2x on fast DC’l‘s.
`~ Flexible data path:
`- masking, merging, rotating... for bit stream coding (like Huffman).
`- General-purpose use of address adders:
`
`Page 6 of 10
`
`PRIOR-ART_0010820
`
`
`
`221
`
`_
`_
`- up to 6x number of adds in one cycle.
`- Conditional operations prevent need for branching (and possible pipeline stalls):
`- adaptive algorithms will operate faster (like adaptive thresholding).
`
`As a result, as many as 15 RISC operations will be performed in one PP cycle. When
`multiplied by the number of FPS and added to the MP and FPU operations, a formidable
`number results.
`In addition, since the C—compiler also influenced the architecture of the
`PP, many of these features will automatically compile into fast code -- many users of the
`MVP will not need to understand the PP architecture to take advantage of its performance.
`
`MASTER PROCESSOR (MP)
`
`The MP is a general-purpose RISC processor with an integral IEEE-compatible floating-
`point unit. A 32-bit instruction is accessed from a 4KByte instruction cache. Data loads
`can be 8, 16, 32, or 64 bits from a 4KByte data cache or from any data module via the
`crossbar. The MP has thirtyone 32-bit usable registers. Uncommon features include:
`- Register files common to floating—point & integer operations.
`- Scoreboard keeps track of result of loads and FPU, preventing use until updated.
`- Addressing modes support optional updating of base-address register with
`results of the address computation.
`- Special FPU instruction permits new multiply, add/subt, & increment each cycle.
`- Left-most and Right-most one logic.
`- Both endians supported.
`
`Since the MP was designed to efficiently execute C programs and has added hardware for
`bitstream processing, it performs exceptionally well as the controller and data interpretation
`processor. The floating point capability accelerates and simplifies programming of high
`precision applications like medical imaging and 3D graphics.
`
`SHARED MEMORY & TRANSFER CONTROLLER (TC)
`
`Much of the advantage of the MVP architecture lies in the memory and data I/O
`architecture. Each processor and memory is fully interconnected through the crossbar and
`switchable at instruction rates. With greater than 500 signal lines switching at nanosecond
`speeds,
`the crossbarred memory architecture is only possible with single-chip
`implementation. With adequate on-chip memory and the ability to reconnect the next
`processor to the data memory, rather than moving the data to another memory, the data on
`chip is not required to move as often. In effect, the original requirement of billions of
`bytes/second data transfer is reduced to only 100's of Mbytes/second. This model works
`well as long as the algorithm uses localized regions of data (patches, blocks,
`neighborhoods, rows...), each of which "fit" into the on-chip memory, and are accessed in
`repeated or predictable patterns. While this usually occurs with image processing, an
`extremely intelligent transfer controller was architected to aid in insuring the validity of this
`assumption. The TC has numerous modes of transferring data on- or off—chip, each
`optimized for a particular type of dataflow (block, patch, fat line, indexed or guided
`patches...). Most importantly, the on-chip SRAM memory was architected with sufficient
`size and modularity to permit double-buffering of data I/O on and off the chip, while the
`on-chip processors access the other on-chip memory modules.
`In effect, practically no
`overhead is required for video I/O. Many convenient methods were designed into the TC
`to prioritize these accesses. In addition, we included support for most commodity memory
`components (VRAM, SRAM, DRAM). Finally, we devised several methods to mitigate
`any contention between the processors for a particular memory module. Both round-robin
`and fixed—robin priority schemes are available to pemtit developers flexibility in structuring
`
`Page 7 Of 10
`
`PRIOR-ART_0010821
`
`
`
`222
`
`their algorithms to reduce contention. For the many image and video algorithms currently
`developed for the MVP thus far, contention has not been a problem.
`
`Another advantage of the crossbar architecture is expandability. We can design many
`different MVP chips, as a function of the number of PPs. We simply slice the architecture,
`cutting or adding PPs and memory modules. Conceptually, the advantage of this approach
`is that, with the same package and pin—out, several different performance and price points
`can be used. A range of applications may require a range of different MVP chips.
`Applications which require CCIR 601 studio quality video and/or multifunction processing
`(graphics / audio / video) would most-likely require an MVP with 4 processors. On the
`other hand, a more dedicated or single-function application like graphics may require
`fewer PPs.
`In addition, if only limited resolution video (QCIF) processing is necessary,
`again a small number of PPs could suffice. We anticipate various versions of the MVP in
`the future.
`
`VIDEO CONTROLLER (VC)
`
`In addition, the MVP has two programmable Iiming controllers for generation of video and
`other timing signals. As an example, video frame grabbing and display requires may pixel,
`horizontal, and vertical signals for synchronization of the external logic in the system. The
`MVP has internal logic to generate those signals under program control, relieving the
`system designer from design of external logic to perform those functions.
`NEW DCT ALGORITHMS FOR COMPRESSION
`SPEED-UP WITH USE OF A PROGRAMMABLE ARCHITECTURE
`
`One advantage of the programmable compression chip is the optimization possible by
`selecting the least computationally demanding DCT algorithm that will meet the accuracy
`required of the application. For example, fast DCT algorithms like those of Lee and Chen
`[Lee-84, Chen-77] have considerable advantage with respect to traditional matrix multiply
`approaches (with a factor of 5 or more speedup). Seperability of the 2D DCT is generally
`used for decomposition of DCTs (with successive processing of the individual rows &
`columns of an image) . The size of the DCT directly influences the benefit of seperability,
`however with the 8x8 DCTs of most standards, a definite speedup results, The Lee
`algorithm tends to be easy to implement and achieve faster computation. although has
`accuracy issues. The Chen algorithm is harder to implement and is computational slower,
`but with good accuracy. Depending on the available processing bandwidth. the €nC0d€1'
`can select an appropriate DCT or IDCT algorithm to perform the task.
`If errors result,
`different coding decisions result and either lower SNRs or compression ratios occur.
`
`In addition, we devised a "Truncated-IDCT" algorithm to utilize the advantages of a
`programmable architecture. Since the DCT, quantizer and threholding operations seek to
`minimize the population of selective frequency coefficients (for high compression),
`statistically, most of the high frequency COCfflClC[1tS are zero valued. Therefore, the
`conventional IDCT will act on 8x8 matrixes with a high percentage of zero valued inputs.
`We can then significantly reduce the amount of IDCT operations performed by not
`executing the zero valued multiplies and adds (similar work has been reported[McMillan-
`92]). This is only possible with software-based IDCTs.
`In implementation, the program
`adaptively truncates the Sxl IDCT summation in the vertical direction based on the run-
`length encoded input values. Further reductions by selecting 4x1 summation when
`appropriate also shortens the process, although not as frequently. With this approach a
`factor of 3 or more speed-up on IDCTs will usually occur.
`
`Page 8 of 10
`
`PRIOR-ART_0010822
`
`
`
`TOOLS
`
`223
`
`Advances in video compression have been limited by the availability of tools to develop
`software and hardware. With the MVP, TI offers a range of software tools and direct on-
`chip support for in-circuit debug, A real-time executive. C++ compilers. algebraic
`assembler, windowed high-level language debugger (with JTAG emulation hardware on
`chip) and library of primitives/applications, all the tools familiar to computer application
`developers, will now be available for development of video applications-
`
`The software model for the MVP is based on two levels. The primary level includes the
`Master Processor acting as a director and scheduler of the MVP's parallelism. The
`Executive operates on the MP, performing those supervisory tasks. The Executive can
`dispatch tasks for operation in pipeline, parallel or any other arrangement on any processor
`within the MVP. Under that, a level which actually performs the tasks on each processor is
`accessed by either: (1) a library of primitives, (2) application tools for programming in
`assembly or (3) a high—level language compiler. Each of these methods have advantages,
`with varying performance and skill level required to code the chip, as a function of the
`particular application. Although nothing restricts the use of any processor as the master or
`slave processor, only software convention.
`
`COMPETING VIDEO COMPRESSION CHIP ARCHITECTURES
`
`Several semiconductor companies have reported activity in video compression chip or chip
`set solutions ([Bolton-93][Konstantinides-92]). Most chip manufacturers are proposing
`hardwired or paramaterized architectures, without C—level programmability. Our MVP is
`an exception. In addition, most of the other "programmable" approaches are based on an
`architecture which integrates dedicated logic modules, like DCTs and Motion Estimators,
`with only the controller programmable. This limits their efficiency since the silicon devoted
`to those functions must always keep busy with those functions to justify their cost. On the
`contrary the MVP architecture has no dedicated logic, permitting user balancing of silicon
`based on the varied and dynamic computational demands of compression. Other
`researchers have recently described simulations which support our position that universally
`programmable architectures are competitive solutions when compared with dedicated or
`hybrid architectures for video decompression (Mayer—93).
`
`Many different architectures are proposed, in development, or currently available, however
`none except the MVP has the flexibility nor computational perfonnance to meet the
`complete demands of truly integrated digital video on the desktop, including the complete
`concert of real-time video & audio compression, with image & 3D graphics processing.
`Not only compression and decompression, but system-level bit-stream control, video
`scaling, error correction and even audio echo cancellation. Architecture limitations and
`transistor counts limit other chips to subsets of these functions.
`
`CONCLUSION
`
`The MVP is a monolithic siugle—chip parallel processor that performs compression
`processing. audio & video processing, 3D graphics and others, and even at the same time.
`Over 2 billion operations are performed per second. This dramatic performance boost will
`enable a wide range of new applications, including desktop interactive digital video.
`Integrating fully-programmable parallel DSP processors with a RISC processor on one
`chip provides software flexibility and system adaptability. A new parallel architecture,
`using a crossbar network to couple the processors and large on—chip SRAMs, and with
`MIMD (Multiple Instruction Multiple Data) operations, yields extremely high efficiency for
`
`Page 9 of 10
`
`PRIOR-ART_0010823
`
`
`
`224
`
`most image, graphics, and video algorithms. Software tools like real-time executives,
`assemblers and compilers all help bring a familiar computer programming model to
`multidimensional signal processing. This new technology frees developers of compression
`algorithms to optimize implementations of standard video & audio compression algorithms,
`without the restrictions found in today's compression chips (those which _are limited to
`current interpretations or versions of the standards). In addition, algorithm developers can
`implement future compression algorithms, without the difficulties of developing new chips
`or adapting existing chips.
`
`The MVP supports a wide range of open standards for video compression and image
`computing. The variation within each standard to promote creative and distinguishing
`advantages in the market place and the constant urge to optimize the standard to a particular
`range of markets, each work to prevent fixed hardware solutions. This programmable,
`integrated solution gives flexibility to system designers to develop competitive algorithms
`as well as adapt to emerging standards.
`
`ACKNOWLEDGMENTS
`
`The author wishes to thank Jeremiah Golston, Dr. Chris Read, and Dr. V. Venkateswar for
`compression algorithm work relating to the MVP. In addition, thanks to the MVP Program
`Manager, Walt Bonneau, for developing and motivating a "world—class" team. A special
`thanks to my original co-MVP-architects, Keith Balmer, Karl Guttag, and Nick Ing-
`Simmons. Finally, thanks to the entire "Team—MVP"!
`REFERENCES
`
`[Bolton—93] Bolton, M. "A Family of MPEG Video Encoder and Decoder Chips", IEEE
`Proceeding of Conference on Hot Chips, 1993.
`
`[Chen—77] Chen, W.H., C.H. Smith, and S.C. Fralick, "A Fast Computational Algorithm
`for the Discrete Cosine Transform", IEEE Transactions of Communication, Vol. 25, pp.
`1004-1009, Sept. 1977.
`
`[Gove-92] Gove, R..l., "Architectures for Single—Chip Image Computing", SPIE
`Proceedings of Conf. on Image Processing and Interchange, San Jose, Ca., Feb 1992.
`
`[Guttag-92] Guttag, K.M., R.I. Gove, & J.R. Van Aken, "A Single-Chip Multiprocessor
`For Multimedia: The MVP", IEEE Computer Graphics & Applications, pp.53-64, 11/92.
`
`[Konstantinides-92] K. Konstantinides & V. Bhaskaran, "Monolithic Architectures for
`Image Processing & Compression”, IEEE Computer Graphics & Applications, pp 75-86,
`Nov. 1992.
`
`[Lee-84] B.G. Lee, "A New Algorithm for the Discrete Cosine Transform", IEEE Trans.
`on Acoustics, Speech, and Signal Processing, Vol. 32, pp. 1243-1245, 1984.
`
`[Mayer-93] A. C. Mayer, "The Architecture of a Processor Array for Video
`Decompression", IEEE Trans. on Consumer Elect..Vol39, No.3, PP 565-569, Aug. 1993.
`
`[McMil1a.n-92] McMillan, L.& L. Westover, "A Forward—Mapping_Realization of the
`Inverse Discrete Cosine Transform", IEEE Proc. of the Data Compression Conference, pp
`219-228, 1992.
`
`Page 10 of 10
`
`PRIOR-ART_0010824