throbber
MorphoSys: A Reconfigurable Architecture for Multimedia Applications
`
`Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J. Kurdahi, Nader Bagherzadeh
`University of California, Irvine, CA (United States),
`and Eliseu M.C. Filho, Federal University of Rio de Janeiro (Brazil)
`
`Abstract
`
`We describe the MorphoSys reconfigurable system,
`which combines a reconfigurable array of processor cells
`with a RISC processor core and a high bandwidth memory
`interface unit. We introduce the array architecture, its
`configuration memory, inter-connection network, role of
`the
`control
`processor
`and
`related
`components.
`Architecture implementation is described in brief and the
`efficacy of MorphoSys is demonstrated through simulation
`of video compression (MPEG-2) and target-recognition
`applications. Comparison with other
`implementations
`illustrates that MorphoSys achieves higher performance by
`up to 10X.
`
`1. Introduction
`
`Reconfigurable computing systems are systems that
`combine
`reconfigurable
`hardware with
`software
`programmable processors. These systems allow the user to
`configure or customize some part of the hardware for
`different applications. Reconfigurable computing adheres
`to a hybrid approach between the extremes of ASICs
`(Application-specific ICs) and general-purpose processors.
`A reconfigurable system typically has wider applicability
`than an ASIC and better performance than a general-
`purpose processor for target applications.
`arrays
`gate
`Conventionally,
`field-programmable
`(FPGAs) have been used for reconfigurable computing.
`These allow designers to manipulate gate-level devices.
`However, FPGAs are slower than ASICs, have lower logic
`density and are inefficient for word operations. Several
`researchers have proposed other reconfigurable systems
`such as PADDI [1], rDPA [2], DPGA [3], and REMARC
`[4]. These are described briefly in a following section.
`
`1.1. MorphoSys: Reconfigurable architecture
`
`This paper describes MorphoSys, a novel architecture
`for reconfigurable computing systems. This design model,
`
`involves a reconfigurable SIMD
`in Fig. 1,
`shown
`component on the same die with a general-purpose RISC
`processor, and a high bandwidth memory interface. This
`integrated architecture model will hopefully provide the
`potential to satisfy the increasing demand for low cost
`stream/frame data processing in multimedia applications.
`MorphoSys
`targets
`applications with
`inherent
`parallelism, high regularity, computation-intensive nature
`and word-level granularity. Some examples of these
`applications are video compression (DCT/IDCT, motion
`estimation), template matching, etc. MorphoSys also
`supports complex bit-level applications such as ATR
`(Automatic Target Recognition).
`
`Main Processor
`(e.g. advanced RISC)
`
`Instruction,
`Data Cache
`
`MorphoSys
`
`Reconfigurable
`Processor Array
`
`High Bandwidth
`Memory Interface
`
`System Bus
`
`External Memory
`(e.g. SDRAM, RDRAM)
`
`Fig. 1: Integrated architectural model for
`reconfigurable systems
`
`1.2. Organization of paper
`
`Section 2 explains some important terms relevant to
`reconfigurable computing. A brief review of related
`research contributions follows. Section 4 introduces the
`architecture of M1, the first version of MorphoSys
`reconfigurable system, and describes the reconfigurable
`cell array and other components. Section 5 describes the
`physical design process with area and delay estimates. We
`provide performance estimates in Section 6 for a set of
`applications (video compression and ATR).
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 134
`
`

`

`2. Important concepts
`
`4. MorphoSys: System architecture
`
`Different parameters are used to characterize the design
`of a reconfigurable computing system. Some of these are:
`(a) Granularity (fine or coarse): refers to the data size
`for operations. Bit-level operations (gates or look-up
`tables) are fine-grain but coarse-grain granularity
`implies operations on word-size data, using ALUs.
`(b) Depth of Programmability (single versus multiple):
`This pertains to the number of configuration planes
`resident in a system. Systems with multiple contexts
`(each of which specifies a configuration plane) may
`perform different functions without reload of
`configuration data.
`(c) Reconfigurability (static versus dynamic): System
`reconfiguration is either static (execution is stopped)
`or dynamic (parallel with execution). Single context
`systems have static re-configuration. Multi-context
`systems may be dynamically reconfigurable.
`(d) Interface (remote versus local): A local interface
`implies that the host processor and programmable
`component reside on the same chip.
`(e) Computation model: The computation model may
`be described as either SIMD or MIMD. Some
`systems may also follow the VLIW model.
`
`3. Related research
`
`•
`
`•
`
`There has been considerable research effort to develop
`reconfigurable computing systems. Some of these research
`prototypes are described below.
`•
`The Splash [5], a linear array of processing elements,
`is useful for linear systolic applications. DECPeRLe-1
`[6] is a two-dimensional array of 16 FPGAs. Both
`systems are fine-grained, with single configuration
`and static reconfigurability.
`PADDI [1] has a set of concurrently executing 16-bit
`(coarse-grain) functional units (EXUs). PADDI targets
`real-time DSP applications (filters, convolvers, etc.)
`rDPA [2]: The reconfigurable data-path architecture
`consists of an array of data-path units (DPUs). Each
`DPU consists of ALU, micro-programmable control
`and four registers. It is dynamically reconfigurable.
`REMARC: This [4] consists of a reconfigurable
`coprocessor, with 64 programmable units. Each 16-bit
`unit has a 32 entry instruction RAM, a 16-bit ALU, 16
`entry data RAM, instruction register, and several other
`registers. This system targets multimedia applications.
`• DPGA: A
`fine-grain
`prototype
`system,
`the
`Dynamically Programmable Gate Arrays (DPGA) [3]
`use traditional 4-input lookup tables as the basic array
`element. Array elements are grouped as sub-arrays
`that have complete row and column connectivity.
`
`•
`
`Fig. 2 shows the organization of the MorphoSys
`system. It is composed of an array of reconfigurable cells
`(RC Array) with configuration data memory (Context
`Memory), a control processor (Tiny RISC), a data buffer
`(Frame Buffer) and a DMA controller.
`
`4.1. System components
`
`Reconfigurable Cell Array: The main component of
`MorphoSys is the 8 x 8 RC (Reconfigurable Cell) Array.
`Each RC has an ALU-multiplier, a register file and is
`configured through a 32-bit context word.
`Host/Control processor: The controlling component of
`MorphoSys is a 32-bit processor, called Tiny RISC, based
`on [7]. Tiny RISC handles serial operations, initiates data
`transfers and controls operation of the RC array.
`Frame Buffer: The two-set Frame Buffer enables
`stream-lined data transfers between RC Array and main
`memory, by overlapping of computation with data load
`and store, alternately using the two sets.
`M1 Chip
`
`Tiny_RISC
`Core Processor
`
`16
`
`16
`
`16
`
`22
`
`6
`
`4
`
`Instr
`Data
`Cache
`
`32
`
`Context
`Memory
`2x8x
`16x32
`
`RC
`Array
`(8 X 8)
`
`256
`
`SDRAM
`Main
`Memory
`
`32
`
`DMA
`Controller
`
`64
`
`10
`
`Frame Buffer
`(2x2x64x64)
`
`32
`
`64
`
`64
`
`9
`
`Fig. 2: MorphoSys (M1 chip) components
`
`4.2. Salient features of MorphoSys
`
`The RC Array follows SIMD model of computation.
`All RCs in the same row/column share same configuration
`data. In brief, important features of MorphoSys are:
`•
`Coarse grain: MorphoSys operates on 8 / 16-bit data.
`•
`Configuration: RC array is configured by context
`words, which specify an instruction opcode for RC.
`• Depth of programmability: The Context Memory can
`store up to 32 planes of configuration.
`• Dynamic reconfiguration: Contexts are loaded into
`Context Memory without interrupting RC operation.
`Local/Host Processor: The control processor (Tiny
`RISC) and RC Array are resident on the same chip.
`Fast Memory Interface: Through DMA controller.
`
`•
`
`•
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 135
`
`

`

`4.3. Tiny RISC instructions for MorphoSys
`
`Several new instructions were introduced in the Tiny RISC
`instruction set for effective control of the MorphoSys RC
`Array operations. They perform the following functions:
`•
`data transfer between main memory (SDRAM) and
`Frame Buffer,
`loading of context words from main memory into
`Context Memory, and
`control of execution of the RC Array.
`
`•
`
`•
`
`4.4. Architecture of the Reconfigurable Cell
`
`column and the register file. Mux B (8-to-1) takes its
`inputs from register file, an array data bus and three of the
`nearest neighbors.
`
`4.5. Context Memory
`
`The Context Memory is organized into two blocks (for row
`and column contexts) with each block having eight sets of
`sixteen context words. The RC Array configuration plane
`comprises eight context words (one from each set) from
`either the row or column block. Thus the Context Memory
`can store 32 configuration planes or 256 context words.
`Context register: Each RC is configured through a
`context word stored in a 32-bit Context Register. It is a
`part of each RC, whereas the Context Memory is separate
`from the RC Array. The different fields for the context
`word are defined in Fig. 4. The field ALU_OP specifies
`ALU function. The control bits for Mux A and Mux B are
`specified in the fields MUX_A and MUX_B. Other fields
`determine the registers to which the result of an operation
`is written (REG #), and the direction (RS_LS) and amount
`of shift (ALU_SFT) applied to output. The context
`includes a 12-bit field for the constant.
`
`The reconfigurable cell (RC) array consists of an 8x8 array
`of identical Reconfigurable Cells (RC). As Fig. 3 shows,
`each RC comprises of an ALU-multiplier, a shift unit, and
`two multiplexers for ALU inputs. It has an output register,
`a feedback register, and a small register file. A context
`word, loaded from Context Memory and stored in the
`context register (Section 4.5), defines the ALU function,
`input multiplexer control bits, result storage, and the
`direction/amount of output shift. The context word can
`also specify an immediate value.
`R0 - R3
`
`I
`
`M
`
`T
`
`C
`
`R
`
`B
`
`L
`
`E
`
`XQ
`FB
`
`U D
`
`L
`
`I
`
`MUXA
`
`MUXB
`
`W R - B u s
`
`R E G #
`
`A L U _ S F T
`
`M U X _ B
`
`C o n s t a n t
`
`3 1
`
`3 0
`
`2 9 - 2 8
`
`2 7
`
`2 6 - 2 3
`
`2 2 - 1 9
`
`1 8 - 1 6
`
`1 5 - 1 2
`
`1 1 - 0
`
`W R _ E x p
`
`R S _ L S
`
`M U X _ A
`
`A L U _ O P
`
`Context Register
`
`Context word from Context Memory
`
`Fig. 4: Definition of context word for RC
`The context word also specifies whether a particular
`RC writes to its row/column express lane (WR_Exp).
`Depending upon the context, an RC can access the input of
`any other RC in its column or row within the same
`quadrant, or else select an input from its own register file.
`Context broadcast: The context is broadcast in either
`of two modes: row-wise or column-wise. Thus, all eight
`RCs in a row (or column) execute the same context.
`Context Memory organization: The Context memory
`has two blocks (one for each broadcast mode), each of
`which stores eight sets (since there are 8 rows/columns) of
`sixteen 32-bit context words.
`Selective context enabling: It is possible to enable one
`specific row or column for operation in the RC Array. This
`feature is primarily useful in loading data into the RC
`Array, but also allows irregular operations in the RC
`Array, for e.g. zigzag re-arrangement of array elements.
`
`4.6. Interconnection network
`The RC interconnection network is comprised of three
`hierarchical levels.
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 136
`
`1616
`
`Constant
`
`1212
`
`1616
`
`1616
`
`R3
`R2
`R1
`R0
`
`ALU+MULT
`
`FB REG
`FB REG
`
`ALU_OP
`
`FLAGFLAG
`
`ALU_SFT
`
`Packing
`Packing
`Register
`Register
`WR_BUS, WR_Exp
`
`SHIFT
`
`2828
`
`Register File
`
`O/P REG
`O/P REG
`
`1616
`
`Fig. 3: Architecture of a reconfigurable cell
`ALU-Multiplier: The ALU has 16-bit inputs, and the
`multiplier has 16 by 12 bit inputs. Two ALU ports, Port A
`and Port B are for data from input multiplexers. The third
`input (12 bits) is from the constant field in the context
`register (Fig. 4). The fourth port takes input from the
`output register. The ALU adder is 28 bits wide to prevent
`loss of precision during multiply-accumulate operation.
`Besides standard functions, the ALU has several additional
`functions e.g. absolute value of difference of two numbers
`and a single cycle multiply-accumulate operation.
`Multiplexer (Input): The two input multiplexers (Fig.
`3) select one of several inputs for the ALU. Mux A (16-to-
`1) provides values from the array data bus, outputs of the
`four nearest neighbors and other cells in the same row and
`
`

`

`RC Array mesh: The primary network in the array is a
`2-D mesh that provides nearest neighbor connectivity.
`Complete row/column connectivity: Within each
`quadrant, each cell can access the output of any other cell
`in its row and column.
`Express lane connectivity: At the global level, there
`are buses for routing connections between adjacent
`quadrants. These buses, also called express lanes, run
`across rows as well as columns. These supply data from
`any one cell (out of four) in a row (column) of a quadrant
`to other cells in adjacent quadrant but in same row
`(column). The express lanes greatly enhance global
`connectivity. Even irregular communication patterns, that
`otherwise require extensive
`interconnections, can be
`handled quite efficiently. For e.g., an eight-point butterfly
`is accomplished in three cycles.
`Data bus: A 128-bit data bus from Frame Buffer to RC
`array is linked to column elements of the array. It provides
`two eight bit operands to each of the eight column cells.
`Eight cycles are required to load the entire RC array.
`Context bus: The context bus transmits context data to
`the Context Register in each cell in a row/column
`depending upon the broadcast mode. Each context word is
`32 bits wide, and there are eight rows (columns), hence the
`context bus is 256 bits wide.
`
`4.7. MorphoSys program flow
`
`The typical operation of MorphoSys M1 system is as
`follows: The Tiny RISC processor handles the general-
`purpose operations itself. Specific parts of applications,
`such as multimedia tasks, are mapped to the RC Array.
`The Tiny RISC processor initiates the loading of the
`context words (configuration data) and data from external
`memory into the Context Memory through the DMA
`Controller (Fig. 2).
`Now, Tiny RISC issues one of several possible context
`broadcast instructions to start execution of the RC Array.
`While the RC Array is executing instructions, and using
`data from the first set of the Frame Buffer, the Tiny RISC
`initiates transfer of data for the next computation into the
`second set of the Frame Buffer. When the RC Array
`execution on the first data set completes, fresh data is
`available in the second set. Thus the RC Array does not
`have to wait for data load/store.
`
`5. MorphoSys M1: Design implementation
`
`In this section, we describe the physical design of major
`components of M1: the reconfigurable cell, the frame
`buffer, DMA controller, Tiny RISC and instr./data caches.
`The target technology is 0.35 microns (four layer metal).
`Target cycle time is 10 nanoseconds. The budgeted chip
`
`area is 80 sq. mm. The chip layout will be completed by
`October, and it will then be fabricated through MOSIS.
`
`5.1. Design methodology
`
`The design approach is two-fold. Components that offer
`great potential for optimization through a regular structure
`are custom designed (using Magic V.6.5), while other
`components, such as controllers, are synthesized with
`Synopsys behavioral synthesis tools. The resultant netlist
`is converted into a layout using standard cell libraries and
`Mentor Graphics tool Autocells. These layouts and custom
`layouts will then be combined for floor planning and
`global routing using MicroPlan and MicroRoute tools of
`Mentor Graphics Corp. The final design will be verified
`using Lsim simulator from Mentor Graphics.
`
`5.2. Custom component layout
`
`Reconfigurable Cell: It is important to optimize the
`area for the ALU and multiplier, since there are 64
`instances of this cell. The 28-bit ALU is designed with a
`carry-skip adder. Worst case delay is around 2.5 ns. The
`16x12 multiplier has a delay close to 3.5 ns. The critical
`path is through the input multiplexers, multiplier, ALU,
`other muxes and a logarithmic shifter and adds to approx.
`9 ns (within the target of 10 ns). The total area for one RC
`is close to 0.75 sq. mm.
`Frame Buffer: This buffer or SRAM array has four
`banks of 64words by 64 bits. A unit of four SRAM cells
`was optimized for area, where each SRAM cell uses six
`transistors. The unit was then instantiated to obtain the
`array. Each bank has an address decoder (two-level 3 input
`decoders), input and output latches (64 and 128 bits) and a
`barrel shifter at the output for byte addressing. The access
`times of the SRAM array are within 3 ns. The area of the
`SRAM array and decoder is 0.7 sq. mm. The shifter and
`latches will need an additional area of 0.15 sq. mm.
`Tiny RISC: This
`is
`implemented as a 32-bit
`(instruction and datapath) design, with the custom design
`of instruction cache (2KB), data cache (1KB), register file
`(16 registers) and ALU. Access times for the caches are 5
`ns, while register file access time is 3 ns.
`
`5.3. Behavioral synthesis for controllers
`
`DMA controller: This is represented as an FSM with a
`large number of states, and is suited for synthesis. The
`synthesis constraints are time cycle of 10 ns and minimum
`area. Estimated critical path delay is 7.5 ns. The netlist is
`read into Mentor Graphics tool Autocells, after formatting,
`and converted into a layout using standard cell libraries.
`Tiny RISC controller: This will be synthesized using
`Synopsys tools and converted into layout using Autocells.
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 137
`
`

`

`Performance analysis: MorphoSys M1 performance is
`compared with three ASIC architectures implemented in
`[9], [10], [11] for matching one 8x8 reference block
`against its search area of 8 pixels displacement.
`
`Cycles
`1159
`
`1020
`
`581
`
`631
`
`1200
`900
`600
`300
`0
`
`ASIC [9]
`
`ASIC [10]
`
`ASIC [11] MorphoSys
`M1 (64
`RCs)
`
`Fig. 5: Performance comparison for Motion
`Estimation
`
`6.2. Discrete cosine transform (DCT) for MPEG
`
`The forward and inverse DCT are used in MPEG encoders
`and decoders. In the following analysis, we consider an
`algorithm for fast 8-point 1-D DCT [13]. The eight row
`(column) DCTs may be computed in parallel. For row
`(column) mode of operation, the configuration context is
`broadcast along columns (rows). The coefficients needed
`for the computation are provided as constants in context
`words. When 1-D DCT along rows (columns) is complete,
`the 1-D DCT along columns (rows) are computed in a
`similar manner.
`
`Cycles
`
`320
`
`240
`
`201
`
`TMS320C80
`
`MVP
`
`MMX
`Pentium
`
`V830R/AV
`
`54
`
`REMARC
`
`21
`
`MorphoSys
`
`M1
`
`400
`300
`200
`100
`0
`
`Fig. 6: Performance comparison for DCT (cycles)
`Performance analysis: MorphoSys requires 21 cycles
`to complete 2-D DCT (or IDCT) on 8x8 block of pixel
`data. This is in contrast to 240 cycles required by Pentium
`MMX TM [12]. The relative performance figures for
`MorphoSys and other implementations are given in Fig 6.
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 138
`
`5.4. Design integration and verification
`
`The custom layouts and synthesized layouts are input into
`MicroPlan for final placement and output pad connections.
`The standard cell layouts may be changed for better area
`utilization, but custom blocks are not affected. Once the
`floor-plan
`is accomplished,
`the design
`is
`input
`to
`Microroute for routing of global buses and Power and
`Ground buses from the pads to each block. The composite
`design is then complete and verified for correctness using
`Lsim, and through back-annotated synthesis re-runs.
`Table 1: Area and delay estimates for M1
`components
`Delay
`(ns)
`9.0
`--
`9 (max)
`4.0
`4.0
`7.5
`
`Component
`Name
`Reconfigurable Cell
`RC Array
`Tiny RISC and caches
`Frame Buffer
`Context Memory
`DMA Controller
`
`Area
`(sq. mm)
`0.8
`55
`8.5
`3.5
`4.0
`7.5
`
`6. Mapping applications to M1
`
`the mapping of video
`this section, we discuss
`In
`compression and automatic target recognition (ATR) to the
`MorphoSys M1 architecture. MPEG standards for video
`compression are important for realization of digital video
`services, such as video conferencing, video-on-demand,
`HDTV and digital TV. The major functions required of a
`typical MPEG encoder are:
`• Motion Estimation and Motion Compensation:
`• DCT and Quantization:
`•
`Inverse Quantization and Inverse DCT
`
`6.1. Motion Estimation for MPEG
`
`Motion estimation helps to identify redundancy between
`frames. The most popular technique for motion estimation
`is the block-matching algorithm [9]. Among the different
`block-matching methods, full search block matching
`(FSBM) involves the maximum computations but gives an
`optimal solution with low control overhead.
`Computation cost: For a reference block size of 16x16,
`it takes 36 clock cycles to finish the matching of three
`candidate blocks. VHDL simulation results show that 5304
`cycles are required to finish the matching of the whole
`search area. If the image size is 352x288 pixels at 30
`frames per second (MPEG-2 main profile, low level),
`processing of an entire image frame would take 2.1 x 106
`cycles. At clock rate of 100 MHz for MorphoSys, the
`computation time is ≈ 21.0 ms.
`
`

`

`6.3. Mapping MPEG-2 video encoder to M1
`
`We mapped all the functions for MPEG-2 video encoder,
`except the VLC encoding, to M1. A group of pictures
`consists of a sequence of four frames in the order IBBP.
`The total number of cycles required, are depicted in Fig. 7.
`
`Motion Est.
`
`MC, DCT and Q
`
`Inv Q, IDCT, Inv MC
`
`B frame
`
`P frame
`
`I frame
`
`0
`
`1250000
`
`2500000
`Cycles
`
`3750000
`
`5000000
`
`Fig. 7: M1 performance for video encoder
`
`6.4. Automatic Target Recognition (ATR)
`
`Automatic Target Recognition (ATR) is the machine
`function
`of
`automatically
`detecting,
`classifying,
`recognizing, and
`identifying an object. The ACS
`Surveillance challenge has been quantified as the ability to
`search 40,000 square nautical miles per day with one meter
`resolution. The processing model developed at Sandia
`National Labs is used for mapping ATR to MorphoSys.
`Performance analysis: We use the same system
`parameters that are chosen for ATR systems implemented
`using Xilinx XC4010 FPGA [15] and Splash 2 system
`[16]. For 16 pairs of target templates, the processing time
`is 21 ms for MorphoSys (at 100 MHz), 210 ms for the
`Xilinx FPGA system, and 195 ms for the Splash 2 system.
`
`Time (ms)
`
`195
`
`210
`
`300
`200
`100
`0
`
`21
`
`MorphoSys
`M1 (64 RCs)
`
`Splash 2 [16] Xilinx FPGAs
`[15]
`
`Fig. 8: Performance comparison for ATR
`
`7. Conclusions and future work
`
`In this paper, we presented features of MorphoSys, an
`integrated
`reconfigurable architecture, and provided
`performance estimates for several applications. The results
`reflect impressive performance for several of the target
`applications. We are implementing MorphoSys M1 on an
`actual chip, with PCI bus interface to Windows NT.
`
`8. Acknowledgments
`This research is supported by Defense and Advanced
`Research Projects Agency (DARPA) under contract F-
`33615-97-C-1126.
`
`3.
`
`4.
`
`9. References:
`1. D. Chen , J. Rabaey, “Reconfigurable Multi-processor IC for
`Rapid Prototyping of Algorithmic-Specific High-Speed
`Datapaths,” IEEE Journal of Solid-State Circuits, V. 27, No.
`12, Dec 92
`2. R. Hartenstein, R. Kress, “A Datapath Synthesis System for
`Reconfigurable Datapath Architecture,” Proc. of Asia and S.
`Pacific DAC, 1995, pp. 479-484
`E. Tau, D. Chen, I. Eslick, J. Brown and A. DeHon, “A First
`Generation DPGA Implementation,” FPD’95, Canadian
`Workshop of Field-Prog. Devices, May 1995
`T. Miyamori and K. Olukotun, “A Quantitative Analysis of
`Reconfigurable Coprocessors for Multimedia Applications,”
`Proc. of IEEE Symp. on Field-Prog. Custom Computing
`Machines, April 1998
`5. M. Gokhale, W. Holmes, A. Kopser, S. Lucas, R. Minnich,
`D. Sweely, D. Lopresti, “Building and Using a Highly
`Parallel Programmable Logic Array,” IEEE Computer, pp.
`81-89, Jan. 1991
`P. Bertin, D. Roncin, and J. Vuillemin, “Introduction to
`Programmable Active Memories,”
`in Systolic Array
`Processors, Prentice Hall, 1989, pp. 300-309
`7. A. Abnous, C. Christensen, J. Gray, J. Lenell, A. Naylor and
`N. Bagherzadeh, “ Design and Implementation of the Tiny
`RISC microprocessor,” Microprocessors and Microsystems,
`Vol. 16, No. 4, pp. 187-94, 1992
`F. Bonomini, F. De Marco-Zompit, G. A. Mian, A. Odorico,
`“Implementing an MPEG2 Video Decoder Based on
`TMS320C80 MVP,” SPRA 332, Texas Instr., Sep 1996
`9. C. Hsieh, T. Lin, “VLSI Architecture For Block-Matching
`Motion Estimation Algorithm,” IEEE Trans. on CSVT, vol.
`2, pp. 169-175, June 1992
`10. S.H Nam, J.S. Baek, T.Y. Lee and M. K. Lee, “ A VLSI
`Design for Full Search Block Matching Motion Estimation,”
`Proc. of IEEE ASIC Conf, Rochester, NY, Sep 94, pp. 254-7
`11. K-M Yang, M-T Sun and L. Wu, “ A Family of VLSI
`Designs
`for Motion Compensation Block Matching
`Algorithm,” IEEE Trans. on Circuits and Systems, V. 36,
`No. 10, Oct 89, pp. 1317-25
`Pentium MMX,
`for
`Intel
`Application
`Notes
`http://developer.intel.com/drg/mmx/appnotes/
`13. W-H Chen, C. H. Smith and S. C. Fralick, “A Fast
`Computational Algorithm for Discrete Cosine Transform,”
`IEEE Trans. on Comm., vol. COM-25, No. 9, Sep 1977
`14. T. Arai,
`I. Kuroda, K. Nadehara and K. Suzuki,
`“V830R/AV: Embedded Multimedia Superscalar RISC
`Proc’r,” IEEE MICRO, Mar/Apr 1998, pp.36-47
`15. J. Villasenor, B. Schoner, K. Chia, C. Zapata, H. J. Kim, C.
`Jones, S. Lansing, and B. Mangione-Smith, “ Configurable
`Computing Solutions for Automatic Target Recognition,”
`Proc. of IEEE Workshop on FCCM, April 1996
`16. M. Rencher, B. Hutchings, " Automated Target Recognition
`on SPLASH 2," Proc. of IEEE Symp. on FCCM, April 1997
`
`6.
`
`8.
`
`12.
`
`Petitioner Microsoft Corporation - Ex. 1058, p. 139
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket