`Case 3:14-cv-OO757-REP-DJN Document 1-3 Filed 11/04/14 Page 1 of 33 Page|D# 51
`
`EXHIBIT A
`EXHIBIT A
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 2 of 33 PageID# 52
`ill ill ii ilium in mi n iiI'll
`
`US005860158A
`
`United States Patent
`Pai et al.
`
`im
`
`[ii] Patent Number:
`
`[45] Date of Patent:
`
`5,860,158
`Jan. 12,1999
`
`[54] CACHE CONTROL UNIT WITH A CACHE
`REQUEST TRANSACTION-ORIENTED
`PROTOCOL
`
`[75]
`
`Inventors: Yet-Ping Pai, Milpitas; Le T. Nguyen,
`Monte Sereno, both of Calif.
`
`[73] Assignee: Samsung Electronics Company, Ltd.,
`Seoul, Rep. of Korea
`
`[21] Appl. No.: 751,149
`
`[22] Filed:
`
`Nov. 15, 1996
`
`[51]
`Int. CI.6
`[52] U.S. CI
`[58] Field of Search
`
`G06F 13/00
`711/118; 711/130
`711/119, 120,
`711/130, 140, 118
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`9/1987 Kceley et al
`4,695,943
`10/1987 Thompson et al
`4,701,844
`4,707,784 11/1987 Ryan et al
`
`711/140
`711/119
`711/140
`
`2/1990 Sachs et al
`4,899,275
`5,377,345 12/1994 Chang et al
`5,418,973
`5/1995 Ellis et al
`5,524,265
`6/1996 Balmer et al
`5,574,849 11/1996 Sonnier et al
`5,659,782
`8/1997 Senter et al
`
`711/3
`395/425
`395/800
`711/212
`395/182.1
`395/800.23
`
`Primary Examiner—Tod R. Swann
`Assistant Examiner—Fc\ix B. Lee
`Attorney, Agent, or Firm—Skjerven, Morrill, MacPherson,
`Franklin & Friel, L.L.P.; Stephen A. Terrile
`
`[57]
`
`ABSTRACT
`
`A cache control unit and a method of controlling a cache.
`The cache is coupled to a cache accessing device. A first
`cache request is received from the device. A request iden
`tification information is assigned to the first cache request
`and provided to the requesting device. The first cache
`request may begin to be processed. A second cache request
`is received from the cache accessing device. The second
`cache request is assigned to the first cache request and
`provided to the requesting device. The first and second cache
`requests are finally fully serviced.
`
`37 Claims, 8 Drawing Sheets
`
`ARM_CCU INTERFACE STATE MACHINE (A_SM)
`
`I10
`
`START1
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 3 of 33 PageID# 53
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 1 of 8
`
`5,860,158
`
`100
`
`PROCESSING CORE 102
`
`THT
`
`TPT
`
`CACHE SYSTEM 130
`
`ROM 150
`
`CACHE CONTROL UNIT 160
`
`z1-
`V
`
`GENERAL PURPOSE
`PROCESSOR
`110
`
`7^
`
`CO
`
`XZ
`
`ICACHE
`142
`
`DCACHE
`144
`
`1407
`
`VECTOR PROCESSOR
`120
`
`77CO
`
`ICACHE
`172
`
`DCACHE
`174
`
`170
`
`tTt
`
`SYSTEM TIMER
`182
`
`V
`
`i>
`
`UART 184 C )
`
`BITSTREAM
`PROCESSOR 186
`
`V
`
`in
`
`So
`
`INTERRUPT
`CONTROLLER 188
`
`i>
`V
`
`FIG. 1
`
`c
`
`LOCAL BUS
`INTERFACE
`196
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 4 of 33 PageID# 54
`
`U.S. Patent
`
`Jan. 12,1999
`
`Sheet 2 of 8
`
`5,860,158
`
`130
`
`SRAM 210
`
`TftG 212
`
`ROM 150
`
`140
`
`170
`
`GPP ICACHE
`142A
`
`GPP DCACHE
`MM
`
`TAG
`142B
`
`TAG
`144B
`
`VECTOR ICACHE
`172A
`
`TAG
`172E
`
`VECTOR DCACHE
`174A
`
`TAG
`
`214 216
`
`TAG
`15QB
`
`ROM CACHE
`15QA
`
`CACHE CONTROL
`UNIT
`160
`
`DATA PIPELINE
`220
`
`ADDRESS PIPELINE
`230
`
`232 234 236
`
`VECTOR (INSTR)
`VECTOR (DATA)
`
`GPP
`
`VECTOR (INSTR)
`VECTOR (DATA)
`VECTOR (DATA)
`GPP
`
`FBUS
`
`I0BUS
`
`FIG. 2
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 5 of 33 PageID# 55
`Case 3:14-cv-OO757-REP-DJN Document 1-3 Filed 11/04/14 Page 5 of 33 Page|D# 55
`
`US. Patent
`
`Jan. 12,1999
`
`Sheet 3 of 8
`
`5,860,158
`
`110
`PROCESSOR
`PURPOSE
`GENERAL
`
`120
`PROCESSOR
`VECTOR
`
`J
`
`360
`GPREADMUX
`
`MUX340
`CACHEREAD
`
`MUX350
`CACHEWRITE
`
`5x22mam;
`
`180
`I0BUS
`
`320
`IOMUX
`
`310
`FBUSMUX
`
`220
`
`SRAMM
`
`ROM150
`
`FIG.3
`
`L
`
`190
`FBUS
`
`330
`BUFFER
`
`«888%.cm,
`._<Ezmo”85%
`
`mmommammommmoomm
`%Im.uE
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 6 of 33 PageID# 56
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 4 of 8
`
`5,860,158
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 7 of 33 PageID# 57
`
`FIG.5
`
`180
`IOBUS
`
`GPP
`
`T—110
`
`M20
`
`VECTORPROCESSOR
`
`190
`FBUS
`
`I
`
`J
`
`MEM_ADR_LAT
`
`31
`
`ADR_Q0JWBJ.AT
`
`WBJ_AT_TMP
`
`522
`UD_TAG
`
`ULTAG
`
`RD_ADR_Q
`
`WR_ADR_Q
`
`1
`
`i,(;i
`ii
`ii
`WR_ADR_MUXJ_AT
`
`RETURN.ADR
`
`RETURNJD
`
`ADR_Q1-
`
`ADR.Q2
`
`ADR.Q3
`
`h
`
`n
`
`u
`
`n
`
`J
`iiT
`RD_ADR_MUX_LAT
`
`520
`
`510
`
`COMPARATOR
`J_
`521
`
`COMPARATOR
`
`TAG.OUT
`
`511
`
`550
`
`RAM_WRITE_ADR
`
`MCP_BASE
`
`CACHE_R0M
`
`J__
`150
`
`RD.TAG
`J-
`506-2
`
`WR_TAG
`
`506-1
`
`CACHE.RAM
`J—
`210
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 8 of 33 PageID# 58
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 6 of 8
`
`5,860,158
`
`ARM_CCU INTERFACE STATE MACHINE (A_SM)
`
`START*
`
`FIG. 6
`
`CCILFBUS INTERFACE STATE MACHINE (F_SM)
`
`ITO
`
`FIG. 7
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 9 of 33 PageID# 59
`
`U.S. Patent
`
`Jan. 12,1999
`
`Sheet 7of 8
`
`5,860,158
`
`DATA RECEIVER STATE MACHINE (D_SM)
`
`FIG. 8
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 10 of 33 PageID# 60
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 8 of8
`
`5,860,158
`
`READ STATE MACHINE
`(RD_SM)
`
`WRITE STATE MACHINE
`(WR_SM)
`
`FIG. 9
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 11 of 33 PageID# 61
`
`5,860,158
`
`CACHE CONTROL UNIT WITH A CACHE
`REQUEST TRANSACTION-ORIENTED
`PROTOCOL
`
`COPYRIGHT NOTICE
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or the patent
`disclosure, as it appears in the Patent andTrademark Office
`patent file or records, but otherwise reserves all copyright
`rights whatsoever.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`This invention relates to providing processors with fast
`memory access and, more particularly, to providing control
`of cache memory systems.
`2. Description of the Related Art
`Processors often employ memories which are relatively
`slow when compared to the clock speeds of the processors.
`To speed up memory access for such processors, a relatively
`small amount of fast memory can be used in a data cache.
`A cache can mediate memory accesses and lessen the
`average memory access time for all or a large portion of the
`address space of a processor even though the cache is small
`relative to the address space. Caches do not occupy a
`specific portion of the address space of the processor but
`instead include tag information which identifies addresses
`for information in lines of the cache.
`Typically, a cache compares an address received from a
`processor to tag information stored in the cache to determine
`whether the cache contains a valid entry for the memory
`address being accessed. If such a cache entry exists (i.e. if
`there is a cache bit), the processor accesses (reads from or
`writes to) the faster cache memory instead of the slower
`memory. In addition to tag information, a cache entry
`typically contains a "validity" bit and a "dirty" bit which
`respectively indicated whether the associated information in
`the entry is valid and whether the associated information
`contains changes to be written back to the slower memory.
`If there is no cache entry for the address being accessed (i.e.
`there is a cache miss), access to the slower memory is
`required for the cache to create a new entry for the just
`accessed memory address.
`Caches use cache policies such as "least recently used" or
`"not last used" replacement techniques to determine which
`existing entries are replaced with new entries. Typically,
`computer programs access the same memory addresses
`repeatedly. Therefore, the most recently accessed data is
`likely to be accessed again soon after the initial access.
`Because recently accessed data is available in the cache for
`subsequent accesses, caches can improve access time across
`the address space of the processor.
`A different method for increasing processor speed is the
`use of parallel processing techniques. For example, by
`providing a number of functional units which perform
`different
`tasks, a "very long instruction word" (VLIW)
`processor can perform multiple functions through a single
`instruction. Also, a general purpose processor and a vector
`processor may be integrated to operate in parallel. An
`integrated multiprocessor is able to achieve high perfor
`mance with low cost since the two processors perform only
`tasks ideally suited for each processor. For example,
`the
`general purpose processor runs a real time operating system
`
`and performs overall system management while the vector
`processor is used to perform parallel calculations using data
`structures called "vectors". (A vector is a collection of data
`elements typically of the same type.) Multiprocessor con-
`5 figurations are especially advantageous for operations
`involving digital signal processing such as coding and
`decoding video, audio, and communications data.
`
`20
`
`SUMMARY OF THE INVENTION
`It has been discovered that accesses to a cache by multiple
`10 devices may be managed by a cache control unit that
`includes transaction identification logic to identify cache
`accesses. Such an apparatus provides the advantage of
`improving performance by increasing the speed of memory
`accesses by one or more devices. Specifically, such an
`15 apparatus allows thecache toservice later arriving requests
`before earlier arriving requests.
`In one embodiment of the present invention, a cache is
`coupled to a cache accessing device. A first cache request is
`received from the device. A request identification informa
`tion is assigned to the first cache request and provided to the
`requesting device. The first-cache request may begin to be
`processed. A second cache request
`is received from the
`cache accessing device. The second cache request
`is
`assigned to the first cache request and provided to the
`requesting device. The first and second cache requests are
`finally fully serviced.
`In another embodiment, a cache system includes a cache
`for temporarily storing information and a cache control unit.
`The cache control unit includes access control logic, iden
`tification logic, and result logic. The access control logic
`receives and executes cache accesses by a cache accessing
`device. The identification logic assigns request identification
`information to each of the cache accesses, and provides the
`request
`identification information to the cache accessing
`device. The identification logic is capable of providing the
`request identification information prior to the execution of
`the cache accesses by the access control
`logic. The result
`logic provides the request identification information and the
`data requested by the cache accessing device to the cache
`accessing device if the cache access was a read.
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The present invention may be better understood, and its
`numerous objects, features, and advantages made apparent
`45 to those skilled in the art by referencing the accompanying
`drawings.
`FIG. 1 shows a block diagram of a multimedia signal
`processor in accordance with an embodiment of the inven
`tion.
`FIG. 2 shows a block diagram of a cache system in
`accordance with an embodiment of the invention.
`FIG. 3 shows a block diagram of a data pipeline used in
`a cache system in accordance with an embodiment of the
`invention.
`FIG. 4 shows a block diagram of a data pipeline used in
`a cache system in accordance with an embodiment of the
`invention.
`FIG. 5 shows a block diagram of an address pipeline used
`60 in a cache system in accordance with an embodiment of the
`invention.
`FIG. 6 shows a state diagram of a cache control unit and
`processor interface in accordance with an embodiment of the
`invention.
`FIG. 7 shows a state diagram of a cache control unit and
`bus interface in accordance with an embodiment of the
`invention.
`
`65
`
`55
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 12 of 33 PageID# 62
`
`5,860,158
`
`FIG. 8 shows a state diagram of a data receiver slate
`machine in accordance with an embodiment of the inven
`tion.
`FIG. 9 shows a state diagram of a read/write state machine
`in accordance with an embodiment of the invention.
`The use of the same reference symbols in different draw
`ings indicates similar or identical items.
`
`5
`
`In that embodiment, vector processor 120 is
`executes.
`designed to perform computationally intensive tasks requir
`ing the manipulation of large data blocks, while general
`purpose processor 110 acts as the master processor to vector
`processor 120.
`In the exemplary embodiment, general purpose processor
`110 is a 32-bit RISC processor which operates at 40 Mhz and
`conforms to the standard ARM7 instruction set. The archi
`tecture for an ARM7 reduced instruction set computer
`(RISC) processor and the ARM7 instruction set is described
`in the ARM7DM Data Sheet available from Advanced RISC
`Machines Ltd. General purpose processor 110 also imple
`ments an extension of the ARM7 instructions set which
`includes instructions for an interface with vector processor
`120. The extension to the ARM7 instruction set for the
`exemplary embodiment of the invention is described in
`copending, U.S. patent application Ser. No. 08/699,295,
`attorney docket No. M-4366 U.S., filed on Aug. 19, 1996,
`entitled "System and Method for Handling Software Inter
`rupts with Argument Passing," naming Seungyoon Peter
`Song, Moataz A. Mohamed, Heon-Chul Park and Le
`Nguyen as inventors, which is incorporated herein by ref
`erence in its entirety. General purpose processor 110 is
`coupled to vector processor 120 by control bus 112 to carry
`out the extension of the ARM7 instruction set. Furthermore,
`interrupt line 114 is used by vector processor 120 to request
`an interrupt on general purpose processor 110.
`In the exemplary embodiment, vector processor 120 has
`a single-instruction-multiple-data (SIMD) architecture and
`manipulates both scalar and vector quantities. In the exem
`plary embodiment, vector processor 120 consists of a pipe
`lined reduced instruction set computer (RISC) central pro
`cessing unit (CPU) that operates at 80 Mhz and has a 288-bit
`vector register file. Each vector register in the vector register
`file can contain up to 32 data elements. A vector register can
`hold thirty-two 8-bit or 9-bit integer data elements, sixteen
`16-bil
`integer data elements, or eight 32-bit
`integer or
`the exemplary
`floating point elements. Additionally,
`embodiment can also operate on a 576-bit vector operand
`spanning two vector registers.
`The instruction set for vector processor 120 includes
`instructions for manipulating vectors and for manipulating
`scalars. The instruction set for the exemplary embodiment of
`the invention and an architecture for implementing the
`instruction set
`is described in the pending U.S. patent
`application Ser. No. 08/699,597, attorney docket No.
`M-4355 U.S., filed on Aug. 19, 1996, entitled "Single-
`Instruction-Multiple-Data Processing in a Multimedia Sig
`nal Processor," naming Le Trong Nguyen as inventor, which
`is incorporated herein by reference in its entirety.
`General purpose processor 110 performs general tasks and
`executes a real-time operating system which controls com
`munications with device drivers. Vector processor 120 per
`forms vector tasks. General purpose processor 110 and
`vector processor 120 may be scalar or superscalar proces
`sors. The multiprocessor operation of the exemplary
`embodiment of the invention is more fully described in
`pending U.S. patent application Ser. No. 08/697,102, attor
`ney docket No. M-4354 U.S., filed on Aug. 19, 1996,
`entitled "Multiprocessor Operation in a Multimedia Signal
`Processor," naming Le Trong Nguyen as inventor, which is
`incorporated herein by reference in its entirety.
`Referring again to FIG. 1, cache system 130 contains a
`fast random access memory (RAM) block (shown graphi
`cally as blocks 140 and 170), read only memory (ROM) 150
`and a cache control unit 160. Cache system 130 can con-
`
`io
`
`30
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT^)
`The following sets forth a detailed description of the
`preferred embodiments. The description is intended to be
`illustrative of the invention and should not be taken to be
`limiting. Many variations, modifications, additions and
`improvements may fall within the scope of the invention as
`defined in the claims that follow.
`Referring to FIG. 1, processor 100 includes a general
`purpose processor 110 coupled to a vector processor 120.
`General purpose processor 110andvectorprocessor 120are ,0
`coupled via control bus 112 and interrupt line 114. General
`purpose processor 110 and vector processor 120 are coupled
`to cache system 130 via bus 116 and bus 118, respectively.
`Cache system is coupled to input/output bus (IOBUS) 180
`and fast bus (FBUS) 190. IOBUS 180 is coupled to system 25
`timer 182, universal asynchronous receiver-transmitter
`(UART) 184, bilstream processor 186 and interrupt control
`ler 188. FBUS 190 is coupled to device interface 192, direct
`memory access (DMA) controller 194, local bus interface
`196 and memory controller 198.
`General purpose processor 110 and vector processor 120
`execute separate program threads in parallel. General pur
`pose processor 110 typically executes instructions which
`manipulate scalar data. Vector processor 120 typically
`executes instructions having vector operands, i.e., operands 35
`each containing multiple data elements of the same type. In
`some embodiments, general purpose processor 110 has a
`limited vector processing capability. However, applications
`that require multiple computations on large arrays of data are
`not suited for scalar processing or even limited vector 40
`processing. For example, multimedia applications such as
`audio and video data compression and decompression
`require many repetitive calculations on pixel arrays and
`strings of audio data. To perform real-time multimedia
`operations, a general purpose processor which manipulates 45
`scalar data (e.g. one pixel value or sound amplitude per
`operand) or only small vectors must operate at a high clock
`frequency. In contrast, a vector processor executes instruc
`tions where each operand is a vector containing multiple
`data elements (e.g. multiple pixel values or sound 50
`amplitudes). Therefore, vector processor 120 can perform
`real-time multimedia operations at a fraction of the clock
`frequency required for general purpose processor 110 to
`perform the same function. Thus, by allowing an efficient
`division of the tasks required for, e.g., multimedia 55
`applications, the combination of general purpose processor
`110 and vector processor 120 provides high performance per
`cost. Although in the preferred embodiment, processor 100
`is for multimedia applications, processor 100 may be any
`type of processor.
`In one embodiment, general purpose processor 110
`executes a real-time operating system designed for a media
`circuit board communicating with a host computer system.
`The real-time operating system communicates with a pri
`mary processor of the host computer system, services input/ 65
`output (I/O) devices on or coupled to the media circuit
`board, and selects tasks which vector processor 120
`
`60
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 13 of 33 PageID# 63
`
`5,860,158
`
`5
`
`figure the RAM block into (i) an instruction cache 142 and
`a data cache 144 for general purpose processor 110, and (ii)
`an instruction cache 172 and data cache 174 for vector
`processor 120. In the preferred embodiment, RAM block
`140, 170 includes static RAM (SRAM).
`In an embodiment of a computer system according to the
`invention, general purpose processor 110 and vector proces
`sor 120 share a variety of on-chip and off-chip resources
`which are accessible through a single address space. Cache
`system 130 couples a memory to any of several memory 10
`mapped devices such as bitstream processor 186, UART
`184, DMA controller 194, local bus interface 196, and a
`coder-decoder (CODEC) device interfaced through device
`interface 192. Cache system 130 can use a transaction-
`orientedprotocolto implementa switchboard fordata access 15
`among the processors and memory mapped resources. For
`example, the transaction-oriented protocol provides that if
`completion of an initial cache transaction is delayed (e.g.,
`due to a cache miss), other cache access transactions may
`proceed prior to completionof the initial transaction. Thus, 20
`"step-aside-and-wait" capability is provided in this embodi
`ment of a cache management system according to the
`invention. A similar transaction-oriented protocol is further
`described in pending, U.S. patent application Ser. No.
`08/731,393, attorney docket No. M^398 U.S., filed on Oct. 25
`18,1996, entitled "Shared Bus System with Transaction and
`Destination ID," naming Amjad Z. Qureshi and Le Trong
`Nguyen as inventors, which is incorporated herein by ref
`erence in its entirety.
`Cache system 130 couples general purpose processor 110
`and vector processor 120 to two system busses: IOBUS 180
`and FBUS 190. IOBUS 180 typically operates at a slower
`frequency than FBUS 190. Slower speed devices are
`coupled to IOBUS 180, while higher speed devices are
`coupled to FBUS 190. By separating the slower speed
`devices from the higher speed devices, the slower speed
`devices are prevented from unduly impacting the perfor
`mance of the higher speed devices.
`Cache system 130 also serves as a switchboard for ^
`communication between IOBUS 180, FBUS 190, general
`purpose processor 110, and vector processor 120. In most
`embodiments of cache system 130, multiple simultaneous
`accesses between the busses and processors are possible. For
`example, vector processor 120 is able to communicate with .,
`FBUS 190 at the same time that general purpose processor
`110 is communicating with IOBUS 180. In one embodiment
`of the invention, the combination of the switchboard and
`caching function is accomplished by using direct mapping
`techniques for FBUS 190and IOBUS 180. Specifically, the 5Q
`devices on FBUS 190 and IOBUS 180 can be accessed by
`general purpose processor 110 and vector processor 120 by
`standard memory reads and write at appropriate addresses.
`FBUS 190 provides an interface to the main memory.The
`interface unit to the memory is composed of a four-entry 55
`address queue and a one-entry write-back latch. The inter
`face can support one pending refill (read) request from
`general purpose processor instruction cache 142, one pend
`ing refill (read) request from vector processor instruction
`cache 172, one write request from vector processor data gg
`cache 174, and one write-back request from vector processor
`data cache due to a dirty cache line.
`FBUS 190 is coupled to various high speed devices such
`as a memory controller 198 and a DMA controller 194, a
`local bus interface 196, and a device interface 192. Memory 65
`controller 198 and DMA controller 194 provide memory
`interfaces. Local bus interface 196 provides an interface to
`
`30
`
`35
`
`a local bus coupled to a processor. Device interface 192
`provides interfaces to various digital-to-analog and analog-
`to-digital converters (DACs and ACDs, respectively) that
`may be coupled to processor 100 for video, audio or
`communications applications.
`Memory controller 198 provides an interface for a local
`memory if a local memory is provided for processor 100.
`Memory controller 198 controls reads and writes to the local
`memory. In the exemplary embodiment, memory controller
`198 is coupled to and controls one bank of synchronous
`dynamic RAMs (two lMxl6 SDRAM chips) configured to
`use 24 to 26 address bits and 32 data bits and having the
`features of: (i) a "CAS-before-RAS" refresh protocol, per
`formed at a programmable refresh rate, (ii) partial writes that
`initiate Read-Modify-Writc operations, and (iii)
`internal
`bank interleave. Memory controller 198 also provides a 1:1
`frequency match between the local memory and FBUS 190,
`manual "both bank precharge", and address and data queu
`ing to better utilize FBUS 190. Synchronous DRAM are
`known to effectively ope/ate at such frequencies (80 MHz),
`and standard fast page DRAMs and extended data out
`(EDO) DRAMs could also be used. DRAM controllers with
`capabilities similar to memory controller 198 in the exem
`plary embodiment are known in the art.
`DMA controller 194 controls direct memory accesses
`between the main memory of a host computer and the local
`memory of processor 100. Such DMA controllers are well
`known in the art. In some embodiments of the invention, a
`memory data mover is included. The memory data mover
`performs DMA from one block of memory to another block
`of memory.
`Local bus interface 196 implements the required protocol
`for communications with a host computer via a local bus. In
`the exemplary embodiment, local bus interface 196 provides
`an interface to a 33-MHz, 32-bit PCI bus. Such interfaces are
`well known in the art.
`Device interface 192 provides a hardware interface for
`devices such as audio, video and communications DACs and
`ADCs which would typically be on a printed circuit board
`with a processor 100 adapted for multimedia applications.
`Device interface 192 may be customized for the particular
`application of processor 100. In particular, device interface
`192 might only provide an interface for specific devices or
`integrated circuits (ICs). Typical units within device inter
`face 192 provide an interface for connection of standard
`ADCs, DACs, or CODECs. Designs for ADC, DAC, and
`CODEC interfaces are well known in the art and not
`described further here. Other interfaces which might be
`employed include but are not
`limited to an integrated
`services digital network (ISDN) interface for digital tele
`phone and interfaces for busses such as for a microcbannel
`bus. In one embodiment of processor 100, device interface
`192 is an application specific integrated circuit (ASIC)
`which can be programmed to perform a desired functional
`ity.
`In the preferred embodiment, IOBUS 180 operates at a
`frequency (40 MHz) that is lower than the operating fre
`quency (80 MHz) of FBUS 190. Also in the preferred
`embodiment, IOBUS 180 is coupled to system timer 182,
`UART 184, bitstream processor 186, and interrupt controller
`188.
`System timer 182 interrupts general purpose processor
`110 at scheduled intervals which are selected by writing to
`registers corresponding to system timer 182. In the exem
`plary embodiment, system timer 182 is a standard Intel 8254
`compatible interval timer having three independent 16-bit
`counters and six programmable counter modes.
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 1-3 Filed 11/04/14 Page 14 of 33 PageID# 64
`
`5,860,158
`
`5
`
`UART 184 is a serial interface which is compatible with
`the common 16450 UART integrated circuit. The 16450
`UART IC is for use in modem or facsimile applications
`which require a standard serial communication ("COM")
`port of a personal computer.
`Bitstream processor 186 is a. fixed hardware processor
`which performs specific functions on an input or output
`bitstream. In the exemplary embodiment, bitstream proces
`sor 186 performs initial or final stages of MPEG coding or
`decoding. In particular, bitstream processor 186 performs ]0
`variable length (Huffman) coding and decoding, and pack
`ing and unpacking of video data in "zig-zag" format. Bit-
`stream processor 186 operates in parallel with and under the
`control of general purpose processor 110 and vector proces
`sor 120. Processors 110 and 120 configure bitstream pro- 1S
`cessor 186 via control registers. An exemplary embodiment
`of bitstream processor 186 is described in pending U.S.
`patent application Ser. No. 08/699,303, attorney docket No.
`M-4368 U.S., filed on Aug. 19,1996, entitled "Methods and
`Apparatus for Processing Video Data,"naming CliffReader, 20
`Jae Cheol Son, Amjad Qureshi and Le Nguyen as inventors,
`which is incorporated herein by reference in its entirety.
`Interrupt controller 188 controls interrupts of general
`purpose processor 110 and supports multiple interrupt pri
`orities. A mask register is provided to allow each interrupt 2s
`priority to be individually masked.
`In the exemplary
`embodiment, interrupt controller 188 is programmable and
`implements the standard Intel 8259 interrupt system that is
`common in x86-based personal computers. A highest prior
`ity (level 0) interrupt
`is assigned to system timer 242. 30
`Priority levels 1, 2, 3, and 7 are respectively assigned to a
`virtual frame buffer, DMA controller 194 and device inter
`face 192, bitstream processor 186, local bus interface 196,
`and UART 184. Interrupt priority levels 4, 5, and 6 are
`unassigned in the exemplary embodiment of the invention. 35
`The virtual frame buffer at priority level 1, which is included
`in some embodiments of the invention, emulates a standard
`VGA frame buffer.
`Referring to FIG. 2, cache system 130 includes SRAM
`block 210, ROM 150, data pipeline 220, address pipeline 40
`230 and cache control unit 160. SRAM block 210, ROM 150
`and cache control unit 200 are each separately coupled to
`data pipeline 220 and to address pipeline 230. Data pipeline
`220 is coupled to IOBUS 180, FBUS 190, general purpose
`processor 110 and vector processor 120. Address pipeline 45
`230 is coupled to general purpose processor 110 and vector
`processor 120.
`SRAM block 210 is divided into four memory banks to
`form instruction cache 142 and data cache 144 for use with
`general purpose processor 110, as well as instruction cache 50
`172 and data cache 174 for use with vector processor 120.
`In any cycle, cache system 130 can accept one read request
`and one write request. SRAM block 210 is a dual-ported
`memory circuit, with read port 216 and write port 214, so
`that simultaneous reading and writing of SRAM block 210 55
`is supported. SRAM block 210 also contains a tag section
`212 which is subdivided into TAG 142B, TAG 144B, TAG
`172B and TAG 174B for each of the respective memory
`banks 142A, 144A, 172A and 174A. The tag RAM has two
`read ports. The read port address and the write port address 60
`can be compared with the internal cache tags for hit or miss
`condition. The tag information for each cache line includes
`a tag, two validity bits, two dirty bits, and use information.
`Each validity bit and dirty bit corresponds to a 32-byte half
`of a cache line which is equal
`to the amount of data 65
`transferred by a single read or write operation. Each dirty bit
`indicates a single 256-bit write to external memory, and each
`
`8
`indicates a single 256-bit read from external
`validity bit
`memory. The used bits are for the entry replacement scheme
`used to create new entries. Four sets of cache bank select
`signals and three sets of line indices are needed to access
`SRAM block 210.
`ROM 150 includes ROM cache field 150A and ROM tag
`field 150B. ROM 150 can be configured as a cache.
`Although tag field 150B cannot be modified,
`individual
`addresses can be marked as invalid so that data or instruc
`tions can be brought from memory to be used in place of the
`data or instructions in ROM 150. ROM 150 contains fre
`quently used instructions and data for general purpose
`processor 110 and vector processor 120. In the exemplary
`embodiment, ROM 150 contains: reset and initialization
`procedures; self-test diagnostics procedures; interrupt and
`exception handlers; and subroutines for soundblaster emu
`lation; subroutines for V.34 modem signal processing; gen
`eral telephony functions; 2-dimensional and 3-dimensional
`graphics subroutine libraries; and subroutine libraries for
`audio and video standards such as MPEG-1, MPEG-2,
`H.261, H.263, G.728, and G.723.
`Data pipeline 220 performs the data switchboard function
`of cache system 130. Data pipeline 220 is able to create
`multiple simultaneous data communication paths between
`IOBUS 180, FBUS 190, general purpose processor 110,
`vector processor 120 and SRAM block 210 and ROM 150.
`Similarly, address pipeline 230 performs switchboard func
`tions for addresses. In the embodiment of FIG. 2, IOBUS
`180 and FBUS 190 use time multiplexing for address and
`data signals. Cache control 160 provides the control lines to
`data pipeline 220 and address pipeline 230 to properly
`configure the communication channels.
`In some embodiments of cache system 130, a transaction-
`based protocol is used to support all read and write opera
`tions. Any unit coupled to cache system 130, such as general
`processor U0, vector processor 120, or the various devices
`coupled to IOBUS 180 and FBUS 190, can place a request
`to cache system 130. Such a request is formed by a device
`identification code ("device ID") and an address of the
`requested memory location. Each unit has a distinct device
`ID and cache system 130 can prioritize the requests based on
`the device ID of the unit making the request. When the data
`at the requested address becomes available, cache system
`responds with the device ID, a transaction identification
`code ("transaction ID"), the address, and the requested data.
`If the requested address is not contained in SRAM block 210
`or ROM 150, cache system 130 will not be able to respond
`to the specific request for several clock cycles while the data
`at the memory address is retrieved. However, while the data
`of a first request is being