throbber
United States Patent (19)
`Garde
`
`USOO5922076A
`Patent Number:
`11
`(45) Date of Patent:
`
`5,922,076
`Jul. 13, 1999
`
`54) CLOCKING SCHEME FOR DIGITAL SIGNAL Attorney, Agent, or Firm Wolf, Greenfield & Sacks, P.C.
`PROCESSOR SYSTEM
`57
`ABSTRACT
`
`75 Inventor: Douglas Garde, Dover, Mass.
`73 Assignee: Analog Devices, Inc., Norwood, Mass.
`
`21 Appl. No.: 08/931,665
`
`Sep. 16, 1997
`22 Filed:
`(51) Int. Cl. .................................................. G06F 1/04
`52 U.S. Cl. .............................................................. 713/600
`58 Field of Search ..................................... 395/553,555,
`395/556, 559; 713/500, 501, 600; 709/400,
`248
`
`56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,611,075 3/1997 Garde ...................................... 395/480
`5,619.720 4/1997 Garde et al.
`- - - 395/800
`5,685,005 11/1997 Garde et al. ............................ 395/800
`Primary Examiner Thomas M. Heckler
`
`A digital Signal processing System includes a cluster of
`processors and a host. A host can access each of the
`processors through an external bus System that interconnects
`the host with each of the processors. An external port of each
`of the processors operates at one of a local clock frequency
`and host clock frequency, the local clock frequency and host
`clock frequency being asynchronous with one another. The
`host operates at the host clock frequency. Upon a host access
`of one of the processors, the clock frequency of operation of
`the external parallel port of each processor automatically is
`controlled to operate at the host clock frequency. In an
`embodiment, each processor also includes a core processor
`that operates at a core clock frequency that is a multiple of
`the local clock frequency, asynchronous with the host clock
`frequency. Thus, the Speed of operation of the core processor
`and that of the external parallel port can be optimized
`independently.
`
`12 Claims, 6 Drawing Sheets
`
`
`
`LCLK
`
`HCK
`
`IPR2023-00037
`Apple EX1028 Page 1
`
`

`

`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 1 of 6
`
`5,922,076
`
`FIG. 1
`
`LCLK
`
`HCLK
`
`LCLK
`
`HCLK
`
`ANALOG SWITCH
`
`LCLK
`
`HCLK
`
`
`
`10
`
`PA
`
`BUFFERS
`
`HCLK'
`
`HCLK's TO EACH DESTINATION
`A2
`
`BUFFERS
`
`LCLK'
`
`LCLKS TO EACH DESTINATION
`
`LCLK
`
`HCLK
`
`LCLK
`
`HCLK
`
`FIG. 2
`
`LCLK
`
`HCLK
`
`MUX
`
`MCLK
`
`IPR2023-00037
`Apple EX1028 Page 2
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 2 of 6
`
`5,922,076
`5,922,076
`
`
`
`WNSHDv(vy)Sib0dWYHOOHdnwx&SYNCgl
`NOLLVOINAWNOO<——ySHAS
`
`
`9cstdtsy0078
`cso}|Keceek9ee
`
`ayO078ToHINOD&wei’
`
` yuddY=LVe13|4erWN]iuoglacd=SSeS
`Tae]2g0s
`HINOeliHOn2oe|wud
`92701
`=~|NYSNYGNagtAHONNAHOWSIAMOWSN
`
`
`
`OchXNS)uOefXYr926XWg2£XYy9X0018NOTL¥LNdwOO
`
`HOdv=VIVO}HOOY§=¥lvd)
`24H1HOvKTWOL
`OetXNSFYOOerXSFYO
`
`yWeWOWeitOWe
`waonanoas|2eX2e
`
`
`AMOOTNOTLVLNdWO)vy
`
`sua4and
`
`Ve
`
`E“9I4
`
`
`
`
`
`
`
`IPR2023-00037
`Apple EX1028 Page 3
`
`IPR2023-00037
`Apple EX1028 Page 3
`
`
`
`
`
`

`

`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 3 of 6
`
`5,922,076
`
`FIG. 4
`
`
`
`LCLK
`
`HCLK
`
`26
`
`PERIPH.
`
`LCLK
`
`FREG
`MULT
`
`128
`
`130-CCLK
`
`CORE
`
`32
`
`IPR2023-00037
`Apple EX1028 Page 4
`
`

`

`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 4 of 6
`
`5,922,076
`
`OUTPUT AND AERS X
`
`64
`
`40
`
`COPY OF
`DISTRIBUTION TREE
`
`T
`
`
`
`
`
`A4 2
`
`
`
`
`
`
`
`
`
`PHASE a CONTROL
`
`COPY OF
`DELAY T2 .50
`
`INVERTER CHAIN
`
`
`
`UPDATEN
`
`TRIEN
`
`
`
`
`
`OUTPUT
`PAD
`
`60
`
`2
`
`DELAY LOCKED LOOP
`
`
`
`36
`
`IPR2023-00037
`Apple EX1028 Page 5
`
`

`

`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 5 of 6
`
`5,922,076
`
`FIG. 6
`
`28
`
`IDFIFO
`
`OAFIFO
`
`DIRECT WRITE
`ADR DEST
`
`OBUF
`DIRECT READ
`(SLAVE)
`
`EXT ADR
`
`58
`
`INTERNAL
`DATA
`BUSES
`
`MO
`
`
`
`
`
`
`
`
`
`
`
`
`
`IPR2023-00037
`Apple EX1028 Page 6
`
`

`

`U.S. Patent
`
`Jul. 13, 1999
`
`Sheet 6 of 6
`
`5,922,076
`
`
`
`
`
`
`
`
`
`
`
`
`
`00€
`
`HE000B0 B1 IHM
`
`
`(JE?INDEHONAS-38) Wºº
`HEIN[100
`
`302903
`
`IPR2023-00037
`Apple EX1028 Page 7
`
`

`

`1
`CLOCKING SCHEME FOR DIGITAL SIGNAL
`PROCESSOR SYSTEM
`
`FIELD OF THE INVENTION
`The present invention relates to digital Signal processors,
`and more Specifically, to a digital Signal processor System
`and method having a unique asynchronous clocking Scheme.
`BACKGROUND OF THE INVENTION
`A digital signal processor (DSP) is a special purpose
`computer that is designed to optimize performance for
`digital Signal processing applications. Such as, for examples,
`fast Fourier transforms, digital filtering, image processing
`and Speech recognition. Digital Signal processing applica
`tions typically are characterized by real time operation, high
`interrupt rates and intensive numeric computations. In
`addition, digital Signal processing applications tend to be
`intensive in memory access operations and to require the
`input and output of large quantities of data. Thus, designs of
`digital Signal processors may be quite different from those of
`general purpose computers.
`A typical digital Signal processor includes at least one
`memory for Storing digital Signal processing operations
`instructions as well as operands used in the digital Signal
`processing operations, and a core processor, connected to the
`memory, for carrying out Such operations. A digital Signal
`processor also typically includes a peripheral input/output
`(I/O) device enabling communication with, and the transfer
`of data to/from, other processors and/or external devices.
`The core processor includes Some type of computation unit
`for performing the digital signal processing operations (i.e.,
`computations) on the operands based on the instructions.
`Many different computational Schemes as well as data
`Storage and transferring Schemes have been developed for
`optimizing Speed, accuracy, size and performance of digital
`Signal processors.
`A digital Signal processor commonly operates based upon
`receipt of a Single input clock. From this Single input clock
`are derived a core processor clock, on which the core
`processor operates, and an I/O clock, on which the I/O
`device operates. It is not uncommon for the input clock and
`the I/O clock to be maintained at the same frequency.
`The core processor clock may be a multiple of this input
`clock Such that the core processor operates at a different
`(typically greater) clock frequency than that of the I/O
`device. The speed of the I/O device is limited by the speed
`of the external Signals upon which they operate. The Speed
`of Such external Signals may be limited by physical con
`Straints and capacitances and inductances of external devices
`and buses. The core processor is not So limited. Therefore,
`it is preferable to have the core processor operate at a
`different, and more optimal clock frequency.
`Some digital Signal processors allow the user to Select a
`ratio (e.g., X2, X2.5, X3, X3.5, X4...) by which the input
`clock will be multiplied to produce the core processor clock.
`This enables the user to Select, within a limited range, a core
`processor frequency that is best for the particular processor.
`AS the geometries of processors shrink, internal Speed
`paths improve, enabling faster operation. For a particular
`processor, therefore, there is an optimal Speed at which the
`processor can operate. A limitation in currently available
`processors is that the core processor frequency is limited by
`the input clock and the user-Selectable core clock ratioS
`available.
`In a digital signal processing System, a cluster (i.e., four,
`Six or eight) of processors may be interconnected by an
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,922,076
`
`2
`external bus System. A host computer, connected to each of
`the processors in the System through the bus System, may
`acceSS any of the processors. The host computer operates at
`a host clock frequency that may be unrelated
`(asynchronously related) to the input clock frequency (I/O
`clock frequency) of each of the processors in the cluster.
`When the host wishes to acceSS any of the processors,
`either the host clock and the processor I/O clock must be
`Synchronized, or asynchronous acceSS must be enabled.
`Synchronization would require Some type of external Syn
`chronizing interface between the host and each processor in
`the cluster. Alternatively, the provision of asynchronous
`acceSS would require an additional, asynchronous processor
`I/O interface. To date, each of the approaches aimed at
`enabling an asynchronously operating host to acceSS a
`processor requires complex and expensive circuitry. In
`addition, each of Such approaches may be difficult for a user
`to implement and use.
`It is a general object of the present invention to provide an
`improved processor clocking Scheme.
`SUMMARY OF THE INVENTION
`One embodiment of the invention is directed to a digital
`Signal processor. The digital signal processor receives a local
`clock and a System clock, wherein the local clock frequency
`and the System clock frequency may be asynchronous with
`one another. A core processor operates at a core clock
`frequency is a multiple of the local clock frequency. An
`external parallel port, coupled to the core processor, is
`operable at the System clock frequency or at the local clock
`frequency.
`In an embodiment of the invention, the digital Signal
`processor further includes a resynchronization circuit,
`coupled between the external parallel port and the core
`processor, that receives an input command Signal and latches
`in the command Signal when valid.
`Another embodiment of the invention is directed to a
`digital Signal processing System. The System includes a
`plurality of processors, each connected to another by an
`external bus System through an external port. A host, con
`nected to each of the plurality of processors through the
`external bus System, operates at a host clock frequency. The
`host can access each processor through the external bus
`System. The external port of each of the processorS operates
`either at a local clock frequency or at the host clock
`frequency, or at a multiple of either the local clock frequency
`or host clock frequency. Upon a host access, the clock
`frequency of the external port of each processor automati
`cally is controlled to operate at the host clock frequency.
`In one embodiment, the System further includes an exter
`nal memory unit, connected to the host and to at least one of
`the processors through the external bus System. The memory
`also operates either at the local clock frequency or at the host
`clock frequency. Upon a host access of either one of the
`processors or of the memory unit, the clock frequency of the
`memory unit also automatically is controlled to operate at
`the host clock frequency.
`In an embodiment, the clock frequency of operation of the
`external port of each processor is user-controlled.
`In an embodiment, each processor includes a Switch that
`receives a local clock and a host clock and Selects one for
`operation of the external parallel port. In one embodiment,
`the Switch includes a multiplexer.
`In an embodiment, the clock frequency of the memory
`unit is controlled by a master processor to which it is
`connected.
`
`IPR2023-00037
`Apple EX1028 Page 8
`
`

`

`3
`In an embodiment of the System, each processor of the
`System includes a core processor that operates at a multiple
`of the local clock frequency, wherein the local clock fre
`quency may be asynchronous with the host clock frequency.
`In this embodiment, each processor further includes a resyn
`chronization circuit, coupled between the core processor and
`the external port, that latches in a received command Signal
`when valid.
`A further embodiment of the invention is directed to a
`method of digital Signal processing. The method includes:
`connecting a host to a plurality of digital signal processors
`through a bus System; operating an external port of each
`processor at a local clock frequency, a host clock frequency,
`or a multiple of either the local clock frequency or host clock
`frequency; and automatically Switching operation of the
`external port of each processor to the host clock frequency
`upon an access by the host of one of the processors.
`In an embodiment, the method further includes the step of
`operating a core processor of each digital signal processor at
`a multiple of the local clock frequency, which may be
`asynchronous with the System clock frequency.
`The features and advantages of the present invention will
`be more readily understood and apparent from the following
`detailed description of the invention, which should be read
`in conjunction with the accompanying drawings and from
`the claims which are appended to the end of the detailed
`description.
`BRIEF DESCRIPTION OF THE DRAWING
`For a better understanding of the present invention, ref
`erence is made to the accompanying drawings, which are
`incorporated herein by reference.
`FIG. 1 is a block diagram of a System including a cluster
`of processors according to one embodiment of the invention.
`FIG. 2 is a block diagram of an alternate embodiment of
`the system shown in FIG. 1.
`FIG. 3 is a block diagram of the internal components of
`an exemplary processor that may be used with the present
`invention.
`FIG. 4 is a part functional, part Structural block diagram
`of certain processor components and the different clock
`Signals on which the components operate.
`FIG. 5 is a block diagram of an exemplary delay calibra
`tion circuit that may be used with a processor of the
`invention.
`FIG. 6 is a block diagram of an exemplary external port
`block that may be employed within a processor of the
`invention.
`FIG. 7 is a part functional, part Structural block diagram
`of a resynchronization circuit that may be employed within
`a processor of the invention.
`DETAILED DESCRIPTION
`One embodiment of the present invention is directed to a
`cluster of digital Signal processors interconnected by a bus
`System, and a host that can access any of the processors
`through the bus System. A periphery of each of the
`processors, connected to the bus System, operates at one of
`a local clock frequency and a host clock frequency. The host
`operates at the host clock frequency and, when the host
`accesses one of the processors, the clock frequency of
`operation of the periphery of each of the processors auto
`matically is Switched to the host clock frequency.
`Another embodiment of the present invention is directed
`to a digital Signal processor having a core processor that may
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,922,076
`
`4
`operate asynchronously with the periphery of the processor.
`In particular, the periphery of the processor, Such as an
`external parallel port, may operate at either a local clock
`frequency or a host clock frequency, wherein a user may
`Select between the two. A core processor of the digital
`processor operates at a multiple of the local clock frequency.
`The local clock frequency and the host clock frequency may
`be independently generated and may be asynchronous with
`one another.
`FIG. 1 is a block diagram showing an exemplary embodi
`ment of the present invention including a cluster of digital
`signal processors P1-P4. The system shown also includes a
`host 100 and a memory 102. The host 100, memory 102, and
`processors P1-P4 are interconnected by a bus system 104.
`The host may include an external computer that communi
`cates with each of processors P1-P4 and external memory
`102. External memory 102 may be any suitable external
`memory that operates with Such a digital Signal processing
`System Such as Synchronous Dynamic Random Access
`Memory (SDRAM). Data may be written to or read from
`each of the processors, as well as to/from the memory.
`Preferably, the external bus operates as a pipelined bus. In
`other words, the data may arrive one, two or three cycles
`after an address is issued, corresponding to a pipeline delay
`of one, two or three cycles respectively. Addresses may be
`issued on every cycle. Preferably, all signals are Sampled on
`the clock signal rising edge and must meet a Set-up time and
`a hold-time requirement.
`During operation, host 100 may acceSS any one of pro
`cessors P1-P4 or memory 102 through bus 104. Host 100
`operates on a host clock HCLK at a host clock frequency.
`Each processor P1-P4 receives the host clock HCLK and a
`local clock LCLK. In one embodiment, as explained in
`greater detail below, the host clock HCLK and local clock
`LCLK are independently generated and may be asynchro
`nous with one another.
`A periphery of each processor, that portion of the
`processor, Such as an external parallel port, which couples
`the internal components of the processor to the external bus
`system 104, may operate at either the local clock LCLK
`frequency or the host clock HCLK frequency. In one
`embodiment, as explained below, this operation is user
`Selectable. Similarly, the memory may operate at either the
`local clock LCLK frequency or the host clock HCLK
`frequency.
`In this embodiment, a buffer 110, having multiple series
`terminated outputs, provides the host clock HCLK Signal to
`each destination, which, in this embodiment, includes host
`100, each processor P1-P4, and memory 102. Similarly,
`buffer 112, also having multiple Series-terminated outputs,
`provides local clock LCLK Signal to each destination,
`which, in this embodiment, includes each processor P1-P4
`and memory 102. Each clock signal is provided on a
`separate trace, output from the buffer. The buffers ensure that
`the same clock signal timing is provided to each designation.
`During operation, a periphery of each processor P1-P4
`and memory 102 may be operating at the local clock LCLK
`frequency. When host 100 is to access one of processors
`P1-P4 or memory 102, the clock frequency of operation of
`the periphery of each processor P1-P4 automatically is
`Switched from that of the local clock LCLK to that of the
`host clock HCLK. At the same time, the clock frequency of
`operation of the memory also is Switched automatically from
`that of the local clock LCLK to that of the host clock HCLK.
`In one embodiment, the Switching occurs when a Host
`Bus Request (HBR) or Host Bus Grant (HBG) control signal
`
`IPR2023-00037
`Apple EX1028 Page 9
`
`

`

`5,922,076
`
`15
`
`25
`
`S
`is asserted by the host. Such control Signal may be provided
`to each processor causing an internal Switch (not shown) in
`each processor to Switch the clock frequency from the local
`clock LCLK to the host clock HCLK. The Switch internal to
`each processor may include a multiplexer, or the like. Glitch
`Suppression is required for any clock signal Switch to the
`processor. For example, glitch Suppression can be attained
`by waiting for one clock to go low, and holding the clock
`output until the other clock goes low, and then driving the
`output with the first clock at that point.
`In one embodiment, an external analog switch 108 selects
`one of host clock HCLK or local clock LCLK to clock the
`memory. A master processor P3 provides a control Signal
`along line 106, at the appropriate time, causing analog
`Switch 108 to select the host clock HCLK signal and
`provides such signal to memory 102. Switch 108 preferably
`is a low-resistance analog Switch, Such that the Switching
`delay is maintained to be less than 0.2 nanoSeconds. For
`example, the Switch may be made from a low-resistance
`Field Effect Transistor. For external Switch 108, the Switch
`ing from the local clock LCLK to the host clock HCLK does
`not have to be glitch-free because no memory access is
`occurring during the Switch over.
`In an alternate embodiment of the system shown in FIG.
`1, Switch 108 of FIG. 1 is replaced by an internal multiplexer
`124, shown in FIG. 2. Such a system includes four proces
`sors P1-P4, host 100, and memory 102 (see FIG. 1). Like the
`system of FIG. 1, the host operates at a host clock HCLK
`frequency and a periphery (I/O port) of each of the proces
`sors P1-P4 operates at a periphery clock PCLK frequency
`which may be equal to either the host clock HCLK fre
`quency or at the local clock LCLK frequency. Memory 102
`operates at a memory clock MCLK frequency which also
`may be equal to either the host clock HCLK frequency or at
`the local clock LCLK frequency. As in the embodiment of
`FIG. 1, upon a host access (of memory or a processor),
`periphery clock PCLK and memory clock MCLK automati
`cally are Switched to host clock HCLK. The Switching may
`be performed internally of each processor by multiplexer
`124. Multiplexer 124 is controlled to Switch automatically to
`the host clock HCLK upon a hostbus acceSS or grant. The
`output of multiplexer 124 includes periphery clock PCLK
`Signal and memory clock MCLK Signal. One master pro
`cessor P1-P4 may be selected to provide memory clock
`MCLK signal along bus 116 to memory 102.
`Each processor shown in the systems of FIGS. 1 and 2
`may be implemented having the components shown in FIG.
`3. As shown, the principle components of DSP 10 are
`computation blockS 12 and 14, a memory 16, a control block
`24, link port buffers 26, an external port 28, a DRAM
`50
`controller 30, an instruction alignment buffer (IAB) 32 and
`a primary instruction decoder 34. Computation blockS 12
`and 14, instruction alignment buffer 32, primary instruction
`decoder 34 and control block 24 constitute a core processor
`which performs the main computation and data processing
`functions of DSP 10. External port 28 controls external
`communications via an external address buS 58 and an
`external data bus 68. External port 28 may constitute the
`periphery of DSP 10. Link port buffers 26 control external
`communication via communication ports 36. DSP 10 is
`preferably configured as a Single monolithic integrated cir
`cuit.
`Memory 16 includes three independent, large capacity
`memory banks 40, 42 and 44. In an embodiment, each of
`memory banks 40, 42 and 44 has a capacity of 64K words
`of 32 bits each. Each of the memory banks 40, 42 and 44
`may have a 128-bit data bus. Up to four consecutive aligned
`
`6
`data words of 32 bits each can be transferred to or from each
`memory bank in a Single clock cycle.
`The elements of DSP 10 are interconnected by buses for
`efficient, high Speed operation. Each of the buses includes
`multiple lines for parallel transfer of binary information. A
`first address bus 50 (MAO) interconnects memory bank 40
`(M0) and control block 24. A second address bus 52 (MA1)
`interconnects memory bank 42 (M1) and control block 24.
`A third address bus 54 (MA2) interconnects memory bank
`44 (M2) and control block 24. Each of the address buses 50,
`52 and 54 may be 16-bits wide. An external address bus 56
`(MAE) interconnects external port 28 and control block 24.
`External address bus 56 is connected through external port
`28 to external address bus 58. Each of the external address
`buses 56 and 58 may be 32 bits wide. A first data bus 60
`(MD0) interconnects memory bank 40, computation blocks
`12 and 14, control block 24, link port buffers 26, IAB 32 and
`external port 28. A second data bus 62 (MD1) interconnects
`memory bank 42, computation blockS 12 and 14, control
`block 24, link port buffers 26, IAB 32 and external port 28.
`A third data bus 64 (MD2) interconnects memory bank 44,
`computation blocks 12 and 14, control block 24, link port
`buffers 26, IAB 32 and external port 28. The data buses 60,
`62 and 64 are connected through external port 28 to external
`data bus 68. Each of the data buses 60, 62 and 64 may be 128
`bits wide, and external data bus 68 may be 64 bits wide.
`The first address bus 50 and the first data bus 60 comprise
`a bus for transfer of data to and from memory bank 40. The
`Second address buS 52 and the Second data buS 62 comprise
`a Second bus for transfer of data to and from memory bank
`42. The third address bus 54 and the third data bus 64
`comprise a third bus for transfer of data to and from memory
`bank 44. Since each of memory banks 40, 42 and 44 has a
`Separate bus, memory bankS 40, 42 and 44 may be accessed
`Simultaneously. AS used herein, “data' refers to binary
`words, which may represent either instructions or operands
`that are associated with the operation of DSP 10. In a typical
`operating mode, program instructions are Stored in one of
`the memory banks, and operands are Stored in the other two
`memory banks. Thus, at least one instruction and two
`operands can be provided to computation blockS 12 and 14
`in a single clock cycle. AS described below, each of memory
`bankS 40, 42, and 44 is configured to permit reading and
`Writing of multiple data words in a Single clock cycle. The
`Simultaneous transfer of multiple data words from each
`memory bank in a Single clock cycle is accomplished
`without requiring an instruction cache or a data cache.
`The control block 24 includes a program Sequencer 70, a
`first integer ALU 72 (JALU), a second integer ALU 74 (K
`ALU), a first DMA address generator 76 (DMAGA) and a
`second DMA address generator 78 (DMAG B). Integer
`ALU's 72 and 74, at different times, execute integer ALU
`instructions and perform data address generation. During
`execution of a program, program Sequencer 70 Supplies a
`Sequence of instruction addresses on one of address buses
`50, 52, 54 and 56, depending on the memory location of the
`instruction Sequence. Typically, one of memory bankS 40, 42
`or 44 is used for Storage of the instruction Sequence. Each of
`integer ALU's 72 and 74 supplies a data address on one of
`address buses 50, 52, 54 and 56, depending on the location
`of the operand required by the instruction. ASSume, for
`example, that an instruction Sequence is Stored in memory
`bank 40 and that the required operands are Stored in memory
`banks 42 and 44. In this case, the program Sequencer
`supplies instruction addresses on address bus 50 and the
`accessed instructions are Supplied to the instruction align
`ment buffer 32, as described below. Integer ALU's 72 and 74
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`IPR2023-00037
`Apple EX1028 Page 10
`
`

`

`7
`may, for example, output addresses of operands on address
`buses 52 and 54, respectively. In response to the addresses
`generated by integer ALU's 72 and 74, memory banks 42
`and 44 Supply operands on data buses 62 and 64,
`respectively, to either or both of computation blockS 12 and
`14. Memory banks 40, 42 and 44 are interchangeable with
`respect to Storage of instructions and operands.
`Program sequencer 70 and the integer ALU's 72 and 74
`may access an external memory (not shown) via external
`port 28. The desired external memory address is placed on
`address bus 56. The external address is coupled through
`external port 28 to external address bus 58. The external
`memory Supplies the requested data word or data words on
`external data bus 68. The external data is supplied via
`external port 28 and one of the data buses 60, 62 and 64 to
`15
`one or both of computation blocks 12 and 14. The DRAM
`controller 30 controls the external memory.
`As indicated above, each of the memory banks 40, 42 and
`44 may have a capacity of 64k words of 32 bits each. Each
`memory bank may be connected to a data bus that is 128 bits
`wide. In an alternative embodiment, each data bus may be 64
`bits wide, and 64bits are transferred on each of clock phase
`1 and clock phase 2, thus providing an effective bus width
`of 128 bits. Multiple data words can be accessed in each
`memory bank in a single clock cycle. Specifically, data can
`be accessed as Single, dual or quad words of 32 bits each.
`Dual and quad accesses require the data to be aligned in
`memory. Typical applications for quad data accesses are the
`fast Fourier transform (FFT) and complex FIR filters. Quad
`accesses also assist double precision operations. Preferably,
`instructions are accessed as quad words. However, as dis
`cussed below, instructions are not required to be aligned in
`memory.
`Using quad word transfers, four instructions and eight
`operands, each of 32 bits, can be Supplied to computation
`blocks 12 and 14 in a single clock cycle. The number of data
`words transferred and the computation block or blocks to
`which the data words are transferred are selected by control
`bits in the instruction. The Single, dual, or quad data words
`can be transferred to computation block 12, to computation
`block 14, or to both. Dual and quad data word accesses
`improve the performance of DSP 10 in many applications by
`allowing Several operands to be transferred to the compu
`tation blocks 12 and 14 in a single clock cycle. The ability
`to access multiple instructions in each clock cycle allows
`multiple operations to be executed in each cycle, thereby
`improving performance. If operands can be Supplied faster
`than they are needed by the computation blocks 12 and 14,
`then there are memory cycles left over that can be used by
`DMA address generators 76 and 78 to provide new data to
`the memory banks 40, 42 and 44 during those unused cycles,
`without Stealing cycles from the core processor. Finally, the
`ability to acceSS multiple data words makes it possible to
`utilize two or more computation blocks and to keep them
`Supplied with operands. The ability to access Single or dual
`data words reduces power consumption in comparison with
`a configuration where only quad data words are accessed.
`In processor 10 shown in FIG. 3, external port 28 may
`comprise a periphery of the processor and would operate at
`a periphery clock PCLK. The remaining components of DSP
`10, in one embodiment of the invention, would operate at a
`core clock CCLK, which is a multiple of local clock LCLK,
`as described below.
`FIG. 4 is a part Structural, part functional block diagram
`of Some components of processor P1 and the clock signals
`on which they operate. The processor P1 shown includes a
`
`65
`
`45
`
`50
`
`55
`
`60
`
`5,922,076
`
`25
`
`35
`
`40
`
`8
`core processor 132, operating at a core clock CCLK
`frequency, and a periphery 126, operating at either a local
`clock LCLK frequency or a host clock HCLK frequency, or
`a multiple of either LCLK or HCLK. Periphery 126 may
`consist of external port 28 that communicates with external
`data bus 68 and external address bus 58, shown in FIG. 3.
`Processor 132 receives both the local clock LCLK signal
`and the host clock HCLK signal as inputs. Not shown in
`FIG. 4 is a delay calibration circuit through which each input
`clock signal is run to account for propagation delays, as
`described in greater detail hereinafter with reference to FIG.
`5. Both are provided to Switch 124 which selects one as the
`periphery clock PCLK to periphery 126, as described above
`with reference to FIGS. 1 and 2.
`The local clock LCLK signal also is provided to a
`frequency multiplier 128. Frequency multiplier 128 multi
`plies the local clock LCLK Signal by a ratio Selected by the
`user and outputs the product, which is the core clock signal
`CCLK, on line 130 to core processor 132. Frequency
`multiplier may, for example, include the ratios, X2, X2.5,
`X3, X3.5, X4, one of which is selected by a user to produce
`the core clock CCLK.
`This embodiment of the invention enables the frequency
`of operation of the core processor 132 to be optimized
`independently of the frequency of operation of the periphery
`126. The frequency of operation of the periphery 126 may be
`limited by the external bus should such periphery consist of
`the external parallel port. Such a limitation would not,
`however, affect the Speed of the core processor. The inven
`tion also enables the frequency of operation of the periphery
`to be optimized independently of the Speed of operation of
`the core.
`AS stated, the host clock HCLK and local clock LCLKare
`generated independently and may be asynchronous with one
`another. For example, host clock HCLK may be 66 MHz and
`local clock LCLK may be 100 MHz. When periphery 126
`operates on the local clock LCLK, it appears to operate
`Synchronously with core processor 132. AS described above,
`with reference to FIGS. 1 and 2, the Switch to operating on
`host clock HCLK occurs automatically upon an access
`request by the host. Because the core clock CCLK is related
`to the local clock LCLK and because the local clock LCLK
`may be asynchronously related to the host clock HCLK,
`periphery 126 may appear to operate asynchronously with
`core processor 132 (when operating at the host clock HCLK
`frequency). To account for Such operation, an asynchronous
`interface (not shown in FIG. 4) exists between periphery 126
`and core processor 132, and will be described in greater
`detail below.
`Given the high Speeds at which the bus operates, skew in
`the periphery clock PCLK and the memory clock MCLK
`should be minimized. In addition, skew in the core clock
`CCLK should be removed in the frequency multiplexer. In
`one embodiment of the present

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket