throbber
I 1111111111111111 11111 111111111111111 1111111111 1111111111 lll111111111111111
`
`United States Patent
`[19J
`Garde
`
`[11]Patent Number:
`5,922,076
`[45]Date of Patent:
`
`Jul. 13, 1999
`
`US005922076A
`
`
`[54]CLOCKING SCHEME FOR DIGITAL SIGNAL
`PROCESSOR SYSTEM
`
`
`
`
`
`Attorney, Agent, or Firm-Wolf, Greenfield & Sacks, P.C.
`
`[57]
`
`ABSTRACT
`
`
`
`
`Douglas Garde, Dover, Mass.[75]Inventor:
`
`
`
`
`[21]Appl. No.: 08/931,665
`
`[22]Filed:Sep. 16, 1997
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`A digital signal processing system includes a cluster of
`
`
`
`
`
`
`
`processors and a host. A host can access each of the
`Analog Devices, Inc., Norwood, Mass.[73]Assignee:
`
`
`
`
`processors through an external bus system that interconnects
`
`
`the host with each of the processors. An external port of each
`
`
`of the processors operates at one of a local clock frequency
`
`
`and host clock frequency, the local clock frequency and host
`
`
`
`clock frequency being asynchronous with one another. The
`[51]Int. Cl.6
`G06F 1/04
`
`
`
`host operates at the host clock frequency. Upon a host access
`
`
`[52]U.S. Cl. .............................................................. 713/600
`
`
`
`of one of the processors, the clock frequency of operation of
`
`
`..................................... 395/553, 555,
`[58]Field of Search
`
`
`
`
`the external parallel port of each processor automatically is
`
`
`395/556, 559; 713/500, 501, 600; 709/400,
`
`
`
`controlled to operate at the host clock frequency. In an
`248
`
`
`
`embodiment, each processor also includes a core processor
`
`
`
`that operates at a core clock frequency that is a multiple of
`
`
`the local clock frequency, asynchronous with the host clock
`
`
`frequency. Thus, the speed of operation of the core processor
`
`
`
`5,611,075 3/1997 Garde ...................................... 395/480
`
`
`
`and that of the external parallel port can be optimized
`
`
`
`5,619,720 4/1997 Garde et al. ............................ 395/800
`independently.
`
`
`
`5,685,005 11/1997 Garde et al. ............................ 395/800
`
`[56]
`
`
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`
`
`
`
`Primary Examiner-Thomas M. Heckler
`
`
`
`
`
`12 Claims, 6 Drawing Sheets
`
`LCLK
`
`HCLK
`
`120
`
`118
`
`Pi
`PROC.
`
`MUX
`
`124
`
`PCLK
`126
`
`PERIPH.
`
`LCLK
`
`128
`FREQ.
`MULT.
`
`130 CCLK
`
`132
`
`CORE
`
`

`

`Jul. 13, 1999 Sheet 1 of 6 5,922,076
`U.S. Patent
`
`FIG. j
`
`LCLK HCLK LCLK HCLK
`
`P2
`
`Pi
`
`PROC. PROC.
`
`HCLK 100
`
`102
`
`104
`
`MEM.
`
`CLK
`
`108
`
`P3
`106
`
`ANALOG SWITCH
`
`P4
`
`HOST
`
`110
`
`BUFFERS HCLK'
`
`LCLK HCLK
`
`
`HCLK's TO EACH DESTINATION
`112
`PROC. PROC.
`
`LCLK HCLK LCLK HCLK
`
`
`
`LCLK's TO EACH DESTINATION
`
`BUFFERS LCLK'
`
`FIG. 2
`
`___,.--P 1
`
`LCLK
`
`PCLK
`
`MUX
`___./12 4
`
`MCLK
`
`HCLK
`
`

`

`d •
`r:JJ. •
`
`�
`�
`...... �
`
`= ......
`
`rF.J.=­�
`
`
`�
`....
`N
`
`0
`
`....,
`O'I
`
`Ul ....
`
`\0
`N
`N....
`
`= .....:a
`0--,
`
`FIG. 3
`
`lV
`
`3
`
`�
`
`::::¢::>
`
`24
`78� 72 CONTROL BLOCK
`/ 36
`70
`-\ J ALU ;
`LINK PORT
`DMAG b
`34
`'
`.(
`COMMUNICATION
`BUFFERS "'
`60MOO
`; 76 DMAG a X ALU PROGRAM
`
`PORTS (4)
`4 CHSUM - 32x32 - SEQUENCER
`-74
`62MD1 IAB
`� MAO MA1 MAE MA2
`PRIM
`TO ALL
`-INSTR � BLOCKS
`58
`16 16 ., 32 16
`64
`DEC
`. 32 /.
`MD2
`50� 52� 56/ 54--
`ADDRESi
`,----,- 64
`128
`MD2
`- EXT
`� 128 �
`,,--62
`' . MD1
`PORT ME
`•130✓
`r----60
`MOO
`68
`. 64 /.
`. '
`42 44
`_ DATA
`_,• ! .. . I .
`�30
`.
`' . 40� DATA ADDA
`DATA ADDA DATA ADDA
`M1
`M2
`MO
`R
`2Mbit 2Mbit
`2Mbit
`MEMORYMEMORY
`MEMORY
`....,..... 16
`BANK BANK
`BANK
`COMPUTATION BLOCK X
`
`
`64K X 32 64K X 32
`64K X 32
`OR 16Kx120
`OR 16Kx120
`
`COMPUTATION BLOCKY OR 16Kx120
`
`'"""'
`��
`'"""'
`\0
`\0
`\0
`
`✓
`
`V
`
`•
`
`12
`
`-
`
`

`

`U.S. Patent
`
`
`5,922,076
`Jul. 13, 1999 Sheet 3 of 6
`
`FIG. 4
`
`LCLK
`HCLK
`
`120
`
`118
`
`P1
`PROC.
`
`MUX
`
`124
`
`126
`PCLK
`
`PERIPH.
`
`LCLK
`
`128
`FREQ.
`MULT.
`
`130
`CCLK
`
`132
`
`CORE
`
`

`

`
`
`U.S. Patent Jul. 13, 1999 Sheet 4 of 6 5,922,076
`
`5
`FIG.
`
`I
`
`OUTPUT AND
`
`I I 164
`LATCHING
`
`I
`I
`
`I
`I
`
`140
`166
`168
`T1 pd
`
`Q
`en
`
`pd'
`
`162
`COPY OF
`DISTRIBUTION
`
`DISTRIBUTION TREE
`TREE 64
`LATEN
`
`T1
`
`142 144
`
`158
`
`T2 OUTPUT
`- PAD
`
`160
`
`154 64 156 64
`
`T1
`T1
`138 ,,_.___,.....__-----,.....--,..____,.....__�
`DISTRIUTION
`DISTRIBUTION
`TREE TREE
`
`PHASE
`COPY OF
`CONTROL
`DET
`
`DELAY T2 150
`
`152
`
`I/0
`CLOCK
`146
`
`INVERTER CHAIN
`
`UPDATEN TRIEN
`
`
`
`
`
`DELAY LOCKED LOOP
`
`136
`
`

`

`U.S. Patent Jul. 13, 1999 Sheet 5 of 6
`
`5,922,076
`
`FIG. 6
`
`/ 28
`
`178 170
`IFIFO ;
`( IDFIFO
`PACK --
`i
`
`IAFIFO INPUT FIFO
`INTERNAL 180
`DATA 60
`
`ADR (DEST) :
`'""'
`BUSES
`,
`18
`(
`MO MUXES
`AND __/ 176
`62 DRIVERS 172
`( OFIFO ------ODFIFO
`Mi (
`64
`UN---
`i
`( OAFIFO OMA & PACK
`182 DIRECT WRITE
`M2
`66 ( ADR !DEST] :
`EXT AD
`OBUF
`32-c_
`MAE
`
`� DIRECT READ
`(SLAVE]
`OMAR
`174/
`, 4
`
`(6)
`
`, TA
`EXT DA
`54\C-
`68
`
`--_,,.
`
`.
`
`R
`
`58
`
`.
`
`

`

`FIG. 7
`
`d •
`r:JJ. •
`�
`�
`...... �
`
`= ......
`
`204
`
`200
`
`190
`HCLKRE-SYNCHRONIZATION LATCHES
`
`WRITE DECODER 3
`
`----.-------1 COUNTER
`192
`3 198 194 lat
`)dr LSBs
`ARBLAT
`
`WRITE FROM
`HCLK
`00, EXT BUS
`01 32 OR 64
`COMPARE
`
`
`LATCH ON RISING EDGE
`,I - I I
`--r I � Id I � J 110 DATA IN ON WR
`CCLK
`CCLK (-t)
`
`EARLY CCLK
`196
`(OR CLOCK) RISING
`_11 EDGE
`EMPTY
`��- CCLK (RE-SYNCHRONIZED)
`READ DECODER
`COUNTER
`
`170
`
`b
`
`32,64
`OR 128
`
`C
`VI%! I I I W/'.I V /21 �
`
`17t
`
`n
`
`3
`
`3
`
`'"""'
`��
`'"""'
`\0
`\0
`\0
`
`en
`
`'JJ.
`=­�
`�
`(t=3ns)
`....
`O'I
`
`0
`
`....,
`O'I
`
`206
`
`202
`
`Ul ....
`\0
`N
`N....
`
`= .....:a
`0--,
`
`

`

`1
`
`5,922,076
`
`PROCESSOR SYSTEM
`
`SUMMARY OF THE INVENTION
`
`It is a general object of the present invention to provide an
`
`2
`external bus system. A host computer, connected to each of
`
`
`
`CLOCKING SCHEME FOR DIGITAL SIGNAL
`
`
`
`the processors in the system through the bus system, may
`
`
`
`access any of the processors. The host computer operates at
`FIELD OF THE INVENTION
`
`
`a host clock frequency that may be unrelated
`
`
`5 (asynchronously related) to the input clock frequency (1/0
`The present invention relates to digital signal processors,
`
`
`
`
`
`
`
`clock frequency) of each of the processors in the cluster.
`
`
`
`
`and more specifically, to a digital signal processor system
`When the host wishes to access any of the processors,
`
`
`
`and method having a unique asynchronous clocking scheme.
`
`
`
`
`either the host clock and the processor 1/0 clock must be
`BACKGROUND OF THE INVENTION
`
`synchronized, or asynchronous access must be enabled.
`
`
`
`A digital signal processor (DSP) is a special purpose
`
`
`
`10 Synchronization would require some type of external syn­
`
`
`
`
`chronizing interface between the host and each processor in
`
`
`computer that is designed to optimize performance for
`
`
`
`the cluster. Alternatively, the provision of asynchronous
`
`
`
`
`
`digital signal processing applications such as, for examples,
`
`
`
`
`access would require an additional, asynchronous processor
`
`
`fast Fourier transforms, digital filtering, image processing
`
`
`1/0 interface. To date, each of the approaches aimed at
`
`
`
`
`and speech recognition. Digital signal processing applica­
`
`
`15 enabling an asynchronously operating host to access a
`
`
`
`
`tions typically are characterized by real time operation, high
`
`
`
`
`processor requires complex and expensive circuitry. In
`
`
`interrupt rates and intensive numeric computations. In
`
`
`
`addition, each of such approaches may be difficult for a user
`
`
`
`
`addition, digital signal processing applications tend to be
`
`to implement and use.
`
`
`
`
`intensive in memory access operations and to require the
`
`
`
`input and output of large quantities of data. Thus, designs of
`
`
`
`
`
`20 improved
`
`
`
`
`
`digital signal processors may be quite different from those of
`
`
`processor clocking scheme.
`
`general purpose computers.
`A typical digital signal processor includes at least one
`
`
`
`
`
`One embodiment of the invention is directed to a digital
`
`
`
`
`
`memory for storing digital signal processing operations
`
`
`
`
`
`signal processor. The digital signal processor receives a local
`
`
`
`instructions as well as operands used in the digital signal
`
`
`processing operations, and a core processor, connected to the 25
`
`
`clock and a system clock, wherein the local clock frequency
`
`
`
`
`
`and the system clock frequency may be asynchronous with
`
`
`
`
`memory, for carrying out such operations. A digital signal
`
`
`
`
`processor also typically includes a peripheral input/output
`
`
`
`one another. A core processor operates at a core clock
`
`
`
`frequency is a multiple of the local clock frequency. An
`
`
`
`
`(1/0) device enabling communication with, and the transfer
`
`
`
`external parallel port, coupled to the core processor, is
`
`
`
`
`of data to/from, other processors and/or external devices.
`
`The core processor includes some type of computation unit 30
`
`
`operable at the system clock frequency or at the local clock
`
`
`
`
`frequency.
`
`for performing the digital signal processing operations (i.e.,
`
`
`
`computations) on the operands based on the instructions.
`
`
`In an embodiment of the invention, the digital signal
`
`
`
`
`
`
`Many different computational schemes as well as data
`
`
`
`
`processor further includes a resynchronization circuit,
`
`
`
`
`
`storage and transferring schemes have been developed for
`
`
`coupled between the external parallel port and the core
`35
`
`
`
`
`optimizing speed, accuracy, size and performance of digital
`
`
`
`processor, that receives an input command signal and latches
`
`signal processors.
`
`in the command signal when valid.
`
`
`
`A digital signal processor commonly operates based upon
`Another embodiment of the invention is directed to a
`
`
`
`
`
`
`
`receipt of a single input clock. From this single input clock
`
`
`
`
`
`digital signal processing system. The system includes a
`
`
`
`
`are derived a core processor clock, on which the core
`
`
`
`
`plurality of processors, each connected to another by an
`40
`
`
`
`
`processor operates, and an 1/0 clock, on which the 1/0
`
`
`
`external bus system through an external port. A host, con-
`
`
`
`device operates. It is not uncommon for the input clock and
`
`
`
`nected to each of the plurality of processors through the
`
`
`the 1/0 clock to be maintained at the same frequency.
`
`
`
`external bus system, operates at a host clock frequency. The
`
`
`host can access each processor through the external bus
`
`
`The core processor clock may be a multiple of this input
`
`
`
`
`system. The external port of each of the processors operates
`clock such that the core processor operates at a different
`
`
`45
`
`
`(typically greater) clock frequency than that of the 1/0
`
`
`
`either at a local clock frequency or at the host clock
`
`
`
`
`
`frequency, or at a multiple of either the local clock frequency
`
`
`
`device. The speed of the 1/0 device is limited by the speed
`
`
`or host clock frequency. Upon a host access, the clock
`
`
`of the external signals upon which they operate. The speed
`
`
`
`
`frequency of the external port of each processor automati-
`
`
`
`
`of such external signals may be limited by physical con­
`
`50 cally is controlled to operate at the host clock frequency.
`
`
`
`
`straints and capacitances and inductances of external devices
`
`
`
`
`and buses. The core processor is not so limited. Therefore,
`
`
`In one embodiment, the system further includes an exter­
`
`
`
`it is preferable to have the core processor operate at a
`
`
`
`nal memory unit, connected to the host and to at least one of
`
`different, and more optimal clock frequency.
`
`
`
`
`the processors through the external bus system. The memory
`
`
`
`also operates either at the local clock frequency or at the host
`
`
`
`
`Some digital signal processors allow the user to select a
`
`one of the
`55 clock frequency. Upon a host access of either
`
`
`ratio (e.g., X2, X2.5, X3, X3.5, X4 .. . ) by which the input
`
`
`
`processors or of the memory unit, the clock frequency of the
`
`
`
`
`clock will be multiplied to produce the core processor clock.
`memory unit also automatically is controlled to operate at
`
`
`
`
`This enables the user to select, within a limited range, a core
`the host clock frequency.
`
`
`
`processor frequency that is best for the particular processor.
`In an embodiment, the clock frequency of operation of the
`
`
`
`
`
`As the geometries of processors shrink, internal speed
`
`
`60 external port of each processor is user-controlled.
`
`
`
`
`paths improve, enabling faster operation. For a particular
`
`
`
`
`processor, therefore, there is an optimal speed at which the
`
`
`
`
`In an embodiment, each processor includes a switch that
`
`
`
`
`processor can operate. A limitation in currently available
`
`
`receives a local clock and a host clock and selects one for
`
`
`
`processors is that the core processor frequency is limited by
`
`
`operation of the external parallel port. In one embodiment,
`
`
`the input clock and the user-selectable core clock ratios
`
`
`the switch includes a multiplexer.
`available.
`
`
`In an embodiment, the clock frequency of the memory
`65
`
`
`
`
`
`
`In a digital signal processing system, a cluster (i.e., four,
`
`
`unit is controlled by a master processor to which it is
`
`
`
`
`six or eight) of processors may be interconnected by an
`connected.
`
`

`

`5,922,076
`
`
`
`4
`3
`
`
`
`In an embodiment of the system, each processor of the
`operate asynchronously with the periphery of the processor.
`
`
`
`
`
`
`In particular, the periphery of the processor, such as an
`
`
`
`system includes a core processor that operates at a multiple
`
`
`
`
`
`external parallel port, may operate at either a local clock
`
`of the local clock frequency, wherein the local clock fre­
`
`
`
`frequency or a host clock frequency, wherein a user may
`
`quency may be asynchronous with the host clock frequency.
`
`
`
`5 select between the two. A core processor of the digital
`
`
`
`
`In this embodiment, each processor further includes a resyn­
`
`
`
`processor operates at a multiple of the local clock frequency.
`
`
`
`
`chronization circuit, coupled between the core processor and
`
`
`
`
`The local clock frequency and the host clock frequency may
`
`
`
`the external port, that latches in a received command signal
`
`
`
`be independently generated and may be asynchronous with
`when valid.
`one another.
`A further embodiment of the invention is directed to a
`
`
`
`
`
`: 10
`FIG. 1 is a block diagram showing an exemplary embodi-
`
`
`
`method of digital signal processing. The method includes
`
`
`
`
`
`
`
`
`ment of the present invention including a cluster of digital
`
`
`
`
`connecting a host to a plurality of digital signal processors
`
`signal processors Pl-P4. The system shown also includes a
`
`
`
`
`
`through a bus system; operating an external port of each
`host 100 and a memory 102. The host 100, memory 102, and
`
`
`processor at a local clock frequency, a host clock frequency,
`
`
`processors Pl-P4 are interconnected by a bus system 104.
`
`
`
`or a multiple of either the local clock frequency or host clock
`
`
`
`15 The host may include an external computer that communi­
`
`frequency; and automatically switching operation of the
`
`
`
`cates with each of processors Pl-P4 and external memory
`
`
`external port of each processor to the host clock frequency
`
`
`102.External memory 102 may be any suitable external
`
`
`upon an access by the host of one of the processors.
`
`
`
`memory that operates with such a digital signal processing
`
`
`
`In an embodiment, the method further includes the step of
`
`system such as Synchronous Dynamic Random Access
`
`
`operating a core processor of each digital signal processor at
`20 Memory (SDRAM). Data may be
`
`
`written to or read from
`a multiple of the local clock frequency, which may be
`
`
`each of the processors, as well as to/from the memory.
`
`asynchronous with the system clock frequency.
`
`
`
`Preferably, the external bus operates as a pipelined bus. In
`
`
`
`
`The features and advantages of the present invention will
`
`
`other words, the data may arrive one, two or three cycles
`
`
`
`
`be more readily understood and apparent from the following
`
`
`after an address is issued, corresponding to a pipeline delay
`
`
`
`detailed description of the invention, which should be read
`
`
`
`25 of one, two or three cycles respectively. Addresses may be
`
`
`
`in conjunction with the accompanying drawings and from
`
`
`
`
`
`issued on every cycle. Preferably, all signals are sampled on
`
`
`the claims which are appended to the end of the detailed
`
`
`the clock signal rising edge and must meet a set-up time and
`description.
`
`a hold-time requirement.
`During operation, host 100 may access any one of pro-
`
`
`30
`
`
`
`cessors Pl-P4 or memory 102 through bus 104. Host 100
`
`
`
`
`
`For a better understanding of the present invention, ref­
`
`operates on a host clock HCLK at a host clock frequency.
`
`
`
`erence is made to the accompanying drawings, which are
`
`
`Each processor Pl-P4 receives the host clock HCLK and a
`
`
`incorporated herein by reference.
`
`
`
`local clock LCLK. In one embodiment, as explained in
`
`
`
`FIG. 1 is a block diagram of a system including a cluster
`
`
`greater detail below, the host clock HCLK and local clock
`
`
`
`
`of processors according to one embodiment of the invention.
`
`35 LCLK are independently generated and may be asynchro­
`
`
`
`FIG. 2 is a block diagram of an alternate embodiment of
`nous with one another.
`
`the system shown in FIG. 1.
`A periphery of each processor, that portion of the
`
`
`
`FIG. 3 is a block diagram of the internal components of
`
`
`
`
`
`
`
`processor, such as an external parallel port, which couples
`
`
`
`an exemplary processor that may be used with the present
`
`
`
`to the external bus 40 the internal components of the processor
`invention.
`
`
`system 104, may operate at either the local clock LCLK
`FIG. 4 is a part functional, part structural block diagram
`
`
`
`
`
`frequency or the host clock HCLK frequency. In one
`
`
`
`
`of certain processor components and the different clock
`
`
`
`embodiment, as explained below, this operation is user­
`
`
`
`signals on which the components operate.
`
`
`
`
`selectable. Similarly, the memory may operate at either the
`
`
`
`FIG. 5 is a block diagram of an exemplary delay calibra­
`
`
`local clock LCLK frequency or the host clock HCLK
`45
`
`
`tion circuit that may be used with a processor of the
`frequency.
`invention.
`In this embodiment, a buffer 110, having multiple series­
`
`
`
`
`
`
`
`FIG. 6 is a block diagram of an exemplary external port
`
`
`
`terminated outputs, provides the host clock HCLK signal to
`
`
`
`
`
`block that may be employed within a processor of the
`
`
`
`
`each destination, which, in this embodiment, includes host
`invention.
`
`
`102. Similarly, 50 100, each processor Pl-P4, and memory
`
`
`buffer 112, also having multiple series-terminated outputs,
`
`
`
`FIG. 7 is a part functional, part structural block diagram
`
`
`
`provides local clock LCLK signal to each destination,
`
`
`of a resynchronization circuit that may be employed within
`
`which, in this embodiment, includes each processor Pl-P4
`
`
`
`
`a processor of the invention.
`
`
`
`and memory 102. Each clock signal is provided on a
`
`
`
`
`
`buffer. The buffers ensure that 55 separate trace, output from the
`
`
`the same clock signal timing is provided to each designation.
`
`
`
`
`One embodiment of the present invention is directed to a
`
`During operation, a periphery of each processor Pl-P4
`
`
`
`cluster of digital signal processors interconnected by a bus
`
`
`
`
`and memory 102 may be operating at the local clock LCLK
`
`
`system, and a host that can access any of the processors
`
`
`
`frequency. When host 100 is to access one of processors
`
`
`
`
`through the bus system. A periphery of each of the
`
`
`processors, connected to the bus system, operates at one of 60
`
`
`Pl-P4 or memory 102, the clock frequency of operation of
`
`
`
`
`
`
`the periphery of each processor Pl-P4 automatically is
`
`
`a local clock frequency and a host clock frequency. The host
`
`
`
`
`operates at the host clock frequency and, when the host
`
`
`
`switched from that of the local clock LCLK to that of the
`
`
`
`
`accesses one of the processors, the clock frequency of
`
`host clock HCLK. At the same time, the clock frequency of
`
`operation of the periphery of each of the processors auto­
`
`
`
`operation of the memory also is switched automatically from
`
`
`matically is switched to the host clock frequency.
`
`host clock HCLK. 65 that of the local clock LCLK to that of the
`
`
`
`
`
`
`
`
`
`Another embodiment of the present invention is directed In one embodiment, the switching occurs when a Host
`
`
`
`
`
`
`
`to a digital signal processor having a core processor that may Bus Request (HER) or Host Bus Grant (HBG) control signal
`
`BRIEF DESCRIPTION OF IBE DRAWING
`
`DETAILED DESCRIPTION
`
`

`

`5,922,076
`
`6
`5
`is asserted by the host. Such control signal
`
`
`may be provided data words of 32 bits each can be transferred to or from each
`
`
`
`
`
`
`
`
`
`
`to each processor causing an internal
`
`
`memory bank in switch (not shown) in a single clock cycle.
`each processor to switch the clock frequency from the local
`
`
`
`
`
`The elements of DSP 10 are interconnected by buses for
`
`clock LCLK to the host clock HCLK. The switch internal to
`
`
`
`
`
`
`
`efficient, high speed operation. Each of the buses includes
`each processor may include a multiplexer, or the like. Glitch
`
`
`
`
`
`
`
`
`5 multiple lines for parallel transfer of binary information. A
`
`
`
`suppression is required for any clock signal switch to the
`
`
`
`first address bus 50 (MAO) interconnects memory bank 40
`
`
`
`processor. For example, glitch suppression can be attained
`
`
`(MO) and control block 24. A second address bus 52 (MAl)
`
`
`by waiting for one clock to go low, and holding the clock
`
`
`interconnects memory bank 42 (Ml) and control block 24.
`
`
`
`output until the other clock goes low, and then driving the
`
`
`
`A third address bus 54 (MA2) interconnects memory bank
`
`
`output with the first clock at that point.
`
`
`buses 50, 10 44 (M2) and control block 24. Each of the address
`
`
`
`
`In one embodiment, an external analog switch 108 selects
`
`
`
`52 and 54 may be 16-bits wide. An external address bus 56
`one of host clock HCLK or local clock LCLK to clock the
`
`
`
`(MAE) interconnects external port 28 and control block 24.
`
`
`
`
`
`memory. A master processor P3 provides a control signal
`
`
`
`
`External address bus 56 is connected through external port
`
`
`
`along line 106, at the appropriate time, causing analog
`
`
`
`28 to external address bus 58. Each of the external address
`
`switch 108 to select the host clock HCLK signal and
`
`
`15 buses 56 and 58 may be 32 bits wide. A first data bus 60
`
`
`provides such signal to memory 102. Switch 108 preferably
`
`
`(MDO) interconnects memory bank 40, computation blocks
`
`
`is a low-resistance analog switch, such that the switching
`
`
`12 and 14, control block 24, link port buffers 26, IAB 32 and
`
`
`delay is maintained to be less than 0.2 nanoseconds. For
`
`
`external port 28. A second data bus 62 (MDl) interconnects
`
`example, the switch may be made from a low-resistance
`
`memory bank 42, computation blocks 12 and 14, control
`
`
`
`Field Effect Transistor. For external switch 108, the switch­
`
`
`20 block 24, link port buffers 26, IAB 32 and external port 28.
`
`
`ing from the local clock LCLK to the host clock HCLK does
`
`
`A third data bus 64 (MD2) interconnects memory bank 44,
`
`
`
`not have to be glitch-free because no memory access is
`
`
`computation blocks 12 and 14, control block 24, link port
`
`occurring during the switch over.
`
`
`buffers 26, IAB 32 and external port 28. The data buses 60,
`
`
`
`
`
`
`In an alternate embodiment of the system shown in FIG. 62 and 64 are connected through external port 28 to external
`
`
`
`1, switch 108 of FIG. 1 is replaced
`
`25 data bus 68. Each of the data by an internal multiplexer buses 60, 62 and 64 may be 128
`124, shown in FIG. 2. Such a system includes
`
`
`bits wide, and external four proces­ data bus 68 may be 64 bits wide.
`
`
`sors Pl-P4, host 100, and memory 102 (see FIG. 1). Like the
`
`
`The first address bus 50 and the first data bus 60 comprise
`
`
`system of FIG. 1, the host operates at a host clock HCLK
`
`
`a bus for transfer of data to and from memory bank 40. The
`
`
`
`frequency and a periphery (1/0 port) of each of the proces­
`
`
`
`second address bus 52 and the second data bus 62 comprise
`
`
`sors Pl-P4 operates at a periphery clock PCLK frequency
`
`
`
`
`30 a second bus for transfer of data to and from memory bank
`
`which may be equal to either the host clock HCLK fre­
`
`
`42.The third address bus 54 and the third data bus 64
`
`quency or at the local clock LCLK frequency. Memory 102
`
`
`
`comprise a third bus for transfer of data to and from memory
`
`
`
`operates at a memory clock MCLK frequency which also
`
`bank 44. Since each of memory banks 40, 42 and 44 has a
`
`
`may be equal to either the host clock HCLK frequency or at
`
`separate bus, memory banks 40, 42 and 44 may be accessed
`
`
`the local clock LCLK frequency. As in the embodiment of 35
`
`
`
`simultaneously. As used herein, "data" refers to binary
`
`
`FIG. 1, upon a host access (of memory or a processor),
`
`
`
`words, which may represent either instructions or operands
`
`periphery clock PCLK and memory clock MCLK automati­
`
`
`
`that are associated with the operation of DSP 10. In a typical
`
`
`cally are switched to host clock HCLK. The switching may
`
`
`
`
`operating mode, program instructions are stored in one of
`
`
`be performed internally of each processor by multiplexer
`
`
`
`
`the memory banks, and operands are stored in the other two
`
`124.Multiplexer 124 is controlled to switch automatically to 40
`
`
`
`
`
`memory banks. Thus, at least one instruction and two
`
`
`
`the host clock HCLK upon a host bus access or grant. The
`
`
`
`
`operands can be provided to computation blocks 12 and 14
`
`
`
`
`output of multiplexer 124 includes periphery clock PCLK
`
`
`
`
`in a single clock cycle. As described below, each of memory
`signal and memory clock MCLK signal. One master pro­
`
`
`
`
`
`
`andbanks 40, 42, and 44 is confipermit reading gu red to
`cessor Pl-P4 may be selected to provide memory clock
`
`
`
`
`
`
`
`writing of multiple data words in a single clock cycle. The
`
`
`MCLK signal along bus 116 to memory 102.
`
`
`
`45 simultaneous transfer of multiple data words from each
`
`
`
`Each processor shown in the systems of FIGS. 1 and 2
`
`
`memory bank in a single clock cycle is accomplished
`
`
`
`
`
`without may be implemented having the components shown in FIG. requiring an instruction cache or a data cache.
`
`3.As shown, the principle components of DSP 10 are
`
`
`
`
`
`The control block 24 includes a program sequencer 70, a
`
`
`
`computation blocks 12 and 14, a memory 16, a control block
`
`
`
`first integer ALU 72 (J ALU), a second integer ALU 74 (K
`
`24, link port buffers 26, an external port 28, a DRAM 50
`
`
`
`ALU), a first DMA address generator 76 (DMAG A) and a
`
`
`controller 30, an instruction alignment buffer (IAB) 32 and
`
`
`second DMA address generator 78 (DMAG B). Integer
`
`
`
`
`a primary instruction decoder 34. Computation blocks 12
`
`
`
`ALU's 72 and 74, at different times, execute integer ALU
`
`
`and 14, instruction alignment buffer 32, primary instruction
`
`
`
`instructions and perform data address generation. During
`
`
`
`decoder 34 and control block 24 constitute a core processor
`
`
`
`
`
`execution of a program, program sequencer 70 supplies a
`
`
`which performs the main computation and data processing
`
`
`
`
`sequence of instruction addresses on one of address buses
`55
`
`
`
`functions of DSP 10. External port 28 controls external
`
`
`
`50, 52, 54 and 56, depending on the memory location of the
`
`
`
`communications via an external address bus 58 and an
`
`
`
`
`instruction sequence. Typically, one of memory banks 40, 42
`
`
`
`external data bus 68. External port 28 may constitute the
`
`
`or 44 is used for storage of the instruction sequence. Each of
`
`
`periphery of DSP 10. Link port buffers 26 control external
`
`
`
`integer ALU's 72 and 74 supplies a data address on one of
`
`
`communication via communication ports 36. DSP 10 is 60
`
`
`
`address buses 50, 52, 54 and 56, depending on the location
`
`
`
`integrated cir-preferably configu red as a single monolithic
`
`
`
`
`
`of the operand required by the instruction. Assume, for
`cuit.
`
`
`
`
`example, that an instruction sequence is stored in memory
`
`
`
`Memory 16 includes three independent,
`
`
`
`bank 40 and that the large capacity required operands are stored in memory
`
`memory banks
`banks 42 and 44. In this case, 40, 42 and 44. In an embodiment, each of the program sequencer
`
`
`
`
`
`
`memory banks 40, 42 and 44 has a capacity of 64K words 65 supplies instruction addresses on address bus 50 and the
`
`
`
`
`
`
`
`
`
`of 32 bits each. Each of the memory banks 40, 42 and 44 accessed instructions are supplied to the instruction align­
`
`
`
`
`
`
`
`
`
`may have a 128-bit data bus. Up to four consecutive aligned ment buffer 32, as described below. Integer ALU's 72 and 74
`
`

`

`5,922,076
`
`7
`8
`core processor on address 132, operating at a core clock CCLK
`
`
`
`
`
`
`
`may, for example, output addresses of operands
`
`
`
`
`buses 52 and 54, respectively. In response to the
`
`
`
`frequency, addresses and a periphery 126, operating at either a local
`
`generated by integer
`
`clock LCLK frequency ALU's 72 and 74, memory banks 42 or a host clock HCLK frequency, or
`
`
`
`and 44 supply operands
`
`
`a multiple on data buses 62 and 64, of either LCLK or HCLK. Periphery 126 may
`
`
`
`
`5 consist of external port 28 that communicates with external
`
`respectively, to either or both of computation blocks 12 and
`
`
`
`
`
`data bus 68 and external address bus 58, shown in FIG. 3.
`
`14.Memory banks 40, 42 and 44 are interchangeable with
`
`
`
`respect to storage of instructions and operands.
`
`
`
`Processor 132 receives both the local clock LCLK signal
`
`
`and the host clock HCLK signal as inputs. Not shown in
`
`
`
`
`Program sequencer 70 and the integer ALU's 72 and 74
`
`
`FIG. 4 is a delay calibration circuit through which each input
`
`
`
`may access an external memory (not shown) via external
`10
`
`
`
`
`clock signal is run to account for propagation delays, as
`
`
`
`port 28. The desired external memory address is placed on
`
`
`
`
`
`described in greater detail hereinafter with reference to FIG.
`
`
`
`address bus 56. The external address is coupled through
`
`
`
`
`5.Both are provided to switch 124 which selects one as the
`
`
`
`external port 28 to external address bus 58. The external
`
`
`
`periphery clock PCLK to periphery 126, as described above
`
`
`memory supplies the requested data word or data words on
`
`with reference to FIGS. 1 and 2.
`
`
`
`external data bus 68. The external data is supplied via
`
`
`external port 28 and one of the data buses 60, 62 and 64 to 15
`
`The local clock LCLK signal also is provided to a
`
`
`
`
`
`one or both of computation blocks 12 and 14. The DRAM
`
`
`
`
`
`frequency multiplier 128. Frequency multiplier 128 multi­
`
`
`
`controller 30 controls the external memory.
`
`
`
`plies the local clock LCLK signal by a ratio selected by the
`
`
`user and outputs the product, which is the core clock signal
`
`
`As indicated above, each of the memory banks 40, 42 and
`
`
`CCLK, on line 130 to core processor 132. Frequency
`
`
`44 may have a capacity of 64k words of 32 bits each. Each
`
`memory bank may be connected to a data bus that is 128 bits 20
`
`
`
`multiplier may, for example, include the ratios, X2, X2.5,
`
`
`X3, X3.5, X4, one of which is selected by a user to produce
`
`
`wide. In an alternative embodiment, each data bus may be 64
`the core clock CCLK.
`
`bits wide, and 64 bits are transferred on each of clock phase
`
`
`1 and clock phase 2, thus providing an effective bus width
`This embodiment of the invention enables the frequency
`
`
`
`
`of 128 bits. Multiple data words can be accessed in each
`
`
`
`of operation of the core processor 132 to be optimized
`
`memory bank in a single clock cycle. Specifically, data can 25
`
`
`
`
`
`
`independently of the frequency of operation of the periphery
`
`be accessed as single, dual or quad words of 32 bits each.
`
`
`
`126.The frequency of operation of the periphery 126 may be
`
`
`
`
`Dual and quad accesses require the data to be aligned in
`
`
`
`
`limited by the external bus should such periphery consist of
`
`
`
`memory. Typical applications for quad data accesses are the
`
`
`the external parallel port. Such a limitation would not,
`
`
`
`
`fast Fourier transform (FFT) and complex FIR filters. Quad
`
`
`
`however, affect the speed of the core processor. The inven­
`30
`
`
`
`
`accesses also assist double precision operations. Preferably,
`
`
`
`tion also enables the frequency of operation of the periphery
`
`
`
`instructions are accessed as quad words. However, as dis­
`
`
`
`to be optimized independently of the speed of operation of
`
`
`
`
`
`
`cussed below, instructions are not required to be aligned in
`the core.
`memory.
`
`As stated, the host clock HCLK and local clock LCLK are
`
`
`Using quad word transfers, four instructions and eight
`
`
`and may be asynchronous with one 35 generated independently
`
`operands, each of 32 bits, can be supplied to computation
`
`
`
`
`
`another. For example, host clock HCLK may be 66 MHz and
`blocks 12 and 14 in a single clock cycle. The number of data
`
`
`
`
`local clock LCLK may be 100 MHz. When periphery 126
`words transferred and the computati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket