`“3°”°'“ ""E""'i
`4 s B1D‘i2H‘|fi182022242fi2B303234
`7 9 11 1:! 15 l?‘ 19 2123 25 2}‘ 29 3133 35
`0 Ooeoeoeooeeeeooooo 9 900 090 0
`9 9 9 9 9 9 9
`5’ 5’ 9 G 5’ G 9 9 9 ‘3 9 0 0 9 9 9
`Single-Chip Parallel MIMD DSP
`Over 2 Billion RISC-Like Operations per
`Master Processor (MP)
`- IEEE-754 Floating—Polnt Capability
`- 4|-(-Byte Instruction Cache
`— 4K-Byte Data Cache
`4 Parallel Processors
`- 32-Bit Advanced DSP (ADSP) Processors
`— 64-Bit Opcode Provides Many Parallel
`Operations per Cycle
`- 2K-Byte Instruction Cache and BK Bytes
`oi Data RAM per ADSP
`Transfer Controller {TC}
`— B4-Bit Data Transfers
`- 320M-Byte-is for ’32oC8o-40
`- 400M-Byteis for 'a2ocao-so
`- 32-Bit Addressing
`- Direct DRAMIVHAM Interface With
`Dynamic Bus Sizing
`Intelligent Queuing and cycle
`' Video Controller (VG)
`‘ "'°""*°9 V"‘°° T"“'"9 9"“ VFW“
`- Dual Frame Timers for 2 Simultaneous
`image Capture and! or Display Systems
`' Big- or Little-Endian Operation
`50K Bytes or On-Chip RAM
`4G-Byte Address Space
`25—ns cycle Time for '32ocao-4o
`29'"! Cycle “me ‘'9' 32999959
`3-3‘V °P°'3"°" Wm‘ 5'V “O
`'EEE-1149-1 t T951 POI‘! (-JTAG}
`The TMS320CBO multimedia video processor (MVP) is a single chip, MIME) {multiple instruction I multiple data)
`parallel processor capable of performing over 2 billion operations per second. it consists ofa 32-bit Fl|SC master
`processor with an 80-MF|op IEEE floating-point unit. four 32-bit advanced DSP (ADSP) processors, a transfer
`controller with 320M-bytei’s forthe TM 8320080-40 or 400M-byteis for the TMS320C8D-50 off-chip transfer rate,
`and avideo controller. All the processors are tightly coupled via an on-chip crossbarlhal provides shared access
`to on-chip RAM. This performance and programrnabiiity make the ‘C80 ideally suited for multimedia and imaging
`1 IEEE Std 11491-1990. IEEE Standard Test Access Port and Boundary—Scan Architecture
`in mi umpliln-3 or
`'"=‘ W
`Copyright CD 1994. T9: as instruments incorporated
`Volkswagen 1015
`C 25
`D10 T
`D20 %
`D34 K
`E1 1
`3” W
`B” W
`0(:0:‘ 0onto
`> E1
`2 0m E -
`O :
`n E :
`' O z
`POST OFFICE aox I443 ' HOUSTON. TEXAS 73:51-14:43
`AP 1 2
`AP1 4
`AP! 3
`AP 26
`A A
`D 20
`D 2 4
`0"} U3|!UN 1'
`(.0!N 93We0“O
`SPRSCI23 - JULY 1994
` |
`C00’)I zona- I
`(00) O‘!
` U
`I ==-a
`A1 6
`AG 5
`B N
`AL1 ?
`AL! 9
`A M1 0
`AM 1 2
`AM I 6
`AMI 3
`AM 20
`5 7
`AFI 1 1
`AR1 1’
`AR 1 9
`AR 23
`AR 25
`AR 27
`0'} (D
` )1-'.u:n
`PO57 OFFICE ac}-(1443 ' HOUSYCJN. Tans 7'?251—Iu:I
`SF‘ RS023 — JULY 1994
`E I 7
`A 5
`M 0:
`D 3
`AH 34
`Man M
`‘I1 5
`I33'0 z
`E (J
`E :-
`00 O'}(.I"J{.u')(."JO!¢J1.n.I'.-‘I
`0lb03 -J
`(‘)0l'l'E5C2 —|
`— 0T1
`3 -4
`O "4
`S 5
`> 3> Z O"
`1 E '
`o W 5 2
`' 0 Z
`SPF-ISO23 - JULY ‘I994
`2 Q I
`-< E I
`: Ou
`. E u
`P34 —
`L35 —
`A11. A19. A25. ca. 09.
`027. D6. D12. D18. D24.
`D30. E5. E13. E23. E31,
`F4, F10. F16. F22. F26.
`F32. .13. .133. L5. L31. M4.
`M32. N5, N31. R1. R35.
`v4. V32. W5. M1. M1.
`M35. AC5. AC31, A04.
`A032, AE5. AE31. AG3.
`AG33. M5. AJ31. AK-I.
`Ame. AK14, AI-(20. AK26.
`AK32. AL5. AL13, AL23,
`AL31. AM6. AM12. AM1B.
`AM2-1. AM30. AN9. AN2?,
`F13, U5. U31.
`MT s
`A7, A11’. A29. B6. B12,
`B13. B24, B30. C15. C21.
`D4, D32, F2, F8. F12.
`F20. F24. F28. F34. G1.
`G35. J5. J31. M2, M34.
`N1. N35, FI3, FI5, R31,
`R33, U1. U35. V2, V34.
`NR3. AA5. AA31, AA33,
`AC1. AC35. AD2. A034,
`AG5. AG31. AJ1. AJ35.
`AK2. AKB. AK12, AI-(16.
`AK24. AK28. AK34. AM4,
`AM32. RN15, AN21.
`AN33. AP6. AP12. AP18.
`0 E> C
`*5‘ TEXAS
`SPRSO23 — JULY 1994
`T W2‘
`Terminal Functions
`Address bus. These terminals output the 32-bit byte address of the external memory cycle. The
`address can be multiplexed for DRAM accesses.
`Address shift selection. These signals determine how the column address appears on the address bus.
`Eight shift values are supported. inciudlng zero.
`Bus-size selection. These signals indicate the bus size of the memory orother device being accessed.
`allowing dynamic bus sizing for data buses less than 64 bits wide. as indicated below:
`8 bits
`16 bits
`32 bits
`64 bits
`Cycle-timing selection. These input signals determine the timing of the current memory:
`1 cycloicolumn
`Nonpipelined. 1 cyclercoiumn
`2 cycleicolumn
`3 cycleicolumn
`Data bus. These signals transfer up to Sat bits of data per memory cycle into or out of the ‘C80.
`Data buffer enable. This signal drives the active-low output enables of bidirectional transceivers that
`can be used to buffer input and output data on D63-DO.
`Data-direction indicator. This signal indicate the direction of the data that passes through the
`transceivers. when DDIN is low, the transfer is from external memory into the ‘Geo.
`Fault. This input signal is driven tow by external circuitry to inform the ‘CBO that a fault occurred on the
`current memory row access.
`Page-size indication. These signals indicate the page size of the memory device{s] being accessed
`by the current cycle. The ‘C80 uses this to determine when to begin a new row access.
`Fieady. This signal indicates that the external device is ready to compiete the memory cycle. This signal
`is driven low by external circuitry to insert wail states into a memory cycle.
`Flow latch. The high-to—|ow transition of K can be used to latch the valid 32-bit byte address that is
`present on A3‘! — A0.
`Ftetry. This signal is driven low by external circuitry to indicate that the addressed memory is busy. The
`CBO will begin the cycle again.
`A31 A0
`CT‘ ‘Cm
`> E
`2 Om 2 ‘
`0 I
`t 3
`STATUS5's-LATUSO H identity the processor and type of request that initiated the cycle.
`Status code. At row time. these sig nais indicate the type of cycie being perlorrn ed. At column time. they
`% i
`User-timing selection. This signal causes the timing of FIAS and CAS?—CASD to be modified so that
`custom memory timings can be generated. During reset. UTIME selects the endian mode in which the
`‘C80 operates.
`Cclumnvaddress strobes. These outputs drive the CAS inputs of DFtAMs and VFtAMs. The eight
`Strobes provide byte write access to memory.
`Special function. This signal selects special VRAM functions such as block write. load color register,
`and split—register transfer.
`n Row-address strobe. This signal drives the fifs‘ Inputs of DFlAMs and \rFtAMs.
`Transierfoutput enable. During memory-read cycles. TRG is used as an output enable for DFtAMs and
`VFtAMs. During VRAM register-transfer cycles, TFIG is used as a transfer enable.
`Write enable. This signal is driven low before CE during write cycles. W controls the direction of the
`transfer during VFIAM transfer cycie-s.
`T t = input. 0 = output. 2 = high impedance
`‘iv’? TEXAS
`L '»’°=’=‘
`Terminel Functions (Continued)
`SPRSD23 - JULY 1994
`Host acknowledge. The ‘CEO drives this terminal low following an active HFIEQ to indicate that it has
`driven the local-memory-bus signals to the high-impedance state and is relinquishing the bus. HACK
`is driven high asynchronously following HFIEO being detected inactive and the ‘C80 resumes driving
`the bus.
`Host request. An external device drives this input low to request ownership of the locaivmentory bus.
`when HHEQ is high. the ‘C80 owns and drives the bus. HFiE0 is internally synchronized to the 'CBO's
`internal clock. HREO is also used at reset to determine the power-up state of the MP. ll HREQ is low
`at the rising edge of RESET. the MP comes up running. it HREO is high. the MP remains halted until
`the first interrupt occurrence on EINT3.
`Internal cycle request. These signals provide a two-bit code indicating the highest priority
`memory-cycle request that is being received by the TC. External logic can monitor these signals to
`delomtine if it is necessary to relinquish the local-memory bus to the ‘CED.
`Low-priority packet transfer. trickle refresh. idle
`High-priority packet transfer
`CacheiDEA {direct external access} request. urgent packet transfer
`VC SRT {seria|—register transfer). urgent refresh. XPT (external packet
`transfer) or VCPT {VC packet transfer)
`In put ctoc it. This signal generates the Intern at ‘C30 clocks to which all processor functions (exce pt the
`frame timers) are synchronous.
`Local output clock. This signal provides a way to synchronize external circuitry to internal timings. All
`‘C80 output signals [except the VC signals] are synchronous to this clock.
`Edge-triggered interrupts. These signals allow external devices to interrupt the MP on one of three
`interrupt levels (ElNTi
`is the highest priority}. The interrupts are n'sing»edge triggered. EINT3 also
`serves as an unhali signal. It the MP is powered up halted. the first rising edge on EINT3 causes the
`MP to unhelt and fetch its reset vector (the EINT3 interrupt pending bit is not set in this case}.
`priority falls below that of the edge-triggered interrupts. Any interrupt request should remain low until
`it is recognized by the ‘CBO.
`Reset. This signal is driven low to reset the ‘C80 (ell processors]. During reset, all internal registers
`are set to their initial state and all outputs are driven to their inactive or high-impedance levels. During
`the rising edge of RESET. the MP reset mode and the ‘CEO's operating endian mode are determined
`by the levels of HFIEO and LiTiME terminals. respectively.
`Exlemal packet transfer. These encoded inputs are used by external devices to request a high-priority
`XPT by the TC.
`Emulation terminals. These terminals are used to support emulation host interrupts. special functions
`targeted at a single processor. and multiprocessor halt-event communications.
`Test clock. This signal provides the clock for the 'C8D's IEEE-1149.! logic. allowing it to be compatible
`with other IEEE-1149.1 devices. controllers. and test equipment designed for different clock rates.
`Test data input. This signal provides input data for all lEEE-1149.1 instructions and data scans of the
`Test data output. This signal provides output data for all IEEE—1t49.t instructions and data scans of
`the ‘CBO.
`Test mode select. This signal controls the IE EE-11 49.1 state machine.
`Test reset. This signal resets the 'CBD‘s IEEE-1149.1 module. when low. ell boundary-scan logic is
`disabled, allowing normal ‘C80 operation.
`Tl = input. 0 = output. 2 = high impedance
`3 This terminal has an internal puliup and can be left unccrinnecied during normal operation.
`5 This terminal has an internal pulidown and can be left unconnnected during normal operation.
`POST DFFlCE 8011443 ' HOUSTON. TEXAS i-"'i“25i-IM3
`$l'-‘R5023 - .lULY199rt
` rrorzt
`Terminal Functions (Continued)
`Composite area. These signals define e spacial area such as an overscan boundary. This area
`represents the logical OR of the internal horizontal and vertical area signals.
`Composite blankinglvertical blanking. Each of these signals provides one of two blanking lunctions.
`depending on the configuration of the CSYNCEHBLNK terminal:
`Composite blanking disables pixel displaylcapture during both horizontal and vertical retrace
`periods and is enabled when CSYNC is selected for composite sync video systems.
`Vertical blanking disables pixel displaylcapture during vertical retrace periods and is enabled when
`HBLNK is selected for separate-sync video systems.
`Initially these signals are configured as CBLN K0. CBLNKl
`Composite synclltorizontal blanking. These terminals can be programmed for one of two functions:
`Composite sync is for use on composite-sync video systems and can be programmed as an input.
`output. or high-impedance signal. As an input. the ‘CBO extracts horizontal and vertical sync
`information from externally generated active-low sync pulses. As an output. the active-low
`composite sync pulses are generated from either external HSYNC and VSYNC signals or the
`'C30's internal video timers. in the high-impedance state. the terminal is neither driven nor allowed
`to drive circuitry.
`Horizontal blank disables pixel display)‘ capture d Luring horizontal retrace periods in separate-sync
`video systems and can be used as an output only.
`Immediately following reset. these signals are configured as high—impedance CSYNCO and CSYNC1.
`Frame clock. These signals are derived from the external video system's dotcloclt and are used to drive
`the 'CBO‘s video logic for trame timer 0 and frame timer 1.
`Horizontal sync. These signals control the video system. They can be programmed as input. output.
`or high-impedance signals. As an input, HSYNC synchronizes the video timer to externally generated
`horizontal sync pulses. As an output. l-{SYNC is an active-low horizontal sync pulse generated by the
`‘CBO on-chip trams timer.
`In the high impedance state. the terminal is not driven and no internal
`synchronization is allowed to occur.
`Immediately following reset.
`these signals are In the
`high-impedance state.
`Serial data clock. These cloclr inputs are used by the 'C80's SRT controller to track the \r‘FlAM tap point
`when using rnidline reload. SCLKU and SCLKt should be the same signals that clock the serial register
`on the VFtAMs controlled by trams timer 0 and frame timer 1. respectively.
`Vertical sync. These signals control the video system. They can be programmed as an inputs. outputs.
`or high-impedance signals. its inputs. \r'SYNCx synchronizes the frame timer to externally generated
`vertical sync pulses. As outputs. VSYNCX are active-low vertical-sync pulses generated by the ‘CBO
`on-chip frame timer.
`In the high-impedance state.
`the terminal
`is not driven and no internal
`synchronization is allowed to occur. Immediately following reset. this signal is in the high—impedance
`Ground. Electrical ground inputs
`Power. Nominal 3.3-V power supply inputs
`5 V power. Nominal 5-V power supply inputs
`I - input. 0 = output. 2 - high impedance
`* For proper operation. all Ugo and V33 terminals must be connected externally.
`> 2> z 0 r
`n E -
`n O :
`o E :
`l O z
`SPHSD23 - JULY 1994
`I I
`11 -19.1
`Figure 1. Block Diagram Showing Datapaths
`Figure 1 shows the major components of the ‘C80: the master processor (MP). the advanced digital signal
`processors (ADSPs}. the transfer controller {TO}. the video controller (V0). and the IEEE-1149.1 emulation
`intertace. Shared access to on-chip RAM is achieved through the crossbar. Crossbar connections are
`represented by 0. Each ADSP can perform three accesses per cycle through its local (L). global (G). and
`instruction {I} ports. The MP can access two RAMs per cycle through its crossbar data (C! D) and Instruction
`(I) ports. and the TC can access one RAM through its crossbar interface. Up to 15 simultaneous accesses are
`supported in each cycle. Addresses can be changed every cycle, allowing the crossbar matrix to be changed
`on a cycle-by-cycte basis. Contention between processors tor the same RAM in the same cycle is resolved by
`a round-robin priority scheme. in addition to the crossbar, a 32-bit datapath exists between the MP and the TC
`and V0. This allows the MP to access TC and VC control registers that are memory mapped into the MP5
`memory space.
`The CS0 has a 4G-byte address space as shown in Figure 2. The lower 32M bytes are used to address internal
`RAM and memory-mapped registers.
`ADSP3 Parameter HAM
`W ""“'3
`W "7""
`ADSP2 Plfl meta: HAM
`9'‘ bid“!
`9" °""°"
`ADSP1 Para motor RAM
`"K "V""
`Ru orvad
`9*‘ ‘"1"’
`NDSPO Furamator FIQM
`(‘K "‘“°“
`(1 E aux ta-nu}
`ADS P3 mu RAM!
`“K W '1
`(‘K "‘""’
`nus»: Dan RAM2
`W Wm)
`Ru Irvad
`(ZK brlu)
`AD5P:‘l'f'Iyt.I RAM:
`‘ W‘
`*1“ ”“°"
`ADSP3 om HAM1
`(ZK byln]
`ADS” om mm
`{2K {mu}
`Ms” mm mm
`3'‘ "W"
`ADSP2 0.“ name
`(:1: bytes)
`ADS,‘ om Mm
`“K ‘Wu.’
`(K Wu]
`ADSPO D 2 nmo
`‘ '
`“K "‘""’
`0x01 |J03?FF
`ox“ O02-HF
`am am am
`''’‘°' W‘ "F
`uxowm no-0
`M“ U n
`oxoooo FFF
`0100001 7FF
`0 10000 I 000
`0:00-000‘? FF
`SPH5023 - JULY1994
`architecture (continued)
`Exurnnf Iolornory
`(4064!-I hm-1
`(3 953'‘ an.”
`Inlcmary-slapped vc Ftonluhrl
`(512 bytu}
`Ilomory-Ilnppld ‘EC Rlqlltln
`‘S1’ °""’
`(ZEN. bylln}
`natructlun Cnchl
`(4 Mm
`aux bma.)
`MP mu cuchn
`H P I
`MK bnu)
`mm m''’
`ADSP3 In Itructlcm Clcht
`(2I( bnu}
`(BK brlul
`ADSP2 Irmrucuon Clchn
`I Wm)
`(GK 03103]
`ADSP1 Insltuctlnn cache
`I?“ |-W1")
`(BK by-tn)
`Ans:-o Inutnmlan cncm
`IZK hm."
`(a max emu
`MP Parameter HAM
`(2K bvinll
`{SOK nym}
`cm suaono
`°'°‘ “MP”
`°"°‘ 3°‘ FF‘
`> 2> z 0 I
`0 :
`o 2
`Figure 2. Memory Map
`SPHS023 — JULY 199:!
`master processor (MP) overview
`The MP is a 32-bit RISC {reduced instruction set computer) with an integrat IEEE-754 floating-point unit. As with
`other FtlSC processors. all accesses to memory are performed with load and store instructions. and most integer
`and logical operations are performed on registers in a single cycle. The fIoating—point instructions are pipelined:
`although a single-precision instruction such as a floating-point multiply takes three cycles to complete. such an
`instruction can be started on each clock cycle. Likewise, double-precision instructions, such as square root. that
`take 28 cycles to complete can start on any clock cycle. Floating-point-unit operations use the same register
`tile as the integer and logic unit. A register scoreboard ensures that correct register-access sequences are
`Instructions and data are both fetched from on-chip caches. each 4K bytes in size. The control forthese caches
`is an integral part of the MP design. The MP is able to access the on-chip memories by using the crossbar
`The MP is structurally designed for efficient execution of C code. As an example, the MP contains an R0 register.
`often called a zeroing register. used by C. Also. the MP instruction set is tailored to contain many of the C
`executables round in compiler technology.
`Figure 3 shows the block diagram for the master processor.
`Register Flle
`{31 32-Bit Registers}
`4 n........-p....:.....1 .=........,.-p.,....
`Mask Generator
`Zero Comparator
`Intgggr A[_u
`Floating-Point Multiplier
`Control Registers
`Instruction Register
`Program Counters
`FIoB“W_Po|m Add"
`Emulation Logic
`Encllan Multiplexers
`Instruction Cache
`Data Cache
`Crossbar Interface
`Figure 3. Master Processor Architecture
`POST OFFICE BOX I443 ‘ HOUSTON. TEXAS ?'.l‘251-1443
`SPRSD23 — JULY 1994
`master processor (MP) overview (continued)
`Key features of the MP include:
`' A 32-bit RISC processor
`— Loadfstore architecture
`3-operand arithmetic and logical instructions
`Thirty-one 32~bit general-purpose registers
`Four double-precision floating-point vector accumulators
`IEEE-754 floating-point hardware
`4K-byte instruction cache
`4K-byte data cache
`Data and instruction cache characteristics that include:
`4-way set associative
`LFtU repiacement
`— Data writeback
`— No bus snooping or bus watching
`2K-byte parameter HAM (not cached}
`Delayed branches with option to annul de|ay—slot instructions
`Explicit compare instructions — no dedicated status register
`Register and accumulator scoreboard
`15-bit or 32-bit immediate constants
`Vector floating-point instructions
`initiate a l‘|oating~point operation and a parallel load or store in one instruction
`- Multipiy and accumulate
`Scalable timer
`Lettmost—one and rightrnosbone logic
`Performance at 30 s N s 50 MHz internal frequency:
`2 x N MFLOPS peak (80 Mi-‘LOPS at 40 MHZ]
`N MIPS (40 MIPS at 40 MHZ)
`— Over 2600 x N Dhrystones (104000 Dhrystones at 40 MHZ)
`Control registers used for vector data loads or stores. emulation. exceptions, and cache control
`32-bit address space for bytes
`SP 2 0m E -
`n O :
`1 § 1 O 2
`SPHS023 - JULY 199-I
`advanced digital signal processor (ADSPJ overview
`The 'C80’s ADSF’ is a programmable DSP-like 32-bit integer processor with a 64-bit instruction word that is
`optimized for imaging and graphics applications. It supports the filtering and frequency domain operations
`required for image processing. The ADSP can execute in parallel a multiply, an ALU operation {such as a
`shift-and-add), and two memory accesses within a single instruction.
`The ADSP has a three-input ALU that supports all 256 Boolean combinations of three inputs and many
`combinations of arithmetic and Boolean functions. Data merging and bit-to-byte, bit-to-halfword. and bit-to-word
`translations are supported by hardware along the input data path to the ALU. These merging and translation
`operations allow the ADSP to accelerate graphics applications such as windowing environments. The internal
`parallelism allows a single ADSP to achieve over 500 million operations per second for certain algorithms.
`key teatures of ADSP
`Key features of the ADS? include:
`64-bit instruction word supports many parallel operations. such as a multiply, an ALU operation, and two
`memory accesses in a single cycle
`3-stage pipeline provides fast instruction cycle
`8 data. 10 address. and 6 index registers
`20 other user-visible registers
`Data unit
`16 x 16 integer multiplier (optional 8 x 8 multiplies)
`Splittable 3-input ALU
`32-barrel rotator
`Mask generator
`Multiple-status flag expander facilitates translations to and from 1-bit-per-pixel space.
`supports transparency, max. min. saturation, z-buffering, and patterning.
`Conditional operations to reduce branch requirements and delays: operations include both
`conditional assignment of data unit result(s) (16 condition codes) and conditional source selection
`(based on negative status bit).
`Special processing hardware such as leftmost 1 and rightmost 1 detection and leftmost-bit-change
`and rightmost-bit-change detection
`it also
`Memory addressing
`2 address units (global and local). allowing up to two 32-bit memory accesses in parallel with data unit
`12 basic address modes (variations of immediate and indexed addressing)
`Byte, haltword. and word addressability
`8-. 16-. and 32-bit data (or pixel) sizes
`Loads of 8-bit or 16-bit data are either sign or zero—extended to 32 bits
`Indexed addressing can be scaled according to data size
`Big- and little-endian addressing supported
`Conditional assignment for loads [memory-to-register transfers} based on 1 of 16 condition codes
`Conditional source selection for stores [register-to-memory transfers) based on negative status bit
`SPHSD23 - JULY 1994
`key features of ADSP (continued)
`| Reamer!
`Date Path
`Meek Generator
`Barrel Flotelor
`Three-Input ALU
`Local Address Unll.
`Global Addreu Unlt
`Progrem Flow Control Unit
`Three Zero-Overhead
`Loopfarench Controllers
`Instruction Ind
`Cache Conlrol
`Local Datl Global Date
`Instruction address port
`Local address poll
`Global address port
`Replicate hardware
`Align-lslgn-extend hardware
`Figure 4. ADSP Block Dlegram
`B E z 0 m E -
`n O :
`0 %d O 2
`SPFIS023 — JULY 1994
`key features of ADSP (continued)
`Program flow
`-— Three hardware loop controilers that support zero-over-head looping andior branching. three nest
`loops, one loop with multiple end points. and many other flexible looping combinations
`The program counter (PC register} is mapped into the register file. Either the ALU or the global
`address unit can write to the PC register conditionally or unconditionally to cause a branch or
`subroutine call.
`Interrupts for message passing and context switching
`Instruction-cache management for accelerating program execution on the ADSP
`' Run-time parallel-programming-environment support
`' Algebraic assembly language
`typical applications oi ADSP
`The ADSP serves as a high-speed pixel coprocessor for the FIISC-like MP (master processor). Typical tasks
`performed by the ADS P5 are:
`Pixel-intensive processing
`Motion estimation
`Mean square error
`Domain transforms
`Core graphics functions
`Shaded fills
`Image analysis
`~ Segmentation
`— Feature extraction
`Bit-stream oodingldecoding
`— Data merging
`— Table lookups
`SPHSO23 — JULY1994
`transfer controller (TC) overview
`The transfer controller (TC) is a combined memory controller and DMA (direct memory access) machine. it
`handles the movement of data and instructions within the “C80 system as required by the master processor.
`parallel processors. video controller, and externat devices.
`The transfer controtier performs the following data-movement and memory~control functions:
`° MP and ADSP instruction-cache fills
`MP data—cache fills and dirty-block write-back
`MP and ADSP packet transfers (PTs)
`Externally initiated packet transfers (XPTs}
`VC packet transfers (VC F'Ts)
`MP and ADSP direct external accesses {DEAS}
`VC shift-register-transfer {SFtTs}
`DRAM refresh
`External bus requests
`Operations are performed on the cache subblock as requested by the processors‘ internal cache controlters.
`DEA operations transfer off-chip data directly‘ to or from processor registers. Packet transfers are the main data
`transfer operations and provide an extremely flexible method for moving multidimensional blocks of data
`(packets) between on-chip andlor off-chip memory.
`Sm mu,‘
`Packet Transfer
`Cache Buffer
`Ftequeet Queuing end Prioritization Logic
`MP Cache
`ADSP Cache
`Figure 5. Transfer Controller Architecture
`*3‘ TEXAS
`E>z 0 r
`n E -
`n O :
`1 § 1 O 2
`2 Q I
`-< E o
`: 0L
`L E u
`: U 5 > D <
`SPF-15023 — JULY ‘I99-1
`transfer controller (TC) overview (continued)
`Key features of the TC include:
`" Crossbar interface
`64-bit data path
`Single-cycle access
`External memory interface
`4G-byte address range
`— Dynamically configurable memory cycles
`Elus size of 8. 16. 32. or 64 bits
`Selectabie memory page size
`Selectable rowlcolumn address multiplexing
`Selectable cycle timing
`Big or little endian operation
`Cache, VRAM. and refresh controller
`Programmable refresh rate
`— VRAM bloclowrite support
`Independent source and destination addressing
`— Autonomous address generation based on packet transfer paramet