`PROCESSOR AND ITS APPLICATION
`DEVELOPMENT TOOLS
`
`AT&T TECHNICAL JOURNAL
`
`_______ Jame. R. Boddie, Renato N. Gadenz, Robert N. Kershaw,
`W. Patrick Hay., and Jame. Tow
`
`James R. Boddie is
`head of the Signal
`Processing and Inte(cid:173)
`grated Circuit Design
`Department at AT&T
`Bell Laboratories in
`Holmdel. New Jersey.
`W Patrick Hays is a
`supervisor; and Ren(cid:173)
`aOO N. Gadenz and
`James Tow are mem(cid:173)
`bers of technical staff
`in that department.
`Robert N. Kershaw is
`a supervisor in the
`DigitallC Design
`Department at AT&T
`Bell Laboratories in
`Allentown. Pennsylva(cid:173)
`nia. Mr. Boddie 's
`department designs
`analog and digital sig(cid:173)
`nal processing
`integrated circuits and
`support systems. Mr.
`Boddie joined AT&T in
`1977 and has a
`B.S .E.E. from Auburn
`University. an S.M.
`and E.E. from Massa(cid:173)
`chusetts Institute of
`Technology (MIT), and
`a Ph.D. from Auburn
`University. all in elec(cid:173)
`trical engineering. Mr.
`Gadenz, who designs
`and tests new VLSI
`circuits for digital sig(cid:173)
`nal processing, joined
`(continued on page 104)
`
`The WE® DSP32 digital signal processor is a high(cid:173)
`speed, programmable, VLSI circuit with 32-bit float(cid:173)
`ing-point arithmetic. The device can be used cost(cid:173)
`effectively in a wide variety of complex digital signal
`processing applications, suchas speech recognition,
`high-speed modems, low bit-rate voice coders, multi(cid:173)
`channel signaling systems, and signal processing
`workstations. In this paper, we review the architecture
`of the DSP32 and present its instruction set, including
`some examples. We also discuss the software and
`hardware support tools that are available for developing
`DSP32 applications.
`
`An Expanding Family
`The WE®DSP32 digital signal processor'>is the latestmem(cid:173)
`ber ofa family ofsingle chip, programmable digital signal processors
`developed at AT&T Bell Laboratories.
`The first family member appeared in 1979 and was called sim(cid:173)
`ply DSp'3It was oneofthe first single-chip, programmable digital signal
`processors evermade.
`As very-large-scale-integration (VLSI) technology advanced,
`the demand for the DSP justified its redesign. The new device-the WE
`DSP20-ran twice as fast, used less power, and costless than the origi(cid:173)
`nal DSp, yet was pin, architecture, and instruction set compatible.
`
`DSP32 Features
`Both the DSP and DSP20 are 20-bit, fixed-point processors
`and can execute instructions at the rate of 1.25and 2.5 million instruc(cid:173)
`tions per second, respectively.
`The DSP32, on the otherhand, can execute 4 million instruc(cid:173)
`tions per second, operating on32-bit, floating-point numbers. It also
`hasa complete set ofmicroprocessor instructions that operate on16(cid:173)
`bit, fixed-point numbers forease ofuse inlogic andcontrol operations.
`In addition, the DSP32 offers bothserial andparallel input/output (VO)
`with direct-mernory access (DMA) capability. (DMA allows data trans-
`
`89
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 1 of 16
`
`
`
`fers from an input buffer to memory or from memory to an
`output buffer, without program intervention. )
`Like the previous DSPs, the DSP32 has an on-
`chip, random-access memory (RAM) for storing variable
`data, and an on-chip, mask-programmable read-only mem-
`ory (ROM) for storing instructions and fixed data. Thus,
`the device can be customized for a wide variety of signal
`processing applications.
`Although there are many different types of appli-
`cations, most require real-time execution of repetitive
`multiplications and additions. Like its predecessors, the
`DSP32 is optimized for this type of computation. But it
`offers a new architecture that allows users to develop more
`complex applications requiring floating-point arithmetic.
`The DSP32 can be cost-effective for a wide vari-
`ety of digital signal processing applications, such as speech
`recognition, high-speed modems, low bit-rate voice cod-
`ers, multichannel signaling systems, and signal processing
`workstations. Algorithms that are suited for floating point
`include matrix inversion; spectral analysis; high-quality,
`low bit-rate speech graphics; and image processing.
`
`. Our goal in designing the DSP32 has been to bal-
`
`ance high performance with ease of use. By high
`performance, we mean fast execution of signal processing
`algorithms using a high-quality arithmetic. A major con-
`tributor to this goal is the use of a 32-bit, floating-point-
`data arithmetic unit.
`An 8-bit exponent in the 32-bit, floating-point
`word yields a large dynamic range that makes overflow
`unlikely, while a 24-bit normalized mantissa gives high pre-
`cision, independent of magnitude. Large dynamic range
`and high precision are often essential in advanced
`algorithms.
`Floating point also contributes to ease of use,
`because it frees programmers from concern about inter-
`mediate scaling to avoid overflow or loss of precision.
`Further, algorithms that are initially developed on main-
`frame computers or array processors and use floating-
`point arithmetic can be easily adapted to run on the
`DSP32.
`
`Ease of use applies not only to programming or
`interfacing to the DSP32, but to the support tools that are
`available for developing DSP32 applications. We will dis-
`cuss the architecture, instruction set, and support tools
`later.
`
`The DSP32 can do the following, all with 32-bit,
`floating-point precision and dynamic range:
`a 1024-point, complex, Fast Fourier Transform (FFT) in
`19.2 ms (including bit reversal)
`a finite impulse response filter in 250 ns per tap
`a second-order section (four multiply) for a recursive,
`infinite-impulse-response (IIR) filter in 1 ps.
`The device is available in two packages: a 40-pin
`dual in-line package, and a 100-pin pin array that can use
`external memory to expand the on-chip memory. The chip
`is manufactured in 1.5-pm effective channel length, n-type
`metal-oxide semiconductor (NMOS) technology, and has
`about 155,000 transistors in an 81-mm2 area (12.70 mm by
`6.35 mm).
`The DSP32 operates with a 16.384-MHz clock
`and a single 5V power supply. For the 40-pin device, typical
`power dissipation is 1. 8W, while worst-case power dissipa-
`tion is 2.3W For the 100-pin package, typical power
`dissipation is 2.0W and worst case is 2.6W The device is
`now being manufactured in high volumes.
`Recently, the DSP32 has been manufactured with
`a 10-percent photolithographic shrink. Samples will soon
`be available from two different processes; one leads to
`higher speed devices and the other to lower power devices.
`
`Its Architecture
`The DSP32’s architecture consists of several spe-
`cialized sections (Figu’re 1) that work in parallel to achieve
`a high throughput. A 32-bit bus interconnects these sec-
`tions, and control is distributed among them.
`The device has two execution units-the
`data
`arithmetic unit (DAU) and the control arithmetic unit
`its on-chip memory includes 2048 bytes of
`(CAU)-and
`ROM and 4096 bytes of RAM. The memory can be
`addressed as 8-, 16-, or 32-bit words and is organized to
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 2 of 16
`
`
`
`access 32-bit data at the same speed as 8-bit data.
`A flexible serial I/O (30) unit allows direct inter-
`face to a codec, a time-division multiplex line, or another
`DSP32, while an 8-bit parallel I/O (PIO) unit allows bidi-
`rectional communication with a microprocessor.
`Data Arithmetic Unit. The DAU is the primary execu-
`tion unit for signal processing algorithms. As Figure 1
`shows, it contains a 32-bit, floating-point multiplier; a 40-
`bit, floating-point adder; and four 40-bit accumulator regis-
`ters (an through a3) that provide temporary storage, thus
`reducing memory accesses.
`The DAU has a multiply-add structure; that is, the
`multiplier output is directly connected to the adder input. It
`does 4 million instructions per second of the form,
`A = B + C * D. In this instruction, we have both a
`floating-point multiplication and a floating-point addition,
`which yields a throughput of 8 million floating-point opera-
`tions per second.
`The DAU multiplier inputs are 32-bit, floating-
`point numbers with a 24-bit mantissa and 8-bit exponent.
`Inputs to the multiplier and adder can come from
`memory, I/O registers, or an accumulator (a 0 -a 3). The
`adder inputs from memory or I/O are 8, 16, or 32 bits
`wide, while those from the multiplier or an accumulator are
`40-bits wide (an 8-bit exponent and a 32-bit mantissa that
`includes eight guard bits).
`These 40 bits of precision are maintained in the
`adder and the accumulators; the eight guard bits of the
`mantissa allow extra precision in intermediate accumula-
`tions. However, the 40-bit result in an accumulator is
`truncated to 32-bits when written to memory or I/O, or
`provided as an input to the multiplier.
`The DAU converts floating-point data to and from
`16-bit integer data. It also converts floating-point data to
`and from p-law and A-law companded data formats.
`16-bit, fixed-
`control Arithmetic Unit. The CAU-a
`point unit-has
`a dual function. It generates and post mod-
`ifies addresses for accessing operands in memory, and it
`executes microprocessor instructions on 16-bit data for
`logic and control operations. (Post modify means incre-
`
`ment the address after completing the operation.)
`As Figure 1 shows, the CAU has 21 16-bit general
`purpose registers (r L through r 2 I); a 16-bit program
`counter (pc); and a full-function, arithmetic logic unit
`(ALU). All CAU registers are static and do not require
`refreshing.
`For instructions that are primarily executed in the
`DAU, registers r L through r 14 are used as memory
`pointers and registers r L 5 though r b 9 hold address
`increments. A pointer register contains the address of an
`operand in memory that is to be either read or written,
`and an increment register contains a number that is added
`to the pointer to alter it. This addition is done in the ALU.
`Register r 2 0, also called PIN (pointer-in), is the
`pointer for serial DMA input, and register r 2 2 , also called
`POUT (pointer-out), is the pointer for serial DMA output.
`These registers can also be used as general-purpose regis-
`ters, but their effect on DMA operations must be
`considered.
`As stated above, the CAU also executes its own
`set of instructions. They implement two’s complement,
`fixed-point arithmetic operations; logic functions; control
`operations l i e conditional branching; and data moves
`between memory, VO, and the CAU registers. Executing
`logic and control operations in the CAU not only makes
`programming the device easier but also enhances the
`DSP32 performance.
`Memory. The DSP32 provides on-chip memory
`(Figure 1) that includes a 512 by 32-bit ROM and a 1024 by
`32-bit RAM that is divided into two equal 512 by 32-bit
`segments.
`Data can be 8, 16, or 32 bits wide, and memory is
`uniformly byte addressable. This means that the four indi-
`vidual bytes and the two 16-bit words in each 32-bit word
`can be addressed independently.
`Byte addressability is important because, besides
`using 32-bit instructions and floating-point data, the
`DSP32 uses 16-bit fixed-point integers and 8-bit com-
`panded (p-law and A-law) data.
`The RAM is dynamic and is refreshed either auto-
`
`91
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 3 of 16
`
`
`
`92
`
`matically once every 32 instructions or under program
`control. The ROM is masked programmed.
`For the 100-pin pin array package, an additional 56
`kbytes of memory can be accessed externally. If standard
`byte-wide memory chips are used for expanding memory,
`no additional interfacing devices are needed. Also, if this
`external memory is fast enough (80-1-1s access time), there
`is no speed penalty when accessing it.
`The DSP32 memory space (ROM, RAM, and
`external memory) is logically divided into two banks:
`Lower bank 0, which can be expanded with external mem-
`ory, and Upper bank 1. Memory space can be configured in
`four different ways using two of the DSP32 pins.
`To achieve maximum throughput, memory
`accesses must alternate between the two memory banks.
`As one memory bank is accessed, the other memory bank
`is being addressed. However, if the user chooses not to
`interleave, the DSP32 automatically inserts a wait state
`when two consecutive accesses occur to the same memory
`bank. This flexible memory accessing makes the DSP32
`easier to program.
`Instructions can be stored anywhere in the
`address space (ROM, RAM or external memory) and can
`be executed from any memory without a speed penalty, if
`interleaving between the two memory banks is used. Each
`instruction cycle consists of four machine states, and one
`
`read or write operation can occur during each state.
`Serial vo. The SIO is used for serial-to-parallel con-
`version of input data and parallel-to-serial conversion of
`output data. Serial input and output operations are inde-
`pendent and asynchronous with respect to each other and
`to program execution. The SIO control signals allow direct
`interface to a codec, a time-division multiplexed line, or
`another DSP32.
`Input to the SIO (Figure 1) is loaded into the input
`shift register (isr), and then into the input buffer
`( i b u f ) . SIO outputs are loaded into the output buffer
`(obuf), and then put into the output shift register (osr).
`This double buffering permits a second serial transmission
`to begin before the first transmission has been processed.
`Data widths can be 8, 16, or 32 bits. The I/O con-
`trol register ioc in the SIO is used to select various 110
`conditions, bit lengths, internal or external clocks, and
`internal or external synchronization signal, thus allowing
`flexibility when interfacing to external hardware.
`For example, in the active mode, the DSP32 gen-
`erates the I/O clocks and the synchronization signal for
`external hardware. In the passive mode, the DSP32 acts as
`a slave and the external hardware generates its clocks and
`synchronization signal.
`SIO transfers could occur under program control
`or using direct memory access. To enable DMA, the user
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 4 of 16
`
`
`
`93
`
`Figure 1. Block diagram of the DSP32 digital signal proces-
`sor. The numbers in parentheses designate the size of a
`particular register or bus. The symbols at the left represent
`pins on the device package.
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 5 of 16
`
`
`
`94
`
`sets the i o c register appropriately. If DMA is enabled, an
`implicit DMA request to the DSP32 control occurs when-
`ever the input buffer is full or the output buffer is empty.
`CAU registers r 2 o (PIN) and r 2 L (POUT),
`which the user can set, serve as pointers for the DMA
`transfers and are automatically incremented after each
`DMA access. The serial IiO DMA allows data samples to
`be transferred in and out of memory without interrupting
`execution of the DSP32 program.
`Parallel vo. The PI0 (Figure 1) is used for bidirec-
`tional communication between the DSP32 and a
`microprocessor over an %bit, external PI0 data bus
`(PDB).
`
`As in the SIO, PI0 transfers can be made under
`program or DMA control. In either case, data is trans-
`ferred in both directions through the 16-bit, parallel data
`register ( p d r ) . In the DMA mode, the 16-bit parallel
`address register ( p a r ) contains the DMA pointer. The
`microprocessor initializes p a r , which can be incremented
`automatically after each memory access.
`The parallel IiO DMA allows a microprocessor to
`download a program without interrupting execution of
`another DSP32 program. (Each DMA operation steals one
`instruction cycle. ) The microprocessor can then alter a
`branch address in the original program, which causes the
`DSP32 to execute the new program. Thus, the DSP32 can
`be dynamically programmed.
`Program outputs can also occur through the paral-
`lel interrupt register ( p i r ) , a register that an external
`microprocessor can also read. When the DSP32 loads this
`register, it also raises an interrupt flag that the micropro-
`cessor recognizes. The microprocessor can then take
`action.
`
`The PI0 can also be used to handle error condi-
`tions, such as floating-point overflow or underflow in the
`DAU, or loss of external synchronization signal. When an
`error occurs, a bit is set in the 6-bit, error source register
`(esr) that is readable by an external microprocessor. The
`10-bit, error mask register (emr), which is written by the
`microprocessor, conditions the DSP32 to ignore the error
`
`or send an interrupt signal to the microprocessor. If the
`latter happens, the DSP32 could also halt itself.
`Control of parallel communications is set up in the
`%bit, parallel control register ( p c r), which the external
`microprocessor can also access. This allows the user, for
`example, to start or restart the DSP32 (after a halt), or
`enable or disable automatic refreshing of the RAM.
`
`Instruction Set
`The DSP32 supports a powerful set of instruc-
`tions for signal processing algorithms. These instructions
`are divided into two types: instructions that are executed
`primarily in the DAU, and those that are executed primar-
`ily in the CAU. After they are assembled, all instructions
`are 32 bits wide.
`DAU Instructions. There are two groups of DAU
`instructions. The multiplylaccumulate instructions perform
`operations on 32-bit floating-point signal processing data.
`The special function instructions perform nonlinear opera-
`tions, such as rounding and data conversions between
`floating-point and k-law, A-law, or integer formats.
`A multiplyiaccumulate instruction requires three
`operands; the first two are multiplied, and the resulting
`product is added to the third. The instruction specifies the
`sources for these three operands (where the data resides).
`One source must be an accumulator; the other two oper-
`ands can come from either an accumulator, memory, or
`IiO. The CAU generates the addresses for memory
`operands.
`Operands from memory or IiO are transferred
`over the 32-bit data bus into the DAU. The result of the
`multiplyiadd operation is always stored in an accumulator
`and, as an option, can also be written to memory or 110.
`This transfer again occurs over the 32-bit data bus.
`The formats for DAU multiplyiaccumulate instruc-
`tions are
`[ Z =I aN = [-JaM {+, -} Y * X
`aN=[-]aM{+,-}(Z=Y) * X
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 6 of 16
`
`
`
`In these instructions, a N and aM can be any of the four
`accumulators (ao-aj). The x and Y operands are mem-
`ory locations, serial I/O input buffer, or an accumulator.
`The Z operand is either a memory location or the serial
`110 output buffer.
`Also, the [ ] and { } are not part of the assembly
`language syntax. Values enclosed in brackets [ ] are
`optional, and one of the values enclosed in braces { } must
`be used. For example, “ Z = ” is in brackets because the
`result of a multiply/accumulate operation can optionally be
`written to memory.
`The second format is similar to the first, except
`the Y operand is written to the location that z specifies, in
`addition to being operated on by the DAU. This is espe-
`cially useful for memory-to-memory moves for tap update
`in finite-impulse-response filters.
`The format for special function instructions is
`
`[Z =] a N = f u n c t i o n ( Y )
`
`Here, the specified function operates on the value Y, and
`the result is placed in the accumulator a N . As an option,
`the result can then be written to the location that z
`specifies.
`We will give examples of these formats shortly.
`The DSP32 instruction syntax is similar to the C
`programming language and makes the programmer’s code
`almost self-documenting. An asterisk indicates multiplica-
`tion, but an asterisk preceding a pointer (e. g., * r 7 )
`means memory location Pointed to by the pointer. An equal
`sign is assignment, a plus sign means addition, and the
`plus plus ( + + ) that follows a pointer indicates post incre-
`ment (increment the pointer after reading from or writing
`to memory). Here, pointers are always CAU registers.
`Consider the DAU multiply/accumulate instruction
`*r5++ = a1 = a0 + *r7 * *rLO++rL7
`that reads from right to left as follows. Multiply the con-
`tents of the memory locations pointed to by CAU registers
`
`r L 0 and r 7 , and add the result to the contents of accu-
`mulator a ~ .
` Then, store this result in accumulator aL and
`the memory location pointed to by CAU register r 5. Also,
`post increment the contents of register r L o using the
`contents of register r L 7 , and post increment the contents
`of register r 5 by one. (In reality, this is an increment by
`one 32-bit word address, which is equivalent to an incre-
`ment by four byte addresses.)
`Clearly, much computation is represented in a sin-
`gle instruction.
`An example of a DAU special function instruction
`
`is
`
`a0 = f l o a t ( i b u f )
`
`In this example, the contents of the input buffer i b u f
`(assume that the serial input word is 16 bits wide) are con-
`verted to DSP32 32-bit, floating-point format, and the
`result is stored in accumulator a O .
`CAU Instructions. There are three groups of CAU
`instructions: arithmetic and logic, data move, and control.
`The CAU uses two’s-complement, 16-bit fixed-point
`ari t h e tic.
`Except for a register load from memory, execu-
`tion of a CAU instruction is completed before execution of
`the next instruction begins. This feature greatly simplifies
`using the CAU for logic and control operations.
`Here are the formats for the three groups of CAU
`instructions:
`r D = r D o p {rS , N}
`
`{MEM, I / O } = r D
`
`or
`
`r D = {MEM, I / O }
`
`i f ( C O N D ) g o t o r H , N , { r H + N ,
`
`r H - N }
`
`95
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 7 of 16
`
`
`
`96
`
`The first format represents the arithmetic and
`logic group. These instructions do 16-bit fixed-point inte-
`ger arithmetic, such as addition and subtraction, or logic
`operations, suchas A N D , O R , XOR, a n d s h i f t . The
`values of r D and r S are the contents of any of the 21 16-
`bit CAU registers ( r b - r 2 2 ) , and N is a 16-bit integer.
`The second format represents the data move
`group. These instructions transfer data between the CAU
`registers and memory, or between the registers and serial
`or parallel I/O. In this form of instruction, ME M is an 8- or
`16-bit memory location that is specified by a direct address
`or a CAU register. Data can also be moved to or from the
`serial or parallel I/O.
`The third format represents the control group,
`whose instructions alter the program counter to change
`the sequence of program execution. The DSP32 can
`branch on many conditions ( i f . . . g o t o), including
`CAU, DAU, and I/O conditions. The control instructions
`also include an unconditional branch (goto), and a call to
`and return from a subroutine.
`The values of r D , r S, and rH are the contents
`of any of the 21 16-bit CAU registers. However, r H can
`also be the contents of the program counter (pc). As
`before, N is a 16-bit integer.
`Examples of CAU instructions are:
`
`*r7++ = r 2 0
`i f ( e q ) g o t o r3 + 2 2
`The first instruction belongs to the arithmetic and
`logic group. It does a logical A N D of the contents of CAU
`registers r b i’ and r b b and places the result back into
`r b b .
`
`The second instruction belongs to the data move
`group. It takes the contents of CAU register r 1 0 and
`writes it to the 16-bit memory location pointed to by CAU
`register ri’. Then, register ri’ is post incremented so
`
`that it points to the next 16-bit integer location in memory
`(an increment of one 16-bit word, or two bytes).
`The last instruction, an example of the control
`group, illustrates a conditional branch. If the result of the
`last CAU operation equaled zero, then go to the memory
`location specified by the contents of CAU register r3
`plus 12.
`
`Appendixes A and B present two examples of
`DSP32 programs.
`
`Development Support Tools
`As we mentioned, ease of use requires the availa-
`bility of a good set of tools that allow a user to translate
`algorithms into working applications.
`The DSP32 software library, DSP32-SL, is writ-
`ten in the C programming language and resides in a host
`computer that runs the UNIX@ operating system. Soft-
`ware tools4 include an assembler, linkerlloader, and
`simulator. The hardware development system, DSP32-DS4
`allows real-time program debugging and in-circuit emulation.
`software TOOIS. The assembler translates the user’s
`assembly language program into the binary code that
`DSP32 uses for instructions. A notable feature of this
`assembler that distinguishes it from typical microprocessor
`and other DSP assemblers is its use of the high-level syn-
`tax just presented.
`The assembler generates relocatable code that
`another software tool, the linker/loader, can easily alter.
`The relocatable code can reside anywhere in the addressa-
`ble space and can be combined with code that was
`assembled separately.
`The simulator is a program that simulates the
`operations of the DSP32 program in a nonreal-time envi-
`ronment. For full program debugging, the simulator allows
`access to all registers and memories. It also provides an
`interface to the DSP32 hardware development system.
`The simulator provides precise simulation down
`to the timing of synchronous IIO. When running on the
`AT&T 3B2/300 computer, it executes about 750 instruc-
`tions per second.
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 8 of 16
`
`
`
`Computer
`UNIX@ system
`DSP32 assembler
`DSP32 simulator
`
`-
`
`Terminal
`
`-
`I
`User system % Figure 2. The DSP32
`1
`trigger FI*
`
`RS232
`
`Breakpoint
`
`analyzer
`
`development environ-
`ment. At the center is
`the hardware develop-
`ment system, DSP32-
`DS. The assembler
`and simulator are part
`of the software
`library, DSP32-SL.
`
`97
`
`In the simulator, the user can freeze processing on
`many conditions, such as the execution of a specified
`instruction, access of a specified register or memory loca-
`tion, or occurrence of a specified number of I/O events.
`Input data can be supplied by a file, while output data can
`be captured in a file. The user can refer to memory loca-
`tions by their symbolic names rather than their absolute
`addresses. In addition, the user can define complex com-
`mand sequences and display formats, and invoke them
`with a simple command.
`Hardware TOOLS. The DSP32 hardware development
`system consists of a small circuit board that contains a
`DSP32, 14k words of external memory, a control micro-
`processor, and an RS232C interface. It can be used stand-
`alone or as an in-circuit emulator to the user’s system.
`The DSP32-DS can run DSP32 programs in an
`environment where its input and output are connected to
`the user’s DSP32 system. The development system also
`retains many of the simulator debugging features.
`Figure 2 depicts a typical DSP32 software and
`hardware development environment. Normally, the DSP32-
`DS is connected to a standard computer terminal and a
`host computer that runs the DSP32 simulator program.
`With a special simulator command, the user’s
`application program can be loaded and run using the devel-
`opment system, instead of the simulator. The user can
`switch back and forth between the simulator and the devel-
`
`opment system to compare results.
`For multiple DSP32 applications, up to seven
`DSP32-DSs can be connected to a single computer and ter-
`minal system. The user can selectively communicate with
`each system or the simulator program. One application for
`the multiple DSP32-DS configuration is the use of one or
`more DSP32-DSs as digital signal generators or signal ana-
`lyzers for testing a program in another DSP32-DS.
`With the appropriate application software, a
`DSP32-DS can become a powerful digital signal processing
`workstation that consists of a flexible set of programmable
`function generators, filters, power meters, detectors, and
`spectrum analyzers.
`
`Conclusion
`We have presented the DSP32-a
`32-bit, floating-
`point, programmable, digital signal processor-and
`its
`support tools for developing applications.
`The high performance of the device, coupled with
`its ease of use, make it suitable for a variety of advanced
`signal processing applications in telecommunications,
`speech, imaging, and graphics. The device can also be
`used as a 16-bit fixed-point microcomputer.
`The flexible on-chip memory and I/O-the
`latter
`with DMA capability on both serial and parallel inter-
`faces-provide
`the ability to program the DSP32
`dynamically and interact easily with the user’s system.
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 9 of 16
`
`
`
`Acknowledgments
`We gratefully acknowledge the contributions of
`the following people to the design, testing, and fabrication
`of the device and the development of its software and
`hardware support tools: L. E. Bays, G. G. Burgstresser,
`R. J. Canniff, J. D. Cuthbert, C-E Chen, D. B. Cuttriss, J.
`C. Desko, B. J. Dutt, D. J. Emrich, C. J. Faust, E. M.
`Fields, R. L. Freyman, J. A. Grant, C. J. Garen, G. E.
`Hall, D. J. Hart, J. Hartung, J. W Hendricks, T D. Hop-
`mann, W C. Ingram, l? W Kempsey, J. J. Klinikowski, D.
`E Koehler, R. D. Lippincott, D. A. McGillis, C. R. Miller,
`K. Mondal, H. 0. Morris, H. S. Moscovitz, H. N. Nham,
`G. D. O’Donnell, A. A. Pignone, Y. Rotblum, M. A.
`Steele, l? A. Stiling, E Subramaniam, W A. Stocker, T G.
`Szyrnanski, L. L! Tran, C. J. Van Wyk, A. J. White, D. Wil-
`lauer, W Witscher, Jr., D. S. Yaney and E. J. Yurek.
`
`References
`1. R. N. Kershaw, et al., “A Programmable Digital Signal Processor
`with 32b Floating Point Arithmetic,” Dgest ofTechnica1 Papers,
`1985 ISSCC, February 13-15, 1985, pp. 92-93.
`2. W. E Hays, et al., “A 32-bit VLSI Digital Signal Processor,” IEEE
`Journal ofsolid-State Circuits, Vol. SC-20, No. 5, October 1985,
`pp. 998-1004.
`3. Digital Signal Processor, The Bell System Technical Journal, Vol.
`60, No. 7, Part 2, September 1981.
`4. WE@ DSP32 Digital Signal Processor-Information Manual, AT&T
`Technolonies. Inc.. 1986.
`5. L. R. Rabiner and B.’ Gold, Theory and Application ofDigita1 Sig-
`nal Processing, Prentice-Hall, Englewood Cliffs, N. J., 1975, p.
`367.
`6. G. Szentirmai, “FILSYN: A General Purpose Filter Synthesis Pro-
`gram,” Proceedings ofthe IEEE, Vol. 65, October 1977, pp.
`1443-1458.
`
`98
`
`Appendix A. IIR Filter Program
`This example shows how to implement an infinite-
`impulse-response (IIR) filter. It assumes that the DSP32
`digital signal processor is connected to a 16-bit linear
`codec and uses four-coefficient, second-order, IIR sections
`connected in cascade. With the DSP32 floating-point capa-
`bility, there usually is no need to consider pole-zero
`pairing, ordering of sections, or gain distribution associ-
`ated with a fixed-point arithmetic implementation (which
`would also require five-multiply sections).
`The line numbers in the text refer to the program
`listing (Example l), where comments and the # d e f i n e
`feature of the C preprocessor (lines 1 to 8) are used to
`enhance program readability. Line 9, an assembler direc-
`tive, defines global variables that the DSP32 simulator and
`development system can reference as symbols.
`Line 10 programs the serial inputioutput for 16-bit
`transfers and uses the DSP32 on-chip clocks to drive an
`external 8-kHz sampling rate, analog to digital and digital
`to analog converter chip.
`Line 11 shows a wait for the arrival of the serial
`input sample, indicated by a full input buffer. If i b e (input
`buffer empty) is true, the program will stay in the loop,
`lines 11 and 12. (All DSP32 branching instructions are
`delayed brunches; i. e., the instruction that follows any
`branch instruction will always be executed before the
`branch occurs.) Line 12 sets up register r L o as a loop
`counter for use in line 21.
`Reading of i b u f , line 13, will empty the input
`buffer. Lines 18, 19, 20, and 22 do the four multiply and
`accumulate operations for the direct-form implementation
`of a second-order IIR section (Figure A).
`Execution of a DSP32 instruction proceeds from
`right to left. For example, line 19 first forms the product of
`A i2 (the coefficient of the denominator z p 2 term) with the
`appropriate state variable (SJ2 in Figure A) and decrements
`the state variable pointer. Next, it forms a sum in aO,
`stores the sum as an updated state variable (pointed to by
`r L 3 ) , and then increments r 2 3 .
`The updated state variable value will be used in
`
`CommScope, Inc.
`IPR2023-00066, Ex. 1012
`Page 10 of 16
`
`
`
`output
`from (i-1)th
`section
`
`Input
`to (i + 1)th
`*
`section
`
`the next s