throbber
P1: KCU
`
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`Journal of VLSI Signal Processing 16, 41–55 (1997)
`c(cid:176) 1997 Kluwer Academic Publishers. Manufactured in The Netherlands.
`
`Architectural Synthesis of Digital Signal Processing Algorithms Using “IRIS”
`
`D.W. TRAINOR, R.F. WOODS AND J.V. McCANNY
`Department of Electrical & Electronic Engineering, The Queen’s University of Belfast, Ashby Building,
`Stranmillis Road, Belfast BT9 5AH, Northern Ireland
`
`Received March 4, 1996; Revised August 14, 1996
`
`Abstract.
`In this paper, we present the IRIS architectural synthesis system for high-performance digital signal
`processing. This tool allows non-specialists to automatically derive VLSI circuit architectures from high-level,
`algorithmic representations, and provides a quick route to silicon implementation. By incorporating a novel synthesis
`methodology, called the Modular Design Procedure, within the IRIS system, parameterised models of complex and
`innovative DSP hardware can be derived and automatically assembled to create new DSP systems. The nature
`of this synthesis methodology is such that designers can explore a large range of architectural alternatives, whilst
`considering all the architectural implications of using specific hardware to realise the circuit. The applicability
`of IRIS is demonstrated using the design examples of a second order Infinite Impulse Response filter and a one-
`dimensional Discrete Cosine Transform circuit.
`
`1.
`
`Introduction
`
`In recent years, considerable research has been car-
`ried out into the design of novel VLSI architectures for
`digital signal processing (DSP) applications [1]. The
`highly structured nature of many DSP functions makes
`it possible to derive regular circuit architectures, such
`as systolic arrays, that are ideal for VLSI implementa-
`tion. Exploitation of algorithm parallelism, and using
`techniques such as pipelining, can tailor architectures
`to particular sampling rate, area and power require-
`ments.
`In the field of electronic systems design generally,
`and DSP design in particular, systems are becoming
`increasingly complex, performance specifications are
`becoming more stringent, and time-to market pressures
`are shortening design cycles.
`In response, attention
`is now being directed at employing design automa-
`tion to carry out more abstract, system-level design
`tasks, by using CAD tools to automatically derive cir-
`cuits from algorithmic descriptions and required per-
`formance specifications. Since this process involves
`translating a required behaviour into an equivalent cir-
`cuit structure, it is referred to as architectural synthesis.
`
`The purpose of this paper is to describe the IRIS ar-
`chitectural synthesis tool for high performance DSP.
`Unlike other approaches [1–7], the emphasis in IRIS is
`to allow architectural exploration of algorithms. The
`process involves using processing unit models to gen-
`erate optimal architectures based on those units. The
`main advantage of IRIS is that it offers the designer
`full freedom to investigate a wide range of architec-
`tures using user-preferred blocks or novel processing
`techniques. It is also tightly coupled to conventional
`VHDL synthesis tools, and has been used to produce
`practical and realistic designs.
`The remaining text of this paper describes the syn-
`thesis methodology and the operation of the IRIS ar-
`chitectural synthesis tool. In Section 2, an overview of
`commonly applied architectural synthesis techniques is
`given, together with some important limitations exhib-
`ited by these techniques when applied to the synthesis
`of novel, high performance DSP circuits. This is illus-
`trated using an IIR filter example in Section 3. A solu-
`tion to these difficulties is also presented in Section 3,
`which introduces a radical new synthesis methodol-
`ogy, the Modular Design Procedure, and discusses its
`automation within IRIS. The capabilities of the IRIS
`
`Magna 2034
`TRW v. Magna
`IPR2015-00436
`
`0001
`
`

`
`
`
`
`
`P1: KCU
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`42
`
`Trainor, Woods and McCanny
`
`system are discussed during Section 4, and demon-
`strated throughout Sections 3 and 5, by carrying out
`novel implementations of second order recursive filter-
`ing and Discrete Cosine Transform algorithms. Finally,
`Section 6 offers some conclusions that can be drawn
`from the results of this research.
`
`2. Architectural Synthesis Methodologies
`
`Considerable effort has been focused on deriving meth-
`ods for mapping DSP algorithms onto VLSI archi-
`tectures, including techniques based on sets of Re-
`currence Equations [2], dependence graphs [1] and,
`algebraic methods [3]. Whilst many of these spe-
`cialist tools produce architectural representations from
`high levels of abstraction, they are unable to produce
`functionally-correct, implementable circuits, as issues
`such as internal word growth, truncation and data or-
`ganisation, have not been considered. These issues
`affect the latency and timing of data and thereby in-
`validate any derived architecture.
`It is left to the IC
`designer to modify and refine the initial architectures
`taking into considerations these design details. The
`second order Infinite Impulse Response filter example
`in Section 3 demonstrates the radical differences that
`implementation-level performance criteria can create
`between abstract algorithmic representations and a cor-
`responding physical circuit.
`as CATHEDRAL,
`Synthesis
`systems,
`such
`PHIDEO, HYPER and MARS [4–7] can take algo-
`rithmic descriptions and apply scheduling, assignment
`and hardware mapping techniques to synthesize an ar-
`chitecture. With these systems, hardware mapping, the
`process that maps a flow graph onto the available hard-
`ware blocks, is carried out after the various scheduling
`and assignment procedures. The assumption is there-
`fore made that hardware mapping does not alter the
`structure or functionality of the design. For the hard-
`ware units generally used in reported design examples
`synthesized by these tools, this may be a valid assump-
`tion. However, it has been demonstrated [8] that if
`the hardware units are complex pipelined processors,
`hardware mapping can invalidate the architecture. This
`needs to be resolved if the circuit is to operate correctly
`once implemented using the chosen hardware, and has
`been shown to be a complex task [8].
`A vital issue relating to the various synthesis method-
`ologies is the manner in which the design space may be
`explored. Most of the current architectural synthesis
`tools apply a fixed set of synthesis procedures, regard-
`
`less of the different properties of the circuit in question.
`Hence, with these tools, a number of algorithmic trans-
`formation methods are derived and applied uniformly
`to all problems. This approach may not be appropriate
`for many designers, who explore design trade-offs in
`different ways, depending on the specifications and de-
`sign properties for the particular problem currently un-
`der consideration. For these designers, a tool based on
`fixed synthesis principles is undesirable, since what is
`required is a system that allows the designer to explore
`and trade-off different design issues in a completely
`open manner.
`We believe that there is considerable scope for a de-
`sign methodology that allows designers to easily imple-
`ment and optimise architectures based on processing
`elements, and hence specific hardware, of their own
`choosing. Designers could use favourite processing
`units (which allows the re-use of previously optimised
`circuit layouts) or clever novel technologies, such as
`redundant arithmetic processors [9], which can offer
`clear advantages for some applications. Therefore, the
`IRIS synthesis system has been developed, which en-
`ables the extraction of parameterised expressions from
`complex VLSI processing elements, and uses these ex-
`pressions to achieve functionally-correct solutions for
`circuits built from these processors. Designers can
`quickly create and evaluate architectures that utilise
`existing hardware blocks and can be easily realised us-
`ing commercial silicon design tools. Also, designers
`can take advantage of novel circuit designs, gaining ar-
`chitectural knowledge as well as synthesis capability.
`
`3. Architectural Synthesis Using the IRIS System
`
`The design input of the IRIS system is a Signal Flow
`Graph (SFG) representation of the algorithm, consist-
`ing of zero-delay processing nodes connected by edges
`which may be weighted with an appropriate number of
`delays. For example, consider the algorithm carried
`out by a second order Infinite Impulse Response filter,
`which is defined by Eq. (1).
`yn D a0xn C a1xn¡1 C a2xn¡2 C b1 yn¡1 C b2 yn¡2 (1)
`
`Figure 1 shows a SFG that is functionally equivalent
`to the algorithm of Eq. (1). Each processing node rep-
`resents a Multiply/Accumulate (MAC) operation, and
`the black circles on particular graph edges represents a
`single delay, which is necessary to compute the algo-
`rithm.
`
`0002
`
`

`
`
`
`P1: KCU
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`Architectural Synthesis
`
`43
`
`issues, using the second order IIR filter as a design
`example.
`
`3.1. Derivation of IRIS Processor Models
`
`In order to utilise complex processors as blocks that can
`be used to synthesize DSP systems in IRIS, models of
`the various processors need to be derived and placed in
`a library within the tool. It is these models that replace
`the mathematical operations of the SFG, and it is the
`information contained within the models that is used by
`IRIS to determine the data timing changes caused by
`replacing the zero-delay SFG operation with complex,
`pipelined hardware.
`The processor models abstract much of the structural
`detail of the processor architecture, but retain enough
`performance-related information so that the effects of
`placing the models into the SFG can be determined.
`When a processor model is derived, the MDP demands
`that two processor performance measurements must be
`associated with the model. The first of these is the data
`format, or “time shape” of data entering or leaving each
`input or output of the processor. The time shape may
`be defined as the position in time of the bits or digits
`of the data value relative to each other [8]. Figure 2
`shows some examples of typical data time shapes.
`The detailed structure of the particular processor,
`and particularly the placement of internal pipelining
`latches, will determine the data time shape at each pro-
`cessor input and output.
`It is necessary to maintain
`information on the processor time shapes to determine
`what extra circuitry, if any, needs to be placed between
`connected processors to convert between the format of
`the data at the output of the first processor, and the
`format expected at the input of the second.
`The second processor performance issue that must
`be addressed by the equivalent IRIS processor model
`is the latency through each datapath in the processor.
`These latency values must be incorporated in the pro-
`cessor model since the number of clock cycles required
`for a particular processor to produce its results has pro-
`found effects on the timing of data throughout the entire
`circuit.
`An important design feature of the IRIS system is
`that the information on data time shapes and data-
`path latencies within the processor model can be pa-
`rameterised in terms of other important processor de-
`sign criteria, such as the data wordlengths used and
`the number of internal pipelining stages. Therefore,
`design changes, such as changes in data wordlengths,
`
`Figure 1. SFG for second order IIR filter.
`
`The synthesis technique involves replacing the
`generic processing nodes of the SFG with models of
`heavily-parameterised pipelined processing units from
`a library and applying a methodology called the Modu-
`lar Design Procedure (MDP) [8]. Unlike other systems
`[4–7] within IRIS the processor parameters exactly
`model the performance of the individual processor.
`The process of synthesis thus involves replacing the
`zero-delay nodes in the original signal flow graph by the
`model of the practical processor. The original Signal
`Flow Graph with zero-delay operators defines the algo-
`rithmic functionality but this is changed when practical
`processors are inserted that have different latencies. If
`the timing effects of using practical processors is not
`addressed then the resulting architecture will imple-
`ment a different algorithm.
`IRIS addresses this problem by retiming the cir-
`cuit to preserve the original algorithm but in such a
`way to minimise the number of delays that are added.
`This is achieved by calculating the maximum num-
`ber of delays within all the loops in the Signal Flow
`Graph and then using retiming [10] to ensure the global
`timing in the synthesized architecture is equivalent to
`that in the original SFG. The circuit will then have
`a maximum possible sampling rate.
`If this maxi-
`mum possible sampling rate meets the specified sam-
`pling rate, then the design has been successful and
`the user has achieved an optimised solution for the
`hardware used.
`If the maximum possible sampling
`rate is in excess of the sampling rate required, then
`the user can investigate alternative designs using dif-
`ferent, namely slower and more hardware efficient,
`blocks or using hardware sharing, to achieve a more ef-
`ficient solution. Subsections 3.1 and 3.2 discuss these
`
`0003
`
`

`
` P1: KCU
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`44
`
`Trainor, Woods and McCanny
`
`Figure 2. Typical data time shapes.
`
`can be easily taken into account. This parameterisa-
`tion is also complementary to the recent trend of the
`production, and purchase, of libraries of parameterised
`“mega-functions” [11], which are usually written in a
`Hardware Description Language. These libraries can
`increase design re-use and reduce time to market, by
`allowing systems to be constructed by connected com-
`plex re-usable blocks of hardware.
`Whilst a library of processor elements is available
`to the user, a processor interface capability is available
`to allow the incorporation of new processing blocks
`within IRIS. These blocks could be application-speci-
`fic components from commercial vendors, allowing a
`tight coupling between IRIS and conventional compil-
`ers. Alternatively, these processors could use novel
`processing techniques, such as redundant arithmetic
`[9], which would be difficult to capture using conven-
`tional synthesis tools.
`
`3.1.1. Derivation of IRIS Processor Models for the
`Second Order IIR Filter. In order to demonstrate the
`construction of IRIS processor models, consider the
`implementation of the second order IIR filter SFG
`shown in Fig. 1. This design requires the insertion of
`blocks of hardware, capable of the MAC operation,
`into the processing nodes of the SFG. Examples of
`high-performance modules, structurally defined at the
`bit level, that could be used to realise the filter circuit
`are the Carry-Save and Signed Binary Number Repre-
`sentation (SBNR) MAC processors [9] shown in Figs. 3
`and 4.
`
`In Fig. 3, the two signal p and q are multiplied, and
`the result added to the signal s. The labels pi , qi and si ,
`refer to data entering the processor, whilst po, qo and
`so refer to output data. The superscript of each label
`refers to the significance of the data bit to which that
`
`Figure 3. Structure of carry-save MAC processor.
`
`0004
`
`

`
` P1: KCU
`
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`Architectural Synthesis
`
`45
`
`depicted by black dots in Fig. 3. These latches define
`the timing of the various operations within the MAC
`processor. An important point to notice is the extra
`latches that are added to the datapath of the s signal
`at the right hand side of the structure. These latches
`ensure that each bit of the input si travels through the
`same number of pipeline stages before emerging at so,
`therefore the relative timing of the various bits of data
`entering si is maintained at so. This allows several
`MAC processors to be cascaded without having to con-
`vert the data timing from the output of one processor
`to the input of the next. This arrangement is useful for
`several important DSP circuits, particularly digital fil-
`ters, which can be easily implemented using cascaded
`MAC processors.
`Figure 1 shows that the IIR filter SFG exhibits re-
`cursion, where previously generated results form the
`inputs for future iterations of the algorithm. Previous
`research has shown that for such structures, processors
`that exhibit low, wordlength-independent latency and
`use redundant number systems to produce results most
`significant digit first can produce more efficient imple-
`mentations of the feedback paths [9]. The SBNR MAC
`processor shown in Fig. 4 represents such a structure,
`multiplying the pi signal and the qi signal (which is rep-
`resented by the dotted lines in the figure) and adding
`the si signal. The use of SBNR allows the removal
`of carry propagation chains within the processor and
`hence the production of results in a most significant
`digit first format.
`Examples of IRIS processor models are shown in
`Figs. 5 and 6. These models have been derived from
`the MAC structures of Figs. 3 and 4, and will be used
`
`Figure 4. Structure of SBNR MAC processor.
`
`superscript is attached e.g., if a bit is identified with
`¡1, i.e.,
`a superscript, “1”, that bit is of significance 2
`0.5. The subscript of each label indicates the relative
`clock cycle at which the bit signal enters or leaves the
`processor e.g., a bit labelled “n ¡ 1” enters or leaves
`the processor one clock cycle after a bit labelled “n”.
`This notation defines the timing of the various bits of
`data as they pass through the processor.
`The scheduling of the various operations in the
`Carry-Save MAC processor results in a number of
`pipeline stages, implemented by the addition of latches,
`
`Figure 5. Carry-save MAC model.
`
`0005
`
`

`
` P1: KCU
`
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`46
`
`Trainor, Woods and McCanny
`
`is applied. The reason for this is that, under truncation,
`a number of bits from data emerging from the output
`of a processor may not be useful and, if the data time
`shape is of a skewed format, extra clock cycles may
`have to be applied before the first useable bit emerges.
`These extra cycles are reflected in the truncation term,
`the value of which depends on the output time shape
`and upon which output bit the truncation is applied at.
`The derivation of the two IRIS MAC processor mod-
`els gives a practical demonstration of the performance
`discrepencies that exist between the zero-delay, ab-
`stract MAC operations shown in the SFG of Fig. 1, and
`actual processing hardware, capable of performing the
`MAC operation. The physical MAC processors exhibit
`non-zero latency, expect data in specific formats, and
`exhibit performance characteristics that depend upon
`design parameters such as data wordlengths and lev-
`els of pipelining. At the SFG level of abstraction, no
`such issues have been considered, hence the need to
`translate the SFG into a practical circuit based on spe-
`cific hardware. This highlights the problems associated
`with many high-level design techniques [1–3], which
`derive SFG-like representations from algorithms, but
`leave the designer with the problem of refining the cir-
`cuit, based on the hardware chosen.
`
`3.2.
`
`IRIS Retiming Procedure
`
`As previously stated, once IRIS processor models have
`been placed into the SFG nodes, a rescheduling or re-
`timing of the circuit is now required, due to the changes
`in data timing. This procedure consists of two stages,
`namely delay scaling and retiming. The IRIS scaling
`process determines the maximum allowable sampling
`period and retimes the SFG taking into account this
`value. The maximum allowable sampling period, also
`referred to as the pipelining period [12], is defined in
`Eq. (2).
`
`fi D max[TC ¥ DC]8 C
`
`(2)
`
`In Eq. (2), TC is the total number of clock cycles
`taken by the processors in a loop C of the SFG, and
`DC is the total number of delay elements in the same
`loop. In order to preserve algorithmic functionality, the
`sampling rate of the architecture must be synchronized
`to the rate of the slowest loop. In order to determine
`TC, and hence fiC, for each loop, IRIS must calculate
`the latency of each processor in loop C which means it
`must be aware of performance characteristics of each
`
`Figure 6. SBNR MAC model.
`
`to demonstrate the IRIS synthesis and retiming proce-
`dures, by producing a practical IIR filter architecture
`from the SFG of Fig. 1. Whilst these models have been
`manually derived from the structures shown in Figs. 3
`and 4, schemes by which processor models may be
`automatically generated from suitably structured HDL
`descriptions are currently being developed.
`The graphical representations of the processor mod-
`els in Figs. 5 and 6 demonstrate the parameterisation of
`the important performance characteristics of the pro-
`cessors. At each input and output of the block, a rep-
`resentation of the data time shape is displayed, and
`through each datapath, a parameterised expression for
`the latency is shown. The parameters n and m represent
`the wordlengths of the p and q data, whilst the vari-
`able x represents the number of rows of cells between
`successive pipeline stages in the processor. When de-
`termining the latency expressions, the smallest integer
`greater than or equal to the ratios specified is used. This
`is designated by the use of the symbol dae in Figs. 5
`and 6 which is defined as the smallest integer greater
`than or equal to “a”. An important issue relating to the
`Carry-Save model is that it is based on a slightly mod-
`ified version of the architecture shown in Fig. 3, which
`extends the functionality of the processor to utilise twos
`complement arithmetic [8]. The effect of these modi-
`fications on the processor model is an increase in the
`latency of the S datapath by one clock cycle.
`In addition to wordlength parameters, some of the la-
`tency values of the models depend on an additive trun-
`cation term, t, to reflect the fact that the latency through
`the datapath can be increased if numerical truncation
`
`0006
`
`

`
` P1: KCU
`
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`Architectural Synthesis
`
`47
`
`parameterised processing block. When the pipelining
`period has been determined, all delay elements within
`the SFG are scaled up by this value which is equivalent
`to adjusting the rate of the entire structure to its slowest
`loop.
`After delay scaling, the retiming procedure needs
`to be applied, in order to compensate for the different
`performance characteristics between the generic SFG
`processors and the practical models, as demonstrated
`in Section 3.1. In addition to the processor latencies,
`the retiming problem must model the effects on timing
`of truncation and any required data organisation con-
`version circuitry (the nature of which is determined by
`IRIS before retiming occurs). In the retiming proce-
`dure, the latency of each datapath through a processor
`is realised by removing the appropriate number of de-
`lays from the edges of the scaled SFG and incorporating
`them within the processor (referred to as embedding the
`processor block). Latches are transferred as required
`to ensure that embedding occurs correctly, whilst min-
`imising the total number of latches used. This has been
`achieved by formulating the retiming routine as a linear
`programming problem [10].
`
`3.2.1. Retiming of the IIR Filter Example. For the
`purposes of the IIR filter example, assume that the
`circuit
`is to be implemented using twos comple-
`ment Carry-Save MAC processors for the three non-
`recursive SFG nodes, and SBNR MAC processors for
`the two recursive nodes, since the SBNR modules gen-
`erally have low latency, which leads to smaller values
`of fi in Eq.
`(2), and hence more efficient recursive
`circuits. The Carry-Save processors are employed in
`the non-recursive section because they occupy less area
`than equivalent SBNR processors.
`Parameter values for the models shown in Figs. 5
`and 6 need to be supplied to calculate specific latency
`values and data time shapes for the filter design.
`In
`this example, data wordlengths of 12 bits/digits and
`pipelining stages placed every 4 rows of MAC cells are
`assumed. Placing these values into the parameterised
`expressions of the models shown in Figs. 5 and 6, and
`assuming no extra latency caused by data truncation,
`gives the particular models displayed in Figs. 7 and 8.
`The SFG of Fig. 1 exhibits two recursive loops,
`highlighted in Fig. 9. Using the latency expressions of
`the models shown in Figs. 7 and 8, evaluating Eq. (2)
`gives the result displayed in Eq. (3).
`fi1 D .2 ¥ 1/ D 2
`fi2 D ..2 C 2/ ¥ 2/ D 2
`
`.loop1/
`.loop2/
`
`(3)
`
`Figure 7. Carry-save MAC with resolved parameters.
`
`Equation (3) shows that both loops in the IIR fil-
`ter imply a pipelining period of 2 cycles, and hence all
`delay elements in the filter SFG must be scaled by a fac-
`tor of 2 before any further retiming takes place. This
`effectively synchronises the circuit to the rate of the
`slowest loop, and determines that the filter circuit will
`be 50% efficient, i.e., new iterations of the algorithm
`can commence every 2nd clock cycle. It is important
`to note that delay scaling is only applied to delay ele-
`ments present in the original SFG, and not to registers
`introduced by the subsequent retiming process.
`Applying delay scaling and retiming to the IIR filter
`problem results in the synthesis of a filter architecture,
`
`Figure 8. SBNR MAC with resolved parameters.
`
`0007
`
`

`
`
`
`P1: KCU
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`48
`
`Trainor, Woods and McCanny
`
`through the SFG by IRIS to model the various laten-
`cies associated with the particular processor models
`employed.
`An important issue relating to the filter design is
`the appearance of delays on the inputs of the circuit,
`indicating that retiming has been carried out in such a
`way that the external I/O of the original SFG, where
`all inputs arrive simultaneously, has been preserved
`in the synthesized architecture. The manner in which
`these delays are considered depends on the designer
`and the application. If the inputs of the circuit need to
`enter simultaneously, then the delays on the inputs of
`Fig. 10 represent the delay circuitry that is physically
`required.
`If simultaneous input entry is not important, then
`these delay values represent the relative timing required
`between the different inputs for correct operation, and
`it is the responsibility of the designer to ensure that this
`timing of the external signals is met.
`As previously stated, new iterations of the filter algo-
`rithm can be initiated every second clock cycle, since
`the calculated pipelining period is 2. This implies that
`the sample rate will be only 50% of the corresponding
`clock frequency. There are a number of approaches
`that can be adopted to change or take advantage of this
`characteristic. If the sample rate specification has al-
`ready been met, then scheduling techniques [13] can
`be applied by IRIS, to take advantage of the idle cy-
`cles in the circuit, thereby time-multiplexing several
`similar operations onto individual processors. This re-
`sults in a hardware saving with no reduction in overall
`processing speed.
`Alternatives to scheduling procedures, which can
`be applied to increase the sample rate of the circuit,
`include architectural transformations, and employing
`lower-latency processors in the recursive section of the
`IIR filter. For example, a modified version of the SBNR
`MAC processor shown in Fig. 4 has been reported in
`[14], which can be applied to recursive filters, and ex-
`hibits a latency of only a single clock cycle.
`Essentially, our synthesis methodology provides
`three mechanisms by which the design space may
`be explored, namely scheduling and allocation tech-
`niques, architectural transformations, and the choice
`of practical processors which have parameterised
`performance characteristics. The first two of these
`mechanisms are commonly employed in architectural
`synthesis tools, however the third concept has not
`been adequately considered by previous synthesis sys-
`tems. The relationship between the different design
`
`Figure 9. Recursive loops in the IIR filter circuit.
`
`based on and optimised to the specific MAC proces-
`sor hardware chosen. The generated architecture is
`shown in Fig. 10. In this circuit, it can be seen that two
`additional blocks have been placed in the circuit, enti-
`tled “tc msd” and “msd tc”, which convert data from
`twos complement representation to SBNR and vice-
`versa. IRIS can detect the need for this extra circuitry
`by specifying the number system used by a particu-
`lar processor in the IRIS processor model, and placing
`numerical conversion circuitry where required.
`In the architecture of Fig. 10, a number of blocks
`of delays have been introduced by the retiming pro-
`cess to account for the internal latencies of the pro-
`cessors and any conversion circuitry required, thereby
`ensuring that the global data timing of the synthesized
`architecture is the same as the original SFG. These
`blocks of timing delays are labelled x D, where x is
`the number of delays. Notice the block of 11 de-
`lays between the third twos complement MAC and the
`“tc msd” processor. This is caused by skewing cir-
`cuitry, required to convert the data time shape from
`the MAC output to the “tc msd” input”. A total of 31
`pipeline stages have been introduced and propagated
`
`Figure 10. Synthesized architecture for the IIR filter example.
`
`0008
`
`

`
`
`
`P1: KCU
`Journal of VLSI Signal Processing
`
`KL430-05-Trainor
`
`April 9, 1997
`
`12:28
`
`exploration methods will be demonstrated in the de-
`sign example discussed in Section 5.
`
`4.
`
`IRIS System Structure and Operation
`
`The functionality of IRIS can be partitioned into sev-
`eral subsystems, each carrying out a subset of the var-
`ious design functions that the tool makes available.
`Figure 11 gives a system viewpoint of the IRIS tool.
`In Fig. 11, the IRIS shell is shown as the central
`communicating process for the system. This shell is a
`windows-based program running environment, which
`processes commands from the user, invokes and moni-
`tors several tools within the IRIS framework, and relays
`the textual and graphical information produced by these
`tools back to the user. Parameter files provide the shell
`with information regarding the display, the location of
`libraries, access to help files and error messages, and
`lists the tools that can be invoked within the shell.
`From the shell, the user has control over the invo-
`cation and operation of two tools, a schematic editor,
`which also functions as an interface for the various syn-
`thesis functions, and a designer for the parameterised
`processor models required by the MDP. The processor
`designer is currently a simple parameter capture and
`analysis tool, which allows the user to create and eval-
`uate the various parameterised expressions that govern
`the characteristics of each processor model. Hierarchi-
`cal synthesis of systems is also possible by generating
`processor models from circuit architectures previously
`synthesized by IRIS. Within each processor model, in
`addition to the parameterised latency and data format
`
`Figure 11.
`
`IRIS system functionality diagram.
`
`Architectural Synthesis
`
`49
`
`expressions, the designer can associate parameterised
`area and timing expressions for the processor, thereby
`allowing IRIS to make speed and area estimates for syn-
`thesized circuits. These values can be fed back from
`silicon implementation tools, and investigation is con-
`tinuing into methods by which more accurate estimates
`of circuit area and speed can be made at the architec-
`tural level. IRIS can use these values as a guide for
`determining whether or not the architectural designs
`it creates meets the sample rate and area specifications
`supplied by the designer. In addition, the nature of IRIS
`is such that algorithms can very quickly be mapped onto
`silicon, and at the silicon implementation level, perfor-
`mance estimates are much more accurate than at the
`architectural level. Therefore, it is possible to iterate
`around the architectural synthesis and implementation
`stages without incurring large penalties in design time.
`Within the schematic editor, SFG schematics can be
`created using instances of various processing blocks,
`connected together by wires (representing the signals in
`the SFG) and terminated with external connectors (rep-
`resenting the external inputs and outputs to the SFG).
`From the schematic editor, a number of design verifi-
`cation and synthesis functions can be invoked, includ-
`ing a symbolic simulator which generates a difference
`equation corresponding to the SFG, and a numerical
`error analysis tool. Th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket