`c° 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
`
`Fast Implementation of 3-D Digital Filters Via
`Systolic Array Processors
`
`mertzios@demokritos.cc.duth.gr
`B. G. MERTZIOS
`Department of Electrical and Computer Engineering, Democritus University of Thrace, 67 100 Xanthi, Greece
`
`A. N. VENETSANOPOULOS
`Department of Electrical and Computer Engineering, University of Toronto, Toronto M5S 1A4, Canada
`
`Abstract. In this paper a fast implementation architecture of three-dimensional (3-D) FIR or IIR digital filters via
`systolic VLSI array processors is described. The modular structure presented is comprised of similar processing
`elements in a linear cascade configuration with local interconnections. High speed throughput rates are attained
`due to high concurrency, which is achieved by exploiting both pipelining and parallelism. The considered 3-D FIR
`and IIR filters may be used for the processing of reconstructed 3-D images and in medical imaging applications.
`
`Key Words: 3-D digital filters, systolic arrays, piplining, parallelism
`
`I. Introduction
`
`During the last decade we have witnessed a growing interest in the design and implemen-
`tation of three-dimensional (3-D) and M-D digital filters [1]–[10], which find numerous
`applications in medical imaging [11]–[14], computer vision [15], 3-D TV video signals
`[16], image restoration and enhancement for geophysical and seismic data [17], and pro-
`cessing of time-varying images [18]. In particular, digital image processing is necessary
`for the processing of medical images in order to provide higher quality images for inter-
`pretation and diagnosis. In medical imaging applications there is usually a clinical need
`to examine sections of the human body along directions, where direct image acquisition
`cannot be attained [12]. This latter problem of representation of 3-D images is solved
`by combining a number of two-dimensional (2-D) sections and then filtering the 3-D re-
`constructed images. Reconstruction and processing of 3-D images are used in imaging
`magnetic resonance (MRI), medical imaging and in the reconstruction of the carotid vessel
`by echographic sections [14].
`The need for fast processing of huge amount of data has led to the use of special purpose
`hardware architectures since the computer-based digital signal processing systems, which
`are designed with a general purpose structure and data processing philosophy, have certain
`features that prevent high throughput. The most prominent of the special purpose distributed
`processing structures are the Array Processors (APs). APs are ideal for the fast real-time
`implementation of complex digital signal processing algorithms. The aim of the APs design
`is defined as the choice of the best pipelined and parallel processing techniques and device
`technology, in order to meet satisfactory performance with low-cost.
`In particular, the
`VLSI APs are special purpose, locally interconnected computing networks that maximize
`concurrency by combining pipelining and parallelism. They are fully implemented by VLSI
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 335
`
`
`
`336
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`chips and are characterized by massive concurrency and regular data flow [19], [20]. The two
`most popular special purpose VLSI APs are the systolic and wavefront arrays, which exploit
`both pipelining and parallelism by using the concept of computational wavefront [20]–[22].
`A systolic array is a network of elementary Processing Elements (PEs) that rhythmi-
`cally compute and pass data through the system. A wavefront array is a systolic array
`plus data flow computing. Both kinds of arrays are algorithm oriented and they feature
`the desired properties of regularity, modularity, local communication, data and computa-
`tional pipelining and highly synchronized multiprocessing. However, the systolic arrays are
`globally synchronized since the data movements are controlled by global timing-reference
`“beats”, while the wavefront arrays are locally synchronized since the data movements
`are controlled by correct sequencing using handshaking. In addition to the classical im-
`plementation criteria of low-sensitivity with respect to finite word length effects, absence
`of overflow oscillations and limit cycles, the figures of merit in the VLSI implementation
`of digital signal processors are: (i). Concurrency, which is achieved by pipelining (data
`and computational pipelining), parallelism and multiprocessing, (ii). Repetitive, regular
`and modular structure, (iii). Local communication, which is the only permitted, (iv). Local
`synchronization and (v). Workload and flow distribution.
`Since the communication in VLSI is very expensive and remains restrictive, only short
`local communication paths are used, while no shared buses are needed. VLSI APs have
`been used for the fast implementation of many matrix based algorithms (matrix decompo-
`sitions, triangularization, matrix-vector multiplications) and signal processing algorithms
`(convolution and FFT techniques, estimation) [19], [20]. Moreover, recently fast implemen-
`tation structures were presented for one-dimensional (1-D) [22]–[28] and 2-D digital filters
`[29]–[31]. The recursive algorithms, or generally the realizations which need feedback,
`are usually considered inappropriate for high speed implementation, due to the recursive
`bottleneck burden. The existence of feedback loops imposes a bound in the achieved
`throughput rate and usually results in nonlocalized communications and nonlocalized tim-
`ing. Specifically, the maximum latency of the feedback loops determines the maximum
`allowed throughput rate. Fortunately, the recursive bottleneck may be overcome by recast-
`ing the algorithm using the principle of look-ahead computation, in order to increase the
`number of delays in the feedback loops and retiming to effectively pipeline the computation
`within the loops [25], [33], [34]. Other techniques for fast processing of recursive algo-
`rithms are the bit-level pipelining [35], [36] and the block processing [23], [29], [34], [37]
`(and the references therein). Also the use of internal local feedback loops, whenever this is
`possible, increases the throughput rate [23], [26], [33].
`The present paper refers to the fast implementation of 3-D FIR and IIR digital filters via
`VLSI array processors. A systolic-like architecture is presented, which is comprised of
`similar PEs in a linear cascade configuration, with local communications. The resulting
`structures are modular, regular, with local interconnections. Concurrency is achieved by
`exploiting both pipelining and parallelism. The proposed VLSI implementation of the 3-D
`digital filters attains high throughput rates which meet the requirement of high sampling
`rates in the real-time processing applications.The FIR and IIR 3-D digital filters considered
`may be used for the processing of reconstructed 3-D images, such as medical or geophysical
`images.
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 336
`
`
`
`3-D DIGITAL FILTERS
`
`337
`
`II. Direct Form Realization of 3-D Digital Filters
`
`A. 3-D FIR Digital Filters
`
`The 3-D FIR linear digital filters may be described by the 3-D nonrecursive difference
`equation
`
`N1
`
`N2
`
`N3
`
`ai1,i2,i3 u(n1 − i1, n2 − i2, n3 − i3)
`
`(1)
`
`X i
`X i
`X i
`
`2=0
`
`3=0
`
`1=0
`
`y(n1, n2, n3) =
`
`where y(n1, n2, n3), u(n1, n2, n3) represent the input and output 3-D sequences respectively.
`The direct form realization of the 3-D FIR filter (1) may be seen as an extension of the
`direct form realization of the 2-D FIR filter [38]; specifically now each coefficient of the
`2-D filter is replaced by an 1-D polynomial in the third variable. The three independent
`variables n1, n2, n3 are associated with three distinct Unit Delay (UD) elements UD1, UD2,
`UD3, corresponding to the variables z1, z2, z3, which appear in the 3-D filter’s transfer
`function
`
`N1
`
`N2
`
`N3
`
`ai1,i2,i3 z−i1
`
`1
`
`z−i2
`2
`
`z−i3
`3
`
`(2)
`
`X i
`X i
`X i
`
`1=0
`
`2=0
`
`3=0
`
`H (z1, z2, z3) =
`
`B. 3-D IIR Digital Filters
`
`The 3-D IIR linear digital filters may be described by the 3-D recursive difference equation
`
`N1
`
`N2
`
`N3
`
`ai1,i2,i3 u(n1 − i1, n2 − i2, n3 − i3)
`
`X i
`X i
`X i
`
`1=0
`
`2=0
`
`3=0
`
`y(n1, n2, n3) =
`
`bi1,i2,i3 y(n1 − i1, n2 − i2, n3 − i3)
`
`M1
`
`M2
`
`M3
`
`X i
`X i
`X i
`
`1=0
`
`2=0
`
`3=0
`
`−
`
`(i1, i2, i3) 6= (0, 0, 0)
`
`(3)
`
`in analogy to the 2-D quarter-plane model [38]. Eq. (1) should be computable according to
`the selected scanning. There is not any restriction among the triples of indices (M1, M2, M3)
`and (N1, N2, N3), since causality in space is not a necessary condition for computability.
`The 2-D transfer function, associated with (3), is given by the real rational function
`
`H (z1, z2, z3) =
`
`a(z1, z2, z3)
`b(z1, z2, z3)
`
`i2=0 PN3= PN1i1=0 PN2
`
`
`z−i3
`z−i2
`i3=0 ai1,i2,i3 z−i1
`3
`2
`
`
`1 + PM1i1=0 PM2i2=0 PM3
`z−i2
`i3=0 bi1,i2,i3 z−i1
`z−i3
`2
`3
`
`1
`
`1
`
`The two known forms of direct realization exist also here, in analogy to the 1-D and 2-D
`ones. The direct form I realization results from the cascade configuration of the nonrecursive
`
`(i1, i2, i3) 6= (0, 0, 0)
`
`(4)
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 337
`
`
`
`338
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`Figure 1. Block diagram of the direct form I realization of a 3-D IIR digital filter.
`
`Figure 2. Block diagram of the direct form II realization of a 3-D IIR digital filter.
`
`3-D FIR filter
`
`HF (z1, z2, z3) = a(z1, z2, z3)
`
`with the recursive 3-D all-pole IIR filter (Fig. 1).
`
`HI (z1, z2, z3) = 1/b(z1, z2, z3)
`
`(5)
`
`(6)
`
`On the contrary, the direct form II realization of the 3-D filter (3) results from the cascade
`configuration of the subfilters HF (z1, z2, z3), HI (z1, z2, z3) in reverse order (Fig. 2). The
`space-invariance property [39] of the filter considered ensures that the transfer function
`remains unchanged in both realizations.
`The direct form II realization of a 3-D IIR filter is described by the equations:
`
`M1
`
`M2
`
`M3
`
`bi1,i2,i3 w(n1 − i1, n2 − i2, n3 − i3)
`
`X i
`X i
`X i
`
`1=0
`
`2=0
`
`3=0
`
`w(n1, n2, n3) = u(n1, n2, n3) −
`
`(7a)
`
`(7b)
`
`(i1, i2, i3) 6= (0, 0, 0)
`
`N1
`
`N2
`
`N3
`
`ai1,i2,i3 w(n1 − i1, n2 − i2, n3 − i3)
`
`X i
`X i
`X i
`
`2=0
`
`3=0
`
`1=0
`
`y(n1, n2, n3) =
`
`where w(n1, n2, n3) is an intermediate variable. Here the delays UD1, UD2 in the directions
`of n1 and n2, corresponding to the variables z1 and z2 of the transfer function, may be shared
`by the subfilters HI (z1, z2, z3) and HF (z1, z2, z3).
`Consider the Row by Row, Plane by Plane (RRPP) scanning, in a 3-D frame of size
`J1 × J2 × J3, where the inputs are processed sequentially along the three rectangular axes;
`then the mapping of the spatial pile (n1, n2, n3) to the lexicographic index is determined by
`the index mapping
`
`I (n1, n2, n3) = n1 + J1n2 + J1 J2n3
`
`(8)
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 338
`
`
`
`3-D DIGITAL FILTERS
`
`339
`
`The adoption of the RRPP scanning, implies that the Z-Transform (ZT) operator is associ-
`ated to the unit delays UD1, UD2, UD3 (denoted as simple delays by 1 in the corresponding
`equations), according to the following relations:
`
`z−1
`1 Z T [x (n1, n2, n3)] = Z T [x (n1 − 1, n2, n3)]
`z−1
`2 Z T [x (n1, n2, n3)] = Z T [x (n1, n2 − 1, n3)]
`z−1
`3 Z T [x (n1, n2, n3)] = Z T [x (n1, n2, n3 − 1)]
`
`(9a)
`
`(9b)
`
`(9c)
`
`which show the correspondance of the variables z1, z2, z3 with the unit delays UD1, UD2,
`UD3.
`
`III. Fast Implementation of 3-D Filters Via Systolic Arrays
`
`In this section we describe implementation structures of the 3-D FIR and IIR filters via
`VLSI array processors, which are based on the nonrecursive direct form realization and on
`the recursive direct form II realization respectively.
`
`A. 3-D FIR Digital Filters
`
`The systolic arrays implementation of a 3-D FIR digital filter consists of a 2-D array of
`Processing Units (PUs), which are locally interconnected (Fig. 3a). Each PU is formed as a
`cascade configuration of N1 +1 elementary PEs (Fig. 3b). There are in total (N2 +1)(N3 +1)
`PUs denoted as PU (i, j ), i = 0, 1, . . . , N2, j = 0, 1, . . . , N3. Moreover, each PU(i, j )
`operates with a separate input the sample u(n1, n2 − i, n3 − j ). Thus the delays D2, D3,
`which due to the structure of the RRPP scanning correspond to large delays (D2 = J1 D1,
`D3 = J1 J2 D1), do not have to be implemented.
`The structure of each PE is shown in Fig. 3c. The output of the PE is given by
`
`ai1,i2,i3 w(n1 − i1, n2 − i2, n3 − i3)
`
`(10)
`
`N1
`
`X i
`
`1=0
`
`vi2,i3 =
`
`Considering that the delay of local communication is negligible, the delay UD1 is con-
`sidered to be equal to the time T needed to execute the operations in a PE, i.e.
`
`UD1 = T = Mu + Ad
`
`(11)
`
`where Mu and Ad denote the time needed to execute one multiplication and one addition
`respectively.
`The whole structure operates with column block pipelining [40]. The maximum allowed
`throughput rate is therefore determined by
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 339
`
`(12)
`
`1 T
`
`R ≤
`
`Thus considering Mu = 115 ns and Ad = 19 ns for the 16 bit multipliers and adders [41],
`
`
`
`340
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`(a)
`
`(b)
`
`Figure 3. The systolic arrays’ implementation of the direct form realization of a 3-D FIR digital filter: (a). The
`layout diagram, (b). The PU(i, j ), i = 0, 1, . . . , n2, j = 0, 1, . . . , n3, (c). The structure of the processing element
`(PE).
`
`(c)
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 340
`
`
`
`3-D DIGITAL FILTERS
`
`341
`
`we obtain T = 134 ns and
`
`= 7.46 × 106 OPS/sec = 7.46 MOPS/sec
`
`1 T
`
`R ≤
`
`The latency or throughput delay is defined as “the time separating the appearance of an
`input sample at the input port from the appearance of the corresponding output sample at
`the output port”. However, the latency (or throughput delay) is not of great importance in
`most applications. Only the latency of the feedback loops, whenever they exist, is critical
`since it imposes an upper bound in the achieved throughput rate [23], [25].
`In order to achieve global synchronization and pipelining of the proposed systolic archi-
`tecture, with clock period T , the outputs of the summers in the right column of Fig. 3a are
`delayed by the cutsets of the array that are orthogonal to the schedule vector s = [1 1]T .
`Each cutset introduces delays equal to the throughput delay (N1 + 1)T of the PUs. Then
`the latency of the whole array implementation is readily found to be
`
`L = [(N1 + 1)(N2 + N3 + 2)]T
`
`(13)
`
`since there are (N2 + N3 + 2) PUs along the schedule vector s.
`
`B. 3-D IIR Digital Filters
`
`The systolic arrays implementation of a 3-D IIR digital filter, unlike the FIR one, involves
`at least one feedback loop, due to its recursive nature. Since there are three independent
`variables that are represented by the three delays D1, D2, and D3 and only one type of the
`delays is involved in the PUs, a 2-D array of PUs is needed for the implementation of a
`3-D IIR digital filter (Fig. 4a). Each PU is a bi-directional array of elementary PEs and
`uses only the delays D1. The PU(0, 0) involves the feedback loop and is shown in Fig. 4b.
`The PU(0, j ), j = 1, 2, . . . , M3 and the PU(i, j ), i = 1, 2, . . . , M2, j = 0, 1, . . . , M3 are
`shown in Fig. 4c and Fig. 4d respectively. The structure of a typical PE is shown in Fig. 4e.
`Some remarks relatively to the proposed implementation are in order:
`
`1. Only the upper left PU has a feedback loop in its first PE. However, note that the sum
`
`M1
`
`M2
`
`M3
`
`v(n1, n2, n3) =
`
`bi1,i2,i3 w(n1 − i1, n2 − i2, n3 − i3)
`
`X i
`X i
`X i
`
`2=0
`
`3=0
`
`1=0
`
`(i1, i2, i3) 6= (0, 0, 0)
`
`(14)
`
`is fedback in the summer of the feedback loop.
`
`2. The delays UD1 in the PEs are used for both the forward and the feedback paths, as
`occurs in the 1-D and 2-D direct form II realization structures.
`
`3. After rescaling of the time units [19], [25], the feedback loop involves two delays UD1,
`i.e. the pipelining period (time scaling factor) is a = 2 and the input data have to be
`interleaved with blank data.
`It results that the latency of the feedback loop, which
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 341
`
`
`
`342
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`(a)
`
`(b)
`
`Figure 4. The systolic arrays’ implementation of the direct form II realization of a 3-D IIR digital filter: (a). The
`layout diagram, (b). The processing unit PU(0,0) with the feedback loop, (c). The PU(0, j ), j = 1, 2, . . . , m3.
`
`(c)
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 342
`
`
`
`3-D DIGITAL FILTERS
`
`343
`
`(d)
`
`(e)
`
`Figure 4. Continued. The systolic arrays’ implementation of the direct form II realization of a 3-D IIR digital
`filter: (d). The PU(i, j ), i = 1, 2, . . . , m2, j = 0, 1, . . . , m3, (e) The structure of a processing element.
`
`determines the iteration bound in the recursive structures, equals two rescaled delays,
`i.e.
`
`L f = 2 UD1
`
`(15)
`
`4. The minimum possible rescaled delay UD1 equals the time needed to execute the
`computations in the critical path of a PE, considering that the local communication
`time is negligible. Due to the system’s regularity and modularity, all the paths include
`one multiplication and one addition, i.e.
`
`UD1 = T = Mu + Ad
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 343
`
`(16)
`
`
`
`344
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`5. The auxiliary variable w(n1, n2, n3) is produced at the first PE of the PU(0, 0). This
`variable needs to be delayed w.r.t. the three indices n1, n2, n3. The delay w.r.t. n1 is
`implemented in the PEs of each PU. The delays w.r.t. n2 and n3 are implemented by using
`two type of registers R2i , i = 1, 2, . . . , N2 and R3 j , j = 1, 2, . . . , N3 respectively.
`
`6. The PU(i, j ), i = 1, 2, . . . , N2, j = 0, 1, . . . , N3 are equipped with the registers R2i ,
`i = 1, 2, . . . , N2. Moreover, the PU(0, j ), j = 1, 2, . . . , N3 are equipped with the
`registers R3 j , j = 1, 2, . . . , N3. All the registers are of first input, first output (FIFO)
`type. Specifically, the register R2i stores the J1 samples (Fig. 5)
`
`w(n1, n2 − i − 1, n3 − j ), w(n1 + 1, n2 − i − 1, n3 − j ), . . . ,
`
`w( J1 − 1, n2 − i − 1, n3 − j ), w(0, n2 − i, n3 − j ), . . . , w(n1 − 2n2 − i, n3 − j )
`
`w(n1 − 1, n2 − i, n3 − j )
`
`at the time instant before the introduction of w(n1, n2 − i, n3 − j ); at the clock cycle
`where w(n1, n2 − i, n3 − j ) is stored, the oldest sample w(n1, n2 − i − 1, n3 − j ) is
`erased. Moreover the register R3 j stores the J1 J2 samples (Fig. 6)
`
`w(n1, n2, n3 − j − 1), w(n1 + 1, n2, n3 − j − 1), . . . , w( J1 − 1, n2, n3 − j − 1),
`
`w(0, n2 + 1, n3 − j − 1), . . . , w( J1 − 1, J2 − 1, n3 − j − 1),
`
`w(0, 0, n3 − j ), . . . , w(n1 − 1, n2, n3 − j )
`
`at the time instant before the introduction of w(n1, n2, n3 − j ); at the clock cycle where
`w(n1, n2, n3 − j ) is stored, the oldest sample w(n1, n2, n3 − j − 1) is released and
`erased.
`
`The maximum allowed throughput rate of the whole systolic implementation structure is
`determined as function of the latency of the feedback loop and is given by [23], [26]
`
`(17)
`
`1
`
`2(Mu + Ad)
`
`=
`
`=
`
`1
`
`2T
`
`1 L
`
`f
`
`R ≤
`
`The maximum sampling rate at the input port is confined by the throughput rate of the
`implementation, i.e.
`
`FS ≤ R
`
`(18)
`
`The latency of the whole systolic implementation of the 3-D IIR digital filter is found to
`be
`
`L = 2(M1 + 1)(M2 + M3 + 1)T
`
`(19)
`
`since there are (M2 + M3 + 1) PUs along the schedule vector s = [1 1]T . The latency of
`each PU equals to (M1 + 1)T and the PEs are pipelined in both directions.
`Considering again Mu = 115 ns and Ad = 19 ns for the 16 bit multipliers and adders
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 344
`
`
`
`3-D DIGITAL FILTERS
`
`345
`
`Figure 5. The structure of the register R2i .
`
`[41] we find that the throughput rate of the recursive 3-D IIR digital filter is
`
`R =
`
`1
`
`2T
`
`= 3.73 × 106 OPS/sec = 3.73 MOPS/sec
`
`For comparison of the speed requirements we refer that the real- time processing of a 2-D
`256 × 256 image with a TV scan rate of 30 images per second and one operation per pixel
`requires a sampling rate FS = 1.97 MOPs.
`
`V. Conclusions
`
`The systolic VLSI array processors’ implementation of the direct form realization of 3-D IIR
`digital filters is presented. Row by row, plane by plane scanning of the processed images
`is considered. Registers operating at the sample rate are used for fast implementation
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 345
`
`
`
`346
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`Figure 6. The structure of the register R3 j .
`
`of the “slow” delay operator. The proposed structures are modular, regular, use only
`local communications, and achieve high throughput rates using pipelining. The obtained
`implementations of the 3-D digital filters are useful for the fast processing of medical
`images, 3-D video signals, 3-D computer vision images and of time-varying images.
`The concurrency, and therefore the throughput rate of the proposed structures, may be
`increased by exploiting parallelism in addition of pipelining. Parallel computation can be
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 346
`
`
`
`3-D DIGITAL FILTERS
`
`347
`
`achieved by adopting the bright idea of diagonal processing [43], [44], [6], which is based
`on a modified sampling of the original 3-D signal, such that the sampling raster is distinct
`in each one of any 3 consecutive hyperplanes. Then all the pixels belonging to the same
`hyperplane are processed simultaneously and in parallel. In 3-D case, the hyperplanes are
`parallel to the diagonal planes at 45◦. Then all the 3-D computations are organized as
`1-D ones (corresponding to a time axis), at the expense of additional hardware. The set of
`the pixels belonging to a diagonal hyperplane that can be computed simultaneously form
`a grid pattern, which is called diagonal hyperstructure. The diagonal 3-D processing is
`ideal for the processing of time dependent images. The systolic implementation of 3-D IIR
`digital filters using the diagonal 3-D processing will highly increase the attained throughput
`rate, since it will produce all the pixels belonging on a plane, in a time period equal to
`the latency 2T of the feedback loop. Thus, real time processing of 3-D images will be
`possible. The above interesting improvement in 3-D systolic implementation is currently
`under consideration.
`
`References
`
`1. B. G. Mertzios and A. N. Venetsanopoulos, “Modular Realization of M-dimensional Filters,” Signal Pro-
`cessing, vol. 7, 1984, pp. 351–369.
`
`2. A. N. Venetsanopoulos and B. G. Mertzios, “Real-Time Image Processing with Decomposition Structures,”
`Time-Varying Image Processing and Moving Object Recognition (V. Cappellini, ed.), Amsterdam: Elsevier
`Science Publishers B.V. (North-Holland), 1987, pp. 3–18.
`
`3. R. L. Webber and R. N. Nagel, “Three Dimensional Enhancement of Two-Dimensional Images,” Journal
`of Clinical Engineering, 1980, pp. 41–50.
`
`4. A. Fettweis, “Multidimensional Circuits and Systems Theory,” Tutorial Lecture, Proc. IEEE Int. Symposium
`on Circuits Syst., Montreal, Canada, May 1984, pp. 951–957.
`
`5. A. Fettweis, “Multidimensional Digital Filters with Closed Loop Behavior Designed by Complex Network
`Theory Approach,” IEEE Trans. on Circuits Syst., vol. CAS-34, no. 4, 1987, pp. 338–344.
`
`6. X. Liu and A. Fettweis, “Multidimensional Digital Filtering by Using Parallel Algorithms Based on Diagonal
`Processing,” Multidimensional Systems and Signal Processing, vol. 1, 1990, pp. 51–66.
`
`7. A. Fettweis, “Discrete Modelling of Lossless Fluid Dynamic Systems,” Archiv fur Elektronik und Ubertra-
`gungstechnik (AEU), vol. 46, no. 4, 1992, pp. 209–218.
`
`8. L. B. Bruton and N. R. Bartley, “The Design of Highly Selective Adaptive Three Dimensional Recursive
`Cone Filters,” IEEE Trans. on Circuits Syst., vol. CAS-34, no. 7, 1987, pp. 775–781.
`
`9. M. Zervakis and A. N. Venetsanopoulos, “Three-Dimensional Rotated Digital Filters: Design, Stability and
`Applications,” Circuits, Signal and Signal Processing, vol. 9, no. 4, 1990, pp. 383–408.
`
`10. V. Cappellini, L. Alparone, G. Galli, P. Lange, A. Mecocci and L. Menichetti, “Digital Processing of Stereo
`Images and 3-D Reconstruction Techniques,” Int. J. Remote Sensing, vol. 12, no. 3, 1991, pp. 477–490.
`
`11. C. C. Jaffe, “Medical Imaging,” American Scientist, vol. 70, 1982, pp. 576–585.
`
`12. V. Cappellini, R. Carla and M. Melani, “3-D Digital Filtering of Biomedical Images,” Proc. 1986 European
`Signal Processing Conference, The Hague, The Netherlands, Sept. 1986, pp. 1383–1386.
`
`14. R. Carla et. al., “3-D Reconstruction of Carotid Vessel by Echographic Sections,” Time-Varying Image
`Processing and Moving Object Recognition (V. Cappellini, ed.), Amsterdam: Elsevier Science Publishers
`
`13. G. Garibotto, S. Garozzo, M. Micca, G. Piretta and C. Giorgi, “Three Dimensional Digital Signal Processing
`in Neurosurgical Applications,” Proc. of the Int. Conf. on Digital Signal Processing, Firenze, Italy, 1981,
`pp. 434–444.
`
`B.V. (North-Holland), pp. 153–157, 1987.Petitioner Microsoft Corporation - Ex. 1018, p. 347
`
`
`
`348
`
`B. G. MERTZIOS AND A. N. VENETSANOPOULOS
`
`15. D. H. Ballard and C. M. Brown, Computer Vision, Englewood Cliffs, NJ: Prentice Hall, 1982.
`
`16. J-Y. Quellet and E. Dubois, “Sampling and Reconstruction of NTSC Video Signal at Twice the Color
`Subcarrier Frequency,” IEEE Trans. on Commun., vol. COM-29, 1981, pp. 1823–1832.
`
`17. N. Keskes, A. Boulanovar and O. Faugeras, “Application of Image Analysis Techniques to Seismic Data,”
`Proc. IEEE Int. Conf. on Acoust. Speech, Signal Processing, ICASSP-82, Paris, May 1982.
`
`18. G. Tascini, “Intrinsic Three-Dimensional Representation of Digital Images,” Proc. of Mediterranean Elec-
`trotechnical Conf., MELECON 83, Paper A8.07, Athens, Greece, May 1983.
`
`19. S. Y. Kung, H. J. Whitehouse and T. Kailath, Eds., VLSI and Modern Signal Processing, Englewood Cliffs,
`NJ: Prentice-Hall, 1985.
`
`20. S. Y. Kung, VLSI Array Processors, Englewood Cliffs, NJ: Prentice-Hall, 1987.
`
`21. H. T. Kung, “Why Systolic Architectures,” Computer, vol. C-15, 1982, pp. 37–46.
`
`22. S. Y. Kung, “On Supercomputing with Systolic/Wavefront Array Processors,” Proc. IEEE, vol. 72, 1984,
`pp. 867–884.
`
`23. H. H. Lu, E. A. Lee, and D. G. Messerschmitt, “Fast Recursive Filtering with Multiple Slow Processing
`Elements,” IEEE Trans. on Circuits Syst., vol. CAS-32, no. 11, 1985, pp. 1119–1129.
`
`24. S.K. Rao and Th. Kailath, “VLSI arrays for digital signal processing: Part I - A model identification approach
`to digital filter realizations,” IEEE Trans. on Circuits Syst., vol. CAS-32, no. 11, 1985, pp. 1105–1118.
`
`25. K. K. Parhi and D. G. Messerschmitt, “Concurrent Cellular VLSI Adaptive Filter Architectures,” IEEE
`Trans. on Circuits Syst., vol. CAS-34, no. 10, 1987, pp. 1141–1151.
`
`26. B. G. Mertzios, “Fast Implementation of Multivariable Linear Systems Via VLSI Array Processors,”
`COMPEL—The International Journal for Computation and Mathematics in Electrical and Electronic En-
`gineering, vol. 10, no. 1, 1991, pp. 1–10.
`
`27. B. G. Mertzios and A. N. Venetsanopoulos, “Implementation of Quadratic Digital Filters Via VLSI Array
`Processors,” Archiv fur Elektronik und Ubertragungstechnik (AEU), vol. 43, no. 3, 1989, 153–157.
`
`28. B. G. Mertzios and St. Scarlatos, “On the Systolic Implementation of Wave Digital Filters,” Archiv fur
`Elektronik und Ubertragungstechnik (AEU), vol. 45, no. 6, 1991, pp. 335–343.
`
`29. B. G. Mertzios, “Fast Block Implementation of Two-Dimensional Recursive Digital Filters via VLSI Array
`Processors,” Archiv fur Elektronik und Ubertragungstechnik (AEU), vol. 44, no. 1, 1990, pp. 55–58.
`
`30. T. Aboulnasr and W. Steenart, “Real-Time Systolic Array Processor for 2-D Spatial Filtering,” Proc. Third
`European Signal Processing Conf., EUSIPCO-86, The Hague, The Netherlands, Sept. 1986, pp. 687–690.
`
`31. N. R. Shanbhag, “An Improved Systolic Arcitecture for 2-D Digital Filters,” IEEE Trans. on Circuits Syst.,
`vol. CAS-39, no. 5, 1991, pp. 1195–1202.
`
`32. B. G. Mertzios and A. N. Venetsanopoulos, “Fast Direct Implementations of Two-Dimensional IIR Digital
`Filters via Systolic and Wavefront Arrays,” Int. J. of Circuit Theory and Applications, vol. 21, 1993, pp. 275–
`285.
`
`33. K. K. Parhi and D. G. Messerschmitt, “Pipeline Interleaving and Parallelism in Recursive Digital Filters,
`Part I: Pipelining Using Scattered Look-Ahead Decomposition,” IEEE Trans. on Acoust., Speech, Signal
`Processing, vol. ASSP-37, 1989, pp. 1099–1117.
`
`34. K. K. Parhi and D. G. Messerschmitt, “Pipeline Interleaving and Parallelism in Recursive Digital Filters, Part
`II: Pipelined Incremental Block Filtering,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-
`37, 1989, pp. 1118–1134.
`
`35. K. K. Parhi and D. G. Messerschmitt, “A Bit-Parallel Bit Level Recursive Filter Architecture,” Proc. IEEE
`Int. Conf. Comput. Design, New York, 1986.
`
`36. B. G. Mertzios, “Pipelining the Three-Port Wave Filter Adaptor at the Bit Level,” Circuits, Signal and Signal
`Processing, vol. 14, no. 3, 1995, pp. 285–298.
`
`37. B. G. Mertzios, “Block Realization of 2-D IIR Digital Filters,” Signal Processing, vol. 7, no. 2, 1984,
`pp. 135–149.
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 348
`
`
`
`3-D DIGITAL FILTERS
`
`349
`
`38. D. E. Dudgeon and R. M. Mersereau, Two-Dimensional Digital Signal Processing, Englewood Cliffs, NJ:
`Prentice-Hall Inc., 1985.
`
`39. B. G. Mertzios, “Block-Space Invariance of Two-Dimensional Digital Signals,” Signal Processing, vol. 13,
`1987, pp. 141–153.
`
`40. J. R. Jump and S. R. Ahuja, “Effective Pipelining of Digital Systems,” IEEE Trans. on Computers, vol. C-27,
`1978, pp. 855–865.
`
`41. A. N. Venetsanopoulos and V. Cappellini, “Real-Time Image Processing,” in Multidimensional Systems:
`Techniques and Applications, Marcel Dekker Inc., ch. 8, pp. 345–399, 1986.
`
`42. L. B. Bruton and N. R. Bartley, “A General Purpose Computer Program for the Design of Two-Dimensional
`Recursive Filters,” Circuits, Signal and Signal Processing, vol. 3, no. 2, 1984, pp. 243–264.
`
`43. A. Fettweis, “Principles of Multidimensional Wave Digital Filtering,” Digital Signal Processing (J. K. Ag-
`garwal, ed.), Point Lobas Press, 1979.
`
`44. X. Liu and A. Fettweis, “Multidimensional Digital Filtering by Using Parallel Algorithms Based on Diagonal
`Processing,” Multidimensional Systems and Signal Processing, vol. 1, 1990, pp. 51–66.
`
`Petitioner Microsoft Corporation - Ex. 1018, p. 349
`
`