`
`
`
`
`OLYMPUS EX. 1021 -1/15
`
`OLYMPUS EX. 1021 - 1/15
`
`
`
`SIGNAL PROCESSING
`
`CONTENTS
`
`Volume 6 (1984) NO. 4
`
`Regular Papers
`
`M. Vetterli and H. Nussbaumer, Simple FFT and DOT algorithms with reduced number
`
`of operations
`
`J. B. Martens, Convolution algorithms, based on the CRT (Chinese Remainder
`Theorem)
`
`T. Gtingen and N. C. Geckini, An algorithm for formant tracking
`
`E.
`
`I. Plotkin,‘ L. M. Roytman and N. M. S. Swamy, Unbiased estimation of an initial phase
`
`F. F. Liedtke, Computer simulation of an automatic classification procedure for
`digitally modulated communication signals with unknown parameters
`
`Author Index
`
`Short Communications
`
`G. H. Allen, Programming an efficient radix-four FFT algorithm
`
`J. Le Roux, Non symmetric lattice structure for pole zero filters
`
`Book Review
`
`Announcements + Call for Papers
`
`OLYMPUS EX. 1021 - 2/15
`
`
`
`NAL PROCESSING
`
`A
`I
`
`I
`
`‘opean Journal devoted to the methods and applications of signal processing
`alication of the European Association for Signal Processing (EURASIP)
`
`M. Nagao (Kyoto, Japan)
`O. D. Faugeras (Paris, France)
`ir-in—Chiet
`H. Niemann (Erlangen, West Germany)
`A. L. Fawe (Sart Tilman, Belgium)
`‘
`t Kunt
`F. Pellandini (Neuchatel, Switzerland)
`A. Fettweis (Bochum, West Germany)
`ratoire de traitement de signaux
`8. Picinbono (Gif-sur-Yvette, France)
`J. G. Gander (Solothurn, Switzerland)
`3 polytechnique federale de Lausanne
`B. S. Ramakrishna (Bangalore, India)
`C, Gazanhes (Marseille, France)
`lhemin de Bellerive
`F. Rocca (Milano, Italy)
`.
`D. Godard (La Gaude, France)
`1007 Lausanne Switzerland
`8. Sankur (Istanbul, Turkey)
`G. H. Granlund (Linkoeping, Sweden)
`)2147 26 26 Telex: 24478
`L. L. Scharf (Kingston, USA)
`C. Gueguen (Paris, France)
`)rial Board
`J. C. Simon (Paris, France)
`T. S. Huang (West Lafayette, USA)
`ellanger (Le Plessis Robinson, France)
`C. Y. Suen (Montreal, Canada)
`Z. Kulpa (Warsaw, Poland)
`3ite (Mons, Belgium)
`Ju—Wei Tai (Peking, China)
`J. L. Lacoume (Grenoble, France)
`raccini (Genova, Italy)
`H. Tiziani (Heerbrugg, Switzerland)
`M. A. Lagunan-Hernandes
`appellini (Florence, Italy)
`D. Wolf (Frankfurt, West Germany)
`(Barcelona, Spain)
`arayannis (Athens, Greece)
`G. Winkler (Karlsruhe, West Germany)
`D. S. Lebedev (Moscow, USSR)
`.Constantinides (London, UK.)
`I. T. Young (Delft, The Netherlands)
`S. Levialdi (Napoli, Italy)
`acoulon (Lausanne, Switzerland)
`P. Zamperoni (Braunschweig, West
`C. E. Liedtke (Hannover, West Germany)
`Durrani (Glasgow, UK.)
`Germany)
`J. Max (Grenoble, France)
`lndres (Darmstadt, West Germany)
`scudié (Lyon, France)
`R. De Mori (Montreal, Canada)
`Secretary: I. Bezzi (Miss)
`
`orial Policy. SIGNAL PROCESSING is a European Journal
`Data PlOCeSSlng
`Remote Sensing
`.enting the theory and practice of signal processing.
`Its
`Communication Signal Processing
`iary objectives are:
`Biomedical Signal Processing
`lemination of research results and of engineering develop-
`Geophysical and Astrophysical Signal Processing
`its to European signal processing groups and individuals;
`Earth Resources Signal Processing
`sentation of practical solutions to current signal processing
`Acoustic and Vibration Signal Processing
`stems in engineering and science.
`Signal Processing Technology
`editorial policy andthetechnical content ofthe Journal are
`Speech Processing
`responsibility ofthe Editor—in-Chief and the Editorial Board.
`i Journal is self—supporting from subscription Income and
`Radar Signal PFOCGSSWQ
`Industrial Applications
`tains a minimum amount of advertisements. Advertisements
`subject to the prior approval of the Editor—in—Chief.
`
`Sonar Signal Processing
`SPECIE' Signal PiOCGSSlng
`New Applications
`
`upe. SIGNAL PROCESSING incorporates all aspects of the
`Dry and practice of signal processing (analogue and digital).
`iatures original research work, tutorial and review articles,
`I accounts of practical developments.
`It
`is intended for a
`id dissemination of knowledge and experience to engineers
`I scientists working in signal processing research, develop—
`nt or practical application.
`
`Jject coverage. Subject areas covered by the Journal include:
`nal Theory
`Signal Processing Systems
`ichastic Prooesses
`Software Developments
`tection and Estimation
`Image Processing
`sctral Analysis
`Pattern Recognition
`ering
`Optical Signal Processing
`
`Membership and Subscription Information.
`SignaIProcessing (ISSN 0165—1684) is publishedintwovolumes
`(Bissues) ayear.The subscription pricefor1984(Volumes 6, 7) is
`Dfl. 350.00 + DfI. 42.00 p.p.h. : Dfl. 392.00. Our p.p.h. (postage,
`packing and handling) charge includes surface delivery of all
`Issues, except to subscribers in the USA/Canada and India, who
`receive all issues by airdelivery (S.A.L. a Surface Airl Lifted) at no
`extracost. Forthe restlofthe world, airmailand S.A.L.charges are
`available upon request. Claims for missing issues will be
`honouredfree ofchargewithinthree months afterthe publication
`date of the issues. Mail orders and inquiries to: Elsevier Science
`Publishers B.V., Journals Department, PO. Box 211, 1000 AE
`Amsterdam, The Netherlands. For full membership information
`of the Association, including a subscription at a reduced rate,
`please contact: EURASIP. PO. Box 134, CH—1000 Lausanne13,
`Switzerland.
`
`
`1984, Elsevier Science Publishers B.V. (North—Holland)
`
`stem or transmitted in any form or by any means, electronic, mechanical,
`e Publishers B.V. (Norttholland), PO. Box 1991, 1000 BZ
`
`rights reserved. No part of this publication may be reproduced, stored in a retrieval sy
`)tocopying, recording or otherwise, without the prior permission of the publisher, Elsevier Scienc
`sterdam, The Netherlands.
`Jmission of an article for publication implies the transfer of the copyright from the author(s) to the publisher and entails the author's irrevocable and exclusive
`and/ortoactinoroutofCourtin
`horization oi the publisher to collect any sums or considerations for copying or reproduction payable by third parties (as mentioned in article 17 paragraph 2 of the
`Ich Copyright Act of 1912 and in the Royal Decree of June 20, 1974 (S. 351) pursuant to article 16b ofthe Dutch Copyright Act of 1912)
`inection therewith.
`scial regulations for readers in the U.S.A. A This journal has been registered with the Copyright Clearance Center. Inc. Consent Is given for copying of articles for
`is This consent is given on the condition that the copier pays through the Center the per-copy tee stated in
`'sonal or internal use, or for the personal use of specific clien .
`1
`code on the first page of each article for copying beyond that permitted by Sections 107 or 108 oi the US. Copyright Law. The appropriate fee should be (orwarded with
`opy ofthefirst page otthe articletothe Copyright Clearance Center, Inc,, 21 Congress Street, Salem,
`tgiven broad consent to copy and permission to copy must be obtained directly from the author. All articles published priorto 1981 may be copied fora perscopy tee oi
`$2.25, also payable through the Center. (NB. For review journals this fee is $0.25 per copy per page.) This consent does not extend to other kinds oi copying, such as
`general distribution, resale, advertising and promotion purposes, or forcreating new collective works. Special written permission must be obtained from the publisher
`such copying.
`will be asked to transfer copyright of the article to the publisher.
`eclal regulations [or authors in the USA. 7 Upon acceptance of an article by the journal, the author(s)
`is transfer will ensure the widest possible dissemination of intormation under the US. Copyright Law.
`
`ibiished bimonihiy
`
`Printed in The Netherlands
`
`OLYMPUS EX. 1021 - 3/15
`
`OLYMPUS EX. 1021 - 3/15
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`Signal Processing 6 (1984) 267—278
`North-Holland
`
`SIMPLE FFT AND DCT ALGORITHMS WITH REDUCED NUMBER OF
`OPERATIONS
`
`/
`Martin VETTERLI, member EURASIP, and Henri JKNNU§SBAUMER
`Laboratoire d’Informatique Technique, Ecole Polytechnique Fédérale de Lausanne, 16 Chemin de Bellerive, CH-1007 Lausanne,
`Switzerland
`
`Received 2 November 1983
`Revised 20 February 1984
`
`Abstract. A simple algorithm for the evaluation of‘discrete Fourier transforms (DFT) and discrete cosine transforms (DCT)
`is presented. This approach, based on the divide and conquer technique, achieves a substantial decrease in the number of
`additions when compared to currently used FFT algorithms (30% for a DFT on real data, 15% for a DFT on complex data
`and 25% for a DCT) and keeps the same number of multiplications as the best known FFT algorithms. The simple structure
`of the algorithm and the fact that it is best suited for real data (one does not have to take a transform oftwo real sequences
`simultaneously anymore) should lead to efficient implementations and to a wide range of applications
`
`Zusammenfassung. Ein einfacher Algorithmus zur Berechnung von diskreten Fourier Transformationen (DFT) und diskreten
`Cosinus Transformationen (DCT) wird vorgeschlagen. Diese Methode, basierend aufder “Teilen und Losen” Technik, erlaubt
`eine Verkleinerung der Anzahl Additionen gegeniiber gebraiichlichen FFT Algorithmen (30% fiir eine DFT von einem reellen
`Signal, 15% Hit eine DFT von einem complexen Signal und 25% fiir eine DCT) und braucht gleichviel Multiplikationen wie
`die besten bekannten FFT Algorithmen. Die einfache Struktur des Algorithmus und der Fakt dass er am besten fiir reelle
`Signale geeignet
`ist (man braucht nicht mehr gleichzeitig zwei reelle Signale zu transformieren) sollten zu effizienter
`Implementierung und zu zahlreichen Applikationen fiihren.
`
`Résumé. Un algorithme simple pour l’évaluation de la transformée de Fourier discrete (DFT) et de la transformée en cosinus
`discrete (DCT) est proposé. Cette approche, basée sur la méthode de la “division et solution”, permet une diminution
`Substantielle du nombre d’additions par rapport aux algorithmes de FFT courants (30% pour une DFT de signaux réels,
`15% pour une DFT de signaux complexes et 25% pour une DCT) tout en gardant un nombre de multiplications ‘égal a celui
`des meilleurs algorithmes de FFT connus. La structure simple de l’algorithme ainsi que le fait qu’il s’applique bien aux
`signaux réels (i1 n’y a plus besoin de prendre la transformée de deux signaux réels simultanément) devraient conduire a une
`implantation efficace ainsi qu’a un large champ d’applications.
`
`Ol65-1684/84/$3.00 © 1984, ‘Elsevier Science Publishers B.V. (North—Holland)
`
`transform (WFTA) [6], although a beautiful result in complexity theory, did not bring the expected
`improvements once implemented on real life computers [7], essentially due to the large total number of
`operations and to the structural complexity of the algorithm.
`The fact that most FFT’s are taken on real data is seldom fully taken into account. The algorithm using
`a FFT of half dimension for'the computation of a DFT on a real sequence [8] uses substantially more
`operations than the method of computing a single FFT on two real sequences simultaneously [9]. The
`
`Keywords. Fast Fourier transform, fast cosine transform, transforms of real data.
`
`1. Introduction
`
`Since the rediscovery of the fast Fourier transform (FFT) algorithm [1, 2] for the evaluation of discrete
`Fourier transforms, several improvements have been made to the basic divide and conquer scheme as for
`example the mixed radix FFT [3] and the real factor FFT [4, 5]. The introduction of the Winograd Fourier
`
`OLYMPUS EX. 1021 - 4/15
`
`
`
`
`
`3
`
`
`
`l
`
`268
`
`M. Verterli, H. J. Nilssbaumer/ Simple FFT and DCT Algorithms
`
`latter method has the disadvantage that one has to take two DFT’s at once and that the sorting of the
`output uses additional adds. The fact that the input and output sequences are real is used explicitly in a
`real convolution algorithm [10] where the DFT and inverse DFT are computed with a single complex FFT.
`Another transform that
`is mostly applied to real data is the discrete cosine transform. Since the
`introduction of the DCT [11], the search for a fast algorithm followed two main different approaches.
`One was to compute the DCT through a FFT of same dimension [12], where one is bound to take two
`transforms simultaneously. The other was a direct approach, leading to rather involved algorithms [13].
`It should be noted that the former technique outperforms all the latter ones when using optimal FFT’s,
`a fact often left in the dark [14].
`_
`Recently, evaluation of signal processing algorithms has shifted away from multiplication counts alone
`to the counting of the total number of operations, including data transfers [15]. This is due to the fact
`that the ratios (multiplication time)/(addition time) and (multiplication time)/(load time) are close to one
`on most computers and signal processors. Another growing concern has been the generation of time
`efficient software [16], and finally, the efficiency of an algorithm turns out to be a non-trivial combination
`of the various operation counts as well as of its structural complexity [17].
`In this communication, we address an old problem, namely, the efficient evaluation of DFT’s and DCT’s
`of real data. Efficiency is meant in the sense of m‘inimal number of multiplications and additions as well
`as in the sense of structural simplicity. As it turns out, the two problems are closely related, since a DFT
`of dimension N can be evaluated with two DCT’s of size N/4 and since a DCT of size N can be evaluated
`with a DFT of size N and additional operations. The same technique can be applied again to the reduced
`DCT of size N/4 and to the DFT of size N, and this until only trivial transforms are left over (N z 1, 2).
`This leads to an elegant recursive formulation ofthe two algorithms and to a number of multiplications
`identical
`to the best FFT’s while diminishing substantially the number of additions (typically 30%).
`Interestingly, this last saving is partly kept when computing complex DFT’s, and as an example, the total
`number of operations for a 1024-point transform is nearly 10% below the number of operations required
`for a lOO8—point WFTA. The prime factor FFT (PFA) requires about the same number of operations [18],
`but has a more complex structure.
`Note that the algorithms below were developed while searching for an efficient way to compute DCT’s
`of real data. The derived FFT algorithm for real data that follows immediately requires a number of
`multiplications identical to the one found in [19] (which is a variation of the Rader—Brenner algorithm),
`and a total number of operations that can be found in [20]. While obtaining an identical complexity, the
`derivations are quite different and the algorithm below seems more suitable for programming.
`Section 2 is used to derive the general algorithm and Section 3 evaluates its computational complexity.
`In Section 4, the results are compared to other algorithms and some implementation considerations are
`addressed.
`
`2. Derivation of the algorithms
`
`Let us define the following transforms ofthe length-N real vector x with elements x(0), x(l) ‘
`
`-
`
`- x(N — 1):
`
`Discrete Fourier transform
`Nil
`DFT(k,N,x):= z x(n)-e1'2"""/N, k=0,...,N—1,
`11:0
`
`(1)
`
`where j =
`Signal Processing
`
`OLYMPUS EX. 1021 - 5/15
`
`OLYMPUS EX. 1021 - 5/15
`
`
`
`n20
`
`where (8) or (9) are applied when necessary.
`Hence, the cosine DFT of dimension N has been mapped into a cosine DFT of N/2 and a DCT of
`N/4, at a cost of N/4 input and N/2 output additions (note that cos-DFT(k, N, x) is evaluated for
`k =0, .
`.
`.
`, N/2 but that k: N/4 does not require an output addition). The cosine DFT of N/2 can be
`handled similarly (and this until the transform becomes trivial) and the case of the DCT will be treated
`below.
`
`Turning to the sine DFT, we take a similar approach, using the fact that the sine function is odd.
`N/Zil
`
`_
`
`.
`
`21T(2n+l)k
`
`(5)
`
`(6)
`
`(7)
`
`(8)
`
`(9)
`
`(10)
`
`(11)
`
`(12)
`
`(13)
`
`M, Verterli, H. JfNussbaumer/ Simple FFT and DCT Algorithms
`
`Discrete cosine transform
`
`DCT(k, N, x) I: Z x(n) - cos<%)—
`
`2
`
`2 +1 k
`
`N"
`n:0
`
`Cosine DFT
`
`Sine DFT
`
`sin—DFT(k, N, x) I: X x(n) - sin
`n:0
`
`...
`
`.
`
`(4)
`
`Note that in the definition of the DCT, the normalizing factor l/VE at k=0 is omitted for simplicity.
`While the transforms are usually only defined for k = 0,. .
`.
`, N — 1, one will see that other values are useful
`as well and can easily be obtained from the ones defined above.
`Obviously, the following relations hold:
`
`DFT(k, N, x) = cos-DFT(k, N, x) —j sin-DFT(k, N, x),
`
`DCT( N, N, x) = 0,
`
`DCT(—k, N, x) = DCT(k, N, x),
`
`DCT(ZN ~ k, N, x) = * DCT(k, N, x),
`
`cos-DFT(N k, N, x) = cos—DFT(k, N, x),
`
`sin-DFT(N ~ k, N, x) = —sin-DFT(k, N, x).
`
`Looking at the evaluation of cos-DFT(k, N, x), we note that since the cosine function is even:
`
`C05-DFT(1<,N,x)= M2271x(2n)'COS<2fink>+NE71(X(2n +1) +x(N—2n — 1)) . cos<w),
`
`nv—O
`
`N/2
`
`":0
`
`4- N/4
`
`or, in a more succinct form:
`
`cos-DFTUc, N, x): c0s-DFT(k, N/2, x.) +DCT(k, N/4, x2),
`
`k = O, .
`
`.
`
`.
`
`, N — l,
`
`with x1(n)=x(2n),
`
`n=0,...,N/2—1,
`
`x2(n)=x(2n+l)+x(N—2n—l), n=0,...,N/4—l,
`
`sin—DFT(k, N, x): Z x(2n) - sin<
`
`217nk
`
`N/2>+ E0 (x(2n+1)—x(N—2n—-1))
`
`N/AH
`
`sm(———4IN/4
`
`Vol. 6, No. 4, August I984
`
`OLYMPUS EX. 1021 - 6/15
`
`
`
`2I
`
`}2
`
`/
`
`
`
`memmwmwWWowmemmwafififwflfiéfinmfiW§AkkwlfllWfkwfléwmy
`
`i
`
`I
`
`5
`
`,
`
`3
`
` sgé
`
`,Eti
`
`g
`
`270
`
`M. Vetterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`Using the following identity:
`
`Sin<217(2n]\[+1)k> :(“1)“‘COS<27T(2n +1])\([N/4—k)>,
`we can rewrite (13), using (8) or (10) when necessary, as:
`
`sin-DFT(k, N, x) = sin-DFT(k, N/2, x,) +DCT(N/4— k, N/4, x3),
`with k3(n)=(—1)"-(x(2n+l)—x(N~2n—l)).
`
`k = 0, .
`
`.
`
`.
`
`, N — l,
`
`(14)
`
`(15)
`
`Therefore, the sine DFT of dimension N has been mapped into a sine DFT of N/2 and a DCT of N/4,
`at the cost of N/4 input and (N/2)—2 output additions (note that sin-DFT(k, N, x) is evaluated for
`k = 0, .
`.
`.
`, N/2 but that k : 0, N/4 and N/2 do not require any output additions). The sine DFT of N/2
`is handled in a similar fashion until the length is reduced so that the transform becomes trivial.
`
`We now focus our attention on the computation of the DCT. Using the following mapping [12]:
`
`x4(n)=x(2n),
`x4(N—n—1)=x(2n+l), n=0,...,N/2—l,
`the DCT can be evaluated as:
`N—1
`2
`4 +1 k
`DCT(k, N, x): Z x4(n)-cos(—1—n——i), k:0,..., N—l.
`
`11:0
`
`i
`
`Using basic trigonometry, (17) becomes:
`
`21Tk
`.
`21Tk
`.
`DCT(k, N, x) 2 cos TN ~cos-DFT(k, N, x4)~sm W -s1n—DFT(k, N, x4),
`
`k = 0, .
`
`.
`
`.
`
`(16)
`
`(17)
`
`, N — l.
`(18)
`
`With the symmetries of trigonometric functions, (18) can be computed as follows:
`(2%)
`. (wk)
`cos — —s1n ~—
`4N
`4N
`(217k)
`(217k)
`
`‘ [cos—DFT(k, N, 3%)]
`~
`sin~DFT(k, N, x4)
`
`’
`
`k_0
`
`"H’
`
`N/2_1
`
`] _
`[ DCT(k, N, x)
`DCT(N— k, N, x) _
`
`_
`
`sm — cos m
`4N
`4N
`
`and
`
`DCT(N/Z, N, x) = cos(1-r/4) - cos-DFT(N/2, N, x4).
`
`(19)
`
`The matrix product in (19) can be evaluated with 3 multiplications and 3 additions (instead of 4
`multiplications and 2 additions) [9] as follows:
`
`pl—COS
`
`Sln
`
`,
`
`a . (my, (age)
`
`0s 4N ,
`
`pz—sm 4N
`
`s, : cos-DFT(k, N, x4) +sin-DFT(k, N, x4),
`
`217k
`
`m1=sl - cos m,
`Signal Processing
`
`wmmmmmmmmmmmmwmmmwmmwmwm
`
`OLYMPUS EX. 1021 - 7/15
`
`OLYMPUS EX. 1021 - 7/15
`
`
`
`M. Vetterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`~ m2 = Sin-DFT(k, N, x4) - 171,
`
`m3 : cos—DFT(k, N, x4) ' P2,
`
`DCT(k, N; x) = ml — m2,
`
`DCT(N — k, N, x) x ml +m3,
`
`I
`
`'
`
`(20)
`
`where p. and p; are precomputed. Using all simplifications, the computation of (19) requires therefore a
`total of (3N/2)72 multiplications and (3 N/2)~3 additions.
`Thus, we have shown how to map a N dimensional DFT into two DCT’s of N/4 (5, 12 and 15) and
`how to map a DCT of N into a DFT of same size (16 and 19). In other words, since the DCT is computed
`through a DFT, it is shown how to compute a DFT of N with 2 DFT’s of N/4 plus auxiliary operations}
`and this operation can be repeated until N has been reduced sufficiently so as to lead to trivial transforms.
`Figures 1
`to 4 try to visualize schematically the interaction of the various transforms. Combination of
`these figures in appropriate order show how to compute a transform of any dimension.
`
`Vol. 6, No.4, August 1984
`
`DFT(N)
`
`cos-DFT(N>
`
`sm—DFT(N)
`
`Fig.
`
`l. Decomposition ofa DFT of size N into a cosine DFT of size N and a sine DFT of size N.
`
`cos—DFT(N)
`
`cos—DFTtN/2)
`
`DCT(N/LD
`
`Fig. 2. Decomposition iof a cosine DFT of size N into a cosine DFT of size, N/2 and a DCT of size N/k4.
`
`OLYMPUS EX. 1021 - 8/15
`
`
`
`272
`
`M. Venerli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`sm—DFHN)
`
`SIN—DFT(N/2)
`
`
`
`DCT(N/Ll)
`
`
`Fig. 3. Dccomposition of a sinc DFT of size N into a sine DFT of size N/2 and a DCT of size N/4.
`
`DCT(N
`
`cos—DFT(N)
`
`SIN-DFT(N)
`
`Fig. 4. Decomposition of a DCT of size N into a cosine DFT of size N and a sine DFT of size N plus auxiliary operations.
`
`The algorithm is best suited for DFT’s on real data, but if the input is complex, one simply takes the
`real and imaginary parts separately (thus doubling the computational load) and evaluates the output with
`2N—4 auxiliary adds (k = 0 and N/2 require no additions).
`
`’3. Computational complexity
`
`Even if the derivation above used only the fact that N was a multiple of 4, the algorithm, as other
`divide and conquer schemes, performs best when N is highly composite, typically a power of 2.
`Below, we restate the number of operations required for the various steps needed in the evaluation of
`the transforms, where OM[-] and OA[-] stand for the number of multiplies and adds respectively.
`Signal Processing
`
`OLYMPUS EX. 1021 - 9/15
`
`OLYMPUS EX. 1021 - 9/15
`
`
`
`M. Vetterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`DFT on length—N real data:
`
`OM[R-DFT( N)] = OM[cos-DFT(N)] + 0M [sin-DFT(N)],
`
`0A[R—DFT(N)] = OA[cos-DFT(NI)] + OA[sin—DFT(N)].
`
`DFT on length-N complex data:
`
`OM[C-DFT(N)] = 2- (OM[cos-DFT(N)]+ OM[sin-DFT(N)]),
`
`OA[C-DFT(N)] = 2 - (OA[cos-DFT(N)] + OA[sin-DFT(N)]) +2N —4.
`
`DCT on length-N real data:
`
`OM [DCT(N)] = OM[cos-DFT(N)] + OM[sin—DFT(N)] +(3 N/2)~2,
`
`OA[DCT(N)] = OA[cos-DFT(N)] + OA[sin-DFT(N)] +(3 N/2) — 3.
`
`Cosine DFT on length-N real data:
`
`OM[cos—DFT(N)] = OM[DCT(N/4)] + on; [cos‘DFT(N/ 2)],
`
`OA[cos—DFT(N)] : OA[DCT(N/4)] + OA[cos—DFT(N/2)] +(3 N/4).
`
`Sine DFT on length-N real data:
`
`OM[sin-DFT(N)] = OM[DC'I‘(N/4)] + OM[sin-DFT(N/2)],
`OA[sin-DFT(N)] = OA[DCT(N/4)]+ OA[sin-DFT(N/2)] +(3 N/4) —2.
`
`25
`
`(
`
`)
`
`OM[R-DFT(2)] = OM[C-DFT(2)] = OM[cos-DFT(2)] = OM[sin-DFT(2)] = 0,
`
`OM[DCT(2)] : 1,
`
`0A[R-DFT(2)] = OA[DCT(2)] : OA[cos-DFT(2)] = 2,
`
`OA[C-DFT(2)] : 4,
`
`OA[sin-DFT(2)] : 0.
`
`But, when N is a power of 2 and that the recursions are therefore applied Logz N times, the operation
`counts for the DCT reduces simply to:
`
`Vol. 6, No. 4, August I984
`
`, From (21)—(25) one can compute recursively the number of operations needed for the various transform
`types and sizes greater than 2. For N = 1, no operations are required, and the values for N = 2 are given
`in (26), thus defining the initial conditions for the above recursions.
`
`0M[DCT(N)]= N/2 ‘ L0g2(N),
`
`OA[DCT(N)] = N/2 - (3 Log2(N)—2) + 1.
`
`From (27) it follows immediately with (21)—(23) that:
`
`OM[R-DFT(N)]= N/2 - (Log2(N) —3) +2,
`
`OA[R-DFT(N)] = N/2 - (3 Log2(N)—5)+4,
`
`(27)
`
`(28)
`
`OLYMPUS EX. 1021 - 10/15
`
`
`
`274
`
`and
`
`M. Verterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`OM[C—DFT(N)] = N- (Log2(N)—3) +4,
`.
`OA[C-DFT(N)]= 3N- (L0g2(N)— 1) +4.
`
`The above closed form expressions can now easily be compared to existing algorithms.
`
`
`
`29
`
`(
`
`)
`
`4. Comparison with existing algorithms
`
`First, the introduced algorithm (which is called in the following fast Fourier-cosine transform or FFCT)
`is compared to other algorithms which work directly on real data. Then we look at the case where
`two transforms on real data are taken simultaneously with an optimal FFT algorithm. The case of the
`FFT on complex data is investigated as well, and the issue of algorithm structure is addressed. The
`operation counts for FFT’s are taken from [9] unless otherwise specified.
`For the computation of a DFT on real data, one can use the algorithm based on an FFT of N/2 [8].
`Using again the 3 mult/3 adds approach for the auxiliary output operations, one gets the following
`complexity:
`
`OM[FFT(N)] = OM[FFT(N/2)] +(3N/2)_2,
`
`OA[FFT(N)] = 0A[FFT(N/2)] +(9N/2)—2.
`
`(30)
`
`Table 1 compares this result with the FFCT and shows the substantial savings that are obtained.
`
` .mmmm.&mwmmawWWMWMWtMM.WMgMW)WWWa.
`
`Table 1
`
`Comparison of direct computation of a DFT on real data with a FFT of N/2 or
`the FFCT algorithm (the FFT of N/2 is computed with the Rader—Brenner algorithm)
`
`size
`FFT of N/2 +ops.
`FFCT algorithm
`
`
`N
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`20
`2
`42
`10
`8
`60
`10
`122
`26
`16
`164
`34
`290
`66
`32
`420
`98
`710
`162
`64
`1 028
`258
`1 678
`386
`128
`2 436
`642
`3 870
`898
`256
`5636
`1538
`8766
`2050
`512
`12804
`3586
`19582
`4610
`1024
`
`
`
`
`43 262 8 19410 2422 048 28 676
`
`,
`
`Turning to the DCT, one can compare the FFCT to fast discrete cosine algorithms which perform the
`transform directly, without going to the FFT (and therefore to the need to perform 2 transforms on real
`data at the same time). The various operation counts are shown in Table 2. The number of multiplications
`for the FFCT is always below the other algorithms, and even if the number of additions is slightly above
`the Chen et a]. version, the total number of operations is always substantially less.
`Signal Processing
`
`mamxmmmmemwmxmmmmwmwmmmmmwmm“888mm
`
`OLYMPUS EX. 1021 - 11/15
`
`OLYMPUS EX. 1021 - 11/15
`
`
`
`M. Vetlerli, H. J. Nussbaumer/ Simple FFT and DCTAIgorirhms
`
`Table 2 '
`
`Comparison of direct computation of a DCT on real data with the algorithms of Chen et a]. [13], Wang er a1. [14]
`and the FFCT (the operation counts for the two first algorithms are taken from [14])
`
`
`
`
`
`
`
`size FFCT Chen et a1. . Wang et a1.
`
`
`
`
`
`
`
`
`
`r. mults r. adds r. adds r. multsN r. adds
`
`29
`12
`29
`13
`26
`16
`8
`81
`32
`83
`35
`74
`44
`16
`209
`80
`219
`9|
`194
`116
`32
`513
`192
`547
`227
`482
`292
`64
`1217
`448
`1315
`547
`1154
`708
`128
`2 817
`1 024
`3 075
`l 283
`2 690
`l 668
`256
`6 401
`2 304
`7 043
`2 947
`6146
`3 844
`512
`14 337
`5 I20
`15 875
`6 659
`13 826
`8 708
`1024
`
`
`
`
`
`
`19460 30 722 14 851 35 331 112642048 31745
`
`Vol. 6, No. 4, August 1984
`
`
`
`
`
`The other general approach to transforms on real data is to compute simultaneously the transform of
`two real sequences. Beside the drawback that one has always to compute two transforms (otherwise the
`approach is suboptimal), the sorting of the output requires about N output additions. The operation count
`for a DFT on real data are given below:
`
`OM[R-DFT(N)] = 1/2 - OM[FFT(N)],
`
`OA[R-DFT(N)]=1/2' OA[FFT(N)] + N —2.
`
`(31)
`
`Table 3 compares (31) with the FFCT, and, as can be seen, the multiplication count is identical while a
`saving of about 30% is made with regard to additions.
`The situation is quite similar when computing a DCT with an FFT [12]. It is assumed that 2 DCT’s on
`real data are evaluated at the same time. Thus, the operation count is:
`
`OM[R-DFT(N)]=1/2 - OM[FFT(N)]+(3N/2)—2,
`OA[R~DFT(N)]= 1/2- OA[FFT(N)]+(5N/2)—5.
`
`Table 3
`
`Comparison of the computation of a DFT on real data with an FFT of N on two
`data sequences simultaneously and output additions or the FFCT algorithm (the
`FFT of N is computed with the Radar—Brenner algorithm)
`
`(32)
`
`
`
`size FFCT algorithm FFT of N+adds
`
`
`
`
`
`
`
`
`
`real mults real adds real multsN real adds
`
`
`
`32
`88
`
`2
`10
`
`98
`
`20
`60
`
`
`
`OLYMPUS EX. 1021 - 12/15
`
`
`
`
`
`276
`
`M. Vellerli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`A technique similar to the one used in (23) was used in the computation of the output of the DCT. The
`results are compared in Table 4 where it is seen that the additions are reduced in the FFCT by about
`25% with an identical number of multiplies.
`
`Table 4
`
`Comparison of the computation of a DCT on real data with the algorithm using
`an FFT on two data sequences simultaneously or the FFCT (the FFT is the
`Rader—Brenner version)
`
`
`
`size
`
`N
`
`FFT of N +ops.
`
`FFCT algorithm
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`29
`12
`41
`12
`8
`81
`32
`109
`32
`16
`209
`80
`287
`80
`32
`513
`192
`707
`192
`64
`1 217
`448
`1 675
`448
`128
`2 817
`1 024
`3 867
`1 024
`256
`6401
`2304
`8763
`2304
`512
`14337
`5120
`19 579
`5120
`1024
`31 745
`11 264
`43 259
`11 264
`2 048
`
`
`Turning to the computation ofa DFT on a complex input, one can use (29) for the FFCT. The comparison
`to the Rader—Brenner FFT is given in Table 5 where the number of multiplies turns out to be identical
`while the number of additions is reduced by nearly 20%.
`Looking at last at the Winograd Fourier transform, one sees that even if the number of multiplies is
`larger in the FFCT case, the total number of operations is smaller for large transforms (for example 5%
`for the WFTA(504)/FFCT(512) and 9% for the WFTA(1008)/FFCT(1024) comparison). Note that the
`gain is less in the PFA case (5% and 2% respectively).
`Concerning the WFTA and the PFA, one recalls that its main drawback is the involved structure of the
`algorithm. Even if the structure of the FFCT is not as straightforward as the radix-2 FFT,
`it remains
`simple (it is similar to the structure of the Rader—Brenner algorithm). In the FFCT, the reduction in the
`number of operations is obtained at the cost of additional permutations and data transfers. Even if these
`
`Table 5
`
`Comparison ofthe Rader—Brenner FFT with the FFCT when applied to complex data
`
`
`
`size
`
`N
`
`FFT of N
`
`FFCT algorithm
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`52
`4
`52
`4
`8
`148
`20
`148
`20
`16
`388
`68
`424
`68
`32
`964
`196
`l 104
`196
`64
`2308
`516
`2720
`516‘
`128
`5 380
`1 284
`6 464
`,1 284
`256
`12292
`3076
`14976
`3076
`512
`27 652
`7 172
`34 048
`7 172
`1024
`
`
`
`
`16 38876 28816 3882 048 61444
`Signal Processing
`
`OLYMPUS EX. 1021 - 13/15
`
`OLYMPUS EX. 1021 - 13/15
`
`
`
`M. Velterli, H. J. Nussbaumer/ Simple FFT and DCTAIgurillnns
`
`277
`
`permutations can be grouped with previous ones (for example at the entry of the DCT), the resulting
`structure is more complex. But this increase in topological complexity is annihilated by the substantial
`decrease in arithmetic complexity.
`Looking at the implementation, one sees the importance of using very eflicient small transforms. Together
`With the explicit coding technique [16], this should lead to fast code for real transforms, especially on
`micro and signal processors, where the number of arithmetic and data registers is small (leading to
`uneflicient implementations of the complex transform on two real sequences version).
`
`5. Concluding remarks
`
`We have introduced a simple, recursive algorithm for the computation of the discrete Fourier transform
`and the discrete cosine transform. First, this algorithm can be applied directly to real data, providing
`therefore an attractive alternative to the method using a complex transform oftwo real sequences. Secondly,
`it uses the same number of multiplies as the best structured FFT algorithms but decreases the number of
`additions by about 25 to 30% for DFT’s and DCT’s on real data and by nearly 20% for DFT’s on complex
`data when compared to currently used algorithms. At last, its structure is simple enough so that it should
`lead to efficient implementations.
`It is worth noting that the proposed algorithm, while leading to the same computational complexity
`for the FFT as the one in [20], is not isomorphic. Interestingly, the algorithm in [20] has been around for
`15 years but was seldom referenced and even less used. Meanwhile, literature was published on amelior-
`ations of the Cooley—Tukey FFT which leads to less efficient algorithms than the one in [20]... . More
`recently, actually just as the final copy was being sent to the publisher, yet another algorithm appeared
`that leads to the same complexity for the complex FFT [21].
`Investigations are under way in order to generalize the above ideas to other sinusoidal transforms and
`to higher dimensions, as well as to prove the efficiency of an implementation.
`I
`
`Vol. 6, No. 4, August 1984
`
`Acknowledgements
`
`The authors would like to thank one of the anonymous reviewers for bringing [20] and [21] to their
`attention as well as the “Fonds National Suisse de la Recherche Scientifique” for supporting this research.
`
`References
`
`[l] C. Runge, and H. Konitz, Vorlesungen iiber Numerisches Reclinen, Springer Verlag, Berlin, 1924.
`[2] J.W. Cooley, and J.W. Tukey, “An algorithm for the machine calculation of complexfourier series", Math. of Comput., Vol.
`19, pp. 297—301, April 1965.
`[3] R. Singleton, “An algorithm for computing the mixed radix fast fourier transform“, IEEE Trans. on Audio. and Electraacoustics,
`Vol. AU—l7, pp. 93—103, June 1969.
`[4] CM. Rader, and N.M. Brenner, “A new principle for fast fouricr transfoimation”, IEEE Trans. Acoust., Speech, Signal Processing,
`Vol. ASSP-24, pp. 264—265, June 1976.
`»
`[5] G. Bruun, “z—transform DFT filters and FFT’s”, IEEE Trans. Acoust, Speech, Signal Processing, Vol. ASSP-26, pp. 56—63, Feb.
`1978.
`[6] S. Winograd, “On computing the discrete fouricr transform”, Proc.