`
`x<fte‘5=
`
`HUAWEI EX. 1021 -1/15
`
`
`
`SIGNAL PROCESSING
`
`CONTENTS
`
`Volume 6 (1984) No. 4
`
`Regular Papers
`
`M. Vetterli and H. Nussbaumer, Simple FFT and DCTalgorithms with reduced number
`of operations
`
`J.B. Martens, Convolution algorithms, based on the CRT (Chinese Remainder
`Theorem)
`
`T. Gtingen and N. C. Gegkini, An algorithm for formant tracking
`
`267
`
`279
`
`293
`
`E.
`
`|. Plotkin, L. M. Roytman and N. M.S. Swamy, Unbiased estimation of an initial pnase 301
`
`F. F. Liedtke, Computer simulation of an automatic classification procedure for
`digitally modulated communication signals with unknown parameters
`
`Short Communications
`
`G.H. Allen, Programming an efficient radix-four FFT algorithm
`
`J. Le Roux, Non symmetric lattice structure for pole zero filters
`
`Book Review
`
`Announcements + Cail for Papers
`
`Author Index
`
`317
`
`325
`
`331
`
`337
`
`339
`
`343
`
`
`
`seoraraneonemcanibeneTOSNNTetg—nmrnnnaeei
`
`JEP > & 984
`
`LIBRARY
`
`HUAWEI EX.1021 -2/15
`
`HUAWEI EX. 1021 -2/15
`
`
`
`SIGNAL PROCESSING
`
`t
`
`a
`
`A Européan Journal devoted to the methods and applications of signal processing
`A publication of the European Association for Signal Processing (EURASIP)
`O. D. Faugeras (Paris, France)
`Editor-in-Chief
`A. L. Fawe (Sart Tilman, Belgium)
`Murat Kunt
`A. Fettweis (Bochum, West Germany)
`Laboratoire de traitement de signaux
`J. G. Gander (Solothurn, Switzerland)
`Ecole polytechnique fédérale de Lausanne
`C, Gazanhes (Marseille, France)
`16, Chemin de Bellerive
`D. Godard (La Gaude, France)
`CH - 1007 Lausanne Switzerland
`G.H. Granlund (Linkoeping, Sweden)
`Tel: 021 47 26 26 Telex: 24478
`C. Gueguen (Paris, France)
`Editorial Board
`T. 8, Huang (West Lafayette, USA)
`M. Bellanger (Le Plessis Robinson, France)
`Z. Kulpa (Warsaw, Poland)
`R. Boite (Mons, Belgium)
`J. L. Lacoure (Grenoble, France)
`C. Braccini (Genova, Italy)
`M. A. Lagunan-Hernandes
`V. Cappellini (Florence, Ilaty)
`(Barcelona, Spain)
`G. Carayannis (Athens, Greece)
`D. S. Lebedev (Moscow, USSR)
`A. G. Constantinides (London, U.K.)
`S. Levialdi (Napoli, Italy)
`F. de Coulon (Lausanne, Switzerland)
`GC. E. Liedtke (Hannover, West Germany)
`T.S, Durrani (Glasgow, U.K.)
`J. Max (Grenoble, France)
`Ww. Endres (Darmstadt, West Germany)
`R. De Mori (Montreal, Ganada)
`B. Escudié (Lyon, France)
`
`M. Nagao (Kyoto, Japan)
`H. Niemann (Erlangen, West Germany)
`F. Pellandini (Neuchatel, Switzerland)
`B. Picinbono (Gif-sur-Yvette, France)
`B.S. Ramakrishna (Bangalore, India)
`F. Rocca (Milano, Italy)
`B. Sankur (istanbul, Turkey)
`L.L. Scharf (Kingston, USA)
`J. C. Simon (Paris, France)
`C. Y. Suen (Montreal, Canada)
`Ju-Wei Tai (Peking, China)
`H. Tiziani (Heerbrugg, Switzerland)
`D, Wolf (Frankfurt, West Germany)
`G. Winkler (Karlsruhe, West Germany)
`|. T. Young (Delft, The Netherlands)
`P. Zamperoni (Braunschweig, West
`Germany)
`Secretary: |. Bezzi (Miss)
`
`
`Data Processing
`Editorial Policy. SIGNAL PROCESSING is a European Journal
`Remote Sensing
`presenting the theory and practice of signal processing.
`Its
`Communication Signal Processing
`primary objectives are:
`Biomedical Signal Processing
`Dissemination of research results and of engineering develop-
`Geophysical and Astrophysical Signal Processing
`ments to European signal processing groups and individuals;
`Earth Resources Signal Processing
`Presentation of practical solutions to current signal processing
`Acoustic and Vibration Signal Processing
`problems in engineering and science.
`Signal Processing Technology
`The editorial policy and the technical content of the Journal are
`Speech Processing
`the responsibility of the Editor-in-Chief and the Editorial Board,
`Radar Signal Processing
`The Journal is self-supporting from subscription income and
`Industrial Applications
`contains a minimum amount of advertisements. Advertisements
`are subjectto the prior approval of the Editor-in-Chief
`
`Sonar Signal Processing
`Special Signal Processing
`New Applications
`
`Scope. SIGNAL PROCESSING incorporatesall aspects of the
`theory and practice of signal processing (analogue and digital).
`It features original research work, tutorial and review articles,
`and accounts of practical developments. It is intended for a
`rapid dissemination of knowledge and experience to engineers
`and scientists working in signal processing research, develop-
`ment or practical application.
`
`Subject coverage. Subject areas covered by the Journalinclude:
`Signal Theory
`Signal Processing Systems
`Stochastic Processes
`Software Developments
`Detection and Estimation
`Image Processing
`Spectral Analysis
`Pattern Recognition
`Filtering
`Optical Signal Processing
`
`Membership and Subscription information.
`Signal Processing (ISSN 0165-1684) is published in two volumes
`(issues) ayear. The subscription price for 1984 (Volumes6,7) is
`Dfl, 350,00 + Dfl, 42.00 p.p.h. = Dfi. 392.00. Our p.p.h. (postage,
`packing and handling) charge includes surface delivery ofall
`issues, except to subscribers in the USA/Canada and India, who
`receive allissuesby air delivery (S.A.L.- Surface AirlLifted) atno
`extracost. Forthe rest ofthe world, airmailand S.A.L. chargesare
`available upon request. Claims for missing issues will he
`honoured free of charge within three months after the publication
`date of the issues. Mail orders andinquiries to: Elsevier Science
`Publishers B.V., Journals Department, P.O. Box 214, 1000 AE
`Amsterdam, The Netherlands. For full membership information
`of the Association, including a subscription at a reduccdrate,
`please contact: EURASIP, P.O. Box 134, CH-1000 Lausanne13,
`Switzerland.
`
`
`© 1984, Elsevier Science Publishers B.V. (North-Holland)
`Alt rights reserved. No part of this publication may be reproduced, stored ina retrieval system or transmitted in any form or by any means.electronic, mechanical,
`photocopying, recording or otherwise, without the prior permission of the publisher, Elsevier Science Publishers 8.V. (North-Holland), P.O. Box 1991, 1000 BZ
`Amsterdam, The Netherlands.
`Submission of an article for publication implies the transfer of the copyright from the author(s) to the publisher and entaiis the author's irrevocable and exclusive
`authorization of the pubtisherto collect any sumsor considerations for copying or reproduction payable by third parties (as mentionedin article 17 paragraph 2 of the
`Dutch Copyright Act of 1912 andin the Royal Decree of June 20, 1974 (S. 351) pursuantto article 166 of the Dutch Copyright Act of 1912) and/orto actin or out of Courtin
`connection therewith.
`Special regulations Jor reaciers in the U.S.A.-lhis journal has beenregistered with the Copyright Clearance Center, inc. Gonsentis given for copying of articles for
`personalor internal use,or for the personal use of specific clients. This consentis given on the condition that the copier paysthrough the Centerthe per-copy fee staledl in
`107 or 108 of the U.S. Copyright Law. The appropriate fee should be forwarded with
`the code onthefirst page of each articie for copying beyond that permitted by Sections
`gress Street, Salem, MA 01970, U.S.A. tino code appearsinan article, the author has
`a copyofthe first pageof the article to the Copyright Clearance Center, Inc., 21 Con
`not given broad consentto copy and permission tu copy must be obtained directly from the author. All articles published prior to 1981 may be copied for a per-copy fee of
`US $2.25, also payable through the Center.
`(N.B. For review journalsthis feeis $0.25 per copy per page.) This consent doesnotextendta other kinds of capying, such as
`for generaldistribution, resale, advertising and promotion purposes, or for creating new collective works. Special written permission must be obtained from the publisher
`for such copying.
`Specialregulations for authors in the U.S.A. - Upon acceptance of an article by the journal, the author(s) will be asked to transfer copyrightofthe article to the publisher
`This transfer will ensure the widest possible dissemination of Information under the U.S. Copyright Law
`
`Published bimonthly
`
`Printed in The Netherlands
`
`HUAWEI EX. 1021 -3/15
`
`
`
`HUAWEI EX. 1021 -3/15
`
`
`
`
`This material may be protected by Copyrightlaw (Title 17 U.S. Code)
`
`
`
`Signal Processing 6 (1984) 267-278
`North-Holland
`
`,
`
`267
`
`SIMPLE FFT AND DCT ALGORITHMS WITH REDUCED NUMBER OF
`OPERATIONS
`
`{
`Martin VETTERLI, member EURASIP, and Henri J..ANUSSBAUMER
`Laboratoire d’Informatique Technique, Ecole Polytechnique Fédérale de Lausanne, 16 Chemin de Bellerive, CH-1007 Lausanne,
`Switzerland
`
`Received 2 November 1983
`Revised 20 February 1984
`
`Abstract. A simple algorithm for the evaluation of discrete Fourier transforms (DFT):and discrete cosine transforms (OCT)
`is presented. This approach, based on the divide and conquer technique, achieves a substantial decrease in the number of
`additions when compared to currently used FFT algorithms (30% for a DFT onreal data, 15% for a DFT on complex data
`and 25% for a DCT) and keeps the same numberof multiplications as the best known FFT algorithms. The simple structure
`of the algorithm and the fact that it is best suited for real data (one does not have to take a transform of two real sequences
`simultaneously anymore) should lead to efficient implementations and to a wide range of applications.
`
`Zusammenfassung. Ein einfacher Algorithmus zur Berechnung von diskreten Fourier Transformationen (DFT) und diskreten
`Cosinus Transformationen (DCT) wird vorgeschlagen. Diese Methode, basierend auf der “Teilen und Lésen” Technik, erlaubt
`eine Verkleinerung der Anzahi Additionen gegeniiber gebraiichlichen FFT Algorithmen (30%fiir eine DFT von einem reellen
`Signal, 15%fiir eine DFT von einem compicxen Signal und 25%fiir cine DCT) und braucht gleichvicl Multiplikationen wic
`die besten bekannten FFT Algorithmen. Die einfache Struktur des Algorithmus und der Fakt dass er am besten fiir reelle
`Signale geeignet
`ist (man braucht nicht mehr gleichzeitig zwei reelle Signale zu transformieren) sollten zu effizienter
`Implementierung und zu zahlreichen Applikationen fiihren.
`
`Résumé. Unalgorithme simple pour l’évaluation de la transformée de Fourier discréte (DFT) et de la transformée en cosinus
`discréte (DCT) est proposé. Ceite approche, basée sur la méthode de la “division et solution”, permet une diminution
`substantielle du nombre d’additions par rapport aux algorithmes de FFT courants (30% pour une DFT de signaux réels,
`15% pour une DFT de signaux complexes et 25% pour une DCT)tout en gardant un nombre de multiplications égal 4 celui
`des meilleurs algorithmes de FFT connus. La structure simple de l’algorithme ainsi que le fait qu’il s’applique bien aux
`signaux réels (il n’y a plus besoin de prendre la transformée de deux signaux réels simultanément) devraient conduire 4 une
`implantation efficace ainsi qu’a un large champ d’applications.
`
`Keywords. Fast Fourier transform, fast cosine transform, transforms of real data.
`
`1. Introduction
`
`Since the rediscovery of the fast Fourier transform (FFT) algorithm [1, 2] for the evaluation of discrete
`Fourier transforms, several improvements have been madeto the basic divide and conquer schemeas for
`example the mixed radix FFT [3] and the real factor FFT [4, 5]. The introduction of the Winograd Fourier
`transform (WFTA) [6]; although a beautiful result in complexity theory, did not bring the expected
`improvements once implemented onreal life computers [7], essentially due to the large total number of
`operations and to the structural complexity of the algorithm.
`The fact that most FFT’s are taken on real data is seldom fully taken into account. The algorithm using
`a FFT ofhalf dimension for-the computation of a DFT on a real sequence [8] uses substantially more
`operations than the method of computing a single FFT on two real sequences simultaneously [9]. The
`
`0165-1684/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
`
`HUAWEI EX. 1021 -4/15
`
`
`
`(MidasNAiarolesCaloASakgs
`
`
`
`
`268
`M. Vetterli, H. J. Nussbaumer { Simple FFT and DCT Algorithms
`latter method has the disadvantage that one has to take two DFT’s at once and that the sorting ofthe
`output uses additional adds. The fact that the input and output sequences arerealis used explicitly in a
`real convolution algorithm [10] where the DFT and inverse DFT are computed with a single complex FFT.
`Another transform that is mostly applied to real data is the discrete cosine transform. Since the
`introduction of the DCT [11], the search for a fast algorithm followed two main different approaches.
`One was to compute the DCT through a FFT of same dimension [12], where one is bound to take two
`transforms simultaneously. The other was a direct approach, leading to rather involved algorithms[13].
`It should be noted that the former technique outperformsall the latter ones when using optimal FFT’s,
`a fact often left in the dark [14].
`Recently, evaluation of signal processing algorithms has shifted away from multiplication counts alone
`to the counting of the total number of operations, including data transfers [15]. This is due to the fact
`that the ratios (multiplication time)/ (addition time) and (multiplication time)/(load time) are close to one
`on most computers and signal processors. Another growing concern has been the generation of time
`efficient software [16], and finally, the efficiency of an algorithm turnsout to be a non-trivial combination
`of the various operation counts as well as ofits structural complexity [17].
`In this communication, we address an old problem, namely, the efficient evaluation of DFT’s and DCT’s
`of rcal data. Efficiency is meant in the sense of minimal numberof multiplications and additions as well
`as in the senseof structural simplicity. As it turns out, the two problems are closely related, since a DFT
`of dimension N can be evaluated with two DCT’s ofsize N/4 and since a DCT of size N can be evaluated
`with a DFT ofsize N and additional operations. The same technique can be applied again to the reduced
`DCTof size N/4 and to the DFT ofsize N, and this until only trivial transformsare left over (N = 1, 2).
`This leads to an elegant recursive formulation of the two algorithms and to a numberof multiplications
`identical
`to the best FFT’s while diminishing substantially the number of additions (typically 30%).
`Interestingly, this last saving is partly kept when computing complex DFT’s, and as an example, the total
`numberof operations for a 1024-point transformis nearly 10% below the numberof operations required
`for a 1008-point WFETA. The prime factor FFT (PFA) requires about the same numberof operations [18],
`but has a more complex structure.
`Note that the algorithms below were developed while searching for an efficient way to compute DCT’s
`of real data. The derived FFT algorithm for real data that follows immediately requires a number of
`multiplications identical to the one found in [19] (which is a variation of the Rader—Brenner algorithm),
`and a total numberof operations that can be found in [20]. While obtaining an identical complexity, the
`derivations are quite different and the algorithm below seems more suitable for programming.
`Section 2 is used to derive the general algorithm and Section 3 evaluatesits computational complexity.
`In Section 4, the results are compared to otheralgorithms and some implementation considerations are
`addressed.
`
`2. Derivation of the algorithms
`
`Let us define the following transforms ofthe Iength- N real vector x with elements x(0), x(1) +++ xCN— 1):
`
`Discrete Fourier transform
`N-1
`.
`DFI(k, Nox)= Y x(ny eoPN, k=0,...,N—1,
`n=0
`
`(1)
`
`where j= 4V=1,
`Signal Processing
`
`HUAWEI EX. 1021 -5/15
`
`HUAWEI EX. 1021 -5/15
`
`
`
`M. Vetterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`Discrete cosine transform
`
`DCT(k, N, x= ¥. x(n) cas(2ATUE), k=0,...,N-—1.
`
`Qa(2n +1)k
`
`AN
`
`NI
`
`n-0
`
`269
`
`(2)
`
`(3)
`
`(4)
`
`Cosine DFT
`
`cos-DFT(k, N,x):= ¥ x(n) cos( ny ) k=0,...,N=1.
`
`N-\
`n=
`
`
`Qarnk
`
`Sine DFT
`
`sin-DFT(k, N,x)'= ¥ x(n): sin vy ), k=0,...,N-1.
`
`N=!
`n=0
`
`
`2
`k
`
`Note that in the definition of the DCT, the normalizing factor 1/V2 at k=O is omitted for simplicity.
`While the transforms are usually only defined for k=0,..., N—1, one will see that other values are useful
`as well and caneasily be obtained from the ones defined above.
`Obviously, the following relations hold:
`
`DFT(k, N, x) =cos-DFT(k, N, x)—j sin-DFT(k, N, x),
`
`DCT(N, N, x) =0,
`
`DCT(-k, N, x)= DCT(k, N, x),
`
`DCTQN — k, N,x)=—DCT(k, N, x),
`
`cos-DFT(N — k, N, x) =cos-DFT(k, N, x),
`
`sin-DFT(N — k, N, x) =—sin-DFT(k, N, x).
`
`(5)
`
`(6)
`
`(7)
`
`(8)
`
`(9)
`
`(10)
`
`Looking at the evaluation of cos-DFT(k, N, x), we note that since the cosine function is even:
`
`N/2-1
`2
`syed In +1)+x(N—2n—-1)): (ao) i
`cos-DFT(k, N,x)= X x(2n)- cos( an
`N/2
`XY
`(xn +1)
`+x(
`n—1))+ cos
`»
`dl)
`0
`n=0
`
`4-N/4
`
`or, in a more succinct form:
`
`cos-DFT(k, N, x)= cos-DFT(k, N/2,x,)+DCT(k, N/4, x), k=0,...,N-1,
`
`with x,(n)=x(2n),
`
`n=0,...,N/2-1,
`
`x,(n)=x(2n+1l)+x(N—-2n-1), n=0,...,N/4-1,
`
`(12)
`
`where (8) or (9) are applied when necessary.
`Hence, the cosine DFT of dimension N has been mapped into a cosine DFT of N/2 and a DCT of
`N/4, at a cost of N/4 input and N/2 output additions (note that cos-DFT(k, N,x) is evaluated for
`K=0,...,N/2 but that k= N/4 does not require an output addition). The cosine DFT of N/2 can be
`handled similarly (and this until the transform becomestrivial) and the case of the DCT will be treated
`below.
`.
`Turning to the sine DFT, we take'a similar approach, using the fact that the sine function is odd.
`N/2-1
`N/4-1
`
`n=
`n=
`
`sin-DFT(k, N,x)= ¥ x(2n)- sin(=) + ¥ (x@nt+1)-x(N-2n-D)- sin(
`
`4
`2a +R)
`(13)
`4+N/4
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX. 1021 -6/15
`
`HUAWEI EX. 1021 -6/15
`
`
`
`270
`
`M. Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`Using the following identity:
`
`sin(ment bt) ey. coa(22" + Neva)
`
`we can rewrite (13), using (8) or (10) when necéssary, as:
`
`sin-DFT(k, N, x) =sin-DFT(k, N/2, x) +DCT(N/4-k, N/4,x,),
`
`k=0,...,N-—-1I,
`
`with x3(7)=(-1)"+ (x(2n +1)-—x(N—-2n—1)).
`
`4)
`
`(15)
`
`Therefore, the sine DFT of dimension N has been mapped into a sine DFT of N/2 and a DCT of N/4,
`at the cost of N/4 input and (N/2)—2 output additions (note that sin-DFT(k, N, x) is evaluated for
`k=0,..., N/2 but that k=0, N/4 and N/2 do not require any output additions). The sine DFT of N/2
`is handled in a similar fashion until the length is reduced so that the transform becomestrivial.
`We now focus our attention on the computation of the DCT. Using the following mapping[12]:
`
`x4() = x(2n),
`
`Xxa(N —n—-l=x(2n+1), n=0,...,N/2—-1,
`the DCT can be evaluated as:
`
`DCT(k, N,x)= Yo x,(n)- cos(me k=0,...,N-1.
`
`2n(4n +1)k
`
`4N
`
`.
`
`N~-l
`
`n=0
`
`(16)
`
`G7)
`
`
`
`
`|
`
`|1
`|
`
`|
`
`Using basic trigonometry, (17) becomes:
`
`DCT(k, N, x)= cos(2 - cos-DFT(k, N, x,)—sin at) -sin-DFT(k, N,x,), k=0,...,N-1.
`
`2k
`4N
`
`2ak
`4N
`
`With the symmetries of trigonometric functions, (18) can be computed as follows:
`
`osf eX)
`
`c (=)
`|-
`DCT(kK, N, x)
`SVAN
`DCT(N-kN xd |. (=*)
`
`sin, —__
`4N
`
`inf
`
`2™
`
`in(27)
`SIVAN iraven k=O
`(224)
`-L
`sin-DFT(k, N, xa)’
`
`cos
`
`ee
`4N
`
`N/2-1
`
`ur
`
`and
`
`DCT(N/2, N, x) =cos(m/4) - cos-DFT(.N/2, N, x4).
`
`(19)
`
`The matrix product in (19) can be evaluated with 3 multiplications and 3 additions (instead of 4
`multiplications and 2 additions) [9] as follows:
`
`_
`(225) +sin(37)
`Pr CONAN) VAN)?
`- (=*)-
`(3)
`P2=sin AN
`cos 4N :
`
`5; = c0s-DFT(k, N, x4) +sin-DFT(k, N, x4),
`
`m= s,-cos| 7 J,
`Signal Processing
`
`-cos(224)
`
`=,
`
`g 1
`
`q ; :::
`
`i
`
`HUAWEI EX. 1021 -7/15
`
`HUAWEI EX. 1021 -7/15
`
`
`
`M.Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`271
`
`m, =sin-DFT(k, N, X4)° Pi,
`
`m; = cos-DFT(k, N, x4) * Pa,
`DCT(k, N,x)=m,—m,
`DCT(N —k, Nx)=m, +m,
`
`.
`
`(20)
`
`where p, and p, are precomputed. Using all simplifications, the computation of (19) requires therefore a
`total of G@N/2)—2 multiplications and (3N/2)--3 additions.
`Thus, we have shown how to map a N dimensional DFT into two DCT’s of N/4 (5,12 and 15) and
`how to map a DCTof N into a DFT of samesize (16 and 19). In other words, since the DCT is computed
`through a DFT,it is shown how to compute a DFT of N with 2 DFT’s of N/4 plus auxiliary operations,
`and this operation can be repeated until N has been reduced sufficiently so as to lead to trivial transforms.
`Figures |
`to 4 try to visualize schematically the interaction of the various transforms. Combination of
`these figures in appropriate order show how to compute a transform of any dimension.
`
`DFT CN)
`
`cos-DFT(N)
`
`Fig. 1. Decomposition of a DFT of size N into a cosine DFT of size N and a sine DFT ofsize N.
`
`sin-D FT ON)
`DCT CNA)
`
`cosDFT (CN)
`
`cos FT (N2>
`
`Fig. 2. Decomposition ‘of a cosine DFT ofsize N into a cosine 1)FT of size. N/2 and a DCTof size N/4.
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX. 1021 -8/15
`
`HUAWEI EX. 1021 -8/15
`
`
`
`272
`
`M. Vetterli, H. J. Nussbaumer { Simple FFT and DCT Algorithms
`
`sinD FT CN)
`
`sin-D FT CN/2 )
`
`
`
`DCT CNA)
`sINDFT ON)
`
`Fig. 3. Decomposition of a sine DFT of size N into a sine DFT of size N/2 and a DCTof size N/4.
`
`DCT O(N)
`
`cosDFT«(N)
`
`Fig. 4. Decomposition of a DCT of size N into a cosine DFT of size N and a sine DFT ofsize N plus auxiliary operations.
`
`The algorithm is best suited for DFT’s on real data, but if the input is complex, one simply takes the
`real and imaginary parts separately (thus doubling the computational load) and evaluates the output with
`2N —4 auxiliary adds (k =0 and N/2 require no additions).
`
`‘3. Computational complexity
`
`Even if the derivation above used only the fact that N was a multiple of 4, the algorithm, as other
`divide and conquer schemes, performs best when N is highly composite, typically a powerof 2.
`Below, we restate the numberof operations required for the various steps needed in the evaluation of
`the transforms, where Oy[+] and O,|:] stand for the numberof multiplies and adds respectively.
`Signal Processing
`
`HUAWEI EX. 1021 -9/15
`
`HUAWEI EX. 1021 -9/15
`
`
`
`M. Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`DFT on length-N real data:
`
`Ox [R-DFT(N)] = O,;[cos-DFT(N)] + Oy; [sin- DFT(N)],
`
`O,[R-DFT(N)]= O,[cos-DFT(N)] + Oa[sin-DFT(N)].
`DFT on length-N complex data:
`.
`
`Om {C-DFT(N)]=2 - (Oy[cos-DFT(N)] + Oy,[sin-DFT(N)]),
`
`O,[C-DFT(N)]=2 : (O4[cos-DFT(N)] + O,[sin-DFT(N)]) +2N —4.
`
`DCT on length-N real data:
`
`Ou {[DCT(N)] = Oy[cos-DFT(N)] + Ov [sin-DFT(N)]+3N/2)-2,
`
`O,[DCT(N)] = O,[cos-DFT(N)] + O,[sin-DFT(N)]+@GN/2)—3.
`
`Cosine DFT on length-N real data:
`
`Oy [cos-DFT(N)] = Ou[DCT(N/4)] + Oyj[cos-DFT(N/2)],
`O,[cos-DFT(N)] = O4[DCT(N/4)]+ O4[cos-DFT(N/2)]+(3.N/4).
`
`Sine DFT on length-N real data:
`
`273
`
`@)
`
`(22)
`
`(23)
`
`24
`
`4)
`
`On(sin-DFT(N)] = O,,[DCT(N/4)] + Oy, [sin-DFT(.N/2)],
`
`>)
`O,[sin-DFT(N)] = O,[DCT(N/4)] + O,[sin-DFT(N/2)]+(3.N/4)—2.
`From (21)-(25) one can compute recursively the number of operations needed for the various transform
`types and sizes greater than 2. For N =1, no operations are required, and the values for N =2 are given
`in (26), thus defining the initial conditions for the above recursions.
`
`25
`
`Om[R-DFT(2)] = Om[C-DFT(2)] = Ou [cos-DFT(2)]= Ou [sin-DFT(2)]= 0,
`
`Om{DCT(2)]= 1,
`
`O,{R-DFT(2)] = O4{DCT(2)] = O4{cos-DFT(2)] =2,
`
`(26)
`
`O,[C-DFT(2)]=4,
`
`O,[sin-DFT(2)]=0.
`
`But, when N is a power of 2 and that the recursions are therefore applied Log, N times, the operation
`counts for the DCT reduces simply to:
`
`Om[DCT(N)] = N/2 + Log, (N),
`
`O,[DCT(N)]= N/2-@ Logo(N)—2) +1.
`
`(27)
`
`From (27) it follows immediately with (21)-(23) that:
`[ N/2>(Log,(N)—3)(N)]= (28)
`
`
`
`On [R-DFT(N)]= N/2 +
`(Logo(N)—3) +2,
`O,[R-DFT(N)]= N/2- (3 Log,(N)—5) +4,
`Val. 6, No. 4, August 1984
`
`HUAWEI EX.1021 -10/15
`
`HUAWEI EX. 1021 -10/15
`
`
`
`
`eteeaamesSSDSihese
`
`
`‘einonasionalurmemamsomeanaiepticratnnseaesinearerreereenonomMaNieanonemeamareilst
`
`||||
`
`32
`
`274
`and
`
`M. Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`Om[C-DFT(N)] = N- (Log,(N)—3)+4,
`O,[C-DFT(N)]=3N: (Logo(N)— 1) +4.
`The above closed form expressions can now easily be compared to existing algorithms.
`
`29
`”
`
`4. Comparison with existing algorithms
`
`First, the introduced algorithm (whichis called in the following fast Fourier-cosine transform or FFCT)
`is compared to other algorithms which work directly on real data. Then we look at the case where
`two transforms on real data are taken simultaneously with an optimal FFT algorithm. The case of the
`FFT on complex data is investigated as well, and the issue of algorithm structure is addressed. The
`operation counts for FFT’s are taken from [9] unless otherwise specified,
`For the computation of a DFT on real data, one can use the algorithm based on an FFT of N/2 [8].
`Using again the 3 mult/3 adds approach for the auxiliary output operations, one gets the following
`complexity:
`
`Om[EFT(N)] = Ou[FFT(N/2)]+(3.N/2)—2,
`
`Oa[FFT(N)] = O,[FFT(N/2)]+(9N/2)—2.
`
`60
`
`Table | compares this result with the FFCT and shows the substantial savings that are obtained.
`
`Table [
`
`Comparisonof direct computation of a DFT on real data with a FFT of N/2 or
`the FFCTalgorithm (the FFT of N/2is computed with the Rader—-Brenner algorithm)
`size
`FFT of N/2 +ops.
`FFCTalgorithm
`
`
`N
`real mults
`real adds
`real mults
`real adds
`
`
`20
`2
`42
`10
`8
`60
`10
`122
`26
`16
`164
`34
`290
`66
`32
`420
`98
`710
`162
`64
`1 028
`258
`1 678
`386
`128
`2 436
`642
`3 870
`898
`256
`5 636
`1538
`8 766
`2050
`512
`12 804
`3 586
`19 582
`4610
`1 024
`
`
`
`
`10 242 43 262 8 1942 048 28 676
`
`|
`
`Turning to the DCT, one can compare the FFCTto fast discrete cosine algorithms which perform the
`transform directly, without going to the FFT (and therefore to the need to perform 2 transforms on real
`data at the same time). The various operation counts are shown in Table 2. The number of multiplications
`for the FFCT is always below the other algorithms, and even if the numberofadditionsis slightly above
`the Chen ef al. version, the total number of operations is always substantially less.
`Signal Processing
`
`HUAWEI EX. 1021 -11/15
`
`HUAWEI EX. 1021 -11/15
`
`
`
`M. Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`275
`
`Table 2
`
`Comparison of direct computation of a DCT on real data with the algorithms of Chen e al. [13], Wang e7 al. [14]
`and the FFCT (the operation counts for the two first algorithms are taken from [14])
`
`
`
`
`
`
`
`
`
`Chen et al. . Wang etal.size FFCT
`r. mults
`r adds
`r. mults
`r. adds
`r. mults
`r. adds
`
` N
`
`8
`16
`32
`64
`128
`256
`512
`1024
`2 048
`
`16
`44
`116
`292
`708
`1 668
`3 844
`8 708
`19 460
`
`26
`74
`194
`482
`L154
`2 690
`6 146
`13 826
`30 722
`
`13
`35
`91
`227
`547
`1 283
`2947
`6 659
`1485]
`
`29
`83
`219
`547
`1315
`3075
`7 043
`15 875
`35 331
`
`12
`32
`80
`192
`A448
`1024
`2304
`5 120
`11 264
`
`29
`81
`209
`$13
`1217
`2817
`6 401
`14337
`31745
`
`The other general approach to transforms on real data is to compute simultaneously the transform of
`two real sequences. Beside the drawback that one has always to compute two transforms (otherwise the
`approachis suboptimal), the sorting of the output requires about N output additions. The operation count
`for a DFT onreal data are given below:
`
`Om [R-DFT(N)]= 1/2 + On [FFTON)],
`
`O,[R-DFT(N)]= 1/2 + O,[FFT(N)]+ N —-2.
`
`Gl)
`
`Table 3 compares (31) with the FFCT, and, as can be seen, the multiplication count is identical while a
`saving of about 30% is made with regard to additions.
`The situation is quite similar when computing a DCT with an FFT [12]. It is assumed that 2 DCT’s on
`real data are evaluated at the same time. Thus, the operation countis:
`
`Oy|[R-DFT(N)]= 1/2 + O[FFT(N)] +3 N/2)-2,
`
`O,{R-DFT(N)]= 1/2 - O4[FFT(N)]+(5N/2)—5.
`
`Table 3
`
`Comparison of the computation of a DFT on real data with an FFT of N on two
`data sequences simultaneously and output additions or the FFCT algorithm (the
`FFT of N is computed with the Rader-Brenneralgorithm)
`
`size
`
`FFT of N+adds
`
`FFCTalgorithm
`
`N
`real mults
`real adds
`real mults
`real adds
`
`
`20
`2
`32
`2
`8
`60
`10
`88
`10
`16
`164
`34
`242
`34
`32
`420
`98
`614
`98
`64
`1028
`258
`1 486
`258
`128
`2 436
`642
`3 486
`642
`256
`5 636
`1 538
`7998
`1538
`$12
`12 804
`3 586
`18 046
`3 586
`1024
`
`
`
`
`8 194 40.190 8 1942 048 28 676
`
`(32)
`
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX.1021 -12/15
`
`HUAWEI EX. 1021 -12/15
`
`
`
`real adds
`real mults
`real adds
`real mults
`29
`12
`41
`12
`8
`81
`32
`109
`32
`16
`209
`80
`287
`80
`32
`513
`192
`107
`192
`64
`1217
`448
`1675
`448
`128
`2817
`1 024
`3.867
`1024
`256
`6 401
`2 304
`8763
`2304
`512
`1024
`5120
`19 579
`$120
`14337
`2048
`11 264
`43 259
`11 264
`31745
`
`-
`Turning to the computation of a DFT on a complex input, one can use (29) for the FFCT. The comparison
`to the Rader-Brenner FFTis given in Table § where the numberof multiplies turns out to be identical
`while the numberof additions is reduced by nearly 20%.
`Looking at last alt the Winograd Fourier transform, one sees that even if the number of multiplies is
`larger in the FFCTcase, the total numberof operations is smaller for large transforms (for example 5%
`for the WFTA(504)/FFCT(512) and 9% for the WFTA(1008)/FFCT(1024) comparison). Note that the
`gain is less in the PFA case (5% and 2% respectively).
`Concerning the WFTA and the PFA,one recalls that its main drawback is the involved structure of the
`algorithm, Even if the structure of the FFCT is not as straightforward as the radix-2 FET, it remains
`simple (it is similar to the structure of the Rader~Brenner algorithm). In the FFCT, the reductionin the
`numberof operations is obtained at the cost of additional permutations and data transfers. Even if these
`
`
`seiebacitwrenttarerialLSeit
`
`:
`|
`|
`|
`q
`|
`|
`;
`|
`|
`
`:|
`
`|
`
`|
`
`276
`M. Veiterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`A technique similar to the one used in (23) was used in the computation of the output of the DCT. The
`results are compared in Table 4 whereit is seen that the additions are reduced in the FFCT by about
`25% with an identical numberof multiplies.
`Table 4
`
`Comparison of the computation of a DCT on real data with the algorithm using
`an FFT on two data sequences simultaneously or the FFCT (the FFT is
`the
`Rader~Brenner version)
`
`
`size
`FFT of N taps.
`FFCTalgorithm
`
`
`N
`
`Table 5
`
`Comparisonofthe Rader—Brenner FFTwith the FFCT whenapplied to complexdata
`
`
`
`
`
`
`FFT of Nsize FFCTalgorithm
`real mults
`real adds
`real mults
`real adds
`
` N
`
`52
`4
`52
`4
`8
`148
`20
`148
`20
`16
`388
`68
`424
`68
`32
`964
`196
`1 104
`196
`64
`2 308
`$16
`2720
`516°
`128
`5 380
`1 284
`6 464
`| 284
`256
`12 292
`3 076
`14976
`3 076
`512
`27 652
`7172
`34 048
`7172
`1024
`
`
`
`
`16 388 76 288 16 3882 048 61 444
`Signal Processing
`
`HUAWEI EX. 1021 -13/15
`
`HUAWEI EX. 1021 -13/15
`
`
`
`
`
`M. Vetterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`277
`
`permutations can be grouped with previous ones (for example at the entry of the DCT), the resulting
`structure is more complex. But this increase in topological complexity is annihilated by the substantial
`decrease in arithmetic complexity.
`Looking at the implementation, one sees the importance of using very efficient small transforms. Together
`with the explicit coding technique [16], this should lead to fast code for real transforms, especially on
`micro and signal processors, where the number of arithmetic and data registers is small (leading to
`unefficient implementations of the complex transform on two real sequences version).
`
`5, Concluding remarks
`
`We have introduced a simple, recursive algorithm for the computation of the discrete Fourier transform
`and the discrete cosine transform. First, this algorithm can be applied directly to real data, providing
`therefore anattractive alternative to the method using a complex transformof two real sequences. Secondly,
`it uses the same number of multiplics as the best structured FFT algorithms but decreases the numberof
`additions by about 25 to 30%for DET’s and DCT’s on real data and by nearly 20% for DFT’s on complex
`data when compared to currently used algorithms. Atlast, its structure is simple enough sothat it should
`lead to efficient implementations.
`It is worth noting that the proposed algorithm, while leading to the same computational complexity
`for the FFT as the onein [20], is not isomorphic. Interestingly, the algorithm in [20] has been aroundfor
`15 years but was seldom referenced and even less used. Meanwhile, literature was published on amelior-
`ations of the Cooley-Tukey FFT which leads to less efficient algorithms than the one in [20].... More
`recently, actually just as the final copy was being sent to the publisher, yet another algorithm appeared
`that leads to the same complexity for the complex FFT [21].
`Investigations are under way in order to generalize the above ideas to other sinusoidal transforms and
`to higher dimensions, as well as to prove the efficiency of an implementation.
`‘
`
`Acknowledgements
`
`The authors would like to thank one of the anonymous reviewers for bringing [20] and [21] to their
`attention as well as the “Fonds National Suisse de la Recherche Scientifique’ for supporting this research.
`
`References
`
`[1] C. Runge, and H. Konitz, Vorlesungen iiber Numerisches Rechnen, Springer Verlag, Berlin, 1924.
`[2] J.W. Cooley, and J.W. Tukey, “An algorithm for the machine calculation of complex fourier series", Math. of Comput., Vol.
`19, pp. 297-301, April 1