`
`HUAWEI EX
`
`1021 -1/15
`
`HUAWEI EX. 1021 -1/15
`
`
`
`SIGNAL PROCESSING
`
`CONTENTS
`
`Volume 6 (1984) No. 4
`
`Regular Papers
`
`M. Vetterli and H. Nussbaumer, Simple FFT and DOT algorithms with reduced number
`of operations
`
`J. B. Martens, Convolution algorithms, based on the CRT (Chinese Remainder
`Theorem)
`
`T. GUngen and N. C. Geckini, An algorithm for formant tracking
`
`267
`
`279
`
`293
`
`E. I. Plotkin, L. M. Roytman and N. M. S. Swamy, Unbiased estimation of an initial phase 301
`
`F. F. Liedtke, Computer simulation of an automatic Classification procedure for
`digitally modulated communication signals with unknown parameters
`
`Short Communications
`
`(3. H. Allen, Programming an efficient radix-four FFT algorithm
`
`J. Le Roux, Non symmetric lattice structure for pole zero filters
`
`Book Review
`
`Announcements + Call for Papers
`
`Author Index
`
`311
`
`325
`
`331
`
`387
`
`339
`
`343
`
`
`
`WWW_N.WWM...W.W-M.
`
`HUAWEI EX. 1021 -2/15
`
`HUAWEI EX. 1021 -2/15
`
`
`
`SIGNAL PROCESSING
`
`i
`
`6
`
`A European Journal devoted to the methods a
`A publication of the European Association for
`
`nd applications of signal processing
`Signal Processing (EURASIP)
`
`Editor-ianhiet
`Murat Kunt
`Laboratoire de traitoment de signaux
`Ecole polytechnique ledcrale de Lausanne
`l6, Chemin de Beilerive
`CH — 1007 Lausanne Switzerland
`Tel: 021 47 26 26 Telex: 24478
`Editorial Board
`M. Bellanger (Le Plessis Robinson, France)
`R. Boitc (Mons, Belgium)
`C. Braccini (Geneva, Italy)
`V. Cappellini (Florence, Italy)
`G. Carayannis (Athens, Greece)
`A, G, Constantinides (London, UK.)
`F. de Coulon (Lausanne, Switzerland)
`T. S. Durrani (Glasgow. UK.)
`W. Endres (Darmstadt, West Germany)
`B. Escudie (Lyon, France)
`
`O. D. Faugeras (Paris, France)
`A. L, Fawe (Sart Tilman, Belgium)
`A. Fettweis (Boctrum, West Germany)
`J. G. Gander (Solothurn, Switzerland)
`C, Gazanhes (Marseille, France)
`D. Godard (La Gaude, France)
`G. H, Granlund (Linkoeping, Sweden)
`C. Gueguen (Paris. France)
`T. S. Huang (West Lafayette. USA)
`Z. Kutpa (Warsaw, Poland)
`J, L. Lacoume (Grenoble, France)
`M. A. Lagunan-Hernandes
`(Barcelona, Spain)
`D. S. Lebedev (Moscow, USSR)
`S Levialdi (Napoli, Italy)
`C. E. Liedtke (Hannover, West Germany)
`J. Max (Grenoble, France)
`R. De Mori (Montreal, Canada)
`
`M Nagao (Kyoto, Japan)
`H. Niemenn (Erlangcn, West Germany)
`F. Petlandini (Neuchatel, Switzerland)
`8. Picinbono (Gif-sur~Yvette, France)
`8.8. Ramakrishna (Bangalore, India)
`F. Rocca (Milano, Italy)
`8. Sankur (lstanbut, Turkey)
`L. L, Scharf (Kingston, USA)
`J. C. Simon (Paris, France)
`C. Y. Suen (Montreal, Canada)
`Ju—Wei Tai (Peking, China)
`H. Tiziani (Heerbrugg, SWitzorIand)
`0. Well (Frankfurt, West Germany)
`G. Winkier (Kartsruhe, West Germany)
`|. T. Young (Delft, The Netherlands)
`P. Zamperoni (Braunschweig, West
`Germany)
`Secretary: |. Bezzi (Miss)
`
`
`Data Processing
`Editorial Policy. SIGNAL PROCESSING is a European Journal
`Remote Sensing
`presenting the theory and practice of signal processing.
`its
`Communication Signal Processing
`primary objectives are:
`Biomedical Signal Processing
`Dissemination of research results and of engineering develop-
`Geophysical and Astrophysical Signal Processing
`ments to European signal processing groups and individuals;
`Earth Resources Signal I‘rocessmg
`Presentation of practical solutions to current signal processing
`Acoustic and Vibration Signal Processing
`problems in engineering and science.
`Signal Processing Technology
`The editorial policy and the technical content of the Journal are
`Speech Processing
`the responsibility of the Editor—in—Chief and the Editorial Board.
`Radar Signal Processing
`The Journal is setfvsupporting from subscription income and
`Industrial Applications
`contains a minimum amount of advertisements. Advertisements
`are subject to the prior approval of the Editor—in—Chief
`
`Sonar Signal Processing
`SpeCiai Signal Processing
`New Applications
`
`Scope. SIGNAL PROCESSING incorporates all aSpects of the
`theory and practice of signal processing (analogue and digital).
`It features original research work, tutorial and review articles,
`and accounts of practical developments.
`It
`is intended for a
`rapid dissemination of knowledge and experience to engineers
`and scientists working in signal processing research, develops
`mentor practical application.
`
`Subject coverage. Subject areas covered by the Journal include:
`Signal Theory
`Signal Processing Systems
`Stochastic Processes
`Software Developments
`Detection and Estimation
`image Processing
`Spectral Analysis
`Pattern Recognition
`Filtering
`Optical Signal Processing
`
`Membership and Subscription Information.
`SignalProcessing (ISSN 01651684) is publishedin two volumes
`(8issues)ayear.Thesubscription pricefor1984 (Volumes6,7)is
`Dfl. 350.00 + Oil. 42.00 p.p.h : Dfl. 392.00. Our p.p.h. (postage,
`packing and handling) charge includes surface delivery of all
`issues, exceptto subscribers in the USA/Canada and India, who
`receive attissues by airdelivery(S.A LeSuri‘ace AirtLifted)atno
`extracost Fortherest'ofthewortd,airmailandS.A,L chargesare
`available upon request. Claims for missing issues will he
`honouredfreeofchargewithinthrcc monthsafterttre publication
`date of the issues. Mail orders and inquiries to: Eisevier Science
`Publishers B.V., Journals Department, PO. Box 211,1000 AE
`Amsterdam, The Netherlands. For full membership information
`of the Association. including a subscription at a reduced rate,
`please contact: EURASIP. PO. [Box 134, CH—1000 Lausanne 13,
`Switzerland,
`
`
`@1984, Elsevier Science Publishers B.V. (North—Holland)
`All rights reserved. No part oi this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means. electronic, rirechanicat,
`photocopying, recording or otherwise, without the prior permission of the publisher, Elsevier Science Publishers B.V. iNortthoiland), PO Box 1991, 1000 BZ
`Amsterdam. The Netherlands.
`Submission of an article for publication implies the transfer of the copyright from the author(s) to the publisher and entails the author's irrevocable and exclusive
`authorization of the publisher to collect any sums or considerations for copying or reproduction payable by third parties (as mentioned in article 17 paragraph 2 or the
`Dutch Copyright Act of 1912 and in the Royal Decree ot June 20, 1974 (S 351) pursuant to article 16b of the Dutch Copyright Act of 1912) and/orto act in or out of Court in
`connection therewith,
`Special regulations for readers in the U.S.A.
`this journal has been registered with the Copyright Clearance Center. Inc. Consent Is given for copying of articles for
`personal orinteinat use, or for the personal use of specific clients This consent is given on the condition that the copier paysthrouqh the Centerthe per-copy fee stated in
`107 or team the U 8. Copyright Law. the appropriatefeeshouid be iorwairted with
`the code on the first page of each article for copying beyond that permitted by Sections
`gress Street, Salem, MA 01970, U.S.A. If no code appears in an article, the. author has
`a copy of the first page of the article to the Copyright Clearance Center. Inc , 21 Con
`not given broad consent to copy and permission to copy must be obtained directly from the author All articles published prior to |981 may be copied for a perrcopy tee oi
`US $2 25, also payable through the Center [N Ft For review journals this fee is $0.25 per Lzopy per page.) This consent does not exteurl to other kinds oi copying, such as
`for general distribution, resale, advertising and promotion purposes, or forcreating new collective works. Special wri119n permission must be obtained from the publisher
`for such copying.
`Specie/regulations for authors in the USA. 7 Upon acceptance of an article by the ioumat, the authorts) will be asked to transfer copyright of the article to the publisher
`This transfer will ensure the widest possrbtc dissemination of intorrriation under the US. Copyright Law
`
`Published bimonthly
`
`Printed in The Netherlands
`
`HUAWEI EX. 1021 -3/15
`
`
`
`HUAWEI EX. 1021 -3/15
`
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`
`
`
`Signal Processing 6 (1984) 267~278
`NortheHolland
`
`r
`
`267
`
`SIMPLE FFT AND DCT ALGORITHMS WITH REDUCED NUMBER OF
`OPERATIONS
`
`/
`Martin VETTERLI, member EURASIP, and Henri J3NU§SBAUMER
`Luburaluire d’In/"ormarique Technique, Ecale Polyueehm'que Fédérale de Lausanne, 16 Chemin de Bellerive, CH-1007 Lauranne,
`Switzerland
`
`Received 2 November 1983
`Revised 20 February 1984
`
`Abstract. A simple algorithm for the evaluation of‘discrete Fourier transforms (DFT) and discrete cosine transforms (DC/1')
`is presented. This approach, based on the divide and conquer technique, achieves a substantial decrease in the number of
`additions when compared to currently used FFT algorithms (30% for a DFT on real data, 15% for a DFT on complex data
`and 25% fora DCT) and keeps the same number of multiplications as the best known FFT algorithms. The simple structure
`of the algorithm and the fact that it is best suited for real data (one does not have to take a transform of two real sequences
`simultaneously anymore) should lead to efficient implementations and to a wide range of applications
`
`Zusammenfassung. Ein einfacher Algorithmus zur Berechnung von diskreten Fourier Transformationen (DFT) und diskreten
`Cosinus Transformationen (DCT) wird vorgeschlagen. Diese Methode, basierend aufdcr “Tcilen und Losen” chhnik, crlaubt
`cine Verkleinerung der Anzahl Additionen gegeniiber gebraiichlichen FFT Algorithmen (30% fiir eine DFT von einem reellen
`Signal, 15% fiir cine DFT von eincm complcxcn Signal und 25% fur cinc DCT) und braucht glcichvicl Multiplikationcn wic
`die bcsten bckanntcn FFT Algorithmcn. Die einfache Struktur des Algorithmus und der Fakt dass er am besten fiir reelle
`Signale gceignct
`ist (man braucht nicht mehr gleichzeitig zwei reelle Signale 7u transformieren) sollten 7u eflizienter
`lmplementierung und zu zahlreichen Applikationen fuhren.
`
`Résumé. Un algorithme simple pour l’évaluation de la transformée de Fourier discréte (DFT) et de la transformée en cosinus
`discrete (DCT) est propose. Cette approche, basée sur la méthode de la “division et solution”, permet une diminution
`substantielle du nombre d’additions par rapport aux algorithmes de FFT courants (30% pour une DFT de signaux réels,
`15% pour une DFT de signaux complexes et 25% pour une DCT) tout en gardant un nombre de multiplications‘égal a celui
`des meilleurs algorithmes de FFT connus, La structure simple de l’algorithme ainsi que le fait qu‘il s’applique bien aux
`signaux réels (il n’y a plus besoin de prendre la transformée de deux signaux réels simultanément) devraient conduire a mic
`implantation et’ficace ainsi qu’z‘t un large champ d’applications.
`
`Keywords. Fast Fourier transform, fast cosine transform, transforms of real data.
`
`1. Introduction
`
`Since the rediscovery of the fast Fourier transform (FFT) algorithm [1, 2] for the evaluation of discrete
`Fourier transforms, several improvements have been made to the basic divide and conquer scheme as for
`example the mixed radix FFT [3] and the real factor FFT [4, 5]. The introduction of the Winograd Fourier
`transform (WFTA) [6], although a beautiful result in complexity theory, did not bring the expected
`improvements once implemented on real life computers [7], essentially due to the large total number of
`operations and to the structural complexity of the algorithm.
`The fact that most FFT’S are taken on real data is seldom fully taken into account. The algorithm using
`a FFT of half dimension for'the computation of a DFT on a real sequence [8] uses substantially more
`operations than the method of computing a single FFT on two real sequences simultaneously [9]. The
`
`0165-1684/84/$3.00 © 1984, ‘Elsevier Science Publishers B.V. (North—Holland)
`
`HUAWEI EX. 1021 -4/15
`
`
`
`tt
`
`
`
`
`kmNyummy...samuaMQMWaswiawwwvwmuv
`
`
`268
`
`M. Vetterli, H. J. Nurshaumer/ Simple FFT and DCT Algorithms
`
`latter method has the disadvantage that one has to take two DFT’s at once and that the sorting of the
`output uses additional adds. The fact that the input and output sequences are real is used explicitly in a
`real convolution algorithm [10] where the DFT and inverse DFT are computed with a single complex EFT.
`Another transform that is mostly applied to real data is the discrete cosine transform. Since the
`introduction of the DCT [11], the search for a fast algorithm followed two main different approaches
`One was to compute the DCT through a FFT of same dimension [12], where one is bound to take two
`transforms simultaneously. The other was a direct approach, leading to rather involved algorithms [13].
`It should be noted that the former technique outperforms all the latter ones when using optimal FFT’s,
`a fact often left in the dark [14].
`_
`Recently, evaluation of signal processing algorithms has shifted away from multiplication counts alone
`to the counting of the total number of operations, including data transfers [15]. This is due to the fact
`that the ratios (multiplication time)/(addition time) and (multiplication time)/(load time) are close to one
`on most computers and signal processors. Another growing concern has been the generation of time
`efficient software [16], and finally, the efficiency of an algorithm turns out to be a non—trivial combination
`of the various operation counts as well as of its structural complexity [17].
`In this communication, we address an old problem, namely, the efficient evaluation of DFT’s and DCT‘s
`of real data. Efficiency is meant in the sense of m’inimal number of multiplications and additions as well
`as in the sense of structural simplicity. As it turns out, the two problems are closely related, since a DFT
`of dimension N can be evaluated with two DCT’s of size N/4 and since a DCT of size N can be evaluated
`with a DFT of size N and additional operations. The same technique can be applied again to the reduced
`DCT of size NM and to the DFT of size N, and this until only trivial transforms are left over (N = 1, 2).
`This leads to an elegant recursive formulation of the two algorithms and to a number of multiplications
`identical
`to the best FFT‘s while diminishing substantially the number of additions (typically 30%).
`Interestingly, this last saving is partly kept when computing complex DFT’s, and as an example, the total
`number of operations for a 1024-point transform is nearly 10% below the number of operations required
`for a 1008-point WFTA. The prime factor FFT (PFA) requires about the same number of operations [18],
`but has a more complex structure.
`Note that the algorithms below were developed while searching for an efficient way to compute DCT’s
`of real data. The derived FFT algorithm for real data that follows immediately requires a number of
`multiplications identical to the one found in [19] (which is a variation of the Rader—Brenner algorithm),
`and a total number of operations that can be found in [20]. While obtaining an identical complexity, the
`derivations are quite different and the algorithm below seems more suitable for programming.
`Section 2 is used to derive the general algorithm and Section 3 evaluates its computational complexity.
`In Section 4, the results are compared to other algorithms and some implementation considerations are
`addressed.
`
`2. Derivation of the algorithms
`
`Let us define the following transforms ofthe length-N real vector x with elements x(0), x(l) -
`
`-
`
`- x(N — 1):
`
`Discrete Fourier transform
`N7.
`l
`
`DFT(k,N,x)I= z xtn)-e*12""k/N, k:0,...,N—l,
`n=0
`
`(1)
`
`where j : +«/:—1.
`Signal Processing
`
`HUAWEI EX. 1021 -5/15
`
`HUAWEI EX. 1021 -5/15
`
`
`
`M. Verterli, H. J. Nussbaumer / Simple FFT and DCT Algorithms
`
`Discrete cosine transform
`
`DCT(k,N,x):= z x(n)-cos<“(—”)—), k:o,...,N—1.
`
`2
`
`2 +1 k
`
`4N
`
`M.
`
`"40
`
`Cosine DFT
`
`cos-DFT(k, N, x)Z: : x(n)‘cos< “N1 > k=0,...,Nfi 1.
`
`~71
`"=0
`
`
`2
`k
`
`Sine DFT
`
`sin-DFT(k, N, x): Z x(n) - sin< T: ),
`
`N—l
`11:0
`
`
`2
`k
`
`kZO, .
`
`.
`
`.
`
`, N— l.
`
`269
`
`(2)
`
`(3)
`
`(4)
`
`Note that in the definition of the DCT, the normalizing factor 1/x/2 at k:0 is omitted for simplicity.
`While the transforms are usually only defined for k = O,
`.
`.
`.
`, N 7 1, one will see that other values are useful
`as well and can easily be obtained from the ones defined above.
`Obviously, the following relations hold:
`
`DFT(k, N, x) = cos-DFTUc, N, x) — j sin-DFT(k, N, x),
`
`DCT( N, N, x) : 0,
`
`DCTHC, N, x) = DCT(k, N, x),
`
`DCT(2N ~ k, N, x) = — DCT(k, N, x),
`
`c0s-DFT(N * k, N, x) = cos-DFT(k, N, x),
`
`sin-DFT(N~ k, N, x) = —sin-DFT(k, N, x).
`
`Looking at the evaluation of cos-DFT(k, N, x), we note that since the cosine function is even:
`
`N/271
`2
`k
`N/4~l
`2
`2 +1 k
`
`cos—DFT(k,N,x):
`
`"go x(2n)-cos(;;2)+ .20 (x(2n+l)+x(N—2n—l))‘cos<%%>,
`
`or, in a more succinct form:
`
`cos-DFT(k, N, x)= cos-DFT(k, N/2, x.) +DCT(k, N/4, x2),
`
`k : 0,. .
`
`.
`
`, N — l,
`
`with x1(n)=x(2n),
`
`n=0,...,N/271,
`
`x2(n)=x(2n+1)+x(N72nil), n=0,...,N/4il,
`
`(5)
`
`(6)
`
`(7)
`
`(8)
`
`(9)
`
`(10)
`
`(11)
`
`(12)
`
`where (8) or (9) are applied when necessary.
`Hence, the cosine DFT of dimension N has been mapped into a cosine DFT of N/2 and a DCT of
`N/4, at a cost of N/4 input and N/2 output additions (note that cos-DFT(k, N, x) is evaluated for
`k =0, .
`.
`.
`, N/2 but that k: N/4 does not require an output addition). The cosine DFT of N/2 can be
`handled similarly (and this until the transform becomes trivial) and the case of the DCT will be treated
`below.
`,
`,
`Turning to the sine DPT, we take a similar approach, using the fact that the sine function is odd.
`N/271
`
`
`sin—DFT(k, N, x)= Z x(2n) - sin<
`
`11:0
`
`n:
`
`21:17:) +NE:(xtzn+1)-X(N—2"”l))'Sin<Zl‘lgrlTj‘llLk)I
`
`(13)
`
`Vol. 6, No. 4, August 1984
`
`H'UAWEI EX. 1021 -6/15
`
`HUAWEI EX. 1021 -6/15
`
`
`
`
`awnwwsa
`
`
`
`i
`
`i
`i
`i1
`‘
`
`
`
`2 g E i
`
`i:
`
`§ EE3
`
`5g E§
`
`DCT(N/2, N, x) : cos(Tr/4) - cos-DFT(N/2, N, x4).
`
`(19)
`
`The matrix product in (19) can be evaluated with 3 multiplications and 3 additions (instead of 4
`multiplications and 2 additions) [9] as follows:
`
`_
`(fl>+in<zfil‘)
`””008 4N
`S
`4N”
`
`t, <M>_
`p2~s1n 4N
`
`(E115)
`COS 4N ,
`
`s, = cos-DFT(k, N, x4) +sin-DFT(k, N, x4),
`
`ml—sl
`Sigmll Processing
`
`,
`
`(21c)
`
`cos 4N ,
`
`HUAWEI EX. 1021 -7/15
`
`270
`
`M, Verrerlz, H.
`
`.l. Nussbaumer/ Simple FFT and DCT Algorithms
`
`Using the following identity:
`
`Sin<2m2nN+ mic) _(_1)n ' c05(2m2n +11)\(]N/4—k))’
`
`we can rewrite (l3)7 using (8) or (10) when necessary, as:
`
`sin-DFT(k, N, x)=sin-DFT(k, N/2, x,)+DCT(N/4— k, N/4, x3),
`
`k : 0, .
`
`.
`
`.
`
`, N— I,
`
`with x3(n):(—1)"-(x(2n+l)—x(N~2nil)).
`
`(14)
`
`(15)
`
`Therefore, the sine DFT of dimension N has been mapped into a sine DFT of N/2 and a DCT of N/4,
`at the cost of N/4 input and (N/2)—2 output additions (note that sin-DFT(k, N, x) is evaluated for
`k 2 0, .
`.
`.
`, N/2 but that k : 0, N/4 and N/2 do not require any output additions). The sine DFT of N/2
`is handled in a similar fashion until the length is reduced so that the transform becomes trivial.
`We now focus our attention on the computation of the DCT. Using the following mapping [12]:
`
`X4(n)= 96(2"),
`
`x4(N—n—l):x(2n+l),
`
`r1=O,...,N/2~l,
`
`the DCT can be evaluated as:
`
`N~l
`2
`4 +1 k
`DCT(k, N, x) = Z x401) - COS<M>,
`
`4N
`
`11:0
`
`kr—O, .
`
`’
`
`.
`
`. , N7 1.
`
`Using basic trigonometry, (17) becomes:
`
`DCT(k, N, x) : cos(i) ~cos-DFT(k, N, x4) —sin<l) -sin-DFT(lc, N, x4),
`
`2 k
`4N
`
`2 k
`4N
`
`With the symmetries of trigonometric functions, (18) can be computed as follows:
`
`(16)
`
`(l7)
`
`k = 0, .
`
`.
`
`.
`
`, N — 1.
`
`(18)
`
`C
`
`[ DCT(k, N, x) ],
`DCT(N—k, N, x) _
`
`and
`
`(217k)
`s 4N
`(21%)
`
`_
`
`sm 7 cos
`4N
`
`0- — —'-1n ——
`
`.
`
`b
`
`(Zwk)
`4N
`(211k)
`
`‘ [cos-DFT(k, N, x4)]
`~
`sin»DFT(k, N, x4)
`
`’
`
`kio
`
`""’
`
`N/2_1
`
`4N
`
`HUAWEI EX. 1021 -7/15
`
`
`
`M. Velterli, H. J. Nusshaumer/ Simple FFT and DCT Algorithms
`
`271
`
`m2 = sin-DFTUC, N, x4) - p1,
`
`m5 : cos—DFT(k, N, x4) ' p2,
`
`DCT(k, N; x) = ml — m2,
`
`DCT( N ,, k, N, x) : m1+ m3,
`
`'
`
`(20)
`
`where p. and p; are precomputed. Using all simplifications, the computation of (19) requires therefore a
`total of (3 N/2)72 multiplications and (3 N/2)V-~3 additions.
`Thus, we have shown how to map a N dimensional DFT into two DCT’s of N/4 (5, 12 and 15) and
`how to map a DCT of N into a DFT of same size (16 and 19). In other words, since the DCT is computed
`through a DFT, it is shown how to compute a DFT of N with 2 DFT’s of N/4 plus auxiliary operations,
`and this operation can be repeated until N has been reduced sufficiently so as to lead to trivial transforms.
`Figures 1
`to 4 try to visualize schematically the interaction of the various transforms. Combination of
`these figures in appropriate order show how to compute a transform of any dimension.
`
`DFT(N)
`
`cos-DFT(N)
`
`Fig.
`
`l. Decomposition ofa DFT of size N into a cosine DFT of size N and a sine Dl‘T of size N.
`
`sm—DFT(N)
`DCT(N/ll)
`
`cos—DFT(N)
`
`cos-DFltN/Z)
`
`Fig. 2. Decomposition iof a cosine DFT of size N into a cosine I)FT of size, N/Z and a DCT of size N/M4.
`Vol. 6, No. 4, August 1984
`
`H'UAWEI EX. 1021 -8/15
`
`HUAWEI EX. 1021 -8/15
`
`
`
`272
`
`M. Venerli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`SIN—DFT(N)
`
`suerFT(N/2)
`
`DCT(N/ll)
`SIN-DFT(N)
`
`Fig, 3. Decomposition of a sine DFT of size N into a sine DFT of size N/2 and a DCT of size N/4,
`
`DCT<N
`
`cosADF’HN)
`
`Fig. 4. Decomposition of a DCT or" size N into a cosine DFT of size N and a sine DFT of size N plus auxiliary operations.
`
`The algorithm is best suited for DFT’s on real data, but if the input is complex, one simply takes the
`real and imaginary parts separately (thus doubling the computational load) and evaluates the output with
`2N—4 auxiliary adds (k :0 and N/2 require no additions).
`
`’3. Computational complexity
`
`Even if the derivation above used only the fact that N was a multiple of 4, the algorithm, as other
`divide and conquer schemes, performs best when N is highly composite, typically a power of 24
`Below, we restate the number of operations required for the various steps needed in the evaluation of
`the transforms, where OM[-] and OAL'] stand for the number of multiplies and adds respectively.
`Signal Processing
`
`HUAWEI EX. 1021 -9/15
`
`HUAWEI EX. 1021 -9/15
`
`
`
`M. Vet/Erli, H, J. Nussbaumer/ Simple FFT and DOT Algorithms
`
`DFT on length-N real data:
`
`OM[R-DFT(N)] = OM[cos—DFT(N)] + OMLsin—DFT(N)J,
`-
`.
`_
`OA[R»DFT(N)] = OA[cos-DFT(N)] + OA[sin—DFT(N)j.
`DPT on length-N complex data:
`i
`
`0M[C-DFT(N)] = 2- (OM[cos-DFT(N)] +OM[sin-DFT(N)]),
`OA[C-DFT(N)] = 2 - (OA[cos-DFT(N)] + 0,4[sin-DFT(N)]) +2N —4.
`
`DCT 0n length-N real data:
`
`0M [DCT(N)] : OM[cos-DFT(N)j + OM Lsin~DFT(N)] +(3 N /2) ~ 2,
`OA[DCT(N)] = OA[cos-DFT(N)] + OA[sin-DFT(N)] +(3N/2)—3.
`
`Cosine DFT on length-N real data:
`
`OM[cos—DFT(N)] = OM[DCT(N/4)] + 0,; [cos—DFT(N/2)],
`0A[cos-DFT(N)] = OA[DCT(N/4)] + OA[cos-DFT(N/2)] +(3N/4).
`
`Sine DFT on length-N real data:
`
`OM[sin-DFT(N)] = 0M[DCT(N/4)] + OM[Sin-DFT(N/2)],
`OA[sin—DFT(N)] : OA[DCT(N/4)] + OA[sin-DFT(N/2)] +(3N/4) —2.
`
`273
`
`(21)
`
`(22)
`
`23
`
`24
`
`)
`
`)
`
`(
`
`(
`
`25
`
`(
`
`)
`
`From (21)—(25) one can compute recursively the number of operations needed for the various transform
`types and sizes greater than 2. For N : 1, no operations are required, and the values for N =2 are given
`in (26), thus defining the initial conditions for the above recursions.
`
`OM[R»DFT(2)] : OMLC-DFT(2)] = OM[cos-DI<T(2)] : OM[sin-DFT(2)]: 0,
`
`OM[DCT(2)] : 1,
`
`OA[R-DFT(2)] = 0A{DCT(2)] : OA[cos-DFT(2)] = 2,
`
`(26)
`
`oltc-szu : 4,
`
`OA[sin-DFT(2)] : 0.
`
`But, when N is a power of 2 and that the recursions are therefore applied Log; N times, the operation
`counts for the DCT reduces simply to:
`
`0A4[DCT(N)] : N/2 ' Lng (N),
`
`OA[DCT(N)]= N/2 - (3 Log2(N)—2)+ 1.
`
`From (27) it follows immediately with (21)—(23) that:
`
`OM R-DFTN =N 2-112 N —3 +2,
`[
`()l
`/
`(0g()
`)
`OA[R-DFT(N)]= N/2 - (3 Log2(N)75) +4,
`
`(27)
`
`(28)
`
`Vol. 6, No. 4, August I984
`
`HUAWEI EX. 1021 -10/15
`
`HUAWEI EX. 1021 -10/15
`
`
`
`‘53
`t1
`i
`
`
`,iist..i;s¢:.ii;mum£um,mwawmedaliWBWMMMss‘Wwwmm1:
`mmmmimammmmwmmwmmmmmm.mmme
`
`2:
`
`274
`and
`
`M. Verterli, H. J. Nussbaumer/ SimpIe FFT and DCT Algorithms
`
`OM[C-DFT(N)] = N‘ (Log2(N)—3) +4,
`.
`OA[C-DFT(N)];3N- (L0g2(N)—1)+4-
`
`29
`
`(
`
`)
`
`The above closed form expressions can now easily be compared to existing algorithms.
`
`4. Comparison with existing algorithms
`
`First, the introduced algorithm (which is called in the following fast Fourier-cosine transform or FFCT)
`is compared to other algorithms which work directly on real data. Then we look at the case where
`two transforms on real data are taken simultaneously with an optimal FFT algorithm. The case of the
`FFT on complex data is investigated as well, and the issue of algorithm structure is addressed. The
`operation counts for FFT’s are taken from [9] unless otherwise specified.
`For the computation of a DFT on real data, one can use the algorithm based on an FFT of N/2 [8].
`Using again the 3 mult/3 adds approach for the auxiliary output operations, one gets the following
`complexity:
`
`OMU‘FT(N)] = OM[FFT(N/2)] +(3 N/2) _ 2,
`0A[FFT(N)] : OA[FFT(N/2)] +(9N/2)72.
`
`(30)
`
`Table 1 compares this result with the FFCT and shows the substantial savings that are obtained.
`
`Table I
`
`Comparison of direct computation of a DFT on real data with a FFT of N/2 or
`the FFCT algorithm (the FFT of N/Z is computed with the Reader—Brenner algorithm)
`
`size
`FFT of N/2 +ops.
`FFCT algorithm
`
`
`N
`real mults
`real adds
`rcal mulls
`real adds
`
`
`20
`2
`42
`10
`8
`60
`10
`122
`26
`16
`164
`34
`290
`66
`32
`420
`98
`710
`162
`64
`1 028
`258
`1 678
`386
`128
`2 436
`642
`3 870
`898
`256
`5636
`1538
`8766
`2050
`512
`12804
`3586
`19 582
`4610
`1024
`
`
`
`
`10 242 43 262 8 1942 048 28 676
`
`.
`
`Turning to the DCT, one can compare the FFCT to fast discrete cosine algorithms which perform the
`transform directly, without going to the FFT (and therefore to the need to perform 2 transforms on real
`data at the same time). The various operation counts are shown in Table 2. The number of multiplications
`for the FFCT is always below the other algorithms, and even if the number of additions is slightly above
`the Chen et a]. version, the total number of operations is always substantially less.
`Signal Processing
`
`HUAWEI EX. 1021 -11/15
`
`HUAWEI EX. 1021 -11/15
`
`
`
`M. Vellerli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`275
`
`Table 2
`
`Comparison of direct computation of a DCT on real data with the algorithms of Chen el al. [13], Wang et a1. [14]
`and the FFCT (the operation counts for the two first algorithms are taken from [14])
`
`
`size
`Chen et a1.
`.
`Wang er a1.
`FFCT
`
`N
`r. mults
`I‘. adds
`1'. mults
`1'. adds
`r. mults
`r. adds
`
`
`8
`16
`32
`64
`128
`256
`512
`1024
`2 048
`
`16
`44
`116
`292
`708
`1668
`3844
`8 708
`I9 460
`
`26
`74
`194
`482
`1154
`2 690
`6146
`13 826
`30 722
`
`13
`35
`9|
`227
`547
`1283
`2947
`6 659
`I4 851
`
`29
`83
`219
`547
`1315
`3 075
`7043
`15 875
`35 331
`
`I2
`32
`80
`192
`448
`1024
`2304
`5120
`11264
`
`29
`81
`209
`513
`1217
`2 817
`6401
`14 337
`3| 745
`
`The other general approach to transforms on real data is to compute simultaneously the transform of
`two real sequences. Beside the drawback that one has always to compute two transforms (otherwise the
`approach is suboptimal), the sorting of the output requires about N output additions. The operation count
`for a DFT on real data are given below:
`
`OM[R-DFT(N)]: 1/2 - OM[FFT(N)],
`
`OA[R-DFT(N)] 2 1/2. OA[FFT(N)]+ N —2.
`
`(31)
`
`Table 3 compares (31) with the FFCT, and, as can be seen, the multiplication count is identical while a
`saving of about 30% is made with regard to additions.
`The situation is quite similar when computing a DCT with an FFT [12]. It is assumed that 2 DCT‘s on
`real data are evaluated at the same time. Thus, the operation count is:
`
`OmiR-DFT(N)]: 1/2 - OM[FFT(N)] +(3 N/2)*2,
`OA[R~DFT(N)]=1/2- OA[FFT(N)]+(5N/2)i 5.
`
`Table 3
`
`Comparison of the computation of a DFT on real data with an FFT of N on two
`data sequences simultaneously and output additions or the FFCT algorithm (the
`FFT of N is computed with the Radar—Brenner algorithm)
`
`size
`
`FFT ofNi adds
`
`FFCT algorithm
`
`N
`real mults
`real adds
`real mults
`real adds
`
`
`20
`2
`32
`2
`8
`60
`10
`88
`10
`16
`164
`34
`242
`34
`32
`420
`98
`614
`98
`64
`1028
`258
`1486
`258
`128
`2 436
`642
`3 486
`642
`256
`5 636
`1538
`7 998
`1538
`512
`12 804
`3 586
`18 046
`3 586
`1024
`
`
`
`
`819440.19081942 048 28 676
`
`(32)
`
`Vol. 6, No, 4, August 1984
`
`HUAWEI EX. 1021 -12/15
`
`HUAWEI EX. 1021 -12/15
`
`
`
`276
`
`M. Vellerli, H. J. Nussbaumer/ Simple FFT and DCTAIgUriI/Ims
`
`A technique similar to the one used in (23) was used in the computation ofthe output of the DCT. The
`results are compared in Table 4 where it is seen that the additions are reduced in the FFCT by about
`25% with an identical number of multiplies.
`Table 4
`
`Comparison of the computation of a DCT on real data with the algorithm using
`an FFT on two data sequences simultaneously or the FFCT (the FFT is
`the
`Raider—Brenner version)
`
`
`size
`FFT of N tops.
`FFCT algorithm
`
`N
`real mults
`real adds
`real mults
`real adds
`
`12
`41
`12
`29
`8
`32
`109
`32
`81
`16
`80
`287
`80
`209
`32
`192
`707
`192
`513
`64
`448
`1675
`448
`1217
`128
`| 024
`3 867
`1 024
`2 817
`256
`2304
`8763
`2304
`6401
`512
`5120
`19579
`5120
`14 337
`1024
`
`
`11 264 43 259 11 2642 048 31 745
`
`
`
`Turning to the computation ofa DFT on a complex input, one can use (29) for the FFCT. The comparison
`to the Rader—Brenner FFT is given in Table 5 where the number of multiplies turns out to be identical
`while the number of additions is reduced by nearly 20%.
`Looking at last at the Winograd Fourier transform, one sees that even if the number of multiplies is
`larger in the FFCT case, the total number of operations is smaller for large transforms (for example 5%
`for the WFTA(504)/FFCT(512) and 9% for the WFTA(1008)/FFCT(1024) comparison). Note that the
`gain is less in the PFA case (5% and 2% respectively).
`Concerning the WFTA and the PFA, one recalls that its main drawback is the involved structure of the
`algorithm. Even if the structure of the FFCT is not as straightforward as the radix-2 FFT,
`it remains
`simple (it is similar to the structure of the Rader—Brenner algorithm). In the FFCT, the reduction in the
`number of operations is obtained at the cost of additional permutations and data transfers. Even if these
`
`Table 5
`
`Comparison ofthe Rader—Brenner FFT with the FFCT when applied to complex data
`
`
`
` N real mults real adds real mults real adds
`
`
`
`
`
`
`
`
`FFT of Nsize FFCT algorithm
`
`8
`4
`52
`4
`52
`16
`20
`148
`20
`148
`32
`68
`424
`68
`388
`64
`196
`1 104
`196
`964
`128
`516‘
`2720
`516
`2308
`256
`,1 2X4
`6 464
`1 284
`5380
`512
`3 076
`14 976
`3 076
`12 292
`1024
`7172
`34 048
`7 172
`27 652
`2048
`16388
`76 288
`16388
`61444
`
`
`
`Signal Processing
`
`HUAWEI EX. 1021 -13/15
`
`HUAWEI EX. 1021 -13/15
`
`
`
`
`
`M. Vetterli, H. J. Nuxsbaumer/ Simple FFT and DCTAIgaz-ithms
`
`277
`
`permutations can be grouped with previous ones (for example at the entry of the DCT), the resulting
`Structure is more complex. But this increase in topological complexity is annihilated by the substantial
`decrease in arithmetic complexity.
`Looking at the implementation, one sees the importance of using very efficient small transforms. Together
`With the explicit coding technique [16], this should lead to fast code for real transforms, especially on
`micro and signal processors, where the number of arithmetic and data registers is small (leading to
`unetficient implementations of the complex transform on two real sequences version).
`
`5. Concluding remarks
`
`We have introduced a simple, recursive algorithm for the computation of the discrete Fourier transform
`and the discrete cosine transform. First, this algorithm can be applied directly to real data, providing
`therefore an attractive alternative to the method using a complex transform oftwo real sequences. Secondly,
`it uses the same number of multiplies as the best structured FFT algorithms but decreases the number of
`additions by about 25 to 30% for DETS and DCT’s on real data and by nearly 20% for DFT’s on complex
`data when compared to currently used algorithms. At last, its structure is simple enough so that it should
`lead to efficient implementations.
`It is worth noting that the proposed algorithm, while leading to the same computational complexity
`for the FFT as the one in [20], is not isomorphic. Interestingly, the algorithm in [20] has been around for
`15 years but was seldom referenced and even less used. Meanwhile, literature was published on amelior-
`ations of the Cooley—Tukey FFT which leads to less efficient algorithms than the one in [20] .
`.
`.
`. More
`recently, actually just as the final copy was being sent to the publisher, yet another algorithm appeared
`that leads to the same complexity for the complex FFT [21].
`Investigations are under way in order to generalize the above ideas to other sinusoidal transforms and
`to higher dimensions, as well as to prove the efficiency of an implementation.
`1
`
`Acknowledgements
`
`The authors would like to thank one of the anonymous reviewers for bringing [20] and [21] to their
`attention as well as the “Fonds National Suisse de la Recherche Scientifique” for supporting this research.
`
`References
`
`[l] C. Runge, and H, Konitz, VUrlesungen iiber Numerischer Rerhnen, Spring