`
`HUAWEI EX. 1127 - 1/15
`
`
`
`SIGNAL PROCESSING
`
`CONTENTS
`
`Volume 6 (1984) No. 4
`
`Regular Papers
`
`M. Vetterli and H. Nussbaumer, Simple FFT and DCT algorithms with reduced number
`of operations
`
`J. B. Martens, Convolution algorithms, based on the CRT (Chinese Remainder
`Theorem)
`
`T. Gilingen and N. C. Geckini, An algorithm f.or formant tracking
`
`267
`
`279
`
`293
`
`E. I. Plotkin, L. M. Roytman and N. M. S. Swamy, Unbiased estimation of an initial phase 301
`
`F. F. Lledtke, Computer simulation of an automatic classification procedure for
`digitally modulated communication signals with unknown parameters
`
`Short Communications
`
`(3. H. Allen, Programming an efficient radix-four FFT algorithm
`
`J. Le Roux, Non symmetric lattice structure for pole zero filters
`
`Book Review
`
`V
`
`Announcements + Call for Papers
`
`Author Index
`
`311
`
`325
`
`331
`
`387
`
`339
`
`343
`
`
`
`.a.a.=........,._........7.».,._.c._._,...~.m.....,..w.....-.,_
`
`H’ ~ 3) ‘$384
`
`E»
`
`§
`
`‘:> R. A R. Y
`
`HLJAWEI EX. 1127 - 2/15
`
`HUAWEI EX. 1127 - 2/15
`
`
`
`SIGNAL PROCESSING
`
`i
`
`I
`
`A European Journal devoted to the methods a
`A publication of the European Association for
`
`rid applications of signal processing
`Signal Processing (EURASIP)
`
`Editor-in-Chiei
`Murat Kunt
`Laberatoire de traitcment de signaux
`Ecole polytechnique iederale de Lausanne
`16, Chemin de Bellerive
`CH -1007 Lausanne Switzerland
`Tel: 021 47 26 26 Telex: 24478
`Editorial Board
`M. Bellanger (Le Plessis Robinson, France)
`Ft. Boitc (Mons, Belgium)
`C. Braccini (Genova, Italy)
`V. Cappellini (Florence, Italy)
`G. Carayannis (Athens, Greece)
`A. G. Constantinides (London, U.K.)
`F. de Coulon (Lausanne. Switzerland)
`T. S. Durrani (Glasgow. UK.)
`W. Endres (Darmstadt, West Germany)
`B. Escudié (Lyon, France)
`
`O. D. Faugeras (Paris, France)
`A. L. Fawe (Sari Tilman, Belgium)
`A. Fettwels (Boctium, West Germany)
`J. G. Gander (Soiothurn, Switzerland)
`0, Gazanhes (Marseilte, France)
`D. Godard (La Gaude, France)
`G. H Granlund (Linkoepirig, Sweden)
`C. Guegiien (Paris. France)
`T. S. Huang (West Lafayette, USA)
`Z. Kulpa (Warsaw, Poland)
`J. L. Lacoume (Grenoble, France)
`M. A. Lagunan—Hernandes
`(Barcelona, Spain)
`D. S. Lebedev (Moscow, USSR)
`S Levialdi (Napoli, Italy)
`C. E. Liedtke (Hannover, West Germany)
`J. Max (Grenoble, France)
`R. De Mori (Montreal, Canada)
`
`M Nagao (Kyoto, Japan)
`H. Niemann (Erlangen. West Germany)
`3. Peltandini (Neuchatel, Switzerland)
`8. Picinbono (Gif—sur~Yvette, France)
`B.S. Ramakrishna (Bangalore, India)
`3. Rocca (Milano. Italy)
`B. Sankur (Istanbul, Turkey)
`-. L. Scharf (Kingston, USA)
`J. C. Simon (Paris, France)
`C. Y. Siren (Montreal, Canada)
`Ju—Wei Tai (Peking, China)
`—I. Tiziani (Heerbrugg, Switzerland)
`D. Woll (Frankfurt, West Germany)
`G. Winkler (Karlsruhe, West Germany’)
`I. T. Young (Delft, The Netherlands)
`3. Zamperoni (Braunschweig, West
`Germany)
`Secretary: I. Bezzi (Miss)
`
`Editorial Policy. SIGNAL PROCESSlNG is a European Journal
`presenting the theory and practice of signal processing.
`its
`primary objectives are:
`Dissemination of research results and of engineering develop-
`ments to European signal processing groups and individuals;
`Presentation oi practical solutions to current signal processing
`problems in engineering and science.
`The editorial policy and the technical content of the Journal are
`the responsibility of the Editor—in—Chief and the Editorial Board.
`The Journal is self-supporting from subscription income and
`contains a minimum amount 01 advertisements. Advertisements
`are subject to the prior approval of the Editor—in—Chiet
`
`Scope. SIGNAL PROCESSING incorporates all aspects of the
`theory and practice of signal processing (analogue and digital).
`It Ieatures original research work, tutorial and review articles,
`and accounts of practical developments.
`It
`is intended for a
`rapid dissemination of knowledge and experience to engineers
`and scientists working in signal processing research, develop-
`mentor practical application.
`
`Subject coverage. Subject areas covered by the Journal include:
`Signal Theory
`Signal Processing Systems
`Stochastic Processes
`Software Developments
`Detection and Estimation
`image Processing
`Spectral Analysis
`Pattern Recognition
`Filtering
`Optical Signal Processing
`
`Data Processing
`Remote Sensing
`Communication Signal Processing
`Biomedical Signal Processing
`Geophysical and Astrophysical Signal Processing
`Earth Resources Signal Processing
`Acoustic and Vibration Signal Processing
`Signal Processing Technology
`Speech Processing
`Radar Signal Processing
`Industrial Applications
`
`Sonar Signal Processing
`Special Signal Processing
`New Applications
`
`Membership and Subscription Intormation.
`SignatProcessing (ISSN 01654684) is publishediri two volumes
`(8issues)ayear.Thesubscription pricefor1984 (Volumes6,7)is
`Dtl. 350.00 + Dll. 42.00 p.p.h : Dfl. 392.00. Our p.p.h. (postage,
`packing and handling) charge includes surface delivery oi all
`issues, exceptto subscribers in the USA/Canada and India, who
`receive allissues by airdelivery(S.A L.~Suri‘ace AirlLifted)atno
`extracost Fortherestoftheworld,airmailandS_A.L chargesare
`available upon request. Claims for missing issues will he
`honouredfreeofchargewithinthrcc monthsaftertlie publication
`date of the issues. Mail orders and inquiries to: Elsevier Science
`Publishers B.V., Journals Department, PO. Box 211,1000 AE
`Amsterdam, The Netherlands. For trill membership information
`of the Association. including a subscription at a reduced rate,
`please contact: EURASIP, PO. [Box 134, CH—1000 Lausanne13,
`Switzerland.
`
`©1984, Elsevier Science Publishers B.V. (North—Holland)
`All rights reserved. No part oi this publication may be reprodiiced, stored in a retrieval system or transmitted in any ‘lorm or by any means. electronic, mechanical,
`photocopying, recording or otherwise, without the prior permission oi the publisher, Elsevier Science Publishers B.V. INorthvHoi|and), PO Box 1991, 1000 BZ
`Amsterdam, The Netherlands.
`Submission of an article for publication implies the transier oi the copyright from the author(s) to the publisher and entails the author's irrevocable and exclusive
`authorization oi the publisher to collect any sums or considerations tor copying or reproduction payable by third parties (as mentioned in article 17 paragraph 2 of the
`Dutch Copyright Act of 1912 and in the Royal Decree of June 20, 1974 (S. 351) pursuant to article 16b of the Dutch Copyiight Act of 1912) and/orto act in or out of Court in
`connection therewith.
`Special regulations for readers in the U.S.A.
`this journal has been registered with the Copyright Clearance Center. inc. Consent is given for copying of articles for
`n on the condition that the copier paysthrouqh the Centerthe per-copy fee stated in
`personal orinteinal rise, or for the personal use of specific clients This consent is give
`107 or 10Bef the U 8. Copyright Law. the appropriateteeshould be iorwarrled with
`the code on the first page oi each article for copying beyond that permitted by Sections
`gress Street, Salem, MA 01970, U.S./\. If no code appears in an article, the author has
`a copy of the first page of the article to the Copyright Clearance Center. Inc , 21 Con
`not given broad consent to copy and permission to copy must be obtained directly from the author All articles published prior to I981 may be copied for a per—copy tee oi
`US $2 25, also payable through the Center [N B. For review journals this fee is $0.25 per copy per page.) This consent does not extenrl to other kinds of copying, such as
`for general distribution, resale, advertising and promotion purposes, or forcreatiiig new collective works. Speclalwritteir permission must be obtained from the publisher
`for such copying.
`Specialregu/atiorrs [or authors in the L/.S.A. — Upon acceptance 01 an article by the journal, the authoris) will be asked to transfer copyright of the article to the publisher
`This transfer will ensure the widest possible dissemination oi intorniaiion under the U.S. Copyright Law
`
`Published bimonthly
`
`Printed in The Netherlands
`
`HUAWEI EX. 112.7 - 3/15
`
`
`
`HUAWEI EX. 1127 - 3/15
`
`
`
`Signal Processing 6 (1984) 267—278
`North—l-lolland
`
`V
`
`267
`
`SIMPLE FFT AND DCT ALGORITHMS WITH REDUCED NUMBER OF
`OPERATIONS
`
`/
`Martin VETTERLI, mcmber EURASIP, and Henri J.lNU_§SBAUMER
`Luboraluire d’In/"urmariqzte Technique, Ecale Polyleclmique Fédérale de Lausanne, 16 Chemin de Bellerive, CH-1007 Lausanne,
`Switzerland
`
`Received 2 November 1983
`Revised 20 February 1984
`
`Abstract. A simple algorithm for the evaluation of‘discrete Fourier transforms (DFT) and discrete cosine transforms (l,)('.'l')
`is presented. This approach, based on the divide and conquer technique, achieves a substantial decrease in the number of
`additions when compared to currently used FFT algorithms (30% for a DFT on real data, 15% for a DFT on complex data
`and 25% fora DCT) and keeps the same number of multiplications as the best known FFT algorithms. The simple structure
`of the algorithm and the fact that it is best suited for real data (one does not have to take a transform of two real sequences
`simultaneously anymore) should lead to efficient implementations and to a wide range of applications.
`
`Zusammenfassung. Ein einfacher Algorithmus zur Berechnung von diskreten Fourier Transformationen (DFT) und diskreten
`Cosinus Transformationen (DCT) wird vorgeschlagen. Diese Methode, basierend aufdcr “Tcilen und Losen” Tcchnik, crlaubt
`eine Verkleinerung der Anzahl Additionen gegeniiber gebraiichlichen FFT Algorithmen (30% fiir eine DFT von einem reellen
`Signal, 15% fiir eine DFT von einem complcxcn Signal und 25% fur cine DCT) und braucht gleichvicl Multiplikationcn wic
`die bcsten bckanntcn FFT Algorithmcn. Die einfache Struktur cles Algorithmus und der Fakt dass er am besten fiir reelle
`Signale geeignet
`ist (man braucht nicht mehr gleichzeitig zwei reelle Signale 7u transformieren) snllten 7u eflizienter
`lmplementierung und zu zahlreichen Applikationen fiihren.
`
`Résumé. Un algorithme simple pour l’évaIuation de la transformée de Fourier discrete (DFT) et de la transformée en cosinus
`discrete (DCT) est propose. Cette approche, basée sur la méthode de la “division et solution”, permet une diminution
`substantielle du nombre d’additions par rapport aux algorithmes de FFT courants (30% pour une DFT de signaux réels,
`15% pour une DFT de signaux complexes et 25% pour une DCT) tout en gardant un nombre de multiplicationségal a celui
`des meilleurs algorithmes de FFT connus, La structure simple de l’algorithme ainsi que le fait qu‘il s’applique bien aux
`signaux réels (il n’y a plus besoin de prendre la transformée de deux signaux réels simultanément) devraient conduire a une
`implantation efficace ainsi qu’2‘t un large champ d’applications.
`
`Keywords. Fast Fourier transform, fast cosine transform, transforms of real data.
`
`1. Introduction
`
`Since the rediscovery of the fast Fourier transform (FFT) algorithm [1, 2] for the evaluation of discrete
`Fourier transforms, several improvements have been made to the basic divide and conquer scheme as for
`example the mixed radix FFT [3] and the real factor FFT [4, 5]. The introduction of the Winograd Fourier
`transform (WFTA) [6], although a beautiful result in complexity theory, did not bring the expected
`improvements once implemented on real life computers [7], essentially due to the large total number of
`operations and to the structural complexity of the algorithm.
`The fact that most FFT’s are taken on real data is seldom fully taken into account. The algorithm using
`a FFT of half dimension forthe computation of a DFT on a real sequence [8] uses substantially more
`operations than the method of computing a single FFT on two real sequences simultaneously [9]. The
`
`016,5-1684/84/$3.00 © 1984, ‘Elsevier Science Publishers B.V. (North—Hol1and)
`
`HUAWEI EX. 1127 -4/15
`
`HUAWEI EX. 1127 - 4/15
`
`
`
`t
`
`268
`
`M. Venerli, H. J. Nusshamner/ Simple FFT and DCT Algorithms
`
`latter method has the disadvantage that one has to take two DFT’s at once and that the sorting of the
`output uses additional adds. The fact that the input and output sequences are real is used explicitly in a
`real convolution algorithm [10] where the DFT and inverse DFT are computed with a single complex EFT.
`Another transform that is mostly applied to real data is the discrete cosine transform. Since the
`introduction of the DCT [11], the search for a fast algorithm followed two main different approaches.
`One was to compute the DCT through a FFT of same dimension [12], where one is bound to take two
`transforms simultaneously. The other was a direct approach, leading to rather involved algorithms [13].
`It should be noted that the former technique outperforms all the latter ones when using optimal FFT’s,
`a fact often left in the dark [14].
`,
`Recently, evaluation of signal processing algorithms has shifted away from multiplication counts alone
`to the counting of the total number of operations, including data transfers [15]. This is due to the fact
`that the ratios (multiplication time)/(addition time) and (multiplication time)/(load time) are close to one
`on most computers and signal processors. Another growing concern has been the generation of time
`efficient software [16], and finally, the efficiency of an algorithm turns out to be a non—trivial combination
`of the various operation counts as well as of its structural complexity [17].
`In this communication, we address an old problem, namely, the efficient evaluation of DFT’s and DCT’s
`of real data. Efficiency is meant in the sense of mlinimal number of multiplications and additions as well
`as in the sense of structural simplicity. As it turns out, the two problems are closely related, since a DFT
`of dimension N can be evaluated with two DCT’s of size N/4 and since a DCT of size N can be evaluated
`with a DFT of size N and additional operations. The same technique can be applied again to the reduced
`DCT of size N/4 and to the DFT of size N, and this until only trivial transforms are left over (N = 1, 2).
`This leads to an elegant recursive formulation of the two algorithms and to a number of multiplications
`identical
`to the best FFT‘s while diminishing substantially the number of additions (typically 30%).
`Interestingly, this last saving is partly kept when computing complex DFT’s, and as an example, the total
`number of operations for a 1024-point transform is nearly 10% below the number of operations required
`for a 1008-point WFTA. The prime factor FFT (PFA) requires about the same number of operations [18],
`but has a more complex structure.
`Note that the algorithms below were developed while searching for an efficient way to compute DCT’s
`of real data. The derived FFT algorithm for real data that follows immediately requires a number of
`multiplications identical to the one found in [19] (which is a variation of the Rader—Brenner algorithm),
`and a total number of operations that can be found in [20]. While obtaining an identical complexity, the
`derivations are quite difierent and the algorithm below seems more suitable for programming.
`Section 2 is used to derive the general algorithm and Section 3 evaluates its computational complexity.
`In Section 4, the results are compared to other algorithms and some implementation considerations are
`addressed.
`
`2. Derivation of the algorithms
`
`Let us define the following transforms ofthe length-N real vector x with elements x(0), x(l) -
`
`-
`
`- x(N -1):
`
`Discrete Fourier transform
`N—l
`
`DFT(k,N,x)'.= 2 xtn)-e“i2""“”, k=0,...,N—l,
`11:0
`
`(1)
`
`where j 2
`Signal Processiiig
`
`HUAWEI EX. 1127 - 5/15
`
`HUAWEI EX. 1127 - 5/15
`
`
`
`M. Verterli, H. J. Nussbaumer / Simple FFT and DCT Alguritlnns
`
`Discrete cosine transform
`
`DCT(k,N,x):= 2 x(n)-cos<&L), k=o,...,N—1.
`
`2
`
`2 +1 k
`
`~—'
`
`n*0
`
`Cosine DFT
`
`cos-DFT(k, N, x)I= 2 x(n)‘cos< "73
`
`N4
`n=0
`
`2
`
`k
`
`k=0,...,N—— 1.
`
`Sine DPT
`
`269
`
`(2)
`
`(3)
`
`I‘/—-I
`11:0
`
`2
`
`k
`
`(4)
`k:0,...,N—l.
`sin-DFT(k,N,x)I: 2 x(n)-sin< “L”
`Note that in the definition of the DCT, the normalizing factor 1/x/2 at k =0 is omitted for simplicity.
`While the transforms are usually only defined for k = 0,. .., N — 1, one will see that other values are useful
`as well and can easily be obtained from the ones defined above.
`Obviously, the following relations hold:
`
`DFT(k, N, x) = cos-DFT(k, N, x) —j sin-DFT(k, N, x),
`
`DCT( N, N, x) 2 0,
`
`DCT(—k, N, x) = DCT(k, N, x),
`
`DCT(2N~ k, N, x) = - DCT(k, N, x),
`
`c0s—DFT(N — k, N, x) = cos-DFT(k, N, x),
`
`sin-DFT(N ~ k, N, .1‘): —sin-DFT(k, N, x).
`
`Looking at the evaluation of cos-DFT(k, N, x), we note that since the cosine function is even:
`WM
`2
`k WM
`2
`2
`1 k
`
`cos—DFT(k,N,x)= IE0 x(2n)-cos(]:;2)+ "E0 (x(2n+l)+x(N—2n—l))-cos< >,
`
`or, in a more succinct form:
`
`cos-DFT(k, N, x)= cos-DFT(k, N/2, x.) +DCT(k, N/4, x2),
`
`k 2 0, .
`
`.
`
`.
`
`, N — l,
`
`with x1(n)=x(2n),
`
`n=O,...,N/2—1,
`
`x2(n)=x(2n+l)+x(N—2n—l), n=O,...,N/4—l,
`
`(5)
`
`(6)
`
`(7)
`
`(8)
`
`(9)
`
`(10)
`
`(11)
`
`(12)
`
`where (8) or (9) are applied when necessary.
`Hence, the cosine DFT of dimension N has been mapped into a cosine DFT of N/2 and a DCT of
`N/4, at a cost of N/4 input and N/2 Output additions (note that cos-DFT(k, N, x) is evaluated for
`k =0, .
`.
`.
`, N/2 but that Ic: N/4 does not require an output addition). The cosine DFT of N/2 can be
`handled similarly (and this until the transform becomes trivial) and the case of the DCT will be treated
`below.
`I
`,
`Turning to the sine DPT, We take a similar approach, using the fact that the sine function is odd.
`N/2—1
`
`sin—DFT(k, N, x)= Z x(2n) - sin<2]:]T;1£() +N/iii (x(2n + l)—x(N—2n ~ 1)) - sin< 27r(2n+l)k
`4-N/4
`
`n:
`
`(13)
`
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX. 1127 -6/15
`
`HUAWEI EX. 1127 - 6/15
`
`
`
`
`‘.'§¢K’..'\¥'§"~.\.:\k§..2
`
`
`
`t
`
`2
`‘
`
`
`
`2
`
`€31
`
`E:7
`
`3t Es
`
`DCT(N/2, N, x) 2 cos(~rr/4) - cos-DFT(N/2, N, x4).
`
`(19)
`
`The matrix product in (19) can be evaluated with 3 multiplications and 3 additions (instead of 4
`multiplications and 2 additions) [9] as follows:
`
`21-rk
`_
`21rk
`p12cos W +s1n W,
`
`211k
`21rk
`_
`p22s1n W —cos W,
`
`s, 2 cos-DFT(k, N, x4) +sin-DFT(k, N, x4),
`
`2TFk
`
`ml2s, ' COS W ,
`Signal Processing
`
`HUAWEI EX. 1127 - 7/15
`
`270
`
`M. Verrerlz, H.
`
`.1. Nussbrmmer/ Simple FFT and DCTAlgoriII1ms
`
`Using the following identity:
`
`Sin( > __ (._1)" . C0S< )’
`
`we can rewrite (13), using (8) or (10) when necessary, as:
`
`sin-DFT(k, N, x)2sin-DFT(k, N/2, x,)+DCT(N/4- k, N/4, x3),
`
`k 2 0, .
`
`.
`
`.
`
`, N— I,
`
`with x3(n)2(—1)"-(x(2n+1)—x(N~2n—l)).
`
`(14)
`
`(15)
`
`Therefore, the sine DFT of dimension N has been mapped into a sine DFT of N/2 and a DCT of N/4,
`at the cost of N/4 input and (N/2)—2 output additions (note that sin-DFT(k, N, x) is evaluated for
`k 2 0, .
`.
`.
`, N/2 but that k 2 0, N/4 and N/2 do not require any output additions). The sine DFT of N/2
`is handled in a similar fashion until the length is reduced so that the transform becomes trivial.
`We now focus our attention on the computation of the DCT. Using the following mapping [I2]:
`
`X4(")= x(2"),
`
`x4(N—n—l)2x(2n+l), n20,...,N/2~l,
`the DCT can be evaluated as:
`N"‘
`2
`4 +1 k
`DCT(k, N, x)2 2 x4(n) - cos<—T11—)>,
`Using basic trigonometry, (17) becomes:
`2 k
`l)CT(k, N, x) : cos(i—) -cos-DFT(k, N, x4) —sin<L) -sinmDFT(Ic, N, x4),
`4N
`
`n2O
`
`/«:0, .
`
`i
`
`. ., N— 1.
`
`With the symmetries of trigonometric functions, (18) can be computed as follows:
`
`2 k
`4N
`
`(16)
`
`(17)
`
`k = 0, .
`
`.
`
`.
`
`, N -1.
`
`(18)
`
`0- T —''m ——
`
`C
`
`[ DCT(k, N, x)
`DCT(N—k, N, x)
`
`]_
`
`and
`
`(Zwk)
`5 4N
`_ (wk)
`
`Sln 7 cos
`4N
`
`_ <2'rrk)
`4N
`<21'rk>
`
`°
`
`_ [cos-DFT(k, N, W]
`-
`sin»DFT(k, N, x4)
`
`’
`
`kfo
`
`""’
`
`N/2_1
`
`4N
`
`HUAWEI EX. 1127 - 7/15
`
`
`
`M. Velterli, II. J. Nusshaumer/ Simple FFT and DCT Algorithms
`
`271
`
`m2 = sin-DFT(k, N, x4) - pl,
`
`m5 : cos—DFT(k, N, x4) - P2,
`
`DCT(k, N," x) = m, — Inz,
`
`DCT( N —— k, N, x) =: m, + m3,
`
`'
`
`(20)
`
`where p. and p2 are precomputed. Using all simplifications, the computation of (19) requires therefore a
`total of (3 N/2)—2 multiplications and (3 N/2)e~/3 additions.
`Thus, we have shown how to map a N dimensional DFT into two DCT’s of N/4 (5, l2 and 15) and
`how to map a DCT of N into a DFT of same size (16 and l9). In other words, since the DCT is computed
`through a DFT, it is shown how to compute a DFT of N with 2 DFT’s of N/4 plus auxiliary operations,»
`and this operation can be repeated until N has been reduced sufficiently so as to lead to trivial transforms.
`Figures 1
`to 4 try to visualize schematically the interaction of the various transforms. Combination of
`these figures in appropriate order show how to compute a transform of any dimension.
`
`DFT(N)
`
`cos-DFT(N)
`
`Fig. 1. Decomposition ofa DFT of size N into a cosine DFT of size N and a sine L)l<T of size N.
`
`sm-DFT(N)
`DCT(N/ll)
`
`cos—lJFT(N)
`
`cos—DFl(N/2)
`
`Fig. 2. Decomposition of a cosine DFT of size N into a cosine I)FT of size, N/2 and a DCT of size N/M4.
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX. 1127 - 8/15
`
`HUAWEI EX. 1127 - 8/15
`
`
`
`272
`
`M. Venerli, H. J. Nu.vsbamner/ Simple FFT and DCT /llgorirhms
`
`siN—DFT(N)
`
`stN—DT(N/2)
`
`DCT(N/ll)
`s1N-DFT(N)
`
`Fig. 3. Decomposition of a sine DFT of size N into a sine DFT of size N/2 and a DCT of size N/4,
`
`DCT(N
`
`cos4DFT(N)
`
`Fig. 4. Decomposition of a DCT of‘ size N into a cosine DFT of size N and a sine DFT of size N plus auxiliary operations.
`
`The algorithm is best suited for DFT’s on real data, but if the input is complex, one simply takes the
`real and imaginary parts separately (thus doubling the computational load) and evaluates the output with
`2N—4 auxiliary adds (k =0 and N/2 require no additions).
`
`*3. Computational complexity
`
`Even if the derivation above used only the fact that N was a multiple of 4, the algorithm, as other
`divide and conquer schemes, performs best when N is highly composite, typically a power of 2.
`Below, we restate the number of operations required for the various steps needed in the evaluation of
`the transforms, where ()M[-] and OAL-] stand for the number of multiplies and adds respectively.
`Signal Processing
`
`HUAWEI EX. 1127 - 9/15
`
`HUAWEI EX. 1127 - 9/15
`
`
`
`M. Vetlerli, H. J. Nu:s'baumer/ Simple FFT and DCT Algm‘ilhm.§
`
`DFT on length-N real data:
`
`OM[R-DFT(N)] = OM[cos—DFT(N)] + OM[sin—DFT(N)j,
`-
`—
`_
`_
`OA[R»DFT(N)] = OA[cos-DFT(N_)] + OA[s1n—DFT(N)j.
`DFT on length-N complex data:
`1
`
`0M[C-DFT(N)] = 2- (OM[c0s-DFT(N)] +OM[sin-DFT(N)]),
`OA[C-DFT(N)] = 2 - (OA[cos-DFT(N)] + OA[sin-DFT(N)]) +2N -4.
`
`DCT on length-N real data:
`
`OM [DCT(N)] = OM[cos-DFT(N)j + O.v1|_Sln~DFT(N)] +(3 N /2) ~ 2,
`
`OA[DCT(N)] = OA[cos-DFT(N)] + OA[sin-DFT(N)] +(3 N/2) — 3.
`
`Cosine DPT on length-N real data:
`
`OM[cos-DFT(N)] = 0M[DCT(N/4)] + 0,‘; [cos-DFT(N/2)],
`
`0A[cos-DFT(N)] = OA[DCT(N/4)] + O,,[cos-DFT(N/2)] +(3 N/4).
`
`Sine DFT on length-N real data:
`
`OM[sin—DFT(N)] = 0M[DCT(N/4)] + ()M[sin-DFT(N/2)],
`OA[sin—DFT(N)] = OA[DCT(N/4)] + OA[sin-DFT(N/2)] +(3 N/4) — 2.
`
`273
`
`(21)
`
`(22)
`
`(23)
`
`(24)
`
`25
`
`(
`
`)
`
`From (21)—(25) one can compute recursively the number of operations needed for the various transform
`types and sizes greater than 2. For N = 1, no operations are required, and the values for N =2 are given
`in (26), thus defining the initial conditions for the above recursions.
`
`OM[R~DFT(2)] = OMLC-DFT(2)] = OM[cos-Dl<T(2)] : OM[sin-DFT(2)]: 0,
`
`()M[DCT(2)] : 1,
`
`OA[R-DFT(2)] = OA{DCT(2)] = OA[cos-DFT(2)] = 2,
`
`(26)
`
`OA[C-DFT(2)] = 4,
`
`OA[sin-DFT(2)] = 0.
`
`But, when N is a power of 2 and that the recursions are therefore applied Log; N times, the operation
`counts for the DCT reduces simply to:
`
`0M[DCT(N)] = N/2 ' Logz (N),
`
`O,\[DCT(N)]= N/2 - (3 Log2(N)—2) + 1.
`
`From (27) it follows immediately with (21)—(23) that:
`
`OM[R-DFT(N)]= N/2 - (I.og2(N)—3)+2,
`OA[R-DFT(N)]= N/2 - (3 Log2(N)—5)+4,
`
`(27)
`
`(28)
`
`Vol. 6, No. 4, August I984
`
`HUAWEI EX. 1127 - 10/15
`
`HUAWEI EX. 1127 - 10/15
`
`
`
` 32,
`
`4‘l
`'4
`
`274
`and
`
`M. Verterli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`OM[C-DFT(N)] = N~ (Log;(N) — 3) +4,
`OA[C-DFT(N)] ; 3N- (Log2(N) — 1) +4.
`
`(29)
`
`The above closed form expressions can now easily be compared to existing algorithms.
`
`4. Comparison with existing algorithms
`
`First, the introduced algorithm (which is called in the following fast Fourier-cosine transform or FFCT)
`is compared to other algorithms which work directly on real data. Then we look at the case where
`two transforms on real data are taken simultaneously with an optimal FFT algorithm. The case of the
`FFT on complex data is investigated as well, and the issue of algorithm structure is addressed. The
`operation counts for FFT’s are taken from [9] unless otherwise specified.
`For the computation of a DFT on real data, one can use the algorithm based on an FFT of N/2 [8].
`Using again the 3 mult/3 adds approach for the auxiliary output operations, one gets the following
`complexity:
`
`0M[FFT(N)] = OM[FFT(N/2)] +(3 N/2) _ 2,
`0A[FFT(N)] = O,\[FFT(N/2)] +(9N/2)—2.
`
`(30)
`
`Table l compares this result with the FFCT and shows the substantial savings that are obtained.
`
`Table I
`
`Comparison of direct computation of a DFT on real data with a FFT of N/2 or
`the FFCT algorithm (the FFT of N/2 is computed with the Rader—Brenner algorithm)
`
`size
`
`N
`
`8
`I6
`32
`64
`I28
`256
`512
`I024
`2 048
`
`FFT of N/2 +ops.
`
`FFCT algorithm
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`10
`26
`66
`162
`386
`898
`2050
`4610
`10 242
`
`_
`
`42
`I22
`290
`710
`l 678
`3 870
`8766
`I9 582
`43 262
`
`2
`10
`34
`98
`258
`642
`1538
`3586
`8 194
`
`20
`60
`164
`420
`l 028
`2 436
`5636
`12804
`28 676
`
`Turning to the DCT, one can compare the FFCT to fast discrete cosine algorithms which perform the
`transform directly, without going to the FFT (and therefore to the need to perform 2 transforms on real
`data at the same time). The various operation counts are shown in Table 2. The number of multiplications
`for the FFCT is always below the other algorithms, and even if the number of additions is slightly above
`the Chen et al. version, the total number of operations is always substantially less.
`Signal Processing
`
`HUAWEI EX. 1127 - 11/15
`
`HUAWEI EX. 1127 - 11/15
`
`
`
`M. Vetlerli, H. J. Nussbaumer/ Simple FFT and DCT Algorithms
`
`275
`
`Table 2
`
`Comparison of direct computation of a DCT on real data with the algorithms of Chen el al. [13], Wang er a1. [14]
`and the FFCT (the operation counts for the two first algorithms are taken from [14])
`
`size
`
`N
`
`8
`16
`32
`64
`128
`256
`512
`1024
`2048
`
`Chen el al.
`
`.
`
`Wang er (11.
`
`FFCT
`
`r. mults
`
`I‘. adds
`
`1‘. mults
`
`I‘. adds
`
`r. mults
`
`r. adds
`
`16
`44
`116
`292
`708
`1668
`3844
`8 708
`194611
`
`26
`74
`194
`482
`1154
`2 690
`6146
`13 826
`30722
`
`13
`35
`91
`227
`547
`1283
`2947
`6 659
`14851
`
`29
`83
`219
`547
`1315
`3 075
`7043
`15 875
`35 331
`
`12
`32
`80
`192
`448
`1024
`2304
`5120
`11264
`
`29
`81
`209
`513
`1217
`2 817
`6401
`14 337
`31745
`
`Tl1e other general approach to transforms on real data is to compute simultaneously the transform of
`two real sequences. Beside the drawback that one has always to compute two transforms (otherwise the
`approach is suboptimal), the sorting of the output requires about N output additions. The operation count
`for a DFT on real data are given below:
`
`OM[R-DFT(N)]: 1/2 - OM[FFT(N)],
`OA[R-DFT(N)]:1/2- OA[FFT(N)]+ N -2.
`
`(31)
`
`Table 3 compares (31) with the FFCT, and, as can be seen, the multiplication count is identical while a
`saving of about 30% is made with regard to additions.
`
`The situation is quite similar when computing a DCT with an FFT [12]. It is assumed that 2 DCT’s on
`real data are evaluated at the same time. Thus, the operation count is:
`
`OM[R-DFT(N)]= 1/2 - OM[FFT(N)] +(3N/2)~2,
`OA[R-DFT(N)]=1/2- OA[FFT(N)]+(5N/2)— 5.
`
`Table 3
`
`Comparison of the computation of a DFT on real data with an FFT of N on two
`data sequences simultaneously and output additions or the FFCT algorithm (the
`FFT of N is computed with the Rader—Brenner algorithm)
`
`size
`
`N
`
`8
`16
`32
`64
`128
`256
`512
`1024
`2 048
`
`FFT ofN1 adds
`
`F1"-CT algorithm
`
`real mults
`
`real adds
`
`real mulls
`
`real adds
`
`2
`10
`34
`98
`258
`642
`1538
`3586
`8194
`
`32
`88
`242
`614
`1486
`3 486
`7 998
`18 046
`40.190
`
`2
`10
`34
`98
`258
`642
`1538
`3 586
`8194
`
`20
`60
`164
`420
`1028
`2 436
`5 636
`12 804
`28 676
`
`(32)
`
`Vol. 6, No. 4, August 1984
`
`HUAWEI EX. 1127 - 12/15
`
`HUAWEI EX. 1127 - 12/15
`
`
`
`276
`
`M. Vellerli, H. J. Nu.v.vbaumer/ Simple FFT and DCTAlguri(Im1s
`
`A technique similar to the one used in (23) was used in the computation ofthe output of the DCT. The
`results are compared in Table 4 where it is seen that the additions are reduced in the FFCT by about
`25% with an identical number of multiplies.
`Table 4
`
`Comparison of the computation of a DCT on real data with the algorithm using
`an FFT on two data sequences simultaneously or the FFCT (the FFT is
`the
`Rader—-Brenner version)
`
`size
`
`N
`
`8
`16
`32
`64
`128
`256
`S12
`1024
`2 048
`
`FFT of N Hips.
`
`FFCT algorithm
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`12
`32
`80
`192
`448
`I 024
`2304
`5120
`11 264
`
`41
`109
`287
`707
`1675
`3 867
`8763
`19579
`43 259
`
`12
`32
`80
`192
`448
`1 024
`2304
`5120
`11 264
`
`29
`81
`209
`513
`1217
`2 817
`6401
`14 337
`31 745
`
`Turning to the computation ofa DFT on a complex input, one can use (29) for the FFCT. The comparison
`to the Rader—Brenner FFT is given in Table 5 where the number of multiplies turns out to be identical
`while the number of additions is reduced by nearly 20%.
`Looking at last at the Winograd Fourier transform, one sees that even if the number of multiplies is
`larger in the FFCT case, the total number of operations is smaller for large transforms (for example 5%
`for the WFTA(504)/FFCT(5l2) and 9% for the WFTA(l008)/FFCT(1024) comparison). Note that the
`gain is less in the PFA case (5% and 2% respectively).
`Concerning the WFTA and the PFA, one recalls that its main drawback is the involved structure of the
`algorithm. Even if the structure of the FFCT is not as straightforward as the radix-2 FFT,
`it remains
`simple (it is similar to the structure of the Rader—Brenner algorithm). ln the FFCT, the reduction in the
`number of operations is obtained at the cost of additional permutations and data transfers. Even if these
`
`Table 5
`
`Comparison ofthe Rader—Brenner FFT with the FFCT when applied to complex data
`
`FFT of N
`
`FFCT algorithm
`
`real mults
`
`real adds
`
`real mults
`
`real adds
`
`4
`20
`68
`196
`516‘
`,1 284
`3 076
`7172
`16388
`
`52
`148
`424
`l 104
`2720
`6 464
`14 976
`34 048
`76 288
`
`4
`20
`68
`196
`516
`l 284
`3 076
`7 172
`16388
`
`52
`I48
`388
`964
`2308
`5 380
`12 292
`27 652
`61444
`
`size
`
`N
`
`8
`16
`32
`64
`128
`256
`512
`1024
`2048
`
`Signal Processing
`
`HUAWEI EX. 1127 - 13/15
`
`HUAWEI EX. 1127 - 13/15
`
`
`
`
`
`M. Vetterli, H. J. Nu.rsbaumer/ Simple FFTant1 DCTAIgari1lm1s
`
`277
`
`permutations can be grouped with previous ones (for example at the entry of the DCT), the resulting
`structure is more complex. But this increase in topological complexity is annihilated by the substantial
`decrease in arithmetic complexity.
`Looking at the implementation, one sees the importance of using very efficient small transforms. Together
`with the explicit coding technique [16], this should lead to fast code for real transforms, especially on
`micro and signal processors, where the number of arithmetic and data registers is small (leading to
`unetficient implementations of the complex transform on two real sequences version).
`
`5. Concluding remarks
`
`We have introduced a simple, recursive algorithm for the computation of the discrete Fourier transform
`and the discrete cosine transform. First, this algorithm can be applied directly to real data, providing
`therefore an attractive alternative to the method using a complex transform oftwo real sequences. Secondly,
`it uses the same number of multiplies as the best structured FFT algorithms but decreases the number of
`additions by about 25 to 30% for DF'l"s and DCT’s on real data and by nearly 20% for DFT’s on complex
`data when compared to currently used algorithms. At last, its structure is simple enough so that it should
`lead to efficient implementations.
`It is worth noting that the proposed algorithm, while leading to the same computational complexity
`for the FFT as the one in [20], is not isomorphic. Interestingly, the algorithm in [20] has been around for
`15 years but was seldom referenced and even less used. Meanwhile, literature was published on amelior-
`ations of the Cooley—Tukey FFT which leads to less efficient algorithms than the one in [20]. .
`.
`. More
`recently, actually just as the linal copy was being sent to the publisher, yet another algorithm appeared
`that leads to the same complexity for the complex FFT [21].
`Investigations are under way in order to generalize the above ideas to other sinusoidal transforms and
`to higher dimensions, as well as to prove the efficiency of an implementation.
`1
`
`Acknowledgements
`
`The authors would like to thank one of the anonymous reviewers for bringing [20] and [21] to their
`attention as well as the “Fonds National Suisse de la Recherche Scientifique” for supporting this research.
`
`References
`
`[1] C. Runge, and H. Konilz, Varlesungen iiber Numerisches Rerhnen, Springer Verlag, Berlin, 1924.
`[2] J.W. Cooley, and .I.W. Tukey, “An algorithm for the machine calculation of complexfouricr series”, Math.