`
`LGE EXHIBIT NO. 1006
`
`- i -
`
`Amazon v. Jawbone
`U.S. Patent 11,122,357
`Amazon Ex. 1006
`
`
`
`
`
`DIGITALSIGNAL
`
`PROCESSING12JAN20021
`
`- ii-
`
`
`
`Digital
`Signal
`Processing
`
`A Review Journal
`
`
`
`(a) GSC Lower Path
`
`(b) NFAB Lower Path
`
`Volume 12, Number1
`January 2002
`
`IDE » l’First
`
`Articles published online first
`hittp:/Anww.ideallbrary.com
`
`TX 5-489-033
`PURTXOGB5482033%
`
`Editors
`Jim Schroeder
`Joe Campbell
`
`ISSN 1051-2004
`
`a)
`
`ACADEMIC
`PRESS
`
`An Elsevier Science Imprint
`
`
`
`-iii -
`
`
`
`Digital
`Signal
`Processing
`
`
`
`A Review Journal
`
`Editors
`
`Jim Schroeder
`SPRI/CSSIP
`Adelaide, SA, Australia
`E-mail: schroeder@cssip.edu.au
`
`Joe Campbell
`MLT. Lincoln Laboratory
`Lexington, Massachuseits
`E-mail: j.campbell@ieee.org
`
`Editorial Board
`
`Maurice Bellanger
`CNAM
`Paris, France
`Robert E. Bogner
`University of Adelaide
`Adelaide, SA, Australia
`Johann F. BGhme
`Ruhr-Universitat Boch
`Soneet aoonum
`Bochum, Germany
`James A. Cadzow
`Vanderbilt Universit
`Nashville, Tennessee
`G. Clifford Carter
`NUWC
`Newport, Rhode Island
`A. G. Constantinides
`imperial College
`London, England
`Petar M. Djuric
`State University of New York
`Stony Brook, New York
`Anthony D. Fagan
`University College Dublin
`Dublin, treland
`Sadaoki Furui
`Tokyo institute of Technology
`Tokyo, Japan
`
`John E. Hershey
`General Electric Company
`Schenectady, New York
`B. R.Hunt
`University of Arizona
`Tucson, Arizona
`JamesF. Kaiser
`Duke Universily
`Durham, North Carolina
`R. Lynn Kirlin
`University of Victoria
`Victoria, British Columbia, Canada
`Ercan Kuruoglu
`Istituto di Elaborazione della Informazione
`Ghezzano, Italy
`Meemong Lee
`Jet Propulsion Laboratory
`Pasadena, California
`Petre Stoica
`Uppsala University
`Uppsala, Sweden
`Mati Wax
`Wavion, Ltd
`Yoqneam,Isreal
`Rao Yarlagadda
`Oklahoma State University
`Stillwater, Oklahoma
`
`Cover photo. Lower path directivity pattern at 5000 Hz. See the article by McCowan, Moore, and Sridharan in
`this issue
`
`- iv -
`
`
`
`
`
`LopYRIGHT DEE
`
`COPY,
`
`Digital Signal Processing
`
`Volume 12, Number 1, January 2002
`
`© 2002 Elsevier Science (USA)
`
`All Rights Reserved
`
`No part of this publication may be reproduced ortransmitted in any form or by any means, electronic or mechanical, including photocopy,
`recording, or any information storage andretrieval system, without permission in writing from the Publisher. Exceptions: Explicit permission
`from Academic Pressis not required to reproduce a maximum oftwofigures or tables from an Academic Pressarticle in anotherscientific or
`research publication provided that the material has not been credited to another source andthatfull credit to the Academic Pressarticle is
`given. In addition, authors of work contained herein need not obtain permissionin the following casesonly: (1) to use their original figures or
`tables in their future works; (2) to make copiesoftheir papers for use in their classroom teaching; and (3) to includetheir papers aspart of their
`dissertations.
`The appearance of the codeat the bottom ofthefirst page of an article in this journal indicates the Publisher's consentthat copies of the
`article may be madefor personalorinternal use, or for the personal or internal use of specific clients. This consentis given on the condition,
`however, that the copier pay the stated per copy fee through the Copyright Clearance Center,
`Inc. (222 Rosewood Drive, Danvers,
`Massachusetts 01923), for copying beyondthat permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to
`other kinds of copying, such as copying for generaldistribution, for advertising or promotional purposes, for creating new collective works, orfor
`resale, Copy fees for pre-2002articles are as shownonthearticle title pages; if no fee code appears onthetitle page, the copyfee is the same
`as thosefor currentarticles.
`
`1051-2004/02 $35.00
`MADE IN THE UNITED STATES OF AMERICA
`This journalis printed on acid-free paper
`
`©
`DIGITAL SIGNAL PROCESSING(ISSN 1051-2004)
`Published quarterly by Elsevier Science.
`Editorial and Production Otfices: 525 B Street, Suite 1900, San Diego, CA 92101-4495
`Accounting andCirculation Offices: 6277 Sea Harbor Drive, Orlando, FL 32887-4900
`2002: Volume 12. Price $343.00 U.S.A. and Canada; $374.00 all olher countries
`All prices include postage and handling
`Information concerning personal subscription rales may be obtained by wriling to the Publishers. All correspondence, permission requests, and subscription orders
`should be addressedto the office of ihe Publishers at 6277 Sea HarborDrive, Orlando, FL 32887-4900 (telephone: 407-345-2000). Sendnolices of changeof address
`to the office of the Publishers at least 6 to 8 weeksin advance. Pleaseinclude both old and new addresses. POSTMASTER: Send changesol addresslo Digital Signat
`Processing, 6277 Sea Harbor Drive, Orlando, FL 32887-4900.
`
`- v-
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`- 87 -
`
`
`
`88
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`practical array dimensions. Low frequency performanceis critical for speech
`processing applications, as significant speech energy is located below 1 kHz.
`By explicitly maximizing the array gain, superdirective beamforming tech-
`niquesare able to achieve greater directivity than conventional techniques with
`closely spaced sensorarrays [1]. This directivity generally comes at the expense
`of a controlled reduction in the white noise gain of the array. Recent work has
`demonstrated the suitability of superdirective beamforming for speech enhance-
`ment and recognition tasks [2, 3]. By employing a spherical propagation model
`in its formulation, rather than assuminga far-field model, near-field superdirec-
`tivity (NFSD) succeedsin achieving high directivity at low frequencies for near-
`field speech sources in diffuse noise conditions [4]. In previous work, near-field
`superdirectivity has been shownto lead to good speech recognition performance
`in high noise conditions for a near-field speaker [5].
`Superdirective techniques are typically formulated assuming a diffuse noise
`field. While this is a good approximation to many practical noise conditions,
`further noise reduction would result from a more accurate model of the
`actual noise conditions during operation. Adaptive array processing techniques
`continually update their parameters based on the statistics of the measured
`input noise. The generalized sidelobe canceler (GSC) [6] presents a structure
`that can be used to implement a variety of adaptive beamformers. A block
`diagram of the basic GSC system is shown in Fig. 1. The GSC separates
`the adaptive beamformer into two main processing paths—a standard fixed
`beamformer, w, with L constraints on the desired signal response, and an
`adaptive path, consisting of a blocking matrix, B, and a set of adaptive filters, a.
`As the desired signal has been constrained in the upper path, the lower path
`filters can be updated using an unconstrained adaptive algorithm, such as the
`least-mean-square (LMS) algorithm.
`While the theory of adaptive techniques promises greater signal enhance-
`ment, this is not always the case in real situations. A common problem with
`the GSC system is leakage of the desired signal through the blocking matrix,
`resulting in signal degradation at the beamformer output. This is particularly
`problematic for broadband signals, such as speech, and especially for speech
`recognition applications where signaldistortion iscritical.
`In this paper we propose a system that is suited to speech enhancementin a
`practical near-field situation, having both the good low frequency performance
`of near-field superdirectivity and the adaptability of a GSC system, while taking
`
`
`
`FIG. 1. Generalized sidelobe canceler structure.
`
`
`
`- 88 -
`
`- 88 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`89
`
`care to minimize the problem of signal degradation for near-field sources.
`We begin by formulating a concise model for near-field sound propagation in
`Section 2. This model is then used in Section 3 to develop the proposed near-
`field adaptive beamforming (NFAB) technique. To demonstrate the benefit of
`the technique over existing methods, an experimental evaluation assessing
`directivity patterns, speech enhancement performance, and speech recognition
`performanceis detailed in Sections 4 and 5.
`
`2. NEAR-FIELD SOUND PROPAGATION MODEL
`
`a I
`
`n sensor array applications, a succinct means of characterizing both the
`array geometry andthelocation of a signal sourceis via the propagation vector.
`The propagation vector concisely describes the theoretical propagation of the
`signal from its source to each sensor in the array. In this section, we develop an
`expression for the propagation vector of a sound source located in the near-field
`of a microphone array using a spherical propagation model. This expression is
`then used in the formulation of the proposed near-field adaptive beamformerin
`the following sections.
`Many microphone array processing techniques assume a planar signal
`wavefront. This is reasonable for a far-ficld source, but when the desired
`source is close to the array a more accurate spherical wavefront model must
`be employed. For a microphonearray of length L, a source is considered to be
`in the near-field if r <2L2/A, where r is the distance to the source and A is the
`wavelength.
`Wedefine the reference microphoneas the origin of a 3-dimensional vector
`space, as shown in Fig. 2, The position vector for a source in direction (5, ¢s),
`at distance r, from the reference microphone,is denoted p, and is given by:
`
`cos 9; sin ds
`
`cos
`
`Ps =rs(&% y. Z]|sind, sing,|. (1)
`
`(i = 1,...,N), are similarly
`The microphone position vectors, denoted as p;
`defined. The distance from the source to microphone /is thus
`
`d; = ||[Ps — Pill:
`
`(2)
`
`where|| || is the Euclidean vector norm.
`In such a model, the differences in distance to each sensor can besignificant
`for a near-field source, resulting in phase misalignment across sensors. The
`difference in propagation time to each microphonewith respect to the reference
`microphone(/ = 1) is given by
`
`
`UG oe
`Cc
`
`(3)
`
`
`
`- 89 -
`
`- 89 -
`
`
`
`90
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`source
`
`Az
`
`microphone ij
`Bi
`
`-
`x
`
`FIG. 2. Near-field propagation model.
`
`where c = 340 ms ! for sound. In addition, the wavefront amplitude decays at a
`rate proportional to the distance traveled. The resulting amplitude differences
`across sensors are negligible for far-field sources, but can be significant in
`the near-field case. The microphone attenuation factors, with respect to the
`amplitude on the reference microphone, are given by
`
`a= a
`
`Thus, if +1(f) is the desired source at the reference microphone, the signal on
`the ith microphoneis given by
`
`xi(f) =ayjxy(fye i,
`
`6)
`
`Consequently, we define the near-field propagation vector for a source at
`distance r and direction (6, @) as
`
`d(f.r,6,@)= [ae227m LeFATIH ayeJ2tftu] Tt
`
`(6)
`
`3. NEAR-FIELD ADAPTIVE BEAMFORMING
`
`= T
`
`he proposed system structure is shown in Fig. 3. The objective of the
`proposed technique is to add the benefit of good low frequency directivity
`to a standard adaptive beamformer, as low frequency performanceis critical
`in speech processing applications. The upper path consists of a fixed near-
`field superdirective beamformer, while the lower path contains a near-field
`compensation unit, a blocking matrix and an adaptive noise cancelingfilter.
`The principal components of the system are discussed in the followingsections.
`
`- 90 -
`
`- 90 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`91
`
`Fixed
`
` NFSD +
`Beamformer
`>(+)
`yi
`
`Near-field
`Blocking
`
`
`
`Matrix
`compensation
`
`
`D(f)
`
`
`
`FIG. 3. Near-field adaptive beamformer.
`
`Section 3.1 gives an explanation of the near-field superdirective beamformer.
`Section 3.2 proposes the inclusion of a near-field compensation unit in the
`adaptive sidelobe canceling path and examines its effect on reducing signal
`distortion at the output. Once this near-field compensation has been performed,
`a standard generalized sidelobe canceling blocking matrix and adaptive filters
`can be applied to reduce the output noise power, as discussed in Section 3.3.
`
`3.1. Near-field Superdirective Beamformer
`Superdirective beamforming techniques are based upon the maximization of
`the array gain, or directivity index. The array gain is defined as the ratio of
`output signal-to-noise ratio to input signal-to-noise ratio and for the general
`case can be expressed in matrix notation as [1]
`
`w(f)P(fywis)
`Gf) =.
`(f) wif) Q(fywi/)
`
`where w(/) is a column vector of channelgains,
`
`wif) = (wif)... wi(f) ... wf]
`
`7
`
`(8)
`
`()” is the complex conjugate transpose operator, and P(f) and Q(/) are
`the cross-spectral density matrices of the signal and noise respectively. In
`practical speech processing applications the form of the signal and noise cross-
`spectral density matrices is generally unknown and must be estimated, either
`from mathematical models (fixed beamformers) or from thestatistics of the
`multichannel inputs (adaptive beamformers). Superdirective beamformers are
`calculated based on assumed mathematical models for the P(f) and Q(f)
`matrices.
`Whenthe desired signal is known to emanate from a single source at location
`(r;, 95, @s), the signal cross-spectral matrix P simplifies to the propagation vector
`of the source, and the array gain can be expressed as
`
`GP)=
`
`IwfF dCf re. A). Gol?
`Seee
`wfiQsywif)
`
`”
`
`9
`
`9%)
`
`-91 -
`
`- 91 -
`
`
`
`92
`
`Digital Signal Processing Vol, 12, No. 1, January 2002
`
`where d(/,r, @,¢) is the propagation vector for the desired source, as defined in
`Eq.(6).
`A diffuse (spherically isotropic) noisefield is often a good approximation for
`many practical situations, particularly in reverberant closed spaces, such as ina
`car or an office [7, 8]. For diffuse noise, the noise cross-spectral density matrix Q
`can be formulated as
`
` | [acre oraceo. or" sinaaads,
`1
`Qh = ta
`
`Jo
`
`(10)
`
`where d(/, 9, ¢) is the propagation vectorof a far-field noise source (r > 2.7/2)
`in direction (0, ¢).
`The superdirectivity problem is thus formulatedas:
`
`IW PACS. rs. bo)?
`wih wif FQC(fyw is)
`
`(11)
`
`to formulate the propagation
`By using a spherical propagation model]
`vector, d, the standard superdirective formulation can be optimized for a near-
`field source [9, 4], As such, the only difference in the calculation of the standard
`and near-field superdirective channel filters is the form of the propagation
`vector, d. For a near-field source, the assumption of plane wave (far-field)
`propagation leads to errors in the array response to the desired signal due
`to curvature of the direct wavefront. A thorough discussion of the use of a
`near-field model for superdirective microphone arrays is given by Ryan and
`Goubran [9].
`Cox [10] gives the general superdirective filter solution subject to
`
`1. L linear constraints, C(/)"w(/) = g(f) (explained below); and
`2. aconstraint on the maximum white noise gain, w(f)" w(f) = 62, where
`6“ is the desired white noisegain.
`as
`
`(12)
`wf) =(Q(f) +e)CN {Cn1) +ICy} gf),
`where¢ is a Lagrange multiplier that is iteratively adjusted to satisfy the white
`noise gain constraint. The white noise gain is the array gain for spatially white
`(incoherent) noise; that is, Q(f) =I. A constraint on the white noise gainis
`necessary as an unconstrained superdirective solution will in fact result in
`significant gain to any incoherentnoise, particularly at low frequencies. Cox [10]
`states that the technique of adding a small amount to each diagonal matrix
`elementprior to inversionis in fact the optimum meansofsolving this problem.
`A study of the relationship between the multiplier « and the desired white
`noise gain 6*, shows that the white noise gain increases monotonically with
`increasing «. One possible means of obtaining the desired value of ¢ is thus
`an iterative technique employing a binary search algorithm between a specified
`minimum and maximum valuefor «. The computational expenseofthe iterative
`procedure is not critical, as the beamformer filters depend only on the source
`
`-92 -
`
`- 92 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`93
`
`location and array geometry, and thus must only be calculated once for a given
`configuration.
`The constraint matrix, C%(f), is of order L x N, where there are L linear
`constraints being applied, and the vector g(f) is a length-L column vector
`of constraining values. The constraints generally include one specifying unity
`response for the desired signal, d”(f)w(/) = 1, and where this is the sole
`constraint the above solution can by simplified by substituting C(f) = d(f) and
`g(f) = 1, giving
`
`(Qf) +edp)
`df)QCP) + elt f)
`Once the optimalfilters w(f) have been calculated, the near-field superdirec-
`tive beamformeroutputis calculated as
`
`wtf)
`
`(13)
`
`yu f) = wf)? x(f),
`
`where x(f) is the N-channel input column vector
`
`x(f)= [ad (A ani)”
`
`(14)
`
`(15)
`
`3.2. Near-field Compensation Unit
`The first element in the adaptive path of standard GSC is the blocking
`matrix [6]. Its purpose is to block the desired signal from the adaptive noise
`estimatc. To ensure complete blocking, the desired signal must both be time
`aligned and have equal amplitudes across all channels. If this is the case,
`cancellation occurs if each row of the blocking matrix sumsto zero, andall rows
`are linearly independent.
`For a near-field desired source, to align the desired signal on all channels,
`a near-field compensation mustfirst be applied to the input channels prior to
`blocking. To ensure full cancellation we need to compensate for both phase
`misalignment and amplitude scaling of the desired signal across sensors. We
`define the diagonal matrix
`
`Df) =[diag(d( fy,
`
`(16)
`
`where d(f) is the near-field propagation vector from Eq. (6). In this paper we
`define the diagonal operator, diag( ), to produce a diagonal matrix from a vector
`parameter. Conversely, if invoked with a matrix parameter, it produces a row
`vector corresponding to the matrix diagonal. The near-field compensation can
`be applied as
`
`x'(f) =D(f)x(f).
`
`(17)
`
`Oncethis near-field compensation has been performed, a standard GSC blocking
`matrix can be employed to block the desired signal from the adaptive path.
`The inclusion of this compensation unit is critical for a near-field desired
`signal. Without compensation for both phase and amplitude differences between
`sensors, blocking of the desired signal will not be ensured, leading to signal
`
`- 93 -
`
`- 93 -
`
`
`
`94
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`
`
`0
`
`20
`
`40
`
`60
`
`100
`60
`Direction of Arrival (deg)
`
`120
`
`140
`
`160
`
`160
`
`FIG. 4. Comparison of blocking matrix row beam-patterns.
`
`cancellation at the output. The near-field compensation effectively ensures that
`a true null exists in the beam-pattern of each blocking matrix row in the
`direction and distance correspondingto the desired source. Toillustrate, Fig. 4
`showsthedirectivity pattern at 2 kHz for the first row in the blocking matrix
`using the array shown in Fig. 5, with the desired source directly in front of the
`center microphone at a distance of 0.6 m. The figure shows the compensated
`response in the far- and near-fields, as well as the uncompensated near-field
`response.It is clear that the uncompensated system will allow a high degree of
`signal leakage into the adaptive path as it blocks noise sources rather than the
`desired signal.
`
`
`
`60cm
`
`270 cm
`
`desired
`source
`
`DeN,
`e
`localised
`noise
`
`FIG. 5. Experimental configuration.
`
`-94 -
`
`
`
`Farfletd
`--» NF Uncompensated (r=0.6m)
`— NF Compensated (r=0.6m)
`
`
`
` -
`
`
`
`
`
`BlockingMatrixRowResponse(dB)
`
`-40
`
`- 94 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`95
`
`3.3. Blocking Matrix and Adaptive Noise Canceling Filter
`The blocking matrix and adaptive noise canceling filters are taken from the
`standard GSC technique [6]. The order of the blocking matrix is N x (N — L),
`where there are L constraints applied in the fixed upper path beamformer.
`Generally only a unity constraint on the desired signal is specified, and the
`standard N x (N — 1) Griffiths—Jim blocking matrix is used:
`
`1
`
`O - 0
`
`O
`
`O
`
`O
`
`.
`
`(18)
`
`B=
`
`-1
`
`1
`
`0
`
`0
`
`0
`
`-1
`
`"
`
`-%
`
`oO
`
`:
`0
`
`0
`
`1
`
`-1
`0
`
`1
`-1
`
`The output of the blocking matrix is calculated as
`
`x" (f) =B"x'(f),
`
`(19)
`
`where x”(f) is an (N — 1)-length column vector. Defining the (N — 1)-length
`adaptive filter columnvector as
`
`a(f)=[ai(f) ... aif)... away’.
`
`the output of the lower path is given as
`
`yf) = al fy x" (f).
`
`(20)
`
`(21)
`
`The NFAB outputis then calculated from the upper and lower path outputs as
`
`WP) = yu) — 1)
`
`(22)
`
`and the adaptive filters are updated using the standard unconstrained LMS
`algorithm
`
`agyi(f) = a(f) + uf ye(f),
`
`(23)
`
`where y is the adaptation step size and k denotes the current frame.
`
`3.4. Summary of Technique
`
`In summary, the proposed NFABtechniqueis characterized by the series of
`equations
`
`yu(f) = wf)? x(f)
`x, =B“D(f)x(/)
`
`(24.4)
`(24b)
`
`- 95 -
`
`- 95 -
`
`
`
`96
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`yf) =afy xi(f)
`yf) = sul f) — 0 (P)
`api (f)=ac(f) Fuxyf),
`
`(24c)
`(24d)
`(24e)
`
`where all terms have been defined in the preceding discussion.
`
`4, EXPERIMENTAL CONFIGURATION
`
`a F
`
`or the experimental evaluation in this paper, we used the 11 element array
`shown in Fig. 5. The array consists of a nine element broadside array, with an
`additional two microphonessituated directly behind the end microphones. The
`total array is 40 cm wide and 15 cm deepin the horizontal plane. The broadside
`microphonesare arranged according to a standard broadband subarray design,
`where different subarrays are used for different frequency ranges for the fixed
`upper path beamformer. The two endfire microphonesare included for use by
`the near-field superdirective beamformerin the low frequency range. The four
`subarrays are thus
`
`(f <1 kHz): microphones 1-11;
`(1 kHz < f < 2 kHz): microphones 1, 2, 5, 8, and 9;
`(2 kHz < f <4 kHz): microphones2, 3, 5, 7, and 8; and
`e (4kHz < f < 8 kHz): microphones3-7.
`
`The array was situated in a computer room, with different sound source
`locations, as shownin Fig. 5. The two sound sources were
`
`1. the desired speaker situated 60 cm from the center microphone,directly
`in front of the array; and
`2. a localized noise source at an angle of 124° and a distance of 270 cm from
`the array.
`
`Impulse responses of the acoustic path between each source and microphone
`were measured from multichannel recordings made in the room with the ar-
`ray using the maximum length sequence technique detailed in Rife and Van-
`derkooy [11]. As the impulse responses were calculated from real recordings
`made simultaneously acrossall input channels, they take into account the real
`acoustic properties of the room and the array. The multichannel desired speech
`andlocalized noise microphone inputs were then generated by convolving the
`original single-channel speech and noise signals with these impulse responses.
`In addition, a real multichannel background noise recording of normal operat-
`ing conditions was made in the room with other workers present. This record-
`ing is referred to in the experiments as the ambient noise signal and is approxi-
`mately diffuse in nature. It consists mainly of computernoise, a variable level of
`background speech, and noise from an air-conditioning unit. The ambient noise
`effectively represents a diffuse noise field, while the localized noise represents
`a coherent noise source. In this paper, we specify the levels of the two different
`noise sources independently, as the signal to ambient-noise ratio (SANR) and
`
`- 96 -
`
`- 96 -
`
`
`
`_
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`97
`
`signal to localized-noise ratio (SLNR). These values are calculated as the aver-
`age segmental SNR from the speech and noise input, as measured at the center
`microphoneof the array.
`In this way, realistic multichannel input signals can be simulated for specified
`levels of ambient and localized noise. As well as facilitating the generation
`of different noise conditions, simulating the multichannel inputs using the
`impulse response method is more practical than making real recordings for
`speech recognition experiments, as existing single channel speech corpora may
`be used.
`
`5. EXPERIMENTAL RESULTS
`
`a T
`
`his section presents the results of the experimental evaluation. The
`proposed NFABtechnique is compared to a conventional
`fixed filter-sum
`beamformer, a fixed near-field superdirective beamformer, and a conventional
`GSC adaptive beamformer. These beamformersare specified in Table 1.
`The techniques are first assessed in terms of the directivity pattern in
`order to demonstrate the advantage of the proposed NFAB over conventional
`beamforming techniques, particularly at low frequencies. Following this, the
`techniques are evaluated for speech enhancement in terms of the improvement
`in signal to noise ratio and the log area ratio. Finally, the techniques are
`compared in a hands-free speech recognition task in noisy conditions using the
`TIDIGITSdatabase [12].
`
`5.1. Directivity Analysis
`As hasbeenstated, the main objective of the proposed technique is to produce
`an adaptive beamformer that exhibits good low frequency performance for near-
`field speech sources. To assess the effectiveness of the proposed technique in
`achieving this objective, in this section we analyze the horizontal directivity
`pattern. The directivity of a filter-sum beamformer is expressed in matrix
`notation as
`
`h(f.r,8,0) = wolf)" d( fr, 8,0),
`
`where w,is the length N channelfilter vector
`,
`.
`wf) = [woi(h) an woilf) ee Won (f))
`
`T
`
`:
`
`(25)
`
`(26)
`
`TABLE1
`
`
`Beamforming Techniques in Evaluation
`
`Technique
`
`Description
`
`Filters
`
`FS
`NFSD
`GSC
`
`NFAB
`
`Conventional FS beamformer
`Near-field superdirective beamformer
`GSC system with FS fixed upper path
`beamformer
`Near-field adaptive beamformer
`
`w,(f) = |diag(D(f))]"
`w.(f) =w(f)
`w,(f) = [diag(D(f))|" — Df) Bats)
`
`w,(f) =w(f) — Di f)Ba(f)
`
`-97 -
`
`- 97 -
`
`
`
`98
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`150,
`
`180)
`
`210'
`
`270
`
`270
`
`(a) FS
`
`(b) NFSD
`
`FIG. 6. Upperpath directivity pattern at 300 Hz.
`
`5.1.1, Upper path directivity. First, we seek to demonstrate the directiv-
`ity improvement that NFSD achieves at low frequencies compared to a conven-
`tional filter-sum (FS) beamformer. For the FS beamformer, a commonsolutionis
`to choose w,(/’) = [diag(D(f)) |". This effectively ensures that the desiredsignal
`is aligned for phase and amplitude across sensors using a spherical propagation
`model. For NFSD,we usethefilter vector w() described in Section 3.1. Figure 6
`shows the near-field directivity pattern at 300 Hz for the FS and NFSD. From
`these figures, it is clear that the NFSD technique results in greater directional
`discrimination at low frequencies compared to a conventional beamformer. At
`higher frequencies (f > 1 kHz), conventional beamformers offer reasonabledi-
`rectivity, and so the FS and NFSD techniques give comparable performance.
`5.1.2. Lower path directivity.
`Second, we wish to demonstrate the effect
`of the noise canceling path. The directivity of the noise canceling filters can be
`obtained by using the channelfilters w,(/) = D({)Ba(/). The blocking matrix
`and adaptive filters essentially implement a conventional (nonsuperdirective)
`beamformer that adaptively focuses on the major sources of noise. To examine
`the directivity of the lower path filters, the beamformer was run on an input
`speech signal with a white localized noise source (at the location shownin Fig.5)
`added at an SLNR of 0 dB and a low level of ambient noise (SANR = 20 dB).
`The steady-state adaptive filter vector, a(f), was written to file for both the
`proposed NFAB technique and the conventional GSC beamformer. The near-
`field directivity patterns of the lower path filters are plotted in Figs. 7 and 8
`for 300 and 5000 Hz, respectively. We see that the lower path adaptive filters
`for both beamformers converge to similar solutions in terms of directivity,
`producing a main lobe in the direction of the cohcrent noise source (+124° from
`Fig. 5), as well as a null in the location of the desired speaker. As expected, the
`directivity of the adaptive path is poor at low frequencies, as seen in Fig.7.
`§.1.3. Overall beamformerdirectivity.
`Finally, we examinethedirectivity
`pattern of the overall beamformer for the NFAB and conventional adaptive
`
`- 98 -
`
`- 98 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`99
`
` 270
`
`270
`
`(a) GSC LowerPath
`
`(b) NFAB LowerPath
`
`FIG. 7. Lower path directivity pattern at 300 Hz.
`
`systems. The near-field directivity patterns at 300 Hz are shown in Fig. 9.
`We see that the directivity pattern of the NFAB system exhibits a true null
`in the direction and at the distance of the noise source, while the directivity of
`the conventional beamformeris too poor to significantly attenuate the noise at
`this frequency. At frequencies above 1 kHz the directivity performance of both
`techniques is comparable.
`5.1.4. Summary of beamformer directivity.
`termsof directivity, the proposed NFAB system:
`
`In summary wesee that, in
`
`e outperforms the conventional FS system in terms of low frequency
`performance and theability to attenuate coherent noise sources,
`e outperforms the NFSD system due to the ability to attenuate coherent
`noise sources, and
`
`
`
`270
`
`270
`
`(a) GSC LowerPath
`
`(b) NFAB Lower Path
`
`FIG. 8. Lowerpath directivity pattern at 5000 Hz.
`
`- 99 -
`
`- 99 -
`
`
`
`100
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`
`
`270
`
`270
`
`(a) GSC
`
`(b) NFAB
`
`FIG. 9. Overall beamformer directivity pattern at 300 Hz.
`
`e outperforms the conventional GSC system in terms of low frequency
`performance.
`In this way, we see that the proposed system succeeds in meeting the stated
`objectives and should therefore demonstrate improved performance in speech
`processing applications.
`
`§.2. Speech Enhancement Analysis
`The signal plots in Fig. 10 give an indication of the level of enhancement
`achieved by the NFAB technique. For the desired speech signal, we used a
`segment of speech from the TIDIGITS database corresponding to the digit
`sequence one-nine-eight-six. Ambient noise was added at an SANR level of
`10 dB, and a localized white noise signal was added at an SLNRlevel of 0 dB.
`Theplots indicate that NFAB succeeds in reducingthe noise level with negligible
`distortion to the desired signal.
`To better measure the level of enhancement, objective speech measures were
`used to compare the different techniques. Two measures were used, these
`being the SNR improvement and the log area ratio distortion measure. The
`SNR improvementis defined as the difference in SNR at the array output and
`input. As the true SNR cannot be measured, it is estimated as the average
`segmental signal-plus-noise to noise ratio. While the signal to noise ratio is a
`useful measurefor assessing noise reduction, it does not necessarily give a good
`indication of how much distortion has been introduced to the desired speech
`signal. The log area ratio (LAR) measure of speech quality is more highly
`correlated with perceptual intelligibility in humans [13]. The log area ratio
`measurefor a frameof speech is calculated as
`
`1
`
`LAR(#) =
`
`P S low etre)
`4, tee)
`2ST) °F 1-H®
`
`1/2
`f
`
`(27)
`
`- 100 -
`
`- 100 -
`
`
`
`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`:
`
`101
`
`
`Clean Input (centre mic)
`
`
`
`
`NFAB Output
`Noisy Input(centre mic)
`
`
`
`NFSD Output
`
`
`
`FIG. 10. Sample enhancedsignal.
`
`wheren is the frame number, and r, and r, are the original and processed Pth-
`order linear predictive coefficients of the nth frame, respectively. The overall
`log area ratio distortion measure for the signal is calculated as the average
`distortion over all input frames.
`A set of experiments was conducted in which the localized white noise was
`replaced with a localized speech-like noise source taken from the NOISEX
`database [14]. This is essentially a white noise signal that has been shaped
`with a speech-like spectral envelope and thus represents a morerealistic noise
`scenario than white noise. The signal to localized noise ratio (SLNR) was varied
`from 20 to 0 dB, with the ambient noise present at a constant SANR level
`of 10 dB. The output signal to noise ratio improvement and log area ratios
`are given in Tables 2 and 3 for the different enhancement techniques. ! The
`measures have been averaged over 10 randomly chosen speech segments taken
`
`TABLE 2
`
`
`Signal to Noise Ratio Improvement (SANR = 10 dB)
`SLNR(dB)
`
`
`
`Technique
`0
`5
`10
`15
`20
`FS
`0.5
`0.4
`0.3
`0.1
`0.1
`NFSD
`1.4
`1.6
`11
`0.5
`0.2
`GSC
`1.6
`18
`1.9
`2.5
`3.3
`
`NFAB 7.9EE5.5 5.9 6.4 1.5
`
`
`
`
`
`1 Sample soundfiles are also available at http://www.speech.q