throbber
(cid:8)(cid:7)(cid:3)(cid:14)(cid:1)(cid:11)(cid:13)(cid:15)(cid:10)(cid:5)(cid:15)(cid:2)(cid:10)(cid:9)(cid:6)(cid:11)(cid:4)(cid:12)(cid:12)(cid:15)
`
`- i -
`
`Sony v. Jawbone
`
`U.S. Patent No. 11,122,357
`
`Sony Ex. 1006
`
`

`

`
`
`DIGITALSIGNAL
`
`PROCESSING12JAN20021
`
`- ii-
`
`

`

`Volume -r2,I\Jumber 1
`January 2002
`
`Articles published onlino nrst
`hlip://WWW.Idoallbrary.com
`
`T
`I
`
`i
`
`TX 5-483·033
`II[IIIIUIIIIIIIIIn~ 1
`
`"'TX00054B3033.:
`
`_l
`
`A Review Journal
`
`/
`
`0
`
`180
`
`0
`
`270
`
`(a) GSC Lower Path
`
`270
`
`(b) NFAB lower Path
`
`Editors
`Jim Schroeder
`Joe Campbell
`
`ISSN 1051-2004
`
`ACADEMIC
`PRESS
`
`An Elsevier Science Imprint
`
`-iii -
`
`

`

`Digital
`Signal
`Processing __ _
`
`A Review Journal
`
`Editors
`
`Jim Schroeder
`SPRI!CSSIP
`Adelaide, SA. Australia
`E-mail: schroeder@cssip.edu.au
`
`Joe Campbell
`M I T. Lincoln Laboratory
`Lexington. Massachusetts
`E-mail: j.campbell@ieee.org
`
`Editorial Board
`
`Maurice Bellanger
`CNAM
`Paris, France
`Robert E. Bogner
`University of Adelaide
`Adelaide, SA, Australia
`Johann F. Bohme
`Ruhr-Universitat Bochum
`Bochum. Germany
`James A. Cadzow
`Vanderbilt University
`Nashville, Tennessee
`G. Clifford Carter
`NUWC
`Newport. Rhode Island
`A. G. Constantinides
`Imperial College
`London, England
`Petar M. Djuric
`State University of New York
`Stony Brook, New York
`Anthony D. Fagan
`University College Dublin
`Dublin, Ireland
`Sadaoki Furui
`Tokyo Institute of Technology
`Tokyo, Japan
`
`John E. Hershey
`General Electric Company
`Schenectady, New York
`B. R. Hunt
`University of Arizona
`Tucson. Arizona
`James F. Kaiser
`Duke University
`Durham. North Carolina
`R. Lynn Kirlin
`University of Victoria
`Victoria, British Columbia, Canada
`Ercan Kuruoglu
`lstituto di Elaborazione della lnformazione
`Ghezzano. Italy
`Meemong Lee
`Jet Propulsion Laboratory
`Pasadena. California
`Petre Stoica
`Uppsala University
`Uppsala, Sweden
`Mati Wax
`Wavion. Ltd
`Yoqneam. Isreal
`Rao Yarlagadda
`Oklahoma Stale University
`Stillwater, Oklahoma
`
`Cover photo Lower path directivity pattern at 5000 Hz See the article by McCowan. Moore, and Sridharan in
`this issue
`
`- iv -
`
`

`

`Digital Signal Processing
`
`Volume 12, Number 1, January 2002
`
`© 2002 Elsevier Science (USA)
`
`All Rights Reserved
`
`No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy,
`recording, or any information storage and retrieval system, without permission in writing from the Publisher. Exceptions: Explicit permission
`from Academic Press is not required to reproduce a maximum of two figures or tables from an Academic Press article in another scientific or
`research publication provided that the material has not been credited to another source and that full credit to the Academic Press article is
`given. In addition, authors of work contained herein need not obtain permission in the following cases only: (1) to use their original figures or
`tables in their future works; (2) to make copies of their papers for use in their classroom teaching; and (3) to include their papers as part of their
`dissertations.
`The appearance of the code at the bottom of the first page of an article in this journal indicates the Publisher's consent that copies of the
`article may be made for personal or internal use, or for the personal or internal use of specific clients. This consent is given on the condition,
`however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers,
`Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to
`other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for
`resale. Copy fees for pre-2002 articles are as shown on the article title pages; if no fee code appears on the title page, the copy fee is the same
`as those for current articles.
`
`1 051-2004/02 $35.00
`MADE IN THE UNITED STATES OF AMERICA
`This journal is printed on acid-free paper.
`
`DIGITAL SIGNAL PROCESSING (ISSN 1051·2004)
`Published quarterty by Elsevier Science.
`Editorial and Production Olfices: 525 8 Street, Suite 1900, San Diego, CA 92101 -4495
`Accounting and Circulation Offices: 6277 Sea Harbor Drive, Orlando, FL 32887·4900
`2002: Volume 12 Price $343.00 U.S A. and Canada: $374,00 all olher countries
`All prices include postage and handling
`Information concerning personal subscription rales may be obtained by wriling to lhe Publishers. All correspondence, permission requests, and subscription orders
`should be addressed to the office of the Publishers at 6277 Sea Harbor Drive, Orlando, FL 32887-4900 (telephone: 407-345-2000) Send notices of change of address
`to the office of the Publishers at leas I 6 to 8 weeks in advance. Please include both old and new addresses. POSTMASTER: Send changes of address Ia Digital Signal
`Processing, 6277 Sea Harbor Drive, Orlando, FL 32887-4900.
`
`- v-
`
`

`

`Digital Signal Processing 12, 87-106 (2002)
`doi: 10.1006/dspr.2001.04 J 4, available online at http://www.idealibrary.com on
`
`1 0 E lll... l®
`-';'
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`Near-field Adaptive Beamformer for Robust
`Speech Recognition
`lain A. McCowan, Darren C. Moore, and S. Sridharan
`Speech Research Laboratory, RCSAVT, School ofEESE, Queensland University
`of Technology, GPO Box 2434, Brisbane QLD 4001, Australia
`E-mail: iain@ieee.org; moore@idiap.ch; s.sridharan@qut.edu.au
`
`McCowan, I. A., Moore, D. C., and Sridharan, S., Near-field Adaptive
`Beamformer for Robust Speech Recognition, Digital Signal Processing 12
`(2002) 87-106.
`This paper investigates a new microphone array processing technique
`specifically for the purpose of speech enhancement and recognition. The
`main objective of the proposed technique is to improve the low frequency
`directivity of a conventional adaptive beamformer, as low frequency per-
`formance is critical in speech processing applications. The proposed tech-
`nique, termed near-field adaptive beamforming (NFAB), is implemented
`using the standard generalized sidelobe canceler (GSC) system structure,
`where a near-field superdirective (NFSD) beamformer is used as the fixed
`upper-path beamformer to improve the low frequency performance. In ad-
`dition, to minimize signal leakage into the adaptive noise canceling path for
`near-field sources, a compensation unit is introduced prior to the blocking
`matrix. The advantage of the technique is verified by comparing the direc-
`tivity patterns with those of conventional filter-sum, NFSD, and GSC sys-
`tems. In speech enhancement and recognition experiments, the proposed
`technique outperforms the standard techniques for a near-field source in
`adverse noise conditions. © 2002 Elsevier Science tuSAl
`Key Words: microphone array; beamforming; near-field; adaptive; su-
`perdirectivity; speech recognition.
`
`1. INTRODUCTION
`
`-Currently, much research is being undertaken to improve the robustness of
`
`speech recognition systems in real environments. This paper focuses on the
`use of a microphone array to enhance the noisy input speech signal prior to
`recognition. While the use of microphone arrays for speech recognition has been
`studied for some time by a number of researchers, a persistent problem has been
`the poor low frequency directivity of conventional beamforming techniques with
`87
`
`1051-2004/02 $35.00
`© 2002 Elsevier Science (USA)
`All rights reserved.
`
`- 87 -
`
`

`

`88
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`practical array dimensions. Low frequency performance is critical for speech
`processing applications, as significant speech energy is located below 1 kHz.
`By explicitly maximizing the array gain, superdirective beamforming tech-
`niques are able to achieve greater directivity than conventional techniques with
`closely spaced sensor arrays [1] . This directivity generally comes at the expense
`of a controlled reduction in the white noise gain of the array. Recent work has
`demonstrated the suitability of superdirective beamforming for speech enhance-
`ment and recognition tasks [2, 3]. By employing a spherical propagation model
`in its formulation, rather than assuming a far-field model, near-field superdirec-
`tivity (NFSD) succeeds in achieving high directivity at low frequencies for near-
`field speech sources in diffuse noise conditions [4]. In previous work, near-field
`superdirectivity has been shown to lead to good speech recognition performance
`in high noise conditions for a near-field speaker [5].
`Superdirective techniques are typically formulated assuming a diffuse noise
`field. While this is a good approximation to many practical noise conditions,
`further noise reduction would result from a more accurate model of the
`actual noise conditions during operation. Adaptive array processing techniques
`continually update their parameters based on the statistics of the measured
`input noise. The generalized sidelobe canceler (GSC) [6] presents a structure
`that can be used to implement a variety of adaptive beamformers. A block
`diagram of the basic GSC system is shown in Fig. 1. The GSC separates
`the adaptive beamformer into two main processing paths-a standard fixed
`beamformer, w, with L constraints on the desired signal response, and an
`adaptive path, consisting of a blocking matrix, B , and a set of adaptive filters, a.
`As the desired signal has been constrained in the upper path, the lower path
`filters can be updated using an unconstrained adaptive algorithm, such as the
`least-mean-square (LMS) algorithm.
`While the theory of adaptive techniques promises greater signal enhance-
`ment, this is not always the case in real situations. A common problem with
`the GSC system is leakage of the desired signal through the blocking matrix,
`resulting in signal degradation at the beamformer output. This is particularly
`problematic for broadband signals, such as speech, and especially for speech
`recognition applications where signal distortion is critical.
`In this paper we propose a system that is suited to speech enhancement in a
`practical near-field situation, having both the good low frequency performance
`of near-field superdirectivity and the adaptability of a GSC system, while taking
`
`1----~+ ~
`
`FIG. 1. Generalized sidelobe canceler structure.
`
`- 88 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`89
`
`care to minimize the problem of signal degradation for near-field sources.
`We begin by formulating a concise model for near-field sound propagation in
`Section 2. This model is then used in Section 3 to develop the proposed near-
`field adaptive beamforming (NFAB) technique. To demonstrate the benefit of
`the technique over existing methods, an experimental evaluation assessing
`directivity patterns, speech enhancement performance, and speech recognition
`performance is detailed in Sections 4 and 5.
`
`2. NEAR-FIELD SOUND PROPAGATION MODEL
`
`-In sensor array applications, a succinct means of characterizing both the
`
`array geometry and the location of a signal source is via the propagation vector.
`The propagation vector concisely describes the theoretical propagation of the
`signal from its source to each sensor in the array. In this section, we develop an
`expression for the propagation vector of a sound source located in the near-field
`of a microphone array using a spherical propagation model. This expression is
`then used in the formulation of the proposed near-field adaptive beamformer in
`the following sections.
`Many microphone array processing techniques assume a planar signal
`wavefront. This is reasonable for a far-field source, but when the desired
`source is close to the array a more accurate spherical wavefront model must
`be employed. For a microphone array of length L, a source is considered to be
`in the near-field if r < 2 L 21 ;._, where r is the distance to the source and ;._ is the
`wavelength.
`We define the reference microphone as the origin of a 3-dimensional vector
`space, as shown in Fig. 2. The position vector for a source in direction (85 , c/Js),
`at distance r5 from the reference microphone, is denoted Ps and is given by:
`
`cos Bs sin c/Js l
`
`Ps = rsrx,y,z] sin8ssin¢s
`[
`cos¢5
`
`.
`
`(1)
`
`The microphone position vectors, denoted as p; (i = 1, ... , N), are similarly
`defined. The distance from the source to microphone i is thus
`
`d; =liPs- P;ll,
`
`(2)
`
`II is the Euclidean vector norm.
`where II
`In such a model, the differences in distance to each sensor can be significant
`for a near-field source, resulting in phase misalignment across sensors. The
`difference in propagation time to each microphone with respect to the reference
`microphone (i = 1) is given by
`
`d;- dt
`r;= -
`-
`- ,
`c
`
`(3)
`
`- 89 -
`
`

`

`90
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`Az
`
`reference •
`····· · ·· ·· ··~1 ······ ....
`microphone •
`( 0, 0, 0 ).·
`,·'-..._.../'-
`' '
`e.
`microphone I
`
`.·
`
`~-
`X
`
`y
`
`. Pi
`
`FIG. 2. Near-field propagation model.
`
`where c = 340 ms- 1 for sound. In addition, the wavefront amplitude decays at a
`rate proportional to the distance traveled. The resulting amplitude differences
`across sensors are negligible for far-field sources, but can be significant in
`the near-field case. The microphone attenuation factors, with respect to the
`amplitude on the reference microphone, are given by
`
`rh
`I.Yi=-.
`di
`
`(4)
`
`Thus, if x1 (f) is the desired source at the reference microphone, the signal on
`the i th microphone is given by
`
`(5)
`
`Consequently, we define the near-field propagation vector for a source at
`distance r and direction ce' ¢) as
`
`3. NEAR-FIELD ADAPTIVE BEAMFORMING
`
`-The proposed system structure is shown in Fig. 3. The objective of the
`
`proposed technique is to add the benefit of good low frequency directivity
`to a standard adaptive beamformer, as low frequency performance is critical
`in speech processing applications. The upper path consists of a fixed near-
`field superdirective beamformer, while the lower path contains a near-field
`compensation unit, a blocking matrix and an adaptive noise canceling filter.
`The principal components of the system are discussed in the following sections.
`
`- 90 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`91
`
`+0
`-
`
`y(f)
`
`Yl(f)
`
`/_
`-
`
`Adap'e
`
`/a(f)
`
`N
`I x(f)
`
`Fixed
`NFSD
`Beamformer
`../' w(f)
`
`Yu(f)
`
`'
`
`Near-field
`
`Blocking
`
`N-1
`
`D(f)
`
`x'(f)
`
`B
`
`-compensation - Matrix ~ Filters
`
`I
`
`FIG. 3. Near-field adaptive beamformer.
`
`Section 3.1 gives an explanation of the near-field superdirective beamformer.
`Section 3.2 proposes the inclusion of a near-field compensation unit in the
`adaptive sidelobe canceling path and examines its effect on reducing signal
`distortion at the output. Once this near-field compensation has been performed,
`a standard generalized sidelobe canceling blocking matrix and adaptive filters
`can be applied to reduce the output noise power, as discussed in Section 3.3.
`
`3.1. Near-field Superdirective Beamformer
`Superdirective beamforming techniques are based upon the maximization of
`the array gain, or directivity index. The array gain is defined as the ratio of
`output signal-to-noise ratio to input signal-to-noise ratio and for the general
`case can be expressed in matrix notation as [1]
`
`where w(j) is a column vector of channel gains,
`
`w(j) = [w1C.f) ... w;(j) ... WN(f)f,
`
`(7)
`
`(8)
`
`( )H is the complex conjugate transpose operator, and P(j) and Q(j) are
`the cross-spectral density matrices of the signal and noise respectively. In
`practical speech processing applications the form of the signal and noise cross-
`spectral density matrices is generally unknown and must be estimated, either
`from mathematical models (fixed beamformers) or from the statistics of the
`multichannel inputs (adaptive beamformers). Superdirective beamformers are
`calculated based on assumed mathematical models for the P(j) and Q(j)
`matrices.
`When the desired signal is known to emanate from a single source at location
`(r,, 85 , ¢ 5 ), the signal cross-spectral matrix P simplifies to the propagation vector
`of the source, and the array gain can be expressed as
`
`(9)
`
`L
`
`- 91 -
`
`

`

`92
`
`Digital Signal Pmcessing Vol. 12, No. 1, January 2002
`
`where d(.f, r, e, ¢) is the propagation vector for the desired source, as defined in
`Eq. (6).
`A diffuse (spherically isotropic) noise field is often a good approximation for
`many practical situations, particularly in reverberant closed spaces, such as in a
`car or an office [7, 8]. For diffuse noise, the noise cross-spectral density matrix Q
`can be formulated as
`
`Q(f)= 4~ ii d(f,8,¢)d(f,e,rp)Hsin8d8d¢,
`
`(10)
`
`where d(f, e, ¢) is the propagation vector of a far-field noise source (r » 2L 2 /'A)
`in direction (8, ¢).
`The superdirectivity problem is thus formulated as:
`
`(11)
`
`By using a spherical propagation model to formulate the propagation
`vector, d, the standard superdirective formulation can be optimized for a near-
`field source [9, 4]. As such, the only difference in the calculation of the standard
`and near-field superdirective channel filters is the form of the propagation
`vector, d. For a near-field source, the assumption of plane wave (far-field)
`propagation leads to errors in the array response to the desired signal due
`to curvature of the direct wavefront. A thorough discussion of the use of a
`near-field model for superdirective microphone arrays is given by Ryan and
`Goubran [9].
`Cox [10] gives the general superdirective filter solution subject to
`1. L linear constraints, C(f)Hw(f) = g(f) (explained below); and
`2. a constraint on the maximum white noise gain, w(.f) H w(f) = 8- 2, where
`82 is the desired white noise gain.
`as
`
`where E is a Lagrange multiplier that is iteratively adjusted to satisfy the white
`noise gain constraint. The white noise gain is the array gain for spatially white
`(incoherent) noise; that is, Q(.f) = I. A constraint on the white noise gain is
`necessary as an unconstrained superdirective solution will in fact result in
`significant gain to any incoherent noise, particularly at low frequencies. Cox [10]
`states that the technique of adding a small amount to each diagonal matrix
`element prior to inversion is in fact the optimum means of solving this problem.
`A study of the relationship between the multiplier E and the desired white
`noise gain 82 , shows that the white noise gain increases monotonically with
`increasing E. One possible means of obtaining the desired value of E is thus
`an iterative technique employing a binary search algorithm between a specified
`mipjmum and maximum value for E. The computational expense of the iterative
`procedure is not critical, as the beamformer filters depend only on the source
`
`- 92 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`93
`
`location and array geometry, and thus must only be calculated once for a given
`configuration.
`The constraint matrix, C H (f), is of order L x N, where there are L linear
`constraints being applied, and the vector g(f) is a length-L column vector
`of constraining values. The constraints generally include one specifying unity
`response for the desired signal, dH (f)w(f) = 1, and where this is the sole
`constraint the above solution can by simplified by substituting C(f) = d(f) and
`g(f) = 1, giving
`
`w(f) = d(f) 11 lQ(J) + dl- 1d(f) ·
`Once the optimal filters w(f) have been calculated, the near-field superdirec-
`tive beamformer output is calculated as
`
`(13)
`
`y" (f) = w(f) H x(f),
`
`where x(f) is the N -channel input column vector
`
`x(f) = [x1(f) ... x;(f) ... xNU)f .
`
`(14)
`
`(15)
`
`3.2. Near-field Compensation Unit
`The first element in the adaptive path of standard GSC is the blocking
`matrix [6]. Its purpose is to block the desired signal from the adaptive noise
`estimate. To ensure complete blocking, the desired signal must both be time
`aligned and have equal amplitudes across all channels. If this is the case,
`cancellation occurs if each row of the blocking matrix sums to zero, and all rows
`are linearly independent.
`For a near-field desired source, to align the desired signal on all channels,
`a near-field compensation must first be applied to the input channels prior to
`blocking. To ensure full cancellation we need to compensate for both phase
`misalignment and amplitude scaling of the desired signal across sensors. We
`define the diagonal matrix
`
`D(f) = [diag(d(f))]- 1 ,
`
`(16)
`
`where d(f) is the near-field propagation vector from Eq. (6). In this paper we
`define the diagonal operator, diag( ) , to produce a diagonal matrix from a vector
`parameter. Conversely, if invoked with a matrix parameter, it produces a row
`vector corresponding to the matrix diagonal. The near-field compensation can
`be applied as
`
`x' (f) = D (f)x(f).
`(17)
`Once this near-field compensation has been performed, a standard GSC blocking
`matrix can be employed to block the desired signal from the adaptive path.
`The inclusion of this compensation unit is critical for a near-field desired
`signal. Without compensation for both phase and amplitude differences between
`sensors, blocking of the desired signal will not be ensured, leading to signal
`
`- 93 -
`
`

`

`94
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`.......... ..... ' .......... ..
`. ···
`,----,;'
`' .
`/
`' . '' '' '• '• '•
`'
`I
`
`-30
`
`•·· Farflold
`- - NF Uncompensaled {r=0.6m)
`-
`NF Compensated (r-0.6m)
`
`~ 0o~---2~o----~4o----~so-----eLo~~~~oo----~~2-0 ----14~0----1~so--~1BO
`
`Di'e<:lion of Arrival (deg)
`
`FIG. 4. Comparison of blocking matrix row beam-patterns.
`
`cancellation at the output. The near-field compensation effectively ensures that
`a true null exists in the beam-pattern of each blocking matrix row in the
`direction and distance corresponding to the desired source. To illustrate, Fig. 4
`shows the directivity pattern at 2 kHz for the first row in the blocking matrix
`using the array shown in Fig. 5, with the desired source directly in front of the
`center microphone at a distance of 0.6 m. The figure shows the compensated
`response in the far- and near-fields, as well as the uncompensated near-field
`response. It is clear that the uncompensated system will allow a high degree of
`signal leakage into the adaptive path as it blocks noise sources rather than the ·
`desired signal.
`
`10
`
`15cm
`
`10cm
`
`2 5 em
`r -.- .,-,
`
`Scm
`Scm
`34567
`8
`
`11
`
`15 em
`
`10c:m
`
`.. ~ desired
`
`source
`
`1 zi'
`
`•
`·. ,
`localised
`noise
`
`FIG. 5. Experimental configuration.
`
`- 94 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`95
`
`3.3. Blocking Matrix and Adaptive Noise Canceling Filter
`The blocking matrix and adaptive noise canceling filters are taken from the
`standard GSC technique [6]. The order of the blocking matrix is N x (N- L),
`where there are L constraints applied in the fixed upper path beamformer.
`Generally only a unity constraint on the desired signal is specified, and the
`standard N x (N- 1) Griffiths-Jim blocking matrix is used:
`
`B=
`
`1
`
`-1
`
`0
`
`1
`
`0
`
`0
`
`0
`
`-1
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`0
`
`0
`
`-1
`0
`
`1
`-1
`
`The output of the blocking matrix is calculated as
`
`(18)
`
`(19)
`
`where x" (f) is an (N - 1)-length column vector. Defining the (N - 1)-length
`adaptive filter column vector as
`
`a(f) = [a1(j) ... a;(f) ... aN-lCJ)f,
`
`the output of the lower path is given as
`
`(20)
`
`(21)
`
`The NFAB output is then calculated from the upper and lower path outputs as
`
`y(j) = YuCf)- YtCf)
`
`(22)
`
`and the adaptive filters are updated using the standard unconstrained LMS
`algorithm
`
`ak+ICf) = ak(f) + J-tX~(f)yk(f),
`where J1 is the adaptation step size and k denotes the current frame.
`
`(23)
`
`3.4. Summary of Technique
`In summary, the proposed NFAB technique is characterized by the series of
`equations
`
`Yu (f) = w(f) H x(f)
`x~ = BHD(j)x(j)
`
`(24a)
`(24b)
`
`..
`
`- 95 -
`
`

`

`96
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`YtCf) = a(f)Hx~(f)
`y(f) = Yu (f) - Yt (f)
`ak+l (f)= ak (f)+ JJ-X~ (f))'k(f),
`
`(24c)
`(24d)
`(24e)
`
`-
`
`where all terms have been defined in the preceding discussion.
`
`4. EXPERIMENTAL CONFIGURATION
`
`For the experimental evaluation in this paper, we used the 11 element array
`shown in Fig. 5. The array consists of a nine element broadside array, with an
`additional two microphones situated directly behind the end microphones. The
`total array is 40 em wide and 15 em deep in the horizontal plane. The broadside
`microphones are arranged according to a standard broadband subarray design,
`where different subarrays are used for different frequency ranges for the fixed
`upper path beamformer. The two endfire microphones are included for use by
`the near-field superdirective beamformer in the low frequency range. The four
`subarrays are thus
`• (f <1kHz): microphones 1-11;
`• (1kHz < f <2kHz): microphones 1, 2, 5, 8, and 9;
`• (2kHz< f <4kHz): microphones 2, 3, 5, 7, and 8; and
`• (4kHz< f <8kHz): microphones 3-7.
`The array was situated in a computer room, with different sound source
`locations, as shown in Fig. 5. The two sound sources were
`1. the desired speaker situated 60 em from the center microphone, directly
`in front of the array; and
`2. a localized noise source at an angle of 124° and a distance of270 em from
`the array.
`Impulse responses of the acoustic path between each source and microphone
`were measured from multichannel recordings made in the room with the ar-
`ray using the maximum length sequence technique detailed in Rife and Van-
`derkooy [11]. As the impulse responses were calculated from real recordings
`made simultaneously across all input channels, they take into account the real
`acoustic properties of the room and the array. The multichannel desired speech
`and localized noise microphone inputs were then generated by convolving the
`original single-channel speech and noise signals with these impulse responses.
`In addition, a real multichannel background noise recording of normal operat-
`ing conditions was made in the room with other workers present. This record-
`ing is referred to in the experiments as the ambient noise signal and is approxi-
`mately diffuse in nature. It consists mainly of computer noise, a variable level of
`background speech, and noise from an air-conditioning unit. The ambient noise
`effectively represents a diffuse noise field, while the localized noise represents
`a coherent noise source. In this paper, we specify the levels of the two different
`noise sources independently, as the signal to ambient-noise ratio (SANR) and
`
`- 96 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`97
`
`signal to localized-noise ratio (SLNR). These values are calculated as the aver-
`age segmental SNR from the speech and noise input, as measured at the center
`microphone of the array.
`In this way, realistic multichannel input signals can be simulated for specified
`levels of ambient and localized noise. As well as facilitating the generation
`of different noise conditions, simulating the multichannel inputs using the
`impulse response method is more practical than making real recordings for
`speech recognition experiments, as existing single channel speech corpora may
`be used.
`
`5. EXPERIMENTAL RESULTS
`
`-This section presents the results of the experimental evaluation. The
`
`proposed NFAB technique is compared to a conventional fixed filter-sum
`beamformer, a fixed near-field superdirective beamformer, and a conventional
`GSC adaptive beamformer. These beamformers are specified in Table 1.
`The techniques are first assessed in terms of the directivity pattern in
`order to demonstrate the advantage of the proposed NFAB over conventional
`beamforming techniques, particularly at low frequencies. Following this, the
`techniques are evaluated for speech enhancement in terms of the improvement
`in signal to noise ratio and the log area ratio. Finally, the techniques are
`compared in a hands-free speech recognition task in noisy conditions using the
`TIDIGITS database [12].
`
`5.1. Directivity Analysis
`As has been stated, the main objective of the proposed technique is to produce
`an adaptive beamformer that exhibits good low frequency performance for near-
`field speech sources. To assess the effectiveness of the proposed technique in
`achieving this objective, in this section we analyze the horizontal directivity
`pattern. The directivity of a filter-sum beamformer is expressed in matrix
`notation as
`
`h(f, r, 8, ¢) = W 0 (f)H d(.f, r, 8, ¢),
`where W 0 is the length N channel filter vector
`
`(25)
`
`(26)
`
`TABLE 1
`Beamforming Techniques in Evaluation
`Description
`Conventional FS beamformer
`Near-field superdirective beamformer
`GSC system with FS fixed upper path
`beamformer
`Near-field adaptive beamformer
`
`Filters
`w,(f) = [diag(D(f))] 11
`w,(f) =w(f)
`w,(/) = [diag(D(/))] 11 - D(f)Ba(f)
`
`w,(/) = w( f)- D(f)Ba( f)
`
`Technique
`FS
`NFSD
`GSC
`
`NFAB
`
`- 97 -
`
`

`

`98
`
`Digital Signal Processing Vol. 12, No. 1, January 2002
`
`90
`
`1
`
`90
`
`1
`
`. .. ,_. .. ~ :· ... ·.:·
`. 0.5 ...
`.
`.
`.
`.
`·:.::.
`... • .
`:1:
`.. •.. ·~ . . .. . ·. :":·\/ : ...... ... . ·? .. ' ' .
`. .... ~ :: ··....
`.
`. . . . . ~ .
`. / ..
`·.·
`..
`
`'::.~ . .
`
`180 . ..
`
`270
`
`(a) FS
`
`270
`
`(b) NFSD
`
`FIG. 6. Upper path directivity pattern at 300 Hz.
`
`5.1 .1. Upper path directivity. First, we seek to demonstrate the directiv-
`ity improvement that NFSD achieves at low frequencies compared to a conven-
`tional filter-sum (FS) beamformer. For the FS beamformer, a common solution is
`to choose w 0 (f) = [diag(D(f))]H. This effectively ensures that the desired signal
`is aligned for phase and amplitude across sensors using a spherical propagation
`model. For NFSD, we use the filter vector w(f) described in Section 3.1. Figure 6
`shows the near-field directivity pattern at 300Hz for the FS and NFSD. From
`these figures, it is clear that the NFSD technique results in greater directional
`discrimination at low frequencies compared to a conventional beamformer. At
`higher frequencies (f > 1kHz), conventional beamformers offer reasonable di-
`rectivity, and so the FS and NFSD techniques give comparable performance.
`5.1.2. Lower path directivity. Second, we wish to demonstrate the effect
`of the noise canceling path. The directivity of the noise canceling filters can be
`obtained by using the channel filters w 0 (f) = D(f)Ba(f). The blocking matrix
`and adaptive filters essentially implement a conventional (nonsuperdirective)
`beamformer that adaptively focuses on the major sources of noise. To examine
`the directivity of the lower path filters, the beamformer was run on an input
`speech signal with a white localized noise source (at the location shown in Fig. 5)
`added at an SLNR of 0 dB and a low level of ambient noise (SANR = 20 dB).
`The steady-state adaptive filter vector, a(f), was written to file for both the
`proposed NFAB technique and the conventional GSC beamformer. The near-
`field directivity patterns of the lower path filters are plotted in Figs. 7 and 8
`for 300 and 5000 Hz, respectively. We see that the lower path adaptive filters
`for both beamformers converge to similar solutions in terms of directivity,
`producing a main lobe in the direction of the coherent noise source ( ~ 124° from
`Fig. 5), as well as a null in the location of the desired speaker. As expected, the
`directivity of the adaptive path is poor at low frequencies, as seen in Fig. 7.
`5.1.3. Overall beamformer directivity. Finally, we examine the directivity
`pattern of the overall beamformer for the NFAB and conventional adaptive
`
`- 98 -
`
`

`

`McCowan, Moore, and Sridharan: Near-field Adaptive Beamformer
`
`99
`
`90
`
`90
`
`1
`
`270
`
`(a) GSC Lower Path
`
`270
`
`(b) NFAB Lower Path
`
`FIG. 7. Lower path directivity pattern at 300Hz.
`
`systems. The near-field directivity patterns at 300 Hz are shown in Fig. 9.
`We see that the directivity pattern of the NFAB system exhibits a true null
`in the direction and at the distance of the noise source, while the directivity of
`the conventional beamformer is too poor to significantly attenuate the noise at
`this frequency. At frequencies above 1 kHz the directivity performance of both
`techniques is comparable.
`5.1.4. Summary of beamformer directivity.
`terms of directivity, the proposed NFAB system:
`• outperforms the conventional FS system in terms of low frequency
`performance and the ability to attenuate coherent noise sources,
`• outperforms the NFSD system due to the ability to attenuate co

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket