`
`SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENT
`
`By
`
`Abdul Wahab bin Abdul Rahman
`
`School of Applied Science,
`
`Nanyang Technological University,
`
`Nanyang Avenue,
`
`Singapore 639798
`
`First Year Report for the degree of PhD.
`
`In
`
`Applied Science
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 1 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 1
`
`
`
`2
`
`Abstract
`
`The increasing demand for digital cellular telephony and other new services including
`
`multimedia communications prompted numerous studies on implementing algorithms
`
`for low rate speech coding below 4.8 kbits/s using available DSP processors on the
`
`market. In addition there are needs to enhance the speech quality subject to both
`
`degradations due to road, engine and wind noise and the echoes present in the near-end
`
`speaker side--sources effecting the car phone input. All of these tasks must be achieved
`
`with a single DSP chip in order for the system to be both cost-effective and power
`
`efficient and thus widely accepted. This dissertation research proposes:
`
`1)
`
`2)
`
`3)
`
`4)
`
`To study both analytically and experimentally the above degradation.
`
`To develop sound signal processing algorithms to combat these imperfections.
`
`To address architectures for real-time implementation.
`
`To implement them on a DSP platform using state-of-the-art devices and
`
`reconfigurable systems.
`
`Firstly, the impediment to the speech quality in a vehicular chamber is the echo
`
`generated by the leakage of the far-end speaker and is mixed with the speech from the
`
`near-end speaker and transmitted as a composite signal. The first task of the proposed
`
`speech enhancer is to adaptively cancel these echoes. This necessitates the inclusion of
`
`near-end speaker activity detection.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 2 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 2
`
`
`
`3
`
`Secondly, in the vehicular hands-free cellular communication framework, it has been
`
`observed that the degradation in the intelligibility and the general quality of the cellular
`
`speech due to the engine, road, and wind noise components is equally disturbing as the
`
`vehicular echoes. Hence, the second important task of the enhancer is to combat these
`
`imperfections in the cellular speech.
`
`This last point, in particular, makes a form of beam forming based on a microphone
`
`array followed by an adaptive filtering process as a conceptually sound candidate for our
`
`speech enhancer. The most simple form of beamforming is called the delay and sum
`
`beamforming, which compensates the delay of the target signal and sums the signals in
`
`the beam so that the target signals have the same phase while the interfering signals
`
`exhibit different phase. The delay and sum beamforming technique will be used to first
`
`follow the genuine speaker. Then it will adaptively cancel the noises coming from the
`
`interfering speakers, the engine, the wind, especially critical when the windows of the
`
`vehicle are down, and the road noise coming from other vehicles and the road itself.
`
`There are a few studies in the literature on this for speech recognition in a hands-free
`
`telephone set-up [1-7]. These imperfections would be look into and a unified approach
`
`develops to combat all of these different types of degradations.
`
`In addition the research will develop architectures for real-time implementation on a
`
`single chip DSP as part of the next generation digital cellular phones operating at 4.8
`
`KB/s or less. At the time of this proposal, the preferred DSP platform is the
`
`TMS320C4x family from the Texas Instruments, Inc. However, the algorithms develop
`
`will be reconfigurable.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 3 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 3
`
`
`
`4
`
`1. Introduction
`
`It has been in the public debate for some time that vehicles in the future would need to
`
`detect, process, and communicate significantly more information. They will be between
`
`the vehicle and the driver, among the people in the vehicle and between the vehicle and
`
`the outside world, including other vehicles, road itself, and the Advanced Traffic
`
`Management and Information Systems (ATMIS).
`
`The driver and other passengers may want to communicate with the outside world
`
`verbally, or to have a conference call. These activities have been traditionally handled
`
`by car phones or short-wave radios, where the underlying signal is the band-limited
`
`voice grade waveform. These signals are transmitted over a communication channel,
`
`which is extremely corrupted by echoes both in the transmission link and inside the
`
`chamber of the vehicle. In addition, there are also natural and man-made noise from
`
`numerous sources, and interfering signals from other channels, passengers, and audio
`
`information subsystem present. It is commonly accepted that the next generation car
`
`telephones will be totally digital cellular and the volume of applications will increase.
`
`However, a number of ills will not go away and a speech processing system will be
`
`required to tackle them. Some of the tasks for the research will be the noise suppression,
`
`echo cancellation, source
`
`localisation, speaker
`
`identification, speech coding,
`
`compression and transmission by digital means.
`
`This report will discuss the spectral dissection of various degradations in vehicular
`
`environment followed by proposal for a cost-effective model for the speech enhancer
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 4 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 4
`
`
`
`5
`
`system and the introduction to a re-configurable digital signal-processing concept. The
`
`detail discussion on the propose speech enhancer system covers:
`
`1. Identifying the various man-made noises and categorising them into different
`
`optimal sub-bands so that noise cancellation and suppression can be accurately
`
`achieved.
`
`2. Handling of echoes from the near-end and the far-end speakers to adaptively cancel
`
`them.
`
`3. Genuine speaker identification and employing the beam former to follow the
`
`genuine speaker and then to adaptively cancel the noise coming from the interfering
`
`speaker, the engine, the wind and the road noise.
`
`4. Since these noises can be categorise into optimal sub-bands multi-rate signal
`
`processing can be employed to improve the computing performance of the speech
`
`enhancer.
`
` Finally, the conclusion, summary and schedule of the proposed speech enhancer system
`
`architecture for the PhD research.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 5 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 5
`
`
`
`6
`
`2. Spectra of Vehicular Disturbances
`
`In order to justify various components of the proposed system, it would be appropriate
`
`that the various ills mentioned above be observed visually. To study the problem
`
`carefully and to gather road data, a field test was performed. A compact van was
`
`equipped with a DAT tape recorder and a low-cost low-pass microphone. There were
`
`two passengers to act as interference sources in addition to the driver. A database of 40
`
`minutes long recordings under 16 different experimental conditions were recorded by
`
`travelling along the city streets and two expressways in Singapore for a number of
`
`hours. The data were captured onto a hard disk using the speech I/O unit of a digital
`
`signal processing development system. The clock rate was set at 8,000 samples/s, which
`
`is the Nyquist frequency after properly band-limiting the signal to the voice-grade
`
`service bandwidth of the next generation digital cellular phones.
`
`Figure 1 shows a single plot of a stationary vehicle with ignition just started. Each frame
`
`consists of 1024samples and the peak at about the 50th frames shows the maximum
`
`revolution of the engine during the initial start up. Since the windows are all up and the
`
`vehicle is not moving and no other speakers are talking, the spectrum clearly reflects the
`
`engine and air-con noise which dominate the 0 to 150Hz range.
`
`Figure 2 shows two plots of the vehicle moving along a minor road across speed
`
`regulating strips with varying distance. Notice the cyclic nature of the road noise
`
`occurring between 0 to 100 Hz. The spectrum also indicate that the road noise are low
`
`frequencies less then 200 Hz.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 6 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 6
`
`
`
`7
`
`Figure 1. Spectrum of the engine noise in frequency ranges 0-1,000 Hz for a stationary
`
`vehicle during the initial ignition of the engine
`
`Figure 2 Spectrum of the road noise in frequency ranges 0-1,000 Hz and 1,000 – 4,000
`
`Hz for a vehicle moving across a speed regulating strips.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 7 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 7
`
`
`
`8
`
`The spectrum of the engine noise is presented in two plots in figure 3 while the vehicle
`
`is moving at a nominal speed of 60 km/h. The windows were rolled up and the chamber
`
`was quiet. There was not any other vehicle in the vicinity and it was not possible to
`
`detect wind noise inside the vehicle. As it can be seen from these two plots, the engine
`
`noise does have any effect above 200 Hz. It can be seen from figure 1, 2 and 3 that the
`
`engine noise should be very easily tackled.
`
`Figure 3. Spectrum of the engine noise in frequency ranges 0-1,000 Hz and 200-1,000
`
`Hz for a vehicle moving at 60 km/h (windows rolled up and quiet inside the chamber.)
`
`Figure 4 display the spectra in three plots in the frequency ranges 0-1,000 Hz, 200-1,000
`
`Hz, and 1,000-4,000 Hz. In this case, the vehicle is stationary with the windows down.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 8 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 8
`
`
`
`9
`
`Figure 4.The spectrum of the stationary engine noise, ambient wind noise, and
`
`interference from vehicles passing by. The windows are down and the speakers are
`
`silent. The frequency ranges are from 0-1,000 Hz, 200-1,000 Hz and 1,000-4,000 Hz,
`
`respectively.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 9 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 9
`
`
`
`10
`
`There was a heavy vehicle moving at about 50 km/h and the levels of the ambient road
`
`noise and the wind noise were rather significant. In addition to the very-low frequency
`
`components of the previous case representing the engine noise, we have two additional
`
`spectral regions to consider. As it can be seen from this figure, there is considerable
`
`information in the frequency range between 200-400 Hz. We believe this is coming
`
`from the ambient wind noise and the wind generated by vehicles passing by and the road
`
`noise coming from the tire friction on pavement. Suppression of this degradation is not
`
`as simple as the previous one since it exhibits a slowly varying random behaviour.
`
`Nevertheless, a slowly adaptive filtering process should be able to minimise its effects.
`
`Noise components in the frequency range 1,000 Hz -4,000 Hz exhibit a coloured noise
`
`spectrum in a widely spread fashion. Since this spectrum is covering the complete
`
`speech frequency range, it is very difficult to tackle. Source localisation based on
`
`adaptive beamforming followed by a trainable and quickly adapting estimation and
`
`cancellation scheme will be needed to suppress the contributions from these sources.
`
`Finally, in Figure 4, we display similar spectra under more severe conditions. This time,
`
`the vehicle is travelling at a speed of 60 km/h with windows rolled up; there are other
`
`vehicles passing by; the driver is trying to communicate and the two passengers kept
`
`talking. The first spectrum is very similar to the one in Figure 1,2 and 3. However, the
`
`noise in the low frequency range 200-400 Hz is drastically reduced in comparison to
`
`Figure 4.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 10 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 10
`
`
`
`11
`
`Figure 5. The Spectrum when the driver is trying to communicate and two passengers
`
`kept talking in a moving vehicle at a speed of 60 km/h with windows rolled up. As
`
`before, frequency ranges are from 0-1,000 Hz, 200-1,000 Hz and 1,000-4,000 Hz,
`
`respectively.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 11 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 11
`
`
`
`12
`
`In the last figure, it is possible to observe the formant structure of the speech. We
`
`believe this will be one of the most frequently encountered scenarios and the speech
`
`enhancement task will be very demanding since all three speakers are talking and their
`
`acoustical echoes are riding on all other ills. It is impossible to completely eliminate all
`
`the degradations in this case. But the advanced speech enhancement features of the
`
`proposed system will be able to improve the quality of speech to permit uninterrupted
`
`communication.
`
`Lastly we would like to present the spectrum of a female and male voice respectively
`
`showing clearly all the formants of the speech as shown by figure 6a and 6b.
`
`Figure 6a. The spectrum of a female voice with frequency range from 0 - 2,000 Hz and
`
`2,000 – 4,000 Hz. (using CELP coder)
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 12 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 12
`
`
`
`13
`
`Figure 6b. The spectrum of a male voice with frequency range from 0 - 2,000 Hz and
`
`2,000 – 4,000 Hz. (using CELP coder)
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 13 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 13
`
`
`
`14
`
`3. The Enhanced Speech Processing and Communication System
`
`The speech quality of the emerging totally digital cellular phones will, to a greater
`
`extent, depend on the speech quality available at the near-end transmitter of the
`
`communication link. Despite this, most research efforts have been directed towards
`
`speech coding techniques, channel transmission issues of cellular telephony and noise
`
`control and optimisation [1][2][3][4][16]. Recently there has been some research
`
`interest on the effects of ambient acoustical noise in the vehicular environment
`
`[17][18][19], but most of the work on echo cancellation research are carried out in
`
`classroom or a conference room environment. The longest distance of a mid-size car is
`
`about 5m, which corresponds to 16ms delays at a sampling rate of 8,000 samples per
`
`second. At this distance the IS-54 industry standard would require 128-tap FIR filter to
`
`cancel off the echo. Throughout the world it is observed that a significant percent of
`
`cellular phone users are in vehicular chambers, cars, trucks, buses, and public
`
`transportation systems where degradations due to echoes, interferences, and various
`
`types of noise are severe. Recently, some research results, which address some of these
`
`problems, have been reported [6][7][8][9][10][11][12].
`
`An ideal solution to these is to have an enhanced speech processing and communication
`
`system with re-configurable and multi-tasking architecture. The system should be able
`
`to locate an intended speaker, cancel echoes generated inside the vehicle, combat
`
`various noise, and jamming signals as well as handle all the speech processing,
`
`compression, transmission, reception, and data and network communication tasks. In
`
`Figure 7, we present a block diagram of the proposed speech processing and
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 14 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 14
`
`
`
`15
`
`communication system. Speech input to the system will be provided by a microphone
`
`array strategically positioned on the dashboard to capture various signals from speech,
`
`different types of noise, echoes and other interferences. The front-end CODEC will have
`
`a set of 16-bit analogue-to-digital (A/D) and digital-to-analogue (D/A) converters with
`
`sampling rate of between 8,000-10,000 samples per second.
`
`Before any processing task, the system should be able to locate and identify the primary
`
`speaker. That is, the system must focus to its primary user. Speech from other people in
`
`the vehicle, from the hi-fi systems, echoes, engine noise, road noise, wind noise, noises
`
`from standing nearby and passing by vehicles will be considered unwanted input signals
`
`and hence, our objective is to eliminate them, or at least, suppress them significantly.
`
`This, in turn, will improve the quality of the speech from the genuine user.
`
`One of the most annoying impediments to speech quality in a vehicular chamber is the
`
`echo generated by the leakage of the far-end speaker. When the near-end speaker (i.e.
`
`the driver) or any of the passengers in the car speaks, this echo is mixed with his/her
`
`speech and transmitted as a composite signal. Thus, the first task of the proposed speech
`
`enhancement system is to adaptively cancel the echo during non-speech periods.
`
`However, it should not work as a canceller when the near-end speaker speaks. In other
`
`words, no adaptation is to be performed when the near-end speaker talks. This
`
`necessitates the inclusion of a near-end speaker activity detection mechanism. In our
`
`literature survey [6][7][8][9][10][11][12], we have noticed that some researchers have
`
`used a coefficient adaptation algorithm based on the least-mean-squared (LMS) error
`
`criterion for echo cancelling. Albeit being very successful in echo cancellation, the basic
`
`LMS technique is not very effective in tackling other degradation.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 15 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 15
`
`
`
`16
`
`TRANSMITTER/
`RECEIVER
`
`SPEECH CODER
`Y 8.0 VCELP
`Y 4.8CELP
`Y 2.4 MELP
`Y 32Kb/s LD-CELP
`Y ADPCM
`Y 16Kb/s LD-CELP
`
`CANCELLER/
`ENHANCER
`
`CODEC
`
`16-bit A/D and D/A converters
`gain control for each microphone
`and speaker output
`
`Sampling rate 8 to 10 Ks/sec
`
`Figure 7. The Block Diagram of the Proposed Speech Processing and Communication
`
`Microphones Array Beam former
`
`System
`
`Secondly, in the vehicular hands-free cellular communication framework, the engine,
`
`road, and wind noise components need to be considered. It has been observed that the
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 16 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 16
`
`
`
`17
`
`degradation in the intelligibility and the general quality of the cellular speech due to this
`
`imperfection is equally disturbing as the echo of the previous section. Hence, the second
`
`objective of the enhanced speech processing and communication system is to combat
`
`these imperfections of the cellular speech or data. Although there are some recent
`
`studies and analyses on the spectra of these noise sources [7][8][9], they are not directly
`
`applicable here since these noise sources have statistically different spectral behaviour.
`
`For instance, the engine noise is significantly correlated with the engine RPM and
`
`therefore, it is rather deterministic. On the other hand, the road and wind noises are
`
`stochastic in nature and spread over a frequency range [17].
`
`The worst class of degradation is from the interspeaker interference. In this case, the
`
`primary signal and the interfering signals have similar spectra [28]. Thus, it is an
`
`extremely difficult problem to tackle. This was the main reason why we are proposing
`
`the inclusion of speaker tracking and identification capabilities in this speech processing
`
`system.
`
`This last point, in particular, suggests a type of beamforming structure based on a
`
`microphone array [20][23][25][26][27] followed by an adaptive filtering scheme.
`
`Beamforming techniques, which have found important applications in radar, sonar,
`
`radio astronomy, geophysics, and biomedical signal processing applications, appear to
`
`be a conceptually sound candidate for our speech enhancement task.
`
`The most simple form of beamforming is called the delay and sum beamforming, which
`
`compensates the delay of the target signal and sums the signals in the beam so that the
`
`target signals have the same phase while the interfering signals exhibit different phase.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 17 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 17
`
`
`
`18
`
`Here we propose to use the delay and sum beamforming technique. First, it follows the
`
`genuine speaker and then adaptively cancels noises coming from the interfering
`
`speakers, the engine, the wind --especially critical when the windows are down-- and the
`
`road noise coming from other vehicles and the road-tire friction. There are some studies
`
`on
`
`this method
`
`for speech
`
`recognition
`
`in a hands-free
`
`telephone set-up
`
`[6][7][8][9][10][11][12]. Figure 7 shows the structure of the proposed enhancer with the
`
`microphone array and the A/D converters (Dn+1) as the inputs. The output of the system
`
`is a cleaned speech to be transmitted after compression.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 18 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 18
`
`
`
`M1
`
`M2
`
`M3
`
`D1
`
`D2
`
`D3
`
`Mn+1
`
`Dn+1
`
`19
`
`Genuine
`Speaker
`Tracker
`
`Speech + noise
`
`Speech output
`
`Gain and Phase
`update for
`Microphones array
`beamformer
`
`- noise and other
`imperfection
`
`FIR1
`
`FIR2
`
`FIR3
`
`FIRn+1
`
`Filter
`coefficient
`update
`
`Figure 7. The Speech Enhancement Circuit.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 19 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 19
`
`S
`S
`
`
`20
`
`4. Re-configurable Digital Signal Processing
`
`The above speech enhancement architecture requires a considerable amount of
`
`computations. Depending on the particulars of the actual speech/speaker detection
`
`circuitry, the beamformer, adaptive filter banks and digital speech compression
`
`algorithm, we anticipate the overall computational complexity to be on the order of 35-
`
`40 million operations per second (MOPS)1. In particular, the 2,400 bits/s U.S.
`
`government standard MELP coder will require 22-25 MOPS [11-12]. The remaining 13-
`
`15 MOPS will be needed for all other tasks. This conservative figure should be
`
`sufficient since all tasks other than the speech compression will be performed in a re-
`
`configurable multi-tasking fashion2. We believe The Texas Instruments, Inc.,
`
`TMS32C4X DSP hardware platform operating at 40 MHz should be able to handle all
`
`the computational needs. In order to have a microphone array size of six or more we
`
`propose the front-end audio input/output unit to have an eight channel aggregate
`
`200,000 Hz A/D rate in a multiplexed fashion and a minimum of two output channels.
`
`Operating of the system will require a scheduler and a memo-passing facility so that
`
`information can be passed from one process to another. A memo in this case will consist
`
`of the type of processing requirement, the placement of data in memory and, of course,
`
`the originating, and destination units.
`
`
`1 Here we use the term MOP in the framework of the Texas Instruments TMS320C4X DSP systems family.
`2 It should be easy to guess that the computational complexity would increase enormously if the architecture did not have re-
`configurability. That is, the overall computational load would be unacceptably high if the algorithms and circuits for all tasks were
`kept running at all times.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 20 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 20
`
`
`
`21
`
`In addition since the spectral analysis of the noises and echoes indicate the potential of
`
`subdividing them into optimal sidebands, it will be an ideal situation for computation to
`
`be carried out in a much lower sampling rate [29].
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 21 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 21
`
`
`
`22
`
`5. Conclusions and the next step of the project
`
`In this study, we propose a working model for future dashboards in intelligent vehicles.
`
`The system includes a totally digital speech processing and communication system.
`
`Since it is a digital system it will be easily reconfigured to work as an advanced packet
`
`data communication system including fax and electronic mail, voice mail and high-
`
`speed data transfer tasks. We have presented the enhanced speech communication sub-
`
`system and the source tracking and noise cancellation circuitry. However, we would like
`
`to emphasise that the proposed architecture and its components are at its very early stage
`
`of the project and need further deliberation and analysis. In other words, we need to
`
`study the various components in more detail and analysing its interaction. Although
`
`there are many research papers on noise and echo cancellation but they are mainly focus
`
`on a non-vehicular environment. In addition to handling noise and echo problems in a
`
`vehicle the speech enhancer also need to handle the interspeaker interference.
`
`The project should be able to develop expertise in the following areas:
`
`1.
`
`Speech coding and analysis
`
`2. Adaptive Signal Processing
`
`3. Noise and Echo analysis and Cancellation
`
`4. Microphone Array beamformer architecture
`
`5.
`
`The DSP chip (TMS320CXX) and reconfigurabilty
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 22 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 22
`
`
`
`23
`
`The initial studies of noise and echo in a vehicular environment have been completed
`
`and are now at the stage of looking into adaptive cancellation of noises and echoes.
`
`Even though there have been research in these areas but it would require research to
`
`perform cancellation in a sub-band fashion. The sub-band coding techniques employing
`
`the Wavelets concept where each sub-band will be analysed and processed separately
`
`thus improve the computation effort.
`
`In order to analyse the interspeaker interference we need to employ the blind
`
`deconvolution or separation of the signals from the interfering speaker. Techniques
`
`available right now are fairly complicated and would require a lot of computing power
`
`and are not acceptable to the projects. A simplified form is required and in addition the
`
`microphone array beamformer architecture with directive control would proof useful.
`
`This technique needs further discussion in the noisy vehicular environment.
`
`Lastly, the implementations of the speech enhancer into the TMS320Cxx DSP chip
`
`including the MELP speech CODEC will be of great challenge. The reconfigurable
`
`architecture requires smart and quick algorithm to fit into the time slot allocated for the
`
`available MOPS on the DSP chip.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 23 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 23
`
`
`
`24
`
`6. Acknowledgements
`
`I am gratefully indebted to Prof. Dr. Hüseyin Abut, my thesis advisor, and professor, for
`
`his valuable guidance and support. Additionally, I would like to thanks Dr. Tan Eng
`
`Chong for his advice as my thesis supervisor.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 24 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 24
`
`
`
`25
`
`7. References
`
`[1] Thomas E. Miller and Jeffrey Barish, “ Optimizing Sound for Listening in the
`
`Presence of Road Noise”, The International Conference on Signal Processing
`
`Applications and Technology, ICSPAT '93, Santa Clara, Calif., USA, Sept. 28-
`
`Oct. 1 93, Vol. 1, pp. 97-106.
`
`[2] Carlos R. Martins, Moises S. Piedade, INESC and Ceautl Lisboa, “ Fast Adaptive
`
`Noise Canceller using the LMS Algorithm”, The International Conference on
`
`Signal Processing Applications and Technology, ICSPAT '93, Santa Clara, Calif.,
`
`USA, Sept. 28- Oct. 1 93, Vol. 1, pp. 121-127.
`
`[3] Harrison, W. A., J. S. Lim and E. Singer, “A New Application of Adaptive Noise
`
`Cancellation”, IEEE Trans. Acoust., Speech and Signal Processing, Vol. ASSP-
`
`34, No. 1, pp. 21-27, Feb. 1986.
`
`[4] H. Olson, "Electronic Control of Noise, Vibration and reverberation," J. Acoust.
`
`Soc. Am., Vol.28, 1956, pp. 966-972.
`
`[5]
`
`Juha Hakkinen and Mauri Vaananen, “Background Noise Suppressor for a Car
`
`Hands-free Microphone”, The International Conference on Signal Processing
`
`Applications and Technology, ICSPAT '93, Santa Clara, Calif., USA, Sept. 28-
`
`Oct. 1 93, Vol. 1, pp. 300 – 307.
`
`[6] D. Messerschmitt, D. Hedberg, C. Cole, A. Haoui, and P. Winship, "Digital Voice
`
`Echo Canceller with a TMS320C20," in DSP Applications, K.-S. Lin, Ed.,
`
`Prentice-Hall, 1987.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 25 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 25
`
`
`
`26
`
`[7] S. Oh, V. Viswanathan, and P. Papamichalis, "Hands-Free Voice Communication
`
`in an Automobile With a Microphone Array," Proc. IEEE ICASP-92, pp. I-281 -
`
`284, San Francisco, CA.
`
`[8]
`
`I. Claesson, S.E. Nordholm, B.A. Bengtsson, and P. Erickson, "A Multi-DSP
`
`Implementation of a Broad-Band Adaptive Beamformer for Use in a Hands-Free
`
`Mobile Radio Telephone," EEE Trans. on Vehicular Technology, Vol. 40, pp.
`
`194-201, Feb. 1991.
`
`[9] L.J. Griffiths and C.W. Jim, "An Alternative Approach to Linearly Constrained
`
`Adaptive Beamforming," EEE Trans. on Antennas Propag., Vol. AP-30, pp. 27-
`
`34, January 1982.
`
`[10] E. Arkan, "Echo and Road Noise Cancellation in Digital Cellular Telephone,"
`
`M.S. Thesis, San Diego State University, Spring 1994.
`
`[11] E. Arkan, H. Abut, S. Pelling, fj. harris, and G.C. Marques, "Implementation of a
`
`5.0 KB/s Coder for Vehicular Applications: Part: II Acoustic Echo and Noise
`
`Canceller, Proc. of ASILOMAR-1993 Conf. on Sig., Sys. & Computers, pp. 776-
`
`780, IEEE Computer Society Press, 1993.
`
`[12] J. Tardelli, Chair, "US DoD Selection of 2400 BPS Standard," Special Session
`
`SPEC3, Proceedings of IEEE ICASSP-96, Pp. 1137-1164, May 1996, Atlanta,
`
`GA.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 26 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 26
`
`
`
`27
`
`[13] A. McCree, K. Truong, E.B. George, T.P. Barnwell, III and V. Viswanathan, "A
`
`2.4 KBIT/S MELP Coder Candidate for the new U.S. Federal Standard,"
`
`Proceedings of the IEEE ICASSP-96, May 1996, Atlanta, GA.
`
`[14] B.S. Hong, "Adaptive Filtering for Automatic Noise Cancelling in an Interactive
`
`Classroom." Final Year Project Report, No: 74-96, School of Applied Science,
`
`Nanyang Technological University, Singapore, 1997.
`
`[15] B. Widrow and S.D. Stearns, Adaptive Signal Processing, Prentice-Hall,
`
`Englewood Cliffs, N.J., 1985
`
`[16] Nestor Becerra Yoma, Fergus McInnes and Mervyn Jack, “Weighted Matching
`
`Algorithms and Reliability in Noise Cancelling by Spectral Subtraction”, 1997
`
`International Conference on Acoustics, Speech, and Signal Processing
`
`(ICASSP97), Munich, Germany, April 21-24, 1997, pp 1171 – 1174.
`
`[17] Joerg Meyer and Klaus Uwe Simmer, “Multi-channel Speech Enhancement in a
`
`Car Environment using Weiner Filtering and Spectral Subtraction”, 1997
`
`International Conference on Acoustics, Speech, and Signal Processing
`
`(ICASSP97), Munich, Germany, April 21-24, 1997, pp. 1167 – 1170.
`
`[18] L. Arslan, A. MacCree, and V. Viswanathan, “New methods for Adaptive noise
`
`suppression,” Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing,
`
`ICASSP –95, Detriot, Michigan, pp. 812 – 815, May 1995.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 27 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 27
`
`
`
`28
`
`[19] S. Nordholm, I. Claesson and I. Bengtsson, “Adaptive Array Noise Suppression of
`
`Handsfree speaker input in Cars”, IEEE Trans. On Vehicular Technology, vol. 42,
`
`no. 4, pp. 514 – 518, Nov. 1993.
`
`[20] Walter Kellermann, “Strategies for Combining Acoustic Echo Cancellation and
`
`Adaptive Beamforming Microphone Arrays,” Proc. of 1997 International
`
`Conference on Acoustics, Speech, and Signal Processing (ICASSP97), Munich,
`
`Germany, April 21-24, 1997, pp. 219 – 222.
`
`[21] Shoji Makino, Klaus Sstrauss, Suchiro Shimauchi, Yoichi Haneda and Akira
`
`Nakagawa, "Subband Stereo Echo Canceller using the Projection Algorithm with
`
`Fast Convergence to the true Echo Path,” Proc. of the 1997 International
`
`Conference on Acoustics, Speech, and Signal Processing (ICASSP97), Munich,
`
`Germany, April 21-24, 1997, pp. 299 - 302.
`
`[22] H. Schütze, “Convergence of Acoustic Echo Cancellers for Hands-Free
`
`Telephones Operating Under Feedback Conditions,” IEEE Trans. Speech and
`
`Audio Processing, April 1993, Vol. 1 #2, pp. 257 – 260.
`
`[23] S. Gazor and Y. Grenier, “Criteria for Positioning of Sensors for a Microphone
`
`Array”, IEEE Trans. On Speech and Audio Processing, July 1995, Vol. 3 #4, pp.
`
`294 – 303.
`
`[24] R. Martin and S. Gustafsson, The Echo shaping approach to Acoustic Echo
`
`Control”, Speech Communication, special issue on acoustic echo and noise
`
`control, 20(3-4), January 1997.
`
`/Abdul Wahab/PHD/wahab1.doc/Tuesday, July 22, 1997
`
`Page 28 of 29
`
`Petitioner Apple Inc.
`Ex. 1019, p. 28
`
`
`
`29
`
`[25] Jens Meyer and Carsten Sydow, “Noise Cancelling for Microphone Arrays”,
`
`Proc. of the 1997 International Conference on Acoustics, Speech, and Signal
`
`Processing (ICASSP97), Munich, Germany, April 21-24, 1997, pp. 211 - 214.
`
`[26] Gary W. Elko and Anh-Tho Nguyen Pong, “A Steerable and Variable First-Order
`
`Differential Microphone Array”, Proc. of the 1997 International Conference on
`
`Acoustics, Speech, and Signal Processing (