`
`Differences in Voice Quality Between Men and Women: Use of the
`Long-Term Average Spectrum (LTAS)
`
`Article(cid:100)(cid:100)in(cid:100)(cid:100)Journal of Voice · April 1996
`
`DOI: 10.1016/S0892-1997(96)80019-1
`
`CITATIONS
`128
`
`4 authors, including:
`
`Humberto Manuel Trujillo Mendoza
`University of Granada
`
`213 PUBLICATIONS(cid:100)(cid:100)(cid:100)1,279 CITATIONS(cid:100)(cid:100)(cid:100)
`
`SEE PROFILE
`
`READS
`1,634
`
`Some of the authors of this publication are also working on these related projects:
`
`FOREIGN FIGHTERS AND EUROPEAN SECURITY. PSYCHOSOCIAL PARAMETERS OF RADICALIZATION. View project
`
`Desarrollo de herramientas psicológicas y computacionales de ayuda a la decisión para la prevención de la radicalización islamista de corte yihadista.
`View project
`
`All content following this page was uploaded by Humberto Manuel Trujillo Mendoza on 09 November 2017.
`
`The user has requested enhancement of the downloaded file.
`
`1
`
`APPLE 1013
`
`
`
`Journal of Voice
`Vol. 10, No. 1, pp. 59-66
`© 1996 Lippincott-Raven Publishers, Philadelphia
`
`Differences in Voice Quality Between Men and Women:
`Use of the Long-Term Average Spectrum (LTAS)
`
`Elvira Mendoza, Nieves Valencia, Juana Mufioz, and *Humberto Trujillo
`
`Department of Personality, Evaluation and Psychological Treatment, and *Area of Methodology of the Behavioral
`Sciences, Departament of Social Psychology, University of Granada, Granada, Spain
`
`Summary: The goal of this study was to determine if there are acoustical
`differences between male and female voices, andif there are, where exactly do
`these differences lie. Extended speech samples were used. The recorded read-
`ings of a text by 31 women and by 24 men were analyzed by means of the
`Long-Term Spectrum (LTAS), extracting the amplitude values (in decibels) at
`intervals of 160 Hz over a range of 8 kHz. The results showed a significant
`difference between genders, as well as an interaction of gender and frequency
`level. The female voice showedgreaterlevels of aspiration noise, located in the
`spectral regions corresponding to the third formant, which causes the female
`voice to have a more “‘breathy’’ quality than the male voice. The lower spec-
`tral tilt in the women’s voices is another consequence of this presence of
`greater aspiration noise. Key Words: Long-Term Average Spectrum—Voice
`quality—Genderdifferences—Breathiness—Aspiration noise—Spectraltilt.
`
`The ability of the human ear to identify an indi-
`vidual’s gender on the basis of voice quality, re-
`gardless of linguistic content, has been discussed
`previously by various investigators (1,2). Yet, the
`perceptual parameters or various strategies used to
`discriminate between male and female voices are
`not well understood. O’Kane (1) believes that this
`discrimination appears to be performed routinely by
`humanlisteners by extracting a limited number of
`perceptual cues; these may include various socio-
`logical factors such as cultural stereotyping. How-
`ever, Murray and Singh (3) have suggested thatlis-
`teners are able to distinguish a speaker’s gender on
`the basis of such acoustic characteristics as stress
`and pitch levels,
`in addition to nasality versus
`hoarsenessin male and female voices, respectively.
`In speech studies involving genderidentification,
`the acoustic correlates usually submitted to judg-
`ments of listeners have beenrelated to a set ofla-
`
`Accepted March 15, 1995.
`Address correspondence and reprint requests to Dr. Elvira
`Mendoza at Facultad de Psicologfa, Campus Universitario de
`Cartuja, 18071 Granada, Spain.
`
`ryngeal and supralaryngeal parameters. Regarding
`the laryngeal variables, the importance given to the
`fundamental frequency (F,) as an indicator of the
`speaker’s sex is noteworthy (4-7). The woman’s
`pitch has a higher frequency value than the man’s
`pitch, although the absolute values of this differ-
`ence are in question. Depending on the study, wom-
`an’s pitch is higher by as much as 0.45 times (8) to
`1.7 times (9) and even to an octave (10). Given that
`an inverserelation exists between the mean Fy and
`the membranousvocalfold’s length, the physiolog-
`ical substratum appears to reside in the greater
`length of the male vocal folds (11). Daniloff et al.
`(12) stated, ‘‘An individual’s modal frequency is
`governed in large part by the physical size, shape,
`and massof vocal folds and larynx. Also in part, our
`vocal habits and training accustom us to select a
`frequency range that is comfortable, so that modal
`frequency is the result of a compromise between
`personal] habit and optimum mechanical buzz fre-
`quency”’ (pp. 203-4). Nevertheless, some sociolin-
`guistic studies have suggested that these differences
`in voice quality across sexes may be due more to
`
`59
`
`2
`
`
`
`60
`
`E. MENDOZA ET AL.
`
`sociocultural than to physiological factors, sincelis-
`teners are able to distinguish between male and fe-
`male voices even when the speakers are children
`(i.e., when the speaker’s laryngeal physiology may
`be identical across sexes) (13).
`Regarding variables of the vocal tract’s reso-
`nance (VTR), research is even more scarce. Early
`research considered that the vocal tract’s contribu-
`tion to the perception of the speaker’s genderlay in
`the formants frequencies (14). Bladon (15) detected
`that the vowels emitted by men presented narrower
`formant bandwidths with a less profound drop (a
`flatter profile) in the spectrum than the vowels gen-
`erated by women. Others have suggested that there
`is a greater amplitude of the first harmonic com-
`pared with the second in the female voice, as op-
`posed to that of the male voice (9,16). However,
`this difference may come from the interaction be-
`tween Fy and harmonic structure. Klatt and Klatt
`(9) have suggested that the voice differentiation be-
`tween the sexes comes from the generation of a
`noisier aspiration in women’s larynxes compared
`with that in men. These greater levels of aspiration
`noise, centered in the high-frequency spectral re-
`gions correspondingto the third formant, make the
`female voice present a more ‘‘breathy”’ quality than
`the male voice (17). As a consequenceofthis aspi-
`ration noise in high frequencies, it can be expected
`that the source spectrum has a lower spectraltilt,
`given that upon increasing the aspiration noise in
`the third formant, the general spectral tilt is slower.
`Léfqvist and Manderson(18), using the Long-Term
`Average Spectrum (LTAS) as an analytic proce-
`dure, determinedthis overalltilt of the source spec-
`trum through the ratio of energy between 0-1 and
`1-5 kHz. However, Klatt and Klatt (9) established a
`greater spectral drop in ‘‘breathy’’ voice down to
`~2 kHz (—18 dB/octave in breathy vowels com-
`pared with —12 dB/octave in laryngealized and
`modal vowels), maintaining the aspiration noise
`from that frequency onwards. The differences be-
`tween the establishmentof the type of spectral drop
`proposed by both authors and probably due to the
`fact that Léfqvist and Manderson (18) centered
`their investigation on pathological voices rather
`than on normal voices as in the Klatt and Klatt
`study (9).
`Despite the great body of knowledge that has pro-
`gressively accumulated concerning the differences
`between the anatomy, physiology, and acoustics of
`male and female voices, very few attempts have
`been madeto classify both types of voices by means
`
`Journal of Voice, Vol. 10, No. 1, 1996
`
`of objective acoustic measures, with the exception
`of the studies by Childers et al. (19,20), Childers
`and Wu (21), and Wu and Childers (2). Following
`the line of these investigators, we tried to discover
`the acoustical differences between male and female
`voices by meansof the mid-to-long averaging tech-
`niques, as the LTAS. This type of analysis has
`proven to be most valuable as an averaging measure
`because it looks at long speech segments anddisre-
`gards linguistic contents.
`Besides, there are very few studies that base male
`and female voice differentiations on long-term av-
`eraging measures. Tarnédczy and Fant (22) com-
`pared the spectra of male and female Hungarian,
`Swedish, and German speakers with the objective
`of studying the differences in the LTASdueto vari-
`ations among these languages. Although the results
`were confusing with respect to the main objective,
`Tarnézky and Fant (22) were able to detect differ-
`ences across sexes in the different languages. The
`above-mentioned differences centered in the 0.7—
`1.5 kHz range for male speakers, and in the 1-2 kHz
`range for women. The differences between speak-
`ers’ sex were greater than expected.
`In another study, Schlorhaufer et al. (23) com-
`pared the LTASof different genders and age
`groups. Five men, five women, andfive children,
`all German speakers, were studied. Although the
`spectra of these subject groups demonstrated differ-
`ences, the researchers did not attempt to quantify
`these differences. Wu and Childers (2) conducted a
`study aimed at establishing different templates for
`both sexes, stating that gender information should
`be invariant, phoneme independent, and speakerin-
`dependent for a given gender. They added that
`these conditions can be better ensured by employ-
`ing long-term averaging measures. Following their
`suggestions, we believe that averaging measures,
`such as LTAS, emphasize the specific information
`of each subject’s gender.
`Improvement of the systems of synthesis of the
`female voice has been one of the major goals of
`previous studies using methodological procedures
`similar to ours. That is the reason for studying in
`depth the objective acoustic differentiation between
`male and female voices. Given that the majority of
`current voice synthesizers function with male
`voices, it is difficult to obtain voice synthesis of
`women’s or children’s voices with an acceptable
`level of naturalness. According to Titze (11), this
`may be due to the fact that the main parameteruti-
`lized in the generation of synthesized voices has
`
`3
`
`
`
`VOICE QUALITY IN MEN AND WOMEN
`
`61
`
`been the Fy. However, the differential synthesis of
`male and female voices implies much more than a
`mere scale of Fy, and somebasic differences in the
`phonatory and articulatory mechanisms need to be
`considered. Titze’s suggestion leads one to believe
`that a great advancein the acoustic differentiation
`of male and female voices is required in order to
`improve the current systems of analysis and syn-
`thesis of both genders’ voices. Klatt and Klatt (9)
`stated that the principal difficulty in achieving this
`objective stemmed from the diversity of acoustic
`indexes employed in the majority of the studies.
`Acoustic phonetics has established the frequency
`distribution of formants as the relevant variable in
`the identification of sounds, generally doing so by
`specifying a formant’s frequency for its central
`value. The determination of these central values be-
`comes more difficult as the fundamental value is
`increased. Due to the existence of the higher level
`in the Fy voices of women and children, the systems
`of analysis lose resolution, which impedesthe eval-
`uation of the formant’s frequency points in these
`cases (24). Furthermore, the informal observations
`of Klatt (17) suggest that the vocal spectra obtained
`from female voices do not completely conform to
`the all-pole model, possibly because of their tra-
`cheal joint and source-filter interactions. Titze (11)
`questions whether the source-filter theory of
`speech production would have followed the same
`developmentif the earlier models had been based
`on female voices.
`In the present study, wefirst attempted to deter-
`mine, by means of the LTAS, if a spectral profile
`characteristic of a speaker’s gender exists and, if
`so, to delineate the existing differences between
`male and female voice profiles obtained by this
`method. Second, we sought to demonstrate that the
`differences between both types of voices can be
`attributed to the existence of aspiration noise in the
`spectral regions corresponding approximately to
`the third formant. As described earlier (9), it is be-
`lieved that this causes female voices to be emitted
`with a more “‘breathy’’ quality than that of male
`voices.
`
`METHOD
`
`Subjects
`Fifty-five subjects (24 men and 31 women),
`whose ages ranged from 20 to 50 years (with an
`average of 28 and 30 years, respectively), partici-
`pated voluntarily in this study. All subjects were
`
`native Spanish speakers. None had a history of
`speech or auditory problems, and none suffered
`from colds or respiratory infections during the
`length of their involvementin the study. The voices
`of all subjects were determined to be normal (non-
`dysphonic) by two expert speech-language pathol-
`ogists.
`
`Experimental task
`The experimental task involved the reading of a
`standard text, taken from the Spanishtranslation of
`Lewis Carroll’s Alice in Wonderland, which lasted
`~3 minutes and was composed of three paragraphs.
`The subjects were instructed to read the textin their
`natural voice and at a normal speed. All recording
`samples took place in a soundproof room at the
`University of Granada’s Voice Laboratory. The re-
`cording microphone washeld at a distance of 20 cm
`from the mouth, in order to avoid possible aerody-
`namic interferences (25).
`
`Apparatus
`The recording was performed with an AKG D 222
`EB microphone with a flat response and a SONY 77
`ES Digital Audio Tape (DAT) with a samplingfre-
`quencyof 48 kHz, keeping the volumen of the DAT
`between — 30 and — 20 dB. The voice samples were
`introduced via a direct connection to a DSP Sona-
`Graph, model 5500 (Kay Elemetric), and were an-
`alyzed with the LTASportion of the Voice Analysis
`Program. LTAS calculates a power magnitude
`spectrum across the frequency range of the input
`signals. LTASis different from the power spectrum
`in that it includes only voiced segments, and it con-
`tinuously averages the input signal for 30-90 s. The
`advantage of screening out unvoicedsignals is that
`these unvoiced signals may corrupt the average of
`the voiced segments, and it can mask the informa-
`tion of the voice source (18). The LTAS program
`screens the input signal for voiced information
`based on a simple zero-crossing and energycriteria
`(26). The program was adjusted to include spectra
`of voice signals, and its discrete power spectrum
`was added to the accumulated average. The pro-
`gram will not include spectral signals if voicing is in
`doubt.
`The following elements were selected for the
`analysis: a frequency range of 8 kHz, an input shap-
`ing in FLAT, maintenance of the memory’s channel
`at 38 s, a transform size of 128 points, the channel
`sensitivity at 45 dB, and the AC-coupled option.
`
`Journal of Voice, Vol. 10, No. 1, 1996
`
`4
`
`
`
`62
`
`E. MENDOZA ETAL.
`
`Acoustic analysis
`The acoustic analysis was conducted with the
`second paragraphof the text in order to avoid any
`influence of possible vacillations at the beginning of
`the reading and anyfall in intensity or intonation at
`the end of the text.
`The analysis was performed on the amplitude val-
`ues, in decibels, at intervals of 160 Hz, thus, ob-
`taining a total of 50 measurements for each subject,
`corresponding to the values, which in turn corre-
`spondedto each ofthe frequency levels in the total
`range of 8 kHz (0.160, 0.320, 0.480, 0.600 ... 8
`kHz).
`
`RESULTS
`
`For evaluating the possible differences in the dis-
`tribution of energy between the male and female
`spectra and, as such, to assess at what frequencies
`they may exist, an analysis of variance (ANOVA) 2
`x 50 was conducted. The analysis began with the
`gender factor (G) at two levels, male and female,
`
`and the frequencylevel factor (L) in kHz, along 50
`frequency levels. The amplitude, measured in deci-
`bels, was analyzed as the dependentvariable. Fig-
`ure 1 presents the means in each frequency level.
`The results of the ANOVA are shownin Table 1. A
`significance level of 0.05 in the sex factor and a
`level of 0.001 in the level frequency factor were
`used. As Table | shows, there was a significant
`main effect for the sex factor [F(1,53) = 6678; p <
`0.013]. Likewise, the main effects for the level fre-
`quency factor were significant [F(49,257) =
`1509.978; p < 0.001). Significant differences were
`also seen in the interaction between these two fac-
`tors S x L [F(49,2597) = 9.336; p < 0.001].
`Given the first objective of the study, the signif-
`icant interaction differences between speakers’
`voices according to sex were analyzed with a one-
`way ANOVAforeach frequencylevel. The results
`indicated that the spectral amplitude of women’s
`voices is greater (p < 0.001) in the following fre-
`quencylevels: 0.8, 0.96, 2.88, 3.04, 4.16, 4.32, 4.48,
`
`4.64, 4.80, and 4.96 kHz.
`
`0.16
`
`0.96
`
`176
`
`256
`
`496
`416
`336
`FREQUENCY LEVELS (KHz)
`
`5.76
`
`656
`
`7.36
`
`—~— MALES
`—S~ FEMALES
`FIG. 1. Graphic representation of the mean values of amplitude (in decibels) corresponding to female and male voicesin each frequency
`level analyzed (in kilohertz).
`
`Journal of Voice, Vol. 10, No. 1, 1996
`
`5
`
`
`
`VOICE QUALITY IN MEN AND WOMEN
`
`63
`
`TABLE1. Results of analysis of variance for the mixed factorial design G X (L), being
`G the gender factor (men and women), manipulated between subject, and L the level
`frequency factor (50 levels), manipulated within subjects. The dependent variable is the
`amplitude in decibels
`
`Source
`
`Sum of squares
`
`Gender(G)
`Error
`Level (L)
`Level x Gender
`Error
`
`584.547
`4,638.928
`381,473.979
`2,358.630
`13,389.684
`
`“py = 0.013. p < 0.001.
`
`Degrees of
`freedom
`
`1
`53
`49
`49
`2,597
`
`Mean square
`
`584.547
`48.135
`7,785,183
`48.135
`5.156
`
`F
`
`6.6787
`
`1509.978°
`9.336°
`
`Later a discriminant analysis using amplitude as
`the criterion factor and the frequencylevels as the
`prediction factor was conducted to assess ifall the
`frequency levels were equally important in the dif-
`ferentiation of voices across gender. The results of
`this analysis are presented in Table 2. As seen, the
`frequency levels included in the gender discrimina-
`tion equation for those acoustic factors are 0.96,
`1.44, 1.92, 3.04, 3.20, 3.36, and 8 kHz. The classi-
`fication of subjects in this study through the dis-
`criminant function was found to be 100%.
`To evaluate the question of whetheror not female
`voices presented greater levels of aspiration noise
`in the spectral regions corresponding to the third
`formant and a lowerspectraltilt than male voices,
`
`TABLE 2. Discriminate analysis utilizing amplitude as
`the criterion factor and the frequency levels as the
`predictable factor
`
`Variable
`
`F to enter
`remove
`
`U
`Approximate
`Degrees of
`
` statistic F statistic freedom
`
`
`
`F096
`F144
`F192
`F304
`F320
`F336
`F800
`
`7.326
`7156
`4.031
`44.916
`5.321
`4.567
`4.949
`
`0.1468
`0.1613
`0.1849
`0.5413
`0.1105
`0.1214
`0.1332
`
`56.958
`50.942
`55.103
`44.916
`54.024
`48.610
`52.081
`
`5.49
`5.49
`4.50
`1.53
`7.47
`7.47
`6.48
`
`Classification Matrix
`No. of cases classified
`into group
`
`Group
`Women
`Men
`Total
`
`Percent correct
`100.0
`100.0
`100.0
`
`Women
`31
`0
`31
`
`Men
`0
`24
`24
`
`Jackknifed Classification
`No.ofcasesclassified
`into group
`
`
`Men
`Percent correct
`Women
`Group
`0
`100.0
`31
`Women
`24
`100.0
`0
`Men
`
`
`100.0 31Total 24
`
`
`as Klatt and Klatt (9) suggested, the energy concen-
`tration at the level of the third formant and the over-
`all tilt of the spectrum source were analyzed. The
`amplitude of the frequency points had previously
`been examined, showingsignificantly higher values
`for female voices. In analyzing the overalltilt of the
`spectral source, the ratio of energy between 0~I
`kHz and 1-5 kHz was calculated, as suggested by
`L6fqvist and Manderson (18). The results indicated
`that this ratio is greater among male speakers (mean
`= 5.215; SD = 1.286) than among females (mean =
`4.565; SD = 0.731). These differences are statisti-
`cally significant [F(1,53) = 5.600; p < 0.022].
`
`DISCUSSION
`
`The results of this study showed that (a) signifi-
`cant differences were present between genders in
`the distribution of energy throughout the analyzed
`frequency values, taken from voice samples. This is
`reflected in the interaction effects of the gender and
`the frequency level factors found in the ANOVA.
`(b) Significant differences were not found in all of
`the spectrum’s frequency levels, but rather were
`concentrated in the frequencies between 0.80 and 5
`kHz, particularly in the frequencies 0.96, 1.44, 1.92,
`3.04, 3.20, and 3.36 kHz. According to the results of
`the discriminant analysis, this is the spectral region
`that best differentiates the speaker’s gender. (c) The
`spectra corresponding to women’s voices showed a
`lower overall tilt; this was found on the ratio of
`0-1/1-5 kHz. (d) The LTAS, as an average measure
`of continuous voice signals, is a useful instrument
`for detecting these sex-related differences and for
`determining the spectral regions where such differ-
`ences are centered.
`From the results of the discriminantanalysis, it is
`seen that the frequency points of 0.96, 1.44, 1.92,
`3.04, 3.20, 3.36, and 8.00 kHz are most important in
`
`Journal of Voice, Vol. 10, No. 1, 1996
`
`6
`
`
`
`64
`
`E. MENDOZA ETAL.
`
`voice quality differentiation. Within the above-
`mentioned frequency points, those corresponding
`to 3.04, 3.20, and 3.36 kHz are located in the spec-
`tral regions near the third formant, and the higher
`values correspond to the female voices. The impli-
`cations of these results agree with the proposal of
`Klatt and Klatt (9) that the acoustic characteristics
`of female voices lead to a ‘‘breathier’’ quality than
`in male voices. These authors, as indicated in the
`introduction, suggest that this quality can be ex-
`plained by a longer opening and the presence of a
`posterior opening between the vocal folds, which
`would generate aspiration noise in the region of the
`third formant.
`Klatt and Klatt (9) locate another consequence of
`these physioanatomical characteristics in the lower
`spectraltilt, because of the greater concentration of
`aspiration noise in higher frequencies. L6fqvist and
`Mandersson (18) indicate a way of quantifying the
`general spectral tilt via LTAS. They determined the
`energy drop in the spectra of hyperfunctional voices
`by the ratio 0-1/I-5 kHz. The present study has
`confirmed the differences between male and female
`voices to be in this ratio. As the lower values were
`registered in the female voices, a slower generaltilt
`wasseen in this group. However,as seen in Fig.1,
`the spectral tilt in women’s voices is greater by
`>1.60 kHz. This means that the general lowering of
`the spectral tilt (until 5.0 KHz) is due to a greater
`concentration of energy in the higher frequencies
`(1.60-5.0 KHz), according to Klatt and Klatt (9).
`These results suggest that the spectral tilt ratio
`should locate the cut-off point between high and
`low frequencies at 1.60 kHz (0-1.60/1.60—-5.0) in-
`stead of at
`1 kHz, as proposed by L6fqvist and
`Manderson (18). Webelieve that the different cutoff
`point put forth by the latter authors is due to their
`having attempted to distinguish between hyper- and
`hypofunctional voices, whereas this study’s sub-
`jects presented with no vocal pathology.
`An ANOVAwasconducted with this new cutoff
`point, 1.60 KHz, to see whether this value could
`establish clearer differences between male and fe-
`male voices. As such, thestatistical significance in-
`creased =[F(1,53) = 9.023; p < 0.004].
`According to the discriminant analysis, another
`frequency point, 8.0 kHz, exists that indicates dif-
`ferences between the speaker groups. In all likeli-
`hood, another procedure of acoustical analysis is
`necessary to further investigate this question.
`The existence of noisy energy in frequencies
`>8.0 kHz has already been studied. Shoji et al. (27)
`
`Journal of Voice, Vol. 10, No. I, 1996
`
`studied the energy present in frequencies >8.0 kHz
`in vowel emission by normal subjects. These au-
`thors detected significant differences in the energy
`distribution between vowels /a/ and /u/. Following a
`methodology similar to that of Shoji et al., we dis-
`covered differencesin the configuration of the spec-
`tral energy in the regions ranging from 6-10 kHz
`and 10-16 kHz between vowels, and between dys-
`phonic and nondysphonic speakers (28). We believe
`that the differentiation established by the discrimi-
`nant analysis at the frequency point 8.0 kHz should
`move in this direction. Nevertheless, as LTASre-
`quires a great amount of memory when dealing with
`long speech segments, our currently available
`equipmentdoesnot allow the study of the spectral
`zone >8.0 kHz using LTAS.
`Ourresults agree with those of Klatt and Klatt (9)
`regarding the presenceof greater aspiration noise in
`the region of the third formant in the female voice,
`noise that causes, according to these authors, the
`female voice to present a ‘‘breathier’’ quality than
`the male voice. This quality may be possibly due to
`learning/imitation of models and perhapsrestricted
`to American women. The existence of similar ef-
`fects in the results of analysis of the speech of Span-
`ish womenindicate that this characteristic may not
`be restricted exclusively to one female nationality
`subject group. It would be necessary to study this
`particular aspect in various other subject groups be-
`fore generalizing this finding.
`The differences in the methodological procedures
`in this study and in previous studies makeit difficult
`to compare results. The materials that have served
`as stimuli in the acoustical differentiation of the
`speaker’s sex have consisted of syllables (29), sus-
`tained vowels (30), and vowels in syllabic contexts,
`as well as prolonged voiced and unvoiced fricatives
`(2,20). The VTR parameters used most frequently
`in these studies have been the frequency, ampli-
`tude, and bandwidth of the first four formants.
`However, using a long-term averaged spectrum,
`such as LTAS, one cannotaffirm that the points of
`greater amplitude that appear along this spectrum
`correspondto formantvalues as theyare relative to
`specific sounds. In addition, the procedures em-
`ployedin the earlier studies differ from those of the
`present study: electroglottography,
`inverse filter-
`ing, spectral analysis, and linear predictive coding
`(LPC) analysis prevail in the literature, whereas,
`this study used the LTAS. Nevertheless, despite
`the procedural and analytical differences between
`previous studies and the present research, the re-
`
`7
`
`
`
`VOICE QUALITY IN MEN AND WOMEN
`
`65
`
`sults of this study coincide with those found by
`other authors, and in our case with speech samples
`that were natural and independent of phonetic con-
`tent.
`With the data obtained in this study, we intended
`to identify the acoustical physiological relations in
`the human voice. The existing body of knowledge
`on LTASdoesnot permit the identification of these
`relations nor does it necessarily have to be thefinal
`goal of acoustical investigation. Our intention has
`been to contribute a model with which to compare,
`using sufficient statistical evidence, the profile of
`the spectral energy’s distribution in male and fe-
`male voices averaged on a long-term basis. It was
`our intention to contribute significant evidence that
`would aid in improving the current systemsof syn-
`thesis and recognition of women’s voices.
`The determination of a spectral area, correspond-
`ing approximately to the third formant, particularly
`sensitive to the differential establishment of male
`and female voice models, can be seen as oneofthe
`more important contributionsof this study. Accord-
`ing to the data, it is this area of the spectrum that
`presents a significantly different profile in both
`sexes and toward which more investigative efforts
`should be directed. Future research mightuse a per-
`ceptive validation instrument in looking at spectral
`representations for the two voice groups. This
`might include, for example, first maskingorfiltering
`out the spectral regions that are irrelevant in voice
`identification before having listeners decide wheth-
`er a particular LTAS sample correspondsto a male
`or female voice.
`To conclude, different profiles of energy distribu-
`tion in the spectrum can be established for male and
`female voices, and these differences, apparently,
`are due to the presenceofgreater aspiration noise in
`the women’s voices. This causesthe female voice,
`in contrast to the male voice to present a ‘‘breath-
`ier’ quality. Becauseofthis, the spectraltilt in wo-
`men’s voices is smaller than that in men’s voices.
`Finally, the LTASis a technique thatis sufficiently
`sensitive for detecting these differences.
`
`Acknowledgment: This study was supported by
`DGICYT(Direccién General de Investigacién Cientifica
`y Técnica), Ministerio de Educacién y Ciencia (Spain),
`Project PS93-0203.
`
`REFERENCES
`
`1. O'Kane M. Recognition of speech and recognition of
`speaker sex: parallel or concurrent processes? J Acoust Soc
`Am 1900;82 (suppl 1):S84.
`
`10.
`
`il.
`
`12.
`
`13.
`
`14.
`
`15.
`
`16.
`
`17.
`
`18.
`
`. Wu K, Childers DG. Gender recognition from speech. Part I.
`Coarse analysis. J Acoust Soc Am 1991;90:1820—40.
`. Murry T, Singh S. Multidimensional analysis of male and
`female voices. J Acoust Soc Am 1980;68: 1294-300.
`. Henton CG. Tact andfiction in the description of female and
`male pitch. J Acoust Soc Am 1987;82 (supp! 1):S91.
`. Hollien H, Malzik E. Evaluation of cross-sectional studies of
`adolescent voice changes in males. Speech Monograph
`1967 ;34:80-4.
`. Saxman J, Burk K. Speaking fundamental frequency char-
`acteristics of middle-aged female. Folia Phoniatr (Basel)
`1967519: 167-72.
`. Stoicheff M. Speaking fundamental frequency characteris-
`tics of non-smoking female adults. J Speech Hear Res 1981;
`24:437-41.
`. Monsen RB, Engebretson AM. Study of variations in the
`male and female glottal wave. J Acoust Soc Am 1977;62:981-
`93.
`. Klatt DH, Klatt LC. Analysis, synthesis and perception of
`voice quality variations among female and male talkers. J
`Acoust Soc Am 1990;87:820-57.
`Linke CE. A study of pitch characteristics of female and
`their relationship to vocal effectiveness. Folia Phoniatr
`(Basel) 1973;25:173-85.
`Titze IR. Physiological and acoustic differences between
`male and female voices. J Acoust Soc Am 1989;85: 1699-707.
`Daniloff R, Schuckers G, Feth L. The physiology of speech
`and hearing. An introduction. Englewood Cliffs, NJ: Pren-
`tice-Hail, 1980.
`Woods N, College L. It’s not what she says. It’s the way that
`she says it: The influence of speaker-sex on pitch andinto-
`national patterns. Res Speech Percept Indiana University
`1992;18 (Progress Report):84-95.
`Coleman RO. A comparison of the contributions of two
`voice quality characteristics to the perception of maleness
`and femaleness in the voice. J Speech Hear Res 1976;19:
`168-80.
`Bladon A. Acoustic phonetics, auditory phonetics, speaker
`sex and speech recognition: a thread. In: Fallside F, Woods
`A, eds. Computer speech processing. Englewood Cliffs, NJ:
`Prentice-Hall, 1983.
`HentonC, Bladon R. Breathiness in normal female speech:
`inefficiency versus desirability. Lang Commun 1985;5:
`221-7.
`Klatt DH. Detailed spectral analysis of a female voice. J
`Acoust Soc Am 1986;81 (suppl 1):S80.
`Léfqvist A, Manderson, B. Long-time average spectrum of
`speech and voice analysis. Folia Phoniatr (Basel) 1987;39:
`221-9.
`. Childers DG, Wu K, Hicks DM. Factors in voice quality:
`acoustic features related to gender. Proceeding of IEEE In-
`ternational Conference of Acoustics, Speech Signal Pro-
`cessing 1987;1:293-6.
`Childers DG, Wu K, Hicks DM, Yegnarayana B. Voice con-
`version. Speech Commun 1989;8:147-58.
`Childers DG, Wu K. Genderrecognition from speech. Part
`Il. Fine analysis. J Acoust Soc Am 1991;90:1841-56.
`Tarnéczy T, Fant G. Some remarks on the average speech
`spectrum. Speech Transmission Laboratory. Quarterly
`Progress and Status Reports (RoyalInstitute of Technology,
`Stockholm) 1964;4:13-4.
`Schlorhaufer W, Miller WG, Hussl B, Scharfetter L. En-
`ergieverteilung und Dynamik bei der Mutationsfistelstimme
`im Vergleich zur Normalstimme. Folia Phoniatr (Basel)
`1972;24:7-18.
`O’Shaughnessy D, ed. Speech communication: human and
`machine. Woburn, MA: Addison-Wesley, 1987.
`Titze IR, Winholtz WS. Effect of microphone type and
`
`20.
`
`21.
`
`22.
`
`23.
`
`24.
`
`25.
`
`Journal of Voice, Vol. 10, No. 1, 1996
`
`8
`
`
`
`66
`
`E. MENDOZA ETAL.
`
`placement on voice perturbation measurements. J Speech
`Hear Res 1993;36:1177-90.
`26. Rabiner L, Schafer R, eds. Digital processing of speech sig-
`nals. Englewoods Cliffs, NJ: Prentice-Hall, 1978.
`27. Shoji K, Regenbogen E, Yu JD, Blaugrund E. High-
`frequency components of normal and dysphonic voices. J
`Voice 1991;5:29-35.
`28. Valencia N, Mendoza E, Mateo I Carballo G. High-
`
`29.
`
`30.
`
`frequency components of normal and dysphonic voices. J
`Voice 1994;8:157-62.
`Nittrouer S, McGowan RS, Milenkovic PH, Beehler D.
`Acoustic measurements of men’s and women's voices: a
`study of context effects and covariations. J Speech Hear
`Res 1990;33:761-75.
`Kuwabara H, Ohgushi K. Experiments on voice qualities of
`vowels in males and females and correlation with acoustic
`fe