`Volkswagen Group of America, Inc., Petitioner
`
`1
`
`
`
`&U
`
`ta
`
`yaM
`
`Po
`
`
`
`tEm2:Qsfio$2.
`
`8meE:
`
` III'|.luI
`
`
`
`2,225.8sH._n_m
`
`h
`
`mM
`
`US 6,230,132 B1
`
`
`
`.zo:<z:m.uaS.2253L22.
`
`
`
`11zo_:.,§E52232w22:893%zo_=z__§
`
`m<222228E
`
`
`
`Emazo=<z_Eo
`
`
`
`
`
`EEGzo=<o_><z9mm~_8<zofizzaozwmzé
`
`
`
`B8258.ummsou~m=zmzo_zz__ma3%.zo=<z_Ee
`
`
`
`
`
`2
`
`
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 2 0f 10
`
`US 6,230,132 B1
`
`
`
`
`
` RESOLVE
`
`1000
`
`"DESTINATION
`LOCATION ENTRY’
`
`1010
`
`LOAD BASIC
`VOCABULARY
`
`1020
`
`"PLEASE SAY THE
`DESTINATION LOCATION"
`
`1030
`
`<DESTlNAT|ON
`LOCATl0N_1>
`
`1040 <DESTINATlON LOCAT|0N_1>
`T0 SPEECH RECOGNITION
`ENGINE
`
`1050
`
`IS <HYPO.1.1>
`CORRECT?
`
`1050
`
`STORE
`<DESTINAT|0N LOCATlON_1>
`
`‘PLEASE SAY oesnwmou
`
`1080
`
`<DESTINATl0N
`LOCATl0N__2>
`
`<DEST|NATl0N LOCAT|0N__2>
`TO SPEECH RECOGNITION
`ENGINE
`
`
`
`AMBIGUITY
`
`"IS <POSTAL CODE)
`<DEST|NATl0N LOCATION)
`CORRECT?
`
`YES
`
`STORE DESTINATION
`LOCATION
`
`1 190
`
`1140
`
`commug wmq
`57512350
`
`SPELL DESTINATION
`LOCATION
`
`F|G.2
`
`3
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 3 of 10
`
`US 6,230,132 B1
`
`.
`1
`1
`1
`|
`1
`
`I
`I
`I
`I
`I
`
`I
`
`GENERATE
`
`AMBIGUITY LIST
`
`DESTINATION
`LOCATION
`UNAMBIGUOUS?
`
`1 O00
`
` ENTER
`DESTINATION
`LOCATION
`
`
`
`I 01 0
`
`1020
`
`
`"PLEASE SAY
`DLE§3\NT’%£-N
`
`
`
`"<DESTINATION
`L0CAn0N_1>'
`
`
`
`T 030
`
`
`1 040
`
`
`
`
`
`
`
`
`
`
`<DESTINATION LOCAT|ON_I>
`TO SPEECH RECOGNITION
`ENGINE
`
`RESOLVE
`AMBICUITY
`
`
`
`"IS <POSTAL CODE>
`(DESTINATION LOCATION)
`CORRECT?"
`
`1 050
`
`
`
`
`
` 1 060 STORE
`DESTINATION
`LOCATION
`
`
`
`DE5Sn,E,L\%,0N
`I
`
`
`
`I
`
`LOCAHON
`
`P
`
`
`
`1140
`
`
`
`
`
`
`
`CONTINUE WITH
`
`STEP 350
`
`F|G.3
`
`‘IS <HYPO.1.1>
`CORRECT?
`
`N0
`
`STORE
`
`<DESTINAT|0N LOCATION_1>
`
`"IS <HYPO.1.2>
`CORRECT?
`
`
`
`YES
`
`I 075
`
`4
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 4 0f 10
`
`US 6,230,132 B1
`
`SELECT
`FROM LIST
`
`1 430
`
`‘LIST COMPRISES
`<N> ENTRIES"
`
`1440
`
`1 445 “S
`
`1450
`
`1460
`
`1500
`
`LOCATION/STREET
`couw NOT BE
`rouuo
`
`1475
`
`1490
`
`PASS ON
`<ENTRY_X>
`
`CONTINUE WITH
`wm STATE 0
`
`F|G.4
`
`
`
`DISPLAY UST:
`DMDED INTO PAGES
`WITH <K> ENTRIES
`
`
`
`"PREPARE
`NEXT PAGE'
`
` 1430
`
`5
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 5 of 10
`
`US 6,230,132 B1
`
`RESOLVE
`AMBIGUITY
`
`
`
`‘THERE ARE <N>
`INSTANCES OF
`<DEST|NATION LOCATION)
`
`
`
`
`
`
`SORT LIST BY
`NUMBER OF
`INHABITANTS
`
`"DO YOU WANT TO (30
`To <LARGEST CnY>?‘
`
`N0
`
`
`
`FIRST
`INTERROGATION
`
`
`
`SELECT
`FROM LIST
`
`1240
`
`
` 1410
`
`DESTINATION LOCATION
`UNAMBIGUOUS?
`
`NO
`
`1260
`
`RETURN RESULTS TO
`DIALOGUE CALLING
`THEM UP
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`I
`
`F|G.5
`
`6
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 6 of 10
`
`US 6,230,132 B1
`
`SPELL DESTINATION
`
`'|S <NEW_HYPO.4.I >
`CORRECT?
`
`YES "PLEASE SPELL 201 O
`
`DESTINATION
`LOCATION"
`
`SELECT
`FROM LIST
`
`2020
`
`SPEECH INPUT
`
`2030
`
`2040
`
`2050
`2°75 NO
`
`RETURN
`HYPOTHESIS LIST
`
`GENERATE NEW
`LOCATION UST
`
`ACOUSTIC VALUE
`STORED?
`
`2060
`
`YES
`
`LOAD AND GENERATE
`NEW
`LEXICON FROM NEW
`
`LOCATION
`LOCATION LIST
`LIST
`
`_. NEW
`Hypomgggg
`ATCR(§\JISSTFl(t;RV’§LTl<’Jl§IETI)0
`
`
`“SI
`
`
`
`
`
`ENGINE
`
`2100
`
`2110
`
`2130
`
`DESTINATION LOCATION
`UNAMBIGUOUS?
`YES
`
`I
`
`
`
`"0
`
`
`
`RESOLVE
`
`AMBIGUITY
`
`I
`
`N0
`
`‘IS <POSTAL CODE>
`<DESTINATION LOCATION)
`coRREcT?'
`YES
`
`2150
`
`2140
`
`TEMPORARILY
`sToREOgE%TCII’iATI0N
`
`N0 DESTINATION
`L0CglT:I0FI*:JUCN0DlJND
`
`CONTINUE WITH
`P
`
`STE
`
`350
`
`CONTINUE WITH
`T TT
`
`WA‘
`
`SPEECH RECOGNITION
`OS AE
`
`
`
`SPEECH RECOGNITION
`ENGINE**—NEW
`HYPOTHESIS LIST
`
`HYPOTHESIS LIST FROM
`
`
`
`2075
`
`I-'|G.6
`
`7
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 7 of 10
`
`US 6,230,132 B1
`
`COURSE
`
`DESTINATION INPUT
`
`3000
`
`
`
` 3010
`
`ENTER COURSE
`
`'
`
`DESTINATION
`
`CALCULATE I500
`
`LOCATIONS IN THE
`
`VICINITY OF <COURSE
`
`
`
` 3300
`
`DESTINATION)
`
`
`
`
`DESTINATION LEXICON
`
` CENERATE FINE
` 3310
`
`RECOGNITION ENGINE
`
`
`
`AND LOAD IN
`
`
`
` DESTINATION INPUT
`
`3320
`IMPLEMENTATION
`
`STEP IOIO
`
`
`WITHOUT
`
`F|G.7
`
`
`
`8
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 8 of 10
`
`US 6,230,132 B1
`
`STORE
`ADDRESS
`
`7000
`
`
`
`DESTINATION ADDRESS
`IN MEMORY?
`
`"WOULD YOU LIKE TO STORE
`THE CURRENT DESTINATION?‘
`
`7010
`
`7020
`
`7030
`
`ENTER
`
`ADDRESS
`
`7040
`
`
`
`'PLEASE SAY THE NAME
`UNDER WHICH YOU WANT
`THE STORE TO DESTINATION"
`
`
`
`<NAME_T > —>-
`SPEECH RECOGNITION
`ENGINE
`
`7060
`
`DESTINATION
`ADDRESS
`—- <NAME_1>
`
`7070
`
`'AODRESS STORED UNDER
`
`CONTINUE WITH
`NAIT STATE 0
`
`7090
`
`F|G.8
`
`9
`
`
`
`U.S. Patent
`
`May 8, 2001
`
`Sheet 9 of 10
`
`US 6,230,132 B1
`
`YES
`
`DESTINATION LOCATION
`ALREADY ENTERED?
`
` "WOULD YOU
`
`YES
`
`5010
`
`"IS <HYPO.5.I>
`CORRECT?"
`N0
`
`STORE
`
`5 10
`
`1
`
`5120
`
`NO
`
`5130
`
`SPELL
`STREET
`
`5140
`
`' 5150
`
`N0
`
`"TS <HYPO.5.2>
`CORRECT?‘
`
`GENERATE LIST
`WITH AMBIGUITY
`
`STREET
`
`
`
`
`
`LOCATION
`
`NUMBER or STREETS
`(NS) < M?
`
`5040
`
`N0
`
`5050
`
`N0
`
`
` STREET
`
`
`
`RESOLVE
`STREET
`AMBIGUITY
`
`
`
`AMBIGUOUS?
`
`GENERATE STREET
`LEXICON AND LOAD
`
`IN RECOGNITION ENGINE
`
`"PLEASE SAY THE
`STREET NAME'
`
`5060
`
`5070
`
`<STREET_I>
`
`5030
`
`5200
`
`<STREET_1> T0 SPEECH
`RECOGNITION ENGINE
`
`5090
`
`CONTINUE WIT“
`STEP 500
`
`5190
`
`9
`
`10
`
`5030
`DESTINATION
`
`"0
`
`ENTER
`
`LIKE A STREET IN
`<‘3”RRE”T—°'">'-"
`
`
`
`10
`
`
`
`S.U
`
`wa
`
`mMmmmwM
`
`US 6,230,132 B1
`
`P...,:N
`
`Emamzo=§><z
`
`o_.o:
`
`11
`
`
`
`US 6,230,132 B1
`
`1
`PROCESS AND APPARATUS FOR REAL-
`TIME VERBAL INPUT OF A TARGET
`ADDRESS OF A TARGET ADDRESS SYSTEM
`
`BACKGROUND AND SUMMARY OF THE
`INVENTION
`
`This application claims the priority of German patent
`document 197 09 518.6, filed Mar. 10, 1997, the disclosure
`of which is expressly incorporated by reference herein.
`The invention relates to a method and apparatus for
`real-time speech input of a destination address into a navi-
`gation system.
`German patent document DE 196 00 700 describes a
`target guidance system for a motor vehicle in which a fixedly
`mounted circuit, a contact field circuit or a voice recognition
`apparatus can be used as an input device. The document,
`however, does not deal with the vocal input of a target
`address in a target guidance system.
`Published European patent application EP 0 736 853 A1
`likewise describes a target guidance system for a motor
`vehicle. The speech input of a target address in a target
`guidance system is, however, not the subject of this docu-
`ment.
`
`Published German patent application DE 36 08 497 A1
`describes a process for speech controlled operation of a long
`distance communication apparatus, especially an auto tele-
`phone. It is considered a disadvantage of the process that it
`does not deal with the special problems in speech input of a
`target address in a target guidance system.
`Not yet prepublished German patent application P 195 33
`541.4-52 discloses a method and apparatus of this type for
`automatic control of one or more devices, by speech com-
`mands or by speech dialogue in real time. Input speech
`commands are recognized by a speech recognition device
`comprising a speaker-independent speech recognition
`engine and a speaker-independent additional speech recog-
`nition engine that identifies recognition probability as the
`input speech command, and initiates the functions of the
`device or devices associated with this speech command. The
`speech command or speech dialogue is formed on the basis
`of at least one syntax structure, at least one basic command
`vocabulary, and if necessary at least one speaker-specific
`additional command vocabulary. The syntax structures and
`basic command vocabularies are presented in speaker-
`independent form and are established in real
`time. The
`speaker-specific additional vocabulary is input by the
`respective speaker and/or modified by him/her, with an
`additional speech recognition engine that operates according
`to a speaker-dependent recognition method being trained in
`training phases, during and outside real-time operation by
`each speaker, to the speaker-specific features of the respec-
`tive. speaker by at least one-time input of the additional
`command. The speech dialogue and/or control of the devices
`is developed in real time as follows:
`Speech commands input by the user are fed to a speaker-
`independent speech recognition engine operating on
`the basis of phonemes, and to the speaker-dependent
`additional speech recognition engine where they are
`subjected to feature extraction and are checked for the
`presence of additional commands from the additional
`command vocabulary and classified in the speaker-
`dependent additional speech recognition engine on the
`basis of the features extracted therein.
`
`Then the classified commands and syntax structures of the
`two speech recognition engines, recognized with a
`
`2
`certain probability, are assembled into hypothetical
`speech commands and the latter are checked and clas-
`sified for their reliability and recognition probability in
`accordance with the syntax structure provided.
`Thereafter, the additional hypothetical speech commands
`are checked for their plausibility in accordance with
`specified criteria and, of the hypothetical speech com-
`mands recognized as plausible, the one with the highest
`recognition probability is selected and identified as the
`speech command input by the user.
`Finally, the functions of the device to be controlled that
`are associated with the identified speech command are
`initiated and/or answers are generated in accordance
`with a predetermined speech dialogue structure to
`continue the speech dialogue. According to this
`document, the method described can also be used to
`operate a navigation system, with a destination address
`being input by entering letters or groups of letters in a
`spelling mode and with it being possible for the user to
`supply a list for storage of destination addresses for the
`navigation system using names and abbreviations that
`can be determined in advance.
`
`the special
`The disadvantage of this method is that
`properties of the navigation system are not discussed, and
`only the speech input of a destination location by means of
`a spelling mode is described.
`The object of the invention is to provide an improved
`method and apparatus of the type described above, in which
`the special properties. of a navigation system are taken into
`account and simplified.
`Another object of the invention is to provide such an
`arrangement which enables faster speech input of a desti-
`nation address in a navigation system, improving operator
`comfort.
`
`These and other objects and advantages are achieved by
`the method and apparatus according to the invention for
`speech input of destination addresses in a navigation system,
`which uses a known speech recognition device, such as
`described for example in the document referred to above,
`comprising at
`least, one speaker-independent speech-
`recognition engine and at least one speaker-dependent addi-
`tional speech-recognition engine. The method according to
`the invention makes possible various input dialogues for
`speech input of destination addresses. In a first input dia-
`logue (hereinafter referred to as the “destination location
`input”), the speaker-independent speech recognition device
`is used to detect destination locations spoken in isolation,
`and if such destination location is not recognized, to recog-
`nize continuously spoken letters and/or groups of letters. In
`a second input dialogue (hereinafter referred to as “spell
`destination location”), the speaker-independent speech rec-
`ognition engine is used to recognize continuously spoken
`letters and/or groups of letters. In a third input dialogue
`(hereinafter referred to as “coarse destination input”), the
`speaker-independent speech-recognition engine is used to
`recognize destination locations spoken in isolation, and if
`such destination location is recognized, to recognize con-
`tinuously spoken letters and/or groups of letters. In a fourth
`input dialogue (hereinafter referred to as “indirect input”),
`the speaker-independent speech recognition engine is used
`to recognize continuously spoken numbers and/or groups of
`numbers. In a fifth input dialogue (hereinafter referred to as
`“street input”), the speaker-independent speech-recognition
`device is. used to recognize street names spoken in isolation
`and if the street name spoken in isolation is not recognized,
`to recognize continuously spoken letters and/or groups of
`letters.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`12
`
`12
`
`
`
`US 6,230,132 B1
`
`3
`the
`By means of the input dialogues described above,
`navigation system is supplied with verified destination
`addresses, each comprising a destination location and a
`street. In a sixth input dialogue (hereinafter referred to as
`“call up address”), in addition to the speaker-independent
`speech-recognition engine,
`the speaker-dependent addi-
`tional speech-recognition engine is used to recognize key-
`words spoken in isolation.
`In a seventh input dialogue
`(hereinafter referred to as “store address”), a keyword spo-
`ken in isolation by the user is assigned a destination address
`entered by the user, so that during the input dialogue “call up
`address” a destination address associated with the corre-
`sponding recognized keyword is transferred to the naviga-
`tion system.
`The method according to the invention is based primarily
`on the fact
`that
`the entire admissible vocabulary for a
`speech-recognition device is not loaded into the speech-
`recognition device at the moment it is activated; rather, at
`least a required lexicon is generated from the entire possible
`vocabulary during real-time operation and is loaded into the
`speech-recognition device as a function of the required input
`dialogue for executing an operating function. There are more
`than 100,000 locations In the Federal Republic of Germany
`that can serve as vocabulary for the navigation system. If
`this vocabulary were to be loaded into the speech-
`recognition device,
`the recognition process would be
`extremely slow and prone to error. A lexicon generated from
`this vocabulary comprises only about 1500 words, so that
`the recognition process would be much faster and the
`recognition rate higher.
`At least one destination file that contains all possible
`destination addresses and certain additional information for
`
`the possible destination addresses of a guidance system, and
`is stored in at least one database, is used as the database for
`the method according to the invention. From this destination
`file, lexica are generated that comprise at least parts of the
`destination file, with at least one lexicon being generated in
`real
`time as a function of at
`least one activated input
`dialogue. It is especially advantageous for the destination
`file for each stored destination location to contain additional
`
`information, for example political affiliation or a additional
`naming component, postal code or postal code range, tele-
`phone area code, state, population, geographic code, pho-
`netic description, or membership in the lexicon. This addi-
`tional information can then be used to resolve ambiguities or
`to accelerate the search for the desired destination location.
`
`Instead of the phonetic description, a transcription of the
`phonetic description in the form of a chain of indices,
`depending on the implementation of the transcription, can be
`used instead of the phonetic description for the speech-
`recognition device. In addition, a so-called automatic pho-
`netic transcription that performs a rule-based conversion of
`orthographically present names using a table of exceptions
`into a phonetic description can be provided. Entry of lexicon
`membership is only possible if the corresponding lexica are
`generated in an “off-line editing mode,” separately from the
`actual operation of the navigation system, from the destina-
`tion file and have been stored in the (at least one) database,
`for example a CD-ROM or a remote database at a central
`location that can be accessed by corresponding communi-
`cations devices such as a mobile radio network. Generation
`
`of the lexica in the “off-line editing mode” makes sense only
`if sufficient storage space is available in the (at least one)
`database and is especially suitable for
`lexica that are
`required very frequently. In particular, a CD-ROM or an
`external database can be used as the database for
`the
`
`destination file since in this way the destination file can
`always be kept up to date.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`At the moment, not all possible place names in the Federal
`Republic of Germany have been digitized and stored in a
`database. Similarly, a corresponding street list is not avail-
`able for all locations. Therefore it is important to be able to
`update the database at any time. An internal nonvolatile
`storage area of the navigation system can also be used as the
`database for the (at least one) lexicon generated in the
`“off-line editing mode.”
`To facilitate more rapid speech entry of a desired desti-
`nation address into the navigation system, following the
`initialization phase of the navigation system or with suffi-
`ciently large nonvolatile internal storage, a basic vocabulary
`is loaded each time the database is changed, which vocabu-
`lary comprises at least one basic lexicon generated from the
`destination file. This basic lexicon can be generated in the
`“off-line editing mode.” The basic lexicon can be stored in
`the database in addition to the destination file or can be
`
`stored in a nonvolatile internal memory area of the naviga-
`tion system. As an alternative, generation of the basic
`lexicon can wait until after the initialization phase. Dynamic
`generation of lexica during real-time operation of the navi-
`gation system, in other words during operation, offers two
`important advantages. Firstly this creates the possibility of
`putting together any desired lexica from the database stored
`in the (at least one) database, and secondly considerable
`storage space is saved in the (at least one) database since not
`all of the lexica required for the various input dialogues need
`to be stored in the (at least one) database prior to activation
`of the speech-recognition engine.
`In the embodiment described below, the basic vocabulary
`comprises two lexica generated in the “off-line editing
`mode” and stored in the (at least one) database, and two
`lexica generated following the initialization phase. If the
`speech-recognition device has sufficient working memory,
`the basic vocabulary is loaded into it after the initialization
`phase, in addition to the admissible speech commands for
`the speech dialogue system, as described in the above
`mentioned German patent application P 195 33 541.4-52.
`Following the initialization phase and pressing of the PTT
`(push-to-talk) button,
`the speech dialogue system then
`allows the input of various information to control
`the
`devices connected to the speech dialogue system as well as
`to perform the basic functions of a navigation system and to
`enter a destination location and/or a street as the destination
`address for the navigation system. If the speech-recognition
`device has. insufficient RAM, the basic vocabulary is not
`loaded into it until a suitable operating function that accesses
`the basic vocabulary has been activated.
`The basic lexicon, stored in at least one database, com-
`prises the “p” largest cities in the Federal Republic of
`Germany, with the parameter “p” in the design described
`being set at 1000. This directly accesses approximately 53
`million citizens of the FRG or 65% of the population. The
`basic lexicon comprises all locations with more than 15,000
`inhabitants. A regional lexicon also stored in the database
`includes “z” names of regions and areas such as Bodensee,
`Schwabische Alb, etc., with the regional
`lexicon in the
`version described comprising about 100 names for example.
`The regional
`lexicon is used to find known areas and
`conventional regional names. These names cover combina-
`tions of place names that can be generated and loaded as a
`new regional lexicon after the local or regional name is
`spoken. An area lexicon, generated only after initialization,
`comprises “a” dynamically loaded place names in the vicin-
`ity of the actual vehicle location, so that even smaller places
`<<
`2:
`in the immediate vicinity can be addressed directly, with the
`parameter
`a in the embodiment described being set at 400.
`
`13
`
`13
`
`
`
`US 6,230,132 B1
`
`5
`This area lexicon is constantly updated at certain intervals
`while driving so that it is always possible to address loca-
`tions in the immediate vicinity directly. The current vehicle
`location is reported to the navigation system by a positioning
`system known from the prior art, for example by means of
`a global positioning system (GPS). The previously described
`lexica are assigned to the speaker-independent speech-
`recognition engine. A name lexicon that is not generated
`from the destination file and is assigned to the speaker-
`dependent speech-recognition engine comprises approxi-
`mately 150 keywords from the personal address list of the
`user, spoken by the user. Each keyword is then given a
`certain destination address from the destination file by the
`input dialogue “store address.” These specific destination
`addresses are transferred to the navigation system by speech
`input of the associated keywords using the input dialogue
`“call up address.” This results in a basic vocabulary of about
`1650 words that are recognized by the speech-recognition
`device and can be entered as words spoken in isolation
`(place names, street names, keyword).
`Provision can also be made for transferring addresses
`from an external data source, for example a PDA (personal
`digital assistant) or a portable laptop computer, by means of
`data transfer to the speech dialogue system or to the navi-
`gation system and integrate it as an address lexicon in the
`basic vocabulary. Normally, no phonetic descriptions for the
`address data (name, destination location, street) are stored in
`the external data sources. Nevertheless in order to be able to
`
`transfer these data into the vocabulary for a speech-
`recognition device, an automatic phonetic transcription of
`these address data, especially the names, must be performed.
`Assignment to the correct destination location is then per-
`formed using a table.
`For the sample dialogues described below, a destination
`file must be stored in the (at least one) database of the
`navigation system that contains a data set according to Table
`1 in the place found in the navigation system. Depending on
`the storage location and availability, parts of the information
`entered can also be missing. However, this only relates to
`data used to resolve ambiguities, for example additional
`naming component, county, telephone area codes, etc. If
`address data from an outside data source are used,
`the
`address data must be supplemented accordingly. The word
`subunits for the speech-recognition device are especially
`important, which act as hidden Markov model speech rec-
`ognition engines (HMM recognition engines).
`
`TABLE 1
`
`Description of Entry
`
`Place Name
`Political Afliliation or
`additional naming component
`Postal Code or Postal Code
`Range
`Telephone Area Code
`County
`State
`Population
`Geographic Code
`Phonetic Description
`Word Subunits for HMM Speech-
`Recognizing Device
`
`Lexicon Membership
`
`Example
`
`Flensburg
`—
`
`24900-24999
`
`0461
`Flensburg, county
`Schleswig-Holstein
`87,526
`9.43677, 54.78204
`Ifl'EnsIbUrkI
`f[LN]le e[LN] n[C] s b[Vb]
`U[Vb]r k. or 101 79 124 117
`12 39 35 82 68
`3, 4, 78 .
`.
`.
`
`Other objects, advantages and novel features of the
`present invention will become apparent from the following
`detailed description of the invention when considered in
`conjunction with the accompanying drawings.
`
`6
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic diagram providing an overview of
`the possible input dialogues for speech input of a destination
`address for a navigation system according to the invention;
`FIG. 2 is a schematic representation of a flowchart of a
`first embodiment of the input dialogue “destination location
`input”;
`FIG. 3 is a schematic view of a flowchart of a second
`
`embodiment for the input dialogue “destination location
`input”;
`FIG. 4 is a schematic view of a flowchart for the input
`dialogue “choose from list”;
`FIG. 5 is a schematic view of a flowchart for the input
`dialogue “resolve ambiguity”;
`FIG. 6 is a schematic diagram of a flowchart for the input
`dialogue “spell destination location”;
`FIG. 7 is a schematic view of a flowchart for the input
`dialogue “coarse destination input”;
`FIG. 8 is a schematic view of a flowchart for the input
`dialogue “store address”;
`FIG. 9 is a schematic view of a flowchart for the input
`dialogue “street input”; and
`FIG. 10 is a schematic view of a block diagram of a device
`for performing the method according to the invention.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shows an overview of the possible input dialogues
`for speech input of a destination address for a navigation
`system. A speech dialogue between a user and a speech
`dialogue system according to FIG. 1 begins following the
`initialization phase with a wait state 0, in which the speech
`dialogue system stops until the PTT button (push-to-talk
`button) is actuated, and to which the speech dialogue system
`returns after the speech dialogue is terminated. The user
`activates the speech dialogue system by actuating the PTT
`button in step 100. The speech dialogue system replies in
`step 200 with an acoustic output, for example by a signal
`tone or by a speech output indicating to the user that the
`speech dialogue system is ready to receive a speech com-
`mand. In step 300, the speech dialogue system waits for an
`admissible speech command in order, by means of dialogue
`and process control, to control the various devices connected
`to the speech dialogue system or to launch a corresponding
`input dialogue. However, no details of the admissible speech
`commands will be provided at this point that relate to the
`navigation system. The following speech commands relating
`to the various input dialogues of the navigation system can
`now be entered:
`
`“Destination location input” E1: This speech command
`activates the input dialogue “destination location
`input.”
`“Spell destination location” E2: This speech command
`activates the input dialogue “spell destination loca-
`tion.”
`
`“Coarse destination input” E3: This speech command
`activates the input dialogue “coarse destination input.”
`“Postal code” E4 or “telephone area code” E5: The input
`dialogue “indirect
`input” is activated by these two
`speech commands.
`“Street input” E6: This speech command activates the
`input dialogue “street input.”
`“Store address” E7: This speech command activates the
`input dialogue “store address.”
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`14
`
`14
`
`
`
`US 6,230,132 B1
`
`7
`“Call up address” E8: This speech command activates the
`input dialogue “call up address.”
`Instead of the above, of course, other terms can be used
`to activate the various input dialogues. In addition to the
`above speech commands, general speech commands can
`also be used to control the navigation system, for example
`“navigation information,” “start/stop navigation,” etc.
`After starting an input dialogue by speaking the corre-
`sponding speech command,
`the corresponding lexica are
`loaded as the vocabulary into the speech recognition device.
`With a successfully performed speech input of the destina-
`tion location as part of the destination address input by
`means of one of the input dialogues “destination location
`input” in step 1000, “spell destination location” in step 2000,
`“coarse destination input” in step 3000, or “indirect input”
`in step 4000, a check is then made in step 350 whether or not
`a corresponding street list is available for the recognized
`destination location. If the check yields a negative result, a
`branch is made to step 450. If the check yields a positive
`result, a check is made in step 400 to determine whether or
`not the user wants to enter a street name. If the user responds
`to question 400 by “yes,” the input dialogue “street input” is
`called up. If the user answers question 400 by “no” a branch
`is made to step 450. Question 400 is therefore implemented
`only if the street names for the corresponding destination
`location are included in the navigation system. In step 450,
`the recognized desired destination location is automatically
`updated by entering “center” or with “downtown” as the
`street input, since only a complete destination address can be
`transferred to the navigation system, with the destination
`address in addition to the destination location also compris-
`ing a street or a special destination, for example the railroad
`station, airport, downtown, etc. In step 500, the destination
`address is passed to the navigation system. Then the speech
`dialogue is concluded and the speech dialogue system
`returns to wait state 0.
`
`If the speech command “street input” E6 was spoken by
`the user at the beginning of the speech dialogue in step 300
`and recognized by the speech recognition device, in step
`5000 the input dialogue “street input” will be activated.
`Then, following the successful input of the desired destina-
`tion location and the street, the destination address is trans-
`ferred to the navigation system in step 500. If the speech
`command “call up address” E8 was spoken by the user at the
`beginning of the speech dialogue in step 300 and was
`recognized by the speech recognition device, in step 6000
`the input dialogue “call up address” will be activated. In the
`input dialogue “call up address” a keyword is spoken by the
`user and the address associated with the spoken keyword
`will be transferred in step 500 as a destination address to the
`navigation system. If the speech command “store address”
`E7 was spoken by the user at the beginning of the. speech
`dialogue in step 300 and recognized by the speech recog-
`nition device, in step 7000 the input dialogue “store address”
`is activated. By means of input dialogue “store address,” a
`destination address that has been entered is stored under a
`
`keyword spoken by the user in the personal address list.
`Then the input dialogue “call up address” is ended and the
`system returns to wait state 0.
`FIG. 2 shows in a schematic form a first embodiment of
`
`the input dialogue “enter destination location.” Following
`activation of the input dialogue “enter destination location”
`in step 1000, by virtue of the speech command “enter
`destination location” E1 spoken in step 300 by the user and
`recognized by the speech recognition device, in step 1010
`the basic vocabulary is loaded into the speech recognition
`device as can be seen from FIG. 2. The loading of the basic
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`vocabulary into the speech recognition device basically can
`also be performed at another point, for example after the
`initialization phase or following the actuation of the PTT
`button. This depends on the speed of the loading process and
`on the type of speech recognition device used. Then in step
`1020 the user is requested to enter a destination location. In
`step 1030 the user enters the desired destination location by
`speech input. This speech input is transferred in step 1040 as
`an acoustic value <destination location_1> to the speech
`recognition device and compared there with the basic
`vocabulary that was loaded; sampling values in the time or
`frequency domain or feature vectors can be transmitted to
`the speech recognition device as an acoustic value. The
`nature of the acoustic value thus transferred likewise
`
`depends on the type of speech recognition engine employed.
`As a result, the speech recognition engine supplies a first
`hypothesis list hypo.1 with place names which are sorted by
`probability of recognition. If the hypothesis list or hypo.1
`contains homophonic place names, i.e. place names that are
`pronounced identically but are written differently,
`for
`example Ahlen and Aalen, both place names receive the
`same recognition probability and both place names are taken
`into account in the continuation of the input dialogue. Then
`in step 1050 the place name with the greatest recognition
`probability is output as speech output <hypo.1.1> to the user
`with the question as to whether or not <hypo.1.1.> corre-
`sponds to the desired input destination location <destination
`location,1>. (At
`this point
`it still makes no difference
`whether several entries are present at the first location on the
`hypothesis list since the place names are pronounced
`identically.) If the answer to question 1050 is “yes” a jump
`is made to step 1150. If the user answers the question with
`“no” the acoustic value <destination location_1> of the
`destination location entered in step 1060 is stored for a
`possible later recognition process using another lexicon.
`Then the user is requested in step 1070 to pronounce the
`destination location again. In step 1080 the user enters the
`destination location once again by speech input. This speech
`input
`is transferred in step 1090 as the acoustic value
`<destination location,2> to the speech recognition device
`and compared there with the basic vocabulary that has been
`loaded. As a result the speech recognition device offers a
`second hypothesis list hyp