`(10) Patent N0.:
`US 6,850,609 B1
`
`Schrage
`(45) Date of Patent:
`Feb. 1, 2005
`
`U5006850609B1
`
`{54) METHODS AND APPARATUS FOR
`PROVIDING SPEECH RECORDING AND
`SPEECH TRANSCRIPTION SERVICES
`1
`_
`-
`.‘
`(75)
`Invcmm'
`James R' SChmge’ R'dgu'e’ld’ (" (US)
`.
`_
`.,
`‘
`.
`_,
`.
`.
`_
`(73) Asglgncc' 53$” 5”““5 ('"rP"A‘lmg‘°"' VA
`{ * ) Notice:
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U‘S‘L‘ 1540’) by 0 days.
`
`(21) APP]- NOS 090178395
`
`(22
`
`Filed:
`
`Oct. 23, 1998
`
`(60)
`
`(51)
`
`Related 0.8. Application Data
`Provisional application No. 610063952, filed on Oct. 28,
`1997-
`Int. CL? IIIIIIIIIIIIIIIIIIIIIIIIII H041“ 3’42; H04M 1764;
`(1101. 15726; (1101. 17700
`3797202 01- 37978810-
`(5’?) U
`' """" 37978814370058 70;”;35 704,246
`.1 '
`'7
`’
`-
`_’
`'
`,
`5
`'
`'
`’
`5
`'
`_
`,,
`(58} held of Search
`379788.11, 88.01,
`379788.26, 88.07, 88.13, 100.01, 158, 20201—20601;
`7047235, 231; 348114.08, 14.09
`
`Cl
`
`(56)
`
`References Cited
`U.S. PA’I‘LTN'I‘ DOCUMENTS
`
`........... 379738.01
`611996 Russell cl a1.
`5,526,407 A *
`3797202
`911996 Bicsclin
`5,559,875 A *
`
`....................... 379167
`5,586,172 A * 1271996 Sakurai
`5,606,643 A *
`271997 Balasubramanian
`7047243
`el al.
`.. 379788.11
`471997 Fenlon el al.
`5,619,555 A *
`
`369,125.01
`771997 Ellozy el al.
`5,649,060 A *
`348104.09
`111998 Bruno cl al.
`5,710,591 A *
`301000.01
`211900 Dahlcn
`5,870,454 A *
`
`..... 7041252
`672000 Glickman el al.
`6,076,059 A *
`
`............ 7047235
`8.12000 Sharman ct al.
`6,100,882 A *
`872001 Bowalcr .................. 379788.13
`6,278,772 Bl *
`6,304,648 Bl * 1072001 Chang
`379720201
`
`6,327,343 Bl * 1272001 Epstein et al.
`379788.01
`
`.
`
`
`
`212002 Sailo
`6,349,303 Bl *
`7077101
`5.12002 Powens et at.
`6,389,114 Bl *
`379,152
`772002
`Iaylor
`6,424,935 Bl *
`7041’10
`872002 (.annon ct al.
`6,430,270 Bl *
`379788.19
`6,477,491 B1 "‘ 1172002 Chandler eta].
`..
`7047235
`
`6,501,740 Bl * 12.72002 Sun et al.
`......
`3701‘261
`6,600,725 01 *
`"02003 Roy
`3701261
`0,074,450 32 «'4
`02004 Ben-Shacharelal.
`348714.09
`‘ cited b
`examiner
`y
`Primary Examiner—Fan Tsang
`Assistant Examiner—J. Phan
`(74) Attorney, Agent, or Hrm—leonard (T. Suchyta, Esq;
`Joel Wall, Esq; Slraub & Pokotylo
`
`(57)
`
`ABSTRACT
`
`Recording and automated transcription methods and appa—
`ratus suitable for use in a communication system such as a
`telephone system are described.
`In one embodiment, a
`telephone system‘ has a conference bridge, a transcription
`system, and multiple telephone sets connected to a central
`olfice CO switch via communication channels. Each tele—
`(
`phone set lets its user transmit speech on first and second
`channels simultaneously. The (T0 switch selectivelyr con-
`”
`nects the first channel from each telephone being used in a
`conference call to the conference bridge. The bridge inter—
`connects the first channels to establish a telephone confer—
`ence. The CO switch selectively connects the second chan—
`nels to the transcription system, which records the
`teleconference participants in separate recordings. The sepa-
`rate recordings are each time stamped and identified with a
`user’s or communication unit ID. An automated speech—to—
`texl (speech recognition) system transforms the recorded
`speech into textual data. A collator collates text segments
`generated from the speech obtained from different channels
`using the time stamps and the ID’s to form a master
`transcript of textual data. The present invention can be used
`by single or multiple ttsers. Asingle user may use the system
`of the present invention as a dictation system. The system
`can be used for automatically generating a transcript of a
`multi—party telephone conference or meeting.
`
`24 Claims, 6 Drawing Sheets
`
`
`
`0001
`
` GTL 1004
`GTL 1004
`IPR of U.S. Pat. No. 8,135,115
`IPR of US. Pat. No.
`8,135,115
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 1 0f 6
`
`US 6,850,609 B1
`
`_|h)
`
`CZIJHEHIE
`CSCZDHIEfill!
`
`COMPUTER
`
`
`
`c”|||||||||
`
`:1
`
`[:23[2:3C23filmENEWINE
`
`3
`
`PRIOR ART
`
`...3
`
`C.O.
`
`SWITCH
`
`—— ——
`BRIDGE
`
`CONFERENCE
`
`Ch
`
`FIG. 1
`
`0002
`0002
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 2 01'6
`
`US 6,850,609 B1
`
`14
`
`V— 10
`
` FAX
`
`MACHINE
`
`
`
`MULTI-
`
`CHANNEL
`
`RECORDING
`
`DEVICE
`
`SWITCHING
`
`NETWORK
`
`16
`
`
`
` 13
`
`TRANSCRIPTION
`
`SYSTEM
`
`
`
`TRANSCRIPT
`
`SERVER
`
`
`
`FIGURE 2
`
`0003
`0003
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 3 "£6
`
`US 6,850,609 B1
`
`r10
`
`9 9
`
`C1
`
`11
`
`02
`
`'
`
`-
`
`'
`
`-
`
`C3
`
`15 /
`3’
`
`SWITCHING
`NETWORK
`
`L1
`
`--
`
`-
`
`LP
`
`L2
`
`FAX
`
`@ MACHME
`
`14
`
`18
`
`COMPUTER
`
`110 INTERFACE -
`
`30
`
`32
`
`INTERFACE
`ROUTINES
`
`RECORDING!
`TIME
`
`MULTI-
`CHANNEL
`RECORDING
`
`DEVICE
`
`STAMPING
`ROUTINES
`
` 12
`
`
`16
`
`FIG. 3
`
`APPLICATION
`
`ROUTINES
`
`DEVICE
`
`
`DIGITAL DATA
`
`STORAGE
`
`
`
`
`
`
`TRANSCRIPT SERVER
`
`I10 INTERFACE
`
`50
`
`TRANSCRIPT
`
`GENERATION
`
`ROUTINES
`
`TRANSCRIPTION SYSTEM
`
`0004
`0004
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 4 of 6
`
`US 6,850,609 B1
`
`SWITCHING NETWORK RECEIVES
`
`REQUEST FOR CONNECTION TO A TRANSCRIPTION SERVICE PROVIDER
`
`402
`
`404
`
`406
`
`408
`
`410
`
`412
`
`414
`
`REQU ESTER AND TRANSCRIPT
`
`SERVER
`
`
`SWITCHING NETWORK ESTABLISHES
`A COMMUNICATION PATH BETWEEN
`
`INFORMATION
`
`TRANSCRIPT SERVER PROMPTS
`USER FOR TRANSCRIPTION RELATED
`
`TRANSCRIPT SERVER RECEIVES
`INFORMATION FROM USER
`
`TRANSCRIPT SERVER ENTERS
`
`RECORDING MODE OPERATION
`
`TRANSCRIPT SERVER SIGNALS USER
`
`THAT IT IS READY TO RECORD
`
`USER BEGINS PROVIDING SPEECH
`
`INPUT
`
`GOTO
`
`STEP
`
`416
`
`Fig. 4A
`
`0005
`0005
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 5 01'6
`
`US 6,850,609 B1
`
`FROM
`
`STEP 414
`
`RECORD SPEECH WITH TIME STAMP,
`DATE. USER ID. CONFERENCE ID (IF
`ANY) AND DELIVERY INSTRUCTIONS
`
`TRANSCRIPTION SYSTEM
`
`PROVIDE AUDIO RECORDING TO
`
`PERFORM SPEECH RECOGNITION
`
`OPERATION ON RECORDED AUDIO
`
`
`
`DATA
`
`416
`
`418
`
`420
`
`GENERATE TRANSCRIPT FROM
`RECORDED SPEECH
`
`422
`
`TO BE
`
`GENERATED?
`
`
`
`
`IS A
`COMPOSITE TRANSCRIPT
`
`
`424
`
`
`
`COMBINE INDIVIDUAL TRANSCRIPTS
`
`426
`
`HAVING SAME CONFERENCE ID INTO
`
`COMPOSITE TRANSCRIPT
`
`.H
`
`g. 48
`
`0006
`0006
`
`428
`
`TRANSCRIPT
`
`
`
`US. Patent
`
`Feb. 1, 2005
`
`Sheet 6 0f 6
`
`US 6,850,609 B1
`
`
`
`SWITCHING
`
`NETWORK
`
`
`
` 503
`I_!lN-TERFACEIi._0INTERFACE
`
`
`512
`BRIDGE
`
`
`DEVICE
`
`
`
`
`CHANNEL
`
`RECORDING
`
`CONFERENCE
`
`TRANSCRIPT
`
`SERVER
`
`
`TRANSCRIPTION
`
`SYSTEM
`
`16
`
`13
`
`FIGURE 5
`
`0007
`0007
`
`
`
`US 6,850,609 B]
`
`1
`METHODS AND APPARATUS FOR
`PROVIDING SPEECH RECORDING AND
`SPEECH TRANSCRIPTION SERVICES
`
`RELATED APPLICATIONS
`
`This application is a continuation—in—part of US. Provi—
`sional Patent Application Ser. No. 6(If065,952,
`titled
`“'I‘EI..L:I’IIONLE COMMUNICATION ME’I'IIODS AND
`APPARATUS” which was filed on Oct. 28, 199?, and which
`is hereby expressly incorporated by reference.
`
`FIELD OF TIIE INVEN'I'ION
`
`The present invention relates to the field of electronic
`communications and, more particularly,
`to apparatus and
`methods for providing recording andtor transcription ser—
`vices.
`
`BACKGROUND Ol-i THE INVENTION
`
`Current communication systems provide a spectrum of
`services to subscribers. Many modern communication
`systems, e.g., telephone communication systems and tele—
`phone networks, readily allow human—to—hu ma 1'], computer—
`to—computer and human—tocomputer interactions via the
`transmission of audio and data over communication chan-
`nels. Connection system subscribers can now access such
`diverse communication services as call messaging, call
`screening, message retrieval, call waiting, call forwarding
`and teleconferencing from phones and computers. The Inter-
`net
`is an example of a communication system that
`is
`currently being used to transmit both voice and data signals
`and to interface computer systems and networks with exist—
`ing telephone networks.
`As competition between communication service
`providers, e.g., regional telephone companies, Internet ser-
`vice providers, long distance telephone service providers,
`etc., increases, service providers continue to look for new
`ways to distinguish themselves from their competitors and to
`increase revenues. New services often provide a way for
`communication services to distinguish themselves from
`their competitors while, at the same time, creating new
`sources of revenue.
`
`to provide new services and increase
`In an attempt
`revenues, telephone and communication service companies
`have offered in recent years a host of new services. Such
`services often take advantage of existing technology such as
`speech recognition, the ability to make, store and transmit
`voice recordings andt’or the ability to transmit scheduling
`information over the Internet.
`
`Voice dialing telephone service is an example of modern
`telephone service which involve the use of speech recogni-
`tion. Voice mail is an example of a service which takes
`advantage of the ability to make, store and transmit voice
`recordings. Telephone network initiated conference calling,
`where a conference bridge coupled to the Internet is used to
`initiate a multi-party conference call, is an example of a
`modern service which takes advantage of the ability to
`transmit scheduling information over the Internet.
`FIG. 1 illustrates a known telephone system 2 which
`includes a plurality of telephones 'l'l—TN coupled to a
`telephone network conference bridge 3, via a central office
`telephone switch 4. A computer 5, located in the proximity
`of any one of the telephones T1—-TN, can be used to transmit
`telephone conference scheduling information to the confer—
`ence bridge 3 via the Internet 6 to schedule a telephone
`conference. The transmitted information may include, e.g.,
`
`2
`
`the time of the telephone conference and the telephone
`numbers assigned to the telephones which are to be used in
`the telephone conference. In the known system, the confer—
`ence bridge initiates a telephone conference at the appointed
`time by calling each of the telephone numbers associated
`with a scheduled telephone conference and bridging the calls
`so that the audio received from any one phone is transmitted
`to all of the telephones involved in the scheduled conference
`call.
`
`While the services discussed above, have proved useful to
`many subscribers and a worthy source of revenue to many
`communication service providers, there remains a need for
`new communication services which can be used to distin—
`
`guish a communication service provider from its competi—
`tors and serve as a source of revenue. New communication
`
`services which augment or supplement existing services are
`particularly desirable because they can be used to maintain
`an existing subscriber base while helping to attract subscrib—
`ers from competitor’s services. Entirely new services which
`may be provided independent from existing services are also
`desirable as a new revenue source.
`
`Telephone network hardware is maintained, updated and
`serviced by the communication service provider.
`Accordingly, services which can be implemented by adding
`or modifying a limited amount of network hardware, e.g., a
`peripheral server device, tend to be easier to implement on
`a large scale than services which require substantial new
`amounts of customer premise equipment. For this reason,
`new services which can be implemented by adding or
`modifying network hardware without requiring substantial
`changes to existing customer premise equipment tend to be
`more desirable than services which require new customer
`premise equipment or substantial modifications to customer
`premise equipment.
`As will be discussed below, while some embodiments
`require new or modified customer premise equipment, e.g.,
`telephones, many of the methods, apparatus, and services of
`the present invention can be implemented without the need
`for new customer premise equipment or substantial modi-
`fications to existing customer premise equipment.
`SUMMARY OF THE INVENTION
`
`invention relates to
`the present
`As discussed above,
`methods and apparatus for providing recording andi’or tran-
`SCription services, e.g., as part of a communication system.
`The present invention provides a remotely located record-
`ing andtor transcription device coupled to one or more audio
`sources, e.g., telephones. One or more networking devices,
`e.g., telephone switches, may be used to couple the tele—
`phones to the recording and transcription device of the
`present invention.
`In a single user embodiment, the recording device records
`the audio signal received from the audio source, e.g., a single
`telephone. It then transmits the recorded audio, e.g., speech,
`to a transcription device which performs a speech recogni-
`tion operation on the received audio. A transcript, e.g., a set
`of text, generated by performing the speech recognition
`operation is produced and transmitted to the user of the
`service. Transcript delivery may be by way of an E—mail
`message, posting of the transcript on an Internet site to
`which the user has access, andfor by faxing the generated
`transcript to the user using a fax number provided by the
`user.
`
`L“
`
`'10
`
`20
`
`Is.) Ln
`
`30
`
`35
`
`40
`
`U!'11
`
`60
`
`While the transcript device can be used to generate a
`transcript in real time, the system of the present invention
`need not be implemented as a real time transcription device.
`0008
`0008
`
`
`
`US 6,850,609 B]
`
`3
`In one embodiment, the system of the present invention
`delivers transcripts minutes or hours after the audio input is
`received and recorded. This approach provides several
`advantages. First, because the transcription process is not
`performed in real time, the speech recognition task can be
`performed in a time period that is considerably longer than
`the duration of the speech upon which the recognition
`operation is performed. This allows for multiple speech
`recognition passes to be made using various speech recog-
`nition techniques, as is known in the speech recognition art,
`to improve recognition accuracy.
`It also alloWs for the
`speech recognizer to operate at a slower speed than would be
`required for real time speech recognition. Thus lower cost
`(slower) processors can be used to perform the speech
`recognition operation if desired. In addition, non-real time
`recognition facilitates the efficient use of a speech recog—
`nizer as a shared network resource since sequential process—
`ing of simultaneously received and recorded audio signals is
`made possible.
`Implementing the speech recognizer used in the transcrip-
`tion system as a shared resource also alloWS for the use of
`a far more expensive, and potentially far more accurate,
`speech recognizer than individual users of the system could
`alford to purchase independently. The potential for provid-
`ing transcripts with a greater degree of accuracy than most
`customer premise equipment would be able to provide, and
`the advantage ofeliminating the need for customers to invest
`in transcription equipment is a feature which should appeal
`to many potential customers of the speech transcription
`services of the present invention.
`invention is
`While one embodiment of the present
`directed to generating an audio recording and transcript from
`audio signals corresponding to a single user, other embodi—
`ment are directed to making transcripts from audio signals
`received from multiple sources, e.g., users. The multiple
`audio sources may be, e.g., different telephones operating in
`speaker phone mode placed in front of different individuals
`located in the same conference room or telephones respond—
`ing to different
`individuals participating in a telephone
`conference.
`
`In accordance with various embodiments of the present
`invention, a different communication channel
`is used to
`provide the audio originating from each separate audio
`source. The audio from each channel
`is recorded and a
`source identifier, e.g., telephone number is associated with
`the recording. 'lime and date stamps are included in the
`audio recording for subsequent use by the transcription
`device of the present invention. Aconference [D may also be
`added to identify dilferent recordings conesponding to the
`same conference.
`
`Thus, in accordance with the present invention the speech
`from each member of a meeting or telephone conference for
`which a transcript is to be automatically generated is inde—
`pendently recorded. The recorded time stamps facilitate the
`subsequent correlation of the audio recorded from the sepa-
`rate channels and allows for a combined time correlated
`transcript to be generated using automated speech recogni—
`tion techniques.
`Automated speech recognition operations can be per-
`formed on the recorded audio to automatically generate text
`transcripts therefrom. Alternatively, all or portions of the
`recorded audio may be provided to human beings for
`transcription or to be used for other purposes.
`In accordance with various embodiments of the present
`invention, each recorded audio channel is separately tran-
`scribed using automated speech recognition techniques. The
`
`4
`
`generated transcripts corresponding to each separate audio
`source of a meeting or telephone conference is then com—
`bined into a composite transcript.
`In one embodiment, the composite transcript includes an
`audio source identifier, e.g., telephone number or party name
`before each portion of text included in the transcript. Text
`segments corresponding to different audio sources are inter-
`leaved in the composite transcript in the order the speech
`segments occurred.
`In the above described manner the methods and apparatus
`of the present
`invention provide communication service
`providers, e.g.,
`telephone companies,
`the opportunity to
`provide a new service, e.g., a
`transcription service.
`In
`addition, it offers communication service providers which
`currently provide telephone conference service an enhanced
`form of the service, i.e., a telephone conference service with
`an automatic transcript generation feature.
`Additional features, embodiments and advantages of the
`methods and apparatus of the present
`invention are dis—
`cussed below in the detailed description which follows.
`
`BRIEF DESCRIPTION OF TIIE DRAWINGS
`
`FIG. 1 illustrates a known system for providing a tele-
`phone conferencing service.
`FIG. 2 is a generalized schematic diagram of an exem—
`plary communication system having transcription services
`in accordance with one embodiment of the present inven—
`tion.
`
`FIG. 3 is a more detailed diagram of the system of the
`present invention illustrated in FIG. 2.
`FIG. 4, which comprises the combination of FIGS. 4A and
`4B,
`is a flow diagram,
`illustrating the operation of the
`communication system of FIGS. 2 and 3.
`FIG. 5 is a diagram of a telephone conferencing system
`with automated speech transcription capabilities that may be
`implemented in accordance with an exemplary embodiment
`of the present invention.
`
`[)E’l‘AII 151) DESCRIPTION
`
`L“
`
`'10
`
`20
`
`Id Ln
`
`30
`
`35
`
`40
`
`U!'11
`
`FIG. 2 shows a voice communication system 10 of the
`present invention. The communication system 10 includes a
`switching network 11 coupled to (n) communication units
`U1, U2 and Un, e.g., telephones, via respective communi—
`cation channels C1, C2 and Cn.
`Switching network 11, which may be, e.g., a public
`telephone switching network, operates to selectively couple
`one or more of the communication units to a transcript server
`12 via one or more of (p) communication channels L1, 12
`and Lp. The communication units U1, U2 and Un are
`remotely located from the switching network 11 and tran-
`script server 12. For example, the communication units and
`transcript server may be located in separate buildings,
`towns, or even difi'erent countries in the case of international
`telephone calls. Transcript server 12 communicates with
`transcription system 13. The transcript server 12 includes a
`multi-channel
`recording device 16 suitable for simulta-
`neously recording audio signals received from multiple
`different communication units. The transcript server nor—
`mally records audio signals from each communication chan-
`nel separately.
`In various embodiments it also associates
`timetdate stamps, communication unit ID information and!
`or conference Id information with each separate recording.
`The transcript server has the capability of providing the
`recording to the transcription system 13 which is coupled
`thereto. The transcription system performs a speech recog-
`0009
`0009
`
`60
`
`
`
`US 6,850,609 B]
`
`6
`scription system 13. Thus, the I10 routines 64 control the
`transmission of the completed transcript or transcripts to the
`transcript server.
`Operation of the system 10, in one exemplary embodi—
`ment of the present invention, will now be discussed in
`detail with reference to FIG. 4-. FIG.
`4- is a flow chart
`
`L“
`
`5
`nition operation on recordings received from the transcript
`server 12. The transcription system 13 has the capability of
`combining recognized speech from mu ltiplc channels into a
`composite transcript.
`FIG. 3 illustrates the communication system 10 of FIG. 2
`in greater detail. As illustrated in FIG. 3, the transcript server
`12 comprises an inputfoutput (HO) interface 28 which is
`responsible for interfacing between internal components of
`the server 12 and the various devices and communications
`channels coupled thereto including, e.g., the transcription
`system 13, the Internet 17 and communications channels [.1
`...LP.
`
`In addition to the U0 interface 28, the transcript server 12
`comprises a memory 30, a CPU 38,
`the mu lti—channel
`recording device 16 and a digital data storage device 40. The
`storage device may be, e.g., a hard disk drive. These
`transcript server components 30, 38, 16 and 40 are coupled
`to each other and the “0 interface 28 by a common bus 35.
`The memory includes a plurality of routines which, upon
`execution by the CPU 38 control transcript server operation.
`The routines stored in memory 30 include interface routines
`32 for controlling 1;“0 operations, recordingftime stamping
`routines 34 used for controlling the multi—channel recording
`device 16 and various other application routines 36, e.g.,
`speech recognition, DTMF recognition, and call connection
`routines. The speech and D’I'Ml-i recognition routines may be
`used by the CPU 38 to recognize transcript related identi—
`fication information and delivery instructions provided by a
`user of the system.
`The recording device 16 is capable of simultaneously
`generating a separate audio recording corresponding to each
`of the supported input channels L1 to LP. Recordings
`generated by the recording device 16 are stored in the data
`storage device 40 prior to transmission to the transcription
`system 13. The data storage device 40 may also be used for
`storing transcripts, in the form of electronic sets of text,
`provided by the transcription system 13 to the transcript
`server 12, e.g., for storage and delivery.
`The transcription system 13 comprises an ”0 interface
`50, a speech recognizer 52, data storage device 54, a CPU
`56 and memory 58 for storing various routines. The memory
`58 includes transcript generation routines 60, text correla—
`tion routines 62 and 130 routines 64. It may also include
`speech templates andfor models used by the speech recog—
`nizer 52 to perform speech recognition. The transcript
`generation routines are respoasible for controlling the CPU
`56 to process the results of the speech recognition operations
`performed by the speech recognizer 52. A transcript
`is
`generated for each recording provided by the transcript
`server. The non-voice data, e.g., user ID, conference 11),
`time and date stamps included in the recording to which a
`transcript corresponds, are also included in the generated
`transcript.
`The correlation routines 62 control the CPU 56 to gen—
`erate a composite transcript from multiple transcripts which
`include the same conference ID. This may be done, as
`discussed above, by interleaving text from multiple tran—
`scripts as a function of the time stamps included therein.
`User or device II) information is normally inSerted preced-
`ing each interleaved text segment to aid in identifying the
`source of the speech. Transcripts generated by the transcript
`generation routines and composite transcripts generated by
`the correlation routines are stored in the data storage device
`54.
`
`The ”0 routines 64 are responsible for controlling corn-
`l'nunications between the transcript server 12 and the tran-
`
`'10
`
`20
`
`Id Ln
`
`illustrating the steps involved with generating a transcript
`using the system 10 of the present invention.
`Du ring operation, switching network 11 receives requests
`for connection to a transcription service provider from one
`or more subscribers via communication uniLs U1, U2 and
`Un. This event is represented in FIG. 4 as step 402. The
`request may be in the form of a telephone number corre—
`sponding to the transcript server 12. In response, in step 404
`switching network 11 establishes a communication path
`betwoen the requester’s communication unit, say commu-
`nication unit U2,
`to transcript server 12. The established
`communication path includes the corresponding commu ni—
`cation channel C2 which couples the communication unit to
`the switching network, and an available communication
`channels I.l—I.p, say channel 1.1, which couples the switch-
`ing network to the transcript server 12. Dashed line 15
`represents the switched connection established by switching
`network 11 linking communication channels C2 and L1.
`In response to establishment of the communication path to
`the communication device U2, the transcript server, in step
`406, prompts the user, i.e., the requester, to provide tran-
`scription related information. This may include a request for
`a speaker identifier, e.g., name, to be used in the transcript,
`and information on how the transcript is to be delivered, e.g.,
`by fax, E—mail, etc. A request for speaker identifier infor-
`mation may be avoided by using automatic number identi—
`fication (ANI) information provided by the switching net—
`work and an enhanced caller ID service which provides
`caller name and device II) (e.g., telephone number) infor-
`mation. The request may also include information request-
`ing a conference ID used to identify which audio input
`should be combined into a composite conference transcript.
`The user of the system can respond to the transcript server
`prompt either orally or via the use of D’l'Ml" signals (touch
`tones). Step 408 represents receipt of the requested infor-
`mation from the user. Upon receiving the requested infor—
`mation from the caller, the transcript server enters a record—
`ing mode in step 410. The transcript server 412 indicates that
`it is ready to begin recording by transmitting a signal to the
`user. The signal may be, e.g., an audio tone of the type
`commonly used by telephone answering machines to indi—
`cate readiness to begin recording.
`In response to the transcription server ready signal, the
`requester begins providing speech input which is transmitted
`from communication unit U2 to server line [.1 over com-
`munication channel C2. The transcript server 12 generates
`and stores, for each communication channel in use, a sepa—
`rate recording of any received speech, e.g., a digital audio
`recording. Time stamp, date, user II), device ll), conference
`ld {if any) and the received transcript delivery information
`are also generated and recorded with the received audio.
`Recording of such information, at
`least
`in some
`embodimean, is optional. For example, where a compOsite
`transcript will not be generated, time stamp and conference
`1]) information may be omitted from the recorded data. In
`addition, only one of the user and device Ids may be
`recorded and used subsequently as a speech source identifer.
`Audio recordings produced by transcript server 12 are
`provided,
`in step 418,
`to transcription system 13. The
`transcription system 13 is responsible for performing, in step
`0010
`0010
`
`30
`
`35
`
`40
`
`U!I'm
`
`60
`
`
`
`US 6,850,609 B]
`
`7
`420, an automated speech recognition operation on the
`recorded speech. The transcription system 13 generates, in
`step 422, a transcript, e.g., a set of text corresponding to
`recognized speech in the recording,
`for each recording
`received from the transcript server 12.
`After a transcript is generated, in step 424 a determination
`is made by the transcription system 13 as to whether a
`composite transcript is to be produced. This check can be
`done by determining if the recording used to generate the
`transcript includes a conference ID indicating that the gen-
`erated transcript corresponds to a portion of a conference.
`In the case where transcripts of speech from multiple
`users are to be combined into a composite transcript, e.g., a
`transcript of a conference where a dilferent one of the
`communication devices Ul .
`.
`. Un is used by each con l‘er-
`ence participant, a composite, e.g., time correlated, tran-
`script is generated in step 426. This can, and in one embodi-
`ment is, done by time correlating and interleaving the text
`from the various transcripts corresponding to individual
`meeting participants of the same conference into a single
`composite transcript.
`Transcript delivery is eHected in step 428, in the FIG. 2
`embodiment, via the transcript server 12. The transcript
`server 12 receives transcripts generated by the transcript
`system 13 and transmits them according to the received
`delivery instructions. Generated andfor composite tran-
`scripts may be delivered in step 428. Transcript delivery may
`be in electronic form, e.g., by transmitting an electronic
`version of the generated transcript as part of an E—mail, e.g.,
`as a file attachment, to a computer 18 via, the Internet 17, a
`LAN connection, or another connection. The transcript may,
`alternatively, be posted on a server, e.g., the computer 18
`that is coupled to the Internet. In such an embodiment the
`transcript can be accessed by multiple parties, e.g.,
`the
`meeting participants. Alternatively, it can be faxed by the
`transcript server 12 via the switching network 11 to one or
`more fax machines 14 coupled thereto. The audio recordings
`from which the transcript was made can be delivered with
`the transcript if desired, except of con rse, in the case of fax
`delivery.
`In an alternative embodiment, transcript delivery is per-
`formed by the transcription system 13, via an Internet or
`telephone connection via one of the above described trans—
`mission techniques. Such an embodiment has the advantage
`of not requiring further involvement by the transcript server
`in the transcript preparation and delivery process after the
`audiofdata recordings are forwarded to the transcription
`system 13.
`in step 428, transcript
`With delivery of the transcript
`generation and delivery is complete pending the next tran-
`script generation request.
`The abOve description of communication system 10
`broadly outlines an exemplary recording and transcription
`embodiment of the present invention. It is contemplated that
`many variations, modifications and specilic implementa-
`tiorLs are possible. The network formed by communication
`units Ul—Un, communication channels (Tl—(In, and switch-
`ing network 11 may be implemented with a wide spectrum
`of conventional communications networks, such as an ordi—
`nary telephone system, an advanced digital data network, a
`packet switched network, or a cell switched (e.g., ATM)
`network. Communication units Ul—Un may include analog
`andi‘or digital, e.g., ISDN, land—based telephone sets, mobile
`radio transceivers, or the like, while communication chan—
`nels Cl—Cn may include wire pairs, ISDN tines, microwave
`radio channels, optical fibers, coaxial cables, satellites, etc.
`
`L“
`
`'10
`
`20
`
`Id L”
`
`30
`
`35
`
`40
`
`8
`'I'ranscript server 12 mayr be implemented as a com puter-
`based system capable of providing management, storage and
`control functions for itself and transcription system 13.
`Electronic voice recognition devices having speech—to—text
`capabilities are well known. Such systems may be used, in
`accordance with the methods of the present invention, to
`implement transcription system 13.
`Numerous modifications and variations of communica—
`
`tion system 10, are contemplated and will be apparent to
`those skilled in the art in view of the present description. For
`example, any number of transcript servers [2 andfor tran-
`scription systems 13 may be used in the network and there
`need not be a one to one match between the number of
`
`In
`transcript servers 12 and transcription systems 13.
`addition, the transcription system 13 may be located at a site
`that is physically remote from the transcript server 12 with
`a single remotely located transcription system 13 servicing
`one or more transcript servers 12.
`FIG. 5 is a diagram of a telephone conferencing system
`500 with automated speech transcription capabilities imple-
`mented in accordance with another embodiment of the
`present
`invention. Elements of the FIG. 5 embodiment
`which are the same as, or similar to, those of the earlier
`described embodiments are identified using the same refer—
`ence numbers used in preceding figures.
`As illustrated, the telephone conferencing system 500
`includes a plurality of telephone devices T1, T2, Tn, a
`switching network 11, a computer 18, a fax machine 14,
`transcript server 5