`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. C-25, NO. 4, APRIL 1976
`
`System Organizations for Speech Understanding:
`Implications of Network and Multiprocessor Computer
`Architectures for Al
`
`LEE D. ERMAN, RICHARD D. FENNELL, VICTOR R. LESSER, AND D. RAJ REDDY, MEMBER, IEEE
`
`Abstract-This paper considers various factors affecting sys-
`tem organization for speech understanding research. The struc-
`ture of the Hearsay system based on a set of cooperating, inde-
`pendent processes using the hypothesize-and-test paradigm is
`presented. Design considerations for the effective use of multip-
`rocessor and network achitectures in speech understanding sys-
`tems are presented: control of processes, interprocess communi-
`cation and data sharing, resource allocation, and debugging are
`discussed.'
`
`Index Terms-Hardware for AI, multiprocessors, networks,
`parallel processing, real-time systems, software for Al, speech
`recognition, speech understanding, system organization.
`
`INTRODUCTION
`S YSTEM organizations for speech understanding
`systems must address many problems: effective use
`of multiple sources of knowledge, anticipation and goal-
`direction in the analysis of the incoming utterance, real-
`time response, continuous monitoring of input de-
`vice(s), errorful nature of the recognition process, expo-
`nential increase of processing requirements with the in-
`crease of desired accuracy, and so on. A particular
`model of speech perception [20] which attempts to solve
`the above problems involves the use of cooperating in-
`dependent processes using a hypothesize-and-test para-
`digm. This paper examines the effect of the problem
`constraints and the model on system organizations, pre-
`sents the structure of a system currently operational on
`a PDP-10 computer and discusses the implications of
`multiprocessor and network architectures.
`
`Manuscript received February 15, 1973. This research was support-
`ed in part by the Advanced Research Projects Agency of the Depart-
`ment of Defense under Contract F44620-70-C-0107 and monitored by
`the Air Force Office of Scientific Research.
`L. D. Erman, V. R. Lesser, and D. R. Reddy are with the Depart-
`ment of Computer Science, Carnegie-Mellon University, Pittsburgh,
`PA 15213.
`R. D. Fennell was with the Department of Computer Science, Car-
`negie-Mellon University, Pittsburgh, PA 15213. He is now with the
`Federal Judicial Center, Washington, DC 20005.
`1 Since the writing of this paper (ini 1973), we have made advances
`in solving many of the problems described here. These solutions are
`embodied in the Hearsay II system (see Lesser et al. [12], Fennell [91,
`Fennell and Lesser [10], Erman and Lesser [71, and Lesser [13]).
`
`Unlike many other problems in artificial intelligence
`(Al), speech understanding systems are characterized
`by the availability of diverse sources of knowledge, e.g.,
`acoustic-phonetic rules, phonological rules, articulatory
`models of speech production, vocabulary and syntactic
`constraints, semantics of the task domain, user models,
`and so on. A major problem, then, is to develop para-
`digms which can make use of all the available sources of
`knowledge in the problem solution. At the same time,
`absence of one or more sources of knowledge should not
`cripple the system. Suppose each source of knowledge is
`represented within the system as a process. In order to
`remove or add sources of knowledge, each process must
`be independent, i.e., it must not require the presence of
`other processes in the system. But at the same time
`each process must cooperate with the other processes,
`i.e., it must be able to effectively use the information
`gathered by them about the incoming utterance. Thus,
`a major design step is to establish what information is to
`be shared among processes and how this information is
`to be communicated so as to maintain the independence
`of individual processes while still allowing for necessary
`process cooperation.
`Knowledge available in the acoustic signal represents
`only one part of the total knowledge that is brought to
`bear in understanding a conversation. A good example
`of this is when one is interrupted by an appropriate re-
`sponse from the listener to a question that is as yet in-
`complete. In general, a human listener can tolerate a
`great deal of sloppiness and variability in speech be-
`cause his knowledge base permits him to eliminate most
`of the possibilities even as he hears the firstA few words
`of the utterance (if not before!). We feel thatthis notion
`of anticipation, prediction, and hypothesis generation is
`essential for machine perception, systems as well. In
`general, we expect every source of knowledge to be able
`to generate hypotheses in a given context, or verify hy-
`potheses generated by others using different represen-
`tations of knowledge, if necessary. The implication is
`that knowledge processes be organized within the sys-
`tem so as to reduce the problem of recognition and un-
`derstanding to one of prediction and verification.
`
`Authorized licensed use limited to: Callie Pendergrass. Downloaded on May 03,2022 at 17:10:14 UTC from IEEE Xplore. Restrictions apply.
`
`Amazon / Zentian Limited
`Exhibit 1025
`Page 1
`
`
`
`ERMAN et al.: SYSTEM ORGANIZATIONS FOR SPEECH UNDERSTANDING
`
`415
`
`In tasks such as chess and theorem proving, the
`human has sufficient trouble himself so as to make rea-
`sonably crude computer programs of interest. But, be-
`cause humans seem to perform effortlessly (and with
`only modest error) in speech (and visual) perception
`tasks, similar performance is expected from machines,
`i.e., one expects an immediate response and will not tol-
`erate any errors. To equal human performance, a speech
`understanding system must be able to understand trivi-
`al questions as soon as they are uttered. This implies
`that various processes within the system should be al-
`lowed to operate as soon as there is sufficient incoming
`data, without waiting for the completion of the whole
`utterance. If the processes within the system are inde-
`pendent and unaware of the existence of each other,
`then the system must provide facilities for activation,
`termination, and resource allocation for each of the pro-
`cesses. Further, if a process can be deactivated before it
`reaches a natural termination point, provision must be
`made to preserve the state of the process until it is reac-
`tivated. Also, it is necessary to provide interlocks on the
`data that are shared among many processes.
`This has several implications for system organization.
`The system must monitor the input device continuously
`to determine whether speech is present; this requires
`nontrivial processing. If the system is unable to process
`the incoming data, automatic buffering must be provid-
`ed. If the system is to run on a time-sharing system,
`provision must be made to ensure that no data are lost
`because the program is swapped out for a period of
`time. If the speech understanding system is to consist of
`a set of cooperating independent processes, it is further
`necessary that they be able to be interrupted at un-
`preprogrammed points-if the microphone monitoring
`program is not activated in time to process the incoming
`utterance, it could lead to irrevocable loss of data.
`These considerations lead to two additional require-
`ments that are not commonly available on existing
`time-sharing systems, viz., process-generated interrupts
`of other processes and user servicing of interrupts.
`One of the characteristics of speech understanding
`systems is the presence of error at every level of analy-
`sis. To control such errors and permit recycling with im-
`proved definitions of the situation, one uses techniques
`such as feedforward, feedback, and probabalistic back-
`tracking. If such facilities do not exist within the sys-
`tem, they must be programmed explicitly.
`Speech, by its nature, appears to be computer inten-
`sive. A substantially unrestricted system capable of reli-
`ably understanding connected speech of many speakers
`using ,a large vocabulary is likely to require systems of
`the order of a proposed Al machine [3], i.e., processing
`power of 10 to 100 million instructions per second and
`memory of 100- to 1000 million bits.2 To obtain such
`
`processing power, it appears necessary to consider mul-
`tiprocessor architectures. Decomposition of speech pro-
`cessing systems to effectively use distributed processing
`power requires careful consideration even with primi-
`tive systems. Our model of cooperating independent
`processes, each representing a source of knowledge,
`leads to a natural decomposition of the algorithms for
`such machine architectures.
`
`THE CURRENT HEARSAY SYSTEM
`In this section we briefly describe the Hearsay speech
`understanding system as it now exists at Carnegie-Mel-
`lon University. (More detailed descriptions of the sys-
`tem are given in [20], [21], [6], [16].) We shall stress
`those aspects of its organization which are responsive to
`the constraints and model outlined above. This system
`represents a first attempt to solve those problems; thus,
`some of the constraints are only partially or poorly met,
`while others are satisfied in a more constricted way than
`necessary. We shall point out these limitations as they
`arp described; later sections on closely coupled and
`loosely coupled processor network architectures de-
`scribe possible corrections and improvements of the sys-
`tem.
`The Hearsay system is implemented as a small num-
`ber of parallel coroutines (see Fig. 1). Each coroutine
`(module) is realized as a separate job in the PDP-10
`time-sharing system; thus the time-sharing monitor is
`the primary scheduler for the modules. In general, the
`modules may achieve a high degree of (pseudo) parallel
`activity (through the use of shared memory and a flexi-
`ble interprocess message system3), but, in practice, we
`limit the parallelism to a very modest amount. This lim-
`itation is imposed for two reasons: first, since the PDP-
`10 is a uniprocessor system, there is nothing to be
`gained (in the time domain) by increasing the parallel-
`ism; and, second, the greater the amount of parallelism,
`the more difficult it is to control and debug the pro-
`grams within a time-sharing system that is not designed
`for cooperating processes (jobs).
`The model of recognition specifies that there be sepa-
`rate processes, each representing a different domain of
`knowledge. We have chosen three major domains of
`knowledge: acoustic-phonetics, syntax, and semantics.
`1) The acoustic-phonetic domain, which we refer to
`as just acoustics, deals with the sounds of the language
`and how they relate to the speech signal produced by
`the speaker. This domain of knowledge has traditionally
`been the only one used in most previous attempts at
`speech recognition.
`2) The syntax domain deals with the ordering of
`words in the utterance according to the grammar of the
`input language.
`3) The semantic domain considers the meaning of the
`
`2 Smaller and substantially cheaper systems can be built to perform
`useful but restricted speech understanding tasks.
`
`3 The facilities provided for inter-job control and communication
`are similar to those developed for the Stanford Hand-Eye system [8].
`
`Authorized licensed use limited to: Callie Pendergrass. Downloaded on May 03,2022 at 17:10:14 UTC from IEEE Xplore. Restrictions apply.
`
`Amazon / Zentian Limited
`Exhibit 1025
`Page 2
`
`
`
`416
`
`IEEE TRANSACTIONS ON COMPUTERS, APRIL 1976
`
`Message Communication
`and
`Shared Memory
`\N
`
`,
`
`-
`
`|Speech
`Inputter
`
`I
`
`| Recognition
`Overlord
`
`Task
`dule
`
`|
`
`Speech
`Outputter
`
`/o
`
`\
`
`Acoustics
`
`Syntax
`
`Sem*antics
`
`Fig. 1.
`
`Decomposition of processes in the current Hearsay system.
`
`utterances of the language, in tht context of the task
`that is specified for the speech understanding system.
`These processes, according to the model, are to be in-
`dependent and removable; therefore the functioning
`(and very existence) of each must not be necessary or
`crucial to the others. On the other hand, the model also
`requires that the processes cooperate and that the rec-
`ognition should run efficiently and with good error re-
`covery; these dictates imply that there be a great deal of
`interaction among the processes. Thus we seem to have
`opposing requirements for the system. These opposing
`requirements led to the design of the following struc-
`ture:
`Each process interfaces externally in a uniform way
`that is identical across processes; no process knows
`what or how many other recognition processes exist.
`A mediator, ROVER (Recognition OvERlord), handles
`the interface to each of the processes and thus serves
`as the linkage connecting the processes; the processes
`are called ROVER's "sons."
`The interface is implemented as a global data struc-
`ture which is maintained by ROVER. Each of ROVER's
`sons puts information into this data structure in a uni-
`form way. Each may access information submitted by
`its brothers, but in a manner which leaves the source of
`that information anonymous. This mechanism is analo-
`gous to a bulletin board on which messages can be left
`by several people and for which there is a monitor who
`accepts the message and arranges them in appropriate
`places on the board for others to react.
`This anonymous interface structure is appropriate
`only if the global data structure can be designed in such
`a way as to allow the processes to communicate mean-
`ingfully, i.e., there must be a common language which
`allows them to transmit the kind of information they
`need to help each other to work on the problem. We re-
`solve this problem by using the word as the basic unit of
`discourse among the processes.
`The basic element of the global data structure is the
`
`word hypothesis which represents an assertion that a
`particular word (of the input language lexicon) occurs in
`a specified position in the spoken input. A sentence hy-
`pothesis is an ordered linear sequence of word hypothe-
`ses; it represents an assertion that the words occur in
`the sentence in the order that the word hypotheses ap-
`pear in the sentence hypothesis. In addition, the unique
`"word" FILLER may appear as a word hypothesis; this is
`a placeholder and represents the-assertion that zero or
`more as yet unspecified words occur in this position in
`the spoken sentence. In general, there may be any num-
`ber of sentence hypotheses existing at any one time.
`The interactions among the source-of-knowledge pro-
`cesses are carried out using the hypothesize-and-test
`paradigm prescribed by the model. In general, any pro-
`cess may make a set of hypotheses about the utterance;
`all the processes (including the hypothesizer) may then
`verify (i.e. reject, accept, or reorder) these hypotheses.
`In particular, hypothesization occurs when a recognition
`process (acoustics, syntax, or semantics) chooses a FILL-
`ER word from a sentence hypothesis and associates with
`it one or more option words, each of which it asserts is a
`candidate to replace all or part of the FILLER. Verifica-
`tion consists of each process examining the option
`words and rating them in the context of the rest of the
`sentence hypothesis.
`Several restrictions have been placed on the imple-
`mentation of this general scheme. First, at any time
`only one part of the shared, global data structure (i.e.,
`one sentence hypothesis) is accessible to the processes
`for hypothesization and verification. Second, the pro-
`cesses go through the hypothesization and verification
`stages (and several other subsidiary stages) in a syn-
`chronized and noninterruptable manner. Finally, only
`one process is allowed to hypothesize at any one time.
`Again, these restrictions were imposed both because
`parallelism on a uniprocessor does not accomplish any
`throughput increase and because the available program-
`ming and operating systems make a more general imple-
`mentation difficult to specify, debug, and instrument.
`
`Authorized licensed use limited to: Callie Pendergrass. Downloaded on May 03,2022 at 17:10:14 UTC from IEEE Xplore. Restrictions apply.
`
`Amazon / Zentian Limited
`Exhibit 1025
`Page 3
`
`
`
`ERMAN et al.: SYSTEM ORGANIZATIONS FOR SPEECH UNDERSTANDING
`
`417
`
`These restrictions are mitigated somewhat by carefully
`adjusting the time grain of the processing so that each
`noninterruptable phase is not "excessively large."
`Each sentence hypothesis has a confidence rating as-
`sociated with it which is an estimate of how well it de-
`scribes the spoken utterance. This rating is calculated
`by ROVER, based on information supplied by the recog-
`nition processes. Errors in processing become evident
`when the overall rating given to a sentence hypothesis
`begins to drop; at that point, attention is focused on
`some othei sentence hypothesis with a higher rating.
`This switching of focus is the mechanism that provides
`the error recovery and backtracking that is necessary in
`any speech understanding system.
`
`CLOSELY COUPLED PROCESSOR SYSTEM
`ORGANIZATIONS
`As discussed in the Introduction, in order to do real-
`time speech understanding a substantial amount of
`computing power is required. Recent trends in technol-
`ogy indicate that this computing power can be economi-
`cally obtained through a closely coupled network of
`"simple" processors, where these processors can be in-
`terconnected to communicate in a variety of ways (e.g.,
`directly with each other through a highly multiplexed
`switch connected to a large shared memory [2] or
`through a regular or irregular network of busses [4].
`However,* the major problem with this network ap-
`proach to generating computing power is finding algo-
`rithms which have the appropriate control and data
`structures for exploiting the parallelism available in the
`network. The model for a speech understanding system
`as previously discussed, which is decomposed into a set
`of independent processes cooperating through a hypoth-
`esize-and-test paradigm, represents a natural structure
`for exploiting this network parallelism.
`There exist three major areas for exploitation of par-
`allelism in the structure of this speech understanding
`system: preprocessing, hypothesization and verification,
`and the processing specific to each source of knowledge.
`The preprocessing task involves the repetition of a se-
`quence of simple transformations on the acoustic data,
`e.g., detection of the beginning and end of speech, am-
`plitude normalization, a simple phoneme-like labeling,
`smoothing, etc. This sequence of transformations can be
`structured as a pipeline computation in which each
`transformation is a stage in the pipe. Thus, through this
`pipeline decomposition of the preprocessing task, a lim-
`ited amount (i.e., 4) of parallel activity is generated.
`The hypothesize-and-test paradigm for sequencing
`the activity of the different sources of knowledge can
`also be structured so as to exhibit parallelism, but the
`amount of parallelism is potentially much greater. This
`parallel activity is generated by the simultaneous pro-
`cessing of multiple sentence hypotheses and the simul-
`taneous hypothesization and verification by all sources
`of knowledge. The simultaneous processing of multiple
`sentence hypotheses, rather than processing just the
`
`currently most likely candidate, can conceptually intro-
`duce unnecessary work. But in practice, because of the
`errorful nature of the processing, there may be a consid-
`erable amount of necessary backtracking to find the
`best matching sentence hypothesis. It is appropriate to
`quote a conjecture of Minsky and Papert [15, sect.
`12.7.61 on this point.
`[While for the exact match problem] relatively small
`factors of redundancy in memory size yield very large
`increases in speed, . .. [for the best match problem].
`for large data sets with long word lengths there
`are no practical alternatives to large searches that in-
`spect large parts of the memory.]
`Thus, the parallel activity generated by simultaneous
`processing of more than one sentence hypothesis can re-
`sult in a proportional speed-up of the recognition pro-
`cess.4 Correspondingly, simultaneous hypothesization
`and verification by all sources of knowledge also results
`in a proportional speed-up of the recognition process
`because each source of knowledge is independent and is
`designed so that its knowledge contribution is additive.
`Finally, the verification algorithm of each source of
`knowledge can be decomposed into a set of parallel pro-
`cesses in two ways. The first kind of decomposition is
`based on the fact that verifications are performed on a
`set of option words rather than a single word at a time.
`Thus, for each source of knowledge there can be multi-
`ple instantiations of its verification process, each oper-
`ating on a different option word. The second kind of de-
`composition involves the parallelizing of the verification
`algorithms themselves; thus, each instantiation of a ver-
`ification process may itself be composed of a set of par-
`allel processes. However, this set of instantiations may
`not be totally independent because the rating produced
`by the verification process may be dependent on the
`particular set of option words to be verified and also on
`the local data base which is common to all the instantia-
`tions. For example, the acoustic verification process is a
`hierarchical series of progressively more sophisticated
`tests. The first few levels of testing look only at the con-
`text of a single option word, while the more sophisticat-
`ed tests compare one option word against another.
`Thus, only at the first few levels of tests can the acous-
`tic verification algorithm be parallelized in a straight-
`forward manner.
`The parallelism generated by parallelizing the hy-
`pothesize-and-test control structure and the verification
`processes are multiplicative in their parallel activity
`(i.e., performing in parallel the updating of n sentence
`hypothesis where each hypothesis invokes m verifica-
`tion processes and each verification process operates on
`o option words leads to a potential parallelism of
`n*m*o). This parallelism, together with the pipeline
`
`4 Simulation studies are currently being carried out on evaluating
`this speed-up factor. These studies are based on data generated from
`the current version of the Hearsay system.
`
`Authorized licensed use limited to: Callie Pendergrass. Downloaded on May 03,2022 at 17:10:14 UTC from IEEE Xplore. Restrictions apply.
`
`Amazon / Zentian Limited
`Exhibit 1025
`Page 4
`
`
`
`418
`
`IEEE TRANSACTIONS ON COMPUTERS, APRIL 1976
`
`parallelism of the preprocessing, leads to what appears
`to be a large amount of potential parallelism to be ex-
`ploited by a closely coupled network. However, it is still
`not clear just how much potential parallel activity exists
`over the entire recognition system; nor is it known how
`much of this potential will be dissipated because of soft-
`ware and hardware overhead.
`In order to answer these questions, a parallel decom-
`position of the Hearsay speech understanding system is
`now being implemented on C.mmp, a closely coupled
`network of PDP-11's which communicate through a
`large shared memory [2]. The C.mmp hardware configu-
`ration can contain up to 16 PDP-11's; the highly multi-
`plexed switch that connects processors to memory per-
`mits up to 16 simultaneous memory references if these
`references are not to the same memory module. Thus, if
`processors are referencing different memory modules,
`then each processor can run at full speed. In addition,
`C.mmp can be configured for a specific application (e.g.,
`speech) by replacing a processor by a special purpose
`hardware device which directly accesses memory (e.g., a
`signal processor).
`The Hydra software operating system [24J, which is
`associated with C.mmp, provides an appropriate kernel
`set of facilities for implementing the parallel version of
`the speech system. These facilities permit control of
`real-time devices, convenient building of a tree of pro-
`cesses, message queues, and shared data base communi-
`cation among processes, user-defined scheduling strate-
`gies, arbitrary interruption of running processes, and
`dynamic creation of new processes. Building up from
`this base, a debugging system will be constructed which,
`in addition to the normal features, will permit the re-
`cording of all communication among processes, the trac-
`ing of all process activity, and the monitoring of global
`variables (including a recording of which processes have
`modified them). These additional capabilities are cru-
`cial for isolating errors and understanding the dynamic
`behavior patterns of the parallel system.
`The major software problem to be investigated in this
`parallel implementation of the Hearsay system is how to
`efficiently map virtual parallelism (process activity)
`into actual parallelism (processor activity). This map-
`ping problem in turn centers on three design issues,
`each of which relates to how processes interact:
`1) the design of the interlock structure for a shared
`data base;
`2) the choice of the smallest computational grain at
`which the system exhibits parallel activity; and
`3) the techniques for scheduling a large number of
`'l-'sely coupled processes.
`The first design issue is important because in a close-
`ly coupled process structure many processes may at-
`tempt to access a shared data base at the same time. In
`a uniprocessor system, the sequentialization of access to
`this shared data base does not significantly affect per-
`formance because there is only one process running at a
`
`time. In a multiprocessor system, however, if the inter-
`lock structure for a shared data base is not properly de-
`signed so as to permit as many noninterfering accesses
`as possible, then access to the shared data base becomes
`a significant bottleneck in the system's performance
`[14].
`The second issue relates to how closely coupled pro-
`cesses can interact. If the grain of decomposition is such
`that the overhead involved in process communication is
`significant in relation to the amount of computation
`done by the process, then the added virtual parallelism
`achieved by a finer decomposition can decrease, rather
`than increase, the performance of the system. Thus, un-
`derstanding the relationship between the grain of de-
`composition and the overhead of communication is an
`important design parameter.
`The third issue relates to a phenomenon called the
`"control working set" [11]. This phenomenon predicts
`that the execution of a closely coupled process structure
`on a multiprocessor may result in a significant amount
`of supervisory overhead caused by a large number of
`process context switches. The reason for this high num-
`ber of process context switches is analogous to the rea-
`son for "thrashing" within a data working set [5]. For
`example, in a uniprocessor system, if two parallel pro-
`cesses closely interact with each other, then each time
`one process is waiting for a communication from the
`other it would have to be context switched so as to allow
`the other process to execute. If these two processes com-
`municate often, then there would be a large number of
`context switches. However, if there were two processors,
`each containing one of the processes, then there would
`be no process switching.
`The implications of this phenomenon on constructing
`process structures are the following:
`1) Processes should be formed into clusters where
`communication among cluster members is closely cou-
`pled whereas communication among clusters is loosely
`coupled. This process structuring paradigm has also
`been suggested as a model for the operation of complex
`human and natural systems [22].
`2) The size of a process cluster cannot be chosen inde-
`pendent of the particular hardware configuration that
`will be used to execute it. For example, a cluster size of
`8 may be appropriate for a hardware system containing
`16 processors while being inappropriate for a system
`containing 6 processors.
`3) The scheduler of a multiprocessor system should
`use a strategy that schedules process clusters rather
`than single processes. (This is analogous to the advan-
`tage of preloading the data working set rather than dy-
`namically constructing the working set at each context
`swap.)
`4) The use of process structures to implement inher-
`ently sequential, though complex, control structures
`(e.g., coroutines, etc.) may lead to inefficient scheduling
`of process structures on a multiprocessor system (i.e.,
`the scheduling strategy should be able to easily differ-
`
`Authorized licensed use limited to: Callie Pendergrass. Downloaded on May 03,2022 at 17:10:14 UTC from IEEE Xplore. Restrictions apply.
`
`Amazon / Zentian Limited
`Exhibit 1025
`Page 5
`
`
`
`ERMAN et al.: SYSTEM ORGANIZATIONS FOR SPEECH UNDERSTANDING
`
`419
`
`entiate those processes that can go on in parallel from
`those that are sequentialized).
`
`NETWORK ORGANIZATIONS
`The multiprocessor type organization described ear-
`lier implies a closely coupled set of processes on a set of
`closely coupled processors cooperating to accomplish
`the common goal of utterance recognition. The key idea
`in such a system is that both the processes and proces-
`sors are closely coupled-that is, the cost of communi-
`cation between processes or processors is relatively
`cheap with respect to the amount of computation to be
`done by any individual process. Indeed, in the multipro-
`cess system described earlier, much interprocess com-
`munication and data sharing may be achieved by actu-
`ally having shared physical address spaces. However,
`such a system usually also implies a certain homogene-
`ity or physical proximity of the processors and memory.
`Consider now the task of integrating the knowledge of
`many different research groups in various widespread
`geographical locations, each with its own computing fa-
`cilities and each with its own areas of specialization. In
`an attempt to avoid unnecessary duplications of effort,
`one would desire a scheme whereby each group could
`develop pieces of a total recognition system (which piec-
`es might represent new sources of knowledge, such as a
`new and improved vowel classification algorithm) using
`local computing resources (i.e., using an arbitrary ma-
`chine configuration and program structure). Those piec-
`es of the system would then be incorporated into a dis-
`tributed "total recognition system" by appropriate
`(hopefully minimal) linkage and protocol conventions
`and their contributions to the entire system evaluated.
`The geographical constraints suggest the use of a com-
`puter network facility as a means by which one might
`assemble this total recognition system. We are currently
`undertaking the task of designing and implementing
`such a system for use on the Advanced Research
`Projects Agency (ARPA) network of computing facili-
`ties [19]. The usefulness of such a network organization
`for a speech understanding system lies in its potential
`ability to combine and evaluate the various algorithms
`and sources of knowledge of a wide variety of research
`groups. In particular, the objective of the network orga-
`nization is to create a research tool rather than to pro-
`duce a highly efficient recognition system.
`As an example, suppose a group wishes to add a new
`source of knowledge (a new vowel classification algo-
`rithm, for instance) to the network system. This knowl-
`edge source is provided in the form of a process (or a set
`of processes) running on a local computer connected to
`the ARPA network. System integration is then achieved
`by adding linking instructions to the process (perhaps
`interactively) for notifying a centralized controlling pro-
`cess of the set of preconditions (e.g., conditions relating
`to the incoming speech wave or the current state of the
`recognition) that must be met in order to activate this
`
`process [1], as well as the required inputs and created
`outputs (and their formats). The .central controller is
`then responsible for activating the new knowledge
`source at appropriate times, supplying the requested in-
`puts, and updating a global data base to reflect the re-
`sults of the activated process. Knowledge source pro-
`cesses may communicate with one another via a mes-
`sage service facility provided by the central controller.
`The marked increase of indirection with respect to com-
`munication and data sharing as compared with a closely
`coupled multiprocessor approach is a result of the goal
`to serve a wide geographic region of users and to allow
`cooperation between essentially autonomous knowledge
`sources.
`The problems that occur in this network concept are
`of a nature different from that of those occurring in the
`multiprocessor structure described previously. The
`many sources of knowledge are no longer necessarily
`closely coupled. In fact, we might term such a network
`organization to be "loosely coupled" in the sense that
`process communication and data base sharing must be
`achieved by some form of message switching scheme
`since the system is now operating on an indefinite num-
`ber of (n