`Nilssonl980].
`It is particularly helpful to have a denotion link to keep perceptual structures
`separate from model structures. Then if mistakes are made by the vision automa
`ton, a correction mechanism can either sever the denotation link completely or
`create a new denotation link between the correct model and image structures.
`When dealing with many spatial relations, it is economical to recognize that
`many relations are "inverses" of each other. That is, LEFTOF (x,.y) is the "in
`verse" of RIGHTOF0t,.y);
`LEFTOF(xyO <==> RIGHTOF(y,x)
`
`and also
`
`ADJACENT Gey) < => ADJACENT (y,x)
`Rather than double the number of these kinds of links, one can
` normalize
`them. That is, only one half of the inverse pair is used, and the interpreter infers
`the inverse relation when necessary.
`Properties have a different semantics depending on the type of object that has
`the property. An "abstract" node can have a property that gives one aspect or
` a "concrete" node presum
`refinement of the represented concept. A property of
`ably means an established and quantified property of the individual.
`Partitions
`Partitions are a powerful notion in networks. "Partition" is not used in the
`sense of a mathematical partition, but in the sense of a barrier. Since the network is
`a graph, it contains no intrinsic method of delimiting subgraphs of nodes and arcs.
`Such subgraphs are useful for two reasons:
`1. Syntactic. It is useful to delimit that part of the network which represents the
`results of specific inferences.
`2. Semantic. It is useful to delimit that part of the network which represents
`knowledge about specific objects. Partitions may then be used to impose a
` nodes.
`hierarchy upon an otherwise "flat" structure of
`The simple way of representing partitions in a net is to create an additional node to
`represent the partition and introduce additional arcs from that node to every node
`or arc in the partition. Partitions allow the nodes and relations in them to be mani
`pulated as a unit.
`Notationally, it is cleaner to draw a labeled boundary enclosing the relevant
`nodes (or arcs). An example is shown by Fig. 10.12 where we consider two objects
`each made up of several parts with one object entirely left of the other. Rather than
`use a separate LEFTOF relation for each of the parts, a single relation can be used
`between the two partitions. Any pair of parts (one from each object) should inherit
`the LEFTOF relation. Partitions may be used to implement quantification in se
`mantic net representations of predicate calculus [Hendrix 1975, 1979]. They may
`be used to implement frames (Section 10.3.1).
`
`Sec. 70.2 Semantic Nets
`
`331
`
`IPR2021-00921
`Apple EX1015 Page 342
`
`
`
`Left of
`
`( Table
`
`")
`
`6—■ * < Chair
`
`)
`
`Fig. 10.12 The use of partitions, (a) Construction of
`by partitions.
`
` a partition, (b) Two objects described
`
`Conversions
`It is important to be able to transform from geometric (and logical) represen
`tations to propositional abstract representations and vice versa. For example, in
` a telephone on a previously
`Fig. 10.13 the problem is to find the exact location of
`located desk. In this case, propositional knowledge that telephones are usually on
`desktops, together with the desk top location and knowledge about the size of
`phones, define a search area in the image.
`Converting image data about a particular group of objects into relational form
`involves the inverse problem. The problem is to perform a level of abstraction to
`remove the specificity of the geometric knowledge and derive a relation that is ap
`propriate in a larger context. For example, the following program fragment creates
`the relations ABOVE04,5), where A and B are world objects.
`
` tele
`
` Z is the positive vertical.
`
`Comment: assume a world coordinate system where
`Find ZA min for Z in A and ZB max for Z in B.
`If ZA mm > ZB max, then make ABOVE (A,B) true.
`Many other definitions of ABOVE, one of which compares centers of gravity, are
`possible. In most cases, the conversion from continuous geometric relations to
`discrete propositional relations involves more or less arbitrary conventions. To ap
`preciate this further, consult Fig. 10.14 and try to determine in which of the cases
`
`332
`
`Ch. 10 Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 343
`
`
`
`Fig. 10.13 Search area defined by relational bindings.
`
`block A is LEFTOF block B. Figure 10.14d shows a case where different answers
`are obtained depending on whether a twodimensional or threedimensional in
` usually true of
`terpretation is used. Also, when relations are used to encode what is
`the world, it is often easy to construct a counterexample. Winston [Winston 1975]
`used
`
`SUPPORTS (B,A) ABOVE (A,B)
`
`<3>
`
`&
`
`(d)
`
`(c)
`
`(a)
`
`(b)
`
`Fig. 10.14 Examples to demonstrate difficulties in encoding spatial relation
`LEFTOF (see text).
`
`70.2 Semantic Nets
`
`333
`
`IPR2021-00921
`Apple EX1015 Page 344
`
`
`
`which is contradicted by Fig. 10.15, given the previous definition of ABOVE.
`One common way around these problems is to associate quantitative, "con
`tinuous" information with relations (section 10.3.2 and later examples).
`
`10.3 SEMANTIC NET EXAMPLES
`
`Examples of semantic nets abound throughout Part IV. Two more examples illus
`trate the power of the notions. The first example is described very generally, the
`second in detail.
`
`10.3.1 Frame Implementations
`
`Frame system theory [Minsky 1975] is a way of explaining our quick access to im
`portant aspects of a (perhaps perceptual) situation. It is a provocative and con
`troversial idea, and the reader should consult the References for a full treatment.
`Implementationally, a frame may be realized by a partition; a frame is a "chunk"
`of related structure.
`Associating related "chunks" of knowledge into manipulable units is a
`powerful and widespread idea in artificial intelligence [Hayes 1980; Hendrix 1979]
`as well as psychology. These chunks go by several names: units, frames, parti
`tions, schemata, depictions, scripts, and so forth [Schank and Abelson 1977;
`Moore and Newell 1973; Roberts and Goldstein 1977; Hayes* 1977; Bobrow and
`Winograd 1977, 1979; Stefik 1979; Lehnert and Wilks 1979; Rumelhart et al.
`1972].
`Frames systems incorporate a theory of associative recall in which one selects
`frames from memory that are relevant to the situation in which one finds
` oneself.
`These frames include several kinds of information. Most important, frames have
`slots which contain details of the viewing situation. Frame theory dictates a strictly
`specific and prototypical structure for frames. That is, the number and type of slots
`for a particular type of frame are immutable and specified in advance. Further,
`frames represent specific prototype situations; many slots have default values; this
`is where expectations and prior knowledge come from. These default values may
`be disconfirmed by perceptual evidence; if they are, the frame can contain infor
`mation about what actions to take to fill the slot. Some slots are to be
` filled in by in
`vestigation. Thus a frame is a set of expectations to be confirmed or disconfirmed
`
`334
`
`Fig. 10.15 A counterexample lo
`SUPPORTS(B, A) => ABOVEU B).
`
`Ch. 10 Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 345
`
`
`
` a frame
`
`and actions to pursue in various contingencies. One common action is to "bring in
`another frame."
`The theory is that based on a partial match of a frame's defining slots,
`can be "brought to mind." The retrieval is much like jumping to a conclusion
`based on partial evidence. Once the frame is proposed, its slots must be matched
`up with reality; thus we have the initial major hypothesis that the frame represents,
`which itself consists of a number of minor subhypotheses to be verified. A frame
`may have other frames in its slots, and so frames may be linked into "frame sys
`tems" that are themselves associatively related. (Consider, for example, the
`linked perceptual frames for being just outside a theater and for being just inside.)
`Transformations between frames correspond to the effects of relevant actions.
`Thus the hypotheses can suggest one another. "Thinking always begins with sug
`gestive but imperfect plans and images; these are progressively replaced by
`better—but usually still imperfect—ideas" [Minsky 1975].
`Frame theory is controversial and has its share of technical problems [Hinton
`1977]. The most important of these are the following.
`1. Multiple instances of concepts seem to call for copying frames (since the in
`stances may have different slotfillers). Hence, one loses the economy of a
`preexisting structure.
`2. Often, objects have variable numbers of components (wheels on a truck, run
`ways in an airport). The natural representation seems to be a rule for con
`structing examples, not some specific example.
`3. Default values seem inadequate to express legal ranges of slotfilling values or
`dependencies between their properties.
`4. Property inheritance is an important capability that semantic nets can imple
` "elementof" hierarchies. However, such hierarchies
`ment with "is a" or
`raise the question of which frame to copy when a particular individual is being
`perceived. Should one copy the generic Mammal frame or the more specific
`Camel frame, for instance. Surely, it is redundant for the Camel frame to du
`plicate all the slots in the Mammal frame. Yet our perceptual task may call for
`a particular slot to be filled, and it is painful not to be able to tell where any par
`ticular slot resides.
`
`Nevertheless, where these disadvantages can be circumvented or are ir
`relevant, frames are seeing increasing use. They are a natural organizing tool for
`complex data.
`
`10.3.2 Location Networks
`
`This section describes a system for associating geometric analogical data with a se
`mantic net structure which is sometimes like a frame with special "evaluation"
`rules. The system is a geometrical inference mechanism that computes (or infers)
`twodimensional search areas in an image [Russell 1979]. Such networks have
`found use in both aerial image applications [Brooks and Binford 1980; Nevatia and
` al. 1979].
`Price 1978] and medical image applications [Ballard et
`
`Sec. 70.3 Semantic Net Examples
`
`335
`
`IPR2021-00921
`Apple EX1015 Page 346
`
`
`
`The Network
`
`A location network is a network representation of geometric point sets related
`by settheoretic and geometric operations such as set intersection and union, dis
`tance calculation, and so forth. The operations correspond to restrictions on the lo
`cation of objects in the world. These restrictions, or rules, are dictated by cultural
`or physical facts.
`Each internal node of the location network contains a geometric operation, a
`list of arguments for the operation, and a result of the operation. For instance, a
`node might represent the settheoretic union of two argument point sets, and the
`result would be a point set. Inference is performed by evaluating the net; evaluating
`all its operations to derive a point set for the top (root) operation.
`The network thus has a hierarchy of ancestors and descendents imposed on it
`through the argument links. At the bottom of this hierarchy are data nodes which
`contain no operation or arguments, only geometric data. Each node is in one of
`three states: A node is uptodate if the data attached to it are currently considered
`to be accurate. It is outofdate if the data in it are known to be incomplete, inaccu
`rate, or missing. It is hypothesized if its contents have been created by net evalua
`tion but not verified in the image.
`In a common application, the expected relative locations of features in a
`scene are encoded in a network, which thus models the expected structure of the
`image. The primitive set of geometric relations between objects is made up of four
`different types of operations.
`
`1. Directional operations (left, reflect, north, up, down, and so on) specify a point
`set with the obvious locations and orientations to another.
`2. Area operations (closeto, inquadrilateral, incircle and so on) create a point
`set with a nondirectional relation to another.
`3. Set operations (union, difference and intersection) perform the obvious set
`operations.
`4. Predicates on areas allow point sets to be filtered out of consideration by
`measuring some characteristic of the data. For example, a predicate testing
`width, length, or area against some value restricts the size of sets to be those
`within a permissible range.
`
`The location of the aeration tank in a sewage treatment plant provides a
`specific example. The aeration tank is often a rectangular tank surrounded on ei
`ther end by circular sludge and sedimentation tanks (Fig. 10.16). As a general rule,
`sewage flows from the sedimentation tanks to aeration tanks and finally through to
`the sludge tanks. This design permits the use of the following types of restrictions
`on the location of the aeration tanks.
`
`Rule 1: "Aeration tanks are located somewhere close to both the sludge tanks
`and the sedimentation tanks."
`
`336
`
`Ch. 10 Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 347
`
`
`
`Fig. 10.16 Aerial image of
`
` a sewage plant.
`
`The various tanks cannot occupy the same space, so:
`
`Rule 2: "Aeration tanks must not be too close to either the sludge or sedimen
`tation tanks."
`
`Rule 1 is translated to the following network relations.
`CLOSETO (Union (LocSludgeTanks, LocSedTanks), Distance X)
`Rule 2 is translated to
`NOTIN(Union (LocSludgeTanks,LocSedTanks), Distance Y)
`The network describing the probable location of the aeration tanks embodies
`both of these rules. Rule 1 determines an area that is close to both groupings of
`tanks and Rule 2 eliminates a portion of that area. Thinking of the image as a point
`set, a set difference operation can remove the area given by Rule 2 from that
`specified by Rule 1. Figure 10.17 shows the final network that incorporates both
`rules.
`
`Of course, there could be places where the aeration tanks might be located
`very far away or perhaps violate some other rule. It is important to note that, like
`the frames of Section 10.3.1, location networks give prototypical, likely locations
`for an object. They can work very well for stereotyped scenes, and might fail to per
`form in novel situations.
`The Evaluation Mechanism
`The network is interpreted (evaluated) by a program that works topdown in
`a recursive fashion, storing the partial results of each rule at the topmost node as
`
`Sec. 70.3 Semantic Net Examples
`
`337
`
`IPR2021-00921
`Apple EX1015 Page 348
`
`
`
`Aeration
`tank
`
`01
`
`difference
`
`Distance
`Y
`
`closeto
`
`Distance
`X
`
`Sludge
`tanks
`
`Fig. 10.17 Constraint network for aeration tank.
`
`sociated with that rule (with a few exceptions). Evaluation starts with the root
`node. In most networks, this node is an operation node. An operation node is
`evaluated by first evaluating all its arguments, and then applying its operation to
`those results. Its own result is then available to the node of the network that called
`for its evaluation.
`Data nodes may already contain results which might come from a map or
`from the previous application of vision operators. At some point in the course of
`the evaluation, the evaluator may reach a node that has already been evaluated and
`is marked uptodate or hypothesized (such a node contains the results of evalua
`tion below that point). The results of this node are returned and used exactly as if it
`were a data node. Outofdate nodes cause the evaluation mechanism to execute a
`lowlevel procedure to establish the location of the feature. If the procedure is un
`able to establish the status of the object firmly within its resource limits, the status
`will remain outofdate. At any time, outofdate nodes may be processed without
`having to recompute any uptodate nodes. A node marked hypothesized has a
`value, usually supplied by an inference process, and not verified by lowlevel im
` all infer
`age analysis. Hypothesized data may be used in inferences: the results of
`ences based on hypothesized data are marked hypothesized as well.
`
`338
`
`Ch. 10 Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 349
`
`
`
`2.
`
`3.
`
`If a data node ever has its value changed (say, by an independent process that
`adds new information), all its ancestors are marked outofdate. Thus the root
`node will indicate an outofdate status, but only those nodes on the outofdate
`path must be reevaluated to bring the network up to date. Figure 10.18 shows the
`operation of the aeration tank network of
` Fig. 10.17 on the input of
` Fig. 10.16. In
`this case the initial feature data were a single sludge tank and a single sedimenta
`tion tank. Suppose additional work is done to find the location of the remaining
`sludge and sediment tanks in the image. This causes a reevaluation of the network,
`and the new result more accurately reflects the actual location of the aeration
`tanks.
`Properties of Location Networks
`The location network provides a very general example of use of semantic nets
`in computer vision.
`1.
`It serves as a data base of point sets and geometric information. The truth
`status of items in the network is explicitly maintained and depends on incom
`ing information and operations performed on the net.
`It is an expansion of
` a geometric expression into a tree, which makes the order
`of evaluation explicit and in which the partial results are kept for each
`geometric calculation. Thus it provides efficient updating when some but not
`all the partial results change in a reevaluation.
`It provides a way to make geometrical inferences without losing track of the
`hypothetical nature of assumptions. The tree structure records dependencies
`among hypotheses and geometrical results, and so upon invalidation of a
`geometric hypothesis the consequences (here, what other nodes have their
`values affected) are explicit. The record of dependencies solves a major prob
`lem in automated inference systems.
`It reflects implicit universal quantification. The network claims to represent
`true relations whose explicit arguments must be filled in as the network is "in
`stantiated" with real data.
`It has a "flat" semantics. There are no elementof hierarchies or partitions.
`5.
`6. The concept of "individual" is flexible. A point set can contain multiple
`disconnected components corresponding to different world objects. In set
`operations, such an assemblage acts like an explicit set union of the com
`ponents. An "individual" in the network may thus correspond to multiple in
`dividual point (sub)sets in the world.
`7. The network allows use of partial knowledge. A settheoretic semantics of ex
`istence and location allows modeling of an unknown location by the set
`theoretic universe (the possible location is totally unconstrained). If some
`thing is known not to exist in a particular image, its "location" is the null set.
`Generally, a location is a point set.
`8. The settheoretic semantics allows useful punning on set union and the OR
`operation, and set intersection and the AND operation. If a dock is on the
`
`4.
`
`Sec. 70.3 Semantic Net Examples
`
`339
`
`IPR2021-00921
`Apple EX1015 Page 350
`
`
`
`shoreline AND near a town, the search for docks need only be carried out in
`the intersection of the locations.
`
`10.4 CONTROL ISSUES IN COMPLEX VISION SYSTEMS
`
`Computer vision involves the control of large, complex informationprocessing
`tasks. Intelligent biological systems solve this control problem. They seem to have
`complicated control strategies, allowing dynamic allocation of computational
`resources, parallelism, interruptdriven shifts of attention, and incremental
`behavior modification. This section explores different strategies for controlling the
`complex information processing involved in vision. Appendix 2 contains specific
`
`340
`
`Ch. 10 Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 351
`
`
`
`techniques and programming language constructs that have proven to be useful
`tools in implementing control strategies for artificial intelligence and computer vi
`sion.
`
`10.4.1 Parallel and Serial Computation
`
`In parallel computation, several computations are done at the same time. For exam
`ple, different parts of an image may be processed simultaneously. One issue in
`parallel processing is synchronization: Is the computation such that the different
`parts can be done at different rates, or must they be kept in step with each other?
`Usually, the answer is that synchronization is important. Another issue in parallel
`processing is its implementation. Animal vision systems have the architecture to
`do parallel processing, whereas most computer systems are serial (although
`developing computer technologies may allow the practical realization of some
`parallel processing). On a serial computer parallelism must be simulated—this is
`not always straightforward.
`In serial computation, operations are performed sequentially in time whether
`or not they depend on one another. The implied sequential control mechanism is
`more closely matched to a (traditional) serial computer than is a parallel mechan
`ism. Sequential algorithms must be stingy with their resources. This fact has had
`many effects in computer vision. It has led to mechanisms for efficient data access,
`such as multipleresolution representations. It has also led some to emphasize cog
`nitive alternatives for lowlevel visual processing, in the hope that the massive
`parallel computations performed in biological vision systems could be circum
`vented. However, this trend is reversing; cheaper computation and more pervasive
`parallel hardware should increase the commitment of resources to lowlevel com
`putations. Parallel and serial control mechanisms have both appeared in algo
`rithms in earlier chapters. It seems clear that many lowlevel operations (correla
`tion, intrinsic image computations) can be implemented with parallel algorithms.
`Highlevel operations, such as "planning" (Chapter 13) have inherently serial
`components. In general, in the low levels of visual processing control is predom
`inately parallel, whereas at the more abstract levels some useful computations are
`necessarily serial in nature.
`
`10.4.2 Hierarchical and Heterarchical Control
`
`Visual control strategies dictate the flow of information and activity through the
`representational layers. What triggers processing: a low level input like a color
`patch on the retina, or a high level expectation (say, expecting to see a red car)
`Different emphasis on these extremes is a basic control issue. The two extremes
`may be characterized as follows.
`
`1. Image data driven. Here the control proceeds from the construction of the
`generalized image to segmented structures and finally to descriptions. This is
`also called bottomup control.
`
`Sec. 10.4 Control
`
` Issues in Complex Vision Systems
`
` ?
`
`341
`
`IPR2021-00921
`Apple EX1015 Page 352
`
`
`
`2.
`
`Internal model driven. Here highlevel models in the knowledge base generate
`expectations or predictions of geometric, segment, or generalized image struc
`ture in the input. Image understanding is the verification of these predictions.
`This is also called topdown control.
`Topdown and bottomup control are distinguished not by what they do but
`rather by the order in which they do it and how much of it they do. Both ap
`proaches can utilize all the basic representations—intrinsic images, features,
`geometric structures, and propositional representations—but
`the processing
`within these representations is done in different orders.
`The division of control strategies into topdown and bottomup is a rather
`simplistic one. There is evidence that attentional mechanisms may be some of the
`most complicated brain functions that human beings have [Geschwind 1980]. The
`different representational subsystems in a complex vision system influence each
`other in sophisticated and intricate
` ways; whether control flows "up" or "down" is
`only a broad characterization of local influence in the (loosely ordered) layers of
`the system.
`The term "bottomup" was originally applied to parsing algorithms for for
`mal languages that worked their way up the parse tree, assembling the input into
`structures as they did so. "Topdown" parsers, on the other hand, notionally
`started at the top of the parse tree and worked downward, effectively generating
`expectations or predictions about the input based on the possibilities allowed by
`the grammar; the verification of these predictions confirmed a particular parsing.
`These two paradigms are still basic in artificial intelligence, and provide
`powerful analogies and methods for reasoning about and performing many
`informationprocessing tasks. The bottomup paradigm is comparable in spirit
`with "forward chaining," which derives further consequences from established
`results. The topdown paradigm is reflected in "backward chaining," which breaks
`problems up into subproblems to be solved.
`These control organizations can be used not only "tactically" to accomplish
`specific tasks, but they can dictate the whole "strategy" of the vision campaign.
`We shall discover that in their pure forms the extreme strategies (topdown and
`bottomup) appear inadequate to explain or implement vision. More flexible or
`ganizations which incorporate both topdown and bottomup components seem
`more suited to a broad spectrum of ambitious vision tasks.
`Bottom Up Control
`The general outline for bottomup vision processing is:
`1. PREPROCESS. Convert raw data into more usable intrinsic forms, to be inter
`preted by next level. This processing is automatic and domainindependent.
`2. SEGMENT. Find visually meaningful image objects perhaps corresponding to
`world objects or their parts. This process is often but not always broken up into
`(a) the extraction of meaningful visual primitives, such as lines or regions of
`homogeneous composition (based on their local characteristics); and (b) the
`agglomeration of local image features into larger segments.
`
`342
`
`Ch. W Knowledge Representation and Use
`
`IPR2021-00921
`Apple EX1015 Page 353
`
`
`
`3. UNDERSTAND. Relate the image objects to the domain from which the image
`arose. For instance, identify or classify the objects. As a step in this process, or
`indeed as the final step in the computer vision program, the image objects and
`the relations between them may be described.
`In pure bottomup organization each stage yields data for the next. The pro
`gression from raw data to interpreted scene may actually proceed in many steps;
`the different representations at each step allow us to separate the process into the
`main steps mentioned above.
`Bottomup control is practical if potentially useful "domainindependent"
`processing is cheap. It is also practical if the input data are accurate and yield reli
`able and unambiguous information for the higherlevel visual processes. For ex
`ample, the binary images that result from careful illumination engineering and in
`put thresholding can often be processed quite reliably and quickly in a bottomup
`mode. If the data are less reliable, bottomup styles may still work if they make
`only tolerably few errors on each pass.
`TopDown Control
`A bottomup, hierarchical model of perception is at first glance appealing on
`neurological and computational grounds, and has influenced much classical philo
`sophical thought and psychological theory. The "classical" explanation of percep
`tion has relatively recently been augmented by a more cognitionbased one involv
`ing (for instance) interaction of knowledge and expectations with the perceptual
`process in a more topdown manner [Neisser 1967; Bartlett 1932]. A similar evolu
`tion of the control of computer vision processing has accounted for the augmenta
`tion of the pure "pattern recognition" paradigm with more "cognitive" para
`digms. The evidence seems overwhelming that there are vision processes which do
`not "run bottomup," and it is one of the major themes of this book that internal
`models, goals, and cognitive processes must play major roles in computer vision
`[Gregory 1970; Buckhout 1974; Gombrich 1972]. Of
` course, there must be a sub
`stantial component of biological vision systems which can perform in a noncogni
`tive mode.
`There are probably no versions of topdown organization for computer vision
`that are as pure as the bottomup ones. The model to keep in mind in topdown
`perception is that of goaldirected processing. A highlevel goal spawns subgoals
`which are attacked, again perhaps yielding subsubgoals, and so on, until the goals
`are simple enough
`to solve directly. A common topdown technique is
`"hypothesizeandverify"; here an internal modeling process makes predictions
`about the way objects will act and appear. Perception becomes the verifying of
`predictions or hypotheses that flow from the model, and the updating of the model
`based on such probes into the perceptual environment [Bolles 1977]. Of course,
`our goaldriven processes may be interrupted and resources diverted to respond to
`the interrupt (as when movement in the visual periphery causes us to look toward
`the moving object). Normally, however, the hypothesis verification paradigm re
`quires relatively little information from the lower levels and in principle it can con
`trol the lowlevel computations.
`
`Control Issues in Complex Vision Systems
`
`343
`
`IPR2021-00921
`Apple EX1015 Page 354
`
`
`
`The desire to circumvent unnecessary lowlevel processing in computer vi
`sion is understandable. Our lowlevel vision system performs prodigious amounts
`of information processing in several cascaded parallel layers. With serial computa
`tion technology, it is very expensive to duplicate the power of our lowlevel visual
`system. Current technological developments are pointing toward making parallel,
`lowlevel processing feasible and thus lowering this price. In the past, however, the
`price has been so heavy that much research has been devoted to avoiding it, often
`by using domain knowledge to drive a more or less topdown perception paradigm.
`Thus there are two reasons to use a topdown control mechanism. First, it seems to
`be something that human beings do and to be of interest in its own right. Second, it
`seems to offer a chance to accomplish visual tasks without impractical expenditure
`of resources.
`Mixed TopDown and BottomUp Control
`In actual computer vision practice, a judicious mixture of datadriven analysis
`and modeldriven prediction often seems to perform better than either style in iso
`lation. This meld of control styles can s