`"inconsistent," or "optimal" with respect to a given relational structure of objects (an
`Fig. 12.7 Three scenes (A, B, C) and their labelings. Labelings are only "consistent,"
`
`(c)
`
`(b)
`
`f] ft
`
`+
`
`ft
`
`+
`
`ft
`
`+
`
`\
`
`w (
`\\
`\\
`
`/
`
`\
`
`o
`^
`
`i
`
`\\!ha
`
`\
`\
`
`\
`\
`\
`
`Trees
`
`,. \ Grass
`
`labeling
`Optimal
`
`Consistent
`
`labels
`
`Scene
`
`Road \v
`
` v^
`^A AAVA
`
`^v
`
`/
`
`■
`
`
`plest way to find a consistent labeling of
`"labeling of a scene") is to apply a depthfirst
`ties, as in the backtracking algorithm (11.1).
`Label an object in accordance with unary constraints.
`
` a relational structure (we shall often say
` tree search of the labeling possibili
`
`Iterate until a globally consistent labeling is found:
`
`Given the current labeling, label another object
`consistently—in accordance with all constraints.
`
`If the object cannot be labeled consistently, backtrack
` a previously labeled object.
`and pick a new label for
`This labeling algorithm can be computationally inefficient. First, it does not
`prune the search tree very effectively. Second, if it is used to generate all con
`sistent labelings, it does not recognize important independences in the labels. That
`is, it does not notice that conclusions reached (labels assigned) in part of the tree
`search are usable in other parts without recomputation.
`In a serial relaxation, the labels are changed one object at a time. After each
`such change, the new labeling is used to determine which object to process next.
`This technique has proved useful in some applications [Feldman and Yakimovsky
`1974].
`
`Assign all possible labels to each object in accordance with
`unary constraints.
`
`Iterate until a globally consistent labeling is found:
`
`Somehow pick an object to be processed.
`
`Modify its labels to be consistent with the current
`labeling.
`A parallel iterative algorithm adjusts all object labels at once; we have seen
` Sec
`this approach in several places, notably in the "Waltz filtering algorithm" of
`tion 9.5.
`Assign all possible labels to each object in accordance with
`unary constraints.
`
`Iterate until a globally consistent labeling is found:
`
`In parallel, eliminate from each object's label set
`those labels that are inconsistent with the current
`labels of the rest of the relational structure.
`A less structured version of relaxation occurs when the iteration is replaced
`with an asynchronous interaction of labeled objects. Such interaction may be imple
`mented with multiple cooperating processes or in a data base with "demons" (Ap
`
`412
`
`Ch. 12 Inference
`
`
`
`pendix 2). This method of relaxation was used in MSYS [Barrow and Tenenbaum
`1976]. Here imagine that each object is an active process that knows its own label
`set and also knows about the constraints, so that it knows about its relations with
`other objects. The program of each object might look like this.
`
`If I have just been activated, and my label set is not
`consistent with the labels of other objects in the
`relational structure, then I change my label set to be
`consistent, else I suspend myself.
`
`Whenever I change my label set, I activate other objects
` I suspend myself.
`whose label set may be affected, then
`To use such a set of active objects, one can give each one all possible labels
`consistent with the unary constraints, establish the constraints so that the objects
`know where and when to pass on activity, and activate all objects.
`Constraints involving arbitrarily many objects (i.e., constraints of arbitrarily
`high order) can efficiently be relaxed by recording acceptable labelings in a graph
`structure [Freuder 1978]. Each object to be labeled initially corresponds to a node
`in the graph, which contains all legal labels according to unary constraints. Higher
`order constraints involving more and more nodes are incorporated successively as
` is propagated; that is,
`new nodes in the graph. At each step the new node constraint
`the graph is checked to see if it is consistent with the new constraint. With the in
`troduction of more constraints, node pairings that were previously consistent may
`be found to be inconsistent. As an example consider the following graph coloring
`problem: color the graph in Fig. 12.8 so that neighboring nodes have different
`colors. It is solved by building constraints of increasingly higher order and pro
`pagating them. The node constraints are given explicitly as shown in Fig. 12.8a,
`but the higherorder constraints are given in functional implicit form; prospective
`colorings must be tested to see if they satisfy the constraints. After the node con
`straints are given, order two constraints are synthesized as follows: (1) make a
`node for each node pairing; (2) add all labelings that satisfy the constraint. The
`result is shown in Fig. 12.8b. The single constraint of order three is synthesized in
` Y,Z: Red,Green" is
`the same way, but now the graph is inconsistent: the match "
`ruled out by the third order legal label set (RGY,GRY). To restore consistency the
`constraint is propagated through node (Y,Z) by deleting the inconsistent labelings.
`This means that the node constraint for node Z is now inconsistent. To remedy
`this, the constraint is propagated again by deleting the inconsistency, in this case
`the labeling (Z:G). The change is propagated to node (X,Z) by deleting (X,Z:
`Red,Green) and finally the network is consistent.
`In this example constraint propagation did not occur until constraints of
`order three were considered. Normally, some constraint propagation occurs after
`every order greater than one. Of course it may be impossible to find a consistent
` Z in our example are changed from
`graph. This is the case when the labels for node
`(G, Y) to (G,R). Inconsistency is then discovered at order three.
`It is quite possible that a discrete labeling algorithm will not yield a unique la
`bel for each object. In this case, a consistent labeling exists using each label for the
`
`Sec. 12.4 Scene Labeling and Constraint Relaxation
`
`413
`
`
`
`!]
`0
`S]0 PI©
`
`(a)
`
`Fig. 12.8 Coloring a graph by building constraints of increasingly higher order.
`
`object. However, which of an object's multiple labels goes with which of another
`object's multiple labels is not determined. The final enumeration of consistent la
`belings usually proceeds by tree search over the reduced set of possibilities remain
`ing after the relaxation.
`Convergence properties of relaxation algorithms are important; convergence
`means that in some finite time the labeling will "settle down" to a final value. In
`discrete labeling, constraints may often be written so that the label adjustment
`phase always reduces the number of labels for an object (inconsistent ones are el
`iminated). In this case the algorithm clearly must converge in finite time to a con
`sistent labeling, since for each object the label set must either shrink or stay stable.
`In schemes where labels are added, or where labels have complex structure (such
`as real number "weights" or "probabilities"), convergence is often not
`guaranteed mathematically, though such schemes may still be quite useful. Some
`probabilistic labeling schemes (Section 12.4.3) have provably good convergence
`properties.
`
`414
`
`Ch. 12 Inference
`
`
`
`It is possible to use relaxation schemes without really considering their
`mathematical convergence properties, their semantics (What is the semantics of
`weights attached to labels—are they probabilities?), or a clear definition of what
`exactly the relaxation is to achieve (What is a good set of labels?). The fact that
`some schemes can be shown to have unpleasant properties (such as assigning
` two inconsistent hypotheses, or not always converging
`nonzero weights to each of
`to a solution), does not mean that they cannot be used. It only means that their
`behavior is not formally characterizable or possibly even predictable. As relaxation
`computations become more common, the less formalizable, less predictable, and
`less conceptually elegant forms of relaxation computations will be replaced by
`better behaved, more thoroughly understood schemes.
`
`12.4.3 A Linear Relaxation Operator and a Line Labeling Example
`
`The Formulation
`We now move away from discrete labeling and into the realm of continuous
`weights or supposition values on labels. In Sections 12.4.3 and 12 A A we follow
`closely the development of [Rosenfeld et al. 1976]. Let us require that the sum of
`label weights for each object be constrained to sum to unity. Then the weights are
`reminiscent of probabilities, reflecting the "probability that the label is correct."
`When the labeling algorithm converges, a label emerges with a high weight if it oc
`curs in a probable labeling of the scene. Weights, or supposition values, are in fact
`hard to interpret consistently as probabilities, but they are suggestive of likelihoods
`and often can be manipulated like them.
`In what follows p refers to probabilitylike weights (supposition values)
` a probability density function. Let a relational structure
`rather than to the value of
`with n objects be given by a
`n i= 1, ..., n, each with m discrete labels \\, ..., X
`The shorthand p, (X) denotes the weight, or (with the above caveats) the "proba
` X (actually k k for some k) is correct for the object a
`r Then the
`bility" that the label
`probability axioms lead to the following constraints,
`
`/w.
`
`0 < A (X) < 1
`
`(12.14)
`
`(12.15)
`
`I A 00 = 1
`A
`The labeling process starts with an initial assignment of weights to all labels
`for all objects [consistent with Eqs. (12.14) and (12.15)]. The algorithm is parallel
`iterative: It transforms all weights at once into a new set conforming to Eqs.
`(12.14) and (12.15), and repeats this transformation until the weights converge to
`stable values.
`Consider the transformation as the application of an operator to a vector of la
` compatibilities of labels, which serve as
`bel weights. This operator is based on the
`constraints in this labeling algorithm. A compatibility py looks like a conditional
`probability.
`
`E p, (X|X') = 1
`
`for all /,
`
` j, X'
`
`(12.16)
`
`Sec. 12.4 Scene Labeling and Constraint Relaxation
`
`415
`
`
`
`(12.17)
` X = X', else 0.
`Pa (X.|X') = 1
`iff
` a, has la
`The /7,y (X |X') may be interpreted as the conditional probability that object
`bel X given that another object aj has label X'. These compatibilities may be gath
`ered from statistics over a domain, or may reflect a priori belief or information.
`The operator iteratively adjusts label weights in accordance with other
` />, (X) is computed from old weights
`weights and the compatibilities. A new weight
`and compatibilities as follows.
`p,(\)
`
`:= Z cu I I
`J
`
`A'
`
` Pn (X|X')/>,(X')}
`
`(12.18)
`
`The Cy are coefficients such that
`
`(12.19)
`
`I Cy = 1
`j
` a, has label X, given the
`In Eq. (12.18), the inner sum is the expectation that object
`evidence provided by object a,, p, (X) is thus a weighted sum of these expecta
`tions, and the Cy are the weights for the sum.
`u and Cy , and apply Eq. (12.18) re
`To run the algorithm, simply pick the p
`peatedly to the/?, until they stop changing. Equation (12.18) is in the form of a ma
`trix multiplication on the vector of weights, as shown below; the matrix elements
`are weighted compatibilities, the CyPy. The relaxation operator is thus a matrix; if it
` component matrices, one for each set of noninteracting
`is partitioned into several
`weights, linear algebra yields proofs of convergence properties [Rosenfeld et al.
`1976]. The iteration for the reduced matrix for each component does converge,
`and converges to the weight vector that is the eigenvector of the matrix with eigen
`value unity. This final weight vector is independent of the initial assignments of la
`bel weights; we shall say more about this later.
`An Example
` Fig. 12.9a used in [Rosenfeld
`Let us consider the input line drawing scene of
`et.al. 1976]. The line labels given in Section 9.5 allow several consistent labels as
` a different physical interpretation.
`shown in Fig. 12.9be, each with
`In the discrete labelling "filtering" algorithm presented in Section 9.5 and
`outlined in the preceding section, the relational structure is imposed by the neigh
`bor relation between vertices induced by their sharing a line. Unary constraints are
`imposed through a catalog of legal combinations of line labels at vertices, and the
`binary constraint is that a line must not change its label between vertices. The algo
`rithm eliminates inconsistent labels.
` and a3 in Fig. 12.9 with the
`Let us try to label the sides of the triangle a
`h ai,
` +,—}. To do this requires some "conditional prob
`solid object edge labels {>, <,
`abilities" for compatibilities py{\ |X'), so let us use those that arise if all eight in
`terpretations of Fig. 12.9 are equally likely. Remembering that
`p{X\Y)= pi fJ?
`p(Y)
`
`(12.20)
`
`416
`
`Ch. 12 Inference
`
`
`
`d 3
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`(e)
`
`Fig. 12.9 A triangle and its possible
`labels, (a) Edge names, (b) Floating.
`(c) Flap folded up. (d) Triangular hole.
`(e) Flap folded down.
`
`and taking p (X, Y) to mean the probability that labels Xand Foccur consecutively
`in clockwise order around the triangle, one can derive Table 12.2. Of course, we
`could choose other compatibilities based on any considerations whatever as long as
`Eqs. (12.16) and (12.17) are preserved.
`Table 12.2 shows that there are two noninteracting components, {
`{ + ,<}. Consider the first component that consists of the weight vector
`
` — ,>} and
`
`(12.21)
` P3(>), />3()l
`W > ), P\(~)> Pi(>), Pi(),
`The second is treated similarly. This vector describes weights for the subpopula
`tion of labelings given by Fig. 12.9b and c. The matrix M of compatibilities has
`columns of weighted p f>.
`
`M
`
`c 2i/>2i(>l>)
`C n / M i ( > | >)
`ClLPll(>|)
`C 2\P2\(>\)
`C 22P2l(>\>)
`C\2p\l(>\>)
`C22P2li>\)
`C\2Pni>\)
`Cl3/>13(>l>) C 23/>23(>l>)
`CnPn(>\~)
`C23/>23(>|)
`
`(12.22)
`
`Sec. 12.4 Scene Labeling and Constraint Relaxation
`
`417
`
`
`
`Table 12.2
`
`x,
`
`\ 2
`
`Pkl, *2>
`
`/>Oi|A 2)
`
`>
`
`>
`>/4 I >/4 I
`
`%
`>
`1
`— >
`/
`o8
`6
`
`
`
`<
`<
`+
`+
`
`<
`+
`<
`+
`
`
`
`'/4 I '/4 I
`
`>
`<
`0
`0
`+
`
`
`0 > 0 < 0
`
`0 0
`+
`
`0
`<
`>
`0
`0
`+
`>
`0
`0
`<
`0
`0
`+ —
`0
`0
`t,
`1
`5
`
`
`
`o8 o8
`
`If we let c u = ] h for all /,
`
` j, then
`
`M=
`
`?3
`1 0 / 3 /,
`10
`0 1 10
`h
`^ 3 / 3 10
`1 10
`10
`0
`/3
`1
`%
`j& ^3
`0 1
`10
`10
`An analytic eigenvector calculation (Appendix 1) shows that the M of Eq.
`(12.23) yields (for any initial weight vector) the final weight vector of
`
`/3
`
`/3
`
`0
`
`(12.23)
`
`P/4, '/4, 3/4, % %
`
`I/4]
`
`(12.24)
`
`Thus each line of the population in the component we chose (Fig. 12.9b and c) has
`label > with "probability" 3/4, —with "probability" l A. In other words, from an ini
`tial assumption that all labelings in Fig. 12.9b and c were equally likely, the system
`of constraints has "relaxed" to the state where the "most likely" labeling is that of
`Fig. 12.9b, the floating triangle.
`This relaxation method is a crisp mathematical technique, but it has some
`drawbacks. It has good convergence properties, but it converges to a solution en
`tirely determined by the compatibilities, leaving no room for preferences or local
`scene evidence to be incorporated and affect the final weights. Further, the algo
`rithm perhaps does not exactly mirror the following intuitions about how relaxa
`tion should work.
`
`418
`
`Ch. 12
`
`Inference
`
`
`
`
`1. Increase p,(k) if high probability labels for other objects are compatible with
`assignment of A. to a,.
`2. Decrease p,(A) if high probability labels are incompatible with the assignment
`of A. to a,.
`3. Labels with low probability, compatible or incompatible, should have little
`influence on /?,(A).
`However, the operator of this section decreases /?,(A) the most when other labels
`have both low compatibility and low probability. Thus it accords with (1) above,
`but not with (2) or (3). Some of these difficulties are addressed in the next section.
`
`12.4.4 A Nonlinear Operator
`
`The Formulation
`If compatibilities are allowed to take on both positive and negative values,
`then we can express strong incompatibility better and obtain behavior more like
`(1), (2), and (3) just above. Denote the compatibility of the event "label k on a"
`with the event "label k on a" by ty (A,
` A.'). If the two events occur together often,
`r,j should be positive. If they occur together rarely, r
`u should be negative. If they
`are independent, r,y should be 0. The
` correlation coefficient behaves like this, and
`the compatibilities of this section are based on correlations (hence the the notation
`r,j for compatibilities). The correlation is defined using the covariance.
`
`covtf, Y) = p(X,
`Y)p(X)p(Y)
`Now define a quantity a which is like the standard deviation
`2]*
`*(X) = [p(X) (p(X))
`then the correlation is the normalized covariance
`
`(12.25)
`
`This allows the formulation of an expression precisely analogous to Eq.
`(12.18), only that r,j instead of py is used to obtain a means of calculating the posi
`tive or negative change in weights.
`<7,a)(A) = £<v [£ njik, k')p/ kHk')]
`*'
`J
`In Eqs. (12.27)—(12.29) the superscripts indicate iteration numbers. The weight
`change (Eq. 12.27) could be applied as follows,
`Pi(k+lHk)=pM(k)
`(12.28)
`+ qM(k)
`but then the resultant label weights might not remain nonnegative. Fixing this in a
`straightforward way yields the iteration equation
`p;(*)(x)[l + g /*)(A)]
`Pu n ( A ) £ ^ f ; ; H r r
`J ^ : ^7
`lA a )(X)[l + ^ a)(X)]
`
`H2.29)
`
`(12.27)
`
`Sec. 12.4 Scene Labeling and Constraint Relaxation
`
`419
`
`A
`
`
`
`The convergence properties of this operator seem to be unknown, and like
`the linear operator it can assign nonzero weights to maximally incompatible label
`ings. However, its behavior can accord with intuition, as the following example
`shows.
`An Example
`Computing the covariances and correlations for the set of labels of Fig.
`12.9be yields Table 12.3.
`Figure 12.10 shows the nonlinear operator of
`ample of Fig. 12.9. Figure 12.10 shows several cases.
`
` Eq. (12.29) operating on the ex
`
` to apriori probabilities (%, % %, %).
` {>,—}: convergence to "most probable"
`
`1. Equal initial weights: c o n v e r g e n ce
`2. Equal weights in the component
`floating triangle labeling.
`3. Slight bias toward a
` flap labeling is not enough to overcome convergence to the
`"most probable" labeling, as in (2).
`4. Like (3), but greater bias elicits the "improbable" labeling.
`5. Contradicatory biases toward "improbable" labelings: convergence to "most
`probable" labeling instead.
`6. Like (5), but stronger bias toward one "improbable" labeling elicits it.
`7. Bias toward one of the components
` {>,—}, {<, + } converges to most prob
`able labeling in that component.
`
`8. Like (7), only biased to less probable labelling in a component.
`
`12.4.5 Relaxation as Linear Programming
`
`The Idea
`Linear programming (LP) provides some useful metaphors for thinking
`about relaxation computations, as well as actual algorithms and a rigorous basis
`[Hummel and Zucker 1980]. In this section we follow the development of [Hinton
`1979].
`
`Table 12.3
`
`Xi X 2
`
`cov(X), X 2)
`
`cor(Xi, X 2)
`
`>
`>
`> >
`>
`<
`
`
`
`7/64
`?64
`%
`
`7 /l5
`5 A/105
`5/VT05
`
`420
`
`Ch. 12
`
`Inference
`
`
`
`0
`
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`
`0
`
`0
`0
`
`0
`
`
`
`0 0
`
`0
`
`
`
`0 0
`
`0
`0
`
`0
`
`0
`
`
`0 0
`
`
`
`0 0
`
`0
`
`1
`0
`0
`
`0
`
`
`
`0 0
`
`1
`
`
`0 0
`
`0
`
`0
`
`0
`
`1
`0 0
`
`
`
`Limit
`
`0
`0
`
`0
`
`0
`
`
`
`0 0
`
`0
`0
`0
`
`
`
`
`
`1 1
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`
`0
`
`0
`
`0
`
`
`
`1 1 1
`
`
`
`0
`
`
`
`1 1
`
`1 1
`1
`
`0
`0
`0
`
`0
`
`0
`0
`
`0.8
`0
`
`0
`
`0.02
`0.02
`0.02
`
`0
`0
`
`0.94
`
`0
`
`0
`
`0
`
`0
`0.05
`0.05
`
`1
`1
`1
`
`1
`1
`
`1
`
`1
`1
`0
`
`0
`0
`0
`
`0.37 0.37 0.13 0.13
`0.37 0.37 0.13 0.13
`0.37 0.37 0.13 0.13
`
`0.93
`
`0
`
`0
`
`0.03
`
`0
`
`0
`
`0.2
`0.2
`0.2
`
`
`
`0 0
`
`0
`
`0
`0
`
`0
`
`0
`
`
`
`0 0 0
`
`
`
`
`
`0 0 0
`
`
`
`P3i+)
`
`•
`
`•
`
`(b)
`
`*
`
`•
`
`'
`
`•
`
`+
`
`> <
`
`how the label weights are displayed, and (c) shows a number of cases (see text).
`Fig. 12.10 The nonlinear operator produces labelings for the triangle in (a), (b) shows
`
`0.2
`0.2
`0.25 0.25 0.25 0.25
`0.2
`0.3
`
`0.4
`
`0.2
`
`0.3
`
`0.2
`
`0.2
`0.2
`0.2
`
`0.3
`0.3
`0.3
`
`0.2
`0.2
`0.2
`
`0.3
`0.3
`0.3
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0.5
`0.7
`0.8
`
`0.5
`0.7
`0.7
`
`0.5
`0.7
`0.5
`
`0.5
`0.6
`0.5
`
`0.5
`0.5
`0.5
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0.5
`0.3
`0.2
`
`0.5
`0.3
`0.3
`
`0.5
`0.3
`0.5
`
`0.5
`0.4
`0.5
`
`0.5
`0.5
`0.5
`
`(8)
`
`(7)
`
`(6)
`
`(5)
`
`(4)
`
`(3)
`
`(2)
`
`0.25 0.25 0.25 0.25
`0.25 0.25 0.25 0.25
`0.25 0.25 0.25 0.25
`
`initial weights
`
`Case
`
`0.2
`
`
`
`1 1
`
`0
`
`0
`
`0
`
`0
`
`0.98
`0.98
`0.98
`
`
`
`1 1
`
`0.06
`
`10
`0.95
`0.95
`
`0.97
`
`1
`
`1 1
`
`0.07
`
`1
`
`0
`
`0
`
`0
`
`0.98
`0.98
`0.98
`
`0.33 0.33 0.17 0.17
`0.33 0.33 0.17 0.17
`0.33 0.33 0.17 0.17
`
`0.2
`0.2
`0.2
`
`iterations
`After 20 to 30
`
`iterations
`After 2 to 3
`
`33
`
`a2
`
`«1 P, (»
`
`0.23 0.16 0.45 0.16
`0.35 0.20 0.25 0.20
`0.38 0.17 0.29 0.16
`
`0.41 0.13 0.32 0.14
`0.41 0.13 0.32 0.14
`0.41 0.13 0.32 0.14
`
`0
`
`0
`0
`
`0.17
`0.49
`0.7
`
`0
`0
`
`0
`
`0.16
`0.5
`0.5
`
`0.36
`0.64
`0.36
`
`0.37
`0.51
`0.37
`
`0.2
`0.2
`0.2
`
`0.2
`0.2
`0.2
`
`0 0 0 0 0 0 0 0 0 0 0 0
`
`0.3
`0.3
`0.3
`
`0.83
`0.51
`0.3
`
`0.84
`0.5
`0.5
`
`0.64
`0.36
`0.64
`
`0.62
`0.49
`0.62
`
`0.8
`0.8
`0.8
`
`0.3
`0.3
`0.3
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`
`
`
`To put relaxation in terms of linear programming, we use the following trans
`lations.
`. LABEL WEIGHT VECTORS => POINTS IN EUCLIDEAN NSPACE. Each
` hypothesis, to which a weight
`possible assignment of a label to an object is a
`(supposition value) is to be attached. With N hypotheses, an Nvector of
`weights describes a labeling. We shall call this vector a (hypothesis or label)
`weight vector. For m labels and n objects, we need at most Euclidean wmspace.
`. CONSTRAINTS =$> INEQUALITIES. Constraints are mapped into
` linear ine
`qualities in hypothesis weights, by way of various identities like those of "fuzzy
`logic" [Zadeh 1965]. Each inequality determines an infinite halfspace. The
`weight vectors within this halfspace satisfy the constraint. Those outside do
`not. The convex solid that is the set intersection of all the halfspaces includes
`those weight vectors that satisfy all the constraints: each represents a "con
`sistent" labeling. In linear programming terms, each such weight vector is a
`feasible solution. We thus have the usual geometric interpretation of the linear
`programming problem, which is to find the best (optimal) consistent (feasible)
` integervalued (1
`labeling (solution, or weight vector). Solutions should have
`or 0valued) weights indicating convergence to actual labelings, not probabilis
` 12.4.3, or the one shown in Fig. 12.10c, case 1.
`tic ones such as those of Section
`• HYPOTHESIS PREFERENCES =^> PREFERENCE VECTOR. Often some
` a priori
`hypotheses (label assignments) are preferred to others, on the basis of
`knowledge, image evidence, and so on. To express this preference, make an
`Ndimensional preference vector, which expresses the relative importance
`(preference) of the hypotheses. Then
`• The preference of a labeling is the dot product of the preference vector
`and the weight vector (it is the sum for all hypotheses of the weight of
`each hypothesis times its preference).
` preference direction in /Vspace. The op
`• The preference vector defines a
`timal feasible solution is that one "farthest" in the preference direc
`tion. Let x and y be feasible solutions; they are /Vdimensional weight
` z = x y has a component in the
`vectors satisfying all constraints. If
`positive preference direction, then x is a better solution than y, by the
`definition of the preference of a labeling.
`
`It is helpful for our intuition to let the preference direction define a "down
`ward" direction in Nspace as gravity does in our threespace. Then we wish to
`pick the lowest (most preferred) feasible solution vector.
`• LABELING => OPTIMAL SOLUTION. The relaxation algorithm must solve
`the linear programming problem—find the best consistent labeling. Under the
`conditions we have outlined, the best solution vector occurs generally at a ver
`tex of the Nspace solid. This is so because usually a vertex will be the "lowest"
`part of the convex solid in the preference direction. It is a rare coincidence that
`the solid "rests on a face or edge," but when it does a whole edge or face of the
`solid contains equally preferred solutions (the preference direction is normal to
`
`422
`
`Ch. 12 Inference
`
`
`
`the edge or face). For integer solutions, the solid should be the convex hull of
`integer solutions and not have any vertices at noninteger supposition values.
`The "simplex algorithm" is the best known solution method in linear pro
`gramming. It proceeds from vertex to vertex, seeking the one that gives the op
`timal solution. The simplex algorithm is not suited to parallel computation, how
`ever, so here we describe another approach with the flavor of hillclimbing optimi
`zation. Basically, any such algorithm moves the weight vector around in //space,
`iteratively adjusting weights. If they are adjusted one at a time, serial relaxation is
`taking place; if they are all adjusted at once, the relaxation is parallel iterative. The
`feasible solution solid and the preference vector define a "cost function" over all
`TVspace, which acts like a potential function in physics. The algorithm tries to
`reach an optimum (minimum) value for this cost function. As with many optimi
`zation algorithms, we can think of the algorithm as trying to simulate (in Nspace)
`a ball bearing (the weight vector) rolling along some path down to a point of
`minimum gravitational (cost) potential. Physics helps the ball bearing find the
`minimum; computer optimization techniques are sometimes less reliable.
`Translating Constraints to Inequalities
`The supposition values, or hypothesis weights, may be encoded into the in
` 1 meaning "true." The extension of weights
`terval [0,1], with 0 meaning "false,"
`to the whole interval is reminiscent of "fuzzy logic," in which truth values may be
` 12.4.3, we denote suppo
`continuous over some range [Zadeh 1965]. As in Section
`sition values by /?(• ); //, A, B, and Care label assignment events, which may be
`considered as hypotheses that the labels are correctly assigned. ~, V, A, =^
`< => are the usual logical connectives relating hypotheses. The connectives allow
`the expression of complex constraints. For instance, a constraint might be "Label
` a n d only if z is labeled V or q is labelled V ." This constraint relates
`x as '/ if
`three hypotheses: h\. (x is " / ' ), h
`2 (zis "w>"), hy. (qis "v"). The constraint is
`then/?, <=^> (h 2V h 3).
`Inequalities may be derived from constraints this way.
`1. Negation. p(H) = 1 p{~ (H)).
`2. Disjunction. The sums of weights of the disjunct are greater than or equal to
`one. p(A\J B\l..
`.V C) gives the inequalityp{A) + p{B) + . .. + p(C) >
`1.
`3. Conjunction. These are simply separate inequalities, one per conjunct. In par
`ticular, a conjunction of disjunctions may be dealt with conjunct by conjunct,
`producing one disjunctive inequality per conjunct.
`4. Arbitrary expressions. These must be put into conjunctive normal form
`(Chapter 10) by rewriting all connectives as A's and V s. Then (3) applies.
` two hypotheses A and B, with the
`As an example, consider the simple case of
` 1 through 4 results in the following
`single constraint that A => B. Applying rules
` [0,1]. The fifth
`five inequalities inp(A) andp(B); the first four assure weights in
`arises from the logical constraint, since A =>Bis
`the same as B\/
`'(A).
`
`> and
`
`Scene Labeling and Constraint
`
` Relaxation
`
`423
`
`
`
`0<p(A)
`p(A) < 1
`O^p(B)
`p(B) « 1
`or
`>l
`p(B)>p(A)
`p(B) + (lp(A))
`These inequalities are shown in Fig.
` 12.11. As expected from the => con
`straint, optimal feasible solutions exist at: (1,1) or
` (A,B)\ (0,1) or (~(A),B); (0,0)
`or (~(A), ~(B)). Which of these is preferred depends on the preference vector. If
`both its components are positive, (A,B) is preferred. If both are negative, ("(A),
`~(B)) is preferred, and so on.
`A Solution Method
`Here we describe (in prose) a search algorithm that can find the optimal feasi
`ble solution to the linear programming problem as described above. The descrip
`tion makes use of the mechanical analogy of an TVdimensional solid of feasible
`solutions, oriented in TVspace so that the preference vector induces a "downward"
`direction in space. The algorithm attempts to move the vector of hypothesis
`weights to the point in space representing the feasible solution of maximum prefer
`ence. It should be clear that this is a point on the surface of the solid, and unless the
`preference vector is normal to a face or edge of the solid, the point is a unique
`"lowest" vertex.
`To establish a potential that leads to feasible solutions, one needs a measure
`of the infeasibility of a weight vector for each constraint. Define the amount a vec
`tor violates a constraint to be zero if it is on the feasible side of the constraint hy
`perplane. Otherwise the violation is the normal distance of the vector to the hyper
`th hyperplane (Appendix 1) and w the
`plane. If h, is the coefficient vector of the i
`weight vector, this distance is
`
`d, = w • h,
`
`(12.30)
`
`i,o)
`^ \ ^§ >X^\N>^:
`
`B
`f 1 ' 1 ' ! !^
`
`p(Q)>p(P)
`
`p{P)>0
`
`4 24
`
`p(Q)>0
`
`Fig. 12.11 The feasible region for two
`hypotheses A and B and the constraint A
`B. Optimal solutions may occur at the
`three vertices. The preferred vertex will
`be that one farthest in the direction of
`the preference vector, or lowest if the
`preference vector defines "down."
`
`Ch. 12
`
`Inference
`
`
`
`If we then define the infeasibility as
`
`(12.31)
`
`/ = I ~
`/ 2
`then dl/ddj = d, is the rate the infeasibility changes for changes in the violation.
`The force exerted by each constraint is proportional to the normal distance from
`the weight vector to the feasible region denned by that constraint, and tends to pull
`the weight vector onto the surface of the solid.
`Now add a weak "gravitylike" force in the preference direction to make the
`weight vector drift to the optimal vertex. At this point an optimization program
`might perform as shown in Fig. 12.12.
`Figure 12.12 illustrates a problem: The forces of preference and constraints
`will usually dictate a minimum potential outside the solid (in the preference direc
`tion). Fixes must be applied to force the weight vector back to the closest (presum
` 1 and low ones to 0, or
`ably the optimum) vertex. One might round high weights to
`add another local force to draw vectors toward vertices.
`Examples
`An algorithm based on the principles outlined in the preceeding section was
`successfully used to label scenes of "puppets" such as Fig. 12.13 with body parts
`[Hinton 1979].
`The discrete, consistencyoriented version of line labeling may be extended
`to incorporate the notion of optimal landings. Such a system can cope with the ex
`plosive increase in consistent