`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 11. NO. 3. MARCH 1989
`
`Algorithmic Techniques for Computer Vision on a
`Fine-Grained Parallel Machine
`
`Absfruct-We describe algorithms for several problems from com-
`puter vision, and illustrate how they are implemented using a set of
`primitive parallel operations. The primitives we use include general
`permutations, grid permutations, and the scan operation-a
`restricted
`form of the prefix computation. We cover well-known problems allow-
`ing us to concentrate on the implementations rather than the problems.
`First, we describe some simple routines such as border following, com-
`puting histograms and filtering. We then discuss several modules built
`on these routines including edge detection, Hough transforms, and
`connected component labeling. Finally, we describe how these modules
`are composed into higher level vision modules.
`By defining the routines using a set of primitives operations, we ab-
`stract away from a particular architecture. In particular, we do not
`have to worry about features of machines such as the number of pro-
`cessors or whether a tightly connected architecture has a hypercube
`network or a four-dimentional grid network. We do, however, still need
`to worry about the relative performance of the primitives on particular
`machines. We discuss the tradeoffs among primitives and try to iden-
`tify which primitives are most important for particular problems. All
`the primitives discussed are supported by the Connection Machine
`(CM), and we outline how they are implemented. We have imple-
`mented most of the algorithms we describe on the Connection Ma-
`chine.
`
`computational geometry, computational
`Index Terms-algorithms,
`vision, early vision, edge detection, parallel computation, parallel vec-
`tor model.
`
`I. INTRODUCTION
`HE purpose of this paper is to outline the implemen-
`
`T tation of a set of vision modules on a parallel model
`
`of computation and to show the tradeoffs among various
`parallel primitives. The modules we discuss solve well-
`known problems allowing us to concentrate on their im-
`plementation.
`We cover the following problems:
`Gaussian convolution
`edge detection
`border following
`Hough transform
`
`connected components
`convex hull
`Voronoi diagram
`generalized Hough transform for recognition
`The model we use to describe the algorithms is a par-
`allel vector model of computation [6]. In a parallel vector
`model (hereafter, vector model), all the primitive opera-
`tions are defined to work on a vector of atomic values.
`For example, we might have a primitive that elementwise
`adds two vectors, or a primitive that sorts a vector. In this
`paper, we identify a set of useful and efficient primitive
`operations and illustrate some general problems in which
`they are useful.
`Since the parallelism of vector models comes from op-
`erating in parallel over a collection (vector) of atomic val-
`ues, vector models are fine-grained models. Fine-grained
`models are well suited to problems where data are nu-
`merous and computations on the data elements are simi-
`lar. Early vision problems have these properties. We show
`how to express early vision algorithms in terms of rou-
`tines using primitives of the vector model and modules
`composed of these routines. Middle and high level vision
`problems are not so clearly suited to fine-grained solu-
`tions. However, we next show how, using the routines
`and modules in the vector model, these problems can also
`be formulated in a simple, natural manner for implemen-
`tation on a fine-grained architecture.
`The rest of this paper is organized as follows.
`We describe the vector model and the primitive op-
`erations we use in this paper. All algorithms described in
`this paper are built on these primitives.
`We outline how these primitive operations are imple-
`mented on the Connection Machine (CM). This should give
`the reader a more tangible idea of how the algorithms map
`on a real parallel machine.
`We describe a set of elementary routines based on
`the primitives, including pointer jumping, ordering, re-
`gion summation, outer product, and computing histo-
`grams.
`We describe a set of vision modules based on the
`primitives and the elementary routines, including Gauss-
`ian convolution, edge detection, border following, Hough
`transform, connected components, convex hull, Voronoi
`diagram, and generalized Hough transform for recogni-
`tion.
`
`Manuscript received March 1, 1988; revised September 26, 1988.
`This work was supported in part by the Advanced Research Projects
`Agency of the Department of Defense under Army Contract DACA76-85-
`C-0010, and in part under Office of Naval Research Contract N00014-85-
`K-0124, and in part by a grant from Hughes Aircraft Company.
`J . J. Little was with the Artificial Intelligence Laboratory, Massachu-
`setts Institute of Technology, Cambridge, MA 02139. He is now with the
`Department of Computer Science, University of British Columbia, Van-
`couver, BC, Canada V6T 1W5.
`G. E. Blelloch was with the Artificial Intelligence Laboratory, Massa-
`chusetts Institute of Technology, Cambridge, MA 02139. He is now with
`the Department of Computer Science, Camegie Mellon University. Pitts-
`A . Vector Model
`burgh, PA.
`T. A. Cass is with the Artificial Intelligence Laboratory, Massachusetts
`A vector model is defined in terms of a set of primitive
`Institute of Technology, Cambridge, MA 02139.
`operations that require vectors as input and return vectors
`IEEE Log Number 8825717.
`0162-8828/89/0300-0244$01 .OO O 1989 IEEE
`
`Oracle-1030 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`LITTLE er al.: COMPUTER VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`~
`
`245
`
`as output. A vector is an ordered set of atomic values-
`such as integers, floating point or boolean values. Each
`element of a vector has an index. The set of vector prim-
`itives supplied by a vector model defines a virtual vector
`machine. In this section we define a set of vector primi-
`tives which we use in this paper. The primitives we use
`were selected both because they map efficiently onto ma-
`chine instructions of many existing architectures, and be-
`cause they prove very powerful in algorithm design.
`Many known algorithms solve the problems we con-
`sider, but are specialized to a particular architecture, such
`as a pyramid or mesh-connected computer ([33], [31],
`[32]). These algorithms typically include a description of
`primitives similar to those we describe, specialized for the
`particular architecture. By constructing our algorithms
`from primitives, we can describe the algorithms simply
`and uniformly. Of course, this generality does not permit
`the algorithms to exploit specific properties of a particular
`architecture, its interconnection topology or the machine
`structure. For example, certain tree structures can be
`mapped to a hypercube so that all directly connected ele-
`ments in the tree are neighbors in the hypercube. Because
`our algorithms follow the abstraction of the more general
`vector model primitives, in some cases they may be
`suboptimal for a given architecture. These algorithms
`tradeoff speed for simplicity and generality. Many of our
`primitives take O( log n ) time on particular networks,
`such as the hypercube. When mapping onto a particular
`network all complexities given in terms of the number of
`primitive operations can be converted into complexities
`in word operations simply by multiplying by the log n
`factor [6].
`We believe that highly interconnected multicomputers,
`that can be used for fine-grained computation, will be the
`rule rather than the exception. Further, algorithms for
`these machines should be constructed independent of the
`particular underlying interconnection topology, since the
`details of the interconnection scheme may vary due to
`many technology dependent considerations, such as cost
`or the scale of the network [4].
`We choose a vector model instead of a parallel random
`access machine (P-RAM) model 1151 for three reasons.
`First, with a vector model it is easier to consider com-
`munication primitives which are either less powerful or
`more powerful than a reference into a shared memory.
`For example, we consider a primitive that permutes on a
`is less powerful than a shared memory refer-
`grid-this
`ence-and we consider a scan primitive, discussed later,
`which would require O(log n ) time on a P-RAM model.
`Second, vector models are strictly data parallel models
`1201 allowing them to be implemented on a broader set of
`architectures: they can be implemented on single instruc-
`tion multiple data (SIMD) architectures, on vector archi-
`tectures, and on multiple
`instruction multiple data
`(MIMD) architectures. Third, they are more concrete than
`the P-RAM models: in the P-RAM models many ques-
`tions are not well specified, such as how the processors
`are synchronized and how tasks are allocated to proces-
`sors.
`
`Since it may be unclear how the vector model maps
`onto real hardware, later in this section we outline how
`the vector primitives we describe are implemented on the
`connection machine; Appendix A includes a chart dis-
`playing their running times. Most of the algorithms in this
`paper have been implemented on the Connection Ma-
`chine.
`1) Elementwise Arithmetic and Logical Primi-
`tives: Each elementwise primitive operates on equal
`length vectors, producing a result vector of equal length.
`The element i of the result is an elementary arithmetic or
`logical primitive-such as + , - , *, OR and NOT-applied
`to element i of each of the input vectors, for example,
`= [ 5 1 3 4 3
`9
`2
`61
`A
`B
`8 1
`= [ 2 5 3
`3
`6
`21
`A + B = [ 7 6 6 12 4 12
`8
`81
`A x B = [ l o 5 9 32 3 27 12 121.
`2) Permutation Primitives: The permutation primitive
`takes two vector arguments: a data vector and an index
`vector. The permutation primitive permutes each element
`in the data vector to the location specified in the index
`vector. It is an error for more than one element to have
`the same index-the permutation must be one-to-one. This
`restriction is similar to the restriction made in the exclu-
`sive read exclusive write (EREW) P-RAM model, in
`which it is an error to write more than one value to a
`particular memory location at a time. An example of a
`permutation is
`
`= [ 0 1 2 3 4 5 6 7 8 9 ]
`Index
`I ]
`i
`= [ g r o
`s h m a
`Data
`t
`= [ 2 4 3 6 5 9 7 8 0 11
`I
`i
`t h m s ] .
`permute (Data, I ) = [ a 1 g o r
`The combining primitives also take a data vector and an
`index vector, but they allow many-to-one mappings: the
`indexes need not be unique. Values with the same index
`are combined using a binary associative operator. In this
`paper, we only use the binary operators +, first, maxi-
`mum, minimum, and o r as combiners; the first operator
`returns the first of its two arguments. We will henceforth
`refer to these combine operations as first-combine, +-
`combine, max-combine, min-combine, and or-com-
`bine. Some examples of these primitives follow:
`
`Index
`Data
`I
`
`= [ 0 1 2 3 4 5 6 71
`= [ 5 1 3 4 3 9 2 61
`
`= [ 2 5 4 3 1 6 3 51 w
`
`+-combine(Data, I ) = [0 3 5 6 3 7 9 01
`max-combine(Data, I ) = [0 3 5 4 3 6 9 01.
`
`Oracle-1030 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`246
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE. VOL. 11. NO. 3. MARCH 1989
`
`only use +, maximum, minimum, or, and, and first as
`operators for the scan primitives (the operator first returns
`the first of its two arguments). We will henceforth call
`these scan operations +-scan, max-scan, min-scan, or-
`scan, and-scan, and first-scan. Some examples are as
`follows:
`
`= [ 5 1 3 4
`61
`2
`9
`3
`A
`+-scan(A) = [ 5 6 9 13 16 25 27 331
`max-scan(A) = [ 5 5 5
`5
`5
`9
`9
`91
`5
`51
`first-scan(A) = [5 5 5
`5
`5
`5
`
`The first-combine primitive is analogous to the concur-
`rent write (CW) primitive of the P-RAM model. Since
`some machines support a permutation instruction but do
`not support concurrent writes, we use the combine prim-
`itives sparingly and suggest alternative schemes even in
`those cases. Even in architectures without specific hard-
`ware to support concurrent writes, the combine primitives
`are relatively efficient when the number of data elements
`combined at the destination is small.
`To allow communication between vectors of different
`sizes, we include a version of the permute primitive that
`returns a vector of different length than the source vec-
`tors. This version takes two extra arguments: one that
`specifies the length of the destination, and another Bool-
`ean vector that specifies which elements appear in the re-
`sult.
`3) Grid Permutations Primitives: On many architec-
`tures, permutations to nearest neighbors on a grid can be
`implemented significantly more efficiently than a general
`vector permutation. This is especially true for machines
`with processors embedded on a grid. We therefore con-
`sider grid permutations separately. A grid permutation
`maps a vector onto a grid and permutes elements to the
`closet neighbor in some direction on the grid. One way to
`map a vector onto the grid is in row major order. For
`example a 2 X 4 grid might be mapped onto a vector as
`follows:
`
`5 1 3 4
`[3 9 2 61
`XIndex = [0 1 2 3 0 1 2 31
`1
`1 1
`YIndex = [0 0 0 0 1
`1
`c
`= [ 5 1 3 4 3 9 2 61.
`
`In this paper we only consider two-dimensional grids
`and we use the primitives x-up, x-down, y-up, and
`y-down to permute in the four possible directions. For
`simplicity, we assume that the values wrap around in each
`dimension, but this feature is not important in the algo-
`-
`rithms we describe.
`Some examples are as follows:
`
`4) Scan Primitives: The scan primitives execute a scan
`operation, sometimes called a prefix computation 1241,
`sociative operator 8 , and a vector [a,, a l , . . . , a, -, ]
`1231, on a vector. The scan operation takes a binary as-
`of n elements, and returns the vector [ao, ( a , CB a l ) ,
`- *
`* , ( a o @ a l CB
`CB a,- ,)]. In this paper, we will
`
`*
`
`*
`
`*
`
`We will also use a version of each scan that runs in the
`reverse order: starting at the end of the vector. We call
`these versions backscans. In some algorithms, we will use
`enumerate to specify a +-scan in which each element
`has the value 1 or 0. The enumerate operation has the
`effect of giving sequential numbers to the elements with
`a 1.
`The grid scan primitives are analogous to the vector
`scans but operate on a grid ordering instead of the vector
`ordering: each column or row can be viewed as a vector.
`For example,
`
`5 1 3 4
`c = [
`3 9 2 6
`
`I:
`
`1:
`
`1:
`
`x-+-scan( C ) =
`
`]
`3
`
`5) Global Primitives: The global primitives reduce the
`values in a vector using a binary associative operator. As
`with the scan and combine primitives, we only use the
`operators +, maximum, minimum, or, and, and first.
`Some examples are as follows~
`
`= [ 5 I 3 4 3 9 2 61
`A
`+-reduce(A) = 33
`max-reduce(A) = 9.
`
`The global primitive can be implemented with either a
`scan or a combine operation, but since global primitives
`might be more efficient if implemented directly, we con-
`sider them separately.
`6) Segmented Primitives: A vector can be broken into
`contiguous segments in which the beginning of each seg-
`ment is marked with a flag. For example,
`
` 4 3 9
`= [ 5 1
`61
`2
`A
`3
`F F F T F]
`Segment-Flags = [T F T
`I ] [3 4 3 91
`[5
`[2 61.
`
`We can define segmented versions of both the permu-
`tation primitives and the scan primitives which work in-
`dependently within each segment. For example,
`
`Oracle-1030 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`LITTLE et al.: COMPUTER
`
`VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`241
`
`FRONT END 4
`I
`
`MICROCONTROLLER
`
`
`
`Segment-Flags = [T
`= [ 5
`A
`B
`= [ l
`+-scanA
`= [ 5
`max-scan A = [ 5
`permute A , B = [ 1
`
`F T F F F T F ]
`11 [ 3 4
`[2 61
`3
`91
`01
`[2 0
`3
`11
`[O 11
`61
`[ 3 7 10 191 [2 81
`51 [ 3 4 4 91 [2 61
`3
`[2 61.
`51 [4 9
`31
`
`The segmented primitives are very useful when work-
`ing on many sets of data. A more complete description of
`the use of segments can be found in [6].
`
`B. Primitives on the Connection Machine
`In this section, we outline how each primitive operation
`described in the previous section is implemented on the
`Connection Machine. Although this description can be
`found elsewhere [19], [ 5 ] , we believe it is important to
`give the reader a feeling of how the problems we discuss
`actually map onto real hardware. Appendix A includes
`actual running times for each primitives on the Connec-
`tion Machine.
`The Connection Machine [19] is a fine-grained single
`instruction multiple data (SIMD) parallel computer with
`between 8K and 64K processors (see Fig. 1). Each pro-
`cessor is a 1 bit serial processor, with 64K bits of local
`memory, an 8 MHz clock, and optional floating point
`hardware. All processors are controlled by a microcon-
`troller which receives macro instructions from a front end
`computer. Physically, the processors are organized into a
`hypercube network with 16 processors at each node (cor-
`ner) of a 12-dimensional hypercube. Logically, each pro-
`cessor has a unique 16 bit address; in the interconnection
`network, two processors are directly connected when their
`addresses differ exactly in one bit. The wires of the hy-
`percube network are shared by the grid permutation prim-
`itives, the scan primitives, and the permutation and com-
`bine primitives.
`The Connection Machine supports a virtual vector ma-
`chine by distributing the elements of a vector across the
`processors. When using a vector of length m on an n pro-
`cessor CM, r m / n 1 vector elements are placed in the
`memory of each processor. Each processor is responsible
`for the vector elements in its memory. The mapping of
`vectors onto processors is supported in microcode and is
`transparent to the user.
`I ) Elementwise Primitives: The elementwise vector
`primitives are implemented on the CM by loading an ele-
`ment of each argument vector from the memory of each
`processor into the processor, operating on these elements
`and storing the result back to memory. The software guar-
`antees that elements with the same index from different
`equal length vectors are located in the same processor
`memory. This guarantees that the elementwise primitives
`never require communication among processors.
`When there are many elements per processor, each pro-
`cessor loops over all of its elements.
`2) Permutations: The vector permutation primitives
`
`COMMUNICATIONS
`NETWORK
`
`I I
`I
`MEMORY PROCESSORS
`Fig. 1. Block diagram of the Connection Machine (CM-2).
`
`and the vector combining primitives are supported by the
`routing hardware of the Connection Machine. The router
`uses a packet-switched message routing scheme that di-
`rects messages along the hypercube wires to their desti-
`nations. The combine primitives are supported with com-
`bining router switches similar to those suggested for the
`NYU Ultracomputer [ 171.
`Vector permutations, which take O( 1 ) time in the vec-
`tor model, are simulated on the Connection Machine in
`O(1og n ) time with high probability. As with the ele-
`mentwise primitives, when there are more vector ele-
`ments than processors, the CM loops over the elements in
`each processor.
`3) Grid Permutations: The grid permutations are sup-
`ported on the CM using the same hypercube wires as the
`router. Each dimension of a grid is placed in Gray code
`order with respect to the hypercube ordering. A Gray code
`assigns an index to each element in the grid such that the
`indices of neighboring grid elements differ only in one bit;
`then, neighboring grid elements are separated by a single
`wire. Because of this ordering, the grid permutation prim-
`itives are significantly faster than the general permutation
`primitive.
`4) Scan Operations: The scan vector primitives are
`implemented on the CM using a binary tree algorithm.
`This algorithm uses the same wires as the router but does
`not use the routing hardware. Fig. 2 shows an example of
`the tree algorithm used to implement the scan operation.
`The algorithm works in two sweeps. The up-sweep sums
`elements from below and places the left child in the mem-
`ory of each node. The down sweep sends down the left
`branch the value from above and sends down the right
`branch the sum of the value from above with the contents
`of the memory.
`The scan primitives take O(1og n ) time and in practice
`are usually faster than the general permutation primitive.
`When there are more vector elements than processors, the
`CM first scans the elements in each processor, then does
`a single scan across the processors and finally distributes
`the result back across the elements. This relies on contig-
`uous elements of a vector being laid out in a single pro-
`cessor.
`
`Oracle-1030 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`248
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE, VOL. I I , NO. 3, MARCH 1989
`
`3
`
`1
`
`3
`
`
`
`0
`
`3 4 6 6
`1 0 1 1 1 2
`1
`1
` 4
`
`0
`2
`After Down Sweep
`Up Sweep
`After
`Fig. 2. Implementing the +-scan operation on a tree. The result of the
`scan is the result of the down sweep plus the original values (see text for
`explanation).
`
`ment gets the result of applying some binary associative
`operator to all previous elements in the list. The scan
`primitives introduced earlier can be considered the special
`case of such an operation in which all pointers point to
`the next element in the vector: the scans have an implicit
`ordering of the elements while the prefix on a list has an
`explicit (data dependent) ordering.2 By designating an
`element on each linked cycle as the head of the cycle, we
`can also compute a prefix computation on a set of linked
`cycles: the prefix is defined to start at the head and go
`around the cycle back to the head.
`The prefix computation on a set of linked cycles can be
`calculated using pointer jumping [40], also known as re-
`cursive doubling. For a vector of length n, pointer jump-
`ing requires at most O(1og n ) steps. Each step requires
`two calls to the permute primitive and a few elementwise
`operations. In the remainder of this section, we show how
`pointer jumping can be used to calculate a prefix operation
`at the same time as determining a head element for each
`cycle. Given a unique key to each element (the element
`index will do), we define the head of each ring to be the
`element of the ring with the largest key. The prefix com-
`putation we use in the example is the rank computation
`where each element determines its position in the cycle:
`this is the special case of a prefix in which the operator is
`+ and the values are all 1.
`Fig. 3 shows an example of pointer jumping on a cycle
`of eight values. At each step of the algorithm, we keep
`three vectors: nextl, head, and position. Initially, the
`value in nextl for each element points to the next element
`in the cycle, head contains the key of the element, and
`position is zero. At step i of the algorithm, nextl points to
`the element 2' positions ahead in the cycle, head contains
`the largest key of all the elements preceding the given ele-
`ment by up to 2' positions, and position contains the po-
`sition relative to the current head. After O(1og n ) steps,
`head contains the largest key in the ring and position con-
`tains the position of each element relative to the head of
`the ring.
`At each step, head, position, and the element index (a
`return pointer) are permuted to the element index speci-
`fied by nextl. This moves the data in the ring, and gives
`the recipient element a pointer back to the sending ele-
`ment. Each element now generates a new head value by
`taking the maximum of its head value and the head value
`it received, from element other. Each element also gen-
`erates a new posirion in the following way:
`if (other-head > head)
`+ other-position + 2"
`then position
`else position + posirion.
`Each element now passes its nextl back to the element
`specified by the return pointer; this doubles the distance
`of each pointer. After O( log n ) steps, this procedure will
`be complete. In the next section we show how this routine
`can be used to reorder a set of data.
`
`Grid scans are implemented in a similar way: a tree is
`used for each row or each column.
`
`11. ROUTINES
`The primitives described in the previous section sim-
`plify implementing more complex manipulations of data
`structures. Routines are functions composed of the var-
`ious primitives; the routines form important building
`blocks for the vision algorithms given later. These rou-
`tines include:
`pointer jumping
`ordering
`region summation
`outer product
`histograms.
`
`A . Pointer Jumping
`In vision algorithms, as with algorithms in general, it
`is often necessary to operate on a set of linked cycles or
`linked lists of elements. By a set of linked cycles we mean
`a vector in which each element has a pointer (an index)
`to another element. All the pointers must be unique,
`therefore, forming cycles of pointers.' For example, the
`vector
`= [ 0 1 2 3 4 5 6 71
`Index
`Linked Cycles = [4 0 5 7 6 3 1 21
`defines the two cycles
`0 - + 4 + 6 + 1 + 0
`7 + 2 -+ 5 + 3 + 7.
`In a set of linked lists we allow certain elements to con-
`tain a null pointer, so, for example, the vector
`= [ 0 1 2 3 4 5 6 71
`Index
`Linked Lists = [ 4 n 5 n 6 3 1 21
`defines the two linked lists
`0 + 4 - + 6 + 1
`5 + 3.
`7 - + 2
`In this section, we consider the problem of executing a
`prefix operation on a set of linked lists so that each ele-
`
`-+
`
`'It is interesting to note that the vector that defines a set o f linked cycles
`also defines a permutation: the two types of vectors must have the same
`properties.
`
`*In [ 2 3 ] , the authors call the two types of prefix computations data de-
`pendent prefix computations and data independent prefix computations.
`
`Oracle-1030 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`LITTLE et a l . : COMPUTER VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`249
`
`19
`
`12
`
`11
`
`19
`
`6
`
`5
`
`
`19
`
`19
`
`(d)
`(C)
`Fig. 3 . Pointer jumping: (a) shows the initial ring, with clockwise point-
`ers. The outer numbers are the head values, the inner, the posifion. (b),
`(c), and (d) show the connections and how the values change in steps 1,
`2. and 3 .
`
`B. Ordering
`Ordering transforms a set of elements having explicit
`order, represented as linked lists or cycles, into contig-
`uous segments with implicit order (see Fig. 4). For ex-
`ample, the set of edge pixels in an image can be locally
`formed into a linked list by local grid permutations. It
`may be necessary, for example, during output, to reorder
`the points on the edges into a serial stream. Ordering is
`also generally useful as a preliminary step for reconfigur-
`ing elements so that scan operations can be applied.
`The ordering procedure consists of the following steps.
`1) Number elements in the list; compute list length.
`2) Allocate a segment for each list, thereby generating
`offsets.
`3) Distribute the list offset to list elements.
`4) Add the indexes within each list to the offset of each
`list.
`The input to the ordering procedure is two vectors: A ,
`a value vector, and I , a vector of pointers linking the ele-
`ments. First, pointer jumping, as in the example in the
`preceding section, constructs H , a vector whose elements
`are the indexes of the head of each connected group under
`I , and N , a vector representing the position of each ele-
`ment in its list [Fig. 4 (a)]. Next, the value of the maxi-
`mum position in each list moves to its head either by
`pointer jumping or by a max-combine operation on N ,
`using H as index vector. This constructs a vector L con-
`taining the maximum position of the list at the head ele-
`ment, and 0 otherwise; the length of the list is obtained
`by adding one to the maximum position [Fig. 4(b)]. The
`following operations compute the offset for consecutive
`elements in the new order. A +-scan operation on L con-
`structs the vector L'; the offset E for each segment is ob-
`tained by subtracting elementwise L from L' [see Fig.
`4(c)]. Pointer jumping E, using I as index vector, using
`first, distributes the offset to all elements in the list; this
`produces a new vector E'. Finally, each element finds its
`index 0 in the new order by adding E' elementwise to N
`
`*
`
`#B
`L
`
`* O
`
`g
`
`
`
`Fig. 4. Ordering curves: (a) N , positions of elements in curves. (b) L,
`lengths of curves at head. (c) E , offsets of curves. (d) 0 = E' + N ,
`indexes in new order.
`
`[Fig. 4(d)]. A permute operation using 0 as index vector
`reorders the data elements in the vector A serially. Order-
`ing requires an initial pointer jumping step, a second
`pointer jumping step (or a max-combine), one +-scan,
`and a final pointer jumping step to generate the order vec-
`tor.
`C. Region Summation
`Grid scans are useful for many modules in early vision.
`A useful procedure in early vision requires each pixel in
`a grid to sum a ( 2 m + 1 ) x ( 2 m + 1 ) square region
`around it. This procedure can be implemented using a
`constant number of grid scans and permutations (see Fig.
`5 ) . First, for each element, find the sum of the 2m pixels
`around it along a row by the following steps: perform a
`grid scan using plus in x, then permute elements at f m
`offset in x in the scan result to the central element, and
`take their difference. Repeat this in the y-direction on the
`result of the first step. This procedure based on scan op-
`erations takes a constant amount of time, independent of
`m. For small regions a simple grid shift and add method
`might be f a ~ t e r . ~ Boxcar convolution, described in Sec-
`tion IV-A, is efficiently implemented using this tech-
`nique.
`D. Outer Product
`{ < a i , b,> > = { a i > x { b ; } . The outer product has
`The outer product is the Cartesian product of two sets:
`
`many applications, for example, in an implementation of
`the Hough transform described in Section V. Given two
`vectors A and B where A has m elements and B has n
`
`'On the Connection Machine we found the tradeoff at approximately a
`10 X logrid.
`
`Oracle-1030 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`250
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE, VOL. 11, NO. 3, MARCH 1989
`
`+-scan
`
`1
`
`K-1 K
`
`K
`
`K K+1 K+l K+3 K+3 K+3 K+3
`
`K+4
`
`+-scan
`Fig. 5. Region summation: the sum of square regions. A +-scan in x is
`followed by two permute operations, then a subtraction. Then a +-scan
`on this result in y is followed again by two permute operations, and a
`subtraction.
`
`F
`
`T
`
`I
`
`F
`
`F
`
`T
`
`F
`
`T
`
`F
`
`F
`
`F
`
`T
`
`I+1
`
`I+6
`I+4 IC5
`I+2 1+3
`Segment Flags L
`
`1+7 I+S 1+9
`
`T
`
`F
`
`F
`
`T
`
`F
`
`T
`
`F
`
`F
`
`F
`
`T
`
`F
`
`
`
`
`
`elements, the outer product of A and B is the vector R , of
`length mn, whose elements are all pairs of elements of A
`and B . The outer product is implemented with a constant
`number of scan and permutation operations. In the outer
`product, each element a; of A must be copied n times, and
`each element b, of B must be copied m times (see Fig. 6).
`The first step creates two expanded vectors. Each element
`ai of A is permuted to index ni. This creates a new vector
`A , in which each element a; of A is followed by n - 1
`empty elements. Each nonempty element begins a new
`segment. A segmented first-scan copies these elements,
`so that, in A*, each a, is represented n times in a contig-
`uous block. A similar operation replicates each element
`of B m times in a new vector BZ. A permute operation
`moves the i th element in B2 to a new index j where j =
`n ( i mod m ) + L i / m J , producing a new vector B3. Fi-
`nally, the pair of vectors A2 and B3 is the desired outer
`product.
`
`E. Histograms
`A histogram of a finite set is a function which gives the
`frequency of occurrence of a value in the set. The histo-
`gram of the gray-level values in a k bit image, for exam-
`ple, is a table H of length 2k in which each element Hi is
`a count of the number of pixels having value i. Viewed
`in the vector model, a histogram maps a vector K to a new
`vector H where the element of H at index i represents the
`number of elements of K of value i.
`There are several ways to compute the histogram in the
`vector model; these include: using sorting, using the com-
`bine primitives, using counting, and using probing. Each
`displays different strengths.
`1) Histogram by Sorting: To histogram a vector K by
`sorting, we first generate the vector S = sort ( K ) to con-
`figure the elements as in Fig. 7. We create a Boo