throbber
244
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 11. NO. 3. MARCH 1989
`
`Algorithmic Techniques for Computer Vision on a
`Fine-Grained Parallel Machine
`
`Absfruct-We describe algorithms for several problems from com-
`puter vision, and illustrate how they are implemented using a set of
`primitive parallel operations. The primitives we use include general
`permutations, grid permutations, and the scan operation-a
`restricted
`form of the prefix computation. We cover well-known problems allow-
`ing us to concentrate on the implementations rather than the problems.
`First, we describe some simple routines such as border following, com-
`puting histograms and filtering. We then discuss several modules built
`on these routines including edge detection, Hough transforms, and
`connected component labeling. Finally, we describe how these modules
`are composed into higher level vision modules.
`By defining the routines using a set of primitives operations, we ab-
`stract away from a particular architecture. In particular, we do not
`have to worry about features of machines such as the number of pro-
`cessors or whether a tightly connected architecture has a hypercube
`network or a four-dimentional grid network. We do, however, still need
`to worry about the relative performance of the primitives on particular
`machines. We discuss the tradeoffs among primitives and try to iden-
`tify which primitives are most important for particular problems. All
`the primitives discussed are supported by the Connection Machine
`(CM), and we outline how they are implemented. We have imple-
`mented most of the algorithms we describe on the Connection Ma-
`chine.
`
`computational geometry, computational
`Index Terms-algorithms,
`vision, early vision, edge detection, parallel computation, parallel vec-
`tor model.
`
`I. INTRODUCTION
`HE purpose of this paper is to outline the implemen-
`
`T tation of a set of vision modules on a parallel model
`
`of computation and to show the tradeoffs among various
`parallel primitives. The modules we discuss solve well-
`known problems allowing us to concentrate on their im-
`plementation.
`We cover the following problems:
`Gaussian convolution
`edge detection
`border following
`Hough transform
`
`connected components
`convex hull
`Voronoi diagram
`generalized Hough transform for recognition
`The model we use to describe the algorithms is a par-
`allel vector model of computation [6]. In a parallel vector
`model (hereafter, vector model), all the primitive opera-
`tions are defined to work on a vector of atomic values.
`For example, we might have a primitive that elementwise
`adds two vectors, or a primitive that sorts a vector. In this
`paper, we identify a set of useful and efficient primitive
`operations and illustrate some general problems in which
`they are useful.
`Since the parallelism of vector models comes from op-
`erating in parallel over a collection (vector) of atomic val-
`ues, vector models are fine-grained models. Fine-grained
`models are well suited to problems where data are nu-
`merous and computations on the data elements are simi-
`lar. Early vision problems have these properties. We show
`how to express early vision algorithms in terms of rou-
`tines using primitives of the vector model and modules
`composed of these routines. Middle and high level vision
`problems are not so clearly suited to fine-grained solu-
`tions. However, we next show how, using the routines
`and modules in the vector model, these problems can also
`be formulated in a simple, natural manner for implemen-
`tation on a fine-grained architecture.
`The rest of this paper is organized as follows.
`We describe the vector model and the primitive op-
`erations we use in this paper. All algorithms described in
`this paper are built on these primitives.
`We outline how these primitive operations are imple-
`mented on the Connection Machine (CM). This should give
`the reader a more tangible idea of how the algorithms map
`on a real parallel machine.
`We describe a set of elementary routines based on
`the primitives, including pointer jumping, ordering, re-
`gion summation, outer product, and computing histo-
`grams.
`We describe a set of vision modules based on the
`primitives and the elementary routines, including Gauss-
`ian convolution, edge detection, border following, Hough
`transform, connected components, convex hull, Voronoi
`diagram, and generalized Hough transform for recogni-
`tion.
`
`Manuscript received March 1, 1988; revised September 26, 1988.
`This work was supported in part by the Advanced Research Projects
`Agency of the Department of Defense under Army Contract DACA76-85-
`C-0010, and in part under Office of Naval Research Contract N00014-85-
`K-0124, and in part by a grant from Hughes Aircraft Company.
`J . J. Little was with the Artificial Intelligence Laboratory, Massachu-
`setts Institute of Technology, Cambridge, MA 02139. He is now with the
`Department of Computer Science, University of British Columbia, Van-
`couver, BC, Canada V6T 1W5.
`G. E. Blelloch was with the Artificial Intelligence Laboratory, Massa-
`chusetts Institute of Technology, Cambridge, MA 02139. He is now with
`the Department of Computer Science, Camegie Mellon University. Pitts-
`A . Vector Model
`burgh, PA.
`T. A. Cass is with the Artificial Intelligence Laboratory, Massachusetts
`A vector model is defined in terms of a set of primitive
`Institute of Technology, Cambridge, MA 02139.
`operations that require vectors as input and return vectors
`IEEE Log Number 8825717.
`0162-8828/89/0300-0244$01 .OO O 1989 IEEE
`
`Oracle-1030 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`LITTLE er al.: COMPUTER VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`~
`
`245
`
`as output. A vector is an ordered set of atomic values-
`such as integers, floating point or boolean values. Each
`element of a vector has an index. The set of vector prim-
`itives supplied by a vector model defines a virtual vector
`machine. In this section we define a set of vector primi-
`tives which we use in this paper. The primitives we use
`were selected both because they map efficiently onto ma-
`chine instructions of many existing architectures, and be-
`cause they prove very powerful in algorithm design.
`Many known algorithms solve the problems we con-
`sider, but are specialized to a particular architecture, such
`as a pyramid or mesh-connected computer ([33], [31],
`[32]). These algorithms typically include a description of
`primitives similar to those we describe, specialized for the
`particular architecture. By constructing our algorithms
`from primitives, we can describe the algorithms simply
`and uniformly. Of course, this generality does not permit
`the algorithms to exploit specific properties of a particular
`architecture, its interconnection topology or the machine
`structure. For example, certain tree structures can be
`mapped to a hypercube so that all directly connected ele-
`ments in the tree are neighbors in the hypercube. Because
`our algorithms follow the abstraction of the more general
`vector model primitives, in some cases they may be
`suboptimal for a given architecture. These algorithms
`tradeoff speed for simplicity and generality. Many of our
`primitives take O( log n ) time on particular networks,
`such as the hypercube. When mapping onto a particular
`network all complexities given in terms of the number of
`primitive operations can be converted into complexities
`in word operations simply by multiplying by the log n
`factor [6].
`We believe that highly interconnected multicomputers,
`that can be used for fine-grained computation, will be the
`rule rather than the exception. Further, algorithms for
`these machines should be constructed independent of the
`particular underlying interconnection topology, since the
`details of the interconnection scheme may vary due to
`many technology dependent considerations, such as cost
`or the scale of the network [4].
`We choose a vector model instead of a parallel random
`access machine (P-RAM) model 1151 for three reasons.
`First, with a vector model it is easier to consider com-
`munication primitives which are either less powerful or
`more powerful than a reference into a shared memory.
`For example, we consider a primitive that permutes on a
`is less powerful than a shared memory refer-
`grid-this
`ence-and we consider a scan primitive, discussed later,
`which would require O(log n ) time on a P-RAM model.
`Second, vector models are strictly data parallel models
`1201 allowing them to be implemented on a broader set of
`architectures: they can be implemented on single instruc-
`tion multiple data (SIMD) architectures, on vector archi-
`tectures, and on multiple
`instruction multiple data
`(MIMD) architectures. Third, they are more concrete than
`the P-RAM models: in the P-RAM models many ques-
`tions are not well specified, such as how the processors
`are synchronized and how tasks are allocated to proces-
`sors.
`
`Since it may be unclear how the vector model maps
`onto real hardware, later in this section we outline how
`the vector primitives we describe are implemented on the
`connection machine; Appendix A includes a chart dis-
`playing their running times. Most of the algorithms in this
`paper have been implemented on the Connection Ma-
`chine.
`1) Elementwise Arithmetic and Logical Primi-
`tives: Each elementwise primitive operates on equal
`length vectors, producing a result vector of equal length.
`The element i of the result is an elementary arithmetic or
`logical primitive-such as + , - , *, OR and NOT-applied
`to element i of each of the input vectors, for example,
`= [ 5 1 3 4 3
`9
`2
`61
`A
`B
`8 1
`= [ 2 5 3
`3
`6
`21
`A + B = [ 7 6 6 12 4 12
`8
`81
`A x B = [ l o 5 9 32 3 27 12 121.
`2) Permutation Primitives: The permutation primitive
`takes two vector arguments: a data vector and an index
`vector. The permutation primitive permutes each element
`in the data vector to the location specified in the index
`vector. It is an error for more than one element to have
`the same index-the permutation must be one-to-one. This
`restriction is similar to the restriction made in the exclu-
`sive read exclusive write (EREW) P-RAM model, in
`which it is an error to write more than one value to a
`particular memory location at a time. An example of a
`permutation is
`
`= [ 0 1 2 3 4 5 6 7 8 9 ]
`Index
`I ]
`i
`= [ g r o
`s h m a
`Data
`t
`= [ 2 4 3 6 5 9 7 8 0 11
`I
`i
`t h m s ] .
`permute (Data, I ) = [ a 1 g o r
`The combining primitives also take a data vector and an
`index vector, but they allow many-to-one mappings: the
`indexes need not be unique. Values with the same index
`are combined using a binary associative operator. In this
`paper, we only use the binary operators +, first, maxi-
`mum, minimum, and o r as combiners; the first operator
`returns the first of its two arguments. We will henceforth
`refer to these combine operations as first-combine, +-
`combine, max-combine, min-combine, and or-com-
`bine. Some examples of these primitives follow:
`
`Index
`Data
`I
`
`= [ 0 1 2 3 4 5 6 71
`= [ 5 1 3 4 3 9 2 61
`
`= [ 2 5 4 3 1 6 3 51 w
`
`+-combine(Data, I ) = [0 3 5 6 3 7 9 01
`max-combine(Data, I ) = [0 3 5 4 3 6 9 01.
`
`Oracle-1030 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`246
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE. VOL. 11. NO. 3. MARCH 1989
`
`only use +, maximum, minimum, or, and, and first as
`operators for the scan primitives (the operator first returns
`the first of its two arguments). We will henceforth call
`these scan operations +-scan, max-scan, min-scan, or-
`scan, and-scan, and first-scan. Some examples are as
`follows:
`
`= [ 5 1 3 4
`61
`2
`9
`3
`A
`+-scan(A) = [ 5 6 9 13 16 25 27 331
`max-scan(A) = [ 5 5 5
`5
`5
`9
`9
`91
`5
`51
`first-scan(A) = [5 5 5
`5
`5
`5
`
`The first-combine primitive is analogous to the concur-
`rent write (CW) primitive of the P-RAM model. Since
`some machines support a permutation instruction but do
`not support concurrent writes, we use the combine prim-
`itives sparingly and suggest alternative schemes even in
`those cases. Even in architectures without specific hard-
`ware to support concurrent writes, the combine primitives
`are relatively efficient when the number of data elements
`combined at the destination is small.
`To allow communication between vectors of different
`sizes, we include a version of the permute primitive that
`returns a vector of different length than the source vec-
`tors. This version takes two extra arguments: one that
`specifies the length of the destination, and another Bool-
`ean vector that specifies which elements appear in the re-
`sult.
`3) Grid Permutations Primitives: On many architec-
`tures, permutations to nearest neighbors on a grid can be
`implemented significantly more efficiently than a general
`vector permutation. This is especially true for machines
`with processors embedded on a grid. We therefore con-
`sider grid permutations separately. A grid permutation
`maps a vector onto a grid and permutes elements to the
`closet neighbor in some direction on the grid. One way to
`map a vector onto the grid is in row major order. For
`example a 2 X 4 grid might be mapped onto a vector as
`follows:
`
`5 1 3 4
`[3 9 2 61
`XIndex = [0 1 2 3 0 1 2 31
`1
`1 1
`YIndex = [0 0 0 0 1
`1
`c
`= [ 5 1 3 4 3 9 2 61.
`
`In this paper we only consider two-dimensional grids
`and we use the primitives x-up, x-down, y-up, and
`y-down to permute in the four possible directions. For
`simplicity, we assume that the values wrap around in each
`dimension, but this feature is not important in the algo-
`-
`rithms we describe.
`Some examples are as follows:
`
`4) Scan Primitives: The scan primitives execute a scan
`operation, sometimes called a prefix computation 1241,
`sociative operator 8 , and a vector [a,, a l , . . . , a, -, ]
`1231, on a vector. The scan operation takes a binary as-
`of n elements, and returns the vector [ao, ( a , CB a l ) ,
`- *
`* , ( a o @ a l CB
`CB a,- ,)]. In this paper, we will
`
`*
`
`*
`
`*
`
`We will also use a version of each scan that runs in the
`reverse order: starting at the end of the vector. We call
`these versions backscans. In some algorithms, we will use
`enumerate to specify a +-scan in which each element
`has the value 1 or 0. The enumerate operation has the
`effect of giving sequential numbers to the elements with
`a 1.
`The grid scan primitives are analogous to the vector
`scans but operate on a grid ordering instead of the vector
`ordering: each column or row can be viewed as a vector.
`For example,
`
`5 1 3 4
`c = [
`3 9 2 6
`
`I:
`
`1:
`
`1:
`
`x-+-scan( C ) =
`
`]
`3
`
`5) Global Primitives: The global primitives reduce the
`values in a vector using a binary associative operator. As
`with the scan and combine primitives, we only use the
`operators +, maximum, minimum, or, and, and first.
`Some examples are as follows~
`
`= [ 5 I 3 4 3 9 2 61
`A
`+-reduce(A) = 33
`max-reduce(A) = 9.
`
`The global primitive can be implemented with either a
`scan or a combine operation, but since global primitives
`might be more efficient if implemented directly, we con-
`sider them separately.
`6) Segmented Primitives: A vector can be broken into
`contiguous segments in which the beginning of each seg-
`ment is marked with a flag. For example,
`
` 4 3 9
`= [ 5 1
`61
`2
`A
`3
`F F F T F]
`Segment-Flags = [T F T
`I ] [3 4 3 91
`[5
`[2 61.
`
`We can define segmented versions of both the permu-
`tation primitives and the scan primitives which work in-
`dependently within each segment. For example,
`
`Oracle-1030 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`LITTLE et al.: COMPUTER
`
`VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`241
`
`FRONT END 4
`I
`
`MICROCONTROLLER
`
`
`
`Segment-Flags = [T
`= [ 5
`A
`B
`= [ l
`+-scanA
`= [ 5
`max-scan A = [ 5
`permute A , B = [ 1
`
`F T F F F T F ]
`11 [ 3 4
`[2 61
`3
`91
`01
`[2 0
`3
`11
`[O 11
`61
`[ 3 7 10 191 [2 81
`51 [ 3 4 4 91 [2 61
`3
`[2 61.
`51 [4 9
`31
`
`The segmented primitives are very useful when work-
`ing on many sets of data. A more complete description of
`the use of segments can be found in [6].
`
`B. Primitives on the Connection Machine
`In this section, we outline how each primitive operation
`described in the previous section is implemented on the
`Connection Machine. Although this description can be
`found elsewhere [19], [ 5 ] , we believe it is important to
`give the reader a feeling of how the problems we discuss
`actually map onto real hardware. Appendix A includes
`actual running times for each primitives on the Connec-
`tion Machine.
`The Connection Machine [19] is a fine-grained single
`instruction multiple data (SIMD) parallel computer with
`between 8K and 64K processors (see Fig. 1). Each pro-
`cessor is a 1 bit serial processor, with 64K bits of local
`memory, an 8 MHz clock, and optional floating point
`hardware. All processors are controlled by a microcon-
`troller which receives macro instructions from a front end
`computer. Physically, the processors are organized into a
`hypercube network with 16 processors at each node (cor-
`ner) of a 12-dimensional hypercube. Logically, each pro-
`cessor has a unique 16 bit address; in the interconnection
`network, two processors are directly connected when their
`addresses differ exactly in one bit. The wires of the hy-
`percube network are shared by the grid permutation prim-
`itives, the scan primitives, and the permutation and com-
`bine primitives.
`The Connection Machine supports a virtual vector ma-
`chine by distributing the elements of a vector across the
`processors. When using a vector of length m on an n pro-
`cessor CM, r m / n 1 vector elements are placed in the
`memory of each processor. Each processor is responsible
`for the vector elements in its memory. The mapping of
`vectors onto processors is supported in microcode and is
`transparent to the user.
`I ) Elementwise Primitives: The elementwise vector
`primitives are implemented on the CM by loading an ele-
`ment of each argument vector from the memory of each
`processor into the processor, operating on these elements
`and storing the result back to memory. The software guar-
`antees that elements with the same index from different
`equal length vectors are located in the same processor
`memory. This guarantees that the elementwise primitives
`never require communication among processors.
`When there are many elements per processor, each pro-
`cessor loops over all of its elements.
`2) Permutations: The vector permutation primitives
`
`COMMUNICATIONS
`NETWORK
`
`I I
`I
`MEMORY PROCESSORS
`Fig. 1. Block diagram of the Connection Machine (CM-2).
`
`and the vector combining primitives are supported by the
`routing hardware of the Connection Machine. The router
`uses a packet-switched message routing scheme that di-
`rects messages along the hypercube wires to their desti-
`nations. The combine primitives are supported with com-
`bining router switches similar to those suggested for the
`NYU Ultracomputer [ 171.
`Vector permutations, which take O( 1 ) time in the vec-
`tor model, are simulated on the Connection Machine in
`O(1og n ) time with high probability. As with the ele-
`mentwise primitives, when there are more vector ele-
`ments than processors, the CM loops over the elements in
`each processor.
`3) Grid Permutations: The grid permutations are sup-
`ported on the CM using the same hypercube wires as the
`router. Each dimension of a grid is placed in Gray code
`order with respect to the hypercube ordering. A Gray code
`assigns an index to each element in the grid such that the
`indices of neighboring grid elements differ only in one bit;
`then, neighboring grid elements are separated by a single
`wire. Because of this ordering, the grid permutation prim-
`itives are significantly faster than the general permutation
`primitive.
`4) Scan Operations: The scan vector primitives are
`implemented on the CM using a binary tree algorithm.
`This algorithm uses the same wires as the router but does
`not use the routing hardware. Fig. 2 shows an example of
`the tree algorithm used to implement the scan operation.
`The algorithm works in two sweeps. The up-sweep sums
`elements from below and places the left child in the mem-
`ory of each node. The down sweep sends down the left
`branch the value from above and sends down the right
`branch the sum of the value from above with the contents
`of the memory.
`The scan primitives take O(1og n ) time and in practice
`are usually faster than the general permutation primitive.
`When there are more vector elements than processors, the
`CM first scans the elements in each processor, then does
`a single scan across the processors and finally distributes
`the result back across the elements. This relies on contig-
`uous elements of a vector being laid out in a single pro-
`cessor.
`
`Oracle-1030 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`248
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE, VOL. I I , NO. 3, MARCH 1989
`
`3
`
`1
`
`3
`
`
`
`0
`
`3 4 6 6
`1 0 1 1 1 2
`1
`1
` 4
`
`0
`2
`After Down Sweep
`Up Sweep
`After
`Fig. 2. Implementing the +-scan operation on a tree. The result of the
`scan is the result of the down sweep plus the original values (see text for
`explanation).
`
`ment gets the result of applying some binary associative
`operator to all previous elements in the list. The scan
`primitives introduced earlier can be considered the special
`case of such an operation in which all pointers point to
`the next element in the vector: the scans have an implicit
`ordering of the elements while the prefix on a list has an
`explicit (data dependent) ordering.2 By designating an
`element on each linked cycle as the head of the cycle, we
`can also compute a prefix computation on a set of linked
`cycles: the prefix is defined to start at the head and go
`around the cycle back to the head.
`The prefix computation on a set of linked cycles can be
`calculated using pointer jumping [40], also known as re-
`cursive doubling. For a vector of length n, pointer jump-
`ing requires at most O(1og n ) steps. Each step requires
`two calls to the permute primitive and a few elementwise
`operations. In the remainder of this section, we show how
`pointer jumping can be used to calculate a prefix operation
`at the same time as determining a head element for each
`cycle. Given a unique key to each element (the element
`index will do), we define the head of each ring to be the
`element of the ring with the largest key. The prefix com-
`putation we use in the example is the rank computation
`where each element determines its position in the cycle:
`this is the special case of a prefix in which the operator is
`+ and the values are all 1.
`Fig. 3 shows an example of pointer jumping on a cycle
`of eight values. At each step of the algorithm, we keep
`three vectors: nextl, head, and position. Initially, the
`value in nextl for each element points to the next element
`in the cycle, head contains the key of the element, and
`position is zero. At step i of the algorithm, nextl points to
`the element 2' positions ahead in the cycle, head contains
`the largest key of all the elements preceding the given ele-
`ment by up to 2' positions, and position contains the po-
`sition relative to the current head. After O(1og n ) steps,
`head contains the largest key in the ring and position con-
`tains the position of each element relative to the head of
`the ring.
`At each step, head, position, and the element index (a
`return pointer) are permuted to the element index speci-
`fied by nextl. This moves the data in the ring, and gives
`the recipient element a pointer back to the sending ele-
`ment. Each element now generates a new head value by
`taking the maximum of its head value and the head value
`it received, from element other. Each element also gen-
`erates a new posirion in the following way:
`if (other-head > head)
`+ other-position + 2"
`then position
`else position + posirion.
`Each element now passes its nextl back to the element
`specified by the return pointer; this doubles the distance
`of each pointer. After O( log n ) steps, this procedure will
`be complete. In the next section we show how this routine
`can be used to reorder a set of data.
`
`Grid scans are implemented in a similar way: a tree is
`used for each row or each column.
`
`11. ROUTINES
`The primitives described in the previous section sim-
`plify implementing more complex manipulations of data
`structures. Routines are functions composed of the var-
`ious primitives; the routines form important building
`blocks for the vision algorithms given later. These rou-
`tines include:
`pointer jumping
`ordering
`region summation
`outer product
`histograms.
`
`A . Pointer Jumping
`In vision algorithms, as with algorithms in general, it
`is often necessary to operate on a set of linked cycles or
`linked lists of elements. By a set of linked cycles we mean
`a vector in which each element has a pointer (an index)
`to another element. All the pointers must be unique,
`therefore, forming cycles of pointers.' For example, the
`vector
`= [ 0 1 2 3 4 5 6 71
`Index
`Linked Cycles = [4 0 5 7 6 3 1 21
`defines the two cycles
`0 - + 4 + 6 + 1 + 0
`7 + 2 -+ 5 + 3 + 7.
`In a set of linked lists we allow certain elements to con-
`tain a null pointer, so, for example, the vector
`= [ 0 1 2 3 4 5 6 71
`Index
`Linked Lists = [ 4 n 5 n 6 3 1 21
`defines the two linked lists
`0 + 4 - + 6 + 1
`5 + 3.
`7 - + 2
`In this section, we consider the problem of executing a
`prefix operation on a set of linked lists so that each ele-
`
`-+
`
`'It is interesting to note that the vector that defines a set o f linked cycles
`also defines a permutation: the two types of vectors must have the same
`properties.
`
`*In [ 2 3 ] , the authors call the two types of prefix computations data de-
`pendent prefix computations and data independent prefix computations.
`
`Oracle-1030 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`LITTLE et a l . : COMPUTER VISION ON A FINE-GRAINED PARALLEL MACHINE
`
`249
`
`19
`
`12
`
`11
`
`19
`
`6
`
`5
`
`
`19
`
`19
`
`(d)
`(C)
`Fig. 3 . Pointer jumping: (a) shows the initial ring, with clockwise point-
`ers. The outer numbers are the head values, the inner, the posifion. (b),
`(c), and (d) show the connections and how the values change in steps 1,
`2. and 3 .
`
`B. Ordering
`Ordering transforms a set of elements having explicit
`order, represented as linked lists or cycles, into contig-
`uous segments with implicit order (see Fig. 4). For ex-
`ample, the set of edge pixels in an image can be locally
`formed into a linked list by local grid permutations. It
`may be necessary, for example, during output, to reorder
`the points on the edges into a serial stream. Ordering is
`also generally useful as a preliminary step for reconfigur-
`ing elements so that scan operations can be applied.
`The ordering procedure consists of the following steps.
`1) Number elements in the list; compute list length.
`2) Allocate a segment for each list, thereby generating
`offsets.
`3) Distribute the list offset to list elements.
`4) Add the indexes within each list to the offset of each
`list.
`The input to the ordering procedure is two vectors: A ,
`a value vector, and I , a vector of pointers linking the ele-
`ments. First, pointer jumping, as in the example in the
`preceding section, constructs H , a vector whose elements
`are the indexes of the head of each connected group under
`I , and N , a vector representing the position of each ele-
`ment in its list [Fig. 4 (a)]. Next, the value of the maxi-
`mum position in each list moves to its head either by
`pointer jumping or by a max-combine operation on N ,
`using H as index vector. This constructs a vector L con-
`taining the maximum position of the list at the head ele-
`ment, and 0 otherwise; the length of the list is obtained
`by adding one to the maximum position [Fig. 4(b)]. The
`following operations compute the offset for consecutive
`elements in the new order. A +-scan operation on L con-
`structs the vector L'; the offset E for each segment is ob-
`tained by subtracting elementwise L from L' [see Fig.
`4(c)]. Pointer jumping E, using I as index vector, using
`first, distributes the offset to all elements in the list; this
`produces a new vector E'. Finally, each element finds its
`index 0 in the new order by adding E' elementwise to N
`
`*
`
`#B
`L
`
`* O
`
`g
`
`
`
`Fig. 4. Ordering curves: (a) N , positions of elements in curves. (b) L,
`lengths of curves at head. (c) E , offsets of curves. (d) 0 = E' + N ,
`indexes in new order.
`
`[Fig. 4(d)]. A permute operation using 0 as index vector
`reorders the data elements in the vector A serially. Order-
`ing requires an initial pointer jumping step, a second
`pointer jumping step (or a max-combine), one +-scan,
`and a final pointer jumping step to generate the order vec-
`tor.
`C. Region Summation
`Grid scans are useful for many modules in early vision.
`A useful procedure in early vision requires each pixel in
`a grid to sum a ( 2 m + 1 ) x ( 2 m + 1 ) square region
`around it. This procedure can be implemented using a
`constant number of grid scans and permutations (see Fig.
`5 ) . First, for each element, find the sum of the 2m pixels
`around it along a row by the following steps: perform a
`grid scan using plus in x, then permute elements at f m
`offset in x in the scan result to the central element, and
`take their difference. Repeat this in the y-direction on the
`result of the first step. This procedure based on scan op-
`erations takes a constant amount of time, independent of
`m. For small regions a simple grid shift and add method
`might be f a ~ t e r . ~ Boxcar convolution, described in Sec-
`tion IV-A, is efficiently implemented using this tech-
`nique.
`D. Outer Product
`{ < a i , b,> > = { a i > x { b ; } . The outer product has
`The outer product is the Cartesian product of two sets:
`
`many applications, for example, in an implementation of
`the Hough transform described in Section V. Given two
`vectors A and B where A has m elements and B has n
`
`'On the Connection Machine we found the tradeoff at approximately a
`10 X logrid.
`
`Oracle-1030 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`

`

`250
`
`IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE, VOL. 11, NO. 3, MARCH 1989
`
`+-scan
`
`1
`
`K-1 K
`
`K
`
`K K+1 K+l K+3 K+3 K+3 K+3
`
`K+4
`
`+-scan
`Fig. 5. Region summation: the sum of square regions. A +-scan in x is
`followed by two permute operations, then a subtraction. Then a +-scan
`on this result in y is followed again by two permute operations, and a
`subtraction.
`
`F
`
`T
`
`I
`
`F
`
`F
`
`T
`
`F
`
`T
`
`F
`
`F
`
`F
`
`T
`
`I+1
`
`I+6
`I+4 IC5
`I+2 1+3
`Segment Flags L
`
`1+7 I+S 1+9
`
`T
`
`F
`
`F
`
`T
`
`F
`
`T
`
`F
`
`F
`
`F
`
`T
`
`F
`
`
`
`
`
`elements, the outer product of A and B is the vector R , of
`length mn, whose elements are all pairs of elements of A
`and B . The outer product is implemented with a constant
`number of scan and permutation operations. In the outer
`product, each element a; of A must be copied n times, and
`each element b, of B must be copied m times (see Fig. 6).
`The first step creates two expanded vectors. Each element
`ai of A is permuted to index ni. This creates a new vector
`A , in which each element a; of A is followed by n - 1
`empty elements. Each nonempty element begins a new
`segment. A segmented first-scan copies these elements,
`so that, in A*, each a, is represented n times in a contig-
`uous block. A similar operation replicates each element
`of B m times in a new vector BZ. A permute operation
`moves the i th element in B2 to a new index j where j =
`n ( i mod m ) + L i / m J , producing a new vector B3. Fi-
`nally, the pair of vectors A2 and B3 is the desired outer
`product.
`
`E. Histograms
`A histogram of a finite set is a function which gives the
`frequency of occurrence of a value in the set. The histo-
`gram of the gray-level values in a k bit image, for exam-
`ple, is a table H of length 2k in which each element Hi is
`a count of the number of pixels having value i. Viewed
`in the vector model, a histogram maps a vector K to a new
`vector H where the element of H at index i represents the
`number of elements of K of value i.
`There are several ways to compute the histogram in the
`vector model; these include: using sorting, using the com-
`bine primitives, using counting, and using probing. Each
`displays different strengths.
`1) Histogram by Sorting: To histogram a vector K by
`sorting, we first generate the vector S = sort ( K ) to con-
`figure the elements as in Fig. 7. We create a Boo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket