`WORLD INTELLECfUAL PROPERTY ORGANIZATION
`International Bureau
`INTERNATIONAL APPLICATION PUBLISHED UNDER 1HE PATENT COOPERATION TREATY (PCT)
`WO 97/07451
`
`(11) International Publication Number:
`
`(51) International Patent Classification 6 :
`G06F 7/00
`
`A2
`
`(43) International Publication Date:
`
`27 February 1997 (27.02.97)
`
`(81) Designated States: AL, AM, AT, AU, AZ, BB, BG, BR, BY,
`CA,CH,CN,CU,CZ,DE,DK,EE,ES,FI,GB,GE,HU,
`IL, IS, JP, KE, KG, KP, KR, KZ, LK, LR, LS, LT, LU, LV,
`MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU,
`SD, SE, SG, SI, SK, TJ, TM, 1R, TT, UA, UG, US, UZ,
`VN, ARIPO patent (KE, LS, MW, SD, SZ, UG), Eurasian
`patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European
`patent (AT, BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT,
`LU, MC, NL, PT, SE), OAPI patent (BF, BJ, CF, CG, CI,
`CM, GA, GN, ML, MR, NE, SN, TD, TG).
`
`Published
`Without international search report and to be republished
`upon receipt of that report.
`
`(21) International Application Number:
`
`PCT/US96/13195
`
`(22) International Filing Date:
`
`14 August 1996 (14.08.96)
`
`(30) Priority Data:
`08/516,398
`
`16 August 1995 (16.08.95)
`
`us
`
`(71) Applicant (for all designated States except US): MI-
`CROUNITY SYSTEMS ENGINEERING, INC. [US/US];
`255 Caspian Drive, Sunnyvale, CA 94089 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): KARZES, Thomas, J.
`[US/US]; Apartment 1-204, 1063 Morse Avenue, Sunny(cid:173)
`vale, CA 94089 (US). HANSEN, Craig, C. [US/US]; 350
`Yerba Santa Avenue, Los Altos, CA 94022 (US). MAS(cid:173)
`SALIN, Henry [US/US]; Apartment M302, 655 So. Fair
`Oaks, Sunnyvale, CA 94086 (US).
`
`(74) Agent: PETERSON, James, W.; Burns, Doane, Swecker &
`Mathis, L.L.P., P.O. Box 1404, Alexandria, VA 22313-1404
`(US).
`
`(54) Title: METHOD AND SYSTEM FOR IMPLEMENTING DATA MANIPULATION OPERATIONS
`
`DECODED INSTRUCTTON AND CONTROL OPERANDS
`
`•••
`
`C(1)
`
`CS(t)
`
`C(2)
`
`CS(2)
`
`C(n)
`
`CS(n)
`
`CSU(t)
`
`SEL(1)
`
`XC(t)
`
`XCS{1)
`
`DATA/IN
`
`02
`
`CDM(1}
`STAG£ 1
`
`CDM(2)
`STAGE 2
`
`D3
`
`CDM(J)
`STAG£ J
`
`DATA/OUT
`
`(57) Abstract
`
`A method and system for performing arbitrary permutations of sequences of elements. In the general case, the method of the present
`invention processes the elements to be permuted as a multi-dimensional array, where each element in the array corresponds to one of the
`elments to be permuted. The permutation is achieved by performing a sequence of sets of permutations, where each set of permutations in
`the sequence independently permutes the elements within each one-dimensional slice through the array, along some particular dimension of
`the array. The total number of sets of permutations, or stages, is one less than twice the number of dimensions in the array. An extension
`to the general method allows some extensions of permutations which involve the copying of individual elements. A system based on the
`extended general method implements a large class of operations which involve copying and/or permuting elements, where the sequence of
`elements is a word of data and the elements are bits of data. An efficient control structure for the system permits control signals to be
`shared across slices of the array. A version of the system based on a two-dimensional array includes three multiplex stages, where the first
`stage multiplexes along the rows, the second stage multiplexes along the columns, and the third stage multiplexes across the rows once
`again. Several classes of computer instructions which generally involve the copying and/or permuting of data are also described.
`
`Oracle-1006 p. 1
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`.,,
`
`..
`
`"'
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international
`applications under the PCT.
`
`AM
`AT
`AU
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`cs
`CZ
`DE
`DK
`EE
`ES
`FI
`FR
`GA
`
`Annenia
`Austria
`Australia
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`COie d'Ivoire
`Cameroon
`China
`Czechoslovakia
`Czech Republic
`Gennany
`Denmark
`Estonia
`Spain
`Finland
`France
`Gabon
`
`GB
`GE
`GN
`GR
`HU
`IE
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LI
`LK
`LR
`LT
`LU
`LV
`MC
`MD
`MG
`ML
`MN
`MR
`
`United Kingdom
`Georgia
`Guinea
`Greece
`Hungary
`Ireland
`Italy
`Japan
`Kenya
`Kyrgystan
`Democratic People's Republic
`of Korea
`Republic of Korea
`Kazakhstan
`Liechtenstein
`Sri Lanka
`Liberia
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`Mali
`Mongolia
`Mauritania
`
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`SI
`SK
`SN
`sz
`TD
`TG
`TJ
`TT
`UA
`UG
`us
`uz
`VN
`
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Ponugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`
`Oracle-1006 p. 2
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`- 1 -
`
`METHOD AND SYSTEM FOR IMPLEMENTING DATA
`MANIPULATION OPERATIONS
`
`FIELD OF THE INVENTION
`
`5
`
`The present invention relates to the field of bit and byte pe1mutations. and
`particularly to bit and byte pe1mutations pe1fo1med in order to ca11·y out operations
`in a digital data processing system. and particularly in a digital computer system.
`
`TERMINOLOGY
`
`This section defines several te1ms which are used in the rest of this
`document.
`
`10
`
`15
`
`The term ''crossbar" refers to an operation which, in general, takes as input
`some number of values. a, and produces as output some number of values, b.
`where each of the output values may take its value from any one of the input values.
`Each output value is completely independent of the other output values. A crossbar
`therefore functions as a general switching mechanism. It is very common for the
`number of input and output values to be the same, i.e., a= b. In this case, it is
`sometimes refen·ed to as an a-way, or a -wide, crossbar. The term crossbar is also
`used in a physical sense. in which case it refers to a switch which is capable of
`pe1fo1ming a crossbar operation.
`
`20
`
`The term "multiplexer" is very similar to the term "crossbar". In its most
`basic sense, a multiplexer is like a crossbar which produces a single output
`rather than several output values. In that sense, a crossbar may be constructed from
`several multiplexers, each of which takes the same input. It some cases, the tetm
`multiplex may implicitly refer to a set of multiplex operations which are
`independently applied to produce several output values. In this use it is synonymous
`25 with the te1m crossbar. It is usually clear from context whether the te1m
`multiplexer, or the term multiplex operation. is being used in the sense of a single
`output value vs. multiple output values. The term multiplex may be used in either a
`physical or an operational sense.
`
`30
`
`The term "perfect shuffle", or just ·'shuffle''. refers to the operation of
`effectively splitting a sequence of input values into several piles, then shuffling the
`piles together by periectly alternating between the piles in a cyclic manner. In
`
`,.
`
`Oracle-1006 p. 3
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`- 2 -
`
`general, an a -way perfect shuffle indicates that a piles are involved. If this
`number is unspecified in a reference to a shuffle operation, it may refer to a two-way
`shuffle, or it may refer to a multi-way shuffle with an unspecified number of piles,
`depending on the context. It is generally assumed that the total number of elements
`is a multiple of the number of piles involved in the shuffle, so that each pile has the
`same size. Perfect shuffles are discussed in more detail later. The term perfect
`shuffle may be used in either a physical or an operational sense.
`
`•
`
`The tenn "pe1fect deal", or just ''deal". refers to the operation of effectively
`dealing out a sequence of input values into several piles in a cyclic manner, then
`stacking the piles together. The dealing is done in a way which preserves the
`relative order of elements within each pile (as opposed to reversing them), so in a
`sense it is like dealing from the bottom of the "deck". In general, an a -way perfect
`deal indicates that a piles are involved. If this number is unspecified in a reference
`to a deal operation, it may refer to a two-way deal, or it may refer to a multi-way
`deal with an unspecified number of piles, depending on the context. It is generally
`assumed that the total number of elements is a multiple of the number of piles
`involved in the deal, so that each pile has the same size. Pe1fect deals are discussed
`in more detail later. The term perfect deal may be used in either a physical or an
`operational sense.
`
`Perfect shuffles and perfect deals are closely related. Perfect shuffles and
`perfect deals which use the same number of "piles" are inverses of each other.
`Furthermore, a multi-way pe1fect shuffle is always equivalent to some multi-way
`perfect deal. In particular. if the number of elements is ab, then an a -way pe1fect
`shuffle is equivalent to a b -way pe1fect deal.
`
`5
`
`10
`
`15
`
`20
`
`25 BACKGROUND OF THE INVENTION
`
`Digital data processing systems, and particularly digital computer systems,
`are generally able to perform some set of operations on words of data. In some
`systems the set of possible operations may be quite small, whereas in other systems
`the set may be fairly large.
`
`30
`
`It is convenient to group data processing operations into classes which are
`related in function and/or use. For example, floating point arithmetic (addition,
`subtraction, multiplication, etc.) often forms such a class in systems which support
`such operations. Similarly, integer ruithmetic may form a class, logical operations
`
`,;
`
`Oracle-1006 p. 4
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT /US96/13195
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`may form a class, and memory operations (loads and stores) may form a class.
`Many systems also support a class of shift/rotate operations.
`
`The shift/rotate class of operations may include operations which shift data
`left and right (possibly sign extending or zero filling). operations which rotate data
`left or right, or shift/merge operations which first shift a data field, then merge the
`result with another operand. This class of operations differs from the arithmetic
`classes in that it is primarily permuting and copying data rather than generating
`fundamentally new values from old values. Some systems also include operations
`for reversing the bits in a word.
`
`Other permutation and copying operations, which can't be easily expressed
`as a simple sequence of shift, rotate. or shift/merge operations, are typically
`petformed by utilizing look-up tables which are stored in memory. For example, to
`petform any fixed operation in which all data bits in the result are derived from
`specific bits in the source operand, one can first break the source operand into
`several smaller fields (which serves to reduce the number of table entries required).
`For each such field there is a corresponding table which is indexed by the value of
`that field. In general, the width of each table enu-y is the size of the final combined
`result. Each entry in the table contains zeros for all result bits not derived from
`values in the field used to index the table, and the corresponding values of the index
`for all result bits which are derived from values in that field. The final result of the
`operation is formed by logically OR-ing the partial results from each of the tables.
`
`Although such a method is clearly very general, it has several disadvantages.
`One disadvantage is that the tables themselves may occupy a significant amount of
`memory. Another disadvantage is that this method is usually fairly slow. In order
`to use it, each field in the source operand must first be extracted from the operand,
`then used as an index for a load from the corresponding table, and finally the partial
`results must be combined to form the final result. As the number of fields grows,
`the number of operations increases linearly. On the other hand, using larger fields
`results in exponential growth of the number of table entries, and therefore in the
`amount of memory required .
`
`SUMMARY OF THE INVENTION
`
`The present invention is a general method for arbitrarily permuting a
`sequence of elements, an extension to the general method which allows some
`
`..
`
`•
`
`•
`
`Oracle-1006 p. 5
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT /US96/13195
`
`- 4 -
`
`extensions of pe1mutations which involve the copying of individual elements. and a
`system based on the extended general method which implements a class of
`operations which can pe1fo1m the primitive steps of the extended general method, as
`well as a much larger class of operations which generally involve the copying and/or
`permuting of data. In addition. the present invention includes several classes of
`instructions for pe1forming operations which generally involve the copying and/or
`permuting of elements.
`
`'i-
`
`The general method can perfmm an arbitrary pe1mutation of w elements. by
`breaking it down into an n -dimensional rectangle whose sides con-espond to any set
`of factors of w, i.e., w = fih .. · f n. In one embodiment, the elements to be
`permuted are bits. The method consists of a sequence of 2n -1 sets of
`permutations across the various dimensions. This method is not restricted to values
`of w which are powers of two.
`
`The extended general method is obtained by replacing each of the
`permutation steps of the general method with multiplexing steps. For example.
`when permuting across dimension i , each pe1mutation of fi elements is replaced
`with a full frto- fi crossbar, or equivalently, Ji independent frto-1 multiplexers.
`These crossbar, or multiplex, operations can perform permutations as a special case.
`Therefore, the extended general method supports all of the permutations performed
`by the general method, and in addition allows certain types of copying to be
`pe1formed.
`
`For a given embodiment of either the general or extended general method,
`the con-espondence between the elements and the n -dimensional rectangle may take
`one of many forms. For example, the elements may in fact already be arranged in
`the shape of the n -dimensional rectangle, or alternatively in the shape of a lower(cid:173)
`dimensional rectangle which results from some of the original dimensions being
`expanded to reduce the total number of dimensions. In another embodiment, the
`elements may exist purely as a one-dimensional sequence. with some specified
`con-espondence to the rectangle (the most obvious choices are row-major and
`column-major. and simple mixtures of the two). In some embodiments, it may be
`just as easy to permute/multiplex across one dimension as across another.
`However, in other embodiments, it may be more difficult, or even impossible, to
`permute/multiplex across some dimensions.
`
`"
`
`5
`
`IO
`
`15
`
`20
`
`25
`
`30
`
`Oracle-1006 p. 6
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT /US96/13 l 95
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`One way to avoid this problem is to resuict the set of dimensions over which
`the permute/multiplex operations need to occur. This can be achieved by reshaping
`(i.e. transposing) the data between successive permutation/multiplex steps. Using
`this technique. it is possible to restrict the set of dimensions over which the
`permutation/multiplex operations need to occur down to a single dimension ..
`Furthermore, in the case where the elements exist as a one-dimensional sequence.
`the permute/multiplex operations can be consu·ained so as to always operate on
`groups of consecutive elements. In a row-major or column-major representation. or
`a simple mixture of the two, a sufficient subset of the set of possible transposes can
`be achieved by performing multi-way pe1fect shuffle/deal operations. Assuming a
`row-majpr representation (i.e., the last dimension varies most rapidly), an
`Ji x h X · · · f n row-major n -dimensional rectangle can be transposed into an
`. f;+I X· · · fn x Ji X· · · f; rectangle by pe1fonning an Uih · · · J; )-way perfect
`shuffle, which in this context is equivalent to an U';+i/;+2 · · · f n )-way petfect deal.
`
`In cases where the elements exist as a one-dimensional sequence, the
`addition of shuffle/deal operations can constrain the permute/multiplex operations to
`operate on groups of consecutive elements. Furthermore. the number of elements
`which must be accessible within a given group will never exceed the largest
`dimension in the n -dimensional rectangle, i.e., the maximum of the f values.
`Thus, in one embodiment of the present invention each of the 2n -1 steps in the
`general method or extended general method can be implemented with a single
`operation by combining the shuffle/deal operation with the permute/multiplex
`operation, in either order (i.e., either shuffle before or after the pennutation).
`
`The system of the present invention for implementing the general and
`extended general methods of the present invention consists of 2n - 1 sequential
`stages. which may easily be pipelined. Each stage pe1forms its con-esponding
`permute/multiplex steps as descdbed above. Each stage is connected to the previous
`and next stages. A variation of this system implements a smaller number of stages.
`possibly only one stage, and cycles the elements through it multiple times
`(transposing between iterations) to achieve the equivalent of 2n -1 stages. Of
`course, doing so may inhibit pipelining .
`
`In one embodiment, the system is a two-dimensional implementation of the
`extended general method of the present invention. The data elements are bits, and
`the width of a data word is w bits. In this embodiment, the data is ainnged as a
`
`..
`
`•
`
`Oracle-1006 p. 7
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13l 95
`
`- 6 -
`
`two-dimensional ax b rectangle ( u rows and b columns), where w =ab. There
`are three stages in the data path: Stage one consists of a groups of b, b -to-I data
`multiplexers which operate within a given row, stage 2 consists of b groups of a.
`a -to-I data multiplexers which operate within a given column. and stage 3 consists
`of a groups of b. b-to-1 data multiplexers which operate within a given row.
`
`The data multiplexers within each stage are physically a.tTanged as an ax b
`rectangle. This allows the data buses to be easily shared within each stage. Each
`data multiplexer is controlled by an encoded (log2 b )-bit value (for stages 1 and 3)
`or (log2 a)-bit value (for stage 2), with the log 1 values being rounded up if the
`result is non-integral (which will happen if the operand is not a power of 2). Each
`bit of the control value for a given data bit in a given stage is independently chosen
`from several values by a control multiplexer.
`
`Although each bit of a given control value for a given data bit in a given
`stage is independently controlled, there is still some sharing of data, which greatly
`reduces and simplifies the wiring. Each of the select signals for a control
`multiplexer for a given control bit in a given stage is shared across the entire row
`(for stages 1 and 3) or column (for stage 2). Furthermore, most of the inputs for a
`control multiplexer for a given conn·ol bit in a given stage are shared across the
`entire column (for stages 1 and 3) and row (for stage 2).
`
`This system also allows a "fill" value to override some of the result bits (i.e.
`the bits of the data word after all of the stages). This is implemented by providing a
`bus containing a set of fill values which are selected on a bit-by-bit basis. The
`selection is controlled by the output of another set of control multiplexers in stage 3.
`As with the other control multiplexers in stage 3. these multiplexers are controlled
`by select signals which are shared across rows, and the inputs to these multiplexers
`come from signals which are shared across columns.
`
`This system implements various shift and rotate operations, a class of
`shuffle/multiplex operations which can perform the pd.mitive steps of the extended
`general method. and several other classes such as copy/swap. select, expand,
`compress, and bit field deposit/withdraw. Many of these operations are supported
`in a "group" form, which allows a single data word to be viewed as several smaller
`data words packed together. Such a group operation then acts independently on
`each of the smaller data words.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`..
`
`•
`
`Oracle-1006 p. 8
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`- 7 -
`
`In one embodiment of the system, w = 128, a= 16, and b = 8 (so
`log2 a = 4 and log2 b = 4 ). In this embodiment, the data is atTanged as a 16 x 8
`rectangle (16 rows and 8 columns).
`
`•
`
`5
`
`The classes of operations of the present invention generally involve the
`copying and/or pennuting of elements. In general, these operations can apply to any
`sequence of elements which may be pennuted and, in some cases, copied. In one
`embodiment, the operations are instructions in a digital computer, and the elements
`are bits. The following classes of operations are included in the present invention:
`
`1. A general class of perfect shuffle/deal operations.
`
`10
`
`2. A general class of data multiplexing operations.
`
`3. A general class of combined perfect shuffle/deal operations and data
`multiplexing operations.
`
`4. An extension to the general class of peitect shuffle/deal operations which
`pennits an arbitrai·y reordering of dimensions (i.e., an ai·bitrary transpose).
`
`15
`
`5. An extension to the general class of combined perfect shuffle/deal operations and
`data multiplexing operations which pennits an ai·bitrary reordering of
`dimensions (i.e., an arbitrary transpose) in place of the perfect shuffle/deal
`component of the operation.
`
`6. A general class of data selection operations.
`
`20
`
`7. A general class of copy/swap operations which support certain patterns of data
`copying and/or data reversal.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`,,
`
`25
`
`Figure 1 illustrates a first embodiment of the general system of the present
`invention including a single control generation unit and a permute/multiplex unit.
`
`Figure 2 illustrates a second embodiment of the general system of the
`present invention including multiple control generation units, a control select unit
`and a permute/multiplex unit.
`
`Oracle-1006 p. 9
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO97/07451
`
`PCT/US96/13195
`
`- 8 -
`
`Figure 3 illustrates a third embodiment of the general system of the present
`invention having low-level simplification of control signals and including a
`single control generation unit and a final control selection and permute/multiplex
`unit.
`
`Figure 4 illustrates a fourth embodiment of the general system of the
`present invention having low-level simplification of conu·ol signals including
`multiple control generation units, a control select unit, and a final control
`selection and pe11nute/multiplex unit.
`
`Figure 5 illustrates one embodiment of a two-dimensional system of the
`present invention.
`
`Figure 6 illustrates the layout of a stage 1 cell of a two-dimensional
`embodiment of the system of the present invention designed to operate on 128
`bits.
`
`Figure 7 illustrates the layout of a stage 2 cell in a two-dimensional
`embodiment of the system of the present invention designed to operate on 128
`bits.
`
`Figure 8 illustrates the layout of a stage 3 cell in a two-dimensional
`embodiment of the system of the present invention designed to operate on 128
`bits.
`
`5
`
`10
`
`15
`
`20
`
`Figures 9A-9D illustrate alternative physical layout atTangements of stages
`1-3 of the system of the present invention.
`
`Figure 10 illustrates a first embodiment of a stage 1 cell in a two(cid:173)
`dimensional embodiment of the system of the present invention as shown in
`Figure 6.
`
`25
`
`30
`
`Figures 11A and 11B illustrate a first embodiment of a stage 2 cell in a
`two-dimensional embodiment of the system of the present invention as shown in
`Figure 7.
`
`Figures 12A and 12B illusu·ate a first embodiment of a stage 3 cell in a
`two-dimensional embodiment of the system of the present invention as shown in
`Figure 8.
`
`"
`
`.,
`
`Oracle-1006 p. 10
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`5
`
`10
`
`15
`
`Figures 13A and 13B illustrate a second embodiment of a stage 1 cell in
`a two-dimensional embodiment of the system of the present invention, including
`an additional input data bus.
`
`Figures 14A - 14D illustrate a second embodiment of a stage 3 cell in a
`two-dimensional embodiment of the system of the present invention, including
`fill operation circuitry.
`
`Figures 15A - 15F illustrate a third embodiment of a stage 3 cell in a two(cid:173)
`dimensional embodiment of the system of the present invention, including
`circuitry for providing additional multiplexer control to the stage.
`
`Figures 16A • 16F illustrate a fourth embodiment of a stage 3 cell in a
`two-dimensional embodiment of the system of the present invention, including
`fill operation circuitry and circuitry for providing additional multiplexer control
`to the stage.
`
`Figure 17 illustrates an embodiment of a cell that employs bus overloading
`by using the same bus to provide fill data and additional multiplexer control data.
`
`Figure 18 illustrates an embodiment of two adjacent cells that employ bus
`overloading by using the same bus to provide fill data and additional multiplexer
`control data to each of the adjacent cells and also employ bus sharing by using
`the same additional multiplexer control data for both adjacent cells.
`
`20
`
`Figure 19 illustrates a 16-bit data word having bit index numbers ranging
`from O - 15.
`
`Figure 20 illustrates an example of a simple rotate operation being
`performed on a 16-bit data word.
`
`Figure 21 illustrates an example of a bit reversal operation being pe1formed
`on a 16-bit data word.
`
`25
`
`Figure 22 illustrates an example of a two-way shuffle operation being
`performed on a 16-bit data word .
`
`Figure 23 illustrates an example of a two-way deal operation being
`pe1formed on a 16-bit data word.
`
`J
`
`•
`
`•
`
`Oracle-1006 p. 11
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO97/07451
`
`PCT/US96/13195
`
`- 10 -
`
`Figure 24 illustrates the equivalency between perfonning a transpose
`operation on a 3 x 5 rectangle and pe1fo1ming a three-way shuffle operation on
`the 15 elements within the rectangle.
`
`Figure 25 illustrates the equivalency shown in Figure 24 where the rows
`and columns of the rectangles have been renumbered.
`
`5
`
`"
`
`Figure 26 illustrates the bit index of an element in a 4 x 8 rectangle before
`and after a transpose operation.
`
`Figure 27 illusu·ates an example of an outer group shuffle operation being
`performed on a 32-bit dataword where the outer group size for the operation is 8.
`
`10
`
`Figure 28 illustrates an example of an inner group shuffle operation being
`pe1formed on a 32-bit dataword, where the inner group size for the operation is 4.
`
`Figure 29 illustrates an example of an outer/inner group shuffle operation
`being perfmmed on a 128-bit data word, where the inner group size for the operation
`is 8 and the outer group size is 32.
`
`15 DETAILED DESCRIPTION
`
`A general method for arbitrarily permuting a sequence of elements, and an
`extension to the general method which allows some extensions to permutations
`which involve the copying of individual elements, is described in detail hereinafter.
`A system based on the extended general method which implements a class of
`operations which can pe1form the pdmitive steps of the extended general method, as
`well as a much larger class of operations which generally involve the copying and/or
`permuting of data. is also described. Finally, several classes of instructions for
`performing operations which generally involve the copying and/or permuting of
`elements are described.
`
`In the following description. numerous specific details are set faith, such as
`data path width within a microprocessor and specific microprocessor operations in
`order to provide a thorough understanding of the present invention. It will be
`obvious, however, to one skilled in the ait that these specific details need not be
`employed to practice the present invention. In other instances, well-known logic
`gates. as well as some simple combinatorial circuits which may be built from such
`
`20
`
`25
`
`30
`
`Oracle-1006 p. 12
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`- 11 -
`
`gates. have not been desctibed in detail in order to avoid unnecessaiily obscming the
`present invention.
`
`General Method
`
`The general method of the present invention is a method for arbitrruily
`permuting a sequence of elements. The elements could be physical objects (for
`example. glass bottles). or they could be data values (for example. data bits). The
`general method for pe1muting a sequence of elements can be used as long as the
`ptimitive steps of the method can be performed on the elements.
`
`The general method can pe1form an arbitrary pennutation of w elements. by
`breaking it down into an n -dimensional rectangle whose sides c01,-espond to any set
`of factors of w, i.e., w = f 1h · · · fn. In one embodiment, the elements to be
`pennuted are bits. The method consists of a sequence of 2n -1 sets of
`pennutations across the vaiious dimensions. The specific order of the sets of
`pennutations is shown below. This method is not restricted to values of w which
`ru·e powers of two.
`
`5
`
`10
`
`15
`
`Pennutation 1 :
`
`Pennutation 2 :
`
`Pe1fonn ~ independent pennutations of f n elements along
`fn
`dimension n .
`
`Pe1fonn _;.__ independent pe1mutations of fn-I elements
`fn-1
`along dimension n - 1 .
`
`Pennutation n -1 :
`
`Petfo1m ~ independent pennutations of h elements along
`
`dimension 2 .
`
`Pe1mutation 11 :
`
`Pe1form ;~ independent permutations of Ji elements along
`
`dimension 1 .
`
`•
`
`Pennutation n + 1 :
`
`~ independent permutations of h elements along dimension
`
`2.
`
`Oracle-1006 p. 13
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT/US96/13195
`
`- 12 -
`
`Pe1mutation 2n - 2 :
`
`Pe1f01m ~ independent permutations of f n- l elements
`fn-l
`along dimension n -1 .
`
`Pe1mutation 2n - 1 :
`
`Pe1form ~ independent pe1mutations of f 11 element~ along
`fn
`dimension n .
`
`For a given dimension i. to pe1form ;; independent pe1mutations of Ji
`
`elements along dimension i, means that each one-dimensional slice of the rectangle
`obtained by holding constant the coordinates of each dimension. except dimension
`i, is independently permuted. More formally. for each slice through dimension i, a
`permutation function p can be defined such that element (x1, · · ·, x11 ) in the new,
`pe11nuted rectangle comes from elen:ient ( x1, · · · , xi- I• p( xi ) , xi+ 1, · · · , Xn) in the
`old, unpermuted rectangle. Note that there are ~ independent p functions
`Ii
`involved in permuting across dimension i, one for each one-dimensional slice
`through dimension i obtained by holding the coordinates of the other dimensions
`constant. In other words, there is a separate p function for each set of
`(x1, · · ·, xi-1 • xi+ 1, · · ·, xn) values. Fu1thermore, since each p function specifies a
`permutation, no two values of a given p function may be the same, i.e.,
`x-:;.: y ⇒ p(x)-:;.: p(y). This ensures that each element of the old, unpermuted
`rectangle appears exactly once in the new, pe1muted rectangle.
`
`The choice of which dimension to call dimension 1, which dimension to call
`dimension 2, etc. is completely flexible, and may be made in whatever way best
`suits the needs of a particular embodiment. The only requirement is that the pattern
`shown above is followed, i.e., pe1mutation steps I and 2n - 1 involve the same
`dimension, permutation steps 2 and 2n 2 involve the same dimension. etc., with
`each dimension but one being selected twice. and the remaining dimension being
`selected once in pe1m utation step n .
`
`5
`
`IO
`
`15
`
`20
`
`•
`
`Oracle-1006 p. 14
`Oracle v. Teleputers
`IPR2021-00078
`
`
`
`WO 97/07451
`
`PCT /US96/13 l 95
`
`,,.
`
`Determining the Permutation Steps of the General Method
`
`The following procedures show how the individual permutation steps of the
`general method may be dete1mined for any pe1mutation which the general method is
`to perform. At the same time. it should become clear that this method can be used
`to perform any arbitrary permutat