`(12) Reissued Patent
`Morikawa et al.
`
`USOORE43 145E
`
`(10) Patent Number:
`
`(45) Date of Reissued Patent:
`
`US RE43,145 E
`*Jan. 24, 2012
`
`(54) PROCESSOR WHICH CAN FAVORABLY
`EXECUTE A ROUNDING PROCESS
`COMPOSED OF POSITIVE CONVERSION
`AND SATURATED CALCULATION
`PROCESSING
`
`(75)
`
`Inventors: Toru Morikawa, Kadoma (JP); Nobuo
`Higaki, Kadoma (JP); Akira Miyoshi,
`Kadoma (JP); Keizo Sumida, Kadoma
`(JP)
`
`(73) Assignee: Panasonic Corporation, Osaka (JP)
`
`( * ) Notice:
`
`This patent is subject to a terminal dis-
`claimer.
`
`(21) Appl.No.: 11/016,920
`
`(22)
`
`Filed:
`
`Dec. 21, 2004
`Related U.S. Patent Documents
`
`Reissue of:
`
`6,237,084
`May 22, 2001
`09/399,577
`Sep. 20, 1999
`
`(64) Patent No.:
`Issued:
`Appl. No.:
`Filed:
`U.S. Applications:
`(62) Division of application No. 10/366,502, filed on Feb.
`13, 2003, now Pat. No. Re. 39,121, which is a division
`of application No. 08/980,676, filed on Dec. 1, 1997,
`now Pat. No. 5,974,540.
`
`(30)
`
`Foreign Application Priority Data
`
`Nov. 29, 1996
`
`(JP) ..................................... .. 8-320423
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 9/302
`(52) U.S. Cl.
`....................... .. 712/221; 708/551; 708/552
`
`(58) Field of Classification Search ................ .. 708/550,
`708/551, 552, 203, 204, 208
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,935,890 A
`
`6/1990 Funyu
`(Continued)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0 657 804 Al
`6/1995
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`TMS320l0 User’s Guide, Digital Signal Processor Products, Texas
`Instruments, 1983, p. 3-7.*
`
`(Continued)
`
`Primary Examiner — Richard Ellis
`(74) Attorney, Agent, or Firm — McDermott Will & Emery
`LLP
`
`ABSTRACT
`(57)
`A processor which executes positive conversion processing,
`which converts coded data into uncoded data, and saturation
`calculation processing, which rounds a value to an appropri-
`ate number of bits, at high speed. When a positive conversion
`saturation calculation instruction “MCSST D1” is decoded,
`the sum-product result register 6 outputs its held value to the
`path P1. The comparator 22 compares the magnitude of the
`held value ofthe sum-product result register 6 with the coded
`32-bit integer “0x0000_00FF”. The polarity judging unit 23
`judges whether the eighth bit of the value held by the sum-
`product result register 6 is “ON”. The multiplexer 24 outputs
`one of the maximum value “0x0000_00FF” generated by the
`constant generator 21, the zero value “0x0000_0000” gener-
`ated by the zero generator 25, and the held value of the
`sum-product result register 6 to the data bus 18.
`
`44 Claims, 17 Drawing Sheets
`
`
`
`I REGISTER 1-1113
`
`
`
`
`.—
`
`
`
`C1
`
`PETITIONER EXHIBIT 1025-0001
`
`
`
`US RE43,145 E
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`4,945,507 A
`5,235,533 A
`5,251,166 A
`5,402,368 A
`5,448,509 A
`5,504,697 A
`5 2 *
`,
`,
`5,684,728 A
`5,696,709 A
`5,301,977 A
`5,812,439 A
`5,847,978 A
`5,889,980 A
`5,915,109 A
`5a917a740 A
`5974540 A
`6’029’184 A
`6,058,410 A
`
`7/1990 Ishida et al.
`8/1993 Sweedler
`10/1993 Ishida
`3/1995 Yamada
`9/1995 Lee et al.
`4/1996 Ishida
`1C:h1i%(3W3
`1
`............... ..
`a up eta.
`11/1997 Okayama et al.
`12/1997 Smith, Sr,
`9/1998 Km-P et 31,
`9/1998 Hansen
`12/1998 Ogura
`3/1999 Smith, Jr.
`6/1999 Nakamura et al.
`6/1999 V01k_0I15kY
`10/1999 Mmkawa 6‘ *'1~
`2/2000 He
`5/2000 Sharangpani
`
`712/234
`
`EP
`EP
`GB
`GB
`Jp
`JP
`JP
`JP
`JP
`JP
`JP
`
`FOREIGN PATENT DOCUMENTS
`657804
`6/1995
`0 766169 A1
`4/1997
`2 300 054 A
`10/1996
`2300054
`10/1996
`5856032
`4/1983
`58.0 55032 A
`4/1983
`07-182141
`7/1995
`7-210368 A
`8/1995
`7210368
`8/1995
`7-334346
`12/1995
`8-272591
`10/1996
`
`JP
`
`KR
`JP
`W0
`W0
`
`09-97178
`
`1995-0010571
`10-55274
`9617292
`WO 96/17292
`
`4/1997
`
`9/1995
`2/1998
`6/1996
`6/1996
`
`OTHER PUBLICATIONS
`
`Dictionary.com, definition of“specified”, http://dictionaryreference.
`-
`°°."“.br°WSe/SPe°‘fied’.‘?°°eSSedf““' 1.6’ 2(,),10'*
`,
`.
`.
`D1ct1onary.com, defin1t1on of
`defin1tely , http.//dict1onary.refer-
`ence,.com/browse/defin1te1y, accessed Jun. 16, 2010.*
`,
`,
`D1ct1onary.com, defin1t1on of “unamb1guously”, http://d1ct1onary.
`reference.com/browse/unambiguously, accessed Jun. 16, 2010.*
`Patterson et a1., AVLSI RISC, IEEE Computer, 1982, pp. 8-18 and
`20.21,*
`Japanese Office Action, issued in corresponding Japanese Patent
`Application No. 9-327866 dated on Oct. 11 2007.
`Nadehara Kouhei et al
`,“Low-Power Multimedia RISC” IEEE
`.
`’
`’
`"
`’
`M1cro, US, IEEE Inc., NewYork, Vol. 15, No. 6, (Dec. 1, 1995), pp.
`20-29, XP538227, ISSN: 0272-1732. _
`_
`Lee, RubyB., SubwordPara1le11smw1th MAX-2 ,IEEE M1cro,US,
`IEEE Inc., New York, Vol. 16, No.
`, (Aug. 1, 1996), pp. 51-59,
`XP000596513
`'
`.
`.
`.
`.
`.
`Korean Office Act1on, 1ssued 1n Correspond1ng Korean Patent Appl1-
`catron No. 10-1997-0064288, dated on Feb. 26, 2004.
`“Low-Power Multimedia RISC,” by K. Nadehara, 8207 IEEE Micro
`15 (1995) Dec., No. 6.
`“Subword Parallelism with MAX-2,”by R. Lee, IEEE Micro Aug. 1,
`1996, V01, 16,No, 4,
`
`* cited by examiner
`
`PETITIONER EXHIBIT 1025-0002
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 1 of 17
`
`US RE43,145 E
`
`FIG. 1 PRIOR ART
`
`
`
` ARITHMETIC
`
`LOGIC UNIT
`
`
`
`
`SUM-PRODUCT
`RESULT REGISTER
`
`PETITIONER EXHIBIT 1025-0003
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 2 of 17
`
`US RE43,145 E
`
`wE.:a.§_.mE.:a.mE.~E._E$m.:o.$o.$o.$m.2o.$o.25wm..C$.oE.m$.§_m$.~$._Em8guano.m8.£o.mmo.N8.28amamamam5mmNEammmogmosoananmoN8so
`
`
`$,,__5%$.§.§_.§.~E.2E$u_$m.$o.8u.$u.$o.N8.2mo
`
`wE.t..m.2.m.mE.E.m.2.m.~E.Em§o.:u.Eo.2o.:o.m5.~.5_25
`
`
`
`mExm.,Gm..fiw,.3mfiawfiammwouamumouaouaouaonwmunso
`
`mE.$n_.§_$n_.§.mE.~E._E$o.so_o8.$o.$u.mmo.~8.28
`
`
`
`
`
`wE.:h$E.mE.¢$.23:2E
`mB.:o_Su.m_o_:o.2o.§38
`
`.52.mesaN.0E
`
`mmmhm_8m.mN:.§m.m~m_m~:.mmam:m2:2:3mem3:E
`
`§.__:m.$m_3:.:.m.mE.mEd:
`$m.$m.$m.$:EmEm$2.5m
`
`mm:_Sm_mm:_£:.xm.m..m.mmm.2mm
`Ext:.Em.2:.:m.2mHmE.E.
`§.:mm.$m.mmmwmmuwmnmmz._mm
`mmzxwmnmwmfl$:”$mU2.m.~mm.am
`
`:a._;:u+:%..:o+§_...m:u+_E*Eu
`
`
`
`+5?:u+am*2o+_E*~_o+:%_.SUHZE
`
`~$.::o+~E*:o+NE,::o+~£._.2o
`
`
`
`+~E*3o+~E*2o+N2._.~_o+NE*zouam
`
`PETITIONER EXHIBIT 1025-0004
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 3 of 17
`
`US RE43,145 E
`
`0E
`
`m
`
`
`
`zofioayzmézoemm
`
`mmm~Eo<
`
`.535SE
`
`295352.
`
`m.05
`
`zofiémmo
`
`zofiaomxm
`
`mDH<m<nE<
`
`mam<,_.<D
`
`PETITIONER EXHIBIT 1025-0005
`
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 4 of 17
`
`US RE43,145 E
`
`'!.|Il'!l.iIII:
`IIlI||'lI||l|I!‘loil'||l|u
`
`Elll
`
`||_
`
`U.InlI|IIIll"l|lI'l0iI|t|
`
`I..I.l|J-
`
`asas.§___
`_awgomags
`
`.
`
`.
`
`.
`
`-
`
`.
`
`.
`
`PETITIONER EXHIBIT 1025-0006
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 5 of 17
`
`US RE43,145 E
`
`2GE
`
`222E5:..=2222222222222532282:22:
`
`
`
`
`22222222.225;22222222222222
`
`
`22222..:227:E.525::2E;28....83,8222222
`
`STE..35.3222222
`STE82:5222222
`3:22222222122
`
`21:".2$553.5:":
`2.1::.885522SE:
`2152352252.:
`
`21222.2:::222
`
`2.222.._222222222
`.2.22.2.222>222222222.222222.22222222222.2222_,22222
`
` 22_282222222222.222222222222.222222222222:2222222222222>2-2222222
`22.222222222222222222222
`
`-222,.
`
`22:2222222222
`
`22222.222222.222:322$22258E2222
`
`2238:322.222E2322222
`
` -22E222.2222222222222222222.222222:2222222
`
`
`
`2222:.32252>.2222225.:
`
`
`22.-2222222
`
`PETITIONER EXHIBIT 1025-0007
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 6 of 17
`
`US RE43,145 E
`
`FIG. 6
`
`MACCB INSTRUCTION
`
`MULTIPLIER
`READ ADDRESS
`INDICATION
`
`MULTIPLICAND
`READ ADDRESS
`INDICATION
`
`ll‘---MCR
`00- -
`- -REGISTER D0
`01----REGISTERD1
`10'
`' "REGISTER D2
`
`ll----MCR
`00- '
`- -REGISTER D0
`01-~-'REGISTERD1
`10 -
`- "REGISTER D2
`
`INDICATION OF CONTENT OF ELEMENTAL OPERATION
`
`1' -
`0' -
`
`' -MULTIPLICATION
`'
`'NONE
`
`
`
`INDICATION OF CALCULATED CONTENT OF
`ALGEBRAIC SUM
`
`1" ' ‘ADDITION
`0' '
`' ‘NONE
`
`INDICATION OF STORAGE ADDRESS
`FOR _SUM-PRODUCT RESULT
`
`1. .
`0' '
`
`. .MCR
`' ‘NONE
`
`PETITIONER EXHIBIT 1025-0008
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 7 of 17
`
`US RE43,145 E
`
`FIG. 7
`
`MCSST INSTRUCTION
`
`STORAGE ADDRESS
`POSITIVE CONVERSION
`SATURATION CALCULATION INDICATED
`WIDTH INDICATION
`
`00---'24bit POSITIVECONVERSION 00' ' ' "REGISTER D0
`01""l6bilP0$l'l'lVECONVERSION 01“-‘REGISTER D1
`11---'8b'u POSITIVECONVERSION
`10' °
`' ' REGISTER D2
`11' ‘
`' ‘REGISTER D3
`
`PETITIONER EXHIBIT 1025-0009
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 8 of 17
`
`US RE43,145 E
`
`FIG. 8A
`
`
`
`CODE 3”
`
`SUM-PRODUCT RESULT
`
`x
`%
`8
`9
`I6
`24
`32
`IIIIIIIIIIIIIIIIIIIIIIII§§§§§&&8
`
`
`
`
`
`
`
`MATRIX MULTIPLICATION RESULT I-llj
`
`FIG. 8B
`
`SUM-PRODUCT RESULT
`32767
`(7FFF)
`
`- 32767
`
`.
`
`PETITIONER EXHIBIT 1025-0010
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 9 of 17
`
`US RE43,145 E
`
`FIG. 9
`
`LOGIC VALUE X LOGIC VALUE Y SELECTED INPUT VALUE
`
`0x0000_00FF
`
`0x0000_0000
`
`
`
`
`
`0x0O00_00O0
`
`STORED VALUE OF
`SUM—PRODUCT RESULT
`REGISTER
`
`PETITIONER EXHIBIT 1025-0011
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 10 Of 17
`
`US RE43,145 E
`
`FIG. 10
`
`EXAMPLE OPERATION: DO X D1(0x7f X 0x70)
`
`REGISTER STORED D0
`VALUE
`
` OUTPUT OF LOWER-ORDER
`
`32 BITS
`
`
`
`
`MsB:o
`POSITIVE CONVERSION
`SATURATION CALCULATION 0x00003790>0x000000ff
`CIRCUIT
`"'0x000000ff
`
`32 0x00O000ff
`
`
`
`REGISTER STORED
`VALUE
`
`D1
`
`
`
`0x(l00000ff
`
`
`
`MEMORY STORED VALUE
`
`
`Oxff
`
`PETITIONER EXHIBIT 1025-0012
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 11 of 17
`
`US RE43,145 E
`
`FIG. 11
`
`EXAMPLE OPERATION: D0 >< D1 (0x7f X 0x80)
`
`MEMORY STORED
`
`STORED D0
`
`5
`
`
`
`
`
`CODE EXTENSION
`CIRCUIT
`
`1
`
`
`
`0xO0U0007f
`
`32 Oxffffff80
`
`2
`
`64 0xffffffffffffC080
`
`6
`
`32 0xffffc0803
`
`POSITIVE CONVERSION
`MSB: 1->0x00000D00
`SATURATION CALCULATION
`CIRCUIT
`
`
`
`32 0x00000000
`
`
`
`
`
`D1
`
`REGISTER STORED
`VALUE
`
`MEMORY STORED VALUE
`
`axon
`
`PETITIONER EXHIBIT 1025-0013
`
`
`
`OUTPUT OF I_.OWER—ORDER
`
`32 BITS _
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 12 of 17
`
`US RE43,145 E
`
`1E1
`V459!
`AE.,I_!%,I-3!?
`égéggggé
`égagégégé
`$313!?
`@1315
`«W»‘M
`INSTRUCTIONFETCHSTAGE
`
`INSTRUCTIONDECODINGSTAGE
`
`
`
`FIG.12A
`
`EXECUTIONSTAGE
`
`>-'
`
`(1)
`
`E5‘
`284:
`§<=:c*7a
`
`
`
`REGISTERWRITESTAGE
`
`PETITIONER EXHIBIT 1025-0014
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 13 of 17
`
`US RE43,145 E
`
`2
`
`20
`I-«
`33
`L“
`at.
`C}
`HOLD
`was
`Echo
`
`z
`
`9
`E-*
`B8
`‘“<
`><E--
`“M
`
`>
`
`an
`
`gm 3%
`mo
`Dds-«
`5%:
`33”’
`§<::
`-«E—~
`mm
`U[--«
`U)
`Q
`EE
`
`3 g EE
`
`Z17-I1
`QED
`H:
`Um
`3:05
`[_.U
`was
`Eu.
`
`PETITIONER EXHIBIT 1025-0015
`
`FIG.12B
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 14 of 17
`
`US RE43,145 E
`
`
`
`TORAGE ADDRESS
`POSITIVE CONVERSION
`SATURATION CALUCULATION INDICATED
`WIDTH INDICATION
`
`-
`11 -
`-
`00" - -24bit POSIIIVE CONVERSION O0 -
`-
`0 1 lfibit POSITIVE CONVERSION
`01 -
`1 1....8bi”:0S|'[[VE CDNVERSIDN
`10- '
`
`- -MCR
`-
`- REGISTER DO
`-
`- REGISTER D1
`- REGISTER D2
`
`READ ADDRESS INDICATION
`
`l 1 -
`
`-
`
`- -MCR
`
`00' ‘
`01- '
`10' '
`
`' ' REGISTER D0
`- -REGISTER D1
`' REGISTER D2
`
`PETITIONER EXHIBIT 1025-0016
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 15 of 17
`
`US RE43,145 E
`
`."‘\
`
`PETITIONER EXHIBIT 1025-0017
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 16 of 17
`
`US RE43,145 E
`
`uIIIIIIII!|uIl0IIlI.I|IlullllloI.IlIl|lIlv!Il|||lllulnll
`
`M_zo%.%z%%88\.n:85235336asas
`
`ehW205235_,._em.~._m._1.§.._m_1,._+_.m.p_...mm
`
`:22ozaaaW».E<§m
`
`PETITIONER EXHIBIT 1025-0018
`
`
`
`
`
`U.S. Patent
`
`Jan. 24, 2012
`
`Sheet 17 of 17
`
`US RE43,145 E
`
`FIG. 16
`
`MULBSST INSTRUCTION
`
` MULTIPLIER READ
`
`MULTIPLICAND READ
`ADDRESS INDICATION ADDRESS INDICATION
`
`I1--"MCR
`11----MCR
`00- -
`- -REGISTER D0
`00' -
`- -REGIST ER D0
`01----REGISTERDI O1----REGISTERDI
`10- -
`- -REGISTER D2
`10- -
`- -REGISTER D2
`
`POSITIVE CONVERSION SATURATION CALCULATION
`WIDTH INDICATION
`
`01- '
`
`- -24bit POSITIVE VALUE
`
`10- ' --16bit POSITIVE VALUE
`
`1 I ' ' ' '8bit POSITIVE VALUE
`
`CALCULATION CONTENT INDICATION
`
`1- -
`
`- -MULTIPLICATION
`
`0- -
`
`- -NONE
`
`PETITIONER EXHIBIT 1025-0019
`
`
`
`US RE43,l45 E
`
`1
`PROCESSOR WHICH CAN FAVORABLY
`EXECUTE A ROUNDING PROCESS
`COMPOSED OF POSITIVE CONVERSION
`AND SATURATED CALCULATION
`PROCESSING
`
`Matter enclosed in heavy brackets [ ] appears in the
`original patent but forms no part of this reissue specifica-
`tion; matter printed in italics indicates the additions
`made by reissue.
`
`More than one reissue application has been filed for the
`reissue of US. Pat. No. 6,23 7,084. The reissue applications
`are application Ser. Nos. 10/366, 502 (reissued as RE39,I2I
`on Jun. 6, 2006) and II/016,920 (this application), all of
`which are divisional reissues ofU.S. Pat. No. 6,23 7,084. This
`application is a divisional reissue of application Ser No.
`I0/366,502filed Feb. 13, 2003 which is a reissue ofSer No.
`09/399,577filed on Sep. 20, 1999, now US. Pat. No. 6,23 7,
`084, which is a divisional ofapplication Ser. No. 08/980,676
`filed Dec. 1, 1997, now US. Pat. No. 5,974,540.
`This is a divisional application ofU.S. Ser. No. 08/980,676
`now U.S. Pat. No. 5,974,540 filed Dec. 1, 1997.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`The present invention relates to a processor that performs
`processing according to instruction sequences that are stored
`in a ROM or the like.
`
`2. Background of the Invention
`In recent years, there has been a visible increase in the use
`of application software that can interactively reproduce vari-
`ous kinds of data, such as video data, still image data, and
`audio data, that have been compressed according to tech-
`niques such as frame encoding, field encoding, or motion
`compensation. As such software has been developed, there
`has been increasing demand for multimedia-oriented proces-
`sors that can efficiently execute the software. These multime-
`dia-oriented processors are processors designed with a spe-
`cial architecture to facilitate programming, such as the
`compression and decompression ofvideo and audio data. The
`high-speed processing required for handling video data is the
`matrix multiplication of compressed data that has N*N
`matrix elements with coefficient data that also has N*N
`
`matrix elements. Representative examples of compressed
`data that has N*N matrix elements are the luminescence
`
`block composed of 16*16 luminescence elements, the blue
`color difference block (Cb block) composed of 8*8 color
`difference elements, and the red color difference block (Cr
`block) composed of 8*8 color difference elements used in
`MPEG (Moving Pictures Experts Group) techniques. The
`matrix multiplication for compressed data referred to here is
`performed very frequently when executing the approxima-
`tion calculations for an inverse DCT (Discrete Cosine Trans-
`form) in image compression methods such as MPEG and
`JPEG (Joint Photographic Experts Group).
`The following is a description of conventional multimedia-
`oriented processors that can perform high-speed matrix mul-
`tiplication. The basic architecture of conventional multime-
`dia-oriented processors is provided with a sum-product result
`register (hereinafter simply referred to as an MCR register) as
`hardware, and is provided with an instruction set that includes
`a “MOV MCR, **” transfer instruction for transferring a sum-
`product value.
`
`2
`
`An example ofthe hardware construction ofa conventional
`multimedia-oriented processor is shown in FIG. 1. As shown
`in FIG. 1, the arithmetic logic unit (hereinafter, “ALU”) 61
`performs the multiplication of an element Fij that forms part
`of the compressed data and an element Gji that forms part of
`the coefficient matrix in accordance with a multiplication
`instruction. The ALU 61 also reads the sum-product value
`stored in the sum-product result register 62, adds the multi-
`plication result of Gji*Fij to the read sum-product value, and
`has the result of this addition stored in the sum-product result
`register 62. By repeating the above calculation, a sum-prod-
`uct value is accumulated in the sum-product result register 62.
`Once the multiplication has been performed a predetermined
`number of times, the programmer issues a sum-product value
`transfer instruction. By issuing a transfer instruction, the
`accumulated value in the sum-product result register 62 is
`transferred to the general registers, and is used as the matrix
`multiplication result for one row and one column. By per-
`forming N*N iterations of the above processing, the matrix
`multiplication of N*N compressed data and an N*N coeffi-
`cient matrix can be completed.
`When a conventional multimedia-oriented processor is
`used, however, positive correction saturation operations for
`amcnding thc sum-product valuc posc many difficultics for
`programmers.
`Positive conversion processing refers to the conversion of a
`sum-product value that is a negative value into either zero or
`a positive value. Normally, compressed data is expressed as a
`coded relative value that reflects the relation of the present
`value to the preceding and succeeding values. As a result,
`there are many cases when the sum of products for each
`element in the compressed data and the corresponding coef-
`ficients is a negative value. Most reproduction-related hard-
`ware, such as displays and speakers, however is only able to
`process uncoded data, so that when the sum-product values
`are to be reproduced, it is first necessary to perform positive
`conversion processing.
`Saturation calculation processing refers to processing that
`sets all values that exceed a given range (or, in other words,
`which are “saturated”) at a predetermined value. This is to
`say, when an element that includes an erroneous bit generated
`during transfer is used in a sum-product calculation as part of
`the sum-product processing for compressed data, there is an
`increase in the probability of the sum-product value exceed-
`ing a value that can be expressed by the stated number of bits.
`Since most reproduction-related hardware is only physically
`capable of reproducing uncoded data with a fixed valid num-
`ber ofbits, such as eight bits, saturation processing is required
`to convert the sum-product value into a value that can be
`expressed using the valid number of bits.
`It has been conventional practice to perform this kind of
`positive value conversion processing and saturation calcula-
`tion processing by converting the-sum-product value using a
`subroutine that corrects the sum-product value. An example
`of a subroutine that corrects the sum-product value is
`explained below. In this example, the register width and the
`calculation width of the calculation unit are 32 bits, with the
`width of the MCR being 32 bits, and the sum-product value
`being expressed as a coded 16-bit integer. The data that can be
`handled by the reproduction-related hardware needs to be
`expressed using uncoded 8-bit integers. This subroutine is set
`as using the data register D0 for storing the calculation result.
`Each instruction is expressed using two operands, with the
`left and right operands being respectively called the first and
`the second operands. The second operand is used both to
`indicate the transfer address of a transfer instruction and the
`
`storage address of an arithmetical instruction.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`PETITIONER EXHIBIT 1025-0020
`
`
`
`3
`MOV MCR,D0
`CMP 0XFFFF,8000,D0
`BCC CARRY
`
`Instruction 1:
`Instruction 2:
`Instruction 3:
`Instruction 4:
`Instruction 5:
`CARRY:
`Instruction 6:
`CMP 0x0000_00FF,D0
`Instruction 7: BCS END
`
`MOV 0x0000_00000,D0
`BRA END
`
`Instruction 8: MOV 0x0000_00FF,D0
`END: (end of positive conversion saturation calculation
`processing)
`Describing the above instructions in order, Instruction 1,
`“MOV MCR,D0”, transfers the stored value of the MCR
`register into the data register D0.
`Instruction 2, “CMP
`0xFFFF_8000,D0”, compares the value in the data register
`with the immediate “0xFFFF_8000”, where “0x” shows that
`the value is given in hexadecimal. This comparison is per-
`formed by subtracting the immediate “0xFFFF_8000” given
`in the first operand from the stored value of the data register
`D0 given in the second operand.
`The sixteenth bit of the immediate “0xFFFF_8000” in
`Instruction 2 is the code bit used for a 16-bit coded integer, so
`that whcn thc storcd value of thc data rcgistcr D0 is greater
`that the immediate “0xFFFF_8000”, this shows that the
`value stored in the MCR is a negative number.
`On the other hand, when the stored value of the D0 register
`is less than “0xFFFF,8000”, this shows that the value stored
`by the MCR is a positive number. If this number is a positive
`number, a carry is performed and the carry flag in the flag
`register is set.
`The letter “B” in the “BCC” in Instruction 3 stands for
`
`“Branch”, while the letters “CC” stand for “Carry Clear”.
`When the comparison in Instruction 2 finds that the stored
`value ofthe register D0 is less than the immediate “0xFFFF,
`8000”, a branch is performed to Instruction 6 which has the
`label “CARRY”. Conversely, when the comparison in
`Instruction 2 finds that the stored value of the register D0 is
`greater than the immediate “0xFFFF_8000”, Instruction 4,
`“MOV 0x0000,0000,D0” transfers the value zero into the
`register D0, amending the sum-product value to zero. After
`this amendment, the unconditional branch “BRA END” in
`Instruction 5 is performed to transfer the processing to the
`“END” label, thereby completing the positive conversion
`processing.
`The processing described above is performed when the
`stored value of the register D0 is negative. The following is a
`description of the processing performed when the stored
`value of the register D0 is greater than the immediate
`“0xFFFF_8000”.
`In such a case,
`Instruction 6, “CMP
`0x0000_00FF,D0” compares the stored value of the register
`D0 with the immediate “0x0000_00FF”. This comparison is
`performed by subtracting the immediate “0x0000_00FF”
`given in the first operand from the stored value of the data
`register D0 given in the second operand. When the stored
`value of the D0 register is smaller than the immediate
`“0x0000_00FF”, a carry is performed and the carry flag in
`the flag register is set.
`The letters “CS” in Instruction 7, “BCS END”, stand for
`“Carry Set”, so that when the carry flag is set, a branch is
`performed to the label “END” from Instruction 7.
`When the carry flag is not set, no branch is performed in
`Instruction 7 and processing advances to Instruction 8, “MOV
`0x0000_00FF,D0”, where the immediate “0x0000_00FF”
`is transferred into the register D0 to amend the calculation
`result to “0x0000_00FF”, thereby completing the saturation
`calculation processing.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US RE43,145 E
`
`4
`
`The problem with the sum-product value amendment pro-
`cess described above lies in the considerable increase in code
`
`size caused by the insertion ofthe above eight instructions for
`one amendment of a sum-product value. When the program is
`written into a ROM to embed the software into the informa-
`
`tion processing apparatus, the required amount of installed
`ROM will have to need to be increased by an amount equal to
`this increase in code size, leading to an increase in manufac-
`turing cost. A large number of manufacturers of domestic
`appliances such as digital video players, electronic note-
`books, and word processors seek to improve on their rivals’
`products by using their own decompression processing pro-
`grams, although the installation of such decompression pro-
`cessing programs presently has the drawback of increasing
`costs by increasing the required amount of ROM, making
`such installation problematic.
`There is also the problem that since eight instructions need
`to be executed to correct one sum-product value, there is a
`large increase in processing time. When, as shown in FIG. 2,
`an approximation calculation for an inverse DCT is per-
`formed by multiplying compressed data Fij (where i,j:1,2,3,
`4,5 .
`.
`. 8) composed of 8*8 elements with a coefficient matrix
`Gji (where i,j:1,2,3,4,5 .
`.
`. 8) also composed of8*8 elements
`to produce the multiplication result matrix Hij (where i,j:1,
`2,3,4,5 .
`.
`. 8), the calculation of the matrix multiplication
`result element H21 requires the sum-product processing of
`the multiplication results of one colunm of compressed data
`elements F11, F21, F31, F41, F51, F61, F71, F81 by one row
`ofcoefficient data elements G11, G12, G13, G14, G15, G16,
`G17, G18. The result is then subjected to positive conversion
`saturation calculation processing. Following this, the calcu-
`lation of the matrix multiplication result element H12
`requires the sum-product processing of the multiplication
`results of the colunm of compressed data elements F12, F22,
`F32, F42, F52, F62, F72, F82 by one row of coefficient data
`elements G11, G12, G13, G14, G15, G16, G17, G18, withthe
`sum-product result then being subjected to positive conver-
`sion saturation calculation processing.
`The same sum-product processing and positive conversion
`saturation calculation processing is required to obtain the
`other matrix multiplication result elements H21, H31, H41,
`H51, H61, H71, H81, .
`.
`.
`, and since there are 64 elements in
`the coefficient matrix Gij (where i,j:1,2,3,4,5 .
`.
`. 8), the
`sum-product value amending subroutine for positive conver-
`sion saturation calculation processing needs to be performed
`64 times. This sum-product value amending subroutine
`includes branch instructions (as Instructions 3, 5, and 7), so
`that when this sum-product value amending subroutine is
`executed, branches will occur regardless of whether negative
`values or saturation occur, so that the 64 iterations of the
`subroutine will not be performed smoothly. When attempts
`are made to improve the processing speed ofthe sum-product
`operation by introducing pipeline processing to the processor,
`the execution ofthe stated three branch instructions will result
`
`in a noticeable drop in processing efficiency.
`In order to increase the speed of the matrix multiplication,
`it is possible to install a specialized circuit for performing
`matrix multiplication. However, if all of the matrix multipli-
`cations are performed by a specialized circuit, there would be
`a vast increase in hardware, and the processor characteristic
`known as versatility, whereby the processor executes a variety
`of processes in accordance with the program written by the
`programmer, is lost. If the versatility of the processor is lost,
`there is the risk that the processor will not be able to respond
`
`PETITIONER EXHIBIT 1025-0021
`
`
`
`US RE43,l45 E
`
`5
`to programmers’ wishes, and so will not, for example, be able
`to execute an original decompression processing program.
`
`SUMMARY OF THE INVENTION
`
`6
`lation processing is performed in the same step as the calcu-
`lation processing, so that the effective number of steps taken
`the positive conversion saturation calculation processing is
`zero.
`
`It is a primary object of the present invention to provide a
`processor that can perform a rounding process made up of a
`positive conversion process and a saturation calculation pro-
`cess at high speed, while minimizing the increase in code size
`caused by the rounding process.
`The stated object can be achieved by a processor that suc-
`cessively decodes and executes instructions in an instruction
`sequence, the instruction sequence including instructions that
`indicate a storage address of a value used in an operation, the
`processor including: a detecting unit for detecting whether a
`next instruction to be decoded includes an operation content
`indication showing that the next instruction is a correction
`instruction and, ifpresent, reading the operation content indi-
`cation; and a rounding unit for rounding, when the detecting
`unit has detected an operation content indication showing that
`the next instruction is a correction instruction, a coded m-bit
`integer stored at a storage address indicated by the instruction
`to a value expressed as an uncoded s-bit integer (where s<m).
`With thc statcd construction, thc proccssing for rounding
`values is performed once each time a correction instruction is
`detected out of the instruction sequence, so that the rounding
`process can be executed by the programmer writing only one
`instruction.
`
`As the rounding process is performed according to one
`correction instruction, the execution time for one execution of
`the rounding process is extremely short. When the rounding
`of calculated values is required very often, such as when
`decompressing data, there will not be a significant increase in
`the time taken by the decompression processing.
`Since the rounding process can be performed by simply
`executing a correction instruction, when the processor
`attempts to perform a sum-products operation at high speed
`through pipeline processing, there will be no confusion in the
`pipeline. Accordingly,
`the code size of the instruction
`sequence can be reduced and the execution of the instruction
`sequence made faster by adding a small amount of hardware
`to the processor.
`The stated object can also be achieved by a processor that
`successively decodes and executes instructions in an instruc-
`tion sequence, the instruction sequence including instructions
`that indicate a storage address of a value to be used in an
`operation, the processor including: a first detecting unit for
`detecting whether a next instruction to be decoded includes an
`indication showing that the instruction has a calculation per-
`formed; a second detecting unit for detecting whether the next
`instruction to be decoded includes an indication showing that
`calculation is to be performed and that rounding is-to be
`performed on a calculation result; a calculating unit for per-
`forming, when the first detecting unit detects that the next
`instruction includes an indication showing that the instruction
`has a calculation performed, a calculation using an m-bit
`integer in accordance with the indication; and a rounding unit
`for rounding, when the second detecting unit has detected that
`the next instruction to be decoded includes an indication
`
`showing that rounding is to be performed, a calculation result
`of a calculation that uses an m-bit integer to a value expressed
`as an uncoded s-bit integer (where s<m).
`With the stated construction, correction instructions for
`performing a rounding process of a coded calculation result
`are provided, so that the two processes composed of a calcu-
`lation process and a rounding process can be performed in a
`single step. As a result, positive conversion saturation calcu-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other objects, advantages and features of the
`invention will become apparent from the following descrip-
`tion thereof taken in conjunction with the accompanying
`drawings which illustrate a specific embodiment ofthe inven-
`tion. In the drawings:
`FIG. 1 shows a conventional construction composed of an
`ALU 61 and a sum-product result register 62;
`FIG. 2 gives a representation of multiplication of matrices
`composed of N*N elements;
`FIG. 3 shows the construction of the processor of the first
`embodiment of the present invention;
`FIG. 4 shows the construction of the operation execution
`apparatus 14 in the present embodiment;
`FIG. 5 shows an instruction sequence composing the
`matrix multiplication subroutine in the present embodiment;
`FIG. 6 shows the instruction format of a sum-product func-
`tion multiplication instruction “MACCB D1,D1” i11 tl1e
`present embodiment;
`FIG. 7 shows the instruction format of a positive conver-
`sion saturation calculation instruction “MCSST” in the
`
`present embodiment;
`FIG. 8A shows the 32-bit expressions that are the multi-
`plier, the multiplicand, the sum-product value, and the matrix
`multiplication result element;
`FIG. 8B shows how the sum-product value is convened by
`the positive conversion saturation calculation circuit 3;
`FIG. 9 is a truth value table showing the relation of the
`combination ofthe output values of the constant generator 21
`and the zero generator 25 with the output of the multiplexer
`24;
`FIG. 10 shows the flow of data when performing an 8*8 bit
`multiplication using a 32*32 bit multiplication/sum-product
`unit;
`FIG. 11 shows the flow of data when performing an 8*8 bit
`multiplication using a 32*32 bit multiplication/sum-product
`unit;
`FIG. 12A shows an example of the pipeline processing
`performed by the processor shown in FIG. 3;
`FIG. 12B shows the execution according to pipeline pro-
`cessing of a matrix multiplication subroutine inside the pro-
`cessor shown in FIG. 3;
`FIG. 13 shows the instruction format of a positive conver-
`sion saturation calculation instruction “MCSST” in the
`
`applied example in the first embodiment;
`FIG. 14 shows the internal construction of the operation
`execution apparatus 14 in the first embodiment;
`FIG. 15 shows the internal construction of the operation
`execution apparatus 14 in the second embodiment; and
`FIG. 16 shows the instruction format of a positive conver-
`sion saturation calculation multiplication instruction “MulB-
`SST Dm,Dn”.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`First Embodiment
`
`The following is an explanation of the first embodiment of
`the present invention with reference to the drawings. FIG. 3
`shows the internal construction of the processor in the first
`embodiment ofthe present invention, which can be seen to be
`
`PETITIONER EXHIBIT 1025-0022
`
`
`
`US RE43,l45 E
`
`7
`composed of a ROM 11, an instruction fetch circuit 12, a
`decoder 13, an operation execution apparatus 14, an address
`bus 17, and a data bus 18, with the address bus 17 and the data
`bus 18 being connected to the RAM 10.
`The RAM 10 stores the compressed data Fij (i,j:l,2,3,4,
`.
`. 8