`Kizuka
`
`111111
`
`111111111111111111111111111111111111111111111111111111111111
`lTS0057969 3 7 A
`Patent Number:
`Date of Patent:
`
`5,796,937
`Aug. 18, 1998
`
`[11]
`
`[45]
`
`[54] METHOD OF AND APPARATUS FOR
`DEALING WITH PROCESSOR
`ABNORMALITY IN MULTIPROCESSOR
`SYSTEM
`
`[75]
`
`Inventor: Yoshitaka Kizuka. Kawasaki. Japan
`
`[73] Assignee: Fujitsu Limited. Kawasaki. Japan
`
`[21] Appl. No.: 53(1,739
`
`[22] Filed:
`
`Sep. 29, 1995
`
`[30]
`
`Foreign Application Priority Data
`
`Sep. 29, 1994
`
`[JP)
`
`Japan .................................... 6-235422
`
`Int. Cl.6
`...................................................... G06F 11/00
`[51]
`[52] U.S. Cl ................................. 395/182.11: 395/182.09:
`3951182.05; 364/268: 364/268.3
`[58] Field of Search .......................... 395/182.11. 182.09.
`395/182.01. 181. 182.05
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,787,816
`3,812.468
`3,937,936
`4,415,973
`4,503.534
`4,654,846
`4,807,228
`
`1/1974 Hauck ................................ 395/182.01
`5/1974 Wollum .......................... 395/182.09 X
`2/1976 Saporito ............................. 395/182.09
`1111983 Evans ............................. 395/182.11 X
`3/1985 Budde ............................ 395/182.11 X
`3/1987 Goodwin ............................ 395/182.11
`2/1989 Dahbura ............................. 395/182.11
`
`9/1989 Chao ................................... 395/181 X
`4,866.712
`6/1990 Elrod .................................. 395/182 01
`4,933.838
`3/1991 Ely ................................. 395/18209 X
`5,003.464
`5/1993 Glider ..................................... 395/181
`5,214.778
`Primar}' Examiner-Robert W. BeausolieL Jr.
`Assistant Examiner-Dieu-Minh Le
`Attome}; Agem, or Firm-Staas & Halsey
`ABSTRACT
`[57]
`
`A multiprocessor system has processors for processing dis(cid:173)
`tributed works. a monitoring facility for detecting an abnor(cid:173)
`mality in any one of the processors. an administration
`facility for providing information about the abnormal pro(cid:173)
`cessor and information about a redundant processor. and a
`work allocation facility for seeking the distributed works of
`the abnormal processor from a work table according to these
`pieces of information and allocating the sought works to
`given ones of the processors. The system includes an abnor(cid:173)
`mality measures table that selectively describes measures to
`be taken for each of the distributed works against an
`abnormality. The work allocation facility determines. for
`each of the distributed works of the abnormal processor. a
`measure to be taken according to the abnormality measures
`table and allocates the distributed works of the abnormal
`processor to given ones of the processors. If the abnormality
`is recursive. allocating any work for which a specific mea(cid:173)
`sure such as rerun or continuation is to be taken is sus(cid:173)
`pended. If the redundant processor is being initialized.
`allocating works to the redundant processor is delayed.
`
`4 Claims, 18 Drawing Sheets
`
`I - - - - - - - - - - - - - - - - - - - -
`1 CURRENT PROCESSOR (S)
`
`-- - - - -~-- ------,
`I
`I
`I
`I
`I
`
`\
`MONITORING FACILITY
`·TO DETECT ABNORMAL PROCESSOR
`OF P1 TO P3
`
`8
`
`\
`
`CORRESPONDENCE TABLE
`·TO DESCRIBE TYPES OF
`PROCESSORS
`
`ABNORMALITY CLASSIFICATION
`TABLE
`·TO DESCRIBE STATES OF
`PROCESSORS AND POSSIBILITY
`OF RECURSIVE ABNORMALITY
`sr
`
`ABNORMALITY
`
`~
`ADMINISTRATION FACILITY
`·TO IDENTIFY ABNORMAL AND
`REDUNDANT PROCESSORS
`·SEE IF REDUNDANT PROCESSOR
`IS BEING INITIALIZED
`·AND SEE IF ABNORMALITY IS
`RECURSIVE
`
`jE--
`
`t--
`
`INFORMATION ABOUT
`ABNORMAL AND
`REDUNDANT PROCESSORS
`
`I
`I
`_L_
`- - - - - - .. ·-··-
`
`-
`
`- - - - · · - - - - ' - - - -
`
`I
`I
`I
`I
`I
`I
`_L _
`
`-
`
`P2
`
`AHM, Exh. 1006, p. 1
`
`
`
`......:1
`~ w
`0'\
`~
`""-~
`Ol
`
`'II
`
`'II
`
`~ .....
`rJ':J ::r a .....
`
`QO
`
`QO
`~
`~ .....
`
`> = ~ .....
`
`~ = ~
`~ ~
`•
`00.
`
`e •
`
`CURRENT I
`
`PROCESSOR
`
`{
`P1
`
`1--
`
`L.....J
`8
`
`--
`
`-
`
`PROCESSOR
`CURRENT
`
`~
`P3
`
`-
`
`PROCESSOR
`CURRENT
`
`!--
`
`~
`P2
`
`:
`I
`I
`I
`I
`I
`I
`I
`r--
`I
`I
`
`'
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-------
`
`REDUNDANT PROCESSORS
`ABNORMAL AND
`INFORMATION ABOUT
`
`RECURSIVE
`·AND SEE IF ABNORMALITY IS
`IS BEING INITIALIZED
`·SEE IF REDUNDANT PROCESSOR
`REDUNDANT PROCESSORS
`·TO IDENTIFY ABNORMAL AND
`ADMINISTRATION FACILITY
`
`ABNORMALITY
`
`~
`2
`
`5
`
`OF RECURSIVE ABNORMALITY
`PROCESSORS AND POSSIBILITY
`·TO DESCRIBE STATES OF
`TABLE
`ABNORMALITY CLASSIFICATION
`
`PROCESSORS
`·TO DESCRIBE TYPES OF
`CORRESPONDENCE TABLE
`
`\
`L..
`
`~CURRENT PROCESSOR (S)
`-----------------------------------------,
`
`1
`
`OF Pl TO P3
`·TO DETECT ABNORMAL PROCESSOR
`MONITORING FACILITY
`
`~
`1
`
`Fig.1A
`
`AHM, Exh. 1006, p. 2
`
`
`
`.....:J
`\0 w
`="
`._.
`\0
`.....:J
`._.
`til
`
`~ -00
`r:n =(cid:173)~ a
`
`N
`
`rte -S'J -~
`
`> c
`
`"'""" ~ a
`
`~
`~
`•
`\J).
`0 •
`
`9
`
`----------~----------------------J
`
`I
`
`L----------
`
`I
`
`SHARED MEMORY
`
`I NONVOLATILE
`
`I
`I
`I
`
`r--PROCESSOR
`REDUNDANT
`
`--
`
`-
`
`I
`I
`I
`
`f
`P4
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`T
`E I
`
`I
`
`ALLOCATION SH
`
`~~
`
`f
`
`IF ABNORMALITY IS RECURSIVE
`WORKS TO BE RERUN OR CONT! 'lUED
`·AND SUSPEND ALLOCATION OF
`OF REDUNDANT PROCESSOR
`PROCESSORS AFTER INITIALIZATION
`PROCESSOR TO SUBSTITUTE
`·TO ALLOCATE WORKS OF ABNORMAL
`WORK ALLOCATION FACILITY
`
`-
`
`-
`
`-
`
`Fig.1B
`
`WHEN ABNORMALITY OCCURS
`TO BE TAKEN FOR EACH WORK
`·TO DESCRIBE MEASURES
`ABNORMALITY MEASURES TABLE
`
`?
`OF PROCESSORS
`·TO DESCRIBE WORKS
`WORK TABLE
`
`16
`
`-
`
`-
`
`---
`
`AHM, Exh. 1006, p. 3
`
`
`
`~ ......
`\0
`="'
`....
`\0
`.... ......
`VI
`
`P4: INITIALIZED REDUNDANT PROCESSOR
`
`P1~P3: CURRENT PROCESSOR
`
`00
`~
`~
`
`00. :r
`
`~ -I.;J
`
`00
`~
`
`~
`
`~
`
`~
`
`> = ~
`
`~ ... ~ = ...
`
`e •
`
`•
`00.
`
`APPLICATION C
`COMMUNICATION
`
`CONTROL c
`COMMUNICATION
`CONTROL b
`COMMUNICATION
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`P4
`
`PROCESSOR
`REDUNDANT)
`
`(
`
`APPLICATION C
`COMMUNICATION
`APPLICATION B
`COMMUNICATION
`
`CONTROL c
`COMMUNICATION
`
`DISTRIBUTED OS
`
`PROCESSOR
`CURRENT
`Fig.2A
`
`P3
`
`PROCESSOR
`CURRENT
`
`P2
`
`APPLICATION C
`COMMUNICATION
`
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`PROCESSOR
`CURRENT
`
`P1
`
`AHM, Exh. 1006, p. 4
`
`
`
`.....:1
`~ \C w
`0'\
`\C
`~ -....!
`f.Jl
`
`~ -00
`~ -~
`
`00 :r
`
`~
`
`> = ~ -S'J -IC
`
`00
`IC
`
`: RERUN
`
`( i.i)
`
`:CONTINUATION
`
`( .i. )
`
`B:DRAWBACK
`
`{7 RESTORATON OF P2
`
`SERVER
`PRINTING
`
`APPLICATION C
`COMMUNICATION!
`
`APPLICATION C
`COMMUNICATION
`APPLICATION B
`COMMUNICATION
`
`CONTROL c
`COMMUNICATION
`
`---
`
`(ii)
`
`SERVER
`PRINTING
`
`APPLICATION C
`COMMUNICATION
`
`~
`
`~ =
`00 • ;p
`
`f""'t'o.
`
`f""'t'o.
`
`DISTRIBUTED OS ~:HALT
`I
`
`P4
`
`I
`PROCESSOR-PROCESSOR
`REDUNDANT
`
`I
`CURRENT
`
`P3
`
`PROCESSOR
`CURRENT
`Fig. 2 8
`
`CONTROL b
`COMMUNICATION
`
`( i)
`
`X
`
`P2
`
`( i)
`
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`P1
`
`PROCESSOR
`CURRENT
`
`AHM, Exh. 1006, p. 5
`
`
`
`~ "''
`\C
`="'
`,.
`,. "'' \C
`
`01
`
`~ -00
`::g -U1
`
`r.l1 ::r
`
`> = ~ -~ -~
`
`00
`
`~ • IJJ.
`
`(tl a
`~ .......
`~
`•
`
`~ CRUSH OF P4
`
`: RERUN
`
`(li)
`
`:CONTINUATION
`
`(il
`E::J: DRAWBACK
`~:HALT
`
`APPLICATION C
`COMMUNICATION
`APPLICATION B
`COMMUNICATION
`
`CONTROL c
`COMMUNI CAT I ON I
`I
`I
`
`PROCESSOR
`CURRENT
`
`PL.
`
`DISTRIBUTED OS
`
`P3
`
`PROCESSOR
`CURRENT
`
`Fig. 2C
`
`P2
`X
`
`SERVER
`PRINTING
`
`APPLICATION C
`COMMUNICATION
`
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`PROCESSOR
`CURRENT
`
`P1
`
`AHM, Exh. 1006, p. 6
`
`
`
`......:1
`VJ
`\C
`="
`....
`\C
`......:1
`....
`Ol
`
`~ -00
`rJ1 =-~ -0',
`> = ~ -7J -~
`
`00
`
`~ = f"'f".
`~ = f"'f".
`
`•
`Cl'l
`~ •
`
`{!r RESTORATON OF 4
`
`{1 RESTORATON OF 2
`
`APPLICATION Ci
`COMMUNICATION,
`
`APPLICATION C
`COMMUNICATION
`APPLICATION B
`COMMUNICATION
`
`:RERUN
`
`(ill
`
`:CONTINUATION
`
`(i)
`
`CONTROL b
`COMMUNICATION
`
`(.ii)
`
`\
`
`CONTROL c
`COMMUNICATION
`
`B:DRAWBACK
`
`~:HALT
`
`DISTRIBUTED OS
`
`~ ~
`
`(l)
`I
`
`DISTRIBUTED OS
`
`X
`
`PL.
`
`P3
`
`PROCESSOR
`CURRENT
`Fig. 2 D
`
`P2
`X
`
`SERVER
`PRINTING
`
`APPLICATION C
`COMMUNICATION
`
`~I-'"
`
`(i~l
`
`CONTROL b
`COMMUNICATION
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`P1
`
`PROCESSOR
`CURRENT
`
`AHM, Exh. 1006, p. 7
`
`
`
`.....:1
`\C w
`="
`-..
`\C
`.....:1
`.,.
`til
`
`~ -QO
`
`......
`~ a
`'71 :r
`
`= ~ -~ -~
`
`;....
`
`~ = .......
`~ .......
`
`•
`00
`~ •
`
`APPLICATION C
`COMMUNICATION
`
`-
`
`---
`
`( lill :RESUMPTION
`
`(il' :CONTINUATION
`
`(RECOVERY)
`
`Pl..
`
`CONTROL c
`COMMUNICATION
`CONTROL b
`COMMUNICATION
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`I
`X -(REDUNDANT)
`PROCESSOR
`
`DISTRIBUTED OS
`
`P3
`
`PROCESSOR
`CURRENT
`Fig.2E
`
`[
`
`( g~ STR I BUTEO )
`X CURRENT
`-PROCESSOR
`
`P2
`
`CONTROL b
`COMMUNICATION
`
`[,
`
`CONTROL b
`COMMUNICATION
`CONTROL a
`COMMUNICATION
`DISTRIBUTED OS
`
`P1
`
`PROCESSOR
`CURRENT
`
`APPLICATION C
`COMMUNICATION
`APPLICATION B
`uDICOMMUNICATION
`APPLICATION A
`COMMUNICATION
`CONTROL c
`COMMUNICATION
`
`-----·-··-
`
`1-.----------~-
`
`APPL I CAT I ON B f-
`COMMUNICATION
`
`L...--__ ~---~ --
`
`SERVER
`PRINTING
`
`APPLICATION C
`COMMUNICATION
`
`AHM, Exh. 1006, p. 8
`
`
`
`......:1
`~
`\C
`~ -..
`\C
`......:1
`-..
`Ol
`
`~
`.....
`~
`
`rJ;J =- ::g -~
`
`> = ~
`
`~
`.....
`~~
`.....
`
`~
`
`~ a
`~ = .....
`•
`'Jl
`~ •
`
`PM :#:002
`
`25
`
`FACILITY
`MONITORING
`
`-
`
`26
`
`-
`
`-
`
`-
`
`-
`
`-
`
`ALLOCATION STATE
`
`-
`
`CRUSH
`
`1
`
`L--------------_.J
`I w I TH UN I T 13
`I
`1 14 AND 15 AND COMMUNICATE 1
`:·TO PM ADMINISTER UNITS
`I ADMINISTRATION UNIT--; INSTALLATION
`, __ \..._ -----
`16\
`14
`
`I TABLE OF FIG. 5 I
`~
`1 CLASS IF I CAT I ON :
`: TABLE OF FIG. 4 :
`I ABNORMALITY
`I
`I CORRESPONDENCE
`:
`: ·TO ADMIN I STER 1
`I ·TO ADMINISTER
`I
`:
`1 DEC I S I ON UNIT
`: DEFINITION UNIT :
`: CLASS I F I CAT I ON 1
`I
`I
`;-----------l i A-BNORMALITY--~
`ADMINISTRATION FACILITY
`
`I
`
`----r-·-;--J ~]r--___ j
`
`15
`
`24
`
`11
`
`12
`
`CURRENT PROCESSOR MODULE (S)
`
`Fig. 3A
`
`AHM, Exh. 1006, p. 9
`
`
`
`-....)
`~
`\C
`
`\C "' ._.
`
`-....)
`._.
`til
`
`"" ~ -00
`'J1 =-a
`
`~ -7J -"" "" 00
`
`> c:
`
`~ =
`......
`~ ......
`
`•
`00
`~ •
`
`28
`
`SHARED MEMORY
`NONVOLATILE
`
`I
`
`' I
`' I
`
`I
`2~
`
`I MEMORY
`I CPU 1-
`PM #003
`
`-
`
`--
`
`-
`
`LLATION
`
`-E-c
`INSTA
`
`I
`
`11
`J
`
`FACILITY
`MONITORING
`
`~DRAWBACK-RESUMPTION ~--1_
`~L W~ ~H_D~~~L-~N~T ___ c<_I-~1--l-19
`,---------------,
`L-~----------::J ,8 -..l-18
`HALT UNIT
`
`f--1 RERUN UNIT
`
`, -~20
`
`._ ____________ :J
`
`Lc~~! ~N~~T ~ o~ _u~ I_T _ _p:J ~'r,~ 22
`,------------..,
`.._-----------_J
`~ 21
`L-~-----------~ ,8
`Ol'--~-
`,------------.,
`L_ ____________ ::_j
`L-..-----------:.1 ,(3
`'UNIT
`ex'
`r------------.,
`
`L------------------_j
`
`h,~23
`
`I
`
`WORK ALLOCATION FACILITY
`~ ~E~S_U~E_S _ T~~L~ _0~ £ ~G~7 ____ pj .8:
`I ·TO ADMINISTER ABNORMALIT
`: -~
`; MEASURES DETERMINING UNIT
`
`'-----------__ J
`
`r--_.----------------1._
`
`r------------~
`
`.._ ____________ :_)
`cxr -1-
`L--~--------------~
`-,-----------------J .d
`:
`
`-----
`
`Fig. 38
`
`-
`
`-
`
`REDUNDANT PROCESSORS
`ABNORMAL AND
`INFORMATION ABOUT
`
`-
`
`~---
`
`~~
`I
`: ~~-17
`
`I
`
`r--~-~--------------,
`
`~TO OTHER PMS
`IALLOCAfE WORKS OF ABNORMAL PM
`:·AND PROVIDE
`INSTRUCTIONS TO
`1 ·ADMINISTER WORK TABLE OF FIG.61
`I ·TO COMMUNICATE WITH UNIT 12
`;WORK ALLOCATION CONTROL UNIT
`
`:
`I
`I
`~-~
`l
`13
`
`AHM, Exh. 1006, p. 10
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 10 of 18
`
`5,796,937
`
`Fig.4
`
`r----- 31
`
`# 002
`
`NAME OF PM MOUNTING NUMBER OF PM
`pmOa
`# 001
`pmOb
`pmOc
`
`=If 003
`
`* 004
`
`IN THIS FIGURE:
`# 001 ~ :tf 003: CURRENT PM
`# 004: REDUNDANT PM
`
`Fig.5
`
`NAME OF PM
`
`STATE OF PM
`
`pmOa
`pmOb
`
`pmOc
`
`RESTORING
`OPERATING
`CHANGING
`INITIALIZING
`( INTO HOT
`STANDBY STATE
`
`)
`
`r---
`
`32
`
`POSSIBILITY OF
`RECURSIVE ABNORMALITY
`YES
`NO
`YES
`
`NO
`
`AHM, Exh. 1006, p. 11
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 11 of 18
`
`5,796,937
`
`Fig.6
`
`WORK
`
`DESTINATION PM
`
`33
`!
`
`pmOb
`
`pmOc
`
`REDUNDANT
`
`REDUNDANT
`
`REDUNDANT
`
`REDUNDANT
`
`pmOc
`
`pmOc
`
`REDUNDANT
`
`DISTRIBUTED OS
`
`COMMUNICATION
`CONTROL a
`COMMUNICATION
`CONTROL b
`COMMUNICATION
`CONTROL c
`COMMUNICATION
`APPLICATION A
`COMMUNICATION
`APPLICATION B·
`COMMUNICATION
`APPLICATION C
`PRINTING
`SERVER
`INTERACTIVE
`SERVICE
`
`pmOa
`
`pmOa
`
`pmOb
`
`pmOc
`
`p~b
`
`pmOb
`
`pmOa
`
`pmOb
`
`pmOb
`
`AHM, Exh. 1006, p. 12
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 12 of 18
`
`5,796,937
`
`Fig. 7
`
`WORK 10
`
`WORK
`
`34
`!
`MEASURES AGAINST
`PM ABNORMALITY
`SYSTEM HALT
`
`CONTINUATION
`DRAWBACK AND
`RESUMPTION.
`CONTINUATION AND
`RECOVERY
`RERUN
`
`RERUN AND
`RECOVERY
`
`HALT
`DRAWBACK AND
`RE SUMPT I ON.
`CONTINUATION
`
`BASE OF OS
`DISTRIBUTED OS
`(SYSTEM SERVICE)
`COMMUNICATION
`CONTROL
`
`HOST LINKAGE
`SERVICE
`(X)
`HOST LINKAGE
`SERVICE
`(Y)
`INTERACTIVE
`SERVICE
`
`COMMUNICATION
`APPLICATION
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`PRINTING SERVER
`
`RERUN
`
`AHM, Exh. 1006, p. 13
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 13 of 18
`
`5,796,937
`
`Fig. SA
`
`(MOUNTING NUMBER)
`CRUSH
`
`(MOUNTING NUMBER)
`INSTALLATION
`
`IS SYSTEM PROCESS >-'N_O~-----,
`CONTINUABLE?
`YES
`
`522
`
`RECURSIVE
`ABNORMALITY?
`YES
`
`NO
`
`NO
`
`AHM, Exh. 1006, p. 14
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 14 of 18
`
`5,796,937
`
`Fig. 88
`
`DELETE MOUNTING NUMBER
`OF ABNORMAL PM FROM
`CORRESPONDENCE TABLE 31
`THROUGH ADMINISTRATION
`FACILITY 12
`
`HALT OR DRAW BACK WORKS
`OF ABNORMAL PM AND
`WITHDRAW TAKEOVER
`INFORMATION THROUGH
`WORK ALLOCATION
`FACILnY 13
`
`527
`
`;
`
`I
`
`CHANGE MOUNTING NUMBER
`OF ABNORMAL PM TO THAT
`OF REDUNDANT PM
`IN
`CORRESPONDENCE TABLE 31
`THROUGH ADMINISTRATION
`FACILITY 12
`
`~28
`
`CONTINUE.RERUN.OR DRAW
`BACK AND WITHDRAW WORKS
`OF ABNORMAL PM THROUGH
`WORK ALLOCATION
`FACILITY 13
`
`I
`
`END
`
`AHM, Exh. 1006, p. 15
`
`
`
`~ ......
`\C
`-..
`0'\
`\C
`'I
`-..
`Ul
`
`*6
`
`*5
`
`*L..
`
`~ -QO
`(",) --VI
`~ -S'J -~
`
`> c
`
`(",)
`::'
`00
`
`~ = ......
`~ ......
`~
`•
`
`~ • \Jl
`
`FACILITY 12
`THROUGH ADMINISTRATION
`CORRESPONDENCE TABLE 31
`OF
`DESCRIBE MOUNTING NUMBER
`
`INSTALLED PM
`
`IN
`
`S37
`
`FACILITY 12
`THROUGH ADMINISTRATION
`CORRESPONDENCE TABLE 31
`INSTALLED PM
`IN
`AND NEW NAME OF
`DESCRIBE MOUNTING NUMBER
`
`S3L..
`
`YES
`
`NO
`
`533
`
`NO
`
`Fig. 9A
`
`FACILITY 12
`THROUGH ADMINISTRATION
`CORRESPONDENCE TABLE 31
`OF
`DESCRIBE MOUNTING NUMBER
`
`INSTALLED PM
`
`IN
`
`OBJECT OF WORK TO BE
`IN DRAWBACK STATE OR
`INSTALLED
`IS PM TO BE
`
`YES
`RECOVERED?
`
`S31
`
`AHM, Exh. 1006, p. 16
`
`
`
`......:1
`~
`\C
`="'
`.,..
`\C
`......:1
`.,..
`Ul
`
`END
`
`""" -QO
`a -""' 0
`> = ~ -~ -~
`
`ga
`
`QO
`
`~ =
`......
`~ = ~
`
`•
`00
`•
`Lj
`
`INSTALLED PM
`/
`536
`
`EXECUTE WORKS
`LET
`
`YES
`
`*6
`
`NO
`
`DESCRIBE WORKS OF
`DOES WORK TABLE 33
`;
`535
`
`INSTALLED PM?
`
`*5
`
`Fig. 98
`
`FACILITY 13
`WORK ALLOCATION
`RECOVER WORKS THROUGH
`WORKS DRAWN BACK,OR
`INSTALLED PM AND RESUME
`RELEASE BLOCKING OF
`
`_!__
`532
`
`*4
`
`AHM, Exh. 1006, p. 17
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 17 of 18
`
`5,796,937
`
`Fig.10A
`
`HALTING SYSTEM
`
`~ jP2l jP"3l P1~P3:CURRENT
`~ L_ j L_ j
`PROCESSOR
`
`Fig.10B
`
`DRAWING BACK AND RESUMING ABNORMAL PROCESSOR Pl
`(NO REDUNDANT PROCESSOR)
`
`~LJLJ
`..!). LJLJLJ
`LJLJLJ
`
`RESUMING
`WORKS
`
`DRAWING BACK
`
`{7
`
`AHM, Exh. 1006, p. 18
`
`
`
`U.S. Patent
`
`Aug. 18, 1998
`
`Sheet 18 of 18
`
`5,796,937
`
`Fig.10C
`
`RERUNNING OR CONTINUING WORKS BY SUBSTITUTE PROCESSORS
`P2 AND P3
`(NO REDUNDANT PROCESSOR)
`
`..!}
`
`~c:JLJ
`P2 II P3
`II RERUNNING OR CONTINUING
`~ ,,
`L___j 4 -: -----1~1---1-:---..f-1~ WORKS OF PROCESSC'. P 1
`IP11,1 p2 II P)
`{7
`I' RERUNNING OR CONTINUING
`L___j '-+_---f-._.... _ _ _ _ ....,__ WORKS OF PROCESSOR Pl
`RESTORING
`
`Fig.10D
`
`RERUNNING OR CONTINUING WORKS BY REDUNDANT
`PROCESSOR P4
`
`~c:JLJLJ
`~c:JLJLJ RERUNNING OR CONTINUING
`
`{7
`
`WORKS OF PROCESSOR P1
`
`AHM, Exh. 1006, p. 19
`
`
`
`5.796.937
`
`1
`METHOD OF AND APPARATUS FOR
`DEALING WITH PROCESSOR
`ABNORMALITY IN MULTIPROCESSOR
`SYSTEM
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to a method of and an
`apparatus for dealing with a processor abnormality in a
`multiprocessor system. and particularly. to a multiprocessor
`system having processors for processing distributed works.
`When a monitoring facility detects an abnormality in any
`one of the processors. an administration facility provides
`information about the detected abnormal processor as well 15
`as information about a redundant processor. to a work
`allocation facility which seeks the distributed works of the
`abnormal processor from a work table according to these
`pieces of information and allocates the sought works to
`given ones of the processors.
`The present invention allocates the distributed works of
`the abnormal processor to the other processors in a way to
`improve the fault tolerance of the multiprocessor system and
`secure a 24-hour operation of the system.
`2. Description of the Related Art
`A multiprocessor system according to a prior art loosely
`couples processors each having a CPU and a memory
`through a high-speed bus and distributes works including an
`operating system (OS). applications. and communication
`control to the processors.
`To improve the fault tolerance of the system. it is impor(cid:173)
`tant to provide improved measures to deal with a processor
`abnormality. The prior art is incapable of optionally setting
`measures to deal with an abnormality depending on the
`processing conditions of the system and the requirements of 35
`a user. For example. the prior art is incapable of localizing
`the influence of an abnormality in one processor. to protect
`other processors.
`If the cause of a processor abnormality is a software 40
`failure such as an error in take-over information about a
`work processed by the abnormal processor. the abnormality
`will necessarily occur in a substitute processor that reruns or
`continues the work of the abnormal processor. This will
`involve another substitute process. which will be again
`abnormal, to thereby expand the processor abnormality.
`Consequently, the fault tolerance of the system will dete(cid:173)
`riorate.
`Works shared by an abnormal processor must be allocated
`to a redundant processor after the redundant processor is 50
`initialized. or the works will be incorrectly taken over by the
`redundant processor and the redundant processor will inef(cid:173)
`fectively serve as a substitute processor.
`
`55
`
`SUMMARY OF THE INVENTION
`An object of the present invention as to deal with a
`processor abnormality in various ways. suppress an expan(cid:173)
`sion of the processor abnormality. and effectively use a
`redundant processor.
`In order to attain the above object. the present invention 60
`provides a multiprocessor system having processors for
`processing distributed works. a monitoring facility for
`detecting an abnormality in any one of the processors (Pl to
`P4). an administration facility for providing information
`about the detected abnormal processor as well as informa- 65
`tion about a redundant processor. and a work allocation
`facility for seeking the distributed works of the abnormal
`
`2
`processor from a work table according to these pieces of
`information and allocating the sought works to given ones of
`the processors. The system includes an abnormality mea(cid:173)
`sures table that selectively describes measures to be taken
`5 for each of the distributed works against an abnormality. The
`work allocation facility determines. for each of the distrib(cid:173)
`uted works of the abnormal processor. a measure to be taken
`according to the abnormality measures table and allocates
`the distributed works of the abnormal processor to given
`10 ones of the processors. If the abnormality is recursive.
`allocation of any work for which a specific measure such as
`rerun or continuation is to be taken is suspended. If the
`redundant processor is being initialized. allocating works to
`the redundant processor is delayed.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The above object and features of the present invention
`will be more apparent from the following description of the
`preferred embodiments with reference to the accompanying
`20 drawings. wherein:
`FIGS. IA and lB show a basic structure of a multipro(cid:173)
`cessor system according to the present invention;
`FIGS. 2A 2B. 2C. 2D. and 2E explain measures to deal
`with an abnormality occurring in one of processors of the
`25 multiprocessor system:
`FIGS. 3A and 3B explain the operation of the multipro(cid:173)
`cessor system according to the present invention;
`FIG. 4 shows a processor module correspondence table
`30 according to the present invention:
`FIG. S shows an abnormality classification table accord(cid:173)
`ing to the present invention;
`FIG. 6 shows a work table (corresponding to FIGS. 2A to
`2D) according to the present invention;
`FIG. 7 shows an abnormality measures table according to
`the present invention;
`FIGS. SA and SB explain procedures (part 1) to deal with
`a crash or an installation of i processor module. according to
`the present invention;
`FIGS. 9A and 9B explain procedures (part 2) to deal with
`a crash or an installation of a processor module. according
`to the present invention; and
`FIGS. lOA to lOD show examples of conventional mea-
`45 sures to deal with a processor abnormality.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`Before describing the embodiments of the present
`invention. the related art and the disadvantages therein will
`be described with reference to the related figures.
`FIGS. lOA to lOD show measures to deal with a hardware
`or software abnormality in any one of processors Pl to P4.
`Among these processors. the processors PI to P3 are current
`processors. and the processor P4 is a redundant processor. It
`is supposed that an abnormality occurs in the current pro(cid:173)
`cessor Pl.
`Measures to deal with the abnormality include:
`halting all of the current processors;
`drawing back works processed by the abnormal processor
`Pl and resuming the works from the beginning after the
`abnormal processor PI is restored to a normal state;
`letting substitute processors rerun the works of the abnor(cid:173)
`mal processor Pl from the beginning; and
`letting the substitute processors continue the works of the
`abnormal processor Pl from the time when the abnor(cid:173)
`mality occurred.
`
`AHM, Exh. 1006, p. 20
`
`
`
`5.796.937
`
`3
`In this !specification. the "substitute processor" may be
`any one of the redundant and current normal processors.
`The multiprocessor system selects one or a plurality of
`these measures. The selection is solely carried out by an
`operating system (OS). and therefore. there is no room for a
`user or an application for freely selecting one or a plurality
`of them.
`In the above circumstances. these measures are taken
`without actively determining whether the abnormality is a
`hardware abnormality or a software abnormality.
`The software abnormality is caused by an error in a work
`program per se or in take-over information that is produced
`and used during the execution of a work program in each
`processor.
`To solve these problems. the present invention adopts an
`abnormality measures table that selectively describes mea(cid:173)
`sures to be taken for each of works shared by processors of
`a multiprocessor system against an abnormality. If an abnor(cid:173)
`mality repeatedly occurs (recursive error) during restoration
`of the processor that has caused the abnormality or during a
`change-over to a substitute processor. the system determines 20
`that the abnormality is due to a software error and suspends
`allocation of the related work by rerun or continuation to the
`substitute processor. If a redundant processor is being
`initialized. the system delays the allocation of works to the
`redundant processor. In this way. the system provides vari- 25
`ous measures to deal with a processor abnormality. sup(cid:173)
`presses an expansion of the processor abnormality. and
`effectively uses a redundant processor.
`FIGS. 1A and 1B explain the principle of a multiprocessor
`system according to the present invention.
`A processor monitoring facility 1 monitors the operating
`states of current processors P1 to P3 and detects an abnor(cid:173)
`mality in any one of them.
`An administration facility 2 notifies a work allocation
`facility 3 of information about an abnormal processor and a
`redundant processor and administers information about a
`possibility of recursive abnormality as well as information
`about whether or not a redundant processor is being initial(cid:173)
`ized. The recursive abnormality usually occurs when one
`processor takes over a work from another.
`The work allocation facility 3 allocates works of the
`abnormal processor sought from a work table 6 to given
`processors according to measures described in an abnormal-
`ity measures table 7. If the administration facility 2 notifies
`the work allocation facility 3 that there is a possibility of
`recursive abnormality. the facility 3 suppresses allocating
`works to be rerun or continued to the other processors. If the
`facility 2 notilies the facility 3 that the redundant processor
`is being initialized. the facility 3 delays allocating works to
`the redundant processor.
`A correspondence table 4 (refer to FIG. 4) describes the
`classification of the current and redundant processors. i.e ..
`the names and mounting numbers thereof.
`An abnormality classification table 5 (refer to FIG. 5)
`describes the state such as restoring. operating. or initializ(cid:173)
`ing state of each processor as well as a possibility of
`recursive abnormality of each processor.
`The work table 6 (refer to FIG. 6) describes works and
`processors that share the works.
`The abnormality measures table 7 (FIG. 7) describes. for
`each work. measures such as halt. drawback. rerun. and
`continuation to deal with an abnormality.
`The multiprocessor system also has a high-speed bus 8
`and a nonvolatile shared memory 9 for storing take-over
`information to be transferred from an abnormal processor to
`substitute processors. The processors of the system are the
`current processors P1 to P3 and the redundant processor P4.
`
`4
`For the sake of explanation. the tables 4 to 7 are separated
`from each other. The arrangements and storage of informa(cid:173)
`tion contained in these tables are optional.
`The monitoring facility 1 notifies the administration facil(cid:173)
`ity 2 of an abnormality occurring in any one of the current
`processors P1 to P3.
`The administration facility 2 refers to the correspondence
`table 4 and abnormality classification table 5. to identify the
`abnormal processor and a redundant processor and deter-
`10 mine whether or not the abnormality is recursive and
`whether or not the redundant processor is being initialized.
`These pieces of information are sent to the work allocation
`facility 3.
`The work allocation facility 3 refers to the work table 6.
`to identify works shared by he abnormal processor. Accord-
`15 ing to measures sought from the abnormality measures table
`7. the facility 3 allocates the works of the abnormal proces(cid:173)
`sor to the redundant processor P4. etc. Thereafter. the facility
`3 notifies the administration facility 2 of the allocation states
`of the works.
`Not only the redundant processor P4 but also any one of
`the current normal processors may serve as a substitute
`processor. Accordingly. the current normal processors
`execute newly allocated works in addition to works origi(cid:173)
`nally shared thereto.
`The administration facility 2 updates the correspondence
`table 4 according to information about the abnormal and
`redundant processors. The facility 2 also updates the abnor(cid:173)
`mality classilication table 5 according to information from
`the work allocation facility 3.
`The contents of the work table 6 and abnormality mea(cid:173)
`sures table 7 may be set or updated by a user or application
`according to the capacity and utilization mode of the mul(cid:173)
`tiprocessor system.
`As mentioned above. the method of dealing with an
`35 abnormal processor according to the present invention is
`applied to a multiprocessor system having processors for
`processing distributed works. the monitoring facility 1 for
`detecting an abnormality in any one of the processors. the
`administration facility 2 for providing information about the
`40 detected abnormal processor and information about a redun(cid:173)
`dant processor. and the work allocation facility 3 for seeking
`the distributed works of the abnormal processor from the
`work table 6 according to these pieces of information and
`allocating the sought works to given ones of the processors.
`45 The method of the present invention uses the abnormality
`measures table 7 that selectively describes measures to be
`taken for each of the distributed works against an abnor(cid:173)
`mality thereof. The method lets the work allocation facility
`3 determine. for each of the distributed works of the abnor-
`50 mal processor. a measure to be taken according to the
`abnormality measures table 7 and allocate the works to
`given ones of the processors accordingly.
`The apparatus according to the present invention for
`dealing with a processor abnormality in a multiprocessor
`55 system having processors to process distributed works has
`the monitoring facility 1 for detecting an abnormality in any
`one of the processors. the administration facility 2 for
`providing information about the abnormal processor and
`information about a redundant processor. the abnormality
`60 measures table 7 that selectively describes measures to be
`taken for each of the distributed works against an
`abnormality. and the work allocation facility 3 for seeking
`the distributed works of the abnormal processor from the
`work table 6 according to these pieces of information and
`65 allocating the sought works to given ones of the processors
`according to the measures described in the abnormality
`measures table 7.
`
`30
`
`AHM, Exh. 1006, p. 21
`
`
`
`5.796.937
`
`5
`6
`With the abnormality measures table 7 that describes, for
`At this time, the work of the abnormal processor P2 is
`each work. measures such as halt, drawback, rerun, and
`blocked against processing requests from the other proces(cid:173)
`continuation to deal with an abnormality and with the
`sors.
`abnormality classification table 5 that describes a possibility
`A multiprocessor system according to an embodiment of
`of recursive abnormality for each processor and whether or 5
`the present invention will be explained with reference to
`not a redundant processor is being initialized. the system is
`F1GS. 3A to 8B. For the sake of explanation. this system has
`capable of selecting the destination of each of the works of
`processor modules (PMs) each including a CPU and a
`memory.
`the abnormal processor. suppressing an expansion of the
`abnormality. and efficiently using the redundant processor.
`F1GS. 3A and 3B are general views showing the multi(cid:173)
`F1GS. 2A. 2B, 2C 2D. and 2E explain measures to deal 10
`processor system. The system has a monitoring facility IL
`with an abnormality occurred in a processor for processing
`an administration facility 12. a work allocation facility 13. a
`distributed works in a multiprocessor system. The multipro(cid:173)
`definition unit 14, an abnormality classification decision unit
`cessor system includes current processors PI to P3 and a
`15, an administration unit 16, a work allocation control unit
`17, a halt unit 18, a withdrawal unit I9. a drawback-
`redundant processor P4 that has been initialized to a hot
`standby state. It is supposed that an abnormality has 15
`resumption unit 20, a rerun unit 2I. a continuation unit 22.
`occurred in the current processor P2.
`a measures determining unit 23. a high-speed bus 24. the
`processor modules (PMs) 25 to 27. and a nonvolatile shared
`Works of the current processor P2 now becoming an
`memory28.
`abnormal processor are allocated to the other processors as
`follows:
`F1GS. 4 to 7 show variety of tables. that is. a correspon(cid:173)
`dence table 3I, an abnormality classification table 32, a
`a distributed operating system (OS) is continued by the 20
`work table 33. and an abnormality measures table 34.
`current processors PI and P3 and redundant processor
`respectively. Note that each processor module (PM) is
`P4;
`identified by a name used by software but not by a mounting
`conununication control b is continued by the redundant
`number. The correspondence between the name and the
`processor P4;
`a communication application A is continued by the current 25 mounting number is described in the table 3I.
`The monitoring facility 11. administration facility 12. and
`processor P3;
`work allocation facility 13 are realized by one or a plurality
`a communication application B is drawn back;
`of PMs, which receive a signal indicating an installation or
`a printing server is rerun by the current processor PI; and
`a crush from any one of