throbber
United States Patent (19)
`Brey et al.
`
`|||||||||||||
`USOO5274.646A
`11)
`Patent Number:
`5,274,646
`(45) Date of Patent:
`Dec. 28, 1993
`
`(54) EXCESSIVE ERROR CORRECTION
`CONTROL
`75 Inventors: Thomas M. Brey, Hyde Park;
`Matthew A. Krygowski, Hopewell
`Junction; Bruce L. McGilvray,
`Pleasant Valley; Trinh H. Nguyen,
`Wappingers Falls; William W. Shen,
`Poughkeepsie; Arthur J. Sutton,
`Cold Spring, all of N.Y.
`s
`o
`73) Assignee: International Business Machines
`Corporation, Armonk, N.Y.
`(21) Appl. No.: 686,721
`22) Filed:
`Apr. 17, 1991
`(51) Int. Cl. .............................................. G06F 11/10
`52 U.S. Cl. ..................................... 371/40.1; 371/13;
`371/21.6
`58) Field of Search ....................... 371/13, 40.1, 40.2,
`371/21.6, 395/575
`
`(56)
`
`t
`
`R555 PR
`E:
`
`4,964,129 10/1990 Bowden, III et al. ............. 371/40.2
`5,058, 15 10/1991 Blake et al. ........................ 37/40.1
`Primary Examiner-Charles E. Atkinson
`Attorney, Agent, or Firm-Bernard M. Goldman
`57)
`ABSTRACT
`A.
`A method of automatically invoking a recoverable and
`fault tolerant implementation of the complemented/re
`complemented (C/R) error correction method without
`the assistance of a service processor when an excessive
`error is detected in main storage (MS) by ECC logic
`circuits. An excessive error is not correctable by the
`ECC. These novel changes to the C/R method increase
`its effectiveness and protect the C/R hardware against
`random failure. Further, if an excessive error is cor
`rected in a page in MS, an excessive error reporting
`process is provided for controlling the reporting using a
`storage map to determine if a previous correction in that
`page has been reported. If it has been reported, then no
`further reporting of soft excessive errors is made for
`that page. A service processor is signaled in parallel to
`update its persistent copy of the storage map so that on
`References Cited
`a next initializations of MS the memory map can be
`U.S. PATENT DOCUMENTS
`restored in the memory. The memory map is used to
`371/40.2
`4,175,692
`/1979 Watanabe
`assist the repair of failing parts of MS, and is reset after
`54, '7/68 A
`. .';
`4,604,751 8/1986 Aichelmann, Jr. et al. .371/26 MS is repaired.
`4,661,955 4/1987 Arlington et al. .................... 371/3
`4,888,773 12/1989 Arlington et al. ................. 371/40.2
`24 Claims, 7 Drawing Sheets
`111 store."(s)-5E--
`ECC
`MAI
`DU- - - ECC BITs
`<33
`121- INVERer -
`12
`NQRPR-
`CgFFR
`ECC logic si?1
`ESSEE
`disfiel
`is
`H-H - GENERATOR
`H-S-s-
`
`OUTPR
`
`TC
`REQUESTOR
`DU+ECC s
`
`V
`
`
`
`
`
`
`
`
`
`
`
`STATUS
`BUFFER
`
`-R
`
`
`
`STATUS BUFFER
`OUTPR
`COMPARATOR
`OMPARATOR & PARITY CHK
`
`s
`SR
`N
`OUT OF STEP)
`
`STATUS
`STATUS
`BUFFER 2
`BUFFER 1
`PARITY
`PARTY
`CHECK
`CHECK
`! -
`S
`-54
`53-
`INTERRUP
`
`IPR2018-01474
`Apple Inc. EX1010 Page 1
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 1 of 7
`
`5,274,646
`
`MAIN
`STORAGE (MS)
`Du---
`121- INVERTER
`Y
`NO ERROR
`BIT
`CORR ERR
`UNcóir
`SPECIAL U
`H-H
`2ND - H-S
`FETCH s-s
`
`
`
`
`
`ECC LOGIC
`
`
`
`& SP
`GENERATOR
`
`FIG 1
`
`TO
`REQUESTOR
`£25
`REGS
`(SEE FIGS
`6 & 7)
`
`TAT
`
`PARITY CHECK
`
`ECC
`5Eic
`
`OUTPTR
`STATUS BIT
`INVERT
`CONTROL
`
`
`
`TO
`REQUESTOR
`
`DATA
`
`BUFFER
`
`25
`
`2.
`
`INTR
`OUTPTR
`
`
`
`26
`
`
`
`
`
`STATUS BUFFER
`OUTPTR
`COMPARATOR
`OMPARATOR & PARITY CHK
`
`
`
`I
`
`42
`
`43
`
`BITS
`
`5
`SRRE
`NTER
`OUT OF STEP)
`
`STATUS
`STATUS
`BUFFER 2
`BUFFER
`PARITY
`PARITY
`CHECK
`CHECK
`5.
`53
`INTERRUPT
`
`IPR2018-01474
`Apple Inc. EX1010 Page 2
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 2 of 7
`
`5,274,646
`
`
`
`FIG 2
`
`INPOINTER - - - - - -
`
`6.
`
`65
`
`OUTPOINTER - - - - - -
`
`FIG 3
`
`STATUS
`STATUS
`BUFFER 2
`BUFFER
`ENTRY
`ENTRY
`STATUS PARITY STATUS PARITY
`
`DATA BUFFER ENTRY
`
`2E
`
`27E
`
`2 E
`
`IPR2018-01474
`Apple Inc. EX1010 Page 3
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 3 of 7
`
`5,274,646
`
`FIG
`
`MEMORY REQUEST ADDRESS
`
`RE
`RiD MEMORY ADDRESS REGISTER FETCH Felic
`
`7
`
`7a-
`
`FIG. 5
`
`C/R
`REQUEST
`
`C/R SEQUENCER
`
`ST
`FETCH
`
`ST
`STORE
`
`2ND
`FETCH
`
`2ND
`STORE
`
`FIG. 6
`
`73-FROM MS CONTROLLER TO REQUESTOR PROCESSOR OR IMO
`BIT
`EXCESS, SPECIAL --ee
`NO
`DATA ERROR CORR. ERROR
`UE
`H-HH-SS-SRID
`ERROR
`
`O- - -63,
`
`l
`
`l
`
`7u.
`FIG 7
`\\
`FROM MS CONTROLLER TO SP
`ECC
`BIT 2 BIT
`o
`DATASYNDROME Erior CORR. UNCORR, SPEAL H-HH-SS-SRID
`BITS
`ERROR ERROR
`
`
`
`O- -63
`
`IPR2018-01474
`Apple Inc. EX1010 Page 4
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 4 of 7
`
`5,274,646
`
`FIG. 8
`
`REGUESTOR REQUESS DATA LINE
`
`2
`
`FETCH ALL OU-ECC GROUPS IN
`REQUESTED DATA LINE FROM MS
`
`3. GENERATE STATUS INFORMATION FOR EACH OU-ECC GROUP
`(INDICATE ANY EXCESSIVE ERROR FOR ANY GROUP)
`
`4. SEND EACH FETCHED OU--ECC GROUP WITH
`TS STATUS INFORMATION TO TS REQUESTOR
`
`ENO
`
`IPR2018-01474
`Apple Inc. EX1010 Page 5
`
`

`

`U.S.
`Patent
`
`Dec. 28, 1993
`
`Sheet 5 of 7
`
`5,274,646
`
`FIG. 9
`
`O.
`
`RE-FETCH REUEST MADE BY REQUESTOR RECEIVING ERROR STATUS
`SIGNAL TO INVOKE CMR PROCESS TO ATTEMPT TO CORRECT THE
`ERRONEOUS DU (DATA UNT)
`
`ii.
`
`FETCH IDU -- ECC FROM MAN STORAGE
`
`(FIRST FETCH &
`NO INVERSION
`HERE)
`
`2.
`
`GENERATE S = EXCESSIVE ERROR STATE, AND ITS PARITY P
`
`SMULTANEOUSLY
`14. STORE S, P BITS IN STATUS BUFFERS
`AND 2 A NPOINTER 1 AND 2. LOCATION
`15. STORE OU--ECC GROUP N DATA BUFFER
`AT (NPOINTER 3 LOCATION
`16. INCREMENT INPOINTER i , 2 & 3. TOGETHER
`
`18 FETCH TRANSFER COMPLETED WITHOUT CONTROL, FAILURE
`
`
`
`(NO)
`
`19
`
`YEs)
`
`NCREMENT CAR SEUENCER
`TO NITIATE ST STORE
`
`Simultaneously
`OUTPUT & COMPARE CONTENT OF STATUS BUFFERS 82
`PARTY CHECK CONTENT OF STATUS BUFFER 1 & 2
`COMPARE CONTENT OF OUTPOINTER 1, 2 & 3
`OUTPU CONTENT OF DATA BUFFER
`INCREMENT OUTPOINTER 1,2 & 3 TOGETHER
`INVERT OU--ECC F STATUS BIT EQUALS
`DISABLE SECOND ECC LOGIC AND
`STORE OU AND ECC TO MAN STORAGE
`
`STORE COMPLETED WITHOUT CONTROL FAILURE
`29 ACTIVATE - ves
`
`SEQUENCE
`OEPENDENT
`RESTORATION
`(SEE TABLE)
`
`(NO)
`
`31. INCREMENT CMR SEQUENCER
`TO INITIATE SECOND FETCH
`
`GOTO FIG. O
`
`IPR2018-01474
`Apple Inc. EX1010 Page 6
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 6 of 7
`
`5,274,646
`
`FROM FIG 9
`
`FIG. O
`
`5. FETCH IDU - ECC FROM MAN STORAGE
`
`second FETCH)
`
`SMULANE SY
`54. OUTPUT & COMPARE CONTENT OF STATUS BUFFER
`55. PARTY CHECK CONTENT OF STATUS BUFFER 1 & 2
`S COMPARE CONTENT OF OUTPOINTERS 1 & 2
`S7
`NVER OU-ECC F STATUS B ELAS
`58. INCREMENT OUTPOINTERS 1 & 2 FOR NEXT DU+ECC FROM MS
`
`2
`
`
`
`
`
`
`
`(O DATA BUFFER)
`
`(O REGUESTOR)
`
`ECC DEECS FOR ERROR 2
`(S-S,
`(YES)
`
`
`
`62.
`3. SIGNAL TO REQUESTOR 8 TO SP
`S-S, 2-BIT UNCORRECTED ERROR, OR
`H-H, NO ERROR, OR
`H-S, 1-BIT ERROR CORRECTED
`
`INTERRUP SP
`
`8. PASS DU - ECC O REQUESTOR mem)END
`
`7. STORE OU -- ECC INTO DATA BUFFER
`AT INPOINTER J LOCATION
`72. INCREMENT INPOINTER 3
`
`73 SECOND FETCH COMPLETED WITHOUT FAULT
`(YES)
`(NO)
`74. INCREMENT CAR SEQUENCER TO INITIATE 2ND STORE
`
`2nd SRE)
`
`81. AT OUTPOINTER 3 LOCATION,
`OUTPUT DATA BUFFER THRU
`2ND INVERTER (WITHOUT INVERSION)
`AND THRU 2ND ECC T CORRECT ANY
`i-BIT ERROR IN THE DU
`82 INCREMENT OUTPOINTER 3
`
`84 STORE OU -- ECC AT SAME MEMORY ADDRESS
`
`Be SECOND STORE COMPLETED WITHOU FAULT
`- No
`(YES)
`
`88 ACVATE SELENCE
`DEPENDENT RESTORATION
`(SEE TABE)
`
`END
`
`IPR2018-01474
`Apple Inc. EX1010 Page 7
`
`

`

`U.S. Patent
`
`Dec. 28, 1993
`
`Sheet 7 of 7
`
`5,274,646
`
`S-S
`OETECTED
`
`H-H
`R
`H-S
`DETECTED
`
`SPECIAL
`UE
`DETECTED
`
`TES LR BT
`IN HSA
`
`TES LTR BIT
`N HSA
`
`LTR BIT
`SET P
`
`(YES)
`
`NO)
`
`(YES)
`
`R BT
`SE 2
`
`(N)
`92
`
`SET
`R BT
`
`SET
`LTR BI
`
`9i
`
`MC INTERRUPT &
`SET B
`RESET BIT 32
`IN MCIC
`
`MV INTERRUPT
`. SET BIT 1.7
`N MCIC
`
`MC INTERRUPT &
`SET BITS A & 32
`IN MCC
`
`GOTO
`NEXT INSTRUCTION
`IN TASK
`(NO INTERRUPTION)
`
`LEGENO
`
`BIT 16 = UNCORRECTABLE ERROR SENT FROM ERROR-FREE DU
`HAROWARE AND NOT A SPECIA UE
`
`BIT 7
`
`OU HARDWARE ERROR CORRECTED DURING TRANSMISSION
`
`BIT 32
`
`SPECIAL UE CHARACTER TRANSMITTED FROM ERROR-FREE
`OU HARDWARE WITHOUT ERROR
`
`IPR2018-01474
`Apple Inc. EX1010 Page 8
`
`

`

`EXCESSIVE ERROR CORRECTION CONTROL
`
`O
`
`25
`
`INTRODUCTION
`The invention pertains to excessive error correction,
`its control, and its management in an efficient manner.
`An excessive error is an erroneous bit that cannot be
`corrected by the ECC (error correction code) provided
`with the data unit stored in a memory, such as the ran
`dom access memory in a computer system.
`BACKGROUND OF THE INVENTION
`The complement/recomplement (C/R) type of error
`correction was disclosed in U.S. Pat. No. 3,949,208
`entitled "Apparatus for Detecting and Correcting Er
`15
`rors in an Encoded Memory Word' to M. C. Carter and
`assigned to the same assignee as the subject application.
`The C/R technique has been used to augment the error
`correction capability of Hamming type ECCs (error
`20
`correction codes) for data units stored in the memory of
`a computer system. The C/R technique has been used
`to correct one or more hard (H) errors in a data unit,
`leaving the ECC to correct any soft (S) error in the data
`unit.
`A hard error is an error caused by a permanent fault
`in a circuit, such as a broken wire, and causes a bit
`position in memory to be stuck permanently in a given
`state, either a 1 or 0 state. A soft error is usually caused
`by an alpha particle changing the 0 or 1 state of a cir
`cuit, wherein a soft error condition will not exist the
`next time other data is stored in that circuit. Thus a hard
`error remains permanently in the hardware while a soft
`error exists only in the single recording of a data unit.
`The C/R method corrects only the permanently stuck
`state of hard errors. The C/R method may be used with
`35
`computer storage built with semiconductor dynamic
`random access memory (DRAM) semiconductor chips.
`The C/R method is initiated only after the ECC in a
`data unit finds an excessive error. Then the C/R process
`reads and complements (inverts) each read bit value in
`the data unit. Then the C/R process stores the inverted
`data unit back into the same bit locations in memory.
`When stored in their original locations, only the errone
`ous bits in hard error locations revert to their prior
`stuck states. All non-erroneous bits, and any erroneous
`bits with soft errors, will be inverted in relation to the
`stuck bits with hard errors which will not be inverted
`because of their stuck condition. A second fetch of the
`stored inverted data unit again inverts the read bits to
`correct all hard errors; and the ECC is then used only to
`50
`correct any soft errors up to the maximum capability of
`the ECC. After this second inversion and at the end of
`the C/R process, the data unit is again stored in memory
`in its original location in its original erroneous form.
`ECCs (error correcting codes) have been commonly
`55
`used in DRAM (dynamic random access memory) stor
`age by large computer systems, i.e. main storage (MS)
`and extended storage (ES). The most commonly used
`ECC has been for the SEC/DED (single error correc
`tion/ double error detection), which can detect, but
`cannot correct, double-bit errors in any data unit (DU)
`when the DU is stored or transferred. If a second bit
`error (an excessive error) is detected in a DU when
`using such SEC/DED type of ECC, the second errone
`ous bit cannot be corrected by the ECC. However, the
`second erroneous bit (the excessive error in a system
`using SEC/DED) often can be corrected by the C/R
`method for the transmission of the data, in which the
`
`5,274,646
`2
`C/R method can correct any number of hard errors (H)
`and but can only correct a single soft error (S) per DU.
`Accordingly, the combination of the C/R method and
`ECC can correct during transmission any number of
`hard and soft errors in a data unit up to the error detec
`tion capability of the ECC.
`It is the transient characteristic of soft errors that
`prevents the C/R method from correcting any soft
`errors in a data unit. It is the ECC which corrects the
`soft errors. Hence, the combined C/R and ECC
`(SEC/DED) methods are limited to correcting one soft
`error in the transmission of a data unit, and the occur
`rence of two soft errors (the S-S case) is uncorrectable.
`Both of the C/R or ECC nethods are limited to
`correcting stored errors only during the transmission of
`a data unit. The hard or soft errors existing in the data
`unit in memory remain in the memory from which the
`transmission occurred. The C/R method can only cor
`rect the complemented (inverted) readout of a memory
`data unit which is stored with hard errors.
`After the successful completion of the C/R process,
`the stored data unit remains with the same erroneous
`bits in memory, but the requestor receives a corrected
`data unit if the number of soft errors does not exceed the
`ECC capability. The C/R method provides complete
`error correction if there are no the soft error bits. And
`the C/R method enables complete error correction
`after it corrects all hard errors if the soft error bits can
`be corrected by the ECC. No error correction is ob
`tained if the number of soft errors exceeds the capability
`of the ECC. For example, two soft errors in a data unit
`(the S-S error case) are not correctable by the C/R
`method if the maximum ECC capability is one errone
`ous bit per data unit.
`The C/R method is much slower than the ECC cor
`rection process alone, because the C/R method requires
`two additional fetches and two additional stores in
`memory. Accordingly, the C/R method is not invoked
`unless an excessive error is detected. For example when
`using an SEC/DED type of ECC, only two error bits
`per data unit can be detected. If only one error is de
`tected (no excessive error exists), it is corrected by the
`ECC without initiating the C/R method.
`Although the C/R method can correct any number
`of permanent (hard) errors in a data unit, it may be
`limited by the maximum error detection capability of
`the ECC, since ECC error detection is used to control
`the initiation of the C/R method.
`The C/R error correction technique has been effec
`tively used in commercial computer systems having
`SEC/DED (single error correction/double error detec
`tion) ECC stored in memory to correct two errors in a
`data unit; the ECC alone has a maximum ability to
`correct a single error in a data unit. If the data unit has
`a hard error bit and a soft error bit (herein called the
`H-S case), the ECC correction is applied to the single
`soft error bit after the C/R operation has corrected the
`hard error bit.
`Currently used large computer systems desiring the
`best type of maintenance store a record of the occur
`rence of all excessive errors, whether corrected or not.
`This is because excessive errors are not corrected in
`memory, even though excessive errors may be cor
`rected for a requestor by using the C/R method to keep
`a task executing that would otherwise have to quit.
`Hence excessive error correction by the C/R method is
`considered outside the normal error correction ability
`
`45
`
`65
`
`IPR2018-01474
`Apple Inc. EX1010 Page 9
`
`

`

`10
`
`15
`
`5,274,646
`3
`4.
`ability to retry its operation and by providing for the
`of the system. A C/R corrected data unit is vulnerable
`to crashing the system if another soft error occurs in it.
`monitoring and reporting of the correction of excessive
`Other pertinent art was published in 1980 by Bossen
`errors to reduce the need for system interruptions. This
`and Hsaio in the IBM Research Journal, May 1980,
`invention can reduce the amount of communications to
`page 390 entitled "A System Solution to Memory Soft
`a single excessive-error communication per memory
`page frame (or any other unit of memory reporting)
`Errors'.
`Stringent error reporting and accounting have been
`even though excessive errors occur many times in a
`used to insure closely coordinated system maintenance
`memory unit. The monitoring by this invention greatly
`in prior large computer systems. They report excessive
`reduces the involvement of both the operating system
`errors to a service processor (SP) in the system which
`software and of a system service processor in C/R error
`maintains records of all significant error conditions
`corrections.
`occurring in the system to determine, for example,
`An object of this invention is to provide means in
`when to switch a CPU offline to perform maintenance.
`which the C/R process is done dynamically by a re
`Previously, an interruption was required to both the
`questing processor and a memory controller so that
`requesting processor and to memory operation before
`intervention by a service processor is reduced or elimi
`the C/R method could be invoked. C/R invocation
`nated.
`occurred in response to the occurrence of error detec
`Another object of this invention is to allow C/R
`tion which cause both the processor clock and the mem
`correction of a data unit without suspension of data
`ory access to be stopped until the processor has recov
`processing by the requesting processor.
`ered, and an interruption signal to be sent to the system
`20
`A further object of the invention is to make the hard
`service processor (SP). Then the SP interrupted its
`ware implementing the C/R method tolerant of some of
`current program and performed a recovery action on
`the failures that can enable the method to correct an
`the stopped processor, which was usually to retry the
`error when it otherwise may not have been corrected.
`instruction which stopped executing when the proces
`The invention significantly increases the value of the
`sor's clock was stopped. After the recovery action was
`25
`C/R method by providing it with retryability, which
`completed, the SP restarted the processor and the mem
`requires that no matter where failure occurs during
`ory resumed access for normal operation. Then the
`execution of the C/R process, the original erroneous
`processor issued a re-fetch request that invoked the
`DU-ECC value existing prior to the start of the C/R
`C/R method for the data unit having the excessive
`method be restored in its original location, so that the
`error. If the excessive error existed after operation of 30
`C/R method can be retried. Failure-free recovery is
`the C/R method, another processor stoppage occurred,
`essential to obtaining reliability in the use of the C/R
`etc. The next time the processor was restarted, it could
`method.
`record instruction processor damage to the task.
`The invention automatically invokes the C/R method
`This prior operation of the C/R method was very
`when the ECC method detects an excessive error (e.g.
`slow because of the clock-stopping interruption to the
`35
`two erroneous bits in a data unit when the C/R method
`requestor and to stopping memory access, and the SP
`is used with SEC/DED type ECC). Then this invention
`intervention before the C/R method could be invoked,
`uses the ECC error detection in combination with the
`which greatly reduced the efficiency of the system,
`C/R method to detect the different cases of double
`involving milliseconds instead of the normal CPU mi
`error combinations, which are the H-H, H-S and S-S
`crosecond speed for each operation of the C/R method.
`cases. The detection of the H-H case reveals that two
`System performance was further severely degraded due
`hard errors exist in a data unit in memory, which pres
`to a machine check interruption causing a loss of all
`ents a failing condition for memory. The detection of
`data in the CPU's cache, and a loss of all translations in
`the H-S case reveals that one hard error exists in a data
`the CPU's TLB (translation lookaside buffer), adding to
`unit in memory, which also presents a failing condition
`the reduction in CPU performance due to the need to
`for memory. But the detection of the S-S case reveals
`refetch all data lost in the cache and to retranslate all
`two soft errors exists in a data unit in memory, which
`addresses lost in the TLB. The program task was
`does not present a failing condition for memory but
`ABENDed (abnormally ended) by a machine check
`presents an uncorrectable error condition for a data unit
`interruption if the data was not corrected.
`stored in memory.
`C/R error correction does not correct any hard error
`The invention can use the C/R method with other
`or any soft error in the memory itself, even though the
`ECC methods instead of SEC/DED, such as with dou
`C/R method may correct the hard errors and the ECC
`ble
`error
`correction/triple
`error
`detection
`method may correct a soft error in the group only dur
`(DEC/TED), or with triple error correction/quadru
`ing its transmission to the requestor. However, an ECC
`ple error detection (TEC/QED), etc. The replacement
`corrected soft error could be corrected in MS by stor
`55
`of SEC/DED with any of these other known ECC
`ing the corrected DU-ECC group in its original loca
`types correspondingly increases the number of soft
`tion, which is sometimes called "scrubbing" the data.
`correctable errors in a data unit. For example, replace
`SUMMARY OF THE INVENTION
`ment of the SEC/DED ECC with the DEC/TED type
`of ECC allows up to two soft errors to be corrected per
`This invention makes the use of the C/R method
`more reliable and efficient, and less disrupting of system
`data unit, which would handle the H-H-H, H-H-S,
`H-S-S and S-S-S cases. Or replacement of the
`operations compared to prior use of the C/R method.
`This invention eliminates processor interruption upon
`SEC/DED ECC with the TEC/QED type of ECC
`the invocation and correct operation of the C/R
`allows up to three soft errors to be corrected per data
`method to significantly improve the efficiency of sys
`unit, which would handle the H-H-H-H, H-H-H-S,
`tem performance when the memory technology has a
`H-H-S-S, H-S-S-S and S-S-S-S cases, and so on for
`higher order ECC types. The location of the error bits
`significant number of excessive error conditions. And
`this invention augments the C/R method facilitating its
`in a data unit is not important in this invention.
`
`45
`
`50
`
`IPR2018-01474
`Apple Inc. EX1010 Page 10
`
`

`

`O
`
`5,274,646
`6
`5
`questor responding to a double-bit error status signal
`This invention enables different types of remedial
`sent to the requestor by the MS controller.
`actions to be used for these different detectable cases by
`FIG. 11 is a flow diagram of a process of signalling to
`enabling records to be maintained in the system by
`a service processor from the described C/R process,
`signalling the occurrence of the particular error case in
`and the system use of the various signals provided by
`an automatic manner that can be recorded by the system
`the C/R method.
`for later maintenance use.
`The invention may prevent an emergency mainte
`DESCRIPTION OF THE PREFERRED
`nance situation that may shut down a system by en
`EMBODIMENT
`abling the system reporting of errors to detect between
`FIG. 1 provides hardware that obtains novel control
`excessive hard and soft errors. A unique reporting pro
`of the C/R method for correcting erroneous data re
`cess enables later non-emergency maintenance to han
`quested from main storage (MS) 11 by a requestor,
`dle conditions causing the detected errors.
`which may be a CPU (central processing unit), an I/0
`This invention increases the fault tolerance of a sys
`processor that is controlling one or more I/O devices, or
`tem in the manner of use of the combined C/R and ECC
`a service processor (SP). Each request for data sends an
`methods by having redundant status control registers
`15
`and using comparisons and parity checking for them.
`address to MS for accessing a line of data in MSDRAM
`arrays comprising a memory in a computer system. The
`An example of the report signalling by this invention
`invention may be applied to any memory, but the pre
`is where the CPU maintains a memory map in main
`memory of all 4KB page frames in system main storage
`ferred embodiment uses the main storage (MS) in a
`computer system. Each request provides the MS ad
`(MS). The memory map is called a Logical Track Re
`20
`dress of a requested data line in MS and the requestor's
`cord (LTR). The requesting processor reports to the
`identifier (RID), which is provided to the memory con
`LTR the H-H, H-S and S-S excessive error cases for the
`respectively addressed memory unit (4KB page frame)
`troller.
`if the excessive error type has not previously been re
`Each data line in MS 11 contains one or more data
`ported for the respectively addressed memory unit,
`units (DUs), of which each data unit is a group of data
`25
`significantly reducing the number of processor interrup
`bits over which an error correcting code (ECC) is gen
`erated to provide a total group of bits indicated by the
`tions for reporting that delay its processing and reduces
`system efficiency.
`notation DU--ECC group. The ECC bits in a group are
`commingled among the DU bits, and the ECC bits
`The LTR also is reported to a system service proces
`enable SEC/DED for its group during its readout and
`sor (SP) to enable the SP to maintain its own page frame
`map called Physical Track Record (PTR) in a persistent
`transmission.
`In the preferred embodiment each data line is com
`disk file. The PTR is retained after the LTR is lost by a
`prised of sixteen DU--ECC groups. The bits in a DU
`reset of the volatile CPU memory. Upon the next reini
`+ECC group are readout and transmitted in parallel on
`tialization of the CPU, the SP uses the PTR to rebuild
`a memory bus in the preferred embodiment, but they
`the LTR in memory for the CPU software so the CPU
`35
`may be serially transmitted on a bus for the current data
`need not waste time rebuilding the LTR by repeating
`line. Also, the preferred embodiment presumes each
`error interruptions for bad pages that had previously
`DU is a double word of 64 data bits having 8 ECC bits
`been reported as error conditions. The PTR has a
`providing SEC/DED. Hence, the 64+8=72 bits are
`threshold of an amount of error conditions after which
`transferred on the bus in parallel. If a serial bus were
`the appropriate part of the system can be fenced off and
`used, each DU--ECC group would be assembled/disas
`shut down for convenient maintenance.
`sembled into its parallel bit form to and from memory.
`BRIEF DESCRIPTION OF THE DRAWINGS
`Original Fetch Request (FIG. 8 Process)
`FIG. 1 illustrates the interacting collection of hard
`ware used by the preferred embodiment to improve the
`In the hardware of FIG. 1, each of the 16 DU--ECC
`45
`groups in a requested data line is transferred from MS
`fault tolerance of the C/R method.
`11 through inverter circuits 12 to an ECC logic circuit
`FIG. 2 illustrates the structure of the data buffer with
`13. During an original fetch, each group passes through
`its inpointer and outpointer.
`inverter 12 in its true (non-inverted) form to ECC logic
`FIG. 3 illustrates respective contents of the three
`buffers accessed at corresponding locations of their
`circuit 13. Circuit 13 error checks each DU-ECC
`inpointers or outpointers.
`group and passes it to its requestor, whether the ECC
`FIG. 4 represents a memory request register used for
`logic found it error-free or not. If a DU--ECC group
`receiving requests from CPUs, I/Os and other request
`has only one erroneous bit, it is error corrected before
`ors requesting a complement/recomplement (C/R)
`being sent to the requestor. Status information is sent
`with the data to the requestor to inform the requestor of
`error correction operation.
`55
`the group's error-free or particular error condition.
`FIG. 5 represents a C/R sequencer used to control
`If the requestor receives error-free data (whether or
`the steps in the C/R operations used by shown hard
`not ECC corrected), the C/R function is bypassed (not
`ware in FIG. 1.
`done for the request). A particular status signal identi
`FIG. 6 shows a status request register in an MS con
`fies any detected error condition to the requestor for the
`troller used to indicate to the operating system software
`sent DU--ECC group, so that the requestor can deter
`the status of a current fetch request.
`mine what it wants done with the request, including
`FIG. 7 shows a status request register in the MS
`whether the requestor will request the C/R method to
`controller used to indicate to the system service proces
`be performed on the erroneous DU-ECC group.
`sor (SP) the status of a current fetch request.
`The flow diagram of FIG. 8 shows the steps in an
`FIG. 8 is a flow diagram showing the fetch error
`original fetch operation for a data line request, in which
`detection process.
`step 1 represents the current request. Step 2 represents
`FIG. 9 and FIG. 10 are flow diagrams showing the
`when the fetch request obtains priority from the mem
`C/R process invoked by a refetch request from a re
`
`30
`
`SO
`
`65
`
`IPR2018-01474
`Apple Inc. EX1010 Page 11
`
`

`

`5,274,646
`8
`7
`No interruption is caused to the normal operation of
`ory controller, which involves putting the request into
`MS 11 or to the requesting processor during the process
`memory request register 71 in FIG. 4, from which ad
`of issuing a re-fetch request invoking the C/R method.
`dresses are generated for each DU--ECC group in MS.
`The successful use of the C/R process without an
`Step 3 is the generation of status information which is
`interruption to the requesting processor provides signif
`put into registers 73 and 74 in FIGS. 6 and 7, which are
`icant novelty for the subject invention over the prior
`located in the memory controller for MS 11.
`art. Previously, an interruption to the requesting pro
`Register 73 in FIG. 6 represents the status informa
`cessor always occurred on sensing an excessive error
`tion sent to the requestor and has the following fields set
`condition (2 bit error with SEC/DED ECC) during
`by ECC circuits 13: a requestor identifier (RID) of the
`which the SP was invoked to initiate and control the
`processor which made the request, the 72 bits of the
`execution of the C/R method. A significant amount of
`fetched DU--ECC group, and the following single bit
`processor time was lost in that prior process compared
`indicator fields: no error, 1 bit corrected error, exces
`to the process of the subject invention.
`sive error, and a special UE (uncorrected error) indica
`The following TABLE represents the four states of
`tor. The special UE is a unique character stored in a
`the C/R sequencer shown in FIG. 4 and some of their
`memory location to represent that the location had bad
`consequences including the restoration actions needed
`unrecoverable data due to an error.
`for restoring an erroneous DU-ECC group to its origi
`Register 74 in FIG. 7 represents the status informa
`nal state in MS if a failure should occur in the C/R
`tion sent to a service processor (SP) which is also set by
`hardware before its completion of the C/R process.
`ECC circuits 13 with the same information as is put into
`register 73, except that register 74 also has a field set
`TABLE
`with the ECC syndrome bits for the group. By provid
`DU--
`ing the syndrome bits to the SP, the SP has the option
`DU-ECC ECC
`Forn in
`Forn in
`of using the syndrome bits and other status information
`Data Buf MS At
`to verify the ECC operation, if required. For an original
`25
`At End
`End of
`fetch, the fields H-H, H-S and S-S in registers 73 and 74
`of State
`State
`are not generated (remain offin reset state) in the status
`True
`True
`registers.
`In step 4 of FIG. 8, the status information in register
`73 is communicated to the requestor indicated in the
`RID field, which is the fetch request currently repre
`sented in register 71 in FIG. 4.
`Re-fetch Request (See Process in FIG.9)
`The requestor may be a CPU, an I/O processor, or a
`service processor. Each requestor has hardware for
`receiving and sensing the received status information
`and determining if further action is needed for continu
`ing the storage request with the C/R method if an ex
`cessive error is reported in the status register informa
`tion. In the preferred embodiment, if the requestor re
`ceives an excessive error indication in the status regis
`ter, the requestor automatically makes a re-fetch request
`to MS, which is a request for invoking the C/R method
`45
`to attempt a correction of the excessive error in that
`DU-ECC group.
`Status register 74 in FIG. 7 has its contents transmit
`ted to a service processor (SP) such as exists in the
`processor control element (PCE) of an IBM 3090 sys
`SO
`tem. The SP controls the error recovery operations for
`the system when required for an MS error condition,
`such as if an unrecoverable failure occurs in the hard
`ware handling the request. For example if an error
`condition occurs in handling

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket