`Buskens et al.
`
`US006298039B1
`US 6,298,039 B1
`Oct. 2, 2001
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`(54) HIGH AVAILABILITY DISTRIBUTED CALL
`PROCESSING METHOD AND APPARATUS
`
`(75) Inventors: Richard Wayne Buskens, Middletown,
`NJ (US); Thomas F. La Porta,
`Thornwood, NY (US); YoW-Jian Lin,
`Edison, NJ (US); Kazutaka
`Murakami, Freehold, NJ (US);
`Ramachandran Ramjee, Matawan, NJ
`(Us)
`(73) Assignee: Lucent Technologies Inc., Murray Hill,
`NJ (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/017,105
`(22) Filed:
`Jan. 31, 1998
`
`(51) Int. Cl.7 ................................................... .. H04L 12/50
`(52) US. Cl. ................................ .. 370/216; 714/4; 714/16
`(58) Field of Search ................................... .. 370/216, 242,
`370/244, 310, 328, 260; 709/100, 226,
`303; 714/1, 2, 5, 4, 15
`
`“The Role Of New Technologies In Wireless Access Net
`work Evolution”; T. F. LaPorta, A. Sawkar, W. Strom;
`Proceedings of International Switching Symposium (ISS
`1997), IS—03.18, 1997; pp. 533—539.
`“Signaling System No. 7: A. Tutorial”; A. R. Modarressi and
`R. A. Skoog; IEEE Communications Magazine, vol. 28, No.
`7, Jul. 1990; pp. 19—35.
`“A Survey of Rollback—Recovery Protocols in Message
`Passing Systems”; E. N. ElnoZahy, D. B. Johnson and Y. M.
`Wang; Tech. Report CMU—CS—96—181, School of Com
`puter Science, Carnegie Mellon University, Oct. 1996; pp.
`1—46.
`“Optimistic Recovery in Distributed Systems”; R. E. Strom
`and S. Yemini; ACM transactions on Computer Systems,
`vol. 3, No. 3, Aug. 1985; pp. 204—226.
`“Why Optimistic Message Logging Has Not Been Used In
`Telecommunications System”; Y. Huang and Y—M. Wang;
`Proceedings of the 25th International Symposium on Fault
`Tolerant Computing 1995; pp. 459—463.
`“Software Fault Tolerance in the Application Layer”; Y.
`Huang and C. Kintala; Software Fault Tolerance, John Wiley
`& Sons Ltd.; pp. 231—248.
`“Implementation of On—Line Distributed System—Level
`Diagnosis Theory”; R. P. Bianchini, Jr. and R. W. Buskens;
`IEEE Transactions On Computers, vol. 41, No. 5, May 1992;
`pp. 616—626.
`
`(56)
`
`References Cited
`
`* cited by examiner
`
`U.S. PATENT DOCUMENTS
`
`4,905,181 * 2/1990 Gregory ............................. .. 709/100
`5,105,420 * 4/1992 Ardon et a1. .... ..
`370/216
`5,883,939 * 3/1999 Friedman et a1.
`..... .. 379/9
`6,085,086 * 7/2000 La Porta et a1. ................... .. 455/432
`
`OTHER PUBLICATIONS
`
`“Structuring Call Control Software Using Distributed
`Objects”; H. Blair, S. J. Caughey, H. Green and S. K.
`Shrivastava; International Workshop on Trends in Distrib
`uted Computing, Aachen, Germany, 1996; pp. 95—107.
`“Distributed Call Processing For Personal Communications
`Services” T. F. LaPorta, M. Veeraraghavan, P. A. Treventi
`and R. Ramjee; IEEE Communications Magazine, vol. 33,
`No. 6, Jun. 1995; pp. 66—75.
`
`Primary Examiner—Hassan KiZou
`Assistant Examiner—Inder Mehra
`(74) Attorney, Agent, or Firm—Jeffery J. Brosemer
`(57)
`ABSTRACT
`
`A method of delivering highly-reliable, fault-tolerant com
`munications services in a telecommunications network of
`distributed call processing systems. The method advanta
`geously identi?es a set of objects within the telecommuni
`cations network requiring checkpointing; checkpoints the
`objects; and subsequently restores the checkpointed objects
`in the event of a failure. Various aspects of the method are
`disclosed, including restoration strategies.
`
`9 Claims, 3 Drawing Sheets
`
`FUNCTIONAL OBJECTS FOR DISTRIBUTED CALL PROCESSING
`
`UA obj: USER AGENT OBJECT
`CALL obj: CALL OBJECT
`CONN obj: CONNECTION OBJECT
`CHAN obj: CHANNEL OBJECT
`
`140
`
`Bright House Networks - Ex. 1045, Page 1
`
`
`
`U.S. Patent
`
`0a. 2, 2001
`
`Sheet 1 of3
`
`US 6,298,039 B1
`
`FIG. 1
`FUNCTIONAL OBJECTS FOR DISTRIBUTED CALL PROCESSING
`
`UA obj: USER AGENT OBJECT
`CALL obj: CALL OBJECT
`CONN obj: CONNECTION OBJECT
`CHAN obj: CHANNEL OBJECT
`
`FIG. 2
`TYPICAL STATE MACHINE IN CALL PROCESSING SYSTEMS
`
`210
`
`<\ j")
`Ki /-
`///
`'-"\\\
`TRANSITIONS I’ ..
`. \
`FOR
`I
`}
`. , ABORT RECOVERY\
`
`\
`\
`\
`
`
`
`
`
`RESOURCE RELEASE PHASE
`
`
`
`
`
`RESOURCE ALLOCATION PHASE
`
`\+/
`
`220
`
`Bright House Networks - Ex. 1045, Page 2
`
`
`
`U.S. Patent
`
`0111. 2, 2001
`
`Sheet 2 013
`
`US 6,298,039 B1
`
`FIG. 3
`MSC CALL PROCESSING SOFTWARE STRUCTURE
`
`I
`
`I
`
`MSC MANAGEMENT SOFTWARE SYSTEM
`EM #1
`PMon #1
`CM #1
`- INSTANCEH
`INSTANCEII
`INSTANCEI‘I
`04 O r l I l I I I I I I I | I I | l l l l l l l I l I I I I I I | I I I I I I I l | l l l
`Q I. __________ “5153,: _____________________________ _
`g ]\ IM #1
`053
`ChonSrv #1
`UA ob-
`E g
`‘
`INSTANCEI‘I
`INSTANCEII
`INSTANCEII
`Z < I @ @EAN-I Obi)
`g E (E3
`U ‘ ".b
`
`Z '
`
`U 0
`
`_
`
`-
`
`I
`
`II
`
`ConnSrv #1
`INSTANCEI‘I
`
`a i
`I
`a :
`0
`E i
`IM #n
`Q I
`INSTANCE #1
`i ;
`f4 I GED
`a“
`I (EL)
`5'
`I (M)
`MSC CALL PROCESSING
`g :
`’
`2 IX ________________ -EQEEEAEEEESJEN _________________ __
`
`318
`
`2
`.
`ChanSrv #n
`INSTANCE #1
`W13" -" Ob
`(CHAN-n Obi)
`@HAN-n 051)
`
`310/ 055: USER SIGNAL SERVER
`ConnSrv: CONNECTION SERVER
`CM: CONFIGURATION MANAGER
`PMon: PROCESS MONITOR
`
`ChonSrv: CHANNEL SERVER
`IM: INTERWORKING MANAGER
`EM: EVENT MANAGER
`
`Bright House Networks - Ex. 1045, Page 3
`
`
`
`U.S. Patent
`
`0a. 2, 2001
`
`Sheet 3 of3
`
`US 6,298,039 B1
`
`0mm
`
`SN a: 09
`
`@896 6% WE Q6
`
`6525B :6 Q
`
`
`
`on 0mm oow cm: 09
`
`A882; 6% WE 26
`
`2052666 :6 2v
`
`
`
`.1? 28x85 1:;
`
`0%
`com
`com
`com I
`03 I
`(sw NI) mam anus TIVO asvaa/w
`
`v .Qhm
`
`_ _ _
`
`
`
`|? 29285 oz
`{if 56655.6 1:;
`
`
`con
`(sw NI) 20mm 00138 TIVO aovaa/w
`
`Bright House Networks - Ex. 1045, Page 4
`
`
`
`US 6,298,039 B1
`
`1
`HIGH AVAILABILITY DISTRIBUTED CALL
`PROCESSING METHOD AND APPARATUS
`
`TECHNICAL FIELD
`
`This invention relates generally to the ?eld of telecom
`munications and in particular to a method for imparting high
`availability and fault tolerance to distributed call processing
`systems.
`
`BACKGROUND OF THE INVENTION
`The development of telecommunications call processing
`or sWitching systems constructed from a distributed set of
`general purpose computing systems is emerging as an area
`of particular interest in the art. See, for example, H. Blair, S.
`J. Caughey, H. Green and S. K. Shrivastava, “Structuring
`Call Control SoftWare Using Distributed Objects,” Interna
`tional Workshop on Trends in Distributed Computing,
`Aachen, Germany, 1996; T. F. LaPorta, M. Veeraraghavan,
`P. A. Treventi and R. Ramjee, “Distributed Call Processing
`for Personal Communication Services,” IEEE Conimunica
`tions Magazine, Vol.33, no.6, pp. 66—75, June 1995; and
`TINA-C, Service Architecture Version 2.0, March 1995.
`As noted in a paper published by T. F. LaPorta, A. SaWkar
`and W. Strom, entitled “The Role of NeW Technologies in
`Wireless Access NetWork Evolution,” that appeared in Pro
`ceedings of International SWitching Symposium (ISS ’97),
`IS-03.18, 1997, systems employing distributed call process
`ing architectures exhibit increased system scalability,
`performance, and ?exibility. Additionally, advances in open
`distributed processing, such as the Common Object Request
`Broker Architecture (CORBA), described in “The Common
`Object Request Broker: Architecture and Speci?cation,” by
`the Object Management Group (OMG) Rev. 2.0, July 1995,
`facilitate portable and interoperable implementations of dis
`tributed softWare architectures in a heterogeneous comput
`ing environment. As is knoWn, systems employing such
`technologies advantageously leverage a rapidly increasing
`price/performance ratio of“off-the-shelf” computing compo
`nents.
`The stringent performance and availability requirements
`of public telecommunications systems pose particular chal
`lenges to developing highly available distributed call pro
`cessing systems Which incorporate these off-the-shelf com
`puting components. Speci?cally, and as noted by A. R.
`Modarressi, R. A. Skoog, in an article entitled “Signaling
`System No. 7: A Tutorial”, Which appeared in IEEE Com
`munications Magazine, Vol. 28, No. 7, pp. 19—35, in July
`1990, call processing softWare must process each call
`request Within a feW hundred milliseconds, and a sWitching
`system may not be out of service for more than a feW
`minutes per year. As such, present day sWitching systems
`employ custom-designed fault-tolerant processors and
`special-purpose operating systems to meet these stringent
`requirements. In order for next generation sWitching systems
`to be built using general purpose computing platforms,
`softWare-based fault-tolerant methods and systems are
`required to achieve the same or similar performance and
`availability goals.
`TWo softWare methods for enhancing the level fault
`tolerance in a distributed computing environment that have
`been described in the literature are checkpointing and mes
`sage logging. See, for example, E. N. ElnoZahy, D. B.
`Johnson and Y. M. Wang, “A Survey of Rollback-Recovery
`Protocols in Message-Passing Systems,” Tech. Report
`CMU-CS-96- 181, School of Computer Science, Carnegie
`Mellon University, October 1996, and R. E. Strom and S.
`
`2
`Yemini, “Optimistic Recovery in Distributed Systems,”
`ACM Transactions on Computer Systems, Vol.3, no.3,
`pp.204—226, August 1985. Brie?y stated, checkpointing
`involves periodically taking a “snapshot” and saving an
`entire state of a softWare process While messages sent or
`received by the softWare process are logged (message
`logging) betWeen subsequent checkpoints. Assuming a
`pieceWise deterministic execution model, and as described
`by Y. Huang and Y. M. Wang, in an article entitled “Why
`Optimistic Message Logging has not been used in Telecom
`munications Systems,” that appeared in the Proceedings of
`the 25th International Symposium on Fault-Tolerant
`Computing, pp. 459—463, 1995, the state of the process can
`be later reconstructed during a recovery process by replay
`ing logged messages in their original order. As observed by
`Y. Huang and C. Kintala, in “SoftWare Fault Tolerance in the
`Application Layer,” Which appeared In Software Fault Tol
`erance (M. R. Lyu, Ed.), John Wiley & Sons, Chichester,
`England, pp.231—248, 1995, checkpointing, message
`logging, and “rollback” recovery techniques can be embed
`ded into the operating system While remaining virtually
`transparent to application softWare.
`Unfortunately, hoWever, there are numerous disadvan
`tages to these approaches When applied to distributed call
`processing systems. First, taking a snapshot of the entire
`process state may create a long period of time during Which
`the process is unable to service requests from its clients,
`thereby increasing end-to-end call setup latency. Second, a
`single call request may involve a signi?cant number of
`message exchanges betWeen functionally distributed serv
`ers. Consequently, logging every message becomes too
`time-consuming to meet stringent call setup latency require
`ments of only a feW hundred milliseconds associated With
`call processing. Additionally, if checkpoint intervals are
`made sufficiently long in an attempt to minimiZe checkpoint
`overhead, a prohibitively large number of messages my need
`to be replayed after a failure, thereby making recovery time
`unacceptably long. Consequently, a continuing need exists
`in the art for softWare-based fault-tolerant computing sys
`tems suitable for demanding telecommunications applica
`tions.
`
`SUMMARY OF THE INVENTION
`An advance is made over the prior art in accordance With
`the principles of the present invention directed to a method
`of delivering highly-reliable, fault-tolerant communications
`services in a telecommunications netWork of distributed call
`processing systems. The method advantageously identi?es a
`set of objects Within the telecommunications netWork requir
`ing checkpointing; checkpoints the objects; and subse
`quently restores the checkpointed objects in the event of a
`failure. Additionally, the method accommodates the selec
`tive determination of particular states requiring restoration,
`and reduces, Where desired, duplicate restorations Within the
`system.
`Further features and advantages of the present invention,
`as Well as the structure and operation of various embodi
`ments of the present invention are described in detail beloW
`With reference to the accompanying draWing.
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWING
`The teachings of the present invention can be readily
`understood by considering the folloWing detailed descrip
`tion in conjunction With the accompanying draWings, in
`Which:
`FIG. 1 is a bubble diagram depicting functional objects
`associated With distributed call processing;
`
`65
`
`Bright House Networks - Ex. 1045, Page 5
`
`
`
`US 6,298,039 B1
`
`3
`FIG. 2 is a simpli?ed state diagram showing a typical state
`machine in call processing systems;
`FIG. 3 is a block diagram of call processing softWare for
`a mobile sWitching center according to the teachings of the
`present invention;
`FIG. 4(a) shoWs in graphical form the average call setup
`latency (ms) vs. calls/hour (1000s) at call origination for the
`mobile sWitching center of FIG. 3 constructed according to
`the teachings of the present invention; and
`FIG. 4(b) shoWs in graphical form the average call setup
`latency (ms) vs. calls/hour (1000s) at call termination for
`mobile sWitching center of FIG. 3 constructed according to
`the teachings of the present invention.
`
`DETAILED DESCRIPTION
`A preferred embodiment of the invention Will noW be
`described While referring to the ?gures, several of Which
`may be simultaneously referred to during the course of the
`folloWing description. As can be appreciated by those skilled
`in the art, a telecommunications netWork architecture com
`prises many functional entities (FEs), each of Which per
`forms one or more distinct tasks in the netWork. For
`example, the Wireless Intelligent Network (WIN) Distrib
`uted Functional Plane de?nes a distributed functional model
`for Wireless intelligent netWorks. This WIN model includes
`FEs Which provide call control functions, access control
`functions, service control functions, and location registra
`tion functions.
`Call processing scenarios refer to various groupings of
`tasks coordinated through sequences of signaling messages.
`A distributed call processing system is a mapping of tasks to
`a collection of co-operating softWare modules. In general a
`softWare module could support tasks of multiple FEs, but
`only one softWare module is responsible for all tasks of a
`single FE.
`By Way of background, We noW de?ne four distributed
`call processing terms that are based on object-oriented
`concepts. In particular, We de?ne tWo object classes, namely,
`a functional object class and a server class, and tWo object
`instances, a functional object and a server instance.
`A functional object class corresponds to a FE. It de?nes
`u unique call processing functions supported by the class,
`types of physical and logical resources managed by the
`class, and any interfaces exported to other functional object
`classes. A functional object is an instance of a functional
`object class. Each functional object manages its oWn
`assigned resources and associated data corresponding to a
`single call activity and multiple functional object classes
`may be needed to service a single call processing request.
`Each call processing request results in the creation of one
`functional object for each of these functional object classes.
`Collectively, these functional objects created maintain an
`overall state information related to the request. Accordingly,
`the functional objects persist until the requested activity
`(e.g., a call) ends.
`A server class corresponds to a softWare module. It is a
`unit of computation in a functional object class in a distrib
`uted call processing architecture. Server classes support one
`or more closely related functional object classes. A server
`instance is an embodiment of a server class, and typically
`corresponds to a process in a real implementation. A call
`processing system may have multiple instances of the same
`server class to alloW the system to be scalable in the capacity
`dimension.
`By Way of example, and With reference noW to FIG. 1,
`there it shoWs four classes of functional objects identi?ed in
`
`10
`
`15
`
`20
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`our example Mobile SWitching Center (MSC). Speci?cally,
`and as shoWn in the Figure are: User Agent object (UA) 110,
`Connection object (CONN) 130, Channel object (CHAN)
`140, and Call object (CALL) 120.
`CONN object 130 performs tasks necessary for establish
`ing a single end-to-end connection and maintains detailed
`state information about the connection. CHAN object 140,
`controls resource allocation activities for a speci?c transport
`channel resource, such as the channel of a sWitching device
`in the MSC.
`CALL object 120, records call activities of a speci?c user,
`While UA object 110 maintains non-call-related state infor
`mation about the user (such as a user’s service pro?le). Note
`that the UA object 110 and CALL object 120 are user
`speci?c, CONN object 130 is unique for each connection,
`and CHAN object 140 is for a particular resource. As a
`result, UA 110 and CALL 120 object classes are likely
`candidates for grouping together Within one server class.
`Those skilled in the art Will appreciate that public tele
`communications call processing systems are designed to
`meet extremely stringent availability requirements due, in
`large part, to a considerable societal dependence on services
`provided therefrom. Consequently, only a feW minutes of
`doWntime per year are tolerated for these systems. Since it
`is generally accepted that failures cannot be completely
`prevented, recovery times from the inevitable failures must
`be as short as possible to minimiZe service doWn time. In
`addition, the folloWing general requirements must be met by
`highly available distributed call processing systems:
`High performance: LoW end-to-end call setup times (less
`than a feW hundred milliseconds).
`Active call preservation: Active calls must be preserved
`across failures. Calls in a transient state, on the other
`hand, need not be conserved, but may be retried or
`cleared. Clearing transient calls is a common practice
`in telecommunications systems.
`Resource leak avoidance: Reserved server resources and
`netWork channel resources must be released even if a
`call request is abnormally aborted due to a failure.
`Our selective, event-driven checkpointing method, Which
`is the subject of the present invention, checkpoints per
`functional object instead of per process. As such, We call our
`novel approach object-level checkpointing. Advantageously,
`the folloWing general properties of call processing support
`our approach. Speci?cally:
`Property 1: Functional objects are independent and small in
`siZe.
`A call activity involves only one functional object per
`functional object class and there is no mutual dependency
`among functional objects of the same class. Thus, check
`pointing can be scheduled per object Without coordinating
`With other objects in the class. Since call processing systems
`in public telecommunications netWorks can handle a large
`amount of call signaling traf?c, a process may contain
`thousands of functional objects. Each checkpoint thus con
`tains only a tiny fraction of an entire process state.
`Unfortunately, even if checkpoints are taken on a per
`object basis, message logging is generally still required so
`that the system can recover from lost messages.
`Nevertheless, call processing systems exhibit another favor
`able property that alloWs us to completely eliminate message
`logging.
`Property 2: Call processing systems are surrounded by
`robust standard signaling interfaces.
`A call processing system interacts With external netWork
`elements, such as other sWitching and/or database systems.
`
`Bright House Networks - Ex. 1045, Page 6
`
`
`
`US 6,298,039 B1
`
`1O
`
`15
`
`25
`
`35
`
`5
`Typically, standard signaling protocols are employed at
`external element interfaces so that different switching sys
`tems and devices may inter-Work. Signaling protocols used
`in public telecommunications netWorks have been designed
`With high reliability in mind so that lost request or response
`messages are detected and appropriate recovery actions are
`invoked. A timeout mechanism is commonly used for this
`purpose. In particular, upon a timer expiration, a lost request
`is either retried or aborted, depending on the situation.
`Consequently, neither message logging nor message replay
`is necessary for such systems—resulting in loWer failure
`free overhead and reducing recovery time.
`As can be appreciated, an important design consideration
`for our inventive object-level checkpointing method is to
`determine When to checkpoint a functional object. A ?rst
`approach is to checkpoint object Whenever its state changes
`(due to a message receipt). Unfortunately, hoWever, since
`many message exchanges are involved in a single call setup
`request, this method signi?cantly deteriorates failure-free
`performance. Therefore, it is essential to reduce the number
`of checkpoints produced to minimiZe run-time overhead.
`Before describing further our checkpointing method
`hoWever, it is useful to ?rst revieW the structure of typical
`call processing softWare in an attempt to identify locations
`Within the softWare at Which to perform the checkpointing.
`A knoWn characteristic of call processing systems is the
`asynchronous nature of events. Since multiple parties are
`involved in a call, tWo independent, and sometimes
`con?icting, events may affect a single functional object at
`the same time. For example, a caller might hang up While
`connections are being setup for the call. Upon arrival of such
`an asynchronous event, it may be necessary to abort ongoing
`procedures for the original request.
`To cope With asynchronous event arrivals, a state machine
`model has been employed for telecommunications systems.
`FIG. 2 shoWs a typical state machine for call processing
`systems. As is shoWn in this Figure, tWo stable states,
`namely a null state 210 and an active state 220, exist along
`With many other transient states 230, 240 in betWeen. For the
`CONN object described previously, for example, the active
`state represents a state Where an entire connection is estab
`lished betWeen end users, While the null state means that
`there is no connection. The transient states for the CONN
`object are those states in Which a connection is being setup
`or torn doWn. Advantageously, the folloWing observation of
`transient states supports our checkpointing and recovery
`method.
`Property 3: Only a small number of calls are in a transient
`state.
`As should be apparent to those skilled in the art, call
`establishment and call release procedures take only a feW
`hundred milliseconds. In sharp contrast, average call dura
`tions are on the order of many minutes, therefore most call
`activities are in a stable, active state 220 compared to
`average call durations Which are on the order of minutes.
`With these above properties of distributed call processing
`systems de?ned, our inventive checkpointing method may
`be described. Advantageously, our method minimiZes the
`number of checkpoints, While preserving the performance
`requirements discussed previously. In particular, our method
`imparts great importance on active call preservation and
`resource leak avoidance.
`Accordingly, our inventive method performs checkpoint
`ing When:
`1. committing to a stable state, and
`2. obtaining neW state information required to undo
`resource allocation or to redo resource clearing.
`
`45
`
`55
`
`65
`
`6
`Advantageously, With our method, all objects in a tran
`sient state Within a failed server instance are cleared. Since
`most calls are in a stable state, only a small number of calls
`are affected by the above checkpointing policies.
`Finally, one last property of distributed call processing
`systems Which permits us to further reduce the number of
`checkpoints is used in our method. Speci?cally,
`Property 4: Partial state information is replicated among
`multiple objects of different functional object classes.
`When functional objects are contained Within different
`servers, replicated state information oftentimes exists among
`the different servers so that a functional object in one server
`can identify an appropriate functional object in another
`server. We avoid redundant checkpointing of the same data
`by designating one of the servers to be responsible for
`checkpointing any redundant state information shared by the
`different servers. After a failure, a recovering server that
`does not checkpoint the redundant state reloads its state
`information from the server(s) that does checkpoint that
`state. We descriptively refer to this as state reloading. As
`should be apparent to those skilled in the art, our inventive
`method of state reloading reduces the number of checkpoints
`in the system, leading to loWer overall failure-free overhead.
`For our purposes, We can identify and distinguish betWeen
`tWo types of state reloading, namely, pessimistic state
`reloading and optimistic state reloading. In pessimistic state
`reloading, any neW call setup requests that arrive at a
`recovering server before the completion of state reloading
`are discarded. Conversely, in optimistic state reloading, neW
`call setup requests that arrive at a recovering server are
`processed before state reloading is completed, based on the
`assumption that call setup requests do not arrive for users
`that are already in a call. Thus, optimistic state reloading
`decreases the time that a recovering server is unavailable to
`process call requests. In the event that a con?ict is found as
`state is reloaded, the con?icting neW call setup request is
`rejected, and any call setup procedure in progress is aborted.
`During recovery from a failure, a recovering server
`instance must either undo or redo unsuccessful call setup
`and release attempts, detect state inconsistencies, and resyn
`chroniZe the states of related objects among distributed
`servers. Since our selective event-driven checkpointing
`scheme is performed at the application level, these recovery
`mechanisms must also be realiZed at the application level.
`Recall from FIG. 2 that there are tWo main phases in call
`processing state machines. The ?rst phase is the resource
`allocation phase and it reserves netWork resources in stages
`during the transition from a null state to an active state. The
`second phase is the resource release phase and it returns the
`call processing state machine to a null state from an active
`state by freeing reserved resources. Additional state transi
`tions exist betWeen the transient state in the resource allo
`cation phase and the resource release phase. These transi
`tions usually result from an abort action triggered by an
`interruptive event like a timeout or hang-up by a caller.
`Since such events may occur a synchronously With respect
`to the current state, call processing systems are required to
`provide abort recovery procedures for each functional object
`from any state. Importantly, an interruptive event at one
`server may cause inconsistencies among the states of related
`functional objects in different servers. Thus, distributed call
`processing softWare must provide a global resynchroniZa
`tion procedure to resynchroniZe the states of the related
`objects across servers. Abort messages that initiate abort
`recovery procedures for a functional object may be used for
`this purpose. Due to the asynchronous arrival of such events,
`the precise state of an interrupted resource reservation
`
`Bright House Networks - Ex. 1045, Page 7
`
`
`
`US 6,298,039 B1
`
`7
`request, for example, is unclear, and it is uncertain if the
`request is granted or not. Therefore, abort recovery opera
`tions must be idempotent. In other Words, When they are
`carried out several times, the same effect is produced as
`carrying them out only once.
`Distributed call processing systems furnish idempotent
`operations, abort recovery procedures, and global resyn
`chroniZation procedures. Given this characteristic, only
`minimal effort is required to support recovery from failures.
`Speci?cally, to avoid resource leaks, a recovering server
`instance must initiate abort recovery procedures for the
`functional objects it maintains that are in transient states,
`invoking system-Wide resynchroniZation procedures as nec
`essary. The idempotent resource release operations permit
`feWer checkpoints to be taken during call setup and call
`release, With no adverse effects of unnecessarily reissuing
`release requests during recovery.
`To further shorten recovery time after a failure, a paired,
`primary-backup approach may be used for each server
`instance. In particular, the primary and its backup run on
`different hosts in order to survive a single processor failure.
`The primary server instance processes all incoming requests
`and checkpoints its state information to its backup, as
`necessary. Since a backup server is already executing When
`a primary failure occurs, server unavailability is reduced due
`to shorter failover times.
`
`Exemplary System: Mobile SWitching Center
`
`We noW present our inventive method and principles
`described previously, to a call processing system for Wireless
`netWorks, commonly called a Mobile SWitching Center
`(MSC). Those skilled in the art Will recogniZe that an MSC
`is a local sWitching facility in a Wireless netWork. Each MSC
`controls mobile traf?c in a service area that is further divided
`into multiple geographical regions called cells. A Base
`Station (BS) Within each cell manages radio resources
`betWeen the BS and all Mobile Stations (MS) roaming
`Within the cell.
`All base stations Within the service area of an MSC are
`connected via knoWn, Wire-line connections to the MSC,
`Which in turn is interconnected to other MSCs and further to
`the Public SWitched Telephone NetWork (PSTN). A Home
`Location Register (HLR) is connected to the PSTN and
`keeps a global database identifying Which MSC is respon
`sible for setting up calls to a particular MS. The process by
`Which a MS is located Within an MSC’s service area during
`call setup is generally knoWn as paging.
`An MSC performs at least tWo important functions,
`namely, call processing and mobility management. Call
`processing includes setting up and tearing doWn a connec
`tion betWeen calling and called parties as Well as paging
`mobile stations (MSS). Mobility management includes
`poWer-up and poWer-doWn registration of MSs, resulting in
`updates to the MS’s location information in the correspond
`ing HLR.
`With reference noW to FIG. 3, there is shoWn an MSC 300
`illustrating the inventive principles of present invention. As
`shoWn in this Figure, there are four types of call processing
`server classes: interWorking managers (IMs) 310, 312, user
`signaling servers (USSs) 314, channel servers (ChanSrvs)
`316, and connection servers (ConnSrvs) 318. The ?gure also
`depicts three types of management servers, namely, con
`?guration managers (CMs) 320, event managers (EMs) 322,
`and process monitors (PMons) 324. Those skilled in the art
`Will quickly recogniZe, and, as depicted in this Figure,
`multiple instances of each server may exist in a system.
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`8
`InterWorking managers (IMs) 310, 312 act as protocol
`gateWays to internal MSC servers, isolating them from
`external signaling protocols and thereby alloWing the MSC
`to evolve independently of these protocols. Accordingly, an
`IM may terminate one or more signaling protocols and
`multiple types of IMs may exist Within a single MSC.
`Functional objects Within an IM record mapping information
`betWeen identi?ers, such as call id, used both internal and
`external to the MSC to correlate call processing activities.
`User signaling server (USS) 314 maintains information
`about the registration status of mobile stations currently
`roaming Within the service area of the MSC in UA objects.
`A USS also houses CALL objects, each recording call
`activities involving a particular mobile station.
`Channel servers (ChanSrvs) 316, 326 maintain CHAN
`objects to manage resources of sWitching devices allocated
`during call setup and deallocated during call release.
`Examples of resources managed include a sWitching fabric
`used to setup physical connection segments and voice
`encoders/decoders that take packet data from a Wireless link
`(air interface) and convert it to constant b