`
`Exhibit 9
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 2 of 15
`
`A Reliable CORBA-Based Network
`Management System
`Tong Luo, Member IEEE
`Tony Confrey
`GTE Laboratories, Waltham
`K. S. Trivedi, Fellow IEEE
`Duke University, Durham
`
`ABSTRACT
`
`platform, distributed systems. CORBA is an open
`standard and is supported by major software vendors.
`CORBA simplifies the development of distributed ap(cid:173)
`plications by supporting a platform and language
`independent distributed object execution environ(cid:173)
`ment, and by making the communication between
`distributed objects transparent to the application de(cid:173)
`velopers. The componentized architecture supported
`by CORBA is very attractive to the telecommuni(cid:173)
`cation industry because it facilitates the integration
`of new systems with the legacy systems already in
`use. The transparent communication between the
`distributed objects supported by CORBA lets ap(cid:173)
`plication developers concentrate more on business
`logic than on system level communication primitives.
`Thus the development life cycle is shortened, and the
`development risks are reduced.
`
`Network Management provides the central nervous system for
`the llietworks of telecommunications providers. A Telco's Net(cid:173)
`work Management System (NMS) needs to support uninter(cid:173)
`rupted management functionality of complex networks. The
`reliability of such systems has direct impact on the quality
`of services (QoS) provided to the consumers. Even a short
`down time of the NMS may cause customer dissatisfaction,
`revenue losses, and n:iay even jeopardize life. In order to expe(cid:173)
`dite the process of transforming technological capabilities into
`services and to shorten the development cycle of its NMS, the
`telecommunication industry is adopting C0RBA as an under(cid:173)
`lying; architecture. However, neither the C0RBA specifica(cid:173)
`tions nor the available services currently provides direct sup(cid:173)
`port for fault-tolerant objects. Consequently, NMS developers
`usini: C0RBA must provide their own fault-tolerance mecha(cid:173)
`nism for mission-critical objects. This paper reviews available
`In today's increasingly competitive, deregulated
`fault-tolerance approaches in the research literature, presents
`and data-centric telecommunication industry, the
`the architecture of GTE's next generation NMS, discusses the
`Telco which survives will be the one that can rapidly
`reliability issues involved in such systems, and provides our ap-
`transform capabilities available within its own a.nd
`proaches to solve them. Specifically, we present in detail our
`fault-tolerance approaches for the naming server, event chan-
`competitors' networks into bundles of managed Eer(cid:173)
`nels, and other inhouse built critical business objects. A brief vice offerings. This requires a higher level of integra-
`comparison of our approaches with others is also given.
`tion and cooperation amongst a Telco's Operational
`I. INTRODUCTION
`Support Systems (OSS). The network management
`system (NMS), seeing as a bridge between customer
`facing systems and the network, is a key component
`in creating and assuring these services. GTE's cur(cid:173)
`rent NMS, TONICS [5] [6] [7], provides a unified ser(cid:173)
`vice level view of large numbers of heterogeneous net(cid:173)
`work devices, and contains a model of the network, its
`components and their relationships. While an excel(cid:173)
`lent stand-alone management system, TONICS does
`
`The distributed nature of telecommunication man-
`agement systems requires a distributed architec(cid:173)
`ture. There are several distributed object commu(cid:173)
`nication standards using OOD technology. These
`include OMA [l], COM [2], and DSOM [3] [4].
`CORBA/OMA, however, has become increasingly
`ubiquitous
`in the development of large, cross-
`
`0-7803-5284-X/99/$ l 0.00 © 1999 IEEE.
`
`1374
`
`Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03,2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 3 of 15
`
`not provide the levels of integration with other en(cid:173)
`terprise OSS's requested for the future. Therefore,
`GTE is proceeding to develop a next generation man(cid:173)
`agement system. Our next generation NMS will be
`characterized by its ability to provide the following:
`• A componentized architecture, leveraging COTS
`products and supporting independent design,
`development, testing, and deployment of com(cid:173)
`ponents.
`• Open, extensible, secure access to network ser(cid:173)
`vices by OSS's and by customers, competitors
`and partners.
`• Web enabled platform independent client com(cid:173)
`ponents.
`is
`system
`generation
`next
`GTE's
`based on Telecommunications Management Network
`(TMN) (8] (91 layered element and network manage(cid:173)
`ment components. The Element Management Layer
`(EML) isolates the upper layer systems from the
`transport and protocol used to manage the devices
`and adapts any specific network element model into
`a common internal model. The EML system makes
`its services available on a secured CORBA bus. The
`Network Management Layer (NML) is composed of
`a set of cooperating CORBA components which per(cid:173)
`form functi<?ns such as service assurance, service pro(cid:173)
`visioning, inventory management, testing and fault
`isolation, ticketing, etc. The use of an open com(cid:173)
`ponent based framework supports the use of Com(cid:173)
`mercial Off The Shelf (COTS) products where they
`are available. A set of federated Java1 applets pro(cid:173)
`vide the user interface for the system. Given the
`semantics contained within CORBA's Internet Inter
`Orb Protocol (HOP) we have developed the ability
`to provide encrypted, authenticated, authorized and
`audited access to our CORBA services. This enables
`us to project both core functionality and user inter(cid:173)
`face displays outside the corporate firewall.
`The NMS needs to support GTE's business 7 days
`a week, 24 hours a day. Reliability and availability of
`such system is of critical concern. Though CORBA
`provides a suitable open architecture for distributed
`applications, neither the CORBA 2.0 standard [1] nor
`the existing CORBA services (10) provides support
`
`1 Java is a trademark of Sun Microsystems, Inc.
`
`for fault-tolerant objects [11] (12] (13]. This is be(cid:173)
`cause neither of them specifies the protocols for ob(cid:173)
`ject repl.ication and recovery, or addresses complex
`problems such as group communication (14], par(cid:173)
`tial failures (15], and causal ordering of events (16].
`This requires that application developers who adopt
`CORBA as an underlying architecture to build the
`fault-tolerance mechanism by themselves if high reli(cid:173)
`ability and availability is a system requirement.
`Recently, several different approaches have been
`proposed to build reliable distributed systems with
`CORBA. A "warm standby" idea is proposed by
`Sheu, et al (11]. Since it only handles two replica(cid:173)
`tions of an object, this approach is not scalable to a
`larger number of replications. In addition, the pro(cid:173)
`tocol for handling failed objects is not transparent
`to the client object. An "integrated" approach is
`adopted in Orbix+ Isis (17) and Electra (18), which ex(cid:173)
`tend and modify the standard Object Request Broker
`(ORB) with group communication mechanisms. This
`approach keeps the replication of objects transparent
`to clients. A drawback of this approach is that it is
`ORB dependent (implemented with IONA's Orbix) .
`Also it does not comply with CORBA's philosophy
`that the architecture should be generic and simple,
`with special r .::quirements being added on as sepa(cid:173)
`rate services. Yet anotht:r approach is the "service"
`approach (19], which provides the group communi(cid:173)
`cation mechanism on top of a standard ORB. This
`. approach keeps the replication of objects transpar(cid:173)
`ent to the client. It is ORB independent and follows
`CORBA's modularity philosophy. The drawback of
`using this approach is that there is no COTS soft(cid:173)
`ware product supporting the service available at this
`time. GTE is reluctant to invest in building its own
`group communication service which has potentially
`long development cycle, under the pressure of bud(cid:173)
`get constraints and the pressure of and short project
`time frames.
`After studying the problem of providing high reli(cid:173)
`ability and availability in CORBA-based systems, we
`identify three key issues:
`
`• How to make the fault-tolerance protocol of
`server objects transparent to client objects.
`• How to make the replications of the server ob-
`
`1375
`
`Autnorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03.2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 4 of 15
`
`jects consistent with each other.
`• How to make the fault-tolerance protocol scal(cid:173)
`able to multiple object replications.
`
`T he objects in GTE's next generation NMS may
`come from different sources. Some are from vendors,
`while others are implemented by ourselves. Also a
`client may have different ways to interact with these
`objects. The methods of achieving fault-tolerance are
`usually different for various kind of objects. We fo(cid:173)
`cus our discussion on the fault-tolerance mechanisms
`for the naming server, event channels, and critical
`business objects.
`
`The remainder of this paper is organized as fol(cid:173)
`lows. In Section 2 we give a brief review of the re(cid:173)
`quirements of GTE's next generation NMS, present
`the system architecture, and identify the reliability(cid:173)
`related issues. In Section 3, we review different so(cid:173)
`lutions currently available in the research literature.
`In Section 4, we present our solutions and discuss
`the implementation issues. Finally, in Section 5, we
`briefly compare our approaches with others and sum(cid:173)
`marize the discussion.
`
`II. ARCHITECTURE OF GTE's NEXT
`GENERATION NMS
`
`This section describes the architecture of GTE's
`next generation NMS. The logical architecture,
`shown in Figure 1, meets many of the needs outlined
`in Section 1. It provides:
`• A set of components each of which can be devel(cid:173)
`oped and maintained independently.
`• A CORBA based programmatic interface to each
`of the components, enabling EML and NML ser(cid:173)
`vices to be accessed by upper layer applications.
`• A mechanism whereby most applications can be
`independent of changes within the network, or
`at lower layers of the TMN.
`• Centrally administered platform independent
`client components.
`• Secure partitioned access to component services
`via Intranet, Internet, or Extranet.
`The architecture is composed of layered compo(cid:173)
`nents that communicate on a CORBA bus. The lower
`layers abstract out complexity and provide service to
`the upper layers. At the lowest layer is the EML sys(cid:173)
`tem, which provides mediation, data collection and
`long term storage, connection management, and ac(cid:173)
`tivation services used by components at the NML.
`
`SMI.Sysae,ns
`
`UI components
`
`A. Element Management Layer (EML)
`
`CORBA Bus
`··NMcs.;.;,;c~ .......................................... .
`
`Fig. I. L-Ogical architecture or GTE's next generation NMS
`
`The purpose of the EML in our architecture is to
`"be the networ/(' to higher layer systems, while ab(cid:173)
`stracting out the complexities of interaction with di(cid:173)
`verse network components. Through a set of CORBA
`interfaces, it provides a single point of contact which
`exposes the manageable features of each device in a
`vendor, transport and protocol independent manner.
`For the EML, we are developing the Integrated El(cid:173)
`ement Management System (IEMS) which exposes a
`network element as a set of CORBA services. These
`services provide access to the fault, performance, and
`configuration information and capabilities of the de(cid:173)
`vice. The interface is the same whether it is managed
`using TLl, SNMP, or CMIP; whether management
`transport is over X.25, TCP /IP, or an OSI stack; and
`whether the vendor of the device is Vendor A, B, or
`C. Moreover, since CORBA allows for interface in(cid:173)
`heritance, insofar as the fault behavior of a SONET
`
`1376
`
`Autnorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03.2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 5 of 15
`
`device is the same as that of an ATM device, the
`CORBA interface is the same also.
`The IEMS system is composed of four conceptual
`layers. The lowest one is the network interface layer,
`which performs mediation and connection manage(cid:173)
`ment of the physical equipments. This is the module
`that must encode the details of the management pro(cid:173)
`tocol and transport mechanism used to interact with
`a given element. For some protocols and technologies
`we use COTS mediation packages, for others we use
`the vendor provided element manager, for yet others
`we need to provide hand coded scripts and network
`interface processes. The NI layer provides service to
`the adaptation layer mapping or adapting the poten(cid:173)
`tially minimal model presented by the element into
`our logical internal model. The model layer provides
`TMN based logical network element and logical event
`model hierarchies. All upstream components within
`the EML and above operate on these logical enti(cid:173)
`ties. In this fashion, they are shielded from changes
`within the network and also have a common model
`for communicating and reasoning about the network.
`The topmost layer of IEMS is the service layer which
`provides fault, performance, configuration and other
`services. These service components operate on logi(cid:173)
`cal elements and events, perform long~term informa(cid:173)
`tion storage and querying, and provide the CORBA
`services interface to the system.
`
`B. Network Management Layer (NML)
`
`As a user of the EML, the NML forms its requests
`in the general way so as to minimize its sensitivity
`to changes within the network or within the EML
`system itself. The developers of an NML fault man(cid:173)
`agement system who need to display device alarms
`on a graphical display would design the system such
`that it operated on a generic network element object
`within the EML interface. This generic interface can
`provide all the basic alarm information on a given de(cid:173)
`vice of any technology type. Thus when new network
`technologies are added to the network, and supported
`at the EML, zero changes are required within the up(cid:173)
`stream system to display alarms from the new tech(cid:173)
`nology since the IEMS interface on which it depends
`does not change.
`NML components include:
`
`fault and performance
`• Service Assurance, i.e.
`management. This component provides the
`function of monitoring the network health and
`proactively detecting potential faults.
`• Testing and Fault Isolation. This component, in
`the event of a network error, helps isolate and
`diagnose the problem.
`• Configuration. This includes inventory manage(cid:173)
`ment, provisioning and order fulfillment. This
`component allows us to discover, record and
`change the state of the network.
`• User interface. This is implemented as a set of
`Java classes, which are clients of the EML or
`NML CORBA interfaces and provide graphical
`views which interact with the service assurance,
`testing and configuration aspects of the network.
`• Secure access gateway. This enables secure ac(cid:173)
`cess into, or between, components. Based on
`IIOP, this module provides encrypted transport,
`user authentication, authorization services for
`access control, and auditing of all gated oper(cid:173)
`ations.
`
`For the NML, we are currently developing the
`NeMoW (Network Management On the Web) sys(cid:173)
`tem. Given the breadth and depth of functionality
`required for the NML within a Telco, it is hard to
`see how a single system developed by a single group
`could be produced to meet these needs. Within GTE,
`the NeMoW system is composed of a set of heteroge(cid:173)
`neous components, each performing a different func(cid:173)
`tion, written by a different development group, pos(cid:173)
`sibly on differing platforms and languages. What ties
`these systems together is agreement on a few ba(cid:173)
`sic concepts. These systems are component based,
`each performs its function as independently as pos(cid:173)
`sible from the others. They use the services of the
`IEMS to operate on, and communicate with the net(cid:173)
`work. They share the same set of logical network
`models, with extensions necessary to the function
`they perform. Finally they share a common iden(cid:173)
`tification scheme, either directly or through the use
`of a naming service, so that network components can
`be commonly identified.
`
`The component architecture of GTE's next gener(cid:173)
`ation NMS is shown in Figure 2. At the EML, a set
`
`l377
`
`Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03,2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 6 of 15
`
`service provided by the system will be unavailable.
`If the !EMS.Locator fails, a NeMoW objects will not
`be able to find out which IEMS object it should talk
`to for a particular network element, so the system
`fails.
`
`III. DIFFERENT FAULT-TOLERANCE APPROACHES
`IN CORBA BASED SOFTWARE
`
`Reliability and fault-tolerance are usually achieved
`through replications of objects. Depending upon
`how these replications synchronize with each other,
`and how they interact the client objects, there are
`three major ways of object replication, namely "Cold
`Standby", "Warm Standby" and "Hot Standby" [11].
`
`A. Cold Standby
`
`In this approach, a client object only knows where
`the primary and the secondary objects are. The sec(cid:173)
`ondary object may or may not be invoked when the
`primary object is invoked. There is no communica(cid:173)
`tion between the primary and the secondary objects,
`or between the client and the secondary objects. If
`the primary object fails, the client object will invoke
`the secondary object, if necessary, rebind to the sec(cid:173)
`ondary object, and use it as the new primary ob(cid:173)
`ject. To save the state of the primary object, some
`stable storage mechanism must be used. The majior
`drawback of this approach is that the stable stora.ge
`becomes the single point of failure, and the time :re(cid:173)
`quired to invoke the secondary object and bring it
`to the current state of the primary object is usually
`long.
`
`B. Wann Standby
`
`As proposed in [11], the primary object and the
`secondary object are both invoked initially. The pri(cid:173)
`mary object periodically logs the incoming requests
`and the internal state into the secondary object.
`During the execution, the primary object updates
`the state of the secondary object whenever the pri(cid:173)
`mary object's state is changed. If the primary object
`crashes, the client can rebind to the secondary object,
`use it as the new primary object, and ask the repbca(cid:173)
`tion manager to invoke a new secondary object. The
`major drawback of this approach is that the proto(cid:173)
`col of the failover and the invocation of a new ob-
`
`Fig. 2. Component architecture of GTE's next generation
`NMS
`
`of IEMS objects manage the connection of network
`elements of different technology, different vendor and
`different protocol, and convert their propriety infor(cid:173)
`mati1on model into the generic information model of
`the system. At the NML is a set of NeMoW ob(cid:173)
`jects. Each NeMoW object is responsible for a par(cid:173)
`titioned subnet, or a set of subnets and may talk to
`several !EMS objects in order to provide the NML
`services for the network elements managed by these
`IEMS. A NeMoW object finds out which IEMS object
`it should talk to through the IEMS..Locator object.
`The CORBA naming server provides the naming ser(cid:173)
`vice for the whole system, and CORBA event chan(cid:173)
`nels provides the event service for upstream alarm
`dispatching.
`We can see that if any one of the IEMS object fails,
`a subset of the network elements will become invisi(cid:173)
`ble to upper layers. If any one of the NeMoW object
`fails, the related NML management functionality will
`become unavailable to internal or external customers
`and upper layer objects. If the naming server object
`fail:s, none of the other CORBA objects of the sys(cid:173)
`tem is able to resolve and bind the object references
`with object names, which in turn will cause the whole
`system to fail. If an event channel object fails, the
`system will not be able receive any alarms generated
`from the switches that are using this event channel
`to deliver alarms. So part of the fault management
`
`J.378
`
`Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03,2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 7 of 15
`
`ject is very complex even in the simplest case where
`only one primary object and one backup object are
`involved, and the failure of the replication manager
`is actually not considered [11]. Another drawback
`is that the replication protocol is not transparent to
`the client object, so this approach is not scalable to
`more object replications. The replication protocol
`will become more complex if we try to use the same
`approach for the replication manager.
`
`C. Hot Standby
`
`In this approach, multiple replications of an object
`are invoked, and they accept and respond to every
`client request simultaneously. Currently several ar(cid:173)
`chitectures exist using this approach.
`As a naive architecture using this approach, a
`client keeps a list of references of server object repli(cid:173)
`cations and sends point-to-point messages to each of
`them for every request. A client can choose the first
`reply, or wait for all returned values for voting. If one
`of these objects has crashed, other objects can react
`to incoming requests and no recovery is needed. The
`client can invoke a new object to add to the object
`group [11].
`More sophisticated architectures try to make
`the object group transparent to the client. Or(cid:173)
`bix+ ISIS (17] and Electra [18] use an integrated ar(cid:173)
`chitecture in which the standard Object Request
`Broker (ORB) is modified and extended to have
`group communication mechanism built in. The ma(cid:173)
`jor drawback of the integrated architecture is that
`it is ORB dependent (implemented on Orbix) and
`it requires special software and hardware. Another
`drawback of this approach is that the performance
`degrades linearly with increasing number of replica(cid:173)
`tions due to the increasing overhead involved in syn(cid:173)
`chronizing the replications [20].
`The object group service, proposed by Felber [19],
`advocates the building of group communication
`mechanism as a separate service, in the same way as
`the persistence service, the transaction services and
`other services, which can be added on to the standard
`CORBA architecture. The advantage of this archi(cid:173)
`tecture is that it is ORB independent, flexible and it
`makes the use of groups explicit. However, besides
`the same performance degradation problem as in the
`
`integrated architecture, there is no commercial soft(cid:173)
`ware currently available that provides this service.
`GTE is reluctant to build its own group communica(cid:173)
`tion service due to the potentially long development
`cycle and the risk of reinventing the wheel when some
`COTS products become available.
`
`IV. FAULT-TOLERANCE IN GTE'S NEXT
`GENERATION NMS
`
`GTE's current NMS, TONICS [5] (6] [7], uses an
`approach similar to the "warm standby". Under nor(cid:173)
`mal conditions, a mated-pair of TONICS servers are
`running. The management load is split between them
`such that each server is the primary . for approxi(cid:173)
`mately half of the managed objects. On startup, a
`client logs in to each server and queries it for the
`set of objects it manages. Requests are then di(cid:173)
`rected to the primary server for any given object.
`If either server goes down, both the client and the
`mated backup server detect the failure. The backup
`server then takes over the management load of the
`failed server. The client redirects all requests to the
`backup server. Each server has its own database and
`a db.sync process. The db.sync process manages the
`synchronous update to the TONICS server database
`pair, and reconciles two TONICS databases during
`startup. For our next generation NMS, we need to
`mitigate the effects of a failure in our Naming Ser(cid:173)
`vice, Event Channels or critical application/business
`objects.
`For various business and technical reasons, we have
`adopted the VisiBroker ORB from lnprise as our pri(cid:173)
`mary ORB for development. However the VisiBroker
`ORB does not support multi-cast or mirroring [21]
`so we can not use "hot standby" approaches similar
`to Orbix+ISIS [17] and Electra (18]. Also, the tight
`project delivery date does not allow us to develop
`our own group communication service as proposed
`by [19]. On the other hand, VisiBroker ORB pro(cid:173)
`vided some facilities that can be used to implement
`fault-tolerant objects. We summarize these features
`In our application, we may only use a
`as follows.
`subset of them.
`• Smart agent (osagent): A dynamic, distributed
`service that can be used by both client programs
`and object implementations. When a client in-
`
`1379
`
`Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03,2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 8 of 15
`
`vokes the bind method on an object, the os(cid:173)
`agent locates the object implementation and es(cid:173)
`tablishes a connection between the client and
`the object. Object implementations need to reg(cid:173)
`ister with the osagent so the client can locate
`them. Ordinarily, at least one instance of os(cid:173)
`agent should be running on a local network. To
`achieve fault-tolerance on the osagent, multiple
`osagents are running on a local network. If one of
`the osagents becomes unavailable, all object im(cid:173)
`plementations registered with that osagent will
`be automatically re-registered with another os(cid:173)
`agent. Meanwhile, client applications using an
`osagent that becomes unavailable will be auto(cid:173)
`matically switched to another osagent . The fail(cid:173)
`ure of an osagent is completely transparent to
`the client applications.
`• Auto-rebind: To achieve fault-tolerance on
`server objects, multiple instances of an ob(cid:173)
`ject implementation can be started on differ(cid:173)
`ent hosts. If one instance becomes unavailable,
`due to process failure, machine crash, or net(cid:173)
`work failure, the ORB will detect the loss of
`connection between the client object and that
`server instance, and will automatically contact
`the smart agent to establish a connection with
`another instance of the object implementation.
`The client can continue invoking methods on the
`object without being concerned that a new in(cid:173)
`stance of the object is being used. To enable
`aut orebind, the client must set the enable..rebind
`option as true when it first tries to bind to a
`server object.
`• Persistent object through Interceptors: When an
`object implementation maintains state, just us(cid:173)
`ing auto-rebind is not enough to achieve fault(cid:173)
`tolerance because additional steps must be taken
`to ensure that the rebound object has the same
`state as the old one. In these cases, the client
`can register an interceptor for the ORB object.
`When connection to an object implementation
`fails and the ORB re-connects the client to a
`replica object implementation, the bind Inter(cid:173)
`ceptor's rebind..succeeded() method will be in(cid:173)
`voked by the ORB. The client must provide
`
`an implementation of this method to bring the
`replica to the current state.
`• Persistent object reference: VisiBroker ORB
`support persistent object reference which remain
`valid beyond the lifetime of the process that cre(cid:173)
`ates the object. Persistent object reference, to(cid:173)
`gether with the auto-rebind mechanism, can be
`used to make the failure of server object trans(cid:173)
`parent to the client applications. Persistent ob(cid:173)
`ject references are registered with osagent when
`the boa.objJs..ready() method is invoked.
`• Object auto-restart through Object Activation
`Daemon (OAD): If an object is registered with
`an OAD, when the object crashes the OAD will
`restart it automatically. This facility can also be
`used to protect a host becoming unavailable by
`starting the OAD on multiple hosts and register
`the replications of an object with each of the
`OAD.
`• Object migration: An object can be terminated
`on one host and restarted on another host. The
`object migration facility can be used to provide
`load balancing and keep objects available when a
`host fails or has to be shutdown for maintenance.
`• Cloning object: The IDL compiler will gener(cid:173)
`ate a _clone() method for each object interface.
`The _clone() method will create an exact copy
`of the object's entire state and esta blish a new,
`separate connection to the replica object imple(cid:173)
`mentation. The object reference returned and
`the original object reference represent two dis(cid:173)
`tinct connections to the object implementations.
`The _clone() method provides an alternate way
`to start a replica object .
`• Interceptor: An interceptor object sits between
`the client and server , and can be used to view
`or make use of under-the-cover communications
`between client and servers. For example, the
`Bindlnterceptor.rebind() method will be invoked
`by the ORB when the object the client orig(cid:173)
`inally bind to fails. This method can be im(cid:173)
`plemented to invoke another object and/or no(cid:173)
`tify the client about the object failure if nec(cid:173)
`essary; the Bindlnterceptor.rebind..successded()
`method will be invoked by the ORB if a rebind
`
`1380
`
`Autnorized licensed use limited to: Georgia Institute of Technology. Downloaded on March 03,2022 at 22:16:46 UTC from IEEE Xplore. Restrictions apply.
`
`
`
`Case 6:20-cv-00272-ADA Document 65-11 Filed 03/14/22 Page 9 of 15
`
`succeeds. This method can be implemented to
`bring the rebinded object to the latest state of
`the original object if necessary; the Bindlnter(cid:173)
`ceptor.rebind_failed() function is invoked by the
`ORB when a rebind failed. This method can
`be implemented to do a different rebind, and/or
`notify the client about the rebind failure.
`
`A. Fault-tolerant Naming Service
`
`In this section we describe our implementation of
`a fault-tolerant naming service. The naming ser(cid:173)
`vice is usually running as a separate process. Us(cid:173)
`ing VisiBroker, a logfile name can be specified when
`the naming service is started at the command line.
`This logfile maintains a persistent record of the state
`of the naming service, including the naming con(cid:173)
`texts that it contains. If the naming service needs
`to be restarted, the logfile is used to restore the
`naming contexts that the service processed before
`being shutdown. The logfile provides support for
`planned system maintenance.
`It is not enough to
`support fault-tolerance for hardware failure or op(cid:173)
`erating system software failure because in either of
`these cases the naming service will not be able to
`restart on the same host. In order to support fault(cid:173)
`tolerance for hardware failure, operating system soft(cid:173)
`ware failure, and naming service software failure, we
`start the naming service on two different hosts, each
`with its own server interceptor (21] and database,
`as shown in Figure 3. The NamingServerlntercep(cid:173)
`tor.receive..request() method is invoked by the ORB
`when the server object receives a request from a
`client. We implement the NamingServerlntercep(cid:173)
`tor.receive..request() so that it parses the incoming
`messages and arguments. If the incoming message
`is one that changes the naming server's state, such
`as bind