`
`DEJAN S. MILO ´JI ˇCI ´C
`HP Labs
`
`FRED DOUGLIS
`
`AT&T Labs–Research
`
`YVES PAINDAVEINE
`
`TOG Research Institute
`
`RICHARD WHEELER
`
`EMC
`
`AND
`
`SONGNIAN ZHOU
`
`University of Toronto and Platform Computing
`
`Process migration is the act of transferring a process between two machines. It enables
`dynamic load distribution, fault resilience, eased system administration, and data
`access locality. Despite these goals and ongoing research efforts, migration has not
`achieved widespread use. With the increasing deployment of distributed systems in
`general, and distributed operating systems in particular, process migration is again
`receiving more attention in both research and product development. As
`high-performance facilities shift from supercomputers to networks of workstations, and
`with the ever-increasing role of the World Wide Web, we expect migration to play a
`more important role and eventually to be widely adopted.
`This survey reviews the field of process migration by summarizing the key concepts
`and giving an overview of the most important implementations. Design and
`implementation issues of process migration are analyzed in general, and then revisited
`for each of the case studies described: MOSIX, Sprite, Mach, and Load Sharing Facility.
`The benefits and drawbacks of process migration depend on the details of
`implementation and, therefore, this paper focuses on practical matters. This survey will
`help in understanding the potentials of process migration and why it has not caught on.
`
`Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]:
`Distributed Systems—network operating systems; D.4.7 [Operating Systems]:
`Organization and Design—distributed systems; D.4.8 [Operating Systems]:
`Performance—measurements; D.4.2 [Operating Systems]: Storage Management—
`distributed memories
`General Terms: Design, Experimentation
`Additional Key Words and Phrases: Process migration, distributed systems, distributed
`operating systems, load distribution
`
`Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted
`without fee provided that the copies are not made or distributed for profit or commercial advantage, the
`copyright notice, the title of the publication, and its date appear, and notice is given that copying is by
`permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists,
`requires prior specific permission and/or a fee.
`c(cid:176)2001 ACM 0360-0300/01/0900-0241 $5.00
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000, pp. 241–299.
`
`Ex. 1017 - Page 1 of 59
`
`
`
`242
`
`D. S. Milojiˇci´c et al.
`
`1. INTRODUCTION
`Organization of the Paper
`2. BACKGROUND
`2.1. Terminology
`2.2. Target Architectures
`2.3. Goals
`2.4. Application Taxonomy
`2.5. Migration Algorithm
`2.6. System Requirements for Migration
`2.7. Load Information Management
`2.8. Distributed Scheduling
`2.9. Alternatives to Process Migration
`3. CHARACTERISTICS
`3.1. Complexity and Operating
`System Support
`3.2. Performance
`3.3. Transparency
`3.4. Fault Resilience
`3.5. Scalability
`3.6. Heterogeneity
`3.7. Summary
`4. EXAMPLES
`4.1. Early Work
`4.2. Transparent Migration in
`UNIX-like Systems
`4.3. OS with Message-Passing Interface
`4.4. Microkernels
`4.5. User-space Migrations
`4.6. Application-specific Migration
`4.7. Mobile Objects
`4.8. Mobile Agents
`5. CASE STUDIES
`5.1. MOSIX
`5.2. Sprite
`5.3. Mach
`5.4. LSF
`6. COMPARISON
`7. WHY PROCESS MIGRATION
`HAS NOT CAUGHT ON
`7.1. Case Analysis
`7.2. Misconceptions
`7.3. True Barriers to Migration Adoption
`7.4. How these Barriers Might be Overcome
`8. SUMMARY AND FURTHER RESEARCH
`ACKNOWLEDGMENTS
`REFERENCES
`
`1. INTRODUCTION
`A process is an operating system ab-
`straction representing an instance of a
`running computer program. Process mi-
`gration is the act of transferring a pro-
`
`cess between two machines during its
`execution. Several implementations have
`been built for different operating systems,
`including MOSIX [Barak and Litman,
`1985], V [Cheriton, 1988], Accent [Rashid
`and Robertson, 1981], Sprite [Ousterhout
`et al., 1988], Mach [Accetta et al., 1986],
`and OSF/1 AD TNC [Zajcew et al., 1993].
`In addition, some systems provide mech-
`anisms that checkpoint active processes
`and resume their execution in essentially
`the same state on another machine, in-
`cluding Condor [Litzkow et al., 1988] and
`Load Sharing Facility (LSF) [Zhou et al.,
`1994]. Process migration enables:
`
`grating processes from overloaded nodes
`to less loaded ones,
`
`cesses from nodes that may have expe-
`rienced a partial failure,
`
`r dynamic load distribution, by mi-
`r fault resilience, by migrating pro-
`r improved system administration, by
`r data access locality, by migrating pro-
`
`migrating processes from the nodes that
`are about to be shut down or otherwise
`made unavailable, and
`
`cesses closer to the source of some data.
`Despite these goals and ongoing re-
`search efforts, migration has not achieved
`widespread use. One reason for this is the
`complexity of adding transparent migra-
`tion to systems originally designed to run
`stand-alone, since designing new systems
`with migration in mind from the begin-
`ning is not a realistic option anymore. An-
`other reason is that there has not been a
`compelling commercial argument for op-
`erating system vendors to support process
`migration. Checkpoint-restart approaches
`offer a compromise here, since they can
`run on more loosely-coupled systems by
`restricting the types of processes that can
`migrate.
`In spite of these barriers, process mi-
`gration continues to attract research. We
`believe that the main reason is the po-
`tentials offered by mobility as well as
`the attraction to hard problems, so in-
`herent to the research community. There
`have been many different goals and
`approaches to process migration because
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`Ex. 1017 - Page 2 of 59
`
`
`
`Process Migration
`
`of the potentials migration can offer to
`different applications (see Section 2.3 on
`goals, Section 4 on approaches, and Sec-
`tion 2.4 on applications).
`With the increasing deployment of dis-
`tributed systems in general, and dis-
`tributed operating systems in particular,
`the interest in process migration is again
`on the rise both in research and in prod-
`uct development. As high-performance fa-
`cilities shift from supercomputers to Net-
`works of Workstations (NOW) [Anderson
`et al., 1995] and large-scale distributed
`systems, we expect migration to play a
`more important role and eventually gain
`wider acceptance.
`Operating systems developers in in-
`dustry have considered supporting pro-
`cess migration, for example Solaris MC
`[Khalidi et al., 1996], but thus far the
`availability of process migration in com-
`mercial systems is non-existent as we
`describe below. Checkpoint-restart sys-
`tems are becoming increasingly deployed
`for long-running jobs. Finally, techniques
`originally developed for process migration
`have been employed in developing mobile
`agents on the World Wide Web. Recent in-
`terpreted programming languages, such
`as Java [Gosling et al., 1996], Telescript
`[White, 1996] and Tcl/Tk [Ousterhout,
`1994] provide additional support for agent
`mobility.
`There exist a few books that discuss
`process migration [Goscinski, 1991; Barak
`et al., 1993; Singhal and Shivaratri, 1994;
`Milojiˇci´c et al., 1999]; a number of sur-
`veys [Smith, 1988; Eskicioglu, 1990; Nut-
`tal, 1994], though none as detailed as
`this survey; and Ph.D. theses that deal
`directly with migration [Theimer et al.,
`1985; Zayas, 1987a; Lu, 1988; Douglis,
`1990; Philippe, 1993; Milojiˇci´c, 1993c; Zhu,
`1992; Roush, 1995], or that are related
`to migration [Dannenberg, 1982; Nichols,
`1990; Tracey, 1991; Chapin, 1993; Knabe,
`1995; Jacqmot, 1996].
`This survey reviews the field of pro-
`cess migration by summarizing the key
`concepts and describing the most impor-
`tant implementations. Design and im-
`plementation issues of process migration
`are analyzed in general and then re-
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`243
`
`visited for each of the case studies de-
`scribed: MOSIX, Sprite, Mach, and LSF.
`The benefits and drawbacks of process mi-
`gration depend on the details of implemen-
`tation and therefore this paper focuses
`on practical matters. In this paper we
`address mainly process migration mech-
`anisms. Process migration policies, such
`as load information management and dis-
`tributed scheduling, are mentioned to the
`extent that they affect the systems be-
`ing discussed. More detailed descriptions
`of policies have been reported elsewhere
`(e.g., Chapin’s survey [1996]).
`This survey will help in understand-
`ing the potential of process migration. It
`attempts to demonstrate how and why
`migration may be widely deployed. We
`assume that the reader has a general
`knowledge of operating systems.
`
`Organization of the Paper
`The paper is organized as follows. Sec-
`tion 2 provides background on process mi-
`gration. Section 3 describes the process
`migration by surveying its main charac-
`teristics: complexity, performance, trans-
`parency, fault resilience, scalability and
`heterogeneity. Section 4 classifies vari-
`ous implementations of process migration
`mechanisms and then describes a couple
`of representatives for each class. Section 5
`describes four case studies of process mi-
`gration in more detail. In Section 6 we
`compare the process migration implemen-
`tations presented earlier. In Section 7 we
`discuss why we believe that process migra-
`tion has not caught on so far. In the last
`section we summarize the paper and de-
`scribe opportunities for further research.
`
`2. BACKGROUND
`This section gives some background on
`process migration by providing an over-
`view of process migration terminology,
`target architectures, goals, application
`taxonomy, migration algorithms, system
`requirements, load information manage-
`ment, distributed scheduling, and alterna-
`tives to migration.
`
`Ex. 1017 - Page 3 of 59
`
`
`
`244
`
`D. S. Milojiˇci´c et al.
`
`Fig. 1. High Level View of Process Migration.
`Process migration consists of extracting the state of
`the process on the source node, transferring it to the
`destination node where a new instance of the process
`is created, and updating the connections with other
`processes on communicating nodes.
`
`2.1. Terminology
`A process is a key concept in operating
`systems [Tanenbaum, 1992]. It consists of
`data, a stack, register contents, and the
`state specific to the underlying Operating
`System (OS), such as parameters related
`to process, memory, and file management.
`A process can have one or more threads
`of control. Threads, also called lightweight
`processes, consist of their own stack and
`register contents, but share a process’s ad-
`dress space and some of the operating-
`system-specific state, such as signals. The
`task concept was introduced as a gener-
`alization of the process concept, whereby
`a process is decoupled into a task and a
`number of threads. A traditional process
`is represented by a task with one thread
`of control.
`Process migration is the act of trans-
`ferring a process between two machines
`(the source and the destination node) dur-
`ing its execution. Some architectures also
`define a host or home node, which is the
`node where the process logically runs. A
`high-level view of process migration is
`shown in Figure 1. The transferred state
`includes the process’s address space, exe-
`cution point (register contents), communi-
`cation state (e.g., open files and message
`channels) and other operating system de-
`pendent state. Task migration represents
`transferring a task between two machines
`during execution of its threads.
`During migration, two instances of the
`migrating process exist: the source in-
`stance is the original process, and the
`
`Fig. 2. Taxonomy of Mobility.
`
`destination instance is the new process
`created on the destination node. After mi-
`gration, the destination instance becomes
`a migrated process. In systems with a
`home node, a process that is running on
`other machines may be called a remote
`process (from the perspective of the home
`node) or a foreign process (from the per-
`spective of the hosting node).
`Remote invocation is the creation of a
`process on a remote node. Remote invo-
`cation is usually a less “expensive” opera-
`tion than process migration. Although the
`operation can involve the transfer of some
`state, such as code or open files, the con-
`tents of the address space need not be
`transferred.
`Generally speaking, mobility can be
`classified into hardware and software mo-
`bility, as described in Figure 2. Hardware
`mobility deals with mobile computing,
`such as with limitations on the connectiv-
`ity of mobile computers and mobile IP (see
`[Milojiˇci´c et al., 1999] for more details). A
`few techniques in mobile computing have
`an analogy in software mobility, such as
`security, locating, naming, and communi-
`cation forwarding. Software mobility can
`be classified into the mobility of passive
`data and active data. Passive data rep-
`resents traditional means of transferring
`data between computers; it has been em-
`ployed ever since the first two comput-
`ers were connected. Active data can be
`further classified into mobile code, pro-
`cess migration and mobile agents. These
`three classes represent incremental evo-
`lution of state transfer. Mobile code, such
`as Java applets, transfers only code be-
`tween nodes. Process migration, which is
`the main theme of this paper, deals pri-
`marily with code and data transfer. It also
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`Ex. 1017 - Page 4 of 59
`
`
`
`Process Migration
`
`deals with the transfer of authority, for
`instance access to a shared file system,
`but in a limited way: authority is under
`the control of a single administrative do-
`main. Finally, mobile agents transfer code,
`data, and especially authority to act on
`the owner’s behalf on a wide scale, such
`as within the entire Internet.
`
`2.2. Target Architectures
`Process migration research started with
`the appearance of distributed processing
`among multiple processors. Process mi-
`gration introduces opportunities for shar-
`ing processing power and other resources,
`such as memory and communication chan-
`nels. It is addressed in early multipro-
`cessor systems [Stone, 1978; Bokhari,
`1979]. Current multiprocessor systems,
`especially symmetric multiprocessors, are
`scheduled using traditional scheduling
`methods. They are not used as an envi-
`ronment for process migration research.
`Process migration in NUMA (Non-
`Uniform Memory Access) multiprocessor
`architectures is still an active area of re-
`search [Gait, 1990; Squillante and Nelson,
`1991; Vaswani and Zahorjan, 1991; Nelson
`and Squillante, 1995]. The NUMA archi-
`tectures have a different access time to the
`memory of the local processor, compared
`to the memory of a remote processor, or to
`a global memory. The access time to the
`memory of a remote processor can be vari-
`able, depending on the type of intercon-
`nect and the distance to the remote pro-
`cessor. Migration in NUMA architectures
`is heavily dependent on the memory foot-
`print that processes have, both in memory
`and in caches. Recent research on virtual
`machines on scalable shared memory mul-
`tiprocessors [Bugnion, et al., 1997] rep-
`resents another potential for migration.
`Migration of whole virtual machines be-
`tween processors of a multiprocessor ab-
`stracts away most of the complexities of
`operating systems, reducing the migrate-
`able state only to memory and to state
`contained in a virtual monitor [Teodosiu,
`2000]. Therefore, migration is easier to im-
`plement if there is a notion of a virtual
`machine.
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`245
`
`Massively Parallel Processors (MPP)
`are another type of architecture used
`for migration research [Tritscher and
`Bemmerl, 1992; Zajcew et al., 1993]. MPP
`machines have a large number of pro-
`cessors that are usually shared between
`multiple users by providing each of them
`with a subset, or partition, of the pro-
`cessors. After a user relinquishes a par-
`tition, it can be reused by another user.
`MPP computers are typically of a NORMA
`(NO Remote Memory Access) type, i.e.,
`there is no remote memory access. In
`that respect they are similar to net-
`work clusters, except they have a much
`faster interconnect. Migration represents
`a convenient tool to achieve repartition-
`ing. Since MPP machines have a large
`number of processors, the probability of
`failure is also larger. Migrating a running
`process from a partially failed node, for ex-
`ample after a bank of memory unrelated to
`the process fails, allows the process to con-
`tinue running safely. MPP machines also
`use migration for load distribution, such
`as the psched daemon on Cray T3E, or
`Loadleveler on IBM SP2 machines.
`Since its inception, a Local Area Net-
`work (LAN) of computers has been the
`most frequently used architecture for pro-
`cess migration. The bulk of the systems de-
`scribed in this paper, including all of the
`case studies, are implemented on LANs.
`Systems such as NOW [Anderson et al.,
`1995] or Solaris [Khalidi et al., 1996] have
`recently investigated process migration
`using clusters of workstations on LANs.
`It was observed that at any point in time
`many autonomous workstations on a LAN
`are unused, offering potential for other
`users based on process migration [Mutka
`and Livny, 1987]. There is, however, a so-
`ciological aspect to the autonomous work-
`station model. Users are not willing to
`share their computers with others if this
`means affecting their own performance
`[Douglis and Ousterhout, 1991]. The pri-
`ority of the incoming processes (process-
`ing, VM, IPC priorities) may be reduced
`in order to allow for minimal
`impact
`on the workstation’s owner [Douglis and
`Ousterhout, 1991; Krueger and Chawla,
`1991].
`
`Ex. 1017 - Page 5 of 59
`
`
`
`246
`
`Most recently, wide-area networks have
`presented a huge potential for migration.
`The evolution of the Web has significantly
`improved the relevance and the opportu-
`nities for using a wide-area network for
`distributed computing. This has resulted
`in the appearance of mobile agents, en-
`tities that freely roam the network and
`represent the user in conducting his tasks.
`Mobile agents can either appear on the In-
`ternet [Johansen et al., 1995] or in closed
`networks, as in the original version of
`Telescript [White, 1996].
`
`2.3. Goals
`The goals of process migration are closely
`tied with the type of applications that use
`migration, as described in next section.
`The goals of process migration include:
`Accessing more processing power
`is a goal of migration when it is used
`for load distribution. Migration is partic-
`ularly important in the receiver-initiated
`distributed scheduling algorithms, where
`a lightly loaded node announces its avail-
`ability and initiates process migration
`from an overloaded node. This was the
`goal of many systems described in this sur-
`vey, such as Locus [Walker et al., 1983],
`MOSIX [Barak and Shiloh, 1985], and
`Mach [Milojiˇci´c et al., 1993a]. Load distri-
`bution also depends on load information
`management and distributed scheduling
`(see Sections 2.7 and 2.8). A variation
`of this goal is harnessing the computing
`power of temporarily free workstations in
`large clusters. In this case, process mi-
`gration is used to evict processes upon
`the owner’s return, such as in the case of
`Sprite (see Section 5.2).
`Exploitation of resource locality is
`a goal of migration in cases when it is
`more efficient to access resources locally
`than remotely. Moving a process to an-
`other end of a communication channel
`transforms remote communication to lo-
`cal and thereby significantly improves per-
`formance. It is also possible that the re-
`source is not remotely accessible, as in the
`case when there are different semantics
`for local and remote accesses. Examples
`include work by Jul [1989], Milojiˇci´c et al.
`[1993], and Miller and Presotto [1981].
`
`D. S. Milojiˇci´c et al.
`
`Resource sharing is enabled by mi-
`gration to a specific node with a special
`hardware device, large amounts of free
`memory, or some other unique resource.
`Examples include NOW [Anderson et al.,
`1995]
`for utilizing memory of remote
`nodes, and the use of parallel make in
`Sprite [Douglis and Ousterhout, 1991] and
`work by Skordos [1995] for utilizing un-
`used workstations.
`Fault resilience is improved by migra-
`tion from a partially failed node, or in the
`case of long-running applications when
`failures of different kinds (network, de-
`vices) are probable [Chu et al., 1980]. In
`this context, migration can be used in com-
`bination with checkpointing, such as in
`Condor [Litzkow and Solomon, 1992] or
`Utopia [Zhou et al., 1994]. Large-scale sys-
`tems where there is a likelihood that some
`of the systems can fail can also benefit
`from migration, such as in Hive [Chapin
`et al., 1995] and OSF/1 AD TNC [Zajcew
`et al., 1993].
`System administration is simplified
`if long-running computations can be tem-
`porarily transferred to other machines.
`For example, an application could mi-
`grate from a node that will be shutdown,
`and then migrate back after the node is
`brought back up. Another example is the
`repartitioning of large machines, such as
`in the OSF/1 AD TNC Paragon configura-
`tion [Zajcew et al., 1993].
`Mobile computing also increases the
`demand for migration. Users may want to
`migrate running applications from a host
`to their mobile computer as they connect
`to a network at their current location or
`back again when they disconnect [Bharat
`and Cardelli, 1995].
`
`2.4. Application Taxonomy
`The type of applications that can benefit
`from process migration include:
`Parallelizable applications can be
`started on certain nodes, and then mi-
`grated at the application level or by a
`system-wide migration facility in response
`to things like load balancing consider-
`ations. Parallel Virtual Machine (PVM)
`[Beguelin et al., 1993]
`is an example
`of application-level support for parallel
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`Ex. 1017 - Page 6 of 59
`
`
`
`Process Migration
`
`invocation and interprocess communi-
`cation, while Migratory PVM (MPVM)
`[Casas et al., 1995] extends PVM to al-
`low instances of a parallel application
`to migrate among nodes. Some other
`applications are inherently paralleliz-
`able, such as the make tool [Baalbergen,
`1988]. For example, Sprite provides a
`migration-aware parallel make utility
`that distributes a compilation across
`several nodes [Douglis and Ousterhout,
`1991]. Certain processor-bound applica-
`tions, such as scientific computations, can
`be parallelized and executed on multi-
`ple nodes. An example includes work by
`Skordos [1995], where an acoustic appli-
`cation is parallelized and executed on a
`cluster of workstations. Applications that
`perform I/O and other nonidempotent op-
`erations are better suited to a system-wide
`remote execution facility that provides lo-
`cation transparency and, if possible, pre-
`emptive migration.
`Long-running applications, which
`can run for days or even weeks, can
`suffer various interruptions, for example
`partial node failures or administrative
`shutdowns. Process migration can relo-
`cate these applications transparently to
`prevent interruption. Examples of such
`systems include work by Freedman [1991]
`and MPVM [Casas et al., 1995]. Migra-
`tion can also be supported at the appli-
`cation level [Zhou et al., 1994] by pro-
`viding a checkpoint/restart mechanism
`which the application can invoke periodi-
`cally or upon notification of an impending
`interruption.
`Generic multiuser workloads,
`for
`example the random job mix that an
`undergraduate computer laboratory pro-
`duces, can benefit greatly from process mi-
`gration. As users come and go, the load on
`individual nodes varies widely. Dynamic
`process migration [Barak and Wheeler,
`1989, Douglis and Ousterhout, 1991] can
`automatically spread processes across all
`nodes, including those applications that
`are not enhanced to exploit the migration
`mechanism.
`An individual generic application,
`which is preemptable, can be used with
`various goals in mind (see Section 2.3).
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`247
`
`Such an application can either migrate it-
`self, or it can be migrated by another au-
`thority. This type of application is most
`common in various systems described in
`Section 4 and in the case studies described
`in Section 5. Note that it is difficult to
`select such applications without detailed
`knowledge of past behavior, since many
`applications are short-lived and do not ex-
`ecute long enough to justify the overhead
`of migration (see Section 2.7).
`Migration-aware applications are
`applications that have been coded to
`explicitly take advantage of process
`migration. Dynamic process migration can
`automatically redistribute these related
`processes if the load becomes uneven on
`different nodes, e.g. if processes are dy-
`namically created, or there are many more
`processes than nodes. Work by Skordos
`[1995], Freedman [1991] and Cardelli
`[1995] represent this class of application.
`They are described in more detail in Sec-
`tion 4.6.
`Network applications are the most
`recent example of the potential use of mi-
`gration: for instance, mobile agents and
`mobile objects (see Sections 4.7 and 4.8).
`These applications are designed with mo-
`bility in mind. Although this mobility dif-
`fers significantly from the kinds of “pro-
`cess migration” considered elsewhere in
`this paper, it uses some of the same tech-
`niques: location policies, checkpointing,
`transparency, and locating and communi-
`cating with a mobile entity.
`
`2.5. Migration Algorithm
`Although there are many different migra-
`tion implementations and designs, most of
`them can be summarized in the following
`steps (see also Figure 3):
`1. A migration request is issued to a
`remote node. After negotiation, mi-
`gration has been accepted.
`2. A process is detached from its
`source node by suspending its execu-
`tion, declaring it to be in a migrating
`state, and temporarily redirecting com-
`munication as described in the follow-
`ing step.
`
`Ex. 1017 - Page 7 of 59
`
`
`
`248
`
`D. S. Milojiˇci´c et al.
`
`Fig. 3. Migration Algorithm. Many details have been simplified, such as user v. kernel migration, when
`is process actually suspended, when is the state transferred, how are message transferred, etc. These details
`vary subject to particular implementation.
`
`temporarily
`3. Communication is
`redirected by queuing up arriving
`messages directed to the migrated
`process, and by delivering them to
`the process after migration. This step
`continues in parallel with steps 4,
`5, and 6, as long as there are addi-
`tional
`incoming messages. Once the
`communication channels are enabled
`after migration (as a result of step 7),
`the migrated process is known to the
`external world.
`4. The process state is extracted,
`including memory contents; proces-
`sor state (register contents); commu-
`nication state (e.g., opened files and
`
`message channels); and relevant ker-
`nel context. The communication state
`and kernel context are OS-dependent.
`Some of the local OS internal state is
`not transferable. The process state is
`typically retained on the source node
`until the end of migration, and in some
`systems it remains there even after mi-
`gration completes. Processor dependen-
`cies, such as register and stack con-
`tents, have to be eliminated in the case
`of heterogeneous migration.
`5. A destination process instance is
`created into which the transferred
`state will be imported. A destination in-
`stance is not activated until a sufficient
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`Ex. 1017 - Page 8 of 59
`
`
`
`Process Migration
`
`amount of state has been transferred
`from the source process instance. Af-
`ter that, the destination instance will
`be promoted into a regular process.
`6. State is transferred and imported
`into a new instance on the remote
`node. Not all of the state needs to be
`transferred; some of the state could be
`lazily brought over after migration is
`completed (see lazy evaluation in Sec-
`tion 3.2).
`7. Some means of forwarding refer-
`ences to the migrated process must be
`maintained. This is required in order
`to communicate with the process or to
`control it. It can be achieved by regis-
`tering the current location at the home
`node (e.g. in Sprite), by searching for the
`migrated process (e.g. in the V Kernel,
`at the communication protocol level),
`or by forwarding messages across all
`visited nodes (e.g. in Charlotte). This
`step also enables migrated communica-
`tion channels at the destination and it
`ends step 3 as communication is perma-
`nently redirected.
`8. The new instance is resumed when
`sufficient state has been transferred
`and imported. With this step, process
`migration completes. Once all of the
`state has been transferred from the
`original instance, it may be deleted on
`the source node.
`
`2.6. System Requirements for Migration
`To support migration effectively, a system
`should provide the following types of func-
`tionality:
`
`r Exporting/importing the process
`
`state. The system must provide some
`type of export/import interfaces that
`allow the process migration mechanism
`to extract a process’s state from the
`source node and import this state on the
`destination node. These interfaces may
`be provided by the underlying operating
`system,
`the programming language,
`or other elements of the programming
`environment that the process has access
`to. State includes processor registers,
`process address space and communi-
`
`ACM Computing Surveys, Vol. 32, No. 3, September 2000.
`
`249
`
`cation state, such as open message
`channels in the case of message-based
`systems, or open files and signal masks
`in the case of UNIX-like systems.
`
`r Naming/accessing the process and
`
`its resources. After migration, the mi-
`grated process should be accessible by
`the same name and mechanisms as
`if migration never occurred. The same
`applies to process’s resources, such as
`threads, communication channels, files
`and devices. During migration, access to
`a process and/or some of its resources
`can be temporarily suspended. Varying
`degrees of transparency are achieved in
`naming and accessing resources during
`and after migration (see Section 3.3).
`
`r Cleaning up the process’s non-
`
`migratable state. Frequently, the mi-
`grated process has associated system
`state that is not migratable (examples
`include a local process identifier, pid, and
`the local time; a local pid is relevant only
`to the local OS, and every host may have
`a slightly different value for the local
`time—something that may or may not
`matter to a migrating process). Migra-
`tion must wait until the process finishes
`or aborts any pending system operation.
`If the operation can be arbitrarily long, it
`is typically aborted and restarted on the
`destination node. For example, migra-
`tion can wait for the completion of local
`file operations or local device requests
`that are guaranteed to return in a lim-
`ited time frame. Waiting for a message or
`accessing a remote device are examples
`of operations that need to be aborted and
`restarted on the remote node. Processes
`that cannot have their non-migrateable
`state cleaned cannot be considered for
`migration.
`
`2.7. Load Information Management
`The local processes and the resources of
`local and remote nodes have to be char-
`acterized, in order to select a process for
`migration and a destination node, as well
`as to justify migration. This task is com-
`monly known as load information manage-
`ment. Load information is collected and
`passed to a distributed scheduling policy
`
`Ex. 1017 - Page 9 of 59
`
`
`
`250
`
`Fig. 4. Load Information Management Mod-
`ule collects load information on the local node
`and disseminates it among the nodes. Distributed
`Scheduling instructs the migration mechanism
`when, where, and which process to migrate.
`
`(see Figure 4). Load information manage-
`ment is concerned with the following three
`questions:
`What is load information and how
`is it represented? The node load is typi-
`cally represented by one or more of the fol-
`lowing load indices: utilization of the CPU,
`the length of the queue of processes wait-
`ing to be executed, the stretch factor (ra-
`tio between turnaround- and execution-
`time-submission to completion v. start to
`completion) [Ferrari and Zhou 1986], the
`number of running processes, the num-
`ber of background processes, paging, com-
`munication [Milojiˇci´c, 1993c], disk uti-
`lization, and the interrupt rate [Hwang
`et al., 1982]. A process load is typically
`characterized by process lifetime, CPU us-
`age, memory consumption (virtual and
`physical), file usage [Hac, 1989a], commu-
`nication [Lo, 1989], and paging [Milojiˇci´c,
`1993c]. Kuntz uses a combination of work-
`load descriptions for distributed schedul-
`ing [Kunz, 1991]. The application type is
`considered in Cedar [Hagmann, 1986].
`When are load information col-
`lection and dissemination activated?
`These can be periodic or event-based. A
`