`Proceedings of t11W1c---
`ACM SIGPLAN 2002 Conference
`on
`Programming Language
`' Design and Implementation®
`(PLDl'02)
`
`Berlin, Germany
`June 17-19, 2002
`
`Sponsored by the
`Association for Computing Machinery
`Special Interest Group on Programming Languages
`(ACM SIGPLAN)
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 1 of 13
`
`
`
`The Association for Computing Machinery
`1515 Broadway
`New York, New York 10036
`
`Copyright © 2002 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital
`or hard copies of portions of this work for personal or classroom use is granted without fee provided that
`copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
`the full citation on the first page. Copyright for components of this work owned by others than ACM must
`be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to
`redistribute to lists, requires prior specific permission and/or a fee. Request permission to republish from:
`Publications Dept., ACM, Inc. Fax+ 1 (212) 869-0481 or <permissions@acm.org>.
`
`For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted
`provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center,
`222 Rosewood Drive, Danvers, MA 01923.
`
`Notice to Past Authors of ACM-Published Articles
`
`ACM intends to create a complete electronic archive of all articles and/or other material previously
`published by ACM. If you have written a work that has been previously published by ACM in any journal
`or conference proceedings prior to 1978, or any SIG Newsletter at any time, and you do NOT want this
`work to appear in the ACM Digital Library, please inform permissions@acm.org, stating the title of the
`work, the author(s), and where and when published.
`
`ACM ISBN: 1-58113-463-0
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`PO Box 11405
`New York, NY 10286-1405
`
`Phone: 1-800-342-6626
`(US and Canada)
`+ 1-212-626-0500
`(all other countries)
`Fax: +1-212-944-1318
`E-mail: acmhelp@acm.org
`
`ACM Order Number 548020
`Printed in the USA
`
`ii
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 2 of 13
`
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`Profile-Directed Optimization of Event-Based Programs
`
`Mohan Rajagopalan Saumya K. Debray
`Department of Computer Science
`University of Arizona
`Tucson, AZ 85721, USA
`{rnohan, debray}@cs.arizona.edu
`
`Matti A. Hiltunen Richard D. Schlichting
`AT&T Labs-Research
`180 Park Avenue
`Florham Park, NJ 07932, USA
`{hiltunen, rick}@research.att.com
`
`ABSTRACT
`Events are used as a fundamental abstraction in programs ranging
`from graphical user interfaces (GUis) to systems for building cus(cid:173)
`tomized network protocols. While providing a flexible structuring
`and execution paradigm, events have the potentially serious draw(cid:173)
`back of extra execution overhead due to the indirection between
`modules that raise events and those that handle them. This pa(cid:173)
`per describes an approach to addressing this issue using static opti(cid:173)
`mization techniques. This approach, which exploits the underlying
`predictability often exhibited by event-based programs, is based on
`first profiling the program to identify commonly occurring event
`sequences. A variety of techniques that use the resulting profile in(cid:173)
`formation are then applied to the program to reduce the overheads
`associated with such mechanisms as indirect function calls and ar(cid:173)
`gument marshaling. In addition to describing the overall approach,
`experimental results are given that demonstrate the effectiveness
`of the techniques. These results are from event-based programs
`written for X Wmdows, a system for building GUis, and Cactus,
`a system for constructing highly configurable distributed services
`and network protocols.
`
`Categories and Subject Descriptors
`D.3.4 [Programming Languages]: Processors-compilers; opti(cid:173)
`mization
`
`General Terms
`Profiling, Events, Handlers, Performance
`
`INTRODUCTION
`1.
`Events are increasingly being used as a fundamental abstraction
`for writing programs in a variety of contexts. They are used to
`
`Permission to make digital or hard copies of all or part of this work for .
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`PLD/'02 June 17-19, 2002, Berlin, Germany.
`Copyright 2002 ACM l-58113-463-0/02/0006 ... $5.00.
`
`structure user interaction code in GUI systems [8, 18], form the
`basis for configurability in systems to build customized distributed
`services and network protocols [4, 9, 16], are the paradigm used for
`asynchronous notification in distributed object systems [19], and
`are advocated as an alternative to threads in web servers and other
`types of ~ystem code [20, 23]. Even operating system kernels can
`be viewed as event-based systems, with the occurrence of interrupts
`and system calls being events that drive execution.
`The rationale behind using events is multifaceted. Events are
`asynchronous, which is a natural match for the reactive execution
`behavior of GUis and operating systems. Events also allow the
`modules raising events to be decoupled from those fielding the
`In short, event-based
`events, thereby improving configurability.
`programming is generally more flexible and can often be used to
`realize richer execution semantics than traditional procedural or
`thread-oriented styles.
`Despite these advantages, events have the potentially serious dis(cid:173)
`advantage of extra execution overhead due to the indirection be(cid:173)
`tween modules that raise and handle events [5, 14]. Typically, there
`is a registry that maps an event to a collection of handlers to be exe(cid:173)
`cuted when the event occurs. Because these handlers are not known
`statically-and may in fact change dynamically-they are invoked
`indirectly. Depending on the system, the number and type of the
`arguments passed to the handler may also not be known, requiring
`argument marshaling. Finally, there may be repeated work, e.g.,
`initialization or checking of shared data structures, across multiple
`handlers for a given event. All these extra costs can be surprisingly
`high-our experiments indicate that they can account for up to 20%
`of the total execution time in some scenarios.
`This paper describes a collection of static optimizations designed
`to reduce the overhead of event-based programs. Our approach ex(cid:173)
`ploits the underlying predictability of many event-based programs
`to generate an event profile that is conceptually akin to a path profile
`through the call graph of the program. These profiles are then used
`to identify commonly encountered events, as well as the collec(cid:173)
`tion of handlers associated with each event and the order in which
`they are executed. This information is then used to optimize event
`execution by, for example, merging handlers and chaining events.
`The techniques are specific to event-based programs, ~ince stan(cid:173)
`dard optiprization techniques are largely ineffective in this con(cid:173)
`text. For/example, conventional static analysis techniques cannot
`generally discover the connections between events and handlers
`let alone optimize away the associated overheads. Dynamic opti~
`
`106
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 3 of 13
`
`
`
`mization systems such as Dynamo [2] can be used in principle, but
`they focus primarily on lightweight optimizations such as improv(cid:173)
`ing locality and instruction-cache usage in an effort to keep runtime
`overheads low. In contrast, the optimizations we consider are sub(cid:173)
`stantially more heavyweight, and-in the context of event-based
`programs-offer correspondingly greater benefits. Our techniques
`are specifically designed to improve execution on small mobile de(cid:173)
`vices, where resource constraints make any reduction in 'overhead
`valuable.
`/
`The remainder of the paper is organized as follows. Section 2
`describes a general model for event-based programs. This is fol(cid:173)
`lowed in section 3 by a description of our approach to optimizing
`such programs, including our profiling scheme and the collection
`of optimization techniques based on these profiles. Section 4 gives
`experimental results that demonstrate the potential improvements
`for three different examples. The first two, a video application and
`a configurable secure communication service, are built using Cac(cid:173)
`tus, a system for cons~cting highly configurable distributed ser(cid:173)
`vices and network protocols, that supports event-based execution
`[ 10, 12). The third is a cl ient side tool that uses X Windows, a
`popular system for building GUis [18). This is followed by discus(cid:173)
`sions of possible extensions in section 5 and related work in section
`6. Finally, section 7 offers conclusions.
`
`2. EVENT-BASED PROGRAMS
`While event-based programs differ considerably depending on
`the specifics of the underlying programming model and notation,
`their architectures have a number of broad underlying similarities.
`Because of this, the optimizations described in this paper are gen(cid:173)
`erally applicable to most such systems. This section presents a
`general model for event-based system~ in order to provide a com(cid:173)
`mon framework for discussion. As examples, we describe how both
`Cactus and the X Windows system map into the model.
`2.1 Components
`Our general model consists of three main components: events,
`handlers that specify the reaction to an event, and bindings that
`specify which handlers are to be executed when a specific event
`occurs.
`
`Events. Events abstract the asynchronous occurrence of stimuli
`that must be dealt with by a program. Mouse motion, button click,
`and key press are examples of such events in a user interface con(cid:173)
`text, while receiving a packet from the network and message pass(cid:173)
`ing are examples in a systems context. In addition to such external
`events, an event-based program may use internal events that are
`generated and processed within the program. The set of events used
`in the event system may be fixed or the system may allow programs
`to define new events. Basic events may be composed into complex
`events. For example, two basic button click events within a short
`time period can be defined to sonstitute a double-click event.
`Handlers. Handlers direct the response of the program to event(cid:173)
`based stimuli. Specifically, a handler is a section of code that
`specifies the actions to be performed when a given event occurs.
`Typically, handlers have at least one parameter, the event that was
`raised; other parameters may be passed through variable argument
`lists or through shared data structures. The decoupling provided
`by the event mechanism allows handlers to be developed indepen(cid:173)
`dently from other handlers in the program.
`
`Bindings. Bindings determine which handlers are executed when a
`specific event occurs. The binding between an event and a hander is
`often provided using some type of runtime bind operation, although
`the binding may also be predefined and fixed. Most systems allow
`multiple handlers to be bound to a single event and a handler to be
`
`bound to more than one event. An event is ignored if no handlers
`are bound to the event. The execution order of multiple handlers
`bound to the same event may be important. Bindings may be static,
`i.e., remain the same throughout the execution of the program, or
`dynamic, i.e., may change at runtime. Figure 1 illustrates bindings.
`
`Even! A
`
`Evenl B
`
`Event C
`
`Even! D
`
`Figure 1: Event bindings
`
`Bindings are maintained in a registry that maps each event to
`a list of handlers. The registry may be implemented as a shared
`data structure like the table shown in the figure, or each list may
`be maintained as a part of an event data structure. For distributed
`systems where handlers may be on distinct physical machines, the
`registry may be implemented using either a centralized or decen(cid:173)
`tralized approach.
`2.2 Execution
`The handlers bound to an event are executed when the event oc(cid:173)
`curs. An event may occur because the program receives some exter(cid:173)
`nal stimulus (external event) or because some program component
`raises the event (internal event). An execution environment or run(cid:173)
`time system is typically responsible for detecting or receiving exter(cid:173)
`nal stimuli and activating the corresponding events. As a result, we
`say these events are raised implicitly, whereas events directly acti(cid:173)
`vated by a program component are raised explicitly. Timed events
`are events that are activated at a specified time or after a specified
`delay.
`We identify two major types of event activation: synchronous
`activation and asynchronous activation. With synchronous activa(cid:173)
`tion, the specified handlers are executed to completion before the
`activator continues execution. With asynchronous activation, the
`activator continues execution without any guarantees as to when
`the handlers are executed. The different types of event activation
`have specific uses in event-based systems. Synchronous activation
`can be used for internal events when the event activator needs to
`know when the processing of a message has completed before con(cid:173)
`tinuing its own processing. Synchronous activation can be used
`for external events when the runtime system needs to ensure that
`such events are executed sequentially without interleaving. Asyn(cid:173)
`chronous activation can be used when none of these requirements
`apply.
`The overall picture of the event-based program to be optimized
`then consists of a program that reacts to stimuli from its environ(cid:173)
`ment, such as user actions or messages. These stimuli are con(cid:173)
`verted into events. Each event may have multiple handlers bound
`to it and handlers may activate other events synchronously or asyn(cid:173)
`chronously. Thus, the occurrence of an event may lead to the ac(cid:173)
`tivation of a chain of handlers and other events and, in turn, their
`handlers. Events can also be generated by the passage of time (e.g.,
`timeouts). The type of event activation has implications on our
`optimization techniques. For example, since the handlers for a syn(cid:173)
`chronous activation are executed when the event is raised, an opti(cid:173)
`mization that replaces the activation call with calls to the handlers
`
`107
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 4 of 13
`
`
`
`Top AP!
`
`Micro-protocols
`
`Events
`
`DESPrivacy
`
`KeyeclMD5Integrity
`
`RSAAuthenticity
`
`ClientKeyDistribution
`
`Bottom AP!
`
`Figure 2: Cactus composite protocol
`
`bound to the event at that time results in a correct transformation.
`Similarly, it is easy to see that sequences of nested synchronous ac(cid:173)
`tivations can be readily optimized. The specific optimization tech(cid:173)
`niques and their limitations are discussed below in section 3.
`2.3 Example Systems
`Cactus. Cactus is a system and a framework for constructing
`configurable protocols and services, where each service property
`or functional component is implemented as a separate module [ 10].
`As illustrated in figure 2, a service in Cactus is implemented as a
`composite protocol, with each service property or other functional
`component implemented as a micro-protocol. A customized in(cid:173)
`stance of the composite protocol is constructed simply by choosing
`the appropriate set of micro-protocols. A micro-protocol is struc(cid:173)
`tured as a collection of event handlers that correspond to the han(cid:173)
`dlers in our general event-based model. A typical micro-protocol
`consists of two or more event handlers. Events in Cactus are user(cid:173)
`defined. A typical composite protocol uses 10-20 different events
`consisting of a few external events caused by interactions with soft(cid:173)
`ware outside the composite protocol and numerous internal events
`used to structure the internal processing of a message or service
`request. Each event typically has multiple event handlers. As a re(cid:173)
`sult, Cactus composite protocols often have long chains of events
`and event handlers activated by one event. Section 4 gives concrete
`examples of events used in a Cactus composite protocol.
`The Cactus runtime system provides a variety of operations for
`managing events and event handlers. In particular, operations are
`provided for binding an event handler to a specified event (bind)
`and for activating an event (raise). Event handler binding is com(cid:173)
`pletely dynamic. Events can be raised either synchronously or
`asynchronously, and an event can also be raised with a specified
`delay to implement time-driven execution. The order of event han(cid:173)
`dler execution can be specified if desired. Arguments can be passed
`to handlers in both the bind and raise operations. Other operations
`are available for unbinding handlers, creating and deleting events,
`halting event execution, and canceling a delayed event. Handler
`execution is atomic with respect to concurrency, i.e., a handler i~
`executed to completion before any other handler is started unless it
`voluntarily yields the CPU. Cactus does not directly support com(cid:173)
`plex events, but such events can be implemented by defining a new
`event and having a micro-protocol raise this event when the condi(cid:173)
`tions for the complex event are satisfied.
`The X Window system. X is a popular GUI framework for Unix
`systems. The standard architecture of an X based system is shown
`
`108
`
`in figure 3. The X server is a program that runs on each system
`supporting a graphics display and is responsible for managing de(cid:173)
`vice drivers. Application programs, also called X clients, may be
`local or remote to the display system. X servers and X clients use
`the X-protocol for communication. X clients are typically built on
`the Xlib libraries using toolkits such as Xt, GTK, or Qt. X clients
`are implemented as a collection of widgets, which are the basic
`building blocks of X applications.
`An X event is defined as "a packet of data sent by the server to
`the client in response to user behavior or to window system changes
`resulting from interactions between windows" [18]. Examples of
`X events include mouse motion, focus change, and button press.
`These events are recognized through device drivers and relayed to
`the X server, which in turn conveys them to X clients. The Xlib
`framework specifies 33 basic events. X clients may choose to re(cid:173)
`spond to any of these based on event masks that are specified at
`bind time. Events are also used for communication between wid(cid:173)
`gets. Events can arrive in any order and are queued by the X client.
`Event activation in X is similar t9 synchronous activation in the
`general model.
`The X architecture has three mechanisms for handling events:
`event handlers, callback functions, and action procedures. All these
`map to handlers in the general model and are used to specify differ(cid:173)
`ent granularities of control. Event handlers, the most primitive, are
`simply procedures bound to event names. Callback functions and
`action procedures are more commonly used high-level abstractions.
`One difference between the three mechanisms relates to scope(cid:173)
`actions have global scope in an X client, while the scope of event
`handlers and callbacks is restricted to the widget in which they are
`defined. Another difference is their execution semantics. An event
`handler can be bound to multiple events in such a way that it is ex(cid:173)
`ecuted when any of the associated events occur. A callback func(cid:173)
`tion, on the other hand, is bound to a specific callback name, and
`all functions bound to a name are executed when the correspond(cid:173)
`ing callback is issued. Actions provide an additional level of in(cid:173)
`direction, where a mapping is created first between an event and
`the action name, and then between the action name and the action
`procedure.
`In addition to these three, X has a number of other mechanisms
`that can be broadly classified as event handling, namely timeouts,
`signal handlers, and input handlers. Each of these mechanisms
`allows the program to specify a procedure to be called when a given
`condition occurs. For all these handler types, X provides operations
`for registering the handlers and activating them.
`
`3. OPTIMIZATION APPROACH
`Compiler optimizations are based on being able to statically pre(cid:173)
`dict aspects of a program's runtime behavior using either invariants
`that always hold at runtime (i.e., based on dataflow analysis) or as(cid:173)
`sertions that are likely to hold (i.e., based on execution profiles).
`Event-based systems, in contrast, are lai:iely unpredictable in their
`runtime behavior due to the uncertainties associated with the be-
`
`Devices
`Device
`rivers
`X-Server
`
`X - Client Application
`
`XI
`
`Toolkit
`
`Qt
`
`XLib
`
`X-Protocol
`
`Figure 3: Architecture of X Window systems
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 5 of 13
`
`
`
`EventGraph = 0;
`prev_event = eventTrace-tfirstEvent;
`while not (end of eventTrace) {
`event = eventTrace➔ nextEvent;
`if (prev_event,event) not in EventGraph {
`EventGraph += (prev_event,event);
`EventGraph(prev_event,event) ➔ weight = l ;
`} else
`eventGraph(prev_event,event) ➔ weight++ ;
`prev _event = event;
`
`Figure 4: GraphBuilder algorithm.
`
`havior of their external environment, e.g., the user's actions. We
`have found, however, that in practice, there is a significant amount
`of predictabifay in their internal behavior that can be exploited for
`optimization purpose!» This predictability occurs at two levels. At
`the event level, certain s~quences of events occur in all (or most)
`system executions. At the handler level, there is often more than
`one handler bound to a specific event, and all these handlers are ex(cid:173)
`ecuted in sequence each time the event occurs. Handlers are gener(cid:173)
`ally developed to work as independently as possible, and the over(cid:173)
`all execution flow is determined by bindings performed at runtime.
`The use of runtime binding also means that the program's behavior
`can be changed dynamically by changing the event/handler config(cid:173)
`uration from within the program.
`Event and handler profiling are used to identify predictable as(cid:173)
`pects of an event-based program's behavior. This section describes
`these techniques and the optimizations performed based on the re(cid:173)
`sults.
`3.1 Event Profiling
`We identify static optimization opportunities in an event-based
`program using event and handler execution profiles. Profiling is
`used instead of static approaches such as code and registry data
`structure analysis since, as noted above, binding information is gen(cid:173)
`erally available only at runtime and may in fact change during ex(cid:173)
`ecution. We first identify commonly occurring event sequences by
`instrumenting the event system to log an entry each time an event
`occurs, indicating the event being raised and whether it is being
`raised synchronously or asynchronously. We then use the resulting
`event profiles to identify frequently invoked event hand_lers, add in(cid:173)
`strumentation code to each such handler, and log entries each time
`the handler is invoked, thereby obtaining handler profiles. Profil(cid:173)
`ing is done for one program-and for configurable programs, one
`program configuration-at a time. At present, the event framework
`is instrumented by hand, but this can easily be automated using
`well-understood techniques [3]. The analysis and optimizations are
`currently performed manually off-line after the program to be op(cid:173)
`timized has been executed enough times to develop an adequate
`profile. On-line analysis and optimization, as well as automation,
`are potential extensions to this work and are discussed in section 5.
`The profiling algorithm takes the event trace generated by the
`instrumented event framework and constructs an event graph that
`summarizes the event sequences in the trace. There is an edge from
`node A to node B in the graph if event A is ever followed immedi(cid:173)
`ately by event Bin the trace. Each edge (A , B) has an associated
`weight indicating how many times the sequence (A, B) appeared
`in the trace. The algorithm used to generate the event graph is pre(cid:173)
`sented in figure 4. Note that in the event trace, if an event A is fol(cid:173)
`lowed immediately by an event B that was raised synchronously,
`then we can infer that execution of B follows A sequentially. How(cid:173)
`ever, if B was raised asynchronously, then the fact that it follows
`
`r?,::=:========12~ -s;~ple - - ; -4-2-
`
`~ " J :.:-- - - - r c------:-:,-------::::--4-7-:--.9
`
`_
`
`_
`
`, MsgFrrnUserH ,
`--
`- i: -- 1
`392
`
`160
`
`42
`
`391
`
`'---~=::::::::::'.,:-~r;:=~- -------<~; ~e]'~im~~;
`
`I
`
`Key:
`
`Synchronously Activated Events
`
`Asynchronously Activated Events
`
`Figure 5: Event graph generated from video player
`
`A in the event trace may not indicate causality, i.e., we cannot con(cid:173)
`clude that A had any role in raising B. For example, B may be the
`result of a timeout from an earlier event completely unrelated to A.
`The event graph is used as the starting point for the analysis that
`identifies predictable event and handler sequences. The first step
`is to use the edge weights to identify commonly occurring event
`sequences; while this mapping is not exact and more sophisticated
`techniques could be applied, we have found this approach to be
`sufficient in practice. Given an event graph G and a threshold 'f/,
`an event path of weight 'f/ is defined to be a path in G in which no
`edge has weight less than T/· To simplify the algorithm, we first
`discard from the event graph edges whose weights are below the
`threshold rJ; this produces a reduced event graph from which event
`paths are extracted. Each event path indicates a frequent sequence
`of events and hence represents a candidate for optimization. The
`remainder of this discussion focuses on event paths unless other(cid:173)
`wise mentioned. Note that the event paths so constructed are not
`quite the same as hot path profiles. Path profiling at the level of
`events is not used since path profiles tend to be large and expen(cid:173)
`sive to compute [13, 25], and since experimental results using the
`approach described above suggest that it is adequate for the opti(cid:173)
`mizations implemented.
`The next step is to perform handler level profiling to identify pre(cid:173)
`dictable sequences of handler activations. This profiling is needed
`for two reasons. First, an event may have multiple handlers that are
`executed in sequence each time the event occurs. Second, because
`of the decoupling between events and their handlers, knowing the
`events that occur does not by itself tell us about the handlers that
`are activated. The event paths in the event graph are used to identify
`the most promising events for handler level profiling. The handlers
`for the nodes in each event path are instrumented and an entry is
`logged each time the handler is invoked. Based on this trace, an(cid:173)
`other graph called the handler graph is constructed to use as the
`basis for optimization. The profiling and graph construction for
`handlers is carried out in the same way as before.
`
`109
`
`GOOGLE EXHIBIT 1039
`GOOGLE v. NEONODE
`IPR2021-01041
`
`Page 6 of 13
`
`
`
`Events
`r Event A ks- = --- ---
`
`Handlers
`
`------~~
`
`-fl',ts-gFr~~U~rH 1
`'--- ___ ,
`
`2-
`
`39
`MsgFromUserL 1+--
`-
`
`Threshold • 300
`
`I
`
`,
`Handler!
`I
`
`HI_ code
`
`Handler2
`I
`
`H2_code
`
`I
`
`~
`
`Handler3
`I
`
`H3_code
`
`I
`
`391
`
`,~c~ntrolle!ClkH ' 392
`"--------"
`
`ControllerClkL
`
`Figure 6: Reduced event graph
`
`Event A
`
`Events
`
`Handlerl 23
`I
`
`Hl_code
`H2_code
`I H3_code
`
`Figure 7: Handfer merging
`
`Figure 5 shows the event graph for a video player application im(cid:173)
`plemented on top of a configurable transport protocol called CTP
`built using Cactus [24]; the bold edges are discussed below in sec(cid:173)
`tion 3.2.1, while details of the application are given in section 4.2.
`Figure 6 shows the corresponding reduced event graph for r, =
`300.
`3.2 Optimization Techniques
`Once the most frequent event and handler sequences have been
`identified, the optimizations are performed based on the handler
`graph. The goal is to eliminate:
`
`• Marshalling overheads for event activations.
`
`• Indirect function call and variable argument passing costs.
`
`• State maintenance (synchronization and locking) costs for
`global variables.
`
`• Redundant initializations and code fragments for events with
`multiple handlers.
`
`For synchronous events, we also expect to observe event se(cid:173)
`quences that can be chained together. In addition, elimination of
`indirect function calls increases the potential for value-based op(cid:173)
`timizations such as constant propagation. Another option that has
`been explored is inlining code for raising popular events. All these
`optimizations can be classified broadly as either graph or compiler
`optimizations. This section describes each type in turn.
`
`3.2.1 Graph Optimizations
`Graph optimizations try to reduce the costs associated with in(cid:173)
`teractions between events and handlers in the program by reducing
`the number of handler activations along common event paths . This
`is done by reducing the number of nodes in an event graph and
`generating simpler collapsed graphs, as well as by merging handler
`nodes to create super-handlers for events and chains. Of course,
`correctness of these transformations is an important requirement;
`this issue is discussed in the next section.
`
`Handler Merging. In the case of events with multiple handlers, the
`handler graph shows a sequence of contiguous nodes. The event
`system is responsible for issuing calls to all handlers bound to the
`event. References to handlers are stored as function pointers in a
`list associated with the event, and each raise operation for the event
`
`translates into a sequence of indirect function calls. There are two
`sources of overhead here: the cost of an indirect call, and-since
`in general the identities and the number of arguments taken by the
`handlers for an event are not statically known-a cost associated
`with argument marshaling and unmarshaling. However, we can use
`the handler graph obtained from our handler profiling to identify
`the sequence of handlers activated when an event is raised. Given
`this information, a simple approach for dealing with this overhead
`is to merge all the handlers associated with an event into a single
`large handler. In the handler graph, this corresponds to collapsing
`all handler nodes for a given event into a single super-handler node.
`The immediate savings from this transformation is the reduction in
`the number of indirect function calls. Figure 7 shows the effect of
`this optimization. Further savings then result from the application
`of standard compiler optimizations, such as common subexpression
`elimination and dead-code elimination on the super-handler code.
`An important point to note in.tpis context is that some event sys(cid:173)
`tems such as Cactus allow event bindings to change dynamically.
`We need to account for such changes and ensure that correctness is
`preserved even if they do occur. This is done by checking whether
`any changes have been made to the list of handlers bound to an
`event when it is raised, and then dropping back into the original
`unoptimized code if a change is detected. See section 3.3 for more
`details.
`Raise operations are typically generic in the sense that they can
`raise an arbitrary event. Because of this, they incur overheads
`due to argument marshaling, indirect invocation of handlers, and
`state maintenance. The number of activations-and hence their to(cid:173)
`tal cost-can be reduced using super-han_,_dlers, as discussed above.
`However, super-handlers are still invoked'4ndirectly and so incur
`some residual cost. These costs can be reduced by replacing the
`raise operation with a direct call to the super-handler based on pro(cid:173)
`file information. This in t