`
`JASON MAASSEN, ROB VAN NIEUWPOORT, RONALD VELDEMA,
`HENRI BAL, THILO KIELMANN, CERIEL JACOBS, and RUTGER HOFMAN
`Vrije Universiteit, Amsterdam
`
`Java offers interesting opportunities for parallel computing. In particular, Java Remote Method
`Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymor-
`phism. Sun’s RMI implementation achieves this kind of flexibility at the cost of a major runtime
`overhead. The goal of this article is to show that RMI can be implemented efficiently, while still
`supporting polymorphism and allowing interoperability with Java Virtual Machines (JVMs). We
`study a new approach for implementing RMI, using a compiler-based Java system called Manta.
`Manta uses a native (static) compiler instead of a just-in-time compiler. To implement RMI effi-
`ciently, Manta exploits compile-time type information for generating specialized serializers. Also,
`it uses an efficient RMI protocol and fast low-level communication protocols.
`A difficult problem with this approach is how to support polymorphism and interoperability.
`One of the consequences of polymorphism is that an RMI implementation must be able to download
`remote classes into an application during runtime. Manta solves this problem by using a dynamic
`bytecode compiler, which is capable of compiling and linking bytecode into a running application. To
`allow interoperability with JVMs, Manta also implements the Sun RMI protocol (i.e., the standard
`RMI protocol), in addition to its own protocol.
`We evaluate the performance of Manta using benchmarks and applications that run on a
`32-node Myrinet cluster. The time for a null-RMI (without parameters or a return value) of Manta
`is 35 times lower than for the Sun JDK 1.2, and only slightly higher than for a C-based RPC
`protocol. This high performance is accomplished by pushing almost all of the runtime overhead of
`RMI to compile time. We study the performance differences between the Manta and the Sun RMI
`protocols in detail. The poor performance of the Sun RMI protocol is in part due to an inefficient
`implementation of the protocol. To allow a fair comparison, we compiled the applications and the
`Sun RMI protocol with the native Manta compiler. The results show that Manta’s null-RMI latency
`is still eight times lower than for the compiled Sun RMI protocol and that Manta’s efficient RMI
`protocol results in 1.8 to 3.4 times higher speedups for four out of six applications.
`Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Program-
`ming—distributed programming, parallel programming; D.3.2. [Programming Languages]:
`Language Classifications—concurrent, distributed, and parallel
`languages; object-oriented
`languages; D.3.4 [Programming Languages]: Processors—compilers; run-time environments
`General Terms: Languages, Performance
`Additional Key Words and Phrases: Communication, performance, remote method invocation
`
`Authors’ address: Division of Mathematics and Computer Science, Vrije Universiteit, De Boelelaan
`1081A, 1081 HV Amsterdam, The Netherlands.
`Permission to make digital/hard copy of all or part of this material without fee for personal or
`classroom use provided that the copies are not made or distributed for profit or commercial advan-
`tage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice
`is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on
`servers, or to redistribute to lists requires prior specific permission and/or a fee.
`C(cid:176) 2001 ACM 0164-0925/01/1100–0747 $5.00
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001, Pages 747–775.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`748
`
`†
`
`J. Maassen et al.
`
`1. INTRODUCTION
`There is a growing interest in using Java for high-performance parallel ap-
`plications. Java’s clean and type-safe object-oriented programming model and
`its support for concurrency make it an attractive environment for writing re-
`liable, large-scale parallel programs. For shared memory machines, Java of-
`fers a familiar multithreading paradigm. For distributed memory machines,
`such as clusters of workstations, Java provides Remote Method Invocation
`(RMI), which is an object-oriented version of Remote Procedure Call (RPC). The
`RMI model offers many advantages for distributed programming, including a
`seamless integration with Java’s object model, heterogeneity, and flexibility
`[Waldo 1998].
`Unfortunately, many existing Java implementations have inferior perfor-
`mance of both sequential code and communication primitives, which is a serious
`disadvantage for high-performance computing. Much effort is being invested
`in improving sequential code performance by replacing the original bytecode
`interpretation scheme with just-in-time compilers, native compilers, and spe-
`cialized hardware [Burke et al. 1999; Krall and Grafl 1997; Muller et al. 1997;
`Proebsting et al. 1997]. The communication overhead of RMI implementations,
`however, remains a major weakness. RMI is designed for client/server pro-
`gramming in distributed (Web based) systems, where network latencies on
`the order of several milliseconds are typical. On more tightly coupled paral-
`lel machines, such latencies are unacceptable. On our Pentium Pro/Myrinet
`cluster, for example, Sun’s JDK 1.2 implementation of RMI obtains a null-RMI
`latency (i.e., the roundtrip time of an RMI without parameters or a return
`value) of 1,316 „s, compared to 31 „s for a user-level Remote Procedure Call
`protocol in C.
`Part of this large overhead is caused by inefficiencies in the JDK implemen-
`tation of RMI, which is built on a hierarchy of stream classes that copy data
`and call virtual methods. Serialization of method arguments (i.e., converting
`them to arrays of bytes) is implemented by recursively inspecting object types
`until primitive types are reached, and then invoking the primitive serializers.
`All of this is performed at runtime for each remote invocation.
`Besides inefficiencies in the JDK implementation of RMI, a second reason for
`the slowness of RMI is the difference between the RPC and RMI models. Java’s
`RMI model is designed for flexibility and interoperability. Unlike RPC, it allows
`classes unknown at compile time to be exchanged between a client and a server
`and to be downloaded into a running program. In Java, an actual parameter ob-
`ject in an RMI can be of a subclass of the class of the method’s formal parameter.
`In (polymorphic) object-oriented languages, the dynamic type of the parameter-
`object (the subclass) should be used by the method, not the static type of the
`formal parameter. When the subclass is not yet known to the receiver, it has
`to be fetched from a file or HTTP server and be downloaded into the receiver.
`This high level of flexibility is the key distinction between RMI and RPC [Waldo
`1998]. RPC systems simply use the static type of the formal parameter (thereby
`type-converting the actual parameter), and thus lack support for polymorphism
`and break the object-oriented model.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`Efficient Java RMI
`
`†
`
`749
`
`The key problem is to obtain the efficiency of RPC and the flexibility of
`Java’s RMI. This article discusses a compiler-based Java system, called Manta,1
`which was designed from scratch to efficiently implement RMI. Manta replaces
`Sun’s runtime protocol processing as much as possible by compile-time analysis.
`Manta uses a native compiler to generate efficient sequential code and special-
`ized serialization routines for serializable argument classes. Also, Manta sends
`type descriptors for argument classes only once per destination machine, in-
`stead of once for every RMI. In this way, almost all of the protocol overhead
`has been pushed to compile time, off the critical path. The problems with this
`approach are, however, how to interface with Java Virtual Machines (JVMs)
`and how to address dynamic class loading. Both are required to support inter-
`operability and polymorphism. To interoperate with JVMs, Manta supports the
`Sun RMI and serialization protocol, in addition to its own protocol. Dynamic
`class loading is supported by compiling methods and generating serializers
`at runtime.
`The general strategy of Manta is to make the frequent case fast. Since
`Manta is designed for parallel processing, we assume that the frequent case
`is communication between Manta processes, running, for example, on different
`nodes within a cluster. Manta supports the infrequent case (communication
`with JVMs) using a slower approach. Hence the Manta RMI system logically
`consists of two parts:
`
`— A fast communication protocol that is used only between Manta processes. We
`call this protocol Manta RMI, to emphasize that it delivers the standard RMI
`programming model to the user; but it can only be used for communication
`between Manta processes.
`— Additional software that makes the Manta RMI system as a whole compatible
`with standard RMI, so Manta processes can communicate with JVMs.
`
`We refer to the combination of these two parts as the Manta RMI system.
`We use the term Sun RMI to refer to the standard RMI protocol as defined
`in the RMI specification [Sun Microsystems 1997]. Note that both Manta RMI
`and Sun RMI provide the same programming model, but their wire formats
`are incompatible.
`The Manta RMI system thus combines high performance with the flexibil-
`ity and interoperability of RMI. In a grid computing application [Foster and
`Kesselman 1998], for example, some clusters can run our Manta software
`and communicate internally using the Manta RMI protocol. Other machines
`may run JVMs, containing, for example, a graphical user interface program.
`Manta communicates with such machines using the Sun RMI protocol, allow-
`ing method invocations between Manta and JVMs. Manta implements almost
`all other functionality required by the RMI specification, including heterogene-
`ity, multithreading, synchronized methods, and distributed garbage collection.
`Manta currently does not implement Java’s security model, as the system is
`primarily intended for parallel cluster computing.
`
`1A fast, flexible, black-and-white, tropical fish that can be found in the Indonesian archipelago.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`750
`
`†
`
`J. Maassen et al.
`
`The main contributions of this article are as follows.
`— We show that RMI can be implemented efficiently and can obtain a perfor-
`mance close to that of RPC systems. The null-RMI latency of Manta RMI
`over Myrinet is 37 „s, only 6 „s slower than a C-based RPC protocol.
`— We show that this high performance can be achieved while still supporting
`polymorphism and interoperability with JVMs by using dynamic bytecode
`compilation and multiple RMI protocols.
`— We give a detailed performance comparison between the Manta and Sun
`RMI protocols, using benchmarks as well as a collection of six parallel ap-
`plications. To allow a fair comparison, we compiled the applications and the
`Sun RMI protocol with the native Manta compiler. The results show that
`the Manta protocol results in 1.8 to 3.4 times higher speedups for four out of
`six applications.
`The remainder of the article is structured as follows. Design and imple-
`mentation of the Manta system are discussed in Section 2. In Section 3, we
`give a detailed analysis of the communication performance of our system. In
`Section 4, we discuss the performance of several parallel applications. In
`Section 5, we look at related work. Section 6 presents conclusions.
`
`2. DESIGN AND IMPLEMENTATION OF MANTA
`This section will discuss the design and implementation of the Manta RMI
`system, which includes the Manta RMI protocol and the software extensions
`that make Manta compatible with Sun RMI.
`
`2.1 Manta Structure
`Since Manta is designed for high-performance parallel computing, it uses a
`native compiler rather than a JIT. The most important advantage of a native
`compiler is that it can perform more time consuming optimizations, and there-
`fore (potentially) generate better code.
`The Manta system is illustrated in Figure 1. The box in the middle de-
`scribes the structure of a Manta process, which contains the executable code
`for the application and (de)serialization routines, both of which are generated
`by Manta’s native compiler. Manta processes can communicate with each other
`through the Manta RMI protocol, which has its own wire format. A Manta pro-
`cess can communicate with any JVM (the box on the right) through the Sun
`RMI protocol, using the standard RMI format (i.e., the format defined in Sun’s
`RMI specification).
`A Manta-to-Manta RMI is performed with the Manta protocol, which is
`described in detail in the next section. Manta-to-Manta communication is
`the common case for high-performance parallel programming, for which our
`system is optimized. Manta’s serialization and deserialization protocols sup-
`port heterogeneity (RMIs between machines with different byte-orderings or
`alignment properties).
`A Manta-to-JVM RMI is performed with a slower protocol that is compatible
`with the RMI specification and the standard RMI wire format. Manta uses
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`Efficient Java RMI
`
`†
`
`751
`
`Fig. 1. Manta/JVM interoperability.
`
`generic routines to (de)serialize the objects to or from the standard format.
`These routines use reflection, similar to Sun’s implementation. The routines are
`written in C, as is all of Manta’s runtime system, and execute more efficiently
`than Sun’s implementation, which is partly written in Java.
`To support polymorphism for RMIs between Manta and JVMs, a Manta ap-
`plication must be able to handle bytecode from other processes. When a Manta
`application requests bytecode from a remote process, Manta will invoke its
`bytecode compiler to generate the metaclasses, the (de)serialization routines,
`and the object code for the methods as if they were generated by the Manta
`source code compiler. Dynamic bytecode compilation is described in more detail
`in Section 2.4. The dynamically generated object code is linked into the applica-
`tion with the operating system’s dynamic linking interface. If a remote process
`requests bytecode from a Manta application, the JVM bytecode loader retrieves
`the bytecode for the requested class in the usual way through a shared filesys-
`tem or through an HTTP daemon. Sun’s javac compiler is used to generate the
`bytecode at compile time.
`The structure of the Manta system is more complicated than that of a JVM.
`Much of the complexity of implementing Manta efficiently is due to the need to
`interface a system based on a native-code compiler with a bytecode-based sys-
`tem. The fast communication path in our system, however, is straightforward:
`the Manta protocol just calls the compiler-generated serialization routines and
`uses a simple scheme to communicate with other Manta processes. This fast
`communication path is described below.
`
`2.2 Serialization and Communication
`RMI systems can be split into three major components: low-level communica-
`tion, the RMI protocol (stream management and method dispatch), and serial-
`ization. Below, we discuss how the Manta protocol implements each component.
`Low-level communication. RMI implementations are typically built on top
`of TCP/IP, which was not designed for parallel processing. Manta uses the Panda
`communication library [Bal et al. 1998], which has efficient implementations
`on a variety of networks. Panda uses a scatter/gather interface to minimize the
`number of memory copies, resulting in high throughput.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`752
`
`†
`
`J. Maassen et al.
`
`Fig. 2. Structure of Sun and Manta RMI protocols; shaded layers run compiled code.
`
`On Myrinet, Panda uses the LFC communication system [Bhoedjang et al.
`2000], which provides reliable communication. LFC is a network interface
`protocol for Myrinet that is both efficient and provides the right functionality
`for parallel programming systems. LFC itself is implemented partly by embed-
`ded software that runs on the Myrinet Network Interface processor and partly
`by a library that runs on the host. To avoid the overhead of operating system
`calls, the Myrinet Network Interface is mapped into user space, so LFC and
`Panda run entirely in user space. The current LFC implementation does not
`offer protection, so the Myrinet network can be used by a single process only.
`On Fast Ethernet, Panda is implemented on top of UDP, using a 2-way slid-
`ing window protocol to obtain reliable communication. The Ethernet network
`interface is managed by the kernel (in a protected way), but the Panda RPC
`protocol runs in user space.
`The Panda RPC interface is based on an upcall model: conceptually, a new
`thread of control is created when a message arrives which will execute a han-
`dler for the message. The interface was designed to avoid thread switches in
`simple cases. Unlike active message handlers [von Eicken et al. 1992], upcall
`handlers in Panda are allowed to block to enter a critical section, but a handler
`is not allowed to wait for another message to arrive. This restriction allows the
`implementation to handle all messages using a single thread, so handlers that
`execute without blocking do not need any context switches.
`The RMI protocol. The runtime system for the Manta RMI protocol is
`written in C. It was designed to minimize serialization and dispatch over-
`head such as copying, buffer management, fragmentation, thread switching,
`and indirect method calls. Figure 2 gives an overview of the layers in the
`Manta RMI protocol and compares it with the layering of the Sun RMI system.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`Efficient Java RMI
`
`†
`
`753
`
`The shaded layers denote statically compiled code, while the white layers are
`mainly JIT-compiled Java (although they contain some native calls). Manta
`avoids the stream layers of Sun RMI. Instead, RMI parameters are serialized
`directly into an LFC buffer. Moreover, in the JDK, these stream layers are
`written in Java, and therefore their overhead depends on the quality of the
`Java implementation. In Manta, all layers are either implemented as compiled
`C code or compiler-generated native code. Also, the native code generated by
`the Manta compiler calls RMI serializers directly, instead of using the slow
`Java Native Interface. Heterogeneity between little-end and big-end machines
`is handled by sending data in the native byte order of the sender, and having
`the receiver do the conversion, if necessary.
`Another optimization in the Manta RMI protocol is avoiding thread switching
`overhead at the receiving node. In the general case, an invocation is serviced
`at the receiving node by a newly allocated thread, which runs concurrently
`with the application threads. With this approach, however, the allocation of
`the new thread and the context switch to this thread will be on the critical
`path of the RMI. To reduce the allocation overhead, the Manta runtime system
`maintains a pool of preallocated threads, so the thread can be taken from this
`pool instead of being allocated. In addition, Manta avoids the context-switching
`overhead for simple cases. The Manta compiler determines whether a remote
`method may block. If the compiler can guarantee that a given method will
`never block, the receiver executes the method without doing a context switch
`to a separate thread. In this case, the current application thread will service
`the request and then continue. The compiler currently makes a conservative
`estimation, and only guarantees the nonblocking property for methods that do
`not call other methods and do not create objects (since that might invoke the
`garbage collector, which may cause the method to block). This analysis has to
`be conservative, since a deadlock situation might occur if an application thread
`services a method that blocks.
`The Manta RMI protocol cooperates with the garbage collector to keep track
`of references across machine boundaries. Manta uses a local garbage collector
`based on a mark-and-sweep algorithm. Each machine runs this local collector,
`using a dedicated thread that is activated by the runtime system or the user.
`The distributed garbage collector is implemented on top of the local collectors,
`using a reference-counting mechanism for remote objects (distributed cycles
`remain undetected). If a Manta process communicates with a JVM, it uses the
`distributed garbage collection algorithm of the Sun RMI implementation, which
`is based on leasing.
`The serialization protocol. The serialization of method arguments is an
`important source of overhead in existing RMI implementations. Serialization
`takes a Java object and converts (serializes) it into an array of bytes, making
`a deep copy that includes the referenced subobjects. The Sun serialization pro-
`tocol is written in Java and uses reflection to determine the type of each object
`during runtime. The Sun RMI implementation uses the serialization protocol
`for converting data that are sent over the network. The process of serializing
`all arguments of a method is called marshalling.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`754
`
`†
`
`J. Maassen et al.
`
`With the Manta protocol, all serialization code is generated by the compiler,
`avoiding most of the overhead of reflection. Serialization code for most classes
`is generated at compile time. Only serialization code for classes which are not
`locally available is generated at runtime, by the bytecode compiler. The over-
`head of this runtime code generation is incurred only once—the first time the
`new class is used as an argument to some method invocation. For subsequent
`uses, the efficient serializer code is then available for reuse.
`The Manta compiler also generates the marshalling code for methods. The
`compiler generates method-specific marshall and unmarshall functions, which
`(among others) call the generated routines to serialize or deserialize all ar-
`guments of the method. For every method in the method table, two pointers
`are maintained to dispatch to the right marshaller or unmarshaller, depend-
`ing on the dynamic type of the given object. A similar optimization is used for
`serialization: every object has two pointers in its method table to the serial-
`izer and deserializer for that object. When a particular object is to be serial-
`ized, the method pointer is extracted from the method table of the object’s dy-
`namic type and the serializer is invoked. On deserialization, the same procedure
`is applied.
`Manta’s serialization protocol performs optimizations for simple objects. An
`array whose elements are of a primitive type is serialized by doing a direct
`memory copy into the LFC buffer, so the array need not be traversed, as is done
`by the JDK. In order to detect duplicate objects, the marshalling code uses a
`table containing objects that have already been serialized. If the method does
`not contain any parameters that are objects, however, the table is not built up,
`which again makes simple methods faster.
`Another optimization concerns the type descriptors for the parameters of an
`RMI call. When a serialized object is sent over the network, a descriptor of its
`type must also be sent. The Sun RMI protocol sends a complete type descriptor
`for every class used in the remote method, including the name and package of
`the class, a version number, and a description of the fields in this class. All this
`information is sent for every RMI call; information about a class is only reused
`within a single RMI call. With the Manta RMI protocol, each machine sends
`the type descriptor only once to any other machine. The first time a type is
`sent to a certain machine, a type descriptor is sent and the type is given a new
`type-id that is specific to the receiver. When more objects of this type are sent
`to the same destination machine, the type-id is reused. When the destination
`machine receives a type descriptor, it checks if it already knows this type. If not,
`it loads it from the local disk or an HTTP server. Next, it inserts the type-id and
`a pointer to the metaclass in a table, for future references. This scheme thus
`ensures that type information is sent only once to each remote node.
`
`2.3 Generated Marshalling Code
`Figures 3, 4, and 5 illustrate the generated marshalling code. Consider the
`RemoteExample class in Figure 3. The square() method can be called from an-
`other machine, so the compiler generates marshalling and unmarshalling code
`for it.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`Efficient Java RMI
`
`†
`
`755
`
`import java.rmi.*;
`import java.rmi.server.UnicastRemoteObject;
`
`public class RemoteExample extends UnicastRemoteObject
`implements RemoteExampleInterface {
`int value;
`String name;
`
`synchronized int square(int i, String s1, String s2) throws RemoteException {
`value = i;
`name = s1 + s2;
`System.out.println("i = " + i);
`return i*i;
`
`}
`
`}
`
`Fig. 3. A simple remote class.
`
`marshall__square(class__RemoteExample *this, int i, class__String *s1, class__String *s2) {
`MarshallStruct *m = allocMarshallStruct();
`ObjectTable = createObjectTable();
`
`writeHeader(m->outBuffer, this, OPCODE_CALL, CREATE_THREAD);
`writeInt(m->outBuffer, i);
`writeObject(m->outBuffer, s1, ObjectTable);
`writeObject(m->outBuffer, s2, ObjectTable);
`
`// Request message is created, now write it to the network.
`flushMessage(m->outBuffer);
`
`fillMessage(m->inBuffer); // Receive reply.
`opcode = readInt(m->inBuffer);
`if (opcode == OPCODE_EXCEPTION) {
`class__Exception *exception = readObject(m->inBuffer, ObjectTable);
`freeMarshallStruct(m);
`THROW_EXCEPTION(exception);
`} else {
`result = readInt(m->inBuffer);
`freeMarshallStruct(m);
`RETURN(result);
`
`}
`
`}
`
`Fig. 4. The generated marshaller (pseudocode) for the square method.
`
`unmarshall__square(class__RemoteExample *this, MarshallStruct *m) {
`ObjectTable = createObjectTable();
`
`int i = readInt(m->inBuffer);
`class__String *s1 = readObject(m->inBuffer, ObjectTable);
`class__String *s2 = readObject(m->inBuffer, ObjectTable);
`
`result = CALL_JAVA_FUNCTION(square, this, i, s1, s2, &exception);
`if (exception) {
`writeInt(m->outBuffer, OPCODE_EXCEPTION);
`writeObject(m->outBuffer, exception, ObjectTable);
`} else {
`writeInt(m->outBuffer, OPCODE_RESULT_CALL);
`writeInt(m->outBuffer, result);
`
`} /
`
`/ Reply message is created, now write it to the network.
`flushMessage(m->outBuffer);
`
`}
`
`Fig. 5. The generated unmarshaller (pseudocode) for the square method.
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`756
`
`†
`
`J. Maassen et al.
`
`The generated marshaller for the square() method is shown in Figure 4
`in pseudocode. Because square() has Strings as parameters (which are ob-
`jects in Java), a table is built to detect duplicates. A special create thread
`flag is set in the header data structure because square potentially blocks: it
`contains a method call that may block (e.g., in a wait()) and it creates objects,
`which may trigger garbage collection and thus may also block. The writeObject
`calls serialize the string objects to the buffer. flushMessage does the actual
`writing out to the network buffer. The function fillMessage initiates reading
`the reply.
`Pseudocode for the generated unmarshaller is shown in Figure 5. The
`header is already unpacked when this unmarshaller is called. Because the
`create thread flag in the header was set, this unmarshaller will run in a sep-
`arate thread obtained from a thread pool. The marshaller itself does not know
`about this. Note that the this parameter is already unpacked and is a valid
`reference for the machine on which the unmarshaller will run.
`
`2.4 Dynamic Bytecode Compilation
`To support polymorphism, a Manta program must be able to handle classes
`that are exported by a JVM, but that have not been statically compiled into
`the Manta program. To accomplish this, the Manta RMI system contains a
`bytecode compiler to translate classes to object code at runtime. We describe
`this bytecode compiler below. Manta uses the standard dynamic linker to link
`the object code into the running application.
`As with the JDK, the compiler reads the bytecode from a file or an HTTP
`server. Next, it generates a Manta metaclass with dummy function entries in
`its method table. Since the new class may reference or even subclass other
`unknown classes, the bytecode compiler is invoked recursively for all refer-
`enced unknown classes. Subsequently, the instruction stream for each byte-
`code method is compiled into a C function. For each method, the used stack
`space on the Virtual Machine stack is determined at compile time, and a lo-
`cal stack area is declared in the C function. Operations on local variables are
`compiled in a straightforward way. Virtual function calls and field references
`can be resolved from the running application, including the newly generated
`metaclasses. Jumps and exception blocks are implemented with labels, gotos,
`and nonlocal gotos (setjmp/longjmp). The resulting C file is compiled with the
`system C compiler, and linked into the running application with the system
`dynamic linker (called dlopen() in many Unix implementations). The dummy
`entries in the created metaclass method tables are resolved into function point-
`ers in the dynamically loaded library.
`One of the optimizations we implemented had a large impact on the speed
`of the generated code: keeping the method stack in registers. The trivial im-
`plementation of the method stack would be to maintain an array of N 32-bit
`words, where N is the size of the used stack area of the current method. Since
`bytecode verification requires that all stack offsets can be computed statically,
`it is, however, possible to replace the array with a series of N register variables,
`so the calls to increment or decrement the stack pointer are avoided and the
`
`ACM Transactions on Programming Languages and Systems, Vol. 23, No. 6, November 2001.
`
`Ingenico v. IOENGINE
`IPR2019-00416 (US 8,539,047)
`Exhibit 2109
`
`
`
`Efficient Java RMI
`
`†
`
`757
`
`Fig. 6. Example of Manta’s interoperability.
`
`C compiler can keep stack references in registers. A problem is that in the JVM,
`64-bit variables are spread over two contiguous stack locations. We solve this by
`maintaining two parallel stacks, one for 32-bit and one for 64-bit words. Almost
`all bytecode instructions are typed, so they need to operate only on the relevant
`stack. Some infrequently used instructions (the dup2 family) copy either two
`32-bit words or one 64-bit word, and therefore operate on both stacks. The
`memory waste of a duplicate stack is moderate, since the C compiler will re-
`move any unreferenced local variables. With this optimization, the application
`speed of compiled bytecode is generally within 30% of compiled Manta code.
`
`2.5 Example Application
`Manta’s RMI interoperability and dynamic class loading are useful to interop-
`erate with software that runs on a JVM and uses the Sun RMI protocol. For
`example, consider a parallel program that generates output that must be visu-
`alized. The parallel program is compiled with Manta and uses the Manta RMI
`protocol. The software for the visualization system to be used, however, may
`run on the Sun JDK and use the Sun RMI protocol. To illustrate this type of
`interoperability, we implemented a simple example, using a graphical version
`of one of our parallel applications (successive overrelaxation; see Section 4).
`The computation is performed by a parallel program that is compiled with
`Manta and runs on a cluster computer (see Figure 6). The output is visualized on
`the display of a workstation, using a graphical user interface (GUI) application
`written in Java. The parallel application repeatedly performs one iteration of
`the SOR algorithm and collects its data (a 2-dimensional array) at one node of
`the cluster, called the coordinator. The coordinator passes the array via the Sun
`RMI protocol to a remote viewer object, which is part of the GUI application
`on the workstation.