`A LANGUAGE FOR PROTOCOL COMPILATION
`by
`EDDIE KOHLER
`
`Submitted to the
`DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
`in partial fulfillment of the requirements for the degree of
`Master of Science
`
`at the
`MASSACHUSETTS INSTITUTE OF TECHNOLOGY
`September, 1997 Fr~l
`Rg
`@ 1997 Massachusetts Institute of Technology
`All rights reserved
`
`Signature of Author ...
`.....
`Department of Electrical Engineering and Computer Science
`7
`September, 1997
`7-?
`
`7/
`
`Certified by .....
`
`M. Frans Kaashoek
`Associate Professor of Electrical Engineering and Computer Science
`S
`Thesis Supervisor
`)
`
`Accepted by .......
`
`Arthur C. Smith
`Chair, Department Committee on Graduate Students
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 001
`
`
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 002
`
`
`
`PROLAC: A LANGUAGE FOR PROTOCOL COMPILATION
`by
`EDDIE KOHLER
`
`Submitted to the Department of Electrical Engineering and
`Computer Science on August 29, 1997 in partial fulfillment of the
`requirements for the degree of Master of Science
`
`ABSTRACT
`
`Prolac is a new statically-typed object-oriented programming language designed for
`implementing network protocols. Prolac is designed to make protocol specifications
`readable to human beings, and thus more likely to be correct; easily extensible to
`accommodate protocol enhancements; and efficient when compiled.
`
`We present an overview of the Prolac language and a discussion of issues and prin-
`ciples in its design, as well as a preliminary language reference manual. The prolacc
`optimizing protocol compiler is also described. A prototype TCP specification is pre-
`sented that is both readable and extensible; experience with the specification suggests
`that, even untuned, Prolac overhead is negligible on normal networks.
`
`Thesis Supervisor: M. Frans Kaashoek
`Title: Associate Professor of Computer Science
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 003
`
`
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 004
`
`
`
`CONTENTS
`
`1 Introduction 7
`2 Language overview 13
`3 Language design 19
`4 The prolacc compiler 33
`5 Results 47
`6 Summary 57
`A Prolac language reference manual 59
`Bibliography 81
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 005
`
`
`
`ACKNOWLEDGEMENTS
`
`This thesis presents joint work with Frans Kaashoek. In particular, the TCP imple-
`mentation presented in Chapter 5 was designed and implemented by Frans Kaashoek
`with design advice from Eddie Kohler. This research was partially supported by an
`NSF Graduate Research Fellowship.
`
`for Frans and for Rebecca;
`because of my family
`and the state of Massachusetts;
`near David Mazieres;
`beside Maria and Nummi
`and the ever-unique Al Pua;
`and despite Count Dorkula
`and her lovely sidekick,
`Squirrelgirl
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 006
`
`
`
`1
`INTRODUCTION
`
`Designing and implementing network protocols with conventional languages and tools
`is difficult. The protocols themselves are hard to design; implementing a protocol
`correctly is another challenge [WS95]. Furthermore, protocol efficiency has become
`vital with the growing importance of networking, and the occasional need for protocol
`extensions [Ste97, BOP94] only complicates the issue. Unfortunately, these tensions
`work against one another. Many optimizations which make protocol code more efficient
`also tend to make it much harder to understand [MPBO96], and therefore harder to
`get right. Extensions affect deeply buried snippets of protocol code rarely identifiable
`a priori. Finally, the clearest organization of protocol code is often among the slowest.
`Specialized language tools are a natural area to investigate for a solution to this
`software engineering problem. Most previous work, however, has focused on only
`one of the three issues: correctness. Research has been particularly active in formal
`specification languages amenable to machine verification [BB89, DB89]; while it is
`possible to use one of these specification languages to generate an implementation
`semi-automatically, very high performance is often precluded by the languages them-
`selves. Some work has focused on high-performance implementation [AP93, CDO96],
`but these languages may not be suitable for existing protocols, either by design or due
`to limitations in the underlying language model.
`This thesis presents a language, Prolac, and its compiler, prolacc, which address
`all three issues in protocol implementation: correctness, efficiency, and extensions.
`Our protocol compiler project, of which this thesis is the first concrete result, has
`three specific goals: to implement protocol-specific optimizations, thus creating high-
`performance protocol implementations; to facilitate protocol extensions; and to make
`protocol implementations more tractable to human readers, and thus easier for people
`to reason about.
`Prolac is a statically-typed, object-oriented language. Unlike many such languages,
`it focuses on small functions rather than large ones: its syntax encourages the program-
`mer to divide computation into small pieces, and Prolac features, such as namespaces,
`help a programmer name such small pieces appropriately. A protocol specification
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 007
`
`
`
`divided into small rules is both easier to read and easier to change; a small extension
`is more likely to affect just a few small functions in Prolac, which can be overridden
`by a subtype without complicating the base protocol code. Several novel features-
`specifically, module operators and implicit rules-help make protocol specifications
`easier to understand.
`The prolacc compiler compiles a Prolac specification to C code. It applies sev-
`eral protocol-specific optimizations, such as extensive inlining and outlining; these
`optimizations are specified in the Prolac language as annotations to modules or to
`individual expressions. Both the language and the compiler have been designed to
`produce efficient generated code-our goal was to equal or exceed the performance
`of protocol implementations hand-written in C. In particular, prolacc can analyze away
`almost every dynamic dispatch and structure assignment.
`We also describe a prototype implementation of the TCP protocol [Pos81] in the
`Prolac language, written largely as a proof of concept. We focused on TCP because it
`is a large, complex, important, and well-documented protocol. TCP is widely recog-
`nized as being difficult to implement well; in fact, books have been written about its
`implementation [WS95].
`Our Prolac TCP specification is divided into small, sensibly interlocked pieces.
`Logical extensions to the base TCP protocol, such as delayed acknowledgement and
`slow start, are implemented in the specification as extensions to a set of base mod-
`ules; very simple definitions determine which extensions, if any, are compiled, and
`even their relative order of execution. The TCP specification is highly extensible
`while staying highly readable-much of the specification is very similar to language
`in the standard reference to TCP [Pos81]. Prolac TCP can communicate with other
`TCPs, and experiments show that Prolac is not a bottleneck under normal networking
`conditions.
`
`1.1
`
`Related work
`
`The International Organization for Standards (ISO) has defined two formal descrip-
`tion techniques originally intended for developing the ISO OSI protocol suite. These
`techniques are LOTOS and Estelle. LOTOS [BB89], based on Milner's Calculus of
`Communicating Systems [Mil80], is an algebraic technique with functional proper-
`ties. Like many functional languages, it is very effective for describing its fundamental
`abstraction, processes. Unfortunately, also like many functional languages, these ab-
`stractions are rigid and may not fit existing protocols without some pain-exactly the
`pain that protocol languages are supposed to avoid. LOTOS users report some prob-
`lems with using the language for specification [LMD90], and LOTOS presents very
`serious challenges to the compiler writer, including valid programs which potentially
`generate an infinite number of processes [WvB89]. It seems doubtful that LOTOS
`specifications can be made to run efficiently.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 008
`
`
`
`Estelle [DB89] is one of many protocol languages based on a finite state ma-
`chine model much like that often used in parsers. Estelle, like LOTOS, includes
`asynchronous parallelism; it is based on Pascal, and includes a module system (bound
`up with the parallelism structure) where processes (modules) communicate through
`broadcast signals. Estelle can be used to create semi-automatic implementations of
`reasonable performance [SCB90]. However, experience with large protocols is not
`reported, and high performance is not discussed. Furthermore, the state machine
`model and its cousin the Petri machine model have intrinsic problems for modeling
`protocols: the division into states often does not correspond to anything real in the
`protocol, and relationships between states can become very complicated and difficult
`to change, even in carefully layered protocols [vB86].
`Esterel [BdS91] is a version of Estelle without asynchronous parallelism: an Esterel
`specification has a defined sequential execution. High performance Esterel compilers
`are being developed [CDO96]; however, full implementation of a large protocol is not
`reported. In terms of language, Esterel suffers from many of the same problems as
`Estelle due to their common extended finite state machine model.
`RTAG [And88] is based on a context-free attribute grammar. RTAG provides a
`relatively natural syntax with equivalent modeling power to extended finite state ma-
`chines. Efficient compilers are reported in the literature [HA89], although "efficient"
`turns out to mean "arguably efficient enough for research use", i.e., simple protocols
`suffer a factor of 2 slowdown. This slowdown comes partially from parallelism in the
`language. RTAG is not always suitable for existing protocols, just as yacc is not always
`suitable for describing computer languages.
`The x-kernel [HP91] provides an infrastructure and various tools for creating pro-
`tocol stacks. Prolac is complementary with the x-kernel, which focuses on organizing
`protocol stacks; Prolac is primarily designed for implementing a single protocol.
`Morpheus [AP93], an object-oriented protocol language, enforces a large number
`of constraints on the protocol programmer. These constraints are restrictive enough
`that "existing protocol specification[s] may not be implementable in Morpheus." While
`some of the constraints are meant to increase the knowledge available to the compiler
`to enable domain-specific optimizations, others seem to exist solely to prevent the
`programmer from making bad design choices. Impressive results are reported for
`UDP speedup, but we regard Morpheus's inflexibility and inability to implement real
`protocols as definitive.
`
`1.2 Motivation
`
`This section describes some requirements we formulated in response to the triple
`goal of tractability, extensibility, and efficiency, and how those requirements guided
`Prolac's design.
`
`* The language must be easy to write. Programs in the language should be gen-
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 009
`
`
`
`erally understandable to most implementors without requiring major grounding
`in new techniques.
`This requirement steered us away from existing models, which are often obscure
`from the point of view of traditional programming languages, and toward a simple
`expression model with generally C-like syntax.
`
`* The language must handle most of the protocol's implementation, in-
`cluding some lower levels. Semi-automatic protocol implementation gener-
`ally means that the implementor provides vital pieces of the implementation in
`an auxiliary file [SB90]. While some code rightfully belongs outside a protocol
`language, most languages make the exclusion boundary too high. Separation can
`cause problems when either specification or implementation changes, and acts
`as a deterrent against reflecting vital implementation concerns in the specifica-
`tion. These problems make high-performance semi-automatic implementation
`quite difficult.
`Prolac's object-oriented features and namespaces address the requirement for a
`readable high-level specification. The requirement for low-level implementation
`led us to treat Prolac as bilingual; like yacc [Joh75], Prolac allows programmers
`to escape to C to express complex implementation details. Both high and low
`levels can easily coexist in a Prolac program through subtyping, which can be
`used to provide implementation details for a high-level specification.
`
`* The compiler must generate efficient code for well-written specifica-
`tions.
`Prolac's simplicity is a result of this requirement: a simple language is easier
`to compile into efficient code. Also, Prolac allows the programmer to specify
`various hints to improve the generated code.
`
`* The language should support integrated layer processing. Integrated
`layer processing, or ILP, where processing for the various protocol layers is
`performed all at once rather than sequentially, has proven very important for
`high network performance [CT90, Cla82].
`The inheritance mechanism should apply naturally to ILP, even when layers are
`specified independently.
`
`A study of the other protocol languages described in § 1.1 led to other requirements:
`
`* Asynchronous parallelism is a mistake. Every protocol language including
`asynchronous parallelism has proved difficult or impossible to compile into high-
`performance code, often demonstrably due to that parallelism.
`Prolac has no asynchronous parallelism-or any parallelism at all, for that matter:
`a Prolac specification is wholly sequential.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 010
`
`
`
`* Inflexible abstractions cause problems. While whole protocols (i.e., TCP,
`IP, Ethernet) can fit well into a fixed abstract framework [HP91], the internals
`of these protocols are more complex and often don't match up to the idealized
`state or process abstractions in protocol languages. This forces some uncomfort-
`able twisting to make the protocol fit the language-making the specification
`harder to understand--or may make it impossible to implement the protocol in
`the language [AP93]. Thus, any enforced abstraction is also an enforced limita-
`tion, restricting the language so that only particular protocols can be naturally
`implemented.
`This consideration caused us to leave all protocol-specific abstractions out of
`our protocol language. Prolac is a domain-specific language, in that many of
`its features, both small and large, were formulated in direct response to how
`protocols work; however, it is completely without domain-specific abstractions.
`
`Other requirements were inspired by properties common to many protocols.
`
`* Protocols are often described in informal specifications as decision trees
`augmented with actions. See, for example, the TCP specification [Pos81].
`Prolac was designed so that this style of specification easily translates to Prolac.
`
`* Protocols are not organized around data abstractions. Consider the large
`and complex TCP protocol. This protocol is built around two central objects:
`the transmission control block (TCB) and a segment, or incoming packet. Even
`considering auxiliary objects like buffers and timers, these objects simply do not
`determine a reasonable organization of TCP's voluminous code.
`The implicit rule mechanism (§3.2.2) was inspired by this property. It allows a
`programmer to write maximally concise and understandable specifications, even
`when code is not organized around data objects.
`
`* Protocols evolve through accretion of small extensions. As mentioned
`above, these extensions involve relatively small changes to deeply buried, seem-
`ingly arbitrary pieces of protocol code. The first protocol designer cannot limit
`such extensions to a few specific places: the right places are impossible to iden-
`tify a priori. We expect the need for extensions to grow as user-level protocol
`implementations become more important.
`This inspired our decision to prohibit access control in Prolac: a module can
`override any function from any of its supertypes. Module operators (§3.2.1)
`allow module designers to suggest an interface to a module; however, that
`module's users can still access hidden rules, albeit with a more inconvenient
`syntax.
`
`1. Well, almost: see §3.3.1 regarding the seqint type.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 011
`
`
`
`Protocols behave predictably at run time. Protocols are generally specified
`in the form of a state-transition graph; the next step in any protocol processing
`is always very well-defined.
`This means, in Prolac, that the dynamic dispatch the language provides will
`often go unused; at run time, most call sites will always dispatch to a single
`function, not one of a set. We further note that, in specifications we designed,
`this unique function would often be the most overridden function available.
`Prolacc, therefore, optimizes this case to a static dispatch when it is legal to do
`SO.
`
`1.3 Contributions
`
`This thesis makes the following contributions: The design of the Prolac language,
`including the novel concepts of module operators and implicit rules; prolacc, a working
`Prolac compiler generating reasonably efficient and high-level C code; and a prototype
`TCP specification in Prolac which is readable, extensible, and can communicate with
`other TCPs.
`
`1.4 Thesis organization
`
`Chapters 2 and 3 describe the Prolac language; Chapter 2 gives a general overview,
`while Chapter 3 provides a more detailed discussion. Chapter 4 describes the inter-
`nal structure of the prolacc compiler and describes some of the algorithms it uses.
`Chapter 5 gives a brief description of our prototype TCP implementation; Chapter 6
`provides a brief summary and directions for future work. Finally, Appendix A is a
`preliminary version of the Prolac language reference manual.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 012
`
`
`
`2
`LANGUAGE OVERVIEW
`
`This chapter introduces the Prolac language, providing a quick flavor of its syntax and
`how it is used by developing a trivial example. Chapter 3 will discuss Prolac's design
`and some of its important features.
`Some familiarity with conventional statically-typed object-oriented languages (e.g.,
`C++ or Java) is assumed. This chapter will not describe any features in detail; the
`curious reader is referred to Appendix A, which contains a preliminary version of
`the Prolac language reference manual. The text contains references to the relevant
`sections of the manual.
`
`2.1
`
`Basics
`
`Prolac is an object-oriented language, by which we mean that it has data abstraction
`(i.e., the user can define new data types and operations acting on them), subtyping
`(i.e., a user-defined type can extend the definition of other user-defined types), and
`dynamic dispatch (i.e., for some function calls, one of a set of actual function bodies
`may be executed at run time, depending on the run-time type of a special argument).
`Prolac is also statically typed: like C++ and Java, but unlike Smalltalk, Lisp and
`Self, the compiler knows the type of every object in a Prolac program. The type the
`compiler knows for an object, called its static type, may be different from the run-time
`type of the object, called its dynamic type; specifically, an object's dynamic type may
`be a subtype of its static type.
`Like ML and Lisp, Prolac is an expression language: everything returns a value,
`and there is no concept of a statement which is not an expression. This contributes to
`Prolac's readable but very concise syntax. Prolac has even fewer complicated control
`constructs than ML and Lisp-it has no looping constructs, for example. Looping
`must either be expressed through recursion or in C. This is not generally a problem
`when specifying protocols.
`Prolac is designed for relatively small protocol specifications (i.e., less than 5,000
`lines of code). The Prolac compiler can therefore read the source for an entire pro-
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 013
`
`
`
`gram at once, enabling some important global optimizations. For example, the Prolac
`compiler can discover that a rule' is never overridden, in which case any call of that
`rule can use slightly faster static dispatch or--even better-be inlined.
`A Prolac compiler generates C code from a Prolac specification. Prolac itself is
`bilingual; a Prolac program can specify arbitrary C code to be executed inside a rule.
`This allows tight integration with C without either complicating Prolac or forcing
`implementation concerns into a separate file.
`Prolac is call-by-value rather than call-by-object. C-style explicit pointer types must
`be used to get true dynamic dispatch.
`
`2.2 Modules and rules
`
`The basic unit of program organization in Prolac is the module; a module in Prolac is
`like a class in many other object-oriented languages (§A.3). Modules have associated
`data, calledfields (§A.6), and code, called rules (§A.5). Here is a prototype for a module
`implementing rational numbers:
`
`module Rational {
`field num :> int;
`field den:> int;
`constructor(n:> int, d:> int)
`num = n, den = d;
`negative ::= //Is it negative?
`(num < 0)!= (den < 0);
`
`The fields num and den have separate values for each object of type Rational.
`A field is often called a slot or class member in other languages. The ':>' operator
`declares a type; it should be read "is a".
`The constructor, named constructor, is used to create objects of type Ratio-
`nal (§A.5.3); here, it initializes the num and den fields to values the user must pass as
`constructor parameters.
`Finally, negative is a rule taking no parameters (§A.5). The expression following
`'is the body of the rule, which is executed whenever the rule is called. The negative
`rule actually returns type bool (the Boolean type, with values true and false; §A.8.4);
`because bool return types are so common in Prolac protocol specifications, we allow
`them to be elided. negative is a dynamic rule (often called a method), so it has access
`to a "current object" of Rational type. Prolac also supports static rules which can be
`called without reference to any object (§A.5.1).
`
`1. Prolac's blanket term for "function" and "method"; see §3.1.1 for discussion.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 014
`
`
`
`2.3
`
`Imports and supertypes
`
`Here is how a module M would use the Rational module we've created:
`
`module M has Rational {
`field ir:> Rational;
`}
`Rational is listed in M's module header (§A.3.1), in the has clause. A module must
`explicitly declare every module it uses by importing it-that is, by listing it in the has
`clause (§A.3.3). If we had left Rational out of the has clause, the field definition would
`have caused an undefined name error.
`To demonstrate supertypes, we extend Rational to enforce an additional invariant
`(that the denominator must be positive):
`
`module Pos-Denom-Rational:> Rational {
`constructor(n:> int, d:> int)::=
`Rational(n, d), //first, call parent's constructor
`normalize-sign;
`normalize-sign::=
`den < 0 ==> (num = -num, den = -den);
`negative ::= num < 0;
`
`Supertypes-more specifically, parents, which are direct supertypes-are also de-
`clared in the module header, after a type declaration operator ':>'
`(§A.3.2). While
`eventually Prolac will have multiple inheritance (i.e., allow multiple parents for a sin-
`gle module), the language and compiler described in this thesis only support single
`inheritance.
`The arrow operator '==>' from normalize-sign is the usual way to express condi-
`tional execution in Prolac; 'X ==> Y' essentially means 'if X, then Y' (§A.9.4.3).
`The call of normalize-sign demonstrates that if a rule has no parameters, paren-
`theses can be omitted when it is called (§A.9.2).
`Pos-Denom-Rational adds one new rule, normalize-sign, and overrides an old one
`from Rational's collection, negative (§A.5.2). Consider:
`
`module Test has Rational, Pos-Denom-Rational { ...
`field pos-denom:> Pos-Denom-Rational;
`test ::=
`let p :> *Rational in
`p = &pos-denom,
`p->negative // actually calls Pos-Denom-Rational's negative-
`// even though p's type points to a Rational object--because the
`// run-time type of *p is Pos-Denom-Rational.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 015
`
`
`
`// This is dynamic dispatch.
`end;
`
`I
`In Test, also notice the let expression, which resembles let expressions in many
`functional languages (§A.9.5); the different syntax for pointer types (§A.8.6); and the
`syntax for calling a specific object's version of a rule, which is the same as C++'s.
`
`2.4 Namespaces
`
`Prolac allows the user to create explicit namespaces, both at file scope to group modules
`and within modules to group rules (§A.4). To illustrate, we extend Rational again:
`
`module Reduced-Rational:> Pos-Den-Rational {
`constructor(n:> int, d:> int) ::=
`Pos-Den-Rational(n, d), reduce;
`reduce { // reduce is a namespace
`reduce
`(den == 0 ==> { assert(O && "Bad denominator!"); })
`I I I recurse(den);
`recurse(try :> int)::=
`(try <= 1 ==> true)
`I (num% try == 0 && den% try == 0 ==>
`(num /= try, den /= try, recurse(den - 1)))
`III recurse(try - 1);
`
`}
`
`Here, we avoided polluting the module's top-level namespace with recurse by placing
`it in a nested namespace, reduce. (This also meant that we could give it a short
`name, rather than reduce-recurse or some such.) Note that, in the constructor, we
`treated the namespace name 'reduce' as a rule call; this is legal, and abbreviates calling
`reduce.reduce' (§A.9.2.1).
`Other things to notice: recursive rule calls to implement a looping construct;
`the assert expression in braces { ... }, which is a C block specifying C code to be
`executed (§A.9.6); and the case bars 'I I ', which, together with the arrow operator
`==>', express a case expression analogous to Lisp's cond (§A.9.4.7).
`
`2.5 Advanced features
`In our final example, we use some relatively advanced Prolac features. First, let's make
`it harder to change the values of the num and den fields without going through accessor
`methods:
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 016
`
`
`
`module Rational-Interface:> Reduced-Rational
`// accessor methods:
`numerator :> int ::= num;
`denominator:> int ::= den;
`} hide (num, den);
`
`Here, hide is a module operator, or an operator which acts on modules instead of
`values (§A.3.4). Placed in this position, it is an after-module operator which affects
`what users of Rational-Interface will see by default (§A.3.4.2); in particular, it removes
`the names num and den from Rational-Interface's namespace. (However, in keeping
`with Prolac's anti-access-control philosophy, num and den can still be accessed, ei-
`ther following a show operator or through Rational-Interface's complete namespace,
`Rational-Interface.all.) Module operators are Prolac's main linguistic contribution;
`they are discussed in greater detail in the next chapter.
`Unfortunately, implicit rules, which are most useful in larger programs, are very
`difficult to justify with a microexample. A larger justifying example is therefore pro-
`vided along with the discussion of implicit rules; see §3.2.2.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 017
`
`
`
`18
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 018
`
`
`
`3
`LANGUAGE DESIGN
`
`This chapter discusses the Prolac language: its design; its linguistic contributions, es-
`pecially module operators; and how it is used to write protocols, including a discussion
`of optimizations it supports.
`
`3.1
`Design
`3.1.1 History
`
`The predecessor to the Prolac language, designed by Frans Kaashoek with input from
`Eddie Kohler, was inspired by the yacc parser generator [Joh75]. The PC language had
`named rules and an expression-based syntax. Like yacc, it was also bilingual, allowing
`escapes to C code to express some parts of a computation. Prolac has inherited all
`of these features. However, PC rules could not take parameters or return results, and
`there were no modules or namespaces; a source file was a flat collection of rules, and
`all communication between rules was through global variables.
`The goals of the protocol compiler project are to implement protocol-specific
`optimizations, thus creating high-performance protocol implementations; to facilitate
`protocol extensions; and to make protocol implementations more tractable to human
`readers. Experience with PC convinced us that while, with some work, we could
`create a reasonably high-performance protocol implementation using the language,
`the second and third goals would be essentially impossible to achieve.
`Prolac is a completely new design that addresses those problems. Simplistic re-
`strictions from the earlier language were removed: thus, rules can take parameters and
`return results. Prolac's new object-oriented module system and flexible namespaces
`address the problems of extensions and tractability; the Prolac programmer splits
`computation into many small, sensibly-named rules which may be overridden later.
`Our experience with Prolac has been positive in all three areas: even before extensive
`tuning, our new TCP specification is about as fast as the old, but also more extensible
`(the TCP specification we present consists largely of extensions to a small base) and
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 019
`
`
`
`much more readable.
`A word on rule terminology: The term "rule"-borrowed from yacc-was perfectly
`appropriate for PC; in Prolac, rules are much more like functions or methods in
`other programming languages. We find, however, that the unorthodox terminology
`highlights salient differences between Prolac rules and most other languages' methods.
`In particular, Prolac rules tend to be smaller than methods; rule bodies are expressions,
`not statements; and Prolac has no complicated flow-control statements like for or while,
`so rules tend to be simpler and appear more "functional" than most methods.
`
`3.1.2 Goals and principles
`
`This section describes some goals and principles which guided the design of Prolac.
`These goals and principles grew out of our experience with Prolac itself and are more
`language-specific than those described in § 1.2.
`Encourage short rules. We want Prolac programs to consist of many sensibly-
`named rules with small bodies; such code naturally lends itself to extension and to
`quick top-down comprehension. However, such a style can rapidly become unreadable
`unless rules are given appropriate names (neither too long nor too short) and organized
`into useful groups. The namespace system was developed to facilitate this.
`Eliminate syntax. Appropriate rule naming is necessary, but not sufficient, to
`make a program with many small rules comprehensible. If there are many small rules,
`a user will often forget what one does; the language must allow the user to quickly
`find the rule in question and take in its purpose at a glance. Prolac tries to facilitate
`this by eliminating nonessential syntax, so that a user doesn't have to wade through
`syntactic commonplaces to find what a rule really does. This (perhaps religious) point
`is elaborated further in §3.2.3.
`Abbreviate routine code. The language should allow a user to abbreviate or
`eliminate routine code by implicitly generating it. Namespace call, implicit construc-
`tors, and especially implicit rules are examples of how this principle was put into
`practice. Implicit code generation is limited, and hopefully made less surprising, by
`a simple restriction: implicit code generation only uses mechanisms also available to
`the user.
`Abbreviations degrade safely. Any implicit code the compiler generates should
`degrade safely: when the user makes a change that would significantly change the
`meaning of implicitly generated code, an error or warning should be produced rather
`than a silent change in meaning. This goal has been only partially achieved; adding a
`rule to a supertype may silently change the meaning of a (previously) implicit rule in
`a subtype, for example.
`
`CAVIUM-1072
`Cavium, Inc. v. Alacritech, Inc.
`Page 020
`
`
`
`3.2 Contributions
`
`This section describes novel aspects of the Prolac language, especially module opera-
`tors and implicit rules. A list of smaller contributions, including the rationale behind
`Prolac's minimal syntax, closes the section.
`
`3.2.1 Module operators
`Module operators are Prolac's simple and powerful mechanism for manipulating mod-
`ules. A module operator is simply an operator which takes a module as a value and
`returns a module as a result. A module operator expression is acceptable wherever a
`module can be used in Prolac-as a supertype or import, for example, or as the type of
`a field. The module operators we have implemented in Prolac affect only a module's
`extra-type information (a module's namesp