throbber
PROTOCOLS
`FOR HIGH-SPEED
`NETWORKS, III
`
`Proceedings of the IFIP WG6.1/WG6.4 Third International Workshop on
`Protocols for High-Speed Networks,
`Stockholm, Sweden, 13-15 May, 1992
`
`Edited by
`
`B. PEHRSON
`
`P.
`
`NINGBERG
`
`.P|NK
`
`Swedish Institute of Computer Science
`Box 1263
`
`8- 164 28 Kista, Sweden
`
`1 993
`
`NORTH-
`
`AMSTERDAM ° LOND
`
`AN
`
`EW
`
`K 0 TOKYO
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`ELSEVIER SCIENCE PUBLISHERS B.V.
`
`’
`Sara Burgerhartstraat 25
`P.O. Box 211, 1000 AE Amsterdam, The Netherlands
`
`Keywords are chosen from the ACM Computing Reviews Classification System, ©1991, with permission.
`Details of the full classification system are available from
`ACM, 11 West 42nd St., New York, NY 10036, USA.
`
`ISBN: 0 444 89925 1
`
`ISSN 0926-549X
`
`© 1993 IFIP. All rights reserved.
`No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
`means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of
`the publisher, Elsevier Science Publishers B.V., Copyright & Permissions Department, P.0. Box 521, 1000 AM
`Amsterdam, The Netherlands.
`
`Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance
`Center Inc. (CCC), Salem, Massachusetts. Information can be obtained from the CCC about conditions under
`which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including
`photocopying outside of the U.S.A., should be referred to the publisher, Elsevier Science Publishers B.V., unless
`otherwise specified.
`“
`
`No responsibility is assumed by the publisher or by IFIP for any injury and/or damage to persons or property
`as a matter of products liability, negligence or otherwise, ortrom any use or operation of any methods, products,
`instructions or ideas contained in the material herein.
`
`This book is printed on acid—tree paper.
`
`Printed in The Netherlands
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`Protocols for High—Speed Networks, III (C-9)
`B. Pehrson, P. Gunningberg and S. Pink (Editors)
`Elsevier Science Publishers B.V.
`(North-Holland)
`© 1993 IFIP. All rights reserved.
`
`219
`
`A Parallel Approach to OSI Connection-Oriented
`Protocolsl
`
`Murray VV. Goldberg, Gerald W. Neufeld and Mabo R. Ito
`
`University of British Columbia, Vancouver, British Columbia, Canada
`
`Abstract
`
`This paper describes our work in the parallelization of ISO connection—oriented proto-
`cols. Our attention focuses on (though is not restricted to) the upper layers. Our scheme
`performs parallelization on a per—packet basis using widely available, general purpose,
`shared memory multiprocessors. Our parallel implementation of the ISO stack includes
`the transport, session, presentation, and application (ACSE and ROSE) layers. The im-
`plementation has been tested on a four—processor risc machine and shows that parallelism
`can provide significant performance advantages in protocol processing.
`
`1
`
`Introduction
`
`The performance of communication protocol processing is not keeping pace with the
`present rate of improvement in network transmission bandwidth. Several options are
`available to increase protocol throughput. These include new low—overhead or paral-
`lelizable protocols, parallelizing the execution of existing protocols, specialized hardware
`architectures, or offloading protocol processing to auxiliary subsystems. Some of these
`options overlap or can be employed together.
`
`The solution we explore here is the parallel execution of existing protocols on a general
`purpose multiprocessor. This is attractive for several reasons. First,
`it is applied to
`existing protocols. Existing oflicial a11d de facto protocol standards are well entrenched
`and not likely to change quickly, even for the sake of performance. Second, this solution is
`applicable to all protocol layers, including the presentation layer where the overhead and
`potential for improvement are the greatest. Third, existing protocols are well studied and
`understood. Much work has already gone into making efficient implementations. Finally,
`the protocol processing hardware is general purpose and therefore widely available.
`
`This paper describes our work on a parallel implementation of the ISO stack2 of
`connection—oriented protocols.
`
`‘This work was made possible by grants from the Natural Sciences and Engineering Research
`Council of Canada.
`'
`
`2Actually, our stack is not completely ISO—based as it normally runs on top of TCP.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`220
`
`Parallelism issues, the protocol framework, implementation details and performance
`measurements are presented. Section 2 covers approaches to protocol parallelism. Sec-
`tion 3 discusses the approach we take to ISO protocol parallelism. Section 4 details the
`parallel protocol implementation, including the parallel thread environment designed and
`implemented for this work. Finally, Sections 5 and 6 provide performance measurements
`and detail future work.
`
`2 Parallel Protocol Processing
`
`There are several ways to introduce parallelism into protocol processing. Most of these
`are discussed in [DGW89], [NJB90] and [CMW89] and are briefly presented here.
`
`2.1 Processor Per Connection
`
`The granularity of the first option is at the level of a single connection. In this case there is
`no parallelism within a connection; packets received for the same connection are processed
`one at a time. Parallelism occurs where multiple connections are active concurrently. The
`separate connections are processed by separate processors and competition between the
`connections for computing power is reduced. This schemeis suitable where communication
`resources are shared by multiple applications or where a single application uses concurrent
`connections.
`In either case multiple concurrent connections will exist and parallelism is
`achieved.
`
`The advantage of this scheme is its simplicity. Concurrency control between parallel
`streams is limited to interactions on resources shared by all connections. This set of
`resources and data structures is fairly small.
`It is a reasonably simple task to take a
`serial protocol implementation and introduce concurrency at this level. The drawback of
`this scheme is the lack of benefit for applications with a small number of heavily used
`connections.
`In this case, even with a large volume of data being sent and received,
`processors will still be idle and little or no parallelism is possible.
`
`2.2 Processor Per Function
`
`At the other end of the granularity spectrum is parallelism within a protocol layer. Gran-
`ularity is at the level of individual protocol functions. This may be of benefit depending
`on the number and size of parallelizable tasks. ,.However, even for protocols designed for
`this sort of parallelism, the benefit is questionable due to the extra synchronization and
`communication overhead among the many small tasks.
`
`2.3 Processor Per Layer
`
`The granularity of the third option is at the level of a single protocol layer. Here, one
`or more processors are dedicated to a single layer and the protocol stack is made of a
`pipeline of processors. At each stage in the pipeline a packet moves to the next processor
`for processing at that layer. The vacated stage is free to begin processing the next packet.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`221
`
`The number of packets being processed in parallel could be as high as the number of
`stages in the pipeline.
`
`The advantage of this scheme is simple implementation, especially if pipeline stages
`are drawn along protocol layer boundaries.
`It is natural to implement each layer as a
`separate process. The pipeline will run at the speed of the slowest segment.
`It may be
`possible to divide slow stages to increase overall pipeline speed. However, if such divisions
`do not fall on natural boundaries, implementation becomes complex. The drawback is
`the context switching, communication and synchronization overhead at layer boundaries.
`Between each segment of the pipeline the packet will be handed off or queued for the next
`stage. If there are more stages than there are processors, several context switches may be
`necessary for each packet processed. Without a solution to these problems such a scheme
`is unsuitable for a high speed network.
`
`2.4 Processor Per Packet
`
`The fourth option (and the one selected) is parallelization at the level of a single packet.
`Here, each packet is assigned its own processor to carry it through the stack. There
`are several advantages to this scheme. First, there is no extra communication overhead
`between the layers. Second, performance improvements are made whenever trafiic is
`heavy regardless of how the traffic is distributed over connections. Third, while there
`is synchronization required between the parallel tasks, it is kept to a reasonable level.
`Finally, in the general case, no extra context switching overhead is incurred.
`
`3 Parallelization of the ISO Stack
`
`While much of the protocol processing task is parallelizable, there are areas where syn-
`chronization must occur and others where no parallelism is possible at all.
`
`3.1 Where is Parallelism Possible?
`
`Protocol processing for connection—oriented protocols (and some connection—less protocols)
`can be roughly divided into two unequal parts: packet processing and connection con-
`trol block (CCB) manipulation. Packet processing involves external data representation
`conversion functions, checksurn calculations, encryption and decryption, and encoding
`and decoding of packet headers. CCB manipulation includes state machine transitions,
`enqueuing received segments, etc. In some implementations these parts are intertwined,
`though our experience shows they are not difficult to isolate. The relative processing over-
`head of these two parts varies from layer to layer. For most layers, CCB manipulation for
`a single packet involves only a few lines of code to enqueue a segment, update sequence
`numbers or initiate an acknowledgement (often not even these are required). Packet pro-
`cessing is normally more complex. Even at layers with very simple protocol data unit
`(PDU) formats, such as the network and transport layers, packet manipulation is a sig-
`nificant proportion of the protocol code. At higher layers, such as the presentation and
`application, packet processing overhead completely dominates all other processing. Clark
`and Tennenhouse [CT90] indicate that up to 97% of total protocol stack overhead can
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`222
`
`be attributed to the presentation conversion function (a packet processing task). While
`this number varies depending on the complexity and size of the application data, informal
`experimentation using our own protocol stack has confirmed the dominant overhead of
`presentation layer data conversion.
`
`Whether CCB manipulation and packet processing are parallelizable depends on how
`parallelism is introduced. For example, if parallelism is at the level of a single connection
`(i.e. one process per connection) then there is no parallelism other than that available
`across connection boundaries.
`If parallelism is at the level of the protocol function (one
`process per protocol function) then CCB manipulation and packet processing are both
`parallelizable. This parallelism, however, comes at the expense of significant synchroniza-
`tion and communication complexity unless the protocol is specifically designed for this
`purpose.
`
`If parallelism is at the packet level (one process per packet), significant parallelism is
`available with reasonable levels of synchronization and communication complexity. Packet
`processing does not require access to shared data structures. Packets are generally context
`free in the sense that state information and synchronization are not required for encoding
`and decoding; only the packet is required for decoding and a reference to the packet
`contents for encoding.
`lf only one process is responsible for the encoding or decoding of a
`single packet, then there is no competition for these data structures. This makes packet
`processing an excellent candidate for parallelization. CCB manipulation is more difficult
`using per—packet parallelism.
`111 this case more than one process may be contending
`for access to shared data in the connection control block. A simple example is that of
`enqueuing a segment of a multi—segment PDU. If multiple segments are being enqueued in
`parallel, access to the list structure must be controlled to maintain list integrity. Likewise,
`state examination and transitions must be made in a consistent manner. It is therefore
`
`difficult to parallelize CCB manipulation.
`
`3.2 How is This Parallelism Achieved?
`
`As indicated, our implementation achieves parallelism on a per-packet basis. This is
`implemented by creating some number of processes designed to run in parallel, each
`capable of taking a packet through the protocol stack. However, the difficult work in
`parallelizing a set of protocols is not introducing parallelism, but is rather introducing
`synchronization and mutual exclusion where parallelism is not desirable.
`
`3.2.1 Mutual Exclusion
`
`Making CCB manipulation a critcial section is possible using mutual exclusion constructs
`such as semaphores, critical regions, or monitors. This mutual exclusion is at the level of
`a single CCB. Since a CCB pertains to only one connection at a single layer, competition
`for access to a CCB is restricted to CCB manipulation occurring at the same layer for the
`same connection. No competition (other than that for shared hardware resources) occurs
`across layer or connection boundaries.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`223
`
`3.2.2 Event Ordering
`
`The main synchronization problem is that of event ordering3. That is, when a layer
`receives an event, how does it determine the its relative order in the event stream?
`Correct event ordering must be determined for in—order queueing of segments of a PDU,
`in—order delivery of data to the application, and proper processing of control events.
`Normally, event ordering is determined either through external clues such as sequence
`numbering (as at the data link layer), or through internal clues such as event arrival order
`(as at all layers above the network or transport layer). When unsynchronized parallelism
`is introduced, external (PDU based) sequence numbers still supply sufficient ordering
`information for data events. However, layers which normally rely on arrival order (such
`as the session, presentation and application, as well as the lower layers for control events)
`can no longer do so.
`In a parallel implementation it is possible for a later event to be
`processed more quickly than an earlier event and therefore arrive at some layer out of
`order.
`
`Our solution for determining proper event ordering is a11 internal sequence number
`scheme. This scheme is applied to all events at all layers.
`In this case, every generated
`event is accompanied by an internal sequence number. The sequence number is calcu-
`lated by the layer which generates the event. Even though events can be received out
`of order, their correct ordering can now be determined by the accompanying internal se-
`quence number. This solution does not impose any restrictions on parallelism, but leaves
`some questions unanswered. For example, how are out—of—order data and control events
`handled?
`
`3.2.3 Out of Order Events
`
`There are two possibilities. The first is to process the event in order of arrival based on
`the correct order as indicated by the sequence number. The second is to postpone event
`processing until all intervening events have arrived. The first option varies in difficulty
`depending on the event and protocol being processed. Certain events such as a data
`indication or a request for retransmission of a set of packets could perhaps be processed
`out of order without adverse effects. Most events which have no significant effect on the
`protocol state fall into this category. In general, however, ISO connection based protocols
`are all state—oriented. The correct action taken on receipt of an event depends on the
`current state of the protocol state machine. Likewise, receipt of a control (and to a
`lesser extent, data) event generally alters the state of the protocol. Therefore, the action
`taken on receipt of an out—of—order event depends on the nature and order of intervening
`events not yet received. Given these difficulties, we adopt the second option and postpone
`out—of—order events until they can be processed in—order.
`
`To summarize, we have adopted the following approach. Parallelism is at the level
`of one processor—per—packet.
`In general, at each layer, packet processing and CCB ma-
`
`3An event is any protocol information passed to a protocol layer. Examples include data
`arrival and timer expiry. Events are divided into two categories: control and data. A control
`event is one which passes control information to a layer. Examples are connection requests
`and indications. Control events generally change the state of the protocol state machine. Data
`events are those that pass packets from one layer to another. Examples are data requests and
`indications.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`224
`
`nipulation are disjoint stages. All packet processing is done in parallel. There is no syn-
`chronization or competition for resources (other then general hardware resources) during
`encoding and decoding. CCB manipulation is difficult to parallelize and is of relatively
`short duration. Therefore the CCB manipulation stage for each CCB is made mutually
`exclusive. Competition for CCB access occurs only when processes arrive at the CCB
`manipulation stage at the same layer, on the same connection, and at the same time.
`CCB manipulation for out—of—order events is difficult to perform on event arrival. These
`events are therefore delayed until the intervening events have arrived. This approach
`allows parallelism during all packet processing activities (which are extensive, especially
`at the presentation layer). It suffers syncl1ro11izatio11 overhead only when processes arrive
`together at a CCB manipulation stage (same layer, same connection) or when an event
`passes another in the stack.
`
`4
`
`Implementation
`
`This section describes details of the parallel threads environment, the protocol stack
`code, the structure of the threads executing the code, and the mechanisms used for event
`sequencing and CCB mutual exclusion.
`
`4. 1 Pt hreads
`
`A parallel execution environment is required to support the parallel communication stack.
`Pthreads supplies such an environment.
`
`Pthreads is a user—leVel kernel for use on multiprocessors. It provides an environment
`suitable for running a group of parallel cooperating threads. This first implementation of
`Pthreads runs on a shared memory Motorola 88100 based multiprocessor running UNIX
`System V. It can be ported to other multiprocessors (or uniprocessors) and other host
`operating systems so long as shared memory is supported by the target.
`
`Pthreads allows efficient process creation and destruction, interprocess communica—
`tion and memory allocation in a preemptive parallel environment. Pthreads provides
`semaphores (as well as IPC) for application thread synchronization.
`It also provides a
`sleep facility and 0(1) memory allocation routines allowing fast memory allocation. The
`Pthreads ready queue is shared by all processors. The highest priority ready thread is
`run by the next available processor and may therefore migrate between processors over its
`lifetime. Pthreads shared kernel data structures are protected by a system lock. Access
`to a critical section requires continued examination of the lock location until it becomes
`free. Once it is found to be free, locking is attempted using the XMEM (88100 test and
`set) instruction. This process repeats until the XMEM return value verifies locking. Re-
`peated lock examination, rather than repeated XMEM execution, is done to avoid CPU
`instruction synchronization overhead and take advantage of the data cache.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`225
`
`4.2 Protocol Implementation
`
`4.2.1 The Parallel Stack
`
`Our work includes a parallel implementation of the ISO seven layer connection—oriented
`stack. These protocols provide the remote operations service defined by the CCITT
`standard X219 (ISO standard 9072-1) [CI87a] using the protocol defined by the CCITT
`standard X229 (ISO standard 9072-2) [CI87b]. To support this protocol, the OSI trans-
`port, session, presentation and association control layers are also provided. The remote
`operations layer supports all association classes (1, 2 and 3), and supports operation
`classes I and 2. The association control service conforms to CCITT recommendation
`
`X227 [CCl87a]. The presentation service provides the kernel functional unit of CCITT
`recommendation X226 [CCI87b]. The session service provides the full-duplex basic com-
`bined subset of functional units described by CCITT recommendation X225 [CCI89a].
`Class 0 of the transport layer of CCITT recommendation X224 [CCI89b]is also provided.
`The transport layer normally makes use of TCP4 as its network service using the interface
`described by [RC86].
`
`4.2.2 Process Structure
`
`Many threads cooperate to provide the protocol service. At the top level is the application
`thread. Only the client side of the application uses a thread which exists over the life
`of the application. This thread runs the client application initiating requests of the
`server application by way of the protocol service. The server application uses a separate
`thread to initialize its services and request initialization of the communication service.
`Following this initialization the server thread exits. Events for the server are delivered
`asynchronously by way of upcalls. The upcalls are executed by threads belonging to the
`communication service.
`
`The communication service maintains a set of threads called packet shepherds. The
`number of packet shepherds is configurable and varies with the number of network con-
`nections. Each packet shepherd executes a loop. This loop blocks awaiting an event from
`the network (a packet), and then shepherds the packet through the protocol stack. When
`the packet shepherd completes processing its packet (including the possible application
`upcall and resulting transmission), it returns to the top of the loop, awaiting its next
`packet. Any number of packet shepherds may be active concurrently in the stack.
`
`There is also a set of worker threads, each responsible for a different function. For
`example, there is an acceptor thread responsible for fielding requests for new network
`connections. There is also a timer thread responsible for generating and executing timer
`events.
`
`4.2.3
`
`Internal Sequence Numbers
`
`Parallel processing of events (eg. received packets) allows the possibility that events may
`pass each other in the stack as they are being processed. This can cause a logically
`earlier event to arrive at some layer after a logically later event. In order for correct event
`
`‘*For testing purposes, TCP network access is replaced by a fast bufler scheme.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`226
`
`processing the protocol machine must have knowledge of the logical order of received
`events. This is accomplished using internal sequence numbers. These sequence numbers
`are internal in the sense that they are private within a protocol implementation. They
`are an implementation feature, invisible from a functional point of view.
`
`Internal sequence numbers are generated by all layers which generate events and are
`associated one-to—o11e with these events. A sequence number stream is a monotonically
`increasing-by—one series. There is a separate sequence number stream for each event
`destination of each connection. For example, a session connection generates two indepe11—
`dent sequence number streams. One for events destined for the associated presentation
`connection, and one for events destined for the associated transport connection.
`If one
`connection splits events onto several service providers, separate sequence number streams
`are generated for each. The sequence number associated with each event determines its
`logical position in the event sequence regardless of arrival order.
`
`There are also events which have no place (or perhaps more accurately, have any
`place) in the event stream. These are called unordered events. Unordered events are
`generally asynchronous events such as timer expiry. These events are assigneda special
`null sequence number indicating their unordered status.
`
`4.2.4 The Event Gate
`
`The event gate (depicted in figure l) is a software mechanism that controls the entry of
`threads into a connection’s CCB manipulation stage. There is one event gate for each
`connection active within each layer. Again, our parallelism structure requires no synchro-
`nization between separate connections, even at the same layer. Synchronization is only
`required between threads entering the CCB manipulation stage for the same connection
`at the same layer. There are two aspects to the synchronization performed by the event
`gate: mutual exclusion and sequencing.
`
`Manipulations involving a CCB must be mutually exclusive. These manipulations can
`involve adding segments to a list, counting list elements, or examining and updating state
`variables.
`If two or more processes make these manipulations concurrently, the obvious
`data consistency problems ensue. Therefore the event gate allows only one process at
`a time into the CCB manipulation stage of a connection. An event arriving while the
`CCB manipulation stage is occupied is suspended and enqueued until the stage is clear.
`The resultant context switching is expensive but the relatively short duration of the
`CCB manipulation stage (when compared to the packet processing stages) minimizes
`contention.
`
`Event gates also sequence events entering CCB manipulation stages. Events are al-
`lowed access sequentailly in order of internal sequence number. An event arriving out
`of order is suspended until all intervening in-order events have passed through the CCB
`manipulation stage. These events may arrive from more than one source. For example,
`a connection receives events from at least its service user and service provider. Therefore
`each event gate must keep track of the separate sequence number streams. If more than
`one event is eligible to enter the CCB manipulation stage of a connection (these would
`be events from separate streams) the choice of which is allowed in first is arbitrary. The
`event gate also receives unordered events. These events are allowed in ahead of all other
`waiting events as soon as the CCB manipulation stage is free.
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`227
`
`Protocol Stack
`
`Event Gate (Controls
`access to CCB
`
`Manipulation Stage)
`
`
`
`Figure 1: The Division of Each Layer Into CCB Manipulation and Packet Processing
`Tasks
`
`5 Performance
`
`This section presents the results of performance measurements of the parallel stack. For
`testing purposes, the TCP layer has been replaced with a buffer scheme. The buffer
`scheme makes data continuously available to the stack for processing. Therefore the stack
`will process data at its maximum sustainable rate. This is intended to simulate a high
`bandwidth network.
`
`5 . 1 Data Transfers
`
`The application data is intended to mimic the format and complexity of X.500 requests
`and responses. The data is encoded using basic encoding rules and is represented here
`using ASN.1. The requests represent a set of keys to be searched for in a database.
`Performance results do not include any application processing other than that done by
`ROSE. The total size of the SearchKey set varies from O to 1920 bytes over the various
`tests. The request format is as follows:
`
`SearchKeys ::= SET OF SearchKey
`
`SearchKey
`goesBy
`empNo
`
`= [APPLICATION 0] SET
`Name,
`EmployeeNumber
`
`{
`
`(cid:36)(cid:80)(cid:68)(cid:93)(cid:82)(cid:81)(cid:17)(cid:70)(cid:82)(cid:80)(cid:15)(cid:3)(cid:44)(cid:81)(cid:70)(cid:17)(cid:3)(cid:72)(cid:87)(cid:3)(cid:68)(cid:79)(cid:17)(cid:3)(cid:3)(cid:3)(cid:40)(cid:91)(cid:75)(cid:76)(cid:69)(cid:76)(cid:87)(cid:3)(cid:20)(cid:19)(cid:19)(cid:26)(cid:3)
`Amazon.com, Inc. et al. Exhibit
`
`

`
`228
`
`}
`
`Name ::= [APPLICATION 2]
`
`IMPLICIT SEQUENCE {
`
`IA5String,
`givenName
`IA5String,
`initial
`fami1yName IA5String
`
`}
`
`EmployeeNumber
`
`::= [APPLICATION 3]
`
`IMPLICIT INTEGER
`
`The responses consist of a set of personnel records. The size of the PersonnelRecord
`set varies from 0 to 16768 bytes over the various tests. The response format is as follows:
`
`Records
`
`::= SET OF Personne1Record
`
`Personne1Record ::= [APPLICATION 1]
`
`IMPLICIT SET {
`
`goesBy
`title
`empNo
`dateOfHire
`
`[0]
`
`Name,
`IA5String,
`Emp1oyeeNumber ,
`[1] Date,
`
`nameOfSpouse [2] Name,
`children
`[3]
`IMPLICIT SEQUENCE OF Childlnformation DEFAULT {}
`
`}
`
`Childlnformation ::= SET {
`
`Name,
`goesBy
`dateOfBirth [0] Date
`
`}
`
`Name
`
`::= [APPLICATION 2]
`
`IMPLICIT SEQUENCE {
`
`IA5String,
`givenName
`IA5String,
`initial
`familyName IA5String
`
`} D
`
`ate ::= [APPLICATION 4]
`
`IMPLICIT IA5String -- YYYYMMDD
`
`5.2 Overall Performance
`
`Tests were conducted which measure the time required to process one request /result pair
`using 1, 2, 3, and 4 processors over 1 or 3 connections. The data size for all tests is
`represented as A/B where A is the request size in bytes, and B is the result size in
`bytes. Figure 2 gives a comparative View of test results for all data sizes and numbers of
`processors over a single connection. As expected, the time required to process each re-
`quest / response pair diminishes with increasing number of processors and decreasing data
`size. Figures 3 and 4 display with greater detail the time required to process each re-
`quest / response pair. Notice that the paralle

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket