throbber
IFIP Transactions C:
`Communication Systems
`
`International Federation for Information Processing
`
`Technical Committee 6
`
`Communication Systems
`
`IFIP Transactions Editorial Policy Board
`
`The IFIP Transactions Editortal Policy Board is responsible for the overall scientific
`quality of the IFIP Transactions through a stringent review and selection process
`
`Chairman
`G.J. Morris, UK
`Members
`D. Khakhar, Sweden
`Lee Poh Aun, Malaysia
`M. Tienari, Finland
`P.C. Poole (TC2)
`P. Bollerslev (TC3)
`M. Tomljanov1ch (TC5)
`
`0. Spaniel (TC6)
`P. Thott-Chnstensen (TC?)
`G.B. Davis (TC8)
`K. Brunnstein (TC9)
`G.L. Reijns (TC10)
`W.J. Caelll (TC11}
`R Meersman (TC12)
`B. Shackel (TC13)
`J. Gruska (SG14)
`
`IFIP Transactions Abstracted/Indexed m
`INSPEC Information Services
`
`

`

`C-14
`
`HIGH
`PERFORMANCE
`NETWORKING, IV
`
`Proceedings of the IFIP TC6/WG6.4 Fourth international Conference on
`High Performance Networking
`Liege. Belgium, 14-18 December 1992
`
`Edited by
`
`A. DANTHINE
`lnstitut dElectric1te 828
`Universite de Liege
`Liege, Belgium
`
`0. SPANIOL
`RWTHAachen
`fnformatik IV
`Aachen. Germany
`
`1993
`
`NORTH-HOLLAND
`AMSTERDAM • LONDON • NEW YORK • TOKYO
`
`

`

`~1'7
`II<
`5/t?.?-.?
`, :;?1?.?1/
`/19?-
`
`ELSEVIER SCIENCE PUBLISHERS BV.
`Sara Burgerhartstraat 25
`P.O. Box 211, 1000 AE Amsterdam, The Netherlands
`
`Keywords are chosen from the ACM Computing Reviews Classification System, !01991, with permission.
`Details of the full classification system are available from
`ACM 11 West 42nd St., New York. NY 10036, USA
`
`ISBN· 0 444 81481 7
`ISSN· 0926-549X
`
`<C 1993 IFIP. All rights reserved.
`No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
`means. electronic, mechanical. photocopying, recording or otherwise. without the prior written permission of
`the publisher, Elsevier Science Publishers B V., Copyright & Permissions Department, P.O. Box 521 . 1000 AM
`Amsterdam, The Netherlands.
`
`Special regulations for readers in the U.SA - This publlcallon has been registered with the Copyright Clearance
`Center Inc. (CCC). Salem. Massachusetts. Information can be obtained from the CCC about conditions under
`which photocopies of parts of this pubhcat1on may be made in the U.S.A. All other copyright questions, including
`photocopying outside of the U.S A., should be referred to the publisher, Elsevier Science Publishers B.V., unless
`otherwise specified.
`
`No respons1b11ity 1s assumed by the publisher or by IFIP for any injury and/or damage to persons or property
`as a matter of products liability, negligence or otherwise, or from any use or operation of any methods. products,
`instructions or ideas contained in the material herein.
`
`pp. 119-134, 199-218, 267-281 , 367-381 : Copyright not transferred
`
`This book is printed on acid-free paper.
`
`Printed in The Netherlands
`
`

`

`vii
`
`Table of Contents
`
`h 92 pn
`
`Preface
`Program Committee
`List of Reviewers
`
`Session A: MAC Layer Enhancements
`Chair: Harmen van As, IBM Research, Switzerland
`
`OQDB for Time Constrained Services
`Guven Mercankosk, Z.L. Budrikis, QPSX Communications Ltd, Australia,
`A. Cantoni, Australian Telecommunications Research Institute, Australia
`
`A New Reservation Scheme for CRMA High-Speed Networks
`Nan-Fu Huang, Chung-Ching Chiou, Chiung-Shien Wu,
`National Tsing Hua University, Republic of China
`
`A Host Interface Architecture for High-Speed Networks
`Peter A. Steenkiste, Brian D. Zill, H.T. Kung, Steven J. Schlick, Carnegie
`Mellon University, USA
`Jim Hughes, Bob Kowalski, John Mullaney,
`Network Systems Corporation, USA
`
`Session B: Flow and Rate Control
`Chair: Marjory Johnson, RIACS, USA
`
`Dynamic Bandwidth Allocation and Access Control of Virtual Paths
`In ATM Broadband Networks
`Ibrahim Wahby Habib, Tarek N. Saadawi, City University of New York, USA
`
`Congestion Control - Effective Bandwidth Allocation in ATM Networks
`E.D. Sykas, K.M. Vlakos, K.P. Tsoukatos, E.N. Protonotarios,
`National Technical University of Athens, Greece
`
`v
`xi
`xii
`
`1
`
`3
`
`15
`
`31
`
`47
`
`49
`
`65
`
`

`

`viii
`
`A High Speed Data Link Control Protocol
`Ahmed N.Tantawy, IBM Res. Div., T.J. Wa tson Research Center, USA,
`Hanafy Meleis, DEC, Reading, UK
`
`Session C: Parallel Implementation and Transport
`Protocols
`Chair: Guy Pujolle, Universite P. et M. Curie, France
`
`Parallel TCP/IP for Multiprocessor Workstations
`Kurt Maly, S. Khanna, A. Mukkamala, C.M. Overstreet, A. Yerraballi,
`E.C. Foudriat, B. Madan, Old Dominion University, USA
`
`TCP/IP on the Parallel Protocol Engine
`Erich AOtsche, Matthias Kaiserswerth,
`IBM Research Division, Zurich Research Laboratory, Switzerland
`
`A High-Speed Protocol Parallel Implementation: Design and Analysis
`Thomas F. La Porta, AT& T Bell Laboratories, USA,
`Mischa Schwartz, Columbia University, New York, USA
`
`Session D: Multimedia Communication Systems
`Chair: Radu Popescu-Zeletin, GMO FOKUS, Germany
`
`Orchestration Services for Distributed Multimedia Synchronisation
`Andrew Campbell, Geoff Coulson, Francisco Garcia, David Hutchison,
`Lancaster University, UK
`
`Towards an Integrated Quality of Service Architecture (OOS·AI f or
`Distributed Multimedia Communications
`Helmut Leopold, Alcatel ELIN Research, Austria
`Andrew Campbell, David Hutchison, Lancaster University, UK,
`Niklaus Singer, Alcatel ELIN Research, Austria
`
`JVTOS - A Reference Model for a New Multimedia Service
`Gabriel Dermler, University of Stuttgart, Germany
`Konrad Froitzheim, University of Ulm, Germany
`
`Experiences with the Heidelberg Multimedia Communication System:
`Multicast, Rate Enforcement and Pertonnance
`Andreas Cramer, Manny Farber, Brian McKellar, Ralf Steinmetz,
`IBM European Networking Center, Germany
`
`81
`
`101
`
`103
`
`119
`
`135
`
`151
`
`153
`
`169
`
`183
`
`199
`
`

`

`Session E: QoS Semantics and Management
`Chair: Martina Zitterbart, IBM Res. Div., Watson Research Center, USA
`
`Client-Network Interactions in Quality of Service Communication
`Environments
`Domenico Ferrari , Jean Ramaekers, Giorgio Ventre, International
`Computer Science Institute, USA
`
`The OSI 95 Connection-mode Transport Service: The Enhanced QoS
`Andre Danthine, Yves Baguette, Guy Leduc, Luc Leonard,
`University of Liege, Belgium
`
`QoS : From Definition to Management
`Noemie Simoni, Simon Znaty, TELECOM Paris, France
`
`Session F: Evaluation of High Speed Communication
`Sy~ems
`Chair: Otto Spaniel, Technical University Aachen, Germany
`
`ISO OSI FTAM and High Speed File Transfer: No Contradiction
`Martin Bever, Ulrich Schaffer, Claus SchottmUller,
`IBM European Networking Center, Germany
`
`Analysis of a Delay Based Congestion Avoidance Algorithm
`Walid Dabbous, INRIA, France
`
`Performance Issues In Designing Network Interfaces : A Case Study
`K.K. Ramakrishnan, Digital Equipment Corporation, USA
`
`ix
`
`219
`
`221
`
`235
`
`253
`
`~5
`
`267
`
`283
`
`299
`
`Session G: High Performance Protocol Mechanisms 315
`Chair: Craig Partridge, BBN, USA
`
`Multicast Provision for High Speed Networks
`A.G . Waters, University of Essex, UK
`
`Transport Layer Multicast: An Enhancement for XTP Bucket
`Error Control
`Harry Santoso, MASI, Universite P.et M. Curie, France,
`Serge Fdida, MASI, Universite Rene Descartes, France
`
`A Performance Study of the XTP Error Control
`Arne A. Nilsson, Meejeong Lee,
`North Carolina State University, USA
`
`317
`
`333
`
`351
`
`

`

`x
`
`Session H: Protocol Implementation
`Chair: SamirTohme, E.N.S.T. , France
`
`ADAPTIVE An Object-Oriented Framework for Flexible and Adaptive
`Communication Protocols
`Donald F. Box, Douglas C. Schmidt, Tatsuya Suda,
`University of California, Irvine, USA
`
`365
`
`367
`
`HIPOD : An Architecture for High-Speed Protocol Implementations
`A.S. Krishnakumar, J.G. Kneuer, A.J. Shaw, AT&T Bell Laboratories, USA
`
`383
`
`Parallel Transport System Design
`Torsten Braun, University of Karlsruhe, Germany,
`Martina Zitterbart, IBM Res. Div., T.J. Watson Research Center, USA
`
`Session I: Network Interconnection
`Chair: Augusto Casaca, INESC, Portugal
`
`A Rate-based Congestion Avoidance Scheme for Interconnected
`DQDB Metropolitan Area Networks
`Nen-Fu Huang, Chiung-Shien Wu, Chung-Ching Chiou,
`National Tsing Hua University, Rep.of China
`
`Interconnection of LANs/802.6 Customer Premises Equipments {CPEs)
`via SMDS on Top of ATM: a case description
`W. Rozenblad, B. Li, R. Peschi,
`Alcatel Bell Telephone, Research Centre, Belgium
`
`Architectures for Interworking between B-ISDN and Frame Relay
`J. Vozmediano, J. Berrocal, J. Vinyes,
`ETSI Telecomunicacion, Spain
`
`Author Index
`
`397
`
`413
`
`415
`
`431
`
`443
`
`455
`
`

`

`Jilgh Performance Networking. IV (C-14)
`A. Danthinc and 0. Spaniol (Editors)
`Elsevier Science Publishers B.V. (North Holland)
`1993 IFIP.
`
`119
`
`TCP/IP on the Parallel Protocol Engine
`
`Erkh Riitschc and Mauh1a~ Kaiserswcrth
`
`IBM Research Division, Zurich Research Laboratory
`SaumcP.>tmsse 4, 8803 RiiM:hlikon, Switzerland
`
`Abstract
`
`In th" paper. a parallel 1111pkmentation of 1he TCP/JP pro1ocol ~u11e on the Parallel Pro1ocol En
`gine (PPE), a mulliproccssor-based communication subsystem, is descnbed. The execution
`time~ of the various protocol func1ions are used to analy1e the syMem 's performance in two see
`nario,. In the fir\t ~cenario we execute the te\I application on the PPE; in the second we evaluate
`the potential pcrfonnance of our TCP/lP implcmentallon when 111s driven hy an application on
`the workstation. For the second scenario, the end-to-end performance of our 1111plemcntution on a
`four-processor PPE system 1s more than 3300 TCP segments per \econd.
`
`Keyword Codes. C. l.2; C.2.2; D. U
`Keywords: Multiple Daw Stream Architectures (Multiprocessors); Network Protocols:
`Concurrent Programming
`
`1. INTRODUCTION
`
`Progrc~s m high-speed net.,.. orkrng technologies such as fiber opucs have shifted the bottleneck
`in communi<.:ations from 1hc limited bandwidth of the transmission media to protocol processing
`and the operatmg system overhead in the workstation. So-called lightweight protocols and proto(cid:173)
`col ollload to programmable adapter.. are two approaches proposed 10 cope with this problem.
`Prott>l:Ols such as the Xpress Transfer Protocol (XTP)' !PEI 921 and VMTP !Cheriton 88] try 10
`simplify the control mechanisms and packet \lructures such that the protocol implemcnlation be(cid:173)
`come\ less complex and can possibly be done m hardware We took the second approach in build
`f Kaiserswenh 92J. a muluprocessor-bascd
`mg
`the Parallel Protocol Engine (PPE)
`communication adapter, upon which protocol processing can be omoaded from a host system.
`The ~cctarCAB IAmould 891 and the V.MP Network Adapter Board (Kanak1a 881 are other pro(cid:173)
`grammable adapters, each based on a single protocol processor. The XTP chipsct (Chesson 871 is
`a very spec.:ialiied set of RISC processors designed to e;<ecute the XTP protocol. Our objective
`was to 111vcstiga1e and exploit parallelism in many diffrrent protocols. Therefore we decided 10
`de\ clop a gcnaal purpost communication subsystem capable of suppomng standard protocols
`cfftc1cntly 111 software.
`
`1
`
`.\pre" Tran>lcr PrntlK:OI and XTP arc rcgl\tcrcd tradcmarh ol XTP Forum
`
`

`

`120
`
`In this paper our goal is to demonMrate that a careful 1mplementation of a Mandard tmnspon pro(cid:173)
`tocol stack on a general-purpose multiprocessor architecture allows efficient use of the band(cid:173)
`width availabk Ill today's high-,peed networks. A, an example, we chose to implement the
`TCP/IP protocol l.uite on our -t-processor prototype of the PPE.
`
`We implemented the \ockct interface and a test application directly on the PPE to facilitate our
`performance measurements. In this tesr \Cenario we analyze the performance of TCP/IP and the
`socl..ct layer. We also exanuncd a second scenario to unden.tand how our 1mplementauon would
`pcrfonn when integrated into a workstation. where protocol processing up to the transport layer 15
`perfom1ed on the PPE and applicauons can access the transpon service via the socket interface on
`the:: workstation.
`
`In Section 2 our hardware platform, the PPE. is presented. Section 3 introduces TCP/IP. In the
`follm-.mg secuon we C\pl<tin our approach to parallt:I protoc.:ol unplementation. Section 5 prc\(cid:173)
`e111s the results and discusses the impact of the hardware and software architecture on perfor(cid:173)
`mance. The last section gives the condusion and an outlook on our furure work.
`
`2. THE PARALLEL PROTOCOL El'IGJ~E
`
`The PPE is to be pre~ented only hriefly hen.!. It is described in greater detail in [Wicki 90J and
`( Kat,erswenh 91 . 921 We "Will first concentrate on the hardware and then present the program·
`ming environment.
`
`The PPE is a hybrid \ha.red-memory/message-passing multiprocessor. Message passing is used
`for \ym.:hron1zation. whereas shared memor} is used to store sen.ice primitives and protocol
`frumcs. Figul"I! I shows the archnecture of the PPE ;ind ns u~e as a commumcation subsystem.
`
`The PPE u!>es two separate memories. one for tran~mitting. one for receiving data. Both of these
`memories an:: mapped mto the address space of the worhtallon. (n our tmplementation. four
`T425 transputer\ [IN MOS 89] arc used as protocol processors. On each stde of the adapter, two
`T425s have access to the shared memory. Each processor uses private memory to store its pro(cid:173)
`gram and local data. We decided against using a single shared memory for storing both inbound
`and outbound protocol data, although this would make the adapter more flexible and facilitate
`programming, for the following reason. lligh-speed network interfaces work in a synchronous
`fashion.with data be111g docl..ed in and out of memory. possibly at the same time, ::11 the transml\(cid:173)
`!>ion speed of the physical network. Splitting the adapter mto -.cparate receive and transmit parts
`accommodates simultaneous transmission and reeepuon and only requires memory wtth half the
`speed of that required for a smgle-memory solution. This architecture results in significant cost
`savings. especially when transmiss10n speeds exceed 100 Mb/s
`
`The network interface has read access to the transmit side and write access to the n::cetve side of
`the adapter We emulate a physical network by means of an 8 bit wide parallel interface, which
`allows a po1nt-to-pomt connec11011 between two PPE \ystems operaung with a b1directional
`transrmssion rate of up to 120 Mb/s. The transputer links are used exclusively for signalling and
`control message transfer within the PPE and to and from the ho~t system.
`
`The program111111g language wh11:h best dt:~l:nbcs the transputer\ programrrung model is OC(cid:173)
`CAM [Pounta1n 881. It 1s based on the theory of Cum111w1icaii111{ Sequential Proasses (CSP) de(cid:173)
`veloped hy lloare I Hoare 781. The structuring elements are processes that communicate and
`synd1romze \'lit mes~gcs :\1essage transfer" unbuffered communicating processes must reach
`
`

`

`Application I
`
`Protocol Layers
`
`Physical Layer
`
`121
`
`Micro
`Channel
`Interface
`
`Shared TRANSMIT Memory
`
`' .
`... ---......... -.. -.. --------.. -........ -- ... -...
`.. --- - ... -.... -..... -........... .... -...... -...... -.. -.. -.. .
`' .---~~~~~~~~~~~~~--.
`Shared RECEIVE Memory
`
`Figure 1 Architecture of the PPE
`
`a rendl sous before the message is copied dire1.:tly from the sender's 10 the m.:eiver's address
`space l"his behavior maps dirertly to the transputer's register mtxlel and microcode, wh11.:h sup(cid:173)
`port efficient context switches and transparent message passing via four external links and any
`numhcrof internaJ soft channels. However because OCCAM discourages the use of pointers and
`shared memory between different processes and offers very ltttlc suppo11 of user-defined struc(cid:173)
`tured da1a types, we chose IO do our implementation in the C programming language I LS-C 891.
`Access 10 the transputer specific facilities, such as synchronous message passing and process
`contml, I!> provided through library funrnons, which c.:an. in pan. also be generated as more effi(cid:173)
`cient rnline code hy the compiler.
`
`3. THE TCP/IP PROTOCOL STACK
`
`We tmplemenred the full TCP/IP prmocol stack on the PPE. It conststs of the lmernet Protocol
`OP), the f111eme1 Conrro/ Mes.\aRe Protocol (lCMP), and the Transrmssion Control Protocol
`<TCP> \pplicanons interface to the protocol implementation vi,1 <>ockets. >irrnlar 10 the BSD ver(cid:173)
`~ion of Unix2.
`
`lino\ ''a rc1p,1cml irackmarl. of ,\T&T on t.hl' Unncd Stal~' and other rnun111c\.
`
`

`

`122
`
`IP is a datagram protocol that implements functions similar to those of the OSI Connec11onless
`Network Protocol (\LNP). ICMP, wluch is an integral part of lP. is used to exchange control mes
`>ages between internet clients, e.g., it generates a destination unreachable mesi.age when the ad(cid:173)
`dressing infomiation in a received datagram does not allow forwarding or local delivery. TCP,
`w h1ch roughly implements the ISO Transpon Layer functions, provides an error and now-con(cid:173)
`trolled end to-end transport connecuoo between applications. TCP thus builds reliable data trans(cid:173)
`mission services on top of the unreliable IP datagram service. A TCP connection is specified
`through the pairof Internet addresses and the TCP pon identifiers of the two communicating part(cid:173)
`ners. The socket 1s the local end point of a TCP/IP connection. The application program accesses
`sockets through local idcntifit:rs, similar to file descriptors in Unix.
`
`As we did not \\.'am to implcmcnr TCP/JP from scratch. we based our work on a version of TCP/IP
`for MS/DOS from the University of Maryland IUM 901 .
`
`.i. PARALLEL IMPLEMENTATI01' OF TCP/IP
`
`To develop a parallel solution one needs to partition the problem 111to a sci of subproblerm that can
`be executed 1n parallel. The algorithms solvrng these subproblems arc typically encapsulated m
`cooperating processes which are mapped to the parallel-processor hardware. Depending on the
`underlying hardware and the implementation model chosen. these processes communicate and
`synchro1111c via shart:d memory or message pa~sing.
`
`Application
`
`Buff er
`
`user_task
`
`tcp_task
`
`Transmission
`Control
`Protocol
`
`Internet
`Protocol
`
`Buffer
`
`ii? send 'I< I
`
`I dnver send I
`
`Network Adapter
`
`I Procedure I ( Process )
`
`Figure 2. MS/DOS IP Process Structure
`
`

`

`123
`
`The source code, which served as a basi' for lhis implemen1a11on, was already struc1urcd imo
`111ul11plc processes 1ha1 run on 1op of a simple, non-precmp1ive mullimskmg kemcl. Figure 2
`shew•' 1he original spli1into 1hree processes and one in1errupt service routine. Having such a pro(cid:173)
`ce5S s1ruc1ure a11m~t:d us 10 May fairly close to the original source.
`
`As we: wan1ed to execute the IP layer on d1ffert:n1 processors dwn the TCP layer, we first isolated
`the IP rclcvan1 functions from bo1h tcp_task and ip_task into separate processes. Because of the
`f um:tional division of the PPE 11110 a rransmit and receive side, we 1hen spli11hc remainder, i.e. !he
`core ol the TCP protocol, of tcp_task and lp_task vemcallr m10 three processes (rtask,
`tcp_recv running on the receive side and xtask running on 1hc 1ransmi1 side). We will describe
`1he func11ons of 1hc various pro1ocol processes, 1hu1 implemen1 IP, TCP and tht: socket layer in
`1urn. Pigurt: 3 shows the high-level process struc1ure we derived for our implcmenta1ion.
`
`Transmit Side
`
`Receive Side
`
`Application
`
`Socket Layer
`
`Transmission
`Control
`Protocol
`
`Internet
`Protocol
`
`Media
`Access Control
`
`ip_demux
`
`ip_intersvc
`
`Network Adapter
`
`[>roeedurel ( Process )
`
`Figure 3. High-Level Process Structure
`
`In lh..: followmg we presen1 our parnllel soluuon in a 1op-down approach, fir..t 'howing the high(cid:173)
`h:vel process graph of the main processes m our implementalion. These processes have access to
`data shared between 1he transmit and receive side and can interac1 with one another via high-level
`Pnmi11ves such as remote procedure calls (RPC) and queues. In a second step, we will then show
`ho"' the~e services, m particular shared dala between the receive and tramm11 side as well as
`RPC~ from the receive to 1he transmit side, have been realized on the PPE.
`
`

`

`124
`
`4.1 IP and ICMP
`Becau~e IP is a datagram protocol, the normal flow of data through IP in an end-system requires
`no interaction betwet:n the receiving and transmitting part. Routing infom1ation and exception
`handling, however, require a data exchange. The handling of exception and control messages is
`the function of ICMP. We therefore partitioned IP into two independent processes lcmp_demu1e
`and lp_demux. To guarantee the timely handling of incoming packeis, we dedicated a separate
`proces\ on the receive side of the PPE to the handling of the physical network interface.
`
`The rouung table 1s shared between both processes on the Lransmit and receive side of the PPE. An
`RPC 1~ used if lcmp_demux needs to send out an ICMP message.
`
`4.2 TCP
`Splitting the PPE hardware into a i.eparate send and receive side had more impact on how \.\-e had
`to deal with TCP, the socket layer, and application layer, than it had on IP.
`
`We dedded to split the finite \tale machine (FSM) responsible for implementing a TCP connec(cid:173)
`tion into two separate FSMl> once the connectton is in the data phase. The actions of these FSMs
`are implemented on the receive side through two processes, rtask and tcp_recv. On the transmit
`side one process xtask implements the FSM. Owing to the duplex nature of TCP and the piggy(cid:173)
`backing of control information in data packets, these processes need to share the protocol's send
`and receive ~rate vanables maintained in the transmil:.ion conrrol block (fCB).
`
`tcp_ recv demultiplexes incoming TCP segments, locates the appropriate TCB and executes the
`required action for the FSM state. Header prediction is used Lo speed up packet handling for pack(cid:173)
`ets amving con~cutively on the same connection. Correctly received segments are appended to
`the rece1 ve queue and the application process waiting on this connccuon is then woken up to move
`the data to its own buffers. When the received data exceeds the c1cknowledgement threshold,
`which is specified as a percentage of the advertised receive window, tcp_recv makes an RPC to
`the transmit side to generate an acknowledgement. The acknowledgement is sent a.<; a separate
`packt:t, unless this information can be piggybacked onto an outgoing data segment.
`
`rtask 1s drivi=n by two timers, one responsible for delayed acknowledgements, the other for keep(cid:173)
`alive messages. In steady state data transmission, rtask should never generate an acknowledge(cid:173)
`ment, as tcp_recv already generates ackno\.\-ledgernents \.\-hile data are received. Only when the
`timer runs out and new unacknowledged data have been received since the last acknowledgement
`will rtask generate an acknowledgement. Similarly, keep-alive messages are also sent only when
`no acnvity has taken place on a rnnnection for some time. Again, both acknowledgements and
`keep-ahvc messages are gencrntcd via RPCs to the trJllSIIllt Mde.
`
`On the transmit side the process xtask manages the trnnsmit queue and the retransmission timers.
`To send data, xtask creates the TCP header and fills in the necessary infonnatlon from the TCB.
`<;uch as addresses and sequence number.. for the data and acknowledgements. The header and a
`pointer to the d;ua are then pas'.>Cd ro the IP process (procedure lp_send), which embeds this in(cid:173)
`fonnataon into an IP datagram.
`
`4.3 Socket Layer and AppliC<1tion
`To fac1lttate our cxpenments wuh TCP/IP, we decided as a first Mep t0 implement the entire sock(cid:173)
`et layer us well as the test applicanon on the PPE. A detailed description of the interaction~ be-
`
`

`

`125
`
`tween an application on the hose i.yMem and a protocol on the PPE can be found in [ Kaiserswcnh
`92].
`
`In our implementation, the socket layer, although tighdy coupled with TCP, is part of the applica(cid:173)
`tion process. It is m:cessed via a procedural tnterfacc. used to create a socket, btnd an address to it
`and establish a TCP connection w11h a remutc socket. As the FSM logic to establish connecuons is
`also pan oftcp_recv, we decided t0 place the socket and the application code on the receive side.
`Because we wanted to avoid moving data to be sent from the receive side to the transmit side via a
`iran~puter lmk3. we ah.o allow the apphcauon to U\e buffers on the transmit side of the PPE.
`When data is to be transmitted, the send procedure simply makes an RPC with the buffer address
`on the transmit side, thus causing the write process to copy the data from this application buffer co
`che TCP send queue. When the application wants to receive data, the receive procedure checks
`the receive queue for this connecuon and blocks the application process if the queue is empty.
`
`4A Low-Level Primilives
`Before givmg an example of how TCP data segments are sent and received. we describe how we
`maintatn shared TCBs and muung tables on the PPE. which docs not have shared memory be(cid:173)
`tween its transmit and receive side, and how we realize RPCs from the receive to the transmit side.
`Figure 4 shows the process &rraph of the additional processes required to implcmem these fu nc(cid:173)
`tions.
`
`rpc_process
`
`Arty Process
`
`wnte
`
`Any Process
`
`peek_poke
`
`rpc_dernux
`
`peek_poke
`
`Any Process
`
`Transmit Side
`
`. . ._ dedicated Transputer Link <J-- internal Channel
`Figure 4. Low-Level Primitives
`
`Receive Side
`
`We implement dwrifJwed shared memory between the transmit and receive side by placing the
`data struciurcs that arc to be shared at identical physical addresses tn the local memories of the two
`processors\\ ht ch a<.:<.:ess the data structure. Whenever a value is wntten onto the local copy of the
`data structure. die address of the variable and its value are sent via a dedicated transputer link to a
`~erver process. peek poke. on the remote ~idc. This process then updates the memory area iden(cid:173)
`t1f1ed thruugh the address \.l.ith the accompanying value. The peek_poke processes run at high
`priori!) to ensure 1hat the exchange of a message with a remote process takes place immediately
`and is not delayed by scheduling overhead, which would then also delay the remote process be(cid:173)
`cause of the transputer's synchronous message passing. Serializing write accesses co the shared
`data structures is nor necessary m our case. Ea<.:h replicated data structure fall~ tnto two parts, one
`
`3 We nw;:i,urcd an cftcwvc Lhroughput r.ite of approx1ma1cly 14 \1b/s aero'~ a ltallsputcr link, clearly much
`lo"' er thJn 'ta our h1gh-,pccd parullcl 1111crfacc.
`
`

`

`126
`
`wri11en only from the receive side (e.g .. the updated tnu1~mi1 w111dow), and 1hc other wrirten only
`from the tr;111smi1 ~idc (e.g .. the la\t -,end sequence number).
`
`Since we do not have a lockrng protocol for acces\mg shared data structures. It is possible thill for
`a brief period after chc local update and before the remote update has been propagated, the \atnc
`field in the shared data siructure contains 1wo different values. Because of the properties of TCP
`and the way we have 'Pill the proto.:ol onto the cran\mll and receive side of the PPE, this inconMs(cid:173)
`tency will only be of importance if 111s 1he reason for the protocol Mate to change. As an example
`consider the following: assume the n:transrnission timer (it is abo maintained in the TCB) in
`xtask cxpin:s and, because the acknowledgement field in the TCB docs not indica1e reception of
`an acknowledgemc111, xtask decides 10 retransmit the unacknowledged TCP \Cgments. On the
`recdve side. ho\\ ever. ;m at·i-nowlcdgemem has been received in the meantime which makes this
`retransm1s\lon unnecessary4. To avoid this problem, before actually going 10 a retransmit state,
`xtask will reread the acknowledgement field, now however with the value on the receive side, to
`make sure that a retransmission is warranted. Reading a remote field is similar tu writing; a mes(cid:173)
`~age with the address and stze of the variable is sent to tht remote peek_poke process, which then
`returns the value of that field.
`
`RPCs from the receive to the transmtl side have been implemented as follows: any process on the
`receive side can fom1a1 an RPC message, whit·h is then sent via a dedicated transputer link to the
`rpcJlrocess. This process will then execute the remole procedure. or 111 the case of transmission
`request,, pa\s the request via a local (internal) channel 10 the appropriate write process. one of
`.,.. hich exists for each TCP connection Return values are sent, agarn via a dedicated transputer
`ltnk. bac~ to the receive side to rpc_demux, which forwards these values over a local channel to
`the proccs' 1hat had iniuatt:d the RPC. Upon receiving 1he return value. the calli.:r becomes ready
`again and can continue its e"ecution.
`
`4.5 Example
`Sending u TCP dam sewne111: The normal data flow is shown in hgure 3. The send data are in a
`remotely al located buffer on the transmit side. The application creates a socket ;ind establishes a
`TCP connccuon. The socket send call causes an RPC to the n:mote write process which in tum
`copies the data into the TCP send buffer. xtask then controls the tran~mission and eventual re(cid:173)
`transmissions of the data. The send procedure builds the TCP segment and fory.ards the pointer to
`the segmen1 and the assocmted control block to lp_send. Here the IP header is placed in front of
`the TCP segment and then the packet is sent to the network. The data is copied twice: first from the
`applicauon buffer 10 the -.end queue in ~hared memory and from there to the network.
`Receiving a rep datll .\t'f.lmem: Upon receipt the data IS also copied twice: first from the network
`to the receive queue and from there 10 the application buffer. The interrupt handler process serves
`tbe physical interface and forwards poinlers to received datagrams 10 lp_demux, which checks
`the header and forward~ the packet depending on us type to tcp_recv or lcmp demux.
`
`tcp_recv analyzes the TCP header and calls the appropriate handler function for a given protcxol
`~ late. To ~end an acknowle<lgement or a control packet, tcp_recv uses RPCs to the transmit side
`Correctly received segments are appended to the receive queue. rtask wakes up the application
`process which b blocked in the socket receive procedure. This procedure then fill~ the user buffer
`with data from the receive queue.
`
`4 Note: lhc logK of I.he prolornl would allow for a re1rnnsm1~s1on many case.
`
`

`

`127
`
`4.6 Configuration
`On each side of the PPE only ont: of the two processors 1s physically able to control the 111terfacc m
`the network. Thus we placed the tlevicc driver and the IP layer processes on those two processors.
`TCP. the ".><.:ket layer. and the application executc.> on the second processor on each side of the
`PPE.
`
`4.7 Memory Management
`The buffer memory 111 the protocol!> and the socket is managed in an mbuf like linked list There is
`only one buffer size to simplify these func11ons. The buffer size determines the maximum TCP
`segment ~ize. Provided there is sufficient physical memory (up to 4 MB on each side of the PPE)
`large fl\Ctl-sizcd buffers help avoid costly memory managemem functions. Buffer queues and the
`free but fer lt~t an: protected by semaphores to seriali1c access to the data structures from different
`processes on the same processor.
`
`The data and control flow in the PPE is organized such that only one processor requests buffers
`and the other only releases buffers. We ensure that one buffer element always remain~ 111 the
`queue, thus one processor can always append to the end of the buffer queue and the other proces(cid:173)
`sor can consume the first clement without requiring any addJiional queue access protocol between
`the two processors
`
`5. PERFORMANCE
`
`5.1 1esl Setup
`To mea~ure the pcrformanc<: of the TCP/IP 1mplementat1on "'e used a .,1mple te~t driver running
`on the PPE: a source proct:s~ on one system that sends data over a !;OCket and TCP/IP to a sink
`process on the other sy~tem which receives the da1a. TI1e setup 1> ~hown in Figure 5.
`
`Source
`
`Su'oll
`
`Transmit Side I Socket !TCP I
`
`Application
`Receive Side Socket rrcP
`
`Figure 5 Test Environment
`
`Socket !TCP Transmit Side
`
`Applica1ion Receive Side
`Socket /TCP
`
`As 1hc final goal of this work i., to offload protocol processing from the workstation to the adapter,
`we examined the following two \Ccnarios:
`
`Scenario I. The complete socket layer is unplemented on the subsystem. Upon receip

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket