throbber
Evaluation of Bypass Architecture for High-Speed Bulk
`Data Transfer through deeply layered protocol stacks
`
`by
`
`Yong Hua Thia, BSe, MSe, ole
`
`A Thesis submitted to the Faculty of Graduate Studies and Research in partial
`fulfillment of the requirement~ for the degree of
`
`Doctor of Philosophy
`
`Ottawa-Carleton Institute for Electrical Engineering,
`Faculty of Engineering,
`Department of Systems and Computer Engineering.
`Carleton University.
`Ottawa, Ontario, Canada, K1S 586
`March 5, 1994
`
`© Yong Hua, Thia 1993
`
`Ex.1069.001
`
`DELL
`
`

`

`••• National Library
`
`of Canada
`
`Acquisitions and
`Bibliographic Services Branch
`
`395 Wellington Street
`Ottawa. Ofltano
`K1AON4
`
`Bibliotheque nalionale
`duCanada
`
`Direction des acquisitions el
`des services bibliographiques
`395. rue Welhngton
`Ottawa (Ont.",,)
`KtA0N4
`
`The author has granted an
`irrevocable non-exclusive licence
`allowing the National Library of
`Canada
`to
`reproduce,
`loan,
`distribute or sell copies of
`his/her thesis by any means and
`in any form or format, making
`this thesis available to interested
`persons.
`
`L'auteur a accorde une licence
`irrevocable et non exclusive
`it
`permetiant
`la Bibliotheque
`nationale
`du Canada
`de
`reproduire. preter, distribuer ou
`vendre des copies de sa these
`de quelque maniere et sous
`quelque forme que ce soit pour
`metire des exemplaires de cette
`it
`these
`la disposition des
`personnes interessees.
`
`The author retains ownership of
`the copyright in his/her thesis.
`Neither the thesis nor substantial
`extracts from it may be printed or
`otherwise
`reproduced without
`his/her permission.
`
`L'auteur conserve la propriete du
`droit d'auteur qui protege sa
`these. Ni la these ni des extra its
`substantiels de
`celle-ci
`ne
`dolvent
`etre
`imprimes
`OU
`autrement reproduits sans son
`autorisation.
`
`ISBN 0-315-92944-8
`
`Canada
`
`Ex.1069.002
`
`DELL
`
`

`

`This is an authorized facsimile, made from the microfilm
`master copy of the original dissertation or master thesis
`published by UMI
`
`The bibliographic information for this thesis is contained
`in lJMr s Dissertation Abstracts database, the only
`central source for accessing almost every doctoral
`dissertation accepted in North America since 1861.
`
`UMI° Dissertation
`
`Services
`
`From: Pro Quest
`
`COMPANY
`
`300 North Zeeb Road
`P,O, Box 1346
`Ann Arbor. Michigan 48106-1346 USA
`
`734.761.4700
`800.521.0600
`web www.iJ.proquest.com
`
`Printed in 2006 by digital xerographic process
`on acid~tree paper
`
`Ex.1069.003
`
`DELL
`
`

`

`
`DELL Ex.1069.004DELL Ex.1069.004
`Ex.1069.004
`
`DELL
`
`

`

`

`

`.+. National Library
`
`of Canada
`
`AcqUIsitions and
`Bibliographic Services Branch
`
`Bibl;otheque nationale
`du Canada
`
`Direction des acquisitions et
`des services bibliogr;:,phlques
`
`395 Welnngton Streef
`Ottawa,Ontarto
`KIA ON'
`
`395, rue Wetitngton
`onawa (Onlanol
`KIA ON4
`
`NOTICE
`
`AVIS
`
`The quality of this microform is
`heavily dependent upon
`the
`quality of
`the original
`thesis
`submitted
`for microfilming.
`Every effort has been made to
`ensure
`the highest quality of
`reproduction possible.
`
`La qualite de cette microforme
`depend grandement de la quaiita
`de
`la
`these
`soumise
`au
`microfilmage. Nous avons tout
`fait pour assurer une qualita
`suparieure de reproduction.
`
`If pages are missing, contact the
`university which granted
`the
`degree.
`
`S'U manque des pages, veuillez
`communiquer avec ('universita
`qui a confere Ie grade.
`
`Some pages may have indistinct
`the original
`print especially if
`pages were typed with a poor
`ribbon or
`if
`the
`typewriter
`university sent us an
`inferior
`photocopy.
`
`La qualite d'impression de
`certaines pages peut (aisser it
`desirer, surtout si
`les pages
`originales
`ont
`ete
`dactylographiees it I'aide d'un
`ruban use ou si I'universite nOl'S
`a fait parvenir une photocopie de
`qualite inferieure.
`
`Reproduction in full or in part of
`this microform is governed by
`the Canadian Copyright Act,
`R.S.C. 1970,
`c. C-30, and
`subsequent amendments.
`
`La reproduction, me me partiel/e,
`de cette microforme est soumise
`it la Loi canadienne sur jj droit
`d'auteur, SRC 1970, c. C-30, et
`ses amendements subsequents.
`
`Canada
`
`Ex.1069.006
`
`DELL
`
`

`

`

`

`

`

`ABSTRACT
`
`Techniques for handling high data throughput in end-systems attached to data net(cid:173)
`
`works are the subject of this thesis. The advent of Fibre Optic technology which "ffers
`
`high bandwidth and low bit error rates. has shifted the performance bottleneck in data
`
`communications from the channel to the processing in the end-points of the system. T"
`
`alleviate the end-system bottleneck one may consider new prmocols. improved ",ftware
`
`implementation <If existing pHllOc"ls. parallel pHlcessing techniljues. spec:ial pro[(lc<-I
`
`structures and hardware assist by nft10ading all or part of the pfllto<.:ol fum:tions to an
`
`attat:hed processor or adapter.
`
`In this thesis. speedup in protocol processing is accomplished by using a combination
`
`of approaches based on general software performance engineering principles. A novel
`
`concept for protocol implementation. which is a generalization of Jacobson's "Header
`
`frediction" algorithm for TCP/IP is introdut:ed. Bypassing identifies a fasl path for bulk
`
`data transfer. by which frequent simple operations are found and optimized. We propose
`
`and evaluate three different methods for bypassing.
`
`In the f rst aporoach. throughput
`
`performance is increased by pure software improvements due to bypass. The second
`
`approach uses dedicated VLSI hardware for offloading protocol functions in the bypass
`
`path. A feasibility study was conducted using the industry standard VHDL language
`
`to model the bypacs chip which we call ROPE (Reduced Operation Protol.:ol Engine).
`
`Results show that it is feasible to implement the bypass path for the Session and Transport
`
`layers of OS! protocols in YLSI with current gate array technology, and that ROPE is
`
`capable of supporting a very high data rate -
`
`313 Mbps with TN checksum. Finally. in
`
`the third approach, a scalable packet-level architecture is proposed and evaluated. It uses
`
`digital signal processing compnnems which are readily available. A simulation mode!
`
`with realistic instruction execution times was perfonned, and its results were validated
`
`with an analytical model. With POBA. a system executing the three layers (transport.
`
`session and presentation) can scale up to speeds close to the host memory bandwidth.
`
`Ex.1069.009
`
`DELL
`
`

`

`ACKNOWLEDGMENTS
`
`This research was supported in large part by the Tclectlmmunkations Research
`
`Institute of Ontario (TRIO).
`
`The debt llf gratitude that a Phd student "wes his advisor at the end of a project
`
`this size is enormous. I am greatly indebted to Prof. Murray Woodside for his excellent
`
`guidance. continued encouragement. and generous support. as well as his e.xtrallrdinary
`
`ratience in Improving my writing and communication skills.
`
`Discussions with Craig Scratchley. Erik Dravnieks. Govindachari Raghunath. Reza
`
`Etemadi. Srinivas Vunn:J.v:J., Weimin Ma. Yao Li and many others have ~een both invigo(cid:173)
`
`[".ting and informative. Greg Franks helped me get started on the testbed implementatinn
`
`and Curtis Hrischuk was always there to help with questions pertaining to PARASOL.
`
`I would also like to thank Bell-Northern Research which provided the VHDL trlois
`
`and facility used in this study. I am particularly indebted to Dl. Simon Curry. Hemi
`Thakar. Dr. Parviz Yousefpour. Bernard Dnray, Mike Majid and Mustapha r:
`who helped with the use of their tools :.nd made my stay at BNR an enjoyable and
`
`ahla
`
`memorable one.
`
`I am grateful to all my friends and fellow students in the Systems and Computer
`
`Engineering department. In partlcular. I enjoyed playing squash and going on sid trips
`
`with Norm Lo. Francois Vien. Robert Dussault and Quyen Vi Bo. Special thanks are due
`
`to the "ffke and support staff, Naren Mehta. John Pope. Dave Sword. Danny Lemay.
`
`Darlene Hebert, Elena Keen. Vivienne Gilchrist. and others who assisted in nne way
`
`or another.
`
`Last but not least. I am deeply grateful to my parents for their support and encour(cid:173)
`
`agement. My wife. Irene and son Brandon. endured years of sacrifice and hardships 10
`
`see me through this program. lowe my family so much that I could find no words ttl
`
`express my deepest gratitude.
`
`IV
`
`Ex.1069.010
`
`DELL
`
`

`

`Contents
`
`viii
`x
`
`. .... 1
`3
`6
`7
`
`2
`
`3
`
`ABSTf-1ACT
`ACKNOWLEDGMENTS
`List cf Figures
`List of Tables
`Introduction
`1
`1.1 The problem statement and motivation . . . . . . . . . . . .
`1.2 The proposed solution ...
`1.3 Summary 01 contributions
`1.4 Thesis Outline ....
`Background
`2.1
`Introduction .,. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8
`2.2 OSI Architecture and layering . . . . . . . . . . . . . . . . . . . . . . . .. 9
`2.2.1 Why Layering? . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 10
`2.2.2 Layering Principles ........................... 11
`2.3 Overview of the Seven Layers of the OSI Architecture ......... 13
`2.4 Why are protocol implementations slow? .... , ............ 15
`2.4.1 Processing and data path of a typical application PDU ..... 15
`2.4.2 Host Architecture .. ...................... ... 18
`2.4.3 Adapter .................................. 18
`2.4.4 HosVAdapter interlace .......................... 19
`2.4.5 Operating system support ....................... 20
`2.4.6 Buffer Management ........................... 21
`2.4.7 Timer Management ........................... 22
`2.4.8 Software Implementation ........................ 23
`2.4.9 Protocol structure and mechanism .................. 23
`2.5 Software Performance Engineering Principles ............... 24
`Review of the State of the Art in Performance Improvement of Upper
`Layer Protocols
`3.1
`Introduction .................................... 27
`3.2 Lightweight Transport Protocols and their mechanisms to support
`high-speed networks ....:............. . . . . . . . . . . . . 28
`3.2.1 Summary of Transport Service Functions that affect performance
`of high-speed networks . . . . . . . . . . . . . . . . . . . . . . .. 30
`3.2.1.1 Connection Management ................... 31
`3.2.1.2 Flow Control and Acknowledgment . . . . . . . . . . ... 32
`3.2.1.3 Error Handling .......................... 33
`3.2.2 Lightweight Transport Protocols ................... 33
`3.3 Special Implementation Techniques ..................... 36
`3.3.1 Header Prediction ............................ 38
`
`v
`
`Ex.1069.011
`
`DELL
`
`

`

`4
`
`5
`
`3.4 Hardware Implementation " ......................... 39
`3.5 Parallel Hardware Implementation ...................... 40
`3.6 Special Architecture or Special Protocol Structures ........... 43
`3.6.1
`Integrated Layer Processing ..................... 44
`3.0.2 AXON ................................... 44
`3.6.3 HOPS ................................... 46
`3.6.: Flexible High-Performance Communication Subsystem ..... 46
`3.7 High-Speed Presentation L.ayer . . . . . . . . . . . . . . .
`. .47
`Problem Statement and Proposed Solution
`4.1 Characterization of Gbps processing requirement ............ 48
`4.2 Goals of this thesis ............................... 52
`4.3 State-of-the-art techniques from the SPE point of view ......... 54
`4.4 Characteristics of the Proposed Solution .................. 56
`4.5 Proposed Optim:stic Principle . . . . . . . . . . . . . . . . . . . . ..
`. 58
`The Bypass Concept and Software bypass
`5.1 Bypass Concept ................................. 60
`5.2 Bypass Architecture and its Implementation ................ 63
`5.2.1 Send Bypass Test . . . . . . . . . . . . . . . . . . . . . .. . ... 63
`5.2.2 Receive Bypass Test .......................... 64
`5.2.3 Bypass Stack ............................... 65
`5.2.4 Shared Data ............................... 65
`5.2.5
`Isolating Data Transfer phase from Connection Establishment
`and Release jlhase .. . . . . . . . . . . . . . . . . . . . . .. " 65
`5.3 Multiple-layer bypass .............................. 66
`. .. 67
`5.3.1 Difficulties of multiple-layer bypass ...........
`5.4 Bypass without window flow control . . . . . . . . . . . . . . . . . . . .. 68
`5.5 Bypass Algorithm with window flow control ................. 70
`5.6 Experimental Setup ............................... 71
`5.6.1 Software structure ............................ 71
`5.7 Results of Bypass Implementation ...................... 74
`5.8 Summary. . . . . . . . . . . . . . . . . . . . . .. . ............. 77
`6 A Reduced Operation Protocol Engine (ROPE): Design and
`Implementation of Bypass chip
`6.1
`Ir.!roduction to VHDL concepts ........................ 79
`6.1.1 Design ................................... 80
`6.1.2 Architectural Description ........................ 81
`I). ~.3 Experimental Setup ........................... 81
`6.1.4 Behavioral description ......................... 84
`6.1.5 Extensions to include major procedures for Transport Class 4
`(!mplemented) .............................. 87
`6.1.5.1 OSI Checksum .. , ................. , ..... 87
`6.1.5.2 Timers .......... ,................ .... 88
`6.1.5.3 Retransmission and Resequencing . . .. . ...... , 88
`
`vi
`
`Ex.1069.012
`
`DELL
`
`

`

`6.2 Results ....................................... 89
`6.3 Extensions to support Session BAS and 8SS. and the Presentation
`layer (Not Implemented) . . . . . . . . . . . . . . . . . . . . . . . .. .. 90
`6.3.1 Session 8SS and BAS ......................... 91
`6.3.2 Presentation Layer proces5ing .................... 93
`6.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
`7 POBA - Parallel Optimistic Bypass Architecture
`7.1 Description of POBA. . . . . . . . . . . . . . . . . . . . . . . . . ..
`..95
`7.1.1 Hardware Architecture
`. . . . . . . . . . . . . . . .
`. . . . . 96
`7.1.2 Sequence of Operations ........................ 97
`7.1.3 Protocol processing at each PPN Node
`'" .......... 100
`7.1.4 Advantages of POBA ......................... joo
`7.2 Characterization of processing requirements and throughput analysis of
`POBA ....................................... 101
`7.2.1 Workload Analysis for POBA .................... 102
`7.2.2 Performance bounriS ......................... 105
`7.2.3 Scalability ................................ 108
`7.3 An implementation using standard off-the-shelf DSP
`microprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 0
`7.3.1 Proposed Hardware for the adapter . . . . . . . . . ....... 110
`7.3.2 Task Architecture ......................... .. 110
`7.3.3 Simuiation technique . . . . . . . . . . . . . . . . . . . . . . . .. 114
`7.4 Simulation results ............................... 117
`7.5 Simulation versus analytica! results .................... 122
`7.6 Bypass Issues ................................. 126
`7.7 Predicted performance .... . . . . . . . . . . . . . . . . . . . . . . . . 127
`7.8 Summary ..................................... 128
`Conclusions and Future Work
`8.1 Conclusions ................................... 130
`8.2 Advantages and What it solves . . . . . . . . . . . . . . . . . . . . . . . 132
`8.3 Future Work . . . .. . . . . . . . . . . . . . . . . . . . . . . . . ...... 134
`8.3.1 New Applications for Bypass . . . . . . . . . . . . . . . . . . .. 135
`8.3.2 ApplyinQ perfor~ance principles for last connection setup in
`connectlon-onellted protocols . . . . . . . . . . . . . . . . . .. 138
`List of Abbreviations
`142
`Bibliography
`145
`159
`Appendix A
`Pseudo Code for bypass
`Appendix B
`163
`TP4 Checksum
`Appendix C
`ASN.1 Basic Encoding Rule tor Integers
`165
`Appendix D
`Schematic Diagram Generated by SYNOPSYS synthesis tool
`(See 6.1.3)
`168
`'iimeline Snapshot (See 7.3.3)
`i 69
`Simulation Results for POBA (See 7.4)
`i 70
`
`Appendix E
`Appendix F
`
`8
`
`vii
`
`Ex.1069.013
`
`DELL
`
`

`

`List of Figures
`
`Figure 1
`Figure 2
`Figure 3
`Figure 4
`Figure 5
`Figure 6
`Figure 7
`
`Figure 8
`Figure 9
`Figure 10
`Figure 11
`Figure 12
`Figure 13
`Figure 14
`Figure 15
`Figure 16
`Figure 17
`Figure 18
`Figure 19
`Figure 20
`Figure 21
`
`Figure 22
`Figure 23
`Figure 24
`Figure 25
`
`Figure 26
`Figure 27
`Figure 28
`Figure 29
`Figure 30
`
`Figure 31
`Figure 32
`Figure 33
`Figure 34
`
`, , , , , , 10
`Open System Interconnection Reference Model
`Layering Interface
`. , , ' , , ' , , , . .' '....,'" 12
`Layer Interface . . . , . . . . . . , , . , ,
`, .. 13
`Data Movement in End·Systems . , ' , . ' , . , .. , , , , ,17
`State of the Art Review . , , . . . . . , . . , , . .
`' , , 29
`Zitterbart: Structural and Functional Parallelism .. ,
`,.41
`Jain Schwartz and Bashkow: Packet level Parallel
`processing of TP4 .. , . . , . , . . . , , . . . . . , , , . , .. 42
`Sterbenz and Parulkar: AXON Block Diagram .... ". 45
`Haas: HOPS Architecture .... , ... , , , .. , .. ' , . , 46
`Effect on throughput by varying X and B "
`, . , . "
`,50
`Processor requirement for 1 Gbps protocol processing, , 51
`Time Sequence Diagram ...... , , , .. , , , . , , , , , ,61
`Finite State Machine of TPOITP2 , ' . , . , , , , . . , , , , ,62
`Bypass Architecture . , . . . . , . . ' , . . . , , , . . . . , , ,64
`Critical path taken by bypassed packets ' . , . , , . , , , ,68
`Sender and Receiver Header Templates . . . , , . . . . .. 69
`ACK TPDU ,., ... " . " . " . , ..... , . , , , , , , 70
`Header Templates .. , . , , . , . , , , , . , , , , .. , , , .. 71
`Experimental testbed , .. , . . . , . , . , , , ... , , , , . , 72
`Procedural Flow Chart of Protocol Testbed " " " ' " 74
`Software and Hardware processing path of a non-bypass
`versus a bypass system . . . . . , , , . . , . , . , . , . . , . 80
`Block Diagram of VLSI bypass system . , , , . . , , , ' , . 82
`Design flow diagram. , , , , . , , , , , . , ' , , . . . , , , , . 84
`Organization of intqrnal bypass chip memory .. , , ' , , . 86
`Activity management. Dialogue units and Synchronization
`paints . . , . . . • . . . , , , , . . . , , , . . . ' , , . . , . , , . 92
`Send Bypass Stack Parallel ArChitecture ' , , . . . . , , , . 98
`Bus Utilization. . . . ' , , . . . , , , , . . , , , , . . . , ' . . 1 03
`Performance Bounds , .. , , , .. , . , ... , , , ., ,. 107
`Effects of varying.\" and Brl' on throughput Bounds, . , 108
`Scalability of POBA with packet size (P) and PPN workload
`(\ P) .... , , , ..... , , .. , , , , ... , . , . , , . , . 109
`Software task architecture . , . . , , . , , . , . , . . , . . . 111
`3D plot of Throughput versus.\" and .Y (1 Kbyte) , . , , 117
`Throughput versus .'Ii (1 Kbyte). , , , , , , . , , . , , , . , 118
`. 118
`Throughput versus X (1 Kbyte). , . , , , , , .. ' .. ,
`
`v iii
`
`Ex.1069.014
`
`DELL
`
`

`

`Figure 35
`Figure 36
`Figure 37
`Figure 38
`Figure 39
`Figure 40
`Figure 41
`Figure 42
`Figure 43
`
`Figure 44
`
`Figure 45
`
`Figure 46
`
`Figure 47
`
`Figure 48
`Figure 49
`Figure 50
`Figure 51
`Figure 52
`Figure 53
`Figure 54
`Figure 55
`
`Figure 56
`Figure 57
`Figure 58
`Figure 59
`Figure 60
`
`Average PPN utilization versus S (1 Kbyte) ........ '19
`Average PPN utilization versus X (1 Kbyte) . . . .
`. . 120
`Control Processor Utilization versus .Y (1 Kbyte) ..... 121
`Control Processor Utilization versus X (1 Kbyte) . . . . . 121
`POBA with 2 Processors . . . . . . . . . . . . . . . . . . . . 122
`POBA with 4 Processors . . . , . , . .
`, . . . . . . . 123
`POBA with 8 Processors . . . . . . . . . . . . . ..
`"
`. 123
`POBA with 10 Processors . . . . . . . . . . . . . . . . . . . '24
`Throughput versus X -. Simulation results compared with
`r c,alysis
`. . . . . . . . . . . . . . . . . . . . .
`. . . . . . 125
`Throughput versus .v - Simulation results compared with
`analysis
`. . . . . . . . . . . . . . . . . . . . . . . . . . , . . . 125
`Throughput versus message length - Simulation results
`compared with analysis .................... 126
`Effect of switch in processing paths, causing the bypass
`path to be flushed.. . . . . . . . . . . . . . . . . . . . . . . . 127
`Extrapolation of throughput bounds for improved
`technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
`Optimistic Connection Setup ......
`. ... , .. 140
`. . . . . . . . . . . . . 1 70
`POBA under various Workloads "
`POBA with multiple active Connections ........... 171
`Effect of switch in processing path for POBA with .Y = 8 . '71
`3D plot of Throughput versus X and .v (4 Kbytes) ... 172
`Throughput versus X (4 Kbytes) ............... 172
`Throughput versus .v (4 Kbytes) ............... '73
`3D plot of Control Processor Utilization versus X and .Y (4
`Kbytes) ............................... 174
`ContrOl Processor Utilization versus X (4 Kbytes) .... 174
`Control Processor Utilization versus .Y (4 Kbytes) .... 175
`3D plot of PPN Utilization versus X and .Y (4 Kbytes) . 176
`PPN Utilization versus X (4 Kbytes) ............. 176
`PPN Utilization versus .V (4 Kbytes) ............. 177
`
`Ex.1069.015
`
`DELL
`
`

`

`List of Tables
`
`Table 1
`Table 2
`
`Table 3
`Table 4
`Table 5
`
`Table 6
`
`.. 37
`
`Comparison of Ligt"jtweight Protocols
`Common procedures for Presentation, Session and
`Transport layers during data transfer phase ......... 52
`Bypass of Sessio>1ITPO .
`. .... 76
`. ...... 77
`Bypass of SessionITP2 . . . . . . .
`Throughput Performance and gate count of bypass VLSI
`chip (ROPE) ....... ........ ...... . ... 90
`Partitioning of protocol functions to host. PPNs and '.ne
`Control processor
`. . . . . . . . .. ......
`
`102
`
`Ex.1069.016
`
`DELL
`
`

`

`Chapter 1
`
`Introduction
`
`Gigabit networks are currently the subject of much research attention [1231. Appli(cid:173)
`
`catiuns for future Gigabit per second (Gbps) networks will probably include multimedia
`
`c(lmmuuicarions. distributed high-speed real-time graphics and support for the bulk trans(cid:173)
`
`fer llf scientific data. such as that "i tne High Energy Physics community [76].
`
`1.1 The problem statement and moH~'ation
`
`The advent of Fibre Optic technology which offers high bandwidth and low bit
`
`error rates. has shifted the performance bottleneck in data communications from the
`
`channel to the
`
`'TOcessing in the end-points of the system. The heavy processing load
`
`is due to a combination of operating system overhead. protocf'1 complexity. and per(cid:173)
`
`octet processing on the data stream. The operating system overhead includes non(cid:173)
`
`protocol rewed processing like context switching. interrupt handling, movement of
`
`data across layer boundaries and process scheduling associated with handling several
`concurrent stream of events. The' protocol complexity is partly historical [105. I X3 J
`
`and partly to provide very flexible standard capabilities. The per-octet processing is
`
`found in checksums and presentation processing. :(1C~flti{'lm:tI"'~'pf('5memji;>:¥ik-(!"
`
`"FeP:<f';!1;;'<:;'f'~"':y;<f;';'<1;9~f;1tftd,;OS:l;;::r::J;!4",t"'~'>:"~1'(;'<wftiett"tl'letm:le<>~m¥lJl'Iex>;;f'~l'fm:t1it:r;>'tl';>
`
`Mlnn_ml1""esl:ll!mi'!\:nmem?"'ter(!!i!;!e"Un6;;'t!'xcepti0fl:"haJ:lGl:mg,:o:ar-e,<ilo_£iclcl'M;;~,,;iae>fntliel'lt;
`
`f{'lf'0;bi~I'f';"!i~eH";1Xj;Cffillttro.eatmm>;lI;fldo<x_"~€Impi-i£ial#liI:';:t{'l:'<~;:;im~~,,:;m:",Il'::J':tt~e.
`
`The problem is even worse for the Session [35, 91, 98} and Presentation [99. 1001
`
`layers which have a larger set of control Protocol Data Units (PDUs), each with varying
`
`Ex.1069.017
`
`DELL
`
`

`

`data formats and optional control fields. For bulk data transfer. per-m:tet "perations
`
`like presentation em:odingJdecoding and checksum processing will significantly atlect
`
`throughput performance.
`
`The performance of upper layer protocols also depend on the hardware it is executing
`
`on. This includes the host processor. the memory. the bus and the network adapter. Cur(cid:173)
`
`rent high performance workstation architectures have been (lptimized for data processing
`"'-" .. ,
`rather than data movement [51 j, The RiSe based processors common in these work(cid:173)
`
`stations typically rely on cache memory to prevent "bubbles" in the processor pipeline.
`
`As host pwcessing speed continues tp outpace improvements in memory bandwidth. the
`
`penalty for access from main memory due to cache misses increa~es [71. X7l. Also. as
`
`the network bandwidth approaches the memory bandwidth, it is :mportant to keep data
`
`movement on the workstations down 10 the minimum, especially for bulk data transfer.
`
`The ideal situation will be to transfer the application data units directly from the user
`
`spa~e to the adapter buffer. with minimal host intervention. This suggests that some
`
`protocol functions must be offioaded,
`
`The key problems associated with offboard processing of protocols include:
`
`LJ
`
`It is often difficult to panirion the functionality of protocols in a dear-cut way for
`
`off-board processing. Poor partitioning of functions lead to a complex protocnl for
`
`the ho,t software to communit:ate with the adapter, which may offset the pOicntial
`
`gain in performance due to offloading [174J.
`
`LJ
`
`It is not sufficient to offload the protocol functions and hnpe In achieve a dramatk
`
`improvement in performance. NOll-protocol related processing is a large part of the
`
`total load, as shown in [2011. Examples i:lclude interrupt handling, context switching
`
`and data copying at layer boundaries in deeply iayered protocol stacks.
`
`It is important to consider multiple layers together [43J. However. it increases the
`
`complexity of the adapter especially if the full protocol logic of all the layers are
`
`offioaded [211, 106).
`
`2
`
`Ex.1069.018
`
`DELL
`
`

`

`o The hardware in the adapter might range from s<:veral microprocessors to single-chip
`VLSI implementations. Probably because of the complexity of existing protocols.
`
`VLSI [11 Xl implementation above the data link layer has been disappointing so
`
`far. The choice of hardware depends largely upon the complexity of the functions
`
`supported by the:;
`
`.cr.
`
`In [13J where the entire transport protocol layer is
`
`uflioaded or in [1Of, I where the full protocol stack can be offloaded. general purpose
`
`microprocessors are used. In [59}. dedicated VLSI chips are used tl> support TCP
`
`checksums. Also. some newer lightweight transport protocols are designed for VLSI
`
`implementation [11. 41J.
`
`1.2 The proposed solution
`
`This thesis uses a combination of approaches based on general software performance
`
`engineering principles to solve the problem addressed in the previous section. It employs
`
`the protocol bypass concept [21OJ which is a generalization of Jacobson' s "Header
`
`Prediction" algorithm [107) for TCPIIP. -By-passing·ldentifics··a?fa-st·pllth·for-bulk·tlata·
`
`transfef·i··by··wbicn·ffequent··sirnple'·opera1"ion.s·are·fttlll'i"d··lIITd·optim:tzett: It addres ses the
`
`problems associated with off-board processing by selecting a particular set of functions
`
`which are complete in themselves for offloading. 'f'he···primar:y •... S-lx-of···ti:lisc·.1he&'tS"···is··t<h··
`
`vtl"fify·.·anclxe!Cten~HhC'··bypas~·~e6neept·.iJN'lft:IeI:.«y.·deiiver·"\fery··hi·gtl'mr~tlghput"·fateS' •• (.upY
`
`tty·1he • .oost.·btlSLrm:.rn<If.:y··tllmtiwidth1·to··the··apy*icatitlTP·prograrn··wtltle'S"uIt'llSiftgxstllndard'
`
`''he:WyWeigI4t'!''fI'~( By this. we hope that we can achieve first. a usable engineering
`
`approach. and second. a deeper understanding of performance issues which would guide
`
`future designs of protocol and host system architecture.
`
`This thesis also addresses the foUowing questions associated with bypassing:
`
`o How to improve performance in the bypass path"!
`o How to minimize bypass test overhead and synchronization between the bypass path
`and the stantlard protocnl stack?
`
`Ex.1069.019
`
`DELL
`
`

`

`LJ What are the problems associated with multiple-layer bypass'!
`
`Partitioning of functionality ftlr bypass and its impact on performam:e.
`
`Three approaches are proposed each of which serves to address one Of more or
`the above questions.
`In the first approach, throughput performance is im.:reased by
`
`implementing changes to software alone. This study used an experimental protocol
`
`testbed. an algorithm for multiple-layer bypass with window flow control, and an efficient
`
`logic for the bypass test [196]. It verified the bypass logi!. and concept.
`
`The sewnd approach propnses the use of dedicated VLSI hardware fllr offlllading
`
`protocol functions in the bypass path. Since the bypass system identifies a simple subset
`
`of protowl processing, it is more amenable to VLSI implementation than an entire
`
`protocol. This results in an architecture that offloads critical functions that are inefficiently
`
`handled by current host systems to dedicated VLSI hardware, leaving only a little header
`
`processing and a OMA transfer to the adapter for the host, and requiring only a ~imple
`
`command protocoL A feasibility study was conducted using the industry standard VHOL
`
`language to model the bypass chip. which is called R6'P£·,tR:edu"e'L.Q.pcr.atit}fi~.f?r<rfm;nI
`
`i.nl!i-~ [1951, A behavioral model of the complete bypass system and a structural model
`
`of ROPE were constructed. Results show that it is both feasible to implement the bypass
`
`path for the Session and Transport layers of OS! protocols in VLSI with current gate
`
`array technology, and that it is capable of supporting a very high data rate.
`
`Finally, in the third approach, a scalable packet-level ParaIJe1«optttfftstfC"Bypa§s
`
`AI~turew~POB~J' [194J was designed and evaluated, Current "ff-the-shelf OSP
`
`microprocessors are used to evaluate the architecture experimentally with very realistic
`
`simulation, Its results are compared with an analytical model. With POBA, a system
`
`using a small number of available OS? components can execute the three layers (transport,
`
`session, presentation) at speeds close to the host memory bandwidth [194J. The'1"66~
`
`_hite~~.is:iess,_~(,·wj.tfha~fit,'antfe6ueti('ffl'ltt'1'ltt'eSstlr'e-rnttlmttmtfffl"gltare4
`
`~,infumA;j..Qon. Hence its efficiency for parallel implementation is significantly
`
`4
`
`Ex.1069.020
`
`DELL
`
`

`

`higher ~mnp lred to architectures that implement the full protocol logic [21. lOti I.
`
`~·'i!>ypaliS>CL),tl!U~pt"d'llUi.d·lc"",.addre",&·.~ .. pn~~··~ui£~mtmt··f~wvfast··ct_"''t+HIT·
`
`!!rltHIfl'. WI! believe that the speed in pwcessing connection packets is ntlt the main issue
`
`in high-speed networks. but the number of message exchanges required between two
`
`application entities befnre a connection can be established.
`
`Recent projects on high-speed protocol processing have explored various approaches.
`
`each of which has a connection to this research:
`
`• NI' ..... "IiRhMl'iRht" protocols - Our approach can make standard ,Jrotocois nearly
`
`as fast as these lightweight protocols, at least for bulk data transfer.
`
`In bypass.
`
`lightweight data transfer can be associated with a complete or heavyweight protocol
`
`for connections.
`
`•
`
`Special sojtwrlre implementation techniques - Bypass reduces inter-layer overhead
`
`when it is extended to a multiple-layer protocol stack. Inter-layer processing overhead
`
`includes queue and buffer management, context switching and movement of data
`
`across layers, all of which are a significant overhead in communications processing.
`
`The advanta£e of this work is the ability to integrate the critical processing functions
`
`of a deeply layered protocol stack without a major redesign of current standards.
`
`• Hardware implementation -
`
`The proposed architecture moves all the per-octet
`
`processing to the adapter, leaving only a little header processing and a DMA transfer
`
`to the adapter for the host, and requiring only a simple command protocol.
`
`• Parallel hardware implementation - The bypass approach is a good way to organize
`
`the processing within the parallel hardware architecture described by Jain er al [llil
`
`and Goldberg et at [82}. and will significantly enhance their proposal. Bypass reduces
`
`the complexity of protocol processing resulting in a significant reduction in processor
`
`contention for shared connection information.
`
`•
`
`Special Architecture and protocol structures - Multiple-layer bypass merges ad(cid:173)
`
`joining layers 10 form a "super-efficient-stack" during the data transfer phase. The
`
`Ex.1069.021
`
`DELL
`
`

`

`functionality of these layers can then be (lrganized fOf optimal performanct!.
`
`The main difference of this work. wmpared with other ideas in the literature is that
`
`it addresses the processing requirements of a multiple-layer protocol stack by efficiently
`
`offluading per-octet protocol related operations and minimizing the key non-protocol
`
`related per-packet operations. It succeeds by isolating functions that are typically handled
`
`inefficiently by

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket