`
`---�·\O
`
`t(:f/}'"~
`
`Kay Harlow
`
`.
`\J
`-
`n
`�liiilli-
`·ca ·
`-!::.::
`PROVISIONAL APPLICATION COVER SHElrT
`e .. �-
`,
`--;.;-!1 CERTIFICATE OF EXPRESS MAILING
`'°!lllitiy certify that this paper and the docwnents and/or
`!il!!lderred to as attached therein are being deposited
`. lllfilre United States Postal Service on November 3,
`1999 iri an envelope as "Express Mail Post Office to
`Addressee" service under 3 7 CFR § 1.10, Mailing Label
`Number EL412812104US,
`essed to the Assistant
`1=_...,..-n, DC 20231.
`Commissioner£
`b'4c'
`,
`
`A) p~a-1
`
`Attorney Docket No.: ADAPP085A+
`First Named Inventor: WILSON, Andrew
`
`Assistant Commissioner for Patents
`Box Patent Application
`Washington, DC 20231
`
`D Duplicate for
`fee processing
`
`Sir:
`
`This is a request for filing a PROVISIONAL APPLICATION under 37 CPR 1.53(c).
`- )
`INVENTOR(S ✓APPLICANT(S)
`
`'
`
`. LASTNAME
`
`FIRST NAME
`
`MIDDLE INITIAL
`
`'i1't
`
`�
`
`,:!,
`-·
`'=':1
`
`WILSON
`
`Andrew
`
`TITLE OF INVENTION (280 £haracters max)
`
`SCSI Over Ethernet
`
`RESIDENCE (CITY AND
`EITHER STATE OR FOREIGN
`
`COUNTRY)
`
`SanJose,CA
`
`CORRESPONDENCE ADDRESS
`Albert S. Penilla, Esq.
`MARTINE PENII.LA & KIM, LLP
`710 Lakeway Drive, Suite 170
`Sunnyvale, California 94086
`(408) 749-6900
`I )
`ENCLOSED APPLICATION PARTS (check all that a
`PPY
`Number of Pages 74 {including drawings) __ Small Entity Statement
`Number of Sheets ____ _
`__ Other (specify) _______ _
`
`x_ Specification
`_ Drawing(s)
`
`__x_ A check or money order is enclosed to cover the Provisional filing fees.
`Provisional Filing Fee Amount ($)150.00
`__x_ The commissioner is hereby authorized to charge any additional fees which may be required or
`,,q
`credit any overpayment to Deposit Account No. 50-0805 (Order No. ADAPP085A+ ).
`The inventions made by an agency of the United States Government or under a contract with an agency of the United States Government.
`.,.. _K_No
`____ yes, the name of the U.S. Government agency and the contract number are:
`
`Respectfully Submitted,
`SIGNATURE
`A ( !1/_
`TYPED NAME
`Albert S. Penilla
`
`f �
`
`2'
`
`-
`
`DATE November 3, 1999
`REGISTRATION NO. 39,487
`
`PROVISIONAL APPLICATION FILING ONLY
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. i
`
`
`
`eSCSI
`
`SCSI over Ethernet
`eSCSI
`
`Inventor
`Andrew Wilson
`
`1 Introduction
`
`This document will show how adding an appropriate light weight protocol to Gigabit Ethernet can turn it into a high
`quality disk subsystem interconnect.
`
`1.1 Scope and Requirements
`Because Ethernet is has price and performance points which cover everything from homes and small offices to the
`largest computer rooms, an Ethernet based storage subsystem could also find applicability in the same wide range of
`settings. However, there are some differences in the requirements, which might preclude an eSCSI designed for one
`end of the scale from working at the other. For the desktop and small server market, the basic n:quirement is to
`connect local storage class devices to computer systems, especially when the devices are housed outside of the main
`computer case. For the large server system market, the requirements of storage subsystem sharing and interprocess
`communication over campus area networks are added. The specific requirements are detailed in the next two
`subsections.
`
`1 .1.1 Desktop and small system requirements
`
`1.
`
`2.
`
`Low latency, on the order of a couple hundred microseconds or less
`
`Maximum distances of a few hundred meters
`
`3. For disk storage, needs to support bandwidths up to those of small RAID boxes
`
`CPU utilization on the order of current storage systems
`
`For desktop connectivity, needs to support bandwidths of tapes, printers, scanners, etc.
`
`4.
`5.
`6. Data must be delivered completely and without corruption.
`7. Use off-the-shelf Ethernet components
`8. Needs to configure automatically, i.e. can function satisfactorily with little or no system aruninistrator
`intervention
`
`~ '·
`
`~
`
`9. Low cost, especially for desktop connectivity
`
`10. The ability to jointly operate with other protocols is highly desirable.
`
`1.1 .2 Large System Requirements
`
`1. Low latency, on the order of a couple hundred microseconds or less
`
`2. Maximum distances of a few kilometers
`
`ADAPP085A+
`
`1
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 1
`
`
`
`eSCSI
`3. Needs to support bandwidth of leading edge RAID boxes for at least the next 5-10 years
`
`4. Needs to support interprocess communication paradigms
`
`5. CPU utilization on the order of current storage systems
`
`6. Data must be delivered completely and without corruption.
`
`7. Able to use off-the-shelf Ethernet switches and IP routers
`
`8. Needs easy management by system administrator
`
`9. Needs to be cost competitive with other storage network technologies
`
`10. The ability to jointly operate with other protocols is highly desirable.
`
`1. 1.3 Requirements discussion
`
`As you can see, there are both similarities and differences between the two sets of requirements .. Fortunately,
`Ethernet offers a range of compatible technologies that can cover the requirements of both smalll and large systems.
`
`Requirement 1 of each list is easily met using today's Ethernet switches and switching routers. Typical values for
`latency are 3-20 microseconds for commercially available Gigabit Ethernet switches. Even witfa several switches
`and a couple hundred meters of cable, the round trip delay will be less than 100 microseconds in the absence of
`congestion.
`
`Requirement 2 for small systems can be met with just CD/CSMA Ethernet, which can span 500 meters with fiber
`optics, or 200 meters diameter with copper. With full duplex, multi-mode optical fiber cables can handle several
`kilometers between switches, so cross country distances are theoretically possible. Thus switched (or router) based
`networks can handle the requirements of both small and large systems.
`
`The bandwidth requirements imposed by future disk drives (#3) are difficult to predict, since they depend on
`advances in recording density and mechanical speed of the disk drives, as well as the access patterns of future
`applications. Figure 1 shows projected access times and sequential access rates for disk drives, assuming historical
`annual improvement rates continue. For applications which access data sequentially, or in very large random reads,
`the projections show that a single drive will be able to nearly saturate a gigabit interconnect. On the other hand,
`transaction processing applications will still be limited by disk access time to about 350 IOPS per drive, or one to
`five megabytes a second depending on request size. Even at five megabytes a second, over 20 drives could be
`supported by a single gigabit link. Thus, gigabit links should be adequate for small server installations, while large
`systems will need a mixture of 1 and 10 gigabit links.
`
`For the types of traffic discussed in the requirement 4 of the small system section, 100 BaseT will often be adequate,
`and should be supported along with Gigabit. Large systems will have to support interprocess communication traffic,
`which consists of fairly short, time sensitive packets. Thus, Gigabit speeds are fine, but low latency and overhead
`protocols are required to support the IPC.
`
`For large systems with external RAID boxes, the new 10 Gigabit Ethernet will be required. Interprocess
`communication doesn't require large bandwidths, but does require very low latencies (the lower the better) and
`efficient transport of short packets. This argues for a light weight transport protocol, but also argues for QOS and
`priority features such as found in IP version 6.
`
`In order to equal the low CPU utilization of current host adapters (requirement #5 on both lists), the transport
`protocol will have to run onboard the eSCSI host adapter, ideally with special purpose hardware,. To keep the
`hardware from becoming too complicated and expensive, a protocol with much less complexity than TCP/IP must
`be used. Such a protocol will be proposed later in this document.
`
`Advocates of other storage interconnect proposals often cite the requirement for error free and complete data
`transfer (#6) as a reason not to use Ethernet. It is true that the basic Ethernet protocol is that of an unreliable
`Datagram, but addition of any of a number of transport layer protocols can produce a reliable data conduit quite
`sui.table for storage applications. Actually, all other storage interconnects rely on similar mecha1aisms to mask the
`occasional data corruption that occurs with any physical interconnect medium, so the differences between Gigabit
`Ethernet (especially full-duplex Gigabit Ethernet) and other storage interconnects are not as large as they might at
`first appear.
`
`ADAPPOSSA+
`
`2
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 2
`
`
`
`eSCSI
`This document will describe a transport protocol which provides reliable, in-order delivery between systems. Using
`this protocol, SCSI commands and data can be transferred reliably between initiators and targets.
`
`- 1.20 .-..
`Ill
`m
`100 ~
`.!!
`Ill
`
`-
`-
`..
`
`80
`
`9
`
`8
`.-..
`Ill
`E 7
`._
`QI
`E 6
`j::
`Ill 5
`Ill
`Cl)
`0 4
`0
`<t
`Cl) 3
`Cl)
`m
`
`.. 2
`
`II.)
`
`>
`<( 1
`
`0
`
`-+--Sustained seq. rate
`
`1998
`
`1999
`
`2000
`
`2001
`
`2002
`
`2003
`
`2004
`
`2005
`
`Ill
`:,:::
`C
`QI
`::,
`l:T
`QI
`(f)
`"ti
`Cl)
`C
`20 m
`:I
`(fJ
`
`60
`
`40
`
`0
`
`-Ill
`
`Figure 1: Pr~jected advances in hard disk drives
`
`In a contention and/or level 2 switched environment, requirement seven simply means that we use the basic Ethernet
`( original or 802. 3 version) to encapsulate our eSCSI traffic. If routers are present than IP version 4 or 6 must also be
`used.
`
`The requirement for small systems of automatic configuration (#8) can be readily satisfied by using switches over
`plain (no IP) Ethernet with the addition of an automatic discovery protocol (discussed later in this document). Larger
`systems will have system administrators who will want to manage certain aspects of the storage network, which can
`be done through use of IP and existing IP based tools. We may need proxy interfaces to eSCSl's native protocols,
`but this can be easily done.
`
`For desktop systems, their low cost requirement (#9) can be easily satisfied with IO0BascT and IOOOBaseT over
`cat5 UDP wiring. Larger systems will want to use fiber optics for longer distance Gigabit runs, and for ten Gigabit
`links. The use of of-the-shelf Ethernet switches and wiring will produce substantial cost savings due to the large
`volumes associated with the use of such equipment for networks.
`
`Basic shared media Ethernet (either hub or cable) is completely oblivious to the data it is transporting, and
`propagates the data between stations (e.g. NICs) using addressing information contained in its own frame header.
`Thus, any number of protocols, including any we might invent, can be used to package the user''s data. Routers, on
`the other hand, usually need to understand portions of those protocols in order to determine the appropriate path for
`the packets. Thus, routers only work with a few standard protocols, such as IPX and IP.
`
`Between shared media and routers lie switches, which provide much of the bandwidth multiplication of routers but
`can use the addressing information contained in the Ethernet frame header to pass packets on to their correct
`destinations. Thus, Ethernet switches can work with private protocols just as well as plain Ethernet and hence meet
`requirement number ten.
`
`When mixing several protocols on the same wire however, requirement ten is a little more complicated to achieve.
`The original Ethernet specification dedicated two bytes in the header to a type field, allowing 65,000 different
`protocols to co-exist on a single Ethernet. At this time there is still plenty of room for additional protocols. However,
`IEEE 802.3 changed that field to a length field, requiring other means to distinguish between protocols. Their
`solution is the addition of eight more bytes of information (3 bytes of 802.2 LLC plus 5 bytes of SNAP) to specify
`which protocol is encapsulated within a given frame. If we want to be fully compatible with 802..3, we will have to
`use these eight bytes as well.
`
`In summary, as long as the scope of our eSCSI proposal is limited to desktop, computer room or campus networks,
`the requirements can be easily met with a combination of lOObaseT, Gigabit and 10 Gigabit Ethernet and an
`appropriate transport protocol. We will now proceed to develop such a protocol.
`
`ADAPP085A+
`
`3
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 3
`
`
`
`eSCSI
`
`1.2 Topologies
`Implementations of eSCSI can be developed for both the desktop connectivity market and serv,er storage market.
`The desktop products will use lO0BaseT and some Gigabit to connect to peripherals and expansion drives through
`point-to-point links or switches. The Server products will use switches and routers interconnected with one and ten
`gigabit Ethernet to connect multiple hosts to disk and tape farms, and provide host to host communication.
`
`1.2.1 Desktop Connectivity
`
`The desktop topologies start with a point to point link connecting a eSCSI PIC to a eSCSI target, as depicted by the
`solid lines in Figure 2. Expansion to more peripherals can follow either of two paths. First of all, any eSCSI port can
`use Ethernet Hubs (or switches) to allow connection to additional targets. Thus, even a single port eSCSI PIC can
`connect to a nearly unlimited nwnber of targets, provided that bandwidth limits are not exceeded. Secondly, Adaptec
`can produce a multiport eSCSI PIC allowing several (e.g. 4) targets to be connected without an external hub. The
`multiport eSCSI PIC could simply have a small hub built into the card, or it could actually have independent ports,
`providing additional bandwidth. If the independent port approach is used, then port aggregation can be used to
`provide higher bandwidth to high performance eSCSI targets. Finally, Figure 2 also illustrates how multiple hosts
`could share peripherals when a hub is employed.
`
`Host
`
`SCSiNet
`Target
`
`Host
`
`/
`... ··"
`/'
`
`;
`
`SC,_S_i_N-et--1~//
`
`/
`
`.._.,,,,,._
`
`SCSiNet
`Target
`
`SCSiNet
`Target
`
`........, __ __.
`
`,h~::::· .. ··•· .. -···············
`Hub
`,
`SCSiNet
`PIC
`......................
`>·,,
`~ ................... ___ .. ,/•~--.... "·•
`j
`-----'-.. :"·,.
`,.c. ....
`.... ::::.:;<:./
`'..
`...... ......
`.....
`...... ·.....
`-......................... -.... - .... .
`I
`, .... ,
`,., '·,
`.... _ ..
`........ ..
`···<:::,.; Ba!!~dth
`I
`SCSiNet
`PIC
`Target
`._ ____ i
`"' .................... _ .... _ .. ,, ...... ..
`·------.. ·-····-··j
`Figure 2: Possible desktop topologies for eSCSI
`One problem with point-to-point links that do not involve a Hub or a switch is that an Ethernet Crossover cable is
`required. This is because traditional Ethernet NICS have their receive and transmit wires on the opposite ends of the
`connector from those of the Hubs. Since networks seldom consist of just two NICS there usually is a hub or switch
`in between, allowing straight through cables to work just fine.
`
`But, a eSCSI PIC with multiple ports would normally be connected to just one target per port. Since target ports
`would have to be wired the same as PIC ports (to allow connection to hubs and switches), a crossover cable is
`required. The other option is to make eSCSI PIC ports configurable (hopefully automatically) as "hub" type ports or
`"NIC" type ports. How feasible this is has not yet been determined.
`
`1.2.2 Combined with Network
`
`While it would be simplest to have eSCST separate from communications networks, there is a significant appeal to
`having a single, combined function Ethernet connector on the desktop computer. Even if several eSCSI connectors
`are provided, being able to use them for either external network or eSCSI, without regard to whilch connector is used
`for which purpose is a strong selling point.
`
`For example, a single computer with PIC/NIC with two or more independent ports might dedicate one to an external
`network connection, and the other(s) to eSCSI devices. If several computers were involved, and they shared both a
`network connection and several desktop peripherals, then connecting them through a single small hub would be
`
`ADAPP085A+
`
`4
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 4
`
`
`
`eSCSl
`useful. Both of these configurations are illustrated in Figure 3. Also shown in Figure 3 is an optional bridge to
`partially isolate the internal network from the external. Such a bridge might be combined with the Hub, and provide
`level 2 bridging for traditional network traffic, and firewall services to prevent eSCSI traffic from crossing between
`the external network and the internal (both directions, to prevent a rogue external eSCSI packe1 from interfering
`with normal storage activities.
`
`Host
`
`SCSiNet
`PIC/NIC
`
`External Network
`
`l .................. L ............... .
`
`SCSiNet
`Bridge/
`FireiWall
`
`SCSiNet
`Target
`
`f"
`J SCSiNet
`.......... ,.,.. ! Target
`.
`t ................................ i
`................................. ,
`
`si!:t l
`
`Hub
`
`................
`
`...... ·
`....... •···
`.. ............. .
`
`SCSiNet
`
`, ................................ .
`
`,--------···"··"·"·"•
`!
`Host
`I
`PTC/NIC I
`!
`i. ................................... {
`L ................................................................. •
`Figure 3: Combined Storage and Network Functions
`
`1 .2.3 Big System
`
`[still to comej
`
`2 Major Decisions
`2.1 Standard or Custom Protocol
`Probably the best-known transport protocol at this time is the intemet's Transport Control Protocol (TCP). Using
`TCP would allow our protocol to function across the Internet, and ultimately let hosts send SCSI commands to
`targets halfway around the world. It would also make some IT managers happier if the "Standard Network" protocol
`was used. However, these minor advantages are offset by some serious efficiency and performance issues associated
`with TCP.
`
`Transport protocols designed for the Internet, (i.e. TCP) have to tolerate a large variety of errors:, including
`corrupted, delayed, out-of-order, duplicate and dropped packets. The mechanisms used require larger headers than a
`protocol operating in a friendlier environment. In addition, TCP requires the use of IP underneath it, which doubles
`the number of bytes of header overhead. On top of that, TCP has inadequate sequence space for gigabit speed
`transfers, which requires additional extensions to conect. The current plan is to add 12 more bytes to solve the
`sequence space problem. This gives a total header overhead of 52 bytes per TCP packet. Finally, there is often
`another 8 bytes allocated for link level control, resulting in 60 bytes of overhead.
`
`Even 60 bytes of overhead is not that bad, given that storage traffic in modern systems tends to be 2K or greater
`transfers, and hence will have a large percentage of completely full packets. Figure 4 shows the raw efficiency (not
`including the effect of SCSI related overhead) possible as a function of the transport protocol header for various
`average packet sizes.
`
`ADAPPOSSA+
`
`5
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 5
`
`
`
`eSCSl
`
`Header
`Size
`(Bytes)
`
`>,
`(.)
`C
`(J)
`
`(.)
`:E
`L!J
`
`100.0%
`
`80.0%
`
`60.0%
`
`40.0%
`
`20.0%
`
`0.0%
`
`250
`
`500
`
`750
`
`1000
`
`1250
`
`1500
`
`Average Packet Size (including header, in Bytes)
`
`Figure 4: Effect of transport header size on Ethernet efficiency
`
`Storage traffic with 4KB requests will have an average packet size between 1000 and 1500 bytes, giving a raw
`efficiency of 91 % to 93% for the 60 byte TCP/IP with extensions, versus 95% to 96% for a hypothetical 20 byte
`header. Since adding the SCSI layer will further reduce efficiency (defined as storage data bytes/ total network
`byte-times used), we are actually talking about giving up approximately 10% of bandwidth to protocol for TCP/IP
`versus only 5% for a leaner protocol. Note that even with no transport header, the efficiency is only about 97%, so a
`95% efficient protocol is close to the best that can be done.
`
`There are also performance issues due to the complexity of TCP. TCP was designed for the fully routed, globe
`spanning Internet, where packets can be received significantly out of order, where round trip delays vary widely, and
`where each packet may traverse a widely varying collection of links. While TCP succeeds at its design goal, many
`of its features represent overkill in a pure LAN environment.
`
`For example, TCP relies on timeouts to detect lost or damaged packets. For efficient operation, the timeouts must be
`a small multiple of round trip delay. On the Internet, such delays can vary over several orders of magnitude, even
`between the same pair of hosts. Thus, TCP needs to dynamically adapt to these varying delays. On a LAN, you get
`less variability of round trip delay, plus in-order packet delivery. This allows you to avoid the continuous round trip
`time measurement and dynamically adjustable timeouts of TCP.
`
`Is a reliable transport protocol with only 20 byte headers possible? In a LAN environment, yes, because the kind of
`badly delayed, out-of-order packets that TCP must protect against can not happen. In addition, a LAN protocol can
`take advantage of the addressing mechanism built into Ethernet, to avoid the need for the IP layer. Because the
`above efficiency analysis indicates low overhead is important, we will develop a low overhead, storage centric
`protocol for use with STP. But if the ''industry" demands use of TCP, it can still be done but wi1U1 potentially higher
`cost or lower peiformance.
`2.2 Use of Internet Protocol (IP) Headers
`Although it is possible to use STP without IP, there are a number reasons to run STP on top of IP anyway. One
`reason is that corporate network folks are fond of using IP to segment their networks. Also, many IP based network
`management tools and extended capabilities have been developed, which can be used with eSCSI, so long as it rides
`on top of IP. So, it is important to consider how STP can be used on top of IP and what complications this causes.
`
`2.2.1 Creates opportunities
`
`1. A major industry initiative, dubbed Universal Plug and Play (UPNP), requires IP as it's foundation. So using IP
`packets for all eSCSI traffic enables direct implementation of UPNP for eSCSI.
`
`2. There are many IP based management tools already out there that we could leverage if we used IP. These could
`allow statistics gathering, network monitoring, and bandwidth reservation for example.
`
`ADAPP085A+
`
`6
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 6
`
`
`
`3.
`
`4.
`
`eSCSI
`IP can be used to segment networks, possibly providing some security for local eSCSI devices, such as a
`server's directly attached storage.
`
`IP provides the flexibility to travel beyond the immediate campus, and may even be required for travel within a
`campus.
`
`5. Many system administrators mandate use of IP in anything potentially capable of attaching to their corporate
`network, and would like to see IP on all network like interconnects ( even a totally separate storage network).
`
`2.2.2 Challenges to overcome when using IP with eSCSI
`
`I. STP is optimized for level 2 switched LAN environments, where packets always arrive in order (unless
`dropped). Thus, we assume a gap in the packet sequence number indicates a dropped packc~t and triggers a NAK
`and subsequent retry. Putting STP on IP enables STP to travel through routers as well as switches. If multiple
`paths exist in a routed environment packets can get out-of-order in the normal course of thilngs. This could
`cause a lot of unnecessary retries if it was a frequent occurrence.
`
`2. The routing algorithms used in some IP routers (especially in W ANs) can result in some packets taking
`considerably longer to reach the destination node than others, and becoming "Ghost Packeits" (greatly delayed
`duplicate packets). To detect Ghost Packets, a larger sequence number space is required in routed networks than
`is required to simply maintain Gigabit performance in a simple switched network.
`
`3. Routers also have to regenerate 802.3 frame headers, and hence CRCs. Thus the data is potentially unprotected
`while traversing the router. We may need additional CRC protection when running on IP, which adds four bytes
`of overhead and may require a lot of host computation in non-integrated solutions.
`
`4.
`
`5.
`
`IP addresses must be assigned and managed. In theory, this could mean every device, from CD-ROMS and disk
`drives to RAID boxes and hosts need mechanisms for assigning and discovering IP addresses, such as ARP,
`DHCP and DNS. This is a lot of extra software to force on the lower end devices, such as native eSCSI drives.
`
`IP version 4 requires an extra 20 bytes with each packet, and version 6 requires an extra 40. While this is still
`not a huge overhead, when combined with the MAC header and STP header, it does result in less bit efficiency
`than Fiber Channel.
`
`2.2.3 Discussion of Strengths and Weaknesses
`
`Challenge number one, out-of-order packets, is really only a problem when W ANs are involved. Since TCP works
`more efficiently if it does NOT have to re-order packets at the receiver, most LAN routers keep the packets in order
`most of the time. Our protocol will not break if an out-of-order packet arrives, it will simply request all packets from
`the first skipped one with a fast N AK, creating some extra network traffic. So, if packet re-ordering in the network is
`infrequent, the effect will only be a very minor performance reduction.
`
`Difficulty number two, Ghost Packets, is not as big a problem with STP as it is with TCP, because STP does not do
`reordering at the receiver, so a Ghost Packet would have to arrive at the exact moment when the: sequences numbers
`had wrapped around to it's sequence number for it to cause a problem. Since IP routers in LANs generally try to
`route packets from a given stream along the same path, the probability of a long lived Ghost Packet is low to begin
`with, and is then multiplied by the probability of the expected sequence number exactly matching the Ghost
`Packet's. However, by extending the sequence space to 32 bits and limiting the time-to-live count at ten Gigabit
`speeds, a completely Ghost Packet safe protocol can be obtained.
`
`Challenge number three may be a bit of a red hearing, because routers often do have ECC or parity protected
`memory, and there are other points in computer storage systems where data is unprotected anyway (data paths
`within host adapters and target chips, for instance). Thus, the real issue is whether the routers have substantially
`higher risk of undetected error than other points in the data path. Never-the-less, including an STP CRC is not
`difficult for an integrated solution, as it just requires a second CRC generator which is a fairly small amount of
`logic. This feature should be supported but its use within a given session made optional.
`
`Challenge number four, managing IP addresses, can be handled through a set of protocols already defined to
`automatically assign IP addresses, or assist network managers in assigning them. The part of the protocols which
`must be implemented by peripherals is fairly simple, and we can handle the more complicated parts in software on
`the servers, so the impact on eSCSI silicon should be modest. However, it still adds complexity that is not needed
`for desktops and small server systems.
`
`ADAPP085A+
`
`7
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 7
`
`
`
`eSCSI
`The use of UPNP or other service discovery strategies both require IP and handle much of the IP management,
`potentially relieving eSCSI devices from a lot of the management tasks. And, as points two through four of the
`opportunities subsection indicate, use of IP allows tapping into existing applications and protocols for handling
`traffic balancing and security.
`
`The amount of byte overhead added by IP is not really that serious, since most packets are fully filled with storage
`type traffic, resulting in only a small percentage overhead. Other types of traffic may see a larger efficiency drop,
`especially with IP version 6.
`
`2.2.4 When should we use IP, or even TCP/IP?
`
`For desktop systems or small single server (cluster) systems IP is not necessary, and you are paying a management
`complexity and performance penalty for no real gain. In fact, there is some built-in security in not being able to
`route outside your local Ethernet segment. However, for Campus size systems IP is required, though even here
`locally attached devices may not need it. I have a proposal, discussed more fully below, that combines raw 802.3 for
`local communication with IP based for campus communication.
`
`As you get beyond the campus and into a WAN situation, multiple routing paths will increase the likelihood of out(cid:173)
`of-order packet reception. At this point TCP would probably be a better choice of transport protocol than STP. One
`option would be to use gateways at the periphery of the campus network which would copy the data payload from
`STP onto TCP. Since both are connected protocols, this should be quite straight forward to do, and SEP can ride on
`top of TCP just as easily as it rides on top of STP. This may even be a simple matter of adding STP to the protocol
`stack of the firewall computers. In fact, you might even want to turn things around by putting traditional network
`traffic on STP, thus improving the efficiency of the local network and reducing host CPU utilization for network
`traffic.
`
`At distances where you might want to put eSCSI on TCP, it is questionable whether you would really want to use
`the SCSI protocol anyway. Due to the speed of light limitations, the latencies will be more than you would want for
`routine storage accesses. The typical reason given for wanting remote disk and tape drives is disaster recovery,
`which mostly requires streaming copies of local data out to the remote site. There are many traditional network
`protocols which do a fine job of that, so there really is no advantage to using any form of SCSI.
`
`2.2.5 Proposed Solution
`
`To support both STP by itself and on top of IP, the following is proposed:
`
`•
`
`•
`
`•
`
`•
`
`The eSCSI discovery protocol will be confined to a single Ethernet segment.
`
`The master initiator handling discovery will also act as a proxy for local devices when determining IP address
`assignments and when ARP requests are received.
`
`The discovery packets will be extended with sufficient space to hold both ipv4 and ipv6 addresses. Flags will be
`used to indicate whether the IP fields are valid, and for which version.
`
`The eSCSI open packet will include space for ipv4 and ipv6 addresses. Flags will be used to indicate whether
`the IP fields are valid, and hence whether the resulting STP session should ride on top of IP.
`
`• All eSCSI implementations will support the addition and removal of IP headers.
`
`• An optional STP trailing CRC will be implemented. A flag in the STP header will indicate whether the CRC is
`attached. The STP length field will not include the 4 bytes occupied by the CRC.
`
`•
`
`IP based resource discovery protocols, such as UPNP will be used to manage device discovery across sub
`networks, for those devices that want to be discovered.
`
`With these changes it will be possible to support non IP based eSCSI for small systems and IP based for large. It will
`even be possible use both in the same system, for example non IP for directly attached peripherals, such as local
`eSCSI disks and IP for shared peripherals, such as RAID boxes. An example system diagram is shown below, where
`dashed lines indicate IP sub-networks. In such a system, the disks that are part of the Cluster would normally run
`without IP and be invisible to external systems, but could be assigned IP addresses for communication with tape
`drives during backup.
`
`ADAPP085A+
`
`8
`
`Provisional Patent Application
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1041, p. 8
`
`
`
`r-------............................... .
`Cluster
`Host
`
`eSCSI
`
`L2
`Switch
`
`L3
`Switch
`
`Additional
`Cluster
`
`RAID
`subsystem
`
`L2
`Switch
`
`Backup
`Host
`
`Tap(:
`
`Tape:
`
`2.3 Connection Endpoints
`As discussed in Section 1.1, data mu