`
`(12) United States Patent
`Jewett et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,271,606 B2
`*Sep. 18, 2012
`
`(54)
`
`(75)
`
`(73)
`
`(*)
`
`(21)
`(22)
`(65)
`
`(60)
`
`(60)
`
`(51)
`
`NETWORK-BASED STORAGE SYSTEM
`CAPABLE OF ALLOCATING STORAGE
`PARTITIONS TO HOSTS
`
`Inventors: Douglas E. Jewett, Round Rock, TX
`(US); Adam J. Radford, Mission Viejo,
`CA (US); Bradley D. Strand, Los
`Gatos, CA (US); Jeffrey D. Chung,
`Cupertino, CA (US); Joel D. Jacobson,
`Mountain View, CA (US); Robert B.
`Haigler, Newark, CA (US); Rod S.
`Thompson, Sunnyvale, CA (US);
`Thomas L. Couch, Woodside, CA (US)
`Assignee: Summit Data Systems LLC, Frisco, TX
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 401 days.
`This patent is Subject to a terminal dis
`claimer.
`
`Notice:
`
`Appl. No.: 12/195.211
`Filed:
`Aug. 20, 2008
`
`Prior Publication Data
`US 2008/0313301 A1
`Dec. 18, 2008
`
`Related U.S. Application Data
`Division of application No. 1 1/683,944, filed on Mar.
`8, 2007, now Pat. No. 7,428,581, which is a
`continuation of application No. 09/927,894, filed on
`Aug. 10, 2001, now Pat. No. 7,392.291.
`Provisional application No. 60/244,664, filed on Aug.
`11, 2000.
`
`Int. C.
`G06F 5/67
`GO6F 3/OO
`
`(2006.01)
`(2006.01)
`
`(52) U.S. Cl. ......................... 709/214; 709/215: 711/114
`(58) Field of Classification Search .......... 709/223 231,
`709/212. 217, 239; 711/114
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,127,097 A
`6/1992 Mizuta
`5,129,077 A * 7/1992 Hillis .............................. T12/13
`5,485,627 A *
`1/1996 Hillis ......
`712/13
`5,548,770 A * 8/1996 Bridges ................................. 1.1
`5,628,005 A *
`5/1997 Hurvig .................................. 1.1
`5,712,975 A *
`1/1998 Ooe ...
`TO9,219
`5,721,779 A * 2/1998 Funk ............................. 713,155
`5,754,830 A
`5/1998 Butts et al.
`5,787,463 A
`7/1998 Gaijar
`(Continued)
`OTHER PUBLICATIONS
`Carlson, M. S., Merhar, M., Monia, C., Rajagopal, M. “A Frame
`work for IP Based Storage.” Internet Draft Document printed from
`ietforg web site, Nov. 17, 2000, pp. 1-36 (of-record in parent and/or
`grandparent application).
`
`(Continued)
`Primary Examiner — Ario Etienne
`Assistant Examiner — Sargon Nano
`(74) Attorney, Agent, or Firm — Greenblum & Bernstein,
`P.L.C.
`
`ABSTRACT
`(57)
`A network-based storage system comprises one or more
`block-level storage servers that connect to, and provide disk
`storage for, one or more host computers. In one embodiment,
`the system is capable of Subdividing the storage space of an
`array of disk drives into multiple storage partitions, and allo
`cating the partitions to host computers on a network. A Stor
`age partition allocated to a particular host computer may
`appear as local disk drive storage to user-level processes
`running on the host computer.
`
`16 Claims, 9 Drawing Sheets
`
`204
`
`HOST #1
`
`102
`
`NETWORK
`
`-/
`
`SOCKET 4
`SOCKET 400
`
`104
`
`26
`
`24
`
`
`
`
`
`s BLOCK SERVER
`:
`#2
`
`
`
`28
`
`Rw N Rese
`
`Adobe - Exhibit 1009, page 1
`
`
`
`US 8,271,606 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,809,279 A * 9/1998 Oeda et al. .................... T11 153
`5,867,491 A
`2/1999 Derango et al.
`5,867,723 A * 2/1999 Chin et al. ...................... T12/11
`5,907,621 A
`5, 1999 Bachman
`5,931,918 A
`8, 1999 Row et al.
`5,931,947 A
`8, 1999 Burns et al.
`5,974,532 A 10, 1999 McLain et al.
`5.996,014 A 11/1999 Uchihori et al.
`6,003,045. A * 12/1999 Freitas et al. ......................... 1f1
`6,076,142 A
`6/2000 Corrington et al.
`6,098,114. A
`8, 2000 McDonald et al.
`6,137.796 A 10/2000 Derango et al.
`6,151.297 A * 1 1/2000 Congdon et al. .............. 370,216
`6,161,165. A 12/2000 Solomon et al.
`6,185.316 B1
`2/2001 Buffam.
`6,339,785 B1
`1/2002 Feigenbaum
`6,378,036 B2 * 4/2002 Lerman et al. ................ T11 112
`6,393,026 B1
`5, 2002 Irwin
`6.421,711 B1
`7/2002 Blumenau et al.
`6.421,753 B1* 7/2002 Hoese et al. .................. T10,315
`6,487,581 B1
`1 1/2002 Spence et al.
`6,502,205 B1* 12/2002 Yanai et al. ....................... 714/7
`6,529.994 B1* 3/2003 Bleidt et al. .................. 711 114
`6,539,482 B1
`3/2003 Blanco et al.
`6,564,252 B1
`5, 2003 Hickman et al.
`6,618,757 B1
`9, 2003 Babbitt et al.
`6,618,798 B1
`9, 2003 Burton et al.
`6,633,909 B1 * 10/2003 Barrett et al. ................. TO9,224
`6,654,752 B2 * 1 1/2003 Ofek ..................................... 1f1
`6,662.268 B1* 12/2003 McBrearty et al. ........... 711 114
`6,671,776 B1
`12/2003 DeKoning
`6,714.968 B1
`3, 2004 Prust
`6,725,456 B1 * 4/2004 Bruno et al. .................. T18, 102
`6,772,347 B1
`8, 2004 Xie et al.
`6,807,559 B1* 10/2004 Budhiraja ..................... TO9,203
`6,834.326 B1* 12/2004 Wang et al. .
`711 114
`6,912,668 B1* 6/2005 Brown et al. ..................... T14?6
`6,955,956 B2 10/2005 Tanaka et al.
`6,970,869 B1
`1 1/2005 Slaughter et al.
`6,983,330 B1* 1/2006 Oliveira et al. ............... 709,239
`7,392.291 B2
`6, 2008 Jewett et al.
`7,428,581 B2
`9, 2008 Jewett et al.
`7,543,184 B2 * 6/2009 Dean et al. ...................... T14? 32
`2001/0044863 A1* 11/2001 Oedia et al. .................... T10/104
`2002/0059539 A1
`5/2002 Anderson ......................... T14?6
`2002fOO77177 A1* 6, 2002 E11ott ...
`... 463/40
`2002/0104041 A1* 8, 2002 Jibbe ........
`T14? 37
`2003/0097504 A1* 5, 2003 Oedia et al.
`710,104
`2003. O188059 A1* 10, 2003 Zack ........
`... T10,74
`2004/0230698 A1* 11/2004 Oedia et al. .................... 709/245
`2008/03131.87 A1 12/2008 Jewett et al.
`
`
`
`OTHER PUBLICATIONS
`Gibson, G., Nagle, D., Amini, K. Chang, F. Feinberg, E., Gobioff,
`H. Lee, C., Ozceri, B., Riedel, E., and Rochberg, D., “A Case for
`Network-Attached Secure Disks.” Technical Report, CMU-CS-96
`142, Department of Electrical and Computer Engineering, Carnegie
`Mellon University, Sep. 26, 1996, pp. 1-19 (of-record in parent and/or
`grandparent application).
`Hotz, S., Van Meter, R., and Finn, G., “Internet Protocols for Net
`work-Attached Peripherals.” Proc. Sixth NASA Goddard Conference
`on Mass Storage Systems and Technologies in cooperation with
`Fifteenth IEEE Symposium on Mass Storage Systems, Mar. 1998,
`pp. 1-15 (of-record in parent and/or grandparent application).
`Lee, E., and Thekkath, C., “Petal: Distributed Virtual Disks.” ACM
`Seventh International Conference on Architectural Support for Pro
`gramming Languages and Operating Systems, ACM Digital Library,
`Oct. 1996, pp. 84-92 (of-record in parent and/or grandparent appli
`cation).
`Perkins, C., and Harjono, H., “Resource Discovery Protocol for
`Mobile Computing.” Mobile Networks and Applications, vol. 1,
`Issue 4, 1996, pp. 447-455 (of-record in parent and/or grandparent
`application).
`
`Tiemey, B., Johnston, W., Herzog, H., Hoo, Guojun, J., Lee, J., Chen,
`L., and Rotem, D., “Distributed Parallel Data Storage Systems: A
`Scalable Approach to High Speed ImageServers.” Proceedings of the
`Second ACM International Conference on Mulitmedia, Oct. 1994,
`pp. 399-405 (of-record in parent and/or grandparent application).
`Van Meter, R., Finn, G., and Hotz, S., “VISA: Netstation's Virtual
`Internet SCSI Adapter.” Proceedings of the Eighth International Con
`ference on Architectural Support for Programming Languages and
`Operating Systems, Oct. 1998, pp.71-80 (of-record in parent and/or
`grandparent application).
`Complaint filed in Summit Data Systems, L.L.C., v. Adaptec, Inc. et
`al., filed Sep. 1, 2010 in the United States District Court for the
`District of Delaware, Case No. 1:10-cv-00749-UNA.
`International Search Report in International Appl. No. PCT/US01/
`25256, mailing date of Dec. 14, 2001.
`Office Action in U.S. Appl. No. 12/195.244, mailing date of Dec. 27.
`2010.
`Office Action in U.S. Appl. No. 12/195.244, mailing date of Oct. 11,
`2011.
`“Final Joint Claim Chart.” dated Oct. 3, 2011, filed in Summit Data
`Systems, LLC v. EMC Corp. et al., Civil Action No. 1:10-cv-00749
`GMS (D. Del. 2010).
`“Defendants' Opening Claim Construction Brief including Tabs
`A-B, dated Oct. 26, 2011, filed in Summit Data Systems, LLC v. EMC
`Corp. et al., Civil Action No. 1:10-cv-00749-GMS (D. Del. 2010).
`“Declaration of Todd M. Simpson in Support of Defendant's Open
`ing Claim Construction Brief.” dated Oct. 26, 2011, filed in Summit
`Data Systems, LLC v. EMC Corp. et al., Civil Action No. 1:10-cv
`00749-GMS (D. Del. 2010).
`Microsoft Computer Dictionary, Microsoft Press, Fourth Edition, pp.
`115, 285,290, and 359, copyright 1999, cited as Exhibit 1 in “Dec
`laration of Todd M. Simpson in Support of Defendant's Opening
`Claim Construction Brief” filed in Summit Data Systems, LLC v.
`EMC Corp. et al., Civil Action No. 1:10-cv-00749-GMS (D. Del.
`2010), dated Oct. 26, 2011).
`The Authoritative Dictionary of IEEE Standards Terms, Standards
`Information Network IEEE Press, Seventh Edition, pp. 52 and 107.
`copyright 2000 cited as Exhibit 2 in “Declaration of Todd M.
`Simpson in Support of Defendant's Opening Claim Construction
`Brief filed in Summit Data Systems, LLC v. EMC Corp. et al., Civil
`Action No. 1:10-cv-00749-GMS (D. Del. 2010), dated Oct. 26,
`2011).
`Pfaffenberger, Bryan, Webster's New World Dictionary of Computer
`Terms, Macmillan Publishing. Seventh Edition, pp. 160 and 169,
`copyright 2000 cited as Exhibit 3 in “Declaration of Todd M.
`Simpson in Support of Defendant's Opening Claim Construction
`Brief filed in Summit Data Systems, LLC v. EMC Corp. et al., Civil
`Action No. 1:10-cv-00749-GMS (D. Del. 2010), dated Oct. 26,
`2011).
`Merriam-Webster's Collegiate Dictionary, Merriam-Webster, Inc.,
`Tenth Edition, pp. 239,840, and 894, copyright 2000 cited as Exhibit
`4 in “Declaration of Todd M. Simpson in Support of Defendant's
`Opening Claim Construction Brief filed in Summit Data Systems,
`LLC v. EMC Corp. et al., Civil Action No. 1:10-cv-00749-GMS (D.
`Del. 2010), dated Oct. 26, 2011).
`Margolis, Philip E. Random House Webster's Computer & Internet
`Dictionary, Random House. Third Edition, p. 516, copyright 1999
`cited as Exhibit 5 in “Declaration of Todd M. Simpson in Support of
`Defendant's Opening Claim Construction Brief.” filed in Summit
`Data Systems, LLC v. EMC Corp. et al., Civil Action No. 1:10-cv
`00749-GMS (D. Del. 2010), dated Oct. 26, 2011).
`“Plaintiff Summit Data Systems LLC's Memorandum of Law in
`Support of its Proposed Claim Construction.” including Exhibits 1
`and 2, dated Oct. 26, 2011, filed in Summit Data Systems, LLC v. EMC
`Corp. et al., Civil Action No. 1:10-cv00749-GMS (D. Del. 2010).
`“Declaration of Myron Zimmerman in Support of Plaintiff Summit
`DataSolutions LLC's Proposed Claim Constructions.” dated Oct. 26.
`2011, cited as Exhibit 3 in “Plaintiff Summit Data Systems LLC's
`Memorandum of Law in Support of its Proposed Claim Construc
`tion.” filed in Summit Data Systems, LLC v. EMC Corp. et al., Civil
`Action No. 1:10-cv-00749-GMS (D. Del. 2010), dated Oct. 26,
`2011).
`
`Adobe - Exhibit 1009, page 2
`
`
`
`US 8,271,606 B2
`Page 3
`
`“Defendants' Answering Claim Construction Brief” including Tab
`A dated Nov. 21, 2011, filed in Summit Data Systems, LLC v. EMC
`Corp. et al., Civil Action No. 1:10-cv-00749-GMS (D. Del. 2010).
`“Plaintiff Summit Data Systems LLC's Answering Memorandum of
`Law in Support of its Proposed Claim Constructions.” dated Nov. 21.
`2011, filed in Summit Data Systems, LLC v. EMC Corp. et al., Civil
`Action No. 1:10-cv-00749-GMS (D. Del. 2010).
`
`"Joint Appendix of Intrinsic Evidence.” dated Nov. 21, 2011, filed in
`Summit Data Systems, LLC v. EMC Corp. et al., Civil Action No.
`1:10-cv-00749-GMS (D. Del. 2010).
`Office Action dated Jun. 7, 2012 issued with respect to U.S. Appl. No.
`12/195.244.
`* cited by examiner
`
`Adobe - Exhibit 1009, page 3
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 1 of 9
`
`US 8,271,606 B2
`
`Z || ||
`
`Z || ||
`
`
`
`
`1mm'S'fl
`
`
`
`z10z‘81119$
`
`6JOI199HS
`
`
`
`Zfl909‘ILZ‘8Sfl
`
`112
`
`112
`
`112
`
`112
`
`NETWORK
`
`HOST
`COMPUTER
`
`
`
`CPU BOARD
`& PROCESSOR
`
`MSK
`ARRAY
`CONTROLLER
`
`BLOCK
`SERVER
`
`/ 22/7
`
`[7/67 1
`
`Adobe - Exhibit 1009, page 4
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 2 of 9
`
`US 8,271,606 B2
`
`
`
`
`èJEANES
`
`Adobe - Exhibit 1009, page 5
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 3 of 9
`
`US 8,271,606 B2
`
`
`
`
`
`
`mawd'S'fl
`
`
`
`z10z‘81119$
`
`6J0S199IIS
`
`
`
`Zfl909‘ILZ‘8Sfl
`
`90
`||
`
`104
`
`
`
`
`||
`
`00
`
`NETWORK OPTIONS
`
`100
`
`||
`
`90
`
`
`
`
`
`BLOCK
`
`
`
`HOST
`SERVER
`COMPUTER
`
`
`
`106
`
`106
`
`
`
`Adobe - Exhibit 1009, page 6
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 4 of 9
`
`US 8,271,606 B2
`
`
`
`
`\} E ANTES
`X|OOTE •
`| #?
`
`007
`LE)|OOS
`
`Adobe - Exhibit 1009, page 7
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 5 Of 9
`
`US 8,271,606 B2
`
`READ OPERATION
`
`OS
`
`HD
`
`HRW
`
`N
`
`SRW
`
`
`
`SD
`
`TME
`
`1H
`
`CD
`
`1S
`
`) 3H
`
`5S
`
`6S
`
`)5H
`
`A6 a
`
`Adobe - Exhibit 1009, page 8
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 6 of 9
`
`US 8,271,606 B2
`
`WRITE OPERATION
`
`OS
`
`HD
`
`
`
`HRW
`
`N
`
`SRW
`
`SD
`
`TIME
`
`OPTIONAL
`CACHE
`LAYER
`
`A76, a
`
`Adobe - Exhibit 1009, page 9
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 7 Of 9
`
`US 8,271,606 B2
`
`BOWdS \} EST)
`
`
`
`
`Z2//|E)|OOS7
`LE)
`OOSZ
`LEX'
`OOS
`
`9. LEX' OOS
`
`Adobe - Exhibit 1009, page 10
`
`
`
`810
`
`WEB
`BROWSER
`
`
`
`BLOCK
`SERVER
`
`CONFKl/MANAGEMENT
`PROGRAM
`
`820
`
`104
`
`PARHHON
`TABLE 1
`
`PARHHON
`TABLE N
`
`OACCESS PWVHEGES
`
`oASflGNED HOSKS)
`
` oDBKS/SECTORS
`
`PARHHON O
`
`(UNASSGNED)
`
`PARHHON 1
`
`(HOST 1)
`
`PARHHON 2
`
`(HOST 2)
`
`PARHHON 3
`
`(HOST 3)
`
`PARTHTON 4
`
`(HOST 1)
`
`14765 6’
`
`
`
`mew-112d'S'fl
`
`ZIOZ‘SI'd9S
`
`6108199HS
`
`
`
`Zfl909‘ILZ‘8Sfl
`
`Adobe - Exhibit 1009, page 11
`
`
`
`U.S. Patent
`
`Sep. 18, 2012
`
`Sheet 9 Of 9
`
`US 8,271,606 B2
`
`START
`
`HOST ATTEMPTS TO CONNECT
`TO BLOCK SERVER
`
`BLOCK SERVER ACCEPTS
`THE CONNECTION REO UEST
`
`HOST RECEIVES SOFTWARE VERSIONS
`FROM THE BLOCK SERVER
`
`HOST REPLIES TO BLOCK SERVER WITH
`A SELECTED VERSION
`
`BLOCK SERVER CHECKS RESPONSE
`FROM HOST AND AUTHENTCATES
`
`910
`
`915
`
`920
`
`925
`
`930
`
`BLOCK SERVER SENDS ACKNOWLEDGMENT
`OF THE AUTHENTCATION TO THE HOST
`
`935
`
`HOST SENDS A REQUEST TO DETERMINE
`AVAILABLE CAPACTY ON THE BLOCK SERVER
`
`940
`
`BLOCK SERVER SENDS CAPACITY AND THE
`NUMBER OF P CONNECTIONS AUTHORIZED
`
`945
`
`BLOCK SERVER ESTABLISHES 'LISTEN'
`SOCKETS FOR DATA CHANNELS FROM HOST
`
`950
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A6. A
`
`Adobe - Exhibit 1009, page 12
`
`
`
`1.
`NETWORK-BASED STORAGE SYSTEM
`CAPABLE OF ALLOCATING STORAGE
`PARTITIONS TO HOSTS
`
`PRIORITY CLAIM
`
`This application is a division of U.S. application Ser. No.
`1 1/683,944, filed Mar. 8, 2007 now U.S. Pat. No. 7,428,581,
`which is a continuation of U.S. application Ser. No. 09/927,
`894, filed Aug. 10, 2001 now U.S. Pat. No. 7,392.291, which
`claims the benefit of U.S. Provisional Appl. No. 60/224,664,
`filed Aug. 11, 2000. The disclosures of the aforesaid applica
`tions are hereby incorporated by reference.
`
`APPENDICES
`
`This specification includes appendices A-D which contain
`details of a commercial implementation. The appendices are
`provided for illustrative purposes, and not to define or limit
`the scope of the invention.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`1. Field of the Invention
`The present invention relates to storage systems for com
`puter networks, and more specifically, relates to Software
`25
`architectures for providing block level access to storage
`resources on a network.
`2. Description of the Related Art
`Various types of architectures exist for allowing host com
`puters to share hard disk drives and other storage resources on
`30
`a computer network. One common type of architecture
`involves the use of a central file manager. One problem with
`this architecture is that the failure of the central file manager
`can render the entire system inoperable. Another problem is
`that many software applications are not designed to use a
`central file manager.
`Some storage architectures overcome these deficiencies by
`allowing the host computers to access the storage resources
`directly over the network, without the use of a central file
`manager. Typically, these architectures allow the host to
`access the storage resources over a network connection at the
`block level (as opposed to the file level).
`
`35
`
`40
`
`SUMMARY OF THE INVENTION
`
`The present invention comprises a system architecture for
`providing block-level access to storage resources. Such as
`disk arrays, over a computer network without the need for a
`central file manager. The architecture embodies various
`inventive features that may be implemented individually or in
`combination.
`One feature of the architecture is a mechanism for dividing
`the physical storage space or units of a block-level storage
`server into multiple partitions, and for allocating these parti
`tions to hosts independently of one another. In a preferred
`embodiment, a partition can be allocated uniquely to a par
`ticular host, or can be allocated to a selected group of hosts (in
`which case different hosts may have different access privi
`leges to the partition). The partition or partitions assigned to
`a particular host may appear, and can be managed as, one or
`more local disk drives.
`
`45
`
`50
`
`55
`
`60
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features will now be described with refer
`ence to the drawings of certain embodiments of the invention,
`which are intended to illustrate, and not limit, the scope of the
`invention.
`
`65
`
`US 8,271,606 B2
`
`2
`FIG. 1 illustrates the primary hardware components of an
`example system in which the invention may be embodied,
`including a host computer and a block server.
`FIG. 2 illustrates the software architecture of the system of
`FIG. 1, including host-side and server-side device drivers and
`reader/writer (RW) components that operate according to the
`invention.
`FIG. 3 illustrates examples of the types of networks and
`network components that can be used to interconnect the
`hosts and block servers.
`FIG. 4 shows, in example form, how the concurrent socket
`connections are established between pairs of reader/writer
`components.
`FIG. 5 illustrates the flow of information between compo
`nents when a host computer performs a read from a block
`SeVe.
`FIG. 6 illustrates the flow of information between compo
`nents when a host computer performs a write to a block server.
`FIG. 7 illustrates how I/O requests are assigned to socket
`connections transparently to user-level applications, and
`illustrates how an I/O request may be subdivided for process
`ing over multiple TCP/IP connections.
`FIG. 8 illustrates how the physical storage of a block server
`may be divided into multiple partitions, each of which may be
`independently allocated to one or more host computer.
`FIG. 9 illustrates an authentication and discovery protocol
`through which a host computer is authenticated by a block
`server, and then obtains information for accessing the block
`SeVe.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`The system architecture described in this section, and in the
`attached appendices, embodies various inventive features that
`may be used individually or in combination. Some of these
`features may be implemented without others, and/or may be
`implemented differently than set forth herein, without depart
`ing from the scope of the invention as defined by the appended
`claims.
`I. Overview
`The present invention comprises a system architecture for
`providing block-level storage access over one or more com
`puter networks. The architecture is designed to incorporate
`any number of host computers and block-level storage servers
`communicating across a network or a combination of net
`works. In one embodiment, the architecture exports virtual
`ized storage blocks over TCP/IP connections. Because TCP/
`IP is used for communications between the host computers
`and block-level storage servers in a preferred embodiment, a
`variety of network topologies can be used to interconnect the
`host computers and the block servers of a given system. For
`example, for relatively small systems, the host computers and
`storage servers can be interconnected by a hub, while for
`larger systems, the hub may be replaced with a Switch.
`Depicted in FIG. 1 are the hardware components of a
`typical system that embodies the invention. The system
`includes a host computer 102 (“host”) and a block-level IP
`storage server 104 (“block server') interconnected by a net
`work 100 via respective network interface cards 106, such as
`10/100/1000 Base-T or 1000 Base-SX Gigabit Ethernet
`cards. The host computer 102 may be a standard PC or work
`station configured to operate as a server or as a user computer.
`The block server 104 may be a network-attached IP storage
`box or device which provides block-level data storage ser
`vices for host computers 102 on the network 100.
`
`Adobe - Exhibit 1009, page 13
`
`
`
`3
`In the illustrated embodiment, the block server 104
`includes a disk array controller 110 that controls an array of
`disk drives 112. A disk array controller 110 of the type
`described in U.S. Pat. No. 6,098,114 may be used for this
`purpose, in which case the disk drives 112 may be ATA/IDE
`drives. The disk array controller may supporta variety of disk
`array configurations, such as RAID 0, RAID 5, RAID 10, and
`JBOD, and is preferably capable of processing multiple I/O
`requests in parallel. The block server 104 also includes a CPU
`board and processor 108 for executing device drivers and
`related software. The block server may also include volatile
`RAM (not shown) for caching I/O data, and may include flash
`or other non-volatile solid state memory for storing configu
`ration information (see FIG. 8).
`In one embodiment, the network 100 may be any type or
`combination of networks that support TCP/IP sockets,
`including but not limited to Local Area Networks (LANs).
`wireless LANs (e.g., 802.11 WLANs), Wide Area Networks
`(WANs), the Internet, and direct connections. One common
`configuration is to locally interconnect the hosts 102 and
`block servers 104 by an Ethernet network to create an Ether
`net-based SAN (Storage Area Network). As depicted by
`dashed lines in FIG. 1, the host and the block server 102,104
`may be interconnected by a second network 100', using a
`second set of network cards 106", to provide increased fault
`tolerance (as described below). The two networks 100, 100'
`may be disparate networks that use different mediums and
`provide different transfer speeds. Some of the various net
`work options are described in more detail below with refer
`ence to FIG. 3.
`The software components of the architecture are shown in
`FIG. 2. The hostside 102 of the software architecture includes
`an operating system (O/S) 202 such as Unix, Windows NT, or
`Linux; a host-side device driver 204 (“host driver”) which
`communicates with the operating system 202; and a reader/
`writer (RW) component 200a (also referred to as an "agent')
`which communicates with the host driver 204. The storage
`side 104 of the software architecture includes a reader/writer
`(RW) component 200b and a storage-side device driver 206
`(“server driver') that are executed by the CPU board's pro
`cessor 108 (FIG.1). The server driver 206 initiates disk opera
`tions in response to I/O requests received from the server-side
`RW component 200b.
`The RW components 200a, 200b are preferably executed
`as separate processes that are established in pairs (one host
`side RW process and one server-side RW process), with each
`pair dedicated to a respective TCP/IP socket over a network
`100. The host RW 200a operates generally by “reading” I/O
`requests from the host driver 204, and “writing these
`requests onto the network 100. Similarly, the storage RW
`50
`200b operates generally by reading I/O requests from the
`network 100 and writing these requests to the server driver
`206. This process can occur simultaneously with transfers by
`other RW pairs, and can occur in any direction across the
`network 100. The RW components 200 also preferably per
`form error checking of transferred I/O data.
`Each RW process (and its corresponding socket) prefer
`ably remains persistent on its respective machine 102, 104,
`and processes I/O requests one at-a-time on a first-in-first-out
`basis until the connection fails or is terminated. A host com
`60
`puter 102 establishes a socket by sending a service request
`over a dedicated configuration socket to the relevant block
`server 104. Once a socket connection is established between
`a RW pair 200a, 200b, the socket handlesbi-directional traffic
`between the host computer 102 and block server 104.
`In the illustrated embodiment, the RW components 200 run
`as processes that are separate from the host and server drivers
`
`30
`
`35
`
`40
`
`45
`
`55
`
`65
`
`US 8,271,606 B2
`
`10
`
`15
`
`25
`
`4
`204, 206, respectively. The host-side 200a and storage-side
`200b RW could alternatively be implemented, for example, as
`one or more of the following: (a) part of the host and server
`drivers 204, 206 (respectively), (b) separate device drivers
`204, 206 (respectively), (c) separate kernel threads, (d) mul
`tiple threads within a single process, (e) multiple threads
`within multiple processes, and (f) multiple processes within a
`single thread.
`A host computer 102 may establish multiple logical con
`nections (sockets) to a given block server 104, and/or estab
`lish sockets to multiple different block servers 104 (as dis
`cussed below). An important benefit of this feature is that it
`allows multiple I/O requests from the same host to be pro
`cessed concurrently (each over a separate Socket) in a non
`blocking manner—if one socket fails, the I/O requests being
`performed over other sockets are not affected. Each socket is
`managed by a respective RW pair.
`An important function of the host driver 204 is that of
`virtualizing the storage provided by the block servers 204, so
`that all higher-level software processes on the host, such as
`the operating system and other user-level processes, view the
`block server storage as one or more local, physical disk
`drives. To accomplish this task, the host driver dynamically
`assigns I/O requests to TCP/IP socket connections without
`revealing the existence of Such connections, or any other
`network details, to user-level processes. The block server 104
`preferably appears to the hosts user-level processes as a SCSI
`device, allowing conventional Volume managers to be used.
`As described below in sub-section III, one embodiment of
`the architecture permits the physical storage of a block server
`104 to be divided into multiple, variable-size partitions. Each
`such partition may be independently allocated to one or more
`hosts, and may configured such that it is viewed and managed
`as a separate physical disk drive. In other embodiments,
`block-level access may be provided to the hosts without par
`titioning.
`FIG. 3 shows some of the various networks 100 and net
`work components that may be used to interconnect the host
`102 and block servers 104 of a given system. These include a
`hub 302 (commonly used to connect LAN segments), the
`Internet 304, a router 306 (a computer that forwards packets
`according to header information), a switch 308 (a device that
`filters and forwards packets between LAN segments), and a
`gateway 310 (a computer that interconnects two different
`types of networks). The system architecture allows any com
`bination of these network options to be used to interconnect a
`given host computer 102 and block server 104.
`An important feature of the architecture is that when the
`network 100 becomes inundated with traffic, a network 100
`administrator can either add network 100 capabilities on the
`fly or change the network 100 hardware without causing any
`loss of data. The host-side 102 and storage-side 104 software
`components are configured, using conventional methods, to
`detect and use new network 100 connections as they become
`available, and to retry operations until a connection is estab
`lished. For example, a network 100 administrator could ini
`tially connect thirty host computers 102 to a small number of
`block servers 104 using a network hub 302. When the number
`of computers reaches a level at which the network hub 302 is
`no longer suitable, a 1000-port switch could be added to the
`network 100 and the hub 302 removed without taking the
`network 100 off-line. The architecture functions this way
`because the host RW 200a creates a new sockets connection
`to the storage RW 200b automatically as new physical con
`nections become available.
`The architecture and associated storage control protocol
`present the storage resources to the host computers 102 as a
`
`Adobe - Exhibit 1009, page 14
`
`
`
`5
`logically contiguous array of bytes which are accessible in
`blocks (e.g., of 512 bytes). The logical data structures of the
`implementation Support byte level access, but disk drives
`typically export blocks which are of a predetermined size, in
`bytes. Thus, to access a given block, a block address (sector
`number) and a count of the number of blocks (sectors) is
`provided. In one embodiment, the protocol exports a 64-bit
`logical blockaddress (LBA) and 64-bit sector count. On write
`operations, the I/O write data request is packaged into a block
`structure on the host side 102. The block request and data are
`sent to the block server 104 over one or more of the socket
`connections managed by the host RW processes 200a. The
`architecture also allows data to be stored non-sequentially
`and allows for the storage medium to efficiently partition
`space and reclaim unused segments.
`Depicted in FIG. 4 are sample socket connections 400
`made by RW pairs 200 connecting over a network 100 to link
`host computers 102 to block servers 104. As mentioned
`above, the network 100 may actually consist of multiple
`networks 100, including fully redundant networks 100. Each
`host computer 102 can open one or more Socket connections
`400 (using corresponding RW pairs) to any one or more block
`servers 104 as needed to process I/O requests. New socket
`connections 400 can be opened, for example, in response to
`long network 100 response times, failed socket connections
`400, the availability of new physical connections, and
`increases in I/O requests. For example, a host computer 102
`can initially open two sockets 400 to a first block server 104;
`and subsequently open two more sockets 400 to another block
`server 104 as additional storage resources are needed.
`Another host computer 102 may have open Socket connec
`tions 400 to the same set of block servers 104 as shown. As
`described above, each socket 400 acts as an independent
`pipeline for handling I/O requests, and remains open until
`either an error occurs or the host 102 terminates the socket
`connection 400.
`II. Processing of Input/Output Requests
`FIGS. 5 and 6 illustrate a network storage protocol that
`may be used for I/O read operations and write operations
`(respectively) between a host computer 102 and a block
`server 104 over a socket connection 400. Located at the tops
`of the vertical lines in FIGS. 5 and 6 are abbreviations that
`denote components as follows.
`OS=Operating System
`HD=Host Driver 204
`HRW=Host Computer's Reader/Writer 200a
`N=Network
`SRW=Server Reader/Writer 200b (of block server)
`SD=Server Driver 206 (of block server)
`Time increases, but is not shown to scale, in these diagrams
`moving from top to bottom. Arrows from one vertical line to
`another generally represent the flow of messages or data
`between components. An arrow that begins and ends at the
`same component (vertical line) represents an action per
`formed by that component. The small circles in the figures
`represent rendezvous events.
`In one embodiment, as shown in FIG. 5, the host reader/
`writer (HRW) initially sends a request 1H to the host driver
`(HD) for an I/O command packet, indicating that the socket is
`available for use. This step can be viewed as the message “if
`you have work to do, give it to me.” The host driver eventually
`responds to this request by returning a command packet that
`specifies an I/O request, as shown. As represented by the
`arrow labeled 2H, the host reader/writer (HRW) translates the
`command packet into a network-generalized order. This step
`allows different, cross platform, computer languages to func
`tion on a common network 100. The local computational
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,271,606 B2
`
`6
`transformation of a host command packet, or host language,
`to a network command packet, or network language, is archi
`tecture specific.
`At this point, the host reader/writer (HRW) generates two
`networks