throbber
(12) Ulllted States Patent
`(10) Patent N0.:
`US 6,601,187 Bl
`
`Sicola ct a].
`(45) Date of Patent:
`Jul. 29, 2003
`
`UStJtJ6601187B1
`
`(75)
`
`(54) SYSTEM FOR DATA REPLICATION USING
`REDUNDANT PAIRS OF STORAGE
`CONTROLLERS, FIBRE CHANNEL FABRICS
`1.1
`1
`1
`1
`1
`,
`,
`AND LINKS [HEREBEI WLLN
`Inventors:
`Stephen J. Slcola, Monument, CO
`(US); Susan G. Elkington, Colorado
`Springs, CO (US); Michael D. Walker,
`Colorado Springs, CO (US); Paul
`Guttormsun, Colorado Springs, C0
`(US); Richard F. Lary, Colorado
`Springs, CO (US)
`
`.......... 395,132.07
`5,790,775 A
`$1998 Marks et al.
`
`___________ have
`6.1?8521 31 *
`1mm Filgate
`2:3?ng 3} * 2:333; 261?? mi----------------- iiiiiii
`,
`,
`*
`.
`ri
`t cta.
`6,4510% Bl *
`9t2002 DeKoniug et al. .......... Tllttt‘t
`OTHER PUBLICATIONS
`
`Sicola, Stephen J. et al., U.S. Patent Application, Fault
`Tolerant Storage Controller Utilizing Tightly Coupled Dual
`Controller Modules, Ser. No. (IBIO'I’IJIO, filed Jun. 4, 1993,
`pp. 1—90.
`
`* cited by examiner
`
`(73)
`
`Assignee:
`
`Hewlett-Packard Development
`Company, L. P., Houston, TX {US}
`
`Primary Examiner—Robert Beausoliel
`Assistant Examiner—Christopher McCarthy
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(31)
`
`Appl. No.: 095359.745
`
`(22
`
`(51)
`(52)
`(58)
`
`(56)
`
`Filed:
`
`Mar. 31, 2000
`
`Int. Cl.7 ................................................. G06F 11;“00
`U.S. Cl.
`............................................ 714r’6; 711;“162
`Field of Search ......................... 714;“6, 9; 711;“162.
`711;“ 161
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5 274 045 A
`5544,34? A *
`5,768,623 A
`
`12-1993 Idlemanetal.
`8,-‘1996 Yanai et aL
`651998 Judd et a].
`
`371;“101
`
`711f162
`.................. 395857
`
`[5?)
`
`ABSTRACT
`
`Adata replication system having a redundant configuration
`including dual Fibre Channel fabric links interconnecting
`each of the components of two data storage sites, wherein
`each site comprises a host computer and associated data
`storage array, with redundant array controllers and adapters.
`Each array controller in the system is capable of performing
`all of the data replication functions, and each host ‘sees’
`remote data as if it were local. Each array controller has a
`dedicated link via a fabric to a partner on the remote side of
`the long-distance link between fabric elements. Each dedi-
`cated link does not appear to any host as an available link to
`them for data access, however, it is visible to the partner
`array controllers involved in data replication operations.
`These links are managed by each partner array controller as
`if being ‘clustered’ with a reliable data link between them.
`
`16 Claims, 11 Drawing Sheets
`
`100
`
`)
`
`l 107
`
`SWITCHED
`FABRIC
`
`110'
`
`DHPN—1004 I Page 1 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 1 0f 11
`
`US 6,601,187 B1
`
`110'
`
`100)
`
`l107
`
`80
`IE0
`#2
`gu.
`
`0')
`
`Fig.1
`
`DHPN—1004 I Page 2 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 2 0f 11
`
`US 6,601,187 B1
`
`Fig.2
`
`100)
`
`
`
`218
`
`DHPN—1004 I Page 3 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 3 of 11
`
`US 6,601,187 B1
`
`Q
`
`>-
`
`éD
`
`:<
`
`§ )
`
`- EI
`
`I
`<1:
`
`Dow
`
`m.3
`
`
`
`
`
`
`
`mom_‘_.\Emmu
`
`DHPN—1004 I Page 4 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 4 0f 11
`
`US 6,601,187 B1
`
`401
`
`Fig.4
`
`DHPN—1004 I Page 5 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 5 0f 11
`
`US 6,601,187 B1
`
`505
`
`510
`
`HOST PORT
`TARGET CODE
`
`HOST PORT
`INITIATOR CODE
`
`
`
`
`
`PPRC MANAGER
`
`
`
`
`VA & CACHE MGR
`
`DEVICES SERVICES
`
`
`
`515
`520
` 525
`
`
`
`
`TO EXTERNAL DEVICES
`
`Fig. 5
`
`DHPN—1004 I Page 6 of 20
`
`

`

`US. Patent
`
`JuL29,2003
`
`Sheet 6 of 11
`
`US 6,601,187 B1
`
`A1SENDS
`ECHO
`“DB1VM
`UNK1
`
`A1STARTS
`UNK
`“MER
`
`A1SENDS
`CMD
`“DB1VM
`UNK1
`
`A1STARTS
`COMMAND
`NMER
`
`
`
`HMESOUT
`
`ATSCOMMAND
`DRUNK
`NMER
`
`A1
`TRANSFERS
`CONTROLTO
`A2
`
`A2
`COMMUNICATES
`VWTHB2
`WALNKZ
`
`82
`ACCESSES
`LUNX'
`
`Fig. 6A
`
`DHPN—1004 I Page 7 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 7 0f 11
`
`US 6,601,187 B1
`
`635
`
`
`
`C, C! SET
`HB TIMERS
`
`C, C! SEND
`PING
`TO EACH OTHER
`
`645
`
`C, C! RESET
`HB TIMERS
`
`BOTH
`C, C!
`RECEIVE PING
`
`
`
`
`
`
`
`
`
`C FAILS
`
`CI'S
`HB TIMER
`TIMES OUT
`
`C! PERFORMS
`CONTROLLER
`FAILOVER
`
`
`C! SENDS
`DATA OVER
`
`
`C!'S LINK
`
`
`Fig. 63
`
`DHPN—1004 I Page 8 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 8 0f 11
`
`US 6,601,187 B1
`
`701
`
`HOST ISSUES
`WRITE COMMAND
`
`705
`
`CONTROLLER
`RECEIVES
`
`WRITE COMMAND
`
`710
`
`COMMAND
`SENT TO
`
`VA LAYER
`
`715
`
`WRITE DATA
`TO WRITE-BACK
`CACHE
`
`VA
`RETAINS
`CACHE LOCK
`
`720
`
`SEND WRITE
`DATA TO
`
`REMOTE TARGET
`
`725
`
`TARGET WRITES
`DATA TO
`
`W—B CACHE
`
`730
`
`735
`
`COMPLETION
`
`TARGET SENDS
`COMPLETION
`STATUS TO
`INITIATOR
`
`PPRC MGR
`NOTIFIES
`VA OF
`
`745
`
`STATUS SENT
`To HOST
`
`740
`VA COMPLETES
`WRITE
`OPERATION
`
`
`
`Fig. 7
`
`DHPN—1004 I Page 9 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 9 0f 11
`
`US 6,601,187 B1
`
`HOST ISSUES
`WRITE COMMAND
`
`CONTROLLER
`RECEIVES
`WRITE COMMAND
`
`COMMAND
`SENT TO
`VA LAYER
`
`301
`
`805
`
`810
`
`WRITE DATA
`TO WRITE-BACK
`CACHE
`
`VA
`RETAINS
`CACHE LOCK
`
`815
`
`M ICRO-LOG
`TH E
`LB N EXTE NT
`
`SEND HOST
`COMPLETION
`STATUS
`
`820
`
`825
`
`830
`
`SEND WRITE
`DATA TO
`REMOTE TARGET
`
`835
`
`TARGET WRITES
`DATA TO
`W-B CACHE
`
`
`
`
`CONTROLLER
`
`845
`340
`TARGET SENDS
`REMOTE COPY Y
`
`
`COMPLETION STATUS
`CC 3
`..
`SEGMELEITEIDIEPY
`TO 'N' I 'ATOR
`
`
` MARK 'WRITE ppRc MGR
`
`SS Y CRASH WSSS
`COMPLETE'
`C
`STETHE IN
`UNLO ES
`MICRO-LOG
`COPY IN T
`.
`
`
`
`PERFORM OTHER m Fig. 8A
`
`
`
`860
`
`DHPN—1004 I Page 10 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 10 of 11
`
`US 6,601,187 B1
`
`
`
`INHIBIT
`HOST
`WRITES
`
`865
`
`
`
`'RE-PLAY'
`REMOTE COPY
`PRESERVING
`ORDER
`
`
`
`REMOTE
`
`
`CLEAR
`DO OTHER
`COPY
`RECORDED
`RECOVERY,I
`
`
`
`
`SUCCESSFUL?
`ERROR HANDLING
`WRITE ENTRY
`
`
`
`
`
`
`
`
`ALL
`ALLOW
`
`HOST
`ENTRIES
`
`
`
`CLEAR?
`TO RESUME
`
`
`
`
`
`Fig. 8B
`
`DHPN—1004 I Page 11 of 20
`
`

`

`US. Patent
`
`Jul. 29, 2003
`
`Sheet 11 0f 11
`
`US 6,601,187 B1
`
`
`
`213
`
`203
`
`Fig.9
`
`DHPN—1004 I Page 12 of 20
`
`

`

`US 6,601,187 Bl
`
`l
`SYSTEM FOR DATA REPLICATION USING
`REDUNDANT PAIRS OF STORAGE
`CONTROLLERS, FIBRE CHANNEL FABRICS
`AND LINKS THEREBETWEEN
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to error recovery
`in data storage systems, and more specifically, to a system
`for providing controller-based remote data replication using
`a redundantly configured Fibre Channel Storage Area Net-
`work to support data recovery after an error event which
`causes loss of data access at the local site due to a disaster
`at the local site or a catastrophic storage failure.
`
`10
`
`15
`
`BACKGROUND OF THE INVENTION AND
`PROBLEM
`
`It is desirable to provide the ability for rapid recovery of
`user data from a disaster or significant error event at a data
`processing facility. This type of capability is often termed
`‘disaster tolerance’. In a data storage environment, disaster
`tolerance requirements include providing for replicated data
`and redundant storage to support recovery after the event. In
`order to provide a safe physical distance between the origi- ,
`nal data and the data to backed up, the data must be migrated
`from one storage subsystem or physical site to another
`subsystem or site. It is also desirable for user applications to
`continue to run while data replication proceeds in the
`background. Data warehousing,
`'continuous computing’,
`and Enterprise Applications all require remote copy capa-
`bilities.
`
`30
`
`Storage controllers are commonly utilized in computer
`systems to off-load from the host computer certain lower
`level processing functions relating to NO operations, and to
`serve as interface between the host computer and the physi—
`cal storage media. Given the critical role played by the
`storage controller with respect
`to computer system It’O
`performance,
`it
`is desirable to minimize the potential for
`interrupted It'O service due to storage controller malfunc-
`tion. Thus, prior workers in the art have developed various
`system design approaches in an attempt to achieve some
`degree of fault tolerance in the storage control function.
`One prior method of providing storage system fault
`tolerance accomplishes failover through the use of two
`controllers coupled in an activet’passive configuration. Dur-
`ing failover, the passive controller takes over for the active
`(failing) controller. Adrawback to this type of dual configu-
`ration is that it cannot support load balancing, as only one
`controller is active and thus utilized at any given time, to _
`increase overall system performance. Furthermore, the pas-
`sive controller presents an inefficient use of system
`resources.
`
`40
`
`45
`
`Another approach to storage controller fault tolerance is
`based on a process called ‘failover’. Failover is known in the
`art as a process by which a first storage controller, coupled
`to a second controller, assumes the responsibilities of the
`second controller when the second controller fails.
`'Fail-
`back’ is the reverse operation, wherein the second controller,
`having been either repaired or replaced, recovers control
`over its originally-attached storage devices. Since each
`controller is capable of accessing the storage devices
`attached to the other controller as a result of the failover,
`there is no need to store and maintain a duplicate copy of the
`data, i.e., one set stored on the first controller’s attached
`devices and a second (redundant) copy on the second
`controller’s devices.
`
`55
`
`60
`
`65
`
`2
`US. Pat. No. 5,274,645 (Dec. 38, 1993), to Idleman et al.
`discloses a dual-active configuration of storage controllers
`capable of performing failover without the direct involve-
`ment of the host. However, the direction taken by Idleman
`requires a multi—level storage controller implementation.
`Each controller in the dual-redundant pair includes a two-
`level hierarchy of controllers. When the first level or host-
`interface controller of the first controller detects the failure
`of the second level or device interface controller of the
`
`second controller, it re-configures the data path such that the
`data is directed to the functioning second level controller of
`the second controller. In conjunction, a switching circuit
`re-configures the controller-device interconnections, thereby
`permitting the host to access the storage devices originally
`connected to the failed second level controller through the
`operating second level controller of the second controller.
`Thus, the presence of the first level controllers serves to
`isolate the host computer from the failover operation, but
`this isolation is obtained at added controller cost and com—
`plexity.
`Other known failover techniques are based on proprietary
`buses. These techniques utilize existing host interconnect
`“hand-shaking” protocols, whereby the host and controller
`act
`in cooperative eifort
`to effect a failover operation.
`Unfortunately,
`the “hooks” for
`this and other
`types of
`host-assisted failover mechanisms are not compatible with
`more recently developed, industry-standard interconnection
`protocols, such as SCSI, which were not developed with
`failover capability in mind. Consequently, support for dual-
`active failover in these proprietary bus techniques must be
`built
`into the host firmware via the host device drivers.
`Because SCSI, for example, is a popular industry standard
`interconnect, and there is a commercial need to support
`platforms not using proprietary buses, compatibility with
`industry standards such as SCSI is essential. Therefore, a
`vendor-unique device driver in the host is not a desirable
`option.
`US. patent application Ser. No. 082071310, to Sicola et
`al., describes a dual-active, redundant storage controller
`configuration in which each storage controller communi-
`cates directly with the host and its own attached devices, the
`access of which is shared with the other controller. Thus, a
`failover operation may be executed by one of the storage
`controller without the assistance of an intermediary control-
`ler and without the physical reconfiguration of the data path
`at the device interface. However, the technology disclosed in
`Sicola is directed toward a localized configuration, and does
`not provide for data replication across long distances.
`US. Pat. No. 5,790,775 (Aug. 4, 1998) to Marks et al.,
`discloses a system comprising a host CPU, a pair of storage
`controllers in a dual-active, redundant configuration. The
`pair of storage controllers reside on a common host side
`SCSI bus, which serves to couple each controller to the host
`CPU. Each controller is configured by a system user to
`service zero or more, preferred host side SCSI IDs, each host
`side ID associating the controller with one or more units
`located thereon and used by the host CPU to identify the
`controller when accessing one of the associated units. If one
`of the storage controllers in the dual-active,
`redundant
`configuration fails, the surviving one of the storage control-
`lers automatically assumes control of all of the host side
`SCSI IDs and subsequently responds to any host requests
`directed to the preferred, host side SCSI IDS and associated
`units of the failed controller. When the surviving controller
`senses the return of the other controller,
`it releases to the
`returning other controller control of the preferred, SCSI IDS
`of the failed controller.
`
`DHPN—1004 I Page 13 of 20
`
`

`

`US 6,601,187 Bl
`
`10
`
`15
`
`3O
`
`40
`
`45
`
`3
`In another aspect of the Marks invention, the failover is
`made to appear to the host CPU as simply a re-initialization
`of the failed controller. Consequently, all transfers outstand-
`ing are retried by the host CPU after
`time outs have
`occurred. Marks discloses ‘transparent failover’ which is an
`automatic technique that allows for continued operation by
`a partner controller on the same storage bus as the failed
`controller. This technique is useful in situations where the
`host operating system trying to access storage does not have
`the capability to adequately handle multiple paths to the
`same storage volumes. Transparent
`failover makes the
`failover event look like a ‘power—on reset’ of the storage
`device. However, transparent failover has a significant flaw:
`it is not fault tolerant to the storage bus. [f the storage bus
`fails, all access to the storage device is lost.
`us. Pat. No. 5,768,623 (Jun. 16, 1998) to Judd et a]_,
`describes a system for storing data for a plurality of host
`computers on a plurality of storage arrays so that data on
`each storage array can be accessed by any host computer.
`There is an adapter communication interface (interconnect)
`between all of the adapters in the system to provide peer-
`to-peer communications. Each host has an adapter which
`provides controller functions for a separate array designated
`as a primary array (i .e., each adapter functions as an array
`controller}. There are also a plurality of adapters that have '
`secondary control of each storage array. A secondary adapter
`controls a designated storage array when an adapter prima—
`rily controlling the designated storage array is unavailable.
`The adapter communication interface interconnects all
`adapters, including secondary adapters. Interconnectivity of
`the adapters is provided by a Serial Storage Architecture
`(SSA) which includes SCSI as a compatible subset. Judd
`indicates that the SSA network could be implemented with
`various topologies including a switched configuration.
`However, the Judd system elements are interconnected in
`a configuration that comprises three separate loops, one of
`which requires four separate links. Therefore, this configu-
`ration is complex from a connectivity standpoint, and has
`disadvantages in areas including performance, physical
`cabling, and the host involvement required to implement the
`technique. The performance of the Judd invention for data
`replica lion and failover is hindered by the ‘bucket brigade’
`of latency to replicate control information about commands
`in progress and data movement
`in general. The physical
`nature of the invention requires many cables and i nteroon-
`nects to ensure fault tolerance and total interconnectivity,
`resulting in a system which is complex and error prone. The
`tie-in with host adapters is host operating system (DES)
`dependent on an OS platform-by-platfonn basis, such that
`the idiosyncrasies of each platform must be taken into '
`account for each different OES to be used with the .ludd
`system.
`there is a clearly felt need in the art for a
`Therefore,
`disaster tolerant data storage system capable of performing
`data backup and automatic failover and failback without the
`direct involvement of the host computer; and wherein all
`system components are visible to each other via a redundant
`network which allows for extended clustering where local
`and remote sites share data across the network.
`
`55
`
`60
`
`SOLUTION TO THE PROBLEM
`
`the above problems are solved, and an
`Accordingly,
`advance in the field is accomplished by the system of the
`present invention which provides a completely redundant
`configuration including dual Fibre Channel
`fabric links
`interconnecting each of the components of two data storage
`
`65
`
`4
`sites, wherein each site comprises a host computer and
`associated data storage array, with redundant array control-
`lers and adapters. The present system is unique in that each
`array controller is capable of performing all of the data
`replication functions, and each host ‘sees’ remote data as if
`it were local.
`
`The 'mirroring’ of data for backup purposes is the basis
`for RAID (‘Redundant Array of Independent
`[or
`Inexpensive] Disks’) level 1 systems, wherein all data is
`replicated on N separate disks, with N usually having a value
`of 2. Although the concept of storing copies ofdata at a long
`distance from each other [i.e., long distance mirroring) is
`known, the use of a switched, dual-fabric, Fibre Channel
`configuration as described herein is a novel approach to
`disaster tolerant storage systems. Mirroring requires that the
`data be consistent across all volumes. In prior art systems
`which use host -based mirroring (where each host computer
`sees multiple units), the host maintains consistency across
`the units. For those systems which employ controller-based
`mirroring (where the host computer sees only a single unit),
`the host is not signaled completion of a command until the
`controller has updated all pertinent volumes. The present
`invention is, in one aspect, distinguished over the previous
`two types of systems in that the host computer sees multiple
`volumes, but the data replication function is performed by
`the controller. Therefore, a mechanism is required to com-
`municate the association between volumes to the controller.
`To maintain this consistency between volumes, the system
`of the present invention provides a mechanism of associat-
`ing a set of volumes to synchronize the logging to the set of
`volumes so that when the log isoonsistent when it is "played
`back” to the remote site.
`
`Each array controller in the present system has a dedi—
`cated link via a fabric to a partner on the remote side of the
`long-distance link between fabric elements. Each dedicated
`link does not appear to any host as an available link to them
`for data access; however, it is visible to the partner array
`controllers involved in data replication operations. These
`links are managed by each partner array controller as if
`being ‘clustered‘ with a reliable data link between them.
`The fabrics comprise two components, a local element
`and a remote element. An important aspect of the present
`invention is the fact
`that
`the fabrics are 'extended’ by
`standard e-ports (extension ports). The use of e-ports allow
`for standard Fibre Channel cable to be run between the
`fabric elements or the use of a conversion box to covert the
`data to a form such as telco ATM or IP. The extended fabric
`allows the entire system to be viewable by both the hosts and
`storage.
`The dual fabrics, as well as the dual array controllers, dual
`adapters in hosts, and dual links between fabrics, provide
`high-availability and present no single point of failure. A
`distinction here over the prior art is that previous systems
`typically use other kinds of links to provide the data
`replication, resulting in the storage not being readily
`exposed to hosts on both sides of a link. The present
`configuration allows for extended clustering where local a nd
`remote site hosts are actually sharing data across the link
`from one or more storage subystems with dual array con-
`trollers within each subsystem.
`The present system is further distinguished over the prior
`art by other additional features, including independent dis-
`covery of initiator to target system and automatic rediscov-
`ery after link failure. In addition, device failures, such as
`controller and link failures, are detected by ‘heartbeat’
`monitoring by each array controller. Furthermore, no special
`
`DHPN—1004 I Page 14 of 20
`
`

`

`US 6,601,187 Bl
`
`5
`host software is required to implement the above features
`because all replication functionality is totally self contained
`within each array controller and automatically done without
`user intervention.
`
`An additional aspect of the present system is the ability to
`function over two links with data replication traffic. If failure
`of a link occurs, as detected by the 'initiator’ array
`controller, that array controller will automatically ‘failover’,
`or move the base of data replication operations to its partner
`controller. At this time, all transfers in flight are discarded,
`and therefore discarded to the host. The host simply sees a
`controller failover at the host OS (operating system} level,
`causing the OS to retry the operations to the partner con-
`troller.
`
`The array controller partner continues all ‘initiator’ opera-
`tions frorn that point forward. The array controller whose
`link failed will continuously watch that status of its link to
`the same controller on the other 'far’ side of the link. That
`status changes to a ‘good’ link when the array controllers
`have established reliable communications between each
`other. When this occurs,
`the array controller ‘initiator’
`partner will ‘failback’ the link, moving operations back to
`newly reliable link. This procedure re-establishes load bal-
`ance for data replication operations automatically, without
`requiring additional features in the array controller or host
`beyond what
`is minimally required to allow controller
`failover.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`15
`
`3O
`
`The above objects, features and advantages of the present
`invention will become more apparent from the following
`detailed description taken in conjunction with the accom-
`panying drawings, in which:
`FIG. 1 is a diagram showing long distance mirroring;
`FIG. 2 illustrates a switched dual fabric, disaster-tolerant
`storage system;
`FIG. 3 is a block diagram of the system shown in FIG. 2;
`FIG. 4 is a high-level diagram of a remote copy set
`operation;
`FIG. 5 is a block diagram showing exemplary controller
`software architecture;
`FIG. 6A is a flow diagram showing inter-site controller
`heartbeat timer opcration;
`FIG. 6B is a flow diagram showing intra-site controller
`heartbeat timer operation;
`FIG. 7 is a flowchart showing synchronous system opera-
`tion;
`FTG. 8A is a flowchart showing asynchronous system '
`operation;
`FTG. 8B is a flowchart showing a ‘micro-merge’ opera-
`tion; and
`FTG. 9 is a diagram showing an example of a link failover
`operation.
`
`40
`
`45
`
`55
`
`DETAILED DESCRIPTION
`
`invention comprises a data
`The system of the present
`backup and remote copy system which provides disaster
`tolerance. In particular, the present system provides a redun-
`dant peer-to-peer remote copy function which is imple-
`mented as a controller-based replication of one or more
`LUNs (Logical Unit Numbers) between two separate pairs
`of array controllers.
`FIG. 1 is a diagram showing long distance mirroring,
`which is an underlying concept of the present invention. The
`
`60
`
`65
`
`6
`present system 100 employs a switched, dual-fabric, Fibre
`Channel configuration to provide, a disaster tolerant storage
`system. Fibre Channel is the general name of an integrated
`set of standards developed by the American National Stan-
`dards Institute (ANSI) which defines protocols for informa-
`tion transfer. Fibre Channel supports multiple physical inter-
`face types, multiple protocols over a common physical
`interface, and a means for interconnecting various interface
`types. A ‘Fibre Channel’ may include transmission media
`such as copper coax or twisted pair copper wires in addition
`to (or in lieu of) optical fiber.
`As shown in FIG. 1, when host computer 101 writes data
`to its local storage array, an initiating node, or ‘initiator’ 111
`sends a backup copy of the data to remote ‘target’ node 112
`via a Fibre Channel switched fabric 103. A ‘fabric’ is a
`topology (explained in more detail below) which supports
`dynamic interconnections between nodes through pons con-
`nected to the fabric. In FIG. 1, nodes 111 and 112 are
`connected to respective links 105 and 106 via ports 109. A
`node is simply a device which has at least one port to provide
`access external lo the device. In the context of the present
`system 100, a node typically includes an array controller pair
`and associated storage array. Each pen in a node is generi-
`cally termed an N (or NL) port. Ports 109 (array controller
`ports) are thus N ports. Each port in a fabric is generically
`termed an F (or W) port. In FIG. 1, links 105 and 106 are
`connected to switched fabric 103 via F ports 107. More
`specifically, these F ports may be standard E ports [extension
`ports) or E portiFC-BBport pairs, as explained below.
`In general, it is possible for any node connected to a fabric
`to communicate with any other node connected to other F
`ports of the fabric, using services provided by the fabric. In
`a fabric topology, all routing of data frames is performed by
`the fabric, rather than by the ports. This any-to-any connec-
`tion service (‘peer—to-peer’ service) provided by a fabric is
`integral to a Fibre Channel system. It should be noted that
`in the context of the present system, although a second host
`computer 102 is shown (at the target site) in FIG. 1, this
`computer is not necessary for operation of the system 100 as
`described herein.
`
`An underlying operational concept employed by the
`present system 100 is the pairing of volumes (or LUNs) on
`a local array with those on a remote array. The combination
`of volumes is called a Remote Copy Set. ARemote Copy Set
`thus consists of two volumes, one on the local array, and one
`on the remote array. For example, as shown in FIG. 1, a
`Remote Copy Set might consist of LUN l (110} on a storage
`array at site 101 and LUN 1‘ (110‘) on a storage array at site
`102. The array designated as the ‘local’ array is called the
`initiator, while the remote array is called the target. Various
`methods for synchronizing the data between the local and
`remote array are possible in the context of the present
`system. These synchronization methods range frotn full
`synchronous to fullyr asynchronous data transmission, as
`explained below. The system user’s ability to choose these
`methods provides the user with the capability to vary system
`reliability with respect to potential disasters and the recovery
`after such a disaster. The present system allows choices to be
`tirade by the user based on factors which include likelihood
`of disasters and the critical nature of the user’s data.
`
`System Architecture
`
`FIG. 2 illustrates an exemplary configuration of the
`present invention, which comprises a switched dual fabric,
`disaster-tolerant storage system 100. The basic topology of
`the present system 100 is that of a switched-based Storage
`
`DHPN—1004 I Page 15 of 20
`
`

`

`US 6,601,187 Bl
`
`10
`
`15
`
`7"
`Area Network (SAN). As shown in FIG. 2, data storage sites
`218 and 219 each respectively comprise two hosts 101E101A
`and 102110224, and two storage array controllers 201E202
`and 211x212 connected to storage arrays 203 and 213,
`respectively. Alternatively, only a single host 101;?102, or
`more than two hosts may be connected to system 100 at each
`site 2181219. Storage arrays 203 and 213 typically comprise
`a plurality of magnetic disk storage devices, but could also
`include or consist of other types of mass storage devices
`such as semiconductor memory.
`In the configuration of FIG. 2, each host at a particular site
`is connected to both fabric elements {i.e., switches) located
`at that particular site. More specifically, at site 218, host 101
`is connected to switches 204- and 214- via respective paths
`231A and 231B; host 101A is connected to the switches via
`paths 241A and 2413. Also located at site 218 are array
`controllers A1 (ref. no. 201) and A2 (ref. no. 202). Array
`controller A1 is connected to switch 204 via paths 221H and
`221D; array controller A2 is connected to switch 214 via
`paths 222H and 222D. The path suffixes ‘H’ and ‘D’ refer to
`'Host’ and 'Disaster-tolerant’ paths,
`respectively, as
`explained below. Site 219 has counterpart array controllers
`B1 (ref. no 211) and B2 (ref. no. 212), each of which is
`connected to switches 205 and 215. Note that array control-
`lers B1 and B2 are connected to switches 205 and 215 via '
`paths 251D and 252D, which are, in effect, continuations of
`paths 221D and 222D, respectively.
`In the present system shown in FIG. 2, all storage sub-
`systems (203t’2042’205 and 2132214f215) and all hosts (101,
`101A, 102, and 102A) are visible to each other over the SAN
`103N103B. This configuration provides for high availabil-
`ity with a dual fabric, dual host, and dual storage topology,
`where a single fabric, host, or storage can fail and the system
`can still continue to access other system components via the
`SAN. As shown in FIG. 2, each fabric 103N103B employed
`by the present system 100 includes two switches intercon-
`nected by a high-speed link. More specifically, fabric 103A
`comprises switches 204 and 205 connected by link 223A,
`while fabric 103B comprises switches 214 and 215 con-
`nected by link 223B.
`Basic Fibre Channel technology allows the length oflinks
`223N223B (i.e., the distance between data storage sites) to
`be as great as 10 KM as per the FC-PH3 specification (see
`Fibre Channel Standard: Fibre Channel Physical and Sig-
`naling Interface, AN511 X3T11). However, distances of 20
`KM and greater are possible given improved technology and
`FC-PH margins with basic Fibre Channel. FC-BB {Fibre
`Channel Backbone) technology provides the opportunity to
`extend Fibre Channel over leased Telco lines (also called
`WAN tunneling). In the case wherein FC-BB is used for
`links 223A and 2233, FC-BB ports are attached to the E
`ports to terminate the ends of links 223A and 223B.
`It is also possible to interconnect each switch pair 204E205
`and 2141215 via an Internet link (223N223B). If the redun—
`dartt links 223A and 223B between the data storage sites
`218i219 are connected to different ISPs (Internet Service
`Providers) at
`the same site, for example, there is a high
`probability of having at least one link operational at any
`given time. This is particularly true because of the many
`redundant paths which are available over the Internet
`between ISPs. For example, switches 204 and 214 could be
`connected to separate ISPs, and switches 205 and 215 could
`also be connected to separate ISPs.
`FIG. 3 is an exemplary block diagram illustrating addi-
`tional details of the system shown in FIG. 2. The configu-
`ration of the present system 100, as shown in FIG. 3, depicts
`
`8
`only one host per site for the sake of simplicity. Each host
`1011102 has two adapters 308 which support the dual fabric
`topology. The hosts typically run multi-pathing software that
`dynamically allows failover between storage paths as well as
`static load balancing of storage volumes (LUNs) between
`the paths to the controller-based storage arrays 201,202 and
`2111212. The configuration of system 100 allows for appli-
`cations using either of the storage arrays 203K213 to continue
`running given any failure of either fabric 103N103B or
`either of the storage arrays.
`The array controllers 2011202 and 211f212 employed by
`the present system 100 have two host ports 109 per array
`controller, for a total of four connections (ports) per pair in
`the dual redundant configuration of FIG. 3. Each host port
`109 preferably has an optical attachment to the switched
`fabric, for example, a Gigabit I_ink Module (‘GI.M’) inter-
`face at the controller, which connects to a Gigabit Converter
`(‘GBIC’) module comprising the switch interface port 107.
`Switch interconnection ports 306 also preferably comprise
`GBIC modules. Each pair of array controllers 201.5202 and
`211r212 (and associated storage array)
`is also called a
`storage node (e.g., 301 and 302), and has a unique Fibre
`Channel Node Identifier. As shown in FIG. 3, array control-
`ler pair AlflA2 comprise storage node 301, and array con-
`troller pair B11132 comprise storage node 302. Furthermore,
`each storage node and each port on the array controller has
`a unique Fibre Channel Port Identifier, such as a World—Wide
`ID (WWID). In addition, each unit connected

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket