`
`|PR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`DHPN-1003 I Page 1 of 181
`DHPN-1003 I Page 1 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Clusters for
`High Availability
`
`DHPN-1003 I Page 2 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`•
`
`Hewlett-Packard Professional Books
`
`Blinn
`
`Blommers
`Costa
`
`Crane
`
`Fernandez
`Fristrup
`
`Fristrup
`Grady
`
`Grosvenor, Ichiro,
`O'Brien
`
`Gunn
`Helsel
`Helsel
`Kane
`Knouse
`Lewis
`Madell, Parsons, Abegg
`Malan, Letsinger,
`Coleman
`McFarland
`
`Pmtable Shell Programming: An Extensive Collection of
`Bourne Shell Examples
`
`Practical Planning for Network Growth
`
`Planning and Designing High Speed Networks
`Using lOOVG-AnyLAN, Second Edition
`
`A Simplified Approach to Image Processing: Classical and
`Modem Techniques
`
`Configuring the Common Desktop Environment
`
`USENET: Netnews for Everyone
`
`The Essential Web Smfer Survival Guide
`
`Practical Software Metrics for Project
`Management and Process Improvement
`
`Mainframe Downsizing to Upsize Your Business:
`IT-Preneuring
`
`A Guide to Net Ware® for UNIX®
`Graphical Programming: A Tutorial for HP VEE
`
`Visual Programming with HP-VEE
`
`PA-RISC 2.0 Architecture
`
`Practical DCE Programming
`
`The Art & Science of Smalltalk
`
`Developing and Localizing International Software
`
`Object-Oriented Development at Work: Fusion In
`the Real World
`
`X Wrndows on the World: Developing
`Internationalized Software with X, Motu®, and CDE
`
`McMinds/Whitty
`
`Writing Your Own OSF/MotifWidgets
`
`Phaal
`Poniatowski
`Poniatowski
`
`Thomas
`
`Weygant
`
`Witte
`
`LAN Traffic Management
`
`The HP-UX System Administrator's "How To" Book
`
`HP-UX lO.x System Administration "How To" Book
`
`Cable Television Proof-of-Performance: A Practical
`Guide to Cable TV Compliance Measurements Using
`a Spectrum Analyzer.
`
`Clusters for High Availability: A Primer ofHP-UX Solutions
`
`Electronic Test Instruments
`
`DHPN-1003 I Page 3 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Clus~ers for High
`Availability
`
`A Primer of HP-UX Solutions
`
`Peter Weygant
`
`Hewlett-Packard Company
`
`Prentice Hall PTR
`Upper Saddle River, New Jersey 07458
`
`DHPN-1003 I Page 4 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Editorial/Production Supervision: Jomme Anzalone
`Acquisitions Editor: Karen Gettman
`Manufacturing Manager: Alexis R. Heydt
`Cover Design: Design Source
`Manager, Hewlett-Packard Press: Pat Pekary
`
`© 1996 by Hewlett-Packard Company
`
`It Published by Prentice Hall PTR
`
`Prentice-Hall, Inc.
`A Simon & Schuster Company
`Upper Saddle River, NJ 07458
`
`All rights reserved. No part of this book may be
`reproduced, in any form or by any means, without
`permission in writing from the publisher.
`
`MC/ServiceGuard and MC/LockManager are registered trademarks of Hewlett-Packard
`Company. Oracle is a trademark of Oracle Corporation. Symmetrix and EMC are
`trademarks of EMC Corporation. NFS is a trademark of Sun Microsystems, Inc. UNIX
`is a registered trademark in the United States and in other countries, licensed exclusively
`through X/ Open Company, Ltd.
`The publisher offers discounts on this book when ordered in bulk quantities.
`For more information, contact the Corporate Sales Department, PTR Prentice Hall, One
`Lake Street, Upper Saddle River, NJ 07458. Phone: 800-382-3419. FAX: 201-236-7141. e-mail:
`corpsales@prenhall.com
`
`Printed in the United States of America
`
`10 9 8 7
`
`ISBN 0-13-494758-4
`
`HP Part Number 83936-90007
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Ply. Limited, Sydney
`Prentice-Hall of Canada, Inc., Toronto
`Prentice-Hall Hispanomnericana S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Simon & Schuster Asia Pte. Ltd., Singapore
`Editora Prentice-Hall do Brasil, Ltd., Rio de Janeiro
`
`DHPN-1003 I Page 5 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`Foreword
`
`Preface
`
`Acknowledgements
`
`About the Author
`
`1
`
`BASIC HIGH AVAILABILITY CONCEPTS
`
`What is High Availability?
`Available
`Highly Available
`Highly Available Computing
`Service Levels
`Continuous Availability
`Fault Tolerance
`Matching Availability to User Needs
`Choosing a Solution
`
`High Availability as a Business Requirement
`High Availability as Insurance
`
`v
`
`xiii
`
`XV
`
`xvi
`
`xvii
`
`1
`
`2
`2
`3
`4
`5
`6
`6
`7
`7
`
`8
`
`8
`
`DHPN-1003 I Page 6 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`High Availability as Opportunity
`9
`Cost of High Availability
`10
`What Are the Measures of High Availability?
`11
`Calculating Availability
`11
`Expected Period of Operation
`12
`Calculating Mean Time Between Failures
`14
`Understanding the Obstacles to High Availability 16
`Duration of Outages
`17
`Time Lines for Outages
`18
`Causes of Planned Downtime
`20
`Causes of Unplanned Downtime
`22
`Severity of Unplanned Outages
`23
`Designing for Reaction to Failure
`23
`Identifying Points of Failure
`24
`Preparing Your Organization for High
`Availability
`25
`Stating Availability Goals
`25
`Building the Appropriate Physical Environment
`27
`Creating Automated Processes
`27
`Using a Development and Test Environment
`28
`Maintaining a Stock of Spare Parts
`28
`Defining an Escalation Process
`29
`Planning for Disasters
`29
`Training System Administration Staff
`29
`Using Dry Runs
`30
`Documenting Every Detail
`30
`The Starting Point for a Highly Available System 31
`Basic Hardware Reliability
`31
`Software Quality
`32
`Intelligent Diagnostics
`32
`
`vi
`
`DHPN-1003 I Page 7 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`Comprehensive System Management Tools
`Maintenance and Support Services
`
`Moving to High Availability
`
`Summary
`
`2
`
`CREATING A HIGH AVAILABILITY CLUSTER
`
`Identifying Single Points of Failure in a
`Stand-alone System
`
`33
`33
`34
`35
`
`39
`
`40
`
`Eliminating Power Sources as Single
`Points of Failure
`Individual UPS Units
`Power Passthrough UPS Units
`
`45
`45
`46
`48
`Eliminating Disks as Single Points of Failure
`Data Protection with Disk Arrays
`49
`Data Protection with Software Mirroring
`51
`Eliminating the SPU as a Single Point of Failure 54
`Eliminating Single Points of Failure in Networks 57
`Points of Failure in Client Connectivity
`57
`Examples of Points of Failure
`58
`Points of Failure in Inter-Node Communication
`60
`Eliminating the Failure Points
`60
`Providing Redundant LA.N Connections
`61
`Configuring Local Switching of LAN Interfaces
`61
`Providing Redundant FDDI Connections
`66
`Using Dual Attached FDDI
`68
`Redundancy for Dialup Lines, Hardwired Serial Connec-
`tions and X.25
`
`69
`
`vii
`
`DHPN-1003/ Page 8 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`Eliminating Software as a Single Point of Failure 70
`Tailoring Applications for Cluster Use
`71
`
`Implementing the
`High Availability Cluster
`Complete High Availability Solution
`
`3
`
`HP'S HIGH AVAILABILITY CLUSTER
`COMPONENTS
`
`Choosing HA Architectures and Cluster
`Components
`Active/Standby Configurations
`Using MC/ServiceGuard
`Active/ Active Configurations
`Using MC/ServiceGuard
`How MCfServiceGuard Works
`Parallel Database Configuration
`Using MC/LockManager
`Oracle Parallel Server
`How MC/LockManager Works with OPS
`
`Selecting Other HA Subsystems
`Mirror Disk/UX
`High Availability Disk Storage Enclosure
`High Availability Disk Arrays
`EMC Disk Arrays
`Journaled File System
`OnLineJFS
`Transaction Processing Monitors
`Uninterruptible Power Supplies
`System and Network Management Tools
`
`viii
`
`73
`74
`
`77
`
`78
`
`79
`
`82
`84
`
`90
`91
`92
`
`94
`95
`95
`96
`97
`97
`98
`99
`99
`100
`
`DHPN-1003 I Page 9 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`Using Mission Critical Consulting and
`Support Services
`Availability Management Service
`Business Continuity Support
`Business Recoven; Services
`
`4
`
`SAMPLE HIGH A VAl LABILITY SOLUTIONS
`
`Highly Available NFS System for Publishing
`High Availability Software and Packages
`Hardware Configuration
`Responses to Failures
`Stock Quotation Service
`High Availability Software and Packages
`Hardware Configuration
`Responses to Failures
`Order Entry and Catalog Application
`High Availability Software and Packages
`Hardware Configuration
`Responses to Failures
`Insurance Company Database
`Two-Node OPS Configuration
`
`106
`106
`107
`109
`
`111
`
`112
`112
`114
`115
`120
`121
`122
`124
`127
`127
`131
`133
`134
`135
`
`ix
`
`DHPN-1003 I Page 10 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`5
`
`GLOSSARY OF HIGH AVAILABILITY
`TERMINOLOGY
`
`139
`
`AdminCenter
`Adoptive Node
`ADT
`AFR
`Alternate Node
`Annualized Failure Rate
`Architecture for HA
`Availability
`Average Downtime
`Cluster
`Cluster View
`Continuous Availability
`Custody
`Downtime
`Failure
`Failover
`Fault Tolerance
`Grouped Net
`Hardware Mirroring
`Highly Available
`Hot Plug Capability
`Hot Swap Capability
`LAN
`LAN interface
`Logical Volume Manager
`MC/LockManager
`MC/ServiceGuard
`Mean Time Between Failures
`Mean Time to Repair
`MirrorDisk/UX
`Mirroring
`
`X
`
`140
`140
`140
`140
`140
`141
`142
`142
`143
`144
`145
`145
`145
`145
`145
`146
`146
`146
`146
`146
`147
`147
`147
`147
`148
`148
`148
`148
`149
`150
`151
`
`'
`DHPN-1003/ Page 11 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Contents
`
`MTBF
`MTTR
`Network Node Manager
`Node
`Open View
`OperationsCenter
`Planned Downtime
`Primary Node
`Package
`Process Resource Manager
`RAID
`Redundancy
`Reliability
`Relocatable IP Address
`Service
`Service Level Agreement
`Shared Logical Volume Manager
`Single Point of Failure
`SLVM
`Software Mirroring
`SPOF
`SPU
`Subnet
`SwitchOverfUX
`System Processor Unit
`Transfer of Packages
`Unplanned Downtime
`Volume Group
`Index
`
`xi
`
`151
`151
`151
`151
`152
`152
`152
`152
`153
`153
`153
`154
`154
`154
`154
`155
`155
`155
`156
`156
`156
`156
`156
`156
`157
`157
`157
`157
`159
`
`DHPN-1003 I Page 12 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Foreword
`
`Foreword
`
`Over the last ten years, UNIX systems have moved from the spe(cid:173)
`cialized role of providing desktop computing power for engineers
`into the broader arena of commercial computing. This evolution is
`the result of continual dramatic improvements in functionality, re(cid:173)
`liability, performance, and supportability. We are now well into
`the next phase of the UNIX evolution: providing solutions for mis(cid:173)
`sion critical computing.
`
`To best meet the requirements of the data center for availability,
`scalability, and flexibility, Hewlett-Packard has developed a ro(cid:173)
`bust cluster architecture for HP-UX that combines multiple sys(cid:173)
`tems into a high availability cluster. Individual computers, known
`as nodes, are connected in a loosely-coupled manner, each main(cid:173)
`taining its own separate processors, memory, operating system,
`and storage devices. Special system processes bind these nodes to(cid:173)
`gether and allow them to cooperate to provide outstanding levels
`of availability and flexibility for supporting mission critical appli(cid:173)
`cations. The nodes in a cluster can be configured either to share
`data on a set of disks or to obtain exclusive access to data.
`
`To maintain Hewlett-Packard's commitment to the principles of
`open systems, our high availability clusters use standards-based
`hardware components such as SCSI disks and Ethernet LANs.
`There are no proprietary APis that force vendor lock-in, and most
`applications will run on a high availability cluster without modifi(cid:173)
`cation.
`
`xiii
`
`DHPN-1003 I Page 13 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`As the world's leading vendor of open systems, Hewlett-Packard
`is especially proud to publish this primer on cluster solutions for
`high availability. Peter Weygant has done a fine job of presenting
`the basic concepts, architectures, and terminology used in HP's
`cluster solutions. This is the place to begin your exploration of the
`world of high availability clusters.
`
`Xuan Bui
`Hewlett-Packard General Systems Division
`Research and Development Laboratory Manager
`
`xiv
`
`DHPN-1003/ Page 14 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Preface
`
`liM
`
`Et :q:iiki~ - - - - - - - - - - -- - - - -- -
`
`Preface
`
`This guide is about high availability (HA) computing through enterprise
`clusters. It presents basic concepts and terms, then describes the use of
`cluster technology to provide highly available open systems solutions
`for the commercial enterprise. Here are the topics:
`
`• Chapter 1, "Basic High Availability Concepts," presents the lan(cid:173)
`guage used to describe highly available systems and components and
`introduces ways of measming availability.
`
`Chapter 2, "Creating a High Availability Cluster," describes in more
`detail the principles of HA configuration, with examples.
`
`Chapter 3, "HP's High Availability Cluster Components," is an over(cid:173)
`view of HP's current roster of high availability software and hard(cid:173)
`ware offerings.
`
`• Chapter 4, "Sample HA Solutions," discusses afew concrete exam(cid:173)
`ples of highly available cluster solutions.
`
`Chapter 5, "Glossary," gives definitions of important words and
`phrases used to describe high availability.
`
`Additional information is available in the HP publications Managing
`MC!ServiceGuard and Configuring OPS Clusters with MC!LockMan(cid:173)
`ager. The HP 9000 Servers Configuration Guide contains detailed
`information about supported high availability configurations. This and
`other more specialized documents on enterprise clusters are available
`from your HP representative.
`
`XV
`
`DHPN-1003 I Page 15 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`- - - - -------
`Acknowledgments
`
`This book has benefited from the careful review of many individuals
`inside and outside of Hewlett-Packard. The author gratefully acknowl(cid:173)
`edges the contributions of these colleagues, many of whom are listed
`here: Joe Algieri, Sally Anderson, Joe Bac, Bob Baird, Trent Bass, Dan
`Beringer, Claude Brazell, Thomas Buenermann, Xuan Bui, Karl-Heinz
`Busse, Bruce Campbell, Larry Cargnoni, Gina Cassinelli, Marian
`Cochran, Annie Cooperman, Ron Czinski, Dan Dickerman, Pam Dick(cid:173)
`erman, Larry Dino, Janie Felix, John Foxcroft, Shivaji Ganesh, Janet
`Gee, Mike Gutter, Terry Hand, Michael Hayward, Frank Ho, Margaret
`Hunter, Lisa Iarkowski, Art Ipri, Michael Kahn, Marty King, Clark
`Macaulay, Gary Marcos, Debby Mcisaac, Doug McKenzie, Tim Met(cid:173)
`calf, Parissa Mohamadi, Alex Morgan, Markus Ostrowicki, Bob Ramer,
`Bob Sauers, Wesley Sawyer, David Scott, Dan Shive, Christine Smith,
`Eric Soderberg, Steve Stichler, Tim Stockwell, Brad Stone, Liz Tam,
`Bob Togasaki, Emil Velez, Tad Walsh, and Bev Woods. A special thank
`you goes to those groups of Hewlett-Packard customers who read and
`commented on early versions of the manuscript. Enors and omissions
`are the author's sole responsibility.
`
`xvi
`
`DHPN-1003 I Page 16 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`About the Author
`
`About the Author
`
`PeterS. Weygant is a Leaming Products Engineer in the General Sys(cid:173)
`tems Solutions laboratory at Hewlett-Packard. Formerly a professor of
`English, he has been a technical writer and consultant in the computer
`industry for the last 15 years. He has developed documentation and
`managed publication projects in the areas of digital imaging, relational
`database technology, and high availability systems. He has a BA degree
`in English Literature from Colby College as well as MA and PhD
`degrees in English from the University of Pennsylvania.
`
`xvii
`
`DHPN-1003 I Page 17 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`CHAPTER 1
`Basic High Availability
`Concepts
`
`This book takes an elementary look at high availabil(cid:173)
`ity (HA) computing and how it is implemented through
`enterprise-level cluster solutions. We start in this chapter
`with some of the basic concepts of HA. Here's what we'll
`cover:
`
`• What is High Availability?
`• High Availability as a Business Requirement
`• What Are the Measures of High Availability?
`• Understanding the Obstacles to High Availability
`• Preparing Your Organization for High Availability
`• The Starting Point for a High Availability System
`• From High Reliability to High Availability
`• Designing a Highly Available System
`
`1
`
`DHPN-1003 I Page 18 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`Later chapters explore the implementation of high
`availability in clusters, then describe HP's high availability
`products in more detail. A separate chapter is devoted to
`concrete examples of business solutions that use HA.
`
`What is High Availability?
`
`Before exploring the implications of high availability
`in computer systems, we need to define some terms. What
`do we mean by phrases like "availability," "high availabil(cid:173)
`ity," and "high availability computing?"
`
`Available
`
`The term available describes a system that provides a
`specific level of service as needed. This idea of availability
`is part of everyday thinking. In computing, availability is
`generally understood as the period of time when services
`are available (for instance, 16 hours a day, six days a week)
`or as the time required for the system to respond to users
`(for example, under 1 second response time). Any loss of
`service, whether planned or unplanned, is known as an
`outage. Downtime is the duration of an outage measured
`in units of time (e.g., minutes or hours).
`
`2
`
`DHPN-1003 I Page 19 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What is High Availability?
`
`Highly Available
`
`Figure 1. 1 Highly Available Services: Electricity
`
`Highly available characterizes a system that is
`designed to avoid the loss of service by reducing or manag(cid:173)
`ing failures as well as minimizing planned downtime for
`the system. We expect a service to be highly available when
`life, health, and well-being, including the economic well(cid:173)
`being of a company, depend on it.
`
`3
`
`DHPN-1003 I Page 20 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`For example, we expect electrical service to be highly
`available. All but the smallest, shortest outages are unac(cid:173)
`ceptable, since we have geared our lives to depend on elec(cid:173)
`tricity for refrigeration, heating, and lighting, in addition to
`less important daily needs.
`
`Even the most highly available services occasionally
`go out, as anyone who has experienced a blackout or
`brownout in a large city can attest. But in these cases, we
`expect to see an effort to restore service at once. When a
`failure occurs, we expect the electric company to be on the
`road fixing the problem as soon as possible.
`
`Highly Available Computing
`
`In many businesses, the availability of computers has
`become just as important as the availability of electric
`power itself. Highly available computing uses computer
`systems which are designed and managed to operate with only
`a small amount of planned and unplanned downtime.
`
`Note that highly available is not an absolute. The needs
`of different businesses for high availability are quite
`diverse. International businesses or companies running
`multiple shifts may require user access to databases around
`the clock. Financial institutions must be able to transfer
`funds at any time of night or day, seven days a week. On
`the other hand, some retail businesses may require the
`
`4
`
`DHPN-1003 I Page 21 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What is High Availability?
`
`Figure 1.2 Service Outage
`
`computer to be available only 18 hours a day, but during
`these 18 hours they may require sub-second response time
`for transaction processing.
`
`Service Levels
`
`The service level of a system is the degree of service
`the system will provide to its users. Often, the service level
`is spelled out in a document known as a service level agree(cid:173)
`ment (SLA). The service levels your business requires
`determine the kind of applications you develop, and high
`availability systems provide the hardware and software
`
`5
`
`DHPN-1003 I Page 22 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`framework in which these applications can work effec(cid:173)
`tively to provide the needed level of service. High avail(cid:173)
`ability implies a service level in which both planned and
`unplanned computer outages do not exceed a small stated
`value.
`
`Continuous Availability
`Continuous availability means non-stop service, that
`is, there are no planned or unplanned outages at all. This is
`a much more ambitious goal than high availability, since
`there can be no lapse in service. In effect, continuous avail(cid:173)
`ability is an ideal state rather than a characteristic of any
`real world system.
`
`The term is sometimes used to indicate a very high
`level of availability in which only a very small known
`quantity of downtime is acceptable. Note that high avail(cid:173)
`ability does not imply continuous availability.
`
`Fault Tolerance
`Fault tolerance is not a degree of availability so much
`as a method for achieving very high levels of availability. A
`fault tolerant system is characterized by redundancy in
`most of the hardware components, including CPU, mem(cid:173)
`ory, I/0 subsystems, and other elements. A fault tolerant
`system is one that has the ability to continue service in spite
`of a hardware or software failure. However, even fault tol(cid:173)
`erant systems are subject to outages from human error.
`Note that high availability does not imply fault tolerance.
`
`6
`
`DHPN-1003 I Page 23 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What is High Availability?
`
`Matching Availability to User Needs
`
`A failure affects availability when it results in an
`unplanned loss of service that lasts long enough to create a
`problem for users of the system. User sensitivity will
`depend on the specific application. For example, a failure
`that is corrected within one second may not result in any
`perceptible loss of service in an environment that does on(cid:173)
`line transaction processing (OLTP); but for a scientific
`application that runs in a real-time environment, one sec(cid:173)
`ond may be an unacceptable interval.
`
`Since any component can fail, the challenge is to
`design systems in which problems can be predicted and
`isolated before a failure occurs and in which failures are
`quickly detected and corrected when they happen.
`
`Choosing a Solution
`
`Your exact requirements for availability determine the
`kind of solution you need. For example, if the loss of a sys(cid:173)
`tem for a few hours of planned downtime is acceptable to
`you, then you may not need to purchase storage products
`with hot pluggable disks. On the other hand, if you cannot
`afford a planned period of maintenance during which a
`disk replacement could be done on a mirrored disk system,
`then you may wish to consider a HA disk array that sup(cid:173)
`ports hot plugging or hot swapping of components.
`(Descriptions of these HA products appear in later sec(cid:173)
`tions.)
`
`7
`
`DHPN-1003 I Page 24 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`High Availability as a Business
`Requirement
`
`In the current business climate, high availability com(cid:173)
`puting is often seen as a requirement, not a luxury. From
`one perspective, high availability is a form of insurance
`against the loss of business due to computer downtime.
`From another point of view, high availability provides new
`opportunities by allowing your company to provide better
`and more competitive customer service.
`
`High Availability as Insurance
`
`High availability computing is often seen as insurance
`against the following kinds of damage:
`
`• Loss of income
`• Customer dissatisfaction
`• Missed opportunities
`
`For commercial computing, a highly available solu(cid:173)
`tion is needed when loss of the system results in loss of rev(cid:173)
`enue. In such cases, the application is said to be mission(cid:173)
`critical. For all mission-critical applications- that is, where
`income may be lost through downtime - high availability
`is a requirement. In banking, for example, the ability to
`obtain certain account balances 24 hours a day may be mis(cid:173)
`sion-critical. In parts of the securities business, the need for
`
`8
`
`DHPN-1003 I Page 25 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`High Availability as a Business Requirement
`
`high availability may only be for that portion of the day
`when the stock market is active; at other times, systems
`may be safely brought down.
`
`High Availability as Opportunity
`
`Highly available computing provides a business
`opportunity, since there is an increasing demand for
`"around the clock" computerized services in areas as
`diverse as banking and financial market operations, com(cid:173)
`munications, order entry and catalog services, resource
`management, and others. It is not possible to give a simple
`definition of when an application is mission-critical or of
`when high availability of the application creates new
`opportunities; this depends on the nature of the business.
`However, in any business that depends on computers, the
`following principles are always true:
`
`• The degree of availability required is determined by
`business needs. There is no absolute amount of
`availability that is right for all businesses.
`
`• There are many ways to achieve high availability.
`
`• The means of achieving high availability affects all
`aspects of the system.
`
`• The likelihood of failure can be reduced by creating
`an infrastructure that stresses clear procedures and
`preventive maintenance.
`
`• Recovery from failures must be planned.
`
`9
`
`DHPN-1003 I Page 26 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`Some or all of the following are expectations for the
`software applications that run in mission-critical environ(cid:173)
`ments:
`
`• There should be a low rate of application failures,
`that is, a maximum time between failures.
`
`• Applications should be able to recover after failure.
`
`• There should be minimal scheduled downtime.
`
`• The system should be configurable without shut(cid:173)
`down.
`
`• System management tools must be available.
`
`Cost of High Availability
`
`As with other kinds of insurance, the cost depends on
`the degree of availability you choose. Thus the value of
`high availability to the enterprise is directly related to the
`costs of outages. The higher the cost of outage, the easier it
`becomes to justify the expense of high availability solu(cid:173)
`tions. As the degree of availability approaches the ideal of
`100% availability, the cost of the solution increases more
`rapidly. Thus, the cost of 99.95% availability is significantly
`greater than the cost of 99.5% availability, and the cost of
`99.5% availability is significantly greater than 99% avail(cid:173)
`ability, and so on.
`
`10
`
`DHPN-1003 I Page 27 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What Are the Measures of High Availability?
`
`What Are the Measures of High
`Availability?
`
`Availability and reliability can be described in
`terms of numbers, though doing so can be very mis(cid:173)
`leading. In fact, there is no standard method for model(cid:173)
`ing or calculating the degree of availability in a
`computer system. The important thing is to create clear
`definitions of what the numbers mean and then use
`them consistently. Remember that availability is not a
`measurable attribute of a system like CPU clock speed.
`Availability can only be measured historically, based on
`the behavior of the actual system. Moreover, in measur(cid:173)
`ing availability, it is important to ask not simply, "Is the
`application available?" but "Is the entire system pro(cid:173)
`viding service at the proper level?"
`
`Availability is related to reliability, but they are not the
`same thing. Availability is the percentage of total system
`time the computer system is accessible for normal usage.
`Reliability is the amount of time before a system is
`expected to fail. Availability includes reliability.
`
`Calculating Availability
`
`The formula in Figure 1.3 defines availability as the
`percentage of elapsed time that a unit can be used. Elapsed
`time is continuous time (operating time+ downtime).
`
`11
`
`DHPN-1003 I Page 28 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`% Availabilit =
`y
`
`(Total Elapsed Time -Sum
`of Inoperative Times)
`------------------
`Total Elapsed Time
`
`Figure 1.3 Availability
`
`Availability is actually the probability that a unit is
`available (that is, operating normally). Availability is usu(cid:173)
`ally expressed as a percentage of hours per w eek, month, or
`year during which the system and its services can be used
`for normal business.
`
`Expected Period of Operation
`
`Measures of availability must be seen against the
`background of the organization's expected period of opera(cid:173)
`tion of the system. The following tables show the actual
`
`12
`
`DHPN-1 003 I Page 29 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What Are the Measures of High Availability?
`
`hours of uptime and downtime associated with different
`percentages of availability for two common periods of
`operation. Table 1.1 shows 24x7x365, which stands for a
`system that is expected to be in use 24 hours a day, seven
`days a week, 365 days a year.
`
`Table 1. 1 Uptime and Downtime for a 24x7x365 System
`
`Availability Minimum Maximum
`Allowable
`Expected
`Downtime
`Uptime
`
`Remaining
`Time
`
`99%
`
`99.5%
`
`99.95%
`
`100%
`
`8672
`
`8716
`
`8755
`
`8760
`
`88
`
`44
`
`5
`
`0
`
`0
`
`0
`
`0
`
`0
`
`This table shows that there is no remaining time on the
`system at all. All the available time in the year (8760 hours)
`is accounted for. This means that all maintenance must be
`carried out either when the system is up or during the
`allowable downtime hours. In addition, the higher the per(cid:173)
`centage of availability, the less time is allowable for failure.
`
`Table 1.2 shows a 12x5x52 system, which is expected
`to be up for 12 hours a day, five days a week, 52 weeks a
`year.
`
`13
`
`DHPN-1003 I Page 30 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`Basic High Availability Concepts
`
`Table 1.2 Uptime and Downtime for a 12x5x52 System
`
`Availability Minimum
`Expected
`Uptime
`
`Maximum
`Allowable
`Downtime
`
`Remaining
`Time
`
`99%
`
`99.5%
`
`99.95%
`
`100%
`
`3088
`
`3104
`
`3118
`
`3118
`
`32
`
`16
`
`2
`
`0
`
`5642
`
`5642
`
`5642
`
`5642
`
`This table shows that for the 12x5x52 system, there are
`5642 hours of remaining time, which can be used for
`planned maintenance operations requiring the system to be
`down.
`
`Calculating Mean Time Between Failures
`
`Availability is related to failure rates of system compo(cid:173)
`nents. A common measure of equipment reliability is the
`mean time between failures (MTBF). This measure is usu(cid:173)
`ally provided for individual system components, such as
`disks. Measures like these are useful, but they are only one
`dimension of the complete picture of high availability. For
`example, they do not take into account the differences in
`recovery times after failure.
`
`MTBF is given by the formula shown in Figure 1.4.
`
`14
`
`DHPN-1003 I Page 31 of 181
`
`IPR2014-00901 Owner Ex. 2101
`ETRI, Patent Owner
`VMware, Petitioner
`
`
`
`What Are the Measures of High Availability?
`
`MTBF=
`
`Total Operating Time
`
`Total No. of Failures
`
`Figure 1.4 Mean Time Between Failures
`
`The MTBF is calculated by summing the actual operat(cid:173)
`ing times of all units, including units that do not fail, and
`dividing that sum by the sum of all failures of the units.
`Operating time is the sum of the hours when the system is
`in use (that is, not