`Anastasiadis et al.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US007103595B2
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 7,103,595 B2
`Sep.5,2006
`
`(54) STREAMING SERVER
`
`OTHER PUBLICATIONS
`
`(76)
`
`Inventors: Stergios V. Anastasiadis, 3611
`University Dr. Apt. 3Y, Durham, NC
`(US) 27707; Kenneth C. Sevcik, 99
`Harbour Square, Suite 4001, Toronto,
`Ontario (CA), M5J 2H2; Michael
`Stumm, 3 Belvale, Toronto (CA), H5X
`2A6
`
`( * ) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1049 days.
`
`(21) Appl. No.: 10/054,699
`
`(22) Filed:
`
`Jan. 22, 2002
`
`(65)
`
`Prior Publication Data
`
`US 2003/0074486 Al Apr. 17, 2003
`
`(30)
`
`Foreign Application Priority Data
`
`Jan. 19, 2001
`Feb. 9, 2001
`Feb. 9, 2001
`
`(CA) ............................................. 2331474
`(CA) ............................................. 2335521
`(CA) ............................................. 2335540
`
`(51)
`
`Int. Cl.
`G06F 17130
`
`(2006.01)
`
`Ken Rudin, Scalable Systems Architecture: Scalable I/O,
`Part I: Disk Striping, published in DM Review in May 1998,
`pp. 1-3.*
`Shenoy and Vin, Failure Recovery Algorithms for Multime(cid:173)
`dia Servers, University of Texas at Austin, pp. 1-34
`(undated).
`Haskin and Schmuck, The Tiger Shark File System, IBM
`Almaden Research Center, IEEE, 1996, pp. 226-231.
`Bolosky, et al., Distributed Schedule Management in the
`Tiger Video Fileserver, Microsoft Research, SOSP 97
`(undated).
`Anastasiadis, et al., Modular and Efficient Resource Man(cid:173)
`agement in the Exedra Media Server, University of Toronto,
`USfNIX Symp. On Internet Tech., San Francisco, CA Mar.
`2001.
`Shenoy and Vin, Efficient Striping Techniques for Multime(cid:173)
`dia File Servers, University of Texas at Austin, NOSSDAV
`97, pp. 25-36 (undated).
`Reddy and Wijayaratne, Techniques for improving the
`throughput ofVBR streams, Texas A & M University, NCN
`99 (undated).
`Gafsi and Biersack, Data Striping and Reliability Aspects in
`Distributed Video Servers, Institut EURECOM, In Cluster
`Computing, Balzer Pub. (1998), pp. 1-27.
`Ozden, et al., Disk Striping in Video Server Environments,
`AT&T Bell Laboratories, IEEE, 1996, pp. 580-589.
`
`(52) U.S. Cl. ................................................ 70717; 707/1
`
`(Continued)
`
`( 58) Field of Classification Search ... ... ... ... .. ... ... . 707 /1,
`707/2, 7, 10, 104.1; 709/203, 219, 221, 226,
`709/231; 711/114; 714/4, 5, 11; 719/321
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,974,503 A * 10/1999 Venkatesh et al. .......... 711/114
`6,230,200 Bl * 512001 Forecast et al.
`............ 709/226
`6,625,750 Bl * 9/2003 Duso et al. ................... 714/11
`
`Primary Examiner-Apu Mofiz
`(74) Attorney, Agent, or Firm-Connolly Bove Lodge &
`Hutz LLP
`
`(57)
`
`ABSTRACT
`
`A media server having at least one of a stride-based storage
`device space allocation scheme, stride-based method of
`striping data across multiple storage devices for continuous
`media streaming, server-based smoothing of variable bit(cid:173)
`rate streams, distributed architecture, and fault tolerance.
`
`18 Claims, 20 Drawing Sheets
`
`130
`adn:rission
`co:nroJ nodes ~
`
`\ diem
`j
`\
`r1
`Fl
`0 ~ Qlooa\-area \
`i network
`
`schedule
`database
`
`140
`
`',
`
`requem
`
`160
`
`124
`
`tr;msfer
`nodes
`120
`
`~/ 'I
`', high-'P.eod
`\.,.___,..\_) nt!twork
`!!~ 170
`
`100
`
`clients 150
`
`CSCO-1038
`Page 1 of 34
`
`
`
`US 7,103,595 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`Carbrera and Long, Swift: Using Distributed Disk Striping
`to Provide High I/O Data Rates, Computing Systems 4(4)
`(Fall 1991), pp. 405-436.
`Anastasiadis, et al., Server-Based Smoothing of Variable
`Bit-Rate Streams, ACM Int'! Symp. On Multimedia, Oct.
`2001 (Ottawa, ON Canada).
`Anastasiadis, et al., Maximizing Throughput in Replicated
`Disk Striping ofVariable Bit-Rate Streams, USfNIXAnnual
`Tech. Conf., Monterey, CA (Jun. 2002).
`Anastasiadis, et al., Disk Striping Scalability in the Exedra
`Media Server, University of Toronto, SPIE/ACM Multime(cid:173)
`dia Computing and Networking Conf., San Jose, CA (Jan.
`2001).
`McManus and Ross, A Dynamic Programming Methodol(cid:173)
`ogy for Managing Prerecorded VBR Sources in Pack(cid:173)
`et-Switched Networks, University of Pennsylvania, Jan.
`1997, pp. 1-28.
`Zhao and Tripathi, Banwidth-Efficient Continuous Media
`Streaming Through Optimal Multiplexing (undated).
`Sen, et al., Proxy Prefix Caching for Multimedia Streams,
`IEEE, 1999, pp. 1310-1319.
`Sahu, et al., On the Efficient Retrieval of VBR Video in a
`Multimedia Server, IEEE, 1997, pp. 46-53.
`Biersack and Hamdi, Cost-optimal Data Retrieval for Video
`Servers with Variable Bit Rate Video Streams, NOSSDAV
`98 (Cambridge, UK) (undated).
`Salehi, et al., Supporting Stored Video: Reducing Rate
`Variability and End-to-End Resource Requirements
`through Optimal Smoothing, University of Massachusetts,
`SIGMETRICS 96, 1996, pp. 222-231.
`Shenoy, et al., Symphony: An Integrated Multimedia File
`System, University of Texas, pp. 1-17, Tech. Report TR
`97--09 (Mar. 1997).
`Sen, et al., Online Smoothing of Variable-Bit-Rate Stream(cid:173)
`ing Video, IEEE Transactions on Multimedia, v. 2, No. 1
`(Mar. 2000).
`Makaroff, et al., An Evaluation of VBR Disk Admission
`Algorithms for Continuous Media File Servers, ACM Mul(cid:173)
`timedia (1997) pp. 145-154.
`
`Santos and Muntz, Performance Analysis of the RIO Mul(cid:173)
`timedia Storage System with Heterogeneous Disk Configu(cid:173)
`rations, ACM Multimedia, 1998, pp. 1-6 and 227-238
`(1998).
`Gringeri, et al., Traffic Shaping, Bandwidth Allocation, and
`Quality Assessment for MPEG Video Distribution over
`Broadband Networks, IEEE Network, (Dec. 1998).
`Lakshman, et al., VBR Video: Tradeoffs and Potentials,
`Proceedings of the IEEE, vol. 86, No. 5, May 1998, pp.
`952-973.
`Martin, et al., The Fellini Multimedia Storage System,
`Information Sciences Research Center, Journal of Digital
`Libraries, pp. 1-22 (1997).
`Flynn and Tetzlaff, Disk Striping and Block Replication
`Algorithms for Video File Servers, Proceedings of Multi(cid:173)
`media, IEEE, 1996, pp. 590-597.
`Tewari, et al., High Availability in Clustered Multimedia
`Servers, Int'! Conf. on Data Engineering (Feb. 1996) pp.
`336-342.
`Ozden, et al., Fault-tolerant Architectures for Continuous
`Media Servers, ACM SIGMOD (Jun. 1996).
`Tobagi, et al., Streaming RAID-A Disk Array Management
`System For Video Files, ACM Multimedia, 1993, pp.
`393-400.
`Mourad, Doubly-Striped Disk Mirroring: Reliable Storage
`for Video Servers, Multimedia Tools and Applications 2,
`1996, pp. 273-297.
`Berson, et al., Fault Tolerant Design of Multimedia Servers,
`ACM, 1995, pp. 364-375.
`Gray and Shenoy, Rules of Thumb in Data Engineering,
`IEEE International Conference on Data Engineering, 2000.
`Bolosky, et al., The Tiger Video Fileserver, NOSSDAV,
`(Apr. 1996).
`Patterson, et al., A Case for Redundant Arrays of Inexpen(cid:173)
`sive Disks (RAID), University of California (undated).
`McVoy and Kleiman, Extent-like Performance from a
`UNIX File System, USENIX, Dallas, TX (Winter 1991), pp.
`1-12.
`* cited by examiner
`
`Page 2 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 1 of 20
`
`US 7,103,595 B2
`
`130
`admission
`comrol nodes b
`
`\
`
`' '
`
`... ,
`'
`
`cliem
`requests
`160
`
`\
`
`\
`
`'\
`
`' '
`
`' ~
`
`schedule
`dat3base
`
`ld~ ......
`
`140
`
`serve
`buffe
`
`network
`interfaces
`
`\
`)
`
`' l
`
`I
`
`I
`
`,
`
`transfer
`nodes
`120
`
`/
`
`//)
`
`"
`~~ high-sr.eed
`~ nenvork
`t~ 17C
`
`100
`
`~§
`
`114
`
`Figure 1
`
`clients 150
`
`Page 3 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 2 of 20
`
`US 7,103,595 B2
`
`stream stride index
`
`disk strides
`
`i
`
`i+l
`
`i+2
`
`i+3
`
`i+4
`
`i+5
`
`• Requestj
`
`• Requestj+l
`
`• Requestj+2 • Requestj+3
`
`Figure 2
`
`Page 4 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 3 of 20
`
`US 7,103,595 B2
`
`Original Clip
`
`1.0
`
`0.8
`
`..-...
`'-=
`~
`0.6
`~
`~
`~ 0.4
`~
`
`0~2
`
`0
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Figure 3
`
`Page 5 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 4 of 20
`
`US 7,103,595 B2
`
`1.0
`G' ~ 0.8
`T"""1
`~ 0.6
`~ 0.4
`~ ~ 0.2
`0
`
`1.0
`
`,...-4
`
`cc ~ 0.8
`~ 0.6
`~ 0.4
`.....
`~ 0.2
`0
`
`Fixed-Grain (Disk 0)
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Fixed-Grain ( Disk 1 )
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Figure 4(a)
`
`Page 6 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 5 of 20
`
`US 7,103,595 B2
`
`Variable-Grain (Disk 0)
`
`1.0
`,-..
`~ 0.8
`,.-I
`~ 0.6
`~ 0.4
`......
`~ 0.2
`0
`
`LO
`~ \C 0.8
`~
`...-4
`~~~ 0.6
`~ 0.4
`......
`~ 0.2
`~
`0
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Variable-Grain (Disk 1)
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Figure 4(b)
`
`Page 7 of 34
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 6 of 20
`
`US 7,103,595 B2
`
`Group-Grain ( Disk 0 )
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Group-Grain ( Disk 1 )
`
`.-. 2.0
`\Cl
`~ L5
`""""4
`~ 1.0
`fj
`~ 0.5
`~
`
`0
`
`.-. 2.0
`\Cl
`~ 1.5
`~ 1.0
`
`0 2 4 6 8 10 12 14 16 18
`Round
`
`Figure 4(c)
`
`Page 8 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 7 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain Striping
`
`150
`
`100
`
`~
`~ ......
`.....
`
`rl':)
`
`0
`~
`CJ
`~
`
`= 50
`::r z
`
`--·-- Fl= 30
`Fl= 10
`u
`- --- . Fl= 1
`
`0.0
`
`0.2
`
`0.6
`0~4
`Load
`
`0.8
`
`1.0
`
`Figure 5
`
`Page 9 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 8 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain Striping
`
`1.2
`
`1.0
`
`0.8
`
`CJ
`
`"'Cf
`~
`~ y
`~ 0.6
`~
`,...
`CJ
`·~
`~
`
`- .. - ·A= 1
`A== 10
`M
`··•·· Fl =30
`
`I
`
`•
`
`,
`
`• • ~
`
`I
`
`I
`,
`
`0.4
`
`0.2
`
`0-+-----111--.... --11...-mc--w~-r---r-~.----.-----.,
`0.6
`0.8
`0.0
`0.2
`0.4
`1.0
`Load
`
`Figure 6
`
`Page 10 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 9 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain Striping
`
`,.., .... ,,, ,.,
`.. , ,\ ..
`'"'
`,
`.. ~ . ...
`""
`.,
`. ....... ..
`.
`,,.
`,
`'
`..
`I ··--•·--·· ·-··· •
`
`"
`
`,
`
`\
`
`150
`
`l;"IJ s =
`~ 100
`
`~
`~
`QJ
`..c
`§ 50
`z
`
`(
`I
`
`- - - · Load = 80%
`---··· Load= 40%
`
`o ......... __ __, __ ...-_________________ __
`
`0
`
`800 1000
`600
`400
`200
`Block Bf ( Bytes x 1000 )
`
`Figure 7
`
`Page 11 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 10 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain Striping
`II Max Rsrv II Avg Rsrv D Avg Diff
`1.0
`
`..-..
`u
`aJ
`~
`'-'
`
`~ 6
`-~
`~
`~
`l'IJ
`·~ Q
`'"O
`§
`0
`~
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`Block Bf ( Bytes )
`
`Figure 8
`
`Page 12 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 11 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain Striping
`II Max Rsrv II Avg Rsrv D Avg Diff
`LO
`
`.-..
`i:J 0.8
`~
`._,
`fl.)
`8 0.6
`..,...
`~
`~
`.~ 0.4
`Q
`'"C
`s:= 5 0.2
`
`~
`
`8
`
`16
`Number of Disks
`
`32
`
`Figure 9
`
`Page 13 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 12 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain vs Variable-Grain Striping
`
`1000
`
`800
`
`- .. - ·Variable-Grain (80%)
`.. - -~ • • Fixed-Grain (8(11))
`•
`V ariab1e-Grain ( 40%)
`- .... • · Fixed-Grain ( 40%)
`
`• " -..
`.>< -..,•
`• ••
`• • •
`, ...
`, .
`
`200
`
`o--~-.-~---.,.~~...--~-.-~--.,..-~----..~~~
`. 20
`0
`60
`Number of Disks
`
`Figure 10
`
`Page 14 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 13 of 20
`
`US 7,103,595 B2
`
`250
`
`Variable-Grain Striping
`
`.......
`00
`
`s 200
`= ~ 150
`~ =
`t 100
`..c e
`= z 50
`
`........ Fl=30
`)( Fl= 10
`- 4- · Fl == 1
`
`0-+---------------------------
`0.2
`0.0
`0.6
`0 .. 4
`Load
`
`0.8
`
`1.0
`
`Figure 11
`
`Page 15 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 14 of 20
`
`US 7,103,595 B2
`
`Variable-Grain Striping
`
`1.0
`
`_ .. _.fl= 1
`" Fl= 10
`
`...... Fl= 30
`
`•
`
`CJ
`r.J
`
`QJ
`
`'"C
`B
`~ 0.6
`~
`~ 0.4
`·~ ~
`
`0.8
`
`0.2
`
`0.0
`
`0 .. 2
`
`0.6
`0.4
`Load
`
`0.8
`
`1.0
`
`Figure 12
`
`Page 16 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 15 of 20
`
`US 7,103,595 B2
`
`V aria hie-Grain Striping
`11 Max Rsrv II Avg Rsrv D Avg Diff
`LO
`
`,,-....
`CJ 0.8
`~
`fl)
`..._..,
`
`J 0.6
`
`~
`~
`17.1
`....-i 0.4
`Q
`
`"'O = = 0
`
`~
`
`0.2
`
`Figure 13
`
`8
`
`16
`Number of Disks
`
`32
`
`Page 17 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 16 of 20
`
`US 7,103,595 B2
`
`Fixed-Grain vs Variable .. Grain Striping
`61 Variable-Grain
`
`• Fi~d-Grn.in
`
`250
`
`200
`
`50
`
`0
`
`Figure 14
`
`16 Disks
`
`Page 18 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 17 of 20
`
`US 7,103,595 B2
`
`Simulated Disk Mode
`II AvgRsrv
`II Max Rsrv
`flt Avg Meas
`II Max Meas
`
`1.0
`
`..-..
`u 0.8
`~
`i:n
`._..
`~ s 0.6
`......
`E-f
`~ rn
`.......
`Q
`"'O
`
`0.4
`
`~ = 0
`0.2 =
`
`Fixed-Grain
`2 Disks
`
`Variable-Grain
`
`Figure 15
`
`Page 19 of 34
`
`
`
`1660
`clients
`
`dectXl.er
`
`interfa:e
`nenvor~
`
`buffe
`client
`
`1690
`
`1680
`
`1670
`
`1650
`
`network
`high·i~al
`
`1640
`
`1600
`
`Figure 16
`
`1610
`server
`
`1620
`
`1630
`
`1615
`
`e •
`
`•
`00
`
`Page 20 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 19 of 20
`
`US 7,103,595 B2
`
`stream
`requests
`
`Admission
`Control
`
`1710
`
`1715
`
`Buffer
`Manager
`1720
`
`Dispatcher
`
`1730
`
`sche
`dules
`
`Meradara
`Managers
`
`1725
`
`Disk
`Managers
`
`1735
`
`,------,
`1 System 1
`I
`I
`1 :Memory1
`,___ ___ -.... - -r--
`I
`I
`
`, -
`- - - -,
`-
`1 Network1
`I
`I
`1 In rerface 1
`---t---
`I
`I
`
`Figure 17
`
`Page 21 of 34
`
`
`
`U.S. Patent
`
`Sep.5,2006
`
`Sheet 20 of 20
`
`US 7,103,595 B2
`
`round
`offset
`
`1
`
`i+l
`
`J
`
`•••
`
`dispatch
`queues
`
`Figure 18
`
`J
`
`(
`
`2013
`
`./
`
`access t\me
`teservat1on
`tor one round
`
`.
`p11n1ary
`-+-__.__-L---...___..___ ~ backup
`•
`
`2012
`
`•
`
`•
`
`f
`
`j~
`
`I
`
`'
`I - - - - - - - - -
`
`611111• ~
`11!11 !lilUmll!lll!!l1111Hl~lll
`backup
`I I I
`. pnn1ary
`
`I
`
`2010
`
`2000
`
`Figure 19
`
`Page 22 of 34
`
`
`
`US 7, 103,595 B2
`
`1
`STREAMING SERVER
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`This application is related to and claims pnonty to
`Canadian patent application entitled STRIDE-BASED
`DISK SPACE ALLOCATION SCHEME having serial num(cid:173)
`ber 2,331,474, by Stergios V. Anastasiadis, filed Jan. 19,
`2001; to Canadian patent application entitled DISTRIB(cid:173)
`UTED MEDIA SERVER ARCHITECTURE having serial
`number 2,335,521, by Stergios V. Anastasiadis, filed Feb. 9,
`2001; and to Canadian patent application entitled SERVER
`BASED SMOOTHING OF VARIABLE BIT RATE
`STREAMS having serial number 2,335,540, by Stergios V. 15
`Anastasiadis, filed Feb. 9, 2001.
`The entire teachings of the above Canadian patent appli(cid:173)
`cations are further incorporated herein by reference.
`
`2
`System design complications coupled with excessive
`expectations from technological progress previously dis(cid:173)
`couraged the development of media servers efficiently sup(cid:173)
`porting video streams with variable bit rates. Several media
`server designs either i) support only constant bit-rate
`streams, ii) make resource reservations assuming a fixed bit
`rate for each stream, or iii) have only been demonstrated to
`work with constant bit rate streams.
`It is therefore an aspect of an object of the present
`10 invention for providing a distributed continuous-media
`server architecture, called Exedra, that efficiently supports
`variable bit-rate streams and reduces the requirements for
`storage device space, storage device bandwidth, buffer
`space, and network bandwidth with respect to servers that
`support only constant bit-rate streams.
`Vast storage and bandwidth capacity requirements of even
`compressed video streams make it necessary to stripe video
`files across multiple storage devices. Assuming that a media
`storage server serves requests for several different stream
`20 files, appropriate striping makes it possible to scale the
`number of supported streams to the limit of the server
`resources, independent of the particular stream file being
`requested by clients. This is possible by retrieving different
`parts of each stream file from different storage devices, thus
`25 restricting the degree of imbalance in utilization among the
`storage devices.
`However, it has been previously shown that both load
`imbalance across disks and disk overhead is causing disk
`striping of variable bit-rate streams to be efficient only on
`30 disk arrays of limited size. Therefore, the scalability of
`network servers that stripe variable bit-rate streams across
`multiple storage devices is a fundamental problem.
`It is an aspect of an object of the present invention for
`35 providing a new storage device space allocation technique
`and striping policies for variable bit-rate streams that
`increase system throughput and improve scalability.
`The Internet and online services introduce increasing
`requirements for quality-of-service guarantees in order to
`40 ensure that a failure does not result in denial of service for
`clients. Hard drives or storage devices continue to be a major
`source of failure. This is not surprising since disks are
`essentially the only moving mechanical parts of computers.
`It is therefore an aspect of an object of the present
`45 invention for providing fault tolerance in storage device
`arrays and clusters of computer nodes that support variable
`bit-rate streams.
`Variable bit-rate encoding of video streams can achieve
`quality equivalent to constant bit-rate encoding while requir-
`50 ing average bit rate that is lower by 40%. However, variable
`bit-rate streams have high variability in their resource
`requirements, which can lead to low utilization of storage
`device and network bandwidth in the common case. This
`occurs because the aggregate bandwidth requirements of
`concurrently served streams can be significantly higher than
`on average at particular time instances, and the admission
`control process bases its decisions on peak aggregate
`demand when considering new stream requests.
`In order to improve resource utilization and the through(cid:173)
`put of the system, a number of smoothing techniques have
`been proposed that can remove peaks in the required transfer
`bandwidth of individual streams by appropriately prefetch(cid:173)
`ing stream data at times of lower bandwidth demand. To date
`smoothing schemes always prefetched data into the client
`65 buffers. Although such an approach can improve the utili(cid:173)
`zation of both storage device and network bandwidth, it is
`dependent on the amount of buffer space available at the
`
`OTHER REFERENCES
`
`Stergios V. Anastasiadis, Supporting Variable Bit-Rate
`Streams in a Scalable Continuous Media Server, PhD
`Dissertation, University of Toronto, public on January 2002.
`Stergios V. Anastasiadis, Kenneth C. Sevcik, Michael
`Stumm, Disk Striping Scalability in the Exedra Media
`Server, SPIE/ACM Multimedia Computing and Network
`Conference, San Jose, Calif., January 2001 at 175.
`Stergios V. Anastasiadis, Kenneth C. Sevcik, Michael
`Stumm, Modular and Efficient Resource Management in the
`Exedra Media Server, Usenix Symposium on Internet Tech(cid:173)
`nologies and Systems, San Francisco, Calif., March 2001 at
`25.
`Stergios V. Anastasiadis, Kenneth C. Sevcik, Michael
`Stumm, Server-Based Smoothing of Variable Bit-Rate
`Streams, ACM Multimedia Conference, Ottawa, ON, Octo(cid:173)
`ber 2001 at 147.
`The entire teachings of the above references are further
`incorporated herein by reference.
`
`FIELD OF INVENTION
`
`This invention relates to network servers and, in
`particular, a streaming server that supports variable bit-rate
`streams and has at least one of a stride-based storage device
`space allocation scheme, stride-based method of striping
`data across multiple storage devices, distributed
`architecture, fault tolerant operation, and server-based
`smoothing.
`
`BACKGROUND OF THE INVENTION
`
`Spatial and temporal compression have made practical the
`storage and transfer of digital video streams with acceptable
`quality. Standardization (for example, through the MPEG
`specifications) has facilitated widespread distribution and 55
`use of compressed video content in a range of applications
`from studio post-production editing to home entertainment
`(e.g. Digital Versatile Disks). Although media streams can
`optionally be encoded at a constant bit rate, it has been
`shown that equally acceptable quality can be achieved using 60
`variable bit-rate encoding with average bit rates reduced by
`40%.
`As the installed network bandwidth increases, scalable
`network servers are becoming the dominating bottleneck in
`the wide deployment of broadband services. Therefore, the
`potential scalability of network servers that support variable
`bit-rate media streams is becoming a fundamental problem.
`
`Page 23 of 34
`
`
`
`4
`FIG. 6 is a graph of a ratio of total number of rejected
`streams over total number of accepted streams versus load
`under Fixed-Grain Striping;
`FIG. 7 is a graph of number of streams versus block size
`Bf under Fixed-Grain Striping;
`FIG. 8 is a graph of round disk time versus block size Bf
`under Fixed-Grain Striping;
`FIG. 9 is a graph of disk busy time per round versus
`number of disks under Fixed-Grain Striping;
`FIG. 10 is a graph of number of streams versus number of
`disks under Fixed-Grain Striping;
`FIG. 11 is a graph of number of streams versus load under
`Variable Grain Striping;
`FIG. 12 is a graph of a ratio of total number of rejected
`streams over total number of accepted streams versus load
`under Variable Grain Striping;
`FIG. 13 is a graph of round disk access time versus
`number of disks under Variable Grain Striping;
`FIG. 14 is a graph of number of streams versus individual
`stream types in both Fixed-Grain versus Variable-Grain
`Striping;
`FIG. 15 is a graph of round disk time in Simulated Disk
`Mode;
`FIG. 16 is a server-based system for smoothing of vari(cid:173)
`able bit-rate streams;
`FIG. 17 is a functional block diagram of the distributed
`media server of FIG. 1;
`FIG. 18 is a block diagram of a circular vector of dispatch
`queues that keep track of admitted streams yet to be acti(cid:173)
`vated; and
`FIG. 19 is shown a block diagram of a fault-tolerant disk
`35 array for a media server.
`DETAILED DESCRIPTION
`
`15
`
`20
`
`25
`
`US 7, 103,595 B2
`
`3
`client. However, emergence of client devices with widely
`different hardware configurations make it necessary to
`reconsider such assumptions.
`It is therefore an aspect of an object of the present
`invention a smoothing method that uses buffer space avail(cid:173)
`able in the server and provides efficient striping of variable
`bit-rate streams across either homogeneous or heteroge(cid:173)
`neous storage devices.
`
`SUMMARY OF THE INVENTION
`
`10
`
`According to an aspect of the invention, there is provided
`a method and system for accessing variable bit-rate streams
`from one or a multitude of secondary storage devices. The
`system provides that data for each stream are retrieved
`according to a prespecified rate that may vary over time. The
`space of each storage device is managed as a collection of
`fixed-size chunks with length larger than a given minimum.
`The data transfers occur in periods (or rounds) of fixed
`duration. The data of each stream is distributed across a
`multitude of storage devices in a way that only one storage
`device is accessed for a stream during one or a multitude of
`rounds. Detailed accounting is done for the access time of
`the storage devices, the transmission time of the network
`transfer devices, and the available memory space in the
`system for each round. The space of each storage device is
`managed independently from that of the others. The avail(cid:173)
`able memory is allocated contiguously in the virtual space
`for each access of a storage device, and can be deallocated
`in units smaller than the length of the original allocation. 30
`When storage devices fail, data redundancy and extra
`reserved device channel bandwidth guarantee uninterrupted
`system operation.
`According to a further aspect of the invention, there is
`provided a server-based smoothing method that uses only
`buffer space available at the server for smoothing storage
`device data transfers. The smoothing method is also
`extended to support striping of variable bit-rate streams
`across heterogeneous storage devices. The present invention
`maximizes the average number of users supported concur- 40
`rently in continuous-media server systems by applying
`smoothing techniques and combining them appropriately
`with storage device striping and admission control policies.
`In order to prevent excessive smoothing from exhausting the
`available buffer space, data prefetching is done as long as the 45
`proportion of server buffer required by each stream does not
`exceed the corresponding proportion of the required storage
`device bandwidth. Thus, the smoothing process is adjusted
`automatically, according to the total memory and storage
`device bandwidth available in the server.
`
`Referring to the drawings and initially to FIG. 1, there is
`illustrated a block diagram of a distributed streaming server
`100 in accordance with an embodiment of the present
`invention. The distributed streaming server 100 comprises
`storage devices 110 for storing media stream data 114,
`transfer nodes 120, admission control nodes 130, and a
`schedule database 140 containing scheduling information.
`The media stream data 114 are compressed, such as, accord(cid:173)
`ing to the MPEG-2 specification, with constant quality
`quantization parameters and variable bit rates. Clients 150
`with appropriate stream decoding capability send client/
`playback requests 160 and receive stream data 114 via a
`50 high-speed network 170. Alternate compression schemes as
`is known in the art may also be used.
`In the streaming server 100, the stream data 114 are
`retrieved from the storage devices 110 and sent to the clients
`150 through the Transfer Nodes 120. Both the admission
`55 control nodes 130 and the transfer nodes 120 use of stream
`scheduling information maintained in the Schedule Data(cid:173)
`base 140.
`The streaming server 100 is operated using the server(cid:173)
`push model, but other models are possible. When a playback
`60 session starts for a client 150, the server 100 periodically
`sends data to the client 150 until either the end of the stream
`is reached, or the client 150 explicitly requests suspension of
`the playback. The server-push model reduces the control
`traffic from the client to the server and facilitates resource
`65 reservation at the server side, when compared to a client-pull
`model. The data transfers occur in rounds of fixed duration
`T round: in each round, an appropriate amount of data is
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will be described in detail with
`reference to the accompanying drawings, in which like
`numerals denote like parts, and in which
`FIG. 1 is a block diagram of a distributed streaming server
`in accordance with an embodiment of the present invention;
`FIG. 2 is a logical representation of a stride-based allo(cid:173)
`cation of disk space on one disk of FIG. 1;
`FIG. 3 is a graph of data requirements of twenty con(cid:173)
`secutive rounds one second each, in an MPEG-2 clip;
`FIGS. 4(a) to (c) are graphs of data accesses for the clip
`of FIG. 3 using alternative striping techniques over two
`disks;
`FIG. 5 is a graph of number of streams versus load under
`Fixed-Grain Striping;
`
`Page 24 of 34
`
`
`
`US 7, 103,595 B2
`
`15
`
`5
`6
`retrieved from the storage devices 110 into a set of server
`on one storage device of FIG. 1. In stride-based allocation,
`buffers 122 reserved for each active client 150.
`storage device space is allocated in large, fixed-sized chunks
`Concurrently, data are sent from the server buffers 122 to the
`(i, i+l, i+2, i+3, i+4, i+5) called strides 200, which are
`client 150 through network interfaces 124. Round-based
`chosen larger than the maximum stream request size per
`operation is used in media servers in order to keep the
`storage device during a round. The stored streams are
`reservation of the resources and the scheduling-related
`accessed sequentially according to a predefined (albeit
`bookkeeping of the data transfers manageable.
`variable) rate; therefore, the maximum amount of data
`Due to the large amount of network bandwidth required
`accessed from a storage device during a round for a stream
`for this kind of service, preferably the server 100 is con(cid:173)
`is known a priori. Stride-based allocation eliminates external
`nected to a high-speed network 170 through a multitude of
`10 fragmentation, while internal fragmentation remains negli(cid:173)
`network interfaces 124. The amount of stream data periodi(cid:173)
`gible because of the large size of the streams, and because
`cally sent to the clients 150 are determined by the decoding
`a stride may contain data of more than one round as shown
`frame rate of the stream and the resource management
`in stride i+3 of FIG. 2. The contents of each stride are
`policy of the server 100. A policy is to send to the client
`tracked in a stream stride index 210.
`during each round the amount of data that will be needed for
`When a stream is retrieved, only the requested amount of
`the decoding process of the next round; any other policy that
`data (one of Requests j, j+l, j+2, or j+3), and not the entire
`does not violate the timing requirements and buffering
`stride 200, necessary for each round is fetched to memory at
`constraints of the decoding client is also acceptable.
`a time. Since the size of a stream request per round never
`The stream data are stored across multiple storage devices
`exceeds the stride size, at most two partial stride accesses
`110, such as hard drives or disks, as shown in FIG. 1. Every
`(two seek and rotation delays) are required to serve the
`storage device 110 is connected to a particular Transfer
`request of a round on each storage device. Thus, the stride(cid:173)
`Node 120, through a Storage Interconnect 112, which is
`based allocation scheme sets an upper-bound on the esti(cid:173)
`either i) a standard I/O channel (for example, Small Com(cid:173)
`mated storage device access overhead during retrieval. This
`puter System Interface), ii) standard network storage equip(cid:173)
`avoids an arbitrary number of actuator movements required
`ment (for example, Fibre-Channel), or iii) a general purpose
`by prior allocation methods. This also keeps storage device
`network (as with Network-Attached Secure Disks).
`access head movements to a minimum. For example, while
`Alternately, part of the server functionality can be offloaded
`i+3 stride 200 contains parts ofRequestj+l and Requestj+2,
`to network-attached storage devices.
`and Request j+3, there is no Request that is stored over more
`The Transfer Nodes 120 are computers responsible for
`than two strides 200.
`scheduling and initiating all data accesses from the attached 30
`While storing the data of each storage device request
`storage devices 110. Data arriving from the storage devices
`contiguously, on a disk for example, would reduce the
`110 are temporarily staged in the Server Buffer memory 122
`storage device overhead to a single seek and rotation delay
`of the Transfer Node 120 before being sent to the client 150
`(instead of two at most), the overhead for storage manage-
`through the high-speed network 170. The bandwidth of the
`ment (bookkeeping) of large highly utilized storage devices
`system bus (such as the Peripheral Component Interconnect) 35
`could become significant. An advantage of the present
`is the critical resource within each Transfer Node 120 that
`invention is the reduction of overhead for storage manage(cid:173)
`essentially defines the number and the capacity of the
`ment.
`attached network or I/O channel interfaces.
`In the sequence definitions that follow, a zero value is
`Playback requests 160 arriving from the clients 150 are
`assumed outside the specified range.
`initially directed to an Admission Control Node 130, where 40
`In the server 100 with D functionally equivalent storage
`it is determined whether enough resources exist to activate
`devices, the stream Network Sequence, Sn, of length Ln
`the requested playback session either immediately or within
`defines the amount of data, Sn[i], l=<i=<Ln, that the server
`a few rounds. If a new playback request is accepted, com(cid:173)
`100 sends to a particular client 150 during round i after its
`mands are sent to the Transfer Nodes 120 to begin the
`playback starts. Similarly, the Buffer Sequence Sb of length
`appropriate data accesses and transfers. The computational 45
`Lb=Ln+l defines the server buffer 122 space, Sb(i), =<i=
`complexity of the general stream scheduling problem is
`<Lb, occupied by the stream data during round i. The
`combinatorial in the number of streams considered for
`Storage Device Sequence Sd of length Ld=Ln defines the
`activation and the number of reserved resources. As the
`total amount of data Sd(i), =<i=<Ld-1, retrieved from all the
`acceptable initiation latency is limited, a simple scheduling
`approach with complexity linear with the number of rounds 50 storage devices 110 in round