`Mattis et ai.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006292880Bl
`US 6,292,880 BI
`Sep.18,2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) ALIAS-FREE CONTENT-INDEXED OBJECT
`CACHE
`
`(75)
`
`Inventors: Peter Mattis, Belmont; John Plevyak,
`San Francisco, both of CA (US);
`Matthew Haines, Laramie, WY (US);
`Adam Beguelin, San Mateo, CA (US);
`Brian Totty, Foster City, CA (US);
`David Gourley, Palo Alto, CA (US)
`
`(73) Assignee: Inktomi Corporation, Foster City, CA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.c. 154(b) by 0 days.
`
`(21) Appl. No.: 09/060,886
`
`(22) Filed:
`
`Apr. 15, 1998
`
`Int. CI? ...................................................... G06F 12/00
`(51)
`(52) U.S. CI. ................................ 711/216; 7111118; 707/7
`(58) Field of Search ................................ 7111118, 3, 123,
`7111206,216,221; 707/3, 10,203,7
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4,918,646 *
`5,542,087 *
`5,611,049
`5,748,954
`5,850,443
`5,870,763
`
`Hirose ...................................... 707/3
`Neimat et al. ......................... 707/10
`Pitts ... ... ... ... .... ... ... ... ... ... .... .... 707/10
`Mauldin ................................... 707/8
`Oorschot et al. ...................... 380/21
`Lomet ... ... ... .... ... ... ... ... ... ...... 707/202
`
`4/1990
`7/1996
`3/1997
`5/1998
`* 12/1998
`2/1999
`
`C~agTable102
`
`1Q4a
`
`1Q4b
`
`110
`
`................... 709/101
`2/1999 Copeland et al.
`5,872,969
`5,978,791 * 11/1999 Farber et al.
`............................ 707/2
`* cited by examiner
`Primary Examiner-Matthew Kim
`Assistant Examiner-Matt Anderson
`(74) Attorney, Agent, or Firm-Hickman Palermo Truong
`& Becker LLP; Christopher 1. Palermo; Marcel K. Bingham
`
`(57)
`
`ABSTRACT
`
`A method for caching information objects is provided.
`Information objects are stored in portions of a non-volatile
`storage device called arenas, which are contiguous regions
`from which space is allocated in parallel. Objects are con(cid:173)
`tiguously allocated within an arena and are mapped to
`directory tables that provide an efficient search mechanism.
`Each object is identified by a name key and a content key.
`The name key is constructed by applying a hash function to
`the composition of the name or URL of the object along with
`implicit or explicit context about the request. The content
`key is constructed by applying a hash function to the entire
`contents of the object data. Buckets and blocks in the
`directory tables store tags and subkeys derived from the
`keys. Since duplicate objects that have different names will
`hash to the same content key, the cache can detect duplicate
`objects even though they have different names, and store
`only one copy of the object. As a result, cache storage usage
`is dramatically reduced, and tracking object aliases is not
`required. The disclosure also encompasses a computer
`apparatus, computer program product, and computer data
`signal embodied in a carrier wave that are configured
`similarly.
`
`26 Claims, 26 Drawing Sheets
`
`132a-n
`
`106
`
`104n
`
`Size 12.0.
`
`Vector of Alternates
`
`Vector of Alternates
`
`m.a
`Request Info
`Response Info
`Object Key
`
`Page 1 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 1 of 26
`
`US 6,292,880 BI
`
`FIGURE 1
`
`3.Q
`Proxy
`
`4Q
`Server
`
`1O.a
`Client
`
`1Qb
`Client
`
`Page 2 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 2 of 26
`
`US 6,292,880 BI
`
`FIGURE 2
`
`au Cache
`
`-to-
`
`I Ta~s I
`
`Proxy
`
`3D
`
`IQ
`Protocol
`Engine
`
`~
`
`l'
`
`I/O Core
`
`6.Q
`
`2.Q
`Intemet
`
`~
`~
`
`-....
`
`9.O.a
`
`,
`,
`
`-....
`
`9.Ob
`
`I
`
`90n
`
`-"
`
`~
`
`--'
`
`'-
`
`(.
`
`.......
`
`C
`
`.....
`
`5.Q
`Request for object
`
`/
`
`/
`~ J
`
`.1Qa
`Client
`
`Page 3 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 3 of 26
`
`US 6,292,880 BI
`
`FIGURE 3A
`
`52
`Object
`
`5.4
`Hash function
`
`"
`
`(Object Key 56
`
`sa Set Subkey
`
`5..9. Tag
`Subkey
`
`5..[ Remainder Subkey
`
`FIGURE 38
`
`5.3.
`Object name
`
`"' SA
`
`Hash function
`
`\... •
`
`6.4
`fifi
`Subkey 1 Subkey 2
`
`62.
`Name Key
`
`Page 4 of 48
`
`
`
`I-"
`~
`Q
`00
`00
`N
`\0
`N
`0'1
`rJ'l
`
`e
`
`0'1
`N
`o ....,
`~
`~ .....
`'JJ. =(cid:173)~
`
`'"""
`C
`C
`N
`~CIO
`'"""
`~ '?
`'JJ.
`
`~ = .....
`~ .....
`~
`rJl .
`d .
`
`L121
`
`132n
`
`~~ ~~132a-n
`
`132a ,
`
`-I ·-·1·131~i1 _±,_
`
`I -,.--' _---,
`
`-
`
`c=::Open Directory 130
`
`• ~ ~
`Vector of Alternates 122n I
`Vector of Alternates .122..b I
`Vector of Alternates 122a I
`
`Object Key
`Response Info
`Request Info
`123a
`
`Stored Object
`
`• L 124
`
`•
`
`112b
`
`Size 12.Q
`
`~
`
`-+----
`
`Disk location 1.18
`
`Subkey .116 _
`
`-~
`112n
`
`11Qa j' "~3;a1lC:~;"S~.~~_.=lt_ ~1~;]1~1 n
`11Qh:n
`
`11Qn
`
`J
`
`'-106
`
`104n
`
`(
`
`11 O~I-I .'------1
`
`1.1Qc
`
`112a
`
`---....
`
`J
`
`1Mb
`
`liMa
`
`DirectoryTable 110
`
`~2
`
`FIGURE4A
`
`Page 5 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 5 of 26
`
`US 6,292,880 BI
`
`FIGURE 48
`
`Storage device illl.a
`
`.122a
`12.3.g
`"English"J "English"J
`Object Key
`
`12.3..b.
`"IE4"J "IE4"J
`Object Key
`
`123c
`"Navigator/' .. -
`Object Key
`
`Stored Object 124a
`
`
`
`11 ...... __ 12-::-2_b _----'H Stored Object 12A.Il H __ '---1,-22-n-----'
`
`/
`
`Vectors of Alternates122a-n
`
`Page 6 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 6 of 26
`
`US 6,292,880 BI
`
`DIRECTORY TABLE 110
`
`/
`
`110a
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`SEGMENT 111 a
`
`SEGMENT 111 b
`
`'"'C!":!M~NTA 11
`~I...\..:l c.
`I. C
`
`-
`
`-
`
`•
`•
`.!. -
`•
`•
`•
`•
`•
`- .. - - -
`•
`•
`•
`•
`•
`•
`•
`•
`•
`liOn
`
`-
`
`-
`
`FIGURE 4C
`
`Page 7 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 7 of 26
`
`US 6,292,880 BI
`
`FIGURE 5
`
`Directory Table 110
`r - - - - - - - - - - - - - - - - - - - - - l
`
`Tail
`141e
`
`Head
`141a
`
`141d
`
`141b
`
`141c
`
`Subkey 132 I~
`\
`
`____ -~~ Fragment 208c
`
`I
`I
`
`Fragment 208a
`
`130b--
`
`-+---I~
`
`Fragment 208b
`
`- -.... 128c
`
`Page 8 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 8 of 26
`
`US 6,292,880 BI
`
`FIGURE 6
`
`Pool 2QQa
`
`I
`I
`
`Pool 2O.Oh
`
`I
`I
`I
`I
`
`1
`
`I
`
`- - - -
`
`~ ~
`
`Arena
`Arena
`.1
`\
`~~---~~-~~~~~204n
`204b
`
`Storage Device lliJB
`
`I
`
`I
`
`200
`
`( -
`
`;:
`
`202
`~---:
`/
`
`Header
`
`I
`
`I
`
`I
`
`I
`
`I
`
`Arena
`\
`'---204a
`
`--~~-
`
`204a
`
`\
`
`-,::,.
`
`2.Q2 Pool Header
`Magic
`Version No.
`No. of Arenas
`
`2O.6..a Arena Header
`Empty
`Corrupt
`
`I
`
`2ll6n Arena Header
`Empty
`Corrupt
`
`Fragment
`
`-,
`
`2.Q8.d Fragment
`Header
`
`2.O.B..e Fragment Data
`-
`
`"
`
`I
`
`I
`
`12Qfu; Magic 1120ful Key 112lliili Length I
`
`Page 9 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 9 of 26
`
`US 6,292,880 BI
`
`FIGURE 7
`
`210
`204a
`~~ (
`
`2O.8d Fragment
`Header
`
`2.O.ae Fragment
`Data
`
`Page 10 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 10 of 26
`
`US 6,292,880 BI
`
`FIGURE 8A
`
`to)
`
`"Low water mark"
`"High water mark"
`
`Last arena accessed
`Number of accesses
`
`Garbage
`Collection
`
`8.02
`Select pool
`
`.8.0.4
`Select arena
`
`8.0.6.
`Select fragment
`
`Selection factors
`
`BQ8.
`Fragment to be
`deleted?
`
`B.12
`Mark fragment as
`deleted
`
`81Q
`Write fragment to new
`arena
`
`8.14
`Update Directory Table
`
`( 81fi Don~
`
`Page 11 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 11 of 26
`
`US 6,292,880 BI
`
`Write fragment
`to new arena
`
`FIGURE 88
`
`90
`
`Update
`Open
`Index
`
`Yes
`
`Step 806
`
`Reset Top
`Pointer
`
`(Wfi Don~
`
`Obtain lock on
`selected Arena
`
`r---~:"--_ 582
`Write object header
`in available Arena
`
`586
`
`Write object data 1----1
`in selected Arena
`
`Page 12 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 12 of 26
`
`US 6,292,880 BI
`
`820
`
`822
`
`824
`
`Synchronization
`
`Write object to
`cache
`
`Create metadata
`in Open Directory
`
`Write and synch 1----826
`object data to disk
`
`Get next metadata
`from Open Directory
`
`1----828
`
`FIGURE Be
`
`B21
`Garbage collection
`
`m
`For each fragment in Arena, delete
`directory metadata that points to the
`fragment and write it to disk
`
`B.3.1
`Modify pool header in
`memory to show Arena as
`empty
`
`~
`Write pool header and
`sync to disk
`
`B2.3.
`Copy metadata from Open
`Directory to Directory Table
`
`825
`Sync changes
`
`Page 13 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 13 of 26
`
`US 6,292,880 BI
`
`FIGURE 80
`
`Open Directory:
`"checkout_read"
`
`Key
`
`NO
`
`B32
`Get lock for key
`YES
`
`B..3.4.
`Compute bucket for
`key
`
`NOT FOUND
`
`8.3fi
`Scan blocks in
`bucket's list for key
`
`FOUND
`
`8.3..8
`Is block being created
`or destroyed?
`
`.8A.Q
`Return "not available"
`error
`
`B..42
`Increment block
`reader count
`
`8..4A
`Return copy of open
`directory block
`
`Ma
`Retum "does not exist"
`error
`
`NOT FOUND
`
`B..4fi
`Cache Directory
`lookup
`FOUND
`
`B.5.Q
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`Page 14 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 14 of 26
`
`US 6,292,880 BI
`
`FIGURE 8E
`
`Open Directory:
`"checkout_write"
`
`Key
`
`NO
`
`8M
`Get lock for key
`YES
`
`B..5..6
`Compute bucket for
`key
`
`B5.8
`Scan blocks in
`bucket's list for key
`
`.8.6A.
`Is block being created
`or destroyed?
`
`BA.Q
`Return "not available"
`error
`
`a4.2
`Increment block
`reader count
`
`.8..4..4
`Return copy of open
`directory block
`
`B.62
`Return "does not exist"
`error
`
`NOT FOUND
`
`.8.6D
`Cache Directory
`lookup
`FOUND
`
`85D.
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`Page 15 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 15 of 26
`
`US 6,292,880 BI
`
`FIGURE 8F
`
`Open Directory:
`"checkout_create"
`
`Key
`
`NO
`
`Get lock for key
`YES
`BIB.
`Compute bucket for
`key
`
`B..8..Q
`Scan blocks in
`bucket's list for key
`
`NOT FOUND
`
`aB.6
`Return "already exists"
`error
`
`FOUND
`
`.aM
`Cache Directory
`lookup
`
`NOT FOUND
`
`.aM.
`NO
`>----~ Return "not available"
`error
`
`BB.B.
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`B.9.2
`Return copy of open directory
`block and mark block as
`being modified
`
`BBfi
`Increment block
`reader count
`
`8..8.B.
`Initialize block from
`key
`
`aao
`Increment block writer count
`and mark copy as created
`
`Page 16 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 16 of 26
`
`US 6,292,880 BI
`
`FIGURE 9A
`
`91B.
`Look up key value in
`Directory Table
`
`YES
`
`9.1ll
`Obtain object and
`deliver to client
`
`.922
`Update Open
`Directory if needed
`
`924.
`Cache Miss
`
`92fi
`Obtain object from
`source server
`
`92B
`Done
`
`9Q2
`LOOKUP
`
`9.DA
`Convert object name
`in request to key value
`
`9ili3.
`Look up request key
`value in Open Directory
`
`9.12
`Look up request key
`value in Tag Table
`
`am
`Store Tag Table miss
`
`Page 17 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 17 of 26
`
`US 6,292,880 BI
`
`FIGURE 98
`
`. /
`
`--
`
`r Block
`I
`
`-
`
`NO
`
`Open Directory
`block checkin
`
`.-..... -
`'/
`93[l
`Get lock for key
`
`YES
`
`\/
`
`.9..32
`Is block being
`modified?
`NO
`\/
`9.4.6.
`Decrement reader
`count
`
`YES .....
`.,..
`
`.9.3A
`Decrement writer count
`
`\/
`
`a3..6
`Is checkin successful?
`
`YES .....
`,
`
`NO
`
`9..42
`Copy block info back to
`original block
`I
`
`NO
`
`-}-
`
`'if
`
`'I
`9..3.8
`Is delete checkin flag
`set?
`YES
`\I
`
`9.4A
`Mark block as deleted
`
`....
`..... ,1/
`
`9A.O. Done
`
`Page 18 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 18 of 26
`
`US 6,292,880 BI
`
`FIGURE 9C
`
`-
`( each e 100 k u p },,--,~::::.---.....,
`
`Key
`
`95D.
`I----.--;; ...... ;:w
`-~ Return not found
`
`\/
`B.4.8.
`Check out directory
`blocks
`
`\1
`95A
`Check in directory
`blocks
`
`95fi
`Return found
`
`~
`
`\1
`
`.9..52 Done
`
`"'
`
`)
`
`Page 19 of 48
`
`
`
`u.s. Patent
`
`Sep. 18, 2001
`
`Sheet 19 of 26
`
`US 6,292,880 BI
`
`FIGURE 90
`
`Key
`
`FAILURE
`
`9.5.8
`Check out directory
`blocks
`
`SUCCESS
`
`9.5.4.
`Check in directory blocks
`with deletion flag set
`
`Page 20 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 20 of 26
`
`US 6,292,880 BI
`
`FIGURE 9E
`
`Object name
`Request info
`
`I Cache open read
`
`\.,.. , .......
`
`FAILURE
`
`9M
`Check out vector for
`reading
`SUCCESS
`\1
`
`912
`FAILURE ......
`.;' Return no document
`error
`
`9.6.6.
`Use request info to select
`alternate from vector
`SUCCESS
`,V
`.9.6.8
`Check in vector
`
`9lQ
`Read object pointed to
`by alternate
`
`Page 21 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 21 of 26
`
`US 6,292,880 BI
`
`FIGURE 9F
`
`Object name
`Request info
`Response info
`
`Cache open write
`
`'I
`
`Check out vector for
`writing
`SUCCESS
`
`FAILURE
`
`9.8A
`Create vector
`
`9.B2
`FAILURE
`r--I...---~::; Return no document
`error
`
`\1
`9lfi
`Use request info to define
`alternate for vector; add
`alternate to vector
`
`SUCCESS
`\V
`9lB.
`Check in vector
`
`\1
`9.8.0.
`Write object pointed to
`by alternate
`
`Page 22 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 22 of 26
`
`US 6,292,880 BI
`
`FIGURE 9G
`
`""'"
`
`Object name
`Old identifier
`Request info
`Response info
`
`~
`
`- Return not available
`
`9Ba
`
`error
`
`/
`
`\..
`
`FAILURE
`
`Cache update
`
`,.
`......
`
`\It
`9.8.6
`Check out vector for
`writing
`
`SUCCESS
`'V
`aaa
`Construct new clone
`vector
`
`\It
`
`9ili1
`Write new clone vector
`
`t
`
`9.92.
`Check in clone vector
`
`\1
`
`9.9A Done
`
`Page 23 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 23 of 26
`
`US 6,292,880 BI
`
`FIGURE 10A
`
`~--I~ Request Arena with
`available space
`
`fi5fi
`Store contents of 'Mite
`aggregation buffer in Arena
`
`660
`
`Update index
`
`640
`
`642
`
`Allocation
`and Write Object
`
`Look up and retrieve
`object from origin
`server
`
`Get object length
`from HITP request, if
`
`present ,
`
`Store object in write
`aggregation buffer
`
`648
`
`fi5.D.
`Write
`aggregation
`buffer full?
`
`Page 24 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 24 of 26
`
`US 6,292,880 BI
`
`FIGURE 108
`
`620
`
`622
`
`624
`
`Scaled Counter
`Updating
`
`all counters
`
`Sum all
`counter values
`
`626
`
`Compute Maximum
`Value that can be
`represented by a counter
`
`Decrement all
`counter values by 1
`
`632
`
`Page 25 of 48
`
`
`
`I--"
`~
`Q
`00
`00
`N
`\0
`N
`0'1
`rJ'l
`
`e
`
`0'1
`N
`o ....,
`Ul
`N
`~ .....
`'JJ. =(cid:173)~
`
`'""'"
`C
`C
`N
`'""'"
`~CIO
`~ '?
`'JJ.
`
`~ = .....
`~ .....
`~
`rJl .
`d .
`
`1126
`
`1128
`
`ISP
`
`INTERNET
`
`1131)
`
`SERVER f-
`
`.1124
`
`HOST
`
`L
`
`1122
`
`)
`
`NETWORK
`
`LOCAL
`
`.1100 I 1120
`: L/N~
`:NETWOR~(
`I
`I
`
`.1118
`
`INTERFACE
`COMMUN ICA TION
`
`'S,.7
`
`.11.0.4
`
`PROCESSOR
`
`'<-)Y
`
`CONTRO~ }r--
`CURSOR 0----J
`
`R ~~~-------~~--~
`
`BUS
`
`v
`
`v
`
`1102
`
`2
`
`INPUT DEV~ k:===:
`
`.(>.
`
`111.Q
`
`DEVICE
`STORAGE
`
`.(">.
`
`11ill1
`
`ROM
`
`.(';.
`
`.11Ofi
`
`MEMORY
`
`MAIN
`
`DISPLA ~ K..------,
`
`FIG. 11
`
`Page 26 of 48
`
`
`
`u.s. Patent
`
`Sep.18,2001
`
`Sheet 26 of 26
`
`US 6,292,880 BI
`
`FIGURE 12
`
`1212
`(information object has
`not been loaded recently)
`
`12.1A
`Tell process "no'
`
`121fi
`Update expiration date value
`of information object to the
`current date or time
`
`12il2
`External process asks cache
`whether an object has been
`loaded by a client recently
`
`12M.
`Cache locates the information
`object in the cache
`
`12illi
`Read a Read Counter value
`associated in directory tables
`with the information object
`
`NO
`
`121Q
`Tell process 'yes"
`
`Page 27 of 48
`
`
`
`US 6,292,880 Bl
`
`1
`ALIAS-FREE CONTENT-INDEXED OBJECT
`CACHE
`
`FIELD OF THE INVENTION
`
`The present invention relates to information delivery, and
`relates more specifically to a cache for information objects
`that are to be delivered efficiently and at high speed over a
`network to a client.
`
`BACKGROUND OF THE INVENTION
`
`Several important computer technologies rely, to a great
`extent, upon rapid delivery of information from a central
`storage location to remote devices. For example, in the
`client/server model of computing, one or more servers are 15
`used to store information. Client computers or processes are
`separated from the servers and are connected to the servers
`using a network. The clients request information from one of
`the servers by providing a network address of the informa(cid:173)
`tion. The server locates the information based on the pro- 20
`vided network address and transmits it over the network to
`the client, completing the transaction.
`The World Wide Web is a popular application of the
`client/server computing model. FIG. 1 is a simplified block
`diagram of the relationship between elements used in a Web 25
`system. One or more web clients lOa, lOb, each of which is
`a computer or a software process such as a browser program,
`are connected to a global information network 20 called the
`Internet, either directly or through an intermediary such as
`an Internet Service Provider, or an online information ser- 30
`vice.
`A web server 40 is likewise connected to the Internet 20
`by a network link 42. The web server 40 has one or more
`internet network addresses and textual host names, associ(cid:173)
`ated in an agreed-upon format that is indexed at a central
`Domain Name Server (DNS). The server contains multime(cid:173)
`dia information resources, such as documents and images, to
`be provided to clients upon demand. The server 40 may
`additionally or alternatively contain software for dynami(cid:173)
`cally generating such resources in response to requests.
`The clients lOa, lOb and server 40 communicate using
`one or more agreed-upon protocols that specify the format of
`the information that is communicated. A client lOa looks up
`network address of a particular server using DNS and
`establishes a connection to the server using a communica(cid:173)
`tion protocol called the Hypertext Transfer Protocol
`(HTTP). A Uniform Resource Locator (URL) uniquely
`identifies each information object stored on or dynamically
`generated by the server 40. A URL is a form of network
`address that identifies the location of information stored in
`a network.
`A key factor that limits the performance of the World
`Wide Web is the speed with which the server 40 can supply
`information to a client via the Internet 20. Performance is
`limited by the speed, reliability, and congestion level of the
`network route through the Internet, by geographical distance
`delays, and by server load level. Accordingly, client trans(cid:173)
`action time can be reduced by storing replicas of popular
`information objects in repositories geographically dispersed
`from the server. Each local repository for object replicas is
`generally referred to as a cache. A client may be able to
`access replicas from a topologically proximate cache faster
`than possible from the original web server, while at the same
`time reducing Internet server traffic.
`In one arrangement, as shown in FIG. 1, the cache is
`located in a proxy server 30 that is logically interposed
`
`2
`between the clients lOa, lOb and the server 40. The proxy
`server provides a "middleman" gateway service, acting as a
`server to the client, and a client to the server. A proxy server
`equipped with a cache is called a caching proxy server, or
`commonly, a "proxy cache".
`The proxy cache 30 intercepts requests for resources that
`are directed from the clients lOa, lOb to the server 40. When
`the cache in the proxy 30 has a replica of the requested
`resource that meets certain freshness constraints, the proxy
`10 responds to the clients lOa, lOb and serves the resource
`directly. In this arrangement, the number and volume of data
`transfers along the link 42 are greatly reduced. As a result,
`network resources or objects are provided more rapidly to
`the clients lOa, lOb.
`A key problem in such caching is the efficient storage,
`location, and retrieval of objects in the cache. This document
`concerns technology related to the storage, location, and
`retrieval of multimedia objects within a cache. The object
`storage facility within a cache is called a "cache object
`store" or "object store".
`To effectively handle heavy traffic environments, such as
`the World Wide Web, a cache object store needs to be able
`to handle tens or hundreds of millions of different objects,
`while storing, deleting, and fetching the objects simulta(cid:173)
`neously. Accordingly, cache performance must not degrade
`significantly with object count. Performance is the driving
`goal of cache object stores.
`Finding an object in the cache is the most common
`operation and therefore the cache must be extremely fast in
`carrying out searches. The key factor that limits cache
`performance is lookup time. It is desirable to have a cache
`that can determine whether an object is in the cache (a "hit")
`or not (a "miss") as fast as possible. In past approaches,
`35 caches capable of storing millions of objects have been
`stored in traditional file system storage structures. Tradi(cid:173)
`tional file systems are poorly suited for multimedia object
`caches because they are tuned for particular object sizes and
`require multiple disk head movements to examine file sys-
`40 tem metadata. Object stores can obtain higher lookup per(cid:173)
`formance by dedicating DRAM memory to the task of object
`lookup, but because there are tens or hundreds of millions of
`objects, the memory lookup tables must be very compact.
`Once an object is located, it must be transferred to the
`45 client efficiently. Modern disk drives offer high performance
`when reading and writing sequential data, but suffer signifi(cid:173)
`cant performance delays when incurring disk head move(cid:173)
`ments to other parts of the disk. These disk head movements
`are called "seeks". Disk performance is typically con-
`50 strained by the drive's rated seeks per second. To optimize
`performance of a cache, it is desirable to minimize disk
`seeks, by reading and writing contiguous blocks of data.
`Eventually, the object store will become full, and particu(cid:173)
`lar objects must be expunged to make room for new content.
`55 This process is called "garbage collection". Garbage collec(cid:173)
`tion must be efficient enough that it can run continually
`without providing a significant decrease in system
`performance, while removing objects that have the least
`impact on future cache performance.
`60 PAST APPROACHES
`In the past, four approaches have been used to structure
`cache object stores: using the native file system, using a
`memory-blocked "page" cache, using a database, and using
`a "cyclone" circular storage structure. Each of these prior
`65 approaches has significant disadvantages.
`The native file system approach uses the file system of an
`operating system running on the server to create and manage
`
`Page 28 of 48
`
`
`
`US 6,292,880 Bl
`
`3
`a cache. File systems are designed for a particular applica(cid:173)
`tion in mind: storing and retrieving user and system data
`files. File systems are designed and optimized for file
`management applications. They are optimized for typical
`data file sizes and for a relatively small number of files (both
`total and within one folder/directory). Traditional file sys(cid:173)
`tems are not optimized to minimize the number of seeks to
`open, read/write, and close files. Many file systems incur
`significant performance penalties to locate and open files
`when there are large numbers of files present. Typical file 10
`systems suffer fragmentation, with small disk blocks scat(cid:173)
`tered around the drive surface, increasing the number of disk
`seeks required to access data, and wasting storage space.
`Also, file systems, being designed for user data file
`management, include facilities irrelevant to cache object
`stores, and indeed counter-productive to this application. 15
`Examples include: support for random access and selective
`modification, file permissions, support for moving files,
`support for renaming files, and support for appending to files
`over time. File systems are also invest significant energy to
`minimize any data loss, at the expense of performance, both 20
`at write time, and to reconstruct the file system after failure.
`The result is that file systems are relatively poorly for
`handling the millions of files that can be present in a cache
`of Web objects. File systems don't efficiently support the
`large variation in Internet multimedia object size-in par- 25
`ticular they typically do not support very small objects or
`very large objects efficiently. File systems require a large
`number of disk seeks for metadata traversal and block
`chaining, poorly support garbage collection, and take time to
`ensure data integrity and to repair file systems on restart.
`The page cache extends file systems with a set of fixed
`sized memory buffers. Data is staged in and out of these
`buffers before transmission across the network. This
`approach wastes significant memory for large objects being
`sent across slow connections.
`The database system approach uses a database system as
`a cache. Generally, databases are structured to achieve goals
`that make them inappropriate for use as an object cache. For
`example, they are structured to optimize transaction pro(cid:173)
`cessing. To preserve the integrity of each transaction, they 40
`use extensive locking. As a result, as a design goal they favor
`data integrity over performance factors such as speed. In
`contrast, it is acceptable for an object cache to lose data
`occasionally, provided that the cache does not corrupt
`objects, because the data always can be retrieved from the
`server that is original source of the data. Databases are often
`optimized for fast write performance, since write speed
`limits transaction processing speed. However, in an object
`cache, read speed is equally important. Further, databases
`are not naturally good at storing a vast variety of object sizes 50
`while supporting streaming, pipe lined I/O in a virtual
`memory efficient manner. Databases commonly optimized
`for fixed record size sizes. Where databases support variable
`record sizes, they contain support for maintaining object
`relationships that are redundant, and typically employ slow, 55
`virtual memory paging techniques to support streaming,
`pipelined I/O.
`In a cyclonic file system, data is allocated around a
`circular storage structure. When space becomes full, the
`oldest data is simply removed. This approach allows for fast 60
`allocation of data, but makes it difficult to support large
`objects without first staging them in memory, suffers prob(cid:173)
`lems with fragmentation of data, and typically entails naIve
`garbage collection that throws out the oldest object, regard(cid:173)
`less of its popularity. For a modest, active cache with a 65
`diverse working set, such first-in-first-out garbage collection
`can throw objects out before they get to be reused.
`
`4
`The fundamental problem with the above approaches for
`the design of cache object stores is that the solution isn't
`optimized for the constraints of the problem. These
`approaches all represent reapplication of existing technolo(cid:173)
`gies to a new application. None of the applications above are
`ideally suited for the unique constraints of multimedia,
`streaming, object caches. Not only do the above solutions
`inherently encumber object caches with inefficiencies due to
`their imperfect reapplication, but they also are unable to
`effectively support the more unique requirements of multi(cid:173)
`media object caches. These unique requirements include the
`ability to disambiguate and share redundant content that is
`identical, but has different names, and the opposite ability to
`store multiple variants of content with the same name,
`targeted for particular clients, languages, data types, etc.
`Based on the foregoing, there is a clear need to provide an
`object cache that overcomes the disadvantages of these prior
`approaches, and is more ideally suited for the unique
`requirements of multimedia object caches. In particular:
`1. there is a need for an object store that can store
`hundreds of millions of objects of disparate sizes, and
`a terabyte of content size in a memory efficient manner;
`2. there is a need for an object store that can determine if
`a document is a "hit" or a "miss" quickly, without
`time-consuming file directory lookups;
`3. there is a need for a cache that minimizes the number
`of disk seeks to read and write objects;
`4. there is a need for an object store that permits efficient
`streaming of data to and from the cache;
`5. there is a need for an object store that supports multiple
`different versions of targeted alternates for the same
`name;
`6. there is a need for an object store that efficiently stores
`large numbers of objects without content duplication;
`7. there is a need for an object store that can be rapidly and
`efficiently garbage collected in real-time, insightfully
`selecting the documents to be replaced to improve user
`response speed, and traffic reduction;
`8. there is a need for an object store that that can restart
`to full operational capacity within seconds after soft(cid:173)
`ware or hardware failure without data corruption and
`with minimal data loss.
`This document concerns technology directed to accom-
`45 plishing the foregoing goals. In particular, this document
`describes methods and structures related to the time-efficient
`and space-efficient storage, retrieval, and maintenance of
`objects in a large object store. The technology described
`herein provides for a cache object store for a high(cid:173)
`performance, high-load application having the following
`general characteristics:
`1. High performance, measured in low latency and high
`throughput for object store operations, and large num(cid:173)
`bers of concurrent operations;
`2. Large cache support, supporting terabyte caches and
`billions of objects, to handle the Internet's exponential
`content growth rate;
`3. Memory storage space efficiency, so expensive semi(cid:173)
`conductor memory is used sparingly and effectively;
`4. Disk storage space efficiency, so large numbers of
`Internet object replicas can be stored within the finite
`disk capacity of the object store;
`5. Alias free, so that multiple objects or object variants,
`with different names, but with the same content iden(cid:173)
`tical object content, will have the object content cached
`only once, shared among the different names;
`
`30
`
`35
`
`Page 29 of 48
`
`
`
`US 6,292,880 Bl
`
`6
`(E) storing the content key referencing the content object in
`each alternate vector element; and (F) establishing in the
`cache table, mappings from content keys to object content
`data.
`The invention also encompasses an apparatus, computer
`system, computer program product, and a computer data
`signal embodied in a carrier wave configured according to
`the foregoing aspects, and other aspects.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`5
`6. Support for multimedia heterogeneity, efficiently sup(cid:173)
`porting diverse multimedia objects of a multitude of
`types with size ranging over six orders of magnitude
`from a few hundred bytes to hundreds of megabytes;
`7. Fast, usage-aware garbage collection, so less useful
`objects can be efficiently removed from the object store
`to make room for new objects;
`8. Data consistency, so programmatic errors and hardware
`failures do not lead to corrupted data;
`9. Fast restartability, so an object cache can begin servic-
`ing requests within seconds of restart, without requiring
`a time-consuming database or file system check opera(cid:173)
`tion;
`10. Streaming, so large objects can be efficiently pipelined
`from the object store to slow clients, without staging
`the entire object into memory;
`11. Support for content negotiation, so proxy caches can
`efficiently and flexibly store variants of objects for the
`same URL, targeted on client browser, language, or 20
`other attribute of the client request; and
`12. General-purpose applicability, so that the object store
`interface is sufficiently flexible to meet the needs of
`future media types and protocols.
`
`10
`
`The present invention is illustrated by way of example,
`and not by way of limitation, in the figures of the accom(cid:173)
`panying drawings and in which like reference numerals refer
`15 to similar elements and in which:
`FIG. 1 is a block diagram of a client/server relationship;
`FIG. 2 is a block diagram of a traffic server;
`FIG. 3Ais a block diagram of transformation of an object
`into a key;
`FIG. 3B is a block diagram of transformation of an object
`name into a key;
`FIG. 4A is a block diagram of a cache;
`FIG. 4B is a block diagram of a storage mechanism for
`25 Vectors of Alternates;
`FIG. 4C is a block diagram of multi-segment directory
`table;
`FIG. 5 is a block diagram of pointers relating to data
`fragments;
`FIG. 6 is a block diagram of a storage device and its
`contents;
`FIG. 7 is a block diagram showing the structure of a pool;
`FIG. 8A is a flow diagram of a process of garbage
`35 collection;
`FIG. 8B is a flow diagram of a process of writing
`information in a storage device;
`FIG. 8C is a flow diagram of a process of synchronization;
`FIG. 8D is a flow diagram of a "checkout_read" process;
`FIG. 8E is a flow diagram of a "checkout_writ