throbber
United States Patent
`Mattis et al.
`
`[19]
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006128623A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,128,623
`Oct. 3, 2000
`
`[54] HIGH PERFORMANCE
`
`OBJECT CACHE
`
`o 834 819 A2
`
`4/1998 European Pat. Off.
`
`[75]
`
`Inventors: Peter Mattis, Belmont; John Plevyak,
`San Francisco, both of Calif.; Matthew
`Haines, Laramie, Wyo.; Adam
`Beguetin, San Mateo, Calif.; Brian
`Totty, Foster City, Calif.; David
`Gourley, Palo Alto, Calif.
`
`[73]
`
`Assignee:
`
`Inktomi Corporation,
`Calif.
`
`San Mateo,
`
`[21]
`[22]
`[51]
`[52]
`[58]
`
`App!. No.: 09/060,866
`
`Apr. 15, 1998
`
`Filed:
`Int. C1.7
`U.S. CI.
`Field of Search
`
`.................... G06F 17/30
`707/103; 707/100; 711/118
`.....................................707/100, 103;
`711/118, 119, 120, 121, 122
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`7/1996 Neimat et al. ..
`5,542,087
`5,611,049
`3/1997 Pitts
`12/1997 Wong et al.
`5,701,432
`3/1998 Kikinis
`5,727,159
`5,732,267
`3/1998 Smith
`5/1998 Mauldin
`5,748,954
`5,826,242
`10/1998 Montulli
`12/1998 Bhide et al.
`5,852,717
`5,864,852
`1/1999 Luotonen
`5,870,763
`2/1999 Lornet
`2/1999 Copeland, et al.
`5,872,969
`3/1999 Singh et al.
`5,881,229
`6/1999 Wong et al.
`5,909,695
`5,924,116
`7/1999 Aggarwal et aJ.
`8/1999 Logue et al.
`5,935,207
`9/1999 Aviani,Jr
`5,950,205
`10/1999 Krishnaswarnyet al.
`5,974,421
`11/1999 Humphrey
`5,987,233
`FOREIGN PATENT DOCUMENTS
`o 811 939 A2 12/1997 European Pat. Off.
`
`....... 707/103
`395/200.09
`395/457
`395/200.76
`713/1
`395/610
`705/27
`395/200.33
`707/10
`707/202
`395/671
`395/200.33
`711/133
`711/122
`709/219
`707/103
`707/103
`395/200.47
`
`..
`
`Primary Examiner-Thomas G. Black
`Assistant Examiner-Frantz Coby
`Attorney, Agent, or Firm-McDermott, Will & Emery
`
`[57]
`
`ABSTRACT
`
`is
`is disclosed. The cacbe
`cache
`A higb-performance
`designed for time- and space-efficiency for a diverse range
`of information objects.
`Information objects are stored in
`portions of a non-volatile
`storage device called arenas,
`which are contiguous regions from wbicb space is allocated
`in parallel, Objects are substantially contiguously allocated
`within an arena and are mapped by name keys and content-
`based object keys to a tag table, an open directory, and a
`directory table. Tbe tag table is indexed by tbe uame keys,
`and stores references to sets in tbe directory table. The tag
`table is compact and therefore can be stored in fast main
`memory,
`facilitating rapid lookups. The directory table is
`organized so tbat at least a frequeutly-accessed
`portion of it
`also usually resides
`in fast main memory, wbicb further
`speeds lookups. The tag and directory tables are organized
`to quickly determine non-presence of objects. Large objects
`may be chunked into fragments, which are chained using a
`forward functional-iteration mechanism,
`to prevent the need
`for mutating existing on-disk data structures. Garbage col-
`lection periodically moves objects within an arena or to
`other arenas so that
`inactive objects are deleted and free
`space becomes contiguous. Because tbe objects are substan-
`tially contiguously allocated, reading and writing an typical
`object requires only one or two disk head actuator move-
`ments;
`tbus,
`the cacbe can efficiently and smoothly stream
`data off of the storage device, providing optimal delivery of
`multimedia
`objects. Tbe disclosure
`also encompasses
`a
`computer apparatus, computer program product, and com-
`puter data signal embodied in a carrier wave tbat are
`similarly configured.
`
`22 Claims, 26 Drawing Sheets
`
`~ B
`
`I
`
`Disklocation 118.
`
`Size 120
`
`\[~-
`
`
`
`I~l---l~~
`
`MICROSOFT
`
`EXHIBIT 1015
`
`Page 1 of 49
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 1 of 26
`
`6,128,623
`
`FIGURE 1
`
`PRIOR ART
`
`Network 20
`
`3Q
`Proxy
`
`4.0.
`Server
`
`iDa
`Client
`
`1Qh
`Client
`
`Page 2 of 49
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 2 of 26
`
`6,128,623
`
`:::r~
`LC
`
`9.O.b
`
`I
`
`9O.n
`
`•..•.
`
`C
`
`.•...
`
`L.....-.+
`
`...••
`.:::;
`
`...••
`
`Page 3 of 49
`
`....
`
`..::::
`
`.9Qa
`
`III
`
`(
`
`Proxy
`
`3.Q
`
`ZQ
`Protocol
`Engine
`
`FIGURE 2
`
`L
`
`BQ Cache
`
`I r!es
`
`I
`
`...
`110Core
`
`6Q
`
`i 2
`
`U
`Internet
`
`/ Reques~r object /
`
`J
`
`1D.a
`Client
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 3 of 26
`
`6,128,623
`
`FIGURE 3A
`
`52
`Object
`
`SA
`Hash function
`
`fiB. Set Subkey
`
`sa Tag
`Subkey
`
`5fi: Remainder Subkey
`
`(Object Key 56
`
`FIGURE 38
`
`sa
`Object name
`
`t S
`
`A
`Hash function
`
`+
`
`1SUb~Y11/ Su:y
`
`21
`
`62
`Name Key
`
`Page 4 of 49
`
`

`

`ee
`='" ..•~N
`
`~N~
`
`••• o..., N
`tt~
`J).g
`
`0\
`
`~~
`
`of
`
`l
`
`Nooo r
`
`StoredObject
`
`124
`
`ObjectKey
`ResponseInfo
`RequestInfo
`123.a
`
`IVectorofAlternates122hI
`IVectorofAlternates122aI
`131n
`
`="
`
`""
`
`""~
`
`~~ ~~ "
`
`132a
`
`FIGURE4A
`
`Page 5 of 49
`
`Disklocation118.
`
`\:0 Subkey116
`
`,\
`
`,
`,
`,
`,
`,
`'
`
`'
`
`,
`
`,
`,
`
`\\\JSize120
`
`112n
`
`DirectoryTable110
`
`TagTable102
`
`'
`,
`'
`,
`,
`,
`
`'
`
`'
`
`'
`
`,\
`
`104n
`
`106
`
`,\
`
`"
`"
`"
`
`vv
`
`"
`
`'-
`
`I,
`
`1
`
`1.1.Qc
`
`1.1O.h
`
`11Qa
`
`110d
`
`112a
`
`1Mb
`
`10Aa
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 5 of 26
`
`6,128,623
`
`FIGURE 48
`
`Storage device 90a
`
`Vector of Alternates 122a
`123b
`123a
`"IE4", 'IE4',
`'Enqllsh",
`'English',
`Object Key
`Object Key
`
`123c
`'Navigator,'
`Object Key
`
`.. - Stored Object 124a
`
`r-----------------------------------------------------
`
`l_
`
`- - ~1...__
`
`1_2_2_b__
`
`H Stored Object 124b HI...
`
`,
`J
`
`12_2_n __
`
`Page 6 of 49
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`/
`
`Sheet 6 of 26
`
`6,128,623
`
`DIRECTORY TABLE 110
`
`Page 7 of 49
`
`110a
`
`•• ••••
`
`-- ----------~----------
`•
`
`-- ----------~----------
`
`•••••••••
`
`110n
`
`SEGMENT 111a
`
`SEGMENT 111b
`
`SEGMENT 111c
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 7 of 26
`
`6,128,623
`
`FIGURE 5
`
`Directory Table 110
`,---------------------l
`
`I I I
`
`Tail
`141e
`
`I Subkey 132 I
`
`Head
`-:-----..t 141a ,------,::=~
`
`I I I I I I I I I I I I I I II I I I I I I I I~
`
`141d
`
`141b
`
`---------.
`
`-.-..-..,~
`
`Fragment 208c
`
`Fragment 208a
`
`13Gb ------+--I •.•Fragment 2G8b
`
`---128c
`
`----------------------
`
`Page 8 of 49
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 8 of 26
`
`6,128,623
`
`- - -
`
`,,
`
`, ,
`
`,
`
`, ,
`
`-,
`
`, ,
`
`, ,
`210
`
`Storage Device 9.Qa
`
`FIGURE 6
`
`Pool 2QQa
`
`Pool 2llilb.
`
`Fragment
`
`Fragment
`
`- --
`
`Fragment
`
`~208a
`
`,
`
`r
`
`\.208n
`
`~,
`
`\.208b
`
`r
`
`,
`,
`,
`,,,
`
`,,,
`
`2
`~
`
`I
`I
`
`J
`
`II I
`
`I \
`
`\
`
`\
`
`\
`
`,..•.
`
`Page 9 of 49
`
`208d Fragment
`Header
`
`2O.8.eFragment Data
`
`"
`
`" 1
`
`2illk Magic 11206a
`
`Key 112O£.b.Length I
`
`202 Pool Header
`Magic
`Version No.
`No. of Arenas
`20fia Arena Header
`Empty
`Corrupt
`
`,
`2.06.0. Arena Header
`Empty
`Corrupt
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 9 of 26
`
`6,128,623
`
`FIGURE 7
`
`210
`
`I~
`
`204a
`(
`
`2QB..d Fragment
`Header
`
`2O.8..e Fragment
`Data
`
`Page 10 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 10 of 26
`
`6,128,623
`
`FIGURE 8A
`
`803
`
`"Low water mark"
`"High water mark"
`
`Last arena accessed
`Number of accesses
`
`Garbage
`Collection
`
`8.D2
`Select pool
`
`8M.
`Select arena
`
`8Qfi
`Select fragment
`
`Selection factors
`
`807
`
`YES
`
`8J2
`Mark fragment as
`deleted
`
`8.10.
`Write fragment to new
`arena
`
`ill
`Update Directory Table
`
`Page 11 of 49
`
`

`

`u.s. Patent
`
`Oct. 3, 2000
`
`Sheet 11 of 26
`
`6,128,623
`
`Write fragment
`to new arena
`...---...!--.....•578
`
`FIGURE 88
`
`90
`
`Update
`Open
`Index
`
`Step 806
`
`Write object data 1-----1
`in selected Arena
`
`Page 12 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 12 of 26
`
`6,128,623
`
`820
`
`822
`
`824
`
`Synchronization
`
`Write object to
`cache
`
`Create metadata
`in Open Directory
`
`1-----826
`
`Write and synch
`object data to disk
`
`1---028
`
`Get next metadata
`from Open Directory
`
`FIGURE8C
`
`B2l
`Garbage collection
`
`82.9.
`For each fragment in Arena, delete
`directory metadata that points to the
`fragment and write it to disk
`
`83.1
`Modify pool header in
`memory to show Arena as
`empty
`
`8.2.D:
`Write pool header and
`sync to disk
`
`aaa
`Copy metadata from Open
`Directory to Directory Table
`
`825.
`Sync changes
`
`Page 13 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 13 of 26
`
`6,128,623
`
`FIGURE 80
`
`Open Directory:
`'checkout_read"
`
`8..32
`Get lock for key
`YES
`
`NO
`
`8..3.4.
`Compute bucket for
`key
`
`ass
`Scan blocks in
`bucket's list for key
`FOUND
`
`NOT FOUND
`
`B3B.
`Is block being created
`or destroyed?
`
`B4.O.
`Return 'not available'
`error
`
`842
`Increment block
`reader count
`
`8M
`Return copy of open
`directory block
`
`Ma
`Return 'does not exist'
`error
`
`NOT FOUND
`
`BA.6.
`Cache Directory
`lookup
`FOUND
`
`B..5.Q
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`Page 14 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 14 of 26
`
`6,128,623
`
`FIGURE BE
`
`Key
`
`Open Directory:
`'checkout_write"
`
`B.5.4.
`Get lock for key
`YES
`
`NO
`
`B.5.6.
`Compute bucket
`key
`
`for
`
`8.5.8.
`Scan blocks in
`bucket's list for key
`
`864
`Is block being created
`or destroyed?
`
`84.Q
`Return 'not available"
`error
`
`M2.
`Increment block
`reader count
`
`844.
`Return copy of open
`directory block
`
`aez
`Return 'does not exist"
`error
`
`NOT FOUND
`
`aso
`Cache Directory
`lookup
`FOUND
`
`8.5.Q
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`Page 15 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 15 of 26
`
`6,128,623
`
`FIGURE 8F
`
`l
`
`r
`
`Key
`
`1'"E----1,
`
`NO
`
`J
`
`Open Directory:
`'checkout_create"
`
`r
`ill
`Get lock for key
`~YES
`8lB.
`Compute bucket for
`key
`
`NOT FOUND
`
`8.8..0.
`Scan blocks in
`bucket's list for key
`
`FOUND
`
`B..9..6.
`Return "already exists"
`error
`
`FOUND
`
`B.94
`Cache Directory
`lookup
`
`NOT FOUND
`
`BB2
`Is block deleted and has no
`writers or readers?
`
`8M
`NO
`.>----~ Return "not available"
`error
`
`\YES
`8.8..6.
`Increment block
`reader count
`
`8.8B.
`Initialize block from
`key
`
`\
`a9..8
`Create Open Directory
`block; initialize block
`from Directory Index;
`add block to bucket's
`block list
`
`asa
`Increment block writer count
`and mark copy as created
`I
`L-
`
`a92
`Return copy of open directory
`block and mark block as
`being modified
`
`----:~
`
`r;..{
`--.-J
`"-
`
`Done
`
`Page 16 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 16 of 26
`
`6,128,623
`
`9.02
`LOOKUP
`
`9lli1
`Convert object name
`in request
`to key value
`
`sos
`Look up request key
`value in Open Directory
`
`912
`Look up request key
`value in Tag Table
`
`aia
`Store Tag Table miss
`
`FIGURE9A
`
`91B
`Look up key value in
`Directory Table
`
`YES
`~~
`
`910
`Obtain object and
`deliver to client
`
`922
`Update Open
`Directory if needed
`
`m
`Cache Miss
`
`szs
`Obtain object from
`source server
`
`92B
`Done
`
`Page 17 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 17 of 26
`
`6,128,623
`
`Open Directory
`block checkin
`
`....--
`
`\,
`9.3.Q
`Get lock for key
`YES
`
`\1
`9J2
`Is block being
`modified?
`NO
`\'
`94.6.
`Decrement
`count
`
`reader
`
`FIGURE 98
`
`r
`I
`
`Block
`
`--
`--
`
`NO
`
`.,
`
`YES ••.
`
`9.3A
`Decrement writer count
`
`\1
`ass
`Is checkin successful?
`
`NO
`
`M2
`YES •..
`". Copy block info back to
`original block
`I
`
`NO
`
`'1/
`
`~
`
`\'
`93.a
`Is delete checkin flag
`set?
`YES
`
`8M
`Mark block as deleted
`....1
`"'"'\f
`
`9A!l Done
`
`Page 18 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 18 of 26
`
`6,128,623
`
`FIGURE 9C
`
`-
`
`Key
`
`-
`
`95D.
`..•....
`." Return not found
`
`r-
`Cache lookup \.."
`
`,I
`
`BAa
`Check out directory
`
`blocks,
`
`I
`9M
`Check in directory
`
`blocks,v
`~---,v
`
`ass
`Return found
`
`9.52 Done
`
`Page 19 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 19 of 26
`
`6,128,623
`
`FIGURE 90
`
`Key
`
`FAILURE
`
`9..5.8
`Check out directory
`blocks
`SUCCESS
`
`954
`Check in directory blocks
`with deletion flag set
`
`Page 20 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 20 of 26
`
`6,128,623
`
`Cache open read
`
`•.••<E----~
`
`FIGURE 9E
`
`-
`
`Object name
`Request info
`
`FAILURE
`964
`Check out vector for 1---------...,
`reading
`SUCCESS
`\1
`
`\1
`91.2
`Return no document
`error
`
`FAILURE
`
`~
`
`9fifi
`Use request inf0 to select
`alternate from vector
`SUCCESS
`~I
`96B.
`Check in vector
`
`9lQ
`Read object pointed to
`by alternate
`
`Page 21 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 21 of 26
`
`6,128,623
`
`Cache open write -
`
`FIGURE 9F
`
`Object name
`Request info
`Response info
`
`FAILURE
`
`,II
`98A
`Create vector
`
`9.B2
`FAILURE
`I-L..----~:::: Return no document
`error
`
`ill
`Check out vector for
`writing
`SUCCESS
`
`--
`v
`91.6.
`Use request info to define
`alternate for vector; add
`alternate to vector
`SUCCESS
`,II
`9I8.
`Check in vector
`
`9B.O.
`Write object pointed to
`by alternate
`
`Page 22 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 22 of 26
`
`6,128,623
`
`FIGURE 9G
`
`/
`
`'-
`
`Object name
`Old identifier
`Request
`info
`Response info
`
`"""\
`
`./
`
`FAILURE
`
`....
`
`--
`
`sse
`Return not available
`error
`
`--
`
`~
`
`Cache update
`
`W
`9.8.6.
`Check out vector for
`writing
`SUCCESS
`'V
`9.B.B.
`Construct new clone
`vector
`\V
`.9.9.Q
`Write new clone vector
`
`~
`.9.92
`Check in clone vector
`
`"
`
`9..9.4.Done
`
`Page 23 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 23 of 26
`
`6,128,623
`
`FIGURE 10A
`
`65
`
`,----l~ Request Arena with
`available space
`
`sse
`Store contents of write
`aggregation buffer in Arena
`
`660
`
`Update index
`
`640
`
`642
`
`Allocation
`and Write Object
`
`Look up and retrieve
`object from origin
`server
`
`Get object length
`from HTTP request,
`present
`
`if
`
`648
`
`Store object in write
`aggregation buffer
`
`fi.5.Q
`Write
`aggregation
`buffer full?
`
`Page 24 of 49
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 24 of 26
`
`6,128,623
`
`FIGURE 108
`
`620
`
`622
`
`624
`
`626
`
`Scaled Counter
`Updating
`
`Sum all
`counter values
`
`Compute Maximum
`Value that can be
`represented by a counter
`
`Decrement all
`counter values by 1
`
`632
`
`Page 25 of 49
`
`

`

`ee
`..•~N
`='"
`
`~N~
`
`tv0\
`(J1e..,
`=- tttt-tv
`
`'JJ
`
`112.4
`
`HOST
`
`1122
`
`NETWORK
`
`LOCAL
`
`~~
`":••
`
`o I
`
`tv===
`
`="
`
`""
`
`""~
`
`~~ ~~"
`
`1126
`
`1128
`
`ISP
`
`INTERNET
`
`113.Q
`
`SERVERh
`
`1120
`
`I
`
`1100I
`
`111.B.
`
`----------
`
`ILINK
`INETWORK
`
`INTERFACE
`COMMUNICATION
`
`I I I I I I II I I II I II I I I I I
`
`.111Q
`
`11.Qft
`
`DEVICE
`
`STORAGE
`
`BUS
`
`<'7
`
`OM
`
`oR
`
`:r
`
`~
`
`~
`
`llil2
`
`<'7
`
`L;;.
`
`---------
`
`llQ4
`PROCESSOR
`
`'7
`
`~
`
`~
`
`'«./
`
`11DB.
`
`~',>-
`
`MEMORY
`
`MAIN
`
`I I I II I I I II
`
`I I I II I II I I I I
`
`---------------------------
`
`FIGURE11
`
`Page 26 of 49
`
`1116.1"1
`
`A
`
`CONTROL
`CURSOR
`
`~
`
`INPUTDEV~h
`
`DISPLA:mI¢=ll
`
`

`

`u.s. Patent
`
`Oct. 3,2000
`
`Sheet 26 of 26
`FIGURE 12
`
`6,128,623
`
`1212
`(Information object has
`not been loaded recently)
`
`1lli
`Tell process "no"
`
`.12.1fi
`Update expiration date value
`of information object to the
`current date or time
`
`Re- Validation
`
`12.Q2
`External process asks cache
`whether an object has been
`loaded by a client recently
`
`12M
`Cache locates the information
`object in the cache
`
`12.0.6.
`Read a Read Counter value
`associated in directory tables
`with the information object
`
`NO
`
`1210
`Tell process "yes"
`
`Page 27 of 49
`
`

`

`acting as a
`service,
`gateway
`a "middleman"
`server provides
`server to the client, and a client to tbe server. A proxy server
`equipped witb a cache is called a caching proxy server, or
`commonly, a "proxy cache".
`The proxy cache 30 intercepts requests for resources that
`are directed from the clients lOa, lOb to the server 40. When
`tbe cacbe in tbe proxy 30 has a replica of the requested
`resource that meets certain freshness constraints,
`the proxy
`responds
`to the clients lOa,
`lOb and serves the resource
`10 directly. In this arrangement,
`the number and volume of data
`transfers along the link 42 are greatly reduced. As a result,
`network resources or objects are provided more rapidly to
`tbe clients lOa,
`lOb.
`A key problem in sucb caching is tbe efficient storage,
`15 location, and retrieval of objects in tbe cache. This document
`concerns
`technology related to the storage,
`location, and
`retrieval of multimedia objects within a cache. The object
`storage facility witbin a cache is called a "cache object
`store" or "object store".
`such as
`To effectively handle beavy traffic environments,
`tbe World Wide Web, a cache object store needs to be able
`to handle tens or bundreds of millions of different objects,
`while storing, deleting, and fetcbing tbe objects simulta-
`25 neously. Accordingly, cache performance must not degrade
`significantly witb object count. Performance
`is tbe driving
`goal of cacbe object stores.
`Finding an object
`in the cache is the most common
`operation and therefore the cache must be extremely fast in
`that
`limits cache
`30 carrying out searches. The key factor
`performance is 1001-'Uptime. It is desirable to have a cache
`tbat can determine whether an object is in the cache (a "hit")
`or not (a "miss") as fast as possible.
`In past approaches,
`caches capable of storing millions of objects have been
`stored in traditional
`file system storage structures. Tradi-
`tional file systems are poorly suited for multimedia object
`caches because they are tuned for particular object sizes and
`require multiple disk head movements to examine file sys-
`tem metadata. Object stores can obtain higher lookup per-
`formance by dedicating DRAM memory to the task of object
`1001-'Up,but because there are tens or hundreds of millions of
`objects,
`the memory lookup tables must be very compact.
`Once an object
`is located,
`it must be transferred to the
`client efficiently. Modem disk drives offer high performance
`45 when reading and writing sequential data, but suffer signifi-
`cant performance delays when incurring disk head move-
`ments to other parts of the disk. These disk bead movements
`are called "seeks". Disk performance
`is typically con-
`strained by the drive's rated seeks per second. To optimize
`it is desirable to minimize disk
`50 performance of a cache,
`seeks, by reading and writing contiguous blocks of data.
`Eventually,
`the object store will become full, and particu-
`lar objects must be expunged to make room for new content.
`This process is called "garbage collection". Garbage collec-
`tion must be efficient enough that
`it can run continually
`without
`providing
`a significant
`decrease
`in system
`performance, while removing objects tbat have the least
`impact on future cache performance.
`Past Approaches
`In the past, four approaches have been used to structure
`cache object stores: using the native file system, using a
`memory-blocked "page" cache, using a database, and using
`a "cyclone"
`circular
`storage
`structure.
`Each of these prior
`approaches has significant disadvantages.
`The native file system approach uses the file system of an
`operating system running on the server to create and manage
`a cache. File systems are designed for a particular applica-
`
`20
`
`35
`
`40
`
`BACKGROUND OF THE INVENTION
`Several
`important computer
`technologies
`rely, to a great
`extent, upon rapid delivery of information from a central
`storage location to remote devices. For example,
`in the
`client/server model of computing, one or more servers are
`used to store information. Client computers or processes are
`separated from the servers and are connected to the servers
`using a network. The clients request information from one of
`tbe servers by providing a network address of the informa-
`tion. Tbe server locates tbe information based on the pro-
`vided network address and transmits it over tbe network to
`tbe client, completing the transaction.
`tbe
`The World Wide Web is a popular application of
`client/server computing model. FIG. 1 is a simplified block
`diagram of the relationship between elements used in a Web
`system. One or more web clients lOa, lOb, each of which is
`a computer or a software process such as a browser program,
`are connected to a global information network 20 called tbe
`Internet, eitber directly or tbrough an intermediary such as
`an Internet Service Provider, or an online information ser-
`vice.
`A web server 40 is likewise connected to the Internet 20
`by a network link 42. The web server 40 bas one or more
`internet
`network
`addresses
`and textual
`bost names,
`associ-
`ated in an agreed-upon format
`that
`is indexed at a central
`Domain Name Server (DNS). The server contains multime-
`dia information resources, sucb as documents and images, to
`be provided to clients upon demand. The server 40 may
`additionally or alternatively contain software for dynami-
`cally generating such resources in response to requests.
`The clients lOa,
`lOb and server 40 communicate using
`one or more agreed-upon protocols that specify the format of
`tbe information tbat is communicated. A client lOa looks up
`network address of a particular
`server using DNS and
`establisbes a connection to the server using a communica-
`tion protocol
`called
`the Hypertext
`Transfer
`Protocol
`(HTTP). A Uniform Resource Locator
`(URL) uniquely
`identifies eacb information object stored on or dynamically
`generated by the server 40. A URL is a form of network
`address that identifies the location of information stored in
`a network.
`limits tbe performance of tbe World
`that
`A key factor
`Wide Web is tbe speed with which the server 40 can supply
`information to a client via the Internet 20. Performance is
`limited by the speed, reliability, and congestion level of the
`network route through the Internet, by geograpbical distance
`delays, and by server load level. Accordingly, client
`trans-
`action time can be reduced by storing replicas of popular
`information objects in repositories geographically dispersed
`from the server. Each local repository for object replicas is 60
`generally referred to as a cache. A client may be able to
`access replicas from a topologically proximate cache faster
`tban possible from the original web server, wbile at the same
`time reducing Internet server traffic.
`In one arrangement,
`as sbown in FIG. 1, the cacbe is 65
`located in a proxy server 30 tbat
`is logically interposed
`between the clients lOa,
`lOb and the server 40. The proxy
`
`6,128,623
`
`2
`
`1
`HIGH PERFORMANCE OBJECT CACHE
`
`FIELD OF THE INVENTION
`invention relates to information delivery, and
`The present
`relates more specifically to a cache for information objects
`tbat are to be delivered efficiently and at high speed over a
`network to a client
`
`55
`
`Page 28 of 49
`
`

`

`6,128,623
`
`10
`
`15
`
`25
`
`30
`
`50
`
`55
`
`60
`
`3
`tion in mind: storing and retrieving user and system data
`files. File systems
`are designed
`and optimized
`for
`file
`management
`applications. They are optimized for typical
`data file sizes and for a relatively small number of files (both
`total and within one folder/directory). Traditional
`file sys-
`tems are not optimized to minimize the number of seeks to
`open, read/write, and close files. Many file systems incur
`significant performance penalties
`to locate and open files
`when tbere are large numbers of files present. Typical file
`systems suffer fragmentation, with small disk blocks scat-
`tered around the drive surface, increasing the number of disk
`seeks required to access data, and wasting storage space.
`Also,
`file systems,
`being designed
`for user data
`file
`management,
`include facilities
`irrelevant
`to cache object
`stores, and indeed counter-productive
`to this application.
`Examples include: support for random access and selective
`modification,
`file permissions,
`support
`for moving files,
`support for renaming files, and support for appending to files
`over time. File systems are also invest significant energy to
`minimize any data loss, at the e"l'ense of performance, both 20
`at write time, and to reconstruct
`the file system after failure.
`The result
`is that
`file systems
`are relatively poorly for
`handling the millions of files that can be present
`in a cache
`of Web objects. File systems don't efficiently support
`the
`large variation in Internet multimedia object size-in
`par-
`ticular
`they typically do not support very small objects or
`very large objects efficiently. File systems require a large
`number of disk seeks
`for mctadata
`traversal
`and block
`chaining, poorly support garbage collection, and take time to
`ensure data integrity and to repair file systems on restart.
`The page cache extends file systems with a set of fixed
`sized memory buffers. Data is staged in and out of these
`buffers before
`transmission
`across
`the network. This
`for large objects being
`approach wastes
`significant memory
`sent across
`slow connections.
`The database system approach uses a database system as
`a cache. Generally, databases are structured to achieve goals
`that make tbem inappropriate for use as an object cache. For
`example,
`they are structured to optimize transaction pro-
`cessing. To preserve the integrity of each transaction,
`they 40
`use extensive locking. As a result, as a design goal they favor
`data integrity over performance
`factors such as speed.
`In
`contrast,
`it is acceptable for an object cache to lose data
`occasionally,
`provided
`that
`the cache does not corrupt
`objects, because the data always can be retrieved from the 45
`server that is original source of the data. Databases are often
`optimized for
`fast write performance,
`since write speed
`limits
`transaction
`processing
`speed. However,
`in an object
`cache, read speed is equally important. Further, databases
`are not naturally good at storing a vast variety of object sizes
`while
`supporting
`streaming,
`pipelined
`I/O in a virtual
`memory efficient manner. Databases commonly optimized
`for fixed record size sizes. Where databases support variable
`record sizes,
`they contain support
`for maintaining object
`relationships
`that are redundant, and typically employ slow,
`virtual memory paging techniques
`to support
`streaming,
`pipelined I/O.
`file system, data is allocated around a
`In a cyclonic
`circular
`storage structure. When space becomes
`full,
`the
`oldest data is simply removed. This approach allows for fast
`allocation of data, but makes it difficult
`to support
`large
`objects without first staging them in memory, suffers prob-
`lems with fragmentation of data, and typically entails naive
`garbage collection that throws out the oldest object, regard-
`less of its popularity, For a modest, active cache with a 65
`diverse working set, sucb first-in-first-out garbage collection
`can throw objects out before they get to be reused.
`
`35
`
`4
`The fundamental problem with the above approaches for
`the design of cache object stores is that
`the solution isn't
`optimized
`for
`the constraints
`of
`the problem. These
`approaches all represent reapplication of existing tecbnolo-
`gies to a new application. None of the applications above are
`ideally suited for
`the unique constraints
`of multimedia,
`streaming, object caches. Not only do the above solutions
`inherently encumber object caches with inefficiencies due to
`tbeir
`imperfect
`reapplication, but
`they also are unable to
`effectively support
`the more unique requirements of multi-
`media object caches. These unique requirements include the
`ability to disambiguate and share redundant content
`that is
`identical, but has different names, and the opposite ability to
`store multiple variants of content witb the same name,
`targeted for particular clients,
`languages, data types, etc.
`Based on the foregoing,
`tbere is a clear need to provide an
`object cache that overcomes the disadvantages of these prior
`approaches,
`and is more ideally suited for
`the unique
`requirements of multimedia object caches. In particular:
`1.
`there is a need for an object
`store that can store
`hundreds of millions of objects of disparate sizes, and
`a terabyte of content size in a memory efficient manner;
`2. there is a need for an object store that can determine if
`a document
`is a "hit" or a "miss" quickly, without
`lime-consuming file directory lookups;
`3. there is a need for a cache that minimizes the number
`of disk seeks to read and write objects;
`4. there is a need for an object store tbat permits efficient
`streaming of data to and from the cache;
`5. there is a need for an object store that supports multiple
`different versions of targeted alternates
`for the same
`name;
`6. there is a need for an object store that efficiently stores
`large numbers of objects witbout content duplication;
`7. there is a need for an object store that can be rapidly and
`efficiently garbage collected in real-time,
`insightfully
`selecting the documents to be replaced to improve user
`response speed, and traffic reduction;
`8. there is a need for an object store that that can restart
`to full operational capacity within seconds after soft-
`ware or hardware failure without data corruption and
`with minimal data loss.
`This document concerns tecbnology directed to accom-
`In particular,
`plishing the foregoing goals.
`this document
`describes met bods and structures related to the time-efficient
`and space-efficient
`storage,
`retrieval, and maintenance of
`objects in a large object store. The technology described
`herein provides
`for a cache object
`store
`for a high-
`performance,
`high-load application having the following
`general characteristics:
`1. High performance, measured in low latency and high
`throughput
`for object store operations, and large num-
`bers of concurrent operations;
`2. Large cache support, supporting terabyte caches and
`billions of objects,
`to handle the Internet's exponential
`content growth rate;
`3. Memory storage space efficiency, so expensive semi-
`conductor memory is used sparingly and effectively;
`4. Disk storage space efficiency, so large numbers of
`Internet object replicas can be stored within the finite
`disk capacit y of the object store;
`5. Alias free, so that multiple objects or object variants,
`with different names, but with the same content
`iden-
`tical object content, will have tbe object content cached
`only once, shared among the different names;
`
`Page 29 of 49
`
`

`

`6,128,623
`
`6
`of the requested information object in association with such
`block, wherein the size value indicates a storage size of the
`list and the plurality of versions of the information object;
`and wherein step (D) comprises the step of reading the list
`and tbe plurality of versions concurrently. Yet another fea-
`ture is consolidating streaming data transfers of different
`speeds into a write aggregation buffer.
`According to still another feature,
`the step of storing the
`information objects comprises the step of writing the infor-
`10 marion objects in contiguous available storage space of the
`mass storage device, while concurrently performing steps
`(A) through (D) with respect
`to another information object.
`The invention
`also encompasses
`an apparatus,
`computer
`system, computer program product, and a computer data
`15 signal embodied in a carrier wave configured according to
`tbe foregoing aspects, and other aspects.
`
`5
`6. Support for multimedia heterogeneity, efficiently sup-
`porting diverse multimedia objects of a multitude of
`types with size ranging over six orders of magnitude
`from a few bundred bytes to bundreds of megabytes;
`7. Fast, usage-aware garbage collection,
`so less useful
`objects can be efficiently removed from tbe object store
`to make room for new objects;
`8. Data consistency, so programmatic errors and hardware
`failures do not lead to corrupted data;
`9. Fast restartability, so an object cache can begin servic-
`ing requests within seconds of restart, witbout requiring
`a time-consuming database or file system check opera-
`tion;
`10. Streaming, so large objects can be efficiently pipelined
`from the object store to slow clients, without staging
`the entire object
`into memory;
`11. Support for content negotiation, so proxy caches can
`efficiently and flexibly store variants of objects for the
`same URL,
`targeted on client browser,
`language, or
`other attribute of the client request; and
`12. General-purpose
`applicability, so that the object store
`interface is sufficiently flexible to meet
`the needs of
`future media types and protocols.
`
`relating to data
`
`BRIEF DESCRIPTION OF 'II-IE DRAWINGS
`The present
`invention is illustrated by way of example,
`and not by way of limitation,
`in the figures of the accom-
`panying drawings and in which like reference numerals refer
`to similar elements and in which:
`relationship;
`FIG. 1 is a block diagram of a client/server
`FIG. 2 is a block diagram of a traffic server;
`FIG. 3A is a block diagram of transformation of an object
`into a key;
`FIG. 3B is a block diagram of transformation of an object
`name into a key;
`FIG. 4A is a block diagram of a cache;
`FIG. 4B is a block diagram of a storage mechanism for
`Vectors of Alternates;
`FIG. 4C is a block diagram of multi-segment directory
`table;
`FIG. 5 is a block diagram of pointers
`fragments;
`FIG. 6 is a block diagram of a storage device and its
`contents;
`FIG. 7 is a block diagram showing the structure of a pool;
`FIG. 8A is a flow diagram of a process of garbage
`collection;
`FIG. 8B is a flow diagram of a process of writing
`information
`in a storage device;
`FIG. 8C is a flow diagram of a process of synchronization;
`FIG. 8D is a flow diagram of a "checkout_read"
`process;
`FIG. 8E is a flow diagram of a "checkout_write"
`process;
`FIG. 8F is a flow diagram of a "checkout_create"
`pro-
`cess;
`FIG. 9A is a flow diagram of a cache lookup process;
`FIG. 9B is a flow diagram of a "checkin" process;
`FIG. 9C is a flow diagram of a cacbe lookup process;
`FIG. 9D is a flow diagram of a cache remove process;
`FIG. 9E is a flow diagram of a cache read process;
`FIG. 9F is a flow diagram of a cache write process;
`Fl G. 9G is a flow diagram of a cache update process;
`FIG. lOA is a flow diagram of a process of allocating and
`writing
`objects
`in a storage device;
`FIG. lOB is a flow diagram of a process of scaled counter
`updating;
`FIG. 11 is a block diagram of a computer system that can
`be used to implement
`the present
`invention;
`FIG. 12 is a flow diagram of a process
`re-validation.
`
`of object
`
`Page 30 of 49
`
`20
`
`25
`
`30
`
`35
`
`SUMMARY OF THE INVENTION
`The foregoing needs and other needs are addressed by the
`present
`invention, which provides,
`in one aspect,
`in a cache
`for information objects
`that are identified by key values
`based on names of the informati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket