`Hölzle et al.
`
`USOO6510498B1
`(10) Patent No.:
`US 6,510,498 B1
`(45) Date of Patent:
`*Jan. 21, 2003
`
`(54) METHOD AND APPARATUS FOR MEMORY
`ALLOCATION IN A MULTI-THREADED
`VIRTUAL MACHINE
`
`5,727,178 A * 3/1998 Pletcher et al. ............. 711/202
`5,893,159 A * 4/1999 Schneider ................... 711/150
`OTHER PUBLICATIONS
`Robert H. Halstead, Jr., “Multilisp: A Language for Con
`current Symbolic Computation.” Oct. 1985, ACM Transac
`tions on Programming Languages and Systems, Vol. 7, No.
`4.
`* cited by examiner
`Primary Examiner Matthew Kim
`ASSistant Examiner Pierre M. Vital
`(74) Attorney, Agent, or Firm- Beyer Weaver & Thomas
`(57)
`ABSTRACT
`Methods and apparatus for the efficient allocation of shared
`memory in a multi-threaded computer System are disclosed.
`In accordance with one embodiment of the present
`invention, a computer-implemented method for allocating
`memory shared by multiple threads in a multi-threaded
`computing System includes partitioning the shared memory
`into a plurality of blocks, and grouping the multiple threads
`into at least a first group and a Second group. A Selected
`block is allocated to a Selected thread which may attempt to
`allocate an object in the selected block. The allocation of the
`Selected block to the Selected thread is based at least partially
`upon whether the Selected thread is a part of the first group
`or the Second group. In one embodiment, grouping the
`multiple threads into the first group and the Second group
`includes identifying a particular thread and determining
`whether the particular thread is a fast allocating thread. In
`Such an embodiment, when the particular thread is fast
`allocating, the particular thread is grouped into the first
`grOup.
`
`19 Claims, 14 Drawing Sheets
`
`(75)
`
`(73)
`
`(*)
`
`(21)
`(22)
`
`(63)
`
`(51)
`(52)
`(58)
`
`(56)
`
`ASSignee:
`
`Notice:
`
`Inventors: Urs Hölzle, Goleta, CA (US); Steffen
`Grarup, Palo Alto, CA (US)
`Sun Microsystems, Inc.,
`Palo Alto, CA
`(US)
`Subject to any disclaimer,
`the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`This patent is Subject to
`claimer.
`
`a terminal dis
`
`Appl. No.:
`09/724.737
`Nov. 28, 2000
`Filed:
`Related U.S. Application Data
`
`Continuation of application No. 09/108,047, filed on Jun.
`30, 1998, now Pat. No. 6,209,066.
`Int. Cl................................................. G06F 12/00
`711/153; 711/170; 711/173
`U.S. C. ...
`711/153, 170,
`Field of Search ....
`711/173; 709/104,105,
`107, 108, 312;
`712/206, 215, 245, 246
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,247,634 A
`5.535,361 A
`5,600,596 A
`
`9/1993 Cline et al.
`7/1996 Hirata et al.
`2/1997 Shirakihara
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING MEMORY
`BLOCKS
`
`-432
`
`TAKE FRSOCKFROM
`ALOCATION AREA AND ASSIGN
`SHARED BLOCK FOR ALL
`THREADS
`
`43
`
`RUNOWRAL SYSTEM
`
`4.38
`
`HASA
`BLOCK
`OWERFLOWED
`g
`
`440
`OTAIN NeXT AWALABLE BLOCK
`FROMALLOCATION AREA
`
`442
`
`ISA
`NEW BLOCKNN
`AVAILABLE
`
`
`
`462
`
`ASSIGNONE SHARED BLOCK
`O All OHRHREADS
`Ax
`
`46G,
`ASSIGNONNEW BLOCK TO
`EACHFAST ALCCAING
`THREAD
`
`458
`TRMINE WHICH THREADS
`TO CONSIDeR FAST
`ALOCATING
`
`A.
`
`456
`
`PERFORMGARBAG
`COLLECTION
`
`ASSIGNNEW BLOCK TO THE
`THREAD THATOWERFLOWED
`HE BLOCK
`
`
`
`NEWBLOCK
`&WAILABLE
`
`447
`
`N/6VeRFLOWENY
`
`
`
`450
`y
`RLACEFULL SHARE BLOCK
`WhMEWBCCK
`
`
`
`
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 1
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 1 of 14
`
`US 6,510,498 B1
`
`Figure la
`(Prior Art)
`
`-3%. 115
`
`- 112
`
`Figure Ib
`
`Figure 2a
`(Prior Art)
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 2
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 2 of 14
`
`US 6,510,498 B1
`
`230 N.
`
`233a / 233C
`T1|T2T1|T1
`233
`233d
`
`-232
`
`GE)-236
`2->2->
`
`Figure 2b
`(Prior Art)
`
`(2)-238
`
`300 N.
`
`-- - - - - - --------------- -
`- 7304a2304c. 2304e
`:Y304b 04, Y304f
`-
`f GE)-306a, -
`-" i
`
`s
`302 y,
`',
`
`306b 1
`
`2.
`
`Figure 3
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 3
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 3 of 14
`
`US 6,510,498 B1
`
`Figure 4
`
`(STARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING MEMORY
`BLOCKS
`
`4O2
`
`TAKE FIRST BLOCK FROM
`ALLOCATION AREA AND ASSIGN
`SHARED BLOCK FOR ALL
`THREADS
`
`
`
`RUN OVERALL SYSTEM
`
`
`
`
`
`
`
`SA
`NEW BLOCK
`AVAILABLE
`
`4 16
`
`Y
`ASSIGN NEW BLOCK TO THE
`THREAD THAT OVERFLOWED
`THE BLOCK
`
`
`
`414
`
`ANOTHER
`NEW BLOCK
`
`
`
`
`
`
`
`N /OVERFLOWEDNY
`
`
`
`42O
`REPLACE FULL SHARED BLOCK
`WITH NEW BLOCK
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 4
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 4 of 14
`
`US 6,510,498 B1
`
`Figure 5a
`
`(STARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING MEMORY
`BLOCKS
`
`432
`
`TAKE FIRST BLOCK FROM
`ALLOCATION AREA AND ASSIGN
`SHARED BLOCK FOR ALL
`THREADS
`
`434
`
`436
`
`RUN OVERALL SYSTEM
`
`
`
`
`
`ASSIGNONE NEW BLOCK TO
`EACH FAST ALLOCATING
`THREAD
`
`DETERMINE WHICH THREADS
`TO CONSIDER FAST
`ALLOCATING
`
`
`
`444
`
`
`
`ANOTHER
`NEW BLOCK
`AVAILABLE
`
`Y
`45
`O
`REPLACE FULL SHARED BLOCK
`WITH NEW BLOCK
`
`
`
`
`
`
`
`SA
`NEW BLOCK
`AVAILABLE
`
`Y
`ASSIGN NEW BLOCK TO THE
`THREAD THAT OVERFLOWED
`THE BLOCK
`
`447
`
`N /OVERFLOWEDNY
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 5
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet S of 14
`
`US 6,510,498 B1
`
`
`
`S is TOTAL
`NUMBER OF
`THREADS?
`
`
`
`DOES
`THREAD
`USE SHARED
`POOL
`
`
`
`
`
`SET ALLOCATION ROUTINE
`OF THREAD TO LOCKING
`
`
`
`458
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MEMORY
`ALLOCATED
`BY THREAD IN LAS
`GARBAGE COLLECTION
`NTERVAL a
`THRESHOLD
`p
`
`
`
`
`
`SET ALLOCATION ROUTINE
`OF THREAD TO NON-LOCKING
`
`
`
`Figure 5b
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 6
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 6 of 14
`
`US 6,510,498 B1
`
`(STARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING MEMORY
`BLOCKS
`
`TAKE FIRST BLOCK FROM
`ALLOCATION AREA AND ASSIGN
`SHARED BLOCK FOR ALL
`THREADS
`
`
`
`6O2
`
`6O4
`
`RUN OVERALL SYSTEM
`
`606
`
`
`
`Fi gure 6
`
`617
`RESET OVERFLOW COUNTERS
`OF ALL THREADS
`
`616
`
`PERFORM GARBAGE
`COLLECTION
`
`ANOTHER
`NEW BLOCK
`AVAILABLE
`
`
`
`
`
`
`
`61O
`Y
`OBTAN NEXT AVAILABLE BLOCK
`FROM ALLOCATION AREA
`
`
`
`
`
`
`
`SA
`NEW BLOCKYNN
`AVAILABLE
`
`Y
`NCREMENT OVERFLOW
`COUNTER OF OVERFLOWING
`THREAD
`
`618
`
`62O
`
`
`
`Y
`
`COUNTER
`HRESHOLD
`
`
`
`ASSGN NEW BLOCK TO THE
`THREAD THAT OVER FLOWED
`THE BLOCK
`
`N
`
`622
`
`REPLACE FULL SHARED BLOCK
`WITH NEW SHARED BLOCK
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 7
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 7 of 14
`
`US 6,510,498 B1
`
`(STARD
`
`Figure 7
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING MEMORY
`BLOCKS
`
`TAKE FIRST BLOCK FROM
`ALLOCATION AREA AND ASSIGN
`SHARED BLOCK FOR ALL
`THREADS
`
`7O2
`
`704
`
`
`
`
`
`
`
`INCREMENT
`OVERFLOW
`COUNTER OF
`OVERFLOWING
`THREAD
`
`COUNTER
`HRESHOLD
`
`
`
`ASSIGNONE NEW BLOCK TO
`EACH FAST ALLOCATING
`THREAD
`
`DETERMINE WHICH THREADS
`TO CONSIDER FAST
`ALLOCATING
`
`ASSIGN NEW BLOCK TO THE
`THREAD THAT OVER FLOWED
`THE SHARED BLOCK
`
`
`
`ANOTHER
`NEW BLOCK
`AVAILABLE
`
`
`
`
`
`
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 8
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 8 of 14
`
`US 6,510,498 B1
`
`
`
`Figure 8
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 9
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 9 of 14
`
`US 6,510,498 B1
`
`(SIARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING SMALL AND
`LARGE MEMORY BLOCKS
`
`8O2
`
`ALLOCATE A SMALL MEMORY
`BLOCK TO EVERY THREAD
`
`RUN OVERALL SYSTEM
`
`
`
`
`
`
`
`
`
`
`
`S
`A NEW
`LARGE BLOC
`AVAILABLE
`
`
`
`816
`
`PERFORM GARBAGE
`COLLECTION
`
`ASSIGN NEW LARGE BLOCK TO
`THE THREAD THAT OVERFLOWED
`ITS BLOCK
`
`814
`
`Figure 9
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 10
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 10 of 14
`
`US 6,510,498 B1
`
`(STARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING SMALL AND
`LARGE MEMORY BLOCKS
`
`902
`
`ALLOCATE A SMALL BLOCK TO
`EVERY THREAD
`
`904
`
`RUN OVERALL SYSTEM
`
`
`
`906
`
`
`
`
`
`
`
`
`
`910
`OBTAIN NEXT AVAILABLE LARGE
`BLOCK FROM ALLOCATION AREA
`
`
`
`
`
`
`
`
`
`
`
`912
`S
`A NEW
`LARGE BLOCK) N
`AVAILABLE
`
`
`
`922
`ASSIGNONE SMALL BLOCK
`TO EACH OF THE OTHER
`THREADS
`
`92O
`
`ASSIGNONE NEW LARGE
`BLOCK TO EACH FAST
`ALLOCATING THREAD
`
`918
`DETERMINE WHICH THREADS
`TO CONSIDER FAST
`ALLOCATING
`
`916
`
`PERFORM GARBAGE
`COLLECTION
`
`Y
`ASSGN NEW LARGE PRIVATE
`BLOCK TO THE THREAD THAT
`OVERFLOWED TS BLOCK
`
`914
`
`Figure 10a
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 11
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 11 of 14
`
`US 6,510,498 B1
`
`918 Y
`
`
`
`
`
`
`
`
`
`
`
`ls is TOTAL
`NUMBER OF
`THREADS?
`
`
`
`DID
`THREAD
`HAVE A SMAL
`BLOCK
`?
`
`
`
`942
`MARK THREAD AS SLOW
`ALLOCATING
`
`
`
`
`
`
`
`MEMORY
`ALLOCATED
`BY THREAD IN LAS
`GARBAGE COLLECTION
`INTERVAL -
`THRESHOLD
`?
`
`
`
`MARK THREAD AS FAST
`ALLOCATING
`
`Figure 10b
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 12
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 12 of 14
`
`US 6,510,498 B1
`
`(STARD
`
`CONSTRUCT ALLOCATION AREA
`BY ALLOCATING SMALL AND
`LARGE MEMORY BLOCKS
`
`ALLOCATE A SMALL BLOCK TO
`EACH THREAD
`
`952
`
`954
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`956
`
`RUN OVERALL SYSTEM
`
`
`
`
`
`
`
`
`
`976
`ASSIGN NEW LARGE BLOCK TO
`THE THREAD THAT OVERFLOWED
`ITS BLOCK
`y NCREMENT
`OVERFLOW
`COUNTER OF
`
`IS
`A NEW
`
`974
`
`y EOWING LARGE BLOCK)N
`959
`
`AVAILABL
`
`Y
`
`COUNTER
`HRESHOLD
`?
`
`972
`OBTAIN NEXT AVAILABLE LARGE
`BLOCK FROM ALLOCATION AREA
`
`
`
`PERFORM GARBAGE
`COLLECTION
`
`966
`ASSGN NEW SMALL BLOCK TO
`THREAD THAT OVERFLOWED TS
`BLOCK
`
`Figure II
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 13
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 13 of 14
`
`US 6,510,498 B1
`
`1030
`
`1040
`
`/
`
`
`
`1032
`
`Secondary
`Storage
`
`
`
`N PROCESSORS
`
`1034
`
`Primary
`Storage
`
`1036
`
`RAM
`
`NetWOrk
`
`
`
`Figure 12
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 14
`
`
`
`U.S. Patent
`
`Jan. 21, 2003
`
`Sheet 14 of 14
`
`US 6,510,498 B1
`
`|NEWNOHIANE
`
`
`
`EWNI LNTR
`
`
`
`
`
`
`
`
`
`EOHITOS
`
`ENCJOO
`
`HETICHWOO
`
`0,7 || ||
`
`08 || ||
`
`|NEWNOHIANE E WN|| ~
`ETICHWOO
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 15
`
`
`
`US 6,510,498 B1
`
`1
`METHOD AND APPARATUS FOR MEMORY
`ALLOCATION IN A MULTI-THREADED
`VIRTUAL MACHINE
`
`This is a Continuation application of prior application
`Ser. No. 09/108,047 filed on Jun. 30, 1998 now U.S. Pat. No.
`6,209,066, the disclosure of which is incorporated herein by
`reference.
`
`2
`Rafael Lins (John Wiley & Sons Ltd., 1996), which is
`incorporated herein by reference in its entirety. “Younger'
`objects have been observed as being more likely to become
`garbage than "older objects. AS Such, generational garbage
`collection may be used to increase the overall efficiency of
`memory reclamation.
`In a System that uses generational garbage collection, a
`Special memory area is designated for the allocation of new
`objects. Such a memory area is generally considered to be a
`“nursery,” as new objects are allocated within the memory
`area. AS will be appreciated by those skilled in the art, the
`memory area is often referred to as “Eden.”
`FIG. 1a is a diagrammatic representation of a single
`thread and a memory allocation area that is dedicated to the
`Single thread. Such a memory allocation area is Suitable for
`implementation within a single-threaded System that uses
`generational garbage collection. AS Shown, a memory allo
`cation area 102, which may be known as Eden, is indexed by
`an allocation pointer 104. In general, Eden 102 is a block of
`memory in which new objects may be created. When a
`thread 106, which is associated with Eden 102, attempts to
`allocate a new object, allocation pointer 104 is typically
`incremented by the size of the new object, and a check is
`made to determine if allocation pointer 104 has reached the
`end of Eden 102. When it is determined that the end of Eden
`102 has been reached, a generational garbage collection may
`be performed to effectively empty Eden 102, thereby allow
`ing new objects to be created by thread 106 within Eden 102.
`While the allocation of memory and, hence, new objects,
`as described with respect to FIG. 1a is effective in a
`Single-threaded System, Such an allocation of memory and
`objects generally may not be used in a multi-threaded
`system with multiple central processing units (CPUs). By
`way of example, when two threads concurrently attempt to
`request Space in a single Eden, concurrency problems may
`arise. AS Such, in a multi-threaded System, when Eden is a
`shared resource, access to Eden must generally be Synchro
`nized in order to prevent more than one thread from allo
`cating in Eden at any given time. Synchronizing access to
`Eden may involve associating an allocation lock with Eden
`that is obtained by a thread when the thread wishes to create
`a new object, and released by the thread after the new object
`has been created.
`FIG. 1b is a diagrammatic representation of two threads
`and a memory allocation area shared by the two threads
`within an overall multi-threaded system. An Eden 112 has an
`asSociated allocation pointer 114 which is arranged to indi
`cate the beginning of an unused portion 115 of Eden 112.
`When threads 116 and 118, which share Eden 112, wish to
`allocate a new object in Eden 112, they must generally
`obtain the allocation lock (not shown) associated with Eden
`112. Specifically, if thread 116 wishes to access unused
`portion 115, thread 116 must obtain the allocation lock on
`Eden 112. Once thread 116 obtains the allocation lock, and
`it is determined that the end of Eden 112 has not been
`reached, allocation pointer 114 may be incremented, and a
`new object may be allocated by thread 116. If the end of
`Eden 112 has been reached, i.e., when unused portion 115 is
`null, a garbage collection may be performed to effectively
`empty Eden 112, thereby allowing new objects to be created
`by threads 116 and 118.
`When access to Eden is synchronized, the allocation of
`new objects within Eden is typically slowed considerably
`due to the Overhead associated with the acquisition of and
`the releasing of the allocation lock associated with Eden.
`Each time a thread wishes to create a new object in Eden, the
`
`BACKGROUND OF THE INVENTION
`1. Field of Invention
`The present invention relates generally to memory allo
`cation in computer Systems. More particularly, the present
`invention relates to efficient, low-overhead memory alloca
`tion in multi-threaded, object-based computer Systems.
`2. Description of the Related Art
`AS the use of Virtual machines in computer technology
`increases, improving the overall efficiency of a virtual
`machine is becoming more important. The amount of
`memory associated with a computer System that includes a
`Virtual machine is typically limited. AS Such, memory must
`generally be conserved and recycled. Many computer pro
`gramming languages enable Software developerS to dynami
`cally allocate memory within a computer System, while
`other programming languages require explicit manual deal
`location of previously allocated memory, which deallocation
`may be complicated and prone to error. Languages that
`require explicit manual memory management include the C
`and C++ programming languages. Other programming lan
`guages utilize automatic Storage-reclamation to reclaim
`memory that is no longer necessary to ensure the proper
`operation of computer programs that allocate memory from
`the reclamation System. Such automatic Storage-reclamation
`Systems reclaim memory without explicit instructions or
`calls from computer programs which were previously uti
`lizing the memory.
`In object-oriented or object-based Systems, the typical
`unit of memory allocation is commonly referred to as an
`object or a memory object, as will be appreciated by those
`skilled in the art. Objects that are in use are generally
`referred to as “live” objects, whereas objects that are no
`longer needed to correctly execute computer programs are
`typically referred to a "garbage' objects. The act of reclaim
`ing garbage objects is commonly referred to as garbage
`collection, and an automatic Storage-reclamation System is
`often referred to as a garbage collector. Computer programs
`written in languages Such as the Java" programming lan
`guage (developed by Sun MicroSystems, Inc.) and the
`Smalltalk programming language use garbage collection to
`automatically manage memory.
`The use of a compacting garbage collector generally
`allows objects to be allocated relatively quickly. That is, one
`advantage of using a compacting garbage collector is fast
`allocation of objects. Objects may be allocated in a contigu
`ous memory area, e.g., an allocation area, Such that the
`allocation of the objects may be performed by incrementing
`an allocation pointer by the desired amount of Storage. When
`the end of the allocation area has been reached, a garbage
`collection may be performed.
`One garbage collection method is a generational garbage
`collection method. A generational garbage collection
`method is a method in which objects are separated based
`upon their lifetimes as measured from the time the objects
`were created. Generational garbage collection is described
`in more detail in Garbage Collection. Algorithms for Auto
`matic Dynamic Memory Management by Richard Jones and
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 16
`
`
`
`3
`thread must acquire exclusive rights to Eden, as for example
`by acquiring an allocation lock. In general, even So-called
`“fast' locking primitives which are directly implemented by
`hardware, e.g., a compare-and-Swap primitive, may be rela
`tively slow when compared to the base costs associated with
`allocation. For instance, on a multiprocessor System, a
`locking primitive may incur a remote cache miss, as will be
`appreciated by those skilled in the art. In Such a System,
`adding Synchronization features often significantly increases
`the cost of allocation, e.g., by a factor of two or three. Hence,
`adding Synchronization during allocation greatly affects the
`performance of the overall System.
`In order to improve performance associated with access
`ing Eden in a multi-threaded System by avoiding
`Synchronization, each thread in the multi-threaded System
`may be assigned its own Eden. That is, when each thread has
`its own Eden, concurrency problems that may arise when
`more than one thread attempts to access a shared Eden may
`be avoided. FIG.2a is a diagrammatic representation of two
`threads with their own associated Edens, or memory allo
`cation areas. Within a multi-threaded system 200, a first
`Eden 202, which is referenced by an allocation pointer 204,
`is associated with a first thread 206. Multi-threaded system
`200 also includes a second Eden 212 that is referenced by an
`allocation pointer 204, and is associated with a Second
`thread 216.
`When first thread 206 wishes to allocate a new object, first
`thread 206 accesses first Eden 202. Similarly, when second
`thread 216 wishes to allocate a new object, Second thread
`216 accesses second Eden 212. As each thread 206, 216 has
`its own exclusive Eden, namely Edens 202 and 212,
`respectively, no allocation locks are needed to Safeguard
`against two threads attempting to access a single Eden in
`order to create a new object at any given time.
`Although allocating a separate Eden to each thread in a
`multi-threaded System may eliminate the need for allocation
`locks, allocating Separate Edens often requires a Substantial
`amount of memory. For example, Some applications may
`contain hundreds or even thousands of threads. In addition,
`Some threads may allocate objects at a faster Speed than
`others and, hence, will generally require more memory. The
`requirement for more memory may lead to frequent garbage
`collections, performed over all memory, e.g., global garbage
`collections performed on all Edens, which would require
`Some form of Synchronization. AS Such, Overall overhead
`asSociated with performing garbage collections on multiple
`Edens may increase and adversely affect the performance of
`the overall System, Since Some Edens may still be relatively
`empty while others are filled to capacity.
`The use of a Substantial amount of memory, as well as the
`increase in the overall overhead associated with garbage
`collection, that is associated with allocating a separate Eden
`to each thread in a multi-threaded System may be inefficient
`and expensive. Reducing the amount of memory used, as
`well as the frequency of garbage collection, increases the
`efficiency and generally decreases the costs associated with
`a multi-threaded System. Dividing an Eden into chunks, or
`blocks, typically allows an Eden to be shared without
`requiring allocation locks. The general division of Eden into
`chunks is described in "Multilisp: A Language for Concur
`rent Symbolic Computation” by R. Halstead, Jr. (ACM
`Transactions on Programming Languages and Systems, 7(4)
`:501-538, October 1985), which is incorporated herein by
`reference in its entirety. FIG. 2b is a diagrammatic repre
`Sentation of two threads and a memory allocation area
`shared by the two threads in which the memory allocation
`area is divided into chunks. A multi-threaded system 230
`
`4
`includes an Eden 232 that is divided into chunks 233 which
`are of a consistent size. In other words, all chunks 233 are
`approximately the same size. Each thread 236, 238 which
`shares Eden 232 is allocated an initial chunk. By way of
`example, thread 236 is initially allocated chunk 233a, while
`thread 238 is initially allocated chunk 233b.
`When a thread, e.g., thread 236, fills its chunk 233a,
`thread 236 is allocated another chunk 233c. Threads con
`tinue to be allocated chunks 233 until no chunks 233 are
`available, at which time a garbage collection may be per
`formed. It should be appreciated that although the requests
`for chunkS 233 are Synchronized, the Synchronization gen
`erally does not occur as frequently as the allocation Syn
`chronization that was previously mentioned.
`Allocating chunks 233 to threads 236,238 often results in
`Substantial fragmentation, as each chunk 233 must generally
`be sized to hold a large object. Hence, when a chunk is
`partially full, and a large object created by a thread does not
`fit in the partially full chunk, a new chunk will be allocated
`to the thread to accommodate the large object. The Space left
`in the partially full chunk is then effectively wasted. In
`addition, the allocation of Space in the chunks may be
`inefficient when threads which are slow allocating are in
`possession of Virtually empty chunks, thereby reserving
`memory Space which may never be needed.
`Therefore, what is desired is a method and an apparatus
`for efficiently allocating memory in a multi-threaded System
`Such as a multi-threaded virtual machine. Specifically, what
`is needed is a method and an apparatus for allowing threads
`to create new objects in a memory allocation area, e.g., an
`Eden, while minimizing memory Space, minimizing alloca
`tion costs, and improving the efficiency of garbage collec
`tion.
`
`SUMMARY OF THE INVENTION
`The present invention relates to the efficient allocation of
`shared memory in a multi-threaded computer System. In
`accordance with one embodiment of the present invention,
`a computer-implemented method for allocating memory
`shared by multiple threads in a multi-threaded computing
`System includes partitioning the shared memory into a
`plurality of blocks, and grouping the multiple threads into at
`least a first group and a Second group. A Selected block is
`allocated to a Selected thread which may attempt to allocate
`an object in the selected block. The allocation of the selected
`block to the Selected thread is based at least partially upon
`whether the Selected thread is a part of the first group or the
`Second group. In one embodiment, grouping the multiple
`threads into the first group and the Second group includes
`identifying a particular thread and determining whether the
`particular thread is a fast allocating thread. In Such an
`embodiment, when the particular thread is fast allocating,
`the particular thread is grouped into the first group.
`According to another aspect of the present invention, a
`computer-implemented method for allocating shared
`memory in a multi-threaded computing System which
`includes at least a first thread and a Second thread involves
`partitioning the Shared memory into a plurality of blocks,
`and assigning a first block that is accessible to both the first
`thread and the Second thread for the creation of new objects.
`After the System is allowed to run, a determination is
`effectively made as to whether the first block has over
`flowed. If it is determined that the first block has overflowed,
`the method includes determining whether an attempt by the
`first thread to allocate the first object in the first block caused
`the first block to overflow. If such is the case, a second block
`
`US 6,510,498 B1
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 17
`
`
`
`S
`is assigned to the first thread. ASSignment of the Second
`block to the first thread is arranged to cause the first thread
`to effectively relinquish the ability to allocate objects in the
`first block. In one embodiment, the second thread does not
`have the ability to allocate objects in the second block.
`In another embodiment, the method also includes deter
`mining when one of the first block and the second block
`have overflowed, as well as assigning a third block the first
`thread when it is determined that the second block
`overflowed, or assigning the third block to the Second thread
`when it is determined that the first block overflowed. In Such
`an embodiment, when it is determined that the first block
`overflowed, a fourth block may replace the first block.
`According to Still another aspect of the present invention,
`a computer-implemented method for allocating memory in a
`multi-threaded computing System includes partitioning the
`memory into a plurality of blocks which includes a first
`block and a Second block that is Substantially larger than the
`first block. The first block is assigned to be accessible to a
`first thread which is arranged to attempt to allocate a first
`object in the first block, and the Second block is assigned to
`be accessible to the Second thread in order for the Second
`thread to attempt to allocate a Second object in the first
`block. In one embodiment, the first block has a size in the
`range of approximately 1 kiloByte to approximately 4
`kiloBytes, and the Second block has a size in the range of
`approximately 16 kiloBytes to approximately 32 kiloBytes.
`The present invention will be more readily understood
`upon reading the following detailed descriptions and Study
`ing the various figures of the drawings.
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention may be understood by reference to the
`following description taken in conjunction with the accom
`panying drawings in which:
`FIG. 1a is a diagrammatic representation of a thread and
`a memory allocation area.
`FIG. 1b is a diagrammatic representation of two threads
`and a memory allocation area shared by the two threads.
`FIG. 2a is a diagrammatic representation of two threads
`with their associated memory allocation areas.
`FIG. 2b is a diagrammatic representation of two threads
`and a memory allocation area shared by the two threads in
`which the memory allocation area is divided into chunkS.
`FIG. 3 is a diagrammatic representation of multiple
`threads and a memory allocation area shared by the multiple
`threads in accordance with a first embodiment of the present
`invention.
`FIG. 4 is a process flow diagram which illustrates the
`StepS associated with a first process of allocating memory in
`accordance with the first embodiment of the present inven
`tion.
`FIG. 5a is a process flow diagram which illustrates the
`StepS associated with a Second process of allocating memory
`in accordance with the first embodiment of the present
`invention.
`FIG. 5b is a process flow diagram which illustrates the
`StepS associated with a determination of which threads are
`considered to be fast allocating threads, i.e., step 458 of FIG.
`5a, in accordance with the first embodiment of the present
`invention.
`FIG. 6 is a process flow diagram which illustrates the
`StepS associated with a third process of allocating memory in
`accordance with the first embodiment of the present inven
`tion.
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,510,498 B1
`
`6
`FIG. 7 is a process flow diagram which illustrates the
`StepS associated with a fourth process of allocating memory
`in accordance with the first embodiment of the present
`invention.
`FIG. 8 is a diagrammatic representation of multiple
`threads and a memory allocation area shared by the multiple
`threads in accordance with a Second embodiment of the
`present invention.
`FIG. 9 is a process flow diagram which illustrates the
`StepS associated with a first process of allocating memory in
`accordance with the Second embodiment of the present
`invention.
`FIG. 10a is a process flow diagram which illustrates the
`StepS associated with a Second process of allocating memory
`in accordance with the Second embodiment of the present
`invention.
`FIG. 10b is a process flow diagram which illustrates the
`StepS associated with a determination of which threads are
`considered to be fast allocating threads, i.e., step 918 of FIG.
`10a, in accordance with the Second embodiment of the
`present invention.
`FIG. 11 is a process flow diagram which illustrates the
`StepS associated with a third process of allocating memory in
`accordance with the Second embodiment of the present
`invention.
`FIG. 12 illustrates a typical, general-purpose computer
`System Suitable for implementing the present invention.
`FIG. 13 is a diagrammatic representation of a virtual
`machine which is supported by computer system 1030 of
`FIG. 12, and is Suitable for implementing the present inven
`tion.
`
`DETAILED DESCRIPTION OF THE
`EMBODIMENTS
`The overhead associated with allocating shared memory,
`e.g., an "Eden,” in a multi-threaded System is often signifi
`cant. Allocating a separate Eden to each thread in a multi
`threaded System tends eliminate the need for allocation lockS
`asSociated with Synchronization. However, allocating Sepa
`rate Edens often requires a Substantial amount of memory,
`and may lead to more frequent garbage collections, thereby
`potentially adversely affecting the performance of the over
`all System.
`An Eden that is shared by multiple threads may be divided
`into equal chunks, or blocks, Such that each thread has its
`own block. By allowing each thread to have its own block,
`an Eden may be shared without requiring allocation lockS.
`However, dividing Eden into equal chunks and allowing
`each thread to have its own block often result in Substantial
`fragmentation. For example, when a chunk is partially full
`and a large object created by a thread does not fit in the
`partially full chunk, a new chunk will be allocated to the
`thread to accommodate the large object. The Space left in the
`partially full chunk is then effectively wasted. In addition,
`the allocation of Space in the chunks may be inefficient when
`threads that rarely allocate objects are in possession of
`Virtually empty chunks, thereby reserving memory Space
`which may never be needed. When threads reserve memory
`Space that may not be needed, the Space is effectively taken
`away from threads which may need the memory Space.
`Further, more frequent garbage collections, which involve
`Substantial overhead, are likely to occur in order to free
`memory for use by threads which need additional memory
`Space.
`By allowing multiple threads which rarely allocate objects
`to share chunks or blocks of a shared memory allocation
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1017, p. 18
`
`
`
`US 6,510,498 B1
`
`15
`
`25
`
`35
`
`40
`
`7
`area, while providing threads which frequently allocate
`objects with “private,” or unshared, memory blocks, more
`memory space is effectively provided to Substantially only
`the threads that need more memory. Hence, more memory
`Space is likely to be filled before garbage collections are
`performed. In addition, the frequency of garbage collection
`may also be reduced. Although Synchronization is used
`when Slow allocating threads, e.g., thr