`US007506098B2
`
`c12) United States Patent
`Arcedera et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 7,506,098 B2
`Mar.17,2009
`
`(54) OPTIMIZED PLACEMENT POLICY FOR
`SOLID STATE STORAGE DEVICES
`
`(75)
`
`Inventors: MarkArcedera, Paranaque (PH);
`Reyjan C. Lanuza, Taguig (PH);
`Ritchie Babaylan, Cavite (PH)
`
`(73) Assignee: BiTMICRO Networks, Inc., Fremont,
`CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 228 days.
`
`(21) Appl. No.: 11/450,005
`
`(22) Filed:
`
`Jun.8,2006
`
`(65)
`
`Prior Publication Data
`
`US 2007 /0288686 Al
`
`Dec. 13, 2007
`
`(51)
`
`Int. Cl.
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ......................................... 711/103; 710/22
`(58) Field of Classification Search ................. 711/103;
`710/22
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,822,251 A * 10/1998 Bruce et al. ............ 365/185.33
`2002/0141244 Al* 10/2002 Bruce et al. ............ 365/185.33
`2003/0217202 Al * 11/2003 Zilberman et al.
`............ 710/22
`4/2004 Goff et al ...................... 710/22
`2004/0073721 Al*
`
`* cited by examiner
`
`Primary Examiner-Kevin Ellis
`Assistant Examiner-Hashem Farrokh
`(74) Attorney, Agent, or Firm-Dennis Fernandez; Stephen
`R. Uriarte
`
`(57)
`
`ABSTRACT
`
`A data storage system is provided comprising several flash
`arrays in a board and stacking these boards to attain a high(cid:173)
`capacity solid state hard drive. A remap table is used to map
`all logical addresses from a host system to the actual physical
`addresses where data are stored. The assignments of these
`physical locations are done in such a way that the load of the
`system is evenly distributed to its available resources. This
`would ensure that the storage system will run at its utmost
`efficiency utilizing its resources properly. To achieve this, the
`system would make sure that the physical location of data be
`evenly distributed according to the current load of the system.
`
`7 Claims, 13 Drawing Sheets
`
`PBA
`
`Section x
`Chip
`Interleave
`
`0
`
`1
`
`•
`•
`•
`
`14
`
`•
`•
`•
`
`Section
`
`---------
`AO
`I
`BO
`I
`co
`I
`I
`DO
`I
`A1
`B2
`C2
`D2
`
`I
`I
`I
`
`I
`I
`I
`
`I
`I
`I
`
`A14
`B14
`C14
`D14
`
`- - - -
`
`0
`
`1
`
`2
`3
`
`4
`
`Flash Device Location
`Engine
`Bus
`Group
`0
`0
`2
`0
`4
`0
`0
`6
`8
`0
`10
`0
`12
`0
`14
`0
`
`5
`
`6
`
`7
`
`•
`•
`•
`
`1
`
`3
`
`5
`
`7
`
`1
`
`1
`
`1
`
`1
`
`•
`•
`•
`
`0
`
`1
`
`2
`
`3
`
`Flash
`Address
`0
`0
`0
`0
`0
`0
`0
`0
`
`•
`•
`•
`
`Ox8
`Ox8
`Ox8
`Ox8
`
`INTEL-1027
`8,010,740
`
`
`
`00 = N
`tit = 0--, = \0
`
`-.....l
`
`d r.,;_
`
`~
`
`....
`0 ....
`....
`('D ....
`rJJ =(cid:173)
`
`('D
`
`~ :-: ....
`~
`
`1,0
`0
`0
`N
`~-....J
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`•••
`
`Engine (FDE)
`Flash DM:'IW
`
`Engine (FDE)
`FlashDMAIW
`
`Engine (FDE)
`FlashDM:aiw
`
`Flash Bus 7 •• .i
`' '
`•••
`
`•••
`
`•••
`
`•••
`
`•••
`
`•••
`
`FIG. 1
`
`Flash Buffer Controller
`
`I
`Flash DMAIW
`
`:
`, I Engine (FDE)
`
`------------------·
`:i.... ___ J
`
`101
`105
`
`102---r=---------
`
`. '
`
`106
`
`108
`
`109
`
`• • •
`
`• • •
`
`• • •
`
`Fla~h Array Board
`
`107~
`
`104
`
`103
`
`Flash Array Board
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 2 of 13
`
`US 7,506,098 B2
`
`Group 1
`
`Dev17
`c:;::J
`
`Dev19
`c:;::J
`
`Dev21
`c:;::J
`
`Dev23
`c:;::J
`
`Dev25
`c:;::J
`
`Dev27
`c:;::J
`
`Dev29
`c:;::J
`
`Dev31
`c:;::J
`
`Dev16
`c:;::J
`
`Dev18
`c:;::J
`
`Dev20
`c:;::J
`
`Dev22
`c:;::J
`
`Dev24
`c:;::J
`
`Dev26
`c:;::J
`
`Dev28
`c:;::J
`
`Dev30
`c:;::J
`
`0
`
`1
`
`2
`
`H
`
`3
`
`H,
`
`4
`
`5
`
`6
`
`j
`
`'
`
`7
`
`Bi3 ra3 EE ~ §E ~ EE B!3
`~ ~ ~ ~ ~ B!3 EE EE
`' .
`' .
`' .
`' ..
`''
`,,
`,,
`,,
`, ,.
`' ,
`"
`"
`"
`
`Group 0
`
`c:;::J
`
`c:;::J
`
`c:;::J
`
`c:;::J
`
`c:;::J
`
`Dev13
`c:;::J
`
`Dev15
`c:;::J
`
`~~
`
`Dev12
`c:;::J
`
`Dev14
`c:;::J
`
`Dev6
`c:;::J
`
`Dev8
`c:;::J
`
`Dev10
`c:;::J
`
`201
`:·· .E
`20
`202__)~---·::.~ .. -----... __ -_-__ -__ ... _____ ... _-__ -__ -__ ... __ j
`
`0
`'
`t.....
`
`1
`2
`' 3
`'
`---......... · · -......... ·!_ .........
`··1-----1 ·-1-----11 i - - - -1
`
`FIG. 2
`
`4
`
`5
`
`6
`
`7
`
`iev3 wev5 wev7 wev9 Dev11
`EE ~ 8!3 8!3 §E
`
`Deva
`-[ c:;::J
`
`Secti
`
`Dev2
`c:;::J
`
`Dev4
`c:;::J
`
`r··El3 .. §§ .-~ ~ §g ~ §9 ~
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 3 of 13
`
`US 7,506,098 B2
`
`301 _ . . /
`302 _/
`
`303 _ . . /
`
`Section
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`•
`-
`-
`•
`
`61
`
`62
`
`63
`
`64
`
`65
`
`-•
`•
`
`PBA
`
`Flash Device Location
`Bus
`Engine
`Group
`0
`0
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`5
`
`6
`
`7
`
`0
`
`1
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`1
`
`1
`
`0
`
`0
`
`2
`
`4
`
`6
`
`8
`
`10
`
`•
`-
`-
`•
`
`11
`
`13
`
`15
`
`0
`
`2
`
`u ...
`u
`
`FIG. 3
`
`Flash
`Address
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`-
`•
`•
`
`Ox8
`
`Ox8
`
`Ox8
`
`Ox10
`
`Ox10
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 4 of 13
`
`US 7,506,098 B2
`
`Dev17
`1 g~~~~g 1
`
`I
`
`Dev19
`1 g~~~~g 1
`
`I
`
`Dev21
`1 g~~~~g I
`
`I
`
`Dev23
`
`I g~~~~g I
`
`I
`
`Dev3
`
`I g~~~~g I
`
`I
`
`Dev5
`1 g~~~~g I
`
`I
`
`Dev?
`1 g~~~~g I
`
`I
`
`Dev1
`1 g~~~~g 1
`
`I
`
`DevO
`1 g~~~~g 1
`
`I
`
`Dev2
`
`I g~~~~g I
`
`I
`
`' -
`• • ' -
`• • • •
`• • • •
`
`Dev29
`1 g~~~~g I
`
`I
`
`Dev31
`1 g~~~~g I
`
`I
`
`• •
`• •
`
`Dev13
`
`I g~~~~g I
`
`I
`
`Dev15
`
`I g~~~~g I
`
`I
`
`Dev4
`
`I g~~~~g I
`
`I
`
`Dev6
`
`I g~~~~g I
`
`I
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 5 of 13
`
`US 7,506,098 B2
`
`Dev17
`
`Dev19
`
`Dev21
`
`Dev23
`
`Dev25
`
`DevO
`
`Dev2
`
`Dev4
`
`Dev6
`
`Dev8
`
`~;~:f ...............
`---... ~----! _EE___ EE m
`--- I Era}ed I _J I g~@;¥8 I I grnJ¥8 I
`___ B!E ___ R!A Fai
`~ I
`
`Dev14
`
`I grnJ¥9 I
`
`I
`I
`
`e
`
`FIG. 5
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 6 of 13
`
`US 7,506,098 B2
`
`Start
`
`Data needs to be written to
`Flash
`601
`
`Determine which FOE has
`the least write request
`602
`
`Is free section
`available?
`603
`
`Force erasure of invalid
`blocks
`604
`
`No
`
`Determine next target
`FOE
`605
`
`Yes
`
`Get free section for
`that FOE
`606
`
`Update LBA-PBA Map
`Table to reflect new
`location
`607
`
`Queue up write request
`608
`
`Return
`
`FIG. 6
`
`
`
`00 = N
`= 0--, = \0
`
`tit
`-....l
`d r.,;_
`
`FIG. 7
`
`~
`
`....
`0 ....
`-....J
`.....
`rJJ =- ('D
`
`('D
`
`1,0
`0
`0
`N
`~-....J
`....
`~ :-:
`~
`
`=
`
`~
`~
`~
`~
`•
`00
`~
`
`~
`
`Section 0
`PBA 15215
`
`Section 0
`PBA 11915
`
`Section 1
`PBA 11215
`
`Section 0
`PBA 2515
`
`Section 0
`PBA 12214
`
`Section 0
`PBA 18014
`
`•
`•
`•
`
`Section 0
`PBA 1815
`
`Section 0
`PBA 1014
`
`Section 3
`PBA 1015
`
`FDE15
`
`Section 1
`PBA 314
`
`FDE14
`
`Section 0
`PBA 18903
`
`Section 0
`PBA 17103
`
`Section 0
`PBA 14503
`
`Section 2
`PBA 10203
`
`Section 0
`PBA 2403
`
`Section 1
`PBA 1003
`
`FDE3
`
`Section 0
`►I PBA 14002
`
`Section 0
`►I PBA 12802
`
`Section 0
`PBA402
`
`►I
`
`Section 0
`
`PBA2
`
`FDE2
`
`Section 0
`PBA 17001
`
`Section 0
`PBA 16401
`
`Section 0
`PBA 15201
`
`Section 0
`PBA 14400
`
`Section 0
`PBA 13000
`
`Section 0
`PBA 4201
`
`Section 0
`PBA 1800
`
`Section 0
`PBA 2301
`
`Section 0
`PBA 1200
`
`702
`
`701
`
`Section 2
`PBA401
`
`FDE1
`
`---
`
`----
`
`-secfiorf:t-,
`x--I UT""\. L..VV c"'~
`
`FDE0
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 8 of 13
`
`US 7,506,098 B2
`
`Bus 0
`
`Bus 1
`
`Bus 2
`
`Bus 3
`
`FDE O
`
`FDE 1
`
`FDE 2
`
`FDE 3
`
`FDE 4
`
`FDE 5
`
`-~R~e_ad_--+-_W_r_ite_~I ~I _R_e_ad_~_W_r_ite_~I
`Erase
`
`~~R~e_ad~ __ R_e_ad_~
`.
`Read
`
`FDE 7
`
`FDE6
`Read
`Write
`
`Bus 4
`
`Bus 5
`
`Bus 6
`
`Bus 7
`
`FDE 8
`
`FDE 9
`
`FDE 10
`
`FDE 11
`
`FDE 12
`
`FDE 13
`
`FDE 14
`
`__ R_e_ad_~_R_e_ad_~I ~~W~r_ite~-+--R_e_ad_~I
`.
`Read
`
`I
`
`Read
`
`FDE15
`Read
`
`I
`
`FIG. 8
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 9 of 13
`
`US 7,506,098 B2
`
`FOE?
`
`Free Sec
`
`FDE9
`
`Free Sec
`
`•
`•
`•
`~-~
`
`Bus 3
`
`FDE 7
`Write
`
`I
`
`[
`
`FDE 6
`Read
`
`Write
`
`•
`•
`•
`
`•
`•
`•
`
`FIG. 9
`
`Bus 4
`
`FDE 8
`Read
`
`FDE 9
`Read
`
`Write
`
`•
`•
`•
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 10 of 13
`
`US 7,506,098 B2
`
`Bus 0
`
`Bus 1
`
`Bus 2
`
`Bus 3
`
`FOE 0
`Read
`Erase
`Write
`
`FOE 1
`Write
`Write
`
`FOE 2
`
`FOE 3
`
`Read __ L Write
`
`Write
`
`Write
`
`FOE 4
`Read
`Read
`c __ Write
`
`FOE 5
`Read
`Write
`
`FOE 6
`Read
`Write
`c __ Write
`
`FOE 7
`Write
`Write
`
`Bus 4
`
`Bus 5
`
`Bus 6
`
`Bus 7
`
`FOES
`
`FOE9
`
`Read J _____ ~~~~ --.
`
`Write
`
`' Write
`
`'
`
`FOE12
`
`FOE 13
`
`Read l Write
`
`Write
`
`Write
`
`FOE14
`Write
`
`FOE15
`Read
`Write
`
`FOE 10
`Write
`Read
`Write
`
`FOE 11
`Read
`Write
`Write
`
`FIG. 10
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 11 of 13
`
`US 7,506,098 B2
`
`Start
`
`Physical Block detected to
`be defective
`1100
`
`Determine where to get
`new block by prioritizing
`its current location
`1110
`
`Is free section
`available?
`1120
`
`Force erasure of invalid
`blocks
`1130
`
`No
`
`Determine next target
`location
`1140
`
`Yes
`
`Get free block to
`replace bad block
`1150
`
`Update LBA-PBA Map
`Table to reflect new
`location
`1160
`
`If data recovered, write to
`new block
`1170
`
`Return
`
`FIG. 11
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 12 of 13
`
`US 7,506,098 B2
`
`1202
`
`Dev17
`
`Dev19
`
`Dev21
`
`Dev23
`
`Dev25
`
`1203
`
`_______ c= --------------------------------------------
`
`Dev27
`
`Dev29
`
`Dev31
`
`Section A14
`_S!i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`' ' '
`
`Section B14
`_S_!!:i~ 2,_3_
`Stripe 9
`
`Stripe 5
`Stripe 1
`
`Section C14
`
`_S_!!:i~~(cid:173)
`Stripe 10
`Stripe 6
`Stripe 2
`
`' ' '
`
`Section D14
`_S,!!:i~1_5-
`Stripe 11
`Stripe 7
`
`Stripe 3
`
`Section A15
`
`_S!i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`' ' '
`
`Section B15
`
`_S!;i~l.3-
`Stripe 9
`Stripe 5
`Stripe 1
`
`' ' '
`
`Section C15
`
`_S!;i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`Section 015
`
`Stripe 7
`Stripe 3
`
`__ .:,.;,j,jjQJl,j)t\ ___ ·:
`
`Section B6
`
`Section C6
`
`Section D6
`
`Section A7
`
`Section B7
`
`Section C7
`
`Section D7
`
`1
`
`Stripe 12
`-StriPe8-
`Stripe4
`
`Stripe 13
`-StriPe9-
`Stripe 5
`
`Stripe 1
`
`Stripe 14
`-stri~ 10-
`
`Stripe 15
`-stri~ 11-
`Stripe 7
`
`Stripe 3
`
`Stripe 12
`-StriPe8-
`
`Stripe 13
`-StriPe9-
`
`' '
`:
`(7
`\ ~ ____ s:~~~~----
`
`' ' ' '
`' ' ' ' ' '
`
`1201
`
`stripe 6
`Stripe 2
`
`Stripe4
`Stripe 0
`
`Stripe 5
`Stripe 1
`
`Stripe 14
`-stri~T□-
`Stripe 6
`Stripe 2
`
`Stripe 15
`-stri~ Ti-
`stripe 7
`Stripe 3
`
`Dev16
`
`Dev18
`
`Dev20
`
`Dev22
`
`Dev24
`
`Dev26
`
`Dev28
`
`Dev30
`
`' ' ' '
`
`Section B12
`
`Stripe 13
`-StriPe9-
`Stripe 5
`
`Stripe 1
`
`' ' ' '
`
`Section C12
`
`Stripe 14
`-stri~10-
`
`Stripe 6
`Stripe 2
`
`' ' ' '
`
`Section D12
`
`_S_!!:i~l.5-
`Stripe 11
`Stripe 7
`Stripe 3
`
`' ' ' '
`
`Section B13
`
`' ' ' '
`
`Section C13
`
`Stripe 13
`-StriPe9-
`
`Stripe 14
`-stri~To-
`
`Stripe 5
`Stripe 1
`
`Stripe 6
`Stripe 2
`
`Section A13
`
`Stripe 12
`-StriPe8-
`
`Stripe4
`Stripe 0
`
`Section 013
`
`_S!ie21_5_
`Stripe 11
`Stripe 7
`Stripe 3
`
`Section C4
`
`Section D4
`
`Section AS
`
`Section B5
`
`Section CS
`
`Section D5
`
`Section A12
`
`Stripe 12
`-StriPe8-
`Stripe4
`
`Stripe O
`
`Section A4
`
`Section B4
`_S!i~E-
`Stripe 9
`
`Stripe 5
`
`Stripe 1
`
`Stripe4
`Stripe 0
`
`_S.!!:i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`_S!i~l.5-
`Stripe 11
`Stripe 7 .I
`Stripe 3
`I
`
`Stripe4
`Stripe a
`
`_S!i~lJ-
`Stripe 9
`Stripe 5
`Stripe 1
`
`_S!it;;~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`_S.!!:i~l.5-
`Stripe 11
`Stripe 7
`Stripe 3
`
`Dev1
`
`Dev3
`
`Dev5
`
`Dev7
`
`Dev9
`
`Dev11
`
`Dev13
`
`Dev15
`
`Section B10
`
`Section C10
`
`Section D10
`
`Section A11
`
`_s.!!:i~l.3-
`Stripe 9
`Stripe 5
`stripe 1
`
`_s.!!:i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`Stripe 15
`-stri~ Ti-
`Stripe 7
`Stripe 3
`
`_S!;i~E-
`Stripe 8
`Stripe4
`Stripe a
`
`' ' ' '
`
`Section B11
`
`_s!i~lJ-
`Stripe 9
`Stripe 5
`Stripe 1
`
`' ' ' '
`
`SectionC11
`
`_S!;i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`Section D11
`
`Stripe 15
`-stri~ Ti-
`stripe 7
`Stripe 3
`
`Section B2
`
`Section C2
`
`Section D2
`
`Section A3
`
`Section B3
`
`Section C3
`
`Section D3
`
`_S!i~l.3-
`Stripe 9
`Stripe 5
`Stripe 1
`
`_S_!!:i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`_S!ie21_5 _
`Stripe 11
`Stripe 7
`Stripe 3
`
`_S!;i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`_S!i~l.3-
`Stripe 9
`Stripe 5
`Stripe 1
`
`_S!;i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`_S!i~:!_?(cid:173)
`Stripe 11
`Stripe 7
`Stripe 3
`
`Section A10
`_s.!!:1~12-
`stripe 8
`Stripe4
`Stripe 0
`
`Section ft2
`_ S!i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`DevO
`
`Dev2
`
`Dev4
`
`Dev6
`
`Dev8
`
`Dev10
`
`Dev12
`
`Dev14
`
`' ' ' '
`
`Section BB
`
`_S!ie2:@ _
`Stripe 9
`Stripe 5
`Stripe 1
`
`' ' ' '
`
`Section CB
`
`_S_!!:i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`' ' ' '
`
`Section D8
`
`_s!i~l.5-
`Stripe 11
`Stripe 7
`Stripe 3
`
`' ' ' '
`
`Section B9
`
`_S!i~l.3-
`Stripe 9
`Stripe 5
`Stripe 1
`
`' ' ' '
`
`Section C9
`
`_S!i~~-
`Stripe 10
`Stripe 6
`Stripe 2
`
`Section A9
`
`_S!i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`Section D9
`
`Stripe 7
`Stripe 3
`
`Section AB
`_ S!i~E(cid:173)
`Stripe 8
`Stripe4
`Stripe 0
`
`Section AO
`
`Section BO
`
`Section CO
`
`Section DO
`
`Section A1
`
`Section B1
`
`Section C1
`
`Section D1
`
`Stripe 12
`-StriPe8-
`Stripe4
`Stripe O
`
`Stripe 13
`-StriPe9-
`
`Stripe 14
`-stri~10-
`
`Stripe 5
`Stripe 1
`
`Stripe 6
`Stripe 2
`
`Stripe 15
`-stri~ Ti-
`Stripe 7
`Stripe 3
`
`Stripe 12
`-StriPe8-
`
`Stripe 13
`-StriPe9-
`
`Stripe 14
`-stri~To-
`
`Stripe4
`Stripe 0
`
`Stripe 5
`Stripe 1
`
`Stripe 6
`Stripe 2
`
`Stripe 15
`-stri~ Ti-
`stripe 7
`Stripe 3
`
`FIG. 12
`
`
`
`U.S. Patent
`
`Mar.17,2009
`
`Sheet 13 of 13
`
`US 7,506,098 B2
`
`Section x
`Chip
`Interleave
`
`Section
`
`AO
`BO
`co
`DO
`A1
`82
`C2
`D2
`
`A14
`814
`C14
`D14
`
`:I
`
`ii
`ii
`
`ii
`
`- - - -
`
`0
`
`1
`
`•
`•
`•
`
`14
`
`•
`•
`•
`
`Flash
`Address
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`•
`•
`•
`
`Ox8
`Ox8
`Ox8
`Ox8
`
`Flash Device Location
`I Engine
`Bus
`Group
`0
`2
`4
`
`0
`
`1
`
`2
`
`PBA
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`1
`1
`1
`
`6
`8
`
`10
`12
`14
`
`•
`•
`•
`
`1
`3
`
`5
`
`7
`
`•
`•
`•
`
`3
`
`4
`
`5
`
`6
`
`7
`
`0
`
`1
`
`2
`
`3
`
`FIG. 13
`
`
`
`US 7,506,098 B2
`
`1
`OPTIMIZED PLACEMENT POLICY FOR
`SOLID STATE STORAGE DEVICES
`
`BACKGROUND
`
`1. Field
`The present invention relates to computer data storage
`systems. More particularly, the present invention relates to a
`system and method of mapping all logical addresses from a
`host system to physical addresses of data storage devices for
`improving host computer data access performance.
`2. Description of Related Art
`As flash devices are getting cheaper, solid state based hard
`drives are getting more popular as replacement for traditional
`mechanical hard drives. Mechanical hard drives suffer in
`areas unseen in flash memory based drives due to its many
`moving parts ( electrical motor, spindle shaft, read/write head,
`and a magnetic rotating disk). This leads to reliability prob(cid:173)
`lems especially when exposed to vibration and shock. Not
`only that, it also causes slow access time when fetching data 20
`from different areas of mechanical drive.
`Since flash memory based drives typically have no moving
`parts, it can easily withstand harsh environmental conditions
`and physical mishandling that would lead to failures in regu-
`lar mechanical drives. Also, access to a flash device does not 25
`suffer from the same problem as rotating drives wherein
`access time is increased if it is accessing data that are physi(cid:173)
`cally far from each other (since it requires head movements).
`However, there are also several problems associated with
`using flash based drives over rotating drives. Flash devices 30
`camiot be written to when it is not in the erased state. After it
`has been written, the only way to bring it back to its erased
`state is to erase a larger block of flash called erase block or
`simply flash block which is the minimum amount of data that
`can be erased. Typical flash technology (specifically NAND 35
`flash) doesn't allow toggling of individual bytes from a pro(cid:173)
`grammed state back to its erased state. That means that when
`a host requests to change an existing sector via logical block
`address or LBA, the flash physical block location ( addressed
`via physical block address or PBA) that contains this data 40
`must be erased first before attempting to write it with the new
`data. Considering that erase operations typically takes much
`longer in comparison to write or read operations, this greatly
`impacts the performance of the system. To avoid this perfor(cid:173)
`mance degradation, applications usually don't place the new 45
`data to its old physical location but instead finds a new one
`(that is already erased) and relocates the logical sector to a
`new physical location and thus skips the erase operation. The
`old block would then be erased in the background. Since hosts
`are designed with typical rotating drives in mind, it knows 50
`that the sectors are "write in place" and not relocated to a
`different location so a different layer needs to handle the
`dynamic changes that occur within a flash-based drive. Some
`implementation do this on the file system where a new layer
`called "Flash Translation Layer" is the one that handles the 55
`mapping while others do it on the actual flash controller itself
`so that hosts will never see the difference.
`Another unique characteristic of flash memory devices is
`that it has the tendency to wear-out when subjected to a
`certain amount of erase cycles (typically 100,000 cycles).
`This wearing-out leads to bad blocks and thus requires some
`sort of a bad block management to handle this. To prevent
`certain memory blocks from degrading much faster than the
`other blocks, a wear-leveling mechanism is required to assure
`that each and every block wears out evenly.
`Current flash based systems have addressed these issues
`either at the file system level or embedded in the actual flash
`
`2
`controller however most of them are targeted to single flash
`device or just a small array of flash devices. In order for
`flash-based hard drives to take over the rotating drives market
`share, it should be able to match the capacities of these drives.
`5 To achieve this, there is a need to create a system of several
`flash arrays in a board and stack these boards to attain a
`high-capacity solid state hard drive. To increase the perfor(cid:173)
`mance, systems can allow parallel access to these flash arrays
`and also take advantage of new flash device features like
`10 multi-bank (sometimes called multi-plane) and copy-back.
`Existing approaches in selection of flash blocks for new
`physical location, replacement of bad blocks, or wear-level(cid:173)
`ing doesn't pay much attention on where to get these blocks,
`they simply do this in a round robin mamier to spread out the
`15 access. With flash based systems allowing significant perfor(cid:173)
`mance gains by correctly selecting the target blocks, it is
`important to have a good mapping scheme to take advantage
`of these features.
`
`BRIEF SUMMARY OF THE INVENTION
`
`In one embodiment of the invention, a remap table, some(cid:173)
`times referred to as an LBA-PBA Map Table, is used to map
`all logical addresses from a host system to the actual physical
`addresses where data are stored. The assignments of these
`physical locations are done in such a way that the load of the
`system is evenly distributed to its available resources. This
`would ensure that the storage system will run at its utmost
`efficiency utilizing its resources properly. To achieve this, the
`system would make sure that the physical location of data be
`evenly distributed according to the current load of the system.
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`
`So that the mamier in which the above recited features,
`advantages and objects of the present invention are attained
`and can be understood in detail, a more particular description
`of the invention, briefly s=arized above, may be had by
`reference to the embodiments thereof which are illustrated in
`the appended drawings.
`It is to be noted, however, that the appended drawings
`illustrate only typical embodiments of this invention and are
`therefore not to be considered limiting of its scope, for the
`present invention may admit to other equally effective
`embodiments.
`FIG. 1 is a sample Flash Based Drive architecture with
`multiple flash chips, also referred herein as flash devices,
`accessed by multiple Flash DMA Engines according to an
`embodiment of the present invention.
`FIG. 2 is a sample physical layout of data sections accord(cid:173)
`ing to an embodiment of the present invention.
`FIG. 3 is the LBA-PBA Map Table for the layout shown in
`FIG. 2 according to an embodiment of the present invention.
`FIG. 4 is a physical layout with erased sections according
`to an embodiment of the present invention.
`FIG. 5 is a block diagram illustrating how sections are
`placed to its new location when a write request is issued for
`that data section according to an embodiment of the present
`invention.
`FIG. 6 is a flow chart illustrating the process of writing data
`to the Flash array according to an embodiment of the present
`65 invention.
`FIG. 7 is a block diagram illustrating a list of pre-erased
`sections according to an embodiment of the present invention.
`
`60
`
`
`
`US 7,506,098 B2
`
`3
`FIG. 8 is a block diagram illustrating the queue of pending
`operations for the Flash DMA Engines according to an
`embodiment of the present invention.
`FIG. 9 is a block diagram illustrating how new write opera(cid:173)
`tions are to be added to the queue of the Flash DMA Engines 5
`according to an embodiment of the present invention.
`FIG. 10 is a block diagram illustrating an updated snapshot
`of the queue of pending operations according to an embodi(cid:173)
`ment of the present invention.
`FIG. 11 is a flowchart illustrating the process for Bad Block 1 o
`Management according to an embodiment of the present
`invention.
`FIG. 12 is a sample physical layout with flash device level
`striping according to an embodiment of the present invention.
`FIG. 13 is the LBA-PBA Map Table with striping for the
`layout shown in FIG. 12 according to an embodiment of the
`present invention.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`4
`DMA Engines. Lastly, group interleaving is the transfer of
`data by a certain Flash DMA Engines to/from different flash
`devices it controls.
`The main advantage of implementing the bus interleaving
`method is that the flash access is done in parallel utilizing the
`different flash buses, i.e. Flash Bus 0, Flash Bus 1, Flash Bus
`2, and so on. Each Flash DMA Engine uses a different flash
`bus in order to achieve parallel operations. Flash array bank
`interleaving has parallel operations during flash access by
`utilizing the busy signal status of the active flash bus. As an
`example, one engine (FDE AO, where FDE stands for Flash
`DMA Engine, the term FDE and DMA Engine is used inter(cid:173)
`changeably in this document) is writing data to a flash device
`(flash device AO) while FDE AO is waiting for the command
`15 completion, other FDE of different bank interleave, e.g., FDE
`Al, can access Flash Bus O and send out a command to a
`different target flash device such as flash device Al. Accord(cid:173)
`ingly, group interleaving performs parallel operations by hav(cid:173)
`ing a specific FDE send multiple commands to different flash
`20 devices it controls. As an example, one engine (FDE AO)
`sends a command to a flash device AO ofGroup0. While FDE
`AO is waiting for the command to be completed and the flash
`bus is temporarily idle, FDEA0 can send another command to
`a flash device in another group, e.g., flash device AO of Group
`25 1, in order to achieve optimum data transfer.
`From this, it can be seen that data transfers are most effi(cid:173)
`cient if flash devices are accessed using different flash bus
`(bus interleaving), then using different Flash DMA Engine
`(flash array bank interleaving) and lastly different group
`30 (group interleaving). Another feature of new flash devices is
`its multi-bank capability. A single flash device is sub-divided
`into four ( 4) banks wherein parallel operation can occur. In a
`Multi-Bank operation, an FDE can target up to four (4) dif(cid:173)
`ferent blocks in a target flash device and up to four ( 4) blocks
`35 can be erased and/or programmed using a single request.
`To take advantage of this parallel operation, a mapping
`scheme that considers all these capabilities must be created.
`To lessen the size of the LBA-PBA Map Table, a section size
`is defined to be the minimum relocatable area. Assuming an
`LBA size is 512 bytes and the section size is 4 KB, only one
`entry is needed for every eight (8) LBAs. The section size is
`primarily limited by the native page size of a flash device. A
`page is smaller than the minimum erase size used for erasing
`data in a flash block. In the example shown, the minimum
`erase size used is equal to the flash block size. A flash block is
`made up of multiple pages. It is always a multiple of this page
`size since a NAND flash is usually programmed on a per page
`basis. Since the section size is the minimum relocatable
`region, when only five (5) LBAs are updated, the other three
`(3) LBAs must be relocated together with the new data.
`Smaller section would therefore lead to more flexibility but
`larger overhead to maintain the LBA-PBA mapping.
`Although large section might suffer because of the need to
`relocate the urnnodified data, a typical OS or Operating Sys(cid:173)
`tem, usually accesses the media in larger blocks like 4 KB.
`The choice of the section size depends largely on how the host
`accesses the media. The larger the host access is, the more
`acceptable it is to use large section size to minimize the
`LBA-PBA mapping without suffering from the need to relo(cid:173)
`cate unmodified data. Taking the concept wherein applica(cid:173)
`tions for rotating drives tend to optimize sequential access,
`this system as illustrated in FIG. 1 should take advantage of
`this and optimize for sequential access. Therefore, an exem(cid:173)
`plary ideal layout is illustrated in FIG. 2.
`FIG. 2 is a sample physical layout of data sections accord(cid:173)
`ing to an embodiment of the present invention. For illustrative
`purposes, the system shown in FIG. 2 has sixteen (16) DMA
`
`FIG. 1 shows an exemplary architecture that accommo(cid:173)
`dates a very large number of flash arrays to achieve large
`capacities according to an embodiment of the present inven(cid:173)
`tion. The system comprises a number of Flash DMA, or
`Direct Memory Access, Engines (FDEs) 101. A Flash DMA
`Engine (FDE) is basically an intelligent DMA controller that
`facilitates high speed data transfers to/from a group of flash
`memory devices. The system also contains a set of flash buses
`102, which is a bus interface used bytheFDE to connect to the
`flash memory devices. To increase capacity, a number of
`expansion boards, also referred herein as flash array boards,
`103 can be added. An expansion board is essentially a
`memory board that consists of a pool of flash memory devices
`for additional storage and a Flash Buffer Controller 104 for
`communicating to the Flash DMA Engine. The Flash Buffer
`Controller is a controller that drives the flash bus and trans(cid:173)
`lates the command signals from the FDEs into native flash
`commands that can be understood by the target flash chip. The
`number of buses/engines can be increased/decreased accord(cid:173)
`ing to the required performance, cost, and storage capacity of 40
`the system.
`The flash array organization comprises a set of Flash DMA
`Engines controlling multiple flash devices across a set of flash
`buses. The set of flash devices assigned to a particular flash
`bus is called a "flash array bank". Each bank can be parti- 45
`tioned into any number of flash array banks with the Flash
`DMA Engines sharing a flash bus. For example in FIG. 1, it is
`shown that a group of n number of Flash DMA Engines, such
`as group 105, shares a single Flash Bus0 106.
`Each Flash DMA Engine is assigned to control a set of flash 50
`devices. This set of flash devices is said to belong to a flash
`array bank interleave. In addition, each flash device within
`this interleave is said to belong to a different flash group.
`From FIG. 1, all flash chips labeled 'AO' within flash array
`bank interleave 107 is controlled by Flash DMA Engine AO 55
`and each of the flash device within this interleave belongs to
`a different group. For example, the first flash device AO 108
`belongs to Group 0, second flash device AO 109 belongs to
`Group 1, and so on.
`To optimize access to this very large array of flash devices, 60
`a number of operations are done in parallel. There are three
`(3) methods of interleaving that are easily supported in this
`system; these are bus, flash array bank and group interleaving.
`Bus interleaving is the transfer of data to/from flash devices
`using the different flash buses. The flash array bank interleav- 65
`ing method on the other hand, is the transfer of data to/from
`flash devices belonging to the same bus but in different Flash
`
`
`
`US 7,506,098 B2
`
`15
`
`5
`Engines with two (2) engines sharing a bus. Each engine also
`controls two (2) flash devices for a total of thirty-two (32)
`flash devices. In this example, a section consists of eight (8)
`LBAs. As can be seen from FIG. 2, consecutive sections are
`distributed all throughout the entire flash arrays taking advan(cid:173)
`tage of bus interleaves, then engine, interleaves, then group
`interleaves. In this way, when the hosts requests for LBAs
`0-23, equivalent to sections 0-2 (shown as element 201), the
`requests will go throughFDEs 0, 2, 4 (shown as element 202)
`utilizing buses 0, 1, 2 (shown as element 203). Twenty-four
`(24) LBAs are equivalent to three (3) sections. This layout is
`ideal for sequential access. This layout is ideal because the
`mapping is able to take advantage of bus interleaving (then
`bank interleaving, and group interleaving) that the system
`provides. So whenever the host accesses data sequentially, the
`mapping assures that the system will fetch the data in the most
`efficient or parallel way, taking advantage of the bus inter(cid:173)
`leaving then bank interleaving and then group interleaving.
`But as noted before, due to the inherent characteristic of flash
`devices requiring erase cycles before writing new data, write 20
`operations will trigger the data to be relocated to new loca(cid:173)
`tions that have been previously erased (to save on erase
`cycles).
`FIG. 3 is the LBA-PBA Map Table for the layout shown in
`FIG. 2 according to an embodiment of the present invention. 25
`FIG. 3 shows how the LBA-PBA Map Table will look like
`based on the FIG. 2 layout. A section consists of a group of
`LBAs. In this example, a section ( corresponding to a row in
`the table) consists of eight (8) LBAs. The PBA stored here
`contains the information for both the location of the flash 30
`device, uniquely identified using its engine number (bus num(cid:173)
`ber was added to illustrate bus interleaving but each engine is
`associated with only one bus) and group number, and the
`address within the flash device. From FIG. 2, the first physical
`section of Dev0 is labeled as Section 0, the first physical 35
`section of Dev2 is labeled as Section 1 . . . , the second
`physical section of Dev0 is labeled as Section 32, and so on.
`Correspondingly, in FIG. 3, Section O 301 is located at Dev 0,
`which has a unique address Bus 0, Engine 0, Group 0 and is in
`address O within that flash device. Section 1 302 is located at 40
`Dev 2, which has a unique address Bus 1, Engine 2, Group 0
`and is in address O within that flash device. Section 61 303 is
`located at Dev 27, which has a unique address Bus 5, Engine
`11, Group 1 and is in address Ox08 within that flash device.
`Assuming the flash is addressable every 512 bytes and a
`section is 4 KB in size, address Ox08 represents the second
`physical section (or the second 4 KB unit) within a flash
`device, address 0xl0 the third physical section (or the third 4
`KB unit) and so on. The 512 byte addressable unit means that
`every flash address represents 512 bytes so address 0x00 is
`the first 512 bytes, address Ox0l the second 512 bytes, and so
`on. That leads to address 0x00-0x07 representing first 4 KB of
`a flash device and Ox08-0x0F the next 4 KB. The 512 byte
`addressable unit is just an arbitrary value for the system, it can
`be byte addressable leading to address 0x0000-Ox0FFF rep(cid:173)
`resenting the first 4 KB and address Ox 1000-0x 1 FFF the next
`4KB.
`Mapping also plays a major role to look up target physical
`locations for bad block management, for wear-leveling and
`most importantly for write operations. In write operations,
`instead of writing the new data in its old physical location, an
`erased physical location is obtained and the logical block is
`remapped there to save an erased cycle. Determining the new
`physical location is dependent on the current load of the
`system. As mentioned, the illustrated storage system works in
`the most optimum way when it takes advantage of the parallel
`operations it can execute at a given time with the bus inter-
`
`6
`leaving being the most beneficial (then engine interleaving,
`then group interleaving). That means that w