`(19) World Intellectual Property
`=>
`Organization
`
`=
`International Bureau
`(43) International Publication Date =
`29 March 2018 (29.03.2018) WIPO |
`
`UNDER THE PATENT COOPERATION TREATY (PCT)
`
`UMA OA AOU CAUTAAT
`
`(10) International Publication Number
`WO 2018/057526 A2
`
`PCT
`
`(51) International Patent Classification:
`GO6F 13/16 (2006.01)
`C07H 21/04 (2006.01)
`
`(21) International Application Number:
`
`PCT/US2017/052305
`
`(22) International Filing Date:
`19 September 2017 (19.09.2017)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`62/397,855
`62/446,178
`62/517,671
`
`21 September 2016 (21.09.2016) US
`13 January 2017 (13.01.2017)
`US
`09 June 2017 (09.06.2017)
`US
`
`(74) Agent: DUSABAN GONZALES, Stephanie; Wilson
`Sonsini Goodrich & Rosati, 650 Page Mill Road, Palo Alto,
`California 94304 (US).
`
`(81) Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AO,AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW,BY, BZ,
`CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO,
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN,
`HR, HU, ID,IL,IN,IR, IS, JO, JP, KE, KG, KH, KN,KP,
`KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME,
`MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ,
`OM,PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA,
`SC, SD,SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN,
`TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(71) Applicant: TWIST BIOSCIENCE CORPORATION
`[US/US]; 455 Mission Bay Boulevard South, Suite 545, San
`Francisco, California 94158 (US).
`
`(72) Inventor: PECK,Bill James; 3086 Carleton Place, Santa
`Clara, California 95051 (US).
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM,KE, LR, LS, MW, MZ, NA, RW,SD, SL, ST, SZ, TZ,
`UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`FF,FS, FI, FR, GB, GR, HR, HU,IF, IS, IT, LT, LU, LV,
`
`(54) Title: NUCLEIC ACID BASED DATA STORAGE
`
`101
`
`Receipt of Digital
`Sequence
`Encoding for
`Information
`
`Selection:
`Surface Material,
`Loci Design &
`Reagents
`
`103
`
`105
`
`Encryption
`
`Nucleic Acid
`Sequence for
`Synthesis
`
`Surface
`Preparation
`
`De Novo
`Polynuclectide
`Synthesis
`
`
`
` |
`
`FIG. 1
`
`[
`
`Storage
`
`Decryption
`
`(57) Abstract: Provided herein are compositions, devices, systems and methods for the generation and use of biomolecule-based
`information for storage. Additionally, devices described herein for de novo synthesis of nucleic acids encoding information related to
`the original source information may be rigid or flexible material. Further described herein are highly efficient methods for long term
`data storage with 100% accuracy in the retention of information. Also provided herein are methods and systems for efficient transfer
`of preselected polynucleotides from a storage structure for reading stored information.
`
`[Continued on next page]
`
`
`
`
`
`wo2018/057526A2IIININNNMIITINITANITATTAAIATA
`
`
`
`WO 2018/057526 A2 [IMT TNAUTC TAM AAA AUU AAUT
`
`MC, MK, MT, NL, NO,PL, PT, RO, RS, SE, SL SK, SM,
`TR), OAPT (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW,
`KM, ML, MR, NE, SN, TD, TG).
`
`Declarations under Rule 4.17:
`
`— asto applicant's entitlement to apply for and be granted a
`patent (Rule 4.17(ii))
`— as to the applicant's entitlement to claim the priority of the
`earlier application (Rule 4.17(iii))
`Published:
`
`— without international search report and to be republished
`upon receipt of that report (Rule 48.2(g))
`— with sequence listing part ofdescription (Rule 5.2(a))
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`NUCLEIC ACID BASED DATA STORAGE
`
`CROSS-REFERENCE
`
`[0001]
`
`This application claims the benefit of U.S. Provisional Application No. 62/517,671 filed
`
`June 9, 2017; U.S. Provisional Application No. 62/446,178 filed January 13, 2017; and US.
`
`Provisional Application No. 62/397,855 filed September 21, 2016, each of which is incorporated
`
`herein by reference in its entirety.
`
`SEQUENCELISTING
`
`[0002]
`
`The instant application contains a Sequence Listing which has been submitted
`
`electronically in ASCII format and is hereby incorporated by referencein its entirety. Said ASCII
`
`copy, created on September 18, 2017, is named 44854728601 _SL.txt andis 1,612 bytesin size.
`
`BACKGROUND
`
`[0003] Biomolecule based information storage systems, e.g., DNA-based, have a large storage
`
`capacity andstability over time. However, there is a need for scalable, automated, highly accurate
`
`and highly efficient systems for generating biomolecules for information storage.
`
`BRIEF SUMMARY
`
`[0004]
`
`Provided herein are methods for storing and accessing information, the method
`
`comprising: (a) converting at least one item of information in a form of at least one digital sequence
`
`to at least one nucleic acid sequence; (b) providing a structure comprising a surface; (c)
`
`synthesizing a plurality of polynucleotides having predetermined sequences collectively encoding
`
`for the at least one nucleic acid sequence, wherein each polynucleotide extends from the surface;
`
`(d) storing the plurality of polynucleotides; and (e) selectively transferring the plurality of
`
`polynucleotides to a receiving unit, wherein selectively transferring comprisesapplication of a
`
`force, wherein the force is laminar pressure, capillary pressure, slip flow pressure, magnetic force,
`
`electrostatic force, peristaltic force, sound waves, vibrational force, centripetal force, centrifugal
`
`force, or any combination thereof, and wherein the plurality of polynucleotides collectively encodes
`
`for a single nucleic acid sequence of the at least one nucleic acid sequence. Further provided herein
`
`are methods, wherein the application of force comprises a conducting member, and an applied
`
`voltage potential between the structure and the conducting member. Further provided herein are
`
`methods, wherein the application of force comprises contacting the surface of the structure with a
`
`rigid or flexible slip. Further provided herein are methods, wherein the application of force
`
`comprises a pressure release or pressure nozzle. Further provided herein are methods further
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`comprising using the pressure nozzle during step (c). Further provided herein are methods further
`
`comprising flooding the polynucleotides through the pressure nozzle. Further provided herein are
`
`methodsfurther comprising depositing nucleotides through the pressure nozzle. Further provided
`
`herein are methods further comprising: sequencingthe plurality of polynucleotides; and assembling
`
`the at least one digital sequence. Further provided herein are methods, wherein the at least one
`
`digital sequence assembled is 100% accurate compared to aninitial at least one digital sequence.
`
`[0005]
`
`Provided herein are methodsfor storing information, the method comprising: (a)
`
`converting at least one item of information in a form of at least one digital sequence to at least one
`
`nucleic acid sequence; (b) synthesizing a plurality of polynucleotides having predetermined
`
`sequencescollectively encoding for the at least one nucleic acid sequence, wherein each
`
`polynucleotide comprises: (i) a plurality of coding regions, wherein each coding region is identical;
`
`and (ii) at least one non-coding region, wherein the at least one non-coding region comprises a
`
`cleavage region; and (c) storing the plurality of polynucleotides. Further provided herein are
`
`methods, wherein the cleavage region comprises a restriction enzyme recognition site. Further
`
`provided herein are methods, wherein the cleavage region comprises a light sensitive nucleobase.
`
`Further provided herein are methods further comprising application of a restriction enzyme,
`
`electromagnetic radiation, or a gaseous reagent to cleave at the cleavage region, thereby removing
`
`at least one of the plurality of coding regions. Further provided herein are methods, wherein each
`
`coding region comprises 25 to 500 bases in length. Further provided herein are methods, wherein
`
`each coding region comprises 100 to 2000 basesin length. Further provided herein are methods,
`
`wherein each non-coding region comprises | to 100 bases in length. Further provided herein are
`
`methods, wherein each non-coding region comprises at most 200 bases. Further provided herein are
`
`methods, wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides.
`
`Further provided herein are methods, wherein the plurality of polynucleotides comprises at least 10
`
`billion polynucleotides. Further provided herein are methods, wherein greater than 90% of the
`
`polynucleotides encode for a sequence that does not differ from the predetermined sequence.
`
`Further provided herein are methods, wherein the at least one item of information is text
`
`information, audio information or visual information. Further provided herein are methods, wherein
`
`a first non-coding region within each polynucleotide has a different sequence than a second non-
`
`coding region within each polynucleotide. Further provided herein are methods, wherein each non-
`
`coding region within each polynucleotide has a different sequence. Further provided herein are
`
`methods, wherein a first cleavage region within each polynucleotide has a different sequence than a
`
`second cleavage region within each polynucleotide. Further provided herein are methods, wherein
`
`each cleavage region within each polynucleotide has a different sequence. Further provided herein
`
`2
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`are methods, wherein a numberof cleavage regions within each polynucleotide is at least 1, 2, 3, 4,
`
`or 5. Further provided herein are methods, wherein a sequence for the numberof cleavage regions
`
`is different. Further provided herein are methods, wherein each polynucleotide comprises a tether
`
`region.
`
`[0006]
`
`Provided herein are methods for encrypting information, the method comprising: (a)
`
`converting at least one item of information in a form ofat least one digital sequence to at least one
`
`nucleic acid sequence; (b) associating each of the at least one nucleic acid sequence with one of a
`
`plurality of non-identical markings; (c) providing a structure having a surface, wherein the surface
`
`comprisesthe plurality of non-identical markings; (d) synthesizing a plurality of polynucleotides
`
`having predetermined sequencescollectively encoding for the at least one nucleic acid sequence,
`
`wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides, and wherein
`
`each polynucleotide extends from the surface in a discrete region demarcated by oneof the non-
`
`identical markings; and (e) storing the plurality of polynucleotides. Further provided herein are
`
`methods, wherein the plurality of polynucleotides comprises at least 1,000,000 polynucleotides.
`
`Further provided herein are methods, wherein greater than 90% of the polynucleotides encode for a
`
`sequencethat does not differ from the predetermined sequence. Further provided herein are
`
`methods, wherein the at least one item of information is text information, audio information or
`
`visual information. Further provided herein are methods, wherein a subset of the polynucleotides
`
`discretely demarcated by one of the non-identical markings comprise a same sequence. Further
`
`provided herein are methods further comprising selecting a subset of polynucleotides discretely
`
`demarcated by one of the non-identical markings, releasing the subset of polynucleotides,
`
`sequencing the plurality of polynucleotides, decrypting the plurality of polynucleotides, and
`
`assembling the at least one digital sequence. Further provided herein are methods further
`
`comprising selecting a subset of polynucleotides discretely demarcated by one of the non-identical
`
`markings, amplifying the subset of polynucleotides, sequencing the subset of polynucleotides,
`
`decrypting the plurality of polynucleotides, and assembling the at least one digital sequence.
`
`Further provided herein are methods, wherein the at least one digital sequence assembled is 100%
`
`accurate comparedto an initial at least one digital sequence. Further provided herein are methods,
`
`wherein the at least one digital sequence comprises an amountof digital information ofat least 1
`
`gigabyte. Further provided herein are methods, wherein the at least one digital sequence comprises
`
`an amountofdigital information of at least 1 terabyte. Further provided herein are methods,
`
`wherein the at least one digital sequence comprises an amountof digital information of at least 1
`
`petabyte.
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`[0007]
`
`Provided herein are methodsfor collection of information, the method comprising: (a)
`
`providing a structure comprising a surface, wherein the structure comprises: a first plurality of
`
`polynucleotides having predetermined sequences collectively encoding for at least one nucleic acid
`
`sequence; and a secondplurality of polynucleotides having predetermined sequencescollectively
`
`encodingfor the at least one nucleic acid sequence, wherein the first plurality of polynucleotides
`
`and the second plurality of polynucleotides both extend from the surface and both encodeforthe
`
`sameat least one nucleic acid sequence; (b) selectively separating a region of the structure
`
`comprising the first plurality of polynucleotides and removingthefirst plurality of polynucleotides
`
`from the surface; and (c) sequencing and decrypting the at least one nucleic acid sequence to form
`
`at least one digital sequence encoding for an item of information. Further provided herein are
`
`methods, wherein a region of the structure comprising thefirst plurality of polynucleotides
`
`comprises a cluster of channels or wells. Further provided herein are methods, wherein the structure
`
`is a rigid structure. Further provided herein are methods, wherein the structure is a flexible
`
`structure. Further provided herein are methods, wherein a region of the structure comprising only a
`
`remaining portion of the structure lacking thefirst plurality of polynucleotides is spliced back
`
`together. Further provided herein are methods, wherein selectively removing comprises application
`
`of force to a region of the structure comprising the first plurality of polynucleotides. Further
`
`provided herein are methods, wherein the application of force is laminar pressure, capillary
`
`pressure, slip flow pressure, magnetic force, electrostatic force, peristaltic force, sound waves,
`
`vibrational force, centripetal force, centrifugal force, or any combination thereof. Further provided
`
`herein are methods, wherein the application of force comprises a conducting member, and an
`
`applied voltage potential between the structure and the conducting member. Further provided
`
`herein are methods, wherein the application of force comprises contacting the surface of the
`
`structure with a rigid or flexible slip. Further provided herein are methods, wherein the application
`
`of force comprises a pressure release or pressure nozzle. Further provided herein are methods,
`
`wherein each polynucleotide of the first plurality of nucleotides comprises at most 500 basesin
`
`length. Further provided herein are methods, wherein each polynucleotide of the first plurality of
`
`nucleotides comprises at most 200 bases in length. Further provided herein are methods, wherein
`
`each polynucleotide of the second plurality of nucleotides comprises at most 500 bases in length.
`
`Further provided herein are methods, wherein each polynucleotide of the secondplurality of
`
`nucleotides comprises at most 200 bases in length. Further provided herein are methods, wherein an
`
`amountof the item of information is at least one gigabyte. Further provided herein are methods,
`
`wherein an amount of the item of information is at least one terabyte. Further provided herein are
`
`methods, wherein an amount of the item of information is at least one petabyte.
`
`4
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`[0008]
`
`Provided herein are nucleic acid libraries, comprising a plurality of polynucleotides,
`
`wherein each of the polynucleotides comprises: (i) a plurality of coding regions, wherein each
`
`coding region is identical; and (11) at least one non-coding region, wherein the at least one non-
`
`coding region comprises a cleavage region; and wherein whentheplurality of polynucleotides are
`
`sequenced, decrypted, and assembled to form a digital sequence, the digital sequence has greater
`
`than 90% accuracy compared to a preselected digital sequence. Further provided herein are nucleic
`
`acid libraries, wherein the cleavage region comprises a restriction enzyme recognition site. Further
`
`provided herein are nucleic acid libraries, wherein the cleavage region comprises a light sensitive
`
`nucleobase. Further provided herein are nucleic acid libraries, further comprising application of a
`
`restriction enzyme, electromagnetic radiation, or a gaseousreagent to cleave at the cleavage region,
`
`thereby removingat least one of the plurality of coding regions. Further provided herein are nucleic
`
`acid libraries, wherein each coding region comprises 25 to 500 bases in length. Further provided
`
`herein are nucleic acid libraries, wherein each coding region comprises 100 to 2000 basesin length.
`
`Further provided herein are nucleic acid libraries, wherein each non-coding region comprises | to
`
`100 bases in length. Further provided herein are nucleic acid libraries, wherein each non-coding
`
`region comprises at most 200 bases. Further provided herein are nucleic acid libraries, wherein the
`
`plurality of polynucleotides comprises at least 100,000 polynucleotides. Further provided herein are
`
`nucleic acid libraries, wherein the plurality of polynucleotides comprises at least 10 billion
`
`polynucleotides. Further provided herein are nucleic acid libraries, wherein greater than 90% of the
`
`polynucleotides encode for a sequence that does not differ from a predetermined sequence. Further
`
`provided herein are nucleic acid libraries, wherein a first non-coding region within each
`
`polynucleotide has a different sequence than a second non-coding region within each
`
`polynucleotide. Further provided herein are nucleic acid libraries, wherein each non-coding region
`
`within each polynucleotide has a different sequence. Further provided herein are nucleic acid
`
`libraries, wherein a first cleavage region within each polynucleotide has a different sequence than a
`
`second cleavage region within each polynucleotide. Further provided herein are nucleic acid
`
`libraries, wherein each cleavage region within each polynucleotide has a different sequence.
`
`Further provided herein are nucleic acid libraries, wherein a numberofcleavage regions within
`
`each polynucleotide is at least 1, 2, 3, 4, or 5. Further provided herein are nucleic acid libraries,
`
`wherein a sequence for the numberof cleavage regionsis different.
`
`[0009]
`
`Provided herein are devices for storing information, the device comprising: (a) a structure
`
`having a surface; and (b) a plurality of discrete regions on the surface for synthesizing a plurality of
`
`polynucleotides having predetermined sequences collectively encoding for at least one nucleic acid
`
`sequence, wherein each polynucleotide comprises: (i) a plurality of coding regions, wherein each
`
`5
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`coding region is identical; and (11) at least one non-coding region, wherein the at least one non-
`
`coding region comprises a cleavage region; and wherein the at least one nucleic acid sequence
`
`encodes for at least one item of information.
`
`[0010]
`
`Provided herein are devices for encrypting information, the device comprising: (a) a
`
`structure having a surface, wherein the surface comprisesa plurality of non-identical markings; and
`
`(b) a plurality of discrete regions on the surface for synthesizing a plurality of polynucleotides
`
`having predetermined sequencescollectively encoding for at least one nucleic acid sequence,
`
`wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides, and wherein
`
`each polynucleotide extends from the surface in a discrete region demarcated by one of the non-
`
`identical markings; and wherein the at least one nucleic acid sequence encodes for at least one item
`
`of information.
`
`[0011]
`
`Provided herein are methods for storing information, the method comprising: (a)
`
`converting at least one item of information in the form ofat least one digital sequence to at least
`
`one nucleic acid sequence; (b) synthesizing a plurality of polynucleotides having predetermined
`
`sequencescollectively encoding for the at least one nucleic acid sequence, wherein each
`
`polynucleotide comprises: (i) at least one coding sequence up to about 500 bases in length; and (ii)
`
`at least one bar code sequence, wherein the bar code sequence comprises sequence associated with
`
`the identity of the coding sequence; and (c) storing the plurality of polynucleotides. Further
`
`provided herein are methods, wherein each polynucleotide comprisesat least one coding sequence
`
`up to about 300 bases in length. Further provided herein are methods, wherein the plurality of
`
`polynucleotides comprisesat least about 100,000 polynucleotides. Further provided herein are
`
`methods, wherein the plurality of polynucleotides comprisesat least about 10 billion
`
`polynucleotides. Further provided herein are methods, wherein greater than 90% of the
`
`polynucleotides encode for a sequence that does not differ from the predetermined sequence.
`
`Further provided herein are methods, wherein the at least one item of information is text
`
`information, audio information or visual information.
`
`INCORPORATION BY REFERENCE
`
`[0012] All publications, patents, and patent applications mentioned in this specification are herein
`
`incorporated by reference to the same extent as if each individual publication, patent, or patent
`
`application was specifically and individually indicated to be incorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0013]
`
`The novel features of the invention are set forth with particularity in the appended claims.
`
`A better understanding of the features and advantages of the present invention will be obtained by
`
`6
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`reference to the following detailed description that sets forth illustrative embodiments, in which the
`
`principles of the invention are utilized, and the accompanying drawings of which:
`
`[0014]
`
`Figure 1 illustrates an exemplary workflow for nucleic acid-based data storage.
`
`[0015]
`
`Figures 2A-2C depict various polynucleotide sequence design schemes.
`
`[0016]
`
`Figures 3A-3D depict various polynucleotide sequence design schemes.
`
`[0017]
`
`Figures 4A-4B depict a barcode design scheme.
`
`[0018]
`
`Figure 5 illustrates a plate configured for polynucleotide synthesis comprising 24 regions,
`
`or sub-fields, each having an array of 256 clusters.
`
`[0019]
`
`Figure 6 illustrates a closer view of the sub-field in FIG. 5 having 16 x16 ofclusters,
`
`each cluster having 121 individual loci.
`
`[0020]
`
`Figure 7 illustrates a detailed view of the cluster in FIG. 5, where the cluster has 121
`
`loci.6
`
`[0021]
`
`Figure 8Aillustrates a front view of a plate with a plurality of channels.
`
`[0022]
`
`Figure 8B illustrates a sectional view of plate with a plurality of channels.
`
`[0023]
`
`Figures 9A-9B depict a continuous loop and reel-to-reel arrangements for flexible
`
`structures.
`
`[0024]
`
`Figures 9C-9D depict schemasfor release and extraction of synthesized polynucleotides.
`
`[0025]
`
`Figures 10A-10C depict a zoom in ofa flexible structure, having spots, channels, or
`
`wells, respectively.
`
`[0026]
`
`Figure 11A illustrates a zoom in of loci on a structure described herein.
`
`[0027]
`
`Figures 11B-11C illustrate markings on structures described herein.
`
`[0028]
`
`Figure 12 illustrates a polynucleotide synthesis material deposition device.
`
`[0029]
`
`Figure 13 illustrates a polynucleotide synthesis workflow.
`
`[0030]
`
`Figures 14A-14Billustrate a method for electrostatic deposition of a polynucleotide into
`
`a plurality of channels.
`
`[0031]
`
`Figures 15A-15B illustrate an exemplary method for electrostatic transfer of a
`
`polynucleotide from a plurality of channels.
`
`[0032]
`
`Figures 16A-16B illustrate a method for transfer of a polynucleotide from a plurality of
`
`channels, through a slip mechanism.
`
`[0033]
`
`Figures 17A-17B illustrate a method for transfer of a polynucleotide from a plurality of
`
`channels, through a pressure release mechanism.
`
`[0034]
`
`Figure 18 illustrates a method for transfer of a polynucleotide from a plurality of channels
`
`in a flexible structure, through a nozzle mechanism.
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`[0035]
`
`Figures 19A-19B illustrate a method for capture of a polynucleotide from a plurality of
`
`channels, through a pin.
`
`[0036]
`
`Figures 20A-20B illustrate a method for electrostatic capture of a polynucleotide from a
`
`plurality of channels.
`
`[0037]
`
`Figure 21 illustrates a method for electrostatic containment of a polynucleotide from a
`
`plurality of channels into a receiving unit.
`
`[0038]
`
`Figure 22 illustrates a method for electrostatic containment of a polynucleotide from a
`
`plurality of channels into a receiving unit.
`
`[0039]
`
`Figure 23 illustrates an example of a computer system.
`
`[0040]
`
`Figure 24 is a block diagram illustrating architecture of a computer system.
`
`[0041]
`
`Figure 25 is a diagram demonstrating a network configured to incorporate a plurality of
`
`computer systems, a plurality of cell phones and personal data assistants, and Network Attached
`
`Storage (NAS).
`
`[0042]
`
`Figure 26 is a block diagram of a multiprocessor computer system using a shared virtual
`
`address memory space.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`[0043]
`
`There is a need for larger capacity storage systems as the amount of information generated
`
`and stored is increasing exponentially. Traditional storage media have a limited capacity and
`
`require specialized technology that changes with time, requiring constant transfer of data to new
`
`media, often at a great expense. A biomolecule such as a DNA molecule provides a suitable host
`
`for information storage in-part due to its stability over time and capacity for four bit information
`
`coding, as opposed to traditional binary information coding. Thus, large amounts of data are
`
`encoded in the DNAin a relatively smaller amount of physical space than used by commercially
`
`available information storage devices. Provided herein are methods to increase DNA synthesis
`
`throughput through increased sequence density and decreased turn-around time.
`
`[0044] Definitions
`
`[0045] Unless defined otherwise, all technical and scientific terms used herein have the same
`
`meaning as is commonly understood by one of ordinary skill in the art to which these inventions
`
`belong.
`
`[0046] Throughout this disclosure, numerical features are presented in a range format. It should
`
`be understood that the description in range format is merely for convenience and brevity and should
`
`not be construed as an inflexible limitation on the scope of any embodiments. Accordingly, the
`
`description of a range should be considered to have specifically disclosed all the possible subranges
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`as well as individual numerical values within that range to the tenth of the unit of the lower limit
`
`unless the context clearly dictates otherwise. For example, description of a range such as from | to
`
`6 should be considered to have specifically disclosed subranges such as from 1 to 3, from | to 4,
`
`from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual values within that range,
`
`for example, 1.1, 2, 2.3, 5, and 5.9. This applies regardless of the breadth of the range. The upper
`
`and lowerlimits of these intervening ranges may independently be included in the smaller ranges,
`
`and are also encompassed within the invention, subject to any specifically excluded limit in the
`
`stated range. Wherethe stated range includes one or both of the limits, ranges excluding either or
`
`both of those included limits are also included in the invention, unless the context clearly dictates
`
`otherwise.
`
`[0047]
`
`The terminology usedherein is for the purpose of describing particular embodiments only
`
`and is not intendedto be limiting of any embodiment. Asused herein, the singular forms “a,” “an”
`
`and “the” are intended to include the plural forms as well, unless the context clearly indicates
`
`otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used
`
`in this specification, specify the presence of stated features, integers, steps, operations, elements,
`
`and/or components, but do not preclude the presence or addition of one or more other features,
`
`integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term
`
`“and/or” includes any and all combinations of one or more of the associated listed items.
`
`[0048] Unless specifically stated or obvious from context, as used herein, the term “about” in
`
`reference to a numberor range of numbersis understood to mean the stated number and numbers
`
`+/- 10% thereof, or 10% below the lowerlisted limit and 10% above the higherlisted limit for the
`
`valueslisted for a range.
`
`[0049] As used herein, the terms “preselected sequence”, “predefined sequence” or
`
`“predetermined sequence”are used interchangeably. The terms meanthat the sequence of the
`
`polymeris known and chosen before synthesis or assembly of the polymer.In particular, various
`
`aspects of the invention are described herein primarily with regard to the preparation of nucleic
`
`acids molecules, the sequence of the oligonucleotide or polynucleotide being known and chosen
`
`before the synthesis or assembly of the nucleic acid molecules.
`
`[0050]
`
`Provided herein are methods and compositions for production of synthetic (i.e. de novo
`
`synthesized or chemically synthesized) polynucleotides. Polynucleotides may also be referred to as
`
`oligonucleotides or oligos. Polynucleotide sequences described herein may be, unless stated
`
`otherwise, comprise DNA or RNA.
`
`[0051] Nucleic Acid Based Information Storage
`
`
`
`WO 2018/057526
`
`PCT/US2017/052305
`
`[0052]
`
`Provided herein are devices, compositions, systems and methods for nucleic acid-based
`
`information (data) storage. An exemplary workflowis provided in FIG.1. In a first step, a digital
`
`sequence encoding an item of information (.e., digital information in a binary code for processing
`
`by a computer) is received 101. An encryption 103 schemeis applied to convert the digital
`
`sequence from a binary code to a nucleic acid sequence 105. A surface material for nucleic acid
`
`extension, a design for loci for nucleic acid extension (aka, arrangementspots), and reagents for
`
`nucleic acid synthesis are selected 107. The surface of a structure is prepared for nucleic acid
`
`synthesis 108. De novo polynucleotide synthesis is performed 109. The synthesized
`
`polynucleotides are stored 111 and available for subsequent release 113, in whole or in part. Once
`
`released, the polynucleotides, in whole or in part, are sequenced 115, subject to decryption 117 to
`
`convert nucleic sequence backto digital sequence. The digital sequence is then assembled 119 to
`
`obtain an alignment encoding for the original item of information.
`
`[0053]
`
`Items of Information
`
`[0054] Optionally, an early step of a DNA data storage process disclosed herein includes obtaining
`
`or receiving one or more items of information in the form of an initial code. Items of information
`
`include, without limitation, text, audio and visual information. Exemplary sources for items of
`
`information include, without limitation, books, periodicals, electronic databases, medical records,
`
`letters, forms, voice recordings, animal recordings, biological profiles, broadcasts, films, short
`
`videos, emails, bookkeeping phone logs, internet activity logs, drawings, paintings, prints,
`
`photographs, pixelated graphics, and software code. Exemplary biological profile sources for items
`
`of information include, without limitation, gene libraries, genomes, gene expression data, and
`
`protein activity data. Exemplary formats for items of information include, without limitation, .txt,
`
`.PDF, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .rtf, jpg, .gif, .psd, .bmp,.tiff, png, and. mpeg. The
`
`amount of individual file sizes encoding for an item of information, or a plurality of files encoding
`
`for items of information, in digital format include, without limitation, up to 1024 bytes (equal to 1
`
`KB), 1024 KB (equal to IMB), 1024 MB(equalto 1 GB), 1024 GB (equal to 1TB), 1024 TB
`
`(equal to 1PB), 1 exabyte, 1 zettabyte, 1 yottabyte, 1 xenottabyte or more. In some instances, an
`
`amount ofdigital information is at least 1 gigabyte (GB). In some instances,