`WO 2018/057526
`This application claims the benefit of U.S. Provisional Application No. 62/517,671 filed
`June 9, 2017; U.S. Provisional Application No. 62/446,178 filed January 13, 2017; and US.
`Provisional Application No. 62/397,855 filed September 21, 2016, each of which is incorporated
`herein by reference in its entirety.
`The instant application contains a Sequence Listing which has been submitted
`electronically in ASCII format and is hereby incorporated by referencein its entirety. Said ASCII
`copy, created on September 18, 2017, is named 44854728601 _SL.txt andis 1,612 bytesin size.
`[0003] Biomolecule based information storage systems, e.g., DNA-based, have a large storage
`capacity andstability over time. However, there is a need for scalable, automated, highly accurate
`and highly efficient systems for generating biomolecules for information storage.
`Provided herein are methods for storing and accessing information, the method
`comprising: (a) converting at least one item of information in a form of at least one digital sequence
`to at least one nucleic acid sequence; (b) providing a structure comprising a surface; (c)
`synthesizing a plurality of polynucleotides having predetermined sequences collectively encoding
`for the at least one nucleic acid sequence, wherein each polynucleotide extends from the surface;
`(d) storing the plurality of polynucleotides; and (e) selectively transferring the plurality of
`polynucleotides to a receiving unit, wherein selectively transferring comprisesapplication of a
`force, wherein the force is laminar pressure, capillary pressure, slip flow pressure, magnetic force,
`electrostatic force, peristaltic force, sound waves, vibrational force, centripetal force, centrifugal
`force, or any combination thereof, and wherein the plurality of polynucleotides collectively encodes
`for a single nucleic acid sequence of the at least one nucleic acid sequence. Further provided herein
`are methods, wherein the application of force comprises a conducting member, and an applied
`voltage potential between the structure and the conducting member. Further provided herein are
`methods, wherein the application of force comprises contacting the surface of the structure with a
`rigid or flexible slip. Further provided herein are methods, wherein the application of force
`comprises a pressure release or pressure nozzle. Further provided herein are methods further


`WO 2018/057526
`comprising using the pressure nozzle during step (c). Further provided herein are methods further
`comprising flooding the polynucleotides through the pressure nozzle. Further provided herein are
`methodsfurther comprising depositing nucleotides through the pressure nozzle. Further provided
`herein are methods further comprising: sequencingthe plurality of polynucleotides; and assembling
`the at least one digital sequence. Further provided herein are methods, wherein the at least one
`digital sequence assembled is 100% accurate compared to aninitial at least one digital sequence.
`Provided herein are methodsfor storing information, the method comprising: (a)
`converting at least one item of information in a form of at least one digital sequence to at least one
`nucleic acid sequence; (b) synthesizing a plurality of polynucleotides having predetermined
`sequencescollectively encoding for the at least one nucleic acid sequence, wherein each
`polynucleotide comprises: (i) a plurality of coding regions, wherein each coding region is identical;
`and (ii) at least one non-coding region, wherein the at least one non-coding region comprises a
`cleavage region; and (c) storing the plurality of polynucleotides. Further provided herein are
`methods, wherein the cleavage region comprises a restriction enzyme recognition site. Further
`provided herein are methods, wherein the cleavage region comprises a light sensitive nucleobase.
`Further provided herein are methods further comprising application of a restriction enzyme,
`electromagnetic radiation, or a gaseous reagent to cleave at the cleavage region, thereby removing
`at least one of the plurality of coding regions. Further provided herein are methods, wherein each
`coding region comprises 25 to 500 bases in length. Further provided herein are methods, wherein
`each coding region comprises 100 to 2000 basesin length. Further provided herein are methods,
`wherein each non-coding region comprises | to 100 bases in length. Further provided herein are
`methods, wherein each non-coding region comprises at most 200 bases. Further provided herein are
`methods, wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides.
`Further provided herein are methods, wherein the plurality of polynucleotides comprises at least 10
`billion polynucleotides. Further provided herein are methods, wherein greater than 90% of the
`polynucleotides encode for a sequence that does not differ from the predetermined sequence.
`Further provided herein are methods, wherein the at least one item of information is text
`information, audio information or visual information. Further provided herein are methods, wherein
`a first non-coding region within each polynucleotide has a different sequence than a second non-
`coding region within each polynucleotide. Further provided herein are methods, wherein each non-
`coding region within each polynucleotide has a different sequence. Further provided herein are
`methods, wherein a first cleavage region within each polynucleotide has a different sequence than a
`second cleavage region within each polynucleotide. Further provided herein are methods, wherein
`each cleavage region within each polynucleotide has a different sequence. Further provided herein


`WO 2018/057526
`are methods, wherein a numberof cleavage regions within each polynucleotide is at least 1, 2, 3, 4,
`or 5. Further provided herein are methods, wherein a sequence for the numberof cleavage regions
`is different. Further provided herein are methods, wherein each polynucleotide comprises a tether
`Provided herein are methods for encrypting information, the method comprising: (a)
`converting at least one item of information in a form ofat least one digital sequence to at least one
`nucleic acid sequence; (b) associating each of the at least one nucleic acid sequence with one of a
`plurality of non-identical markings; (c) providing a structure having a surface, wherein the surface
`comprisesthe plurality of non-identical markings; (d) synthesizing a plurality of polynucleotides
`having predetermined sequencescollectively encoding for the at least one nucleic acid sequence,
`wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides, and wherein
`each polynucleotide extends from the surface in a discrete region demarcated by oneof the non-
`identical markings; and (e) storing the plurality of polynucleotides. Further provided herein are
`methods, wherein the plurality of polynucleotides comprises at least 1,000,000 polynucleotides.
`Further provided herein are methods, wherein greater than 90% of the polynucleotides encode for a
`sequencethat does not differ from the predetermined sequence. Further provided herein are
`methods, wherein the at least one item of information is text information, audio information or
`visual information. Further provided herein are methods, wherein a subset of the polynucleotides
`discretely demarcated by one of the non-identical markings comprise a same sequence. Further
`provided herein are methods further comprising selecting a subset of polynucleotides discretely
`demarcated by one of the non-identical markings, releasing the subset of polynucleotides,
`sequencing the plurality of polynucleotides, decrypting the plurality of polynucleotides, and
`assembling the at least one digital sequence. Further provided herein are methods further
`comprising selecting a subset of polynucleotides discretely demarcated by one of the non-identical
`markings, amplifying the subset of polynucleotides, sequencing the subset of polynucleotides,
`decrypting the plurality of polynucleotides, and assembling the at least one digital sequence.
`Further provided herein are methods, wherein the at least one digital sequence assembled is 100%
`accurate comparedto an initial at least one digital sequence. Further provided herein are methods,
`wherein the at least one digital sequence comprises an amountof digital information ofat least 1
`gigabyte. Further provided herein are methods, wherein the at least one digital sequence comprises
`an amountofdigital information of at least 1 terabyte. Further provided herein are methods,
`wherein the at least one digital sequence comprises an amountof digital information of at least 1


`WO 2018/057526
`Provided herein are methodsfor collection of information, the method comprising: (a)
`providing a structure comprising a surface, wherein the structure comprises: a first plurality of
`polynucleotides having predetermined sequences collectively encoding for at least one nucleic acid
`sequence; and a secondplurality of polynucleotides having predetermined sequencescollectively
`encodingfor the at least one nucleic acid sequence, wherein the first plurality of polynucleotides
`and the second plurality of polynucleotides both extend from the surface and both encodeforthe
`sameat least one nucleic acid sequence; (b) selectively separating a region of the structure
`comprising the first plurality of polynucleotides and removingthefirst plurality of polynucleotides
`from the surface; and (c) sequencing and decrypting the at least one nucleic acid sequence to form
`at least one digital sequence encoding for an item of information. Further provided herein are
`methods, wherein a region of the structure comprising thefirst plurality of polynucleotides
`comprises a cluster of channels or wells. Further provided herein are methods, wherein the structure
`is a rigid structure. Further provided herein are methods, wherein the structure is a flexible
`structure. Further provided herein are methods, wherein a region of the structure comprising only a
`remaining portion of the structure lacking thefirst plurality of polynucleotides is spliced back
`together. Further provided herein are methods, wherein selectively removing comprises application
`of force to a region of the structure comprising the first plurality of polynucleotides. Further
`provided herein are methods, wherein the application of force is laminar pressure, capillary
`pressure, slip flow pressure, magnetic force, electrostatic force, peristaltic force, sound waves,
`vibrational force, centripetal force, centrifugal force, or any combination thereof. Further provided
`herein are methods, wherein the application of force comprises a conducting member, and an
`applied voltage potential between the structure and the conducting member. Further provided
`herein are methods, wherein the application of force comprises contacting the surface of the
`structure with a rigid or flexible slip. Further provided herein are methods, wherein the application
`of force comprises a pressure release or pressure nozzle. Further provided herein are methods,
`wherein each polynucleotide of the first plurality of nucleotides comprises at most 500 basesin
`length. Further provided herein are methods, wherein each polynucleotide of the first plurality of
`nucleotides comprises at most 200 bases in length. Further provided herein are methods, wherein
`each polynucleotide of the second plurality of nucleotides comprises at most 500 bases in length.
`Further provided herein are methods, wherein each polynucleotide of the secondplurality of
`nucleotides comprises at most 200 bases in length. Further provided herein are methods, wherein an
`amountof the item of information is at least one gigabyte. Further provided herein are methods,
`wherein an amount of the item of information is at least one terabyte. Further provided herein are
`methods, wherein an amount of the item of information is at least one petabyte.


`WO 2018/057526
`Provided herein are nucleic acid libraries, comprising a plurality of polynucleotides,
`wherein each of the polynucleotides comprises: (i) a plurality of coding regions, wherein each
`coding region is identical; and (11) at least one non-coding region, wherein the at least one non-
`coding region comprises a cleavage region; and wherein whentheplurality of polynucleotides are
`sequenced, decrypted, and assembled to form a digital sequence, the digital sequence has greater
`than 90% accuracy compared to a preselected digital sequence. Further provided herein are nucleic
`acid libraries, wherein the cleavage region comprises a restriction enzyme recognition site. Further
`provided herein are nucleic acid libraries, wherein the cleavage region comprises a light sensitive
`nucleobase. Further provided herein are nucleic acid libraries, further comprising application of a
`restriction enzyme, electromagnetic radiation, or a gaseousreagent to cleave at the cleavage region,
`thereby removingat least one of the plurality of coding regions. Further provided herein are nucleic
`acid libraries, wherein each coding region comprises 25 to 500 bases in length. Further provided
`herein are nucleic acid libraries, wherein each coding region comprises 100 to 2000 basesin length.
`Further provided herein are nucleic acid libraries, wherein each non-coding region comprises | to
`100 bases in length. Further provided herein are nucleic acid libraries, wherein each non-coding
`region comprises at most 200 bases. Further provided herein are nucleic acid libraries, wherein the
`plurality of polynucleotides comprises at least 100,000 polynucleotides. Further provided herein are
`nucleic acid libraries, wherein the plurality of polynucleotides comprises at least 10 billion
`polynucleotides. Further provided herein are nucleic acid libraries, wherein greater than 90% of the
`polynucleotides encode for a sequence that does not differ from a predetermined sequence. Further
`provided herein are nucleic acid libraries, wherein a first non-coding region within each
`polynucleotide has a different sequence than a second non-coding region within each
`polynucleotide. Further provided herein are nucleic acid libraries, wherein each non-coding region
`within each polynucleotide has a different sequence. Further provided herein are nucleic acid
`libraries, wherein a first cleavage region within each polynucleotide has a different sequence than a
`second cleavage region within each polynucleotide. Further provided herein are nucleic acid
`libraries, wherein each cleavage region within each polynucleotide has a different sequence.
`Further provided herein are nucleic acid libraries, wherein a numberofcleavage regions within
`each polynucleotide is at least 1, 2, 3, 4, or 5. Further provided herein are nucleic acid libraries,
`wherein a sequence for the numberof cleavage regionsis different.
`Provided herein are devices for storing information, the device comprising: (a) a structure
`having a surface; and (b) a plurality of discrete regions on the surface for synthesizing a plurality of
`polynucleotides having predetermined sequences collectively encoding for at least one nucleic acid
`sequence, wherein each polynucleotide comprises: (i) a plurality of coding regions, wherein each


`WO 2018/057526
`coding region is identical; and (11) at least one non-coding region, wherein the at least one non-
`coding region comprises a cleavage region; and wherein the at least one nucleic acid sequence
`encodes for at least one item of information.
`Provided herein are devices for encrypting information, the device comprising: (a) a
`structure having a surface, wherein the surface comprisesa plurality of non-identical markings; and
`(b) a plurality of discrete regions on the surface for synthesizing a plurality of polynucleotides
`having predetermined sequencescollectively encoding for at least one nucleic acid sequence,
`wherein the plurality of polynucleotides comprises at least 100,000 polynucleotides, and wherein
`each polynucleotide extends from the surface in a discrete region demarcated by one of the non-
`identical markings; and wherein the at least one nucleic acid sequence encodes for at least one item
`of information.
`Provided herein are methods for storing information, the method comprising: (a)
`converting at least one item of information in the form ofat least one digital sequence to at least
`one nucleic acid sequence; (b) synthesizing a plurality of polynucleotides having predetermined
`sequencescollectively encoding for the at least one nucleic acid sequence, wherein each
`polynucleotide comprises: (i) at least one coding sequence up to about 500 bases in length; and (ii)
`at least one bar code sequence, wherein the bar code sequence comprises sequence associated with
`the identity of the coding sequence; and (c) storing the plurality of polynucleotides. Further
`provided herein are methods, wherein each polynucleotide comprisesat least one coding sequence
`up to about 300 bases in length. Further provided herein are methods, wherein the plurality of
`polynucleotides comprisesat least about 100,000 polynucleotides. Further provided herein are
`methods, wherein the plurality of polynucleotides comprisesat least about 10 billion
`polynucleotides. Further provided herein are methods, wherein greater than 90% of the
`polynucleotides encode for a sequence that does not differ from the predetermined sequence.
`Further provided herein are methods, wherein the at least one item of information is text
`information, audio information or visual information.
`[0012] All publications, patents, and patent applications mentioned in this specification are herein
`incorporated by reference to the same extent as if each individual publication, patent, or patent
`application was specifically and individually indicated to be incorporated by reference.
`The novel features of the invention are set forth with particularity in the appended claims.
`A better understanding of the features and advantages of the present invention will be obtained by


`WO 2018/057526
`reference to the following detailed description that sets forth illustrative embodiments, in which the
`principles of the invention are utilized, and the accompanying drawings of which:
`Figure 1 illustrates an exemplary workflow for nucleic acid-based data storage.
`Figures 2A-2C depict various polynucleotide sequence design schemes.
`Figures 3A-3D depict various polynucleotide sequence design schemes.
`Figures 4A-4B depict a barcode design scheme.
`Figure 5 illustrates a plate configured for polynucleotide synthesis comprising 24 regions,
`or sub-fields, each having an array of 256 clusters.
`Figure 6 illustrates a closer view of the sub-field in FIG. 5 having 16 x16 ofclusters,
`each cluster having 121 individual loci.
`Figure 7 illustrates a detailed view of the cluster in FIG. 5, where the cluster has 121
`Figure 8Aillustrates a front view of a plate with a plurality of channels.
`Figure 8B illustrates a sectional view of plate with a plurality of channels.
`Figures 9A-9B depict a continuous loop and reel-to-reel arrangements for flexible
`Figures 9C-9D depict schemasfor release and extraction of synthesized polynucleotides.
`Figures 10A-10C depict a zoom in ofa flexible structure, having spots, channels, or
`wells, respectively.
`Figure 11A illustrates a zoom in of loci on a structure described herein.
`Figures 11B-11C illustrate markings on structures described herein.
`Figure 12 illustrates a polynucleotide synthesis material deposition device.
`Figure 13 illustrates a polynucleotide synthesis workflow.
`Figures 14A-14Billustrate a method for electrostatic deposition of a polynucleotide into
`a plurality of channels.
`Figures 15A-15B illustrate an exemplary method for electrostatic transfer of a
`polynucleotide from a plurality of channels.
`Figures 16A-16B illustrate a method for transfer of a polynucleotide from a plurality of
`channels, through a slip mechanism.
`Figures 17A-17B illustrate a method for transfer of a polynucleotide from a plurality of
`channels, through a pressure release mechanism.
`Figure 18 illustrates a method for transfer of a polynucleotide from a plurality of channels
`in a flexible structure, through a nozzle mechanism.


`WO 2018/057526
`Figures 19A-19B illustrate a method for capture of a polynucleotide from a plurality of
`channels, through a pin.
`Figures 20A-20B illustrate a method for electrostatic capture of a polynucleotide from a
`plurality of channels.
`Figure 21 illustrates a method for electrostatic containment of a polynucleotide from a
`plurality of channels into a receiving unit.
`Figure 22 illustrates a method for electrostatic containment of a polynucleotide from a
`plurality of channels into a receiving unit.
`Figure 23 illustrates an example of a computer system.
`Figure 24 is a block diagram illustrating architecture of a computer system.
`Figure 25 is a diagram demonstrating a network configured to incorporate a plurality of
`computer systems, a plurality of cell phones and personal data assistants, and Network Attached
`Storage (NAS).
`Figure 26 is a block diagram of a multiprocessor computer system using a shared virtual
`address memory space.
`There is a need for larger capacity storage systems as the amount of information generated
`and stored is increasing exponentially. Traditional storage media have a limited capacity and
`require specialized technology that changes with time, requiring constant transfer of data to new
`media, often at a great expense. A biomolecule such as a DNA molecule provides a suitable host
`for information storage in-part due to its stability over time and capacity for four bit information
`coding, as opposed to traditional binary information coding. Thus, large amounts of data are
`encoded in the DNAin a relatively smaller amount of physical space than used by commercially
`available information storage devices. Provided herein are methods to increase DNA synthesis
`throughput through increased sequence density and decreased turn-around time.
`[0044] Definitions
`[0045] Unless defined otherwise, all technical and scientific terms used herein have the same
`meaning as is commonly understood by one of ordinary skill in the art to which these inventions
`[0046] Throughout this disclosure, numerical features are presented in a range format. It should
`be understood that the description in range format is merely for convenience and brevity and should
`not be construed as an inflexible limitation on the scope of any embodiments. Accordingly, the
`description of a range should be considered to have specifically disclosed all the possible subranges


`WO 2018/057526
`as well as individual numerical values within that range to the tenth of the unit of the lower limit
`unless the context clearly dictates otherwise. For example, description of a range such as from | to
`6 should be considered to have specifically disclosed subranges such as from 1 to 3, from | to 4,
`from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual values within that range,
`for example, 1.1, 2, 2.3, 5, and 5.9. This applies regardless of the breadth of the range. The upper
`and lowerlimits of these intervening ranges may independently be included in the smaller ranges,
`and are also encompassed within the invention, subject to any specifically excluded limit in the
`stated range. Wherethe stated range includes one or both of the limits, ranges excluding either or
`both of those included limits are also included in the invention, unless the context clearly dictates
`The terminology usedherein is for the purpose of describing particular embodiments only
`and is not intendedto be limiting of any embodiment. Asused herein, the singular forms “a,” “an”
`and “the” are intended to include the plural forms as well, unless the context clearly indicates
`otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used
`in this specification, specify the presence of stated features, integers, steps, operations, elements,
`and/or components, but do not preclude the presence or addition of one or more other features,
`integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term
`“and/or” includes any and all combinations of one or more of the associated listed items.
`[0048] Unless specifically stated or obvious from context, as used herein, the term “about” in
`reference to a numberor range of numbersis understood to mean the stated number and numbers
`+/- 10% thereof, or 10% below the lowerlisted limit and 10% above the higherlisted limit for the
`valueslisted for a range.
`[0049] As used herein, the terms “preselected sequence”, “predefined sequence” or
`“predetermined sequence”are used interchangeably. The terms meanthat the sequence of the
`polymeris known and chosen before synthesis or assembly of the polymer.In particular, various
`aspects of the invention are described herein primarily with regard to the preparation of nucleic
`acids molecules, the sequence of the oligonucleotide or polynucleotide being known and chosen
`before the synthesis or assembly of the nucleic acid molecules.
`Provided herein are methods and compositions for production of synthetic (i.e. de novo
`synthesized or chemically synthesized) polynucleotides. Polynucleotides may also be referred to as
`oligonucleotides or oligos. Polynucleotide sequences described herein may be, unless stated
`otherwise, comprise DNA or RNA.
`[0051] Nucleic Acid Based Information Storage


`WO 2018/057526
`Provided herein are devices, compositions, systems and methods for nucleic acid-based
`information (data) storage. An exemplary workflowis provided in FIG.1. In a first step, a digital
`sequence encoding an item of information (.e., digital information in a binary code for processing
`by a computer) is received 101. An encryption 103 schemeis applied to convert the digital
`sequence from a binary code to a nucleic acid sequence 105. A surface material for nucleic acid
`extension, a design for loci for nucleic acid extension (aka, arrangementspots), and reagents for
`nucleic acid synthesis are selected 107. The surface of a structure is prepared for nucleic acid
`synthesis 108. De novo polynucleotide synthesis is performed 109. The synthesized
`polynucleotides are stored 111 and available for subsequent release 113, in whole or in part. Once
`released, the polynucleotides, in whole or in part, are sequenced 115, subject to decryption 117 to
`convert nucleic sequence backto digital sequence. The digital sequence is then assembled 119 to
`obtain an alignment encoding for the original item of information.
`Items of Information
`[0054] Optionally, an early step of a DNA data storage process disclosed herein includes obtaining
`or receiving one or more items of information in the form of an initial code. Items of information
`include, without limitation, text, audio and visual information. Exemplary sources for items of
`information include, without limitation, books, periodicals, electronic databases, medical records,
`letters, forms, voice recordings, animal recordings, biological profiles, broadcasts, films, short
`videos, emails, bookkeeping phone logs, internet activity logs, drawings, paintings, prints,
`photographs, pixelated graphics, and software code. Exemplary biological profile sources for items
`of information include, without limitation, gene libraries, genomes, gene expression data, and
`protein activity data. Exemplary formats for items of information include, without limitation, .txt,
`.PDF, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .rtf, jpg, .gif, .psd, .bmp,.tiff, png, and. mpeg. The
`amount of individual file sizes encoding for an item of information, or a plurality of files encoding
`for items of information, in digital format include, without limitation, up to 1024 bytes (equal to 1
`KB), 1024 KB (equal to IMB), 1024 MB(equalto 1 GB), 1024 GB (equal to 1TB), 1024 TB
`(equal to 1PB), 1 exabyte, 1 zettabyte, 1 yottabyte, 1 xenottabyte or more. In some instances, an
`amount ofdigital information is at least 1 gigabyte (GB). In some instances,

