throbber
LETTER
`
`https://doi.org/10.1038/s41586-019-0879-y
`
`Commonality despite exceptional diversity in the
`baseline human antibody repertoire
`
`Bryan Briney1,2,3,4,5*, Anne Inderbitzin1,6, Collin Joyce1,2,3,4 & Dennis R. Burton1,2,4,5,7*
`
`In principle, humans can produce an antibody response to any non-
`self-antigen molecule in the appropriate context. This flexibility is
`achieved by the presence of a large repertoire of naive antibodies, the
`diversity of which is expanded by somatic hypermutation following
`antigen exposure1. The diversity of the naive antibody repertoire in
`humans is estimated to be at least 1012 unique antibodies2. Because
`the number of peripheral blood B cells in a healthy adult human is
`on the order of 5 × 109, the circulating B cell population samples
`only a small fraction of this diversity. Full-scale analyses of human
`antibody repertoires have been prohibitively difficult, primarily
`owing to their massive size. The amount of information encoded
`by all of the rearranged antibody and T cell receptor genes in one
`person—the ‘genome’ of the adaptive immune system—exceeds the
`size of the human genome by more than four orders of magnitude.
`Furthermore, because much of the B lymphocyte population is
`localized in organs or tissues that cannot be comprehensively
`sampled from living subjects, human repertoire studies have
`focused on circulating B cells3. Here we examine the circulating B
`cell populations of ten human subjects and present what is, to our
`knowledge, the largest single collection of adaptive immune receptor
`sequences described to date, comprising almost 3 billion antibody
`heavy-chain sequences. This dataset enables genetic study of the
`baseline human antibody repertoire at an unprecedented depth
`and granularity, which reveals largely unique repertoires for each
`individual studied, a subpopulation of universally shared antibody
`clonotypes, and an exceptional overall diversity of the antibody
`repertoire.
`Eighteen sequencing libraries were generated for each of ten subjects
`(Extended Data Fig. 1). These libraries yielded 2.90 × 109 raw reads.
`After annotation4, which included duplicate removal using unique
`molecular identifiers5, we obtained 3.64 × 108 productive antibody
`sequences (Extended Data Table 1).
`Amplification was reproducible, with similar gene usage between
`replicates (Fig. 1a, Extended Data Fig. 2). The frequencies of IgM-
`encoding (0.62–0.94) and IgG-encoding (0.06–0.38) sequences were
`consistent with the expected frequency of circulating B cells that
`express these isotypes6 (Fig. 1b). Although V-gene, J-gene and CDRH3
`length distributions were similar between subjects (Fig. 1c, e, f), differ-
`ences were large enough that individual repertoires could conceivably
`be distinguished using only these features. We reduced sequence sub-
`samples to the frequency distributions of V-gene, J-gene and CDRH3
`length, and quantified similarity using the Morisita–Horn similarity
`index7,8. Subject repertoires were clearly distinguishable using as few
`as 104 sequences (Fig. 1d, Extended Data Fig. 4) and did not cluster
`by age, gender or ethnicity (Fig. 1g). The IgG+ repertoires were least
`similar, suggesting that the unique immunological histories of subjects
`are a substantial contributor to repertoire individuality (Fig. 1h). A
`one-versus-rest support-vector-machine classifier trained on V-gene,
`J-gene and CDRH3 length data from 5 of the 6 biological replicates
`from each subject accurately assigned the remaining replicate using
`
`test or training datasets of as few as 500 sequences from each replicate
`(Fig. 1i).
`To estimate repertoire diversity and minimize the effects of sequenc-
`ing and amplification error, we first considered clonotype diversity. An
`antibody clonotype is a collection of sequences using the same V and J
`genes, and encoding an identical CDRH3 amino acid sequence9. For
`each subject, all sequences from each biological replicate were collapsed
`into a set of unique clonotypes. Any clonotypes that were repeatedly
`observed after pooling de-duplicated biological replicates must be
`derived from different cells, which provides a straightforward means
`of quantifying multiple occurrence. For clarity, clonotypes or sequences
`present in multiple biological replicates from a single subject will be
`referred to as ‘repeatedly observed’, whereas clonotypes or sequences
`found in multiple subjects will be referred to as ‘shared’.
`Rarefaction curves indicated a low frequency of repeatedly observed
`clonotypes, which is supported by capture–recapture sampling
`(3.9–11.7% recapture; Fig. 2a, Extended Data Fig. 6). To estimate
`repertoire diversity, we selected two estimators: Chao 2 and Recon.
`Chao 2 is a non-parametric estimator that uses repeat occurrence
`data from multiple samples to estimate species richness10. Recon uses
`maximum likelihood to estimate species richness, assuming only that
`the overall size of the repertoire is large (relative to sampling depth)
`and well-mixed11. These estimates represent the total diversity that the
`humoral immune system is capable of generating. Accordingly, these
`estimates may greatly exceed the actual number of B cells present in
`a single individual at any one time. The estimators produced similar
`estimates of clonotype diversity for each subject, with identical rank
`order (Fig. 2b). Recon consistently estimated about twofold greater
`repertoire diversity (2 × 107–1 × 109) than Chao 2 (1 × 107–5 × 108),
`consistent with reports that Chao 2 underestimates richness for samples
`with a non-negligible frequency of rare species12,13. Pooling unique
`clonotypes from multiple subjects enabled us to estimate cohort-wide
`diversity (Fig. 2c). Chao 2 (5 × 109) and Recon (5 × 109) produced
`nearly identical estimates for the complete ten-subject pool. Estimates
`of cohort-wide clonotype diversity exceed individual subject estimates
`by less than two orders of magnitude, which suggests a relatively high
`frequency of shared clonotypes. We next sought to estimate the
`sequence diversity for each individual, again using both the Chao 2
`and Recon estimators (Fig. 2d). As expected, the estimates for
`sequences were substantially higher than for clonotypes, with Chao 2
`(2 × 108–2 × 109) and Recon (1 × 108–2 × 109) producing comparable
`estimates for each subject. Unlike the cohort-wide clonotype esti-
`mates, Recon estimated much lower cohort-wide sequence diversity
`(1 × 1010) than Chao 2 (1 × 1011; Fig. 2e). The light-chain repertoire
`is estimated to be approximately four orders of magnitude less diverse
`than the heavy-chain repertoire (Extended Data Fig. 7) and pairing
`of heavy and light chains is approximately random14, which produces
`a total paired-sequence diversity estimate of 1016 to 1018. The most
`commonly cited estimate of antibody repertoire diversity—1012 unique
`sequences2—considers only the unmutated naive repertoire. As such,
`
`1Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA. 2Center for HIV/AIDS Vaccine Immunology and Immunogen Discovery, The Scripps Research
`Institute, La Jolla, CA, USA. 3Center for Viral Systems Biology, The Scripps Research Institute, La Jolla, CA, USA. 4IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, USA.
`5Human Vaccines Project, New York, NY, USA. 6Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland. 7Ragon Institute of
`MGH, MIT and Harvard, Cambridge, MA, USA. *e-mail: briney@scripps.edu; burton@scripps.edu
`
`© 2019 Springer Nature Limited. All rights reserved.
`
`N A T U R E | www.nature.com/nature
`
`Lassen - Exhibit 1016, p. 1
`
`

`

`316188
`316188 vs 326650
`316188 vs 326651
`316188 vs 326713
`316188 vs 326737
`316188 vs 326780
`316188 vs 326797
`316188 vs 326907
`316188 vs 327059
`316188 vs D103
`
`101
`
`102
`
`105
`104
`103
`Sequence count
`
`106
`
`107
`
`f
`
`d
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0
`
`Morisita–Horn similarity
`
`0.16
`
`0.84
`
`0
`
`0.2
`
`0.6
`0.4
`Frequency
`
`0.8
`
`1.0
`
`316188
`326650
`326651
`326713
`326737
`326780
`326797
`326907
`327059
`D103
`
`IgM
`
`IgG
`
`0.14
`
`0.12
`
`0.10
`
`0.08
`
`0.06
`
`0.04
`
`0.02
`
`b
`
`c
`
`Frequency
`
`VH1
`VH2
`VH3
`VH4
`VH5
`VH6
`VH7
`
`RESEARCH
`
`LETTER
`
`326650 IgM
`Biological replicates
`r 2 = 0.9978
`
`a
`
`10−1
`
`10−2
`
`10−3
`
`10−4
`
`10−5
`
`10−6
`
`10−7
`
`VJ frequency
`
`10−8
`10−8
`
`10−7
`
`10−6
`
`10−5
`10−4
`VJ frequency
`
`10−3
`
`10−2
`
`10−1
`
`0
`
`0
`
`10
`
`20
`
`30
`
`CDRH3 length (AA)
`
`e
`
`316188
`326650
`326651
`326713
`326737
`326780
`326797
`326907
`327059
`D103
`
`IGHJ6
`
`IGHJ5
`
`IGHJ4
`
`IGHJ3
`
`IGHJ2
`
`IGHJ1
`
`IGHV6-1
`
`IGHV5-51
`
`IGHV4-61
`
`IGHV4-59
`
`IGHV4-4
`
`IGHV4-39
`
`IGHV4-34
`
`IGHV4-31
`
`IGHV4-28
`
`IGHV3-9
`
`IGHV3-74
`
`IGHV3-73
`
`IGHV3-72
`
`IGHV3-7
`
`IGHV3-66
`
`IGHV3-64
`
`IGHV2-5
`
`IGHV1-8
`
`IGHV1-3
`
`IGHV1-2
`
`IGHV7-4-1
`
`IGHV5-10-1
`
`IGHV4-38-2
`
`IGHV4-30-4
`
`IGHV4-30-2
`
`IGHV3-NL1
`
`IGHV3-64D
`
`IGHV3-53
`
`IGHV3-49
`
`IGHV3-48
`
`IGHV3-43D
`
`IGHV3-43
`
`IGHV3-33
`
`IGHV3-30-3
`
`IGHV3-30
`
`IGHV3-23
`
`IGHV3-21
`
`IGHV3-20
`
`IGHV3-15
`
`IGHV3-13
`
`IGHV3-11
`
`IGHV2-70D
`
`IGHV2-70
`
`IGHV2-26
`
`IGHV1-69-2
`
`IGHV1-69
`
`IGHV1-58
`
`IGHV1-46
`
`IGHV1-45
`
`IGHV1-24
`
`IGHV1-18
`
`316188
`326650
`
`326651
`326713
`
`326737
`326780
`
`326797
`326907
`
`327059
`D103
`
`500
`sequences
`
`1.0
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`i
`
`Mean ROC AUC
`
`All
`IgM (<2 mutations)
`IgM (2+ mutations)
`IgG
`
`1.0
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`h
`
`Morisita–Horn similarity
`
`3
`1
`7
`6
`2
`3
`
`7
`3
`7
`6
`2
`3
`
`0
`8
`7
`6
`2
`3
`
`0
`5
`6
`6
`2
`3
`
`9
`5
`0
`7
`2
`3
`
`8
`8
`1
`6
`1
`3
`
`7
`0
`9
`6
`2
`3
`
`1
`5
`6
`6
`2
`3
`
`7
`9
`7
`6
`2
`3
`
`3
`0
`1
`D
`
`g
`
`326651
`
`326797
`
`D103
`
`326713
`
`326737
`
`326780
`
`326650
`
`327059
`
`316188
`
`326907
`
`Intra
`
`Inter
`
`Comparison type
`
`101
`
`102
`Sequence count
`
`103
`
`Fig. 1 | Uniqueness of the repertoires of individual subjects.
`a, Frequency comparison of V and J combinations in biological replicates
`from subject 326650. V and J combinations are coloured according to
`the V gene used. b, Sequence frequency by antibody isotype. Subjects
`are coloured as in c. Each point represents a single biological replicate.
`Mean of all samples is indicated for each isotype. c, CDRH3 length
`distribution for each subject. CDRH3 lengths were determined using
`the Immunogenetics (IMGT) numbering scheme. AA, amino acids.
`d, Morisita–Horn similarity of pairwise comparisons between subject
`316188 and each of the other subjects. Lines indicate mean similarity
`of 20 bootstrap samplings, and shaded areas indicate 95% confidence
`intervals. Data from subject 316188 are representative; plots for all other
`subjects can be found in Extended Data Fig. 4. e, f, V gene (e) and J gene
`(f) use by subject. Increased colour intensity indicates higher frequency.
`Subjects are coloured as in c. g, Clustered distance matrix of subjects,
`
`using pairwise Morisita–Horn similarity of V-gene, J-gene and CDRH3
`length as the distance measure. Distance matrix was computed using
`single-linkage clustering (Euclidean distance metric). Subject colours
`are as in c. A dendrogram representation of the distance matrix is also
`shown on the left side of the distance matrix. h, Comparison of intra- and
`inter-subject similarity in V-gene, J-gene and CDRH3 length, using all
`sequences, IgM sequences with fewer than two nucleotide mutations, IgM
`sequences with two or more mutations, or IgG sequences. Points represent
`individual intra- or inter-subject comparisons. Box plots show the median
`line and span the 25th–75th percentile, with whiskers indicating the 95%
`confidence interval. i, Mean receiver operating characteristic (ROC) area
`under the curve (AUC) for a one-versus-rest support-vector-machine
`classifier. The ROC AUC does not drop below 1.0 for any subject when the
`test or training datasets include ≥ 500 sequences each; this 500-sequence
`threshold is indicated with a dashed vertical line.
`
`our sequence diversity estimates, which include both the naive and
`memory sequences, are not directly comparable to this previous esti-
`mate. Clonotype diversity estimates—which incorporate only V- and
`J-gene assignments, and the CDRH3 amino acid sequence—minimize
`the influence of somatic hypermutation, and are more suitable for
`comparison with previous estimates of naive repertoire diversity. The
`cohort-wide paired clonotype diversity using either estimator, under
`the same assumptions regarding light-chain diversity and random
`
`pairing, is estimated at 3 × 1015—over three orders of magnitude
`greater than previously estimated for the naive repertoire.
`Although it is known that convergent antibodies may arise from
`different individuals in response to immunological exposure, and
`a low frequency of CDRH3 sharing has previously been observed
`in healthy adult repertoires9,15, the overall prevalence of repertoire
`sharing is unknown. For each combination of two or more subjects,
`we computed the frequency of shared clonotypes (Fig. 3a). Pairs of
`
`N A T U R E | www.nature.com/nature
`
`© 2019 Springer Nature Limited. All rights reserved.
`
`Lassen - Exhibit 1016, p. 2
`
`

`

`LETTER RESEARCH
`
`0.95
`
`0.90
`
`0.85
`
`316188
`326650
`326651
`326713
`326737
`326780
`326797
`326907
`327059
`D103
`
`a
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`Unique clonotypes (fraction)
`
`0.80
`0.85 0.90 0.95 1.00
`
`0
`
`0
`
`0.8
`0.6
`0.4
`0.2
`Observed clonotypes (fraction)
`
`1.0
`
`1
`
`2
`
`3
`
`6
`5
`4
`Number of subjects
`
`7
`
`8
`
`9
`
`10
`
`Chao 2
`Recon
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`Chao 2
`Recon
`
`1010
`
`109
`
`108
`
`107
`
`c
`
`Diversity estimate (clonotypes)
`
`e
`
`1013
`
`1012
`
`1011
`
`1010
`
`109
`
`108
`
`Diversity estimate (sequences)
`
`Chao 2
`Recon
`
`0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
`
`C
`
`R
`
`Subsample fraction
`
`Estimator
`
`0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
`
`C
`
`R
`
`Chao 2
`Recon
`
`109
`
`108
`
`107
`
`106
`
`109
`
`108
`
`107
`
`b
`
`Diversity estimate (clonotypes)
`
`d
`
`Diversity estimate (sequences)
`
`Subsample fraction
`
`Estimator
`
`Number of subjects
`
`Fig. 2 | Clonotype and sequence diversity amongst the 10 subjects.
`a, Clonotype rarefaction curves for each subject. Lines represent the mean
`of 10 independent samplings, with the exception of the 1.0 fraction (which
`was sampled once). The dashed line represents a perfectly diverse sample.
`Inset is a close-up of the ends of the rarefaction curves. b, Estimates of
`total repertoire diversity per clonotype were computed for increasingly
`large fractions of the clonotype repertoire of each subject. Each line
`represents the mean of 10 random subsamplings without replacement
`(except for the 1.0 fraction). Chao 2 (C) estimates are shown in solid lines,
`Recon (R) estimates are shown in dashed lines. Subject colours are as in
`a. Maximum diversity (1.0 fraction of each subject) for each estimator is
`shown in the right panel. c, Overall cross-subject clonotype diversity of
`each possible combination of one or more subjects. The Chao 2 estimate
`is a solid line and the Recon estimate is a dashed line. Shaded regions
`
`indicate 95% confidence intervals. The confidence intervals in c are
`for different groupings of subjects, not for the estimators themselves.
`d, Estimates of total sequence repertoire diversity were computed for
`increasingly large fractions of the sequence repertoire of each subject.
`Each line represents the mean of 10 random subsamplings without
`replacement (except for the 1.0 fraction, for which only a single calculation
`was made). Chao 2 estimates are shown in solid lines, Recon estimates
`are shown in dashed lines. Subject colours are as in a. Maximum diversity
`(1.0 fraction of each subject repertoire) for each estimator is shown in the
`right panel. e, Overall cross-subject nucleotide sequence diversity of each
`possible combination of one or more subjects. The Chao 2 estimate is a
`solid line and the Recon estimate is a dashed line. Shaded regions indicate
`95% confidence intervals. Confidence intervals are as in c.
`
`subjects shared—on average—0.95% of their respective clonotypes, and
`0.022% of clonotypes were shared by all ten subjects. We next used two
`approaches to quantify the expected frequency of clonotype sharing by
`chance. Hypergeometric distributions, based on cohort-wide clonotype
`
`diversity (Chao 2) and the number of unique clonotypes for each sub-
`ject, indicated a low likelihood that the observed sharing was due to
`chance (8.8 × 10−6, Bonferroni-corrected P = 0.05 is 1.1 × 10−3). We
`also generated synthetic antibody sequences using IGoR16 to determine
`
`© 2019 Springer Nature Limited. All rights reserved.
`
`N A T U R E | www.nature.com/nature
`
`Lassen - Exhibit 1016, p. 3
`
`

`

`15
`
`20
`
`d
`
`0.125
`
`0.100
`
`0.075
`
`0.050
`
`0.025
`
`0
`
`Frequency
`
`Mean: 15.4
`
`316188
`326650
`326651
`326713
`326737
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`1.25
`
`1.00
`
`0.75
`
`0.50
`
`0.25
`
`0
`
`0
`
`c
`
`Cumulative frequency
`
`RESEARCH
`
`LETTER
`
`3
`
`0
`
`1
`
`D
`
`0.20
`
`0.63
`
`0
`
`0 . 1
`0.03
`
`2
`
`0 .0
`2
`0.0
`
`4
`0.0
`
`327059
`
`0.99
`
`a
`
`b
`
`Observed
`
`Synthetic
`(default)
`
`Synthetic
`(subject-specific)
`
`316188
`
`326650
`
`0.41
`
`0.03
`
`0.03
`
`3 2 6 6 5 1
`
`1.17
`
`0 . 5 6
`
`0 . 0 5
`
`0.0 3
`
`0.022%
`
`5
`
`10
`
`25
`
`30
`
`CDRH3 length (IMGT)
`
`Observed
`
`Synthetic
`(default)
`Synthetic
`(subject-specific)
`
`0.25
`
`0.20
`
`0.15
`
`0.10
`
`0.05
`
`0
`
`Frequency
`
`326780
`326797
`326907
`327059
`D103
`
`CDRH3 length (IMGT)
`
`Mean: 14.0
`
`5
`
`10
`
`15
`
`20
`
`1.25
`
`1.00
`
`0.75
`
`0.50
`
`0.25
`
`0
`
`Cumulative frequency
`
`2 3 4 5 6 7 8 9
`
`Number of shared subjects
`
`1.25
`
`326713
`
`2
`0.0
`3
`
`0 . 0
`
`0 . 0 5
`
`0
`0.2
`
`326907
`
`0.0 3
`
`0.56
`
`0.03
`
`0.05
`
`326780
`
`0.03
`
`5
`
`0 . 2
`
`1.57
`
`0.41
`
`7
`
`3
`
`7
`
`6
`
`2
`
`3
`
`0.51
`
`326797
`
`Bonferroni-corrected
`P = 0.05
`
`10
`
`10−7
`
`10−3
`10−4
`10−5
`10−6
`Probability of observed clonotype sharing frequency
`
`10−5
`
`10−4
`
`10−3
`
`10−2
`
`Shared clonotype frequency
`
`0
`
`25
`
`30
`
`CDRH3 length (IMGT)
`
`5
`
`20
`15
`10
`CDRH3 length (IMGT)
`
`25
`
`30
`
`Acidic
`
`Basic
`
`Hydrophobic
`
`Polar
`
`1
`
`2
`
`3
`
`Head
`
`–3 –2 –1
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`–0.1
`
`–0.2
`
`–0.3
`
`h
`
`Relative abundance
`
`Unshared
`
`g
`
`(synthetic)
`
`Shared
`
`Shared
`
`Unshared
`
`f
`
`(synthetic)
`
`Shared
`
`Shared
`
`Unshared
`Shared
`Unshared (synthetic)
`Shared (synthetic)
`
`7
`
`8
`
`12
`11
`10
`9
`CDRH3 length (IMGT)
`
`13
`
`14
`
`1.0
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`0.4
`
`e
`
`Shannon entropy
`
`CDRH3 position
`
`215,799 sequences
`
`9,390
`
`2,202
`
`765
`
`275
`
`114
`
`58
`
`22
`
`3
`
`2
`
`3
`
`4
`
`7
`6
`5
`Number of subjects
`
`8
`
`9
`
`10
`
`012345
`
`Nucleotide mutations
`
`k
`
`1
`
`2
`
`3
`
`7
`6
`5
`4
`Number of subjects
`
`8
`
`9
`
`10
`
`20
`
`15
`
`10
`
`05
`
`Mutations
`
`j
`
`316188
`
`326650
`
`326651
`
`326713
`
`326737
`
`326780
`
`326797
`
`326907
`
`327059
`
`D103
`
`4
`2
`Number of observations
`
`6
`
`20
`
`15
`
`10
`
`05
`
`Mutations
`
`i
`
`Fig. 3 | Shared clonotypes and sequences amongst the 10 subjects.
`a, Venn diagram of shared clonotype frequency. b, Shared clonotype
`frequency between subject groups. Points represent different group
`combinations. Observed sequences (black), synthetic sequences generated
`with IGoR’s default model (red) and sequences generated with subject-
`specific models (blue) are shown. c, Distribution of CDRH3 lengths for
`clonotypes found in one biological replicate (top) or all six biological
`replicates (bottom). CDRH3 length is defined using IMGT numbering.
`The colour key legend is split to maintain legibility; data for all subjects
`are present in both plots. d, Distribution of CDRH3 length for unshared
`clonotypes (top) or clonotypes shared by the majority of subjects (bottom).
`Observed sequences (black), default model (red) and subject-specific
`model (blue) synthetic sequences are shown. e, Per position Shannon
`entropy of the CDRH3 head regions of unshared (solid) or majority-
`shared (dashed) clonotypes. Points indicate the mean, whiskers indicate
`the 95% confidence interval, and lines represent the linear best fit.
`
`f, g, Sequence logos of the CDRH3s encoded by observed unshared
`clonotypes, observed majority-shared clonotypes and synthetic majority-
`shared clonotypes of length 8 (f) or 13 (g). Head-region amino acid
`colouring: polar amino acids (GSTYCQN) are green; basic amino acids
`(KRH) are blue; acidic amino acids (DE) are red; and hydrophobic amino
`acids (AVLIPWFM) are black. All torso residues are grey. h, Relative
`abundance of amino acid properties in the CDRH3s of majority-shared
`clonotypes. Abundances are normalized to the frequency in unshared
`clonotypes. i, Nucleotide mutations for singly observed or repeatedly
`observed clonotypes. Coloured lines indicate the mean for each subject;
`dashed black line indicates the mean of all subjects. j, Nucleotide
`mutations for shared or unshared clonotypes. Coloured lines indicate the
`mean for each subject; dashed black line indicates the mean of all subjects.
`k, Mutation frequency of nucleotide sequences shared by two or more
`subjects. Points indicate mean mutation frequency. The number of unique
`nucleotide sequences in each shared group is shown.
`
`the expected frequency of clonotype sharing due to coincident V(D)J
`recombination. Synthetic sequence sets were generated using three
`different recombination models: (1) IGoR’s default model, inferred
`from unproductive antibody rearrangements and thus focused only
`on parameters related to V(D)J recombination; (2) subject-specific
`recombination models inferred from unmutated sequences from each
`subject; and (3) a combined-subject recombination model inferred
`from a pool of unmutated sequences drawn from all subjects. For each
`model, 10 batches of 108 sequences were generated, for a total of 3 bil-
`lion synthetic sequences. In the sequence sets generated with IGoR’s
`default model, clonotype sharing was sevenfold lower than in human
`repertoires (0.0032%; Fig. 3b), which indicates that coincident V(D)J
`recombination alone is not sufficient to explain the observed sharing.
`The subject-derived synthetic sequence sets showed much more shar-
`ing (0.1% and 0.16%, respectively; Fig. 3b, Extended Data Fig. 8). In
`addition to containing information about V(D)J recombination, the
`subject-derived models also implicitly encode information about the
`
`selection processes involved in B cell development. The increased
`frequency of clonotype sharing in subject-derived synthetic datasets
`indicates that the sieving effect of B cell development produces naive
`repertoires that are more similar than recombination alone would
`be expected to produce. Combined with our observation that naive-
`enriched repertoires are more similar to each other than are class-
`switched repertoires (Fig. 1h), a model emerges in which individual
`repertoires are very dissimilar after V(D)J recombination, are homo-
`genized during B cell development and become increasingly individ-
`ualized following differential responses to immunological exposure.
`The length distributions of CDRH3s in unique and repeatedly
`observed clonotypes were similar, whereas short CDRH3s were much
`more common in shared clonotypes (Fig. 3c, d). The skew towards
`short CDRH3s in the shared population is probably due to the
`increased probability of similar recombination events among shorter
`CDRH3s. By contrast, repeatedly observed clonotypes are more often
`the result of clonal expansion, as evidenced by their increased mutation
`
`N A T U R E | www.nature.com/nature
`
`© 2019 Springer Nature Limited. All rights reserved.
`
`Lassen - Exhibit 1016, p. 4
`
`

`

`LETTER RESEARCH
`
`frequency (Fig. 3i). Shared nucleotide sequences showed a strong
`inverse relationship between mutation frequency and the number of
`shared subjects (Fig. 3k); almost all sequences shared by four or more
`subjects were unmutated. Thus, although coincident recombination
`infrequently produces identical antibody sequences, the likelihood of
`coincident recombination being linked to an identical set of somatic
`mutations is exceptionally low.
`Antibody CDRH3s can be divided into two primary regions: the
`framework-proximal ‘torso’ and the more-variable ‘head’17,18. When
`comparing size-matched samples of shared and unshared clonotypes,
`we noted less diversity in the head regions of shared clonotypes.
`Furthermore, head-region diversity in shared clonotypes was inversely
`related to length of CDRH3, which is a relationship that is not seen in
`unshared clonotypes or synthetic repertoires (Fig. 3e). This inverse
`relationship—along with the skewed distribution of CDRH3 lengths
`in shared clonotypes (Fig. 3d)—indicates that two distinct processes
`shape the shared clonotype population. The shortest shared CDRH3s
`encode head-region diversity, similar to unshared CDRH3s and syn-
`thetic CDRH3s of the same length (Fig. 3f). Thus, short CDRH3s
`are probably shared primarily owing to their lower CDRH3 diversity
`and concomitantly higher likelihood of independent generation by
`coincident recombination. By contrast, longer shared CDRH3s are
`less diverse than unshared or shared synthetic populations (Fig. 3g),
`and more commonly encode head regions that are enriched in polar,
`uncharged residues and lack hydrophobic residues (Fig. 3h). This
`implies the existence of a mechanism by which these shared clono-
`types are selected or enriched after recombination, on the basis of the
`biochemical properties of their CDRH3 regions.
`In summary, sequencing the circulating B cell population of ten
`individuals at unprecedented depth has revealed repertoires that are
`highly individualized and extremely diverse. We estimate cohort-wide
`repertoire diversity of approximately 5 × 109 unique heavy-chain
`clonotypes, and as many as 1 × 1011 unique heavy-chain sequences.
`This indicates that the paired antibody diversity available to the
`circulating repertoire is very large, perhaps in the region of 1016–1018
`unique antibody sequences. Despite this enormous diversity, clono-
`types are shared more frequently than would be expected from coin-
`cident V(D)J recombination. Furthermore, we found that clonotype
`sharing is probably driven primarily by selection processes related
`to early B cell development rather than by convergent responses
`to common antigens. The possible clinical and diagnostic applica-
`tions of sequencing the adaptive-immune repertoire are myriad—
`however, much work remains to be done before these applications can
`be implemented. The results described here are confined to circulating
`B cells, which represent a minority of the total B cell population. The
`repertories of circulating and tissue-resident B cells are known to dif-
`fer19, and these differences may influence overall repertoire diversity
`and sharing. Furthermore, we have studied only ten individuals from
`a limited age range (18–30 years) and geographical region at a single
`time point. Much larger cohorts—representing diverse ethnicities,
`geographies and ages—will be required to capture the true population-
`wide repertoire diversity. Nevertheless, large-scale sequencing of
`the human adaptive-immune repertoire holds immense potential.
`Our use of high-level antibody-feature frequencies to differentiate
`repertoires raises the possibility of identifying and classifying
`discrete repertoire perturbations associated with autoimmune disease
`and chronic infection. Furthermore, because the repertoire of adaptive-
`immune receptors encodes a comprehensive record of an individ-
`ual’s immunological encounters, leveraging large-scale sequencing
`of adaptive-immune receptors represents an appealing strategy for
`diagnosing infection or deconvoluting infection histories. Finally, the
`individuality of the baseline repertoire of each subject suggests that
`the personalization of vaccine delivery and therapeutic intervention
`may produce substantial benefits in the treatment and prevention of
`infectious diseases.
`
`Online content
`Any methods, additional references, Nature Research reporting summaries, source
`data, statements of data availability and associated accession codes are available at
`https://doi.org/10.1038/s41586-019-0879-y.
`
`Received: 19 September 2017; Accepted: 22 November 2018;
`Published online xx xx xxxx.
`
` 1. Rajewsky, K. Clonal selection and learning in the antibody system. Nature 381,
`751–758 (1996).
` 2. Alberts, B. et al. The Generation of Antibody Diversity (Garland Science, New York,
`2002).
` 3. Boyd, S. D. & Crowe, J. E. Jr. Deep sequencing and human antibody repertoire
`analysis. Curr. Opin. Immunol. 40, 103–109 (2016).
` 4. Briney, B. & Burton, D. Massively scalable genetic analysis of antibody
`repertoires. Preprint at https://www.biorxiv.org/content/
`early/2018/10/19/447813 (2018).
` 5. Briney, B., Le, K., Zhu, J. & Burton, D. R. Clonify: unseeded antibody lineage
`assignment from next-generation sequencing data. Sci. Rep. 6, 23901 (2016).
` 6. Morbach, H., Eichhorn, E. M., Liese, J. G. & Girschick, H. J. Reference values for B
`cell subpopulations from infancy to adulthood. Clin. Exp. Immunol. 162,
`271–279 (2010).
` 7. Morisita, M. Measuring of the dispersion of individuals and analysis of the
`distributional patterns. Mem. Fac. Sci. Kyushu Univ. Ser. E 2, 5–235 (1959).
` 8. Horn, H. S. Measurement of ‘overlap’ in comparative ecological studies. Am.
`Nat. 100, 419–424 (1966).
` 9. Setliff, I. et al. Multi-donor longitudinal antibody repertoire sequencing reveals
`the existence of public antibody clonotypes in HIV-1 infection. Cell Host Microbe
`23, 845–854 (2018).
` 10. Chao, A. Estimating the population size for capture–recapture data with
`unequal catchability. Biometrics 43, 783–791 (1987).
` 11. Kaplinsky, J. & Arnaout, R. Robust estimates of overall immune-repertoire
`diversity from high-throughput measurements on samples. Nat. Commun. 7,
`11881 (2016).
` 12. Chao, A. & Chiu, C.-H. Nonparametric Estimation and Comparison of Species
`Richness https://doi.org/10.1002/9780470015902.a0026329 (John Wiley &
`Sons, 2016).
` 13. Eren, M. I., Chao, A., Hwang, W.-H. & Colwell, R. K. Estimating the richness of a
`population when the maximum number of classes is fixed: a nonparametric
`solution to an archaeological problem. PLoS ONE 7, e34179 (2012).
` 14. DeKosky, B. J. et al. In-depth determination and analysis of the human paired
`heavy- and light-chain antibody repertoire. Nat. Med. 21, 86–91 (2015).
` 15. Arnaout, R. et al. High-resolution description of antibody heavy-chain
`repertoires in humans. PLoS ONE 6, e22365 (2011).
` 16. Marcou, Q., Mora, T. & Walczak, A. M. High-throughput immune repertoire
`analysis with IGoR. Nat. Commun. 9, 561 (2018).
` 17. Morea, V., Tramontano, A., Rustici, M., Chothia, C. & Lesk, A. M. Conformations of
`the third hypervariable region in the VH domain of immunoglobulins. J. Mol.
`Biol. 275, 269–294 (1998).
` 18. Finn, J. A. et al. Improving loop modeling of the antibody complementarity-
`determining region 3 using knowledge-based restraints. PLoS ONE 11,
`e0154811 (2016).
` 19. Briney, B. S., Willis, J. R., Finn, J. A., McKinney, B. A. & Crowe, J. E. Jr. Tissue-
`specific expressed antibody variable gene repertoires. PLoS ONE 9, e100839
`(2014).
`
`Acknowledgements The authors thank all of the study subjects for their
`participation and the Genomic Services Laboratory at the HudsonAlpha
`Institute for Biotechnology for their sequencing expertise. This work was
`supported by the National Institute of Allergy and Infectious Diseases (Center
`for HIV/AIDS Vaccine Immunology and Immunogen Discovery, UM1AI100663
`(D.R.B.); Center for Viral Systems Biology, U19AI135995 (B.B.)), the
`International AIDS Vaccine Initiative (IAVI) through the Neutralizing Antibody
`Consortium SFP1849 (D.R.B.), and the Ragon Institute of MGH, MIT and
`Harvard (D.R.B.).
`
`Author contributions B.B. and D.R.B. planned and designed the experiments.
`B.B., A.I. and C.J. performed experiments. B.B. analysed data. B.B. and D.R.B.
`wrote the manuscript. All authors contributed to manuscript revisions.
`
`Competing interests The authors declare no competing interests.
`
`Additional information
`Extended data is available for this paper at https://doi.org/10.1038/s41586-
`019-0879-y.
`Supplementary information is available for this paper at https://doi.org/
`10.1038/s41586-019-0879-y.
`Reprints and permissions information is available at http://www.nature.com/
`reprints.
`Correspondence and requests for materials should be addressed to B.B. or
`D.R.B.
`Publisher’s note: Springer Nature remains neutral with regard to jurisdictional
`claims in published maps and institutional affiliations.
`
`© 2019 Springer Nature Limited. All rights reserved.
`
`N A T U R E | www.nature.com/nature
`
`Lassen - Exhibit 1016, p. 5
`
`

`

`RESEARCH
`
`LETTER
`
`METHODS
`No statistical methods were used to predetermine sample size. The experiments
`were not randomized and investigators were not blinded to allocation during
`experiments and outcome assessment.
`Leukapheresis samples. Full leukopaks (three blood volumes) were obtained from
`ten human subjects (Hemacare). Samples were collected at Hemacare’s Southern
`California donor centre. Sample collection was performed under a protocol
`approved by the Institutional Research Boards of Scripps Research and Hemacare.
`Informed consent was obtained from each subject. All subjects were healthy, HIV-
`negative adults betwe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket