`https://doi.org/10.1186/s12864-018-4718-6
`
`RESEARCH ARTICLE
`
`Open Access
`
`A genome-wide survey of mutations in
`the Jurkat cell line
`, Azeem Siddique2, Steven R. Head2, Daniel R. Salomon1ˆ and Andrew I. Su1
`Louis Gioia1*
`
`Abstract
`Background: The Jurkat cell line has an extensive history as a model of T cell signaling. But at the turn of the 21st
`century, some expression irregularities were observed, raising doubts about how closely the cell line paralleled
`normal human T cells. While numerous expression deficiencies have been described in Jurkat, genetic explanations
`have only been provided for a handful of defects.
`Results: Here, we report a comprehensive catolog of genomic variation in the Jurkat cell line based on
`whole-genome sequencing. With this list of all detectable, non-reference sequences, we prioritize potentially
`damaging mutations by mining public databases for functional effects. We confirm documented mutations in Jurkat
`and propose links from detrimental gene variants to observed expression abnormalities in the cell line.
`Conclusions: The Jurkat cell line harbors many mutations that are associated with cancer and contribute to Jurkat’s
`unique characteristics. Genes with damaging mutations in the Jurkat cell line are involved in T-cell receptor signaling
`(PTEN, INPP5D, CTLA4, and SYK), maintenance of genome stability (TP53, BAX, and MSH2), and O-linked glycosylation
`(C1GALT1C1). This work ties together decades of molecular experiments and serves as a resource that will streamline
`both the interpretation of past research and the design of future Jurkat studies.
`Keywords: Jurkat, Whole-genome sequencing, Cancer, T-cell, Genome stability, T-cell receptor, T-cell acute
`lymphoblastic leukemia
`
`Background
`The Jurkat cell line was isolated in 1977 from the blood
`of a fourteen-year-old boy with Acute Lymphoblastic
`Leukemia [1]. It was one of the first in vitro systems for
`studying T-cell biology and helped to produce an incredi-
`ble number of discoveries and publications (Fig. 1) [2].
`As the workhorse behind a diverse array of molecular
`investigations, the Jurkat cell line revealed the founda-
`tions for our modern understanding of multiple signaling
`pathways. Most notably, studies of Jurkat cells established
`the bulk of what is currently known about T-cell recep-
`tor (TCR) signaling [2]. However, at the turn of the 21st
`century, as the use of Jurkat as a model T-cell line was
`reaching its height, some abnormalities in the cell line
`began to come to light.
`
`*Correspondence: lhgioia@scripps.edu
`ˆDeceased
`1Department of Molecular Medicine, The Scripps Research Institute, La Jolla,
`California 92037, USA
`Full list of author information is available at the end of the article
`
`Problems were first noticed in the form of gene expres-
`sion defects. The most publicized of these defects was
`aberrant PI3K signaling due to the absence of PTEN and
`INPP5D (SHIP) in Jurkat cells [2]. The loss of these two
`central regulators of phosphatidylinositol signaling was
`proposed as the cause of the previously-documented, con-
`stitutive activation of PI3K signaling, a major mediator
`of downstream TCR signaling events [3]. This fundamen-
`tal TCR signaling defect in Jurkat led many researchers
`to question its validity as a model system for T-cell stud-
`ies [2]. Although the number of publications using Jurkat
`dropped off over the following decade, it is still widely
`used in biomedical research (Fig. 1).
`Defect detection up to now has been primarily based
`on top-down approaches, requiring knowledge of signal-
`ing or expression defects, which leads to interrogations of
`specific coding sequences. While multiple genetic defects
`have been described over the past few decades, these top-
`down approaches are limited in scope and have failed to
`provide a broader understanding of Jurkat biology.
`
`© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
`International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
`reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
`Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
`(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 1
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 2 of 13
`
`Fig. 1 Jurkat publication trends. Yearly publication counts for PubMed queries. Representative queries are given in the legend. Note that these
`query descriptions are abbreviations of more detailed search terms, which are provided in the “Methods” section
`
`Modern sequencing technology allows for interrogation
`of the entire genome. In contrast to top-down techniques,
`whole-genome sequencing (WGS) allows us to investigate
`genetic defects from the bottom up, with the potential
`to extend our understanding of abnormalities in Jurkat.
`Thus, in this study, we used shotgun sequencing to per-
`form genome-scale characterization of genomic variants
`in this commonly-used cell line.
`
`Results
`Sequencing and variant callers
`Whole-genome sequencing of the Jurkat cell line pro-
`duced over 366 million 100bp paired-end reads and over
`531 million 150bp paired-end reads, totaling over 116 bil-
`lion sequenced bases. More than 98% of the reads were
`successfully aligned to the hg19 human reference genome
`with the Burrows-Wheeler Aligner [4], totaling over 110
`billion aligned bases. This gave an average coverage of
`∼ 36x across the hg19 reference sequence, with over 10x
`depth of coverage for 78.8% of the genome. The aligned
`reads were then used to detect both small and large
`genomic variants in the Jurkat genome.
`In order to utilize all of the information available in the
`WGS data, we employed a suite of variant calling tools
`for the identification of all major types of genomic vari-
`ants. Each tool uses a certain type of sequence information
`to identify specific categories of variants. Our variant
`caller suite consisted of four distinct tools and algorithms:
`The Genome Analysis Toolkit, Pindel, BreakDancer, and
`CNVnator [5–8].
`The Genome Analysis Toolkit (GATK) from the Broad
`Institute uses De Bruijn graph-based models to iden-
`tify single-nucleotide substitutions and small insertions
`and deletions. Pindel’s split-read approach can also detect
`
`small insertions and deletions, as well as inversions, tan-
`dem duplications, and inter-chromosomal translocations.
`BreakDancer compares the distance between aligned read
`pairs to the insert size distribution from the sequencing
`library in order to find large structural variants. CNVnator
`uses read-depth information and a mean-shift algorithm
`to assign copy number levels across the genome and
`identify deletion and duplication events.
`In order for GATK to call small variants, it must be
`told how many alleles to expect at each position. As such,
`an accurate estimate of Jurkat ploidy is required before
`GATK can be used. While both the original 1977 publica-
`tion and the American Type Culture Collection (ATCC)
`report that Jurkat is diploid, other publications refute this
`description. The first karyotypes of the Jurkat cell line
`were published by Snow and Judd in 1987, who found
`that Jurkat was hypotetraploid, possessing fewer than four
`times the haploid number of chromosomes [9]. A few
`years later, tetraploidy was corroborated by an investiga-
`tion of p53 mutations, which found that the Jurkat cell line
`contained 4 separate p53 alleles [10]. More recent reports
`confirm Jurkat tetraploidy. The German Collection of
`Microorganisms and Cell Cultures (DSMZ) describes the
`Jurkat karyotype as a "human flat-moded hypotetraploid
`karyotype with 7.8% polyploidy." In addition, a multicolor-
`Fluorescence In Situ Hybridization study from 2013 found
`within-culture mosaicism on a tetraploid background
`[11].
`
`Variant calls
`Given the previous reports of tetraploidy, we ran GATK
`with a ploidy count of 4. GATK identified nearly 5 mil-
`lion variants, comprising ∼ 3.5 million single-nucleotide
`substitutions, ∼ 1.0 million small deletions, and ∼ 357
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 2
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 3 of 13
`
`thousand small insertions, across over 4.6 million variant
`loci. Basic metrics for the GATK variant calls are consis-
`tent with normal human samples. The ratio of homozy-
`gous to heterozygous variant loci is 0.635, which is in
`the range of previously reported ratios, and the ratio of
`transitions to transversions is 2.10, which is the expected
`value for human genomes [12]. The number of single-
`nucleotide substitutions is similar to previously reported
`values. However, the total number of indels is higher
`than published values from human WGS studies, which
`generally detect fewer than 700 thousand indels by shot-
`gun sequencing [12]. To date, the highest number of
`indels identified in a single human genome was ∼ 850
`thousand—determined via Sanger sequencing of J.C. Ven-
`ter’s genome [13]. This enrichment for indels, especially
`deletions, in Jurkat is likely to be at least partially due to
`the redundancy of the tetraploid genome.
`The Pindel variant caller detected 1.4 million deletions,
`740 thousand insertions, 18 thousand duplications, 150
`thousand inversions, and 4 inter-chromosomal translo-
`cations. The split-read approach is markedly similar to
`GATK’s method for the detection of small insertions and
`deletions. GATK also uses split-reads, but its detection of
`variants relies on an assembly-based method that is lim-
`ited to small sequence differences between the reads and
`the reference genome. Accordingly, the small indels called
`by both methods should be similar. As expected, in the
`Jurkat call set, over 85% of the deletions and over 65% of
`the insertions that were identified by GATK have direct
`matches in the Pindel calls.
`BreakDancer identified 6128 deletions, 18 insertions,
`183 inversions, 1981 intra-chromosomal translocations,
`and 113 inter-chromosomal translocations.
`CNV calls from CNVnator are presented in Fig. 2
`by percentage of the genome. A plot of the raw read
`
`depth density is provided in Additional file 1: Figure S1.
`CNVnator reported a modal copy number of 4 in Jurkat,
`representing over 65% of the genome and corroborat-
`ing reports of tetraploidy. From the CNVnator results,
`we identified 2499 deletion sites (CN ≤ 1), of which
`218 were homozygous (CN = 0), and 1863 duplication
`sites (CN ≥ 5).
`The structural variant calls from each tool were com-
`pared and merged with specific considerations made for
`each category of variant and each detection tool (see
`“Methods” section). Short and long insertions and dele-
`tions were defined using a cutoff of 50 bp, in accordance
`with the structural variant databases from NCBI [14].
`The numbers of variants called by each tool, along with
`the proportion of overlapping loci and total number of
`merged calls, are provided in Table 1.
`Most types of variants were called by multiple tools.
`However, the number of variants called by each tool and
`the number of variant calls that were unique to each tool
`varied greatly between variant classes and individual vari-
`ant callers (Table 1). Furthermore, each tool differed in the
`sizes of variants that it called (Additional file 1: Figures
`S2-S8).
`The relative contributions of each variant caller to the
`total set of merged calls are displayed in Fig. 3. Pindel calls
`dominated the merged variant sets, with the exception
`of translocations. This unmatched number of Pindel calls
`can be attributed to the power of the split-read approach.
`On the other hand, Pindel calls are limited in their utility
`due to the tool’s inability to determine allele frequencies.
`In contrast to Pindel’s detection power and lack of allele
`annotations, GATK and CNVnator are both limited in
`the range of variant sizes that they can detect but are
`able to consider all alleles. Therefore, while Pindel calls
`make up the majority of detected variants, GATK and
`
`Fig. 2 Histogram of DNA copy number in Jurkat. Binned copy number alterations as fractions of the genome
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 3
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 4 of 13
`
`Table 1 Variant loci counts from each tool
`GATK
`
`Pindel
`
`Breakdancer
`
`CNVnator
`
`3,520,988
`
`170,397
`
`841,001
`
`(70%)
`
`326,446
`
`(65%)
`
`326
`
`(0%)
`
`1904
`
`(61%)
`
`1039
`
`(31%)
`
`Substitutions
`
`Short Hom. Deletion
`
`Short Deletion
`
`Short Insertion
`
`Long Hom. Deletion
`
`Long Deletion
`
`Long Insertion
`
`Duplication
`
`Inversion
`
`Intra. Translocation
`
`Inter. Translocation
`
`1,239,299
`
`(47%)
`
`616,298
`
`(35%)
`
`118,610
`
`(1.4%)
`
`125,918
`
`(0.25%)
`
`17,762
`
`(22%)
`
`149,545
`
`(0.0087%)
`
`4
`
`(0%)
`
`47
`
`(0%)
`
`6081
`
`(10%)
`
`18
`
`(0%)
`
`183
`
`(7.1%)
`
`1981
`
`113
`
`(0%)
`
`108
`
`(0%)
`
`2499
`
`(1.2%)
`
`1863
`
`(24%)
`
`Merged
`
`3,520,988
`
`170,397
`
`1,460,321
`
`729,727
`
`434
`
`125,397
`
`126,657
`
`15,288
`
`149,715
`
`1981
`
`117
`
`The percentage of sites that overlap the other tools is provided where applicable
`
`CNVnator calls were prioritized in our investigations of
`variant consequence.
`
`Comparisons to databases
`After creating the merged variant sets, we compared
`them to databases of previously identified variants in
`
`order to assess the novelty of the genomic variants that
`were detected in Jurkat. We used dbSNP and DGV as
`resources for known short and long variants, respectively
`[15, 16]. Both of these databases contain the variants that
`were identified by the 1000Genomes project in addition
`to variants cataloged by other sources. Comparisons of
`
`Fig. 3 Comparison of variant loci counts from each tool. a Total number of merged variant loci called by all tools for different variant types.
`b Fraction of merged variant loci called by each tool for different variant types
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 4
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 5 of 13
`
`short variants—including single-nucleotide substitutions,
`short deletions, and short insertions—to variants found
`in the 1000Genomes project and dbSNP are given in
`Fig. 4. Single-nucleotide substitutions showed the great-
`est number of matches, while fewer than half of the short
`insertions and deletions were found in dbSNP. An even
`greater reduction in the number of database matches
`was seen in the long structural variant database compar-
`isons (Fig. 4). The differences in the number of database
`matches between single-nucleotide variants, short indels,
`and long structural variants are likely due to several fac-
`tors. The feasibility of structural variant detection, com-
`bined with the paucity of studies investigating these larger
`variants, are major contributors to these differences, but
`the increased mutational sample space of larger variants
`may also play a role.
`We also compared our SNV and small indel calls to
`those found in Jurkat by the COSMIC Cell Line project
`[17]. Our WGS approach identified nearly 10x as many
`SNVs as were detected by COSMIC via microarray. How-
`ever, of the ∼ 408 thousand Jurkat SNVs in COSMIC,
`we uncovered over 383 thousand (94%) matching single-
`nucleotide variants. Within the matching SNV calls, geno-
`types between the two call sets agreed at over 97% of loci.
`The same level of agreement was observed for both the
`∼ 174 thousand homozygous COSMIC calls and the ∼
`210 thousand heterozygous COSMIC calls. Deletion and
`insertion calls showed less overlap, but we were able to
`find 67% of the 18 thousand COSMIC deletion calls and
`40% of the 2260 COSMIC insertion calls in our data.
`Our final comparison to previously identified variants
`focused on rare, pathogenic variants from the ClinVar
`database. After removing records without assertion cri-
`teria, corresponding to a review status of zero stars, 10
`Jurkat variants were reported as pathogenic by ClinVar
`
`(Table 2). Interestingly, 6 of the 10 variants, involving 5
`separate genes, are thought to cause cancer. The other
`pathogenic ClinVar matches are associated with severe
`developmental defects. Long deletions and duplications
`from Jurkat were also found in ClinVar, but the annota-
`tions do not contain gene information and are generally
`less informative (Additional files 2 and 3).
`Moving from established to predicted effects, we
`used SnpEff
`to predict
`the functional consequences
`of the GATK-called small variants. SnpEff
`identified
`9997 synonymous and 10,984 nonsynonymous muta-
`tions. Among the nonsynonymous mutations, 252 vari-
`ants are nonsense mutations and 10,732 variants are
`missense mutations.
`‘High Impact’
`functional effects
`were predicted for 1141 of the small variant loci, of
`which 747 variants were determined to be rare (MAF
`< 0.001) in the Exome Aggregation Consortium (ExAC)
`dataset of over 60 thousand human samples [18]. These
`rare, high-impact variants were predicted to affect
`678 genes.
`A second set of ‘High Impact’ variants was created
`from the homozygous deletion calls that intersected cod-
`ing exons. This high-impact, homozygous deletion set
`includes 120 variant loci across 129 genes.
`All sets of variants, including those of high impact,
`appear to be distributed across the genome (Fig. 5). How-
`ever, even if the mutations are randomly distributed, it
`is still possible that some biological processes are more
`affected than others. The two sets of highly impacted
`genes were combined, producing a set of 781 unique
`genes. This list of likely damaged genes was used to probe
`selected gene set databases from MSigDB [19]. The top 5
`enriched gene sets are displayed in Table 3.
`As might be expected from a cancer cell line, the dam-
`aged genes in Jurkat are involved in genome, cell cycle, and
`
`Fig. 4 Jurkat variants with database matches. Jurkat variants loci that have matches in dbSNP (short variants) and DGV (long variants) as percentage
`of total Jurkat variant sites for each type of variant. Number of databases matches over the number of Jurkat variant loci: 3.29M / 3.52M substitutions;
`652K / 1.46M short deletions; 323K / 730K short insertions; 6.38K / 125K long deletions; 286 / 127K long insertions; 1.27K / 15.3K duplications
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 5
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 6 of 13
`
`Table 2 Jurkat variants found in the ClinVar database
`rsID
`Jurkat AF
`
`ClinVar substitutions
`
`rs63750636
`
`rs397517342
`
`rs397516435
`
`ClinVar short deletions
`
`rs63750075
`
`rs398122841
`
`rs397508104
`
`rs750664956
`
`rs786204835
`
`ClinVar short insertions
`
`rs397507178
`
`rs398122840
`
`1.0
`
`0.75
`
`0.25
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`Gene
`
`MSH2
`
`CDH23
`
`TP53
`
`MSH6
`
`BAX
`
`KCNQ1-(AS1)
`
`ASPM
`
`PURA
`
`RAD50
`
`BAX
`
`Phenotype
`
`ClinVar accession
`
`Lynch syndrome
`
`Usher syndrome type 1D
`
`Li-Fraumeni syndrome
`
`Lynch syndrome
`
`Carcinoma of colon
`
`Long QT syndrome
`
`Not provided
`
`Not provided
`
`Hereditary cancer
`
`Carcinoma of colon
`
`RCV000076405.3
`
`RCV000039224.2
`
`RCV000205265.3
`
`RCV000074711.2
`
`RCV000010120.5
`
`RCV000046039.3
`
`RCV000217980.1
`
`RCV000169739.5
`
`RCV000030958.3
`
`RCV000010119.5
`
`cytoskeleton maintenance, as well as sugar processing.
`The enrichment of damaged genes that are involved in the
`immune system is particularly interesting given the Jurkat
`cell line’s role in establishing our current understanding of
`T-cell immune responses.
`
`While the gene set enrichment analysis aided in
`categorizing the many genetic aberrations in the Jurkat
`cell line, most of the top-enriched sets are broad, sug-
`gesting gross defects across general biological processes.
`These findings reinforce the growing body of literature
`
`Fig. 5 Genomic variation distributions. Distributions of multiple types of variants across the Jurkat genome. Plotted data listed from outside-in: 1.
`hg19 genome ideogram (gray); 2. Density of SnpEff “High Impact” SNVs with rare ExAC allele frequencies (gold); 3. Homozygous deletions that lie in
`coding exons (red); 4. Deletions longer than 25 kb (blue); 5. Insertions longer than 50 bp that lie in coding exons (green); 6. Inversions longer than
`25 kb (cyan); 7. Interchromosomal translocations (center)
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 6
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 7 of 13
`
`Table 3 Gene sets enriched for highly impacted genes
`GO: CHROMOSOME ORGANIZATION
`Overlap: 51/1009
`p-value= 0.00011
`Genes: APBB1, ATXN3, ATXN7, BAZ2A, BCORL1, BRD8, CDC14A, CDCA5,
`CDYL, CENPT, CLASP1,
`CREBBP, EHMT1, GATAD2B, GTF2H3, HDAC4, KAT2A, KDM5B, KIF23, KNTC1,
`MLH3, MSH2,
`NCAPD2, NDC80, NOC2L, PIBF1, PRIM2, PTGES3, RAD50, RBL2, RSF1, SETX,
`SLX4, SMARCC2,
`SMC3, TCF7L2, TEP1, TET1, TEX14, TOP1MT, TP53, TTN, USP15, VPRBP,
`YEATS2, ZNF304, ZNF462
`GO: CELL CYCLE
`p-value= 0.00015
`Overlap: 62/1316
`Genes: ADCY3, ANAPC5, APBB1, ARHGEF2, BAX, CDC14A, CDC27, CDCA5,
`CDK14, CECR2,
`CENPT, CEP164, CKAP2, CLASP1, DYNC1H1, DYNC1I2, FANCI, HINFP,
`HSP90AA1, INTS3,
`IQGAP3, KIAA0430, KIAA1377, KIF23, KNTC1, KRT18, MACF1, MAP3K8, MAP9,
`MCM8,
`MKI67, MLH3, MNS1, MSH2, NCAPD2, NDC80, NUP214, NUP98, OFD1, ORC1,
`PHLDA1,
`PIBF1, PRIM2, PSMD3, PTEN, PYHIN1, RAD50, RBL2, RUVBL1, SMC3, SON,
`TCF7L2, TEX14,
`THAP1, TP53, TP53BP1, TPR, TRIOBP, TSC1, TTK, TTN, ZFHX3
`GO: CARBOHYDRATE DERIVATIVE BIOSYNTHETIC PROCESS
`Overlap: 34/595
`p-value= 0.00017
`Genes: ADCY2, ADCY3, ADCY9, ALG1L2, ALG9, B3GALT1, B3GNT6, BCAN,
`BMPR2, C1GALT1C1,
`CANT1, CHST15, CHSY1, GAL3ST4, GPC6, GUCY2C, GXYLT1, HAS3, KIAA2018,
`MUC16, MUC19,
`MUC3A, MUC6, NDST4, OMD, PHLDA1, PIGS, PRKCSH, SLC25A13, ST3GAL3,
`ST3GAL5, TET1,
`UGCG, UGP2
`GO: NEGATIVE REGULATION OF ORGANELLE ORGANIZATION
`Overlap: 25/387
`p-value= 0.00020
`Genes: ARHGEF2, CDC14A, CKAP2, CLASP1, DYNC1H1, KIAA1377, KIF23,
`LIMA1, MAP9, MSH2,
`NDC80, NOC2L, OFD1, OTUB1, PIBF1, RAD50, SMC3, SPTA1, SPTAN1, TET1,
`TEX14, TPR, TRIOBP,
`TTK, UBQLN4
`GO: IMMUNE SYSTEM PROCESS
`Overlap: 85/1984
`p-value= 0.00023
`
`Genes: ABCB5, ADAM17, AGBL5, AIM2, AP3B1, APOB, ARHGEF2, BAX,
`C1orf177, C7, CCL13,
`
`CD14, CD177, CEACAM8, CLNK, CREBBP, CTLA4, CYFIP2, DEFB126, DHX58,
`DYNC1H1,
`
`DYNC1I2, DYNC2H1, ENDOU, ENPP3, F2, FN1, HDAC4, HLA-DRB5, HNRNPK,
`HSH2D,
`
`HSP90AA1, IGJ, IL10RB, IL27RA, IL2RG, ILF2, INPP5D, IPO7, ITGA6, KIF23, KIF3C,
`KIR2DS4,
`
`KLC2, LILRA3, MAP3K1, MAP3K8, MSH2, NCAM1, NLRC3, NLRC5, OAS1,
`OTUB1, PAPD4,
`
`PIBF1, PODXL, PRKACG, PSMD3, PTEN, RHOH, SAMHD1, SARM1, SEC31A,
`SECTM1, SHC1,
`
`SLC3A2, SLFN11, SPEF2, SPTA1, STAT5B, SYK, SYNCRIP, TAB2, TAPBP, TEK,
`TMIGD2, TNFSF4,
`
`TNK2, TRIL, TRIM10, TSC1, ULBP1, VPRBP, WDR7, WIPF1
`
`that has cataloged numerous irregularities in Jurkat biol-
`ogy, but they also imply that the deviation from normal
`T-cell biology may be more extensive than previous stud-
`ies had reported.
`
`Defective pathways
`By leveraging the deep history of the Jurkat cell line, in
`combination with our pathogenic and high-impact variant
`lists, we have distinguished three core pathways that are
`defective due to genomic aberrations in Jurkat—namely
`TCR signaling, genome stability, and O-linked glycosyla-
`tion. This analysis is not exhaustive. Rather, we focused
`on pathways that are well-supported by both the literature
`and our genomic analysis.
`
`TCR signaling
`The damaged genes affecting T-cell receptor signaling are
`PTEN, INPP5D, CTLA4, and SYK. TCR signaling in Jurkat
`was first called into question due to the lack of PTEN and
`INPP5D expression [3, 20]. Both PTEN and INPP5D are
`lipid phosphatases that regulate PI3K signaling by degrad-
`ing PtdIns(3,4,5)P3. PTEN mutations in Jurkat were first
`described by Sakai et al. in 1998. They found two sep-
`arate alterations in exon 7 "without normal conformers
`present," both of which introduced stop codons [21].
`We detected the same two heterozygous variants. SnpEff
`annotated one of these mutations as a frameshift vari-
`ant and the other as a stop-gained variant, predicting that
`both of these variants would result in loss of function.
`INPP5D (SHIP1) has long been known to not be
`expressed in the Jurkat cell
`line [3]. We have identi-
`fied a single-nucleotide substitution that changes codon
`317 from glutamine to a stop codon, as well as a
`47 bp heterozygous deletion from hg19.chr2:234068130–
`234068177. These same mutations were detected in 2009
`via targeted sequencing [22]. Admittedly, the lack of allele
`resolution in our data precludes us from making defini-
`tive claims about these mutations, as we cannot distin-
`guish which alleles were affected. Fortunately, the targeted
`sequencing study found the stop codon on one allele and
`the 47 bp deletion on the others, both of which should
`block the production of a full length INPP5D transcript.
`CTLA4 is a CD28 homolog that transmits an inhibitory
`signal to T cells. In 1993, Lindsten et al. noticed that
`“CTLA4 mRNA is not expressed nor induced in the Jurkat
`T cell line” [23]. However, the reason for this lack of
`CTLA4 induction has not been proposed. More recent
`investigations have detected both the protein and the tran-
`script, although the transcript was less abundant in Jurkat
`than in peripheral blood mononuclear cells [24]. This
`finding seems to support the hypothesis that the CTLA4
`protein is accumulated in the cytosol [24]. Our analyses
`revealed a heterozygous, stop-gained, single-nucleotide
`substitution that converts codon 20 to a stop codon.
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 7
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 8 of 13
`
`This mutation was found in around half of the mapped
`reads and might be responsible for the decreased CTLA4
`expression that has been observed in Jurkat cells, although
`other mechanisms may be at play.
`SYK is a member of the Syk family of non-receptor
`tyrosine kinases. It functions similarly to ZAP70 in trans-
`mitting signals from the T-cell receptor. In 1995, Fargnoli
`et al. reported that SYK is not expressed in the Jurkat
`cell line and contains a guanine insertion that causes a
`frameshift at codon 34. We identified the same heterozy-
`gous insertion in our sample, which is predicted to result
`in loss of function of the transcript, yet the mechanism
`behind the lack of expression of the other allele remains
`an open question.
`Interestingly, Fargnoli et al. proposed that the lack of
`SYK expression in Jurkat “may have facilitated the ini-
`tial identification and characterization of ZAP70 as the
`major ζ-associated protein” [25]. On the other hand, while
`the lack of SYK expression in Jurkat was subsequently
`confirmed, reconstitution studies suggest that SYK and
`ZAP70 occupy distinct roles in TCR signaling, with SYK
`displaying 100-fold greater kinase activity than ZAP70
`[26, 27].
`
`Genome stability
`TP53, BAX, and MSH2 encode tumor suppressors
`involved in maintaining genomic stability that are severely
`mutated in Jurkat. The product of the TP53 gene is
`p53, which is a known deficiency in the Jurkat cell line
`[20]. In 1990, Cheng and Haas detected a heterozygous,
`stop-gained single-nucleotide substitution in codon 196
`(R196*) in Jurkat cells. They proposed that this muta-
`tion “may play a role in the genesis or in the tumorigenic
`progression of leukemic T cells” [10]. We detected the
`same heterozygous mutation (rs397516435) in exon 6 of
`the TP53 gene and found that this mutation is associated
`with Li-Fraumeni syndrome [28], which is an autosomal
`dominant hereditary disorder that causes the early onset
`of tumors. This mutation is likely responsible for the
`consistent reports of p53 deficiencies in Jurkat cells.
`While loss of p53’s protective effects is normally thought
`of as the mechanism behind tumorigenesis, in some cases,
`truncated p53 can gain oncogenic functions [29]. Recent
`studies have revealed that stop-gained mutations in exon
`6 ofTP53 produce a truncated p53 isoform that seems
`to partially escape nonsense-mediated decay. These iso-
`forms, termed p53ψ, lack canonical p53 transcriptional
`activity. Instead, they localize to the mitochondria, where
`they activate a pro-tumorigenic cellular program by regu-
`lating mitochondrial transition pore permeability through
`interaction with cyclophilin D [30]. The Jurkat cell line’s
`expression of a p53ψ isoform may contribute to the
`previously-reported, exaggerated Ca2+ release upon TCR
`activation [31].
`
`the Bcl-2 gene family and
`BAX is a member of
`helps induce apoptosis. In the Jurkat cell line, BAX is
`not expressed due to the presence of two heterozy-
`gous frameshift mutations in codon 41 [32]. All alle-
`les are affected. We identified the same two variants,
`rs398122841 and rs398122840, each of which were found
`in approximately half of the mapped reads.
`Investigations into microsatellite instability revealed
`that MSH2 is not expressed in Jurkat due to a stop-gained
`point mutation in exon 13 [33]. We identified the same
`variant as a homozygous single-nucleotide substitution
`(rs63750636.) MSH2 is involved in DNA mismatch repair,
`and this stop-gained variant is associated with hereditary
`nonpolyposis colorectal cancer [34].
`
`O-linked glycosylation
`The Jurkat cell
`line’s inability to properly synthesize
`O-glycans, due to deficient core 1 synthase, glycoprotein-
`N-acetylgalactosamine
`3-beta-galactosyltransferase
`1
`(C1GALT1) activity, was first noticed in 1990 [35]. This
`deficiency causes Jurkat to express the Tn antigen, which
`is associated with cancer and other pathologies. In 2002,
`Ju and Cummings reported that a single-nucleotide
`deletion in COSMC (C1GALT1C1), a chaperone for
`C1GALT1, was responsible for Jurkat’s truncated O-
`glycans. The deletion causes a frameshift and introduces
`a stop codon in the only exon of the COSMC gene. Ju and
`Cummings assumed that the Jurkat cell line had retained
`its diploid, male genome and possessed only one copy of
`the X chromosome. We now know that the Jurkat cell line
`has two copies of the X chromosome, but consistent with
`the original report, we have determined through deep
`sequencing that the mutation is, indeed, homozygous
`across Jurkat’s two X chromosomes [36].
`
`Discussion
`We performed a bottom-up search for abnormalities in
`the Jurkat genome using short-read sequencing. We detect
`numerous examples of each examined variant type and
`use various strategies to tie these variants to functional
`effects. Our analysis identifies multiple dysfunctional
`pathways in the Jurkat cell line.
`While some of the variants were previously detected
`using top-down methods, we were able to add hundreds of
`potentially damaging variants to the list of Jurkat’s genetic
`defects. Gene set enrichment analysis revealed that many
`of the affected genes lie in pathways that are commonly
`defective in cancer. The great number of potentially dam-
`aging genes, combined with the large-scale genomic rear-
`rangements in Jurkat, make it difficult to pinpoint the
`cause of Jurkat’s biological abnormalities. However, some
`of the better-studied mutations, such as those reported as
`pathogenic by ClinVar, are likely to have significant effects
`on important signaling pathways.
`
`UPenn Ex. 2077
`Miltenyi v. UPenn
`IPR2022-00855
`Page 8
`
`
`
`Gioia et al. BMC Genomics (2018) 19:334
`
`Page 9 of 13
`
`In addition to these putatively damaging variants, we
`identified millions of mutations across all categories of
`genomic variants. The effects of these variants are less cer-
`tain, but our comprehensive variant catalog will facilitate
`further investigations of the Jurkat genome and allow for
`re-analysis of Jurkat variants as more information about
`their effects becomes available.
`Using our list of variants, we were also able to exten-
`sively search the literature for previously identified defects
`in Jurkat. We found a number of reports describing
`the same variants that we had independently identified,
`confirming the presence of these mutations in extramu-
`ral Jurkat samples. Uncovering these past publications
`required precise knowledge of damaged genes in Jurkat.
`They were difficult, if not impossible, to find using general
`queries and were published over a decade ago in a range
`of journals. Furthermore, with the exception of the PTEN
`and INPP5D defects, these reports had never been consol-
`idated into a single resource, making our documentation
`of previous reports the first review of damaged genes in
`the Jurkat cell line.
`The defects in these genes have the potential to con-
`found prior findings in Jurkat, but the loss is unlikely
`to put a dent in the vast amount of knowledge that we
`have gained from this cell line. In fact, Jurkat’s expression
`deficiencies open the door for reconstitution experiments
`that, in other systems, would first require suppression of
`the gene products. Many studies have already put this idea
`into action. Transgenic expression of INPP5D and SYK
`constructs has already generated breakthroughs in our
`understanding of their biological activities [26, 27, 37, 38].
`Likewise