throbber
OPEN 0 ACCESS Freely available on line
`
`The Database for Aggregate Analysis of ClinicalTrials.gov
`(AACT) and Subsequent Regrouping by Clinical Specialty
`
`Asba Tasneem1*, Laura Aberle1, Hari Ananth1, Swati Chakraborty1, Karen Chiswell1, Brian J. McCourt1,
`Ricardo Pietrobon1,2
`
`1 Duke Clinical Research Institute, Durham, North Carolina, United States of America, 2 Department of Surgery, Duke University School of Medicine, Durham, North
`Carolina, United States of America
`
`Abstract
`
`Background: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned
`clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However,
`issues related to data structure, nomenclature, and changes in data collection over time present challenges to the
`aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in
`particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource.
`
`Methods/Principal Results: The purpose of our project was twofold. First, we sought to extend the usability of
`ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that
`contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a
`methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading
`(MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study
`sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was
`created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false
`negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.
`
`Conclusions/Significance: The resulting AACT database features study design attributes parsed into discrete fields,
`integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text
`format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of
`the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may
`facilitate other efforts to analyze studies by specialty groups.
`
`Citation: Tasneem A, Aberle L, Ananth H, Chakraborty S, Chiswell K, et al. (2012) The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent
`Regrouping by Clinical Specialty. PLoS ONE 7(3): e33677. doi:10.1371/journal.pone.0033677
`
`Editor: Joel Joseph Gagnier, University of Michigan, United States of America
`
`Received October 14, 2011; Accepted February 14, 2012; Published March 16, 2012
`Copyright: ß 2012 Tasneem et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
`unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
`
`Funding: Financial support for this work was provided by cooperative agreement U19 FD003800 awarded by the U.S. Food and Drug Administration to Duke
`University in support of the Clinical Trials Transformation Initiative. The funders had no role in study design, data collection and analysis, decision to publish, or
`preparation of the manuscript.
`
`Competing Interests: The authors have declared that no competing interests exist.
`
`* E-mail: asba.tasneem@duke.edu
`
`Introduction
`
`is a registry of
`ClinicalTrials.gov (www.ClinicalTrials.gov)
`human clinical research studies. It is hosted by the National
`Library of Medicine (NLM) at the National Institutes of Health
`(NIH) in collaboration with the U.S. Food and Drug Administra-
`tion (FDA). As mandated by federal law [1], ClinicalTrials.gov
`provides a central resource for information about clinical trials; in
`addition, it increases the public visibility of such research. The
`registry currently contains over 100,000 research studies conduct-
`ed in more than 170 countries and is widely used both by medical
`professionals and the public. New research studies are being
`submitted to the registry by their respective sponsors (or sponsors’
`designees) at a rate of approximately 350 per week [2]. Due to
`legislative [1] and institutional [3] requirements enacted in the
`latter half of
`the previous decade, compliance with registry
`obligations is assumed to be high for U.S. drug and device trials,
`
`and the consistency, quality, and maintenance of registry data
`have improved with increased use [4]. However, the registry has
`not been optimized for the analysis of aggregate data, and a
`systematic effort to create and maintain a database for this purpose
`has not previously been undertaken.
`In November 2007, the FDA and Duke University announced
`the formation of a public-private partnership to improve the
`quality and efficiency of clinical trials. This collaboration of more
`than 60 organizations and government agencies was convened by
`Duke University under a memorandum of understanding with
`FDA, and is now known as the Clinical Trials Transformation
`Initiative (CTTI) [5]. CTTI leaders recognized that Clinical-
`Trials.gov represented a promising source for benchmarking the
`state of the clinical trials enterprise, as the registry contains studies
`from the full range of sponsoring organizations. Increasing the
`usability of ClinicalTrials.gov data may therefore facilitate
`systematic evaluation of clinical studies aimed at building the
`
`--~-
`-~-·
`
`PLoS ONE | www.plosone.org
`
`1
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 1
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0001
`
`

`

`Database for Aggregate Analysis of CT.gov
`
`knowledge base needed to inform medical practice and preven-
`tion.
`As data have accumulated in ClinicalTrials.gov, users have
`increasingly sought capabilities
`that would allow aggregated
`descriptive characterization of
`the national research portfolio;
`however, access and data usability issues, including data format
`and design, present obstacles. A number of related initiatives,
`including the Ontology of Clinical Research (OCRe) [6], Human
`Studies Database (HSDB) [7], CDISC Protocol Representation
`Model [8], and LinkedCT [9] projects, are addressing ontological
`annotations, large-scale data mining, data representation format,
`and external association of these data, respectively. The results of
`this project are complementary to these initiatives and are
`expected to collectively advance this area of study as a whole.
`In this article, we report on CTTI’s efforts to prepare and
`maintain a publicly accessible analysis dataset derived from
`ClinicalTrials.gov content—the database for aggregate analysis
`of ClinicalTrials.gov (AACT). We also discuss efforts to extend the
`
`utility of the analysis dataset by means of an associated clinical
`specialty taxonomy designed to support research policy analyses.
`
`Methods
`
`1. Creation of the AACT
`Key design features of AACT include 1) the capacity to extend
`the dataset by parsing existing data; 2) linking to additional data
`resources,
`such as
`the Medical Subject Headings
`(MeSH)
`thesaurus; and 3) integrated metadata. A framework for extensions
`allows entire studies or individual fields to be associated with new
`data resources while preserving provenance. In addition,
`the
`integrated data dictionary developed for this project facilitates
`browsing and analysis of ClinicalTrials.gov and AACT metadata.
`Finally,
`the database incorporates a flexible design that can
`accommodate future developments, such as coding biospecimen
`type, sponsors, and OCRe annotations. Figure 1 shows key
`enhancements achieved by building the AACT.
`
`M etadata Tables:
`CURRENT_ VARIABLES,
`ENUMERATIONS,
`VARIABLE_HISTORY_DATES
`
`Designs Table with parsed
`Study Design (Primary Purpose,
`Masking, Intervention Model,
`Allocation, Endpoint
`Classification, Control,
`Observational Model, Time
`Perspectwe)
`
`• Aggregat e Analysis
`• Customized Queries
`• Comparative Data Analysis
`• Direct Import into Oracle, SAS
`etc. (excluding Specialty data
`set s)
`
`AACT
`
`MeSH Disease
`Conditions
`Annotated by
`Clinicians
`
`MESH_REPORTING Table
`for MeSH Annot ation
`validation
`
`MESH_SPECIALTY Table with
`Annotated MeSH conditions for
`each specialty (e.g. Cardiology,
`Oncology, Mental Health, ... , etc.)
`
`NON_MESH_SPECIALTY Table with
`Annotated free-text disease conditions
`for each specialty (e.g. Cardiology,
`Oncology, Mental Health, .... etc.)
`
`Initial
`Specialty
`Data Sets
`
`Final
`Specialty
`Data Sets
`
`--------,
`: Manual : #
`
`I
`
`I
`
`,
`r eview
`'
`I
`L - - - - - - - -
`
`Figure 1. A schematic representation of the database for Aggregate Analysis of ClinicalTrials.Gov (AACT) with its key
`enhancements.
`doi:10.1371/journal.pone.0033677.g001
`
`PLoS ONE | www.plosone.org
`
`2
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 2
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0002
`
`

`

`Database for Aggregate Analysis of CT.gov
`
`1.1. Data Sources. A dataset comprising 96,346 clinical
`studies was downloaded from ClinicalTrials.gov in XML format
`on September 27, 2010. We chose ClinicalTrials.gov for our study
`because it is the largest database of its kind and because it covers
`the full range of clinical conditions, includes a broad group of trial
`sponsors [10], and has a regulatory mandate [1]. The date of
`download was chosen to coincide with the anniversary of the
`enactment of the FDA Amendments Act (FDAAA) 3 years earlier,
`which mandated the registration of certain trials of FDA-regulated
`drugs, biologics, and devices [1].
`We downloaded the 2010 MeSH thesaurus (http://www.nlm.
`nih.gov/mesh/2010/download/termscon.html) and merged it
`with the AACT database, where it was used as a lookup table to
`locate corresponding tree numbers, referred to as MeSH IDs, for all
`MeSH terms associated with each clinical trial in ClinicalTrials.
`gov. Persons or organizations who submit studies to the registry
`are requested to provide the condition and keyword data elements as
`MeSH terms.
`1.2. Data Model. ClinicalTrials.gov data element definitions,
`xsd specifications for registry data submission, and downloaded
`
`study XML files were used to represent data specifications for the
`downloaded data. A physical data model was designed using
`Enterprise Architect (Sparx Systems Pty Ltd, Creswick, Victoria,
`Australia); this model depicted data tables and their data columns,
`as well as relationships between and among tables. An optimal
`structure was achieved through normalization, which was used to
`organize data efficiently, eliminate redundancy, and ensure logical
`data dependencies by storing only related data within a given table
`[11]. The database (Figure 2) was normalized to the Second
`Normal Form (2NF), a set of criteria designed to prevent logical
`inconsistencies while reducing data redundancy [12].
`We assigned data type and length of data elements based on
`patterns observed for each data element in the downloaded XML
`files. Whenever possible, we followed guidelines provided in
`ClinicalTrials.gov’s draft Protocol Data Element Definitions [13]
`when assigning lengths to given data elements. Data were housed
`in Oracle RDBMS, version 11.1 g (Oracle Corporation, Redwood
`Shores, California, USA). Enterprise Architect 7.1 was used for
`database design and additional
`transformation rules were
`documented as extract-transform-load (ETL) specifications. PL/
`
`SPONSORS
`
`MESH
`THESAURUS
`
`MESH TREES
`
`LINKS
`
`L
`
`]
`
`INTERVENTION
`BROWSE
`
`CONDITION
`BROWSE
`
`OUTCOMES
`
`CLINICAL_STUOY
`
`[[□ESIGNS 7
`
`LOCATIONS
`
`PERSONS
`
`METADATA TABLES
`
`CURRENT
`VARIABLES
`
`LOCATION
`CONTACT
`
`ENUMERATIONS
`
`FACILITIES
`
`n=A□DRESSES 7
`
`Figure 2. High-level Entity-Relationship Diagram (ERD) for AACT.
`doi:10.1371/journal.pone.0033677.g002
`
`PLoS ONE | www.plosone.org
`
`3
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 3
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0003
`
`

`

`Table 1. Escape characters and replacements.
`
`Escape character
`
`Replacement
`
`’
`
`"
`
`&
`
`"
`
`,
`
`’
`
`"
`
`&
`
`.
`
`,
`
`doi:10.1371/journal.pone.0033677.t001
`
`SQL packages that used Oracle’s inbuilt DBMS_LOB package to
`read the input XML files and load the data into the designed
`tables appropriately were developed. Quality control and
`operational support processes were developed using standard
`SQL queries through Toad for Data Analysts (Quest Software,
`Aliso Viejo, CA, USA) and Cognos ReportNet (CRN)
`(IBM
`Corporation, Armonk, NY, USA). We extended the core data
`model to accommodate both data management and data curation
`purposes. Error log tables and indexes were created for testing,
`debugging, and performance enhancement. Manual user accep-
`tance testing was performed by randomly selecting five studies per
`data element (from a total of 109 data elements) from the AACT
`database. The values associated with each data element were
`tested for correctness and completeness by comparing them with
`the original source data from downloaded XML files. We also
`
`Database for Aggregate Analysis of CT.gov
`
`created integrated data dictionary tables as reference tables
`holding explicit data element definitions and system metadata
`(Tables S1 and S2).
`During the course of database development, the NLM made
`several new data elements available for public download, some of
`which included information about the FDA (e.g., Section 801
`clinical
`trials,
`studies with FDA-regulated interventions, and
`expanded-access studies). In addition to these, MeSH condition
`and intervention terms generated by the NLM algorithm were also
`made available for public download.
`In XML files downloaded from ClinicalTrials.gov, the single
`data element Study Design contains a string of concatenated values
`for various different components of a study design, such as primary
`purpose,
`interventional model, observational model, allocation,
`endpoint classification, time perspective, and masking. While this
`format is well-suited for supporting information retrieval, it does
`not readily accommodate aggregate data analysis of the compo-
`nents within the Study Design data element. For this reason, data
`from Study Design was parsed into its components and stored in a
`separate table called DESIGNS. Additional data elements (Design
`Name and Design Value) were created to store all components of
`study design and their respective enumerated values. Values
`related to masking/blinding (e.g., Single; Double-Blind) were further
`parsed into their components, along with the list of corresponding
`(Participant,
`Investigator, Outcome Assessor, and
`masking subjects
`Caregiver).
`loading the
`encountered while
`challenges were
`Several
`database, including foreign characters embedded in XML files
`
`ICI, E rials Registra ion Policy
`F
`
`aw ·11
`
`~
`
`100
`
`IV .. IV
`
`0
`4)
`ti
`a.
`E
`0 u
`
`..c: .. 'j
`
`90
`
`80
`
`70
`
`60
`
`so
`
`40
`
`30
`
`20
`
`10
`
`,
`~-··
`' '
`
`0 - 1 - - - . - - - - r - - - - - r - - - , - - . . - - - - , - --
`
`- - . - -........ ---r-----r---,.-----,
`
`-...,OJ~
`
`'\,<::><Si
`
`'\,<::><::>..,,
`
`'\,<::><::>'\, '\,<::><:3'
`
`'\,r::,~
`
`'\,r::,(;)<-,
`
`,..,,r::,&
`
`'\,<::>~
`
`'\,<::>~
`
`'\,<::>~
`
`'\,<::>~
`
`Year Study Registered with ClinicalTrials.gov
`
`-
`
`-
`
`-
`
`-
`
`ata M or, it oring Com, itt ee? #
`
`-
`
`- S udy cla ss i icat ion
`
`ut ber of arms $. :;
`In erv ent ion model *
`
`- - - Alloca io, *
`
`• • • • • •
`
`-
`
`-
`
`l a sking "'
`E, rollmer +
`
`Gender &
`
`- - - ea d spor sor &
`
`$ May be required by FDAAA
`
`* A least one o ' I ese elell'ent s is
`r equired by AAA
`
`' Required by FDAAA
`& Required by FDAAA and
`cli nical rials.gov
`:. a a eler et t in .reduce
`2007·04
`
`in
`
`~ -~
`-g
`::s
`~
`n,
`C:
`
`0 .. C:
`4) .. ..5
`
`4)
`
`> ...
`
`~
`
`Figure 3. Percentage of interventional studies with complete data by registration year for selected data elements.
`doi:10.1371/journal.pone.0033677.g003
`
`.·flfi..
`-~.·
`
`PLoS ONE | www.plosone.org
`
`4
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 4
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0004
`
`

`

`Database for Aggregate Analysis of CT.gov
`
`with most of the data elements; these had to be replaced with
`character references (see Table 1 for examples).
`Other circumstances that prompted several database design
`iterations included the facts that the maximum length for each
`data element noted by ClinicalTrials.gov’s May 2010 Protocol
`Data Element Definitions document was not always consistent
`with the complete dataset, and one-to-one or one-to-many
`
`relationships between or among data elements were not obvious
`in the XML data type definition from ClinicalTrials.gov.
`1.3. Quality Assessment. Of the 96,346 studies downloaded
`from ClinicalTrials.gov in September 2010, a total of 79,413
`(82.4%) were interventional (i.e., a study in which an investigator
`following a protocol assigns research participants
`to receive
`specific interventions, as opposed to an observational study),
`
`a. Stored disease conditions provided by submitters and MeSH terms generated by NLM alg:orithm
`
`V ...
`
`l
`
`Annotation of disease condition
`terms (MeSH as well as Non(cid:173)
`MeSH) by Clinical Specialists
`
`b. Annotating disease conditions
`
`Confirm
`Annotation
`
`c. Creation of specialty datasets
`
`Annotations Confirmed
`
`- (cid:173)~~
`
`User registers study in ClinicalTrials.gov
`
`Other (e.g., protocol,
`criteria, ... )
`
`l
`
`DI
`
`Q
`::I o·
`~
`DI
`'iii
`G)
`0
`
`< l
`
`n
`::I o·
`~
`:;t
`iii'
`'iii
`-I
`iil
`::I
`a--,
`DI -c>°
`3
`DI -•::°
`
`::I
`
`::I
`2:
`
`CD
`
`Figure 4. An overview of methodology and process of developing clinical specialty datasets. The INTERVENTIONS, CONDITIONS, and
`KEYWORDS tables consist of disease condition terms provided by data submitters that include both MeSH and non-MeSH terms. The
`INTERVENTION_BROWSE and CONDITION_BROWSE tables are populated by MeSH terms generated by NLM algorithm (a) Process illustrating how
`MeSH terms are created in ClinicalTrials.gov. Tables and data shown here does not represent entire ClinicalTrials.gov database (b) Process illustrating
`the annotation and validation of disease conditions (c) Process illustrating the creation of specialty datasets.
`doi:10.1371/journal.pone.0033677.g004
`
`PLoS ONE | www.plosone.org
`
`5
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 5
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0005
`
`

`

`Table 2. MeSH Subject Headings, 2010—Diseases.
`
`Bacterial Infections and Mycoses [C01]
`
`Virus Diseases [C02]
`
`Parasitic Diseases [C03]
`
`Neoplasms [C04]
`
`Musculoskeletal Diseases [C05]
`
`Digestive System Diseases [C06]
`
`Stomatognathic Diseases [C07]
`
`Respiratory Tract Diseases [C08]
`
`Otorhinolaryngologic Diseases [C09]
`
`Nervous System Diseases [C10]
`
`Eye Diseases [C11]
`
`Male Urogenital Diseases [C12]
`
`Female Urogenital Diseases and Pregnancy Complications [C13]
`
`Cardiovascular Diseases [C14]
`
`Hemic and Lymphatic Diseases [C15]
`
`Congenital, Hereditary, and Neonatal Diseases and Abnormalities [C16]
`
`Skin and Connective Tissue Diseases [C17]
`
`Nutritional and Metabolic Diseases [C18]
`
`Endocrine System Diseases [C19]
`
`Immune System Diseases [C20]
`
`Disorders of Environmental Origin [C21]
`
`Animal Diseases [C22]
`
`Pathological Conditions, Signs and Symptoms [C23]
`
`Available at: http://www.nlm.nih.gov/mesh/trees.html
`
`doi:10.1371/journal.pone.0033677.t002
`
`16,506 (17.1%) were observational, 107 (0.1%) were expanded-
`access, and 320 had no information about the study type. We
`analyzed selected data elements in interventional studies for
`completeness of data (e.g., a null value in the data element) and
`observed a trend toward increasing completeness of data over
`time. This trend appears to have been notably affected by two
`milestones in the history of ClinicalTrials.gov. In September 2004,
`the International Council of Medical Journal Editors (ICMJE)
`published a policy requiring registration of interventional trials as
`a condition of publication [3]. The ICMJE requirements took
`effect in September 2005, which may account for the increase in
`completeness for some data elements in 2005 (Figure 3).
`In September 2007, the FDAAA [1] made the registration of
`interventional studies mandatory. This requirement took effect in
`December 2007 and may further account for increases in the
`
`Table 3. Frequency of intermediate terms and top node
`terms that did not match annotations of lower-level terms.
`
`Specialty
`
`n/N (%)
`
`Cardiology
`
`Oncology
`
`I
`I
`
`Mental health
`
`172/5264 (3.3%)
`
`284/5264 (5.4%)
`
`93/5264 (1.8%)
`
`Database for Aggregate Analysis of CT.gov
`
`completeness of data elements in the ClinicalTrials.gov dataset. In
`Figure 3, the data elements ‘‘data monitoring committee’’ and
`‘‘number of arms’’ were not available at the time that earlier
`studies were registered. It is important to note that the presence of
`these data elements for studies pre-dating December 2007 reflect
`later updates performed by data providers.
`1.4. Changes
`in ClinicalTrials.gov’s Protocol Data
`Element Definitions. The ClinicalTrials.gov Protocol Data
`Element Definitions (PDED) have evolved since the database was
`first launched. Although references containing individual protocol
`data element definitions are provided for submitters with each
`release of the definitions document, there is no document that
`tracks changes
`for all data elements
`for
`review as data
`specifications. These include changing enumerated values for a
`data element, revising a data element definition, making a
`particular data element publicly available,
`introducing a new
`data element, and entirely deleting a data element. However, more
`rigorous submission rules imposed by mandating organizations
`(e.g., NLM, FDA), such as those required by the FDAAA and
`ClinicalTrials.gov, appear to have had the greatest impact on the
`completeness of data.
`Changes to a data element play a significant role in the analysis
`of study data. As we examined each data element’s history, we
`noted that between September 2004 and July 2005 (a period
`spanning 3 releases of the PDED), and again in December 2007,
`the data element requirements were not documented in the
`definitions document. Other inconsistencies were also noted and
`later confirmed (Personal communication, Dr. Deborah Zarin and
`Mr. Nicholas Ide, February 18, 2011).
`1.5. A Public Resource. The AACT can be downloaded as
`Oracle extracts (.dmp file and text format output; available at
`https://www.trialstransformation.org/projects/improving-the-public-
`interface-for-use-of-aggregate-data-in-clinicaltrials.gov/aact-database-
`for-aggregate-analysis-of-clinicaltrials.gov). Additional documents are
`available to assist users in interpreting the data. The high-level data
`dictionary and a comprehensive data dictionary noted previously are
`included in the dataset file. The comprehensive data dictionary
`contains seven sections: 1) current variables, 2) enumerations, 3)
`constraints, 4) record counts, 5) database schema, 6) comprehensive
`change history, and 7) variable history dates. This document provides
`definitions, derivation of terms, data model structure and references,
`NLM and FDAAA requirements, and historical information for each
`data element in ClinicalTrials.gov to facilitate understanding of when
`variables were added, modified, or discontinued. The high-level data
`dictionary provides a summary view of the variables contained in the
`AACT database.
`
`2. A Methodology to Regroup Studies in
`ClinicalTrials.Gov by Specialty
`from multiple clinical
`ClinicalTrials.gov contains
`studies
`domains. While the AACT database facilitates the aggregate
`analysis of the entire dataset, it does not in itself support analysis
`within specific specialty domains. We therefore developed a
`methodology to re-group studies from ClinicalTrials.gov by
`clinical specialties as designated by the Department of Health
`and Human Services [14]. In doing so, we relied on MeSH
`condition terms and free-text disease condition terms associated
`with each study in the ClinicalTrials.gov database—a method
`that can be used to develop other specialized datasets for
`analysis.
`2.1. Use of MeSH Terminology in the ClinicalTrials.gov
`Database. Data submitters (study sponsors or their designees)
`are requested to provide Condition and Keywords data as MeSH
`terms when registering a study. Additionally, an NLM algorithm
`
`n = number of intermediate- and top-node MeSH terms for a given specialty
`that do not match the annotations of their lower-level terms. N = total number
`of intermediate- and top-node MeSH terms.
`doi:10.1371/journal.pone.0033677.t003
`
`.·f!fi..
`-~.·
`
`PLoS ONE | www.plosone.org
`
`6
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 6
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0006
`
`

`

`Database for Aggregate Analysis of CT.gov
`
`Musculoskeletal Diseases [C05)
`Bone Diseases [C05.116)
`Bone Diseases, Endocrine [C05.116.132)
`► Acromegaly [C05.116.132.082)
`Congenital Hypothyroidism [C05.116.132.256)
`Dwarfism, Pituitary [C05.116.132.358)
`Gigantism [C05.116.132.479)
`Osteitis Fibrosa Cystica [C05.116.132.684)
`
`Nervous System Diseases [Cl0)
`Central Nervous System Diseases [Cl0.228)
`Brain Diseases (Cl0.228.1401
`Hypothalamic Diseases [Cl0.228.140.6171
`Pituitary Diseases [Cl0.228.140.617.738)
`Hyperpituitarism [Cl0.228.140.617.738.2501
`► Acromegaly (Cl0.228.140.617.738.250.100)
`Hyperprolactinemia [Cl0.228.140.617.738.250.450)
`Pituitary ACTII Hypersecretion f Cl0.228.140.617.738.250.7251
`
`Endocrine System Diseases [C191
`Pituitary Diseases [C19.700)
`Hyperpituitarism (C19.700.355)
`► Acromegaly (C 19.700.355.179)
`Gigantism (C19.700.355.528)
`Hyperprolactinemia (C 19. 700. 355. 600)
`Pituitary ACTII Hypersecretion ( C 19. 700. 355. 800)
`
`Figure 5. MeSH trees for acromegaly. Source: 2010 online MeSH thesaurus (available: http://www.nlm.nih.gov/cgi/mesh/2010/MB_cgi).
`doi:10.1371/journal.pone.0033677.g005
`
`also evaluates studies and applies MeSH terms according to the
`following steps: 1) study records are checked for the presence of a
`MeSH term,
`including synonyms and lexical variations; 2)
`weighted scores are computed for all matches, with exact
`matches, lexical variations, and synonyms receiving descending
`proportional weight; 3) very common terms are excluded to avoid
`confounding; 4)
`location by data element
`is considered and
`weighted in the term scoring process; and 5) terms with scores
`exceeding the cutoff value are applied to the respective studies.
`(Note that the output from steps 1 and 2 is used for both condition
`and intervention annotations; the field weights are different for
`each and divert
`terms into the target annotation type.) This
`method does not consider the natural-language context
`for
`matched terms or ontologically related concepts that would add
`specificity. Neither the terms from data submitters nor the NLM
`algorithm attempt to associate a term with a particular MeSH
`hierarchy. These resulting annotated MeSH terms are visible on
`the ClinicalTrials.gov website and populated in the condition_browse
`and intervention_browse fields in the downloaded XML file for each
`study. Figure 4 illustrates how MeSH terms are created in the
`ClinicalTrials.gov database.
`
`2.2. MeSH Disease Conditions Annotation. Condition
`and intervention terms in the MeSH thesaurus are arrayed in
`hierarchical branching structures, called trees; each branching
`point is referred to as a node. Nodes range from 1 (highest level) to
`12 (lowest level) in the 2010 version of the MeSH thesaurus. For
`example, one high-level category that we used to classify studies by
`clinical specialty was Diseases. In the 2010 MeSH thesaurus, this
`category contains 23 subcategories (Table 2).
`In order to create specialty datasets from the larger AACT
`dataset, we selected four high-level MeSH nodes
`from the
`thesaurus to serve as an initial basis for identifying studies by
`clinical specialty. Reviewers with relevant subject matter expertise
`annotated MeSH terms from the following high-level nodes: 1)
`Diseases; 2) Analytical, Diagnostic and Therapeutic Techniques and
`Equipment; 3) Psychiatry and Psychology; and 4) Phenomena and Processes.
`A total of 18,491MeSH IDs associated with 9031 MeSH terms
`were reviewed and annotated by clinical specialists belonging to
`one of the 13 clinical specialties and five sub-specialties, which
`were selected on the basis of availability of faculty representation
`and volunteers at Duke, as well as intention to analyze subsets of
`data by clinical specialty. Participating specialty annotations
`
`PLoS ONE | www.plosone.org
`
`7
`
`March 2012 | Volume 7 |
`
`Issue 3 | e33677
`
`MPI EXHIBIT 1065 PAGE 7
`
`Apotex v. Novo - IPR2024-00631
`Petitioner Apotex Exhibit 1065-0007
`
`

`

`Database for Aggregate Analysis of CT.gov
`
`Do any
`condition_browse or
`condition terms have
`ambiguous tags for
`Specialty X?
`
`Do all
`condition_browse terms
`and do all condition terms
`have a N tag for
`Specialty X?
`
`N
`
`y
`
`Do any
`cond ition_browse terms or
`do any condition terms have
`a N tag for
`Specialty X?
`
`N
`
`Possibly assign to
`Specialty X (GROUP 2)
`
`j
`
`Assign N for
`Specialty X
`(GROUP3)
`
`j
`j
`j
`I
`I
`I
`I
`I
`I
`I
`1
`
`Possibly assign N for
`Specialty X (GROUP 4)
`
`,
`
`"'
`
`Uncl assified for
`Specialty X
`(GROUP 5)
`
`- - -- - - --- - _Y{_ - -- A~:_ - -,
`____ _ studies ___ J Review study details, if !.t.,-,(cid:173)
`
`For study NCT00zzzzzz
`
`Do any
`condition_browse or
`condition terms
`have a Y tag for
`Specialty X?
`
`y
`
`Assign Y for Specialty X
`(GROUP 1)
`'
`--------- ~ -- ---------
`Review study details, if
`desired
`- - - -- - - - - - ....... -- - - - - - - - -
`'
`E1<clude' .. ,
`unrelated
`studies
`
`',,,,
`
`-------- v ---------
`Review study details, if
`desired
`,
`- -- - ---7---- - - - - - - -
`'
`
`-- -
`
`Include
`related
`/
`studies,,, "
`
`Include
`related
`
`Final su bset for Specialty X manuscript.
`
`desi red
`:
`:
`•- - ------ ------- -- - -- -- - I
`
`Figure 6. Rules for deciding whether a given study belongs to a given specialty.
`doi:10.1371/journal.pone.0033677.g006
`
`included cardiology, dermatology, endocrinology, gastroenterol-
`ogy,
`immunology/ rheumatology,
`infectious diseases, mental
`health, nephrology, neurology, oncology, otolaryngology, pulmo-
`nary medicine, reproductive medicine, while subspecialty anno-
`tations included peripheral vascular disease, peripheral arterial
`disease, diabetes,
`thyroid disease, and bone disease. The
`association of terms with clinical specialties was performed in
`the context of the anticipated analysis of the data subset for
`respective specialties. The results of this extension to the AACT
`database,
`including specialty tags, will be shared in future
`publications.
`2.3. Validation of Inconsistently Annotated MeSH Terms
`and Limitations of Using the MeSH Hierarchy. A term
`occurring at a particular node ‘‘node x’’ (parent) may have several
`branches (children) at node x+1 that provide a finer classification
`of the node-x term. Clinical specialists were advised to review the
`hierarchy of an individual MeSH term during the annotation
`process. Annotated MeSH descriptors were programmatically
`reviewed for hierarchical inconsistencies in order to maintain the
`logical relationship between parent and child MeSH descriptors.
`
`Table 4. Number of studies reviewed by each set of clinician
`reviewers.
`
`Reviewer A ID
`
`Reviewer B ID
`
`Studies reviewed (n)
`
`Clinician 1
`
`Clinician 1
`
`Clinician 4
`
`Clinician 6
`
`Clinician 2
`
`Clinician 3
`
`Clinician 5
`
`Clinician 7
`
`200
`
`400*
`
`200
`
`200
`
`*The combination of Clinician 1 (‘‘A’’) and Clinician 3 (‘‘B’’) together reviewed 2
`batches of studies.
`doi:10.1371/journal.pone.0033677.t004
`
`Tag validity was evaluated by a process based on annotation rules.
`In general, selection or negation of a parent MeSH term should
`match with all subsequent child MeSH terms below that node.
`Hierarchical inconsistencies in MeSH annotations were flagged
`and accepted after further review and confirmation by clinical
`specialists. The
`anticipated
`inconsistency
`of
`the MeSH
`hierarchical
`structure with clinical
`specialty groupings was
`confirmed in the validation process. Table 3 shows
`the
`frequency of parent terms that did not match with annotations
`for their children terms.
`Further, a term might appear within more than one tree. For
`example, the MeSH term Acromegaly appears as part of multiple
`trees within the topmost MeSH hierarchical category of Diseases
`(Figure 5).
`location, its context could fall
`Depending on its hierarchical
`under Musculoskeletal Diseases, Nervous System Diseases

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket