throbber
Methods of Information in Medicine
`
`,—,.
`
`K. Pommerening, M. Miller,-
`I. Schmidtmann, J. Michaelis
`
`Institut fur Medizinische Statistik
`und Dokumentation
`
`der Johannes—Gutenberngniversitat,
`Mainz, Germany
`
`Pseudonyms for Cancer Registries
`
`Abstract: In order to conform to the rigid German legislation on data priva—
`cy and security we developed a new concept of data flow and data storage
`for population—based cancer registries. A special trusted office generates a
`pseudonym for each case by a cryptographic procedure. This office also
`handles the notification of cases and communicates with the reporting
`physicians.
`it passes pseudonymous records to the registration office for
`permanent storage. The registration office links the records according to the
`pseudonyms. Starting from a requirements analysis we show how to con—
`struct the pseudonyms; we then show that they meet the requirements. We
`discuss how the pseudonyms have to be protected by cryptographic and
`organizational means. A pilot study showed that the proposed procedure
`gives acceptable synonym and homonym error rates. The methods de—
`scribed are not restricted to cancer registration and may serve as a model
`for comparable applications in medical informatics.
`
`Keywords: Cancer Registry, Data Protection, Data Encryption, Pseudonyms,
`Record Linkage.
`
`1. Introduction
`
`the rigid German
`recently,
`Until
`legislation on data privacy and data
`security has hindered comprehensive
`cancer registration in major parts of
`Germany. The new European directive
`on data protection [1] may pose further
`difficulties. The basic premise states
`that permanent storage of an individ-
`ual’s medical data together with his/
`her identification data is allowed on the
`
`basis of informed consent only. How—
`ever, many cancer
`atients nowadays
`
`__._,.,.w«—“_,.
`
`the naturatn"Wé and, there-
`
`
`
`
`
`is desirable that
`it
`registry. Hence,
`physicians should have the right to noti—
`fy incident cases without obtaining in-
`formed consent in order to assure the
`
`necessary completeness of cancer regis—
`tration. Notification without informed
`
`consent is regarded as violation of an
`individual’s constitutional right to data
`
`
`112
`
`is compensated by
`
`privacy, unless it
`anonymity.
`A cancer registry, however, needs
`identification data for record linkage, to
`identify multiple notifications of the
`same individual, and to record follow—
`up information on individuals. On the
`other hand, scientific analysis of the
`registry data is generally performed
`anonymously and does not include any
`reference to individual
`identification
`data.
`To minimize the violation of data
`
`privacy we developed a new organiza—
`tional and technical concept for cancer
`registries which has been approved by
`data—protection officials and incorpo-
`rated into the corresponding German
`federal legislation [2]. In our concept
`the registry is separated into two offices
`with complementary functions. The
`concept makes extensive use of data
`encryption and provides data privacy by
`pseudonymous data storage. This mode
`of data storage allows record linkage by
`matching of pseudonyms and does not
`
`intcrfcrc with the scientific require—
`ments 'of a cancer registry. In certain
`cases a‘controlled re—identification of
`
`records might be necessary to obtain
`follow—up information about cases. The
`concept includes provisions for achiev—
`ing this.
`A pilot study was initiated in 1992 to
`explore the possibilities for
`running
`a population-based cancer
`registry
`in Rheinland—Pfalz ' (Rhineland-Palati—
`nate) on the basis of this concept {3—5].
`The results show that
`the proposed
`compromise between research interests
`and privacy issues is practicable and
`sound. Further overviews have been
`
`given in [6-8]. The concept has also
`been adopted for the pilot phase of
`the cancer registry of Niedersachsen
`(Lower Saxonia) [9].
`The cryptographic concept of pseu—
`donymity can be adapted to other situa—
`tions where a fundamental conflict
`
`between the goals of privacy and public
`interest needs to be solved, e. g., to con-
`trol the effiency of health care [10, 11].
`
`
`MWMWVW‘.M..,..
`
`2.}
`
`ider
`OI‘Cit
`info
`cont
`
`pres
`long
`call:
`the
`owr
`
`t
`is
`was
`
`Che
`
`priv
`elec
`actir
`
`app
`ano
`
`[10}
`betr
`
`pan
`
`of ;
`follr
`
`E" seat/rots
`
`rte.qwnmrtr—jmmoro
`
`tJt‘i
`
`f"?
`30ELM
`the
`
`pser
`inst
`
`nyn
`
`psei
`algt
`pro:
`hasi
`valr
`tion
`
`CYYI
`14].
`inst
`
`Meti
`
`
`
`SYMPHONY00066120
`
`

`

`the
`registry,
`pseudonym for cancer
`procedure should depend on a secret
`key which is kept by the trusted insti~
`tution. Such a pseudonym can by no
`means be uncovered;
`the key-depen-
`dent procedure even prevents
`un—
`authorized trial encryption, at
`least
`from outside.
`
`This kind of pseudonym does not ‘
`meet requirement 2, the reason is lack
`of fault tolerance: the encryption pro—
`cess cannot compensate for slight varia—
`tions in the identification data, e. g, mis-
`takes in spelling the name. This is not a
`problem when machine~readable iden~
`tification data on patient cards can be
`used; but this is not always the case.
`Certain notifying institutions, such as
`pathologists, may not have access to the
`patient card. Old data (from the time
`before the introduction of patient
`cards) should also be linked. In any
`case, requirement 2 conflicts with com~
`plete anonymity;
`the model has
`to
`provide a balance between these two
`conflicting goals. What we need is a
`concept of error detection and error
`‘ correction for encrypted data. Finding
`an optimal solution is an interesting
`problem for further research. As a first
`solution we divide the ‘one—way’ part of
`the pseudonym into a set of ‘linkage
`data’
`that satisfy requirements I, 2
`and 5.
`
`In order to meet requirement 4 we
`add a second part to the pseudonym.
`This part derives from the identification
`data of the patient by encryption; the
`key is known only to the trusted institu—
`tion. For reasons to be discussed later
`
`3. Organizational structure
`of registry
`
`The cancer registry consists of two
`separate offices at separate locations.
`The first office (trusted office, “Ver—
`trauensstelle”) basically serves for the
`notification and generates the pseudow
`nyms. The second office (registration
`office, “Registerstelle”) links the re—
`cords and stores data permanently.
`
`3.]. Identity Data and
`Epidemiological Data
`
`In the following we distinguish
`between identity data and epidemiolog-
`ical data. Identity data are:
`— surname, former surname(s), given
`name(s),
`— address,
`— date of birth, date of death,
`4 date of diagnosis,
`4 notifying physician or healthucare in—
`stitution.
`
`Epidemiological data are those data
`that are needed in every meaningful
`statistical evaluation of
`the registry
`data:
`
`— gender,
`— census code of place of residence,
`~— professional group,
`4 year of birth, year of death,
`— year of diagnosis,
`— date of notification,
`— tumor classification,
`w further medical data.
`
`3.2. The Trusted Office
`
`2. Pseudonyms
`
`Pseudonyms are distinct, unlinkable
`identities that an individual assumes in
`
`order to hide his or her true identity. In
`information technology pseudonyms
`control
`the matching of data while
`preserving privacy. A pseudonym be—
`longs to one person only (henceforth
`called ‘the owner”) but does not reveal
`the identity of that person. If only the
`owner can uncover the pseudonym, it
`is called ‘untraceable’. This concept
`was
`introduced into cryptology by
`Chaum [12];
`it
`is useful
`to protect
`privacy in electronic banking, electronic
`elections, and other electronic trans—
`actions. Possible (but not yet realized)
`applications in the medical domain are
`anonymous
`electronic
`prescriptions
`[10] or
`the settlement of accounts
`between physicians and insurance com-
`panies [11].
`Cancer registries need a distinct kind
`of pseudonyms which must satisfy the
`following requirements:
`1. The registry must be able to re-
`cognize multiple notifications of the
`same case (record linkage).
`2. The record linkage procedure should
`minimize synonym and homonym
`errors (see section 6) to yield suffi-
`cient data quality.
`3. Collaborating registries should be
`able to match their records.
`4. In certain controlled circumstances
`
`a pseudonym
`the uncovering of
`should be possible for obtaining ad-
`ditional information, e.g. within the
`scope of case—control studies.
`5. The owner should not be able to
`
`
`
`uncover his own pseudonym.
`This last point derives from the right
`to notify a case without informing the
`patient about his disease. It implies that
`the owner should not generate his
`pseudonym; instead, we need a trusted
`institution that generates the pseudo—
`nyms.
`shared among too many parties. There
`To satisfy the first requirement the
`fore, for inter—registry linking we pro-
`' pseudonym should be generated by an
`pose a re~encryption of the first part
`algorithmic procedure that can be re—
`of the pseudonym with a temporary
`produced. The prefered method is
`(one—time) key (for details, see sec—
`hashing [13, par. 6.4]. Since the hash
`tion 5.3).
`values should not reveal any informa-
`tion about the original data, we use a
`Our concept of pseudonymity in
`cancer registry needs an organizational
`cryptographic hash function [14, chap.
`framework that is described in the next
`handled in the same way as notification
`14]. Since no one except the trusted
`forms.
`section.
`institution should be able to generate a
`
`we use asymmetric encryption with two
`keys (see section 5.1).
`The reason for requirement 3 is
`that
`the German Federal States will
`
`have separate registries. To enable
`anonymous data matching between
`these registries they could use a com—
`mon cryptographic key, but this is not
`advisable: A secret loses its value if
`
`The trusted office accepts incoming
`reports from physicians or hospital—
`based cancer registries. These reports
`are checked for completeness
`and
`plausibility. If necessary, this office ob—
`tains additional information from the
`
`reporting physicians. It codes the re-
`ported diseases according to classifi-
`cation schemes such as
`ICD—9 and
`
`ICD—IO. Thereafter, it assigns a pseudo
`nym to the record, and sends the pseu-
`donymous record to the registration
`office. After a short period of time,
`when any discrepancies are cleared,
`the trusted office deletes the records
`in its database. Death certificates are
`also sent
`to the trusted office and
`
`Meth. Inform. Med, Vol. 35, No.2, 1996
`
`113
`
`SYMPHONY00066121
`
`

`

`
`
`
`
`
`Fig.1 Organiza—
`tional structure and
`information flow.
`
`
` Physician
`Trusted Office
`
`ncrypts identification
`
`
`ata
` Hospital based
`
`registry
`
` forwards
`reports
`Health care
`
`data
`
`institution
`; implausible
`
`
`data
`
`Public health
`
`
`Registration office
`department
`
`
`(death
`stores
`
`
`
`certificates)
`
`pseudonyms
`
`
`epidemiological data
`
`a
`.
`.
`
`e d
`
`l l
`
`
`
`taining the sequence number and per~
`sonal identification data is sent to the
`
`trusted office in parallel. This office
`generates the pseudonym and sends it
`to the registration office, together with
`the sequence number. The registration
`office performs the record linkage and
`generates a record which contains the
`sequence number and the epidemiolog—
`ical data stored in the registry. Thereaf~
`ter, epidemiological data and exposi—
`tion data may be linked for further anal-
`ysis by using the sequence number. This
`procedure ensures that for the purpose
`of the study nobody sees which cohort
`members were diseased.
`
`A corresponding procedure applies
`to casevcontrol studies if only the epi—
`demiological data which are kept in the
`registry are needed for such a study.
`If it is necessary to obtain additional
`information from the diseased patients,
`the identification data may be decrypt—
`ed using the ire—identification key which
`
`is kept in the supervising office (see sec~
`tion 3.2). Re—identification has to be ap—
`proved by an ethics committee and is
`done in the supervising office; techni-
`cally this could also be realized with a
`portable PC operated by an employee
`of the supervising office. The decrypted
`identification data are then given to the
`trusted office. In some cases the neces~
`
`sary data can be retrieved from the
`notifying institution. If it is necessary to
`contact
`the patient for an additional
`inquiry, the trusted’office has to obtain
`informed consent from the patient via
`the notifying or
`treating physician
`whose identity is stored as part of the
`(encrypted) identification data of the
`patient (see section 3.1).
`
`
`4. .A Registry Model
`
`Since a strict formalization of the
`
`procedures of the previous section in
`
`The trusted office is directed by a
`physician and, therefore, is subject to
`professional discretion in addition to
`data-protection laws. It is trusted by all
`other parties, hence the German name
`“Vertrauensstelle”. Nevertheless,
`the
`decryption key — the ‘private’ key of
`the asymmetric encryption procedure,
`henceforth
`called
`‘re—identification
`
`key’ - is held in a second trusted institu—
`tion outside the cancer registry. There
`are several sensible choicesfor this in—
`stitution; in the following we call it the
`‘supervising office’. The separate hand—
`ling of the re-identification key empha-
`sizes the ‘separation of informational
`powers’ and makes clear that decryp—
`tion (2 re—identification) is an excep-
`tional process. Moreover, it gives addi—
`tional security in case of a compromised
`encryption key.
`
`3.3. The Registration Office
`
`The registration office receives pseu—
`donymous data only. With these data it
`performs record linkage and detects
`duplicate notifications; then it stores the
`pseudonyms and the epidemiological
`data permanently. If the record linkage
`reveals any inconsistencies,
`these are
`reported back to the trusted office
`which, in turn, may sort out any dis—
`crepancies by contacting the reporting
`physicians. in the same way the office
`links a death certificate to an existing
`patient record. Figure 1 illustrates the
`data flow. Only the registration office
`stores records permanently.
`
`3.4. Epidemiological Studies I
`
`
`
`
`
`
`The pseudonymous records serve for
`
`Sequence #
`Source of Cohort
`
`identification data
`routine analyses of the cancer registry
`
`Trusted Office
`,
`Sequence #
`
`as well as for epidemiological studies.
`
`identification data
`
`Sequence #
`Figure 2 illustrates the procedure for a
`Exposure data
`Pseudonym
`cohort study: if a well—defined cohort
`
`(e.g., occupationally exposed employ—
`Registration office
`
`ees of a company) is to be analyzed for
`Sequence #
`
`Pseudonym
`Exposure data
`the occurrence of cancer, a sequence
`
`Epidemiological data
`number is assigned to each individual
`
`member of the cohort and possibly
` Sequence #
`
`also to non—exposed controls. These se—
`Epidemiological data
`
`quence numbers serve as simple tempo—
`Research institute
`
`rary pseudonyms for the study. A re—
`search institute (which could also be the
` Sequence #
`
`
`Exposure data
`registry) obtains a record for each indi—
`Fig.2 Record
`Epidemiological data
`vidual containing the sequence number
`linkage for cohort
`
`and the exposure data. A record COl’l—.
`studies.
`_
`..
`M
`
`
`
`
`
`H4
`
`
`
`Meth. Inform. Med, Vol. 35, No.2, 1996
`
`SYMPI-ioNY00066122
`
`

`

`nd
`
`BC-
`
`1p-
`is
`
`1 a
`'ee
`
`.ed
`he
`es-
`he
`f0
`ial
`iin
`Jia
`an
`ac
`ne
`
`ne
`
`in
`
`the sense of [15] would be too technical
`for this paper, we only give a systematic
`verbal
`(semi—formal) description and
`the access matrix of the registry model;
`some of the less relevant details are
`
`given in a slightly simplified form.
`Every assumption of
`the model
`should be critically examined as
`to
`whether it is sound. For instance, can a
`
`party do things it is not supposed to do?
`What can two or more parties achieve
`through collaboration? The model will
`not give absolute security but will
`Show where additional (organizational)
`'means should be provided. The organ—
`izational framework has to guarantee
`the model assumptions and fill
`the
`security gaps that
`the cryptographic
`procedures leave open.
`In discussing the security of the mod—
`el we assume that the cryptographic
`algorithms are secure and that they are
`implemented in a secure way. The first
`assumption is justified by using state-
`of—the—art
`cryptographic
`techniques.
`The second assumption is more prob—
`lematic and needs careful organization—
`al measures.
`
`~ The exchange key for inter-registry
`record linkage (see 5.4).
`Moreover, we have the identification
`data of the notifying institution for
`clearing discrepancies,
`for obtaining
`follow—up information,
`for
`reporting
`follow~up information in the case where
`the notifying institution is a clinical
`cancer registry, and for compensating
`the reporting physician for his notifica—
`tion. The trusted office also stores other
`administrative data.
`
`The relevant parties for our model
`are the following; for each of these par—
`ties we have to define what knowledge
`it has or transfers and which other par-
`ties it trusts:
`
`— The patient has access to his own
`data, but only via his treating physi—
`cian.
`
`— The notifying institution knows the
`data of its own patients:
`— The treating physician notifies the
`registry of his patients and can be
`asked by the trusted office about
`them.
`institutions
`health—care
`— Other
`which also send notifications are
`
`registries,
`clinical cancer
`care
`institutions,
`and
`Health offices.
`- — The trusted office sees all the data
`
`after~
`Public
`
`except the re-identification key and
`the storage key.
`It permanently
`stores only the encryption key and
`the linkage data key.
`— The supervising office keeps the re‘
`identification key and sees the iden-
`tity data of re—identified cases. '
`-« The registration office sees the pseu-
`donym, the epidemiological data, the
`sequence number, the storage key,
`and also stores these data perma-
`nently (except
`the sequence num~
`'ber).
`~~ The cooperating registry:
`— The trusted office sees the ex—
`
`.
`
`4.]. Data and Parties
`
`In the semi—formal description of
`the model we speak of the patient, the
`cooperating
`registry,
`the
`sequence
`number etc., although in reality there
`are several instances of each of these
`classes.
`
`The knowledge (or data) in our model
`consists of the following parts:
`— The identity data (see 3.1).
`— The pseudonym
`— the encrypted identity (see 5.1),
`— the linkage data (see 5.3);
`they
`occur in “pure hash’ format,
`in
`‘linkage’ format, in ‘storage’ for—
`mat, and in ‘exchange’
`format
`(see Fig. 5).
`— The epidemiological data (see 3.1).
`— The sequence number, a temporary
`pseudonym for a research project as
`in 3.4.
`
`- The encryption key for asymmetric
`encryption of identification data.
`, The re—identification key for re—iden—
`tification of identity data.
`— The linkage data key for generating
`the linkage data (see 5.3).
`~ The storage key for permanent stor—
`age of the linkage data (see 5.3).
`
`~— The outsider: any person or institu—
`tion other than those listed above ~
`
`has access only to communication
`paths
`and
`perhaps
`to
`storage
`media, if these leave the registration
`office, say,
`in case of a hardware
`defect.
`
`notifying-
`the
`bank where
`The
`is ignored.
`physician has his account
`Only a very small amount of informa-
`tion can be gained by observing the
`financial transfers, e. g, that a certain
`physician has a cancer patient at a cer—
`tain time.
`
`In the following we discuss only the
`parts of the model that are relevant for
`the pseudonymity aspect. For example, ’
`data on storage and communication
`media should be useless for the outsid-
`
`er; this is achieved by encryption of all
`communication paths and all storage
`media. In particular, the notifying insti—
`tutions should communicate with the
`
`trusted office in a secure manner, i.e.,
`using encrypted data transfer. Hence—
`forth, we assume that the outsider can
`gain data access only through collabora-
`tion with some other institution, and
`leave the security of communication
`and storage outside the scope of this
`papen
`
`4.2. The Access Matrix
`
`Figure3 gives the access matrix of
`the registry model. We have to show
`thatno party can get additional infor—
`mation by inferencing,’ in other words,
`that the access matrix as shown in Fig. 3
`is complete. Since the model
`involves
`cryptographic keys, i.e., data that imply
`access to other data,
`the question is
`what subsets of the set of data in the
`access matrix are ‘closed’ with respect
`to infereneing. This gives only a 'naive’
`proof of security;
`there are indirect
`ways for getting additional informations
`(see section 4.3).
`We have a single inference that
`needs no key:
`id 7» ldh,
`where the symbols are'takenl from Fig. 3
`and the arrow denotes the inference. In
`other words: whoever has the iden-
`
`change key and the pseudonyms,
`even in pure hash format.
`— The registration office sees the
`linkage data in its own linkage for—
`mat. In case of a match it gets the
`full registry data, which is the aim
`of the linking procedure.
`tification data can derive the linkage
`— The research institute gets the se-
`data in‘pure hash format, because the
`quence number and the epidemi-
`hash algorithm is publicly known and
`ological data as well as the exposure
`needs no key. The complete list of key-
`data which are outside the scope of
`dependent inferences is as follows:
`the registry model (see 3.4).
`
`“alCA
`
`Meth. Inform. Med, Vol. 35, No.2, 1996
`
`l15
`
`SYMPHONY00666123
`
`

`

`
`
`
`
`Linkagedata(storageformat).[ldlsjiinkag'e'ézita'éiéiiénger.)[ldxl
`
`
`
`Epidemiologicaldata[ep]
`
`
`
`
`
`Sequencenumber[sq]
`
`............-...
`'Reidentificationkey[kre]
`
`
`
`Encryptionkey[kei
`
`-illinkag'e’datakeylkldl
`
`-Storagekeylkstl
`
`‘'éiéiéigéare;
`
`
`
`Fig. 3 Access matrix of the registry model. 1only own patients; 2 only re~identified cases;
`3in its own linkage format.
`
`The cooperating trusted office sees
`the linkage data even in pure hash
`format and could perform a
`trial
`encryption. However,
`it
`is trusted by
`definition.
`
`The registration office could try
`illegal data matching with the epidemi—
`ological data and a statistical attack at
`the linkage data in linkage format.
`The supervising office sees the iden«
`tity data of re—identified cases. How—
`ever, it is also trusted, and it gets only
`few data.
`The trusted office sees the iden—
`
`tification data and the epidemiological
`data, but it is trusted by definition.
`The notifying institution and the
`patient get no knowledge of data they
`should not know. They know their own
`data only.
`The question what a party can do
`that has unauthorized knowledge of an
`additional piece of data, say, by col—
`laborating with another party, can be
`answered by the analysis in section 4.2.
`Covert channels could be exploited, for
`instance, by faking notifications; we
`come back to this
`in section 7.1.
`
`employee of the registration office or of
`the research institute; the trusted office
`that also sees the epidemiological data
`sees the identity anyway.
`
`
`
`5. Encryption Procedures
`
`Encryption of identifying data is per—
`, formed by using different techniques
`which are suited for different purposes.
`A detailed technical description of the
`basic algorithms is given in [14]. As a
`basis to assess the performance of the
`procedures one has to take an expected '
`number of 50,000 notifications each
`year for Rheinland-Pfalz. The efficien-
`cy of the procedures also suffices for
`larger registries.
`
`5.1. Asymmetric Encryption
`0f Identification Data
`
`Asymmetric encryption techniques
`use two different keys for encryption
`and decryption, often called ‘public key’
`and ‘private key’. This notation, how—
`ever does not fit in the present context.
`Unauthorized matching with epidemi—
`ological data is only possible for an
`Therefore we speak of ‘encryption key’
`
`
`”6
`
`
`‘ M Mnmausmwmmkasn 9.. .L.
`
`
`Meth. Inform. Med, Vol. 35, No.2, 1996
`
`SYMPHONY000661éZ 7
`
`7
`
`Linkagedata(linkageformat)[1d,]
`
`
`
`
`
`
`
`_,,,,...,..,,
`...-.u.......
`
`a.dt.«J
`,13
`c
`.9.4
`t6
`4:.V A
`E:
`g:H.
`
`25
`
`s : sees
`
`(and temporarily
`stores)
`k = keeps
`(= permanently
`stores)
`
`d = can derive
`
`Pseudonym(encryptedidentity)[ps]
`
`
`
`..Linkagedata(purehashf.).[.liih]
`
`
`
`
`
`Patient
`
`straggggg‘ita;"“"
`Tmstedoffice .. .. .. .. .. .... .. .... .
`"gassing; Iiiééé" " ......
`Reglstmuonofficg ........... ,
`Cooperating trusted office
`
`3
`
`2
`
`.
`
`.
`
`.
`
`.
`
`
`Research institute
`
`Outsider
`
`
`
`l
`
`id -> ps,
`ke:
`km: ps -> id,
`km: idh H ldl,
`k“: 1d, <~+ ldS,
`kx: 1d,, <—>
`ldx.
`Therefore, the access matrix is com-
`plete. The only way to infer the iden~
`tification data id is by knowledge of
`ps and km, the encrypted identification
`data and the re—identification key.
`Hence this can only be done by the
`supervising office.
`
`4.3. Indirect Ways
`for Re—identification
`
`The goal of the registry model is to
`make unauthorized re-identification as
`difficult as possible. However, what is
`possible, if the access matrix is guaran-
`teed by the implementation of the mod—
`el? The multitude and nature of indirect
`
`ways for making inferences about the
`data cannot be completely delineated.
`This is the main difficulty in proving the
`validity of any security model formally.
`Some relevant methods that should be
`considered are:
`
`— trial encryption (guessed plain~text
`attack),
`— data matching with outside sources
`{16}.
`~ statistical attacks [16],
`covert channels [17],
`— social
`engineering
`forced collaboration).
`The outsider sees none of the data.
`
`(voluntary or
`
`He could gain access only by collabora-
`tion with another party.
`The
`research institute
`
`sees
`
`the
`
`epidemiological data and could try an
`unauthorized matching with an external
`data source. This danger is inherent in
`the granularity of the epidemiological
`data and cannot be made smaller by
`any model whatsoever. Therefore, the
`release ofsubsets of epidemiological
`data is restricted according to avspecific
`project.
`The cooperating registration office
`only sees the linkage data in its own
`linkage format. It could try a statistical
`attack to find out some frequent names
`or use distribution anomalies of birth I
`data. But this will hardly suffice to iden~
`tify even a single case other than those
`that
`this registry has among its own
`records.
`W
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`......
`
`......s.......u.-.. s.......i.i...
`
`

`

`
`
`and ‘re—identification key’. Knowledge
`of one of the keys does not help in any
`way to derive the other.
`The identity data of each incoming
`record are encrypted in the trusted of—
`fice using the encryption key, see Fig. 4.
`If, under special circumstances (as in
`3.4),
`the decryption of some iden-
`tification data becomes necessary, the
`registration office sends the encrypted
`identity data back to the trusted office
`that initiates the re-identification, see
`section 3.4.
`
`The most suitable asymmetric en~
`cryption method, according to the state—
`of—the—art, is the RSA algorithm [14, 18,
`19]. It uses the mathematical operation
`of modular exponentiation, x e x6 mod
`n; character strings are treated as num-
`bers according to their bit patterns and
`decomposed into blocks such that each
`block represents a number smaller than
`n. The modulus n is, a very large num-
`ber. The exponent e is the encryption
`key. The re—identification key d has a
`size similar to n and the property that
`xed E x (mod n). Thus, modular expo-
`nentiation with d is the inverse opera—
`tion of modular exponentiation with e.
`Deriving e from n and d requires de—
`composition of n into its prime factors,
`a task that is mathematically infeasible,
`if n is large enough. Experts recom—
`mend a key length of >700 bits [20].
`Since in a cancer
`registry data are
`stored for a long time, one should rath—
`er choose a key length of >1,000 bits to
`be prepared for possible technological
`progress. For performance reasons, in-
`stead of RSA one could use a hybrid
`encryption method [19, section V1.7]
`such as RSA + DES or I’GP (RSA +
`
`IDEA) [14, section 17.9]. This makes
`sense as soon as the data to be encrypt~
`ed are longer than a single RSA block.
`DES and IDEA are symmetric encryp—
`tion procedures, meaning that encryp-
`tion and decryption use the same key.
`The exact description is too complicat-
`ed to be given here; we refer to [14, 17].
`They are several orders of magnitude
`faster than all known asymmetric pro
`cedures but do not fit directly to our
`model which relies on asymmetric en—
`cryption. Therefore, a hybrid combina—
`tion with RSA has to be used.
`
`If an employee of the registration of—
`fice gains knowledge of the encryption
`key, or if an outsider gains knowledge
`of the encryption key and access to the
`registered data, he could perform a trial
`encryption (‘chosen plain-text attack’)
`with the corresponding identity data.
`In order to prevent this possible misuse,
`each record is complemented by a
`random number before
`encryption.
`As shown in Fig. 4, this random number
`is kept
`in the encrypted part of the
`record.
`
`5.2. Key Management
`
`The keys have to be generated in a
`secure manner under special organiza—
`tional precautions, e.g., in the supervis—
`ing office. The encryption key is kept in
`the trusted office. It has not necessarily
`to be kept secret because the encryption
`is randomized (see section 5.1). There—
`fore, there is no need for a cryptograph—
`ic token, like a smart card, to hold this
`key. But a smart card is desirable as
`access~control token. It could then also
`
`hold the key. On the other hand, the
`
`
`
`‘need to know’ principle says that it is
`better keeping the key secret.
`There are two cases where a change
`of the encryption and re-identification
`keys becomes necessary:
`— The actual keys are compromised; at
`least there is suspicion that an unau-
`thorized person has got the keys.
`— The progress of cryptanalysis or the
`performance of hardware have ad
`vanced to a great extent such that the
`chosen key length can no longer be
`assumed to be sufficient.
`
`In these cases a new, more secure
`pair of encryption and re-identification
`keys has to be generated and used. This
`could be done by decrypting and then
`re—encrypting all the stored records in
`the trusted office. [However, the Ger-
`man BSI (‘Bundesamt fur Sicherheit in
`der Informationstechnik’, Federal Of—
`fice for Security in Information Techno-
`logy) proposed a more efficient meth—
`od: define the new encryption method
`to be the composition of the old one
`and the “over—encryption” with the new
`key, thereby avoiding even a temporal
`exposition of the plain—text data; the
`future decryption key is the composi~
`tion of the old and the new keys. Over-
`encryption of the old records can be
`done in the registration office under
`special security precautions. An analo~
`gous procedure also applies in case the
`chosen encryption method is invalidat—
`ed by new research results.
`An alternative method to handle key
`changes without temporarily generating
`plain text was proposed by Miller [21]. .
`It eliminates the need of superimposing
`the old and new encryption procedures
`and keeping the old key. On the other
`hand, it works only with a slightly re—
`stricted version of the RSA algorithm.
`
`5.3. Linkage Data and Anonymous
`Data Matching
`
`
`
`
`
`
`
`f
`
`
`
`Muller-Liidemcheid
`MariewLuise
`BeispielstraBe 123
`45678 Musterstadt
`21.7.1966
`28.2.1995
`uh
`
`
`
`3j&kl98abx?b
`
`Epidemiological
`data
`
`Trusted Office
`
`Epidemioiogical
`data
`
`Registration office
`
`To generate the linkage data we ex—
`tract
`the following components from
`the identity data: Name'(s), surname(s),
`phonetic codes,
`the name code of
`the former GDR, day and month of
`birth. Then these
`components
`are
`separately encrypted, in a first step by
`using a one—way hash function [14], in a
`subsequent step by using a symmetric
`encryption algorithm [14] with the ‘link~
`age~data key’; then they are in ‘linkage
`Fig.4 Asymmetric encryption of identification data.
`
`Meth. Inform. Med, Vol.35, No.2, 1996
`
`1.17
`
`' SYMPHONYOdOBBQEI
`
`

`

`
`
`From left to right the security increases:
`The clear—text format shows the full in—
`
`formation; the pure hash format allows
`trial encryption and record linkage; the
`linkage format allows record linkage
`only; and the storage format gives com—
`plete anonymity.
`For record linkage the registration
`office compares the linkage data and
`other unencrypted identifying data of a
`new case with all the stored records. In
`case of small differences,
`if there is a
`reasonable evidence of match, the case
`(is reported back to the trusted office
`that tries to clarify the case. In very few
`exceptional cases this procedure could
`necessitate a re»identification as in sec—
`tion 3.4.
`
`5.4. Inter-registry Matching
`
`From time to time, e.g., once per
`year,
`the collaborating registries are
`allowed to link their records in order
`
`to detect common notifications, e.g.,
`caused by change of residence, or notifi—
`cations by a treating physician and a
`hospital in the hinterland of the other
`registry.
`For this purpose two registries A
`and B agree upon a temporary one—time
`‘exchange’ key. Registration office A
`transfers a file with the linkage data to
`its trusted office which removes the en—
`
`cryption, getting the ‘pure’ hash values,
`and encrypts these with the exchange
`key. Then it sends them to the trusted
`office of registry B, which removes the
`exchange encryption and does the usual
`linkage~data encryption for i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket