`
`INVENTORS
`
`ISHAAN NERURKAR
`
`CHRISTOPHER HOCKENBROCHT
`
`MICHAEL SHAUGHNESSY
`
`EITAN CHATAV
`
`FIELD OF DISCLOSURE
`
`BACKGROUND
`
`[0001]
`
`The present invention generally relates to a database system, and more specifically
`
`to responding to a database query by executing a differentially private version of the query on the
`
`database.
`
`DESCRIPTION OF THE RELATED ART
`
`[0002]
`
`Personally identifiable information, such as health data, financial records, telecom
`
`data, and confidential business intelligence, such as proprietary data or data restricted by
`
`contractual obligations, is valuable for analysis and collaboration. Yet, only a fraction of such
`
`sensitive information is used by organizations or analysts for statistical or predictive analysis.
`
`Privacy regulations, security concerns, and technological challenges suppress the full value of
`
`data, especially personally identifiable information and confidential and proprietary records.
`
`[0003]
`
`Methods that attempt to solve this problem, such as access controls, data masking,
`
`hashing, anonymization, aggregation, and tokenization, are invasive and resource intensive,
`
`compromise analytical utility, or do not ensure privacy of the records. For example, data
`
`masking may remove or distort data, compromising the statistical properties of the data. As
`
`1
`
`32552/32620/DOCS/3854318.11
`
`
`
`another example, many of the above mentioned methods are not effective when information is
`
`stored in disparate data sources. Technology which enables organizations or analysts to execute
`
`advanced statistical and predictive analysis on sensitive information across disparate data sources
`
`without revealing record-level information is needed.
`
`SUMMARY
`
`[0004]
`
`A hardware database privacy device is communicatively coupled to a private
`
`database system. The hardware database privacy device receives a request from a client device
`
`to perform a query of the private database system and identifies a level of differential privacy
`
`corresponding to the request. The identified level of differential privacy includes a privacy
`
`parameter 8 indicating the degree of information released about data in the private database
`
`system.
`
`[0005]
`
`The differentially private hardware database privacy device identifies a set of data
`
`stored in the private database system and a set of operations to be performed on the set of data
`
`that corresponds to the requested query. After the set of data is accessed, the set of operations is
`
`modified based on the identified level of differential privacy such that a performance of the
`
`modified set of operations produces a result set that is differentially private. The modified set of
`
`operations is performed on the accessed set of data to produce the differentially private result set.
`
`The differentially private result set is provided to the client device for display on a hardware
`
`display of the client device.
`
`32552/32620/DOCS/3854318.11
`
`
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`[0006]
`
`FIG. 1 illustrates a system for receiving a query for a private database, and for
`
`responding to the query by executing a differentially private version of the query on the private
`
`database.
`
`[0007]
`
`FIG. 2 illustrates an example database structure, according to one embodiment.
`
`[0008]
`
`FIG. 3 is a block diagram illustrating the privacy system of the system in FIG. 1,
`
`according to one embodiment.
`
`[0009]
`
`FIG. 4 illustrates displaying results of a differentially private count query, according
`
`to one embodiment.
`
`[0010]
`
`FIG. 5 illustrates an example binary decision tree for use in a differentially private
`
`random forest query, according to one embodiment.
`
`[0011]
`
`FIG. 6 illustrates perturbing the counts for a differentially private histogram query,
`
`according to one embodiment.
`
`[0012]
`
`FIG. 7A illustrates a recursive process for identifying threshold points of the
`
`classification output vector for a differentially private model testing query, according to one
`
`embodiment.
`
`[0013]
`
`FIG. 7B illustrates an example confusion matrix generated during a differentially
`
`private model testing query.
`
`[0014]
`
`FIG. 8 illustrates a system-level modification to the system of FIG. 1 that allows the
`
`client to access to a differentially private synthetic database, according to one embodiment.
`
`[0015]
`
`FIG. 9 illustrates the application of a clustering query to entries of a differentially
`
`private synthetic database, according to one embodiment.
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0016]
`
`FIG. 10 illustrates a process for responding to a database query by executing a
`
`differentially private version of the query on the database, according to one embodiment.
`
`[0017]
`
`FIG. 11 is a block diagram illustrating components of an example machine able to
`
`read instructions from a machine-readable medium and execute them in a processor (or
`
`controller).
`
`DETAILED DESCRIPTION
`
`[0018]
`
`The Figures (FIGS.) and the following description describe certain embodiments by
`
`way of illustration only. One skilled in the art will readily recognize from the following
`
`description that alternative embodiments of the structures and methods illustrated herein may be
`
`employed without departing from the principles described herein. Reference will now be made
`
`in detail to several embodiments, examples of which are illustrated in the accompanying figures.
`
`It is noted that wherever practicable similar or like reference numbers may be used in the figures
`
`and may indicate similar or like functionality.
`
`SYSTEM OVERVIEW
`
`[0019]
`
`FIG. 1 is a system 100 for receiving a query 108 for a private database 106, and
`
`responding to the query 108 by executing a differentially private (DP) version of the query 114
`
`on the private database 106. The system 100 includes a differentially private security system 102
`
`that receives the analytical query 108 from a client 104 and applies a DP version of the query
`
`114 on the database 106. Subsequently, the differentially private security system 102 returns the
`
`response of the DP query 114 to the client 104 as the DP response 112.
`
`[0020]
`
`The database 106 is one or more private databases managed by one or more entities
`
`that can only be accessed by authorized or trusted users. For example, the database 106 may
`
`32552/32620/DOCS/3854318.11
`
`
`
`contain health data of patients, financial records, telecom data, and confidential business
`
`intelligence, such as proprietary data or data restricted by contractual obligations. The
`
`information stored in the database 106 is of interest to one or more clients 104, but clients 104
`
`may not have the necessary authorization to access to information contained in the databases
`
`106.
`
`[0021]
`
`FIG. 2 illustrates an example database structure, according to one embodiment. For
`
`the remainder of the application, a database, including one or more of the private databases 106,
`
`may be referred to as a matrix with a number of rows and columns. Each row is an entry of the
`
`database and each column is a feature of the database. Thus, each row contains a data entry
`
`characterized by a series of feature values for the data entry. For example, as shown in FIG. 2,
`
`the example database 200 contains 8 entries and 11 features, and illustrates a list of patient
`
`profiles. Each patient is characterized by a series of feature values that contain information on
`
`the patient’s height (Feature 1), country of residence (Feature 2), age (Feature 10), and whether
`
`the patient has contracted a disease (Feature 11).
`
`[0022]
`
`The feature values may be numerical in nature, e.g., Features 1 and 10, or
`
`categorical in nature, e.g., Features 2 and 11. In the case of categorical feature values, each
`
`category may be denoted as an integer. For example, in Feature 11 of FIG. 2, “0” indicates that
`
`the patient has not contracted a disease, and “1” indicates that the patient has contracted a
`
`disease.
`
`[0023]
`
`Returning to FIG. 1, the client 104 may be a human analyst or an organization that
`
`does not have direct access to the database 106, but is interested in applying an analytical query
`
`108 to the database 106. For example, the client 104 may be a data analyst, data scientist, or a
`
`health analyst that is interested in the profiles of the patients but does not have direct access to
`
`32552/32620/DOCS/3854318.11
`
`
`
`the database 106. Each client 104 of the system 100 is associated with a privacy budget and
`
`specifies a set of privacy parameters each time the client 104 submits a query 108. The privacy
`
`budget is a numerical value representative of a number and/or type of remaining queries 108
`
`available to the client 104 in terms of the privacy parameters specified for each query 108.
`
`[0024]
`
`The query 108 submitted by the client 104 may be simple queries, such as count
`
`queries that request the number of entries in the databases 106 that satisfy a condition specified
`
`by the client 104, or complicated queries, such as predictive analytics queries that request a data
`
`analytics model trained on the databases 106. Upon submitting a query 108 to the differentially
`
`private security system 102, the client 104 receives a DP response 112 to a differentially private
`
`version of the submitted query 114.
`
`[0025]
`
`The client 104 specifies a set of privacy parameters each time the client 104 submits
`
`query 108. The privacy parameters indicate an amount of decrease in the privacy budget of the
`
`client 104 in return for a response to the query 108. As described below in more detail with
`
`reference to the privacy system 160 in FIG. 3, the privacy parameters specified by the client 104
`
`also indicate the amount of information released about the database 106 to the client 104.
`
`[0026]
`
`The differentially private security system 102 receives an analytical query 108 from
`
`the client 104 and applies a differentially private version of the query 114 on the database 106,
`
`such that it releases a degree of information about the database 106 indicated by the privacy
`
`parameters specified by the client 104, but also protects a degree of privacy of the databases 106
`
`specified by the entities managing the database 106. For example, the entities managing the
`
`database 106 may also set a maXimum threshold on the degree of information released about the
`
`database 106 for a given query 108 that the client 104 may not exceed. Thus, the differentially
`
`private security system balances privacy protection of the database 106 while releasing useful
`
`32552/32620/DOCS/3854318.11
`
`
`
`information on the database 106 to the client 104. The differentially private security system 102
`
`may have complete or partial access to the databases 106.
`
`[0027]
`
`Upon receiving a query 108, the differentially private security system 102 applies
`
`DP query 114 to the database 106 and returns a DP response 112 to the client 104. The DP
`
`query 114 is a differentially private version of the query 108 that satisfies a definition of
`
`differential privacy described in more detail with reference to the privacy system 160 in FIG. 3.
`
`The DP query 114 may include perturbing the response or output of the query 108 with noise, or
`
`perturbing the process for generating the output of the query 108 with noise. The resulting
`
`output of the DP query 114 is returned to the client 104 as DP response 112. Ideally, the DP
`
`response 112 correlates to the original output of the query 108 on the databases 106 but
`
`maintains the degree of privacy specified by the entities managing the database 106.
`
`DIFFERENTIALLY PRIVATE SECURITY SYSTEM
`
`[0028]
`
`The differentially private security system 102 includes a user interface 150, a
`
`library 152, an account management system 154, a query handling engine 156, a data integration
`
`module 158, and a privacy system 160. Some embodiments of the differentially private security
`
`system 102 have different or additional modules than the ones described here. Similarly, the
`
`functions can be distributed among the modules in a different manner than is described here.
`
`Certain modules and functions can be incorporated into other modules of the differentially
`
`private security system 102.
`
`[0029]
`
`The user interface 150 can generate a graphical user interface on a dedicated
`
`hardware device of the differentially private security system 102 or the client 104 in which the
`
`client 104 can submit an analytical query 108 and the desired privacy parameters, and view DP
`
`32552/32620/DOCS/3854318.11
`
`
`
`response 112 in the form of numerical values or images. The client 104 may also inspect
`
`database 106 schemata, view an associated privacy budget, or cache the DP response 112 to view
`
`the response later. The user interface 150 submits properly formatted query commands to other
`
`modules of the differentially private security system 102.
`
`[0030]
`
`The library 152 contains software components that can be included in external
`
`programs that allow the client 104 to submit the analytical query 108, receive the DP response
`
`112, and other functions within a script or program. For example, the client 104 may use the
`
`software components of the library 152 to construct custom data analytic programs. Each of the
`
`software components in the library 152 submits properly formatted query commands to other
`
`modules of the differentially private security system 102.
`
`[0031]
`
`The account management system 154 receives properly formatted query commands
`
`(herein “query commands” or“ C”), parses the received query commands, and updates the
`
`account of the client 104 according to the received query command. For example, the account
`
`management system 154 may check the query commands for syntactic correctness, or check
`
`whether a client 104 has access to a requested resource. As another example, the account
`
`management system 154 may check whether the privacy parameters specified by the client 104
`
`for a given analytical query 108 can be accommodated, and if so, decrement the privacy budget
`
`of the client 104 by the amount specified in the query 108. Query commands verified by the
`
`account management system 154 are provided to the query handling engine 156. Examples of
`
`query commands accommodated by the differentially private security system 102 are listed
`
`below.
`
`QCl. Count
`
`32552/32620/DOCS/3854318.11
`
`
`
`‘SELECT COUNT (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC2. Median
`
`‘SELECT MEDIAN (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC3. Mean
`
`‘SELECT MEAN (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC4. Variance
`
`‘SELECT VARIANCE (<column>) FROM <database.tab1e> WHERE <where_clause>
`
`BUDGET <eps> <de1ta>.
`
`QC5. Inter-Quartile Range
`
`‘SELECT IQR (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET <eps>
`
`<de1ta>.
`
`QC6. Batch Gradient Descent
`
`‘SELECT <GLM> (<columns_x>,<column_y>,<params>) FROM <database.tab1e> WHERE
`
`<where_clause> BUDGET <eps> <de1ta>.
`
`QC7. Stochastic Gradient Descent
`
`‘SELECT SGD <GLM> (<column>) FROM <database.tab1e> WHERE <where_clause>
`
`BUDGET <eps> <de1ta>.
`
`QC8. Random Forest
`
`32552/32620/DOCS/3854318.11
`
`
`
`‘SELECT RANDOMFOREST (<columns_X>,<columns_y>) FROM <database.table> WHERE
`
`<where_clause> BUDGET <eps> <delta>.
`
`QC9. Histogram
`
`‘SELECT HISTOGRAM (<column>) FROM <database.table> WHERE <where_clause_i>
`
`BUDGET <eps> <delta>.
`
`[0032]
`
`The query handling engine 156 transforms the received query commands into
`
`appropriate function calls and database access commands by parsing the query command string.
`
`The function calls are specific to the query 108 requested by the client 104, and the access
`
`commands allow access to the required database 106. Different databases 106 require different
`
`access commands. The access commands are provided to the database integrator 158.
`
`[0033]
`
`The database integrator 158 receives the access commands to one or more databases
`
`106 and collects the required databases 106 and merges them into a single data obj ect. The data
`
`object has a structure similar to that of a database structure described in reference to FIG. 2. The
`
`data object is provided to the privacy system 160.
`
`[0034]
`
`The privacy system 160 receives the data object from the database integrator 158,
`
`appropriate function calls from the query handling engine 156 indicating the type of query 108
`
`submitted by the client 104, privacy parameters specified for the query 108, and produces a DP
`
`response 112 to a differentially private version of the query 108 with respect to the databases
`
`106. The privacy system 160 will be described in further detail in reference to FIG. 3 below.
`
`PRIVACY SYSTEM
`
`[0035]
`
`FIG. 3 is a block diagram illustrating the privacy system 160 of the system 100
`
`shown in FIG. 1, according to one embodiment. The privacy system 160 includes a count engine
`
`10
`
`32552/32620/DOCS/3854318.11
`
`
`
`302, a mean engine 304, a median engine 306, a variance engine 308, an IQR engine 310, a
`
`batch gradient engine 312, a stochastic gradient engine 314, a random forest engine 316, a
`
`histogram engine 318, a model testing engine 320, and a synthetic database engine 322. Some
`
`embodiments of the privacy system 160 have different or additional modules than the ones
`
`described here. Similarly, the functions can be distributed among the modules in a different
`
`manner than is described here. Certain modules and functions can be incorporated into other
`
`modules of the privacy system 160.
`
`Definition of Differential Privacy
`
`[0036]
`
`For a given query 108, the privacy system 160 receives a data obj ectX, function
`
`calls indicating the type of query 108, privacy parameters specified by the client 104, and outputs
`
`a DP response 112 to a differentially private version of the query 108 with respect to X. Each
`
`data object X is a collection of row vectors xi=1, 2,
`
`m in which each row vector x,- has a series of
`
`p elements x/:’”""’P.
`
`[0037]
`
`A query M satisfies the definition of 8-differential privacy if for all:
`
`Pr[M(X) ES]
`VX,X' E 1D),VS Q Range(M):W 3 es
`
`where 11)) is the space of all possible data objects, X, X’ neighboring data objects, S is an output
`
`space of query M, and neighboring databases are defined as two data objects X, X’ that have at
`
`most one different entry from one another. That is, given two neighboring data objects X, X’ in
`
`which one has an individual’s data entry, and the other does not, there is no output of query M
`
`that an adversary can use to distinguish between X, X’. That is, an output of such a query M that
`
`is differentially private reveals no information about the data object X. The privacy parameter 8
`
`controls the amount of information that the query M reveals about any individual data entry in X,
`
`11
`
`32552/32620/DOCS/3854318.11
`
`
`
`and represents the degree of information released about the entries in X. For example, in the
`
`definition given above, a small value of 8 indicates that the probability an output of query M will
`
`disclose information on a specific data entry is small, while a large value of 8 indicates the
`
`opposite.
`
`[0038]
`
`As another definition of differential privacy, a query M is (8,5)—differentially private
`
`if for neighboring data objects X, X’:
`
`Pr[M(X) e 5]
`VX,X, E lD),VS Q Range(M):m S es + 6.
`
`The privacy parameter 6 measures the improbability of the output of query M satisfying 8-
`
`differential privacy. As discussed in reference to FIG. 1, the client 104 may specify the desired
`
`values for the privacy parameters (8,5) for a query 108.
`
`[0039]
`
`There are three important definitions for discussing the privacy system 160: global
`
`sensitivity, local sensitivity, and smooth sensitivity. Global sensitivity of a queryM is defined as
`
`GSM(X) = X,XIL’E%§I)=1”M(X) — MK )II
`
`where X, X’ are any neighboring data objects, such that d(X, X’):1. This states that the global
`
`sensitivity is the most the output of query M could change by computingM on X and X’.
`
`[0040]
`
`The local sensitivity of a query M on the data object X is given by:
`
`mm = X,.d’&%=1”M<X> — MK )II
`
`where the set {X’3 d(X, X’):1 } denotes all data objects that have at most one entry that is
`
`different from X. That is, the local sensitivity LSM (X) is the sensitivity of the output of the query
`
`M on data objects X’ that have at most one different entry from X, measured by a norm function.
`
`12
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0041]
`
`Related to the local sensitivity LSM (X), the smooth sensitivity given a parameter ,6
`
`is given by:
`
`SM(X,-fi) = $223 “L5M(X) . e—fi-d(X,X’)”
`
`where d(X, X’) denotes the number of entries that differ between X and X’.
`
`Notation for Random Variables
`
`[0042]
`
`The notation in this section is used for the remainder of the application to denote the
`
`following random variables.
`
`1) C(02), denotes a zero-centered Gaussian random variable with the probability density
`
`function
`
`x2
`f(x|02) 2 Le 202.
`0m
`
`2) Mb) denotes a zero-centered Laplacian random variable with the probability density function
`
`M
`1 __
`fOCIb) =%6 b-
`
`3) C(2) denotes a zero-centered Cauchy random variable with the probability density function
`
`x
`
`=
`
`1
`
`x
`1 + —
`Y
`
`"y
`
`[0043]
`
`Further, a vector populated with random variables R as its elements is denoted by
`
`v(R). A matrix populated with random variables R as its elements is denoted by M(R).
`
`Count Engine 302
`
`13
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0044]
`
`The count engine 302 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for counting the number of entries in a column
`
`of the data object X that satisfy a condition specified by the client 104, given privacy parameters
`
`(8,5). An example query command for accessing the count engine 302 is given in QCl above.
`
`For the example data objectX shown in FIG. 2, the client 104 may submit a query 108 to return a
`
`DP response 112 for the number of patients that are above the age of 30.
`
`[0045]
`
`The count engine 302 retrieves the count q from X. If privacy parameter 6 is equal
`
`to zero, the count engine 302 returns
`
`1
`y~q+L(cl'-),E
`
`as the DP response 112 for display on the user interface 150, where 01 is a constant. An example
`
`value for 01 may be 1.
`
`If the privacy parameter 6 is non-zero, the count engine 302 returns
`
`1
`y~q+G(c1-2-lo §-—),
`62
`
`as the DP response 112 for display on the user interface 150, where 01 is a constant. An example
`
`value for 01 may be 1.
`
`[0046]
`
`The client 104 may request visualization of entries in the data obj ectX for analysis
`
`of trends or patterns that depend on the features of the entries. In one embodiment, the privacy
`
`system 160 generates a differentially private visualization of the requested data entries from X.
`
`FIG. 4 illustrates displaying results of a differentially private count query to the user interface of
`
`the client, according to one embodiment.
`
`[0047]
`
`The privacy system 160 first maps the requested entries from X for the selected
`
`features specified by the client 104. For example, as shown in the visualization 410 of FIG. 4, a
`
`series of requested entries are plotted depending on their values for Feature 1 and Feature 2. The
`
`14
`
`32552/32620/DOCS/3854318.11
`
`
`
`privacy system 160 then generates disjoint regions on the plot and retrieves the counts of entries
`
`in each of the disjoint regions. In visualization 410, the privacy system 160 divides the plot into
`
`disjoint squares and retrieves the count of entries in each square.
`
`[0048]
`
`For each disjoint region, the privacy system 160 submits a differentially private
`
`count query to the count engine 302, and randomly plots a number of entries determined by the
`
`DP response 112 of the count engine 302 for that region. The resulting DP visualization plot is
`
`returned to the client 104 for display to a user by the user interface 150. For example, square
`
`440 in visualization 410 contains 3 entries, while the same square in DP visualization 420
`
`contains 4 randomly plotted entries determined by the DP response 112 of the count engine 302.
`
`Median Engine 304
`
`[0049]
`
`The median engine 304 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for generating the median of entries in a
`
`column of the data object X that satisfy a condition specified by the client 104, given privacy
`
`parameters (8,6). An example query command for accessing the median engine 304 is given in
`
`QC2 above. For the example data objectX shown in FIG. 2, the client 104 may submit a query
`
`108 to return a DP response 112 for the median age of all patients in X.
`
`[0050]
`
`The median engine 304 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U, and retrieves the median q from U. If privacy parameter
`
`6 is equal to zero, the median engine 304 returns
`
`“0
`y~q+61'5M(U:Cz'6)'?
`
`15
`
`32552/32620/DOCS/3854318.11
`
`
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`Example values for 01,02 may be 6 and 1/6, respectively. If 5 is non-zero, the median engine 304
`
`returns
`
`yzq+C1'SM UJCz'
`
`ea
`2-l0g%
`6
`
`as the DP response 112 for display on the user interface 150. Example values for 01,02 may be 2
`
`and 1, respectively.
`
`Mean Engine 306
`
`[0051]
`
`The mean engine 306 produces a DP response 112 responsive the differentially
`
`private security system 102 receiving a query 108 for generating the mean of entries in a column
`
`of the data object X that satisfy a condition specified by the client 104, given privacy parameters
`
`(8,5). An example query command for accessing the mean engine 306 is given in QC3 above.
`
`For the example data objectX shown in FIG. 2, the client 104 may submit a query 108 to return a
`
`DP response 112 for generating the mean age of patients that are above the age of 30.
`
`[0052]
`
`The mean engine 306 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U. Assuming there are n values in U, the mean engine 306
`
`further divides U into m sub-lists VF], 2,
`
`m each with n/m values. The mean engine 306
`
`aggregates each mean rj of sub-list VJ- into a list R. The mean engine 306 requests a differentially
`
`private median query of the values in R to the median engine 304. The resulting output from the
`
`median engine 304 is returned as the DP response 112 for display on the user interface 150.
`
`Variance Engine 308
`
`16
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0053]
`
`The variance engine 308 produces a DP response 112 responsive to the
`
`differentially private security system 102 receiving a query 108 for generating the variance of
`
`entries in a column of the data object X that satisfy a condition specified by the client 104, given
`
`privacy parameters (8,6). An example query command for accessing the variance engine 308 is
`
`given in QC4 above. For the example data object X shown in FIG. 2, the client 104 may submit
`
`a query 108 to return a DP response 112 for generating the variance of the age of all patients in
`
`X.
`
`[0054]
`
`The variance engine 308 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U. Assuming there are n values in U, the variance engine
`
`308 further divides U into m sub-lists VF], 2,
`
`m each with n/m values. The variance engine 308
`
`aggregates each variance rj of sub-list VJ- into a list R. The variance engine 308 requests a
`
`differentially private median query of the values in R to the median engine 304. The resulting
`
`output from the median engine 304 is returned as the DP response 112 for display on the user
`
`interface 150.
`
`If 2R Engine 310
`
`[0055]
`
`The IQR engine 310 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for generating the interquartile range (IQR) of
`
`entries in a column of the data obj ectX that satisfy a condition specified by the client 104, given
`
`privacy parameters (8,6). An example query command for accessing the IQR engine 310 is given
`
`in QCS above. For the example data obj ectX shown in FIG. 2, the client 104 may submit a
`
`query 108 to return a DP response 112 for generating the IQR of the age of all patients in X.
`
`17
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0056]
`
`In one embodiment, the IQR engine 310 aggregates the values of entries satisfying
`
`the condition specified by the client 104 into a list U. Assuming there are n values in U, the
`
`sample IQR of U is denoted as IQR(U), and a log transform of IQR(U) is denoted as:
`
`
`Hn(U) = 1091+ 1
`logn
`
`IQR(U).
`
`The IQR engine 310 further maps the quantity HMU) to an integer k0 such that Hn(U) E
`
`[k0, k0 + 1). The IQR engine 310 extracts a value A0(U) indicating the number of entries in U
`
`required to change in order for the new list U to satisfy Hn(U) (,1. [k0, k0 + 1).
`
`[0057]
`
`The IQR engine 310 then generates a value R0(U) given by:
`
`R0(U) ~ A0(U) + L (F)
`
`C1
`
`in which 01 is a constant factor. If R0(U) is greater than a predetermined threshold, the IQR
`
`engine 310 returns
`
`y = MN”) -(
`
` 1
`
`1+logn
`
`)4?)
`
`I
`
`as the DP response 112 for display on the user interface 150. If R0(U) is equal to or less than the
`
`predetermined threshold, the IQR engine 310 returns “No Answer” as the DP response 112 for
`
`display on the user interface 150.
`
`[0058]
`
`In another embodiment, the IQR engine 310 aggregates the values of entries
`
`satisfying the condition specified by the client 104 into an ordered list g. The IQR engine 310
`
`retrieves the first quartile and the third quartile from U, given by q and q’, respectively. If 6 is
`
`2
`
`cu)
`
`zero, the IQR engine 310 returns:
`
`yz<q+C1ISM(U"CZ'€)'
`
`,
`
`C
`
`5 >—(q +C1'SM(U}C2'€)'£)
`
`18
`
`32552/32620/DOCS/3854318.11
`
`
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`[0059]
`
`If 5 is non-zero, the IQR engine 310 returns:
`
`3’“
`
`q
`
`+
`
`S
`C1' M
`
`U
`
`M1)
`6
`JC2'— '— — ‘1
`2-l0g% 6/2
`
`’+
`
`S
`C1' M
`
`U
`
`Ml)
`6
`JC2'— '—
`Z-log% 6/2
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`Batch Gradient Engine 312
`
`[0060]
`
`The batch gradient engine 312 produces a DP response 112 responsive to the
`
`differentially private security system 102 receiving a valid query 108 for generating a set of
`
`parameters 0 for a general linear model that captures the correlation between a series of
`
`observable features and a dependent feature, given privacy parameters (8,6). The general linear
`
`model is trained on the selected columns of X. An example query command for accessing the
`
`batch gradient engine 312 is given in QC6 above.
`
`[0061]
`
`Given a row vectorx that contains a series of observable features and a label feature
`
`y, the correlation between the observable features and the label feature in a general linear model
`
`may be given as:
`
`y = x 0T,
`
`where 0 is a row vector containing parameters of the model. That is, the label feature is modeled
`
`as a weighted sum of the observable features, where each value in 0 is the weight given to a
`
`corresponding observable feature.
`
`[0062]
`
`For the example data object X shown in FIG. 2, the client 104 may submit a query
`
`108 to return a DP response 112 for generating a set of parameters 0 for a general linear model
`
`that captures the correlation between the height of the patients (observable feature) and the age
`
`19
`
`32552/32620/DOCS/3854318.11
`
`
`
`of the patients (label feature). As another example, the features may be categorical in nature, and
`
`the requested parameters 0 may be for a general linear model that captures the correlation
`
`between the height, age, residence of the patients (observable features) and whether the patient
`
`will or has contracted a disease (label feature).
`
`[0063]
`
`Examples of general linear models supported by the batch gradient engine 312 are,
`
`but not limited to, linear regression, logistic regression, and support vector machine (SVM)
`
`classif1ers.
`
`[0064]
`
`The optimal values for the set of parameters 0 is found by training the general linear
`
`model on training data (Xtmin, ytmin) consisting of selected columns of data object X.
`
`Specifically, Xtmin is a matrix database in which each column corresponds to a selected
`
`observable feature, and y is a column vector of the selected label feature values. Each entry in
`
`Xtmin has a one-to-one correspondence with an entry in y. The optimal 0 is generally found by
`
`minimizing a loss function on (Xtmin, ytmin) over possible values of 0. Mathematically, the
`
`minimization is given by:
`
`0 : arggmin €(Xtrainr ytrain; 0)-
`
`[0065]
`
`The batch gradient engine 312 returns a DP response 112 mm of a differentially
`
`private batch gradient query by perturbing the loss function to be minimized. Specifically, the
`
`perturbed minimization is given by:
`
`GDP : argmin €(Xtrainrytrain; 0) + 0T1) (G<
`
`0
`
`4-K2-R2- l
`
`1+
`
`2 ( 096 E)>),
`
`62
`
`in which K is the Lipschitz co