`
`INVENTORS
`
`ISHAAN NERURKAR
`
`CHRISTOPHER HOCKENBROCHT
`
`MICHAEL SHAUGHNESSY
`
`EITAN CHATAV
`
`FIELD OF DISCLOSURE
`
`BACKGROUND
`
`[0001]
`
`The present invention generally relates to a database system, and more specifically
`
`to responding to a database query by executing a differentially private version of the query on the
`
`database.
`
`DESCRIPTION OF THE RELATED ART
`
`[0002]
`
`Personally identifiable information, such as health data, financial records, telecom
`
`data, and confidential business intelligence, such as proprietary data or data restricted by
`
`contractual obligations, is valuable for analysis and collaboration. Yet, only a fraction of such
`
`sensitive information is used by organizations or analysts for statistical or predictive analysis.
`
`Privacy regulations, security concerns, and technological challenges suppress the full value of
`
`data, especially personally identifiable information and confidential and proprietary records.
`
`[0003]
`
`Methods that attempt to solve this problem, such as access controls, data masking,
`
`hashing, anonymization, aggregation, and tokenization, are invasive and resource intensive,
`
`compromise analytical utility, or do not ensure privacy of the records. For example, data
`
`masking may remove or distort data, compromising the statistical properties of the data. As
`
`1
`
`32552/32620/DOCS/3854318.11
`
`
`
`another example, many of the above mentioned methods are not effective when information is
`
`stored in disparate data sources. Technology which enables organizations or analysts to execute
`
`advanced statistical and predictive analysis on sensitive information across disparate data sources
`
`without revealing record-level information is needed.
`
`SUMMARY
`
`[0004]
`
`A hardware database privacy device is communicatively coupled to a private
`
`database system. The hardware database privacy device receives a request from a client device
`
`to perform a query of the private database system and identifies a level of differential privacy
`
`corresponding to the request. The identified level of differential privacy includes a privacy
`
`parameter 8 indicating the degree of information released about data in the private database
`
`system.
`
`[0005]
`
`The differentially private hardware database privacy device identifies a set of data
`
`stored in the private database system and a set of operations to be performed on the set of data
`
`that corresponds to the requested query. After the set of data is accessed, the set of operations is
`
`modified based on the identified level of differential privacy such that a performance of the
`
`modified set of operations produces a result set that is differentially private. The modified set of
`
`operations is performed on the accessed set of data to produce the differentially private result set.
`
`The differentially private result set is provided to the client device for display on a hardware
`
`display of the client device.
`
`32552/32620/DOCS/3854318.11
`
`
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`[0006]
`
`FIG. 1 illustrates a system for receiving a query for a private database, and for
`
`responding to the query by executing a differentially private version of the query on the private
`
`database.
`
`[0007]
`
`FIG. 2 illustrates an example database structure, according to one embodiment.
`
`[0008]
`
`FIG. 3 is a block diagram illustrating the privacy system of the system in FIG. 1,
`
`according to one embodiment.
`
`[0009]
`
`FIG. 4 illustrates displaying results of a differentially private count query, according
`
`to one embodiment.
`
`[0010]
`
`FIG. 5 illustrates an example binary decision tree for use in a differentially private
`
`random forest query, according to one embodiment.
`
`[0011]
`
`FIG. 6 illustrates perturbing the counts for a differentially private histogram query,
`
`according to one embodiment.
`
`[0012]
`
`FIG. 7A illustrates a recursive process for identifying threshold points of the
`
`classification output vector for a differentially private model testing query, according to one
`
`embodiment.
`
`[0013]
`
`FIG. 7B illustrates an example confusion matrix generated during a differentially
`
`private model testing query.
`
`[0014]
`
`FIG. 8 illustrates a system-level modification to the system of FIG. 1 that allows the
`
`client to access to a differentially private synthetic database, according to one embodiment.
`
`[0015]
`
`FIG. 9 illustrates the application of a clustering query to entries of a differentially
`
`private synthetic database, according to one embodiment.
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0016]
`
`FIG. 10 illustrates a process for responding to a database query by executing a
`
`differentially private version of the query on the database, according to one embodiment.
`
`[0017]
`
`FIG. 11 is a block diagram illustrating components of an example machine able to
`
`read instructions from a machine-readable medium and execute them in a processor (or
`
`controller).
`
`DETAILED DESCRIPTION
`
`[0018]
`
`The Figures (FIGS.) and the following description describe certain embodiments by
`
`way of illustration only. One skilled in the art will readily recognize from the following
`
`description that alternative embodiments of the structures and methods illustrated herein may be
`
`employed without departing from the principles described herein. Reference will now be made
`
`in detail to several embodiments, examples of which are illustrated in the accompanying figures.
`
`It is noted that wherever practicable similar or like reference numbers may be used in the figures
`
`and may indicate similar or like functionality.
`
`SYSTEM OVERVIEW
`
`[0019]
`
`FIG. 1 is a system 100 for receiving a query 108 for a private database 106, and
`
`responding to the query 108 by executing a differentially private (DP) version of the query 114
`
`on the private database 106. The system 100 includes a differentially private security system 102
`
`that receives the analytical query 108 from a client 104 and applies a DP version of the query
`
`114 on the database 106. Subsequently, the differentially private security system 102 returns the
`
`response of the DP query 114 to the client 104 as the DP response 112.
`
`[0020]
`
`The database 106 is one or more private databases managed by one or more entities
`
`that can only be accessed by authorized or trusted users. For example, the database 106 may
`
`32552/32620/DOCS/3854318.11
`
`
`
`contain health data of patients, financial records, telecom data, and confidential business
`
`intelligence, such as proprietary data or data restricted by contractual obligations. The
`
`information stored in the database 106 is of interest to one or more clients 104, but clients 104
`
`may not have the necessary authorization to access to information contained in the databases
`
`106.
`
`[0021]
`
`FIG. 2 illustrates an example database structure, according to one embodiment. For
`
`the remainder of the application, a database, including one or more of the private databases 106,
`
`may be referred to as a matrix with a number of rows and columns. Each row is an entry of the
`
`database and each column is a feature of the database. Thus, each row contains a data entry
`
`characterized by a series of feature values for the data entry. For example, as shown in FIG. 2,
`
`the example database 200 contains 8 entries and 11 features, and illustrates a list of patient
`
`profiles. Each patient is characterized by a series of feature values that contain information on
`
`the patient’s height (Feature 1), country of residence (Feature 2), age (Feature 10), and whether
`
`the patient has contracted a disease (Feature 11).
`
`[0022]
`
`The feature values may be numerical in nature, e.g., Features 1 and 10, or
`
`categorical in nature, e.g., Features 2 and 11. In the case of categorical feature values, each
`
`category may be denoted as an integer. For example, in Feature 11 of FIG. 2, “0” indicates that
`
`the patient has not contracted a disease, and “1” indicates that the patient has contracted a
`
`disease.
`
`[0023]
`
`Returning to FIG. 1, the client 104 may be a human analyst or an organization that
`
`does not have direct access to the database 106, but is interested in applying an analytical query
`
`108 to the database 106. For example, the client 104 may be a data analyst, data scientist, or a
`
`health analyst that is interested in the profiles of the patients but does not have direct access to
`
`32552/32620/DOCS/3854318.11
`
`
`
`the database 106. Each client 104 of the system 100 is associated with a privacy budget and
`
`specifies a set of privacy parameters each time the client 104 submits a query 108. The privacy
`
`budget is a numerical value representative of a number and/or type of remaining queries 108
`
`available to the client 104 in terms of the privacy parameters specified for each query 108.
`
`[0024]
`
`The query 108 submitted by the client 104 may be simple queries, such as count
`
`queries that request the number of entries in the databases 106 that satisfy a condition specified
`
`by the client 104, or complicated queries, such as predictive analytics queries that request a data
`
`analytics model trained on the databases 106. Upon submitting a query 108 to the differentially
`
`private security system 102, the client 104 receives a DP response 112 to a differentially private
`
`version of the submitted query 114.
`
`[0025]
`
`The client 104 specifies a set of privacy parameters each time the client 104 submits
`
`query 108. The privacy parameters indicate an amount of decrease in the privacy budget of the
`
`client 104 in return for a response to the query 108. As described below in more detail with
`
`reference to the privacy system 160 in FIG. 3, the privacy parameters specified by the client 104
`
`also indicate the amount of information released about the database 106 to the client 104.
`
`[0026]
`
`The differentially private security system 102 receives an analytical query 108 from
`
`the client 104 and applies a differentially private version of the query 114 on the database 106,
`
`such that it releases a degree of information about the database 106 indicated by the privacy
`
`parameters specified by the client 104, but also protects a degree of privacy of the databases 106
`
`specified by the entities managing the database 106. For example, the entities managing the
`
`database 106 may also set a maXimum threshold on the degree of information released about the
`
`database 106 for a given query 108 that the client 104 may not exceed. Thus, the differentially
`
`private security system balances privacy protection of the database 106 while releasing useful
`
`32552/32620/DOCS/3854318.11
`
`
`
`information on the database 106 to the client 104. The differentially private security system 102
`
`may have complete or partial access to the databases 106.
`
`[0027]
`
`Upon receiving a query 108, the differentially private security system 102 applies
`
`DP query 114 to the database 106 and returns a DP response 112 to the client 104. The DP
`
`query 114 is a differentially private version of the query 108 that satisfies a definition of
`
`differential privacy described in more detail with reference to the privacy system 160 in FIG. 3.
`
`The DP query 114 may include perturbing the response or output of the query 108 with noise, or
`
`perturbing the process for generating the output of the query 108 with noise. The resulting
`
`output of the DP query 114 is returned to the client 104 as DP response 112. Ideally, the DP
`
`response 112 correlates to the original output of the query 108 on the databases 106 but
`
`maintains the degree of privacy specified by the entities managing the database 106.
`
`DIFFERENTIALLY PRIVATE SECURITY SYSTEM
`
`[0028]
`
`The differentially private security system 102 includes a user interface 150, a
`
`library 152, an account management system 154, a query handling engine 156, a data integration
`
`module 158, and a privacy system 160. Some embodiments of the differentially private security
`
`system 102 have different or additional modules than the ones described here. Similarly, the
`
`functions can be distributed among the modules in a different manner than is described here.
`
`Certain modules and functions can be incorporated into other modules of the differentially
`
`private security system 102.
`
`[0029]
`
`The user interface 150 can generate a graphical user interface on a dedicated
`
`hardware device of the differentially private security system 102 or the client 104 in which the
`
`client 104 can submit an analytical query 108 and the desired privacy parameters, and view DP
`
`32552/32620/DOCS/3854318.11
`
`
`
`response 112 in the form of numerical values or images. The client 104 may also inspect
`
`database 106 schemata, view an associated privacy budget, or cache the DP response 112 to view
`
`the response later. The user interface 150 submits properly formatted query commands to other
`
`modules of the differentially private security system 102.
`
`[0030]
`
`The library 152 contains software components that can be included in external
`
`programs that allow the client 104 to submit the analytical query 108, receive the DP response
`
`112, and other functions within a script or program. For example, the client 104 may use the
`
`software components of the library 152 to construct custom data analytic programs. Each of the
`
`software components in the library 152 submits properly formatted query commands to other
`
`modules of the differentially private security system 102.
`
`[0031]
`
`The account management system 154 receives properly formatted query commands
`
`(herein “query commands” or“ C”), parses the received query commands, and updates the
`
`account of the client 104 according to the received query command. For example, the account
`
`management system 154 may check the query commands for syntactic correctness, or check
`
`whether a client 104 has access to a requested resource. As another example, the account
`
`management system 154 may check whether the privacy parameters specified by the client 104
`
`for a given analytical query 108 can be accommodated, and if so, decrement the privacy budget
`
`of the client 104 by the amount specified in the query 108. Query commands verified by the
`
`account management system 154 are provided to the query handling engine 156. Examples of
`
`query commands accommodated by the differentially private security system 102 are listed
`
`below.
`
`QCl. Count
`
`32552/32620/DOCS/3854318.11
`
`
`
`‘SELECT COUNT (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC2. Median
`
`‘SELECT MEDIAN (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC3. Mean
`
`‘SELECT MEAN (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET
`
`<eps> <de1ta>.
`
`QC4. Variance
`
`‘SELECT VARIANCE (<column>) FROM <database.tab1e> WHERE <where_clause>
`
`BUDGET <eps> <de1ta>.
`
`QC5. Inter-Quartile Range
`
`‘SELECT IQR (<column>) FROM <database.tab1e> WHERE <where_clause> BUDGET <eps>
`
`<de1ta>.
`
`QC6. Batch Gradient Descent
`
`‘SELECT <GLM> (<columns_x>,<column_y>,<params>) FROM <database.tab1e> WHERE
`
`<where_clause> BUDGET <eps> <de1ta>.
`
`QC7. Stochastic Gradient Descent
`
`‘SELECT SGD <GLM> (<column>) FROM <database.tab1e> WHERE <where_clause>
`
`BUDGET <eps> <de1ta>.
`
`QC8. Random Forest
`
`32552/32620/DOCS/3854318.11
`
`
`
`‘SELECT RANDOMFOREST (<columns_X>,<columns_y>) FROM <database.table> WHERE
`
`<where_clause> BUDGET <eps> <delta>.
`
`QC9. Histogram
`
`‘SELECT HISTOGRAM (<column>) FROM <database.table> WHERE <where_clause_i>
`
`BUDGET <eps> <delta>.
`
`[0032]
`
`The query handling engine 156 transforms the received query commands into
`
`appropriate function calls and database access commands by parsing the query command string.
`
`The function calls are specific to the query 108 requested by the client 104, and the access
`
`commands allow access to the required database 106. Different databases 106 require different
`
`access commands. The access commands are provided to the database integrator 158.
`
`[0033]
`
`The database integrator 158 receives the access commands to one or more databases
`
`106 and collects the required databases 106 and merges them into a single data obj ect. The data
`
`object has a structure similar to that of a database structure described in reference to FIG. 2. The
`
`data object is provided to the privacy system 160.
`
`[0034]
`
`The privacy system 160 receives the data object from the database integrator 158,
`
`appropriate function calls from the query handling engine 156 indicating the type of query 108
`
`submitted by the client 104, privacy parameters specified for the query 108, and produces a DP
`
`response 112 to a differentially private version of the query 108 with respect to the databases
`
`106. The privacy system 160 will be described in further detail in reference to FIG. 3 below.
`
`PRIVACY SYSTEM
`
`[0035]
`
`FIG. 3 is a block diagram illustrating the privacy system 160 of the system 100
`
`shown in FIG. 1, according to one embodiment. The privacy system 160 includes a count engine
`
`10
`
`32552/32620/DOCS/3854318.11
`
`
`
`302, a mean engine 304, a median engine 306, a variance engine 308, an IQR engine 310, a
`
`batch gradient engine 312, a stochastic gradient engine 314, a random forest engine 316, a
`
`histogram engine 318, a model testing engine 320, and a synthetic database engine 322. Some
`
`embodiments of the privacy system 160 have different or additional modules than the ones
`
`described here. Similarly, the functions can be distributed among the modules in a different
`
`manner than is described here. Certain modules and functions can be incorporated into other
`
`modules of the privacy system 160.
`
`Definition of Differential Privacy
`
`[0036]
`
`For a given query 108, the privacy system 160 receives a data obj ectX, function
`
`calls indicating the type of query 108, privacy parameters specified by the client 104, and outputs
`
`a DP response 112 to a differentially private version of the query 108 with respect to X. Each
`
`data object X is a collection of row vectors xi=1, 2,
`
`m in which each row vector x,- has a series of
`
`p elements x/:’”""’P.
`
`[0037]
`
`A query M satisfies the definition of 8-differential privacy if for all:
`
`Pr[M(X) ES]
`VX,X' E 1D),VS Q Range(M):W 3 es
`
`where 11)) is the space of all possible data objects, X, X’ neighboring data objects, S is an output
`
`space of query M, and neighboring databases are defined as two data objects X, X’ that have at
`
`most one different entry from one another. That is, given two neighboring data objects X, X’ in
`
`which one has an individual’s data entry, and the other does not, there is no output of query M
`
`that an adversary can use to distinguish between X, X’. That is, an output of such a query M that
`
`is differentially private reveals no information about the data object X. The privacy parameter 8
`
`controls the amount of information that the query M reveals about any individual data entry in X,
`
`11
`
`32552/32620/DOCS/3854318.11
`
`
`
`and represents the degree of information released about the entries in X. For example, in the
`
`definition given above, a small value of 8 indicates that the probability an output of query M will
`
`disclose information on a specific data entry is small, while a large value of 8 indicates the
`
`opposite.
`
`[0038]
`
`As another definition of differential privacy, a query M is (8,5)—differentially private
`
`if for neighboring data objects X, X’:
`
`Pr[M(X) e 5]
`VX,X, E lD),VS Q Range(M):m S es + 6.
`
`The privacy parameter 6 measures the improbability of the output of query M satisfying 8-
`
`differential privacy. As discussed in reference to FIG. 1, the client 104 may specify the desired
`
`values for the privacy parameters (8,5) for a query 108.
`
`[0039]
`
`There are three important definitions for discussing the privacy system 160: global
`
`sensitivity, local sensitivity, and smooth sensitivity. Global sensitivity of a queryM is defined as
`
`GSM(X) = X,XIL’E%§I)=1”M(X) — MK )II
`
`where X, X’ are any neighboring data objects, such that d(X, X’):1. This states that the global
`
`sensitivity is the most the output of query M could change by computingM on X and X’.
`
`[0040]
`
`The local sensitivity of a query M on the data object X is given by:
`
`mm = X,.d’&%=1”M<X> — MK )II
`
`where the set {X’3 d(X, X’):1 } denotes all data objects that have at most one entry that is
`
`different from X. That is, the local sensitivity LSM (X) is the sensitivity of the output of the query
`
`M on data objects X’ that have at most one different entry from X, measured by a norm function.
`
`12
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0041]
`
`Related to the local sensitivity LSM (X), the smooth sensitivity given a parameter ,6
`
`is given by:
`
`SM(X,-fi) = $223 “L5M(X) . e—fi-d(X,X’)”
`
`where d(X, X’) denotes the number of entries that differ between X and X’.
`
`Notation for Random Variables
`
`[0042]
`
`The notation in this section is used for the remainder of the application to denote the
`
`following random variables.
`
`1) C(02), denotes a zero-centered Gaussian random variable with the probability density
`
`function
`
`x2
`f(x|02) 2 Le 202.
`0m
`
`2) Mb) denotes a zero-centered Laplacian random variable with the probability density function
`
`M
`1 __
`fOCIb) =%6 b-
`
`3) C(2) denotes a zero-centered Cauchy random variable with the probability density function
`
`x
`
`=
`
`1
`
`x
`1 + —
`Y
`
`"y
`
`[0043]
`
`Further, a vector populated with random variables R as its elements is denoted by
`
`v(R). A matrix populated with random variables R as its elements is denoted by M(R).
`
`Count Engine 302
`
`13
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0044]
`
`The count engine 302 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for counting the number of entries in a column
`
`of the data object X that satisfy a condition specified by the client 104, given privacy parameters
`
`(8,5). An example query command for accessing the count engine 302 is given in QCl above.
`
`For the example data objectX shown in FIG. 2, the client 104 may submit a query 108 to return a
`
`DP response 112 for the number of patients that are above the age of 30.
`
`[0045]
`
`The count engine 302 retrieves the count q from X. If privacy parameter 6 is equal
`
`to zero, the count engine 302 returns
`
`1
`y~q+L(cl'-),E
`
`as the DP response 112 for display on the user interface 150, where 01 is a constant. An example
`
`value for 01 may be 1.
`
`If the privacy parameter 6 is non-zero, the count engine 302 returns
`
`1
`y~q+G(c1-2-lo §-—),
`62
`
`as the DP response 112 for display on the user interface 150, where 01 is a constant. An example
`
`value for 01 may be 1.
`
`[0046]
`
`The client 104 may request visualization of entries in the data obj ectX for analysis
`
`of trends or patterns that depend on the features of the entries. In one embodiment, the privacy
`
`system 160 generates a differentially private visualization of the requested data entries from X.
`
`FIG. 4 illustrates displaying results of a differentially private count query to the user interface of
`
`the client, according to one embodiment.
`
`[0047]
`
`The privacy system 160 first maps the requested entries from X for the selected
`
`features specified by the client 104. For example, as shown in the visualization 410 of FIG. 4, a
`
`series of requested entries are plotted depending on their values for Feature 1 and Feature 2. The
`
`14
`
`32552/32620/DOCS/3854318.11
`
`
`
`privacy system 160 then generates disjoint regions on the plot and retrieves the counts of entries
`
`in each of the disjoint regions. In visualization 410, the privacy system 160 divides the plot into
`
`disjoint squares and retrieves the count of entries in each square.
`
`[0048]
`
`For each disjoint region, the privacy system 160 submits a differentially private
`
`count query to the count engine 302, and randomly plots a number of entries determined by the
`
`DP response 112 of the count engine 302 for that region. The resulting DP visualization plot is
`
`returned to the client 104 for display to a user by the user interface 150. For example, square
`
`440 in visualization 410 contains 3 entries, while the same square in DP visualization 420
`
`contains 4 randomly plotted entries determined by the DP response 112 of the count engine 302.
`
`Median Engine 304
`
`[0049]
`
`The median engine 304 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for generating the median of entries in a
`
`column of the data object X that satisfy a condition specified by the client 104, given privacy
`
`parameters (8,6). An example query command for accessing the median engine 304 is given in
`
`QC2 above. For the example data objectX shown in FIG. 2, the client 104 may submit a query
`
`108 to return a DP response 112 for the median age of all patients in X.
`
`[0050]
`
`The median engine 304 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U, and retrieves the median q from U. If privacy parameter
`
`6 is equal to zero, the median engine 304 returns
`
`“0
`y~q+61'5M(U:Cz'6)'?
`
`15
`
`32552/32620/DOCS/3854318.11
`
`
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`Example values for 01,02 may be 6 and 1/6, respectively. If 5 is non-zero, the median engine 304
`
`returns
`
`yzq+C1'SM UJCz'
`
`ea
`2-l0g%
`6
`
`as the DP response 112 for display on the user interface 150. Example values for 01,02 may be 2
`
`and 1, respectively.
`
`Mean Engine 306
`
`[0051]
`
`The mean engine 306 produces a DP response 112 responsive the differentially
`
`private security system 102 receiving a query 108 for generating the mean of entries in a column
`
`of the data object X that satisfy a condition specified by the client 104, given privacy parameters
`
`(8,5). An example query command for accessing the mean engine 306 is given in QC3 above.
`
`For the example data objectX shown in FIG. 2, the client 104 may submit a query 108 to return a
`
`DP response 112 for generating the mean age of patients that are above the age of 30.
`
`[0052]
`
`The mean engine 306 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U. Assuming there are n values in U, the mean engine 306
`
`further divides U into m sub-lists VF], 2,
`
`m each with n/m values. The mean engine 306
`
`aggregates each mean rj of sub-list VJ- into a list R. The mean engine 306 requests a differentially
`
`private median query of the values in R to the median engine 304. The resulting output from the
`
`median engine 304 is returned as the DP response 112 for display on the user interface 150.
`
`Variance Engine 308
`
`16
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0053]
`
`The variance engine 308 produces a DP response 112 responsive to the
`
`differentially private security system 102 receiving a query 108 for generating the variance of
`
`entries in a column of the data object X that satisfy a condition specified by the client 104, given
`
`privacy parameters (8,6). An example query command for accessing the variance engine 308 is
`
`given in QC4 above. For the example data object X shown in FIG. 2, the client 104 may submit
`
`a query 108 to return a DP response 112 for generating the variance of the age of all patients in
`
`X.
`
`[0054]
`
`The variance engine 308 aggregates the values of entries satisfying the condition
`
`specified by the client 104 into a list U. Assuming there are n values in U, the variance engine
`
`308 further divides U into m sub-lists VF], 2,
`
`m each with n/m values. The variance engine 308
`
`aggregates each variance rj of sub-list VJ- into a list R. The variance engine 308 requests a
`
`differentially private median query of the values in R to the median engine 304. The resulting
`
`output from the median engine 304 is returned as the DP response 112 for display on the user
`
`interface 150.
`
`If 2R Engine 310
`
`[0055]
`
`The IQR engine 310 produces a DP response 112 responsive to the differentially
`
`private security system 102 receiving a query 108 for generating the interquartile range (IQR) of
`
`entries in a column of the data obj ectX that satisfy a condition specified by the client 104, given
`
`privacy parameters (8,6). An example query command for accessing the IQR engine 310 is given
`
`in QCS above. For the example data obj ectX shown in FIG. 2, the client 104 may submit a
`
`query 108 to return a DP response 112 for generating the IQR of the age of all patients in X.
`
`17
`
`32552/32620/DOCS/3854318.11
`
`
`
`[0056]
`
`In one embodiment, the IQR engine 310 aggregates the values of entries satisfying
`
`the condition specified by the client 104 into a list U. Assuming there are n values in U, the
`
`sample IQR of U is denoted as IQR(U), and a log transform of IQR(U) is denoted as:
`
`
`Hn(U) = 1091+ 1
`logn
`
`IQR(U).
`
`The IQR engine 310 further maps the quantity HMU) to an integer k0 such that Hn(U) E
`
`[k0, k0 + 1). The IQR engine 310 extracts a value A0(U) indicating the number of entries in U
`
`required to change in order for the new list U to satisfy Hn(U) (,1. [k0, k0 + 1).
`
`[0057]
`
`The IQR engine 310 then generates a value R0(U) given by:
`
`R0(U) ~ A0(U) + L (F)
`
`C1
`
`in which 01 is a constant factor. If R0(U) is greater than a predetermined threshold, the IQR
`
`engine 310 returns
`
`y = MN”) -(
`
` 1
`
`1+logn
`
`)4?)
`
`I
`
`as the DP response 112 for display on the user interface 150. If R0(U) is equal to or less than the
`
`predetermined threshold, the IQR engine 310 returns “No Answer” as the DP response 112 for
`
`display on the user interface 150.
`
`[0058]
`
`In another embodiment, the IQR engine 310 aggregates the values of entries
`
`satisfying the condition specified by the client 104 into an ordered list g. The IQR engine 310
`
`retrieves the first quartile and the third quartile from U, given by q and q’, respectively. If 6 is
`
`2
`
`cu)
`
`zero, the IQR engine 310 returns:
`
`yz<q+C1ISM(U"CZ'€)'
`
`,
`
`C
`
`5 >—(q +C1'SM(U}C2'€)'£)
`
`18
`
`32552/32620/DOCS/3854318.11
`
`
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`[0059]
`
`If 5 is non-zero, the IQR engine 310 returns:
`
`3’“
`
`q
`
`+
`
`S
`C1' M
`
`U
`
`M1)
`6
`JC2'— '— — ‘1
`2-l0g% 6/2
`
`’+
`
`S
`C1' M
`
`U
`
`Ml)
`6
`JC2'— '—
`Z-log% 6/2
`
`as the DP response 112 for display on the user interface 150, in which 01, 02 are constant factors.
`
`Batch Gradient Engine 312
`
`[0060]
`
`The batch gradient engine 312 produces a DP response 112 responsive to the
`
`differentially private security system 102 receiving a valid query 108 for generating a set of
`
`parameters 0 for a general linear model that captures the correlation between a series of
`
`observable features and a dependent feature, given privacy parameters (8,6). The general linear
`
`model is trained on the selected columns of X. An example query command for accessing the
`
`batch gradient engine 312 is given in QC6 above.
`
`[0061]
`
`Given a row vectorx that contains a series of observable features and a label feature
`
`y, the correlation between the observable features and the label feature in a general linear model
`
`may be given as:
`
`y = x 0T,
`
`where 0 is a row vector containing parameters of the model. That is, the label feature is modeled
`
`as a weighted sum of the observable features, where each value in 0 is the weight given to a
`
`corresponding observable feature.
`
`[0062]
`
`For the example data object X shown in FIG. 2, the client 104 may submit a query
`
`108 to return a DP response 112 for generating a set of parameters 0 for a general linear model
`
`that captures the correlation between the height of the patients (observable feature) and the age
`
`19
`
`32552/32620/DOCS/3854318.11
`
`
`
`of the patients (label feature). As another example, the features may be categorical in nature, and
`
`the requested parameters 0 may be for a general linear model that captures the correlation
`
`between the height, age, residence of the patients (observable features) and whether the patient
`
`will or has contracted a disease (label feature).
`
`[0063]
`
`Examples of general linear models supported by the batch gradient engine 312 are,
`
`but not limited to, linear regression, logistic regression, and support vector machine (SVM)
`
`classif1ers.
`
`[0064]
`
`The optimal values for the set of parameters 0 is found by training the general linear
`
`model on training data (Xtmin, ytmin) consisting of selected columns of data object X.
`
`Specifically, Xtmin is a matrix database in which each column corresponds to a selected
`
`observable feature, and y is a column vector of the selected label feature values. Each entry in
`
`Xtmin has a one-to-one correspondence with an entry in y. The optimal 0 is generally found by
`
`minimizing a loss function on (Xtmin, ytmin) over possible values of 0. Mathematically, the
`
`minimization is given by:
`
`0 : arggmin €(Xtrainr ytrain; 0)-
`
`[0065]
`
`The batch gradient engine 312 returns a DP response 112 mm of a differentially
`
`private batch gradient query by perturbing the loss function to be minimized. Specifically, the
`
`perturbed minimization is given by:
`
`GDP : argmin €(Xtrainrytrain; 0) + 0T1) (G<
`
`0
`
`4-K2-R2- l
`
`1+
`
`2 ( 096 E)>),
`
`62
`
`in which K is the Lipschitz co

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site