(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property
`
`Organization
`International Bureau
`(43) International Publication Date
`15 October 2015 (15.10.2015)
`
`\7
`

`&
`:9)
`WlPOl PCT
`
`(10) International Publication Number
`
`WO 2015/157020 A1
`
`(51) International Patent Classification:
`G06F 21/62 (2013.01)
`G06Q 10/04 (2012.01)
`
`(74) Agents: SHEDD, Robert, D. et at; 4 Research Way, 3rd
`Floor, Princeton, New Jersey 08540 (US).
`
`(21) International Application Number:
`
`PCT/US2015/023336
`
`(22) International Filing Date:
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`30 March 2015 (30.03.2015)
`
`English
`
`English
`
`(30) Priority Data:
`61/978,260
`
`11 April 2014 (11.04.2014)
`
`US
`
`(71) Applicant: THOMSON LICENSING [FR/FR]; 1-5 rue
`Jeanne d'Arc, F-92130 Issy-les-Moulineaux (FR).
`
`(72)
`
`Inventors: KVETON, Branislav; 663 Helweh Ct., San
`Jose, California 95126 (US). SALAMATIAN, Salman; 42
`avenue de Vessy, F-01210 l-‘erney-Volairc (FR). FAWAZ,
`Nadia; 1531 Beilomy Street, Santa Clara, California 95050
`(US). TAFT, Nina; 4265 24th Street, San Francisco, Cali-
`fornia 94114 (US).
`
`(81) Designated States (unless otherwise indicated, for every
`lrind of national protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BII, BN, BR, BW, BY,
`BZ, CA, CII, CL, CN, CO, CR, CU, CZ, DE, DK, DM,
`DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR,
`KZ,LAJX;LK,LR,L&IILLY,MA,MD,ME,MG,
`MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM,
`PA,PE,PG,PH,PL,PT,QA,RO,RS,RU}RVV,SA,SC,
`SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN,
`TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW'.
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RVV, SD, SL, ST, SZ,
`TZ, UG, ZM, ZVV), Eurasian (AM, AZ, BY, KG, KZ, RU,
`TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE,
`DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU,
`LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK,
`
`(54) Title: METI IOD AND APPARATUS FOR SPARSE PRIVACY PRESERVING MAPPING
`
`[Continued on nextpage]
`
`(57) Abstract: A user may wish to release some public data,
`which is correlated with his private data, to an analyst in the
`hope of getting some utility. The public data can be distorted be—
`fore its release according to a probabilistic privacy preserving
`mapping mechanism, which limits information leakage under
`utility constraints. The present principles provide a solution to
`speed up the computation of privacy preserving mappings. In
`particular, we recognize that privacy preserving mappings are
`sparse, that is, public data of a user may only be mapped to a
`limited selection of data points with non-zero probabilities. Sub-
`sequently, we generate sparse privacy preserving mappings by
`recasting the optimization problem as a sequence of linear pro-
`grams and solving each of these incrementally using an adapta-
`tion of Dantzig-Wolfe decomposition.
`
`T10
`
`720
`
`740
`
`|
`1,,
`Iv- (bu
`
`
`
`700
`
`\ V705
`
`Initialization
`
`
`Collect statistical information about
`public and private data
`
`
`Determine a sparse privacy preserving
`mapping based on the statistical
`information and a utility constraint
`
`|
`
`Distort public data to be released
`to preserve privacy
`
`_ .
`.
`_
`Release data
`ides
`
`Hg. 7
`
`3/17/2018, EAST Version:
`
`4.1.1.3
`
`
`
`wo2015/157020A1|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`

`

`WO 2015/157020 A1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, Published:
`GW’ KM’ ML’ MR’ NE’ SN’ TD’ TG)‘
`* with international search report (Art. 21(3))
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/US2015/023336
`
`Method and Apparatus for Sparse Privacy Preserving Mapping
`
`
`CROSS-REFERENCE TO RELATED APPLICATIONS
`
`[1]
`
`This application is related to (1) US Provisional Patent Application Serial No.
`
`61/691,090 filed on August 20, 2012, and titled “A Framework for Privacy against Statistical
`
`Inference” (hereinafter “‘Fawaz”); (2) US. Provisional Patent Application Serial No.
`
`1 I047 EA
`1IOU/,J‘-t 3 4‘:
`
`6
`
`Privacy Preserving Mapping against Inference Attacks” (hereinafter “Fawaz2”); and (3) US.
`
`Provisional Patent Application Serial No. 61/867,546 filed on August 19, 2013, and titled
`
`“Method and Apparatus for Utility—Aware Privacy Preserving Mapping through Additive
`
`Noise” (hereinafter “Fawaz3”). The provisional applications are expressly incorporated by
`
`reference herein in their entirety.
`
`TECHNICAL FIELD
`
`[2]
`
`This invention relates to a method and an apparatus for preserving privacy, and more
`
`particularly, to a method and an apparatus for generating a privacy preserving mapping in a fast
`
`speed.
`
`BACKGROUND
`
`[3]
`
`Finding the right balance between privacy risks and big data rewards is a big
`
`challenge facing society today. Big data creates tremendous opportunity, especially for all
`
`offer advice on everything including movies, TV shows, restaurants, music, sleep, exercise,
`
`vacation, entertainment, shopping, and even friends. On one hand, people are willing to part
`
`with some of their personal data (e.g., movie watching history) for the sake of these services.
`
`1
`
`0
`
`1
`
`5
`
`[\3CD
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/US2015/023336
`
`The service, or other benefit that the user derives from allowing access to the user’s data may
`
`be referred to as utility. On the other hand, many users have some data about themselves
`
`they would prefer to keep private (e. g., their political affiliation, salary, pregnancy status,
`
`religion). Most individuals have both public and private data and hence they need to
`
`maintain a boundary between these different elements of their personal information. This is
`
`an enormous challenge because inference analysis on publicly released data can often
`
`uncover private data.
`
`M
`
`[4]
`
`The present principles provide a method for processing user data for a user,
`
`10
`
`15
`
`comprising: accessing the user data, which includes private data and public data; determining
`
`a set of values that the public data of the user can map to, wherein size of the set of values is
`
`small; determining a privacy preserving mapping that maps the public data to released data,
`
`wherein the public data of the user only maps to values within the determined set of values;
`
`modifying the public data of the user based on the privacy preserving mapping; and releasing
`
`the modified data as the released data to at least one of a service provider and a data
`
`collecting agency. The present principles also provide an apparatus for performing these
`
`steps.
`
`[5]
`
`The present principles also provide a method for processing user data for a first user
`
`and a second user, comprising: accessing the user data, which includes private data and
`
`public data; determining a first set of values that the public data of the first user can map to,
`
`wherein size of the first set of values is in the magnitude of order of ten: determining a
`
`second set of values that the public data of the second user can map to, wherein size of the
`
`determined second set of values is in the magnitude of order of ten, and the determined first
`
`set of values is different from the determined second set of values; determining a privacy
`
`[\0
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`preserving mapping that maps the public data to released data, wherein the public data of the
`
`first user only maps to values Within the determined first set of values and the public data of
`
`the second user only maps to values within the determined second set of values; modifying
`
`the public data of the first user and the second user based on the privacy preserving mapping;
`
`and releasing the modified data as the released data to at least one of a service provider and a
`
`data collecting agency. The present principles also provide an apparatus for performing
`
`these steps.
`
`[6]
`
`The present principles also provide a computer readable storage medium having
`
`stored thereon instructions for processing user data according to the methods described
`
`10
`
`above.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[7]
`
`FIG. 1A and FIG. 1B are pictorial examples illustrating exemplary privacy preserving
`
`mappings under small and large distortion constraints, respectively.
`
`[8]
`
`FIGs. 2A, 2B and 2C are pictorial examples illustrating effect of parameters on
`
`privacy-distortion tradeoff using synthetic data, census data and movie data, respectively.
`
`[9]
`
`FIGs. 3A, 3B and 3C are pictorial examples illustrating ROC Curves for Naive Bayes
`
`Classifier with distortion constraint A = 0.02, 0.14 and 0.44, respectively, using census data.
`
`[10]
`
`FIGs. 4A, 4B and 4C are pictorial examples illustrating ROC Curves for Logistic
`
`Regression with distortion consn‘aint A : 0.04, 0.13 and 0.22, respectively, using movie data.
`
`[11]
`
`FIG. 5 is a pictorial example illustrating behavior of Sparse Privacy Preserving
`
`Mappings (SPPM) and exponential mechanism (ExpMec) using synthetic data.
`
`['12]
`
`FIG. 6 is a pictorial examples illustrating time complexity and parameter sensitivity
`
`15
`
`20
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`using synthetic data.
`
`[13]
`
`FIG. 7 is a flow diagram depicting an exemplary method for preserving privacy, in
`
`accordance with an embodiment of the present principles.
`
`[14]
`
`FIG. 8 is a block diagram depicting an exemplary privacy agent, in accordance with
`
`an embodiment of the present principles.
`
`[15]
`
`FIG. 9 is a block diagram depicting an exemplary system that has multiple privacy
`
`agents, in accordance with an embodiment of the present principles.
`
`DETAILED DESCRIPTION
`
`[16]
`
`A number of research efforts have explored the idea of distorting the public data
`
`before they are released to preserve user’s privacy.
`
`In some of these prior efforts, distortion
`
`aims at creating some confusion around user data, by making its value hard to distinguish
`
`from other possible values, in other efforts, distortion is designed to counter a particular
`
`inference threat (i.e., a specific classifier or analysis). Recently Fawaz proposed a new
`
`framework for data distortion, based on an information theory, that captures privacy leakage
`
`in terms of mutual information, where mutual information is a measure of mutual dependence
`
`between two random variables. Minimizing the mutual information between a user’ s
`
`private data and released data is attractive because it reduces the correlation between the
`
`private data and the publicly released data, and thus any inference analysis that tries to learn
`
`the private data from the publicly released data is rendered weak, if not useless.
`
`In other
`
`words, this approach is agnostic to the type of inference analysis used in any given threat.
`
`10
`
`15
`
`20
`
`[17]
`
`In the present application, we refer to the data a user would like to remain private as
`
`“private data,” the data the user is willing to release as “public data,” and the data the user
`
`actually releases as “released data.” For example, a user may want to keep his political
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`opinion private, and is willing to release his TV ratings with modification (for example, the
`
`user’s actual rating of a program is 4, but he releases the rating as 3).
`
`In this case, the user’s
`
`political opinion is considered to be private data for this user, the TV ratings are considered
`
`to be public data, and the released modified TV ratings are considered to be the released data.
`
`Note that another user may be willing to release both political opinion and TV ratings without
`
`modifications, and thus, for this other user, there is no distinction between private data, public
`
`data and released data when only political opinion and TV ratings are considered.
`
`If many
`
`people release political opinions and TV ratings, an analyst may be able to derive the
`
`correlation between political opinions and TV ratings, and thus, may be able to infer the
`
`10
`
`political opinion of the user who wants to keep it private.
`
`15
`
`20
`
`[18]
`
`The term analyst, which for example may be a part of a service provider’s system, as
`
`used in the present application, refers to a receiver of the released data, who ostensibly uses
`
`the data in order to provide utility to the user. Often the analyst is a legitimate receiver of
`
`the released data. However, an analyst could also illegitimately exploit the released data
`
`and infer some information about private data of the user. This creates a tension between
`
`privacy and utility requirements.
`
`To reduce the inference threat while maintaining utility
`
`the user may release a “distorted version” of data, generated according to a conditional
`
`probabilistic mapping, called “privacy preserving mapping” or “privacy mapping,” designed
`
`under a utility constraint.
`
`[19]
`
`Regarding private data, this refers to data that the user not only indicates that it should
`
`not be publicly released, but also that he does not want it to be inferred from other data that
`
`he would release.
`
`Public data is data that the user would allow the privacy agent to release,
`
`possibly in a distorted way to prevent the inference of the private data.
`
`[20]
`
`In one embodiment, public data is the data that the service provider requests from the
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`user in order to provide him with the service. The user however will distort (i.e., modify) it
`
`before releasing it to the service provider.
`
`In another embodiment, public data is the data
`
`that the user indicates as being “public” in the sense that he would not mind releasing it as
`
`long as the release takes a form that protects against inference of the private data.
`
`[21]
`
`As discussed above, whether a specific category of data is considered as private data
`
`or public data is based on the point of View of a specific user. For ease of notation, we call a
`
`specific category of data as private data or public data from the perspective of the current user.
`
`For example, when trying to design privacy preserving mapping for a current user who wants
`
`to keep his political opinion private, we call the political opinion as private data for both the
`
`current user and for another user who is willing to release his political opinion.
`
`[22]
`
`In the present principles, we use the distortion between the released data and public
`
`data as a measure of utility. When the distortion is larger, the released data is more different
`
`from the public data, and more privacy is preserved, but the utility derived from the distorted
`
`data may be lower for the user. On the other hand, when the distortion is smaller, the
`
`released data is a more accurate representation of the public data and the user may receive
`
`more utility, for example, receive more accurate content recommendations.
`
`[23] Distorting data or modifying data, in the context of recommendation systems, means
`
`altering a user’s profile. The framework of Fawaz casts the privacy problem as a convex
`
`optimization problem with linear constraints, where the number of variables grows
`
`quadratically with the size of the underlying alphabet that describes user’s profiles. When
`
`the alphabet size can be huge, the enormous number of options for distorting user profiles
`
`presents a scalability challenge.
`
`[24]
`
`There have been existing works on protecting privacy against statistical inference.
`
`In a work by Salman Salamatian et. al., titled “How to hide the elephant- or the donkey- in
`
`6
`
`10
`
`15
`
`20
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`the room: Practical privacy against statistical inference for large data,” IEEE GlobalSIP 2013,
`
`a method based on quantization was proposed to reduce the number of optimization variables.
`
`It was shown that the reduction in complexity does not affect the privacy levels that can be
`
`achieved, but comes at the expense of additional distortion.
`
`In Fawaz3, privacy mappings in
`
`the class of additive noise were considered. The parametric additive noise model allows to
`
`reduce the number of optimization variables to the number of noise parameters. However,
`
`this suboptimal solution is not suitable for perfect privacy, as it requires a high distortion.
`
`[25]
`
`The use of the information theoretic framework of Fawaz relies on a local privacy
`
`setting, where users do not trust the analyst collecting data, thus each user holds his data
`
`10
`
`locally, and passes it through a privacy preserving mechanism before releasing it to the
`
`untrusted analyst. Local privacy dates back to randomized response in surveys, and has
`
`been considered in privacy for data mining and statistics.
`
`Information theoretic privacy
`
`metrics have also been considered.
`
`Finally, differential privacy is currently the prevalent
`
`notion of privacy in privacy research.
`
`In the present application, we use the exponential
`
`15
`
`mechanism (ExpMec) to compare our privacy mapping.
`
`[26]
`
`The general problem of minimizing a convex function under convex constraints has
`
`been studied extensively, and is of main importance in many machine learning tasks. The
`
`idea of a sparse approximate solutions to those problems has also been studied in the
`
`literature and has been often called Sparse Greedy Approximation. This type of algorithm
`
`20
`
`has been used with success in many applications such as Neural Network, Matrix
`
`Factorization, SVM, Boosting, and others.
`
`Perhaps the most basic form of Sparse and
`
`Greedy Approximation arises when using the Frank-Wolfe Algorithm for a convex problem
`
`over the simplex.
`
`[27]
`
`In the present application ,we consider the setting described in Fawaz, where a user
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`10
`
`15
`
`has two kinds of data: a vector of personal data A E (/1 that he would like to remain private,
`
`e.g., his income level, his political views, and a vector of data B E B that he is willing to
`
`release publicly and from which he will derive some utility, e. g., the release of his media
`
`preferences to a service provider would allow the user to receive content recommendations.
`
`o4 and B are the sets from which A and B can assume values. We assume that the
`
`user’s private data A are linked to his public data B by the joint probability distribution
`
`PA,B- Thus, an analyst who would observe B could infer some information about A. To
`
`reduce this inference threat, instead of releasing B, the user releases a distorted version of B,
`
`denoted as B E B. B is a set from which B can assume values, and is generated
`
`according to a conditional probabilistic mapping ngB, called the privacy preserving
`
`mapping. Note that the set B may differ from B. This setting is reminiscent of the local
`
`privacy setting (e. g., randomized response, input perturbation), where users do not trust the
`
`analyst collecting data, thus each user holds his data locally, and passes it through a privacy
`
`preserving mechanism before releasing it.
`
`[28]
`
`The privacy mapping {1ng is designed to render any statistical inference of A based
`
`on the observation of B harder, while preserving some utility to the released data B, by
`
`limiting the distortion caused by the mapping.
`
`Following the framework for privacy-utility
`
`against statistical inference in Fawaz, the inference threat is modeled by the mutual
`
`information I (A; B ) between the private data A and the publicly released data B, while the
`
`utility requirement is modeled by a constraint on the average distortion ]EB_3 [d (B, B)] S A,
`
`for some distortion metric d: B x B —> R+, and A 2 0. Note that this general framework
`
`does not assume a particular inference algorithm.
`
`In the case of perfect privacy (1(A; B) =
`
`0), the privacy mapping png renders the released data B statistically independent from the
`
`private data A.
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`[29]
`
`Both the mutual information I (A; f?) and the average distortion IE 3’g [d (B, 3)]
`
`depend on both the prior distribution pA_B and the privacy mapping 29%, since A —> B —> 1?
`
`form a Markov chain. To stress that I (A; B ) depends on both pA’B and png, we will
`
`write I (A; E) as ] (291413,?)3'3). Consequently, given a prior 224,3 linking the private data
`
`A and the public data B, the privacy mapping p 3| 3 minimizing the inference threat subject
`
`to a distortion constraint is obtained as the solution to the following convex optimization
`
`problem:
`
`minimize ](pAIB,p§|B),s.t.
`pEIBESimplex
`
`[EpE‘B[d(B,B\)pB’g] SA,
`
`(1)
`
`where Simplex denotes the probability simplex ( EX p (x) = 1,
`
`2906) 2 0 Vx).
`
`10
`
`15
`
`20
`
`[30]
`
`The present principles propose methods to reduce computation complexity when
`
`designing privacy preserving mappings.
`
`In particular, we propose to exploit the structure of
`
`the optimization problem to achieve computational speedup that will allow scalability. To
`
`be more precise, we solve the above problem efficiently when the sets B and g are large.
`
`In one embodiment, by studying smaller scale problems, both analytically and empirically,
`
`we identify that mappings to distort profiles are in fact naturally sparse. We leverage this
`
`observation to develop sparse privacy preserving mappings (SPPM) that handle scalability.
`
`Although the underlying optimization problem has linear constraints, its objective function is
`
`non—linear. We use the Frank—Wolfe algorithm that approximates the objective via a
`
`sequence of linear approximations, and this allows solving the problem as a sequence of
`
`linear programs, that can be solved quickly.
`
`In addition, we limit the number of alternate
`
`user profiles to a small number; this can be practical as long as the alternates are selected
`
`smartly. To do this, we adapt the Dantzig-Wolfe decomposition to the structure of our
`
`problem. Overall we reduce the number of variables from quadratic to linear in the number
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`of user profiles. To the best of our knowledge, this work is the first to apply large scale
`
`linear programming optimization techniques to privacy problems.
`
`[31] We also provide a detailed evaluation on three datasets, in which we compare our
`
`solution to an optimal one (when feasible) and to a state-of—the-art solution based on
`
`differential privacy (called the Exponential Mechanism). We find that our solutions are
`
`close to optimal, and consistently outperform the exponential mechanism (ExpMec) approach
`
`in that we achieve more privacy with less distortion. We show that our methods scale well
`
`with respect to the number of user profiles and their underlying alphabet.
`
`[32]
`
`In the following, the sparsity property of the privacy preserving mappings, the
`
`proposed sparse privacy preserving mappings and evaluation results will be discussed in great
`
`detail.
`
`[33]
`
`Sparsity of the Privacy Preserving Mappings
`
`[34] When applying the aforementioned privacy—utility framework to large data, we
`
`encounter a challenge of scalability. Designing the privacy mapping requires characterizing
`
`the value of p3|B (13 lb) for all possible pairs (b, 13) E B x g, i.e., solving the convex
`
`optimization problem over |B| |§| variables. When fi = B, and the size of the alphabet
`
`|B|
`
`is large, solving the optimization over |B|2 variables may become intractable.
`
`[35]
`
`A natural question is whether 35 = B is a necessary assumption to achieve the
`
`optimal privacy-utility tradeoff.
`
`In other words, does the optimization problem (1) need to
`
`be solved over [B I 2 variables to achieve optimality“? We use the following theoretical
`
`example to motivate a sparser approach to the design of the privacy mapping.
`
`[36]
`
`Example 1: Let A 6 {0,1}, and B 6 {1,2, ...,2m}, and define the joint distribution
`
`10
`
`15
`
`20
`
`10
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`pA‘B such that p(A = 0) = p(A = 1) = E, and for i E {1,2, ..., 27"}, let p(B = i|A = 0) =
`
`
`1
`1
`2m—1
`2m—1
`
`if i S Zm‘l, 0 otherwise; and let p(B = i|A = 1) =
`
`if i> Zm‘l, 0 otherwise.
`
`For this example, the privacy threat is the worst it could be, as observing B determines
`
`deterministically the value of the private random variable A (equivalently,
`
`I (A; B) =
`
`H (A) = 1).
`
`In FIG. 1, we consider an [2 distortion measure d(B;B) = (B — B)2 and
`
`illustrate the optimal mappings solving Problem (1) for different distortion values. Darker
`
`colors mean that the probability of mapping of the corresponding points to the other one is
`
`larger. For small distortions, the blackest diagonal in FIG. 1A shows that most points
`
`B = b are only mapped to themselves B = b (i.e., with a mapping probability of 100%),
`
`10
`
`and only a small number of points in B (around B : 65) may be mapped to different points
`
`in B. That is, for a given data point in B, the privacy preserving mapping only chooses
`
`from a small set of values from B, rather than from the entire set of B. As we increase the
`
`distortion level, more points in B get mapped to a larger number of points in B.
`
`In both
`
`FIG. 1A and FIG. 1B, we notice that the optimal mappings (i.e., these points shown on the
`
`curves) only occupy a very small portion of the 2-D space B X B. Thus, we consider the
`
`optimal privacy preserving mapping as sparse.
`
`[37]
`
`This theoretical example, as well as experiments on other datasets such as the census
`
`data have shown that the optimal privacy preserving mapping may be sparse, in the sense that
`
`the support. (i.e., the set of points B = b to which B = b can be mapped to with a non-zero
`
`20
`
`probability) of 13ng (b | b) may be of much smaller size than B, and may differ for different
`
`values of B = b. We propose to exploit sparsity properties of the privacy mapping to speed
`
`up the computation, by choosing the support of p3|,3 (b|b), in an iterative greedy way.
`
`[38] Using an example, we explain some of the notations used in the present application.
`
`11
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`We assume a group of ten people who consider age and income as private data, and consider
`
`gender and education as public data. We call age, income, gender, or education as an
`
`attribute of a user, wherein the group of people may have values for the attributes as:
`
`— age: {20-30, 30-40, 40-50};
`
`— income: {£50K, >50K };
`
`— gender: {male, female};
`
`— education: {high school, bachelor, postgraduate}
`
`10
`
`15
`
`In this example, private data A is a vector of random variables {age, income}, and public
`
`data B is a vector of random variables { gender, education}. For a particular person in this
`
`group, who is a 29—year—old woman with a bachelor’s degree and makes more than 50K, her
`
`private data a = (age: 20-30, income >50K), and public data b = (gender: female, education:
`
`bachelor).
`
`IIer user profile can be set to (gender: female, education: bachelor). The set of
`
`B may comprise {(malc, high school), (male, bachelor), (male, postgraduate), (female, high
`
`school), (female, bachelor), (female, postgraduate) }. The set of B may also be smaller, for
`
`example, the set of B may comprise {(male, high school), (male, bachelor), (male,
`
`postgraduate), (female, bachelor), (female, postgraduate)} if every woman in this group of
`
`people has a bachelor or postgraduate degree. Each element (for example, (male, bachelor))
`
`of B is also referred to as a data point or possible value of B. The set of B may be
`
`identical to B.
`
`In order to protect his or her age/income information, a person may modify
`
`the gender and/or education information, such that he or she appears as another person.
`
`[39]
`
`S arse Privac Preservin Ma
`
`in
`
`SPPM
`
`[40]
`
`Before we describe our algorithm, we rewrite the optimization problem (1). Let X
`
`be a n X n matrix of variables to be optimized, whose entries are defined as XL] =
`
`13ng (Bl-lbj), and let xj be the j-th column of X, where n is the cardinality of B (i.e.,
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`n = IBI, also referred to as alphabet size). Then the objective function ] (pA_B, p3'B) can be
`
`written as f (X), some function of X, and the distortion constraint can be written as
`
`.
`77.
`F1 dj xj S A, where each.
`
`dj =PB(bj)[(d(51.bJ-) d(52.bj)
`
`d(8n1bj))]T
`
`is a vector of length n that represents the distortion metric scaled by the probability of the
`
`corresponding symbol bj. The marginal of B is computed as p3 (bj) = 2a mm (a, bj).
`1“:
`A L.
`‘ML
`.4
`rinally, the simplex cons raini. car
`
`:z. A“
`1
`L» ”...:uz.‘ a». 1'1"“ _ 4 LL..-“ : #4.»..-
`DC WIlLlCIl 'db 1an — J.
`1 )I all j, WIlCIC 171 lb 'dH
`
`all-ones vector of length 11. Given the new notation, optimization problem (1) can be
`
`written as:
`
`77.
`
`minixmize
`f(X)
`-
`T
`subject to 2 dj X} S A
`1'21
`
`15:9 2 1 Vj = 1, ...n
`
`X20
`
`(2)
`
`where X 2 0 is an entry-wise inequality.
`
`10
`
`[41]
`
`The optimization problem (1) has linear constraints but its objective function is
`
`non-linear because of the way mutual information is computed.
`
`In one embodiment, we
`
`solve the problem as a sequence of linear programs, also known as the Frank-Wolfe method.
`
`Each iteration 1,” of the method consists of three major steps.
`
`First, we compute the
`
`gradient fo(Xg,1) at the solution from the previous step Xg,1. The gradient is a n X n
`
`15
`
`.
`.
`.
`.
`.
`.
`.
`.
`a
`.
`matrlx C, where ci’1' = fl f (X3—1) 1s a part1al derivative of the objectrve funct1on With
`
`respect to the variable 9513]:
`
`Second, we find a feasible solution X’
`
`in the direction of the
`
`gradient. This problem is solved as a linear program with the same constraint as the
`
`optimization problem (1):
`
`13
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`TI.
`
`.
`.
`.
`T
`minlxmizeZ cj xj
`i=177.
`
`subject to Z d; Xj S A
`'=11
`
`1ng = 1 w: 1, n
`
`X 2 0
`
`,
`(3)
`
`where cj
`
`is the j—th column of C.
`
`Finally, we find the minimum of f between Xg_1 and
`
`X’, X3, and make it the current solution.
`
`Since f is convex, this minimum can be found
`
`efficiently by ternary search. The minimum is also feasible because the feasible region is
`
`convex, and both X' and Xg_1 are feasible.
`
`
`
`Algorithm 1 SPPM: Sparse privacy preserving mappings
`
`Input:
`
`Initial feasible point X0
`
`Number of linearization steps L
`for all? = 1,2, ...,L d0
`
`C ‘— VXf(le—1)
`12 <— DWD
`
`Find a feasible solution X'
`
`in the direction of the gradient C:
`Tl
`
`(4)
`
`.
`.
`.
`T
`mlnixmizeZ cj xj
`i=1n
`
`subject to Z dJ-T Xj S A
`j:1
`
`lflxj = 1 V]
`X 2 0
`
`acid 2 0 V(i,j) 65 17
`
`Find the minimum of f between Xg_1 and X':
`
`V ‘— arg minf((1 — WX/aa + WW
`VEIOJI
`
`(5)
`
`Xe ‘— (1 — V*)Xe—1 + VX'
`end for
`
`Output: Feasible solution XL
`
`14
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`
`
`
`Algorithm 2 DWD: Dantzig-Wolfe decomposition
`
`Initialize the set of active variables: V <— { (1,1), (2,2),
`
`, (71,71) }
`
`while the set V grows do
`
`Solve the master problem for 2* and if:
`
`Tl
`
`maximize
`A,“
`
`AA + Z flj
`.J21
`subject to xi S 0
`
`Adi'j +HISC1'] V(l,j)E V
`
`(6)
`
`forall j = 1,2,...,n do
`
`Find the most violated constraint in the master problem [or fixed j:
`
`i.* = arg minfc; ,- —Ad.— ,- — ad
`i
`L
`.,,
`w
`.,J
`
`(7)
`x
`,
`
`if (CPU — Adi-:3]- — ,uj < 0)
`end for
`end while
`
`then 1? <— 1} U (i*,j) end if
`
`Output: Active variables
`
`[42]
`
`The linear program (3) has n
`
`2 variables and therefore is hard to solve when n is
`
`large.
`
`In one embodiment, we propose an incremental solution to this problem, which is
`
`defined only on a subset of active variables 1? E {1, 2,
`
`, n} X {1, 2,
`
`, n}. This is why we
`
`refer to our approach as sparse privacy mappings. The active variables are indices of the
`
`non-zero variables in the solution to the problem (3). Each active variable include a pair of
`
`indices (i, j), wherein the j-th data point in B is mapped to the i-th data point in E? with a
`
`non-zero probability. Therefore, solving (3) on active variables 1/
`
`is equivalent to
`
`restricting all inactive variables to zero:
`
`(8)
`
`_
`_
`_
`minimize
`x
`
`77.
`
`i=177.
`
`T
`c- x-
`J
`J
`
`subject to VdTX;SA
`Li
`1
`’
`j=1
`
`1£Xj=1 Vj=1,...,n
`
`X20
`
`15
`
`3/17/2018, EAST Version: 4.1.1.3
`
`

`

`WO 2015/157020
`
`PCT/U82015/023336
`
`The above program has only IVI variables. Now the challenge is in finding a good set of
`
`active variables V.
`
`'lhis set should be small, and the solutions to of (3) and (8) should be
`
`close.
`
`[43] We grow the set V greedily using the dual linear program of (8).
`
`In particular, we
`
`incrementally solve the dual by adding most violated constraints, which corresponds to
`
`adding most beneficial variables in the primal. The dual of (8) is:
`n
`
`I
`1 A
`'
`'
`maxrmlze Au -1-
`Ml
`
`I
`
`[=1
`
`la}-
`
`subject to
`
`/1 S 0
`
`[Idij +.u]' S CL] V(l,j) E V
`
`.
`
`(9)
`
`10
`
`15
`
`where A E IR is a variable associ

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket