`
`Rakesh Agrawal
`
`Jerry Kiernan
`
`Ramakrishnan Srikant
`
`Yirong Xu
`
`IBM Almaden Research Center
`650 Harry Road, San Jose, CA 95120
`
`ABSTRACT
`Encryption is a well established technology for protecting sensi-
`tive data. However, once encrypted, data can no longer be easily
`queried aside from exact matches. We present an order-preserving
`encryption scheme for numeric data that allows any comparison op-
`eration to be directly applied on encrypted data. Query results pro-
`duced are sound (no false hits) and complete (no false drops). Our
`scheme handles updates gracefully and new values can be added
`without requiring changes in the encryption of other values. It al-
`lows standard database indexes to be built over encrypted tables
`and can easily be integrated with existing database systems. The
`proposed scheme has been designed to be deployed in application
`environments in which the intruder can get access to the encrypted
`database, but does not have prior domain information such as the
`distribution of values and cannot encrypt or decrypt arbitrary val-
`ues of his choice. The encryption is robust against estimation of the
`true value in such environments.
`
`1.
`
`INTRODUCTION
`Database systems typically offer access control as the means to
`restrict access to sensitive data. This mechanism protects the pri-
`vacy of sensitive information provided data is accessed using the in-
`tended database system interfaces. However, access control, while
`important and necessary, is often insufficient. Attacks upon com-
`puter systems have shown that information can be compromised
`if an unauthorized user simply gains access to the raw database
`files, bypassing the database access control mechanism altogether.
`For instance, a recent article published in the Toronto Star [14] de-
`scribes an incident where a disk containing the records of several
`hundred bank customers was being auctioned on eBay. The bank
`had inadvertently sold the disk to the eBay re-seller as used equip-
`ment without deleting its contents. Drawing upon privacy legisla-
`tions and guidelines worldwide, Hippocratic databases also identify
`the protection of personal data from unauthorized acquisition as a
`vital requirement [1].
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage, and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`SIGMOD 2004 June 13-18, 2004, Paris, France.
`Copyright 2004 ACM 1-58113-859-8/04/06 : : : $5.00.
`
`Encryption is a well established technology for protecting sensi-
`tive data [7] [22] [24]. Unfortunately, the integration of existing
`encryption techniques with database systems causes undesirable
`performance degradation. For example, if a column of a table con-
`taining sensitive information is encrypted, and is used in a query
`predicate with a comparison operator, an entire table scan would
`be needed to evaluate the query. The reason is that the current en-
`cryption techniques do not preserve order and therefore database
`indices such as B-tree can no longer be used. Thus the query exe-
`cution over encrypted databases can become unacceptably slow.
`We present an encryption technique called OPES (Order Preserv-
`ing Encryption Scheme) that allows comparison operations to be
`directly applied on encrypted data, without decrypting the operands.
`Thus, equality and range queries as well as the MAX, MIN, and
`COUNT queries can be directly processed over encrypted data.
`Similarly, GROUP BY and ORDER BY operations can also be ap-
`plied. Only when applying SUM or AVG to a group do the values
`need to be decrypted. OPES is also endowed with the following
`properties:
` The results of query processing over data encrypted using
`OPES are exact. They neither contain any false positives nor
`miss any answer tuple. This feature of OPES sharply differ-
`entiates it from schemes such as [13] that produce a superset
`of answer, necessitating filtering of extraneous tuples in a
`rather expensive and complex post-processing step.
`
` OPES handles updates gracefully. A value in a column can
`be modified or a new value can be inserted in a column with-
`out requiring changes in the encryption of other values.
`
` OPES can easily be integrated with existing database sys-
`tems as it has been designed to work with the existing index-
`ing structures such as B-trees. The fact that the database is
`encrypted can be made transparent to the applications.
`
`Measurements from an implementation of OPES in DB2 show that
`the time and space overhead of OPES are reasonable for it to be
`deployed in real systems.
`1.1 Estimation Exposure
`The security of an encryption scheme is conventionally assessed
`by analyzing whether an adversary can find the key used for en-
`cryption. See [22] [24] for a categorization of different levels of
`attacks against a cryptosystem.
`When dealing with sensitive numeric data, an adversary does
`not have to determine the exact data value p corresponding to an
`
`Page 1 of 12
`
`Netskope Exhibit 1019
`
`
`
`column statistics, as well as values written to recovery logs.
`Otherwise, an adversary may be able to use this information
`to guess data distributions.
`
`1.3 Pedagogical Assumptions and Notations
`The focus of this paper is on developing order-preserving en-
`cryption techniques for numeric values and assumes conventional
`encryption [22] [24] for other data types as well as for encrypting
`information such as schema names and metadata. We will some-
`time refer to unencrypted data values as plaintext. Similarly, en-
`crypted values will also be referred to as ciphertext.
`We will assume that the database consists of a single table, which
`in turn consists of a single column. The domain of the column will
`be initially assumed to be a subset of integer values, pmin; pmax.
`The extension for real values is given later in the paper.
`Assume the database ~P consists of a total of j ~P j plaintext values.
`Out of these, jP j values are unique, which will be represented as
`P = p ; p; : : : ; pjP j , pi pi+ The corresponding encrypted
`values will be represented as C = c ; c; : : : ; cjP j, ci ci+ .
`Duplicates can sometimes be used to guess the distribution of a
`domain, particularly if the distribution is highly skewed. A closely
`related problem is that if the number of distinct values is small (e.g.,
`day of the month), it is easy to guess the domain. We will ini-
`tially assume that the domain to be encrypted either does not con-
`tain many duplicates or contains a distribution that can withstand a
`duplicate attack, and discuss the handling of duplicates later in the
`paper.
`1.4 Paper Layout
`The rest of the paper is organized as follows. We first discuss re-
`lated work in Section 2. We give an overview of OPES in Section 3.
`The next three sections give details of the three main phases of
`OPES. We describe extensions to handle real values and duplicates
`in Section 7. In Section 8, we study the quality of the encryption
`produced by OPES and present performance measurements from a
`DB2 implementation. We conclude with a summary and directions
`for future work in Section 9.
`
`2. RELATED WORK
`
`as c = Pp
`
`Summation of Random Numbers A simple scheme has been
`proposed in [3] that computes the encrypted value c of integer p
`j= Rj, where Rj is the jth value generated by a se-
`cure pseudo-random number generator R. Unfortunately, the cost
`of making p calls to R for encrypting or decrypting c can be pro-
`hibitive for large values of p.
`A more serious problem is the vulnerability to estimation ex-
`posure. Since the expected gap between two encrypted values is
`proportional to the gap between the corresponding plaintext val-
`ues, the nature of the plaintext distribution can be inferred from the
`encrypted values. Figure 2 shows the distributions of encrypted val-
`ues obtained using this scheme for data values sampled from two
`different distributions: Uniform and Gaussian. In each case, once
`both the input and encrypted distributions are scaled to be between
`0 and 1, the number of points in each bucket is almost identical for
`the plaintext and encrypted distributions. Thus the percentile of a
`point in the encrypted distribution is also identical to its percentile
`in the plaintext distribution.
`
`Figure 1: Transparent encryption in the “trusted database soft-
`ware with vulnerable storage” setting.
`
`encrypted value c; a breach may occur if the adversary succeeds
`in obtaining a tight estimate of p. For a numeric domain P , if
`an adversary can estimate with c confidence that a data value
`p lies within the interval p ; p] then the interval width p (cid:0)
`p =domain-widthP defines the amount of estimation exposure
`at c confidence level.
`Clearly, any order-preserving encryption scheme is vulnerable
`to tight estimation exposure if the adversary can choose any num-
`ber of unencrypted (encrypted) values of his liking and encrypt
`(decrypt) them into their corresponding encrypted (unencrypted)
`values. Similarly, any order-preserving encryption is not secure
`against tight estimation exposure if the adversary can guess the do-
`main and knows the distribution of values in that domain.
`We consider an application environment where the goal is safety
`from an adversary who has access to all (but only) encrypted values
`(the so called ciphertext only attack [22] [24]), and does not have
`any special information about the domain. We will particularly fo-
`cus on robustness against estimation exposure.
`1.2 Threat Model
`We assume (see Figure 1):
` The storage system used by the database software is vulnera-
`ble to compromise. While current database systems typically
`perform their own storage management, the storage system
`remains part of the operating system. Attacks against storage
`could be performed by accessing database files following a
`path other than through the database software, or in the ex-
`treme, by physical removal of the storage media.
` The database software is trusted. We trust the database
`software to transform query constants into their encrypted
`values and decrypt the query results. Similarly, we assume
`that an adversary does not have access to the values in the
`memory of the database software.
` All disk-resident data is encrypted. In addition to the data
`values, the database software also encrypts schema infor-
`mation such as table and column names, metadata such as
`
`Page 2 of 12
`
`Netskope Exhibit 1019
`
`
`
`(a) Input: Uniform
`
`2500
`
`Original
`Encrypted
`
`Scaled Domain
`
`(b) Input: Gaussian
`
`1
`
`Original
`Encrypted
`
`1
`
`2000
`
`1500
`
`1000
`
`Number of points
`
`500
`
`0
`
`0
`
`2500
`
`2000
`
`1500
`
`1000
`
`Number of points
`
`500
`
`0
`
`0
`
`Scaled Domain
`Figure 3: Polynomial functions: Encryption of different input
`distributions look different.
`
`(a) Input: Uniform
`
`Original
`Encrypted
`
`Scaled Domain
`
`(b) Input: Gaussian
`
`1
`
`Original
`Encrypted
`
`1
`
`300
`
`250
`
`200
`
`150
`
`100
`
`50
`
`Number of points
`
`0
`
`0
`
`700
`
`600
`
`500
`
`400
`
`300
`
`200
`
`100
`
`0
`
`Number of points
`
`0
`
`Scaled Domain
`Figure 2: Summation of random numbers: Distribution of en-
`crypted values tracks the input distribution.
`
`Polynomial Functions
`In [12], a sequence of strictly increasing
`polynomial functions is used for encrypting integer values while
`preserving their order. These polynomial functions can simply be
`of the first or second order, with coefficients generated from the en-
`cryption key. An integer value is encrypted by applying the func-
`tions in such a way that the output of a function becomes the input
`of the next function. Correspondingly, an encrypted value is de-
`crypted by solving these functions in reverse order. However, this
`encryption method does not take the input distribution into account.
`Therefore the shape of the distribution of encrypted values depends
`on the shape of the input distribution, as shown in Figure 3 for the
`encryption function given in Example 10 in [12]. This illustration
`suggests that this scheme may reveal information about the input
`distribution, which can be exploited.
`
`Bucketing
`In [13], tuples are encrypted using conventional en-
`cryption, but an additional bucket id is created for each attribute
`value. This bucket id, which represents the partition to which the
`unencrypted value belongs, can be indexed. The constants ap-
`pearing in a query are replaced by their corresponding bucket ids.
`Clearly, the result of a query will contain false hits that must be
`removed in a post-processing step after decrypting the tuples re-
`turned by the query. This filtering can be quite complex since the
`bucket ids may have been used in joins, subqueries, etc. The num-
`ber of false hits depends on the width of the partitions involved. It is
`shown in [13] that the post-processing overhead can become exces-
`sive if a coarse partitioning is used for bucketization. On the other
`hand, a fine partitioning makes the scheme vulnerable to estimation
`exposure, particularly if an equi-width partitioning is used.
`It has been pointed out in [6] that the indexes proposed in [13]
`can open the door to interference and linking attacks. Instead, they
`
`build a B-tree over plaintext values, but then encrypt every tuple
`and the B-tree at the node level using conventional encryption. The
`advantage of this approach is that the content of B-tree is not visible
`to an untrusted database server. The disadvantage is that the B-tree
`traversal can now be performed by the front-end only by execut-
`ing a sequence of queries that retrieve tree nodes at progressively
`deeper level.
`
`Other Relevant Work Rivest et al. [21] suggest that the limit on
`manipulating encrypted data arises from the choice of encryption
`functions used, and there exist encryption functions that permit en-
`crypted data to be operated on directly for many sets of interesting
`operations. They call these functions “privacy homomorphisms”.
`The focus of [21] and the subsequent follow-up work [2] [8] [9]
`has been on designing privacy homomorphisms to enable arith-
`metic on encrypted data, but the comparison operations were not
`investigated in this line of research.
`In [10], a simple but effective scheme has been proposed to en-
`crypt a look-up directory consisting of (key, value) pairs. The goal
`is to allow the corresponding value to be retrieved if and only if
`a valid key is provided. The essential idea is to encrypt complete
`tuples, but associate with every tuple the one-way hash value of its
`key. Thus, no tuple will be retrieved if an invalid key is presented.
`Answering range queries was not a goal of this system.
`In [23], interesting schemes are proposed to support keyword
`searches over an encrypted text repository. The driving application
`for this work is the efficient retrieval of encrypted email messages.
`Naturally, they do not discuss relational queries and it is not clear
`how their techniques can be adapted for relational databases.
`In [4], a smart card with encryption and query processing ca-
`pabilities is used to ensure the authorized and secure retrieval of
`encrypted data stored on untrusted servers. Encryption keys are
`
`Page 3 of 12
`
`Netskope Exhibit 1019
`
`
`
`running OPES with different input distributions and the same target
`distribution. Notice that the distribution of encrypted values looks
`identical in both 4(a) and 4(b), even though the input distributions
`were very different.
`3.1 Intuition
`To understand the intuition behind OPES algorithm, consider the
`following encryption scheme:
`Generate jP j unique values from a user-specified target distri-
`bution and sort them into a table T . The encrypted value ci of pi
`is then given by ci = T i. That is, the ith plaintext value in the
`sorted list of jP j plaintext values is encrypted into the ith value in
`the sorted list of jP j values obtained from the target distribution.
`The decryption of ci requires a lookup into a reverse map. Here T
`is the encryption key that must be kept secret.
`Clearly, this scheme does not reveal any information about the
`original values apart from the order, since the encrypted values
`were generated solely from the user-specified target distribution,
`without using any information from the original distribution. Even
`if an adversary has all of the encrypted values, he cannot infer T
`from those values. By appropriately choosing target distribution,
`the adversary can be forced to make large estimation errors.
`This simple scheme, while instructive, has the following short-
`comings for it to be used for encrypting large databases:
`
` The size of encryption key is twice as large as the number of
`unique values in the database.
`
` Updates are problematic. When adding a new value p, where
`pi p pi+ , we will need to re-encrypt all pj ; j i.1
`
`OPES has been designed such that the result of encryption is sta-
`tistically indistinguishable from the one obtained using the above
`scheme, thereby providing the same level of security, while remov-
`ing its shortcomings.
`3.2 Overview of OPES
`When encrypting a given database P , OPES makes use of all
`the plaintext values currently present P ,2 and also uses a database
`of sampled values from the target distribution. Only the encrypted
`database C is stored on disk. At the same time, OPES also creates
`some auxiliary information K, which the database system uses to
`decrypt encoded values or encrypt new values. Thus K serves the
`function of the encryption key. This auxiliary information is kept
`encrypted using conventional encryption techniques.
`OPES works in three stages:
`1. Model: The input and target distributions are modeled as
`piece-wise linear splines.
`2. Flatten: The plaintext database P is transformed into a “flat”
`database F such that the values in F are uniformly distributed.
`1It is possible to avoid immediate re-encryption by choosing an
`encrypted value for p in the interval (ci; ci+ ), but T would still
`need updating. Moreover, there might be cases where ci+ = ci+
`and therefore inserting a new value will require re-encryption of
`existing values.
`Note that the encryption scheme ci = T pi circumvents the up-
`date problem. But now the size of the key becomes the size of the
`domain. It is also vulnerable to percentile exposure, as discussed
`earlier in Section 2.
`2If an installation is creating a new database, the database admin-
`istrator can provide a sample of expected values.
`
`1
`
`1
`
`(a) Input: Uniform, Target: Zipf
`2500
`
`Original
`Target
`Encrypted
`
`2000
`
`1500
`
`1000
`
`Number of points
`
`500
`
`0
`
`0
`
`Scaled Domain
`
`(b) Input: Gaussian, Target: Zipf
`2500
`
`Original
`Target
`Encrypted
`
`2000
`
`1500
`
`1000
`
`Number of points
`
`500
`
`0
`
`0
`
`Scaled Domain
`
`Figure 4: Illustrating OPES.
`
`maintained on the smart card. The smart card can translate exact
`match queries into equivalent queries over encrypted data. How-
`ever, range queries require creating a disjunction for every pos-
`sible value in the range, which is infeasible for real data values.
`The smart card implementation could benefit from our encryption
`scheme in that range queries could be translated into equivalent
`queries over encrypted data.
`In [25], the security and tamper resistance of a database stored
`on a smart card is explored. They consider snooping attacks for
`secrecy, and spoofing, splicing, and replay attacks for tamper resis-
`tance. Retrieval performance is not the focus of their work and it
`is not clear how much of their techniques apply to general purpose
`databases not stored in specialized devices.
`Amongst commercial database products, Oracle 8i allows values
`in any of the columns of a table to be encrypted [18]. However,
`the encrypted column can no longer participate in indexing as the
`encryption is not order-preserving.
`Related work also includes research on order-preserving hashing
`[5] [11]. However, protecting the hash values from cryptanalysis is
`not the concern of this body of work. Similarly, the construction of
`original values from the hash values is not required.
`
`3. PROPOSED ORDER-PRESERVING EN-
`CRYPTION SCHEME
`The basic idea of OPES is to take as input a user-provided tar-
`get distribution and transform the plaintext values in such a way
`that the transformation preserves the order while the transformed
`values follow the target distribution. Figure 4 shows the result of
`
`Page 4 of 12
`
`Netskope Exhibit 1019
`
`
`
`3. Transform: The flat database F is transformed into the ci-
`pher database C such that the values in C are distributed
`according to the target distribution.
`
`Note that
`
`pi pj = fi fj = ci cj:
`
`We give details of the three stages in Sections 4, 5 and 6 respec-
`tively.
`
`4. MODELING THE DISTRIBUTIONS
`Techniques for modeling data distributions have been studied ex-
`tensively in the database literature in the context of estimating the
`costs of different query execution plans. As stated in [16], there are
`two broad categories of techniques: histogram-based that capture
`statistical information about a distribution by means of counters
`for a specified number of buckets, and parametric that approximate
`a distribution by fitting the parameters of a given type of function.
`We experimented with several histogram-based techniques [15], in-
`cluding equi-depth, equi-width, and wavelet-based methods, but
`found that the flattened values obtained were not uniformly dis-
`tributed unless the number of buckets was selected to be unreason-
`ably large. The main source of the problem was the assumption that
`the distribution is uniform within each bucket. The parametric are
`suitable for closed-form distributions but lead to poor estimations
`for irregular distributions [16], which we expect to be the norm in
`our application.
`We, therefore, resorted to a combination of histogram-based and
`parametric techniques. As in [16], we first partition the data values
`into buckets and then model the distribution within each bucket as
`a linear spline. The spline for a bucket pl; ph is simply the line
`connecting the densities at the two end-points of the bucket.3
`We also allow the width of value ranges to vary across buck-
`ets. However, unlike [16], we do not have a given fixed number
`of buckets. Rather, we use the minimum description length (MDL)
`principle [20] to determine the number of buckets.
`
`4.1 Bucket Boundaries
`The bucket boundaries are determined in two phases:4
`
`1. Growth phase. The space is recursively split into finer par-
`titions. Each partitioning of a bucket reduces the maximum
`deviation from the density function within the newly formed
`buckets when compared to their parent bucket.
`
`2. Prune phase. Some buckets are pruned (merged into big-
`ger buckets). The idea is to minimize the number of buckets
`and yet have the values within buckets after mapping be uni-
`formly distributed. We use the MDL principle to obtain this
`balance.
`
`The details of these two phases are discussed next.
`
`3In [16], the splines are not continuous across buckets; they use lin-
`ear regression over data values present in a bucket for determining
`the spline. However, such discontinuities may cause undesirable
`breaks in the uniformity when we flatten plaintext values.
`4This procedure is reminiscent of the procedure for building deci-
`sion tree classifiers, and in particular SLIQ [17], but the details are
`quite different.
`
`4.2 Growth Phase
`We are given a bucket pl; ph, with h (cid:0) l (cid:0) (sorted) points:
`fpl+ ; pl+; : : : ; ph(cid:0) g. We first find the linear spline for this
`bucket. Next, for each point ps in the bucket, we compute its ex-
`pected value if the points were distributed according to the density
`distribution modeled by the linear spline (i.e., the expected value
`of the s (cid:0) lth smallest value in a set of h (cid:0) l (cid:0) random val-
`ues drawn from the distribution). We then split the bucket at the
`point that has the largest deviation from its expected value (break-
`ing ties arbitrarily). We stop splitting when the number of points in
`a bucket is below some threshold, say, 10.
`
`4.3 Prune Phase
`The MDL principle [20] states that the best model for encoding
`data is the one that minimizes the sum of the cost of describing the
`model and the cost of describing the data in terms of that model.
`For a given bucket pl; ph, the local benefit LB of splitting this
`bucket at a point ps is given by
`
`LBpl; ph = DataCostpl; ph (cid:0) DataCostpl; ps
`(cid:0) DataCostps; ph (cid:0) IncrModelCost
`
`where DataCost(p ; p gives the cost of describing the data in the
`interval p ; p and IncrModelCost is the increase in modeling
`cost due to the partitioning of a bucket into two buckets.
`The global benefit GB of splitting this bucket at ps takes into
`account the benefit of further recursive splits:
`
`GBpl; ph = LBpl; ph + GBpl; ps + GBps; ph:
`
`If GB , the split is retained; otherwise, the split at ps and
`all recursive splits within pl; ph are pruned. Note that we do this
`computation bottom up, and therefore the cost is linear in the num-
`ber of splits.5
`We now provide the functions for the computation of DataCost
`and IncrModelCost. Assume momentarily the existence of a map-
`ping M that transforms values sampled from a linear density func-
`tion into a set of uniformly distributed values. We specify M in
`the next section. As we shall see, M will have two parameters: a
`quadratic coefficient and a scale factor.
`
`4.3.1 DataCost
`We want to flatten a given data distribution into a uniform distri-
`bution. So, given a bucket, we first flatten the values present in the
`bucket using the mapping M , and then compute the cost of encod-
`ing the deviations from uniformity for the mapped values.6
`Let the set of data values fpl; pl+ ; : : : ; ph(cid:0) g be mapped into
`ffl; fl+ ; : : : ; fh(cid:0) g using M . The encoding of a value pi
`
`5One might wonder why we did not combine pruning with the
`growth phase and stop splitting a bucket as soon as the local benefit
`became zero or negative. The reason is that the benefit of partition-
`ing may start showing only at a finer granularity, and it will often
`be the case that the local benefit is less than zero even though the
`global benefit is much greater than zero.
`6Note that our implementation encodes only the statistically sig-
`nificant deviations to avoid overfitting, i.e., rather than a single ex-
`pected value, we consider the range of values that would occur with
`a uniform distribution, and only encode values that are outside this
`range. We omit this detail for brevity.
`
`Page 5 of 12
`
`Netskope Exhibit 1019
`
`
`
`5.2 Scale Factor
`We need to find the scale factor z, one for each bucket B such
`that:
`1. Two distinct values in the plaintext will always map to two
`distinct values in the flattened space, thereby ensuring incre-
`mental updatability.
`2. Each bucket is mapped to a space proportional to the number
`of points n in that bucket, i.e., if w is the width of the bucket
`and wf = M w is the width after flattening, then wf n.
`The first constraint can be written as:
`
`p ; w : M p + (cid:0) M p :
`
`The 2 in the RHS (instead of 1) ensures two adjacent plaintext val-
`ues will be at least 2 apart in the flattened space. As we will explain
`in Section 5.5, this extra separation makes encryption tolerant to
`rounding errors in floating point calculations. Expanding M , we
`get
`
`p ; w : z =sp + + :
`
`The largest value of =sp + + will be at p = if s ,
`and at p = w (cid:0) otherwise. Therefore we get
`
`^z = ;
`
`= + sw (cid:0) ;
`
`s
`s
`
`where ^z denotes the minimum value of z that will satisfy the first
`constraint.
`To satisfy the second constraint, we want
`
`wf = Kn
`
`for all the buckets. Define
`
`^wf = ^zsw + w
`
`as the minimum width for each bucket, and define
`
`
`
`K = maxh ^wfii ; i = ; : : : ; m:
`
`Then the scale factors
`
`z =
`
`Kn
`sw + w
`will satisfy both the desired constraints, since z ^z, and wf =
`zsw + w = Kn.
`5.3 Encryption Key
`Let us briefly review what we have at this stage. The model-
`ing phase has yielded a set of buckets f B ; : : : ; Bm g. For each
`bucket, we also have a mapping function M , characterized by two
`parameters: the quadratic coefficient s and the scale factor z. We
`save the m + bucket boundaries, the m quadratic coefficients,
`and the m scale factors in a data structure Kf . The database sys-
`tem uses Kf to flatten (encrypt) new plaintext values, and also to
`unflatten (decrypt) a flattened value. Thus Kf serves the function
`of the encryption key.
`Note that Kf is computed once at the time of initial encryption
`of the database. As the database obtains new values, Kf is used to
`encrypt them, but it is not updated, which endows OPES with the
`incremental updatability property.8
`8In order
`
`to encrypt stray values outside of
`
`the current
`
`pl; ph would cost
`
`Costpi = log jfi (cid:0) Eij;
`
`where Ei, the expected value of the ith number assuming unifor-
`mity, is given by
`
`Ei =f l +
`
`i (cid:0) l
`h (cid:0) l
`The cost of encoding all the values in the interval pl; ph is then
`given by
`
`fh (cid:0) fl:
`
`DataCostpl; ph =
`
`Costpi:
`
`h(cid:0)
`
`Xi=l+
`
`IncrModelCost
`4.3.2
`If we have m buckets, we need to store m + boundaries, m
`quadratic coefficients, and m scale factors. Thus the model cost
`will be m + , assuming 32 bits for each of these values.
`More importantly, the cost of an additional bucket
`
`IncrModelCost = = :
`
`5. FLATTEN
`The overall idea of the flatten stage is to map a plaintext bucket B
`into a bucket Bf in the flattened space in such a way that the length
`of Bf is proportional to the number of values present in B. Thus,
`the dense plaintext buckets will be stretched and the sparse buckets
`will be compressed. The values within a bucket are mapped in
`such a way that the density will be uniform in the flattened bucket.
`Since the densities are uniform both inter-bucket and intra-bucket,
`the values in the flattened database will be uniformly distributed.
`We specify next a mapping function that accomplishes these goals.
`5.1 Mapping Function
`
`OBSERVATION 1. If a distribution over ; ph has the density
`function qp + r, where p ; ph, then for any constant z ,
`the mapping function
`
`
`
`p + p
`
`q
`
`M p = z
`
`r
`will yield a uniformly distributed set of values.
`
`This follows from the fact that the slope of the mapping function
`at any point p is proportional to the density at p:7
`
`qp + r
`
`z r
`
`=
`
`dM
`dp
`
` qp + r:
`
`We will refer to s := q=r as the quadratic coefficient. Thus
`
`M p = zsp + p
`
`A different scale factor z is used for different buckets, to make
`the inter-bucket density uniform as well. We describe next how the
`scale factors are computed.
`7An equivalent way to think about this is that the space around p,
`say from p (cid:0) to p + is mapped to a length of
`
`M p + (cid:0) M p (cid:0) =
`
`z
`r
`
`qp + r qp + r:
`
`Page 6 of 12
`
`Netskope Exhibit 1019
`
`
`
`5.4 Mapping a Plaintext Value into a Flat Value
`Represent the domains of the input database P and the flat database
`F as pmin; pmax and fmin; fmax respectively. Note that
`
`Compute the buckets
`1.
`used to transform the plain-
`text distribution into the flat-
`tened distribution
`
`2. From the target distri-
`bution, compute the buckets
`for the flattened distribution
`^Bf .
`
`wf
`i
`
`mXi
`
`=
`
`fmax = fmin +
`
`where wf
`i = Miwi. Recall that wi is the length of plaintext
`bucket Bi, and wf
`i the length of the corresponding flat bucket.
`To flatten a plaintext value p, we first determine the bucket Bi
`into which p falls, using the information about the bucket bound-
`aries saved in Kf . Now p is mapped into the flat value f using the
`equation:
`
`i(cid:0)
`
`i(cid:0)
`
`B
`0
`
`B
`
`m−1
`
`f
`B
`0
`
`^
`f
`B
`0
`
`^
`B
`
`f
`k−1
`
`f
`B
`m−1
`
`B
`
`t
`0
`
`B
`
`t
`k−1
`
`3. Scale the buckets of the flattened target distribution to
`equal the range of the flattened distribution computed in Step
`1, and scale the target distribution proportionately.
`
`B
`0
`
`B
`
`m−1
`
`f
`B
`0
`
`_
`f
`B
`0
`
`f
`B
`m−1
`
`_
`B
`
`f
`k−1
`
`B
`
`c
`0
`
`B
`
`c
`k−1
`
`Figure 5: Scaling the target distribution.
`
`6. TRANSFORM
`The transform stage is almost a mirror image of the flatten stage.
`Given a uniformly distributed set of flattened values, we want to
`map them into the target distribution. An equivalent way of think-
`ing about the problem is that we want to flatten the target distribu-
`tion into a uniform distribution, while ensuring that the distribution
`so obtained “lines-up” with the uniform distribution yielded by flat-
`tening the plaintext distribution.
`Figure 5 portrays this process. We already have on hand the
`buckets for the plain text distribution (Step 1). We bucketize the
`target distribution, independent of the bucketization of the plain-
`text distribution (Step 2). We then scale the target distribution (and
`the flattened target distribution) in such a way that the width of th

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site