`
`Practical Techniques for Searches on Encrypted Data
`
`fdawnsong, daw, perrigg@cs.berkeley.edu
`
`Dawn Xiaodong Song David Wagner Adrian Perrig
`
`University of California, Berkeley
`
`Abstract
`
`1 Introduction
`
`It is desirable to store data on data storage servers such
`as mail servers and file servers in encrypted form to reduce
`security and privacy risks. But this usually implies that one
`has to sacrifice functionality for security. For example, if a
`client wishes to retrieve only documents containing certain
`words, it was not previously known how to let the data stor-
`age server perform the search and answer the query without
`loss of data confidentiality.
`In this paper, we describe our cryptographic schemes
`for the problem of searching on encrypted data and pro-
`vide proofs of security for the resulting crypto systems. Our
`techniques have a number of crucial advantages. They are
`provably secure: they provide provable secrecy for encryp-
`tion, in the sense that the untrusted server cannot learn
`anything about the plaintext when only given the cipher-
`text; they provide query isolation for searches, meaning
`that the untrusted server cannot learn anything more about
`the plaintext than the search result; they provide controlled
`searching, so that the untrusted server cannot search for an
`arbitrary word without the user’s authorization; they also
`support hidden queries, so that the user may ask the un-
`trusted server to search for a secret word without revealing
`the word to the server. The algorithms we present are sim-
`
`cipher operations), and introduce almost no space and com-
`munication overhead, and hence are practical to use today.
`
`ple, fast (for a document of length, the encryption and
`search algorithms only need stream cipher and block
`
`
`We gratefully acknowledge support for this research from several US
`government agencies. This research was suported in part by the Defense
`Advanced Research Projects Agency under DARPA contract N6601-99-
`28913 (under supervision of the Space and Naval Warfare Systems Center
`San Diego), by the National Science foundation under grant FD99-79852,
`and by the United States Postal Service under grant USPS 1025 90-98-C-
`3513. Views and conclusions contained in this document are those of the
`authors and do not necessarily represent the official opinion or policies,
`either expressed or implied of the US government or any of its agencies,
`DARPA, NSF, USPS.
`
`Today’s mail servers such as IMAP servers [11], file
`servers and other data storage servers typically must be fully
`trusted—they have access to the data, and hence must be
`trusted not to reveal it without authorization—which intro-
`duces undesirable security and privacy risks in applications.
`Previous work shows how to build encrypted file systems
`and secure mail servers, but typically one must sacrifice
`functionality to ensure security. The fundamental problem
`is that moving the computation to the data storage seems
`very difficult when the data is encrypted, and many com-
`putation problems over encrypted data previously had no
`practical solutions.
`In this paper, we show how to support searching func-
`tionality without any loss of data confidentiality. An exam-
`ple is where a mobile user with limited bandwidth wants
`to retrieve all email containing the word “Urgent” from an
`untrusted mail-storage server in the infrastructure. This is
`trivial to do when the server knows the content of the data,
`but how can we support search queries if we do not wish to
`reveal all our email to the server?
`Our answer is to present cryptographic schemes that en-
`able searching on encrypted data without leaking any infor-
`mation to the untrusted server.
`
`(cid:15) Our techniques are provably secure. The techniques
`(cid:15) Our schemes are efficient and practical. The algo-
`
`provide provable secrecy for encryption, in the sense
`that the untrusted server cannot learn anything about
`the plaintext given only the ciphertext. The tech-
`niques provide controlled searching, so that the un-
`trusted server cannot search for a word without the
`user’s authorization. The techniques support hidden
`queries, so that the user may ask the untrusted server
`to search for a secret word without revealing the word
`to the server. The techniques also support query isola-
`tion, meaning that the untrusted server learns nothing
`more than the search result about the plaintext.
`
`rithms we present are simple and fast. More specifi-
`
`Page 1 of 12
`
`Netskope Exhibit 1018
`
`
`
`cipher and block cipher operations. Our schemes in-
`troduce essentially no space and communication over-
`head. They are also flexible and can be easily extended
`to support more advanced searches.
`
`Our schemes all take the form of probabilistic searching:
`
`cally, for a document of length, the encryption and
`search algorithms only need number of stream
`a search for the wordW returns all the positions whereW
`by adjusting a parameter in the encryption algorithm;
`1=2
`, so for a`-word document, we expect to see about
``=2
`
`occurs in the plaintext, as well as possibly some other er-
`roneous positions. We may control the number of errors
`
`each wrong position will be returned with probability about
`
`false matches. The user will be able to eliminate all
`the false matches (by decrypting), so in remote searching
`applications, false matches should not be a problem so long
`as they are not so common that they overwhelm the com-
`munication channel between the user and the server.
`This paper is structured as follows. We first introduce
`the problem of searching on encrypted data in Section 2 and
`briefly review some important background in Section 3. We
`then describe our solution for the case of searching with
`sequential scan in Section 4. We discuss further issues such
`as advanced search and search with index in Section 5. We
`discuss related work in Section 6 and finally we conclude in
`Section 7. Appendix A presents the proofs for all of proofs
`of security for these schemes.
`
`2 Searching on Encrypted Data
`
`We first define the problem of searching on encrypted
`data.
`Assume Alice has a set of documents and stores them
`on an untrusted server Bob. For example, Alice could be a
`mobile user who stores her email messages on an untrusted
`mail server. Because Bob is untrusted, Alice wishes to en-
`crypt her documents and only store the ciphertext on Bob.
`Each document can be divided up into ‘words’. Each ‘word’
`may be any token; it may be a 64-bit block, an English
`word, a sentence, or some other atomic quantity, according
`to the application domain of interest. For simplicity, we typ-
`ically assume these ‘words’ have the same length (otherwise
`we can either pad the shorter ‘words’ or split longer ‘words’
`to make all the ‘words’ to have equal length, or use some
`simple extensions for variable length ‘words’; see also Sec-
`tion 5.3). Because Alice may have only a low-bandwidth
`network connection to the server Bob, she wishes to only
`
`retrieve the documents which contain the wordW . In or-
`ument contains the wordW without learning anything else.
`
`der to achieve this goal, we need to design a scheme so that
`after performing certain computations over the ciphertext,
`Bob can determine with some probability whether each doc-
`
`ity is to build up an index that, for each wordW of interest,
`lists the documents that containW . An alternative is to per-
`
`There seem to be two types of approaches. One possibil-
`
`form a sequential scan without an index. The advantage of
`using an index is that it may be faster than the sequential
`scan when the documents are large. The disadvantage of
`using an index is that storing and updating the index can be
`of substantial overhead. So the approach of using an index
`is more suitable for mostly-read-only data.
`We first describe our scheme for searching on encrypted
`data without an index. Since the index-based schemes seem
`to require less sophisticated constructions, we will defer
`discussion of searching with an index until the end of the
`paper (see Section 5.4).
`
`3 Background and Definitions
`
`Our scheme requires several fundamental primitives
`from classical symmetric-key cryptography. Because we
`will prove our scheme secure, we use only primitives with
`a well-defined notion of security. We will list here the re-
`quired primitives, as well as reviewing the standard defini-
`tions of security for them. The definitions may be skipped
`on first reading for those uninterested in our theoretical
`proofs of security.
`We adopt the standard definitions of security from the
`provable security literature [2], and we measure the strength
`of the cryptographic primitives in terms of the resources
`
`a cryptographic primitive if the attack algorithm succeeds
`
`needed to break them. We will say that an attackR-breaks
`in breaking the primitive with resources specified byR, and
`we say that a crypto primitive isR-secure if there is no al-
`gorithm that canR-break it. LetA:f0;1g!f0;1g
`be an arbitrary algorithm and letX andY be random vari-
`ables distributed onf0;1g
`ofA—sometimes called the advantage ofA—forX andY
`AdvA=j[AX=1℄[AY=1℄j:
`1. A pseudorandom generatorG, i.e., a stream cipher.
`We say thatG: G!S is a;e-secure pseu-
`dorandom generator if every algorithmA with run-
`ning time at most has advantage AdvA<e. The
`advantage of an adversaryA is defined as AdvA=
`j[AGU G=1℄[AUS=1℄j, where
`U G;US are random variables distributed uniformly
`on G;S.
`2. A pseudorandom functionF . We say thatF: F
`X!Y is a;;e-secure pseudorandom function
`if every oracle algorithmA making at most oracle
`
`. The distinguishing probability
`
`is
`
`With this background, our list of required primitives is
`as follows:
`
`Page 2 of 12
`
`Netskope Exhibit 1018
`
`
`
`queries and with running time at most has advantage
`AdvA<e. The advantage is defined as AdvA=
`j[AFk=1℄[AR=1℄j whereR represents
`all maps fromX toY, and where the probabilities are
`taken over the choice ofk andR.
`3. A pseudorandom permutationE, i.e., a block cipher.
`We say thatE: EZ!Z is a;;e-secure pseu-
`dorandom function if every oracle algorithmA making
`at most oracle queries and with running time at most
` has advantage AdvA<e. The advantage is defined
`as AdvA=j[AEk;E1k=1℄[A(cid:25);(cid:25)1=1℄j
`where(cid:25) represents a random permutation selected uni-
`formly from the set of all bijections onZ, and where
`the probabilities are taken over the choice ofk and(cid:25).
`In general, the intuition is that;;e-security represents
`resistance to attacks that use at most offline work and at
`most adaptive chosen-text queries.
`We rely on the following notation. Iff: X!Y
`writefkx for the result of applyingf to inputx with key
`k2 . We writehx;yi for the concatenation ofx andy,
`andxy for the bitwise XOR ofx andy. For the remain-
`der of the paper, we letG: G!X`
`generator for some`,F: FX!Y be a pseudo-
`random function, andE: EZ!Z be a pseudoran-
`dom permutation. Typically we will haveX=f0;1g
`Y=f0;1g
`, andZ=XY=f0;1g
`sequence of wordsW1;:::;W`.
`
`In this section, we introduce our solution for searching
`with sequential scan. We first start with a basic scheme
`and show that its encryption algorithm provides provable
`secrecy. We then show how we can extend the first scheme
`to handle controlled searching and hidden searches. We de-
`scribe our final scheme which satisfies all the properties we
`mentioned earlier including query isolation at the end.
`
`works by computing the bitwise exclusive or (XOR) of the
`clear-text with a sequence of pseudorandom bits which have
`a special structure. This structure will allow to search on the
`data without revealing anything else about the clear text.
`More specifically, the basic scheme is as follows. Alice
`
`using some stream cipher (namely, the pseudorandom gen-
`
`can decrypt. Of course, encryption can be done on-line, so
`that we encrypt each word as it becomes available.
`
`sition in the document. Another alternative is to choose a
`
`More generally, at each position, Alice can either (a) choose
`
`how this flexibility allows us to support a variety of inter-
`esting features.
`The basic scheme provides provable secrecy if the pseu-
`
`are secure. By this, we mean that, at each position where
`
`truly random bits for any computationally-bounded adver-
`sary. We formalize the theorem as below.
`
`tor, and if the key material is chosen as described above,
`then the algorithm described above for generating the se-
`
`In other words, we expect the basic scheme to be good
`
`the pseudorandom function and pseudorandom generator
`are adequately secure. See Appendix A for a slightly more
`precise statement of the theorem and for a full proof.
`The basic scheme supports searches over the ciphertext
`
`generates a sequence of pseudorandom valuesS1;:::;S`
`eratorG), where eachSi is bits long. To encrypt
`a-bit wordWi that appears in positioni, Alice takes the
`pseudorandom bitsSi, setsTi:=hSi;FkiSii, and outputs
`the ciphertextCi:=WiTi. Note that only Alice can gen-
`erate the pseudorandom streamT1;:::;T` so no one else
`There is some flexibility in how the keyski may be cho-
`sen. One possibility is to use the same keyk at every po-
`new keyki for each position independent of all other keys.
`ki to be the same as some previouskj (j<i), or (b) choose
`ki independently of all the previous keys. We shall see later
`dorandom functionF and the pseudorandom generatorG
`ki is unknown, the valuesTi are indistinguishable from
`Theorem 4.1. IfF is a;`;eF-secure pseudorandom
`function andG is a;eG-secure pseudorandom genera-
`quencehT1;:::;T`i is a(cid:15);e-secure pseudorandom
`generator, wheree=` eFeG``1=2jXj and the
`constant(cid:15) is negligible compared to.
`for encrypting up to about`max=2 =2 words, if
`in the following way: if Alice wants to search the wordW ,
`she can tell BobW and theki corresponding to each lo-
`cationi where a wordW may occur. Bob can then search
`forW in the ciphertext by checking whetherCiWi is
`of the formh;Fkii for some. Such a search can be
`not knowki, Bob learns nothing about the plaintext. Thus,
`phertext, Alice should reveal only theki corresponding to
`
`performed in linear time. At the positions where Bob does
`
`the scheme allows a limited form of control: if Alice only
`wants Bob to be able to search over the first half of the ci-
`
`a random function selected uniformly from the set of
`
`Notice that the adversary is given an oracle for encryp-
`tion as well as for decryption; this corresponds to the
`adaptive chosen-plaintext/ciphertext attack model.
`
`There is of course no fundamental need for three sepa-
`rate primitives, since in practice all three may be built out
`of just one off-the-shelf primitive. For instance, given any
`block cipher, we may build a pseudorandom generator us-
`ing the counter mode [3] or a pseudorandom function using
`the CBC-MAC [4].
`
`represents a pseudorandom function or permutation, we
`
`be a pseudorandom
`
`,
`
`.
`
`4 Our Solution with Sequential Scan
`
`4.1 Scheme I: The Basic Scheme
`
`Alice wants to encrypt a document which contains the
`Intuitively, the scheme
`
`Page 3 of 12
`
`Netskope Exhibit 1018
`
`
`
`Plaintext
`
`Stream Cipher
`
`Ciphertext
`
`Figure 1. The Basic Scheme
`
`the ciphertext.
`As described so far, the basic scheme is not terribly sat-
`
`ice to control which chapters Bob may search in as well as
`controlling which words Bob may search for.
`We can take this idea even further by using a hierarchi-
`
`ing the entire document), or Alice must know in advance
`
`Then she can reveal either
`
`the purpose of remote searching). However, we shall see
`next how to take care of this difficulty.
`
`4.2 Scheme II: Controlled Searching
`
`This scheme still does not support hidden search queries:
`in order to let Bob search for the location where the word
`
`that this problem can be easily fixed.
`
`4.3 Scheme III: Support for Hidden Searches
`
`Alice and never be revealed. Then, if Alice wish to allow
`
`Suppose Alice would now like to ask Bob to search for
`
`might occur, but reveals absolutely nothing on the locations
`
`trolled searching. We show the correctness of this approach
`in the following theorem.
`
`propose a simple extension to the above scheme to support
`this goal.
`
`clear text separately using a deterministic encryption algo-
`
`
`Wi
`(cid:18)R
`
`
`FkiSi
`Si
`(cid:30) (cid:29)6
`Fki
`:=fk0hC;Wi. This allows Al-
`those locations and none of theki used in the second half of
`W in chapterC aski
`isfying: if Alice wants to help Bob search for a wordW ,
`cal key management scheme. Alice setski:=fh0;Ci
`either Alice must reveal all theki (thus potentially reveal-
`:=fk0h1;Wii.
`and
`(1)ffk0h1;Wih0;Ci for each chapter of interest or (2)
`which locationsW may appear at (which seems to defeat
`fk0h1;Wi itself if she wishes to succinctly authorize Bob
`to search forW in all the chapters.
`W appears, Alice has to revealW to Bob. We shall see next
`Letf: Ff0;1g ! F be an additional pseudo-
`random function, which will be keyed independently ofF .
`The main idea is to choose our keys aski:=fk0Wi.
`We require thatk0
`be chosen uniformly randomly in by
`a wordW but she is not willing to revealW to Bob. We
`Bob to search for the wordW , she revealsfk0W andW to
`him. This allows Bob to identify all the locations whereW
`Alice should merely pre-encrypt each wordW of the
`i whereWi6=W . This attains our desired goal of con-
`rithmEk00 . Note thatE is not allowed to use any random-
`ness, and the computation ofEk00x may depend only on
`x and must not depend on the positioni in the document
`Theorem 4.2. SupposeF is a;`;eF-secure pseudoran-
`wherex is found. So we may think of this pre-encryption
`dom function,f is a;`;ef-secure pseudorandom func-
`tion, andG is a;eG-secure pseudorandom generator.
`long, internally the mapEk00 may be implemented by CBC-
`encryptingWi with a constant IV, or some such, but the
`hT1;:::;T`i will be a(cid:15);e-secure pseudorandom
`generator, wheree=` eFefeG``1=2jXj.
`of the document.) We letXi:=Ek00Wi.
`E-encrypted wordsX1;:::;X`. Now she post-encrypts
`scribed above to obtainCi
`:=XiTi, whereXi=
`Ek00Wi andTi=hSi;FkiXii:
`To search for a wordW , Alice computesX:=Ek00W
`andk:=fk0X, and sendshX;ki to Bob. Note that this
`alternative approach is to generate the keyki for the word
`
`This shows that our scheme for controlled searching is
`about as good as the basic scheme, if the underlying prim-
`itives are secure. See Appendix A for a proof as well as a
`more precise formulation.
`Various extensions of this idea are possible. If the doc-
`ument to be encrypted consists of a series of chapters, an
`
`then
`If the key material is chosen as described above,
`the algorithm described above for generating the sequence
`
`step as ECB encryption of the words of the document us-
`ing some block cipher.
`(Of course, if the word is very
`
`point is that this process must be the same at every position
`
`After the pre-encryption phase, Alice has a sequence of
`
`that sequence using the stream cipher construction de-
`
`Page 4 of 12
`
`Netskope Exhibit 1018
`
`
`
`??
`Wi
`PlaintextE
`EWi
`(cid:18)R
`
`
`
`FkiSi
`Si
`(cid:30) (cid:29)6
`Fki
`distinct. (Assuming that the pre-encryptionE is a pseudo-
`allows Bob to search forW without revealingW itself. It
`property as long as the pre-encryptionE is secure.
`crypting` words is at most``1=2 1: )
`tion, meaning that even when a single keyki is revealed, no
`positions where the corresponding wordWi occurs.
`keyski aski:=fk0Ek00Wi then Alice can no longer
`Theorem 4.3. SupposeE is a;`;eE-secure pseudoran-
`would need to knowEk00Wi (more precisely, the last
`dom permutation,F is a;`;eF-secure pseudorandom
`bits ofEk00Wi) before she can decrypt. This defeats the
`function,f is a;`;ef-secure pseudorandom function,G
`is a;eG-secure pseudorandom generator, and we choose
`scribed above for generating the sequencehT1;:::;T`i will
`be a a(cid:15);e-secure pseudorandom generator, where
`e=` eFefeG``1=2jXj.
`Moreover, if we disclose oneki and consider the reduced
`sequenceT0
`scheme, we split the pre-encrypted wordXi=Ek00Wi
`obtained by discarding all theTj values at po-
`sitions whereWj=Wi, then we obtain a(cid:15);e0-secure
`into two parts,Xi=hi;Rii, wherei (respectivelyRi)
`pseudorandom generator, wheree0=eeE`=jXj.
`denotes the first bits (resp. last bits) ofXi. Instead
`of generatingki:=fk0EWi, Alice should generateki
`aski:=fk0i. To decrypt, Alice can generateSi using
`tually requireE to be a pseudorandom permutation: if
`and withSi she can recoveri by XORingSi against the
`denotes the (keyed) map sendingW to the first bits
`first bits ofCi. Finally, knowledge ofi allows Alice
`ofEk00W, then we can make do with the much weaker
`to computeki and thus finish the decryption.
`assumption that collisions in should be rare. As a special
`This fix is not secure if theWi’s are not encrypted since it
`case, if the firstb bits of (b(cid:20) ) can be shown to be
`a pseudorandom function, thenE will necessarily have the
`the same first bits. Pre-encryption will eliminate
`this problem, since with high probability all thei’s are
`
`purpose of an encryption scheme, because even legitimate
`principals with access to the decryption keys will be unable
`to decrypt. (Scheme II also has a similar inadequacy, but
`as we will show below, the best way to fix it is to introduce
`pre-encryption as in Scheme III.)
`We now show a simple fix for this problem. In the fixed
`
`the pseudorandom generator (since Alice knows the seed),
`
`Strictly speaking, the proof of the theorem does not ac-
`
`might be very likely in some cases that different words have
`
`required property, and we will be able to prove a result anal-
`ogous to Theorem 3. This suggests that for pre-encryption
`
`Stream Cipher
`
`Ciphertext
`
`Figure 2. The Scheme for Hidden Search
`
`is easy to see that this scheme satisfies the hidden search
`
`random permutation, then due to the birthday paradox [15],
`the probability that at least one collision happens after en-
`
`4.4 Scheme IV: The Final Scheme
`
`Careful readers may have noticed that Scheme III ac-
`tually suffers from a small inadequacy: if Alice generates
`
`recover the plaintext from just the ciphertext because she
`
`With this fix, the resulting scheme is provably secure,
`and in fact we can also show that it provides query isola-
`
`extra information is leaked beyond the ability to identify the
`
`the key material as described above. Then the algorithm de-
`
`Page 5 of 12
`
`Netskope Exhibit 1018
`
`
`
`??
`PlaintextWiEEWi
`
`(cid:27)
`i
`Ri
`(cid:18)R
`
`
`
`FkiSi
`Si
`(cid:30) (cid:29)6
`Fki
`of long blocks it may be convenient to takeEk00W to be
`the bit-reversal of the CBC encryption ofW under keyk00
`queries which use boolean operators (e.g.,W andW0
`proximity queries (e.g.,W nearW0
`(e.g.,W immediately precedesW0
`For example, if Alice wishes to search for the wordab[a–z℄,
`faba;abb;:::;abzg. However, the number of queries re-
`
`amples to illustrate that it is relatively easy to implement
`more advanced searching functionality using our scheme as
`a fundamental building block.
`It is clear that we can easily support advanced search
`),
`), and phrase searches
`).
`We can also support searches if the query is given as a
`regular expression using, e.g., wildcards in a limited form.
`
`Stream Cipher
`
`Ciphertext
`
`Figure 3. The Final Scheme
`
`(using a constant IV).
`After encrypting word by word, Alice can also re-order
`the ciphertext according to some pseudorandom permuta-
`tion (known only to Alice). In this case, when Bob performs
`the search of a word, he will not be able to tell the position
`where the word occurs in the real plaintext.
`
`5 Discussion
`
`5.1 Other Practical Considerations
`
`We can see that updates in this scheme are easy. For ex-
`ample, if Alice wants to add a new document into Bob’s
`data storage, she can simply encrypt it in the appropriate
`way and instruct Bob to append it to the already-stored ci-
`phertext. Moreover, since the keys can be generated hierar-
`chically from a master key, the key storage and management
`is also very convenient: Alice only needs to remember one
`password, the master key.
`The underlying technique of embedding information in
`pseudorandom bit streams may also be of independent in-
`terest: we speculate that this simple trick might prove useful
`for other applications, too.
`
`5.2 Supporting More Advanced Search Queries
`
`The schemes we presented earlier only address the prob-
`lem of searching for a single word. We show several ex-
`
`then she can actually generate 26 search queries of the form
`
`quired (and the information leaked to the server) clearly
`grows dramatically as the search word becomes more gen-
`eral.
`For many applications the purpose of the search is to find
`documents which contain a specific word, where the posi-
`tion or the number of occurrences are not relevant. For ex-
`ample, searching email is such an application, where the
`query takes on the form “find all email from Joe”. For this
`application, the previous search schemes leaked informa-
`tion, since the server would know the positions of the word
`in the document, or at least the frequency of a word in a doc-
`ument in case the word order is scrambled. Since we only
`need to know whether a given document contains a word,
`we can use the following trick. We add a count to each
`word, which counts how many times that word occurs pre-
`viously in that document. For example, the first occurrence
`
`of the word “urgent” is stored ash0; urgenti, the second oc-
`currence ash1; urgenti, etc. This allows Alice to search for
`
`Page 6 of 12
`
`Netskope Exhibit 1018
`
`
`
`lows Alice to search for documents, that contain or more
`occurrences of the wordW by searching forh1;Wi.
`
`the first occurrence only, if she wishes just to identify the
`documents where the word appears; and Bob does not gain
`any information about other positions of the search term in
`the document. As an additional feature, this encoding al-
`
`5.3 Dealing with Variable-Length Words
`
`In our scheme, the minimal unit we can search for is an
`individual word. So far we have assumed that the clear text
`can be easily broken into a sequence of words of a fixed
`length. But this might not be true in a normal text document.
`For example, if the minimal unit of search interest is one
`English word, then we have to deal with the fact that English
`words differ in length.
`One possibility is to pick a fixed-size block that is long
`enough to contain most words. Words that are too short
`or too long may be padded to a multiple of the block size
`with some pre-determined padding format. (Note that the
`padding cannot be random since Alice needs to know the
`padding in order to perform the search.) However, such a
`padding scheme would introduce space inefficiency. Also,
`for security reasons we cannot decrease the word length be-
`low a certain limit.
`Another solution is to use variable length words. In this
`case, to support random-access decryption, the length of
`each word also needs to be stored with the word. One natu-
`ral approach is to store the length field before each word in
`the file, and to glue the length field and word together as one
`word to perform encryption and search using our standard
`schemes.
`When words lengths may vary, it is important to hide the
`length information from the server, because revealing the
`length of each word might allow for statistical attacks. For-
`tunately, in this case the server does not need to know the
`lengths to perform a search: he may just scan through the
`file and check for a match at each possible bit boundary.
`In this case, the cost of each scan is increased, because the
`number of operations is determined by the bit-length of the
`document rather than by the number of blocks in the docu-
`ment. However, such an approach may provide better space
`efficiency than is available with a block-oriented scheme.
`
`5.4 Searching with an Encrypted Index
`
`Sequential scan may not be efficient enough when the
`data size is large.
`For some applications,
`i.e.
`large
`databases, a common technique to speed up the searching
`is to use a pre-computed index. Here we show how we can
`answer search queries with the aid of an encrypted index
`without sacrificing security.
`
`An index contains a list of key words; with each key
`word is a list of pointers to documents where the key word
`appears. The key words are words of interest that Alice may
`want to search for later. Alice can certainly build the index
`of her clear text documents and then encrypt the clear text
`and the index and store the ciphertext on Bob. The interest-
`ing question is how to encrypt the index.
`A naive way would be to just encrypt the key words in
`the index and leave the lists of positions in clear. This makes
`it easy for Bob to perform search queries on Alice’s behalf,
`but also leaks a lot of information to Bob and hence may
`allow him to apply various statistical attacks. Therefore, we
`reject this naive approach.
`A simple way is to also encrypt the document pointers in
`each list in the index. Consequently, when Bob searches for
`
`EW and finds a match, he returns Alice the encrypted list
`pointers in the index using a keykW relatedEW, i.e.
`kW=fk000EW. Hence, when Alice wants to search
`for the wordW , she will revealhEW;kWi to Bob. In
`
`of matching positions from the index. Alice may decrypt
`the encrypted entries and send Bob another request to re-
`trieve the relevant documents. One possible advantage for
`this scheme is that the request could be embedded in other
`retrievals so that Bob might have uncertainty about the cor-
`relation of the search request and the retrieval request for
`ciphertext. The disadvantage is that Alice has to spend an
`extra round-trip time to retrieve the documents.
`If Alice does not want to wait for an extra round-trip
`time, or if Alice would like Bob to merge the results of sev-
`eral search queries for her, other techniques are also avail-
`able. For example, she may encrypt the list of document
`
`order to prevent Bob from doing statistical analysis on the
`index, it is better to keep the lists of pointers in a fixed-size
`list. For words that appear infrequently, Alice can pad the
`list to the fixed size. For more common words, Alice can
`split the long list into several lists with the fixed size; then,
`to search for such a word, Alice will need to ask Bob to
`perform and merge several search queries in parallel. Note
`that by keeping the lists of pointers in a fixed-size list, we
`are mainly preventing Bob from learning statistical infor-
`mation on the key words that he has not searched. For the
`key words that Bob has searched, he might still be able to
`learn some statistical information from Alice’s access pat-
`tern. This is acceptable from our point of view since Alice
`only wants to retrieve relevant documents in the first place.
`Note that a general disadvantage for index search is that
`whenever Alice changes her documents, she must update
`the index. There is a trade-off between how much index Al-
`ice updates and how much information Bob might be able to
`learn. For example, if Alice does not change the list of doc-
`ument pointers for one key word entry when she adds a new
`document into Bob’s data storage, Bob would be able to
`tell that the key word does not appear in the new document.
`
`Page 7 of 12
`
`Netskope Exhibit 1018
`
`
`
`Hence, Alice needs to update a substantial part of the index
`to hide the real updates which can be quite expensive. It is
`an interesting research question to develop schemes which
`support more efficient updates.
`
`5.5 More Security Issues
`
`W we effectively disclose to him a list of potential locations
`whereW might occur. If we allow Bob to search for too
`One possible defense is to decrease (so that false matches
`
`In all our schemes, by allowing Bob to search for a word
`
`many words, he may be able to use statistical techniques to
`start learning important information about the documents.
`
`are more prevalent and thus Bob’s information about the
`plaintext is ‘noisy’), but we have not analyzed the cost-
`effectiveness of this tradeoff in any detail.
`A better defense is for Alice to periodically change the
`key, re-encrypt all the documents under the new key, and re-
`order the ciphertext according to some pseudorandom per-
`mutation (known to Alice but not to Bob). This will help
`prevent Bob from learning correlations or other statistical
`information over time. This technique may also be help-
`ful if Alice wants to hide from Bob the places where the
`searched word occurs in the documents of interest.
`In all the schemes we have discussed so far, we must
`trust Bob to return all the search results. If Bob holds out on
`us and returns only some (but not all) of the search results,
`Alice will have no way to detect this. In our scope of inter-
`est, we assume Bob does not misbehave in this way. Even
`when this type of attack is present, it is possible to combine
`our scheme with hash tree techniques [17] to ensure

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site