`World Wide Web1
`
`Charles Frankel, Michael J. Swain, and Vassilis Athitsos
`
`The University of Chicago
`Computer Science Department
`1100 East 58th Street
`Chicago, Illinois 60637
`
`Technical Report 96-14
`
`August 1, 1996
`
`Abstract
`
`Because of the size of the World Wide Web and its inherent lack of structure, finding what one is
`looking for can be a challenge. PC-Meter’s March, 1996, survey found that three of the five most
`visited Web sites were search engines. However, while Web pages typically contain both text
`and images, all the currently available search engines only index text. This paper describes
`WebSeer, a system for locating images on the Web. WebSeer uses image content in addition to
`associated text to index images, presenting the user with a selection that potentially fits her
`needs.
`
`
`
`1This work was supported in part by ONR contract N00014-93-1-0332 and NSF Grant No. IRI-9210763-
`A01.
`
`Petitioner Apple Inc. - Exhibit 1015, p. 1
`
`
`
`Introduction
`
`The explosive growth of the World Wide Web has proven to be a double-edged sword. While an
`immense amount of material is now easily accessible on the Web, locating specific information
`remains a difficult task. An inexperienced user may find it next to impossible to find the
`information she wants; even an experienced user may miss relevant Web pages. Users
`searching the World Wide Web have a number of options presently available to them. Lycos,
`Excite, Alta Vista, and Yahoo! are but a few examples of useful search engines. All of these
`systems have been designed primarily to find text-based information on the Web. WebSeer is
`designed to find the other major source of information currently available on the Web: images.
`
`In order to find specific images, a method of discovering and indexing their contents must first
`be developed. The pixels alone are insufficient; some additional information is needed to make
`it possible to search for specific images. Discovering image content has proven to be an
`extremely difficult problem. One approach has been to supplement the image content with other
`types of information associated with the image. Ogle and Stonebraker’s Cypress system [1][2]
`uses information contained in hand-keyed database fields to supplement image content
`information. Srihari’s Piction system [3] uses the captions of newspaper photographs containing
`human faces to help locate the faces. IBM’s QBIC system [4] relies on the user specifying
`specific visual cues or providing an example image (e.g. a sketch) to begin a query for an image.
`In Picard and Minka’s Foureyes system [5][6] close interaction with a human user supplements
`information derived from the image content.
`
`WebSeer uses the textual information surrounding an image and the image header to supplement
`the information derived from analyzing the image content. This additional information is used to
`create a context in which image analysis algorithms can effectively operate. Image analysis
`algorithms are then used to classify the image within a taxonomy of types (drawing, photograph,
`portrait, etc.) and to extract useful semantic information.
`
`Like text search engines, WebSeer does not have to access the original data to respond to a
`query; all analysis of the image and surrounding text is done off-line during the creation of the
`database. In this way, WebSeer will be able to give fast query responses to a possibly huge
`number of users.
`
`The first part of this report describes the types of information that an image search engine such
`as WebSeer could extract from the Web and how they could be stored in a database. The second
`part illustrates a number of example queries and how images could be found that fit the queries.
`The final part details the prototype WebSeer program and the algorithms that have been
`implemented to date.
`
`How to Find Images on the Web
`
`2
`
`Petitioner Apple Inc. - Exhibit 1015, p. 2
`
`
`
`Information for finding images on the World Wide Web can come from two sources: the
`relevant text and the image itself. Using information from both sources, a program
`should be able to successfully retrieve requested images.
`
`Cues from the Text and HTML Source Code
`
`An HTML document is a structured document. Understanding the structure of a particular
`document can reveal valuable information about the images contained on that page. There are
`several places relevant information about image content may be located within the document.
`These are ordered by the likelihood of the text being useful in an image search.
`
`1. Image File Names: File names contain important clues as to the content of the file. In many
`cases the full URL of an image (e.g. “http://www2.int-evry.fr/~hug/images/marilyn/mm-
`trans.gif”) contains more information than a relative URL (e.g. “mm-trans.gif”), which may
`be included in the HTML source. We have found the file name and the name of the top-level
`directory containing the file are most useful for indexing the content of the image. A caveat
`is that filenames may be difficult to decipher because of abbreviations used for brevity or out
`of necessity (e.g. in the FAT file system used in DOS/Windows 3.1).
`
`2. Image Captions: Often, images have captions that describe the picture. Although HTML
`does not have an explicit caption tag, captions are often signaled as text that is located within
`the
`same
`center
`tag
`as
`the
`image
`(http://www.lib.uchicago.edu/~mozart/Marilyn/monthly.html), within the same cell of a
`table as an image (http://studentweb.tulane.edu/~llovejo/marilyn/), or within the same
`paragraph (see Appendix A).
`
`3. ALT= Text: Image fields in HTML documents may contain a special ALT= section for
`alternate text which is displayed in the event that the images can not be loaded. This
`alternate text may briefly describe the content of the image. For instance, the alternate text
`for the a photograph of a person may contain simply the name of the person in the
`photograph. The alternate text may also give hints as to the intended “goal” of the image.
`For instance, an image containing an ad for Coca-Cola may have the alternate text “Buy
`Coca-Cola.”
`
`4. HTML Titles: The HTML title of the document (which is displayed above the viewed page
`by the browser, is used for history lists, and so on) often provides information about the
`content of the images contained within the document.
` For example, the page
`http://tomahawk.cf.ac.uk/~scotw/marilyn/page2/page2.html is entitled “Marilyn Monroe
`Images - The ventilator scene”, which is highly descriptive of the contents, and more
`descriptive than any other cues that can be found on the page (see Appendix A).
`
`5. HyperLinks: The text of a hyperlink usually gives clues about what the link refers to. In
`many cases, images are referenced by hyperlinks rather than being imbedded within the text
`of the page. An example of such a link is “Marilyn JPG #21 (Jumping for joy)”, at
`
`3
`
`Petitioner Apple Inc. - Exhibit 1015, p. 3
`
`
`
`http://www.netins.net/showcase/agency/marilyn.html. The same photo is referenced as
`“Marilyn
`joy”
`at
`jumping
`for
`http://www.rtmol.stt.it/rtmol/onsite/stars/marilyn/m_imag~1.html, where only the italicized
`text is the text of the hyperlink. In many cases, text hyperlinks assume the reader
`understands that the pictures are of Monroe and does not give any information about who is
`in the picture. These links say “Simply gorgeous ....”, “Leaning on a banister”, and so on.
`Consequently, other text in the same sentence or phrase with the hyperlink can also be
`relevant to deciphering the image content.
`
`6. Other text: In addition to those types mentioned above, other text can give cues about the
`images contained on the page. This text could be used to index the images on the page, with
`a lower likelihood of reference than the text categories listed above.
`
`Cues from the Image Content
`
`Although it is clear that the context provided by the surrounding text can produce an effective
`image search engine, examining the images themselves can substantially enrich the search. In
`this section we will summarize some of the relevant information that can be derived directly
`from the image content.
`
`The following attributes can be easily obtained from the image header.
`• Grayscale vs. color: Some may wish to search for only color images. While color images
`can be converted to grayscale, those in search of black-and-white photography will want the
`real thing.
`
`•
`
`Image Size: This is usually useful for those who wish to screen out images smaller than a
`certain size.
`• File Type (JPEG, GIF, GIF89, etc.): Some searchers may wish to avoid the compromises in
`quality introduced by constraints of these formats, especially the limited color palette in the
`GIF format. GIF89s are animations, and so will often be of separate interest than the other
`static image formats.
`• File Size: For those with slow connections or limited RAM or hard drive space, overly large
`files may be of less interest
`• File Date (when present): If someone is looking for newer images, the date can be used to
`screen out images that are certainly older than a certain date.
`
`Obtaining information beyond that listed above generally requires some image analysis. The
`most useful sorts of image analysis can be grouped into three overlapping categories: 1.
`classifying the image into one of several types (e.g. photograph, hand-drawing, computer-
`generated drawing), 2. recognizing image structures (e.g. faces, horizons), and 3. distinguishing
`specific regions primarily by color/texture attributes (e.g. skin, sky, foliage).
`
`4
`
`Petitioner Apple Inc. - Exhibit 1015, p. 4
`
`
`
`Classifying Images
`Creating a taxonomy of image types can facilitate separating images the user is interested in
`from those that she is not. However, the most important aspect of classification is creating the
`categories. The authors of an image search engine must either design a system that uses helpful
`categories, or include a mechanism to allow the user to guide her own search using relevant
`schema. Ideally, a system would use aspects of both types of categorization.
`
`One important classification which can be made is between images that are photographs and
`hand-drawn and computer-created drawings. In many cases users may wish to find only
`photographs of faces, and ignore the many other images that match the search string. Some of
`these non-photographs may be weeded out because they are too small (e.g. buttons, dividers,
`arrows). Others, such as image-maps containing text, advertisements, or drawings, must be
`eliminated on other criteria. Of course, more realistic drawings will be likely to be mistakenly
`categorized as photographs. Likewise photographs can be thresholded and manipulated in other
`ways to make them look more like drawings. Nonetheless, we have found that many images can
`be successfully classified as either drawings or photographs. It may also be possible to separate
`computer-generated drawings (i.e. those that are artificial-looking, with precisely straight lines
`and precise color gradients) from hand drawings.
`
`There are many other potentially useful categories. Images can be classified as portraits, which
`can be further classified by the number of people in the image and the amount of the body shown
`along with the face (close-up, head-and-shoulders, upper-body, etc.). Images can also be
`classified as indoor or out, and outdoor images can be classified as landscapes.
`
`Some of these categorizations may be best determined by analyzing global statistics, or the
`statistics of relatively large portions of the image (e.g. photo vs. drawing). Others require other
`sorts of analysis, which are described below.
`
`Recognizing Image Structures
`There are many types of objects to identify in images, which would aid content-based indexing.
`We have identified two “objects” that are both important to the typical WebSeer user and within
`the capabilities of existing computer vision technology. They are:
`
`1. Faces: Face-finding algorithms have recently become fairly successful at finding faces in
`grayscale photographs [7][8]. Faces in photographs are often seen upright, with the subject
`looking at the camera; the easiest situation for face-finding algorithms. The high quality of
`many photographs found on the Web, in many cases produced by professionals with careful
`control of the lighting and high quality film and optics, improves the reliability of face
`detectors. Combining face detection with color information from skin detection should
`make a reliable face detector. A primary role of a face detector is to distinguish images of
`people from other types of images that may show up on the same page. However, face
`detection can do more. Users may wish to find a portrait of a single person, a group photo,
`or pictures of a “crowd” containing many faces. Portraits can be close-ups, head-and-
`shoulders shots, upper-body, or full-body shots; the size and location of the face can
`differentiate among these choices.
`
`5
`
`Petitioner Apple Inc. - Exhibit 1015, p. 5
`
`
`
`2. Horizon: A number of systems search images for the presence of a horizon [9] (e.g. the
`Cypress System). Detecting a horizon in a photograph tells us that the image has been taken
`outdoors and allows us to roughly classify it as a landscape photograph.
`
`Face recognition is another possible task one might think of assigning the system. For instance,
`WebSeer could have a database of people that users are likely to search for, and identify their
`faces when creating the index. Restricting possible matches to those people mentioned in the
`text would greatly reduce the number of false positives the face recognizer would return. Further
`research will likely define other categories of objects that could be searched for and indexed in
`the same way as faces and horizons.
`
`Recognizing Regions by Color/Texture Analysis
`Some color and texture analysis may be useful for finding image regions containing skin, sky,
`foliage, etc. in photographs. Skin may be used for finding faces and determining if images are
`portraits; it is in itself of interest to some image searchers. Sky and foliage can also be used in
`image classification. Other less common color textures, including sand, wood, etc., may also be
`identified. If more classes such as these are created, it may prove useful to look for them only
`when the context suggests they may be present.
`
`The WebSeer Implementation
`WebSeer was implemented with three guiding principles in mind:
`
`First, WebSeer must have acceptable performance. We need to allow for extremely high
`speed searches, as we expect a large number of people to be using our system simultaneously.
`Since indexing occurs off-line, the performance of the indexing engine is less crucial than that of
`the search engine. Nonetheless, indexing the entire web in a reasonable amount of time requires
`the processing to proceed quite quickly. For example, assuming there are 10 million unique
`images on the web, indexing them in two weeks would require eight images to be processed
`every second. Crawler speed is also important. Preliminary results indicate that for every 100
`HTML pages on the Web, there are 40 (unique) GIF images and one (unique) JPEG image. In
`contrast to the file size of HTML pages, which averages around 6k, the average file size for GIF
`files is 11k, and the average file size for JPEGs is 35k.
`
`Second, we tried to incorporate standard commercial software and hardware whenever
`possible. Much work has been put into developing advanced database engines, and WebSeer’s
`ability to leverage technology adds significantly to its power. Microsoft’s Visual C++
`development environment was used for much of this project. Microsoft’s Foundation Classes in
`conjunction with the Standard Template Libraries (STL) provided many useful functions and
`abstractions during the development of this project. On the hardware front, the use of relatively
`inexpensive PCs allowed us to ramp up our storage and processing capacities quickly and within
`reasonable cost.
`
`Third, the project should provide a basis for experimentation. We foresee WebSeer
`evolving in the following ways:
`
`6
`
`Petitioner Apple Inc. - Exhibit 1015, p. 6
`
`
`
`• better image understanding algorithms.
`
`• advanced text indexing capabilities
`
`• improved interactions between the image understanding and text indexing algorithms
`
`• more complex transformations from form query submissions to search actions taken .
`
`To facilitate this type of research, the project is divided in such a way that each component can
`be worked on independently. Additionally, we wish to facilitate the incorporation of relevant
`new technologies as they become available. The WebSeer project is composed of five major
`executables. With the exception of the URL Server, which is written in Java, all executables are
`written in C++ and run on a M.S. Windows NT 3.51 platform.
`
`1) The WebSeer Crawler crawls the web downloading both HTML pages and images.
`The crawler is multi-threaded so that the delay downloading pages is spread over
`multiple threads. Each thread is connected to a database of previously visited (and
`waiting to be visited) URLs using the ODBC 2.0 database protocol.
`
`2) The URL Server is a multi-threaded java application which receives requests to
`download URLs from the WebSeer Crawler. Separating the URL server from the
`Crawler application allows us to download pages from multiple machines (with
`different operating systems) simultaneously.
`
`3) The WebSeer Indexer creates the index which is searched by the users. The indexer
`parses the HTML code and executes the appropriate image understanding
`applications.
`
`4) The WebSeer CGI script is called when the user submits (POSTs) a query from the
`WebSeer form. This script opens a TCP/IP connection to the WebSeer Search
`Server, and formats the results for display to the user.
`
`5) The WebSeer Search Server accepts requests from the WebSeer CGI script and
`performs the appropriate searches based on the form fields which the user has filled
`in.
`
`The WebSeer Crawler largely obeys the Robot Exclusion Protocola. The Protocol is dependent
`on system administrators including a robots.txt file on their web site, which lists areas of
`the web site which robots should not visit. Most robots are designed to download only HTML
`files, and so visiting a directory which includes images would be inappropriate. For this reason,
`some robots.txt files exclude directories which contain only image files (See Appendix B).
`Since the WebSeer Crawler is designed to download images, we obey the restrictions specified
`by the robots.txt file when deciding whether to download an html file, but not when
`downloading an image file.
`
`
`
`a The Protocol is specified at http://info.webcrawler.com/mak/projects/robots/norobots.html
`
`7
`
`Petitioner Apple Inc. - Exhibit 1015, p. 7
`
`
`
`The WebSeer Indexer is perhaps the most interesting component to this system. The indexer
`processes a single HTML file at a time. As each image is encountered in the HTML file (either
`through an “<img src=…>” tag or an “<a href src=“this_image.gif”>” tag, the appropriate text is
`extracted in the form of whole words. These words may be present in any of the ways described
`above. Some of the words are more likely to contain information about the content of the image
`they refer to. For this reason, we weight the words according to the likelihood that they contain
`useful information. Words contained in the title of the HTML page, for example, have a lower
`weight than those in the ALT tag of an image. When a user performs a search the summed
`weights of the matching words are used as one criterion for sorting the resulting images. These
`words and their weights are stored in a single database table. Part of an example WebSeer text
`table is show below.
`
`Image ID
`31308
`31308
`31308
`31309
`31309
`31309
`31309
`31309
`31311
`31311
`31311
`31311
`31311
`31311
`
`Text Fragment
`kathy
`ireland
`vivek
`tn
`ki
`vivek
`kathy
`ireland
`marilyn
`gallery
`lovejoy
`marilyn
`monroe
`collection
`
`Fragment Weight
`1
`1
`1
`3
`3
`1
`1
`1
`3
`2
`1
`1
`1
`1
`
`Image understanding algorithms are run whenever an image is encountered. The figure below
`indicates the fields which WebSeer currently saves for each image.
`
`8
`
`Petitioner Apple Inc. - Exhibit 1015, p. 8
`
`
`
`Sample Data
`Field Name
`http://www.cdt.org/images/cdtlgo.gif
`File Name
`http://www.cdt.org/index.html
`Source URL
`8 bit color
`Color Depth
`3,129 bytes
`File Size
`gif
`File Type
`204
`Image Width
`107
`Image Height
`No
`Is Image a Photograph?
`No
`Is Image an Image Map?
`No
`Is Image included as a Reference?
`0
`Number of Faces
`Largest Face Size
`0%
`Three fields contain information about how the image was included in the HTML
`document. The “Is Image a Photograph” field contains the results of the photograph/drawing
`algorithm described below. “Is Image an Image Map” indicates whether the image appeared in
`the HTML code with an ISMAP tag. These images appear to the user as clickable images.
`
`“Is Image included as a Reference?” indicates whether the image appears as part of an HTML
`document, or whether only a reference is included. If an image is part of an HTML document,
`users viewing that document will immediately be able to see that image. If a reference is
`included, the user may have to click on some part of the document in order to view the image.
`References to images are common when the images are large and/or slow to download.
`
`The “Number of Faces” field indicates the number of faces which were detected in the given
`image. Each detected face has four attributes associated with it: horizontal position, vertical
`position, height, and width. Since an image may contain a number of different faces, the
`attributes (fields) associated with each particular face are saved in a separate “face table”.
`
`The “Largest Face Size” field saves the height of the largest face detected in the given image, as
`a percentage of image height. Although this information is also contained in the face table,
`saving the information in the image table speeds some searches by eliminating the need to
`perform a JOIN of the image table with the face table. An example is a search for a “close-up”
`image. We define close-up images as images where the largest face has a height greater than
`50% the size of the image.
`
`An Example Search
`A user is interested in finding small, close-up images of Rebecca De Mornay. They type
`“Rebecca De Mornay” as the search text, and make selections as displayed in Figure 1.
`
`9
`
`Petitioner Apple Inc. - Exhibit 1015, p. 9
`
`
`
`The results page interface, shown in Figure 2, works as follows. Thumbnails of the resulting
`images are displayed above a bar which indicates the size of the original image. Clicking on the
`image will launch your browser to the URL original image. Clicking on the page icon to the
`right of the image will launch your browser to the URL of the page which contains the image.
`
`Figure 1: An image search query
`
`The results were obtained as follows. Webseer searches for images which contain any of the
`words contained in the text, after eliminating words that are especially common (e.g. connecting
`words such as “and” and “the”). The results are sorted by the summed weight of the words
`associated with that image. For instance, if one image has associated words “Rebecca”, with a
`
`10
`
`Petitioner Apple Inc. - Exhibit 1015, p. 10
`
`
`
`weight of 3, and “Mornay” with a weight of 1, that image would have a summed weight of 4.
`Only images with a height or width < 150 pixels, with file size < 10K, which were determined to
`be photographs, and which contain exactly one face whose height is at least 50% of the size of
`the image are included in the results. Lastly, the results are sorted, first by the weighted sum of
`associated words, and then, since “close up” was selected, by the detected face size (with the
`largest faces appearing first).
`
`Figure 2: Results of the query shown in Figure 1.
`
`Image Content Analysis: Classifying Photographs and Drawings
`
`The algorithm classifies images into photographs and artificial images, where the terms
`"artificial images" and “drawings” are used to denote all images that are not photographs. There
`are some borderline cases. The most common is images that contain a photograph and an
`artificial part, which could be, for example, a computer-generated background or super–imposed
`text. Currently, the algorithm is not designed to handle such images consistently.
`
`The algorithm can be split into two independent modules. One module consists of tests
`that separate photographs from drawings. After the image has been submitted to all the tests, the
`decision-making module decides how it should be classified.
`
`11
`
`Petitioner Apple Inc. - Exhibit 1015, p. 11
`
`
`
`Tests
`
`In every test an image gets a positive real number as a score. The images are represented
`by three matrices, corresponding to their red (R), green (G), and blue (B) color bands, whose
`entries are integers from 0 to 255. Photographs and drawings tend to score in different ranges.
`These are the tests we currently use in the algorithm:
`
`•The band difference test. We pick a threshold T between 0 and 255 and initialize the
`counter C to 0. For every pixel in the image, let r, g, b be its respective values in the R, G, B
`matrices. We look at |r - g|, |g - b| and |r - b|, and we increase the counter C by one for each of
`
` C S
`
`those absolute values that is greater than T. The score of the image is
`
`
`of pixels in the image. We expect drawings to score higher than photographs, because the colors
`used in drawings are usually highly saturated.
`
`, where S is the number
`
`•The farthest neighbor test. We pick a threshold T between 0 and 255 and initialize
`counters C and S to 0. For each inner pixel P of the image, we consider its top, bottom, left, and
`right neighbor. For each neighbor, we calculate the absolute difference of its R value from the R
`value of P, and we look at the maximum M of those four absolute differences. If M > 0, we
`
` C S
`
`increase S. If, in addition, M > T, we increase C. The score of the image is
`
`
`The rationale behind this test is to see how abrupt the color changes are in an image. In
`photographs they tend to be smoother than in drawings. Therefore, drawings tend to score
`higher than photographs in this test.
`
`.
`
`• The color test. We count the number of different colors in the image. Drawings tend
`to have fewer distinct colors than photographs.
`
`• The most common color test. We find the color that occurs most frequently in the
`
` C S
`
`image. The score is
`
`
`is the total number of pixels in the image. Again, artificial images usually score higher than
`photographs, because they tend to have a one-color background.
`
` where C is the number of pixels in the image that have that color, and S
`
`• The narrowness test. Let R be the number of rows and C be the number of columns in
`
` C R
`
`, otherwise N =
`
` R C
`
`the image. If R > C, then N =
`
`
`
`Photographs tend to score between 1 and 2, whereas drawings often score above 2.
`
`, where the score of the image is N.
`
`For images saved in the JPEG format, the color test doesn't work because of the way
`JPEG compression works. Neighboring pixels that have the same color in the original image
`usually have similar but not identical colors after the compression. In addition, abrupt color
`changes in the original images usually become smoother after compression. This makes the
`farthest neighbor test less powerful.
`
`12
`
`Petitioner Apple Inc. - Exhibit 1015, p. 12
`
`
`
`Decision making
`
`Since GIF and JPEG images tend to follow different scoring patterns in the tests we use,
`the decision maker needs to know how images tend to score based both on their category
`(photograph or drawing) and their compression format. So, for each of the four cases (GIF
`photographs, GIF drawings, JPEG photographs, JPEG drawings), we created a training set of at
`least 100 images, selecting representative images from the Web. For each test, we calculated
`and stored the scores of all images in the training set. The minimum score is set so that 1% of
`the images score below it. The maximum is defined in an analogous way. The set of scores
`together with the minimum and the maximum is called the distribution of scores for that test. In
`particular, if the training set contains photographs, the distribution is called natural, otherwise it
`is called artificial.
`
`Suppose we have a GIF image and calculate its score S on a particular test T. Consider
`the natural distribution of scores for that test. If the score S is between the minimum and the
`maximum score in that distribution, L is the percentage of scores in the distribution that are less
`than or equal to S, and M is the percentage of scores that are greater than or equal to S. We
`define the natural grade N of the image to be the minimum of L and M.
`
`If the score S is less than the minimum, then R =
`
`S
`minimum and the natural grade N is
`. If S = 0, however, we set N to 0.001. If S is greater than the maximum, then R =
`
`R
`1000
`
`maximum
`S
`
`. We calculate the artificial grade A of the image in a similar way.
`
`R
` and N =
`1000
`
`Based on these scores, we assign an overall grade between 0 and 1 to the image. The
`higher the grade is, the more likely the image is to be a photograph. We fix a threshold T, and
`we classify all images whose grade is over T as photographs and all other images as drawings.
`This is how we calculate the grade:
`
`• We initialize P and D to 1.
`
`• After we are done with all tests, we set the grade of the image to
`
`• For each test, we calculate the score of the image, and from the score we calculate the
`natural grade N and the artificial grade A for that test. We set P to PN and D to DA.
`P
`P + D
`
`The reader may have recognized that if, for each test, A was the conditional probability
`that an image got the score S given that it was a drawing, and N was the conditional probability
`that an image got the score S given that it was a photograph, then the grade of the image would
`be the probability that it was a photograph given all its test scores and assuming that the different
`tests have independent score distributions. In that case, we should classify all images scoring
`over 0.5 as photographs and the rest as drawings. However, A and N are not defined to be the
`conditional probabilities. So, the optimal classification threshold is not necessarily 0.5.
`
`.
`
`13
`
`Petitioner Apple Inc. - Exhibit 1015, p. 13
`
`
`
`Nevertheless, it remains true that the grade is more reliable when the tests we use are pairwise
`independent or approximately independent.
`
`Results
`
`The algorithm has been tested both on the training sets that we used, and on images that
`we didn’t include in those sets. With the group of JPEG artificial images the algorithm was only
`tested on a training set. We have found that purely artificial images in the JPEG format occur
`less frequently than images in other formats.
`
`The following tables give some experimental results:
`
`• Table 1 gives results on GIF images. The first row gives the number of images for each
`group. After that, each entry gives the percentage of correct classifications of images in the
`given group when grades are calculated based on that test only. The group "GIF art1" is the
`training set of GIF drawings. “GIF nat1” is the training set of GIF photographs. “GIF art2” and
`“GIF nat2” are, respectively, sets of drawings and photographs that were not used in training.
`
`• Table 2 gives results on JPEG images, in the same format as Table 1.
`
`• Table 3 gives the values of all thresholds used.
`
`Table 1: Results on GIF Images
`
`GIF art1 GIF art2 GIF nat1 GIF nat2
`
`number of images
`
`band difference test
`
`farthest neighbor test
`
`color test
`
`most common color test
`
`ratio test
`
`All tests combined
`
`395
`
`50.9
`
`86.6
`
`60.8
`
`81.0
`
`71.4
`
`96.2
`
`339
`
`45.4
`
`89.1
`
`79.4
`
`88.5
`
`74.6
`
`94.1
`
`129
`
`94.6
`
`89.1
`
`83.7
`
`92.2
`
`77.5
`
`88.4
`
`112
`
`90.2
`
`83.1
`
`85.7
`
`85.7
`
`77.7
`
`81.2
`
`14
`
`Petitioner Apple Inc. - Exhibit 1015, p. 14
`
`
`
`Table 2: Results on JPEG Images
`
`JPEG art1
`
`JPEG nat1
`
`JPEG nat2
`
`number of images
`
`band test
`
`farthest neighbor test
`
`most common color test
`
`ratio test
`
`All tests combined
`
`150
`
`13.3
`
`55.3
`
`66.7
`
`66.7
`
`92.0
`
`228
`
`98.7
`
`95.2
`
`98.2
`
`96.5
`
`93.9
`
`118
`
`97.5
`
`97.5
`
`91.5
`
`99.2
`
`95.8
`
`Table 3: Thresholds
`
`band difference test
`
`farthest neighbor test
`
`grade
`
`GIF
`
`JPEG
`
`233
`
`100
`
`0.3
`
`185
`
`150
`
`0.1
`
`Current Work
`We are currently working on improving the algorithm in three directions:
`
`• Finding more tests and perfecting the ones we already have.
`
`