`
`Bongwon Suh*, Haibin Ling, Benjamin B. Bederson*, David W. Jacobs
`Department of Computer Science
`*Human-Computer Interaction Laboratory
`University of Maryland
`College Park, MD 20742 USA
`+1 301-405-2764
`{sbw, hbling, bederson, djacobs}@cs.umd.edu
`
`ABSTRACT
`Thumbnail images provide users of image retrieval and
`browsing systems with a method for quickly scanning large
`numbers of images. Recognizing the objects in an image is
`important in many retrieval tasks, but thumbnails generated
`by shrinking the original image often render objects
`illegible. We study the ability of computer vision systems
`to detect key components of images so that automated
`cropping, prior to shrinking, can render objects more
`recognizable. We evaluate automatic cropping techniques
`1) based on a general method that detects salient portions
`of images, and 2) based on automatic face detection. Our
`user study shows that these methods result in small
`thumbnails that are substantially more recognizable and
`easier to find in the context of visual search.
`
`Keywords
`Saliency map, thumbnail, image cropping, face detection,
`usability study, visual search, zoomable user interfaces
`
`INTRODUCTION
`Thumbnail images are now widely used for visualizing
`large numbers of images given limited screen real estate.
`The QBIC system developed by Flickner et al. [10] is a
`notable image database example. A zoomable image
`browser, PhotoMesa [3], lays out thumbnails in a zoomable
`space and lets users move through the space of images with
`a simple set of navigation functions. PhotoFinder applied
`thumbnails as a visualization method for personal photo
`collections [14]. Popular commercial products such as
`Adobe Photoshop Album [2] and ACDSee [1] also use
`thumbnails to represent image files in their interfaces.
`Current systems generate thumbnails by shrinking the
`original
`image. This method
`is simple. However,
`thumbnails generated this way can be difficult to recognize,
`
`Permission to make digital or hard copies of all or part of this work
`
`for personal or classroom use is granted without fee provided that
`
`copies are not made or distributed for profit or commercial
`advantage, and that copies bear this notice and the full citation on the
`
`first page. To copy otherwise, to republish, to post on servers or to
`redistribute to lists, requires prior specific permission and/or a fee.
`UIST ’03, Vancouver, BC, Canada
`© 2003 ACM 1-58113-636-6/03/0010 $5.00
`
`especially when the thumbnails are very small. This
`phenomenon is not unexpected, since shrinking an image
`causes detailed information to be lost. An intuitive solution
`is to keep the more informative part of the image and cut
`less
`informative
`regions before
`shrinking. Some
`commercial products allow users to manually crop and
`shrink images [20]. Burton et al. [4] proposed and
`compared several image simplification methods to enhance
`the full-size images before subsampling. They chose
`edge-detecting smoothing, lossy image compression, and
`self-organizing feature map as three different techniques in
`their work.
`In quite a different context, DeCarlo and Santella [8]
`tracked a user’s eye movements to determine interesting
`portions of images, and generated non-photorealistic,
`painterly images that enhanced the most salient parts of the
`image. Chen et al. [5] use a visual attention model as a cue
`to conduct image adaptation for small displays.
`In this paper, we study the effectiveness of saliency based
`cropping methods for preserving the recognizability of
`important objects in thumbnails. Our first method is a
`general cropping method based on the saliency map of Itti
`and Koch that models human visual attention [12][13]. A
`saliency map of a given image describes the importance of
`each position in the image. In our method, we use the
`saliency map directly as an indication of how much
`information each position in images contains. The merit of
`this method is that the saliency map is built up from low-
`level features only, so it can be applied to general images.
`We then select the most informative portion of the image.
`Although this saliency based method is useful, it does not
`consider semantic information in images. We show that
`semantic information can be used to further improve
`thumbnail cropping, using automatic face detection. We
`choose this domain because a great many pictures of
`interest show human faces, and also because face detection
`methods have begun to achieve high accuracy and
`efficiency [22].
`In this paper we describe saliency based cropping and face
`detection based cropping after first discussing related work
`from the field of visual attention. We then explain the
`
`Volume 5, Issue 2
`
`95
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 1 of 10
`
`
`
`In the next subsection, we propose a general cropping
`method, which is based on the saliency map and can be
`applied to general images. Next, a face detection based
`cropping method is introduced for images with faces.
`
`A General Cropping Method Based on the Saliency Map
`In this method, we use the saliency value to evaluate the
`degree of informativeness of different positions in the
`image I. The cropping rectangle RC should satisfy two
`conditions: having a small size and containing most of the
`salient parts of the image. These two conditions generally
`conflict with each other. Our goal is to find the optimal
`rectangle to balance these two conditions.
`An example saliency map is given in Figure 1:
`
`Figure 1: left: original image, right: saliency map of the
`image shown left
`
`Find Cropping Rectangle with Fixed Threshold using Brute
`Force Algorithm
`We use Itti and Koch’s saliency algorithm because their
`method
`is based on
`low-level features and hence
`independent of semantic information in images. We choose
`Itti and Koch’s model also because it is one of the most
`practical algorithms on real images.
`Once the saliency map SI is ready, our goal is to find the
`crop rectangle RC that is expected to contain the most
`informative part of the image. Since the saliency map is
`used as the criteria of importance, the sum of saliency
`within RC should contain most of the saliency value in SI.
`Based on this idea, we can find RC as the smallest rectangle
`containing a fixed fraction of saliency. To illustrate this
`for RC and the
`formally, we define candidates set
`
` )(λℜ
`fraction threshold λas
`
`∑∑
`
`
`
`λ
`
`>
`
`
`
`)
`
`
`
`)
`
`)
`
` ,( yxS
`I
`∈
`r
`)
` ,( yxS
`I
`
`
`
`, yx
`
`(
`
`r
`
`:
`
`(
`
`
`
`, yx
`
`
`
`ℜ
`
`λ
`)
`
`(
`
`=
`
`Then RC is given by
`R
`C
`
`=
`
`arg
`(min
`λℜ∈
`r
`)
`(
`RC denotes the minimum rectangle that satisfies the
`threshold defined above. A brute force algorithm was
`developed to compute RC.
`
`area
`
`(
`
`r
`
`))
`
`
`
`design of a user study that evaluates the thumbnail
`methods. This paper concludes with a discussion of our
`findings and future work.
`
`RELATED WORK
`Visual attention is the ability of biological visual systems to
`detect interesting parts of the visual input [12][13][16][17]
`[21]. The saliency map of an image describes the degree of
`saliency of each position in the image. The saliency map is
`a matrix corresponding to the input image that describes
`the degree of saliency of each position in the input image.
`Itti and Koch [12][13] provided an approach to compute a
`saliency map for images. Their method first uses pyramid
`technology to compute three feature maps for three low
`level features: color, intensity, and orientation. For each
`feature, saliency is detected when a portion of an image
`differs in that feature from neighboring regions. Then
`these feature maps are combined together to form a single
`saliency map. After this, in a series of iterations, salient
`pixels suppress
`the saliency of
`their neighbors,
`to
`concentrate saliency in a few key points.
`Chen et al. [5] proposed using semantic models together
`with the saliency model of Itti and Koch to identify
`important portions of an image, prior to cropping. Their
`method is based on an attention model that uses attention
`objects as the basic elements. The overall attention value of
`each attention object is calculated by combining attention
`values from different models. For semantic attention
`models they use a face detection technique [15] and a text
`detection technique [6] to compute two different attention
`values. The method provides a way to combine semantic
`information with
`low-level features. However, when
`combining the different values, their method uses heuristic
`weights that are different for five different predefined
`image types. Images need to be manually categorized into
`these five categories prior to applying their method.
`Furthermore, it heavily relies on semantic extraction
`techniques. When the corresponding semantic technique is
`not available or when the technique fails to provide a good
`result (e.g. no face found in the image), it is hard to expect
`a good result from the method. On the other hand, our
`algorithm is totally automatic and works well without
`manual intervention or any assumptions about the image
`types.
`
`THUMBNAIL CROPPING
`Problem Definition
`We define the thumbnail cropping problem as follows:
`Given an image I, the goal of thumbnail cropping is to find
`a rectangle RC, containing a subset of the image IC so that
`the main objects in the image are visible in the subimage.
`We then shrink IC to a thumbnail.. In the rest of this paper,
`we use
`the word “cropping”
`to
`indicate
`thumbnail
`cropping.
`
`Volume 5, Issue 2
`
`96
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 2 of 10
`
`
`
`Find Cropping Rectangle with Fixed Threshold using
`Greedy Algorithm
`The brute force method works, however, it is not time
`efficient. Two main factors slow down the computation.
`First, the algorithm to compute the saliency map involves
`several series of iterations. Some of the iterations involve
`convolutions using very large filter templates (on the order
`of the size of the saliency map). These convolutions make
`the computation very time consuming.
`Second, the brute force algorithm basically searches all
`sub-rectangles exhaustively. While techniques exist to
`speed up this exhaustive search, it still takes a lot of time.
`We found that we can achieve basically the same results
`much more efficiently by: 1) using fewer iterations and
`smaller filter templates during the saliency map calculation;
`2) squaring the saliency to enhance it; 3) using a greedy
`search instead of brute force method by only considering
`rectangles that include the peaks of the saliency.
`Rectangle GREEDY_CROPPING (S, λ)
`thresholdSum (cid:197) λ * Total saliency value in S
`RC (cid:197) the center of S
`currentSaliencySum (cid:197) saliency value of RC
`WHILE currentSaliencySum < thresholdSum DO
` P (cid:197) Maximum saliency point outside RC
` R’ (cid:197) Small rectangle centered at P
` RC (cid:197) UNION(RC, R’)
` UPDATE currentSaliencySum with new region RC
`ENDWHILE
`RETURN RC
`
`Figure 2: Algorithm to find cropping rectangle with fixed
`saliency threshold. S is the input saliency map and λis
`the threshold.
`Figure 2 shows the algorithm GREEDY_CROPPING to
`find
`the cropping
`rectangle with
`fixed
`saliency
`threshold λ . The greedy algorithm calculates RC by
`incrementally including the next most salient peak point P.
`Also when including a salient point P in RC, we union RC
`with a small rectangle centered at P. This is because if P is
`within the foreground object, it is expected that a small
`region surrounding P would also contain the object.
`This algorithm can be modified
`to satisfy further
`requirements. For example, the UNION function in Figure
`2 can be altered when the cropped rectangle should have
`the same aspect ratio as the original image. Rather than just
`merging two rectangles, UNION needs to calculate the
`minimum surrounding bounds that have the same aspect
`ratio as the original image. As another example, the initial
`value of RC can be set to either the center of image, S, or
`the most salient point or any other point. Since the initial
`point always falls in the result thumbnail, it can be
`regarded as a point with extremely large saliency. When
`the most salient point is selected as an initial point, the
`
`result can be optimized to have the minimum size. But, we
`found that to begin the algorithm with the center of images
`gives more robust and faster results even though it might
`increase the size of the result thumbnail especially when all
`salient points are skewed to one side of an image.
`Find Cropping Rectangle with Dynamic Threshold
`Experience shows that the most effective threshold varies
`from image to image. We therefore have developed a
`method for adaptively determining the threshold λ.
`Intuitively, we want to choose a threshold at a point of
`diminishing returns, where adding small amounts of
`additional saliency requires a
`large
`increase
`in
`the
`rectangle. We use an area-threshold graph to visualize this.
`The X axis indicates the threshold (fraction of saliency)
`while the Y axis shows the normalized area of the cropping
`rectangle as the result of the greedy algorithm mentioned
`above. Here the normalized area has a value between 0 and
`1. The solid curve in Figure 3 gives an example of an area-
`threshold graph.
`A natural solution is to use the threshold with maximum
`gradient in the area-threshold graph. We approximate this
`using a binary search method to find the threshold in three
`steps: First, we calculate the area-threshold graph for the
`given image. Second, we use a binary search method to
`find the threshold where the graph goes up quickly. Third,
`the threshold is tuned back to the position where a local
`maximum gradient exists. The dotted lines in Figure 3
`demonstrate the process of finding the threshold for the
`image given in Figure 1.
`
`Figure 3: The solid line represents the area-threshold
`graph. The dotted lines show the process of searching
`for
`the best
`threshold. The numbers indicate the
`sequence of searching
`
`Volume 5, Issue 2
`
`97
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 3 of 10
`
`
`
`Examples of Saliency Map Based Cropping
`After getting RC, we can directly crop the input image I.
`Thumbnails of the image given in Figure 1 are shown in
`Figure 4. It is clear from Figure 4 that the cropped
`thumbnail can be more easily recognized
`than
`the
`thumbnail without cropping.
`
`region. Based on this claim, we designed a thumbnail
`cropping approach based on face detection. First, we
`identify faces by applying CMU’s on-line face detection
`[9][19] to the given images. Then, the cropping rectangle
`RC is computed as containing all the detected faces. After
`that, the thumbnail is generated from the image cropped
`from the original image by RC.
`
`Figure 4 (left): the image cropped based on the saliency
`map; (middle): the cropping rectangle which contains
`most of the saliency parts; (right top): a thumbnail
`subsampled from the original image; (right bottom): a
`thumbnail subsampled from the cropped image (left part
`of this figure).
`Figure 5 shows the result of an image whose salient parts
`are more scattered. Photos focusing primarily on the
`subject and without much background information often
`have this property. A merit of our algorithm is that it is not
`sensitive to this.
`
`Figure 5 (left top): the original image (courtesy of Corbis
`[7]); (right top): the saliency map; (left bottom): the
`cropped image; (right bottom): the cropped saliency map
`which contains most of the salienct parts.
`
`Face Detection Based Cropping
`In the above section, we proposed a general method for
`thumbnail cropping. The method relies only on low-level
`features. However, if our goal is to make the objects of
`interest in an image more recognizable, we can clearly do
`this more effectively when we are able to automatically
`detect the position of these objects.
`Images of people are essential in a lot of research and
`application areas. At the same time, face processing is a
`rapidly expanding area and has attracted a lot of research
`effort in recent years. Face detection is one of the most
`important problems in the area. [22] surveys the numerous
`methods proposed for face detection.
`For human image thumbnails, we claim that recognizability
`will increase if we crop the image to contain only the face
`
`Figure 6 (left): the original image; (middle): the face
`detection result from CMU’s online face detection [9];
`(right): the cropped image based on the face detection
`result.
`Figure 6 shows an example image, its face detection result
`and
`the cropped
`image. Figure 7 shows
`the
`three
`thumbnails generated via three different methods. In this
`example, we can see that face detection based cropping
`method is a very effective way to create thumbnails, while
`saliency based cropping produces
`little
`improvement
`because the original image has few non-salient regions to
`cut.
`
`Figure 7: Thumbnails generated by the three different
`methods. (left): without cropping; (middle): saliency
`based cropping; (right): face detection based cropping.
`
`USER STUDY
`We ran a controlled empirical study to examine the effect
`of different thumbnail generation methods on the ability of
`users to recognize objects in images. The experiment is
`divided into two parts. First, we measured how recognition
`rates change depending on thumbnail size and thumbnail
`generation techniques. Participants were asked to recognize
`objects in small thumbnails (Recognition Task). Second,
`we measured how the thumbnail generation technique
`affects
`search performance
`(Visual Search Task).
`Participants were asked to find images that match given
`descriptions.
`
`Design of Study
`The recognition tasks were designed to measure the
`successful recognition rate of thumbnail images as three
`conditions varied: image set, thumbnail technique, and
`thumbnail size. We measured
`the correctness as a
`dependent variable.
`
`Volume 5, Issue 2
`
`98
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 4 of 10
`
`
`
`The visual search task conditions were designed to measure
`the effectiveness of
`image search with
`thumbnails
`generated with different
`techniques. The experiment
`employed a 3x3 within-subjects factorial design, with
`image set and
`thumbnail
`technique as
`independent
`variables. We measured search time as a dependant
`variable. But, since the face-detection clipping is not
`applicable to the Animal Set and the Corbis Set, we omitted
`the visual search tasks with those conditions as in Figure 8.
`The total duration of the experiment for each participant
`was about 45 minutes.
`
`Thumbnail Technique
`
`Plain shrunken thumbnail
`Saliency based cropping
`Face detection based cropping
`
`Animal
`Set
`√
`√
`X
`
`Corbis
`Set
`√
`√
`X
`
`Face
`Set
`√
`√
`√
`
`Figure 8: Visual search task design. Checkmarks (√)
`show which image sets were tested with which image
`cropping techniques.
`
`Participants
`There were 20 participants in this study. Participants were
`college or graduate students at the University of Maryland
`at College Park recruited on the campus. All participants
`were familiar with computers. Before the tasks began, all
`participants were asked to pick ten familiar persons out of
`fifteen candidates. Two participants had difficulty with
`choosing them. Since the participants must recognize the
`people whose images are used for identification, the results
`from those two participants were excluded from the
`analysis.
`
`Image Sets
`We used three image sets for the experiment. We also used
`filler images as distracters to minimize the duplicate
`exposure of images in the visual search tasks. There were
`500 filler images and images were randomly chosen from
`this set as needed. These images were carefully chosen so
`that none of them were similar to images in the three test
`image sets.
`Animal Set (AS)
`The “Animal Set” includes images of ten different animals
`and there are five images per animal. All images were
`gathered from various sources on the Web. The reason we
`chose animals as target images was to test recognition and
`visual search performance with familiar objects. The basic
`criteria of choosing animals were 1) that the animals should
`be very familiar so that participants can recognize them
`without prior learning; and 2) they should be easily
`distinguishable from each other. As an example, donkeys
`and horses are too similar to each other. To prevent
`confusion, we only used horses.
`
`Corbis Set (CS)
`Corbis is a well known source for digital images and
`provides various types of tailored digital photos [7]. Its
`images are professionally taken and manually cropped. The
`goal of this set is to represent images already in the best
`possible shape. We randomly selected 100 images out of
`10,000 images. We used only 10 images as search targets
`for visual search tasks to reduce the experimental errors.
`But during the experiment, we found that one task was
`problematic because there were very similar images in the
`fillers and sometimes participants picked unintended
`images as an answer. Therefore we discarded the result
`from the task. A total of five observations were discarded
`due to this condition.
`Face Set (FS)
`This set includes images of fifteen well known people who
`are either politicians or entertainers. Five images per
`person were used for this experiment. All images were
`gathered from the Web. We used this set to test the
`effectiveness of face detection based cropping technique
`and to see how the participants’ recognition rate varies with
`different types of images.
`Some images in this set contained more than one face. In
`this case, we cropped the image so that the resulting image
`contains all the faces in the original image. Out of 75
`images, multiple faces were detected in 25 images. We
`found that 13 of them contained erratic detections. All
`erroneously detected faces were included in the cropped
`thumbnail sets since we intended to test our cropping
`method with available face detection techniques, which are
`not perfect.
`
`Ratio
`
`Variance
`
`Thumbnail Techniques
`Plain shrinking without cropping
`The images were scaled down to smaller dimensions. We
`prepared ten levels of thumbnails from 32 to 68 pixels in
`the larger dimension. The thumbnail size was increased by
`four pixels per level. But, for the Face Set images, we
`increased the number of levels to twelve because we found
`that some faces are not identifiable even in a 68 pixel
`thumbnail.
`Cropping Technique and
`Image Set
`Corbis Set
`Animal Set
`Face Set
`All
`Face detection based
`cropping (Face Set)
`
`Saliency
`based
`cropping
`
`61.3%
`53.9%
`54.3%
`57.6%
`
`16.1%
`
`0.110
`0.127
`0.128
`0.124
`
`0.120
`
`Figure 9: Ratio of cropped to original image size.
`
`Volume 5, Issue 2
`
`99
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 5 of 10
`
`
`
`Saliency based cropping
`By using the saliency based cropping algorithms described
`above, we cropped out background of the images. Then we
`shrunk cropped images to ten sizes of thumbnails. Figure 9
`shows how much area was cropped for each technique.
`Face detection based cropping
`Faces were detected by CMU’s algorithm as described
`above. If there were multiple faces detected, we chose the
`bounding region that contains all detected faces. Then
`twelve levels of thumbnails from 36 to 80 pixels were
`prepared for the experiment.
`
`Recognition Task
`We used the “Animal Set” and the “Face Set” images to
`measure how accurately participants could recognize
`objects in small thumbnails. First, users were asked to
`identify animals in thumbnails. The thumbnails in this task
`were chosen randomly from all levels of the Animal Set
`images. This task was repeated 50 times.
`When the user clicked the “Next” button, a thumbnail was
`shown as in Figure 10 for two seconds. Since we intended
`to measure pure recognizability of thumbnails, we limited
`the time thumbnails were shown. According to our pilot
`user study, users tended to guess answers even though they
`could not clearly identify objects in thumbnails when they
`saw them for a long time. To discourage participants’ from
`guessing, the interface was designed to make thumbnails
`disappear after a short period of time, two seconds. For the
`same reason, we introduced more animals in the answer
`list. Although we used only ten animals in this experiment,
`we listed 30 animals as possible answers as seen in Figure
`10, to limit the subject’s ability to guess identity based on
`crude cues. In this way, participants were prevented from
`choosing similarly shaped animals by guess. For example,
`when participants think that they saw a bird-ish animal,
`they would select swan if it is the only avian animal. By
`having multiple birds in the candidate list, we could
`prevent those undesired behaviors.
`
`After the Animal Set recognition task, users were asked to
`identify a person in the same way. This Face Set
`recognition task was repeated 75 times. In this session, the
`candidates were shown as portraits in addition to names as
`seen in Figure 10.
`
`Visual Search Task
`For each testing condition in Figure 8, participants were
`given two tasks. Thus, for each visual search session,
`fourteen search tasks were assigned per participant. The
`order of tasks was randomized to reduce learning effects.
`As shown in Figure 11, participants were asked to find one
`image among 100 images. For the visual search task, it was
`important to provide equal search conditions for each task
`and participant. To ensure fairness, we designed the search
`condition
`carefully. We
`suppressed
`the duplicate
`occurrences of images and manipulated the locations of the
`target images.
`For the Animal Set search tasks, we randomly chose one
`target image out of 50 Animal Set images. Then we
`carefully selected 25 non-similar looking animal images.
`After that we mixed them with 49 more images randomly
`chosen from the filler set as distracters. For the Face Set
`and Corbis Set tasks, we prepared the task image sets in the
`same way.
`The tasks were given as verbal descriptions for the Animal
`Set and Corbis set tasks. For the Face Set tasks, a portrait
`of a target person was given as well as the person’s name.
`The given portraits were separately chosen from an
`independent collection so that they were not duplicated
`with images used for the tasks.
`
`
`Figure 10: Recognition task interfaces. Participants were
`asked to click what they saw or “I’m not sure” button.
`Left: Face Set recognition interface, Right: Animal Set
`recognition interface
`
`
`
`
`
`
`
`Figure 11: Visual search task interface. Participant were
`asked to find an image that matches a given task
`description. Users can zoom in, zoom out, and pan
`freely until they find the right image.
`
`
`
`
`
`
`
`
`
`Volume 5, Issue 2
`
`100
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 6 of 10
`
`
`
`image browser based on
`We used a custom-made
`PhotoMesa [3] as our visual search interface. PhotoMesa
`provides a zooming environment for image navigation with
`a simple set of control functions. Users click the left mouse
`button to zoom into a group of images (as indicated by a
`red rectangle) to see the images in detail and click the right
`mouse button to zoom out to see more images to overview.
`Panning is supported either by mouse dragging or arrow
`keys. The animation between zooming helps user to
`remember where things fit together based on spatial
`relationships. PhotoMesa can display a large number of
`thumbnails in groups on the screen at the same time. Since
`this user study was intended to test pure visual search, all
`images were presented in a single cluster as in Figure 11.
`Participants were allowed to zoom in, zoom out and pan
`freely for navigation. When users identify the target image,
`they were asked to zoom into the full scale of the image
`and click the “Found it” button located on the upper left
`corner of the interface to finish the task. Before the visual
`search session, they were given as much time as they
`wanted until they found it comfortable to use the zoomable
`interface. Most participants found it very easy to navigate
`and reported no problem with the navigation during the
`session.
`
`RECOGNITION TASK RESULTS
`Figure 12 shows the results from the recognition tasks. The
`horizontal axis represents the size of thumbnails and the
`vertical axis denotes the recognition accuracy. Each data
`point in the graph denotes the successful recognition rate of
`the thumbnails at that level. As shown, the bigger the
`thumbnails are, the more accurately participants recognize
`objects in the thumbnails. And this fits well with our
`intuition. But the interesting point here is that the automatic
`cropping techniques perform significantly better than the
`original thumbnails.
`
`There were clear correlations in the results. Participants
`recognized objects in bigger thumbnails more accurately
`regardless of the thumbnail techniques. Therefore, we used
`Paired T-test (two tailed) to analyze the results. The results
`are shown in Figure 13.
`The first graph shows the results from the “Animal Set”
`with two different thumbnail techniques, no cropping and
`saliency based cropping. As clearly shown, users were able
`to recognize objects more accurately with saliency based
`cropped thumbnails than with plain thumbnails with no
`cropping. One of the major reasons for the difference can
`be attributed to the fact that the effective portion of images
`is drawn relatively larger in saliency based cropped images.
`But, if the main object region is cropped out, this would not
`be true. In this case, the users would see more non-core
`parts of images and the recognition rate of the cropped
`thumbnails would be less than that of plain thumbnails. The
`goal of this test is to measure if saliency based cropping cut
`out the right part of images. The recognition test result
`shows that participants recognize objects better with
`saliency based
`thumbnails
`than plain
`thumbnails.
`Therefore, we can say that saliency based cropping cut out
`the right part of images.
`Condition
`No cropping vs. Saliency based
`cropping on Animal Set
`No cropping vs. Saliency based
`cropping on Face Set
`No cropping vs. Face Detection
`based cropping on Face Set
`Saliency based cropping vs. Face
`detection based cropping on Face Set
`Animal Set vs. Face Set with no
`cropping
`Animal Set vs. Face Set with
`saliency based cropping
`
`t-Value
`
`P value
`
`4.33
`
`0.002
`
`4.16
`
`0.002
`
`9.56
`
`< 0.001
`
`7.34
`
`< 0.001
`
`5.00
`
`0.001
`
`3.08
`
`0.005
`
`Figure 13: Analysis results of Recognition Task (Paired T-
`Test). Every curve in Figure 12 is significantly different from
`each other.
`During the experiment, participants mentioned that the
`background sometimes helped with recognition. For
`example, when
`they
`saw blue background,
`they
`immediately suspected that the images would be about sea
`animals. Similarly, the camel was well identified in every
`thumbnail technique even in very small scale thumbnails
`because the images have unique desert backgrounds (4 out
`of 5 images).
`Since saliency based cropping cuts out large portion of
`background (42.4%), we suspected that this might harm
`recognition. But the result shows that it is not true. Users
`performed better with cropped
`images. Even when
`
`Figure 12: Recognition Task Results. Dashed lines are
`interpolated from jagged data points.
`
`
`
`
`
`
`
`
`
`
`
`Volume 5, Issue 2
`
`101
`
`UNIFIED PATENTS EXHIBIT 1015
`Page 7 of 10
`
`
`
`background was cut out, users still could see some of the
`background and
`they got sufficient help from
`this
`information. It implies that the saliency based cropping is
`well balanced. The cropped image shows the main objects
`bigger while giving enough background information.
`The second graph shows results similar to the first. The
`second graph represents the results from the “Face Set”
`with three different types of thumbnail techniques, no
`cropping, saliency based cropping, and face detection
`based cropping. As seen in the graph, participants perform
`much better with face detection based thumbnails. It is not
`surprising that users can identify a person more easily with
`images with bigger faces.
`Compared to the Animal Set result, the Face Set images are
`less accurately identified. This is because humans have
`similar visual characteristics while animals have more
`distinguishing features. In other words, animals can be
`identified with overall shapes and colors but humans
`cannot be distinguished easily with those features. The
`main feature that distinguishes humans is the face. The
`experimental
`results clearly
`show
`that partic