`Collections of Geo-Referenced Photographs
`
`Alexandar Jaffe,
`Mor Naaman
`Yahoo! Research Berkeley
`Berkeley, CA, USA
`ajaffe@cs.washington.edu,
`mor@yahoo-inc.com
`
`Tamir Tassa
`Department of Mathematics
`and Computer Science
`The Open University of Israel
`Ra’anana, Israel
`tamirta@openu.ac.il
`
`Marc Davis
`Yahoo! Inc.
`Sunnyvale, CA, USA
`marcd@yahoo-inc.com
`
`ABSTRACT
`We describe a framework for automatically selecting a sum-
`mary set of photos from a large collection of geo-referenced
`photographs. Such large collections are inherently difficult
`to browse, and become excessively so as they grow in size,
`making summaries an important tool in rendering these col-
`lections accessible. Our summary algorithm is based on spa-
`tial patterns in photo sets, as well as textual-topical patterns
`and user (photographer) identity cues. The algorithm can
`be expanded to support social, temporal, and other factors.
`The summary can thus be biased by the content of the query,
`the user making the query, and the context in which the
`query is made.
`A modified version of our summarization algorithm serves
`as a basis for a new map-based visualization of large collec-
`tions of geo-referenced photos, called Tag Maps. Tag Maps
`visualize the data by placing highly representative textual
`tags on relevant map locations in the viewed region, effec-
`tively providing a sense of the important concepts embodied
`in the collection.
`An initial evaluation of our implementation on a set of
`geo-referenced photos shows that our algorithm and visual-
`ization perform well, producing summaries and views that
`are highly rated by users.
`
`Categories and Subject Descriptors
`H.3.3 [Information Systems]: Information Storage and
`Retrieval—Information Search and Retrieval
`
`General Terms
`Algorithms, Human Factors
`
`Keywords
`Photo Collections, Geo-Referenced Photos, Summarization,
`Clustering, Image Search, Collection Visualization
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`MIR’06, October 26–27, 2006, Santa Barbara, California, USA.
`Copyright 2006 ACM 1-59593-495-2/06/0010 ...$5.00.
`
`INTRODUCTION
`1.
`With the popularization of digital photography, people
`are now capturing and storing far more photographs than
`ever before. Indeed, we are moving towards Susan Sontag’s
`1977 vision of a world where “everything exists to end up
`in a photograph” [18]. As a result, billions of images, many
`of which are on the Web, constitute a growing record of
`our culture and shared experience. Viewing and interact-
`ing with such collections has a broad social and practical
`importance. However, these collections are inherently diffi-
`cult to navigate, due to their size and the inability of com-
`puters to understand the content of the photographs. The
`prospects of ‘making sense’ of these photo collections has
`become largely infeasible.
`Some steps forward have been made through geo-referencing
`of digital photographs, whereby photos are connected to
`metadata describing the geographic location in which they
`were taken [12, 19]. Capture devices such as camera-phones
`and GPS-enabled cameras can automatically associate geo-
`graphic data with images1 and will significantly increase the
`number of geo-referenced photos available online. Already,
`an increasing number of photographs on the Web are associ-
`ated with GPS coordinates. Such geo-referenced photos can
`be categorized geographically or displayed on a digital map,
`providing a rich spatial context in which to view subsets of a
`collection. Yet even here, we run into the problem of being
`able to filter, sort and summarize the data. The viewable
`space inevitably becomes cluttered after the data set has sur-
`passed a certain size, with overlapping photographs making
`viewing and finding specific photographs ever more difficult
`as the collection grows. Figure 1(a) exemplifies the problem
`by showing an unfiltered view of San Francisco photos.
`Our goal is thus to facilitate a system which can automat-
`ically select representative and relevant photographs from a
`particular spatial region. A result of our algorithm is illus-
`trated in Figure 1(b), where a limited set of eleven photos
`that were selected by our system are marked on the San
`Francisco map. Such collection summaries will enable users
`to find items more easily and browse more efficiently through
`large scale geo-referenced photo collections, in a manner that
`improves rather than degrades with the addition of more
`photos.
`Selecting the most representative photos from a given re-
`gion is a difficult task for several reasons. For instance:
`
`1See,
`ZoneTag
`the
`example,
`for
`http://zonetag.research.yahoo.com.
`
`application
`
`at
`
`MemoryWeb Ex. 2003
`Apple v. MemoryWeb – IPR2022-00033
`1 of 10
`
`
`
`• Image analysis alone is poor at understanding the se-
`mantic content of an image, making visual relevance in-
`sufficient for summarization.
`• In multi-user sets, the biases of one user’s data may skew
`the selection towards generally insignificant subjects.
`• It is difficult for an automated system to learn and assess
`the relevance of photos without appropriate models of
`human interest, as the notion of relevance is not well
`defined, and often subjective.
`
`(a) All San Francisco photos
`
`(b) An automatic summary of San
`Francisco photos
`
`Figure 1: All San Francisco photos from our dataset
`of 2200 geo-referenced photos, versus an automatic
`summary of photos, as generated by our system.
`One summary photo is enlarged for illustration.
`
`We have designed and implemented a simple algorithm
`that attempts to address the challenges stated above. Our
`algorithm utilizes metadata-based heuristics that capital-
`ize on patterns in users’ photographic behavior. Foremost
`among these heuristics is the premise that photographs taken
`at a location ‘vote’ for the presence of something interesting
`at that location.
`Our algorithm considers a multitude of spatial, social and
`temporal metadata (such as where the photo was taken, by
`whom, at what time), as well as textual-topical patterns in
`the data, such as textual tags associated with the photo.
`Furthermore, the algorithm can be tuned to bias the set
`of results using various factors such as the social network
`distance of the photographers to the user making the query.
`The summarization algorithm can be used in a number
`of applications. For example, the algorithm could be used
`for geographic image search, returning a summary of pho-
`tographs from a region in response to a search query (that
`can be specified as a text term or a map region). In addi-
`
`tion, the summarization can be used to assist in map-based
`browsing of images, for example, by selecting a subset of rep-
`resentative photos to show according to the map’s coverage
`and zoom level. With or without a map, summarization can
`help in browsing one’s photos or a group of individuals’ pho-
`tos to get an overview of a location or discover personally
`interesting areas for further exploration; automatic travel
`guide is a scenario that comes to mind.
`Key insights from our algorithm helped us generate a new
`way of visualizing large collections of geo-referenced pho-
`tographs. We use the techniques we developed to gener-
`ate map-based tag clouds, which are described in Section 6.
`“Tag Maps”, as we call them, can be used to visualize the
`contents of the collection, giving a quick overview of the
`textual-topical concepts that appear in the data as well as
`their location, importance and recency. The photos them-
`selves are not necessarily part of the visualization. Tag Maps
`concepts can be applied to many other multimedia (or other)
`applications that exhibit patterns in text and locations.
`To summarize, the contributions of this paper are:
`• A new approach for generating summaries of photo col-
`lections based on geographic as well as other contextual
`data associated with the photographic media (Section 3).
`• An outline of the requirements and the useful features
`for these context-based summaries (Section 3).
`• An implementation of an algorithm that generates such
`summaries using a public set of “geo-tagged” photographs
`(Section 4).
`• A new map-based visualization technique for photo col-
`lections that helps indicate both the important regions
`on the map and the textual concepts represented in those
`regions (Section 6).
`• A proposed evaluation for geo-referenced collection sum-
`maries; we use this evaluation to compare our algorithm
`to several baseline methods (Section 7).
`
`In addition, Section 5 briefly touches on potential applica-
`tions. We begin by discussing the related work.
`
`2. RELATED WORK
`Since 2003, a number of different research efforts have
`considered geographic location information associated with
`photographs. In [19], the authors describe WWMX, a map-
`based system for browsing a global collection of geo-referenced
`photos. Several similar map-based photo browsing systems
`appeared on the Web in the last few years2, most of them
`using “geo-tagged” images from Flickr [5] for content. All
`of those systems face the problem of clutter in the map in-
`terface: as the number of photos available in each location
`grow, the full set of images cannot possibly be shown on
`the map at once. While some systems default to showing
`the most recent photos, the WWMX system tries to handle
`clutter by consolidating multiple photograph markers into a
`single marker according to the zoom level. In our system, we
`avoid clutter by utilizing the additional metadata to select
`the best set of photographs from a region, providing poten-
`tially a better selection than the “most recent” strategy, and
`a more meaningful one than the “consolidation” approach.
`Several projects [12, 15] use geographic data to organize
`photo collections in novel ways, for example, by detecting
`
`2like http://geobloggers.com and http://mappr.com
`
`MemoryWeb Ex. 2003
`Apple v. MemoryWeb – IPR2022-00033
`2 of 10
`
`
`
`significant events and locations in a photo collection. Such
`structures could indeed be the basis for collection summa-
`rization. However, these projects considered personal photo
`collections only, and did not consider public shared pools of
`photos.
`Looking at shared collections, some research [3, 4, 11,
`14, 16] tries to use context (mostly location) information
`and sometimes visual features to identify landmarks in pho-
`tographs. Visual analysis could be integrated in our system—
`once our algorithm recognizes significant locations, it can
`attempt to select a photo of a prominent landmark there.
`Work in both [3, 11] considers, in a similar fashion to this
`work, patterns and distributions of textual terms that are
`associated with geo-referenced digital photos, and uses them
`to generate tag suggestions for new photographs. However,
`those projects are not designed to support collection sum-
`marization.
`In the absence of location metadata, temporal metadata
`was also considered in the past for the purpose of photo col-
`lection summarization.
`In [8], Graham et al. describe an
`algorithm to heuristically select representative photos for a
`given time period in a personal collection, utilizing patterns
`in human photo-taking habits (later studied in [6]). Ad-
`ditional time-based work aims to detect events in personal
`collections (e.g., [2]), which could be the basis for collection
`summarization. However, again, all these projects consid-
`ered single-photographer collections only. In public collec-
`tions of timestamped photos, only when additional meta-
`data is available (for example, the fact that all shared pho-
`tos were taken in the same event), there exists the potential
`for time-based summaries [13].
`Another possible approach for summarizing photo collec-
`tions is using textual tags that are associated with the im-
`age.
`In Flickr [5], popular tags have pre-computed clus-
`ters of related tags. For example, the “San Francisco” tag
`on Flickr has several associated tag clusters3: “california,
`bridge, goldengate”; “baseball, giants, sbcpark”, “deyoung,
`museum”, “sfo, airport” and “halloween, castro”. These
`clusters can potentially be used to generate a summary of
`San Francisco photos. This approach is not location-based,
`and the clusters often do not represent concepts that are dis-
`tinct (e.g., one of Boston’s clusters is “massachusetts, city,
`cambridge, building, architecture”). The tag clusters could
`possibly be used in conjunction with our method. In fact,
`we are using some tag-based computation to select summary
`photos. More directly related is a tag subsumption model
`[17] that can use the tag corpus to derive tags that are sub-
`sumed, for example, by the tag “San Francisco”. Again, this
`approach can be integrated with our location-based sum-
`maries.
`These projects, and others, consider various ways to al-
`leviate the difficulties of browsing large collections of pho-
`tographs, but do not provide effective ways to summarize
`multi-user photo collections or visualize them using maps.
`We believe that the potential of a geographic-based summa-
`rization method is significant, especially in conjunction with
`the current state of the art.
`
`3. THE SUMMARIZATION APPROACH
`In this section, we define the problem of summarizing a
`photo collection, then describe the guidelines and insights
`
`3http://flickr.com/photos/tags/sanfrancisco/clusters/
`
`that have informed the implementation of our summariza-
`tion algorithm. In Section 4 we provide the details of the
`algorithm.
`We formalize the summarization problem as that of pro-
`ducing a ranking on the collection in question.
`In other
`words, we summarize a set of photos by ordering the set
`and selecting the top ranked photos. More formally, we
`are looking at the following problem: Given an album of
`n photos, A = {P1, . . . , Pn}, we wish to find an ordering
`ω of A such that any k-length prefix of ω(A) is the best
`possible k-element summary of A. A summary is loosely
`defined as a subset that captures representativeness, rele-
`vance, and breadth in the original collection. These notions
`are captured through some of the following metadata at-
`tributes that are associated with the photos:
`• Location. Photo Pi was taken at location (xi, yi).4
`• Time. Photo Pi was taken at time ti.
`• Photographer. Photo Pi was taken by user ui.
`• Tags. Photo Pi was manually assigned the list of tags
`(i.e., textual labels) wi.
`• Quality. Photo Pi is associated with an externally de-
`rived parameter qi that represents its quality.
`• Relevance. Photo Pi is associated with a relevance fac-
`tor ri. Relevance can include arbitrary bias based on
`parameters such as recency, the time of day, the day of
`the week, the social network of the user, user attributes,
`and so forth.
`
`Note that The relevance attribute can introduce subjectiv-
`ity, allowing us, for example, to tune the results to the user
`who is making the query and the context of the query.
`While there is no accurate formal model for what con-
`stitutes a “good” summary of a collection of geo-referenced
`photographs, we follow a few simple heuristics that try to
`model and capture human attention, as reflected in the set
`of photos taken in a region. Among these heuristics are the
`notions that:
`• Photographs are taken at locations that provide views of
`some important object or landmark.
`• A location is more relevant if the photos around it were
`taken by a large number of distinct photographers.
`• If available, location-based patterns of textual tags can
`reflect the presence of an important landmarks in a lo-
`cation.
`
`In addition to the heuristics listed above, a desired sum-
`mary would also (a) represent a broad range of subjects,
`instead of thoroughly displaying a few, and (b) allow per-
`sonal or query bias to modify the algorithm’s results.
`In the next section we describe the summarization algo-
`rithm that we developed based on these guidelines.
`
`4. ALGORITHM FOR SUMMARIZATION
`As described in Section 3, our summarization algorithm
`produces a ranking of the photos in the collection; each pre-
`fix of this ranking can serve as a collection summary of the
`corresponding size. Producing this ranking is a two-step
`process, a clustering step followed by a ranking step on the
`resulting clustering hierarchy. In particular:
`
`4Notice that this ‘photo origin location’ is different than the
`‘target location’, the location of the photographed object.
`
`MemoryWeb Ex. 2003
`Apple v. MemoryWeb – IPR2022-00033
`3 of 10
`
`
`
`1. We apply a modified version of the Hungarian clus-
`tering algorithm [7] to our collection of photographs.
`This algorithm receives the photograph locations as an
`input, and organizes them into a hierarchical clustered
`structure.
`
`2. We compute a score for each cluster in the hierarchy.
`
`3. Finally, we generate a flat ordering of all photos in the
`dataset by recursively ranking the sub-clusters at each
`level, starting from the leaf clusters, and ending at the
`root.
`
`Note that while the clustering is a fixed one-time compu-
`tation, the ranking step can be re-evaluated, allowing users
`to specify a personal bias or preference towards any of the
`metadata features. Alternatively, the ranking can also be
`modified to utilize implicit bias in the query context (e.g.,
`the identity of the user making the query).
`To illustrate the process and the scoring mechanism we
`use a hypothetical example, presented in Figure 2. In this
`figure, a leaf node represents a single photograph, annotated
`with the identity of the photographer and a single textual
`tag (in practice, of course, more tags can be associated with
`each photo). The tree represents the hierarchy created by
`the clustering algorithm.
`Next, we describe the algorithm in detail. First, we dis-
`cuss the clustering algorithm that produces the clustering
`hierarchy. Then, we describe how to produce a ranking of
`all photos in a single node of the above mentioned clustering
`hierarchy, assuming that all nodes in the hierarchy are as-
`sociated with scores. Finally, we show how we can generate
`such scores for the nodes in the hierarchy.
`4.1 Clustering
`Our method requires a hierarchical clustering algorithm;
`as noted above, we use the Hungarian clustering algorithm
`[7]. This algorithm identifies a hierarchy of clusters within
`a given dataset of n points, based only on the distances
`between those points.
`In our system, the input to the clustering algorithm is a
`set of points in the plane, representing the locations of the
`photographs,5
`A = {(xi, yi) ∈ R2, 1 ≤ i ≤ n} .
`(1)
`The output is a clustering of these photo locations, C(A),
`where C(A) is a tree. Each node in the tree represents a
`subset of A, the root of the tree represents the entire set,
`the children of each node are a partition (or clustering) of
`the subset that is associated with that parent node, and the
`leaves of the tree are the points in A.
`The classical Hungarian method is an efficient algorithm
`for solving the problem of minimal-weight cycle cover. In
`that problem, one is given a weighted graph and needs to
`find a cover of that graph by disjoint cycles with minimal to-
`tal weight. This algorithm serves as the basic building block
`for a clustering method that is dubbed The Hungarian clus-
`tering method. Viewing A as a complete weighted graph,
`where the weight of each edge is the Euclidean (geographic,
`in this case) distance between the two nodes that it connects,
`the Hungarian clustering method subjects that graph to the
`classical Hungarian method. The disjoint cycles, produced
`5For convenience, we use the same notation, A, to denote
`the photo set as well as the set of photo locations.
`
`by the Hungarian method, are viewed as a partition of the
`data-set. The clustering algorithm then proceeds by hier-
`archical merging of the disjoint cycles, until the produced
`clusters are perceived as complete clusters (through some
`”completeness” criteria) and then the hierarchical merging
`stops. We use the Hungarian Clustering algorithm because
`of two features that it boasts: It is an hierarchical clustering
`algorithm, and it does not depend on the number of clusters
`as an input.
`The clustering hierarchy C(A) is used to create a ranking
`of all photos. In order to describe the ranking algorithm,
`let us first assume that the nodes in the hierarchy have been
`assigned a score that embodies the importance of the cluster
`of photos that corresponds to that node.
`4.2 Ranking Framework
`Given a hierarchical clustering C(A) on the locations of
`all photographs, and a score for every node (cluster) in that
`hierarchy, our goal is then to produce a ranking of all items
`in the collection. We describe a recursive interleaving algo-
`rithm that uses the clustered structure and the correspond-
`ing scores in order to produce a natural flat ordering. In the
`next section we outline a way to generate the scores.
`Going bottom up, the ranking algorithm considers each
`node B in the hierarchy C(A) and outputs an ordering ω(B)
`that represents a ranking of photos in B. Finally, when ex-
`ecuting on the root node that corresponds to the entire set
`A, we get the ordered sequence, S := ω(A), that describes
`a ranking of all photos in A. Applying this algorithm to the
`example in Figure 2, a possible output could be the ranking
`S = (6, 8, 4, 5, 7), where the numbers in the sequence corre-
`spond to the numerals of the leaves in the tree in Figure 2.
`For simplicity of notations, we describe the action of the
`algorithm on the root node, A. Actions on other nodes
`are performed in the same manner. We assume that we
`i=1 Ai; namely, node
`A has m direct descendents. In addition, assume that the
`photos in each sub-cluster of A have been ranked recursively
`according to this algorithm, and that each of the nodes Ai
`is associated with some score s(Ai) such that (without loss
`of generality)
`
`identified m sub-clusters in A, A =Sm
`
`s(A1) ≥ s(A2) ≥ · · · ≥ s(Am) .
`
`(2)
`
`Figure 2: A sample hierarchy; the leaves are photos,
`each associated with a user and a single tag.
`
`Our goal is to produce a ranking that would balance the
`contradicting properties of depth and breadth of coverage.
`In the field of Information Retrieval, some measures are used
`to balance results in terms of relevance (depth) and broad-
`ness (breadth) [1, 10, 20]; for various reasons, these measures
`are not applicable here. For our problem, depth requires
`
`1
`
`3
`
`6
`
`7
`
`8
`
`2
`
`5
`
`4
`
`(U1,Bridge) (U1,Car) (U2,Bridge) (U3,Car)(U3,Museum)
`
`MemoryWeb Ex. 2003
`Apple v. MemoryWeb – IPR2022-00033
`4 of 10
`
`
`
`that the photos in a cluster are selected from sub-clusters
`roughly according to the ratio of their scores. For example,
`consider the second level of the hierarchy in Figure 2, which
`consists of two clusters, denoted by C2 and C3, and assume
`that s(C2) : s(C3) = 5 : 3. We would like to interleave the
`photos from these two clusters so that in any section of the
`sequence S, the frequencies of photos from the two clusters
`relate to each other as closely as possible to their score ratio
`in the whole dataset, i.e., 5 : 3. On the other hand, breadth
`requires that each sub-cluster should be represented to some
`extent early in the ranking of its parent cluster.
`In order to attain some amount of depth, breadth, and
`consistency, we interleave photos from sub-clusters in the
`following manner. The ordered sequence of photos for A
`will have two parts: a short header H followed by a trailer
`T , where S(A) = H||T .
`The header H will include a photo from all prominent
`sub-clusters. To that end, we define a threshold 0 < w < 1,
`and then a cluster Ai is deemed prominent if
`Pm
`s(Ai)
`≥ w .
`j=1 s(Aj)
`Assume that there are m0 prominent sub-clusters among the
`m sub-clusters, with 0 ≤ m0 ≤ m. Then in view of assump-
`tion (2), the header is
`H = (A1,1, A2,1 · · · , Am0,1) ,
`where Ai,1 is the most relevant photo from cluster Ai. This
`header is then followed by a trailer, T .
`In order to gen-
`erate the trailer, we first remove from each sub-cluster the
`photo that was selected for the header, recalculate the sub-
`cluster scores, and then assign each sub-cluster a probabil-
`ity that equals its score divided by the sum of scores of all
`sub-clusters. Those probabilities are then used to randomly
`select a sub-cluster. If sub-cluster Ai was selected, we re-
`move its top-ranked photo, append it to T and repeat, until
`all photos have been selected.
`By now we have described how to generate the cluster
`hierarchy and produce a ranking on the photos in that hi-
`erarchy, under the assumption that all nodes are associated
`with scores. We therefore proceed to describe a key aspect
`of the algorithm: the computation of the scores for a given
`cluster (node).
`4.3 Scoring Clusters
`The score of a cluster Ai depends on several factors, in-
`cluding the following:
`
`1. The sum of relevance factors (see Section 3) of all pho-
`tos in the cluster,
`
`X
`
`ρi =
`
`rj .
`
`Pj∈Ai
`
`2. The tag-distinguishability of the cluster, τi (explained
`below).
`
`3. The photographer-distinguishability of the cluster, φi
`(explained below).
`
`We define the cluster density as
`
`δi = 1/(1 + σi) .
`
`(3)
`
`5. The sum of image qualities (see Section 3) of all photos
`in the cluster,
`
`X
`
`κi =
`
`qj .
`
`Pj∈Ai
`
`While most of the above factors are derived only from
`data that is contained in the photo collection, the relevance
`factor can introduce bias by subjective requirements. The
`relevance factor ri of a photo Pi can incorporate parameters
`such as recency, the time of day, the time of the week, the
`identity of the photographer, etc. These parameters can be
`specified by a user making the query, or set by the system
`according to the application or the query context. Each
`photo is assigned a score θ(Pi) in the range [0, 1] for each
`such parameter. The final relevance score, ri, may be a
`weighted average of all those parameter scores.
`The two interesting factors in the score computation are
`the tag- and photographer-distinguishability scores of clus-
`ters. These values are intended to represent how strongly a
`particular cluster is associated with specific tags or photog-
`raphers.
`4.3.1 Tag-distinguishability of clusters
`Tag-distinguishability aims at detecting distinct or unique
`concepts that are represented in a given cluster, as those
`may indicate the presence of some interesting landmarks
`or objects in that cluster. For example, in Figure 2, the
`tag “bridge” appears in two photos from Cluster C2, and
`does not appear elsewhere. As a consequence, C2’s score im-
`proves. On the other hand, the tag “car” appears in photos
`from both C2 and C3 and therefore does not help to distin-
`guish either of them.
`Formally, each photo Pj, 1 ≤ j ≤ n, is tagged with tags
`that are drawn from a finite dictionary, T . Hence, tagging
`may be viewed as a mapping Pj 7→ T (Pj) ⊂ T . For all t ∈ T
`and 1 ≤ i ≤ m, let
`
`|{Pj ∈ Ai : t ∈ T (Pj)}|
`|Ai|
`(4)
`tft,i =
`denote the relative frequency of the tag t in Ai, (or term
`frequency as it is referred to in Information Retrieval). We
`often found that this measure biases towards tags that have
`been used frequently by one user in the same cluster. An
`alternative frequency calculation can be based on the frac-
`tion of photographers in this cluster that have used the tag
`t:
`
`tft,i =
`
`(5)
`
`|{u ∈ Ui : t ∈ T (Pj), Pj ∈ Ai, Pj ∈ Bu}|
`|Ui|
`where Ui is the set of users that have taken photos in cluster
`Ai, and Bu is a set of photos taken by user u.
`We also use the inverse document frequency, which is a
`measure of the overall frequency of the tag t in the entire
`photo collection,
`
`4. The density of the cluster. More specifically, let σx,i
`and σy,i denote the standard deviation of the x and y
`coordinates, respectively, of all points in Ai, and let
`
`σi =`(σx,i)2 + (σy,i)2´1/2
`
`.
`
`idft =
`
`n
`|{Pj ∈ A : t ∈ T (Pj)}| .
`There are several ways to combine these two scores to mea-
`sure how the term t distinguishes the cluster Ai from other
`
`(6)
`
`MemoryWeb Ex. 2003
`Apple v. MemoryWeb – IPR2022-00033
`5 of 10
`
`
`
`!1/2
`
`
`
`X u
`
`φi =
`
`φ2
`u,i
`
`.
`
`(14)
`
`(15)
`
`According to the guidelines in Section 3, while large tag-
`distinguishabilities should contribute towards an increase in
`a cluster’s score, the photographer-distinguishability should
`have an opposite effect. The more a given cluster is associ-
`ated with a single photographer (or few photographers), the
`less we are interested in that cluster.
`Next, we describe how to merge all these factors into a
`single score for each cluster.
`4.3.3 Overall Cluster Score
`The score s(Ai) of the cluster Ai should depend in a
`monotonically increasing manner on the relevance factor,
`ρi, and the image quality factor, κi. The score should
`also depend in a monotonically increasing manner on the
`density measure of the cluster, δi (3), and on τi, the tag-
`distinguishability measure of the cluster. Finally, the score
`must depend in a monotonically decreasing manner on φi,
`the photographer-distinguishability measure of the cluster,
`as discussed above. Therefore, the overall score is:
`) · ρi
`s(Ai) = h(κi, δi, τi, φ
`−1
`i
`where h could be, typically, a geometric mean or a weighted
`average of its variables, and the weights may be chosen and
`fine-tuned by experimentation.
`4.4 Final Considerations
`We have described in this section the full framework of
`our algorithm: how the clustering is done, how scores are
`computed for each node in the cluster hierarchy, and how an
`ordering is produced on all photos given the scores and the
`cluster hierarchy. In practice, we found that it was necessary
`to prevent clusters from subdividing past some minimum
`size. Computing a ranking for a small cluster is meaningless.
`For example, there is not enough information (photos in
`the clusters) to compute a relevant tag- or photographer-
`distinguishability score. In order to solve this problem, we
`simply enforced a minimum size on all non-leaf clusters, by
`merging nodes at the lowest levels of the hierarchy.
`The way ranking is performed for these flat “edge” clus-
`ters is simple, yet different than the generalized method de-
`scribed above. The system computes the top-scoring tags
`for each flat cluster using the tag-distinguishability method,
`and then picks a photo with tags that best match these
`top tags. For example, in Figure 2, photo 6 is likely to
`be picked first because of the tag ‘bridge’, which is asso-
`ciated with that photo, and also appears often in C2 and
`rarely elsewhere. Other approaches for ranking photos for
`the flat leaf clusters may be choosing photos that maximize
`the visual similarity to other photos in this cluster, or some
`combination of such tag- and image-based similarity, as well
`as quality and relevance factors associated with the photos.
`Joshi et at [9] propose, in a different context, a solution that
`can be applied here.
`
`5. SAMPLE APPLICATION
`The summarization algorithm has a number of possible
`uses. As mentioned above, the summaries could be used to
`
`Finally, the overall photographer-distinguishability is de-
`fined as
`
`clusters. Let us denote such measures by τt,i. The usual
`measure in Information Retrieval is the tf-idf weight (term
`frequency – inverse document frequency). That measure is
`defined as
`
`τt,i := tfidft,i = tft,i · idft .
`Another alternative to (7) which is used in Information Re-
`trieval is
`
`(7)
`
`(8)
`
`τt,i := tfidft,i = tft,i · log (idft) .
`In both cases, large values of τt,i indicate that the number
`of occurrences of t in Ai is large with respect to its number
`of occurrences elsewhere.
`We would like to note that in the usual idf weight, the
`inverse document-frequency involves the number of clusters
`in which the tag appears, as opposed to the total number
`of actual tag occurrences, as given in (6). However, the
`usual definition is not suitable for cases where the number
`of clusters (documents) is small.
`In such cases, a single
`random occurrence of a tag in a cluster has a significant
`effect on the usual measure, while in the alternate approach
`we opted for it would be hardly noticeable.
`Next, we need to define an overall tag-distinguishability
`measure for Ai, denoted τi, based on the tag-distinguishability
`measures of all tags in the cluster. We compute the overall
`score by using the Euclidean measure based on the ‘2-norm,
`
`!1/2
`
`
`
`X t
`
`τi =
`
`∈T
`
`τ 2
`t,i
`
`.
`
`(9)
`
`We directly evaluate the effectiveness of our approach to
`“tag scoring” in Section 7.
`4.3.2 Photographer-distinguishability of clusters
`The measure of photographer-distinguishability (or user-
`distinguishability) is, roughly,
`inversely correlated to the
`number of photographers associated with a given cluster.
`The fewer active photographers in a cluster, the lower the
`likelihood the cluster will be semantically meaningful. For
`example, in Figure 2, all the