throbber
Chapman & Hall/CRC
`Data Mining and Knowledge Discovery Series
`
`Constrained Clustering
`Advances in Algorithms,
`_. Theory, and Applications
`
`Edited by
`Sugato Basu ° Jan Davidson
`Kiri L. Wagstaff
`
`CRC Press
`Taylor & Francis Group
`Boca Raton London New York
`
`CRCPress is an imprint ofthe
`Taylor & Francis Group, an informa business
`A CHAPMAN & HALL BOOK
`
`1
`
`.
`
`APPLE 1017
`
`APPLE 1017
`
`1
`
`

`

`ay bY 1 FY
`Cover image shows the result of clusteringferQetral image of Mars using soft constraints to
`
`ALFTS
`
`impose spatial contiguity on cluster assignments. The data set was collected by the Space Telescope
`Imaging Spectrograph (STIS) on the Hubble Space Telescope. This image was reproduced with permis-
`sion from Intelligent Clustering with Instance-Level Constraints by Kiri Wagstaff.
`
`Chapman & Hall/CRC
`Taylor & Francis Group
`6000 Broken Sound Parkway NW,Suite 300
`Boca Raton, FL 33487-2742
`
`© 2009 by Taylor & Francis Group, LLC
`Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business
`
`Noclaim to original U.S. Government works
`Printed in the United States of America on acid-free paper
`10987654321
`
`International Standard Book Number-13: 978-1-58488-996-0 (Hardcover)
`
`This book contains information obtained from authentic and highly regarded sources. Reasonable
`efforts have been made to publish reliable data and information, but the author and publisher cannot
`assumeresponsibility for the validity ofall materials or the consequences of their use. The authors and
`publishers have attemptedto trace the copyright holders of all material reproduced in this publication
`and apologize to copyright holders if permission to publish in this form has not been obtained. If any
`copyright material has not been acknowledged please write and let us know so we mayrectify in any
`future reprint.
`
`Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
`transmitted, or utilized in any form by any electronic, mechanical, or other means, now knownor here-
`after invented, including photocopying, microfilming, and recording, or in any information storage or
`retrieval system, without written permission from the publishers.
`
`For permission to photocopy or use material electronically from this work, please access www.copy-
`right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
`Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
`licenses and registration for a variety of users. For organizations that have been granted a photocopy
`license by the CCC,a separate system of payment has been arranged.
`
`Trademark Notice: Product or corporate names may be trademarksor registered trademarks, and are
`used only for identification and explanation without intent to infringe.
`
`Library of Congress Cataloging-in-Publication Data
`
`Constrained clustering : advances in algorithms, theory, and applications / editors,
`Sugato Basu, Ian Davidson, Kiri Wagstaff.
`p. cm. -- (Chapman & Hall/CRC data mining and knowledge discovery series)
`Includes bibliographical references and index.
`ISBN 978-1-58488-996-0 (hardback: alk. paper)
`1. Cluster analysis--Data processing. 2. Data mining. 3. Computer algorithms.1.
`Basu, Sugato.II. Davidson, lan, 1971- ILI. Wagstaff, Kiri. IV. Title. V. Series. _
`
`QA278.C63 2008
`519.5'3--de22
`
`2008014590
`
`Visit the Taylor & Francis Website at
`http://www.taylorandfrancis.com
`
`and the CRC Press Website at
`http://www.crcpress.com
`
`2
`
`

`

`Chapter 2
`
`_Semi-Supervised Clustering with
`User Feedback
`
`David Cohn
`
`Google, Inc., cohn@google.com
`
`Rich Caruana
`
`Cornell University, caruana@cs.cornell.edu
`
`Andrew Kachites McCallum
`
`University of Massachusetts, Amherst, mccallum@cs.umass.edu
`
`Abstract We present an approach to clustering based on the observa-
`tion that “it is easier to criticize than to construct.” Our approach of semi-
`supervised clustering allows a user to iteratively provide feedback to a clus-
`tering algorithm. The feedback is incorporated in the form of constraints,
`which the clustering algorithm attempts to satisfy on future iterations. These
`constraints allow the user to guide the clusterer toward clusterings of the data
`that the user finds more useful. We demonstrate semi-supervised clustering
`with a system that learns to cluster news stories from a Reuters data set.
`
`2.1 .Introduction
`
`Consider the following problem: you are given 100,000 text documents(e.g.,
`papers, newsgrouparticles, or web pages) and asked to group them into classes
`or into a hierarchy such that related documents are grouped together. You
`are not told what classes or hierarchy to use or what documents arerelated.
`Your job is simplyto create this taxonomy so that the documents can be
`browsed andaccessedefficiently, either by yourself or by other people. While
`
`\This work was originally circulated as an unpublished manuscript [4] when all the authors
`were at Justsystem Pittsburgh Research Center.
`
`17
`
`3
`
`

`

`18 Constrained Clustering: Advances in Algorithms, Theory, and Applications
`
`you may have somecriterion in mind, you would probably be hard-pressed to
`express it algorithmically.
`This problem is ubiquitous. The web has created a numberof new examples
`of it, but it can be found in manyfields that don’t involve the web, as well as
`with manydifferent types of “documents.” Librarians, astronomers, biologists
`— practically everyone tasked with creating a taxonomy from data faces this
`problem in one form or another.
`We propose the following iterative solution to this problem:
`
`1. Give the 100,000 documents to an unsupervised clustering algorithm
`and have it cluster them.
`
`2. Browse the resulting clusters and tell the system which clusters you like,
`and which clusters you don’t like. Don’t do this for all the.clusters, just ¢
`for some of the ones you browsed. Provide feedback to the system by
`saying “This document doesn’t belong in here,” “Move this document to
`that cluster,” or “These two documents shouldn't be (or should be) .in
`the same cluster.”
`
`_
`
`Don’t do this for all, or even many, of the documents; only for the few
`that look most out of place.
`*3. After your critique, re-cluster the documents, allowing ‘the clustering
`algorithm to modify the the distance metric parameters to try to find a
`new clustering that satisfies the constraints you provided in the critique.
`
`4. Repeat this until you are happy with the clustering.
`
`This solution is distinct from both traditional supervised and unsupervised
`learning. Unsupervised clustering takes an unlabeled collection of data and,
`without intervention or additional knowledge, partitions it into sets of ex-
`amples such that examples within clusters are more “similar” than examples
`between clusters. Much work in unsupervised clustering is dedicated to the
`problem of manually engineering similarity criteria that yield good partition-
`ing of data for a given domain.
`Supervised learning, on the other hand, assumes that the class structure or
`hierarchy already is known. It takes a set of examples with class labels, and
`returns a function that maps examplesto class labels. The goal of supervised
`learning is to learn mappings that are accurate enough to be useful when
`classifying new examples, and perhaps to learn mappings that allow users to
`understand the relationships between the data and the labels, such as which
`features are important.
`Semi-supervised clustering falls between the extremes of totally unsuper-
`vised clustering and totally supervised learning. The main goal of our ap-
`proach to semi-supervised clustering is to allow a human to “steer” the clus-
`tering process so that examples can be partitioned into a useful set of clusters
`with minimum time and humaneffort. A secondary goal of semi-supervised
`
`4
`
`

`

`Semi-Supervised Clustering with User Feedback
`
`19
`
`clustering is to give the user a way to interact and play with the data so that
`they can understandit better.”
`Our approach to semi-supervised clustering assumes that the human user
`has in their mindcriteria that enable them to evaluate the quality of a cluster-
`ing. It does not assume that the user is conscious of what they think defines
`a good clustering but that, as with art, they will “know it when they see it.”
`Most importantly, semi-supervised clustering never expects a user to write a
`function that defines theclustering criterion. Instead, the user interacts with
`the clustering system, which attempts to learn a criterion that yields clusters
`the user is satisfied with. As such, one of the primary challenges of semi-
`supervised clustering is finding ways to elicit and make use of user’ feedback
`during clustering.
`_ The remainderof this chapter describes one simple,illustrative way in which
`this may be accomplished. Other challenges that need to be addressed by
`future research on semi-supervised clustering are briefly described in the dis-
`cussion section.
`
`2.1.1 Relation to Active Learning
`
`Semi-supervised clustering with user feedback is closely related to active
`learning [5]. In the most common form ofactive learning, a learning system
`attempts to identify which data points, if labeled by a human, would be most
`informative. In semi-supervised clustering, the human selects the data points,
`and puts on them a wide array of possible constraints instead of labels. These
`two key differences point toward somesituations in which the semi-supervised
`approachis preferable.
`
`1. In some clustering problemsthe desired similarity metric may be so dif-
`ferent from the default that traditional active learning would make many
`inefficient queries. This problem also arises when there are manydiffer-
`ent plausible clusterings. Although less automated, a human browsing
`the data would do less work by selecting the feedback data points them-
`self.
`
`2. The intuitive array of possible constraints are easier to apply than labels,
`especially when the final clusters are not known in advance.
`
`3. The very act of human browsing can lead to the discovery of what
`clusters are desired. Semi-supervised learning can thus be seen as a
`method of data exploration and pattern discovery, efficiently aided by
`clustér-based summarization.
`
`[7] independently introduced a semi-supervised clustering model similar to
`2Demiriz et al.
`the one we dedcribe here. The main distinction between our work and theirs is our use of
`iterative feedback to acquire labelings; Demiriz et al. assume that all available labels are
`given a priori.
`
`5
`
`

`

`20 Constrained Clustering: Advances in Algorithms, Theory, and Applications
`
`However, the distinction with active learning is subjective. As we will see
`in Section 2.5.1, our system could easily be viewed as a practical application
`of learning by counterexamples [1] — one of the earliest and most powerful
`forms of active learning studied in the theory community.
`Hybrid active-semi-supervised systemsare also plausible. In situations with
`a large numberof data points and data types that are difficult to browse, one
`could imagine a system that combines some of the automated selection of
`active learning with the human browsing of semi-supervised clustering. The
`active learner could make many disparate hypotheses about the underlying
`labels and present the examples that would be most indicative of each.
`
`2.2 Clustering
`
`Formally, clustering is the process of partitioning a data set into subsets
`such that all members of a given subset are “similar” according to some dis-
`tance measure D. We will denote the distance between two examples x; and
`x2 as D(x1, 22). We can generalize this to refer to D(y1, y2), the distance be-
`tween two cluster centers, or D(y,21), the distance between a cluster center
`and an example.
`The two most popular approachesto clustering are agglomerative cluster-
`ing and prototype-based clustering. In agglomerative clustering, each datum
`is initially placed in its own cluster. The clusters that are most similar (ac-
`cording to D) are iteratively merged, until the desired number of clusters
`is reached, or some limit on data likelihood or distortion is exceeded (see
`Hofmann and Buhmann[14] for an in-depth treatment of agglomerative clus-
`tering).
`~
`In prototype-based clustering, the final numberofclusters is usually set a
`priori, and the corresponding prototypes are found using some form of Expec-
`tation Maximization (EM) [8]. Each prototypeis initialized to some position
`- (in our case, a randomly weighted sample of the training points). Examples
`are assigned to prototypes accordingto their similarity to each prototype (the
`assignment may be 0-1 or fractional, depending on the algorithm). Prototypes
`are then adjusted to maximize the data likelihood, or, equivalently, minimize —
`the data distortion. The assignment/adjustment process is repeated until no
`significant changes result (see Meilé and Heckerman [17} for concise review of
`prototype-based clustering).
`In the present chapter, we adopt a statistical prototype-based approach,
`resulting from the naive Bayes model of document generation [16].* Given a
`
`3Wereiterate that the approach described in this chapter is only for the point of exploration
`and illustration; the approach is, in theory, applicable to almost any clustering algorithm.
`
`6
`
`

`

`Semi-Supervised Clustering with User Feedback
`
`21
`
`vocabulary V, a document is assumed to be a “bag of words” generated from
`a multinomial distribution @. In this model, the probability of document z is
`
`P(x) = [[ Pela",
`
`tj;EV
`
`where P(t;|@) is the parameterized probability of term t; being generated, and
`N(é;,x) is the numberof times t; appears in the document. Each document
`x forms an estimate of a multinomialdistribution 8,; likewise, each cluster of
`documents a forms an estimate @,, composed from the 6, of its constituent
`documents.*
`,
`For clustering we assumethat, instead of being produced by a single multi-
`nomial distribution, each of the observed documents was drawn from one of
`distributions 6,,,@2,,--.,9n,, corresponding to the unknowndistribution of
`clusters 771,72,..., 7%!
`
`P(t) =D) P(m)P(elm) = D> P(r) [J Pt.)O.
`
`t
`
`i
`
`é;EV
`
`Our task is to estimate values for P(x;) and6,,, which will in turn allow us
`to estimate cluster memberships P(7;|xr) by Bayes rule:
`
`P(aijx) = P(2|7;)P(7;)/P(2).
`
`‘
`
`-
`
`(2.1)
`
`We find estimates for P(7;) and @,, via the standard procedure for EM,
`beginning with randomized estimates of 6,, drawn as a weighted sample from
`the observations. Then, for each cluster 7; and document +, we compute
`P(z|6,,) and apply Equation 2.1 to compute P(z;,|x). Each cluster is given
`partial ownership of a document proportional to P(7;|z). The parameters
`0, are recomputed as the weighted sum of their component documents, and
`the process is repeated. The algorithm is guaranteed to convergeto a locally
`optimal clustering (see, e.g., MacKay [15] or Meili and Heckerman [17] ‘for
`details).
`
`2.3 Semi-Supervised Clustering
`
`The goodness of any clustering depends on how well the metric D matches
`the user’s (perhaps unknown) internal model of the target domain. We pro-
`pose allowing the user to impose their model on the metric via the clustering
`
`4The estimates for term probabilities are derived from the relative term frequencies in the
`documents. Following McCallum and Nigam [16], we smooth with a LaPlacean prior to
`avoid zero term probabilities.
`
`7
`
`

`

`22 Constrained Clustering: Advances in Algorithms, Theory, and Applications
`
`
`
`FIGURE 2.1: Illustration of semi-supervised clustering. Given an initial clus-
`tering, the user specifies two points that should not have been placed in the
`same cluster. The system warps its metric, allowing it to find a clustering
`‘that respects the constraint.
`
`algorithm, by having the user provide the algorithm with feedback, and al-
`lowing it to alter the metric so as to accommodate that feedback. Not only
`is it easier to critique than to construct, but the user’s criticism can take
`many forms — specifying that a particular example does/doesn’t belong in a
`particular cluster, that two examples do/don’t belong in the samecluster, or
`that a particular cluster is good (and should be preserved) or bad (and should
`be split up).
`Feedback may be incorporated into the metric as constraints to be respected
`by the clustering algorithm. Consider two examples, x1 and x2, that are con-
`strained by the user feedback to be in separate clusters. When the clustering
`algorithm attempts a partitioning which places x; and x2 in the samecluster,
`the metric may be altered to increase the distance between x1 and x2 until
`one or the other of them falls in a different cluster (Figure 2.1). Other con-
`straints may be implemented similarly, shrinking the distance between some
`example and a cluster prototype, or increasing the distance between a cluster
`prototype andall the examples assignedtoit.
`
`Implementing Pairwise Document Constraints
`2.3.1
`In this probabilistic setting, the natural measure of dissimilarity between
`two documents, x; and 29, is the probability that they were generated by the
`same multinomial. From Pereira et al. [19], this is proportional to the KL
`divergence to the mean of their multinomial distributions:
`
`Drm (21,22) = |21|Dx1 (Ox, 921,22) + |22|Dx1(A295 92; 29);
`
`where|z| is the length of document +, Dx1,(61, 92) is the standard Kullback-
`Leibler divergence of 6; to 62, and @z, 2, is a distribution such that
`
`P(t;|Ox,,22) = (P(t; 142) + P(t;|9x,)) /2.
`
`8
`
`

`

`Semi-Supervised Clustering with User Feedback
`
`23
`
`The advantage of this measure is that it is symmetric, unlike standard KL
`divergence.
`To implement our constraints, we augment the standard KL divergence
`' D(@z,,9c,) with a weighting function
`
`P(t; |Ox»)
`ig pre.PUG|@x2)_
`
`
`
`Diet 51»Oxo) x P(t;l@x,)log (seae}) .
`
`
`
`where -y; may be interpreted as indicating the importance of t; for distinguish-
`ing x1; and x2. Then, given a constraint that z, and x2 must be in separate
`clusters, we can warp the metric by computing
`
`ODicny (21 22) _
`
`P(t3|9x122)
`
`|za|P(t;|@2,) log (FE)
`
`and hillclimbing over yy to increase the effective distance between the two.
`This gradient tells us the direction to move the y’s in order to increase (or
`decrease) the separation between two documents. (In the current experiments
`we constrain the -y’s to be positive, but it might be interesting to relax this
`and allow some 7y’s to become negative.)
`These ‘y’s are incorporated back into the E-step of clustering algorithm as
`weights attached to the individual term frequencies:
`
`P(aimi) = [] Pltsl6n)o"O).
`
`t;€V
`
`Intuitively, a small y; reduces the effect of t;’s presence or absence on doc-
`ument likelihood, effectively scaling its effect on the document’s divergence
`from its cluster center. As such, we are abie to inject a learned distance metric
`directly into the clustering algorithm.
`
`2.3.2 Other Constraints
`
`Other constraints described in the previous section may be similarly im-
`plemented by hillclimbing over the example-to-cluster and cluster-to-cluster
`distance. Note that the linear warping we describe will not guarantee that
`all constraints can be satisfied; some clusterings desired by the user may be
`non-convex and unrealizable in the space of models supported by naive Bayes.
`In this case, the hillclimbing will converge to a weighting that provides a local
`minimum ofconstraint violations. Local or nonlinear warpings of the distance
`metric, such as the ones described by Friedman[11] and Yianilos [21] may be
`of use in these situations.
`
`9
`
`

`

`24 Constrained Clustering: Advances in Algorithms, Theory, and Applications
`
`2.4 Experiments
`
`In this section, we illustrate the semi-supervised approach on a small docu-
`‘ ment clustering problem. We use a set of 25 documents each from five Reuters
`topic areas: business, health, politics, sports, and tech. Starting from five ran-
`domly initialized prototypes, the EM-based clustering algorithm described in ~
`the previous sections finds clusters that maximize data likelihood.
`Each time clustering converges, we add a constraint. We simulate a human
`user by identifying two documents from the same cluster whose sources are
`different Reuters topics, and constrain them to be in different clusters.° For
`each unsatisfied constraint, we reweight the divergence by a fixed number
`of hillclimbing steps, re-initialize the cluster prototypes, and repeat the EM
`training.
`
`2.4.1 Clustering Performance
`
`Figure 2.2 compares the performance of supervised, unsupervised, and semi-
`supervised learning. For unsupervised and semi-supervised learners, we plot
`cluster purity:
`the fraction of examples that would be classified correctly
`if all examples were assigned the majority label in each cluster. For the
`supervised learner, we pilot both. cluster purity and classification accuracy
`(generalization).
`After only a few constraints have been added, cluster purity increases
`sharply over that of unsupervised clustering.
`It is not clear, however, how
`to fairly compare. the performance of semi-supervised clustering with that of
`fully supervised clustering: constraints do not exactly correspond to labeled
`examples, and it is uncertain what constitutes a proper test set.. In super-
`vised learning, documents used for training are traditionally excluded .from
`the test set, since their labels are already known. But the semi-supervised
`model clusters (and is tested on) the entire corpus, so it is also reasonable to
`gauge it against a supervised learner tested the same way. In the figure we
`show the cluster purity of supervised learning on the training set as well as
`its generalization to an independent test set.
`The semi-supervised learner reaches its asymptotic performance after about
`10 constraints have been added; the supervised learners require between 3 and
`6 times more labeled examples to reach that level of performance.® It is in-
`
`5A fully-operational semi-supervised clustering system would benefit from a graphical user
`interface that permits efficient browsing of the current clusters and supports easy specifi-
`cation of user constraints. See the discussion of Scatter/Gather later in this chapter.
`®To assure ourselves that metric-warping alone wasn’t responsible for the performancedis-
`parity, we also incorporated metric warping into the supervised clusterer, shrinking the
`divergence between a document andits assigned cluster. The addition resulted in no sig-
`nificant performance improvement.
`
`10
`
`10
`
`

`

`correct)es 30
`
`
`
`
`
`Semi-Supervised Clustering with User Feedback
`a9
`
`25
`
`2S
`
`clusterpurity(orfraction
`
`50
`40
`numberof constraints (or labeled examples)
`
`60
`
`FIGURE 2.2: Learning curves for supervised, unsupervised, and semi-
`supervised clustering. For supervised clustering, cluster purity (measured
`on the train set) and generalization (measured on an independent: test set)
`are plotted against the numberof labeled examples; for semi-supervised clus-
`tering, purity is plotted against the numberof constraints. Averages over 10
`runs each, with the upperandlowerlines indicating error bars at one standard
`deviation. See text for details.
`
`teresting to note that the performance of the semi-supervised learner actually
`begins to decrease after roughly 20 constraints have been added. The Reuters
`data set contains many documents that appear under more than one topic
`(an identical article on Microsoft, for example, appears under both business
`and tech). We hypothesize that, in an attempt to separate these unseparable
`documents, the learner is pushing its term weightings to unhealthy extremes.
`
`Experiments on a larger data set consisting of 20,000 USENETarticles sug-
`gest that semi-supervised clustering is just as effective with large data sets.
`More importantly, these experiments show that semi-supervised clustering is
`able to cluster the same data according to different orthogonal criteria. This
`data set contains articles on four subjects: aviation simulators, real aviation,
`auto simulators, and real autos. Semi-supervised clustering can cluster the
`simulators and real groups together (e.g., aviation simulators and real avia-
`tion) or the auto and aviation groups together(e.g., aviation simulators and
`auto simulators) depending on the feedback provided by the user.
`In both
`cases it does so at about 80% accuracy with 10 constraints. When the dis-
`tance metric is not adjusted, the same constraints give an average of only 64%
`accuracy. (Purely unsupervised clustering achieves only about 50% accuracy.)
`
`11
`
`11
`
`

`

`26 Constrained Clustering: Advances in Algorithms, Theory, and Applications
`
`Os et
`
`Sstea
`
`0.3
`
`
`
`
`
`0.7%) weTe em,
`tractionovelapwithmaxcross—entropywords 8
`
`
`
`2= a x
`
`ote ff
`
`— top 1200 words (20%)
`— - top 750 words (11%)
`+= fop 150 words (2.2%)
`+ top
`
`eae
`
`o
`
`2
`
`4
`
`4
`
`12
`10
`8
`Number of constraints
`
`14
`
`16
`
`18
`
`20
`
`FIGURE 2.3: Fraction overlap of the top n weighted terms with top n terms
`ranked by information gain on fully-supervised data. As the numberof con-
`straints increases, there is increasing correlation with terms that strongly
`affect class conditional probabilities. Note that this overlap is achieved with
`far fewer constraints than the number of labels in the fully-supervised data.
`
`2.4.2 Learning Term Weightings
`
`Adjusting ; warps the metric by adjusting the resolving power of term
`t;, essentially identifying which terms are most useful for distinguishing docu-
`ments. If 7; is large, small disparities in the frequency of t; become important
`and will tend to separate documents; if y; is small, large disparities in fre-
`quency will be ignored.
`
`Empirically, this behavior is borne out on the Reuters experiments. Terms
`that subjectively appear highly relevant for distinguishing topics, such as Jraq,
`economy, weapons and council are given large weightings. We computed the
`information gain of ¢; using all document labels [18], and compared it with
`yj. Figure 2.3 shows the overlap between the top-weighted n% terms in the
`vocabulary with the same terms ranked by information gain. After about
`a dozen constraints, semi-supervised clustering learns term weightings with
`moderate overlap to the term weightings learned by supervised learning from
`all 125 document labels.
`
`12
`
`12
`
`

`

`Semi-Supervised Clustering with User Feedback
`
`27
`
`2.5 Discussion
`
`This chapter only scratches the surface of semi-supervised clustering with
`user feedback. There are still many issues to be addressed; we touch on a few
`of these next.
`
`2.5.1 Constraints vs. Labels
`
`Whenapplying supervised learning to classification problems,it is assumed
`that the users know the target classes and have labeled examples from each
`target class.
`In many interesting problems, this is an unrealistic assump-
`tion. A semi-supervised system allows users to give label-like information to
`the learner without having to know labels. Although user feedback in semi-
`supervised clustering serves a similar role as class labels serve in supervised
`learning, comparing supervised learning with semi-supervised clustering is an
`apples—to—oranges comparison. Semi-supervised clustering usually will be ap-
`plied to problems where labels are not readily available. However, evaluating
`clustering systems is difficult and usually subjective. We compare the per-
`formance of semi-supervised clustering to supervised learning using a labeled
`data set principally to avoid this subjectivity.
`The performance disparity between supervised and semi-supervised cluster-
`ing is surprising. While we have argued that it is easier to provide constraints
`than labels, constraints also provide less information than labels. Constraints
`don’t require the user to know the correct label (or even what labels exist!) —
`only the relationship among pairsor sets of labels. There are only 125 possible
`labels in the small Reuters data set, but thousandsof possible separation con-
`straints. Yet empirically, even with very few constraints, the semi-supervised
`learner is able to perform surprisingly well.
`One explanation is in the connection to active learning. As a means of
`user feedback, the addition of a constraint indicates a problem and effectively
`acts as a counterexample for the present clustering. Counterexamples are a
`powerful tool for doing active learning, which, in some situations, are much
`more efficient than learning from randomly labeled examples [1]. As such,
`the user, by iteratively directing the clusterer’s attention toward points that
`are incorrectly clustered, gives a semi-supervised clustering system the many
`advantages of an active learning system.
`
`2.5.2 Types of User Feedback
`
`As we have discussed, there are many different types of feedback that users
`might provide to a semi-supervised clustering system. One type of feedback
`is the constraints on individual data points and clusters we used earlier. But
`many other forms of feedback might prove useful as well. For example, a user
`
`13
`
`13
`
`

`

`28 Constrained Clustering: Advancesin Algorithms, Theory, and Applications
`
`mighttell the system that the current clustering is too coarse or too fine. Or
`the user might point to a cluster and indicate that the cluster is bad without
`saying how it is bad. Similarly, a user might indicate that a cluster is good,
`suggesting that future re-clusterings of the data should attempt to maintain
`this cluster. Users might also give feedback that is not cluster specific, such
`as telling the system that the entire clustering looks bad and that the next
`clustering should be very different.
`Sometypes of user feedback may require adaptive clustering that cannot be
`easily handled by the -y weighting scheme weused above. For example, we con-
`sidered an approachto finding good—but qualitatively different-—clusterings
`of the same data by exploiting EM’s weakness for getting trapped in local
`minima. Different local minima may capture qualitatively different ways of
`clustering the data, one of which may better match the user’s internal prefer-
`ence function than the deepest minima the system can find. In the long run
`we hope to develop a general framework for representing user feedback about
`clusters.
`
`2.5.3 Other Applications
`
`Webelieve there are many applications of feedback-driven semi-supervised
`clustering. Imagine a Yahoo! hierarchy for web pages that allows the user to
`tailor the hierarchy to better match their own interests by providing feedback
`while browsing. Similarly, consider an automatic e-mail system in which a user
`allows the system to cluster e-mail into related mailboxes instead of manually
`specifying the mailboxes. Semi-supervised feedback would allow the user to
`tailor mailbox clusters to fit their (possibly changing) needs. As a different
`example, consider a user clustering proteins into homology groups (groups of
`proteins with similar structures). Large proteins have complex structures and
`could be clustered many different ways. A feedback-driven semi-supervised
`clustering system would allow the user to explore many different ways the
`proteins might be clustered and to find clusterings most suitable to their
`purposes.
`
`2.5.4 Related Work
`
`The core operation of semi-supervised clustering involves learning a distance
`metric, of which a great deal of work has been doneforclassification problems
`(see Hastie and Tibshirani [13] for an overview); more recently, researchers
`have begun applying these techniques to clustering and other forms of machine
`learning (see, e.g., Xing et al.
`[20]).
`Asindicated earlier, our model is most similar to the work of Demiriz et
`al. They report how a fixed set of labeled examples may be used to bias a
`clustering algorithm; we investigate how a user, interacting with the system,
`may efficiently guide the learner to a desired clustering.
`In the time since this work was first presented, there has been a great deal
`
`14
`
`14
`
`

`

`Semi-Supervised Clustering with User Feedback
`
`29
`
`of research in improving clusterings by the (semi-supervised) learning of a
`distance measure.
`Instead of attempting a complete list of references here,
`werefer the reader the references in Chapter 1 and to the other, more recent
`contributions in this volume.
`Ourtechnique of incorporating user feedback is a cousin to relevance feed-
`back, a technique for information retrieval [2|. Given a query andinitial set
`of retrieved documents, relevance feedback asks the user to tag documents
`as being moreorless relevant to the query being pursued. As the process is
`iterated, the retrieval system builds an increasingly accurate model of what
`the user is searching for.
`The question of how a user (or teacher) may best select examples to help a
`learner identify a target concept is the focus of much work in computational
`learning theory. See Goldman and Kearns [12] for a detailed treatment of the
`problem.
`The Scatter/Gather algorithm [6] is an interactive clustering algorithm de-
`signed f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket