`
`Jian Sun
`
`Lazy Snapping
`†
`
`Chi-Keung Tang
`
`Hong Kong University of Science and Technology
`
`†
`
`Yin Li
`
`∗
`†
`
`‡
`
`Heung-Yeung Shum
`
`Microsoft Research Asia
`
`‡
`
`(a) Input image
`
`(b) Object Marking
`
`(c) Boundary editing
`
`(d) Output composition
`
`Figure 1: Lazy Snapping is an interactive image cutout system, consisting of two steps: a quick object marking step and a simple boundary editing step. In
`(b), only 2 (yellow) lines are drawn to indicate the foreground, and another (blue) line to indicate the background. All these lines are far away from the true
`object boundary. In (c), an accurate boundary can be obtained by simply clicking and dragging a few polygon vertices in the zoomed-in view. In (d), the cut
`out is composed on another Van Gogh painting.
`
`Abstract
`
`In this paper, we present Lazy Snapping, an interactive image cutout
`tool. Lazy Snapping separates coarse and fine scale processing,
`making object specification and detailed adjustment easy. More-
`over, Lazy Snapping provides instant visual feedback, snapping the
`cutout contour to the true object boundary efficiently despite the
`presence of ambiguous or low contrast edges.
`Instant feedback
`is made possible by a novel image segmentation algorithm which
`combines graph cut with pre-computed over-segmentation. A set
`of intuitive user interface (UI) tools is designed and implemented
`to provide flexible control and editing for the users. Usability stud-
`ies indicate that Lazy Snapping provides a better user experience
`and produces better segmentation results than the state-of-the-art
`interactive image cutout tool, Magnetic Lasso in Adobe Photoshop.
`
`Keywords: User Interface, Image Cutout, Interactive Image Seg-
`mentation, Graph Cut
`
`1 Introduction
`
`“Image cutout” is the technique of removing an object in a picture
`or photograph from its background. The cutout result is typically
`composited on a different background to create a new scene. Image
`cutout has been around for many years, and is popular in film, tele-
`vision, publication, and photography. It is simple enough to explain
`that even young children make cutouts from magazines or picture
`∗
`This research was done when Yin Li was with Microsoft Research Asia
`as an intern.
`Permission to make digital or hard copies of part or all of this work for personal or
`classroom use is granted without fee provided that copies are not made or distributed for
`profit or direct commercial advantage and that copies show this notice on the first page or
`initial screen of a display along with the full citation. Copyrights for components of this
`work owned by others than ACM must be honored. Abstracting with credit is permitted. To
`copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
`component of this work in other works requires prior specific permission and/or a fee.
`Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New
`York, NY 10036 USA, fax +1 (212) 869-0481, or permissions@acm.org.
`© 2004 ACM 0730-0301/04/0800-0303 $5.00
`
`books. With the advent of digital imaging, it has become possible
`to specify the foreground and background on an individual pixel
`level, providing more accurate results than any scissors could, but
`no less tedious.
`The task in image cutout is in specifying which parts of the image
`are “foreground” (the part you want to cut out) and which belong
`to the background. While a human finds it quite easy to specify
`foreground and background to another human by saying something
`like “cut out the tree from the field of flowers”, the computer is still
`a long way from the sort of cognitive image understanding required
`to do this work unassisted. The user is forced to specify each region
`of foreground individually, with pixel accuracy. The tediousness of
`this pixel-accurate work, done in support of what is a cognitively
`simple task, makes image cutout a particularly frustrating task for
`users.
`The challenge, therefore, is to come up with a way to specify the
`foreground that is less tedious than marking every pixel individu-
`ally, without sacrificing pixel-accurate quality.
`Related Work
`For general image cutout, there are two main methods that improve
`on standard pixel-level selection tools: boundary-based and region-
`based. Each of these methods takes features of the image that the
`computer can detect (such as color consistency) and uses them to
`help automate or guide the foreground specification process.
`Boundary-based methods cut out the foreground by allowing the
`user to surround its boundary with an evolving curve. The user
`traces along the object boundary and the system optimizes the
`curve in a piecewise manner. Examples include intelligent scis-
`sor [Mortensen and Barrett 1995; Mortensen and Barrett 1999],
`image snapping [Gleicher 1995] and Jetstream [Perez and Blake
`2001].
`While easier than just selecting pixels manually with a traditional
`selection tool, these techniques still demand a large amount of at-
`tention from the user. There is never a perfect match between the
`features used by the algorithms and the foreground image. As a re-
`sult, the user must control the curve carefully. If a mistake is made,
`
`303
`
`Lightricks Ltd.
`EX1010
`Page 1 of 6
`
`
`
`2 Object Marking
`
`In the object marking step, the major task is to allow the user to
`conceptually group the foreground object against its background.
`Instead of tracing the object boundary, our system allows users to
`use lines and curves to specify the extent of the object of interest.
`
`2.1 UI Design
`
`To specify an object, a user marks a few lines on the image by drag-
`ging the mouse cursor while holding a button (left button indicating
`the foreground, and right button for the background). A yellow line
`or a blue line is displayed for the foreground marker or background
`marker respectively. This high level, painting-type UI does not re-
`quire very precise user inputs. As shown in Figure 1(b), most mark-
`ing lines are in fact far from the object boundary. Similar marking
`UI to separate object from background is also presented in[Falcao
`et al. 2000; Boykov and Jolly 2001; Fails and Olsen 2003] for im-
`age segmentation or gesture tracking for camera-based interaction.
`The segmentation process is triggered once the user releases the
`mouse button after each marking line is drawn. The user then in-
`spects the segmentation result on screen and decides if more lines
`need to be marked. It is therefore critical that our system gener-
`ates the cutout boundary with very little delay. Our system adopts a
`novel interactive graph cut algorithm to optimize the object bound-
`ary, by maximizing both the color similarity inside the object and
`the gradient along the boundary.
`
`2.2 Graph Cut Image Segmentation
`
`An image cutout problem can be posed as a binary labelling prob-
`lem. Suppose that the image is a graph G = (cid:1)V, E(cid:2), where V is the
`set of all nodes and E is the set of all arcs connecting adjacent nodes.
`Usually, the nodes are pixels on the image and the arcs are adja-
`cency relationships with four or eight connections between neigh-
`boring pixels. The labelling problem is to assign a unique label xi
`for each node i ∈ V, i.e. xi ∈ {foreground(= 1), background(=
`0)}. The solution X = {xi} can be obtained by minimizing a
`Gibbs energy E(X) [Geman and Geman. 1984]:
`(cid:1)
`
`E1(xi) +λ
`
`(i,j)∈E
`
`E2(xi, xj)
`
`(1)
`
`(cid:1) i
`
`∈V
`
`E(X) =
`
`where E1(xi) is the likelihood energy, encoding the cost when the
`label of node i is xi, and E2(xi, xj) is the prior energy, denoting
`the cost when the labels of adjacent nodes i and j are xi and xj
`respectively.
`In this paper, we will concentrate on how to define the energy terms
`E1 and E2 according to user input. We refer readers to [Boykov and
`Jolly 2001] for a detailed formulation of energy minimization as a
`graph cut problem and how to solve it. The graph cut algorithm has
`also been used in the computer graphics community, such as Graph
`Cut Textures [Kwatra et al. 2003], GrabCut [Rother et al. 2004] and
`Photomontage [Agarwala et al. 2004].
`Once the user marks the image, two sets of pixels intersecting with
`the foreground and background markers are defined as foreground
`seeds F and background seeds B respectively, as shown in Figure 2.
`Likelihood energy. In Equation (1), E1 encodes the color simi-
`larity of a node, indicating if it belongs to the foreground or back-
`ground. To compute E1, first the colors in seeds F and B are clus-
`tered by the K-means method [Duda et al. 2000]. The mean colors
`of the foreground and background clusters are denoted as {K
`}
`} respectively. The K-means method is initialized to have
`and {K
`64 clusters in our experiments. Then, for each node i, we compute
`the minimum distance from its color C(i) to foreground clusters as
`
`Bm
`
`Fn
`
`the user has to “back up” the curve and try again. The user is also re-
`quired to enclose the entire boundary, which can take some time for
`a complex, high-resolution object. The close control required in-
`terferes with the user’s ability to get an overview of their progress.
`It is difficult to zoom in and out of the image while you are drag-
`ging the pixel-accurate boundary line. Finally, once the boundary is
`specified, the tool is no longer helpful. Any errors must be cleaned
`up at the end using traditional selection tools (e.g., using the Lasso
`tool with Boolean operation in Photoshop).
`Recently, researchers have managed to improve image cutout
`by using region-based methods, e.g., magic wand in Photoshop,
`intelligent paint [Reese and Barrett 2002; Barrett and Cheney
`2002], marker drawing [Falcao et al. 2000], sketch-based interac-
`tion [Tan and Ahuja 2001], interactive graph cut image segmenta-
`tion [Boykov and Jolly 2001], GrabCut [Rother et al. 2004]) and
`interactive image Photomontage [Agarwala et al. 2004]. Region-
`based methods work by allowing the user to give loose hints as to
`which parts of the image are foreground or background without en-
`closing regions or being pixel accurate. These hints usually take
`the form of clicking or dragging on foreground or background ele-
`ments, and are thus quick and easy to do. An underlying optimiza-
`tion algorithm extracts the actual object boundary based on the user
`input hints.
`Region-based methods allow the user to operate at whatever scale
`they want. They also show partial results. After each hint, the fore-
`ground/background specification becomes more and more accurate.
`The problem with region-based techniques is that there are often
`cases where the features used by the region detection algorithms do
`not match up with the desired foreground or background. Areas in
`shadow, low-contrast edges, and other ambiguous areas can be ex-
`tremely tedious to hint. Sometimes, they cannot be hinted and need
`to be specified explicitly by hand.
`Clearly, there is still a need for a user interface that can combine
`the quick hinting of region-based approaches while still providing
`a simple affordance for pixel-accurate boundary editing.
`Our approach
`We propose Lazy Snapping, which is a novel coarse-to-fine UI de-
`sign for image cutout. As shown in Figure 1, Lazy Snapping con-
`sists of two steps: a quick object marking step (b) and a simple
`boundary editing step (c). The first step, object marking, works at
`a coarse scale, which specifies the object of interest by a few mark-
`ing lines (Section 2). The second step, boundary editing, works at
`a finer scale or on the zoomed-in image, which allows the user to
`edit the object boundary by simply clicking and dragging polygon
`vertices (Section 3).
`Our system inherits the advantages of region-based and boundary-
`based methods in two steps. The first step is intuitive and quick
`for object context specification, while the second step is easy and
`efficient for accurate boundary control.
`Inspired by [Boykov and Jolly 2001], we also formulate image
`cutout as a graph cut problem in both steps. Furthermore, at the
`object marking step, we propose an efficient graph cut algorithm
`by employing pre-computed over-segmentation so that the marking
`UI can provide instant visual feedback for users. At the boundary
`editing step, we introduce a simple polygon editing UI, and use the
`polygon locations as soft constraints to improve snapping results
`around ambiguous or low contrast edges.
`We have conducted usability studies (Section 4) to compare Lazy
`Snapping with the state-of-the-art interactive cutout tool, Magnetic
`Lasso in Photoshop, which has perhaps the best implementation
`of intelligent scissor. It shows that Lazy Snapping outperforms in
`terms of ease of use, efficiency, and quality of results. We have
`experimented with our system on many natural images.
`
`304
`
`Page 2 of 6
`
`
`
`(a)
`
`(b)
`
`(c)
`
`(a) A small region by the pre-segmentation. (b) The nodes and edges for the graph cut
`algorithm with pre-segmentation. (c) The boundary output by the graph cut segmenta-
`tion.
`Figure 3: Our new graph cut algorithm works on the graph whose nodes
`are small regions from watershed segmentation.
`
`2.3 Graph Cut with Pre-segmentation
`
`To improve efficiency, we introduce a novel graph cut formulation
`which is built on a pre-computed image over-segmentation, instead
`of image pixels. We choose the watershed algorithm [Vincent and
`Soille 1991], which locates boundaries well, and preserves small
`differences inside each small region.
`We again formulate object cutout as a graph cut problem where
`the nodes are instead the segmented regions from the watershed
`segmentation. As shown in Figure 3, we use the same notation
`G = (cid:1)V, E(cid:2) for the new graph, while the nodes V are the set of all
`small regions from pre-segmentation, and the edges E are the set of
`all arcs connecting adjacent regions.
`The foreground seeds F, the background seeds B, and the uncer-
`tain region U are defined similarly as in Section 2.2, except that
`now these nodes are small regions instead of pixels. The likelihood
`energy E1 is also similar to Equation (2) while the color C(i) is
`computed as the mean color of the small region i .
`For the prior energy E2 in Equation (3), we compared two defini-
`tions of Cij: 1) Cij is the mean color difference between the two
`regions i and j; 2) Similarly defined Cij, but it is weighted by the
`shared boundary length between regions i and j.
`In our experi-
`ments, similar results were obtained.
`Since watershed segmentation provides a good super set of object
`boundaries, this approximation produces reasonable results and im-
`proves the speed significantly. As shown in Table 1, the number of
`nodes and edges for the graph cut algorithm is reduced by more
`than 10 times compared to the pixel based method in our experi-
`ments with real life images. Most importantly, our new algorithm
`is able to feedback the cut out results almost instantly.
`
`3 Boundary Editing
`
`Although the object marking step preserves the object boundary
`as accurately as possible, there still exist some errors, especially
`around ambiguous and low contrast edge boundaries. Therefore,
`we design a simple polygon editing UI for the user to refine the
`object boundary.
`
`3.1 UI Design
`
`The object boundary produced from the previous step is first con-
`verted into editable polygons. The polygon is constructed in an
`iterative way: the initial polygon has only one vertex, which is the
`point with the highest curvature on the boundary. At each step, we
`compute the distance of each point on the boundary to the polygon
`in the previous step. The farthest point is inserted to generate a new
`polygon. The iteration stops when the largest distance is less than a
`pre-defined threshold (typically 3.2 pixels).
`
`305
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`(e)
`
`(f)
`(c)Uncertain regions U
`(b)Background seeds B
`(a) Foreground seeds F
`(f) Graph cut result
`(e) Background marker
`(d) Foreground marker
`Figure 2: Graph cut formulation for Object Marking. The graph cut al-
`gorithm is defined on F, B, and U. All these nodes participate in the opti-
`mization process and are assigned a unique label, either foreground or back-
`ground.
`
`(cid:4).
`
`(cid:4)C(i) − K
`
`Bm
`
`= min
`m
`
`Bi
`
`Fn
`
`Fi
`
`(cid:4), and similarly d
`(cid:4)C(i) − K
`= min
`d
`n
`Therefore, E1(xi) is defined as follows:
`
`∀i ∈F
`E1(xi = 0) = ∞
`E1(xi = 1) = 0
`∀i ∈B
`E1(xi = 1) = ∞
`
`E1(xi = 0) = 0
`E1(xi = 0) = dB
`E1(xi = 1) = dF
`∀i ∈U
`dF
`+ dB
`dF
`+ dB
`Here, U = V \{F ∪B} is the uncertain region (Figure 2). The first
`two equations guarantee that the nodes in F or B will always have
`the label consistent with user inputs. The third equation encourages
`the nodes to have the label with similar colors to foreground or
`background.
`Prior energy. We use E2 to represent the energy due to the gradient
`along the object boundary. We define E2 as a function of the color
`gradient between two nodes i and j:
`E2(xi, xj) = |xi − xj| · g(Cij)
`(3)
`ξ+1 , and Cij = ||C(i) − C(j)||2 is the L2-Norm
`where g(ξ) = 1
`of the RGB color difference of two pixels i and j. Note that |xi −
`xj| allows us to capture the gradient information only along the
`segmentation boundary. In other words, E2 is a penalty term when
`adjacent nodes are assigned with different labels. The more similar
`the colors of the two nodes are, the larger E2 is, and thus the less
`likely the edge is on the object boundary.
`To minimize the energy E(X) in Equation (1), we use the max-
`flow algorithm in [Boykov and Kolmogorov 2001]. This algorithm
`is specially designed for some vision problems. Unfortunately, as
`shown in the last column of Table 1, it fails to provide interactive
`visual feedback for real life image cutouts.
`
`i
`
`i
`
`i
`
`i
`
`i
`
`i
`
`(2)
`
`Lag with
`Pre-segmentation
`0.12s
`
`Lag without
`Pre-segmentation
`0.57s
`
`0.21s
`
`0.25s
`
`0.22s
`
`1.39s
`
`1.82s
`
`2.49s
`
`Image
`
`Dimension Nodes
`Ratio
`10.7
`
`(408, 600)
`
`(440, 800)
`
`11.4
`
`(1024, 768) 20.7
`
`(768, 1147) 23.8
`
`Edges
`Ratio
`16.8
`
`18.3
`
`32.5
`
`37.6
`
`Boy
`Ballet
`Twins
`Girl
`(1147, 768) 19.3
`30.5
`0.22s
`3.56s
`Grandpa
`• The nodes (edges) ratio is the number of pixels (connection between pixels) divided
`by the number of nodes (edges) after the pre-segmentation.
`• The feedback lag is the delay from when the user releases the mouse to when the
`object boundary is displayed.
`• All lags are timed on a laptop PC with Centrino 1.5GHz CPU and 512M memory.
`Table 1: Performance comparison of the graph cut segmentation algo-
`rithms with and without pre-segmentation on the images shown in Figure 9.
`
`Page 3 of 6
`
`
`
`The likelihood energy E1 is defined as in Equation (2) in the ob-
`ject marking step. But the prior energy E2 is defined differently.
`In addition to the gradient term, E2 uses the polygon locations as
`soft constraints, in order to deal with ambiguous and low contrast
`(cid:6)
`(cid:7)
`gradient boundaries:
`E2(xi, xj) =|x i − xj| · g
`(1 − β) · Cij + β · η · g(D
`j)
`(4)
`where g(·) is the same as in Equation (3), Dij is the distance from
`the center of arc (i, j) to the polygon and η is the scale to unify the
`units of the two terms (typical value is 10).
`In Equation (4), β ∈ [0, 1] is used to control the influence of
`D(i, j). A typical value of β is 0.5 and it works well in most of
`our experiments, although we allow expert users to adjust this pa-
`rameter for better performance. Note that β = 1 makes the graph
`cut segmentation output the result that is snapped onto the polygon,
`regardless of the image gradient.
`When color gradient Cij is small, g(D2ij) dominates E2, which
`
`encourages the result to snap close to the polygon location. This
`is shown in Figure 5 where low contrast edges are very difficult to
`snap without polygon soft constraints. As shown in Figure 5(a),
`it is also a difficult example for region-based methods (e.g., in the
`object marking step).
`If there are two edges with comparable strength, the polygon lo-
`cation can also help users to select the desired one, as shown in
`Figure 6(b). Otherwise, the segmentation result may not be fully
`controlled by the polygon, as shown in Figure 6(c).
`Hard vertex constraint: The users may prefer to specify manu-
`ally a polygon vertex to be a “hard” constraint, so that the system
`ensures the graph cut segmentation result to pass through this ver-
`tex. For this hard constrained vertex, the uncertain region U is auto-
`matically split into two parts along its bisector. The two “split” lines
`are added into foreground seeds F and background seeds B respec-
`tively, so that graph cut segmentation must output a result passing
`through this vertex, because it is the only connection between the
`foreground and background at this place.
`
`2i
`
`4 Usability Study
`
`We believe that Lazy Snapping is superior to existing cutout meth-
`ods in being easier to learn and able to produce results of equal or
`better quality in less time. In order to test this, we have conducted
`a usability study that compared the performance of our Lazy Snap-
`ping prototype system to Magnetic Lasso, Adobe Photoshop’s im-
`age cutout tool.
`Methodology
`Fourteen subjects were selected. Ten were novices with little or
`no experience with Photoshop or its image cutout tools, while four
`were Photoshop experts. Each subject was given a five minute
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`(a) Foreground seeds F
`(c) Uncertain regions U
`(e) Polygon vertices and lines
`
`(b) Background seeds B
`(d) Pixels ignored by graph cut
`
`(e)
`
`Figure 4: Graph cut formulation for boundary editing. Only pixels in F,
`B, or U are considered in optimization. The polygon location is encoded as
`an energy term to guide the optimization to snap to user inputs.
`
`Two UI tools are provided for polygon editing:
`Direct vertex editing allows users to drag the vertex to adjust the
`shape of the polygon. Users can add or delete vertices as well.
`Multiple vertices can be grouped and processed together.
`Overriding brush enables users to draw a single stroke to replace
`a segment of a polygon. This is more efficient than dragging many
`vertices individually.
`The overriding brush is inspired by the Paintbrush tool in Adobe
`Illustrator. The user brushes a stroke starting and stopping at two
`points A and B on the original polygon so that the original polygon
`is split into two parts, one of which has less angle difference to the
`user stroke. This part is replaced by the user stroke to generate a
`new polygon. The angle of the user stroke and the two parts of the
`polygon is measured by the tangent direction at point A and from
`A to B.
`Once the user releases the mouse button after each polygon editing
`operation, the system will optimize the object boundary using the
`graph cut segmentation algorithm again. The optimized boundary
`automatically snaps to the object boundary even though the poly-
`gon vertices may not be on it. Compared with a simple polygon
`boundary where the user needs to modify so many vertices, our UI
`uses many fewer polygon vertices to describe the object shape.
`
`3.2 Boundary Editing using Graph Cut
`
`Again, we formulate boundary editing as a pixel-based graph cut
`problem in a small band around the polygons. The band is 7 pixels
`wide by default. Figure 4 shows foreground seeds F, background
`seeds B and uncertain region U. Given the editable polygon, U is
`a band computed by dilating the polygon, whereas F and B are
`defined as the inner and outer boundaries of U respectively.
`
`(a)
`
`(b)
`
`(c)
`
`(a) Original image
`
`(b) With constraint
`
`(c) Without constraint
`
`Figure 5: The polygon soft constraint can override edge locations at low
`contrast regions. (a) The object marking step produces a bad boundary. Us-
`ing the polygon overriding brush (thick orange line) can replace a segment
`of polygon (b) Enabling the polygon as a soft constraint, the result (dotted
`line) is very close to the polygon (solid line). (c) Otherwise, the optimiza-
`tion is vulnerable to noise due to weak edges.
`
`Figure 6: (a) is the original image, and (b), (c) show the color gradient
`image of the region marked on (a) as a white square.
`(b) With polygon
`soft constraints, users can select which strong edge to snap. (c) Without
`polygon soft constraints, the same input polygon may produce erroneous
`edges because of the inherent edge ambiguity.
`
`306
`
`Page 4 of 6
`
`
`
`Object Marking
`
`Boundary Editing
`
`Object Marking
`
`Boundary Editing
`
`100%
`
`75%
`
`50%
`
`25%
`
`0%
`
`Time (Photoshop = 100%)
`
`100%
`
`75%
`
`50%
`
`25%
`
`0%
`
`Time (Photoshop = 100%)
`
`1
`
`3
`
`5
`
`7.00
`
`9
`
`11
`
`13
`
`A
`
`B
`
`C
`
`D
`
`E
`
`F
`
`G
`
`H
`
`Subject
`
`(a)
`
`Image
`
`(b)
`
`Photoshop
`
`LazySnapping
`
`Photoshop
`
`LazySnapping
`
`600%
`
`500%
`
`400%
`
`300%
`
`200%
`
`100%
`
`# Error Pixels
`
`0% 100% 200% 300% 400% 500% 600%
`
`0%
`
`0%
`
`50%
`
`100%
`
`150%
`
`Time
`
`(c)
`
`Time
`
`(d)
`
`200%
`
`100%
`
`# Error Pixels
`
`0%
`
`Figure 8: (a) and (b) illustrate the average time of cutout process across
`fourteen subjects and eight images respectively. We normalize the time by
`that of Photoshop for each column, so that all data can be compared together.
`Moreover, the number of error pixels to the time is shown in (c) for the first
`task and in (d) for the second task. We normalized the time and quality
`by the mean of all samples of each image for all subjects, so that the data
`from different images can be aligned around 100% for comparison. Lazy
`Snapping is clustered at the lower left corner, indicating better quality in
`less time.
`
`overall took less than 60% of the time than they did when using
`Photoshop (Figure 8(a)(b)). The exact benefit varied widely de-
`pending on the subject and the image (standard deviation is 30%).
`We compared the quality of the Photoshop result with that of the
`Lazy Snapping result. For the second task, Lazy Snapping has less
`than 60% (the average of all 14 subjects and 4 images) the num-
`ber of error pixels than Photoshop has. In this time-restricted task,
`Lazy Snapping was a clear winner. Most subjects were able to com-
`plete the entire task in the 60 seconds allotted (86% less time than
`Photoshop), and for those that were not, Lazy Snapping produced
`satisfactory intermediate results (less than 53% error pixels than
`Photoshop). Photoshop’s magnetic lasso tool does not produce in-
`termediate results. Users who ran out of time were left with large
`errors. (See Figure 8(d)).
`Subject Feedback
`Overall, subjects preferred Lazy Snapping to the tools in Photo-
`shop. They reported it to be “much easier” and “almost magic”.
`One expert user expressed a concern that the ease of working with
`Lazy Snapping might encourage him to be lazy himself and per-
`form less accurately than with the more tedious traditional tools.
`Other users made suggestions for combining the Lazy Snapping
`tools with existing tools like the lasso and magic wand. Several
`users expressed some dissatisfaction with the two steps and won-
`dered if we could make it easier to go back and forth between them.
`We are considering these and other suggestions for improving the
`user experience.
`
`5 Experiments and Summary
`
`instruction session on the Lazy Snapping software. The novices
`were also given five minutes of instruction on Photoshop’s mag-
`netic lasso tool. All users were allowed to experiment with both
`software packages until they were comfortable that they understood
`their functions.
`The study consisted of two tasks. In the first task, the subjects had
`to cut out 4 images (A, B, C and D in Figure 7) as accurately as
`possible. The subjects were asked to work as quickly as they could
`without sacrificing accuracy. For each image, the subject had access
`to a printed version of the desired cutout. After they completed the
`task using one software package, they then repeated the task with
`the same 4 images using the other software. The order was alter-
`nated between subjects in case there was an ordering effect, with
`half of the subjects using Photoshop first, the other half using Lazy
`Snapping. For the second task, the subjects were given another 4
`images (E, F, G and H in Figure 7) to cut out, but this time they
`were given only 60 seconds per image. They were instructed to get
`the cutout as accurately as possible in the allotted time. Again, the
`order was alternated between subjects.
`When using Photoshop, the users were advised to use magnetic
`lasso as the major tool, but were also allowed to use other tools
`in Photoshop, such as free lasso, and work path editing, etc.
`Subjects were videotaped and their cutout results were saved for
`detailed logging and quality analysis.
`We evaluate the quality of cutouts by measuring the number of error
`pixels in the images. Avoiding bias, we compute the quality by av-
`eraging the number of error pixels for four ‘ground truth’ cutouts,
`which are produced by two experts (not selected as subjects) us-
`ing Lazy Snapping and Photoshop respectively. We also exclude
`the pixels in the hairy and furry regions, to avoid the influence of
`subjective recognition.
`Ease of Use
`We tested ease of use by counting the number of errors made by
`the novice users. Using the video tapes, we counted the number of
`times the users chose the incorrect tool or had to invoke the UNDO
`command. For example, while the user wants to draw foreground in
`Lazy Snapping, the background brush is used instead; when using
`the magnetic lasso, the user clicks the zoom button on the navigator
`tool window, which will produce an unexpected result. We found
`the error rate of Lazy Snapping to be less than 20% of the rate
`of Photoshop on the same image. Users also subjectively reported
`Lazy Snapping to be far easier to use than Photoshop.
`Better Quality in Less Time
`We tested time improvements by measuring how long it took users
`to complete the first task. We found subjects using Lazy Snapping
`
`A
`
`B
`
`C
`
`D
`
`H
`G
`F
`E
`Figure 7:
`Images used in usability study. The four images in the first row
`(A, B, C, and D) are for the first task. And the other four (E, F, G, and H)
`are for the second task.
`
`Figure 9 shows more examples produced by Lazy Snapping. The
`number of object markers and times for polygon editing are also
`listed for each image.
`We use Coherent matting [Shum et al. 2004], an extended Bayesian
`
`307
`
`Page 5 of 6
`
`
`
`(a) Girl (4/2/12)
`
`(b) Ballet (4/7/14)
`
`(c) Boy (6/2/13)
`
`(c) Grandpa (4/2/11)
`
`(d) Twins (4/4/12)
`
`Figure 9: More Experiments The numbers in the brackets denote the number of foreground markers, the number of background markers and the times to
`adjust polygon vertices, respectively. Each pair of images shows the marking lines for the first step and the final result. The polygons in the boundary editing
`step are not shown here. Please refer to the accompanying video to view the polygon editing process.
`
`matting [Chuang et al. 2001] with alpha prior, to compute the opac-
`ity around the object boundary before compositing the cutout object
`on a new background. The uncertain region for matting is computed
`by dilating the object boundary. Usually this dilation is of 4 pixels
`width on each side (in most of our experiments).
`In this paper, we have developed an interactive image cutout system
`that is easy to learn, produces better quality cutouts in less time than
`existing image cutout tools. Our system explicitly separates two
`tasks: object context specification and boundary refinement. We
`have designed two user interfaces for these two tasks respectively.
`The first is a marking UI which quickly specifies the object, and
`the second is a polygon editing UI which allows simple and fast
`boundary adjustment. Our usability study shows that our system
`is easy to learn and produces high quality cutouts. The UI can be
`easily and naturally extended to pen-computing devices.
`Our current system is not good at thin and branch structures. We are
`working on it. As well, we are trying to combine the object marking
`and boundary editing steps in a seamless way, without switching
`between each other. Moreover, we plan to extend our works to
`video segmentation.
`Acknowledgements:
`We would like to thank the anonymous reviewers for their construc-
`tive critiques. Many thanks to Dave Vronay for his help on the us-
`ability study, and to Steve Lin for his professional help in video pro-
`duction and proofreading. Chi-Keung Tang’s research is supported
`in part by the Research Grant Council of Hong Kong Special Ad-
`ministration Region, China: HKUST6193/02E and AOE/E-01/99.
`References
`
`AGARWALA, A., DONTCHEVA, M., AGRAWALA, M., DRUCKER, S., COLBURN,
`A., CURLESS, B., SALESIN, D., AND COHEN, M. 2004.
`Interactive digital
`photomontage. In Proceedings of ACM SIGGRAPH 2004.
`BARRETT, W. A., AND CHENEY, A. S. 2002. Object-based image editing. In Pro-
`ceedings of ACM SIGGRAPH 2002.
`BOYKOV, Y., AND JOLLY, M. P. 2001. Interactive graph cuts for optimal boundary &
`region segmentation of objects in n-d images. In Proceedings of ICCV 2001.
`
`BOYKOV, Y., AND KOLMOGOROV, V. 2001. An experimental comparison of min-
`cut/max-flow algorithms for energy minimization in vision. In Energy Minimiza-
`tion Methods in Computer Vision and Pattern Recognition, 2001.
`CHUANG, Y.-Y., CURLESS, B., SALESIN, D. H., AND SZELISKI, R. 2001. A
`bayesian approach to digital matting. In Proceedings of CVPR 2