`
`Focal Stack Compositing for Depth of Field Control
`
`David E. Jacobs
`
`Jongmin Baek
`Stanford University∗
`
`Marc Levoy
`
`(a) Single focal stack slice
`
`(b) Reduced depth of field composite
`
`(c) Extended depth of field composite
`
`(d) Scene depth map (dark means close)
`
`(e) Defocus maps used to generate the images in (b) and (c), respectively (orange means blurry)
`
`Figure 1: Manipulating depth of field using a focal stack. (a) A single slice from a focal stack of 32 photographs, captured with a Canon 7D
`and a 28mm lens at f/4.5. The slice shown is focused 64cm away. (b) A simulated f/2.0 composite, focused at the same depth. To simulate
`the additional blur, objects closer to the camera are rendered from a slice focused afar, and objects far from the camera are rendered from
`a slice focused near. (c) An extended depth of field composite that blurs the foreground flower and is sharp for all depths beyond it. (d) A
`depth map for the scene, representing depth as image intensity (dark means close.) (e) A pair of defocus maps that encapsulate the requested
`amount of per-pixel defocus blur used to generate the composites above. Its magnitude is encoded with saturation.
`
`Abstract
`
`1 Introduction
`
`Many cameras provide insufficient control over depth of field.
`Some have a fixed aperture; others have a variable aperture that
`is either too small or too large to produce the desired amount of
`blur. To overcome this limitation, one can capture a focal stack,
`which is a collection of images each focused at a different depth,
`then combine these slices to form a single composite that exhibits
`the desired depth of field. In this paper, we present a theory of focal
`stack compositing, and algorithms for computing images with ex-
`tended depth of field, shallower depth of field than the lens aperture
`naturally provides, or even freeform (non-physical) depth of field.
`We show that while these composites are subject to halo artifacts,
`there is a principled methodology for avoiding these artifacts—by
`feathering a slice selection map according to certain rules before
`computing the composite image.
`
`CR Categories:
`I.4.3 [Image Processing and Computer Vision]:
`Enhancement—Geometric correction; I.3.3 [Computer Graphics]:
`Picture/Image Generation—Display algorithms
`
`Keywords: Focal stack, compositing, depth of field, halo correc-
`tion, geometric optics
`
`∗e-mail: {dejacobs, jbaek, levoy}@cs.stanford.edu
`
`Depth of field is one of the principal artistic tools available to
`photographers. Decisions about which scene elements are imaged
`sharply and which are out of focus direct a viewer’s attention and
`affect the mood of the photograph. For traditional cameras, such
`decisions are made by controlling the lens’ aperture and focus dis-
`tance. Unfortunately, many consumer cameras—including mobile
`phone cameras and compact point-and-shoot cameras—have lim-
`ited or no control over the aperture because of constraints imposed
`by portability and expense. However, nearly all cameras have focus
`controls and are capable of capturing a stack of images focused at
`different distances. This set of images is called a focal stack. As we
`will demostrate in this paper, these images can be combined to sim-
`ulate depth of field effects beyond the range normally allowable by
`the camera’s optics, including depth of field reduction, extension,
`and even freeform non-physical effects. Figure 1 shows examples
`of two of these manipulations.
`
`In focal stack compositing, each pixel of the output is a weighted
`sum of corresponding pixels in the input images—often referred
`to as focal stack “slices.” The choice of pixel weights determines
`the depth of field of the composite. Given a focal stack and user-
`specified novel camera parameters, appropriate blending weights
`can be computed via a compositing pipeline—ours is illustrated in
`Figure 2. The first step in compositing is to generate or otherwise
`
`1
`
`APPL-1007 / Page 1 of 10
`APPLE INC. v. COREPHOTONICS LTD.
`
`
`
`Stanford Computer Graphics Laboratory Technical Report 2012-1
`
`Figure 2: Our compositing pipeline. Given the scene pictured in the upper left, we capture a stack of images {I1, I2, I3} focused at depths
`{Z1, Z2, Z3} with f-number N. This set of images is called a focal stack. We feed these images into a depth extraction algorithm (ours is
`described in Section 4) to generate an estimate for the distance ˆZ between the lens and the object imaged by each pixel. For scene depth and
`focus distance maps (all images labeled Z) in the diagram above, we use intensity to represent depth; white means background, and black
`means foreground. Given the scene depth map ˆZ, we calculate per-pixel the signed defocus blur Cj corresponding to each slice Ij, using
`Equation (1). Above, we visualize the degree of defocus blur (in images labeled |C|) using saturation, where orange means blurry and white
`means sharp. Equation (1) also allows us to compute a requested defocus map C ⋆ given a user-specified focus distance Z ⋆ and f-number
`N ⋆. Some photographic examples of C ⋆ are shown in Figure 1. In the example above, the user is requesting a reduced depth of field
`composite focused at Z ⋆ = Z2 with f-number N ⋆ = N/2. We then compute a preliminary focus distance map ˜Z0 that specifies the depth
`at which each pixel should be focused in order to achieve the requested defocus C ⋆. For example, in order to maximally defocus the distant
`red object at Z1 visible in the top third of the composite, the preliminary focus distance calls for those pixels to be drawn from I3, which is
`focused close to the camera. Indexing into our focal stack as described creates a preliminary composite ˜I0 that is inexpensive to compute,
`but contains halo artifacts visible near depth edges. To prevent such artifacts, we apply geometric constraints (discussed in Section 3.4) on
`˜Z0 to create a smoother focus distance map ˜Z. The resulting composite ˜I is locally artifact-free, but its corresponding defocus map ˜C does
`not match C ⋆ perfectly. Finally, in the bottom right we show a ground truth image for a camera with the requested parameters Z ⋆, N ⋆.
`
`acquire a proxy for the scene geometry—in our case, we use a depth
`map. Some knowledge of the scene geometry is necessary in order
`to estimate the per-pixel defocus blur present in each slice of the fo-
`cal stack. Additionally, scene geometry is required to calculate the
`per-pixel defocus appropriate for the synthetic image taken with a
`user’s requested hypothetical camera. A basic focal stack compos-
`ite, then, is given by selecting or interpolating between the slices
`that match the requested defocus as closely as possible at each pixel.
`
`Certainly, one may produce similar effects without a focal stack—
`using only a single photograph. First, one can reduce the depth
`of field by segmenting the image into layers and convolving each
`layer with a blur kernel of the appropriate size. In practice, how-
`ever, synthetic blur fails to capture subtle details that are naturally
`present in photographic (physically produced) blur. In particular,
`saturated image regions cannot be blurred synthetically because
`their true brightness is unknown. Similarly, scene inter-reflections
`and translucencies can cause a single pixel to have multiple depths;
`therefore, no single convolutional kernel will be correct. Photo-
`graphic blur, by contrast, guarantees a consistent defocus blur re-
`gardless of depth map accuracy. In addition, physical optical effects
`like contrast inversion [Goodman 1996] cannot be correctly mod-
`eled by synthetic blur, but are present in photographic blur. Second,
`one can extend depth of field without a focal stack via deconvolu-
`tion, but this process is ill-posed without significantly modifying
`camera optics or assuming strong priors about the scene.
`
`Finally, the requirement to capture a focal stack is not as onerous
`as it would seem. Cameras that employ contrast-based autofocus-
`ing [Bell 1992] already capture most, if not all, of the required
`imagery, as they sweep the lens through a range. Contrast-based
`autofocusing is employed by nearly all cameras with electronic
`
`viewfinders. The only additional cost is the bandwidth required
`to save the autofocus ramp frames to disk. Additionally, the depth
`map required for our compositing algorithm is easily computed as
`a byproduct of capturing a focal stack.
`
`We present a theory, framework and pipeline for focal stack com-
`positing that produce composites matching a requested depth of
`field. This pipeline is shown in Figure 2, and is described through-
`out Section 3. We will analyze the geometry of such composites,
`discuss how halo artifacts (especially at occlusion edges) can arise,
`and show how the halo artifacts can be mathematically avoided
`by minimal alteration of the desired depth of field. We will then
`demonstrate the versatility of this framework in applications for re-
`ducing depth of field, extending depth of field, and creating free-
`form non-physical composites that are halo-free.
`
`2 Prior Work
`
`Depth of field is a useful visual cue for conveying the scene ge-
`ometry and directing the viewer’s attention. As such, it has been
`well-studied in the rendering literature. When raytracing a synthetic
`scene, one can obtain the desired depth of field by simulating the
`appropriate lens optics and aperture [Cook et al. 1984; Kolb et al.
`1995] or by employing other image-space postprocessing [Barsky
`and Pasztor 2004; Kosloff and Barsky 2007] that nevertheless relies
`on access to the scene model. In traditional photography, however,
`the photographer determines the depth of field via his choice of the
`relevant camera parameters. While modifying the camera can par-
`tially increase the range of possible depth of field [Mohan et al.
`2009] or the bokeh shape [Lanman et al. 2008], the depth of field is
`essentially fixed at the capture time, barring post-processing.
`
`2
`
`APPL-1007 / Page 2 of 10
`APPLE INC. v. COREPHOTONICS LTD.
`
`
`
`Stanford Computer Graphics Laboratory Technical Report 2012-1
`
`Overcoming this limitation requires correctly estimating the
`amount of blur present at each pixel, and then simulating the de-
`sired blur (if different), which may be greater or smaller than the
`pre-existing blur at the pixel location. For instance, defocus mag-
`nification [Bae and Durand 2007] and variable-aperture photogra-
`phy [Hasinoff and Kutulakos 2007] increase the per-pixel blur us-
`ing image-space convolution, thereby simulating a narrower depth
`of field. Reducing the per-pixel blur, on the other hand, requires
`deblurring the image, and can be ill-posed for traditional lens
`bokehs [Levin et al. 2007].
`
`There exists a large body of work in computational optics that com-
`bats the numerical instability of deblurring a traditional photograph
`by capturing a coded 2D image. Many of them employ spatiotem-
`poral coding of the aperture, in order to increase the invertibility of
`the defocus blur [Levin et al. 2007; Zhou and Nayar 2009], and a
`large subset thereof is concerned with equalizing the defocus blur
`across depth, thereby avoiding errors introduced from inaccurate
`depth estimation [Dowski and Cathey 1995; Nagahara et al. 2008;
`Levin et al. 2009]. While the field of deconvolution has advanced
`significantly, deconvolved images tend to have a characteristically
`flat texture and ringing artifacts.
`
`One alternative to capturing a coded 2D image is acquiring a
`redundant representation of the scene composed of many pho-
`tographs. Light fields [Levoy and Hanrahan 1996; Ng 2005] and
`focal stacks [Streibl 1985] are composed of multiple images that
`are either seen through different portions of the aperture, or focused
`at varying depths, respectively. Light fields can be rendered into an
`image by synthetic aperture focusing [Isaksen et al. 2000; Vaish
`et al. 2005]. Prior works in focal stack compositing [Agarwala
`et al. 2004; Hasinoff et al. 2008] simulate a hypothetical camera’s
`depth of field by extracting from each slice the regions matching
`the proper level of blur for a given aperture size and focus dis-
`tance. However, while focal stack compositing is a rather well-
`known technique demonstrated to be light efficient, it is yet to be
`analyzed with respect to the geometric implications of a particular
`composite. Specifically, the proper spatial relationships between
`composited pixels necessary for artifact prevention are not yet well-
`studied. As a result, composites produced by these techniques fre-
`quently suffer from visual artifacts.
`
`3 Theory
`
`We now build a theory of focal stack compositing as a tool for ma-
`nipulating depth of field, following the pipeline depicted in Fig-
`ure 2. Our theory differs from prior work in two key ways: 1) it
`is fully general and allows non-physical, artistically driven depth-
`of-field effects, and 2) it explicitly models the interplay between the
`lens optics and scene geometry in order to remediate visual artifacts
`at depth edges.
`
`We begin by assuming a thin-lens model and the paraxial approx-
`imation. For now, let us also assume the existence of a depth map
`ˆZ(~p), where ~p = (px, py) is a pixel’s location on the sensor relative
`to the optical axis. ˆZ(~p) is defined as the axial distance between the
`lens and the object hit by the chief ray passing through ~p when the
`sensor is focused at infinity. We discuss depth map extraction in
`Section 4.
`
`Although the pipeline shown in Figure 2 is written in terms of
`object-space depths Z, it is algebraically simpler to express our
`theory in terms of the conjugate sensor-space distances S. This
`simplification is a consequence of the 3D perspective transform ap-
`plied by the lens as the scene is imaged. Figure 3 illustrates this
`transform. Many geometric relationships in object space become
`arithmetic in sensor space, and thus are less unwieldy to discuss.
`
`Figure 3: A lens performs a perspective transformation. A linear
`change in sensor position (on the right) corresponds to a non-linear
`change in object distance (on the left.) Coupled with this change in
`depth is lateral magnification (up and down in the figure.) Although
`it is conceptually easier to reason about geometric relationships
`in object space, the non-linearity it introduces makes the algebra
`unwieldy. Therefore, we primarily work in sensor space for the
`remainder of the paper.
`
`Figure 4: Defocus blur. The rays emitted by the object at depth
`ˆZ converge at a distance ˆS behind the lens. A sensor placed at a
`distance S, instead of the correct distance ˆS, will not sharply image
`the object. The rays do not converge on the sensor but rather create
`a blur spot of radius C. The aperture radius A determines the rate
`at which C grows as the sensor moves away from ˆS.
`
`The Gaussian lens formula tells us the relationship between a scene
`point and its conjugate. If we apply this to the scene’s depth map,
`ˆZ(~p), we define a map ˆS(~p) = (1/f − 1/ ˆZ(~p))−1, where f is the
`focal length of the lens. ˆS(~p) is constructed such that the pixel at ~p
`will be in sharp focus if the sensor is placed at a distance ˆS(~p).
`
`Working in this conjugate sensor space, we define our framework
`as follows: given a set of images {Ij }j taken at sensor positions
`{Sj}j with f-number N, we want to calculate a composite ˜I that
`approximates a hypothetical camera with a sensor placed at S⋆ and
`f-number N ⋆. Later we will relax this constraint and allow a hy-
`pothetical camera that is non-physical.
`
`3.1 Defocus Blur
`
`Defocus blur is a consequence of geometric optics. As shown in
`Figure 4, if the sensor is placed at a distance S from the lens, a
`blur spot of radius C forms on the sensor. This blur spot is known
`as the circle of confusion, and its shape is referred to as the lens’
`bokeh. Using similar triangles, one can show that C = A(1−S/ ˆS).
`Rewriting the the aperture radius A in terms of the focal length f
`and the f-number N, we obtain,
`
`C =
`
`f
`2N
`
`(1 − S/ ˆS).
`
`(1)
`
`If
`Note that C, as defined in Equation (1), is a signed quantity.
`C > 0, then the camera is focused behind the object. If C < 0,
`then the camera is focused in front of the object, and the bokeh
`will be inverted. For most lenses, the bokeh is approximately sym-
`metric, so it would be difficult for a human to distinguish between
`defocus blurs of C and −C. Despite this perceptual equivalence,
`
`3
`
`APPL-1007 / Page 3 of 10
`APPLE INC. v. COREPHOTONICS LTD.
`
`
`
`Stanford Computer Graphics Laboratory Technical Report 2012-1
`
`we choose the above definition of C, rather than its absolute value,
`because it maintains a monotonic relationship between scene depth
`and defocus blur when compositing.
`
`Defocus Maps
`
`Depth of field is typically defined as the range of depths within
`which objects are imaged sharply. Implicit in this definition is, first,
`that sharpness is dependent solely upon the depth, and second, the
`range is a single contiguous interval, outside of which objects will
`be blurry. However, often the desired blurriness is dependent also
`on objects’ locations in the frame (e.g.
`tilt-shift photography, or
`other spatially varying depth-of-field effects). Such non-standard
`criteria can be captured by a more general representation, namely a
`map of desired circle-of-confusion radii across the sensor. We call
`this function C(~p) a defocus map.
`
`A defocus map encapsulates the goal of focal stack compositing.
`For example, if we wish to simulate a camera with the sensor placed
`at a distance S⋆ with f-number N ⋆, the desired defocus map can
`be calculated from ˆS(~p) as
`
`C ⋆(~p) =
`
`f
`2N ⋆ (1 − S⋆/ ˆS(~p)).
`An all-focused image is trivially specified by C ⋆(~p) = 0. One can
`specify arbitrary spatially-varying focus effects by manually paint-
`ing C ⋆, using our stroke-based interface presented in Section 5.3.
`
`(2)
`
`3.2 Sensor Distance Maps
`
`Given C ⋆ as a goal, our task now is to find a composite that cor-
`responds to a defocus map as close to C ⋆ as possible. We will
`represent our solution as a function that defines the desired sensor
`position for each pixel. We call such a function a sensor distance
`map. This is a convenient choice as a proxy for focal stack indices
`because it has physical meaning and is independent of the depth
`resolution of our focal stack. It also lends itself well to an alter-
`native interpretation of focal stack compositing as the construction
`of a sensor surface that is conjugate to a (potentially non-planar)
`surface of focus in object space.
`
`If our only concern is matching C ⋆ as closely as possible, finding
`the optimal sensor distance map is straightforward. For any given
`(unsigned) defocus blur radius, two sensor positions will achieve
`the desired blur—one focused in front of the scene object and one
`focused behind. Because we defined the defocus blur radius C ⋆
`to be a signed quantity, however, there is no ambiguity. Accord-
`ingly, we can find the sensor distance map ˜S0(~p) for a preliminary
`composite by inverting the relationship given by Equation (1):
`
`(cid:19) .
`˜S0(~p) = ˆS(~p)(cid:18)1 −
`An all-focus image is trivially specified by ˜S0(~p) = ˆS(~p). We call
`˜S0(~p) a preliminary sensor distance map because, as we will show
`in Section 3.4, it may not produce a visually pleasing composite.
`
`2N C ⋆(~p)
`f
`
`(3)
`
`3.3 Compositing
`
`In order to build a preliminary composite ˜I0, we must determine
`which pixels from the focal stack best approximate the desired sen-
`sor distance ˜S0. A simple choice would be to quantize ˜S0 to the
`nearest sensor positions available in the focal stack and assign pix-
`els the colors from those slices. The resulting defocus blur ˜C0 for
`such a composite will approximate C ⋆. Figure 5 shows a compari-
`son between ˜C0 and C ⋆ for two depth-of-field manipulations.
`
`4
`
`(a) All-focus composite
`
`(b) Composite simulating a wide aperture
`
`Figure 5: Depth-defocus relationships for focal stack composites.
`(a) An all-focus composite is characterized by |C ⋆| = 0 (shown in
`purple.) If our stack has slices at sensor distances corresponding to
`the red circles, then the composite will assign each pixel the color
`of the nearest stack slice in sensor distance (as segmented by the
`dotted vertical lines), thus creating the depth-defocus relationship
`given by | ˜C0| (shown in orange.) | ˜C0| is farthest from |C ⋆| midway
`between stack slices, so as one might suspect, adding more stack
`slices will improve the all-focus composite. (b) A simulated wide
`aperture composite is characterized by a |C ⋆| that grows quickly
`as it moves away from its conjugate plane of focus (dashed pur-
`ple line.) This can be approximated by “flipping” sensor positions
`about the conjugate plane of focus, such that an object nearby is
`assigned the color of a slice focused far away and vice versa.
`
`However, quantizing ˜S0 as described above can create discontinu-
`ities in the defocus map ˜C0 as seen in Figure 5(b). These discon-
`tinuities manifest themselves as false edges in a composite when
`transitioning between stack slices. We can smooth these transi-
`tions by linearly interpolating between the two closest stack slices
`as an approximation for ˜S0 instead of quantizing. This provides
`a good approximation in most cases. Interpolation should not be
`used when C ⋆ calls for a pixel to be sharper than both of its nearest
`stack slices (i.e. the scene object’s focused sensor distance ˆS(~p) is
`between the two nearest stack slice positions.) In this circumstance,
`blending the two slices will only increase the defocus blur at ~p, so
`it is best to just choose the closer single slice—this case is shown
`in Figure 5(a).
`
`3.4 Eliminating Color Halos
`
`Assigning each pixel a sensor distance independently of its neigh-
`bors will create a composite whose per-pixel blur ˜C0 matches C ⋆
`as closely as possible. However, our goal is not just to obtain blur
`that matches C ⋆, but to do so without producing visual artifacts.
`Halo artifacts, which we define as color bleeding across depth dis-
`continuities, are common in preliminary composites and are visu-
`ally objectionable, as demonstrated in Figure 6. Therefore, we will
`compute a final sensor distance map ˜S(~p) that generates a halo-free
`composite whose per-pixel blur is close to C ⋆.
`
`Halos, we claim, are the manifestation of the “double-counting” of
`rays, i.e. more than one pixel in the composite integrating a given
`ray. Consider a ray from an object sharply imaged in one pixel.
`If captured again by another pixel, it will necessarily appear as a
`defocused contribution of the same object. Figure 7 illustrates this
`geometry. The result is the characteristic color bleeding of halos.
`
`APPL-1007 / Page 4 of 10
`APPLE INC. v. COREPHOTONICS LTD.
`
`
`
`Stanford Computer Graphics Laboratory Technical Report 2012-1
`
`(a) All-focus scene layout
`
`(b) Sharply focused at Z1
`
`(c) Sharply focused at Z2
`
`(d) Adjusting focus to prevent halo
`
`(e) A halo-free surface of focus
`
`(f) Alternative halo-free surfaces of focus
`
`Figure 7: Halo geometry. (a) An example scene with a green and red object at depths Z1 and Z2, respectively. For an all-focus composite,
`the requested surface of focus coincides with Z1, then jumps to Z2 at the occlusion boundary. This surface of focus corresponds to the sensor
`surface shown in purple on the right side of the lens. (b) All light emitted from a point on the foreground object at Z1 (green shaded area)
`converges properly at the sensor. (c) Near the occlusion boundary, only a fraction of the light emitted from the background object at Z2
`(red shaded area) reaches the sensor. The light that is blocked is replaced with a defocused contribution from the foreground object (green
`shaded area.) This contribution appears visually as a green haze over the red background object—similar in appearance to that seen in ˜I0 in
`Figure 2. This haze, next to the otherwise sharp silhouette of the foreground object, is a halo artifact. (d) The closest halo-free alternative is
`to focus on the line passing through the edge of the foreground object and the corner of the lens aperture. Any bundle of rays leaving a point
`on this line will not be occluded by the foreground object. The blue portion of this line gives a halo-free transition in the surface of focus
`between Z1 and Z2. The corresponding sensor surface transition is drawn in blue to the right of the lens. (e) A halo-free surface of focus and
`its corresponding sensor surface. (f) Alternative halo-free surfaces and their conjugates can be found by choosing to defocus the foreground
`instead of the background (the orange transition connecting the two focus distances) or some combination of both (the grey shaded regions).
`The best transition choice is application-specific.
`
`should have its gradient bound as follows,
`
`k∇ ˜S(~p)k ≤
`
`˜S(~p)
`A
`
`.
`
`(4)
`
`where A is the aperture radius. Under the paraxial approximation,
`Equation (4) applies to all other pixels ~p. Therefore, acceptable
`sensor surfaces are those that satisfy Equation (4).
`
`Now that we know how to mathematically characterize halo-
`inducing sensor configurations, we may construct a corrected sen-
`sor distance map ˜S that avoids them, by algorithmically enforc-
`ing the constraints. Note that na¨ıvely checking the slope of ˜S be-
`tween every pair of pixels will yield an algorithm whose runtime
`is quadratic in the number of pixels. Instead, we observe that for
`each ~p, it is sufficient to check the slope between ~p and its clos-
`est neighbor ~q whose sensor distance is s, for each possible value
`of s. This holds because the constraints arising from checking all
`other pixels are necessarily weaker than those we check. This opti-
`mization reduces the time complexity of the algorithm to be linear
`in the number of pixels, at the cost of introducing a linear depen-
`dence on the depth resolution. However, the set of values occurring
`in the sensor distance map is typically small—on the order of the
`number of slices in the focal stack. Algorithm 1 summarizes the
`implementation of this optimization.
`
`Algorithm 1 iterates over the set of sensor distance values. Each
`iteration, corresponding to a particular sensor distance s, enforces
`all pairwise constraints that involve any pixel whose sensor distance
`is s (the set of such pixels is denoted by Q0.) More precisely, we
`identify pixels that interact with Q0, ordered by increasing distance
`from Q0, by iteratively dilating Q0. We then adjust their sensor
`distances if their interaction with Q0 violates Equation (4).
`
`(a) Haloed composite
`
`(b) Ground truth photograph
`
`Figure 6: Halo artifacts. (a) A preliminary composite with a halo
`artifact (inset). (b) A long-exposure ground truth f/22 image.
`
`Intuitively, observing a single ray twice in a composite should be
`avoided, because it is physically impossible. Real photographs of
`opaque objects never contain halos: once a ray has been captured by
`a pixel, it cannot be detected by another, even with an exotic, non-
`planar sensor surface (which we simulate with our composites.) Fo-
`cal stack composites are not constrained in this manner, because
`pixels at different sensor distances are not necessarily captured si-
`multaneously, leaving open the possibility of double-counting rays.
`
`Geometrically, the double-counting of rays by two distinct pixels
`is equivalent to the two pixels being collinear with a point on the
`aperture. In order to detect this condition, we need to examine each
`pair of pixels, extend a line through them, and test whether this
`line intersects the aperture. If it does, then that pair of pixels will
`constitute a halo. Algebraically, this test is equivalent to asking
`whether the gradient of ˜S is bounded by some maximum rate of
`change. For example, for a pixel ~p located on the optical axis, ˜S(~p)
`
`5
`
`APPL-1007 / Page 5 of 10
`APPLE INC. v. COREPHOTONICS LTD.
`
`
`
`Stanford Computer Graphics Laboratory Technical Report 2012-1
`
`Algorithm 1 Constructing ˜S.
`˜S ← ˜S0.
`DilationLimit ← length of image diagonal
`for all sensor distances s do
`Let Q0 be the set of all pixels ~q such that ˜S(~q) = s.
`for r = 1 to DilationLimit do
`Let Qr = dilate(Qr−1, 1 pixel).
`Let ∂Q be the set of newly included pixels in Qr.
`
`A r, s + sA r).
`Let (Smin, Smax) = (s − s
`for all pixels ~p in ∂Q do
`Clamp ˜S(~p) to be in [Smin, Smax].
`end for
`end for
`end for
`
`The algorithm presented above generates a family of halo-free sen-
`sor distance maps. Figure 8 shows composites from one such fam-
`ily. The specific sensor distance map produced depends on the order
`in which the sensor distances are considered in the outer loop of the
`algorithm. Because each iteration resolves any conflict involving
`pixels at a specific sensor distance s, those pixels at distance s will
`be unaffected by future iterations. As such, more important sensor
`distances should be prioritized if possible. It is difficult to deter-
`mine the best order a priori. In fact, it will vary based on the user’s
`intentions. Our implementation uses a foreground-favored ordering
`by default, and would produce the composite shown in Figure 8(a).
`
`The theory presented above relies heavily on knowing where the
`edge of the aperture is, and hence on the thin-lens model and the
`paraxial approximation. Real photographic lenses, on the other
`hand, are complex systems of multiple elements, and as such, may
`deviate from our assumptions. If we had knowledge of the exact
`optical parameters for a lens, we could perform a similar analysis
`to more accurately model the spatial extent of halos. In practice,
`without such knowledge, we conservatively over-estimate the size
`of halo effects to be twice the amount the theory would imply.
`
`3.5 Reducing Blur with a Variable Aperture
`
`The halo elimination algorithm just described removes color bleed-
`ing by sacrificing some accuracy in matching C ⋆. For extended
`depth of field composites, this manifests as blurriness near depth
`edges. If the camera is equipped with a controllable aperture, we
`can further improve on our composite by operating on a focus-
`aperture block, rather than a focal stack. A focus-aperture block
`is a 2D family of photographs with varying focus as well as varying
`aperture radius.
`
`Note that being able to capture narrow-aperture photographs does
`not necessarily obviate the work needed to generate an all-focus
`image, for two reasons: 1) the narrowest aperture may not be small
`enough, and 2) images taken with a small aperture are noisy, as-
`suming a constant exposure duration. Wherever the depth map is
`flat, a properly focused wide-aperture photograph should be just as
`sharp as its narrow-aperture counterpart, and less noisy. However,
`near depth discontinuities, we can trade off noise against blurriness
`by selecting the appropriate f-number for each pixel.
`
`(a) Foreground-favored composite (b) Background-favored composite
`
`Figure 8: Alternative halo-free composites. Recall from Figure 7(f)
`that there exists a family of halo-free surfaces of focus that well ap-
`proximate a given preliminary surface of focus. The specific halo-
`free surface of focus generated is determined by the order in which
`sensor distances are processed in the outer loop of Algorithm 1. (a)
`A composite that prioritizes foreground objects, produced by pro-
`cessing sensor distances in decreasing order. This corresponds to
`the blue depth transition shown in Figure 7(f). (b) A composite that
`prioritizes background objects, produced by processing sensor dis-
`tances in increasing order. This corresponds to the orange depth
`transition shown in Figure 7(f).
`
`Figure 9: Adjusting aperture to prevent halo. If we use the full
`aperture of the lens, the sensor will integrate all the light shown
`in the shaded regions, including a halo-causing contribution from
`the foreground object. While this can be addressed by adjusting
`focus as in Figure 7(d), stopping down the aperture as shown above
`reduces the light reaching the sensor to just the red shaded region,
`effectively blocking the contribution from the foreground object. In
`general, we can eliminate halos without introducing any blur, by
`reducing aperture near occlusion boundaries.
`
`To handle focus-aperture blocks, we must slightly modify Algo-
`rithm 1. Specifically, we are to find not only a halo-free sensor
`distance map ˜S(~p), but also a spatially varying aperture radius map
`˜A(~p) that accompanies it. We solve this problem using a two-pass
`approach. In the first pass, we initialize ˜A(~p) to be the largest aper-
`ture radius available, and execute Algorithm 1. However, instead of
`clamping the sensor distance whenever a halo is encountered, we
`narrow the aperture at the affected pixel location by the appropri-
`ate amount to satisfy Equation (4). It may be that the narrowest
`available aperture is still too large, in which case we settle for this
`value. In the second pass, we execute the algorithm in its original
`form, clamping the sensor distance according to the local constraint
`based on the s