Texture and Depth Enhancement for Motion
`Pictures and Television
`By Christopher A. Mayhew
`Various methods of texture and depth enhancement, collectively called
`Vision 111‘", have been under development for the past four years. By
`simultaneously providing both eyes with alternating images that differ
`in parallax and are time-displaced to coincide with the persistence ofthe
`visio-psychological memory rate, the processes relay depth information
`to the brain in a form that can be translated into a stable three-
`dimensional image. The processes are suitablefor both standard broad-
`cast video and motion-picture productions without equipment modifi-
`cation (other than the camera) and without requiring the viewer to wear
`special glasses.
`phenomena in which a short-term
`memory buffer is used by the brain to
`compare (fuse)
`left and right-eye
`views.2 The Vision 111 processes differ
`from other 3-D systems in that they
`the short-term memory be-
`lieved to be used by the human brain
`in depth perception.2
`Not all people with binocular vision
`can perceive three-dimensional
`ages.3 From 2 to 10% of the popula-
`tion may fail to experience stereopsis
`in their day-to-day lives. Another 10%
`may achieve only limited stereopsis. It
`has been shown that stereoscopic vi-
`sion differs greatly among individuals
`and that some practice is involved.
`in 1967, Ogle observed that left
`and right-eye information can be pre-
`sented alternatively to the left and
`right eyes, resulting in depth percep-
`tion as long as the time interval does
`not exceed 100 mscc.‘ McLaurin et
`al. state that if visual information is in
`fact compared in a temporary memo-
`ry and does not have to be received
`there is no reason
`why stereoscopic information that is
`appropriately sequenced at the proper
`rate cannot be observed by a single
`In 1984, Jones et al. reported that the
`human brain can accept and process
`back of the eye, lined with nerve cells,
`or neurons (Fig. 1). Since the surfaces
`of the retinae are concave, the two
`images focused by the eyes are two-
`dimensional. Both two-dimensional
`images are transmitted along the op-
`tic nerve to the brain's visual cortex
`(Fig. 2). It is in the visual cortex that
`the two images are combined,
`through stcreopsis, to form a cyclope-
`an view of the scene being observed by
`both eyes. The brain is thought of as
`having a one—eyed or cyclopean view
`of the world, that is, the ability to
`determine direction and see in three
`dimensions, something each eye can-
`not accomplish alone.
`There is some question whether this
`cyclopean view is created by the
`simultaneous processing of left and
`right-eye views or is a replacement
`The term three-dimensional (3-D)
`has been expanded over the past
`several years by the computer-imag-
`ing industry to include images pro-
`duced using depth cues that take ad-
`vantage of perspective, shading,
`reflections, and motion. Although
`these images can be rendered with
`incredible results, they are neverthe-
`less two-dimcnsional (2-D). A true 3-
`D image contains binocular parallax
`information in addition to other two-
`dimensional depth cues.
`To the general public, 3-D means
`unreal depth, produced by hyper- or
`superstcreoseopic methods that, by
`increasing the disparity between cam-
`eras to more than the human intero-
`cular distance, produce a pseudo 3-D
`effect beyond what the eyes could nat-
`urally perceive. Viewers tend to be
`disappointed if image depth is not ex-
`aggerated. The methods described in
`this article produce a realistic and
`natural depth and for that reason are
`referred to as texture and depth en-
`hancement rather than 3-D.
`Human Depth Perception
`Nature has given humans binocu-
`lar vision, endowing them with two
`eyes that look in the same direction
`and whose visual fields overlap.‘ Each
`eye views a scene from a slightly dif-
`ferent angle. The scene viewed is fo-
`cused by the eye’s lens onto the retina.
`The retina is a concave surface at the
`Figure 2. The human brain's visual pathway (bottom vlaw).
`parallax information without regard to
`the direction of the parallax.‘ Jones fur-
`ther stated that vertical parallax, when
`sequentially displayed at a rate of be-
`tween 4 to 30 changes/sec, produces a
`sense of depth that is superior to that
`produced by horizontal parallax pre-
`sented in the same manner.
`3-D Image Display Systems
`Although nature saw fit to develop
`humans anatomically and psychologi-
`cally capable of viewing the world
`around them in three dimensions, it
`did not give them the natural ability
`for depth perception on 2-D screens,
`such as those used in motion pictures
`or television. The depth perceived on a
`2-D screen is read into the image by
`viewers through a learned process
`based on their cultural and sociologi-
`cal backgrounds.7
`In 1980, when Okoshi
`3-D displays, he created two main
`categories, binocular (stereoscopic)
`and autostereoscopic.8 Stereoscopic
`techniques require the observer to
`wear a viewing apparatus. This group
`includes stereoscopes, polarization,
`anaglyphic, chromo-stereoscopic,
`Pulfrich, and shuttering technologies.
`Autostereoscopic displays do not re-
`quire viewing devices. They include
`holography, lenticular screens, paral-
`lax barriers, and alternating pairs.9
`A great deal of effort has been de-
`voted to developing 3-D display hard-
`ware. In television and motion-pic-
`ture displays, the approach has been
`mostly mechanical — using an appa-
`ratus to force each eye to see a differ-
`ent perspective of the image. Exam-
`ples are polarization, anaglyphic,
`Pulfrich, and shuttering methods.
`While the depth effect from these
`methods can be spectacular, the ne-
`cessity of glasses makes them awk-
`ward to use and can lead to eyeetrain
`or headaches.
`There are a few autostereoscopic
`methods for television and motion
`though to date none have
`been used or demonstrated commer-
`cially. The major problem is compati-
`bility with existing television or mo-
`tion-picture display systems or image
`Vision III Background
`In 1984,
`I became interested in
`autostereoscopic broadcast television
`and motion-picture production and
`undertook an intensive research and
`development effort. A study of prior
`3-D systems and their failures in the
`marketplace led to the conclusion that
`to gain acceptance in the entertain-
`ment industry, any new imaging
`system has to meet
`the following
`0 It must be compatible with exist-
`ing standard broadcast television and
`motion-picture systems.
`0 It must not require viewing aids
`(i.e., glasses or special screens).
`0 It must not produce eye strain or
`headaches in viewers.
`. It must produce a high-quality
`image that is superior to the best 2-D
`imaging system.
`c It must be cost-effective and no
`more difficult to use than convention-
`al production techniques.
`McLaurin et al. demonstrated that
`stereoscopic information can be per-
`ceived by one eye if presented in the
`proper manner and that it is possible
`to present alternating stereoscopic in-
`formation to both eyes using a stan-
`dard television screen.5
`In 1982, Jones et al. demonstrated
`a method for creating a 3-D effect by
`alternately displaying images from a
`pair of vertically aligned cameras at a
`rate of 4 to 30 changes/sec.lo The
`resulting images did have depth when
`displayed on an ordinary television
`set, but were unstable and introduced
`a distracting rocking motion to the
`picture (Fig. 3).
`Jones attempted to control
`rocking motion by adding a video
`mixing device, which intermittently
`superimposed the second camera’s
`image onto the first camera’s image at
`a rate of 4 to 30 times/sec, rather than
`alternating images as before." This
`did little to control the rocking motion
`and resulted in intermittent
`softening (Fig. 4).
`In 1985,
`I began experimenting
`with Jones’s and other autostereo-
`scopic methods using stop-motion
`film animation techniques. Anima-
`tion was chosen as a development
`technique because it allowed total
`control of the image frame by frame,
`it was cost-effective, and I am skilled
`in the art.
`A custom film animation stand for
`3-D research and development was
`designed and manufactured. The
`stand, named Moby‘", allowed the
`camera to be positioned and manipu-
`lated as desired. At the January 1987
`SPIE conference in Los Angeles,
`presented a paper on the results of
`some early experiments with depth
`enhanced cel animation.” A live-
`action video system was designed and
`manufactured in 1987. This system,
`called Teeter-Toner”, is capable of
`producing video images that are depth
`Over a two-year period, hundreds
`of film and video tests methods were
`developed that resulted in stable tex-
`ture and depth-enhanced images.l3
`Vision III Methods
`Vision 111 relies on discrete motion
`between the various depth planes in a
`given scene. This motion is accom-
`plished by two cameras (only one is
`needed for stop-motion) critically
`aligned and manipulated throughout
`the entire filming or taping of a scene.
`The idea is to keep the discrete motion
`constantly in balance, so that it is per-
`ceived as depth and not as a rocking
`Camera placement is achieved
`Figure 3. (a) First vertical point of origin from a pair of vertically aligned cameras; (b) Second vertical point of origin from a pair a! vertically aligned
`through simple geometry in the fol-
`lowing manner, using a spot in the
`exact center of the film or imaging
`plane of each camera as its point of
`In a typical scene having fore-
`ground and background portions as
`viewed by a hypothetical observer
`with binocular vision, the observer’s
`eyes define the first two left and right
`points of origin (points 24 and 26 in
`Fig. 5) and can be thought of as being
`65 mm apart. The eyes of the observer
`are directed along the optical axes (28
`and 30) and converge at point 20, the
`SMPTEJoumaI, October 1990
`in the scene being viewed.
`Line 52 connects the first two points
`of origin, and is bisected by an optical
`axis (12) that is perpendicular to line
`52 and intersects the optical axes (28
`and 30) at their convergence point
`(20), which is the subject in the scene
`being viewed.
`In Fig. 6, a second vertical line (54)
`intersects the first line (52) at the
`point where optical axis 12 bisects the
`horizontal line (52). The vertical line
`(54) and optical axis 12 are perpen-
`dicular to each other and to the hori-
`zontal line (52).
`Horizontal line (52) connects the
`hypothetical observer's left and right
`points of origin (24 and 26). The verti-
`cal line (54) connects the first and
`second effective points of origin (48A
`and 48B). Points 48A and 48B repre-
`sent the center points of both vertical-
`ly displaced lenses and imaging
`planes. The horizontal line (52) and
`the vertical line (54) form an object
`plane (50) in which points 24, 26, 48A
`and 488 are located. Optical axis 12 is
`perpendicular to the plane (50) and
`bisects the horizontal line (52) and
`the vertical line (54).
`“\‘I-xvz '
`w” V'
`V9‘ c /
`--‘ v
`Figure 4. (a) First point of origin after adding a video mixing device: (b) Second point of origin mixed with first point of origin.
`The purpose of the observing steps
`is to define plane 50 with regard to
`point 20, the subject in the scene being
`viewed. As plane 50 and its perpendic-
`ular optical axis 12 move through or
`around the scene being viewed, or an
`object in the scene moves with regard
`to plane 50, points 48A and 48B move
`equidistantly on either side of the hor-
`izontal line (52), along the vertical
`line (54) toward or away from each
`other. Figure 7 shows that how the dis-
`tance that points 48A and 48B move
`equidistantly toward or away from
`each other depends on the distance
`between planes 50 and 56. Plane 56
`Figure 5. Observer’s optical axes converged on subject. Observer's eyes are located at points
`24 and 26.
`Figure 6. Observing plane 50
`Figure 7. Observing plane 50. converging plane 56. and subject plane 55.
`contains point 58, where optical axes
`40A and 408, which originate at
`points 48A and 48B, converge and
`intersect optical axis 12 out in front of
`plane 50. Planes 50, 56, and 55 are
`parallel to each other and perpendicu-
`lar to optical axis 12. Plane 56 con-
`tains point 58, which is defined by the
`SMPTE Journal. October 1990
`closest object to plane 50in the scene
`being viewed. Plane 55 is always be-
`hind plane 56 and contains point 20,
`the main object in the scene being
`viewed and generally the point of
`Disparity is the distance between
`the center points on the camera’s im-
`aging planes. The disparity can be
`anywhere from 1 to 25 mm or greater,
`determined by the distance from the
`camera’s imaging planes to the con-
`vergence point. Convergence is al-
`ways set on the closest object to the
`camera and changes if necessary to
`maintain that relationship. Time-
`displacement rate is the number of
`changes per sec between cameras.
`This rate can be from 1 to 60 in video
`or 1 to 24 in film. A typical outdoor
`scene may use the following settings:
`Convergence = ll m
`Disparity = l2 mm
`Time-displacement rate = 8 changes/sec
`Once the geometry was under-
`stood, it was necessary to go about the
`art practicing it. This proved to be
`difficult in live-action imaging. It be-
`came necessary to design and build
`a custom camera-mounting device
`capable of performing the precise
`camera manipulation, as well as other
`practical considerations.
`Teeter-Totter Camera System
`The Teeter-Totter camera system
`consists of a mounting device with two
`trays, one for each of the cameras.
`The trays are interlocked via a teeter-
`totter type of mechanism, which pro-
`vides that the manipulation of one
`camera has an equal and opposite ef-
`fect on the other camera (Fig. 8). The
`cameras are horizontally opposed, but
`travel in equal and opposite directions
`along a north/south are, centered on
`the convergence point. To bring the
`cameras’ optical paths in line in the
`Vision 111 fashion, a folded optical
`path was employed. The folded opti-
`cal path is composed of a first surface
`mirror and a 50/50 cube beam split-
`ter. Two screws are provided for
`changing disparity and convergence
`while shooting.
`The cameras currently used are
`BTS LDK-90 broadcast video cam-
`eras (Fig. 9). Video cameras were se-
`lected over motion-picture cameras
`for the initial Teeter-Toner prototype
`because video cameras are easier to
`align. The BTS LDK-90 was chosen
`because it uses full-frame-transfer
`charge-coupled-device (CCD) chips
`and has a built-in 1/4-wave IR plate,
`which assists in controlling the s- and
`p-polarization effect of the folded op-
`tical path. Canon EJ-35 prime prism
`corrected lenses were adapted for use
`on the LDK-90 'lz-in. format. The
`lenses match each other
`in focal
`length within 1%.
`- awn—J
` Iflllllllllt"
`Figure 10. Teeter-Totter system in use on location.
`A custom component and compos-
`ite switcher was employed to switch
`the LDK-90 camera signals at a rate
`between 1 to 60 changes/sec. The out-
`put of the switcher is depth enhanced
`and can be recorded on a standard
`VTR or VCR.
`The current Teeter-Tottcr system
`requires careful camera alignment to
`eliminate horizontal movement in all
`depth planes, precision matching of
`chrominance and luminance between
`cameras. and a great deal of operator
`to juggle disparity, conver-
`gence, and time-displacement rates to
`maintain a stable image (Fig. 10).
`The Teeter-Totter is being automated
`to control the disparity, convergence,
`time-displacement rate, exposure,
`and focus functions. This will help
`with image stability and provide the
`system with user-friendliness when
`operated. A single-camera system is
`currently in development.
`Vision 111 has come a long way in
`the past four years. The Teeter-Totter
`camera system produces extremely
`sharp texture and depth-enhanced
`broadcast-quality video images. The
`texture and depth-enhancement illu-
`sion is best when displayed on a large
`screen. There is still much to be devel-
`oped and improved, but the system is
`close to meeting all of the criteria
`identified in this article as necessary
`for production industry acceptance.
`The Vision III methods have tremen-
`dous intrinsic merit and potential for
`future spin-off technology in the com-
`puter imaging, video games, medical
`imaging, and defense industries. With
`an industry nervous about new exhibi-
`tion formats like HDTV, this is a new
`look without the expense of retooling.
`I would like to thank Peggy May-
`hew, Al Rosson. Toxey Califf, Mat-
`tias Rucht, Eric Pritchard. Dick
`Meltzer, John Keys, Ed Gerwin, Jim
`Stewart, Duncan Brown, Ann Covalt.
`Mark Millsap, John Gaines, and John
`Nash for all their help over the past
`few years.
`l. J. Pettlgrew. “The Neurophysiology of Binocular
`Vision." Scientific American. 227:84-95. Aug.
`2. D. Marr. Vision. W. H. Freeman and Co.: San
`Francisco. I982.
`3. P. Tolin. “Maintaining the Three-Dimensional
`Illusion." Information Display. Dec. I987. p. l I.
`4. K. N. Ogle. ”Some Aspects ofStereoseopic Depth
`Perception.“ J. Opt. Soc. of America, 57:l073—
`l08l.Sept. 1967.
`5. A. P. McLaurin. "Visual Image Depth Enhance-
`ment Process: An Approach toThree-Ditncnsion-
`al Imaging." Displays. 0141.1 I2. July I986.
`6. E.Joneseta1.."VlSIDEP"': Visual Image Depth
`Enhancement by Parallax Induction." Proc. of
`the SPIE. Los Angeles. Calif.. Jan. 1984.
`7. J. Dercgowski. "Pictorial Perception and Cul-
`ture." Scientific American 227:82-88. Nov.
`8. T. Okoshi. “Three-Dimensional Displays.“ Proc.
`ofthe IEEE. 68:569. Mny I980.
`9. L. Hodges ct nl.. "TrueThree-Dimensional CRT-
`Based Displays." Information Display. J:I8-22.
`May I987.
`10. E. Jones ct al.. Three-Dimensional Display
`Methods using Vertically Aligned Point: ofOri-
`gin. U.S. Patent No. 4.429.328. Jan. 3]. 1984.
`II. E. Jones. Three-Dimensional Video Apparatus
`and Methods Using Composite and Mixed Im-
`ages. U.S. Patent No. 4.528.587. July 9. I985.
`12. C. Mayhcw. "True Three-Dimensional Animo-
`tion in Motion Pictures.” Free. of the SPIE. Los
`Angcles.Ca1if.. Jan. I987.
`l3. C. Mayhew et al., Methodfor Obtaining Images
`for Use in Displaying a Three-Dimensional Illu-
`sion and Related Recording Medium. U.S. Pa-
`tent No. 4.8l5,8l9. March 28. I989. w
