`
`gxVOpHyOp\p\DX=@?"]>SEgz]CaO={PC=N|uV6F\PC=NDG=N?"]QfgxSTP}]aO=MO=@~TPR=@={SEg,bdS/l]STP:S
`
`
`
`¡ N¢@(£@,¤C¥¥@¦§ ¢v¨^©_ª$«}¬©d¨,¨R-R(©®® £@( R^(R
`
`© Copyright 1992
`by Benjamin B. Bederson
`All Rights Reserved
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 2
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 2
`
`
`
`Abstract
`
`Wehavedeveloped a prototype miniaturized active vision system whose sensor architecture is based on
`a logarithmically structured space-variant pixel geometry. A space-variant image’s resolution changes
`across the image. Typically, the central part of the image has a very high resolution, and the resolution
`falls off gradually in the periphery. Our system integrates a miniature CCD-based camera, pan-tilt
`actuator, controller, general purpose processors and display. Due to the ability of space-variant sensors
`to cover large work-spaces, yet provide high acuity with an extremely small number of pixels, space-
`variant active vision system architectures provide the potential for radical reductions in system size and
`cost. We haverealized this by creating an entire system that takes up less than a third of a cubic foot.
`In this thesis, we describe a prototype space-variant active vision system (Cortex-I) which performs
`such tasks as tracking moving objects and license plate reading, and functions as a video telephone.
`Wereport on the design and construction of the camera (which is 8 x 8 x 8mm), its readout, and
`a fast mapping algorithm to convert the uniform image to a space-variant image. We introduce a new
`miniature pan-tilt actuator, the Spherical Pointing Motor (SPM), which is 4 x 5 x 6cm. The basic idea
`behind the SPMis to orient a permanent magnet to the magnetic field induced by three orthogonal coils
`by applying the appropriate ratio of currents to the coils. Finally, we present results of integrating the
`system with several applications. Potential application domains for systems of this type include vision
`systems for mobile robots and robot manipulators, traffic monitoring systems, security and surveillance,
`telerobotics, and consumer video communications.
`The long-range goal of this project is to demonstrate that major new applications of robotics will
`become feasible when small low-cost machine vision systems can be mass-produced. This notion of
`“commodity robotics” is expected to parallel the impact of the personal computer, in the sense of opening
`up new application niches for what has until now been expensive and therefore limited technology.
`
`¯ ° ±³²µ´·¶¹¸º²» ,
`¢^®¬®¢Å@,^^©@®T$ÆÇ(¬«:£
`
`©:( § «}©½®_N(RÅN_êf%«}©¬N©£¢-x(^«:RÅ@R^½¬x¢}©f½(R¬®®©¿NÅR®¬R § ^RÅ©K¿N©½«}^%^:¿N«>«¼½©^©RÄÆ}®©£À\©£N¼£NN®6Ç((ªxR¿(o-^«}©
`
`iii
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 3
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 3
`
`
`
`Acknowledgements
`
`This work is the result of a collaborative effort. It was a pleasure to have had the opportunity to work
`with such fine minds as I found in Eric Schwartz (my thesis advisor), Richard Wallace (my colleague
`and constant teacher), and Ping-Wen Ong (my fellow student and supporter).
`Eric has been doing research leading up to this project for at least ten years. He found the funding
`for it, and had manyof the critical ideas that got it moving. The Spherical Pointing Motor was his
`original idea, and only through(literally) years of bouncing it off each other, did we get it right.
`Richard is largely responsible for the digital signal processing in the system. If it weren’t for him,
`we wouldn’t have a video display. He had infinite patience in helping me work through problems. He
`was always available to assist me, and helped to keep the project well directed.
`Ping-wen came through at the last minute to integrate his license plate tracking software to work
`with Cortex-I.
`In addition, he was a motivating force in my studying Chinese. While not directly
`critical to this project, studying Chinese helped me keep my sanity during various traumas — such as
`when thefirst prototype went flying through the air as I got a great electric shock — a few hours before
`we departed for Chicago where Cortex-I wasfirst publicly demonstrated.
`Everyone at the NYU Robotics Lab was always optimistic and supportive as Cortex-I slowly took
`shape. Bud Mishra, especially, had good ideas, and was kind enough to read this thesis twice.
`Andof course, I thank my family, friends, and cat (Tria) who have had to put up with me as I’ve
`been counting down the monthsto finish this, and start the next phase of mylife.
`
`¯ ¸Âíïî ðÕñ òzóÕô õöóÕ÷ óÕîø²3±ÄÆ%(Îf(R½®¬Çt:¿N®¬® § _(¬Nz^ÉEN$C Ò Æ%}®¬C½(:NviNE½©¬x¢f(f(Î%¬n$½
`¬©£À » R©oÊz©£dÔÑ«¼¢fÇÑR®¬®_³$½^©@©d½ERÞ¿ (z
`-Ci¼£¬¼«}_¬©£
`¢NR(Ç § ½©^¬©£>¬É¾RK(^CÅ-K£^%Ð(¬£N@R o®£N^®¢dC$E©§ ®¬¼ÇÑ,(:-¬£NЮt¬£N©®O(-¿R(¬©£v¬©¾}¢-x(^«d Ò ÇÐ^(^©ÃzÇÑN«Kż½®-©Ã>k-^d-$®¢¼ù>c¬©-è
`
`iv
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 4
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 4
`
`
`
`Contents
`
` ð îø²ó îø²3±
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 5
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 5
`
`
`
`List of Figures
`
` ±³² ð õ
` ´3ó ±
`
`vi
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 6
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 6
`
`
`
` î ²µ´3ð ô
` ¸â² ð î «}½^©:iè^®}
`x¢^®¬®¢>N½¬(R
`^(¢¼ÇÈ$©
`«}½©@iÇ%(d¬¿N©@(©Æ%®,«:¬©@(¬©¬©£:}£EDF$'?HGJIK-/G12ML}©K¿R©@®"(R®½-©ÄÆ"(R6-^½($C"¬«:£L(-¿C¬©£%%Ð(¼^¿Àw©@"«}£C^Å©%«}¬©_(½(
`$
`¢^®¢$R«:
`( § ®^«Ç § ©-% z$©£ccx©(næ¤C¦kàÓæ¤C¦v¬«:£k-R_CiÇ [#\dÇÑ«}R¼E^}$C¿©
`Æ¢i¼$R®¬CrR®¬¢¼ÇÑ-¿½ 12C2^$'0!2C-/>10©>z¬©@(^(R$©£$LÇE(«:£%%¬®«:©@(©©£i«}%®¬RR®Ç"R©R(6ÇTCxLÇT«:£N$©@^(R$©£R^©@^© § (R¿N£©^CkN%$½o©f(^©o$R©-Ck(_ @B?A2C-5+=$.'>,?A@B2-/>,0i«}£CL$½®¢(
`(^£N¬N©i©iÆ(^©R("Ç%-Ruè
`
`
`
`
`CHAPTER 1.
`
`INTRODUCTION
`
`2
`
`resolution. It does so with the use of the retina, which is a space-variant sensor. It has a resolution that
`is high in the center (called the fovea), low in the periphery and it changes smoothly between them.
`Any system that uses a space-variant sensor must aim the sensor properly. Since space-variant sensors
`only have high resolution in the fovea, the current region of interest must be continuously tracked or
`foveated. Such a sensor must be mounted in a device that can aim the sensor with precision. A system
`with this capability is an example of an active or attentive vision system.
`A logmap sensoris a useful type of space-variant sensor that is modelled on the humanvisual system
`(see Section 1.3). The space complexity aspect of logmap sensors is particularly attractive and has
`been analyzed in detail [7].
`In this work, a spatial quality measure, Q,, for sensors is defined to be
`the ratio of its work-space to maximal resolution. Logmap sensors can each achieve comparable Q, to
`conventional uniform sensors that are one to four orders of magnitude larger. Image simulation of the
`human space-variant architecture suggests that this ratio may be as high as 10,000: 1 [7]. In current
`implementations of space-variant machine vision systems, logmap sensors with between 1000 and 2000
`pixels have comparable Q, to conventional sensors in the range of 256 x 256 to 512 x 512, a compression
`of between 60 : 1 to 250 : 1. Moreover, as shown in [?], Q, grows exponentially with the number of
`pixels in the logmap sensor, thus providing a highly favorable route to upgrading sensor quality. The
`high compression ratios cited above for the human visual system derive from the fact that human acuity,
`which is roughly one-arc minute in the fovea, when extrapolated over a 120 degree visual field, yields
`a constant resolution sensor with roughly 0.2 gigapixels. The logmap sensor with comparable Q, has
`roughly 10* to 10° pixels.
`In this discussion, compression refers to the ratio of the numberof pixels in a conventional uniform
`sensor to that of a space-variant sensor with the same @,. The issue of image compression,
`in the
`conventional (e.g. JPEG) usage is an independent issue not addressed in this thesis, but we point out
`that a form of progressive video coding, in which a video image is represented by a sequence of logmap
`images, is one application that has benefitted from the high compressionratio of the logmap transform. If
`each logmap image is centered on a different point in the video image and clipped so that we preserve the
`highest frequency information available at each point in the video picture, we can progressively construct
`a video image of increasing detail. Rojer [?] reported some experiments with progressive logmap video
`coding, and discussed the problem of selecting the best sequence of gaze points.
`If the sensor is not
`A space-variant sensor requires pointing the sensor at the region of interest.
`pointed (or foveated) properly, the desired object will fall somewhere on the periphery of the sensor
`resulting in a lower resolution image. So it is important to develop efficient eye movement mechanisms.
`A sensor movement system must find interesting objects, track them as they move, and account for
`movement of the device the sensor is mounted in. A short description of the way the human system
`solves the second two parts of this problem follows. It should be noted that the human visual system
`is able to track objects and foveate on them to within the accuracy of the retina’s highest resolution.
`There are four principal types of eye movements as reviewed by Robinson [?].
`
`|X} ~XF dZXWa|hv' ¦+=$9'>1?A@B2C-/>10
`$R©C Ò i(R®½-©fƬ£Nd¬©Ki¿^©@(^iÔÈR®®¬Cf(r`9>1(1$% ^Þ¿Å
` ¢-x(^«%¬KR § ¬®¬x¢f%©d¿Á«}®Ç6© #"'2-5(1$ÆQ ,22)$'0!2C-5(*$Q(R-N.-/>,0.*.2^$'4>?H>7#4E %:^©
`½¿Çѽ®
`«}^®®¬C:©}(½«:©:$½
`! 12C-/ 1? '@! 1?-524E$% 1%@B+$¿ÅFQCÅLÇÑ:^©¼}-^è©R § _fÇLÐz(Î@ÀwN¿,v«}Á-¬«:®/C$N®¬½© £N«:o$R©^©{RN{^N¿N«} § ®Qz(¿N©^©@(¬N©®½©ÐÇÑN« ^©
`«:£©¬½®(£^C Ò «:£N¼¬«¼½®_©oÇL½«:©Ì$N¿^À\_(©@z¬Cr½½£N£R$(
`^¿Àw©@%«:©¼@¬N©{$¢-$^«:RÅ
`N¿N«>
`¤däoNR_^CÅuN_%©Ó¬© n#\Å¡ £N_^ÁE©^©@®®¬¢Ì%¬Óf©½« § ^¼Ç¬Á-^®¬©¾:®£N«:¾^©
`ÔÈ£¥Nã F¦ Þ½£i,©{©-R
`®¬^©(_6 § R©¿è$CiÇÑ(«³£¼¿«}(R(¬N©,_(¬Ç-
`®£«:©$ÇÑ(«d Ò ÇRN®£N«}¼«:£Nt¿R©N(^(R¼©}-ÐÉT^(^©@tE©NL©¼%-^«}£Æ©>¿®
`( § ®^«ºÇÑ®®_^ Ò i½® § >©(Rcz(}@½«}©c$½®6¢-x(^« § ®>KÎo § ªxR¿(©
`(:(¿©Ã,£Cx(R®½-©uÄÆR(,Çѽ%©¿®Tx¢@ERÇ/R¢«}_R«>R©@(%(^^R § ¢ § ©©ª#w¤N«¬®®#¬¯6Ú±°#²ÄÆR-(¿(¿%«}_N^«}^©@(
`©>«}_Æ(^¢N @½Î@®¢vÔ[#\\³Àuå8\#\8³R´_R^©
`
`1. Saccades: These are discrete movements and move the eye quickly (300° - 400°/second) from one
`fixation point to another. They also correct errors of the pursuit system. There is a large latency
`(150 ms) between fixations. There are typically less than four saccades per second.
`
`2. Pursuit Eye Movements: The pursuit system tracks moving targets and keeps the current
`target foveated. It has a latency of 50ms.
`
`3. Vergence: This controls the depth that the two eyes fixate on together. It is the slowest system
`and has a latency of over 200ms.
`
`4. Vestibular Systems: The Vestibular Ocular Reflex and Optokinetic Reflex systems maintain
`the gaze of the eye, counteracting for head movements.
`I.e., if the head moves to the right, the
`eye movesto the left. This is one of the fastest types of eye movement and hasa latency of only
`14ms. These get sensory input from the semicircular canals that effectively constitute an angular
`inertial accelerometer with a frequency range from 0.017 to 17Hz.
`
`IPR2022-00090 - LGE
`
`Ex. 1015 - Page 8
`
`IPR2022-00090 - LGE
`Ex. 1015 - Page 8
`
`
`
`CHAPTER 1.
`
`INTRODUCTION
`
`3
`
`In addition to these four systems, there is a fifth type of movement known as physiological nystagmus.
`This is a small higher frequency jitter at 30 to 80Hz. These movements are very important to the low-
`level visual process, but are not an issue in the larger movements of object tracking with which we are
`concerned here.
`In our work, we focus on saccadic motions which are the simplest to implement and are used for both
`attention and tracking.
`The mappingof the retina to the visual cortex is another very important characteristic of the human
`visual system. Pioneering work by Hubel and Wiesel [?, ?] showed how some low-level feature detection
`performed on the retinal image was mapped to the V1 area in the visual cortex. Then Schwartz showed
`in [?, ?] that the mapping from the retina to the visual cortex takes a very specialized form. Specifically,
`the retina can be looked at as a polar coordinate system where each point represents one photo-sensitive
`cell. It gets mapped to what is essentially a rectangular coordinate system in the visual cortex. This
`can be represented mathematically by a complex logarithm and is depicted graphically in Figure 1.1.
`This mappingis called the logmap.
`The pointin retinal space, P,, can be denoted re*’ and the pointin cortical space, P2, can be denoted
`by the complex number z + iy. If we take the log of P,, we get:
`
`Substituting
`
`we get
`
`or,
`
`log(re’’) = logr + i0
`
`z=logr
`
`and y=8@,
`
`;
`log(re®®) = x + iy
`
`log(P,) = P..
`
`|X} ~XF dZXWa|hv' [Ò ©¼N-¬©zC$ÇѽO¢$R«:^Å(^(OèÇú¼x¢@EÇ«}_R«}^©@/Ω_%©N6¢®£^®©¢-x£N«½RÄÆ%:«}®¬®u¬£N^ÇÑN½^©^¢iªx¬$Rz [#\>fß\ùÄÆR«>_N^«}^©@%(^(¢f«}
`©}(%½R¼ÇÑ § (_R©N(¬N©K©vΩ£ÄÆ«:©£iÇT((¿©(½®E¿N$(¿Á¼
`©(^
`^(¢«}
`«:
`¿-©_($¢-$R« %RC}E©NRC$R©@(tN©%ÀÏ^©Ð(¬N¿R®¬®\ Ò £N¿«}ERcd%_R($R©@®®¬¢{vCr©£N½®,^-¬©