throbber
EXHIBIT 1041
`
`K. CHEN, A. ASTROM and P.E. DANIELSSON, “PASIC: A Smart Sensor for
`
`Computer Vision,” Pattern Recognition, 1990. Proceedings,”
`
`10th International Conference on pp. 286-291, 16-21 June 1990
`
`TRW Automotive U.S. LLC: EXHIBIT 1041
`PETITION FOR INTER PARTES REVIEW
`OF U.S. PATENT NUMBER 8,599,001
`IPR2015-00436
`
`

`
`PASIC. A Smart Sensor for Computer Vision
`
`Keping Chen
`Anders Astrom and Per-Erik Danielsson
`LSI Design Center
`Department of Electrical Engineering
`Linkoping University, S-581 83 Linkoping, Sweden
`
`Abstract
`
`The PASIC prototype chip contains 256 x 256 photo sensors,
`a linear array of 256 AD-converters, two 256 8-bit shift
`registers, 256 bit-serial ALU, and a 256 x 128 bit-dynamic
`RAM. It is claimed to be a viable architecture for low-level
`vision processing. The processors operate in SIMD-mode
`at 20 MHz. When executing an edge-detection algorithm
`it is shown that the 256 processors are capable of an output
`rate of 3 Mpixel/second.
`
`1. Introduction
`
`The advancement of VLSI technology will inevitably give
`birth to integration of sensors and processors in systems for
`computer vision. The two overriding issues for such
`architectures are:
`
`i) Efficiency when executing a core subset of low
`level algorithms.
`
`the architecture
`ii) Adaptation of
`constraints of VLSI technology.
`
`to various
`
`An early approach in this direction initiated at our university
`is LAPP [l]. This is now a commercially available chip
`which features a linear array of 128 photo sensors,
`thresholding units and simple processors for binary logic.
`
`(Processor, AD-converter, Sensor Integrated
`PASIC
`Circuit) is a prototype for a more general purpuse smart
`sensor. The general floor plan is shown by Fig. 1. Most of
`the area in this prototype is occupied by the photo sensors.
`Their data are read out, one row at a time, over 256 vertical
`lines which connect to a linear array of comparators.
`Together with some digital circuits these comparators form
`a linear array of Analog-to-Digital converters.
`
`The remaining part of the chip is purely digital and contains
`two 8 bit wide 256 stage shift registers, 256 bit-serial
`processors and a 256 x 128 bit dynamic RAM. These parts
`will be described in the following sections. Throughout the
`
`CH2898-5/90/0000/0286$01 .OO 0 1990 IEEE
`
`286
`
`design both analog and digital circuitry has been
`implemented in double metal 1.6 km CMOS technology.
`
`Two 256 8bit bi-directional shift registers
`kt222
`
`256 bit-serial ALUs
`
`22222]
`
`Figure I PASIC floor plan
`
`2. The parallel A/D-converter
`
`A conventional semiconductor video camera has to provide
`for high speed (10 MHz) transportation of tiny charge
`packages from the sensor array to the analog video output.
`In PASIC this transportation problem is greatly alleviated
`since it takes place at much lower rate, one row at a time
`in parallel. See Figure 2. For each new row of analog data
`the 256 comparators are comparing the individual signal
`levels to a common ramp signal generated in synchronism
`with stepping a counter. The eight bits of the counter are
`continuously copied to each of 256 8-bit registers up to the
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-001
`
`

`
`point where the ramp value exceeds the individual signal
`level. The 256 AD-conversions are completed after a full
`ramp generation. With 20 MHz clock rate such an event
`takes place in 256 . 0.05 = 12.8 p.
`
`the system. Putting 256
`organization for the rest of
`processors on a chip immediately requires the processors
`to be bit-serial devices and that they should work in
`SIMD-mode.
`
`The accuracy in the present design is limited to 8 bit/pixel.
`It should be noted, however, that this precision can be
`traded for speed. The A/D-conversion can be speeded up
`by changing the slope of the ramp signal. For instance the
`conversion time for 256 x 256 pixels is reduced from 3276.8
`JLS for 8 bits to 1638.4, 819.2, 409.6, 204.8 and 102.4 KS
`respectively for 7, 6, 5, 4 or 3 bit accuracy. In many vision
`tasks such a programmability for speed-precision trade-off
`is very valuable.
`
`The possibility to increase the signal-to-noise
`lowering the frame rate is self-evident.
`
`ratio by
`
`processor architectures have been
`Many bit-serial
`presented in literature, for instance CLIP[2], DAP[3],
`MPP[4], GAPP[5] and the Connection Machine [6]. The
`details of the GAF'P
`is well known since it has been
`commercially available as a chip. The GAPP processor is
`very simple, having four one-bit registers per chip and logic
`consisting of a full adderlsubtractor. Flexibility and
`generality of the design stems from a rather extensive
`multiplexor system that connects the ALU outputs, the
`register outputs, the memory output and the four NEWS
`neighbors to the memory and the four register inputs. In
`this network several simultaneous bit transfers can take
`place. For instance, one bit can be received from East,
`another transferred Westbound, the sum output stored in
`a memory location, carry saved in the C-register, input bit
`shifted into the communication register, all in the same
`cycle.
`
`This seems very efficient but a closer examination reveals
`that processing speed is nearly always limited by one single
`factor, the memory bandwidth. In a typical two-input
`one-output operation like addition the execution time is
`therefore 3b + 1 cycles where b is the wordlength. We found
`it necessary to design a processor even simpler than the
`GAF'P. In particular, we found it impossible to squeeze a
`GAPP type processor into the narrow 40 pm vertical slot,
`given by the pitch of the sensor columns.
`
`In our VLSI single chip smart sensor, all units, the registers
`in the A/D converter, the bit-serial ALU array, the shift
`registers and the memory cells are hooked to the single-bit
`bus as shown in Figure 2 . All these units communicate with
`the bus bitwise, that is, at one instance only one bit in each
`unit is input/output to/from the bus. The bit-serial ALU and
`the shift registers can be viewed as logic embedded in a
`memory and they appear as memory ports to the bus. The
`architecture where all computation units are memory-like
`and are centered around buses are referred to as a
`memory-bus organized architecture as shown in Figure 4.
`The control and addressing to each unit can be combined
`in a single address decoder broadcasting to all column of
`units. The instruction word (including both address and
`control signals) is 26 bit wide, which can be divided into
`five fields.
`
`287
`
`I
`
`The registers of Figure 2 are actually designed as cells in
`a 8 x 256 bit memory. See Figure 3. The read out to the
`memory bus is controlled by a 3-to-8 decoder.
`From sensor array
`I
`
`Ramp
`gener.
`
`1
`
`I
`
`8
`Figure 2 The parallel AID-converter
`
`From
`comparator
`
`From
`decoder
`
`From
`counter
`
`Memory
`bus
`Figure 3 A bit position in the AID converter registers
`
`3. The memory bus architecture
`
`We believe that convincing arguments for organizing the
`A/D-conversion of an image sensor a linear array were
`given in
`the previous section. By pure geometrical
`constraints
`it
`is then hard to avoid a
`linear array
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-002
`
`

`
`S
`
`AD:
`
`SRA:
`SRB:
`
`&U:
`
`CMA:
`
`4bit A/D converter addredcontrol
`AD=O-7
`0-7th bit to Bus
`AD=8
`data latch
`AD=9
`count
`Sbit shift register 1 address/control
`Sbit shift register 2 addresskontrol
`SR=O
`shift right
`SR=l
`shift left
`SR=2
`data latch
`SR=3
`circle right
`SR=4
`circle left
`SR=5-12
`Bus to 0-7th bit register
`SR=13-20
`0-7th bit register to Bus
`4bit ALU input/output addredcontrol
`BtC to Bus
`ALu=o
`ALu= 1
`BC to Bus
`carry to Bus
`ALu=2
`Bus to C
`ALu=3
`ALu=4
`ACtBC to Bus
`ALu=s
`ACtBC to Bus
`ALU=6
`Bus to B
`B to Bus
`ALu=7
`ALU=8
`Bus to B
`ALu=9
`B@C to Bus
`ALU=lO
`BUS to A
`ALU=11
`BUS to A
`ALU=12
`Bus to A & carry
`feedback
`sum to Bus
`sum to Bus & carry
`feedback
`no operation
`ALU=lS
`8bit memory addresskontrol
`0 to Bus
`MA=O
`MA=l-127
`1-127 bit to Bus
`no operation
`MA=128
`MA=129-256 Bus to 1-127 bit
`
`ALU=13
`ALU=14
`
`-
`
`0
`
`A
`D
`C
`3 -
`4
`
`S
`R
`A
`
`8
`9
`
`S
`R
`B
`-
`13
`14
`
`A
`L
`U
`-
`17
`18
`
`C
`M
`
`A -
`
`25
`
`Figure 4 The architecture and instruction format of PASIC
`
`The key element is the ALU. It has three one-bit registers:
`A, B and C that feed the logic. The logic is basically a full
`adder with SUM and CARRY output, activated by the
`addresses PA = 13, 14 and 2 respectively. In addition,
`however, the ALU-logic is tapped in six other places to
`make six more Boolean functions available for each given
`input A, B, C. With little extra hardware cost these functions
`greatly enhances the overall performance of the processor
`and in effect replaces bit-shuffling over multiplexors in
`GAPP. The number of Boolean functions is further
`extended by the ability to load the A- and B-registers with
`both inverted and non-inverted data from the bus. For
`instance, the CARRY output becomes a Borrow = KB + XC
`+ BC by inserting A := x.
`
`Not shown in Figure 4 is the Global OR function in PASIC.
`It is implemented as a ivired OR combination of the
`C-register output from all 256 processors.
`
`4. Algorithm implementation
`
`Most low-level algorithms for image processing invol
`neighborhood
`operations. The necessary horizont
`neighbor communication is provided by the bi-directior
`shift registers. For grayscale (8-bit) images the obvio
`sequence of events is to load the shift registers over the b
`from the AD-converter or the memory in eight cyck
`After one shift cycle, these 256 x 8 bits are available to t
`neighbors at left or right. Summing two 8-bit pixels frc
`neighboring columns in this manner takes 8 t 1 + 3.8 t
`from
`= 34 cycles where 9 extra cycles stem
`t
`communication. Note, however, that if the result is to
`used in a subsequent neighborhood operation we shol
`store the result into the shift register rather than 1
`memory. The extra delay for neighborhood communicati
`is then virtually eliminated.
`
`The memory in PASIC (present version) is too small
`allow for more sophisticated algorithms. In fact, since
`cannot harbor a full 256 x 256 gray level image we can o
`
`288
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-003
`
`

`
`row-parallel
`bit-serial,
`in
`algorithms
`implement
`pipe-lined versions. If an algorithm is specified using
`intermediate image buffers we have to reformulate the
`algorithm and execute all the stages of the algorithm row
`by row. See Fig. 5. We can hold a limited number of inputs
`and intermediate rows in the memory (128 bits per column).
`However, we have to produce one complete output row
`before we can accept a new row input from the image
`sensor.
`
`Median, 3 x 1 horizontal neighborhood. The result of the
`previous operation was loaded into the shift registers as well
`as to the memory. The computation continues as for the
`vertical neighborhood except for 1 extra shift cycle to bring
`in the left neighbor and, later on, 2 extra cycles to bring
`in the right neighbor.
`Total: 131 + 3 = 134 cycles.
`8bit data input
`
`Sensor array
`
`3x1 median
`
`I I
`
`Processor army
`
`I
`
`I
`
`
`
`thinning 0
`
`+
`output
`
`Figure 6 The edge detection algorithm
`
`Figure 5 Row parallel pipe-lined processing
`
`5. Application example: Edge detection
`
`A rather traditional edge detection procedure is described
`by Figure 6. A cycle count for the various stages of this
`algorithm implemented in PASIC for a 256 x 256 8-bit
`image reveals the following. (See also Figure 4).
`Input of one new row to memory takes 8 cycles.
`
`Median, 1 x 3 "vertical" neighborhood: Three 8 bit
`comparisons are necessary. Each comparison consists of
`one subtraction, 16 cycles, followed by one multiplexing
`operation using the AC + BC logic of the ALU to save the
`larger (and/or the smaller) of two numbers. The first
`multiplexing is a two-way one and takes 33 cycles, the
`following ones are one-way and takes 25 cycles each. The
`whole operation totals 16 + 33 + 2(16 + 25) = 131 cycles.
`
`Smooth is performed by adding two horizontal neighbors
`followed by adding two vertical results. Input data for the
`smoothing stage is already in shift registers. Total execution
`time is 25 cycles for the first 8 bit add operation, 28 cycles
`for the following one which totals 53 cycles.
`
`Derivative magnitude in the x-direction begins with
`subtraction on 10 bit data that takes 31 cycles + 3 extra
`cycles to bring in the two most significant bits, which do
`not fit into the shift register. Then follows an addition with
`vertical neighbor to yield a 12 bit Sobel result f, which takes
`34 cycles. Finally, the magnitude is obtained by inverting
`the 11 LSB bits if the sign is negative. This can be
`implemented in 24 cycles by loading the sign bit in the
`A-register, a zero in the B-register and then streaming in
`the 11 LSB-bits while saving the SUM output from the
`ALU.
`
`To compute the magnitude of the Sobel response in the
`y-direction
`is identical in complexity. The total Sobel
`
`289
`
`I
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-004
`
`

`
`6. Conclusions
`
`to prove (or
`The main objective of the PASIC design is
`smart image
`disprove) the viability of a general purpose
`to put
`the
`sensor. One obvious integration step
`is
`given this, it
`AID-converter on the sensor chip. However,
`is quite natural to also include ALU and memory. To avoid
`high speed transfer of analog data we have employed an
`AID-converter in the form of a linear array of comparators.
`The architecture of the processing part conforms with the
`row parallel output from the AD-converters. We have
`designed an extremely simple but rather efficient processor
`which is excellently suited to the special VLSI constraints
`of the sensor. The pitch in the present version of PASIC
`is 40 pm and it has been possible to fit the AD-converter
`circuitry, the shift register, the processor and the memory
`into this narrow slot. One key factor is the unified structure
`achieved by extending the memory data bus to all the other
`units within the same column.
`
`The actual circuit design has hardly been touched upon in
`this article. However, as is evident by other reports [lo],
`PASIC employs 1.6 pm double metal CMOS technology for
`sensors as well as processors and memories. Furthermore,
`all the four units, sensor, AD-converter, shift register and
`memory, can utilize the same basic cell design shown by
`Figure 3. At the time of this writing most of these parts have
`been tested.
`
`PASIC should be considered as a prototype for a more
`full-fledged
`smart image sensor with higher resolution,
`more memory and control circuitry on-chip. We believe that
`it should be possible to extend the sensor area to 512 x 512
`and the number of processors to 512 by employing
`sub-micron technology in the next version of PASIC. The
`memory size is critical for the scope of the architecture and
`the algorithms. In the present version PASIC is not utilizing
`the highly specialized layout and techniques used in high
`density DRAM’S. Full exploitation of such designs should
`make it possible to store several gray level images on-chip.
`
`The pixel processing rate grows linearly with the size of the
`array. However, since the sensor area grows quadratically
`the frame rate is subject to a linear decrease for upscaled
`versions of PASIC. This problem indicates that future large
`scale smart sensor systems must be programmable at the
`sensor level. The sensor should be able to deliver an image
`of the scene at a variable resolution and/or deliver just a
`smaller window of the full scene (foveation).
`
`the
`clarified by
`further
`is
`computation
`response
`decomposition scheme of Figure 7 (from [ 6 ] ) . Total
`execution time:
`2[34+34+24] = 184 cycles.
`
`pJ
`
`m
`
`- 1 -2 -1
`
`-1 -2 -1
`
`Figure 7 Decomposition of the Sobel operator pair
`
`The gradient magnitude is approximated as the sum lf,l
`+ lfyl. Yielding a 12 bit result from 11 bit input data takes
`37 cycles.
`
`Thresholding with a fix
`to
`is equivalent
`threshold
`subtraction while saving the sign bit of the result. Execution
`time: 25 cycles.
`
`Thinning is applied on the thresholded image. Ideally we
`would like this execution to be data dependent and continue
`iteratively until no further changes. Here we assume that
`all edges after thresholding have a max width of 5 pixels
`and hence two thinning steps are sufficient. Various
`thinning algorithms exist in the literature [SI, [9]. We have
`adopted a four-phase version which executes in 22 cycles
`per phase. Two thinning steps require 2 . 4 . 22 = 176 cycles.
`
`Output of the thinned binary image may take place every
`8th line. Execution time per row is then 118 (8 t 256) = 33
`cycles.
`
`Summary of execution times per line:
`Input from sensor
`Median 1 x 3
`Median 3 x 1
`Smooth
`Derivatives Ifx],
`lfxl t I f Y l
`Thresholding
`Thinning
`output
`
`IfY[
`
`8
`131
`134
`53
`184
`37
`25
`176
`33
`781 cycles
`
`Since there are 256 rows, 50 ns cycle time implies a
`maximum frame rate of
`
`20 . lo6 / 765 . 256 = 102 frameslsecond
`which is equivalent to a processing rate of
`
`20 . l o 6 . 256 / 765 = 3.7 M output pixel/second
`
`290
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-005
`
`

`
`Acknowledgement
`
`This project has been supported by the grant 712-87-02841
`from Swedish Board of Technical Development and by
`CENIIT (CENter for Industrial Information Technology),
`Linkoping University.
`
`References
`[l] R. Forchheimer, A. Odmark, Single chip linear
`array processor, in Applications of digital image
`processing, SPIE, Vol. 397, (1983).
`
`[2] M.J.B. Duff, Architectures of SIMD cellular logic
`image processing arrays in Nato AS1 series, Vol.
`for
`F18, Computer
`architectures
`spatially
`distributed data, ed. by H. Freeman and G.G.
`Picroni, Springer (1985)
`[3] S.F. Reddaway, DAP - a Distributed Processor
`Array, First annual symposium on computer
`architecture, Florida, pp 61-65 (1973).
`[4] K.E. Batcher, Design of a Massively Parallel
`Processor, IEEE Transactions on Comp., Vol.
`C29, pp 836-840 (1980).
`[5] Geometric Arithmetic Parallel Processor, Data sheet
`NCR45CG72, NCR corporation, (1984).
`
`[6] L.W. Tucker and G.G. Robertson, Architecture
`and applications of the Connection Machine, IEEE
`computer, pp 26-38, August 1988.
`
`[7] P-E Danielsson, 0. Seger, Generalized and
`separable Sobel operators, in Machine vision -
`aquiring and interpreting the 3 0 scene, ed. by
`H. Freeman, Academic press (1989).
`
`[8] E.S. Deutsch, Thinning algorithms on rectangular,
`hexagonal and triangular arrays, Com. ACM, Vol.
`15, pp 827-837 (1972).
`
`[9] Q-Z Ye, P-E Danielsson, Inspection of printed
`circuit boards by connectivity preserving shrinking,
`IEEE Trans. PAMI, Vol. 10, pp 737-742 (1988).
`
`[lo] K. Chen, M. Afghani, P-E Danielsson, C.
`Svensson, PASIC: A processor AID-converter-
`sensor integrated circuit, to appear in proceedings
`of ISCAS90.
`
`Authorized licensed use limited to: Lester Kozlowski. Downloaded on June 29,2010 at 17:33:31 UTC from IEEE Xplore. Restrictions apply.
`
`1041-006

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket