throbber
Accelerating An IR Automatic Target Recognition
`Application with FPGAs
`
`Jack Jean, Xuejun Liang, Brian Drozd, and Karen Tomko
` Department of Computer Scicence and Engineering
`Wright State University, Dayton, OH 45435, USA
`{jjean, xliang, bdrozd, ktomko}@cs.wright.edu
`
`An infrared automatic target recognition (IR
`ATR) application is accelerated with an FPGA
`co-processor board. The board features and the
`application are first stated. The FPGA design is
`then described. The achieved performance is
`reported and analyzed at the end.
`
`The FPGA co-processor board used is a Giga
`Operations’s G900 board which is PCI-bus
`based and hosted on a 180 MHz Pentium PC.
`On the board there are 8 XMODs, each
`containing 2 XC4020 FPGA chips, 256 KBytes
`of SRAM, and 2 MBytes of DRAM. It is
`possible to broadcast or multicast an FPGA
`configuration file to those XC4020s in 0.37
`seconds.
`
`The application of the IR ATR program is to
`locate and identify ground vehicles based on a
`single IR image frame. The G900 board is used
`to accelerate the bottleneck of the program
`which is to locate the region of interests (ROIs)
`by applying a set of templates through the whole
`image. Because the computation involved is
`different from a previous work on mapping ATR
`to FPGA [1], we were not able to adopt
`techniques used in that work.
`
`To locate the ROIs six template pairs are applied
`through the whole image except the image
`boundary. Each template pair contains one
`target template and one for background
`template. Each template contains a pattern of
`exactly 30 pixels in a relatively large area, say,
`of size 20 by 50. (Different templates have
`different sizes.) Each image is partitioned into
`five strips of different heights and each template
`needs to be rescaled across strip boundary.
`
`The surrounding area of an image pixel is tested
`against a template pair to see if it is an ROI.
`The computation involved is as follows.
`
`1. Compute the mean of those 30 test points
`associated with the background template.
`2. Test points whose values are larger than the
`mean are considered “hot” (otherwise
`“cold”). Compute four different sums, one
`for target hot values, one for target cold,
`one for background hot, and one for
`background cold.
`3. Based on the four sums compute the
`correlation.
`If the correlation is greater than or equal to
`a constant threshold, the pixel is a ROI.
`
`4.
`
`The FPGA design has the following features.
`
`1. All the floating point computations have
`been replaced with full-precision integer
`operations.
`2. The original code uses one byte per image
`pixel. The FPGA used 5 bits per pixel. The
`recognition results were considered
`comparable judging from the image output
`files and the no. of ROIs.
`3. Six XC4020 chips are used, one per
`template pair.
`4. Each template pair is evaluated on two
`neighboring pixels in parallel.
`5. The original C code uses several 2-D and 4-
`D arrays which require many array index
`computations. Those computations are
`removed with the FPGA design.
`
` Only the locating of ROIs is implemented in
`the FPGA design. A host program running on
`the PC does everything else as required in the
`original source code. To overlap the FPGA
`computation and the host machine computation,
`the host program uses two threads. When the
`child thread is waiting for the FPGA signal of
`computation completion for one strip, the parent
`
`1
`
`Petitioner Microsoft Corporation - Ex. 1052, p. 1
`
`

`

`thread can go ahead and process ROIs identified
`for the previous strips.
`
`The FPGA design in an XC4020 contains
`several components:
`
`1
`0
`2
`1
`3
`2
`… …
`100
`101
`101
`102
`… …
`Memory data
`arrangement:
`Each memory
`address is for
`two pixels.
`
`1. An SRAM controller: it has four functions.
`(a) It receives contiguous image data from the
`host program and stores them in the SRAM on
`the XMOD. Note that data are stored
`redundantly as shown at the right
`so that two neighboring pixels
`can be evaluated in parallel. (b)
`For each single pixel location, it
`allows the (almost random)
`access of 60 test points. (c) Store
`the pixel location into SRAM if
`the location is an ROI. (d) Allow
`the host program to read back all
`the ROIs.
`2. A buffer: it stores the locations
`of those 60 test points associated
`with a template pair.
`3. A computation unit: it
`contains three parts:
`(a) two TEMPERATURE units, each containing
`an accumulator to compute the sum of 30
`background image values, a buffer to store those
`30 values, comparators and adders to compute
`HOT and COLD values.) (b) a multiplexer and
`a small unit called CONVERT, and (c) an
`ASSERT unit to evaluate if the correlation is
`greater than or equal to a threshold.
`
`The ASSERT unit is time-multiplexed to
`process the outputs of the two TEMPERATURE
`units. This arrangement saves one CONVERT
`unit and one ASSERT unit at the expense of
`requiring the multiplexer. In addition, a 33
`MHz clock is used for the ASSERT unit while
`the rest of the chip runs with a 16 MHz clock.
`The whole Round 0 design takes 704 CLBs per
`chip which is 89.8% of the total CLBs on a
`XC4020 chip.
`
`The execution times for the computation of
`ROIs are summarized in Table 1. Note that the
`execution time with FPGA does not include the
`time to initialize the board (2.31 seconds) and
`the time to load the bit file (0.37 seconds).
`On G900, the execution time contains the
`following three parts that take 0.655 (= 0.075 +
`0.018 + 0.562) seconds.
`
`1.
`
`Image data transfer from PC to XMODs:
`about 0.075 seconds. The data transfer
`bandwidth is about 6MB/sec for write and
`4MB/sec for read from G900.
`2. Other data transfer: about 0.018 seconds.
`(1) The sending of templates to XMODs:
`less than 10 Kbyes (2) The reading of
`results from XMODs: Assume that the
`number of pixel locations that are ROIs is
`less than 10%. Since each location takes 2
`bytes. There are about 60 (= 300 x 0.1 x 2)
`KBytes to read.
`3. Real computation: The computation time is
`mostly overlapped with the reading of
`image data from SRAM to FPGA chips.
`For SRAM, it takes one clock cycle to read
`2 bytes. For an image frame of 300 K
`pixels, each pixel requires the reading of 60
`bytes. So it takes 9 M (= 60 x 300 K / 2)
`clock cycles, which takes 0.562 seconds
`with a 16 MHz clock.
`
`Therefore the timing results obtained with the
`FPGA board are consistent with the above
`timing analysis.
`
`Acknowledgments
`
`This research is supported by DARPA under Air
`Force contract number F33615-97-1-1148, an
`Ohio State investment fund, and an Ohio State
`research challenge grant. Xilinx Inc. donated
`an FPGA design tool and FPGA chips on
`XMODs. The authors would like to thank Mr.
`Jim Hilger with the US Army Night Vision &
`Electronics Sensors Directorate for his
`introduction to the IR ATR application and his
`help and comments on the design.
`
`Reference
`
`[1] J. Villasenor et al., “Configurable
`Computing Solutions for Automatic Target
`Recognition,” in IEEE Symposium on
`FCCM, pp. 70--79, 1996.
`
`Speedup
`
`Time With
`Time
`Image
`FPGA
`On PC
`No.
`0.602
`13.106
`1
`0.614
`12.980
`2
`0.622
`13.187
`3
`0.617
`13.016
`4
`Table 1 : Execution times in second
`
`21.2
`21.1
`21.2
`21.1
`
`2
`
`Petitioner Microsoft Corporation - Ex. 1052, p. 2
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket