`
`Acquisition of High-Resolution Digital Images in Video
`Microscopy: Automated Image Mosaicking on a
`Desktop Microcomputer
`SWIDBERT R. OTT
`Department of Zoology, University of Graz, Graz, Austria
`
`KEY WORDS
`
`cross-correlation; Fourier transformation; image processing; matched filtering;
`coordinate transformation
`
`For the digital processing of microscopical images, mosaicking is a prerequisite if
`ABSTRACT
`the specimen is larger than the camera field at the necessary magnification. This study investigates
`the possibilities and limitations of fully automated mosaicking on a desktop computer. Cross-
`correlation-based frame registration was performed with high reliability if the video frames were
`edge-enhanced before matched filtering, and also in a reasonable time since the search for matches
`was restricted to pairs of consecutive frames. An environment for routine mosaicking was developed
`and implemented in a widely used desktop image-processing program. The software developed
`during this study has been released to the public domain. Microsc. Res. Tech. 38:335–339, 1997.
`r 1997 Wiley-Liss, Inc.
`
`INTRODUCTION
`The need for image mosaicking occurs in all those
`imaging applications where the object size is very large
`compared to the finest object detail that needs to be
`preserved in the image, and it is even more pronounced
`if the imaging device has a low spatial resolution
`(Tekalp et al., 1992; for a general introduction, see
`Mann and Picard, 1995). With the rapid development of
`desktop computers, digital image processing has be-
`come increasingly important in microscopy research.
`Acquisition is often done live, via a CCD video camera
`and a microcomputer with a frame grabber card. Once
`the image is digitized, virtually every image-processing
`technique can be employed to extract information.
`However, the poor spatial resolution and limited field of
`the video camera is a serious constraint. It can only be
`bypassed by mounting several detail frames into a
`single image (e.g., Chockkan and Hawkes, 1994; Predel
`et al., 1994). To automate the reassembling of the
`compound image, the microscope can be equipped with
`a motorized stage so that the translations of the frames
`are known in advance. Besides needing additional
`hardware, this approach is inaccurate for higher magni-
`fications. Alternatively, a manual stage can be used and
`the translation can be recovered by a matching algo-
`rithm. Such algorithms are computationally expensive
`and have traditionally been run on workstations.
`We have developed a technique to perform fully
`automated mosaicking on a low-cost computer in reason-
`able time and with high reliability, and have made this
`function available within the widely used public do-
`main software ‘‘NIH Image.’’ The result is ‘‘AutoMatch,’’
`a graphical user interface implementation tailored to
`the needs of routine microscopical imaging. In this
`paper, we discuss the possibilities and limitations of
`automated desktop mosaicking. The algorithm and its
`performance under real-life conditions are described
`and examples of applications given.
`
`MATERIALS AND METHODS
`General Considerations
`Hundreds of papers have been published on the
`problem of frame alignment (for review, see Barron et
`al., 1994). What is needed is a coordinate transforma-
`tion that maps the image coordinates x 5 [x, y] to a new
`set of coordinates, x8 5 [x8, y8]. A general model had to
`account for all the degrees of freedom the camera has in
`3D space—translation, zoom, rotation, pan, and tilt, as
`well as for radial lens distortion. In video microscopy,
`however, this ‘‘barrel’’ distortion can be neglected be-
`cause the camera field covers only the center-most part
`of the microscopical image. Therefore, for a compound
`microscope with an XY stage and a given magnification,
`the problem reduces to recovering the XY translation so
`that the transformation can be described as
`
`x8 5 x 1 b (x, x8, b [ R2)
`
`and the translation b can be found by cross-correlation.
`Among the existing desktop image-processing soft-
`ware, NIH Image (Macintosh 68k and PowerPC plat-
`forms; Rasband and Bright, 1995) is unique in provid-
`ing advanced image analysis while being in the public
`domain. AutoMatch was developed entirely in the NIH
`Image macro programming language so that prospec-
`tive users can run it without recompiling the NIH
`Image source code. This will also make it easy to
`upgrade to future versions of either NIH Image or
`AutoMatch. Interpreted macro language might seem
`like a big trade-off to speed, but we have estimated that
`more than 95% of the processing time is consumed by
`
`Contract grant sponsor: Austrian National Bank, Contract grant numbers:
`3378 and 5519; Contract grant sponsor: Austrian Science Foundation.
`Correspondence to: Swidbert R. Ott, Department of Zoology, University of
`Cambridge, Downing Street, Cambridge CB2 3EJ, UK.
`Accepted in revised form 19 February 1997
`
`r 1997 WILEY-LISS, INC.
`
`
`
`336
`transforming images between the frequency domain
`and the space domain. These calculations run in com-
`piled code because the Fast Hartley Transform (FHT)
`routine is provided by NIH Image.
`Hardware and Software
`A Sony Kontron Image Analysis Division XC-75 CE
`CCD video camera module was attached to a Leitz
`Orthoplan or Olympus Provis compound microscope.
`The S-Video signal was fed into the AV digitizer board of
`a Macintosh 7100/66 AV PowerPC (56 MB RAM) run-
`ning MacOS 7.5.1 and NIH Image 1.60. The public
`domain NIH Image program has been developed at the
`U.S. National Institute of Health and is available via
`FTP from zippy.nimh.nih.gov/pub/nih-image/. Reas-
`sembled images were processed further in NIH Image
`and Adobe Photoshop, and printed on a Sony Mavi-
`graph Digital Color Printer UP-D8800.
`Image Acquisition
`Video input was displayed live in NIH Image in
`an 8-bit greyscale image of 640 3 480, 480 3 480, or
`320 3 240 pixel. Frames were captured in a scanner-
`like fashion by manual movements of the stage. Each
`frame had to overlap with the previous one since
`cross-correlation was then computed only between suc-
`cessive pairs to minimize computation time. Video
`noise was reduced by averaging up to 16 temporal
`frames. To eliminate tiling artifacts, background correc-
`tion was performed by subtracting a blank frame with
`an offset corresponding either to the mean grey value of
`the blank frame or to a predefined constant.
`Mosaicking Algorithm
`The background-corrected frames were transformed
`into the frequency domain by a 2D Fast Hartley Trans-
`form (FHT; Reeves, 1990). Like all Fast Fourier algo-
`rithms, FHT requires square, power of two size images.
`Two alternative strategies were pursued to fulfill this
`requirement. Either the live image was clipped to
`square proportions during acquisition and then scaled
`to 128 3 128, 256 3 256, or 512 3 512 (in this case, the
`compound image was reassembled from the clipped and
`scaled frames) or full rectangular frames were acquired
`and transformed into power of two squares by a non-
`proportional scaling. These scaled square images were
`then used only to calculate the translations, and the
`compound image was built from the original frames. In
`either case, the square images were doubled in height
`and width prior to FHT, with the actual image placed in
`the fourth quadrant and the remainder padded with its
`mean grey value. The cross-correlation function (CCF)
`of two successive frames E i21, E i was obtained by
`calculating the inverse transform of the conjugate
`product of their Fourier transforms. It was then sup-
`posed that
`
`b i21,i 5 m i21,i
`where b i21,i is the translation between E i21 and E i and
`m i21,i is the XY-position of the global maximum of their
`CCF relative to the image centre.
`In those cases where frames had been brought to power
`of two square dimensions by non-proportional scaling,
`the true translation vector had to be recovered by
`multiplication of m with the reciprocal scaling factors.
`
`OTT
`
`Additional Processing Prior to Cross-correlation
`The following filters were tested with respect to
`whether they improved the reliability of the mosaicking
`if applied to the frames prior to matched filtering: Noise
`reduction, Sobel edge detection, 5 3 5 and 9 3 9
`Laplacian kernels, and various Difference of Boxes
`(DoB) or Difference of Gaussians (DoG) filters. The
`minimum overlap necessary for correct alignment was
`used as the criterion for reliability.
`
`RESULTS AND DISCUSSION
`Reliability of the Algorithm
`Choosing the frame overlap, to state the obvious,
`must be a compromise between reliability and effi-
`ciency. An overlap of more than 30% was regarded as
`inefficient. Reliable mosaicking of real-world frames, on
`the other hand, was rarely possible if the overlap was
`less than 10% even under quasi-ideal conditions. There-
`fore, reliable alignment with an overlap of 15–20%
`seemed to be a realistic goal.
`Mosaicking was found to be highly unreliable with
`frames that had not been filtered before cross-correla-
`tion. The algorithm searches only for the global maxi-
`mum of the CCF, assuming that its coordinates m
`correspond to the translation b; however, low-frequency
`components may result in additional maxima in the
`CCF, with values higher than that at b. Therefore,
`mosaicking worked well if there was distinct detail on
`an even background, but it failed if the detail was of low
`contrast, or in the presence of prominent low-frequency
`components such as those corresponding to the global
`shape of the specimen. Under such conditions, the
`value of the CCF at b was just a local maximum unless
`the overlap was substantially larger than 20%.
`The primary aim was, therefore, to make the neces-
`sary overlap as independent as possible from the fre-
`quency composition of the image. The solution was to
`emphasize local features so that the corresponding
`maximum in the CCF becomes the global one. After
`experimentation with different edge enhancement /
`high-pass filters, a DoB filter with a 7 3 7 box kernel
`and a ((1, 1, 1), (1, 4, 1), (1, 1, 1)) kernel proved to be
`highly effective. This set of kernels worked well for
`images with very different kinds of contrast and feature
`distribution, giving sufficient boost to local features to
`allow reliable matching with an overlap of 10–20%.
`
`Features of the Implementation
`The implementation developed during this study has
`a graphical user interface (GUI) and is fully mouse-
`controlled. Since the actual matching algorithm is
`computationally intensive, it seemed reasonable not to
`process the frames immediately but to split the entire
`mosaicking process into three separate steps. First, an
`arbitrary number of mosaics are acquired, each as a
`series of overlapping frames. During frame capturing,
`the previous frame is displayed on-screen to ease
`orientation when moving the stage. Next, the series are
`fed into the matching algorithm. The XY translation
`parameters are stored in internal files, so that finally
`the images can be quickly merged into standard NIH
`Image windows for further processing. The latest ver-
`sion of AutoMatch can be obtained via FTP from
`zippy.nimh.nih.gov/pub/nih-image/contrib/.
`
`
`
`AUTOMATED IMAGE MOSAICKING
`
`Performance and Example Applications
`To compensate for the limited speed on a desktop
`system, CCFs were calculated only between successive
`pairs of frames (differential parameter estimation).
`Further speed could be gained by calculating the CCF
`for scaled, small versions of the frames only. Two
`alternatives were investigated:
`1) Trading image size for speed, by assembling the
`mosaics from scaled frames. Because of the blurring
`and noise in the video signal, not every single pixel
`carries information about the object. Therefore, almost
`no detail was lost if the original frames were scaled
`down from 480 3 480 to 256 3 256 pixels, or if
`acquisition was done with half the digitizer resolution
`(320 3 240; compare Figs. 1a and b).
`2) Trading accuracy for speed, by assembling full-
`sized frames and using scaled versions but for calculat-
`ing the translation b. Assuming nearest-neighbour
`scaling, the largest possible error that can occur in
`recovering b from scaled versions of the frames was
`calculated as
`
`x
`
`y
`emax
`
`4 , emax [ N0
`
`2
`
`emax 53emax
`1 2 sx
`x ,
`2sx # emax
`x
`ds
`x ,
`do
`
`sx 5
`
`sy 5
`
`1 1 sx
`2sx
`y
`ds
`
`y , ds
`do
`
`,
`
`1 2 sy
`y ,
`2sy # emax
`
`1 1 sy
`2sy
`
`
`
`y [ R1\506x, dsy, dox, do
`
`
`
`y are the XY dimensions of thewhere dox, doy and dsx, ds
`
`
`
`
`original and scaled frame, respectively, and emax is the
`maximum error, in pixel.
`The probability P(e) of an error e to occur was
`estimated as
`
`
`
`P(e) 53P(ex)P(ey)4 ,
`
`337
`For example, if an error of one pixel was tolerable, b
`was calculated from 256 3 256 versions of 640 3 480
`frames:
`
`
`
`
`
`emax 5 3114 , P(emax) 5 30.600.474 .
`For maximum accuracy, b was calculated from 512 3
`512 or 256 3 256 versions of 640 3 480 or 320 3 240
`frames, respectively:
`
`
`
`emax 5 3104 , P(emax) 5 30.21 4 .
`
`
`
`Notice that even in this case there is a low probability
`for an error of one pixel to occur along the x-axis. For
`0], it would be necessary to keep the scaling
`emax 5 [0
`factor above 1 along both the x- and the y-axis. For
`640 3 480 frames, however, this would mean to scale to
`1024 3 1024 (in 8-bit integer values) and then calculate
`the FHTs for 2048 3 2048 (in 40-bit real values). Since
`the RAM must hold three such real-number images
`during the conjugate multiplication, this operation
`would require approximately 62 MB RAM and would
`take unreasonably long on currently available desktop
`systems.
`On the 7100/66 AV PowerPC, matching a single pair
`of frames took approximately 25s and 95s based upon
`256 3 256 and 512 3 512 pixel resolution, respectively.
`All timings were measured for a memory allocation of
`50 MB to NIH Image and virtual memory turned off.
`These numbers are not representative, however, since
`it takes three FHT transformations to correlate the first
`two frames, whereas every further frame takes only
`two.
`Figure 1a shows a 20 µm coronal kryostat section of
`the tegmental region of the rat brain, stained for
`NADPH diaphorase activity and viewed with a 103
`objective. The image is 2626 3 1116 pixel in size and
`was mosaiced from 14 video frames, 640 3 480 pixel
`each. It took 4 or 15 minutes to compute if matched
`filtering was based on 256 3 256 or 512 3 512 pixel
`resolution, respectively. Figure 1b shows the same
`specimen as Figure 1a, but viewed with a 203 objective.
`Acquisition was done at half digitizer resolution
`(320 3 240). The mosaic consists of 42 frames resulting
`in 2626 3 840 pixel final resolution, and mosaicking
`took 12 minutes based on 256 3 256 versions of the
`frames. Notice that, although the pixel density of the
`final image is the same in Figures 1a and b, the
`combination of twice the microscope magnification and
`half the digitizer resolution results in much more
`detail, and that the time necessary to mosaic the
`images is approximately the same in both cases.
`Figure 2 shows another application for mosaics that
`consist of a large number of camera fields. This 1169 3
`1340 pixel image was used for semiautomatically count-
`ing the sensory cells in the hearing organ of an insect,
`Bullacris membracioides (1 µm Toluidine blue stained
`Historesin section; the circular profiles in the central
`region are the attachment cells, which indicate the
`number of sensory cells in the unit; preparation gener-
`ously provided by Michael Rieser, University of Graz).
`The mosaic consists of 34 frames, 320 3 240 pixel each.
`
`min (1, sx)
`
`2sx
`
`12(2e x 2 1)sx
`
`0
`
`min (1, sy)
`
`1 2 (2ey 2 1)sy
`
`0
`
`P(ex) 55
`P(ey) 55
`
`if
`
`if
`
`if
`
`if
`
`if
`
`if
`
`if
`
`ex 5 0
`12s x
`2sx
`11s x
`2sx
`1 1 sx
`
`0 , ex ,
`
`1 2 sx
`2sx # ex ,
`
`ex $
`
`ey 5 0
`1 2 sy
`2sy
`11s y
`2sy
`1 1 sy
`
`1 2 sy
`2sy # ey ,
`
`ey $
`
`2sx6 ,
`2sy6 ,
`
`2sy
`
`if 0 ,
`
`ey ,
`
`
`
`338
`
`1a,
`
`OTT
`
`..
`
`1b
`, ',\ ",-,
`
`•
`,
`, ,
`
`" I
`
`, ,
`
`,
`
`/
`
`~
`
`; . "
`~~
`
`(
`"
`
`~ , O.Omm
`i. q,1
`~ oi
`• g- 0.3
`~ . , i- 0.4
`
`=-0.5
`
`Image mosaics of a rat brain cryostat section showing
`Fig. 1.
`NADPH diaphorase activity in the tegmental region, printed at 300
`pixel per inch (ppi). Black frames in the top right corners indicate the
`size of one single video frame. The insets bottom right show a 300%
`digital zoom to demonstrate the image quality. a: Assembled from 14
`
`frames using 103 objective magnification, frames were digitized at
`640 3 480 pixel resolution. b: Assembled from 42 frames using 203
`objective magnification, frames were digitized at 320 3 240 pixel
`resolution.
`
`
`
`AUTOMATED IMAGE MOSAICKING
`
`339
`
`Image mosaic of a cross-section of the hearing organ of an insect, Bullacris membracioides,
`Fig. 2.
`showing the large number of attachment cells (dark circular profiles; Toluidine blue). This mosaic was
`assembled from 34 frames using 203 objective magnification and printed at 300 ppi; frames were
`digitized at 320 3 240 pixel resolution.
`
`Mosaicking was based on 256 3 256 versions and took
`14 minutes.
`This study showed that a low-cost desktop-based
`system can perform fully automated mosaicking of
`microscopical images under real-life conditions and in
`reasonable time. The latter was achieved only if matched
`filtering was restricted to successive pairs of frames.
`This implies that some care has to be taken during the
`acquisition of the frames. It also makes it difficult to
`bridge holes or otherwise unstructured regions within
`the specimen if these regions extend over more than
`one video frame. We think, however, that the setup and
`software described in this report can be regarded as a
`low-cost alternative to high-end workstations which
`stood the test in practical operation.
`ACKNOWLEDGEMENTS
`The author would like to thank Dr. Gerhard Skofitsch
`(University of Graz) for providing hardware facilities
`and Klaus Steiner (University of Graz) for intense and
`fruitful discussions on the error estimation. This work
`was supported by the Austrian National Bank grants
`#3378 and #5519 to Dr. Gerhard Skofitsch and by a
`
`grant of the Austrian Science Foundation (FWF) to
`Prof. Heiner Ro¨mer.
`
`REFERENCES
`Barron, J.L., Fleet, D.J., and Beauchemin, S.S. (1994) Performance of
`optical flow techniques. International Journal of Computer Vision,
`12:43–77.
`Chockkan, V., and Hawkes, R. (1994) Functional and antigenic maps
`in the rat cerebellum: Zebrin compartimentation and vibrissal
`receptive fields in lobule IXa. J. Comp. Neurol., 345:33–45.
`Mann, S., and Picard, R.W. (1995) Video orbits of the projective group:
`A new perspective on image mosaicing. M.I.T. Media Lab Perceptual
`Computing Section, Tech. Rep. 338.
`Predel, R., Agricola, H., Linde, D., Wollweber, L., Veenstra, J.A., and
`Penzlin, H. (1994) The insect peptide corazonin: Physiological and
`immunocytochemical studies in Blattariae. Zoology, 98:35–50.
`Rasband, W.S., and Bright, D.S. (1995) NIH Image: A public domain
`image processing program for the Macintosh. Microbeam Anal. Soc.
`J., 4:137–149.
`Reeves, A.A. (1990) Optimized Fast Hartley Transform for the MC68000
`with applications in image processing. MSc Thesis. Thayer School of
`Engineering, Hanover, New Hampshire.
`Tekalp, A., Ozkan, M., and Sezan, M. (1992) High-resolution image
`reconstruction from lower-resolution image sequences and space-
`varying image restoration. IEEE Proc. of the Int. Conf. on Acoust.,
`Speech and Sig. Proc., pp. 111–169.