`
`
`
`
`
`
`
`Microsoft TerraServer: A Spatial Data Warehouse
`
`
`
`
`
`
`
`
`
`
`
`
`Tom Barclay
`Jim Gray
`Don Slutz
`
`June 1999
`
`Technical Report
`MS-TR-99-29
`
`
`
`
`Microsoft Research
`Advanced Technology Division
`Microsoft Corporation
`One Microsoft Way
`Redmond, WA 98052
`
`1
`
`Microsoft Corp. Exhibit 1032
`
`
`
`
`
`2
`
`Microsoft Corp. Exhibit 1032
`
`Microsoft Corp. Exhibit 1032
`
`
`
`
`
`Microsoft TerraServer: A Spatial Data Warehouse
`Tom Barclay. Jim Gray, Don Slutz
`{TBarclay, Gray, Dslutz}@microsoft.com
`Microsoft Research, 301 Howard St., Suite 830,San Francisco, CA 94105
`http://research.microsoft.com/barc
`
`Our research group explores scaleable servers. We wanted first-
`hand experience building and operating a large Internet server
`with a large database and heavy web traffic. To generate the
`traffic we needed to build an application that would be
`interesting to millions of web users. To have a huge database,
`we needed a huge data source: trillions of bytes that are
`relatively inexpensive to acquire and process.
`Based on our exposure to the EOS/DIS project [Davis94], we
`settled on building a web site that serves aerial, satellite, and
`topographic imagery. We picked this application for three
`reasons:
`1. The web is inherently a graphical environment, and these
`images of neighborhoods are recognizable and interesting
`throughout the world. We believed this application would
`generate the billions of web hits needed to test our
`scaleabilty ideas.
`2. The data was available. The USGS was cooperative, and
`since the cold war had ended, other agencies were more
`able to share satellite image data. The thaw relaxed
`regulations that had previously limited the access to high-
`resolution imagery on a global basis.
`3. The solution as we defined it – a wide-area, client/server
`imagery database application stored in a commercially
`available SQL database system – had not been attempted
`before. Indeed, many people felt it was impossible without
`using an object-oriented or object-relational system.
`This paper describes the application design, database design,
`hardware architecture, and operational experience of
`the
`TerraServer. The TerraServer has been operating for a year
`now. We are just deploying the third redesign of the database,
`user interface, and online image loading system.
`
`
`Abstract
`Microsoft® TerraServer stores aerial, satellite, and topographic
`images of the earth in a SQL database available via the Internet.
`It is the world’s largest online atlas, combining five terabytes of
`image data from the United States Geological Survey (USGS)
`and SPIN-2. Internet browsers provide intuitive spatial and text
`interfaces to the data. Users need no special hardware, software,
`or knowledge to locate and browse imagery. This paper
`describes how terabytes of “Internet unfriendly” geo-spatial
`images were scrubbed and edited into hundreds of millions of
`“Internet friendly” image tiles and loaded into a SQL data
`warehouse. Microsoft TerraServer demonstrates that general-
`purpose relational database technology can manage large scale
`image repositories, and shows that web browsers can be a good
`geospatial image presentation system.
`1. Overview
`The TerraServer is the world's largest public repository of high-
`resolution aerial, satellite, and topographic data. It is designed to
`be accessed by thousands of simultaneous users using Internet
`protocols via standard web browsers.
`The TerraServer is a multi-media data warehouse. It differs
`from a traditional data warehouse in several ways: (1) it is
`accessed by millions of users, (2) the users extract relatively few
`records (thousands) in a particular session and, (3) the records
`are relatively large (10 kilobytes). By contrast, classic data
`warehouses are (1) accessed by a few hundred users via
`proprietary interfaces, (2) queries examine millions of records,
`to discover trends or anomalies, (3) the records themselves are
`generally less than a kilobyte. In addition, classic data
`warehouse queries may run for days before delivering results.
`Initial results typically cause users to modify and re-run queries
`to further refine results.
`One thing the TerraServer has in common with classic data
`warehouses is that both manage huge databases: several
`terabytes of data. Terraserver’s topographic maps cover all of
`the United States at 2 meter resolution 10 million square
`kilometers), the aerial photos cover 30% of the United States
`today (3 million square kilometers at one-meter resolution, and
`1% of the urban areas outside the United States (1 million
`square kilometers) at 1.56 meter resolution.
`This report describes the design of the TerraServer and its
`operation over the last year. It also summarizes what we have
`learned from building and operating the TerraServer.
`
`
`
`3
`
`Microsoft Corp. Exhibit 1032
`
`
`
`2. Application Design
`Microsoft TerraServer is accessed via the Internet through any
`graphical web browser. Users can zoom and pan across a mosaic
`of tiles within a TerraServer scene. The user interface is
`designed to function adequately over low-speed (28.8kbps)
`connections. Any modern PC, MAC, or UNIX workstation can
`access the TerraServer in this way. If you have never used it,
`look at the TerraServer web site at
`
`http://terraserver.microsoft.com/.
`Imagery is categorized into “themes” by data source, projection
`system, and image type. Currently, there are three data themes:
`
`USGS Digital Ortho-
`Quadrangles (DOQ) are
`gray-scale,
`1-meter
`resolution
`aerial photos.
`Cars can be seen, but 1-
`meter
`resolution
`is
`too
`coarse
`to
`show people.
`Imagery is orthorectified to
`1-meter
`square
`pixels.
`Approximately 50% of the
`U.S. has been digitized.
`The entire conterminous
`U.S.
`is expected
`to be
`completed by the end of
`2001.
`
`Raster
`Digital
`USGS
`Graphics (DRG) are 13-
`color digitized
`topographic
`maps, with scales varying
`from 2.4 meter resolution to
`25.6 meter resolution. DRGs
`are the digitized versions of
`the
`popular
`USGS
`topographic maps.
`The
`complete
`set
`of USGS
`topographic maps have been
`scanned
`including Alaska,
`Hawaii, and several territories
`such as Guam and Puerto
`Rico.
`
`Figure 3. a SPIN-2: 1.56-
`meter image of Atlanta's
`Fulton County Stadium.
`
`Figure 1. A USGS DOQ
`Image of 3Com Park near
`San Francisco
`
`Figure 2. A USGS DRG 2-
`meter resolution image
`
`Aerial Images SPIN-2™ are
`grayscale
`1.56
`meter
`resolution Russian
`satellite
`images. The images are re-
`sampled to 1-meter resolution.
`Microsoft
`TerraServer
`contains SPIN-2 images of
`Western Europe, the United
`States, and
`the Far East.
`Unfortunately, there is little
`coverage of Canada, South
`America,
`Africa,
`and
`Southeast Asia.
`
`
`
`4
`
`2.1 Projection Systems and Scenes
`The earth is a bumpy ellipsoid. Maps and computer monitors
`are flat. It is impossible to accurately present a spherical object
`on a flat surface.
`issue by developing
`this
`Cartographers have addressed
`projections of the geoid onto flat surfaces [Robinson95]. There
`are many projection systems, each suited to present certain
`regions or properties. Multiple images in a projection system can
`often be joined together to form a seamless mosaic within
`certain boundary conditions. These mosaics either have extreme
`distortion as they scale out, or they introduce seams.
`DOQ and DRG data are projected by the USGS into Universal
`Transverse Mercator (UTM) projection using
`the North
`American Datum (NAD) ellipsoid created in 1983 [Snyder89].
`UTM is a projection system that divides the earth into 60 wedge
`shaped zones numbered 1 thru 60 beginning at the International
`Date Line. Each zone is 6 degrees wide and goes from the
`equator to the poles. UTM grid coordinates are specified as zone
`number, then meters from the equator and from the zone
`meridian1.
`The conterminous United States is divided into 10 zones (see
`Figure 4). Each of these UTM zones is a scene. The
`TerraServer mosaics each scene, but two adjacent scenes are not
`mosaiced together. Users can pan and zoom within a scene, and
`can jump from one scene to another.
`
`
`Figure 4: The ten UTM zones in the continental United
`States.
`
`The Russian SPIN-2 imagery is digitized from Russian satellite
`photographs. The Russian satellite captures 160km wide by
`40km high areas in a single image. The satellite takes one image
`and then begins the adjacent image, overlapping the last image.
`The overlap is variable, and when digitized does not line up on a
`pixel boundary.
`To create a seamless mosaic of SPIN-2 imagery, all SPIN-2
`imagery would have to be orthorectified. This requires precise
`geo-location of each image, which was not possible due to
`security concerns. Without rectification, if tiles extracted from
`separate SPIN-2 satellite images are mosaiced, the tile edges are
`misaligned. Roads, rivers, and other geographic features do not
`line up. While GIS experts may tolerate this, it is disorienting
`and unacceptable to novice users.
`
`
`1 Actually, UTM grid units can be in inches, feet, meters, or kilometers.
`The USGS chose meters for most of their assets in the UTM projection.
`UTM is not used above 80N or 70S [Robinson95].
`
`Microsoft Corp. Exhibit 1032
`
`
`
`All the image tiles and their metadata are stored in an SQL
`database. A separate table is maintained for each (theme,
`resolution) pair so that tiles are clustered together for better
`locality. USGS DOQs have resolutions from 1-meter through
`64-meter. USGS DRG data supports 2-meter through 128-meter
`resolution. SPIN supports resolutions from 1-meter to 64-meter.
`Each theme table has the same five-part primary key:
`
`SceneID –individual scene identifier
`
`X – tile’s relative position on the X-axis
`
`Y – tile’s relative position on the Y-axis
` DisplayStatus – Controls display of an image tile
` OrigMetaTag – image the tile was extracted from
`There are 28 other fields
`that describe
`the geo-spatial
`coordinates for the image and other properties. One field is a
`large “blob” type that contains the compressed image.
`These tile blobs are chosen to be about ten kilobytes so that they
`can be quickly downloaded via a standard modem (within three
`seconds via a 28.8 modem).
`
`2.4. Gazetteer Database Schema
`
`
`
`200x200 m tile200x200 m tile
`
`
`
`.4 x. 4 km.4 x. 4 km
`
`
`
`.8 x .8 km.8 x .8 km
`
`
`
`1.6x 1.6 km1.6x 1.6 km
`
`
`Figure 5: The image database is organized as a pyramid of
`200x200 pixel tiles. Coarser images are sub-sampled from
`fine-resolution images.
`The Gazetteer lets users find images by name. It contains the
`names for about 1.5 million places, with many alternate
`spellings. It is a simplified version of the Gazetteer found in the
`Encarta Virtual Globe™ and Microsoft Streets™ products.
`The Gazetteer Schema has a snowflake structure. Place is the
`center table. It contains the formal name for a unique place on
`earth and maps the uniquely named location to the TerraServer
`Grid System. The AltPlace table contains all the synonyms of a
`unique place. The State and Country parent tables identify a
`place’s state/province and country. The AltState and AltCountry
`tables contain the state/province and country synonyms.
`
`Consequently, the TerraServer treats each 160km x 40km SPIN2
`image as a separate scene. These scenes are not mosaiced
`together. Users can pan and zoom within a scene, and can jump
`from one scene to another.
`
`2.2. TerraServer Grid System
`Users can zoom and pan across a mosaic of tiles within a
`TerraServer scene. The tiles are organized in the database by
`theme, resolution, scene, and location within a scene.
`TerraServer is designed to support a fixed number of resolutions
`in powers of 2 from 1/1024 meters per pixel (scale 0) through
`4096 meter (scale 22). The scale is related to resolution in
`meters per pixel by
`
`Scale = log2(resolution) + 10
`The highest resolution images currently in the database are one
`meter per pixel, which is scale 10. Coarser resolutions are
`derived by sub-sampling fine-resolution images.
`For UTM projection data-sets, the SceneID is the UTM zone
`assigned to the original image a tile’s pixels were extracted
`from. For SPIN2 data-sets, a unique SceneID is assigned by
`TerraServer for as each scene is loaded.
`Each TerraServer scene is planar. A tile can be identified by its
`position in the scene. The tile loading program assigns a relative
`X and Y tile identifier to each tile as it is loaded.
`For UTM projected data, the X and Y tile address is the UTM
`coordinate of the top-left pixel in the tile divided by the tile
`image size in UTM units in meters. The following are the
`formulas:
`X = TopLleftUTM_X / (TilePixWidth • Resolution)
`Y = TopLeftUTM_Y / (TilePixHeight • Resolution)
`For SPIN2 scenes, the X and Y tile addresses are relative to the
`upper left corner of the scene.
`The six fields – Resolution, Theme, SceneID, X, and, Y - form
`the unique key by which any TerraServer image tile can be
`directly addressed. Each TerraServer web page contains image
`tiles from a single Theme, Scale, and SceneID combination. For
`example, our building in USGS DOQ theme (T=1), has scene
`UTM zone 10 (S=10), at scale 1 meter (Z=10) with X=2766 and
`Y=20913. The URL is:
`
`http://terraserverv.microsoft.com/tile.asp?S=10&T=1&Z=10
`&X=2766&Y=20913.
`The TerraServer search system performs the conversion from
`geographic coordinate systems to the TerraServer coordinate
`system. The TerraServer image display system uses TerraServer
`grid system coordinates to pan and zoom between tiles and
`resolutions of the same theme and scene.
`
`2.3. Imagery Database Schema
`Each theme has an OriginalMeta table. This table has a row for
`each image that is tiled and loaded into the TerraServer
`database. The OrigMetaTag field is the primary key. The meta-
`fields vary widely from theme to theme. Some of the meta
`fields are displayed by the Image Info Active Server Page (for
`example,
`see
`http://terraserver.microsoft.com/GetOrigMeta.asp?OrigMetaId=
`104578&SrcId=1&Width=225&Height=150&ImgSize=0&DSiz
`e=0)
`
`
`
`5
`
`Microsoft Corp. Exhibit 1032
`
`
`
`Lookup by place name is surprisingly common (40%). So the
`user interface was modified to make it even easier. The
`TerraServer home page has a simple name lookup field, and a
`button that takes the user to an “advanced” name lookup web
`page. The find a place input field allows the user to enter a
`subset of place name, state name, and country name. The
`supporting database stored procedure builds a SQL cursor that
`searches for the name by performing joins on the appropriate
`tables, depending on which fields the user specified. Name
`searches are performed on the “Alt” tables which have
`synonyms and abbreviations for places (USA for example).
`Formal names matching the search criteria are returned from the
`Place, State, and Country tables.
`The ImageSearch table forms the association between a named
`place and visible images. The ImageSearch table identifies the
`Theme, SceneID, Scale, X, Y, and ImageDate of visible image
`tiles that cover the kilometer-square cell at the center of the
`named place.
` The load program inserts rows into the
`ImageSearch table when the sub-sampling program completes
`filling in the image pyramid for a certain area. The ImageSearch
`table serves as a one-level quad-tree index of the image data
`[Samet90].
`The image display Active Server Page scripts use an additional
`table, the Pyramid table, to display the city closest to the center
`tile on an image display web page. This two-level quad-tree is
`used to find population-weighted nearest neighbors of a given
`latitude and longitude. The SQL stored procedure scans a
`rectangle of the quad-tree to determine the closest city. The
`quad tree is implemented atop a B-tree by giving each
`quadrangle a name which is a prefix of the key for records in
`that quadrangle.
`In total, the Gazetteer contains about 4 million rows and
`consumes 3.3 GB of space. Our first design used a fine-
`granularity (quarter kilometer) quad-tree and so used a hundred
`times more space 400GB). That design needed no computation
`during lookup. The current design must examine 50 records on
`average and evaluate some spherical trigonometry on the
`coordinates for each record. The new design uses more
`computation, but it can examine a record in 30 microseconds of
`processor time, so it is a good tradeoff.
`
`Alt
`Country Name
`
`Alt
`State Name
`
`Imagery
`
`Country Name
`
`State Name
`
`Source Meta
`
`Load Job
`
`Alt
`Place Name
`
`Place Name
`
`Image Search
`
`Image
`
`Scale Job
`
`Load
`Management
`
`Place Type
`
`Pyramid
`
`Image Type
`
`External Link
`
`Gazetteer
`External Group
`Figure 6. The gazetteer forms a star schmea used to locate places
`by name. The Pyramid is a quad-tree used to located the nearest
`city. Most of the data is stored in the Image database which is
`indexed by theme, resoulution, sceneID, scale, X, and Y.
`Source Meta represents the OriginalMeta table. The job tables
`track the data loading process and the data sub-sampling to build
`the image pyramid.
`
`
`
`
`6
`
`3. TerraServer User Interface
`The TerraServer user interface is designed to be useable by a
`sixth grade Geography student. We wanted users to naturally
`understand how to find and view images with a minimum of
`instruction. As with video games – practically everything in the
`image viewing area is click-able.
`The image location methods all display the result as a web page
`formatted with a table of tiles at the middle of the image
`pyramid. The user can zoom-in to move down the image
`pyramid or pan in any direction – Northwest, North, Northeast,
`West, East, Southwest, South, and Southeast. Users can also
`zoom out from lower levels in the image pyramid.
`The image display web page is a simple HTML document
`containing a table of image source tags identifying the specific
`image tiles to form the picture. Anchor tags allow the user to
`pan and zoom through the tiles of a single theme at a time. Any
`web browser that supports HTML tables and can display Jpeg
`and Gif images can host the TerraServer user interface. Full
`resolution SPIN-2 imagery requires the web browser to support
`Java applets.
`Automated processes can also access the imagery via the HTTP
`protocol. This enables TerraServer imagery to be integrated into
`third party applications. The USGS and SPIN-2 data providers
`use this feature to sell and distribute their imagery on-line.
`TerraServer routes image “download” requests to data provider
`web sites. The data provider commerce applications fetch
`Microsoft TerraServer image tiles programmatically during the
`image purchase and image delivery process.
`
`3.1. Navigation
`Users have several methods to navigate to an initial spot on the
`globe. From there they can pan and zoom within a scene, or
`jump to a new scene.
`
`
`H o w Im a g e P a g e s a re F o u n dH o w Im a g e P a g e s a re F o u n d
`
`E xp e d ia E xp e d ia
`
`M a pM a p
`
`2 2 %2 2 %
`
`F a m o u s F a m o u s
`
`P la ce sP la ce s
`
`1 8 %1 8 %
`
`
`N a m e N a m e
`
`S e a rchS e a rch
`
`4 0 %4 0 %
`
`
`G e o G e o
`
`C o o rd in a teC o o rd in a te
`
`1 %1 %
`
`
`C o v e ra g e C o v e ra g e
`
`M a pM a p
`
`1 9 %1 9 %
`
`Figure 7: Name search is the most popular way of finding a
`place.
`
`Name Lookup: Clients knowing the place name (or part of the
`name) can navigate textually by presenting a name to the
`Gazetteer. The Gazetteer database has the names and locations
`of 1.5 million places in the world. For example, Moscow finds
`28 cities, while North Pole finds 5 cities, a mining district, a
`lake, and a point-of-interest. There are 378 entries for San
`Francisco in the Gazetteer. All entries with matching images are
`shown in a list along with the data provider name and the image
`date. The user selects an image from the list.
`
`Microsoft Corp. Exhibit 1032
`
`
`
`
`
`
`
`
`
`
`
`The View Map button switches the display to a Microsoft
`Expedia street map centered over the image area.
`The Download button links to a data provider web site
`where the user can purchase a digital or print copy of the
`image.
`The Image Info button links to a page describing the
`attributes of the source image the tiles on the web page
`were extracted from.
`The scale of the image, e.g. an English Mile/Yard scale and
`a Metric Kilometer/Meter scale.
`
`
`The user interface has evolved over the last 2 years, and
`continues to evolve. It is likely that it will have changed by the
`time you read this.
`
`Relative location from a City
`Photographed on DD MMM YYYY
`Other Image Dates Available: Spin-2 DD MMM YYYY Spin-2 DD MMM YYYY
`
`Zoom Out
`
`64m
`
`32m
`
`16m
`
`8m
`
`4m
`
`2m
`
`1m
`
`Zoom In
`
`Image copyright and credit line
`Microsoft Copyright Notice
`
`
`Figure 8 The TerraServer image frame. The image display
`area contains one or more tiles (default is 2x3).
`
`Expedia maps: To help orient users, a political and street map
`can be displayed in place of the image. Users can navigate by
`clicking on the map to center and zoom-in or pan in any
`direction. Once the user centers the map over their point of
`interest, he can switch back to the imagery.
`Famous Place List: The Famous Place list is a set of direct
`links to familiar places (like the Pyramids). Users suggest entries
`for the Famous Places List. TerraServer administrators review
`the suggested places and add entries to the Famous Place table
`in the TerraServer database. Thumbnail images of some of them
`are added to the TerraServer home page for easy access. Clients
`interested in viewing well known imagery of major cities,
`natural wonders, and famous sites can browse a pre-selected list
`of links to them.
`Coverage map: The USGS provided us with a set of base maps
`in three resolutions – 1 pixel per degree, 8 pixels per degree, and
`48 pixels per degree. The TerraServer database load programs
`color in areas of these three coverage maps as data is loaded into
`the system. The result is a shaded map depicting where there is
`image coverage. Three separate coverage maps are produced: all
`data providers, SPIN-2 only, and USGS only. The coverage map
`system is useful to those seeking an image “near New York
`City”, assuming users have an idea where New York City is
`without needing to name it.
`Navigation via latitude and longitude: Some users have a GPS
`or other source of precise geographic reference information.
`The TerraServer supports direct entry of latitude and longitude
`values in this case.
`As Figure 7 shows, most lookups are by place name. Map-
`based navigation is the second-most popular method. The
`famous-place list and coverage map are tied for third place.
`
`3.2 TerraServer Image Page
`TerraServer search operations identify the center tile of the
`image display page. The image display system builds a
`complete web page of image source and anchor tags around this
`tile. In addition to the mosaic of tiles, the page includes
`information aids that explain what the user is viewing clickable
`buttons for navigation (see Figure 8):
`
`The relative distance from nearest city, e.g. 20 Km SW of
`Raleigh, North Carolina, United States.
`
`The image date.
` A list of other images overlapping the current image.
`
`The logo of the data source, e.g. USGS or SPIN-2.
`
`The resolution, e.g. 64m, 32m, 16m, 8m, 4m, 2m, 1m etc.
`
`Zoom In/Out buttons to move in or out one resolution level
`from the current level. Clicking directly on an image zooms
`in one resolution level and pans the clicked image to the
`center of the image display area.
`Pan buttons (bright green triangles) around the border of
`the image window. The button is grayed out if imagery is
`not available in a direction.
`Three buttons control the size of the image display area –
`Small, Medium, and Large. Small is designed to fit an 800
`x 600 resolution monitor. Medium is designed to fit a 1024
`x 768 monitor. And Large is designed to fix 1280 x 1024
`and larger resolution monitors.
`
`
`
`
`
`
`
`7
`
`Microsoft Corp. Exhibit 1032
`
`
`
`4. System Architecture
`
`4.1. Three-Tier Software Architecture
`The Microsoft TerraServer has a 3-tier architecture. As depicted
`in Figure 8, the darker shaded boxes identify standard “off-the-
`shelf” software. The lighter shaded boxes identify application
`software that implements the TerraServer application logic.
`Tier 1: The Client is a graphical web browser or other
`hardware/software system
`that supports HTTP 1.1
`protocols and HTML 3.2 document structure. Microsoft
`TerraServer is built and tested with Netscape Navigator
`V3.0 & V4.0 and Internet Explorer V3.0, V4.0, & V5.0 on
`Windows, MacOS, and UNIX.
`Tier 2: The Application Logic is a web server application that
`responds to HTTP requests submitted by clients by
`interacting with the Tier 3 database system and applying
`logic to the results returned.
`
`
`
`
`
`Tier 1Tier 1Tier 1
`
`
`
`
`
`Web BrowserWeb BrowserWeb Browser
`
`
`
`Windows orWindows orWindows or
`
`
`Macintosh orMacintosh orMacintosh or
`
`
`Unix or...Unix or...Unix or...
`
`HTML/HTTPHTML/HTTPHTML/HTTP
`
`
`
`
`InternetInternetInternet
`
`
`
`
`
`ADO/ODBCADO/ODBCADO/ODBC
`
`
`
`Tier 2Tier 2Tier 2
`
`
`TerraServerTerraServerTerraServer
`
`
`Active ServerActive ServerActive Server
`
`
`PagesPagesPages
`
`
`Internet InformationInternet InformationInternet Information
`
`
`Server 4. Server 4. Server 4.
`
`
`Web ServerWeb ServerWeb Server
`
`
`Windows NT Windows NT Windows NT
`
`
`Server 4.0Server 4.0Server 4.0
`
`
`
`Tier 3Tier 3Tier 3
`
`
`TerraServerTerraServerTerraServer
`
`
`SQL Stored SQL Stored SQL Stored
`
`
`ProceduresProceduresProcedures
`
`
`SQL Server 7.0SQL Server 7.0SQL Server 7.0
`
`
`RelationalRelationalRelational
`
`
`Database SystemDatabase SystemDatabase System
`
`
`Windows NT Windows NT Windows NT
`
`
`Server 4.0Server 4.0Server 4.0
`
`
`
`Figure 9: The TerraServer has a 3-tier architecture.
`
`Tier 3: The Database System is a SQL Server 7.0 Relational
`DBMS containing all image and meta-data required by the
`Application Logic tier.
`All end user access to TerraServer image and metadata occurs
`through Tier 2 stored procedures (Active Server Pages2) written
`in Visual Basic Script (VBScript). Thirty two Active Server
`Page (ASP) scripts implement the entire TerraServer web
`application. These ASP scripts dynamically construct HTML
`documents from information returned by SQL Server stored
`procedures. The scripts invoke SQL Server stored procedures
`via an Active Data Object (ADO) interface that is layered on top
`of the Open Data Base Connection (ODBC) protocol.
`ASP scripts return complete HTML documents to web browser
`clients. Each web browser decodes the HTML and formats the
`user’s browser window. The web browser opens the image tile
`URL that causes the web server to invoke the image fetch ASP
`script.
`All meta and image data is stored in a single Microsoft SQL
`Server 7.0 database. Multiple database servers can be configured
`into a single Microsoft TerraServer web site. The only rule is
`that all data for a theme must be contained entirely in one
`database. A theme’s data can be replicated to one or more
`backup servers, but with our current stored procedures, a
`theme’s inventory cannot be split across two database servers.
`
`
`2 Active Server Page is technology similar to CGI scripts in other web
`server
`implementations. Active Server Page scripts have better
`performance characteristics than CGI.
`
`
`8
`
`the
`The TerraServer SQL Stored Procedures perform
`application’s data access logic so that each application function
`has only one round-trip from the web page to the database
`server. There are twelve stored procedures to support online
`access and eight stored procedures to support the database
`loading process.
`
`4.2. Database Architecture
`The database architecture was chosen to demonstrate the
`scalability and usability of SQL Server—everything was done in
`the most straightforward way, with no special tricks A single
`SQL server database was created on one file group consisting of
`many NTFS files. Each file resided on one of the four logical
`volumes and was 20GB which is a convenient backup/recovery
`unit. Initially, 53 files were created to achieve the 1TB database
`goal. Additional files are added as new imagery is loaded. Plans
`are to grow the database to over 2.2TB. The initial files were
`placed on two 595GB NT stripe-set volumes and the files added
`later were placed on two other similar volumes. SQL Server
`makes all allocation and placement decisions within and among
`the files.
`The TerraServer database was created using default settings with
`two exceptions. A bulk copy option was set to improve load
`times by reducing logging. Also, a truncate log on checkpoint
`option was set. These options preclude media recovery using the
`log. Instead, Terraserver would restore from an online database
`backup and reload any data that had been added since that
`backup.
`The physical SQL table design closely follows the logical
`schema of Figure 6. There is a separate set of Image tables for
`each theme. The set of tables is almost identical for each theme
`and consists of an OriginalMeta table with attributes of the
`theme for each original image and a set of Image tables, one for
`each scale. An Image table contains a row for each image tile of
`that scale and the primary key is the Scene identifier, X-axis
`coordinate, Y-axis coordinate, display status, and OriginalMeta
`primary key value. The tile image, about 10KB average size, is
`placed directly in a column of type image, there is no image data
`outside the database. The tile image size was set solely by the
`application needs and just happens to be a little larger than the
`SQL Server page size of 8KB. SQL Server automatically
`manages the mapping of image column data onto pages. The
`image column comprises almost all the bytes in the row and the
`Image tables comprise almost the entire database contents.
`All tables are clustered on their primary key and a few
`secondary indexes, mostly in the Gazetteer, were added to
`support searching for different name combinations and for on-
`line loading. Retrieving one image tile requires the simplest of
`SQL statements:
`
` Select * from Image where PrimaryKey=’value’
`One set of Gazetteer tables and one ImageSearch table serve to
`locate images by name in all themes. The Loadjob and Scalejob
`tables are used to manage the online loading of images. They
`hold the state of load jobs and are used for monitoring and
`restart.
`
`
`Microsoft Corp. Exhibit 1032
`
`
`
`4.3. Commerce Server
`The data source providers – USGS and SPIN-2, own the
`imagery stored within the Microsoft TerraServer database. The
`USGS and SPIN-2 organizations, not Microsoft, offer a private
`use license of their imagery for a nominal fee. Users purchasing
`USGS data receive one or more “Digital Ortho Quarter
`Quadrangle” data files on a single CDROM through the US
`Mail. Users purchasing SPIN-2 data receive a digital file
`through the Internet and can optionally purchase a photographic
`print from Kodak.3 The USGS and SPIN-2 each host their own
`electronic commerce web sites.
`The USGS commerce web site is located the USGS EROS Data
`Center in Sioux Falls, South Dakota. The web site runs
`Microsoft Site Server Enterprise Edition 3.0 on Intel based
`equipment. Custom components connect the MS Commerce
`application with internal USGS accounting and CD fulfillment
`systems. USGS EROS Data Center personnel developed the
`USGS commerce application and
`the components
`that
`interconnect the commerce server with internal applications.
`The SPIN-2 commerce web site is located at Aerial Images’
`headquarters in Raleigh, North Carolina. The web site runs
`Microsoft Site Server V3.0 on a two-node Compaq Alpha-
`Cluster. Microsoft developed the commerce application and the
`interconnection to Kodak for the rights to display SPIN-2 data
`on the Internet.
`The TerraServer web site hosts a store-front to the USGS
`commerce web site. The first web page explains the difference
`between an “Internet ready image file” like a Jpeg and the full-
`resolution USGS DOQ file. The user is offered a free, Internet-
`ready, Jpeg file in case the user does not have the software or
`interest in purchasing the more technically challenging U