`US 7,296,088 B1
`(10) Patent No.:
`(12)
`Padmanabhanetal.
`(45) Date of Patent:
`Nov. 13, 2007
`
`
`US007296088B1
`
`(54) SYSTEM AND METHOD FOR
`DETERMINING THE GEOGRAPHIC
`LOCATION OF INTERNET HOSTS
`
`(75)
`
`Inventors: Venkata N. Padmanabhan, Bellevue,
`WA(US); Lakshminarayanan
`Subramanian, EF] Cerrito, CA (US)
`
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`US.C. 154(b) by 1127 days.
`(21) Appl. No.: 09/849,662
`.
`Filed:
`
`(22)
`
`May4, 2001
`
`Related U.S. Application Data
`
`(60) Provisional application No. 60/249,487,filed on Nov.
`17, 2000.
`
`(51)
`
`Int. Cl.
`(2006.01)
`GO6F 15/173
`(2006.01)
`GO6F 15/16
`(52) US. Ch. veceecceceeeenes
`709/238; 709/239; 709/241
`(58) Field of Classification Search ......0.00.0..0.... None
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`1/1990 Gray et al. we 701/219
`4,891,761 A *
`5/1996 Maine etal... 342/457
`5,515,062 A *
`1/2000 Link et al.
`..........
`709/233
`6,012,096 A *
`3/2002 Intriligator et al.
`.
`702/3
`6,356,842 B1*
`1/2004 Andersonet al.
`...
`. 709/225
`6,684,250 B2*
`
`8/2004 Augart oe 370/351
`6,778,524 B1*
`6,804,624 B2* 10/2004 Silverman ............0.... 702/159
`6,922,417 B2*
`7/2005 Vanlint ........... ee 370/503
`
`
`
`
`6,981,055 B1* 12/2005 Ahuja et al. we. 709/238
`2002/0073231 Al*
`6/2002 Quarterman etal. ........ 709/238
`2002/0078233 Al*
`6/2002 Biliris et al. 0... 709/238
`2003/0095069 Al*
`5/2003 Stilp oo. eee 342/465
`
`OLHER PUBLICATIONS
`U.S. Appl. No. 60/241,776.*
`US. Appl. No. 60/194,761.*
`Imielinski,T., et al.; “DataSpace: Querying and Monitoring Deeply
`Networked Collections in Physical Space”, IEEE Personal Com-
`munications, vol. 7, No. 5, Oct. 2000, p. 4-9.
`David Moore,et al.; “Where in the World is netgeo.caida.org?”’,
`http://www.caida.org/outreach/papers/inet_netgeo/, Viewed Jan. 2,
`2001, p. 1-14.
`Quova: Our Service, http:www.quova.com/service.html, Viewed
`Jan. 2, 2001, p. 1-3.
`VisualRoute, Datametrics
`System Corporation,
`http://www.
`visualroute.com, Viewed Jan. 2, 2001, p. 1-2.
`VisualRoute, Technical Notes-Version 5.0b, http://www.visualroute.
`com/technotes.html, Viewed Jan. 2, 2001, p. 1-5.
`
`(Continued)
`
`Primary Examiner—Nathan J. Flynn
`Assistant Examiner—AshokPatel
`(74) Attorney, Agent, or Firm—Amin, Turocy & Calvin,
`LLP
`
`(57)
`
`ABSTRACT
`.
`.
`.
`A system and methodologies are disclosed for determining
`the geographic location of an Internet host. A first method
`infers host location based on the DNS namesofthe host of
`interest or other nearby network nodes. A second method
`employs network delay measurements from geographically
`distributed locations to triangulate the coordinates of the
`host. A third method couples partial host-to-location map-
`ping information obtained from one or more sources with
`BGPorother routing information in order to infer location
`of the host of interest.
`
`41 Claims, 13 Drawing Sheets
`
`
`
`
`
`80 120\140100
`0
`20
`40
`
`IP address sequence number (1000s)
`
`Google v. Mullen
`
`Google Exhibit 1064
`Google Exhibit 1064
`Google v. Mullen
`
`a 230
`
`
`
`— Error Distance —— Dispersion
`
`
`
`
`
`
`
`
`
`(kilometers)
` Distance
`
`
`
`
`
`
`
`US 7,296,088 B1
`Page 2
`
`OTHER PUBLICATIONS
`
`AllWhois, http://www.allwhois.com/cgi-bin/allwhois4.cgi, Viewed.
`Nov. 14, 2000, p. 1-2.
`Akamai, Inc.; http://www.akamai.com, Viewed Nov. 14, 2000,p. 1.
`Akamai Advantage,
`Inc.; http://www.akamai.com/html/aa/index.
`html, Viewed Nov. 14, 2000, p. 1.
`Akamai Network, Inc.; http://www.akamia.com/html/aa/akne html,
`Viewed Nov. 14, 2000, p. 1-2.
`The CAIDA Web Site; httl://www.caida.org, Viewed Nov. 14, 2000,
`p. Ll.
`Topology-CAIDA: Analysis: topology; http://www.caida.org/analy-
`sis/topology, Viewed Nov. 14, 2000, p. 1-2.
`Tools-CAIDA: TOOLS; http://www.caida.org/tools, Viewed Nov.
`14, 2000, p. 1.
`Education and Outreach-CAIDA:OUTREACH, hittp://www.caida.
`org/outreach, Viewed. Nov. 14, 2000, p. 1-2.
`NetBoy Suite: Complete Netboy Package of Appliations; http://
`www.ndgsoftware.com/~ndgabout/page3.html, Viewed Nov.
`14,
`2000, p. 1-3.
`NeoWorx Inc., http://www.neoworx.com, Viewed Nov. 14, 2000, p.
`1-3.
`
`NeoTrace: Graphical Trace Route Program, http://www.neoworx.
`com/products/neotrace, Viewed Nov. 14, 2000, p. 1-2.
`“e-Business Without Limits”;Digital
`Island,
`Inc.; http://www.
`digitalisland.net/services, Viewed. Nov. 14, 2000, p. 1-2.
`VisualRoute: Mapping the Internet,
`littp://www.visualroute.com,
`Viewed Nov. 14, 2000, p. 1-2.
`U.S. Census Bureau: United States Department of Commerce;
`http://www.census.gov, Viewed Nov. 14, 2000, p. 1-2.
`Sutton, J.C., et al., “Dynamic Location: An Iconic Model to Syn-
`chronize Temporal and Spatial Transportation Data”, Transporta-
`tion Research Part C (Emerging Technologies), vol. 8C, No. 1-6,
`Feb.-Dec. 2000, p. 37-52.
`“A Geographic XML-based Format for the Mobile Environment”;
`Artem Garmash; Proceedings of the 34th Hawaii International
`Conference on System Sciences—2001; pp. 1-9.
`V.N. Padmanabhan,et al., Determining the Geographic Location of
`Internet Hosts, Nov. 2000, pp. 1-15, MBR—TR—2000-110.
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`US 7,296,088 B1
`
`Nov. 13, 2007
`
`Sheet 1 of 13
`
`
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 2 of 13
`
`US 7,296,088 B1
`
`START
`ya
`
`52
`
`OBTAIN ROUTE INFORMATION
`RELATING TO ONE OR MORE
`NETWORK PATH BETWEEN A
`HOST IP ADDRESS AND ONE
`OR MORE COMPUTER
`SYSTEMS
`
`OBTAIN ROUTER
`LABEL ASSOCIATED
`WITH HOST IP
`ADDRESS
`
`
`
`
`
`
`
`
`OBTAIN ROUTER LABEL
`ASSOCIATED WITH
`NEXT CLOSEST
`NETWORK NODE
`
`60
`
`ESTIMATE
`LOCATION OF
`INTERNET HOST
`ACCORDING TO
`THE LOCATION
`
`64
`
`66
`
`DONE
`
`FIG. 2
`
`
`
`INFORMATION
`
`
`
`LOCATION
`
`CODE IN ROUTER
`
`LABEL?
`
`58
`
`YES
`
`OBTAIN LOCATION
`INFORMATION FROM
`A DATA STORE
`CORRESPONDING
`TO THE LOCATION
`CODE
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 3 of 13
`
`US 7,296,088 B1
`
`4000 km
`
`Ga) Seattle, WA
`(2) Berkeley, CA
`G) Stanford, CA
`(4) San Diego, CA
`
`(3) Madison, WI
`() Urbana, IL
`(7) St. Louis, MO
`(8) Dallas, TX
`
`(3) Durham, NC
`Chapel Hill, NC
`
`(9) Austin, TX
`Boston, MA
`@) New Brunswick, NJ
`(12) Baltimore, MD
`
`FIG. 3
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 4 of 13
`
`US 7,296,088 B1
`
`aan
`
`Cumuleiive Oeelribadion of Emor Cire ict
`
`
`
`82
`
`88
`
`CumulativeProbebiay md
`
`met
`
`OA
`
`+,(* Co oa
`
`fe +
`
`+
`+
`
`oak,
`+
`
`a2
`
`0.1
`
`a
`
`506
`
`84
`
`tx) a: at
`
`2500
`
`Distarcm(iong)
`
`
`
`a0
`
`x00
`
`4000
`
`4500
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 5 of 13
`
`US 7,296,088 B1
`
`Probability
`
`Q
`
`500
`92
`
` Camulative
`
` seeEE CHS LLCHEESHeMeCNMERHOSEEDEROLatottge
`7000-15002
`S00
`UG
`Error Distance (km)
`
`beNERiDARENChek x
`(tS eB
`
`FIG. 5
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 6 of 13
`
`US 7,296,088 B1
`
`Oe - i.
`
`7
`
`a Voc
`
`FIG. 6
`
`2500
`2000
`15K:
`Error Distance (km)
`
`3000 \ 4009
`92
`
`1
`
`.
`
`4309
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 7 of 13
`
`US 7,296,088 B1
`
`START
`
`102
`
`Be 100
`
`104
`
`106
`
`108
`
`110
`
`112
`
`MEASUREA FIRST DELAY TIME RELATING TO A FIRST
`
`NETWORK PATH BETWEENAN INTERNET HOST AND A FIRST COMPUTER SYSTEM
`HOST AND A SECOND COMPUTER SYSTEM
`A THIRD COMPUTER SYSTEM
`
`MEASURE A SECOND DELAY TIME RELATING TOA
`SECOND NETWORK PATH BETWEEN THE INTERNET
`
`MEASUREA THIRD DELAY TIME RELATING TO A THIRD
`NETWORK PATH BETWEEN THE INTERNET HOST AND
`
`CORRELATE THE FIRST, SECOND,
`AND THIRD DELAY TIMES
`
`THE DELAY TIMES
`
`ESTIMATE LOCATION OF INTERNET HOST
`ACCORDING TO THE CORRELATION OF
`
`114
`
`DONE
`
`FIG. 7
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 8 of 13
`
`US 7,296,088 B1
`
`we
`
`aa
`rateagerness te
`
`owecl
`
`eswee nt
`
`Po
`i
`\
`‘
`\
`2500
`\
`i
`\
`i
`132 ~ \
`2000 ©
`SUT 138
`ef
`.
`SS
`&
`se
`3
`2 1800
`os
`a
`be
`a
`
`*%
`
`*
`
`*,
`
`&,
`
`*
`
`136
`&.
`
`ys
`cy,
`hea,
`
`to
`
`pete]
`
`_-
`
`|
`;
`:
`
`{GOP
`
`&
`
`se
`
`~ om
`
`‘ 137
`
`500
`
`~ &
`
`8
`Numberof
`Probe Points
`
`FIG. 8
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 9 of 13
`
`US 7,296,088 B1
`
`START
`
`202
`
`Be 200
`
`204
`
`206
`
`208
`
`210
`
`212
`
`OBTAIN PARTIAL IP ADDRESS-
`TO-LOCATION MAPPING
`INFORMATION FROM
`A DATA SOURCE
`
`OBTAIN NETWORK ROUTING
`INFORMATION
`
`
`
`CLUSTER INFORMATION
`
`CLUSTER TOGETHER
`IP ADDRESSES
`FOR HOSTSIN THE SAME
`LOCATION
`ACCORDING TO THE
`NETWORK ROUTING
`INFORMATION
`
`CORRELATE THE PARTIALIP
`ADDRESS-TO-LOCATION
`INFORMATION WITH THE
`
`
`
`ESTIMATE LOCATION OF
`INTERNET HOST ACCORDING
`TO CORRELATION OF THE
`
`PARTIAL IP ADDRESS-TO-
`LOCATION INFORMATION AND
`THE CLUSTER INFORMATION
`
`
`
`
`
`
`
`FIG.9
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 10 of 13
`
`US 7,296,088 B1
`
`
`
`
`0
`
`1000
`
`2000
`
`3000
`
`4000
`
`Error distance (kilometers)
`
`FIG. 10
`
`
`
`— Geolrack ——- GeoPing —— GeoCluster
`
`
`
`| ||
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 11 of 13
`
`US 7,296,088 B1
`
`a 230
`
`
`
`
`
`— Error Distance —— Dispersion
`
`
`
`
`
`
`
`
`
`
`
`4500
`
`4000
`
`3500
`
`3000
`
`2500
`
`2000
`
`1500
`
`1000
`
`500
`
`0 120\14020 40 60 80 100
`
`
`
`
`
`
`
`
`
`
`
` Distance(kilometers)
`
`
`
`
`
`
`
`IP address sequence number (1000s)
`
`FIG. 11
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 12 of 13
`
`US 7,296,088 B1
`
`a 240
`
`
`—--—-BGPonly (20,0.7)
`— — BGP+subclustering (20,0.7)
`— BGP+subclustering, (5,0.6) ------- {24 clusters, (20,0.7)
`
`
`
`
`
`
`
`
`SanneNy —
`0.5
`
`CDF
`
`ot |e nnn
`
`0.3
`4
`242
`
`
`
`
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1000
`1500
`2000
`2500
`
`0
`
`500
`244
`
`Error distance (kilometers)
`
`3000
`
`3500
`
`4000
`
`FIG. 12
`
`
`
`U.S. Patent
`
`Nov. 13, 2007
`
`Sheet 13 of 13
`
`US 7,296,088 B1
`
`PROCESSING
`
`UNIT
`
`COMPUTER
`
`321
`
`322
`325
`
`24
`
`Eaa
`
`ROM
`
`i
`
`20 |
`
`\
`
`| Operating System \a POs \335
`
`petro ser
`
`! 1__ Applications_ \336
`| rrocccccclcccc
`=Qa. oe @n
`a rm) oo mJ
`
`an|toy0,!5,| o,||1ty\
`he Data N.338
`
`a H
`
`ard Drive
`
`328
`329
`
`
`
`(|
`
`r
`
`333
`
`334
`
`348
`
`346
`
`MODEM
`
`Adaptor
`
`Serial
`
`Interface
`
`354
` Network
`Adaptor
`
`3
`
`5
`
`353
`
`351
`
`ps0
`
`331
`
`MONITOR
`
`347
`
`340
`
`342 Y MOUSE
`
`REMOTE
`COMPUTER
`
`349
`
`LAN
`
`350
`
`FIG. 13
`
`323
`
`
`
`
`
`US 7,296,088 Bl
`
`1
`SYSTEM AND METHOD FOR
`DETERMINING THE GEOGRAPHIC
`LOCATION OF INTERNET HOSTS
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims the benefit of U.S. Provisional
`Patent Application Ser. No. 60/249,487, which wasfiled
`Nov. 17, 2000, entitled SYSTEM AND METHOD FOR
`DETERMINING THE GEOGRAPHIC LOCATION OF
`INTERNET HOSTS.
`
`
`
`TECHNICAL FIELD
`
`The present invention relates generally to computer sys-
`tems, and more particularly to methods and systems for
`determining the geographic location of Internet hosts.
`
`BACKGROUND
`
`Location-aware computing provides a user with a com-
`puting experience tailored to the user’s geographical loca-
`tion. Location-aware computing enables users to interact
`effectively with their environment, by making computing a
`function of the user’s location as well as other factors. Both
`
`the behavior andthe user interface of software applications
`may be modified according to the user’s location via the
`employment of location-aware computing techniques. For
`example, a printing service may route a user print job based
`on whichprinter is located nearest the user’s current loca-
`tion. In another example, a restaurant location service or
`application may preferentially locate or select restaurants
`that are close to the user’s location.
`
`Location-aware computing is also relevant for the more
`traditional Internet hosts, such as user desktop machines,
`whichare typically stationary and are commonly connected
`via a fixed wireline network. Consider, for example, a user
`browsing information on a news Website. There are many
`ways in which the information delivered to such a user can
`be customized accordingto his or her physical location. For
`instance, the user may be sent information on local events,
`weather, and the like. In addition, advertisements may be
`targeted based on the geographical location of the Internet
`host. The Web site can also monitor usage and/or control
`access to content based on client location (this in analogous
`to viewership ratings and broadcast rights in the context of
`traditional TV).
`Knowingor estimating the physical location ofthe useris
`a prerequisite for location-aware computing. The granularity
`of location information needed may vary depending on the
`application. Thusfar, much work has gone into determining
`userlocation in the context of wireless networks and mobile
`hosts, for example, a cellular phone user driving around a
`city. A variety of approaches have been used for determining
`user/host location in a wireless setting. For instance, location
`inferences have been obtained based on wireless signal
`timing and/or signal strength, based on a particular mobile
`host’s point of attachment in a cellular network, or by using
`a Global Positioning System (GPS). However, the signal
`strength measurement techniques employed in wireless sys-
`tems are not applicable to the Internet.
`While various techniques have thusfar been developed for
`wireless or mobile clients, such as cell phones and thelike,
`conventionaltools and techniquesfor locating Internet hosts
`have not similarly progressed. Thus, while some suchtools
`are available, these remain generally inadequate to provide
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`the geographic resolution required to facilitate improved
`location-aware computing applications and services. For
`instance, it is possible for a Web site to determine a user’s
`Internet host location by requiring the user to register with
`the site and then “log in” each timeheorshevisits thesite.
`While such a mechanism may be appropriate for services
`with high security requirements (such as banking and
`email), it is impractical to expect users of the vast majority
`of Websites (such as newssites that users browse casually)
`to register and log in.
`An alternative to requiring users to “log in”or register, is
`to store location information in a client-based cookie at the
`
`time of registration, and to then include the cookie in future
`requests. Such an approach does not require the user to log
`in on each visit, but it still imposes the burden ofregistra-
`tion. Moreover, the cookie information may be unavailable
`when the user connects from a host other than the one from
`
`which registration was performed. In either of these tech-
`niques, the location information manually input by an indi-
`vidual user may be inaccurate or erroneous. Thus, the value
`of such information is questionable, with respect to provid-
`ing a computing experience customized according to loca-
`tion.
`
`There has recently been an interest in location-aware
`computing and services in wireless environments. As a
`result there has been much work on the problemof locating
`hosts in such environments. The most well-known among
`these is the Global Positioning System (GPS). However,
`GPSis ineffective indoors. There have been several systems
`targeted specifically at indoor environments. However, in
`general these techniques are specific to wireless networks
`and are thus not applicable to the Internet.
`Some attempts have been made to provide services for
`mapping IP addresses to geographic locations. Thusfar,
`however, no satisfactory solution has been found. Conven-
`tional proposals for solving the Internet host identification
`problem can be broadly classified into three categories;
`domain nameservice approaches, whois based approaches,
`and traceroute approaches.
`The first approach includes incorporating latitude and
`longitude information in the domain name service (DNS).
`This may include defining the format of a new resource
`record (RR) for the domain name system, and reserving a
`corresponding DNS type mnemonic (e.g., LOC) and
`numerical code (e.g., 29). However, existing DNS based
`approaches
`suffer
`from several problems. First,
`this
`approach involves modification of the record structure of
`DNSrecords. Also, the DNS approach requires different
`administrations to enter the LOC records into the DNS
`
`record database, which may be a burdensometask. Further-
`more, there is no easy way to verify whether the location
`entered by a user or administrator is correct and trustworthy.
`Another approach involves using the whois database to
`determine the location of the organization to which an IP
`address wasallocated. The whois utility is used to query a
`host and determine if a certain user is registered on that
`system. Some conventional tools query whois servers to
`attempt
`to ascertain the geographic location of a host.
`However,
`several problems
`exist with whois based
`approaches. For example,
`the whois database is highly
`unreliable. The organizations that maintain the domain name
`data do not insist on keeping the database accurate and
`current. Thus, records corresponding to an IP address block
`may bepresent in multiple registries, but these records may
`not be consistent.
`In addition, a large block of IP addresses may be allocated
`to a single entity. Thus, for any IP address in that block, the
`
`
`
`US 7,296,088 Bl
`
`3
`whois server will return only the headquarters or the address
`registered by the organization. For example, the 8.0.0.0/8 IP
`address block is allocated to BBN Planet and a query to a
`whois database may only return “Cambridge, Mass.” for any
`IP address within this range. A further problem is that due to
`web-hosting and domain nametransfers, the location regis-
`tered in the whois database may be very different from the
`actual location ofhost server. For example, a whois query on
`www.desktop.com may return the location as Colorado,
`even though the servers are actually based in San Francisco.
`A third approach involves performing a traceroute func-
`tion to an IP address and mapping the router label to the
`geographic location using airport codes, city codes and
`country codes. Traceroute is a utility that traces the route
`from a client machine to a remote host being contacted, and
`reports IP addresses of routers in between. The basic idea in
`any traceroute-based tool is to perform a traceroute from a
`source to a given IP address and look at the router labels
`(e.g., the DNS names associated with a router’s network
`interfaces) along the path. The router labels may have the
`geographic location information hidden in terms of city
`codes, airport codes and country codes. However, tracer-
`oute-based approaches suffer from several shortcomings.
`First, router label
`information may not be available for
`several reasons: the router may not respond to the packets
`sent by traceroute or the IP address of the router interface
`may not resolve to a DNS name. Second,
`the location
`information contained in the router label may be ambiguous.
`EachISP has its own naming schemefor cities, which makes
`it difficult to decipher location. For example, the codes used
`for San Francisco, Calif. include sfo, sffca, sanfrancisco,
`sanfranciscosfd,
`snfr, and snfrca. City names may be
`ambiguous. For example,
`there are well over a dozen
`different locations called Bloomington in the United States,
`so the presence of the code bloomington in a router label
`does not indicate the actual location. Even airport codes may
`cause ambiguity. For example, mit is the airport code for
`Shafter, Calif., but it also appears in router labels associated
`with MIT in Cambridge, Mass.
`A fundamental problem with using IP address to estimate
`location, in general, is that manyclients are behindfirewalls
`or proxies, so the “client” IP address seen by the server may
`actually correspond to the firewall or the proxy. Thus,
`geographic location is traceable only to the proxy location,
`which may be quite far from the location of the client.
`Existing techniques based on DNS, whois, or traceroutes are
`unable to tell when a “client” IP address actually corre-
`sponds to a proxy. So they would use the proxy’s IP address
`to estimate location not realizing the error. As a result of the
`incorrect
`location estimate, a location-aware computing
`system may provide the user with inappropriate information
`or content.
`
`In summary, the limitations of existing techniques points
`to the need for improved systems and methodologies by
`which the geographic location of Internet hosts may be
`determined.
`
`SUMMARY
`
`The following presents a simplified summary of the
`invention in order to provide a basic understanding of some
`aspects of the invention. This summary is not an extensive
`overview of the invention. It is intended to neither identify
`key or critical elements of the invention nor delineate the
`scope of the invention. Its sole purpose is to present some
`concepts of the invention in a simplified form as a prelude
`to the more detailed description that is presented later.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`The present invention provides methodologies and soft-
`ware tools for determining the geographic location ofInter-
`net hosts, which achieve improved location accuracy over
`that of conventional techniques. In particular, the invention
`comprises software tools and methodologies, referred to
`hereinafter as GeoTrack, GeoPing, and GeoCluster,
`for
`determining the geographic location of Internet hosts. The
`GeoTrack tool infers location based on the DNS names of
`the host of interest or other nearby network nodes. The
`GeoPing tool correlates network delay measurements from
`geographically distributed locations to triangulate the coor-
`dinates of the host. The GeoCluster tool couples partial
`host-to-location mapping information obtained from one or
`more sources with routing information in order to infer
`location ofthe host ofinterest.
`
`The invention finds utility in many situations, such as
`where Internet servers try to deduce the location of clients
`without depending on explicit information from the human
`user or the client ISP. Thus, the tools and methodologies of
`the present invention determine the geographic location of
`the user knowing only the IP address of the Internet host
`from which the user is connecting. The novel techniques of
`the invention approach this problem from different angles by
`employing one or more different properties of the Internet,
`such as hierarchical addressing and correlation between
`delay and distance.
`The first methodology, GeoTrack, operates to infer Inter-
`net host locations based on the DNS names of the host of
`
`interest or other nearby network nodes. This method
`employs traceroute and PING measurements from multiple
`sources to the specified IP address of the host, and converts
`the router labels into geographical
`locations using city
`codes, country codes, airport codes, and the like. The DNS
`nameof an Internet host sometimes contains clues about the
`location of the host. Such a clue, when present, may indicate
`the location at different levels of granularity such as city
`(e.g., corerouter] SanFrancisco.cw.net may indicate the city
`of San Francisco, Calif.), state/region (e.g., www.state.ca.us
`may indicate the state of California,), or country (e.g.,
`www.un.cm may indicate the country of Cameroon).
`Even whenpresent, however, the clue could be mislead-
`ing (e.g., a host with a DNS name www.newyork.com may
`actually be located in the city of New Orleans). Thus, the
`GeoTrack method selectively employs such clues to over-
`come or minimize the problems associated with prior DNS
`based approaches. For instance, the GeoTrack method may
`employa subsetof airport codes, wherein misleading airport
`codes have been removed.In addition, other specializedlists
`of city and country codes may also be employed, wherein
`the city codes may bepartitioned based on country and/or
`continent. This allows the proper partition to be employed
`according to other network information.
`If the traceroute initiated from one location fails because
`it encounters routers whose labels do not contain meaning
`location information, then the traceroutes from other geo-
`graphically dispersed locations may still succeed because
`these are likely to encounter a different set of routers. Thus,
`GeoTrack employs a plurality of traceroutes initiated from
`geographically dispersed locations, whereby more refined
`location estimates are obtained andthe likelihood of obtain-
`ing a location estimate is improved.
`According to an aspect of the present invention, there is
`provided a method of determining the location of an Internet
`host using a first computer system, which comprises obtain-
`ing route information relating to a first network path
`between a host IP address associated with the Internet host
`
`and the first computer system. The first network path com-
`
`
`
`US 7,296,088 Bl
`
`6
`5
`a variety of sources, including web-based email sites, busi-
`prises the first computer system, the Internet host, and one
`or more intermediate network nodes. In addition, the route
`ness web hosting sites, TV listing sites, and the like. The
`information may comprise a plurality of router labels asso-
`host-to-location mapping information thus obtained is par-
`ciated with the host IP address and the intermediate network
`tial in the sense that it may includearelatively small number
`nodes. The method further comprises extracting a first
`of IP addresses. Moreover, the mapping information may not
`location code from the route information corresponding to a
`be entirely accurate (for example, location information in
`router label associated with the Internet host or an interme-
`users registration records at a Web-based email site may be
`stale or incorrect). BGP or other routing information is then
`employed to expand the coverageofthis data by identifying
`clusters of IP addresses that are likely to be located in the
`same geographic area.
`Accordingto still another aspect of the invention,there is
`provided a method of determining the location of an Internet
`host using a first computer system, which comprises obtain-
`ing partial [P-to-location mapping information from a data
`source and network routing information, clustering together
`IP addresses correspondingto hosts likely to be in the same
`geographic location according to the network routing infor-
`mation to obtain cluster information, correlating the partial
`IP-to-location information with the cluster information, and
`providing a location estimate of the location of the Internet
`host according to the correlation of the partial IP-to-location
`information and the cluster information.
`
`diate network node proximate the Internet host, and con-
`sulting a data store comprising at least one data set having
`location codes and corresponding location information. The
`method further comprises obtaining location information
`from the data store correspondingto thefirst location code
`associated with the Internet host or the proximate interme-
`diate network node, and providing a first location estimate of
`the location of the Internet host according to the location
`information from the data store corresponding to the first
`location code.
`
`The second method, GeoPing, employs network delay
`measurements made from geographically distributed loca-
`tions to triangulate the coordinates of the host. This method
`thus employs the functional relationship between the delay
`experienced by packets traveling between a pair of hosts in
`the network and the geographic separation between the
`hosts. The delay measurements are correlated with a data-
`base or other data store having delay measurements between
`knownsources and locations, in order to provide anestimate
`of the Internet host location of interest. For instance, a set of
`delay measurements made from geographically distributed
`locations may be used to form a delay vector. The measured
`delay vector may then be compared with existing delay
`vectors corresponding to known locations(e.g., as recorded
`in a delay map). The location corresponding to the closest
`delay vector may then be used as the location estimate.
`Alternatively or in combination,the locations corresponding
`with two or more knowndelay vectors close to the measured
`delay vector may be triangulated to derive a location esti-
`mate for the host ofinterest.
`According to another aspect of the invention, there is
`provided a method of determining the location of an Internet
`host using a first computer system, which comprises mea-
`suring a first delay time relating to a first network path
`between a target host and the first computer system, mea-
`suring a second delay timerelating to a second network path
`between the target host and a second computer system, and
`measuring a third delay timerelating to a third network path
`between the target host and a third computer system. The
`method may include measuring any plurality of such delay
`times. In addition, the method provides for correlating the
`first, second, and third delay times, and providing a location
`estimate of the location of the Internet host according to the
`correlation of the first, second, and third delay times. For
`example,
`the correlation may comprise creating a delay
`vector using the measured delay times, and comparing the
`resulting measured delay vector with known delay vectors.
`The location estimate may be provided, for instance, by
`selecting the location of a known delay vector closest to the
`measured delay vector, and/or by triangulating the locations
`of a plurality of known delay vectors close to the measured
`delay vector.
`The third method, GeoCluster, couples partial host-to-
`location mapping information obtained from one or more
`sources with BGP or other routing information in order to
`infer the location of the host of interest. Network prefix
`information extracted from the routing data, in this regard,
`may indicate clusters of IP addresses that are likely to
`correspond to hosts that are collocated. For example, the
`host-to-location mapping information may be obtained from
`
`30
`
`40
`
`45
`
`The above methods may further comprise some measure
`of self-calibration, which may include the provision of
`confidence metrics. Thus, a location-aware system or appli-
`cation mayselectively provide location specific services or
`content according to the estimated Internet host location if
`the confidence is above a threshold, and not provide such
`content or services otherwise. In this manner, the method-
`ologies of the invention further prevent or minimize the
`provision of incorrect location-specific content or services.
`The invention also provides for combinations of two or more
`of the GeoTrack, GeoPing, and GeoCluster methodologies.
`The invention further comprises software tools and com-
`puter-readable media with computer-executable instructions
`for performing the various methodologies illustrated and
`described herein. In addition, the invention comprises sys-
`tems, such as computer systems, adapted to perform Internet
`host
`location estimation. The invention further provides
`geographicallocation estimate data associated with an Inter-
`net host resulting from the above mentioned methodologies
`and processes.
`To the accomplishmentof the foregoing andrelated ends,
`certain illustrative aspects of the invention are described
`herein in connection with the following description and the
`annexed drawings. These aspects are indicative, however, of
`but a few of the various ways in which the principles of the
`invention may be employed and the present invention is
`intended to include all such aspects and their equivalents.
`Other advantages and novel features of the invention may
`becomeapparent from the following detailed description of
`the invention when considered in conjunction with the
`drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic diagram illustrating an exemplary
`network path between an Internet host IP address and a
`computer system;
`FIG. 2 is a flow diagram illustrating an exemplary
`GeoTrack method of determining the location of an Internet
`host in accordance with an aspect of the present invention;
`FIG. 3 is a schematic diagram illustrating an exemplary
`set of geographically dispersed probe locations;
`
`
`
`US 7,296,088 Bl
`
`7
`FIG. 4 is an illustration of exemplary cumulative distri-
`bution of crror distance results obtained via an exemplary
`GeoTrack method according to an aspect of the invention;
`FIG. 5 is an illustration of an exemplary comparison of
`cumulative distribution functions of error distance results
`
`obtained via an exemplary GeoTrack method according to
`an aspect of the invention;
`FIG. 6 is an illustration of exemplary cumulative distri-
`bution functions of error distance results obtained via
`
`another exemplary GeoTrack method employing multiple
`locations according to another aspect of the invention;
`FIG.7 is a flow diagram illustrating an exemplary GeoP-
`ing methodof determining the location of an Internet host in
`accordance with another aspect of the invention;
`FIG. 8 is an illustration of exemplary meanerror distance
`vs. number of probe points data results obtained according
`to an aspect of the invention;
`FIG. 9 is a flow diagram illustrating an exemplary Geo-
`Cluster method of determining the location of an Internet
`host in accordance with another aspect of the invention;
`FIG. 10 is an illustration of exemplary CDFerror distance
`results obtained according to an aspect of the invention;
`FIG. 11 is an illustration of exemplary error distance and
`dispersion results obtained according to an aspect of the
`invention;
`FIG.12 is anillustration of another set of exempla