`
`CHAPTER How
`29 Internet-Based
`Software Works
`
`
`
`168
`
`168
`
`
`
`SOME people believe that in the not-too-distantfuture, muchof the software you run won’t physically be
`found on your computer, Instead, it will be located, at least in part, on an Internet server somewhere. You'll
`turn on your computer, it will have instant, high-speed access to the Internet, you'll choose software to run
`from a server, andit will always be up-to-date because the companyin charge will continually updateit.
`To a certain extent, this type of Internet-based computing has a Back to the Future kind offeel to it. In
`the days before the advent of the personal computer, all software was run on a mainframeor large mini-
`computer, and peopleaccessed it via a dumb terminalthat had no processing poweritself. In some ways,
`Internet-based software goes back to that model. Thedifference is that instead of dumb terminals, people
`have powerful computers that can still do a lot on their own. And not many people expectthatall software
`will be run this way.
`No matter whatthe future holds, today some softwareis already being run that way. Thesoftware is
`being provided by application service providers (ASPs), which have a variety of ways of letting you run soft-
`ware over the Internet. The most common wayis one in which you visit a Web site, and the software runs
`inside your browser. But there are variations as well, such as services that run Web servers, while you or your
`business runs client software that requires the use of those servers.
`
`You might already be using an ASP withoutrealizing it. For example, Web-based e-mail services, such as
`HotMail, in fact are forms of ASPs—in essence they allow you to run an e-mail program from within your
`browser.
`
`Corporations turn to ASPs because they enable them to do business more cheaply. Rather than having
`to build a large infrastructure ofinternal servers or have a large supportstaff for installing and trouble-
`shooting software and hardware, they contract with an ASP and let the ASP bearall thosecosts.
`
`It’s not only corporations, though, that eventually might turn more to ASPs. Microsoft has announced
`its vision for the future of software in its .Net plan, and that plan includes ASP-like features. At this point,
`the details of .Net are quite hazy. But what's come outso far indicates that a good partofit will include a far
`tighter relationship between the software on your computerandservers on the Internet, and it might even
`include software runningsimilarly to ASPs.
`
`169
`
`169
`
`
`
`How Application Service
`Providers Work
`
`There is no single, standard way that ASPs enable people and
`companiesto run software overthe Internet; many variations
`exist. In one common way, however, a corporation contracts
`with an ASP to provide servers for the entire corporation for
`client/server software that the corporation uses—for example,
`the Lotus Notes workgroup and e-mail program.
`
`Corporation
`
`
`
`
`ij In this setup, the client software runs on each person’s
`computerin the corporation and isn’t handled by the
`ASP. However, the client software needs to access
`servers to work—for example, for sending and receiv-
`ing e-mail or creating and accessing corporate data-
`bases. The ASPis in charge of the servers and the
`software and data on the servers; the corporation is in
`charge of the client software on people’s desktops.
`
`
`
`CIEEeeerooooo
`Data
`
`Lotus Notes Server
`
`170
`
`170
`
`
`
`Ei Another type of ASP is completely Web based and can be usedbyindividuals as well
`as corporations. It's a way for people to run softwaredirectly in their Web browsers
`
`
`
`
`
`
`hee
`e Tr
`t
`ware, someonevisits a Website thatallow herto run a
`
`
`application—for exam! apieceofper fi
`OGram.
`| UUWUULIKU UE UD UU ©
`WOUUDUOULUURDU U
`MOTONOOONOUUUUUIU
`
`E
`
`
`
`
`ActiveX
`
` | The Website delivers the
`
`software to the person’s
`Web browser. The software
`can be created using many
`kinds of tools, such as
`ActiveX or Java.
`
`C
`Personal Finance
`ASP Server
`
`GB The software runs inside the person’s
`browser. Depending on the software, it
`might allow the person to save data to
`her own PC, or she might be required to
`save it to the Website. When the person
`leaves the Website, the software stops
`running, and it leaves no traces behind
`on the person’s computer.
`
`Gg Anothertype of ASP requires someone to download
`a small piece of software to her computer. It’s a kind
`of “helper” software—it’s not the application itself,
`but instead helps the application run.
`
`8|The person now runsthe application onSFceown computer. Lotus
`
`—
`
`yA After the person downloadsthe helper, she
`choosesthe actual application she wants to
`run—a graphics program, for example, or a
`piece of personal finance software. The
`helper application goes out to a Web site
`and downloadsa core part of the application
`
`
`
`tothe————'sharddisk.
`
`Notes
`Server
`
`If she needs other elements of the application other than
`the core that was downloadedto her computer, the helper
`application goes out and gets it. When the person stops
`running the application, the helper program deletesit from
`the hard disk, leaving behind no trace.
`
`Application
`downloads
`and then
`launches.
`
`171
`
`171
`
`
`
`Be
`
`|RRRRR Ls
`
`- - FF
`
`=~ << <
`
`~- - -
`
`172
`
`172
`
`
`
` USING
`
`COMMON
`INTERNET
`TOOLS
`
`Chapter 30: How Telnet Works
`176
`
`Chapter 31: How FTP Downloading Works
`180
`
`Chapter 32: How Internet Searching Works
`186
`
`Chapter 33: How Agents Work
`192
`
`Chapter 34: How Java, ActiveX, and JavaScript Work
`196
`
`Chapter 35: How CGI Scripting Works
`204
`
`173
`
`173
`
`
`
`AVN enormous amountof information and entertainmentis available on the Internet, but
`how do you accessit? Although using the Internet gets easier every day,it’s still not quite as
`simple as turning on yourtelevision or reading your daily newspaper.
`The solutionis to use a variety of Internettools. These tools enable you to tap into the colos-
`sal resources of the Internet. Some ofthese resources, such as the World Wide Web,are quite
`well known, Others, such as FTP (file transfer protocol) are used quite often-
`and sometimes
`people use them without even knowingit. Still others, such as Telnet, are not nearly as popu-
`lar, although they arestill useful. Many of these Internet tools predate the Web, but they are
`still useful today,
`For many people, the term “Internet” really means the World Wide Web, but as this sec-
`tion of the book shows, a world exists well beyond the Web. (Turn to Part 4, “Using World
`Wide Web,” for information about the Web—thefastest growing and mostvisible part of the
`Internet.) And this section of the book also shows you some of the advanced underlying tech-
`nologies that make the Internet and the World Wide Web a richer, more interactive, more
`entertaining, and more productive medium. Manyof these technologies have changed the
`very nature of the Internet and have turnedit into a truly interactive medium—onethat
`people can navigateefficiently. The technologies also enable Web publishers and Internet
`developers to more effectively present information to people.
`This section looks at how the most common and useful Internet tools work,
`Chapter 30, “How Telnet Works,” covers one of the older Internet technologies, and one
`that is still in widespread use—Telnet. Telnet enables you to take over the resources ofa dis-
`tant computer while sitting at your own computer. What you type on your keyboard is sent
`across the Internet to the distant computer, the commandsare carried out by the distant com-
`puter, and the results of your commands are sent to your own computerscreen.It appears as
`if you’re sitting at the distant computer's keyboard. Telnetis used in many ways, notably by
`libraries making their catalogs available over the Internet. When you log in to a distant com-
`puter using Telnet, you often use a menuing system.
`Chapter 31, “How FTP Downloading Works,” covers one of the most popularuses of the
`Internet—downloading files. Generally, files are downloaded from the Internet using FTP, the
`Internet protocol. Not only will you look at how FTP works, but you'll also look at howfiles
`are compressed and decompressed onthe Internet. A compressedfile takes less time to be sent
`over the Internet to your computer. You might not know it, but many times when you're on a
`Web site and downloadafile, you're actually using the FTP protocol.
`Chapter 32, “How Internet Searching Works,” examines Internet search engines. The
`Internet contains such a vast amountof information thatit’s often impossible to find exactly
`what you want. Search engines look through the entire Internet—not only Web pages, but
`othersites such as newsgroups—andfind information you’re looking for, based on keywords
`you type.
`Chapter 33, “How Agents Work,” looks at agents on the Internet. Agents are programs
`that do your bidding across the Internet automatically, without you doing anything. They can
`find the latest news and download it to your computer; they can find you the best deal on the
`
`174
`
`174
`
`
`
`CD you wantto buy; they can perform important Web maintenancetasks; and more. They
`are becoming so complex that systems are being developed to enable agents to interact with
`one another so they can perform jobs cooperatively.
`Chapter 34, “How Java, ActiveX, and JavaScript Work,” examines three other types of
`technologies that are transforming the Internet—Java, JavaScript, and ActiveX. These three
`technologies might do more to transform the Internet than almost any other technologies
`currently available. These technologies add multimedia and interactivity, but more impor-
`tantly, they begin to treat the Internetas if it were an extension of your computer. In essence,
`they enable your computer and the Internet to interact asif they were one large computer
`system. This enables things such as newstickers, interactive games you can play with others,
`multimedia presentations combining animations, sounds, music, graphics, and much more.
`Java, a computer language developed by Sun Microsystems, enables applications to be run
`from the Internet. The programs run inside your Web browser. One benefit of Java applica-
`tions is that they can be run on any computer, such as a PC, a Macintosh, or a Unix work-
`station.
`
`ActiveX, a competing technology from Microsoft, canalso essentially turn the Internet
`into an extension of your computer. Similar to Java applets, ActiveX controls are downloaded
`to your computer and run there. They can do anything a normalapplication can do and can
`also interact with the Web, the Internet, and other computers connected to the Internet.
`To run them, a browser that supports ActiveX, such as Internet Explorer, is necessary.
`JavaScript, which despite its nameis not really relatedto Java, is simpler than Java and
`ActiveX and can be written by people who don't have substantial programming experience.
`JavaScript is commonly used to create interactive forms, site navigation, and similar features.
`Finally, Chapter 35, “How CGI Scripting Works,” examines CGI (Common Gateway
`Interface) scripting. This might appear as one of the more mundaneInternet technologies,
`but withoutit, very little Web interactivity would take place. CGI is a standard way in which
`the Web interacts with outside resources—most commonly, databases. You've probably run
`CGI scripts many times without knowingit. If you'vefilled out a form on a Web page to
`register to use a site and then later received an e-mail notification with a password for you
`to use, you’ve probably run a CGIscript. CGI enables programmers to write code that can
`access information servers (such as Web servers) on the Internet and then send the informa-
`tion to users.
`
`175
`
`175
`
`
`
`
`
`ee a © ow Telnet
`
`3 Q Works
`
`
`
`176
`
`176
`
`
`
`(ONE of the more remarkable features of the Internetis the wayit lets you use the resourcesofa distant
`computer somewhereelse in the world, From your own homeoroffice, you can log onto another computer,
`issue commandsjust as if you were at that computer’s keyboard, and then gainaccess to all the computer's
`resources. You do this with an Internet resource called Telnet. Telnet follows a client/server model, which
`means that you run a piece of software on your own PC (the client) to use the resources of a distant server
`computer. This distant computer is called the host.
`The host allows manyclients to access its resources at the sametime;it isn't devoted to a single user.
`To use Telnet and the host’s resources, you must know the address of the Internet host whose resources you
`want to access.
`
`When you use Telnet, before you can take over the resources of a host computer youtypically have to
`log onto the host. Often, you can use the name “guest” to log on. Some systems require that you also give
`information about yourself, such as your name and address. And some might require that you choosea user-
`name and a password that you will use the next time you login.
`
`You can access many hosts on the Internet by using Telnet. They are all different computers, so many of
`them don’t work or look alike. For example, some might be Unix-based systems, some might be NT-based
`computers, and some might be Macintoshes, as well as a variety of other computers, and they all work and
`look different from one another. As a way to make things easier, many hosts use a menuing system thatgives
`you access to their resources.
`
`Telnet gives you a way to use those menuing systems by using somethingcalled terminal emulation.It lets
`you use your computer to emulate the type of keyboard and computerthat eachof the different computer
`systems expect. Different computers often require different kinds of terminal emulation, but one common
`kind is called VT-100 emulation, so if you use Telnet software and tell it to use VT-L00 emulation, that’s a
`safe emulation to use.
`
`Telnet clients are available for all the major operating systems, including Linux, Unix, Macintosh, and
`all versions of Windows. If you use an Internet shell account instead of a SLIP/PPP connection, you'll
`typically use a Telnet client by simply typing the word Telnet followed by the Internetaddress of the
`computer you wantto access. For example, if you wanted to gain access to a computer run by the federal
`governmentcalled Fed World that lets you access a great deal of government information, you'd type
`Telnet fedworld. gov.
`A Windows- or Macintosh-based Telnet clientis easier to use than a DOS- or Unix-based Telnet client
`because the former remembers hostnames for you. With clients, you can often keep an address book ofhost-
`names so you can easily revisit them.
`
`177
`
`177
`
`
`
`
`
`Understanding Telnet
`
`To use Telnet, you need to knowthe Internet
`address of the host whose resources you
`want to use; your Telnet client contacts the
`husl, using its Internet address.
`
`
`and yourcomputornégtat owtheywill com-
`
`
`
`how informa-
`mation to the distant computer a
`
`tion will be displayed on your s
`een. It determines,
`
`the backspace key will work.
`commontypeof terminaler
`
`municate with eachother. They decide which termi-
`nal emulation will be used. Te
`
`
`
`PC Running
`Telnet Software
`
`
`
`178
`
`
`
`
`
`Network Virtual Terminal
`
`Network Virtual Terminal
`
`E Whena client and a server communicate, they use the Telnet protocol. The Telnet protocol assumesthat
`each end of the connection—the client and the server—is a network virtual terminal (NVT)., Each NVT has a
`virtual “printer” and a virtual “keyboard” The keyboard sends data from one NVT to the other. When you
`type text on your keyboard, you're using the NVT
`keyboard. The printer is not really a printer at
`all—it receives and displays the data on the com-
`puter screen.pve a distant Telnet connection
`
`Printer
`
`
`
`Printer
`
`Keyboard
`Keyboard
`4 Typedtext in a Telnet session accumulates in a buffer on your computer. When a completeline of data
`is ready for transmission, or when you give a command to transmit data (such as pressing the Enter
`key), the data is sent across the Internet from your NVT keyboard. Along with the datais the host's IP
`address, which ensures the packet is sent to the proper location.
`
`from: 137.42.9.68
`
`To: fedworld.gov
`
`OCon
`
`
`
`5 Your IP addressis also sent, so that information can be routed back to you. Additionally, specific Telnet
`commands are sent that the other NVT uses to decide what to do with the data or how to respond to
`the data. For example, when data is sent from one NVT to another, and certain information must be
`sent back to the originating NVT for a process to proceed, the Telnet Go Ahead (GA) command is sent.
`
`Ga The Telnet host receives the data you've sent. It processes the data and returns to your screen (your
`NVT printer) the results of using the data or running the commandonadistant computer. So, for exam-
`ple, if you type a series of keys with the letters dir and press Enter, the distant computercarries out the
`dir command. That computer also returns to your screen the dir commandand sendstheresults of
`running that command on the distant computer.
`
`
`
`EA Because packets must go through manyInternet routers in each direction between your computer and
`the host, a delay might occur between the time you send a command and the time you see the results
`on your own computer screen.
`
`179
`
`179
`
`
`
`
`
`CHAPTER How FTP
`
`3 :
`
`Downloading
`
`Works
`
`
`
`180
`
`180
`
`
`
`ONE of the most popular uses of the Internet is to download files—thartis, transter files from a computer
`onthe Internet to your computer. These files can be of many types: programs that you can run on your own
`computer; graphics you can view; sounds and music you canlisten to; or text files that you can read. Many
`tens of thousandsoffiles are downloaded every day over the Internet. Frequently, using the Internet's File
`Transfer Protocol, commonly referred to as FTP. You can also use FTP to uploadfiles from your computer to
`another computer on the Internet.
`
`FTP, like many Internet resources, works onaclient/server model. You run FTPclient software on your
`computer to connect to an FTPserver on the Internet. On the FTPserver, a program called an FTP daemon
`(pronounced “demon”) allows you to download and uploadfiles.
`
`To log on to an FTPsite and downloadfiles, you must type in an account number (or username) and
`a password before the daemonwill allow you to enter. Somesites allow anyone to enter and download
`files, but an account number (or username) and password muststill be entered. Often, to get in, you use
`anonymous as your username and your e-mail address as your password. Becauseof this, these sites are often
`referred to as anonymous FTPsites. Some FTPsites are private and allow only certain people with the proper
`account number and passwordto enter.
`
`FTPis fairly simple to use. When you log on to an FTPsite, you can browse through theavailablefiles
`by changing directories and seeinga listing ofall the files available in each directory. Whenyouseea file
`you want to download, use your client software to instruct the FTP server to send youthefile.
`As the World Wide Web gained popularity, downloading software became eveneasier. You can use your
`Web browser and click links to files. Behind the scenes, FTPis often still downloading the files. FTP
`remains the most popular way to download files from the Web and the Internet. The HTTPprotocol of the
`Web can be used for downloading files from the Web, butit’s not as efficient as FTP, so it isn’t used as fre-
`quently.
`
`One problem with downloadingfiles over the Internet is that somefiles are so large that it can take a
`tremendous amountof time to download them; especially if the connection is made via modem. Evenat
`56Kbps, downloading files can be slow. As a way to speed upfile transfers and save space on the FTPserver,
`files are commonly compressed, or shrunk in size using special compression software. Many different meth-
`ods are used to compressfiles. Depending onthefile type,files are usually compressed from 10%-50%. After
`downloadingthe files, you’ll need to run the compression software on your own computer to decompressthe
`files so you can use them.
`
`181
`
`181
`
`
`
`How an FIP Session Works
`
`FTP, like many other Internet services, runs on a client/server model. To use it you'll
`need client software on your computer. lo begin an FIP session, run the FTPclient soft-
`ware and contact the FTP server from which you want to download files. You can get
`FTP client software in hundreds of places on the Internet, such as ZDNet Downloads at
`www. zdnet.com/downloads. A command-line FTP program is also included on Windows-
`these FTPclients.
`based computers, but is much harder to use than
`
`(i FIP Sites
`=) Public eottware aachives
`‘Qi InternetWinneck apecitic
`
`[Anzio
`
`Lite [Feinet Cie
`
`Connect
`rF PrhULULfF PF PF PF Ff FT FF FCF
`
`
`
`
`
`
`
`
`
`
`
`Please Log In
`
`
`Sy Large mirver sites
` User Account: Anonymous
`
`
`
`
`
`
`
`
`
`eeteitA
`
`a oe o-oo oo)|ce a alll
`
`bi The FTP daemonrunson the FTPserver. This
`daemon handles all FTP transactions. When an
`FTP client contacts a server, the daemonwill ask
`for an account number (or username) and pass-
`word. Many FTPsites let anyone log on to them
`to download files and software. This is called
`anonymous FTP. With anonymous FTP, you
`often use anonymous for your account number
`and your e-mail address for your password.
`Note that some FTPclients will automati-
`cally log on to the FTP server for you
`when you connect so you won't be
`asked to log on.
`
`v
`yA
`
`the FTP server, a connec-
`tion called a commandlink is
`opened up between your com-
`puter and the server. Your computer
`usesthis link for sending commandsto
`the server,andthe serverusesthislink
`for sending messages and information
`back to your computer.
`
` Password: Swordfish@sf.com
`
`
`182
`
`
`
`| When you want to change directories
`on the FTPserver, your client soft-
`ware sends an instruction to the FTP
`daemon using the command link.
`The daemon changes directories and
`then sends backa listing offiles on
`that directory via the command link.
`On your client software, you'll see a
`listing of files in the new server
`directory. When you want to down-
`load a file, the request is issued over
`the command link.
`
`
`Command Link
`
`»
`Send me,Alice. zip.
`a
`
`=
`
`
`
`
`
`closes automatically.
`
`-
`
`=
`
`=
`
`ie
`
`S
`
`5 Whenyou issue a commandto downloada file, you open a
`second connection called the data connection (also called
`the data link). This connection can be opened up in one of
`two modes: the ASCII modeorthe binary mode. The ASCII
`modeis used for sending text files and alters things such as
`line feed and carriage return; the binary modeis used for
`sending binaryfiles and lets files through untouched.
`
`Data Link
`Sending File Alice.zip
`
`Command Link
`
`WS. ad -
`
`Ga Thefile is downloaded from the
`server to your computer via the
`data connection. After the file is
`downloaded, the data link
`
`V
`
`Vv
`
`v
`CommandLink
`
`Log meoff,
`
`i Whenthefile is downloaded andthe data link
`closes, the commandlink stays open. You can
`then change directories or download more
`files. After you're done, log off and the com-
`mand link closes. You're no longer connected
`
`to the FTP server.
`
`183
`
`183
`
`
`
`How File Compression Works
`
`Compression programs use a/lgorithms—complex mathematical formulas—to shrink files.
`In the tirst step in the process, the algorithm examinesthe file to be compressed and looks
`for repeating patterns of data.
`
`
`
`
`
`Uncompressed
`File
`
`[azb3~q4]+c4~cosineb-I4
`
`
`
`
`
`tokenD
`tokenL
`
`tokenB
`
` token|
`
`NY
`
`tokenA
`tokenL
`
`token&
`
`tokenL
`
`tokenA
`
`tokenT
`
`Bi Whenthe algorithm finds patterns
`of data that repeat, it replaces the
`patterns with smaller tokens. Ina
`file that has many repeating pat-
`terns, many tokens are used to
`replace data so the compressed file ~
`is much smaller than the original
`file.
`
`File Analysis
`~ 32,196
`= token A
`~ £54
`e toke
`
`Compression
`Software
`
`
`
`Compressed
`File
`
`Ei A header can also be added to thefile as
`it is compressed. This header contains
`information about the file, such as the
`filename,thefile size, and the compres-
`sion method used. This information is
`used to help reconstruct the file when it
`is uncompressed.
`
`LA File extensions,the letters
`that appear after the period
`File.pak
`at the end of a filename,tell
`File.arj
`File.zoo
`you whether and howafile
`is compressed,
`
`Usually MS-DOS
`
`le.1zh > File.pak >
`
`
`File.zip
`
`Fi
`
`~
`
`=
`
`184
`
`
`
` zy Some compression
`
`software, such as
`PKZIP for the PC, can also
`archivefiles by combining
`several compressed files.
`The Unix command Tar can
`also combine manyfiles into a
`single archive.
`
`5|Whenyou wantto use a compressed
`file you find on the Internet, transfer
`it over the Internet to your computer.
`
`
`
`Gg To usethefile, you'll need decompression
`software on your computer. The decom-
`pression software looks into the file’s
`header and examinesthe tokens in the
`file. The decompression software uses a
`decompression algorithm to reconstruct
`the original file which you can then use on
`your computer.
`
`
`
`185
`
`185
`
`
`
`
`
`“ee_~-CFlow Internet
`
`3 ? Searching
`
`Works
`
`
`
`186
`
`186
`
`
`
`‘© muchinformation is available on the Internet, but thereis solittle organization to the Internet thatit
`can seem impossible to find the information or documents you want. A numberof solutions have sprung up
`to solve the problem. The two most popular ones are indexes andsearch engines.
`
`Indexes present a highly structured way to find information. They enable you to browse through informa-
`tion by categories, such as arts, computers, entertainment, sports, and so on. Ina Web browser, you click a
`category, and you are then presented with a series of subcategories. Under sports, for example, you'll find
`baseball, basketball, football, hockey, and soccer. Depending on thesize of the index, several layers of sub-
`categories might be available. When youget to the subcategory you're interested in, you are presented with
`a list of relevant documents. To get to those documents, you click the links to them. Yahoo!
`(http://www. yahoo.com/) is the largest and most popular index on the Internet. Yahoo! and other indexes
`also enable you to search by typing words that describe the information you're looking for. You thenget a set
`of search results—links to documents that match your search. To get the information, you click a link.
`
`Another popular way of finding information is to use search engines, also called search tools and sometimes
`called Web crawlers or spiders. Search engines operate differently from indexes. They are essentially massive
`databases that cover wide swathsof the Internet. Search engines don’t present information in a hierarchical
`fashion. Instead, you search through them as you would a database, by typing keywordsthat describe the
`information you want.
`
`There are many popular Internet search engines, including Google, Lycos, Excite, and AltaVista.
`Although the specifics of how they operate differ somewhat, generally they are all composed ofthreeparts:
`at least one spider, which crawls across the Internet gathering information; a database, which containsall
`the information the spiders gather; and a search tool, which people use to search through the database.
`Search engines are constantly updated to present the most up-to-date information, and they hold enormous
`amountsof information. Search engines extract and index informationdifferently. Someindex every word
`they find in a document, for example, and others index only the key 100 words in each document. Some
`index the size of the document; some index thetitle, headings, subheadings, and so on.
`
`Additionally, each search engine returns results in a different way. Some weigh the results to show the
`relevance of the documents; some showthefirst several sentences of the document; and some showthetitle
`of the documentas well as the URL.
`
`Many search engines and indexes are on the Internet, each with its own strengths and weaknesses. To
`cast the widest possible net when looking for information, you should search as many of them as you can.
`The problemis that doing so is too time-consuming. So a type of software called meta-search software has
`heen developed. With this software, such as Copernic, you type a search on your own computer. Thesoft-
`ware then submits the search to many Internet search engines and indexes simultaneously, compiles the
`results for you, and then delivers the results to your computer. Tovisit any resulting site, just click the link,
`the same as if you were on an index or a search enginesite.
`
`187
`
`187
`
`
`
`
`
`
`
`ow Internet Search
`Each search engine uses a
`ngines Work
`crawler or spider with its own set
`of rules guiding how documents
`
`are gathered. Somefollow every
`.
`#
`link on every home page they
`find and then, in turn, examine
`23WAIdaa4729rape1Cc
`every link on each of those new
`
`homepages, and so on. Some
`
`oo”
`is
`spiders ignore links that lead to
`
`graphics files, sound files, and
`
`animation files. Some ignore
`links to certain Internet
`
`resources, such as newsgroups,
`
`and some are instructed to look
`
`arily for the most popular
`
`pages. It can take a spider
`
`reral seconds to many
`
`Gathered
`
`183 documents:
`427 hyperlinks
`
`
`
`
`
`
`
`URLs and documents
` \\\ Gathered:
`and send information
`
`
`487 documents: 938,
`Dis
`
`Database — >
`
`Ei The indexing software receives the documents and URLs from
`the agent. The software extracts information from the docu-
`ments and indexesit by putting the information into a database.
`Each search engine extracts and indexes different types of infor-
`mation. Some index every word in each document, for exam-
`ple, but others index only the key 100 words in each; some
`index the size of the document and the number of words in it;
`some index the title, headings and subheadings, and so on. The
`kind of index built determines which type of searching can be
`done with the search engine and how the information will be
`displayed.
`
`188
`
`188
`
`
`
`
`
`
`
`
`.
`a.
`=
`
`ay,
`
`
`
`
`
`6] Whenyouclick a link to one of the documentsthat
`interest you, you’re sent straight to that document. The
`
`documentitself is not in the database or on the search
`
`engine site.
`
`GB The database is searched, based on
`the criteria you've set. Results are
`returned in HTML pages. Each search
`_ engine returns results in a different
`way. Some weigh the results to show
`how relevant the documentis to your
`search; some showthe URL, as well
`as the first several sentences of the
`document; and some showthetitle
`of the document and the URL.
`
`4 Whenyouvisit a search engine and want to
`search the Internet for information, you type
`words on a Web page that describe the infor-
`mation you wantto find. Depending on the
`search engine, more than just keywords
`can
`be used. For example, you can searchby date
`and other criteria with somesearchengines.
`
`
`
`189
`
`os
`
`Server
`
`189
`
`
`
`Weather
`
`“Get best
`
`Politics
`
`A simple Internet agent is one that gathers news from a variety of sources while you're not using your
`computer or while you are using your computer for another task. News agents can work in several ways.
`In the simplest example, youfill out a form saying which type of news you're interested in and on what
`schedule you want your newsdelivered. Based on that information, at preset intervals, the news agent
`connects to newssites around the Internet and downloads newsstories to your computer, where you can
`read them as HTML pages.
`
`“Get bestprice”
`
`
`
`Qe
`
`
`
`
`2. Shopping agents let you search throughall of the Internet for the best bargains. On the Web,youfill out
`a form detailing the product you want to buy. When you submit the form, the shopping agent launches
`programs that search through a variety of shopping sites and databases on the Internet. The agent
`looks into the databases of those sites and finds the best prices. It then sends back to you the links to
`the sites so you can visit the sites with the best prices and order from there.
`
`190
`
`190
`
`
`
`
`Web Server *
`4 Whenrobotsandspidersdotheirworkon a remote Internetsitefromwhereawerelaunched,they
`
`Unprotected
`
`—\j
`
`Protected
`Web Server
`
`can put an extra load on the site’s system resources—for example, by swamping the server with too
`many requests in too short a time. Because of this, some system administrators are interested in ways
`of excluding robots in certain circumstances, such as not allowing robots into certain Web directories.
`A variety of ways have been devisedto limit robot access, including creating a file called Robots.txt that
`describes the areasoff limits to robots, which the robots would read, adhereto, and notvisit. But there is
`nothing that guarantees the robots must adhereto thisrule.It’s up to the goodfaith of the person writing
`the robotto adheretoit.
`
`Ganeneeee -*
`
`aouounal®
`
`i
`x
`
`I,
`
`
`At
`secwnmanle
`paneerGA,
`AR *
`eelON
`ahem
`t 4
`KF
`
`
`
`Bad Links: 42 out of 714
`
`RESULTS
`Missing graphics: 12 out of 1142
`Problemsidentified,
`Create logfile ofall results?
`
`corrective measu