`
`(12) United States Patent
`Gribble et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,196.205 B2
`Jun. 5, 2012
`
`(54) DETECTION OF SPYWARE THREATS
`WITHN VIRTUAL MACHINE
`
`(75) Inventors: Steven Gribble, Seattle, WA (US);
`Henry Levy, Seattle, WA (US);
`Alexander Moshchuk, Seattle, WA
`(US); Tanya Bragin, Seattle, WA (US)
`(73) Assignee: University of Washington through its
`Center for Commercialization, Seattle,
`WA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1322 days.
`(21) Appl. No.: 11/426,370
`
`(*) Notice:
`
`(22) Filed:
`
`Jun. 26, 2006
`
`(65)
`
`Prior Publication Data
`US 2007/O1749 15 A1
`Jul. 26, 2007
`
`Related U.S. Application Data
`(60) Provisional application No. 60/761,143, filed on Jan.
`23, 2006, provisional application No. 60/787,804,
`filed on Mar. 31, 2006.
`
`(51) Int. Cl.
`(2006.01)
`G06F II/00
`(2006.01)
`H04L 9M32
`(52) U.S. Cl. .......................................... 726/24; 713/168
`(58) Field of Classification Search .................... 726/11,
`726/22, 26; 713/168
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`2003/0195950 A1 10/2003 Huang et al. .................. TO9,219
`2003/0212902 A1* 11/2003 van der Made ..
`713,200
`2003,0229.900 A1
`12, 2003
`Reisman ..
`725/87
`2004/O2551.65 A1* 12, 2004
`713,201
`SZor ......
`
`2005, 0138427 A1* 6/2005 Cromer et al. ................ T13 201
`2005, 018294.0 A1* 8, 2005 Sutton et al. ...
`T13, 179
`2005/0273856 A1* 12/2005 Huddleston ..................... 726/22
`2006/0021029 A1
`1/2006 Brickell et al. ................. 726/22
`2006/0021054 A1
`1/2006 Costa et al. ..................... 726/25
`2006, OO31673 A1* 2, 2006 Becket al. .....
`... 713, 164
`2006/01 12342 A1* 5/2006 Bantz et al. ....
`715,736
`2006/01 12416 A1* 5/2006 Ohta et al. ........................ T26.1
`2006/0161982 A1* 7/2006 Chari et al. ..................... T26/23
`2006/0236127 A1* 10, 2006 Kurien et al. ..
`T13, 193
`2007/0136579 A1* 6/2007 Levy et al. ... ... 713,168
`2007,0186212 A1
`8, 2007 MaZZaferri et al.
`... 718.1
`2007/0256073 Al 1 1/2007 Troung et al. ......
`... 718.1
`2009/0271867 A1* 10/2009 Zhang ............................. T26/24
`OTHER PUBLICATIONS
`Wang, Yi-Min, et al. "Automated Web Patrol with Strider Honey
`Monkeys: Finding Web Sites That Exploit Browser Vulnerbilites.”
`Microsoft Research, Technical Report, First Version: Jun. 4, 2005,
`Last Updated: Jul. 27, 2005.
`* cited by examiner
`Primary Examiner — Techane Gergiso
`(74) Attorney, Agent, or Firm — University of Washington
`Center for Commercialization
`
`ABSTRACT
`(57)
`A system analyzes content accessed at a network site to
`determine whether it is malicious. The system employs a tool
`able to identify spyware that is piggy-backed on executable
`files (such as software downloads) and is able to detect “drive
`by download’ attacks that install software on the victims
`computer when a page is rendered by a browser program. The
`tool uses a virtual machine (VM) to sandbox and analyze
`potentially malicious content. By installing and running
`executable files within a clean VM environment, commercial
`anti-spyware tools can be employed to determine whether a
`specific executable contains piggy-backed spyware. By vis
`iting a Web page with an unmodified browser inside a clean
`VM environment, predefined “triggers, such as the installa
`tion of a new library, or the creation of a new process, can be
`used to determine whether the page mounts a drive-by down
`load attack.
`
`44 Claims, 5 Drawing Sheets
`
`200
`
`230 YREQUEST
`
`WEBURL
`
`202
`
`SPYPROXY
`
`242
`N RENDERURL
`CONTENT WITH
`USER
`BROWSER
`
`?t SAFE, RETURN
`URL CONTENT TO
`CLIENT
`
`240
`
`238
`S. RETURNRESULT OF
`ANALYSS AND
`RETRIEWE CONTENT
`
`
`
`232
`
`CREATE VM.
`LAUNCHBROWSER;
`DIRECT TOURL
`
`206
`
`234
`
`f
`
`CHECKING URLSFOR
`ORWE-BYATTACKS AND
`SPYWARE INSTALLATION
`
`
`
`WIRTUAL
`MACHINE
`
`BROWSERFETCHESURL3
`EMBEDED OBJECTS
`
`238
`
`SANALYZE WMFOR Es
`
`INFECTIONS or WIDENCE OF
`RV-YAttACK
`
`Juniper Ex. 1010-p.1
`Juniper v Huawei
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 1 of 5
`
`US 8,196.205 B2
`
`
`
`CRAWL THE WEB TO
`FIND & DOWNLOAD
`EXECUTABLE FILES
`TO ALOCAL STORAGE
`SYSTEM
`
`FOREACH
`EXECUTABLE FOUND,
`PERFORM THE
`FOLLOWING STEPS
`TO ANALYZET:
`
`CLONE A NEWVIRTUAL
`MACHINE, CONTAINING A
`"CLEAN' INSTALLATION OF
`THE OPERATING SYSTEM
`
`INSTALL THE EXECUTABLE,
`USE HEURISTICS TO
`NAVIGATE THROUGH
`INSTALLATION MENUS OR
`INSTALLER TOOLS
`
`ANALYZE THE RESULTING
`VIRTUAL MACHINE, FOR
`EXAMPLE, BY RUNNING AN
`ANTI-SPYWARE TOOL SCAN
`TO DETECT INSTALLED
`SPYWARE
`
`COLLECT RESULTS FROM
`ANALYSIS, AND BUILD A
`REPORT ON THE SAFETY
`ORRISK OF THE
`EXECUTABLE FILE
`
`FIG. I.
`
`Juniper Ex. 1010-p.2
`Juniper v Huawei
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 2 of 5
`
`US 8,196.205 B2
`
`
`
`CRAWL THE WEB TO FIND
`WEB PAGES TO TEST
`
`FOREACH WEB PAGE
`FOUND TO TEST,
`PERFORM THE
`FOLLOWING STEPS:
`
`CLONE A NEW VIRTUAL MACHINE,
`CONTAINING A "CLEAN"
`INSTALLATION OF THE
`OPERATING SYSTEMAND
`BROWSER USED FOR TESTING
`(E.G., MICROSOFT'S WINDOWS"M&
`INTERNET EXPLORERTM)
`
`FORCE THE BROWSER TO LOAD
`THE WEB PAGE THAT IS TO BE
`TESTED
`
`LOOKFOR A "TRIGGER" TO FIRE,
`SUCH ASA NEW PROCESS BEING
`LAUNCHED, AFILE BEING
`CREATED/MODIFIED, OR A
`REGISTRY ENTRY BEING
`MODIFIED
`
`IF A TRIGGER FIRES, DECLARE
`THE WEB PAGE TO BE
`SUSPICIOUS; OPTIONALLY,
`PERFORMAN ANTI-SPYWARE
`SCAN IN THE VIRTUAL MACHINE
`TO TEST FOR THE DEFINITIVE
`PRESENCE OF SPYWARE
`
`Juniper Ex. 1010-p.3
`Juniper v Huawei
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 3 of 5
`
`US 8,196.205 B2
`
`112
`
`PROCESSOR
`
`
`
`MEMORY
`(RAM & ROM)
`
`116
`
`128
`
`NETWORK
`INTERFACE
`
`
`
`120
`
`I/O
`INTERFACE
`(PORTS)
`
`DISPLAY
`INTERFACE
`
`122
`
`INPUT
`DEVICES
`
`DISPLAY
`DEVICE
`
`126
`
`100
`
`FIG. 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WEBSITE
`(SPYWARE)
`
`COMPUTING DEVICE
`(WEB CRAWLER)
`VIRTUAL MACHINE
`- BROWSER
`
`140
`
`FIG. 4
`
`Juniper Ex. 1010-p.4
`Juniper v Huawei
`
`
`
`U.S. Patent
`
`EENW
`
`L - T INBITO
`
`0 LZZOZ80Z
`pisano:
`
`ETEV/_LÍTOEXE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Juniper Ex. 1010-p.5
`Juniper v Huawei
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 5 of 5
`
`US 8,196.205 B2
`
`
`
`
`
`
`
`
`
`OOZ
`
`
`
`NAJO LEH ‘El-H\/S HITèH[] (HECINER-à:
`
`
`
`
`
`Juniper Ex. 1010-p.6
`Juniper v Huawei
`
`
`
`1.
`DETECTION OF SPYWARE THREATS
`WITHIN VIRTUAL MACHINE
`
`RELATED APPLICATIONS
`
`This application is based on a prior copending provisional
`applications, Ser. No. 60/761,143, filed on Jan. 23, 2006, and
`Ser. No. 60/787,804, filed Mar. 31, 2006, the benefit of the
`filing date of which is hereby claimed under 35 U.S.C. S 119
`(e)
`
`10
`
`GOVERNMENT RIGHTS
`
`This invention was made with U.S. Government support
`under grant No. CNS 0430477 awarded by the National Sci- 15
`ence Foundation (NSF). The U.S. Government has certain
`rights in the invention.
`
`BACKGROUND
`
`2O
`
`In the span of just a few years, spyware has become the
`Internet's most “popular download. A recent scan performed
`by America Online/National Cyber Security Alliance (AOL/
`NCSA) of 329 customers’ computers found that 80% were
`infected with spyware programs. More shocking, each 25
`infected computer contained an average of 93 spyware com
`ponents. As used herein and in the claims that follow, a
`definition of the term “spyware provided by the online ency
`clopedia WikipediaTM is applied. WikipediaTM defines spy
`ware as “a broad category of malicious Software designed to 30
`intercept or take partial control of a computer's operation
`without the informed consent of that machine’s owner or
`legitimate user.” WikipediaTM further notes that “while the
`term spyware taken literally suggests Software that Surrep
`titiously monitors the user, it has come to refer more broadly 35
`to software that subverts the computer's operation for the
`benefit of a third party.” Adware, which displays advertising
`for a service or product, may be a form of spyware, if it is
`installed without a user's consent. Most users are willing to
`accept the display of sponsored popup advertising as a nec- 40
`essary result of being enabled to visit a Web page that pro
`vides a desired benefit at no other cost to the user. However, if
`the adware installs any software component on a user's com
`puter without the user's knowledge or agreement, or contin
`ues displaying advertising when the user is accessing other 45
`sites, the adware is properly viewed as spyware.
`While specific spyware may be designed to simply gather
`information that would generally be viewed as innocuous,
`Such as logging the Web pages that a user visits for purposes
`of more effectively targeting advertising to customers, other 50
`forms of spyware can deliver unsolicited pop-up advertising
`when the user visits unrelated Web pages that don’t benefit
`from sponsored advertising that is displayed, or the spyware
`can Surreptitiously gather personal information about a user,
`including a user's Social security number or credit card num- 55
`bers, or can change a user's home page, or redirect Web page
`requests entered by a user to a different Web site, e.g., one that
`Solicits the user to access pornography.
`The consequences of spyware infections can be severe, and
`can include inundating the Spyware victim with pop-up ads 60
`that open faster than a user can close them, or enabling the
`victim’s financial information to be used by a third party to
`purchase merchandise or withdraw funds from a user's bank
`account, or for stealing passwords. Another form of spyware
`that is sometimes referred to as “malware may even render 65
`the victim’s computer useless. At the very least, the spyware
`installed on a computer diverts system and processor
`
`US 8,196,205 B2
`
`2
`resources away from the tasks desired by a user and can
`dramatically slow computer response time in carrying out
`those tasks or in loading the desktop. In many cases, the user
`will not even be aware of what is causing these problems,
`since the installation of the spyware is done without the user's
`consent and knowledge.
`Spyware typically installs itself surreptitiously through
`one of two methods. First, a user might choose to download
`Software to which piggy-backed spyware code has been
`attached. For example, a user may initiate download of a
`desired utility file, and the piggy-backed spyware will be
`included with the download and automatically installed when
`the utility program is installed. Piggy-backed spyware is par
`ticularly common with file-sharing software. The file-sharing
`KazaaTM system alone has been the source of hundreds of
`millions of spyware installations. Second, a user might visit a
`Web page that invisibly performs a “drive-by download”
`attack (sometimes also referred to herein as a “drive-by instal
`lation'), exploiting a vulnerability in the user's browser to
`install software without the user's consent. In each case, it is
`unlikely that the user will have any indication that spyware
`has been installed. It is only when the adverse effect of the
`spyware is experienced that a user may become aware that the
`spyware installed on the computer is preventing the user's
`computer from working as it did before becoming infected.
`In previous work related to spyware, passive network
`monitoring was used to measure the extent to which four
`specific adware programs had spread through computers on
`the University of Washington campus. In a report of this work,
`the spyware problem was studied from a different perspec
`tive. Specifically, the study measured the extent to which: (1)
`executable Web content contains spyware; and, (2) Web
`pages contain embedded drive-by download attacks. Both
`studies confirmed the existence of a significant spyware prob
`lem.
`The AOL/NCSA online safety study mentioned above con
`ducted a poll of 329 households and also examined their
`computers for the presence of spyware. Over half of the
`respondents believed their machines were spyware-free. In
`reality, 80% of computers scanned were infected with spy
`ware programs. The AOL/NCSA study did not attempt to
`identify how these computers became infected.
`A recent edition of the “Communications of the ACM’
`contained over a dozen articles on the spyware problem.
`These articles discuss issues such as the public perception of
`spyware, security threats caused by spyware, and frameworks
`for assessing and categorizing spyware programs.
`Many projects have examined the detection, measurement,
`and prevention of malware. Such as worms and viruses. Some
`of their techniques may ultimately be applicable to the detec
`tion and prevention of spyware. None of the current
`approaches for identifying Web pages that install spyware are
`able to detect such a Web page on-the-fly, in real time, as a
`user is about to open the Web page in a browser or download
`an executable file.
`Although a number of different commercially available
`programs can be employed to scan a computer system to
`detect known spyware, by the time that the spyware is thus
`detected and removed, the user may have experienced signifi
`cant problems and the efficient operation of the user's com
`puter may have been adversely impacted. An active Internet
`user can unknowingly be exposed to multiple sources of
`spyware each day, so that even if a spyware scanning program
`is used each evening while the computer is not otherwise in
`use, the spyware installed that day may already have
`adversely impacted the user before it can be detected and
`removed.
`
`Juniper Ex. 1010-p.7
`Juniper v Huawei
`
`
`
`US 8,196,205 B2
`
`3
`Accordingly, in addition to identifying Web pages that
`carry out drive-by installation of spyware and executable files
`that include piggy-backed spyware based on Web crawling by
`a dedicated entity, it would be desirable to seamlessly detect
`spyware in real time and on-the-fly, before it is installed on a
`user's computer system. It would also be desirable to provide
`this detection without the interaction of the user and to pre
`clude the user from downloading Web pages and executable
`files that install spyware. In some cases, it may be desirable to
`detect the spyware in real time using a centralized computing
`device to which a user's computing device is connected.
`Alternatively, it may instead be desirable, for example for
`home use, to enable the user's computing device to detect
`spyware threats from Web pages and/or executable files
`before they are accessed by the user, or to employ some
`combination of these approaches.
`
`SUMMARY
`
`4
`provide a clean operating system within a VM running on the
`user's computer to perform the analysis in real-time, on-the
`fly.
`If the system detects spyware or a drive-by download
`attack at a Web site, it blocks the associated malicious content
`from reaching the user's computer (i.e., from being installed
`on or adversely affecting the user's computing system), pre
`venting the spyware or other type of related malware from
`causing harm. However, if an executable file is found not to
`include any piggy-backed spyware or if a Web page is not
`found to be attempting a drive-by installation of spyware or
`other undesired malware, the system permits the content to be
`accessed by the user's computer. Further details of the system
`and of the approach used therein are discussed below.
`To block the malicious content, a plug-in or other type of
`software module may be installed to work with a browser
`program being operated by the user to visit sites on the Inter
`net or other network and will be configured to control the
`browser program to inhibit the completion of a Web page
`download and rendering or the download of an executable
`file, until the analysis of the Web page or the executable file
`can be carried out on-the-fly, and it is determined that no
`spyware or other adverse software installation will result if
`the browser program is allowed to complete the download and
`rendering of the Web page or the download of the executable
`file. An object being downloaded by a user can also be down
`loaded to a “sandbox,” so that the user's computer is protected
`from the object until the VM environment determines that it is
`safe to move the object from the sandbox to be rendered in the
`user's Web browser (ifa Web page), or installed on the user's
`computer (if an executable file).
`Since the on-the-fly analysis of a Web page or executable
`file may slightly delay the rendering of safe Web pages or the
`download of safe executables, in at least Some exemplary
`embodiments, it may be desirable to download the Web page/
`executable file into the VM environment and into the user
`environment in parallel. In this case, the download and ren
`dering of the Web page or the download of the executable file
`into the user environment would not be enabled to complete
`until the analysis of the Web page or executable is completed
`in the VM environment.
`This Summary has been provided to introduce a few con
`cepts in a simplified form that are further described in detail
`below in the Description. However, this Summary is not
`intended to identify key or essential features of the claimed
`Subject matter, nor is it intended to be used as an aid in
`determining the scope of the claimed subject matter.
`
`DRAWINGS
`
`Various aspects and attendant advantages of one or more
`exemplary embodiments and modifications thereto will
`become more readily appreciated as the same becomes better
`understood by reference to the following detailed description,
`when taken in conjunction with the accompanying drawings,
`wherein:
`FIG. 1 is a flowchart illustrating the logical steps taken by
`an exemplary embodiment of an executable file analysis tool
`and the accompanying text describes the steps taken in this
`embodiment of the present approach to analyze executable
`files from the Web, in order to determine whether they contain
`piggy-backed spyware;
`FIG. 2 is a flowchart illustrating the logical steps taken by
`an exemplary embodiment of a drive-by download attack
`detection tool and indicating the steps taken in an embodi
`
`10
`
`15
`
`25
`
`30
`
`35
`
`A system has been developed that looks for and identifies
`spyware-infected executables and Web pages on the Internet.
`In regard to executable files, the system implements an auto
`mated solution that addresses three problems. These prob
`lems are: (1) determining whether a Web object contains
`executable Software; (2) downloading, installing, and execut
`ing that software within a virtual machine without direct user
`interaction; and, (3) analyzing whether the installation and
`execution of the Software caused a spyware infection. In one
`exemplary prototype embodiment discussed below, a high
`performance infrastructure was employed to solve these
`problems so that a large number of executables from a variety
`of sources could be analyzed in a reasonable amount of time.
`FIG. 1 shows a flowchart that illustrates the steps that the
`system takes in order to perform an analysis on an executable
`file. The following discussion describes each of these steps in
`detail.
`There are several advantages provided by exemplary
`embodiments of the present concept, compared to the current
`approach taken by others. First, the present technique can
`examine executable file content for piggy-backed spyware
`programs in addition to examining Web pages for drive-by
`download attacks. Second, the study using this technique
`provides a rich analysis of the spyware that was encountered,
`including the areas of the Web that are most infected, and the
`fraction of spyware that contains malicious functions, such as
`modem dialing or Trojan downloading. Third, this study
`examined how spyware on the Web has changed over time.
`Fourth, the susceptibility of the FirefoxTM browser to drive-by
`downloads was evaluated in the study, in addition to that of
`the Microsoft Internet ExplorerTM (IE) browser.
`A further aspect of this approach is directed to a system to
`combat both methods of spyware infection noted above and to
`prevent a user's computer from becoming infected with spy
`55
`ware, while the user is accessing a site on a network. An
`embodiment of the system transparently performs an on-the
`fly analysis of: (1) executables that the user is attempting to
`download; and, (2) Web pages that the user is visiting, in order
`to detect piggy-backed spyware and drive-by download
`attacks before they can affect the user's computer. In at least
`Some embodiments, the analysis is performed in real-time,
`generally as discussed above in connection with the approach
`used to carry out the study. It is also contemplated that instead
`ofusing VM running on a centralized computing device (e.g.,
`a server that provides the real-time analysis for one or more
`computers in a network), other embodiments may create and
`
`40
`
`45
`
`50
`
`60
`
`65
`
`Juniper Ex. 1010-p.8
`Juniper v Huawei
`
`
`
`US 8,196,205 B2
`
`5
`ment of this novel approach to find and analyze Web pages to
`determine whether they perform a drive-by attack, and/or
`whether they install spyware:
`FIG.3 is a schematic block diagram of a generally conven
`tional computing device that is suitable for use in carrying out
`the novel approach disclosed herein;
`FIG. 4 is a schematic block diagram illustrating use of a
`browser executed within a virtual machine (VM) environ
`ment of one example, and employed to crawl the Web to
`10
`detect spyware threats;
`FIG. 5A is a schematic block diagram illustrating the com
`ponents and the steps employed in an embodiment of the
`present approach to check executable files requested for
`download by a user, to determine if the executable files are
`conveying or attempting to install piggy-backed spyware; and
`FIG. 5B is a schematic block diagram illustrating the com
`ponents and the steps employed in the present approach to
`check uniform resource locators (URLs) requested by a user,
`to determine if the URLs will cause drive-by attacks and/or
`install spyware.
`
`15
`
`DESCRIPTION
`
`6
`Running Executables within a VM
`As indicated in a step 12 in FIG. 1, for each executable
`found, a number of steps were carried out to analyze the
`executable file. Each executable file that was downloaded was
`installed and run in a clean VM, as indicated in a step 14. This
`approach was challenging; while it is simple to run a “naked”
`executable file, software is often distributed using an installer
`framework, such as Windows InstallerTM. Unfortunately,
`installers typically interact with users, requiring them to per
`form manual tasks Such as agreeing to an End User License
`Agreement (EULA), filling in demographic information,
`pressing buttons to begin the installation process, or indicat
`ing agreement to proceed with default options.
`To automate the execution of installer frameworks, a soft
`ware tool was developed that uses heuristics to simulate com
`mon user interactions, as indicated in a step 16. For example,
`Some of these heuristics identify and click on permission
`granting buttons such as “next.” “OK” “run.” “install,” or “I
`agree.” Other heuristics identify and select appropriate radio
`buttons or check-boxes by looking for labels commonly asso
`ciated with EULA agreements. The tool also looks for type-in
`boxes that prompt a user for information, such as name or
`email address, and fills in the boxes with dummy information.
`While this tool cannot handle all installation scenarios per
`fectly, it was verified that the tool successfully navigates all
`popular installer frameworks and it was rarely seen to fail in
`completing an executable installation.
`Since this exemplary approach and study focused on Win
`dowsTM executables, for each executable that was analyzed in
`the study, a VM was first created that contained a clean
`Windows XPTM guest operating system (OS) image. To pro
`vide the clean guest virtual machine environment and OS, the
`“snapshot take' and “snapshot revert' functions provided in
`VMware Workstation 5.0TM running on a LinuxTM host OS
`were used. For each node within a cluster, a pool of VMs was
`maintained on a plurality of computers. When it was desired
`to analyze an executable, a VM from this pool was allocated,
`the VM was rolled-back to a clean checkpoint to ensure that
`no residual changes due to the installation of a previous
`executable remained, the executable or installer image was
`injected into the VM, and the tool was employed to automati
`cally install and execute the program using the heuristic capa
`bility, so that user intervention was not required.
`Analyzing the Installed Executable and its Effect(s) on the
`VM Environment
`Once an executable was installed and run inaVM, the final
`challenge was to determine whether that executable had
`infected the VM with spyware, as indicated in a step 18. To
`make this determination in this exemplary embodiment, the
`Lavasoft AdAwareTM anti-spyware tool was automatically
`run in the VM, using scripts to launch the tool and collect the
`infection analysis from the logs that was produced. The log
`information that was collected in a step 20 of FIG. 1 was
`Sufficiently rich to identify specific spyware programs that
`were installed. Using online databases of previously identi
`fied spyware programs, the functions that those spyware pro
`grams contained, such as keystroke logging, adware, Trojan
`backdoors, or browser hijacking were also manually classi
`fied. Of course, AdAware can detect only those spyware pro
`grams that have signatures included within its detection data
`base. Accordingly, this analysis missed spyware programs
`that AdAware did not find. Also note that only information
`was collected about spyware software that was installed.
`Although many anti-spyware tools such as AdAware also
`identify malicious cookies or registry entries as spyware
`threats, these were excluded, so as to focus only on spyware
`software. To speed up the AdAware sweep, the Windows XP
`
`25
`
`35
`
`40
`
`45
`
`Figures and Disclosed Embodiments are not Limiting
`Exemplary embodiments are illustrated in referenced Fig
`ures of the drawings. It is intended that the embodiments and
`Figures disclosed herein are to be considered illustrative
`rather than restrictive.
`Spyware-Infected Executables on the Web
`30
`FIG. 1 illustrates the steps that are carried out in an exem
`plary embodiment of a system that was configured to crawl
`the Web (or any other designated network) to identify execut
`able files that attempt to carry out installation of spyware or
`other undesired software instructions. A step 10 indicates that
`a crawling program was used to search the Web to find execut
`able files at various Web sites and download the executable
`files to a local storage system, e.g., to a hard drive.
`In a study performed using an exemplary prototype of the
`technology discussed below, it was assumed that a Web object
`was an executable if either: (1) the Content-type hypertext
`transfer protocol (HTTP) header provided by a Web server
`when downloading the object was associated with an execut
`able (e.g., application/octet-stream); or, (2) its URL con
`tained an extension known to be associated with executables
`and installers (e.g., .exe, cab, or msi). Once a Web object was
`downloaded, well-known signatures at the beginning of the
`file were looked at to help identify its type. If a file's type
`could not be identified, it was assumed that it was not an
`executable and need not be analyzed. While Such an assump
`tion may miss Some executables, it rarely produces false
`positives. Accordingly, applying this assumption may under
`estimate the number of executable files on the Web, but is
`unlikely to overestimate the number.
`Some executable files on the Web are not immediately
`obvious to a Web crawler. Two instances of this are
`executables embedded in archives (such as compressed ZIP
`files), and executables whose URLs are hidden in JavaScript.
`To handle the first case, archive files were downloaded and
`extracted, while looking for filenames with extensions asso
`ciated with executables. To handle the second case, the Web
`crawler scanned JavaScript content looking for URLs and
`added them to the list of pages to crawl. Note that JavaScript
`programs can dynamically construct URLs when interpreted.
`Since the Web crawler does not execute JavaScript code, it
`missed any executables that might have been dynamically
`constructed using JavaScript.
`
`50
`
`55
`
`60
`
`65
`
`Juniper Ex. 1010-p.9
`Juniper v Huawei
`
`
`
`7
`image installed in the VM was pruned to eliminate non
`essential and unnecessary functionality and features, so that it
`contained as few files and ran as few components as possible.
`The host firewall and automatic updates were also disabled,
`so as not to interfere with the analysis or with installation of
`spyware in the VM environment.
`Performance
`The executable analysis infrastructure was hosted on a
`ten-node cluster consisting of dual-processor, 2.8 GHz Intel
`Corp. Pentium 4TM machines, each with 4 GB of RAM and
`single 80 GB, 7200 RPM hard drives. On average, it took 92
`seconds to create a clean VM, installan executable, run it, and
`perform an AdAwareTM Sweep. Of this time, around 1-2 sec
`onds was spent creating the VM, 55 seconds was required for
`installing and running the executable in the VM, and 35
`seconds performing the AdAwareTM sweep of the VM envi
`ronment after the executable was installed and run. By paral
`lelizing the analysis to run one VM per processor in the
`cluster, it was possible to analyze 18,782 executables per day,
`in this exemplary test configuration. In practice, it was found
`that the bottleneck of the system, i.e., the slowest part of the
`process, was crawling the Web to find and download
`executables, rather than analyzing the executables that were
`thus found.
`Exemplary Results from Using the Tool
`The HeritrixTM public domain Web crawler was used to
`gather a crawl over 2,500 Internet Web sites in this study. To
`understand how spyware had penetrated different regions of
`the Web, sites from eight different categories were crawled,
`including: adult entertainment sites, celebrity-oriented sites,
`game-oriented sites, kids’ sites, music sites, online news sites,
`pirate?warez sites, and screensaver or “wallpaper sites. In
`addition, ClnetsTM download.com shareware site, which pro
`vides a large number of downloadable executables, was
`crawled.
`
`25
`
`30
`
`US 8,196,205 B2
`
`5
`
`10
`
`15
`
`8
`occurs when a victim visits a Web page that contains mali
`cious content, i.e., tries to install spyware or attempts to
`modify the computing environment in a manner that the user
`considers undesirable. An example is JavaScriptembedded in
`hypertext markup language (HTML) for a Web page that is
`designed to exploit a vulnerability in the victim’s Web
`browser program. A successful drive-by download lets the
`attacking Web page install and run arbitrary Software on the
`victim's computer. The primary challenge in detecting drive
`by attacks is performing an automated analysis of content on
`a Web page to determine whether it contains attack code.
`Fortunately, a simple solution was found: it was assumed that
`a drive-by download attack would attempt to break out of the
`security sandbox implemented by the Web browser program,
`e.g., by modifying system files or modifying/adding (OS)
`registry entries. To recognize such an attack, the Web page
`was rendered using an unmodified browser program running
`in the VM, and an attempt was made to detect when the
`sandbox provided by the normal constraints of the Web
`browser program had been violated.
`The flowchart of FIG. 2 illustrates the steps that an exem
`plary embodiment of the present approach employed in order
`to perform an analysis of a Web page to determine if it was
`attempting a drive-by download of spyware or attempting to
`perform other malicious and undesired acts. This approach
`also used a crawler program to crawl the Web or other desig
`nated network, searching for pages in a markup language
`(e.g., HTML) to test, in a step 30. For each Web (or other page
`in markup language format) that was found, a step 32 per
`formed a plurality of steps to determine if the page was at least
`attempting to carry out a drive-by installation of spyware or
`other undesired and malicious program code. In a step 34, for
`each Web page found in the exemplary embodiment, a new
`VM was cloned containing a clean installation of the operat
`ing system and of a browser program selected for testing. For
`
`TABLE 1.
`
`Executable file results. The number of pages crawled, domains crawled, executables analyzed, and
`infected exe