`Testing, Surveys, and
`Continuing Assessinents
`
`The test of what is real is that it is hard and rough .
`. . . What is pleasant belongs in dreams.
`
`Simone Weil, Gravity and Grace, 1947
`
`Apple Inc.
`Exhibit 1018
`Page 142
`
`
`
`Chapter 4
`
`Introduction
`Expert Reviews
`Usability Testing and Laboratories
`4.4 Surveys
`4.5 Acceptance Tests
`4.6 Evaluation During Active Use
`4.7 Controlled Psychologically Oriented Experiments
`4.8 Practitioner's Summary
`4. 9 Researcher's Agenda
`
`4.1
`
`Introduction
`
`Designers can become so entranced with their creations that they may fail to
`evaluate those objects adequately. Experienced designers have attained the
`wisdom and humility to know that extensive testing is a necessity. If feed(cid:173)
`back is the "breakfast of champions," then testing is the "dinner of the gods."
`However, careful choices must be made from the large menu of evaluation
`possibilities to create a balanced meal.
`The determinants of the evaluation plan include (Nielsen, 1993; Hix and
`Hartson, 1993; Preece et al., 1994; Newman and Lamming, 1995)
`
`• Stage of design (early, middle, late)
`• Novelty of project (well defined versus exploratory)
`• Number of expected users
`
`Apple Inc.
`Exhibit 1018
`Page 143
`
`
`
`4.2 Expert Reviews
`
`125
`
`• Criticality of the interface (for example, life-critical medical system
`versus museum-exhibit support)
`• Costs of product and finances allocated for testing
`• Time available
`• Experience of the design and evaluation team
`
`The range of evaluation plans might be from an ambitious two-year test
`with multiple phases for a new national air-traffic-control system to a three(cid:173)
`day test with six users for a sm.all internal accounting system. The range of
`costs might be from 10 percent of a project down to 1 percent.
`A few years ago, it was just a good idea to get ahead of the competition by
`focusing on usability and doing testing, but now the rapid growth of interest in
`usability m.eans that failure to test is risky indeed. The dangers are not only that
`the competition has strengthened, but also that customary engineering practice
`now requires adequate testing. Failure to perform and document testing could
`lead to failed contract proposals or malpractice lawsuits from users when errors
`arise. At this point, it is irresponsible to bypass some form of usability testing.
`One troubling aspect of testing is the uncertainty that remains even after
`exhaustive testing by multiple methods. Perfection is not possible in com(cid:173)
`plex human endeavors, so planning must include continuing methods to
`assess and repair problems during the lifecycle of an interface. Second, even
`though problems 1nay continue to be found, at some point a decision has to
`be made about completing prototype testing and delivering the product.
`Third, most testing methods will account appropriately for normal usage,
`but performance with high levels of input such as in nuclear-reactor-control
`or air-traffic-control emergencies is extremely difficult to test. Development
`of testing methods to deal with stressful situations and even with partial
`equipment failures will have to be undertaken as user interfaces are devel(cid:173)
`oped for an increasing number of life-critical applications.
`The Usability Professionals Association was founded in 1991 to exchange
`information among workers in this arena. The annual conference focuses
`attention on forms of usability evaluations and provides a forum for
`exchanges of ideas among the more than 4000 members.
`
`4.2 Expert Reviews
`
`While informal demos to colleagues or customers can provide some useful
`feedback, more formal expert reviews have proved to be effective (Nielsen
`and Mack, 1994). These methods depend on having experts available on staff
`or as consultants, whose expertise may be in the application or user-interface
`domains. Expert reviews can be conducted on short notice and rapidly.
`
`Apple Inc.
`Exhibit 1018
`Page 144
`
`
`
`126
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`Expert reviews can occur early or late in the design phase, and the out(cid:173)
`comes can be a fonnal report with problems identified or recomrnendations
`for changes. Alternatively, the expert review could result in a discussion with
`or presentation to designers or managers. Expert reviewers should be sensi(cid:173)
`tive to the design team's ego involvement and professional skilt so sugges(cid:173)
`tions should be made cautiously: It is difficult for someone just freshly
`inspecting a system to understand all the design rationale and development
`history. The reviewer notes possible problems for discussion with the
`designers, but solutions generally should be left for the designers to pro(cid:173)
`duce. Expert reviews usually entail half day to one week, although a lengthy
`training period may be required to explain the task domain or operational
`procedures. It may be useful to have the same as well as fresh expert review(cid:173)
`ers as the project progresses. There are a variety of expert-review methods
`from which to choose:
`
`• Heuristic evaluation The expert reviewers critique an interface to deter(cid:173)
`mine conformance with a short list of design heuristics such as the eight
`golden rules (Chapter 2). It makes an enormous difference if the experts
`are familiar with the rules and are able to interpret and apply the1n.
`• Guidelines review The interface is checked for conformance with the
`organizational or other guidelines document. Because guidelines docu(cid:173)
`ments 1nay contain a thousand items, it may take the expert reviewers
`some time to master the guidelines, and days or weeks to review a large
`system.
`• Consistency inspection The experts verify consistency across a family
`of interfaces, checking for consistency of terminology, color, layout,
`input and output formats, and so on within the interface as well as in
`the training materials and online help.
`• Cognitive walkthrough The experts simulate users walking through the
`interface to carry out typical tasks. High-frequency tasks are a starting
`point, but rare critical tasks, such as error recovery, also should be
`walked through. Some form of simulating the day in the life of the user
`should be part of expert-review process. Cognitive walkthroughs were
`developed for interfaces that can be learned by exploratory browsing
`(Wharton et al., 1994), but they are useful even for interfaces that require
`substantial training. An expert might try the walkthrough privately and
`explore, but then there also would be a group meeting with designers,
`users, or managers to conduct the walkthrough and to provoke a discus(cid:173)
`sion. This public walkthrough is based on the successful code walk(cid:173)
`throughs promoted in software engineering (Yourdon, 1989).
`• Formal usability inspection The experts hold courtroom-style meeting,
`with a moderator or judge, to present the interface and to discuss its
`merits and weaknesses. Design-team members may rebut the evidence
`about problems in an adversarial format. Formal usability inspections
`
`Apple Inc.
`Exhibit 1018
`Page 145
`
`
`
`4.3 Usability Testing and Laboratories
`
`127
`
`can be educational experiences for novice designers and managers, but
`they may take longer to prepare and more personnel to carry out than
`do other types of review.
`
`Expert reviews can be scheduled at several points in the develop1nent
`process when experts are available and when the design team is ready for
`feedback. The number of expert reviews will depend on the magnitude of
`the project and on the amount of resources allocated.
`Comparative evaluation of expert-review methods and usability-testing
`1nethods is difficult because of the many uncontrollable variables; however,
`the studies that have been conducted provide evidence for the benefits of
`expert reviews (Jeffries et al., 1991; Karat et al. 1992). Different experts tend
`to find different problems in an interface, so three to five expert reviewers
`can be highly productive, as can complementary usability testing.
`Expert reviewers should be placed in the situation 1nost si1nilar to the one
`that intended users will experience. The expert reviewers should take train(cid:173)
`ing courses, read manuals, take tutorials, and try the syste1n in as close as
`possible to a realistic work environm_ent, complete with noise and distrac(cid:173)
`tions. In addition, expert reviewers may also retreat to a quieter environment
`for detailed review of each screen.
`Getting a bird's-eye view of an interface by studying a full set of printed
`screens laid out on the floor or pinned to walls has proved to be enormously
`fruitful in detecting inconsistencies and spotting unusual patterns.
`The dangers with expert reviews are that the experts may not have an
`adequate understanding of the task domain or user com1nunities. Experts
`come in many flavors, and conflicting advice can further confuse the situa(cid:173)
`tion (cynics say, "For every PhD, there is an equal and opposite PhD"). To
`strengthen the possibility of successful expert review, it helps to chose
`knowledgeable experts who are familiar with the project situation and who
`have a long-term relationship with the organization. These people can be
`called back to see the results of their intervention, and they can be held
`accountable. Moreover, even experienced expert reviewers have great diffi(cid:173)
`culty knowing how typical users, especially first-time users, will behave.
`
`4.3 Usability Testing and Laboratories
`
`The emergence of usability testing and laboratories since the early 1980s is an
`indicator of the profound shift in attention to user needs. Traditional man(cid:173)
`agers and developers resisted at first, saying that usability testing seemed like
`·a nice idea, but that time pressures or limited resources prevented them from
`trying it. As experience grew and successful projects gave credit to the testing
`process, demand swelled and design teams began to compete for the scarce
`
`Apple Inc.
`Exhibit 1018
`Page 146
`
`
`
`128
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`resource of the usability-laboratory staff. Managers came to realize that hav(cid:173)
`ing a usability test on the schedule was a powerful incentive to complete a
`design phase. The usability-test report provided supportive confirmation of
`progress and specific recomn1.endations for changes. Designers sought the
`bright light of evaluative feedback to guide their work, and managers saw
`fewer disasters as projects approached delivery dates. The remarkable sur(cid:173)
`prise was that usability testing not only sped up n1.any projects, but also pro(cid:173)
`duced dramatic cost savings (Gould, 1988; Gould et al., 1991; Karat, 1994).
`Usability-laboratory advocates split fron1. their academic roots as these
`practitioners developed innovative approaches that were influenced by
`advertising and 1narket research. While acade1nics were developing con(cid:173)
`trolled experiments to test hypotheses and support theories, practitioners
`developed usability-testing methods to refine user interfaces rapidly. Con(cid:173)
`trolled experilnents have at least two treatments and seek to show statistically
`significant differences; usability tests are designed to find flaws in user inter(cid:173)
`faces. Both strategies use a carefully prepared set of tasks, but usability tests
`have fewer subjects (maybe as few as three), and the outcome is a report with
`rec01n1nended changes, as opposed to validation or rejection of hypotheses.
`Of course, there is a useful spectru1n of possibilities between rigid controls and
`informal testing, and sometimes a combination of approaches is appropriate.
`The movanent toward usability testing stilnulated the construction of
`usability laboratories (Dumas and Redish, 1993; Nielsen, 1993). Many orga(cid:173)
`nizations spent modest sums to build a single usability laboratory, while
`IBM built an elaborate facility in Boca Raton, Florida, with 16laboratories in
`a circular arrangement with a centralized database for logging usage and
`recording performance. Having a physical laboratory makes an organiza(cid:173)
`tion's commitment to usability clear to employees, customers, and users
`(Nielsen, 1994) (Fig. 4.1). A typical1nodest usability laboratory would have
`two 10- by 10-foot areas, one for the participants to do their work and
`another, divided by a half-silvered mirror, for the testers and observers
`(designers, managers, and customers) (Fig. 4.2). IBM was an early leader in
`developing usability laboratories, Microsoft started later, but e.mbraced the
`idea forcefully, and hundreds of software-development companies have fol(cid:173)
`lowed suit. A consulting community that will do usability testing for hire
`also has emerged.
`The usability laboratory is typically staffed by one or more people with
`expertise in testing and user-interface design, who may serve 10 to 15 pro(cid:173)
`jects per year throughout the organization. The laboratory staff meet with
`the user-interface architect or manager at the start of the project to 1nake a
`test plan with scheduled dates and budget allocations. Usability-laboratory
`staff participate in early task analysis or design reviews, provide information
`on software tools or literature references, ~nd help to develop the set of tasks
`for the usability test. Two to six weeks before the usability test, the detailed
`test plan is developed, comprising the list of tasks, plus subjective satisfac-
`
`Apple Inc.
`Exhibit 1018
`Page 147
`
`
`
`4.3 Usability Testing and Laboratories
`
`129
`
`Figure 4.1
`
`Usability lab test, with subject and observer seated at a workstation. Video recorders
`capture the user's actions and the contents of the screens, while n'licrophones cap(cid:173)
`ture thinking-aloud comments. (Used with permission of Sun Microsystems, Moun(cid:173)
`tain View, CA.)
`
`tion and debriefing questions. The number, types, and source of participants
`are identified-sources, for example, 1night be customer sites, temporary
`personnel agencies, or advertisements placed in newspapers. A pilot test of
`the procedures, tasks, and questionnaires, with one to three subjects is con(cid:173)
`ducted one week ahead of time, while there is still time for changes. This
`stereotypic preparation process can be modified in many ways to suit each
`project's unique needs.
`After changes are approved, participants are chosen to represent the
`intended user com1nunities, with attention to background in cmnputing,
`experience with the task, motivation, education, and ability with the natural
`language used in the interface. Usability-laboratory staff also must control
`for physical concerns (such as eyesight, left- versus right-handedness, age,
`and gender), and for other experimental conditions (such as time of day, day
`of week, physical surroundings, noise, room temperature, and level of dis(cid:173)
`tractions).
`Participants should always be treated with respect and should be
`informed that it is not they who are being tested; rather, it is the software and
`user interface that are under study. They should be told about what they will
`be doing (for example, typing text into a computer, creating a drawing using
`a mouse, or getting information from a touchscreen kiosk) and how long
`they will be expected to stay. Participation should always be voluntary, and
`
`Apple Inc.
`Exhibit 1018
`Page 148
`
`
`
`130
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`Figure 4.2
`
`Usability lab control room, with test controllers and observers watching the subject
`through a half-silvered window. Video controls allow zoon1ing and panning to focus
`on user actions. (Used with permission of Sun Microsystems, Mountain View, CA.)
`
`informed consent should be obtained. Professional practice is to ask all sub(cid:173)
`jects to read and sign a state1nent like this one:
`
`• I have freely volunteered to participate in this experiment.
`• I have been informed in advance what my task(s) will be and what pro(cid:173)
`cedures will be followed.
`• I have been given the opportunity to ask questions and have had 1ny
`questions answered to my satisfaction.
`• I am aware that I have the right to withdraw consent and to dis(cid:173)
`continue participation at any time, without prejudice to my future
`treatment.
`• My signature below 1nay be taken as affirmation of all the above state(cid:173)
`ments; it was given prior to my participation in this study.
`
`An effective technique during usability testing is to invite users to think
`aloud about what they are doing. The designer or tester should be supportive of
`the participants, not taking over or giving instructions, but prompting and lis(cid:173)
`tening for clues about how they are dealing with the interface. After a suitable
`
`Apple Inc.
`Exhibit 1018
`Page 149
`
`
`
`4.3 Usability Testing and Laboratories
`
`131
`
`time period for accomplishing the task list-usually one to three hours-the
`participants can be invited to make general comments or suggestions, or to
`respond to specific questions. The informal ahnosphere of a thinking-aloud
`session is pleasant, and often leads to many spontaneous suggestions for
`improvements. In their efforts to encourage thinking aloud, some usability lab(cid:173)
`oratories found that having two participants working together produces 1nore
`talking, as one participant explains procedures and decisions to the other.
`Videotaping participants performing tasks is often valuable for later review
`and for showing designers or 1nanagers the proble1ns that users encounter
`(Lund, 1985). Reviewing videotapes is a tedious job, so careful logging and
`annotation during the test is vital to reduce the time spent finding critical
`incidents (Harrison, 1991). Participants may be anxious about the video cam(cid:173)
`eras at the start of the test, but within minutes they usually focus on the tasks
`and ignore the videotaping. The reactions of designers to seeing videotapes
`of users failing with their system_ is s01netimes powerful and may be highly
`motivating. When designers see subjects repeatedly picking the wrong menu
`ite1n, they realize that the label or placement needs to be changed. Most
`usability laboratories have acquired or developed software to facilitate log(cid:173)
`ging of user activities (typing, mousing, reading screens, reading 1nanuals,
`and so on) by observers with automatic time stamping.
`At each design stage, the interface can be refined iteratively, and the
`improved version can be tested. It is important to fix quickly even small
`flaws, such as of spelling errors or inconsistent layout, since they influence
`user expectations.
`Many variant forms of usability testing have been tried. Nielsen's (1993)
`discount usability engineering, which advocates quick and dirty approaches to
`task analysis, prototype development, and testing, has been widely influen(cid:173)
`tial because it lowered the barriers to newcomers.
`Field tests attempt to put new interfaces to work in realistic environments
`for a fixed trial period. Field tests can be made more fruitful if logging soft(cid:173)
`ware is used to capture error, command, and help frequencies, plus produc(cid:173)
`tivity measures. Portable usability laboratories with videotaping and
`logging facilities have been developed to support more thorough field test(cid:173)
`ing. A different kind of field testing supplies users with test versions of new
`software. The largest field test of all time was probably the beta-testing of
`Microsoft's Windows 95, in which reportedly 400,000 users internationally
`received early versions and were asked to comment.
`Early usability studies can be conducted using paper mockups of screen
`displays to assess user reactions to wording, layout, and sequencing. A test
`administrator plays the role of the computer by flipping the pages while ask(cid:173)
`ing a participant user to carry out typical tasks. This informal testing is inex(cid:173)
`pensive and rapid, and usually is productive.
`Game designers pioneered the can-you-break-this approach to usability
`testing by providing energetic teenagers with the challenge of trying to beat
`
`Apple Inc.
`Exhibit 1018
`Page 150
`
`
`
`132
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`new games. This destructive testing approach, in which the users try to find
`fatal flaws in the system or otherwise to destroy it, has been used in other
`projects and should be considered seriously. Software purchasers have little
`patience with flawed products and the cost of sending out tens of thousands
`of replacement disks is one that few companies can bear.
`Competitive usability testing can be used to compare a new interface to pre(cid:173)
`vious versions or to similar products from competitors. This approach is
`close to a controlled experim.ental study, and staff 1nust be careful to con(cid:173)
`struct parallel sets of tasks and to counterbalance the order of presentation of
`the interfaces. Within subjects designs seem more powerful because partici(cid:173)
`pants can make c01nparisons between the c01npeting interfaces, so fewer
`participants are needed, although they will each be needed for a longer time
`period.
`For all its success, usability testing does have at least two serious limita(cid:173)
`tions: It e1nphasizes first-time usage and has limited coverage of the interface
`features. Since usability tests are usually two to four hours, it is difficult to
`ascertain how performance will be after a week or a 1nonth of regular usage.
`Within the typical two to four hours of a usability test, the participants may
`get to use only a small fraction of the features, menus, dialog boxes, or help
`screens. These and other concerns have led design temns to supplement
`usability testing with the varied forms of expert reviews.
`
`4.4 Surveys
`
`Written user surveys are a familiar, inexpensive, and generally acceptable
`companion for usability tests and expert reviews. Managers and users grasp
`the notion of surveys, and the typically large nu1nbers of respondents (hun(cid:173)
`dreds to thousands of users) offer a sense of authority compared to the poten(cid:173)
`tially biased and highly variable results from small numbers of usability-test
`participants or expert reviewers. The keys to successful surveys are clear
`goals in advance and then development of focused items that help to attain
`those goals. Experienced surveyors know that care is also needed during
`administration and data analysis (Oppenheim, 1992).
`A survey form should be prepared, reviewed among colleagues, and
`tested with a small sample of users before a large-scale survey is conducted.
`Similarly, statistical analyses (beyond means and standard deviations) and
`presentations (histograms, scatterplots, and so on) should also be developed
`before the final survey is distributed. In short, directed activities are more
`successful than unplanned statistical-gathering expeditions (no wild goose
`chases, please). My experience is that directed activities also seem to provide
`the most fertile frameworks for unanticipated discoveries.
`
`Apple Inc.
`Exhibit 1018
`Page 151
`
`
`
`4.4 Surveys
`
`133
`
`Survey goals can be tied to the cmnponents of the OAI 1nodel of interface
`design (see Section 2.3). Users could be asked for their subjective ilnpres(cid:173)
`sions about specific aspects of the interface, such as the representation of
`
`• Task domain objects and actions
`• Interface dmnain metaphors and action handles
`o Syntax of inputs and design of displays
`
`Other goals would be to ascertain the user's
`
`• Background (age, gender, origins, education, incmne)
`• Experience with con1.puters (specific applications or software packages,
`length of time, depth of knowledge)
`• Job responsibilities (decision-1naking influence, managerial roles,
`motivation)
`• Personality style (introvert versus extravert, risk taking versus risk
`averse, early versus late adopter, systematic versus opportunistic)
`• Reasons for not using an interface (inadequate services, too complex, too
`slow)
`• Familiarity with features (printing, macros, shortcuts, tutorials)
`• Feelings after using an interface (confused versus clear, frustrated ver(cid:173)
`sus in control, bored versus excited)
`
`Online surveys avoid the cost and effort of printing, distributing, and col(cid:173)
`lecting paper forms. Many people prefer to answer a brief survey displayed
`on a screen, instead of filling in and returning a printed form, although there
`is a potential bias in the self-selected sample. One survey of World Wide Web
`utilization generated more than 13,000 respondents. So that costs are kept
`low, surveys might be administered to only a fraction of the user community.
`In one survey, users were asked to respond to eight statements according
`to the following commonly used scale:
`
`1. Strongly agree
`2. Agree
`3. Neutral
`4. Disagree
`5. Strongly disagree
`
`The items in the survey were these:
`
`1. I find the system commands easy to use.
`2. I feel competent with and knowledgeable about the system commands.
`3. When writing a set of system commands for a new application, I am
`confident that they will be correct on the first run.
`
`Apple Inc.
`Exhibit 1018
`Page 152
`
`
`
`134
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`4. When I get an error message, I find that it is helpful in identifying the
`proble1n.
`5. I think that there are too many options and special cases.
`6. I believe that the com1nands could be substantially simplified.
`7. I have trouble remembering the commands and options, and must con(cid:173)
`sult the manual frequently.
`8. When a problem arises, I ask for assistance fro1n someone who really
`knows the system.
`
`This list of questions can help designers to identify problems users are
`having, and to demonstrate improvement to the interface as changes are
`made in training, online assistance, com1nand structures, and so on; progress
`is demonstrated by improved scores on subsequent surveys.
`In a study of error messages in text-editor usage, users had to rate the
`messages on 1-to-7 scales:
`
`Hostile
`Vague
`Misleading
`Discouraging
`
`1234567
`1234567
`1234567
`1234567
`
`Friendly
`Specific
`Beneficial
`Encouraging
`
`If precise-as opposed to general-questions are used in surveys, then
`there is a greater chance that the results will provide useful guidance for tak(cid:173)
`ing action.
`Coleman and Williges (1985) developed a set of bipolar semantically
`anchored items (pleasing versus irritating, simple versus complicated, con(cid:173)
`cise versus redundant) that asked users to describe their reactions to using a
`word processor. Another approach is to ask users to evaluate aspects of the
`interface design, such as the readability of characters, the meaningfulness of
`command names, or the helpfulness of error messages. If users rate as poor
`one aspect of the interactive system, the designers have a clear indication of
`what needs to be redone.
`The Questionnaire for User Interaction Satisfaction (QUIS) was developed by
`Shneiderman and was refined by Norman and Chin (Chin et al., 1988)
`(http:/ /www.lap.umd.edu/QUISFolder/quisHome.html). It was based on
`the early versions of the OAI model and therefore covered interface details,
`such as readability of characters and layout of displays; interface objects,
`such as meaningfulness of icons; interface actions, such as shortcuts for fre(cid:173)
`quent users; and task issues, such as appropriate terminology or screen
`sequencing. It has proved useful in demonstrating the benefits of improve(cid:173)
`ments to a videodisc-retrieval program, in.comparing two Pascal program(cid:173)
`ming environments,
`in assessing word processors, and
`in setting
`requirements for redesign of an online public-access library catalog. We have .
`
`Apple Inc.
`Exhibit 1018
`Page 153
`
`
`
`4.5 Acceptance Tests
`
`135
`
`since applied QUIS in many projects with thousands of users and have cre(cid:173)
`ated new versions that include items relating to website design and video(cid:173)
`conferencing. The University of Maryland Office of Technology Liaison
`(College Park, Maryland 20742; (301) 405-4209) licenses QUIS in electronic
`and paper forms to over a hundred organizations internationally, in addition
`to granting free licenses to student researchers. The licensees have applied
`QUIS in varied ways, sometimes using only parts of QUIS or adding
`domain-specific ite1ns.
`Table 4.1 contains the long fonn that was designed to have two levels of
`questions: general and detailed. If participants are willing to respond to
`every ite1n, then the long-form questionnaire can be used. If participants are
`not likely to be patient, then only the general questions in the short form
`need to be asked.
`Other scales include the Post-Study System Usability Questionnaire,
`developed by IBM, which has 48 ite1ns that focus on system usefulness, infor(cid:173)
`mation quality, and interface quality (Lewis, 1995). The Software Usability
`Measurement Inventory contains 50 items designed to measure users' per(cid:173)
`ceptions of their effect, efficiency, and control (Kirakowski and Corbett, 1993).
`
`4.5 Acceptance Tests
`
`For large imple1nentation projects, the customer or manager usually sets
`objective and measurable goals for hardware and software performance.
`Many authors of requirements documents are even so bold as to specify
`1nean time between failures, as well as the mean time to repair for hardware
`and, in some cases, for software. More typically, a set of test cases is specified
`for the software, with possible response-time requirements for the hardware(cid:173)
`software combination. If the completed product fails to meet these accep(cid:173)
`tance criteria, the system must be reworked until success is demonstrated.
`These notions can be neatly extended to the hu1nan interface. Explicit
`acceptance criteria should be established when the requirements document
`is written or when a contract is offered.
`Rather than the vague and misleading criterion of "user friendly," measur-
`able criteria for the user interface can be established for the following:
`
`• Time for users to learn specific functions
`• Speed of task performance
`• Rate of errors by users
`• User retention of commands over tilne
`• Subjective user satisfaction
`
`Apple Inc.
`Exhibit 1018
`Page 154
`
`
`
`136
`
`4 Expert Reviews, Usability Testing, Surveys, and Continuing Assessments
`
`Table 4.1
`
`Questionaire for User Interaction Satisfaction(© University of Maryland, 1997)
`
`Identification number: _______ System: __ Age:
`
`Gender: male
`
`female
`
`PART 1: System Experience
`1.1 How long have you worked on this system?
`less than 1 hour
`_1 hour to less than 1 day
`_
`1 day to less than 1 week
`1 week to less than 1 month
`1 month to less than 6 months
`1.2 On the average, how much time do you spend per week on this system?
`less than one hour
`4 to less than 10 hours
`one to less than 4 hours
`over 10 hours
`
`__ 6 months to less than 1 year
`__ 1 year to less than 2 years
`__ 2 years to less than 3 years
`__ 3 years or 1nore
`
`PART 2: Past Experience
`2.1 How many operating systems have you worked with?
`3-4
`none
`5-6
`1
`more than6
`2
`2.2 Of the following devices, software, and systems, check those that you have personally used
`and are familiar with:
`__ computer terminal
`color monitor
`CD-ROM drive
`track ball
`_graphics tablet
`scanners
`spreadsheet software
`_
`__ voice recognition
`__ CAD computer aided design
`
`_
`
`personal computer
`touch screen
`_keyboard
`_joystick
`_head mounted display
`_word processor
`database software
`video editing systems
`rapid prototyping systems
`
`__ lap top computer
`_
`floppy drive
`mouse
`__ pen based computing
`modems
`graphics software
`computer games
`internet
`
`PART 3: Overall User Reactions
`Please circle the numbers which most appropriately reflect your impressions about using this
`computer system. Not Applicable= NA.
`3.1 Overall reactions to the system:
`
`wonderful
`terrible
`1 2 3 4 5 6 7 8 9
`frustrating
`satisfying
`1 2 3 4 5 6 7 8 9
`stimulating
`dull
`1 2 3 4 5 6 7 8 9
`difficult
`easy
`1 2 3 4 5 6 7 8 9
`adequate power
`inadequate power
`1 2 3 4 5 6 7 8 9
`flexible
`rigid
`1 2 3 4 5 6 7 8 9
`
`NA
`
`NA
`
`NA
`
`NA
`
`NA
`
`NA
`
`3.2
`
`3.3
`
`3.4
`
`3.5
`
`3.6
`
`Apple Inc.
`Exhibit 1018
`Page 155
`
`
`
`Table 4.1
`
`(continued)
`
`4.5 Acceptance Tests
`
`137
`
`PART 4: Screen
`4.1 Characters on the computer screen
`
`4.1.1 Image of characters
`
`4.1.2 Character shapes (fonts)
`
`