`
`Software Testing Best Practices
`
`Ram Cbillarege
`Center for Software Engineering
`IBM Research
`
`Abstract:
`
`This report lists 28 best practices that contribute to improved software testing. n,ey are not
`necessarily related to software test tools. Some may have associated tools 0111 they are
`fundamentally practice. The collections represent practices that several experienced software
`organizations have gained from and and recognize as key.
`
`1. Introduction
`
`Every time we conclude a snidy or task force on the subject of software development
`process I have found one recO!ll!llendation that comes out loud and d ear. "We need to adopt the
`best practices in the industry." While it appears as an obvious conclusion, the most glaring lack of
`it's presence continues to astound the snidy team. So clear is its presence that it distinguishes the
`winners from the also-ran like no other factor.
`
`The se.arch for best practices is constant. Son1e are known and well recognized, others
`debated, and several hidden. Sometimes a practices that is obvious to the observer may be
`transparent to the practitioner who chants "that's just the ·way we do things." At other tin1es
`what's known in one co!ll!llunity is never heard of in another.
`
`The list in this article is focused on Software Testing. While every attempt is made to
`foe.us it to testing, we know, that testing does not stand alone. It is intiniately dependent on the
`development activity and therefore draws heavily on the development practices. But finally,
`testing is a separate process activity - the final arbiter of validity before the user assesses its
`merit.
`
`The collec.tion of practices have come frohm many sources - at this point indelibly
`blended wi th its long history. Some of them were identified merely through a recognition of what
`is in the literatures; others through focus groups where practitioners identified what they \>allied.
`The list has been sifted and shared \\~th increasing nunlber of practitioners to gain their insight.
`And finally they were culled down to a reasonable number.
`
`A long list is hard to conceptualize, less translate to implementation. To be actionable, we
`need to think in terms of steps - a few at a time, and avenues to tailor the choice to our own
`independent needs. I like to think of them as Basic, Foundational, and Incremental.
`
`I
`
`
`
`CONFIGIT 1023
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96856 4116199
`
`The Basics are exactly that. They are the training wheels you need to get started and
`when you take them off, it is evident that you !mow how to ride. But remember, that you take
`them off does not mean you forget how to ride. 1bis is an imponant difference which all too often
`is forgotten in software. "Yeah, we used to write functional specification but we don't do that
`anymore" means you forget to ride, not that you didn't need to do that step anymore.
`The
`Basic practices have been around for a long time. Their v-alue contribution is widely recognized
`and documented in our software engineering literature. Their applicability is broad, regardless of
`product or process.
`
`The Foundational practices are the rock in the soil that protects your effons against
`harshness of nanire, be it a redesign of your architecnire or enhancements to sustain unforeseen
`growth. They need to be put down thoughtfully and will make the difference in the long haul,
`whether you build a ranch or a sl..-yscraper. Their value add is significant and established by a few
`leaders in the industry. Unlike the Basics, they are probably not as well !mown and therefore need
`implementation help. While there may be no textbooks on them yet, there is plenty of
`documentation to dig up.
`
`The Incremental practices provide specific advantages in special condition~. While they
`may not provide broad gains across the board of testing, they are more specialized. These are the
`right angle drills - when you need it, there's nothing else that can get between narrow snids and
`drill a hole perfectly square. At the same time, if there was just one drill you were going to buy, it
`may not be yollf first choice. Not all practices are widely !mown or greatly docmnented. But they
`all possess the strength that are powerful when judiciously applied.
`
`The next sections describe each of the practices and are grouped under Basics,
`Foundational, and Incren1ental.
`
`2. The Basic Practices
`
`• Functional Specifications
`• Reviews and In~pection
`• Formal entry and exit criteria
`• Functional test - variations
`•Multi-platform testing
`• Internal Betas
`• Automated test execution
`• Beta programs
`• 'Nightly' Builds
`
`f unctional Specifications
`
`Functional specifications are a key pan of many development processes and came
`into vogue with the development of the waterfall process. While it is a development
`
`2
`
`
`
`2
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96856 4/26199
`
`process aspect, it is critically necessary for software functional test. A functional
`specification often describes the external view of an object or a procedure indicating the
`options by which a servi ce could be invoked. The testers use this to write down test cases
`from a black box testing perspective ..
`
`The advantage of having a functioual specification is that the test generation
`activity could happen in parallel with the development of the code. This is ideal fron1
`several dimen~ons. Firstly, it gain~ parallelism in execution, removing a serious
`serialization bottleneck in the development process. By the time the software code is
`ready, the test cases are also ready to be run against the code. Secondly, it forces a degree
`of clarity from the perspec.tive of a designer and an architect, so essential for the overall
`efficiencies of development. Thirdly, the functional specifications becon1e documentation
`that can be shared with the customers to gain an additional perspective on what is being
`developed.
`
`Re,iews and Inspection
`
`Software inspection, which was invented by Mike Fagan in the mid ?O's at IBM,
`has grown to be recognized as one of the most efficient methods of debugging code.
`Today, 20 years later, there are several books written on software inspec.tion, tools have
`been made available, and consulting organizations teach the practice of software
`inspection. It is argued that software inspection can easily provide a ten times gain in the
`process of debugging software .. Not much needs to be said about this, since it is a fairly
`well-know11 and understood practice.
`
`f onnal :Entry and :Exit Ciiteria
`
`The notion of a formal entry and exit criteria goes back to the evohllion of the
`waterfall development processes and a model called ETVX, again an IBM invention. The
`idea is that every process step, be it inspec.tion, fttnctioual test, or software design, has a
`precise entry and precise exit criteria. These are deiined by the development process and
`are watched by management to gate the movement from one stage to another. It is
`arguable as to how precise any one of the criteria can be, and with the decrease of
`emphasis developmerll, process entry and exit criteria went out of currency. However,
`this practice allows much more careful management of the software development process.
`
`f unctional Test - Variations
`
`Most functioual tests are written as black box tests working off a functional
`specification. The number of test cases that are generated usually are variations on the
`input space coupled ,vith visiting the output conditions. A variation refers to a spec.ific
`combination of input conditions to yield a specific output condition. Writing down
`functional tests involves v.'Iiting different variations to cover as much of the state space as
`one deems necessary for a program. The best practice involves understanding how to
`,mte variations and gain coverage which is adequate enough to thoroughly test the
`
`3
`
`
`
`3
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96856 4116199
`
`fttnction. Given that there is no measure of cove.rage for functional tests, the practice of
`writing variations does involve an element of art. Toe practice has been in use in many
`locations within IBM and we need to consolidate our knowledge to teach new flmction
`testers the art and practice.
`
`Multi-platform Testing
`
`Many products today are designed to mn on different platfomlS which creates the
`additional burden to both design and test the product. When code is ported fron1 one
`platfom1 to another, modifications are sometimes done for perfonnance pwposes. Toe net
`result is that testing on multiple platfOllllS has become a necessity for most products.
`Therefore techniques to do this better, both in development and testing, are essential. This
`best practice should address all aspects of multi-platform development and testing.
`
`Internal Betas
`
`The idea of a Beta is to release a product to a limited number of customers and get
`feedback to fix problen1S before a larger shipment. For larger companies, such as IBM,
`Microsoft and Oracle, many of their products are used internally, th11S fotnling a good beta
`audience. Techniques to best conduct such an internal Beta test are essential for us to
`obtain good coverage and efficiently use internal resources. This best practice has
`everything to do with Beta progran15 though on a smaller scale to best leverage it and
`reduce cost and expense of an external Beta.
`
`Automated Test Enrntion
`
`The goal of automated test execution is that we minimize the a11101111t of manual
`work involved in test execution and gain higher coverage with a larger number of test
`cases. The automated test execution has a significant impact on both the tools sets for test
`execution and also the way tests are designed. Integral to automated test envir0flll1ents is
`the test oracle that verifies current operation and logs failure with diagnosis information.
`This is a best practice fairly well 1.111derstood in some segn1ents of software testing and not
`in others. The best practice., therefore, needs to leverage what is known and then develop
`methods for areas where autoniation is not yet fully exploited.
`
`Beta Programs
`
`(see internal betas)
`
`'Nightly' Builds
`
`4
`
`
`
`4
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96&56 4126199
`
`The concept of a nightly build has been in vogue for a long time. While every build
`is not necessarily done every day, the concept capnires frequent builds from changes that
`are being promoted into the change control system. The advantage is firstly, that if a
`major regression occurs because of errors recently generated, they are capnired quickly.
`Secondly, regression tests can be mn in the background. Thirdly, the newer releases of
`software are available to developers and testers sooner.
`
`3. Foundational
`
`User Scenarios
`Usability Testing
`In-process ODC feedback loops
`Multi-release CDC/Butterfly profiles
`R.equiren1ents for test planning
`Automated test generation
`
`User Sc.ena1ios
`
`As we integrate multiple software products and create end user applications that
`invoke one or a multiplicity of products, the task of testing the end user feantres gets
`complicated. One of the viable methods of testing is to develop user scenarios that
`exercise the functionality of the applications. We broadly call these User Scenarios. The
`advantage of the user scenario is that it tests the product in the ways that most likely
`reflect customer usage, imitating what Software Reliability Engineering has for long
`advocated under the concept of Operational Profile. A fttrther advantage of using user
`scenarios is that one reduces the con1plexity of writing test cases by moving to testing
`scenarios than feantres of an application. However, the methodology of developing user
`scenarios and using enough ofthen1 to get adequate coverage at a fonctional level
`continues to be a difficult task. 1bis best practice should capntre methods of recording
`user scenarios and developing test cases based on them. In addition it could discuss
`potential diagnosis methods when specific faihtre scenarios occurs.
`
`Usability Testing
`
`For a large munber of products, it is believed that the usability becon1es the final
`arbiter of quality. 1bis is true for a large number of desktop applications that gained
`market share through providing a good user experience. Usability testing needs to not only
`assess how usable a product is but also provide feedback on methods to improve the user
`experience and thereby gain a positive quality image. The best practice for usability
`testing should also have knowledge about advances in the area of Human Computer
`Interface
`
`5
`
`
`
`5
`
`
`
`Copyright IBM Research - Technical Report RC21457 Log 96&56 4/26199
`
`In-Process ODC F eeclback Loops
`
`Orthogonal defect classification is a measurement method that uses the defect
`stream to provide precise measurability into the product and the process. Given the
`measurement, a variety of analysis techniques have been developed to assist management
`and decision making on a range of software engineering activities. One of the uses ofODC
`has been the ability to dose feedback loops in a software development process, which has
`traditionally been a difficult task. While ODC can be used for a variety of other software
`managen1ent methods, dosing of feedback loops has been found over the past few years to
`be a much needed process improvement and cost control mechanism.
`
`Multi-Release ODC/Butterfly
`
`A key feature of the ODC measuren1ent is the ability to look at nmltiple releases of
`a product and develop a profile of customer usage and its impact on warranty costs and
`overall development efficiencies. The technology of multi-release ODC/Butterfly analysis
`allows a product manager to m..ke strategic development decisions so as to optimize
`development costs, tin1e to market, and quality issues by recognizing customer trends,
`usage patterns, and product performance.
`
`"Requirements" for Tes t Planning
`
`One of the roles of software testing is to ensure that the product meets the
`requiren1ents of the c!ientele. Capturing the requirements therefore becomes an essential
`part not only to help develop but to create test plans that can be used to gauge if the
`developed product is likely to meet customer needs. Often times in smaller development
`organizations, the task of requirements managenient falls prey to conjec.nires of what
`ought to be developed as opposed to what is needed in the marke.t. Therefore,
`requirements management and its translation to produce test plans is an in1pot1ant step.
`This practice needs to be understood and executed v.~th a holistic view to be successful.
`
`Automated Tes t ~neration
`
`Ahnost 30% of the testing task can be the writing of test cases. To first order of
`approximation, this is a completely m.1nual exercise and a prime candidate for savings
`through automation. However, the technology for automation has not been advancing as
`rapidly as one would have hoped. While there are automated test generation tools they
`often produce too large a test set, de.feating the gains from automation. On the other,
`there do exist a few techniques and tools that have been recognized as good methods for
`autoniatically generating test cases. The practice needs to understand which of these
`methods are successful and in what environments they are viable .. There is a reasonable
`
`6
`
`
`
`6
`
`
`
`Copyright IBM Research - Technical Report RC21457 Log96&56 4/26199
`
`amount of learning in the use of these tools or methodologies but they do pay off past the
`initial ramp up.
`
`4. Incremental
`
`• T earning testers v.~th developers
`•Code coverage (SWS)
`• Automated environment generator (Drake)
`•Testing to help ship on demand
`•State task diagran1 (Tucson)
`• Men1ory resource failure simulation
`• Statistical testing (Tucson)
`•Semifonnal methods (e .. g. SOL)
`•Check-in tests for code
`• Minimizing regression test cases
`• lnstmmented versions for MTTF
`• Benchmark trends
`• Bug bounties
`
`Teaming Testers with Denlopers
`
`Its been recognized for a long time that the close coupling of testers v.~th
`developers in1proves both the test cases and the code that is developed. An extren1e case
`of this practice is Microsoft, where every developer is shadowed v.~th a tester. Needless to
`say, one does not have to resort to such an extren1e to gain the benefits of this teaming.
`This practice should, therefore, understand the kinds of teaJlling that are beneficial, and the
`environments in which they may be employed. The value of a best practice such as
`teaJlliug should be therefore more than just concept. Instead it should include guidance on
`forming the right team while reporting the pitfalls and successes experienced.
`
`Code Con?rage,
`
`The concept of code coverage is based on a stmctural notion of the code. Code
`coverage implies a numerical metric that measures the elen1ents of the code that have been
`exercised as a consequence of testing. There are a host of metrics: statements, branches,
`and data that are in1plied by the tenn code coverage .. Today, there exist several tools that
`assist in this measurement and additionally provide guidance on covering elen1ents not yet
`exercised. This is also an area that has had considerable acaderuic play and has been an
`issue of debate for a couple of decades. The practice of code coverage should therefore
`carry infonnation about the tools and the me.thods of how to employ code coverage and
`track results fron1 the positive benefits experienced.
`
`7
`
`
`
`7
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96856 4116199
`
`Automated En,ironment ~ nera tor
`
`A fairly time-consuming task is the setting up a test environments to execute test
`cases. These tasks can take greater amounts of time as we have more operating systems,
`more versions, and code that mus on multiple platforms. The shear task of bringing up an
`environment and taking it down for a different set oftest cases can dominate the calendar
`in system test. Tools that can autoniatically set up environments, mu the test cases, record
`the results, and then autoniatically reconfigure to a new environment, have high value. The
`IBM Hursley Lab has developed a tool called DRAKE that does precisely. This best
`practice should therefore capture the issues, tools, and techniques that are associated with
`an envirolllllents set up, break dow11, and autOll!atic mnniug of test cases.
`
`Testing to Help Ship on Demand
`
`This is an idea from Microsoft where they look at the testing process as one that
`eriables late changes and acconllllodates market pressures. It changes the role of testing to
`one of providing excellent regression ability and working in late changes that still do not
`break the product or the ship schedule. This really amounts to a philosophical view of
`testing, placing it in a different role yielding new ramifications for the entire development
`process. We cite this as a best practice to recognize that there may be areas where such a
`conceptual framework necessitate a very reactive testing practice. The practice ought to
`identify how to work this concept into organizations and products in specific markets. It
`may have applicability in the £-Commerce world, where there is far greater customer
`interaction and competitive pressure.
`
`State Task Diagram
`
`This practice captures the functional operations of an application or a module in
`the form of a state transition diagram The advantages of doing so allows one to create
`test cases autoniatically or create coverage metrics that are closer to the ftmctional
`decomposition of the application. There are a fair nwnber of tools that allow for capturing
`Markov models which may be useful for this practice. The difficulties have tlStlally been in
`extracting the functional view of a product which may not exist in any con1putable or
`documented form and producing the state transition diagram. One of the automated test
`generation tools called Test Master from Teradyne
`acrually uses state task cliagranis for the generation offlmctional test. This practice has
`possibly more than one application and the keepers of the practice need to capture the
`tools, the methods, and its uses.
`
`Memo11' Resource f ailure Simulation
`
`This practice addresses a particular software bug, namely the loss of memory
`because of poor heap management or the lack of garbage collection. It is a fairly serious
`problem for many C programs in Unix applications. It also exists on other platfonns and
`
`8
`
`
`
`8
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96856 4/26199
`
`languages. There are commercial tools available to help simulate memory failure and
`check for memoty leaks. The practice should be generic and develop methods and
`techniques for use on different platforms and language environments.
`
`Statistic.al Testing
`
`The concept of statistical testing was invented by the late Harlan Mills (IBM
`Fell ow who invented Clean Room software engineering). The central idea is to use
`software testing as a means to assess the reliability of software as opposed to a de.bugging
`mechanism. 1bis is quite contrary to the popular use of software testing as a debugging
`method. Therefore one needs to recognize that the goals and motivations of statistical
`testing are different fundamentally. There are many arguments as to why this might indeed
`be a very valid approach. The theory of this is buried in the concepts of Clean Roon1
`software engineering and are worthy of a separate discussion. Statistical testing needs to
`exercise the software along an operational profile and then measure interfailure times that
`are then used to estimate its reliability. A good development process should yield an
`increasing mean time between faih!fe every time a bug is fixed. 1bis then becon1es the
`release criteria and the conditions to stop software testing.
`
`Semi-Formal l\lethocls
`
`The origin of formal methods in software engineering dates a couple of decades.
`Over the years it has made considerable progress in some specific areas such as protocol
`implementation. The key concept of a formal method is that it would allow for a
`verification of the program as opposed to testing and debugging. The verification methods
`are varied, SOllle of which are theorem provers, while some of them simulation against
`which assertions can be validated. The vision of formal methods has always been that if
`the specification of software is succinctly captured it could lead to automatic generation of
`code, requiring minimal testing.
`
`In the practice there has been much debate on the viability of semi-formal methods
`and to a large extent the industty ignores it. However, one must recognize a very key
`contribution from IBM's Hursley I.ab. 1bis is where the kernel of CICS was implen1ented
`after a formal specification written in Z. A semi-formal method is one where the
`specificatiori~ caplll!ed may be in state transition diagran1S or tables that can then be used
`for even test generation. IBM's Zurich Research Lab has done some work in this area and
`very successftilly used this for protocol implementation~. The best practice in senli-formal
`methods ought to caplll!e our experience and also guide places where such applications
`may be viable.
`
`Check-in Tests for Code
`
`The idea of a check-in test is to couple an automatic test prograni (usually a
`regression test) \\~th the change control system. Microsoft has been known to en1ploy
`
`9
`
`
`
`9
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96&56 4126199
`
`such a system very well. 1bis allows for an automatic test run on recently changed code so
`that the chances of the code breaking the build are minimized. In fact, Microsoft's change
`control system and build are supposedly set up such that unless the code passes the test, it
`does not get promoted into the next build.
`
`Minimizing Regression Test Cases
`
`In organizations that have a legacy of development and of products that have
`matured over many releases, it is not uncommon to find regression test buckets that are
`huge. The negative consequence of such large test buckets is that they take long to
`execute. At the same time, it is often unclear as to which of these test cases are duplicative
`providing little additional value. There are several methods to minimize the regression
`tests. One of the methods looks at the code coverage produced, and distill test cases to a
`minima 1 set. One must note that this method, though attractive., does conft1Se a structural
`metric with a fonctional test. Never the less, it is a way to implenient the mininlization.
`
`Insniunentecl Versions for 1\ITTI
`
`An opportunity that a beta progran1 provides is that one get a large sample ofl1Sers
`to test the product. If the product is instrumented so that failures are recorded and
`returned to the vendor, they would yield an excellent source to measure the mean time
`between faihire of the software. There are several uses for this metric. Firstly, it can be
`used as a gauge to enhance the product's quality in a manner that would be meaningful to
`a tlSer. Secondly, it allows us to measure the mean time between faihire of the same
`product under different customer profiles or tlSer sets. Thirdly, it can be enhanced to
`additionally capture first failure data that could benefit the diagnosis and problen1
`determination. Microsoft has claimed that they are able to do at least the first two through
`instrumented versions that they ship in their betas.
`
`Benchmark Trends
`
`Benchm.1fking is a broad concept that applies to many disciplines in different areas.
`In the world of software testing, we could interpret this to mean the techniques and the
`performance of testing me.thods as experienced by other software developers. Today there
`is not an avenue to regularly excllange such infonnation and compare benchmarks. This
`best practice could be initiated by benchmarking across IBM Labs and then advance the
`practice to include a larger pool with competitors and custon1ers.
`
`Bug Bounties
`
`We have heard that bug bounties were tlSed in Microsoft and we know that they
`have been used in IBM during the I OX days. Bug bounties refers to our initiatives that
`charge the organization with a focus on detecting software bugs. At times providing
`
`10
`
`
`
`10
`
`
`
`Copyright IBM Research - Technical Report RC 21457 Log 96&56 4126199
`
`rewards too. The experience states that such effort tend to identify a larger than usual
`number of bugs. Clearly additional resource is necessary to fix the bugs. But the net result
`is a higher quality product.
`
`11
`
`
`
`11
`
`