throbber
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`507
`
`The Oracle Problem in Software
`Testing: A Survey
`
`Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo
`
`Abstract—Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system,
`the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test
`oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation.
`Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles
`has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic
`testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be
`aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All
`forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a
`comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software
`testing research and practice.
`
`Index Terms—Test oracle, automatic testing, testing formalism
`

`
`1 INTRODUCTION
`
`M UCH work on software testing seeks to automate as
`
`much of the test process as practical and desirable,
`to make testing faster, cheaper, and more reliable. To
`this end, we need a test oracle, a procedure that distin-
`guishes between the correct and incorrect behaviors of
`the System Under Test (SUT).
`However, compared to many aspects of test automation,
`the problem of automating the test oracle has received signif-
`icantly less attention, and remains comparatively less well-
`solved. This current open problem represents a significant
`bottleneck that inhibits greater test automation and uptake
`of automated testing methods and tools more widely. For
`instance, the problem of automatically generating test inputs
`has been the subject of research interest for nearly four deca-
`des [46], [107]. It involves finding inputs that cause execution
`to reveal faults, if they are present, and to give confidence in
`their absence, if none are found. Automated test input gener-
`ation been the subject of many significant advances in both
`Search-Based Testing [3], [5], [83], [126], [128] and Dynamic
`Symbolic Execution [75], [108], [161]; yet none of these
`advances address the issue of checking generated inputs
`with respect to expected behaviours—that is, providing an
`automated solution to the test oracle problem.
`Of course, one might hope that the SUT has been devel-
`oped under excellent design-for-test principles, so that there
`
` E.T. Barr, M. Harman and S. Yoo are with the Department of Computer
`Science, University College London, Gower Street, London WC2R 2LS,
`London, United Kingdom. E-mail: {e.barr, m.harman, shin.yoo}@ucl.ac.uk.
` P. McMinn and M. Shahbaz are with the Department of Computer
`Science, University of Sheffield, Sheffield S1 4DP, South Yorkshire,
`United Kingdom.
`E-mail: p.mcminn@sheffield.ac.uk, muzammil.shahbaz@gmail.com.
`
`Manuscript received 24 July 2013; revised 5 Sept. 2014; accepted 28 Oct.
`2014. Date of publication 19 Nov. 2014; date of current version 15 May 2015.
`Recommended for acceptance by L. Baresi.
`For information on obtaining reprints of this article, please send e-mail to:
`reprints@ieee.org, and reference the Digital Object Identifier below.
`Digital Object Identifier no. 10.1109/TSE.2014.2372785
`
`might be a detailed, and possibly formal, specification of
`intended behaviour. One might also hope that the code
`itself contains pre- and post- conditions that implement
`well-understood contract-driven development approaches
`[135]. In these situations, the test oracle cost problem is ame-
`liorated by the presence of an automatable test oracle to
`which a testing tool can refer to check outputs, free from the
`need for costly human intervention.
`Where no full specification of the properties of the SUT
`exists, one may hope to construct a partial test oracle that
`can answer questions for some inputs. Such partial test
`oracles can be constructed using metamorphic testing
`(built from known relationships between desired behav-
`iour) or by deriving oracular information from execution
`or documentation.
`For many systems and most testing as currently practiced
`in industry, however, the tester does not have the luxury of
`formal specifications or assertions, or automated partial test
`oracles [91], [92]. The tester therefore faces the daunting task
`of manually checking the system’s behaviour for all test
`cases. In such cases, automated software testing approaches
`must address the human oracle cost problem [1], [82], [130].
`To achieve greater test automation and wider uptake of
`automated testing, we therefore need a concerted effort to
`find ways to address the test oracle problem and to integrate
`automated and partially automated test oracle solutions
`into testing techniques. This paper seeks to help address
`this challenge by providing a comprehensive review and
`analysis of the existing literature of the test oracle problem.
`Four partial surveys of topics relating to test oracles pre-
`cede this one. However, none has provided a comprehen-
`sive survey of trends and results. In 2001, Baresi and Young
`[17] presented a partial survey that covered four topics
`prevalent at the time the paper was published: assertions,
`specifications, state-based conformance testing, and log file
`analysis. While these topics remain important, they capture
`only a part of the overall landscape of research in test
`
`This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
`
`CONFIGIT 1039
`
`1
`
`

`

`508
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`oracles, which the present paper covers. Another early work
`was the initial motivation for considering the test oracle
`problem contained in Binder’s textbook on software testing
`[23], published in 2000. More recently, in 2009, Shahamiri
`et al. [164] compared six techniques from the specific cate-
`gory of derived test oracles. In 2011, Staats et al. [174] pro-
`posed a theoretical analysis that included test oracles in a
`revisitation of the fundamentals of testing. Most recently, in
`2014, Pezze and Zhang focus on automated test oracles for
`functional properties [150].
`Despite this work, research into the test oracle problem
`remains an activity undertaken in a fragmented community
`of researchers and practitioners. The role of the present
`paper is to overcome this fragmentation in this important
`area of software testing by providing the first comprehen-
`sive analysis and review of work on the test oracle problem.
`The rest of the paper is organised as follows: Section 2 sets
`out the definitions relating to test oracles that we use to com-
`pare and contrast the techniques in the literature. Section 3
`relates a historical analysis of developments in the area. Here
`we identify key milestones and track the volume of past pub-
`lications. Based on this data, we plot growth trends for four
`broad categories of solution to the test oracle problem, which
`we survey in Sections 4, 5, 6, and 7. These four categories
`comprise approaches to the oracle problem where:
`
`
`
`
`
`test oracles can be specified (Section 4);
`test oracles can be derived (Section 5);
`test oracles can be built from implicit information
`(Section 6); and
`no automatable oracle is available, yet it is still possible
`to reduce human effort (Section 7).
`Finally, Section 8 concludes with closing remarks.
`
`2 DEFINITIONS
`
`This section presents definitions to establish a lingua franca
`in which to examine the literature on oracles. These defini-
`tions are formalised to avoid ambiguity, but the reader
`should find that it is also possible to read the paper using
`only the informal descriptions that accompany these formal
`definitions. We use the theory to clarify the relationship
`between algebraic specification, pseudo oracles, and meta-
`morphic relations in Section 5.
`To begin, we define a test activity as a stimulus or
`response, then test activity sequences that incorporate con-
`straints over stimuli and responses. Test oracles accept or
`reject test activity sequences, first deterministically then
`probabilistically. We then define notions of soundness and
`completeness of test oracles.
`
`2.1 Test Activities
`To test is to stimulate a system and observe its response. A
`stimulus and a response both have values, which may coin-
`cide, as when the stimulus value and the response are both
`reals. A system has a set of components C. A stimulus and its
`response target a subset of components. For instance, a com-
`mon pattern for constructing test oracles is to compare the
`output of distinct components on the same stimulus value.
`Thus, stimuli and responses are values that target compo-
`nents. Collectively, stimuli and responses are test activities:
`
`Fig. 1. Stimulus and observations: S is anything that can change the
`observable behavior of the SUT f; R is anything that can be observed
`about the system’s behavior; I includes f’s explicit inputs; O is its explicit
`outputs; everything not in S [ R neither affects nor is affected by f.
`
`Definition 2.1 (Test Activities). For the SUT p, S is the set of
`stimuli that trigger or constrain p’s computation and R is the
`set of observable responses to a stimulus of p. S and R are dis-
`joint. Test activities form the set A ¼ S ] R.
`The use of disjoint union implicitly labels the elements of
`A, which we can flatten to the tuple L  C  V , where
`L ¼ fstimulus; responseg is the set of activities labels, C is
`the set of components, and V is an arbitrary set of values.
`To model those aspects of the world that are independent of
`any component, like a clock, we set an activity’s target to
`the empty set.
`We use the terms “stimulus” and “observation” in the
`broadest sense possible to cater to various testing scenarios,
`functional and nonfunctional. As shown in Fig. 1, a stimulus
`can be either an explicit test input from the tester, I  S, or an
`environmental factor that can affect the testing, S n I. Similarly,
`an observation ranges from an output of the SUT, O  R, to a
`nonfunctional execution profile, like execution time in R n O.
`For example, stimuli include the configuration and plat-
`form settings, database table contents, device states, resource
`constraints, preconditions, typed values at an input device,
`inputs on a channel from another system, sensor inputs and
`so on. Notably, resetting a SUT to an initial state is a stimulus
`and stimulating the SUT with an input runs it. Observations
`include anything that can be discerned and ascribed a mean-
`ing significant to the purpose of testing—including values
`that appear on an output device, database state, temporal
`properties of the execution, heat dissipated during execution,
`power consumed, or any other measurable attributes of its
`execution. Stimuli and observations are members of different
`sets of test activities, but we combine them into test activities.
`
`2.2 Test Activity Sequence
`Testing is a sequence of stimuli and response observations.
`The relationship between stimuli and responses can often
`be captured formally; consider a simple SUT that squares
`its input. To compactly represent infinite relations between
`stimulus and response values such as ði; o ¼ i2Þ, we intro-
`duce a compact notation for set comprehensions:
`x:½fŠ ¼ fxj fg;
`
`where x is a dummy variable over an arbitrary set.
`Definition 2.2 (Test Activity Sequence). A test activity
`sequence is an element of TA ¼ fwj T ! wg over the
`grammar
`
`2
`
`

`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`509
`
`Fig. 2. Cumulative number of publications from 1978 to 2012 and research trend analysis for each type of test oracle. The x-axis represents years
`and y-axis the cumulative number of publications. We use a power regression model to perform the trend analysis. The regression equation and the
`coefficient of determination (R2) indicate a upward future trend, a sign of a healthy research community.
`
`T ::¼ A0:½0f0Š0T j AT j 
`where A is the test activity alphabet.
`
`Under Definition 2.2,
`the testing activity sequence
`io:½o ¼ i2Š denotes the stimulus of invoking f on i, then
`observing the response output. It further specifies valid
`responses obeying o ¼ i2. Thus, it compactly represents
`test activity sequences i1o1; i2o2; . . .
`the infinite set of
`where ok ¼ i2
`k.
`For practical purposes, a test activity sequence will
`almost always have to satisfy constraints in order to be use-
`ful. Under our formalism, these constraints differentiate the
`approaches to test oracle we survey. As an initial illustra-
`tion, we constrain a test activity sequence to obtain a practi-
`cal test sequence:
`Definition 2.3 (Practical Test Sequence). A practical test
`sequence is any test activity sequence w that satisfies
`w ¼ TsTrT;
`for s 2 S; r 2 R:
`Thus, the test activity sequence, w, is practical iff w con-
`tains at
`least one stimulus followed by at
`least one
`observation.
`This notion of a test sequence is nothing more than a very
`general notion of what it means to test; we must do some-
`thing to the system (the stimulus) and subsequently observe
`some behaviour of the system (the observation) so that we
`have something to check (the observation) and something
`upon which this observed behaviour depends (the stimulus).
`A reliable reset ðp; rÞ 2 S is a special stimulus that returns
`the SUT’s component p to its start state. The test activity
`sequence ðstimulus; p; rÞðstimulus; p; iÞ is therefore equivalent
`to the conventional application notation pðiÞ. To extract the
`value of an activity, we write vðaÞ; to extract its target compo-
`nent, we write cðaÞ. To specify two invocations of a single
`
`component on the different values, we must write r1i1r2; i2 :
`½r1; i1; r2; i2 2 S; cðr1Þ ¼ cði1Þ ¼ cðr2Þ ¼ cði2Þ ^ vði1Þ 6¼ vði2ފ.
`In the sequel, we often compare different executions of a sin-
`gle SUT or compare the output of independently imple-
`mented components of the SUT on the same input value. For
`clarity, we introduce syntactic sugar to express constraints on
`stimulus values and components. We let fðxÞ denote ri :
`½cðiÞ ¼ f ^ vðiÞ ¼ xŠ, for f 2 C.
`A test oracle is a predicate that determines whether a
`given test activity sequence is an acceptable behaviour of
`the SUT or not. We first define a “test oracle”, and then relax
`this definition to “probabilistic test oracle”.
`Definition 2.4 (Test Oracle). A test oracle D : TA 7! B is a
`partial1 function from a test activity sequence to true or false.
`
`When a test oracle is defined for a test activity, it either
`accepts the test activity or not. Concatenation in a test activ-
`ity sequence denotes sequential activities; the test oracle D
`permits parallel activities when it accepts different permu-
`tations of the same stimuli and response observations. We
`use D to distinguish a deterministic test oracle from proba-
`bilistic ones. Test oracles are typically computationally
`expensive, so probabilistic approaches to the provision of
`oracle information may be desirable even where a determin-
`istic test oracle is possible [124].
`Definition 2.5 (Probabilistic Test Oracle). A probabilistic
`test oracle ~D : TA 7! ½0; 1Š maps a test activity sequence into
`the interval ½0; 1Š 2 R.
`A probabilistic test oracle returns a real number in the
`closed interval ½0; 1Š. As with test oracles, we do not require
`a probabilistic test oracle to be a total
`function. A
`
`1. Recall that a function is implicitly total: it maps every element of
`its domain to a single element of its range. The partial function
`f : X 7! Y is the total function f0 : X0 ! Y , where X0  X.
`
`3
`
`

`

`510
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`probabilistic test oracle can model the case where the test
`oracle is only able to efficiently offer a probability that
`the test case is acceptable, or for other situations where
`some degree of imprecision can be tolerated in the test
`oracle’s response.
`Our formalism combines a language-theoretic view of
`stimulus and response activities with constraints over those
`activities; these constraints explicitly capture specifications.
`The high-level language view imposes a temporal order on
`the activities. Thus, our formalism is inherently temporal. The
`formalism of Staats et al. captures any temporal exercising of
`the SUT’s behavior in tests, which are atomic black boxes for
`them [174]. Indeed, practitioners write test plans and activi-
`ties, they do not often write specifications at all, let alone a for-
`mal one. This fact and the expressivity of our formalism, as
`evident in our capture of existing test oracle approaches, is
`evidence that our formalism is a good fit with practice.
`
`2.3 Soundness and Completeness
`We conclude this section by defining soundness and com-
`pleteness of test oracles.
`In order to define soundness and completeness of a test
`oracle, we need to define a concept of the “ground truth”, G.
`The ground truth is another form of oracle, a conceptual
`oracle, that always gives the “right answer”. Of course, it
`cannot be known in all but the most trivial cases, but it is a
`useful definition that bounds test oracle behaviour.
`Definition 2.6 (Ground Truth). The ground truth oracle, G,
`is a total test oracle that always gives the “right answer”.
`
`We can now define soundness and completeness of a test
`oracle with respect to G.
`Definition 2.7 (Soundness). The test oracle D is sound iff
`DðaÞ ) GðaÞ:
`
`Definition 2.8 (Completeness). The test oracle D is complete
`iff
`
`GðaÞ ) DðaÞ:
`
`While test oracles cannot, in general, be both sound and
`complete, we can, nevertheless, define and use partially cor-
`rect test oracles. Further, one could argue, from a purely
`philosophical point of view, that human oracles can be
`sound and complete, or correct. In this view, correctness
`becomes a subjective human assessment. The foregoing def-
`initions allow for this case.
`We relax our definition of soundness to cater for probabi-
`listic test oracles:
`
`Definition 2.9 (Probablistic Soundness and Completeness).
`A probabilistic test oracle ~D is probabilistically sound iff
`þ  ) GðwÞ
`Pð ~DðwÞ ¼ 1Þ > 1
`2
`and ~D is probabilistically complete iff
`
`GðwÞ ) Pð ~DðwÞ ¼ 1Þ > 1
`2
`
`þ 
`
`where  is non-negligible.
`
`The non-negligible advantage  requires ~D to do suffi-
`ciently better than flipping a fair coin, which for a binary
`classifier maximizes entropy, that we can achieve arbitrary
`confidence in whether the test sequence w is valid by repeat-
`edly sampling ~D on w.
`
`3 TEST ORACLE RESEARCH TRENDS
`
`The term “test oracle” first appeared in William Howden’s
`seminal work in 1978 [99]. In this section, we analyze the
`research on test oracles, and its related areas, conducted
`since 1978. We begin with a synopsis of the volume of publi-
`cations, classified into specified, derived, implicit, and lack
`of automated test oracles. We then discuss when key con-
`cepts in test oracles were first introduced.
`
`3.1 Volume of Publications
`We constructed a repository of 694 publications on test
`oracles and its related areas from 1978 to 2012 by conduct-
`ing web searches for research articles on Google Scholar and
`Microsoft Academic Search using the queries “software + test
`+ oracle” and “software + test oracle”2, for each year.
`Although some of the queries generated in this fashion may
`be similar, different responses are obtained, with particular
`differences around more lowly-ranked results.
`We classify work on test oracles into four categories:
`specified test oracles (317), derived test oracles (245),
`implicit test oracles (76), and no test oracle (56), which han-
`dles the lack of a test oracle.
`Specified test oracles, discussed in detail in Section 4,
`judge all behavioural aspects of a system with respect to a
`given formal
`specification. For
`specified test oracles
`we searched for related articles using queries “formal +
`specification”, “state-based specification”, “model-based
`languages”, “transition-based languages”, “assertion-based
`languages”, “algebraic specification” and “formal + confor-
`mance testing”. For all queries, we appended the keywords
`with “test oracle” to filter the results for test oracles.
`Derived test oracles (see Section 5) involve artefacts from
`which a test oracle may be derived—for instance, a previous
`version of the system. For derived test oracles, we searched
`for additional articles using the queries “specification
`inference”,
`“specification mining”,
`“API mining”,
`“metamorphic testing”, “regression testing” and “program
`documentation”.
`An implicit oracle (see Section 6) refers to the detection of
`“obvious” faults such as a program crash. For implicit test
`oracles we applied the queries “implicit oracle”, “null
`pointer + detection”, “null reference + detection”, “deadlock
`+ livelock + race + detection”, “memory leaks + detection”,
`“crash + detection”, “performance + load testing”, “non-
`functional + error detection”, “fuzzing + test oracle” and
`“anomaly detection”.
`There have also been papers researching strategies for
`handling the lack of an automated test oracle (see Section 7).
`Here, we applied the queries “human oracle”, “test
`
`2. We use + to separate the keywords in a query; a phrase, not inter-
`nally separated by +, like “test oracle”, is a compound keyword, quoted
`when given to the search engine.
`
`4
`
`

`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`511
`
`Fig. 3. Chronological introduction of test oracles techniques and concepts.
`
`minimization”, “test suite reduction” and “test data + gen-
`eration + realistic + valid”.
`Each of the above queries were appended by the key-
`words “software testing”. The results were filtered,
`removing articles that were found to have no relation to
`software testing and test oracles. Fig. 2 shows the cumu-
`lative number of publications on each type of test oracle
`from 1978 onwards. We analyzed the research trend on
`this data by applying different regression models. The
`trend line, shown in Fig. 2,
`is fitted using a power
`model. The high values for the four coefficients of deter-
`mination (R2), one for each of the four types of test ora-
`cle, confirm that our models are good fits to the trend
`data. The trends observed suggest a healthy growth in
`research volumes in these topics related to the test oracle
`problem in the future.
`
`3.2 The Advent of Test Oracle Techniques
`We classified the collected publications by techniques or
`concepts they proposed to (partially) solve a test oracle
`problem; for example, Model Checking [35] and Metamorphic
`Testing [36] fall into the derived test oracle and DAISTIS
`[69] is an algebraic specification system that addresses the
`specified test oracle problem.
`For each type of test oracle and the advent of a tech-
`nique or a concept, we plotted a timeline in chronological
`order of publications to study research trends. Fig. 3 shows
`the timeline starting from 1978 when the term “test oracle”
`was first coined. Each vertical bar presents the technique
`
`or concept used to solve the problem labeled with the year
`of its first publication.
`The timeline shows only the work that is explicit on the
`issue of test oracles. For example, the work on test genera-
`tion using finite state machines (FSM) can be traced back
`to as early as 1950s. But the explicit use of finite state
`machines to generate test oracles can be traced back to Jard
`and Bochmann [102] and Howden in 1986 [98]. We record,
`in the timeline, the earliest available publication for a given
`technique or concept. We consider only published work in
`journals, the proceedings of conferences and workshops, or
`magazines. We excluded all other types of documentation,
`such as technical reports and manuals.
`Fig. 3 shows a few techniques and concepts that pre-
`date 1978. Although not explicitly on test oracles, they
`identify and address issues for which test oracles were
`later developed. For example, work on detecting concur-
`rency issues (deadlock, livelock, and races) can be traced
`back to the 1960s. Since these issues require no specifica-
`tion, implicit test oracles can and have been built that
`detect them on arbitrary systems. Similarly, Regression
`Testing detects problems in the functionality a new ver-
`sion of a system shares with its predecessors and is a pre-
`cursor of derived test oracles.
`The trend analysis suggests that proposals for new tech-
`niques and concepts for the formal specification of test
`oracles peaked in 1990s, and has gradually diminished in
`the last decade. However,
`it remains an area of much
`research activity, as can be judged from the number of
`
`5
`
`

`

`512
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`publications for each year in Fig. 2. For derived test oracles,
`many solutions have been proposed throughout this period.
`Initially, these solutions were primarily theoretical, such as
`Partial/Pseudo-Oracles [196] and Specification Inference
`[194]; empirical studies however followed in late 1990s.
`For implicit test oracles, research into the solutions estab-
`lished before 1978 has continued, but at a slower pace than
`the other types of test oracles. For handling the lack of an
`automated test oracle, Partition Testing is a well-known tech-
`nique that helps a human test oracle select tests. The trend
`line suggests that only recently have new techniques and
`concepts for tackling this problem started to emerge, with
`an explicit focus on the human oracle cost problem.
`
`4 SPECIFIED TEST ORACLES
`
`Specification is fundamental to computer science, so it is not
`surprising that a vast body of research has explored its use as
`a source of test oracle information. This topic could merit an
`entire survey on its own right. In this section, we provide an
`overview of this work. We also include here partial specifica-
`tions of system behaviour such as assertions and models.
`A specification defines, if possible using mathematical
`logic, the test oracle for particular domain. Thus, a specifica-
`tion language is a notation for defining a specified test oracle
`D, which judges whether the behaviour of a system con-
`forms to a formal specification. Our formalism, defined in
`Section 2, is, itself, a specification language for specifying
`test oracles.
`Over the last 30 years, many methods and formalisms for
`testing based on formal specification have been developed.
`They fall into four broad categories: model-based specifica-
`tion languages, state transition systems, assertions and con-
`tracts, and algebraic specifications. Model-based languages
`define models and a syntax that defines desired behavior in
`terms of its effect on the model. State transition systems
`focus on modeling the reaction of a system to stimuli,
`referred to as “transitions” in this particular formalism.
`Assertions and contracts are fragments of a specification
`language that are interleaved with statements of the imple-
`mentation language and checked at runtime. Algebraic
`specifications define equations over a program’s operations
`that hold when the program is correct.
`
`4.1 Specification Languages
`Specification languages define a mathematical model of a
`system’s behaviour, and are equipped with a formal seman-
`tics that defines the meaning of each language construct in
`terms of the model. When used for testing, models do not
`usually fully specify the system, but seek to capture salient
`properties of a system so that test cases can be generated
`from or checked against them.
`
`4.1.1 Model-Based Specification Languages
`Model-based specification languages model a system as a
`collection of states and operations to alter these states, and
`are therefore also referred to as “state-based specifications”
`in the literature [101], [109], [182], [183]. Preconditions and
`postconditions constrain the system’s operations. An oper-
`ation’s precondition imposes a necessary condition over the
`input states that must hold in a correct application of the
`
`operation; a postcondition defines the (usually strongest)
`effect the operation has on program state [109].
`A variety of model-based specification languages exist,
`including Z [172], B [110], UML/OCL [31], VDM/VDM-SL
`[62], Alloy [101], and the LARCH family [71], which
`includes an algebraic specification sub-language. Broadly,
`these languages have evolved toward being more concrete,
`closer to the implementation languages programmers use to
`solve problems. Two reasons explain this phenomenon: the
`first is the effort to increase their adoption in industry by
`making them more familiar to practitioners and the second
`is to establish synergies between specification and imple-
`mentation that facilitate development as iterative refine-
`ment. For
`instance, Z models disparate entities,
`like
`predicates, sets, state properties, and operations, through a
`single structuring mechanism, its schema construct; the B
`method, Z’s successor, provides a richer array of less
`abstract language constructs.
`B€orger discusses how to use the abstract state machine
`formalism, a very general set-theoretic specification lan-
`guage geared toward the definition of functions, to define
`high level test oracles [29]. The models underlying specifica-
`tion languages can be very abstract, quite far from concrete
`execution output. For instance, it may be difficult to com-
`pute whether a model’s postcondition for a function permits
`an observed concrete output. If this impedance mismatch
`can be overcome, by abstracting a system’s concrete output
`or by concretizing a specification model’s output, and if a
`specification’s postconditions can be evaluated in finite
`time, they can serve as a test oracle [4].
`Model-based specification languages, such as VDM, Z,
`and B can express invariants, which can drive testing. Any
`test case that causes a program to violate an invariant has
`discovered an incorrect behavior; therefore, these invariants
`are partial test oracles.
`In search of a model-based specification language acces-
`sible to domain experts, Parnas et al. proposed TOG (Test
`Oracles Generator) from program documentation [142],
`[145], [148]. In their method, the documentation is written
`in fully formal tabular expressions in which the method sig-
`nature, the external variables, and relation between its start
`and end states are specified [104]. Thus, test oracles can be
`automatically generated to check the outputs against the
`specified states of a program. The work by Parnas et al. has
`been developed over a considerable period of more than
`two decades [48], [59], [60], [144], [149], [190], [191].
`
`4.1.2 State Transition Systems
`State transition systems often present a graphical syntax,
`and focus on transitions between different states of the sys-
`tem. Here, states typically abstract sets of concrete state of
`the modeled system. State transition systems have been
`referred as visual languages in the literature [197]. A wide
`variety of state transition systems exist, including finite state
`machines [111], Mealy/Moore machines [111], I/O Autom-
`ata [117], Labeled Transition Systems [180], SDL [54], Harel
`Statecharts [81], UML state machines [28], X-Machines [95],
`[96], Simulink/Stateflow [179] and PROMELA [97]. Mou-
`chawrab et al. conducted a rigorous empirical evaluation
`of test oracle construction techniques using state transition
`systems [70], [137].
`
`6
`
`

`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`513
`
`An important class of state transition systems have a
`finite set of states and are therefore particularly well-suited
`for automated reasoning about systems whose behaviour
`can be abstracted into states defined by a finite set of values
`[93]. State transition systems capture the behavior of a sys-
`tem under test as a set of states,3 with transitions represent-
`ing stimuli that cause the system to change state. State
`transition systems model the output of a system they
`abstract either as a property of the states (the final state in
`the case of Moore machines) or the transitions traversed (as
`with Mealy machines).
`Models approximate a SUT, so behavioral differences
`between the two are inevitable. Some divergences, however,
`are spurious and falsely report testing failure. State-transition
`models are especially susceptible to this problem when
`modeling embedded systems, for which time of occurrence is
`critical. Recent work model tolerates spurious differences in
`time by “steering” model’s evaluation: when the SUT and its
`model differ, the model is backtracked, and a steering action,
`like modifying timer value or changing inputs, is applied to
`reduce the distance, under a similarity measure [74].
`Protocol conformance testing [72] and,
`later, model

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket