IPR2021-01055, No. 1039-36 Exhibit - Barr The Oracle Problem in Software Testing (P.T.A.B. Jun. 14, 2021)

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`507
`
`The Oracle Problem in Software
`Testing: A Survey
`
`Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo
`
`Abstract—Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system,
`the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test
`oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation.
`Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles
`has introduced techniques for oracle automation, including modelling, speciﬁcations, contract-driven development and metamorphic
`testing. When none of these is completely adequate, the ﬁnal source of test oracle information remains the human, who may be
`aware of informal speciﬁcations, expectations, norms and domain speciﬁc information that provide informal oracle guidance. All
`forms of test oracles, even the humble human, involve challenges of reducing cost and increasing beneﬁt. This paper provides a
`comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software
`testing research and practice.
`
`Index Terms—Test oracle, automatic testing, testing formalism
`
`Ç
`
`1 INTRODUCTION
`
`M UCH work on software testing seeks to automate as
`
`much of the test process as practical and desirable,
`to make testing faster, cheaper, and more reliable. To
`this end, we need a test oracle, a procedure that distin-
`guishes between the correct and incorrect behaviors of
`the System Under Test (SUT).
`However, compared to many aspects of test automation,
`the problem of automating the test oracle has received signif-
`icantly less attention, and remains comparatively less well-
`solved. This current open problem represents a signiﬁcant
`bottleneck that inhibits greater test automation and uptake
`of automated testing methods and tools more widely. For
`instance, the problem of automatically generating test inputs
`has been the subject of research interest for nearly four deca-
`des [46], [107]. It involves ﬁnding inputs that cause execution
`to reveal faults, if they are present, and to give conﬁdence in
`their absence, if none are found. Automated test input gener-
`ation been the subject of many signiﬁcant advances in both
`Search-Based Testing [3], [5], [83], [126], [128] and Dynamic
`Symbolic Execution [75], [108], [161]; yet none of these
`advances address the issue of checking generated inputs
`with respect to expected behaviours—that is, providing an
`automated solution to the test oracle problem.
`Of course, one might hope that the SUT has been devel-
`oped under excellent design-for-test principles, so that there
`
` E.T. Barr, M. Harman and S. Yoo are with the Department of Computer
`Science, University College London, Gower Street, London WC2R 2LS,
`London, United Kingdom. E-mail: {e.barr, m.harman, shin.yoo}@ucl.ac.uk.
` P. McMinn and M. Shahbaz are with the Department of Computer
`Science, University of Shefﬁeld, Shefﬁeld S1 4DP, South Yorkshire,
`United Kingdom.
`E-mail: p.mcminn@shefﬁeld.ac.uk, muzammil.shahbaz@gmail.com.
`
`Manuscript received 24 July 2013; revised 5 Sept. 2014; accepted 28 Oct.
`2014. Date of publication 19 Nov. 2014; date of current version 15 May 2015.
`Recommended for acceptance by L. Baresi.
`For information on obtaining reprints of this article, please send e-mail to:
`reprints@ieee.org, and reference the Digital Object Identiﬁer below.
`Digital Object Identiﬁer no. 10.1109/TSE.2014.2372785
`
`might be a detailed, and possibly formal, speciﬁcation of
`intended behaviour. One might also hope that the code
`itself contains pre- and post- conditions that implement
`well-understood contract-driven development approaches
`[135]. In these situations, the test oracle cost problem is ame-
`liorated by the presence of an automatable test oracle to
`which a testing tool can refer to check outputs, free from the
`need for costly human intervention.
`Where no full speciﬁcation of the properties of the SUT
`exists, one may hope to construct a partial test oracle that
`can answer questions for some inputs. Such partial test
`oracles can be constructed using metamorphic testing
`(built from known relationships between desired behav-
`iour) or by deriving oracular information from execution
`or documentation.
`For many systems and most testing as currently practiced
`in industry, however, the tester does not have the luxury of
`formal speciﬁcations or assertions, or automated partial test
`oracles [91], [92]. The tester therefore faces the daunting task
`of manually checking the system’s behaviour for all test
`cases. In such cases, automated software testing approaches
`must address the human oracle cost problem [1], [82], [130].
`To achieve greater test automation and wider uptake of
`automated testing, we therefore need a concerted effort to
`ﬁnd ways to address the test oracle problem and to integrate
`automated and partially automated test oracle solutions
`into testing techniques. This paper seeks to help address
`this challenge by providing a comprehensive review and
`analysis of the existing literature of the test oracle problem.
`Four partial surveys of topics relating to test oracles pre-
`cede this one. However, none has provided a comprehen-
`sive survey of trends and results. In 2001, Baresi and Young
`[17] presented a partial survey that covered four topics
`prevalent at the time the paper was published: assertions,
`speciﬁcations, state-based conformance testing, and log ﬁle
`analysis. While these topics remain important, they capture
`only a part of the overall landscape of research in test
`
`This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
`
`CONFIGIT 1039
`
`1
`
`

`508
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`oracles, which the present paper covers. Another early work
`was the initial motivation for considering the test oracle
`problem contained in Binder’s textbook on software testing
`[23], published in 2000. More recently, in 2009, Shahamiri
`et al. [164] compared six techniques from the speciﬁc cate-
`gory of derived test oracles. In 2011, Staats et al. [174] pro-
`posed a theoretical analysis that included test oracles in a
`revisitation of the fundamentals of testing. Most recently, in
`2014, Pezze and Zhang focus on automated test oracles for
`functional properties [150].
`Despite this work, research into the test oracle problem
`remains an activity undertaken in a fragmented community
`of researchers and practitioners. The role of the present
`paper is to overcome this fragmentation in this important
`area of software testing by providing the ﬁrst comprehen-
`sive analysis and review of work on the test oracle problem.
`The rest of the paper is organised as follows: Section 2 sets
`out the deﬁnitions relating to test oracles that we use to com-
`pare and contrast the techniques in the literature. Section 3
`relates a historical analysis of developments in the area. Here
`we identify key milestones and track the volume of past pub-
`lications. Based on this data, we plot growth trends for four
`broad categories of solution to the test oracle problem, which
`we survey in Sections 4, 5, 6, and 7. These four categories
`comprise approaches to the oracle problem where:
`
`
`
`
`
`test oracles can be speciﬁed (Section 4);
`test oracles can be derived (Section 5);
`test oracles can be built from implicit information
`(Section 6); and
`no automatable oracle is available, yet it is still possible
`to reduce human effort (Section 7).
`Finally, Section 8 concludes with closing remarks.
`
`2 DEFINITIONS
`
`This section presents deﬁnitions to establish a lingua franca
`in which to examine the literature on oracles. These deﬁni-
`tions are formalised to avoid ambiguity, but the reader
`should ﬁnd that it is also possible to read the paper using
`only the informal descriptions that accompany these formal
`deﬁnitions. We use the theory to clarify the relationship
`between algebraic speciﬁcation, pseudo oracles, and meta-
`morphic relations in Section 5.
`To begin, we deﬁne a test activity as a stimulus or
`response, then test activity sequences that incorporate con-
`straints over stimuli and responses. Test oracles accept or
`reject test activity sequences, ﬁrst deterministically then
`probabilistically. We then deﬁne notions of soundness and
`completeness of test oracles.
`
`2.1 Test Activities
`To test is to stimulate a system and observe its response. A
`stimulus and a response both have values, which may coin-
`cide, as when the stimulus value and the response are both
`reals. A system has a set of components C. A stimulus and its
`response target a subset of components. For instance, a com-
`mon pattern for constructing test oracles is to compare the
`output of distinct components on the same stimulus value.
`Thus, stimuli and responses are values that target compo-
`nents. Collectively, stimuli and responses are test activities:
`
`Fig. 1. Stimulus and observations: S is anything that can change the
`observable behavior of the SUT f; R is anything that can be observed
`about the system’s behavior; I includes f’s explicit inputs; O is its explicit
`outputs; everything not in S [ R neither affects nor is affected by f.
`
`Deﬁnition 2.1 (Test Activities). For the SUT p, S is the set of
`stimuli that trigger or constrain p’s computation and R is the
`set of observable responses to a stimulus of p. S and R are dis-
`joint. Test activities form the set A ¼ S ] R.
`The use of disjoint union implicitly labels the elements of
`A, which we can ﬂatten to the tuple L C V , where
`L ¼ fstimulus; responseg is the set of activities labels, C is
`the set of components, and V is an arbitrary set of values.
`To model those aspects of the world that are independent of
`any component, like a clock, we set an activity’s target to
`the empty set.
`We use the terms “stimulus” and “observation” in the
`broadest sense possible to cater to various testing scenarios,
`functional and nonfunctional. As shown in Fig. 1, a stimulus
`can be either an explicit test input from the tester, I S, or an
`environmental factor that can affect the testing, S n I. Similarly,
`an observation ranges from an output of the SUT, O R, to a
`nonfunctional execution proﬁle, like execution time in R n O.
`For example, stimuli include the conﬁguration and plat-
`form settings, database table contents, device states, resource
`constraints, preconditions, typed values at an input device,
`inputs on a channel from another system, sensor inputs and
`so on. Notably, resetting a SUT to an initial state is a stimulus
`and stimulating the SUT with an input runs it. Observations
`include anything that can be discerned and ascribed a mean-
`ing signiﬁcant to the purpose of testing—including values
`that appear on an output device, database state, temporal
`properties of the execution, heat dissipated during execution,
`power consumed, or any other measurable attributes of its
`execution. Stimuli and observations are members of different
`sets of test activities, but we combine them into test activities.
`
`2.2 Test Activity Sequence
`Testing is a sequence of stimuli and response observations.
`The relationship between stimuli and responses can often
`be captured formally; consider a simple SUT that squares
`its input. To compactly represent inﬁnite relations between
`stimulus and response values such as ði; o ¼ i2Þ, we intro-
`duce a compact notation for set comprehensions:
`x:½f ¼ fxj fg;
`
`where x is a dummy variable over an arbitrary set.
`Deﬁnition 2.2 (Test Activity Sequence). A test activity
`sequence is an element of TA ¼ fwj T ! wg over the
`grammar
`
`2
`
`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`509
`
`Fig. 2. Cumulative number of publications from 1978 to 2012 and research trend analysis for each type of test oracle. The x-axis represents years
`and y-axis the cumulative number of publications. We use a power regression model to perform the trend analysis. The regression equation and the
`coefﬁcient of determination (R2) indicate a upward future trend, a sign of a healthy research community.
`
`T ::¼ A0:½0f00T j AT j
`where A is the test activity alphabet.
`
`Under Deﬁnition 2.2,
`the testing activity sequence
`io:½o ¼ i2 denotes the stimulus of invoking f on i, then
`observing the response output. It further speciﬁes valid
`responses obeying o ¼ i2. Thus, it compactly represents
`test activity sequences i1o1; i2o2; . . .
`the inﬁnite set of
`where ok ¼ i2
`k.
`For practical purposes, a test activity sequence will
`almost always have to satisfy constraints in order to be use-
`ful. Under our formalism, these constraints differentiate the
`approaches to test oracle we survey. As an initial illustra-
`tion, we constrain a test activity sequence to obtain a practi-
`cal test sequence:
`Deﬁnition 2.3 (Practical Test Sequence). A practical test
`sequence is any test activity sequence w that satisﬁes
`w ¼ TsTrT;
`for s 2 S; r 2 R:
`Thus, the test activity sequence, w, is practical iff w con-
`tains at
`least one stimulus followed by at
`least one
`observation.
`This notion of a test sequence is nothing more than a very
`general notion of what it means to test; we must do some-
`thing to the system (the stimulus) and subsequently observe
`some behaviour of the system (the observation) so that we
`have something to check (the observation) and something
`upon which this observed behaviour depends (the stimulus).
`A reliable reset ðp; rÞ 2 S is a special stimulus that returns
`the SUT’s component p to its start state. The test activity
`sequence ðstimulus; p; rÞðstimulus; p; iÞ is therefore equivalent
`to the conventional application notation pðiÞ. To extract the
`value of an activity, we write vðaÞ; to extract its target compo-
`nent, we write cðaÞ. To specify two invocations of a single
`
`component on the different values, we must write r1i1r2; i2 :
`½r1; i1; r2; i2 2 S; cðr1Þ ¼ cði1Þ ¼ cðr2Þ ¼ cði2Þ ^ vði1Þ 6¼ vði2Þ.
`In the sequel, we often compare different executions of a sin-
`gle SUT or compare the output of independently imple-
`mented components of the SUT on the same input value. For
`clarity, we introduce syntactic sugar to express constraints on
`stimulus values and components. We let fðxÞ denote ri :
`½cðiÞ ¼ f ^ vðiÞ ¼ x, for f 2 C.
`A test oracle is a predicate that determines whether a
`given test activity sequence is an acceptable behaviour of
`the SUT or not. We ﬁrst deﬁne a “test oracle”, and then relax
`this deﬁnition to “probabilistic test oracle”.
`Deﬁnition 2.4 (Test Oracle). A test oracle D : TA 7! B is a
`partial1 function from a test activity sequence to true or false.
`
`When a test oracle is deﬁned for a test activity, it either
`accepts the test activity or not. Concatenation in a test activ-
`ity sequence denotes sequential activities; the test oracle D
`permits parallel activities when it accepts different permu-
`tations of the same stimuli and response observations. We
`use D to distinguish a deterministic test oracle from proba-
`bilistic ones. Test oracles are typically computationally
`expensive, so probabilistic approaches to the provision of
`oracle information may be desirable even where a determin-
`istic test oracle is possible [124].
`Deﬁnition 2.5 (Probabilistic Test Oracle). A probabilistic
`test oracle ~D : TA 7! ½0; 1 maps a test activity sequence into
`the interval ½0; 1 2 R.
`A probabilistic test oracle returns a real number in the
`closed interval ½0; 1. As with test oracles, we do not require
`a probabilistic test oracle to be a total
`function. A
`
`1. Recall that a function is implicitly total: it maps every element of
`its domain to a single element of its range. The partial function
`f : X 7! Y is the total function f0 : X0 ! Y , where X0 X.
`
`3
`
`

`510
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`probabilistic test oracle can model the case where the test
`oracle is only able to efﬁciently offer a probability that
`the test case is acceptable, or for other situations where
`some degree of imprecision can be tolerated in the test
`oracle’s response.
`Our formalism combines a language-theoretic view of
`stimulus and response activities with constraints over those
`activities; these constraints explicitly capture speciﬁcations.
`The high-level language view imposes a temporal order on
`the activities. Thus, our formalism is inherently temporal. The
`formalism of Staats et al. captures any temporal exercising of
`the SUT’s behavior in tests, which are atomic black boxes for
`them [174]. Indeed, practitioners write test plans and activi-
`ties, they do not often write speciﬁcations at all, let alone a for-
`mal one. This fact and the expressivity of our formalism, as
`evident in our capture of existing test oracle approaches, is
`evidence that our formalism is a good ﬁt with practice.
`
`2.3 Soundness and Completeness
`We conclude this section by deﬁning soundness and com-
`pleteness of test oracles.
`In order to deﬁne soundness and completeness of a test
`oracle, we need to deﬁne a concept of the “ground truth”, G.
`The ground truth is another form of oracle, a conceptual
`oracle, that always gives the “right answer”. Of course, it
`cannot be known in all but the most trivial cases, but it is a
`useful deﬁnition that bounds test oracle behaviour.
`Deﬁnition 2.6 (Ground Truth). The ground truth oracle, G,
`is a total test oracle that always gives the “right answer”.
`
`We can now deﬁne soundness and completeness of a test
`oracle with respect to G.
`Deﬁnition 2.7 (Soundness). The test oracle D is sound iff
`DðaÞ ) GðaÞ:
`
`Deﬁnition 2.8 (Completeness). The test oracle D is complete
`iff
`
`GðaÞ ) DðaÞ:
`
`While test oracles cannot, in general, be both sound and
`complete, we can, nevertheless, deﬁne and use partially cor-
`rect test oracles. Further, one could argue, from a purely
`philosophical point of view, that human oracles can be
`sound and complete, or correct. In this view, correctness
`becomes a subjective human assessment. The foregoing def-
`initions allow for this case.
`We relax our deﬁnition of soundness to cater for probabi-
`listic test oracles:
`
`Deﬁnition 2.9 (Probablistic Soundness and Completeness).
`A probabilistic test oracle ~D is probabilistically sound iff
`þ ) GðwÞ
`Pð ~DðwÞ ¼ 1Þ > 1
`2
`and ~D is probabilistically complete iff
`
`GðwÞ ) Pð ~DðwÞ ¼ 1Þ > 1
`2
`
`þ
`
`where is non-negligible.
`
`The non-negligible advantage requires ~D to do sufﬁ-
`ciently better than ﬂipping a fair coin, which for a binary
`classiﬁer maximizes entropy, that we can achieve arbitrary
`conﬁdence in whether the test sequence w is valid by repeat-
`edly sampling ~D on w.
`
`3 TEST ORACLE RESEARCH TRENDS
`
`The term “test oracle” ﬁrst appeared in William Howden’s
`seminal work in 1978 [99]. In this section, we analyze the
`research on test oracles, and its related areas, conducted
`since 1978. We begin with a synopsis of the volume of publi-
`cations, classiﬁed into speciﬁed, derived, implicit, and lack
`of automated test oracles. We then discuss when key con-
`cepts in test oracles were ﬁrst introduced.
`
`3.1 Volume of Publications
`We constructed a repository of 694 publications on test
`oracles and its related areas from 1978 to 2012 by conduct-
`ing web searches for research articles on Google Scholar and
`Microsoft Academic Search using the queries “software + test
`+ oracle” and “software + test oracle”2, for each year.
`Although some of the queries generated in this fashion may
`be similar, different responses are obtained, with particular
`differences around more lowly-ranked results.
`We classify work on test oracles into four categories:
`speciﬁed test oracles (317), derived test oracles (245),
`implicit test oracles (76), and no test oracle (56), which han-
`dles the lack of a test oracle.
`Speciﬁed test oracles, discussed in detail in Section 4,
`judge all behavioural aspects of a system with respect to a
`given formal
`speciﬁcation. For
`speciﬁed test oracles
`we searched for related articles using queries “formal +
`speciﬁcation”, “state-based speciﬁcation”, “model-based
`languages”, “transition-based languages”, “assertion-based
`languages”, “algebraic speciﬁcation” and “formal + confor-
`mance testing”. For all queries, we appended the keywords
`with “test oracle” to ﬁlter the results for test oracles.
`Derived test oracles (see Section 5) involve artefacts from
`which a test oracle may be derived—for instance, a previous
`version of the system. For derived test oracles, we searched
`for additional articles using the queries “speciﬁcation
`inference”,
`“speciﬁcation mining”,
`“API mining”,
`“metamorphic testing”, “regression testing” and “program
`documentation”.
`An implicit oracle (see Section 6) refers to the detection of
`“obvious” faults such as a program crash. For implicit test
`oracles we applied the queries “implicit oracle”, “null
`pointer + detection”, “null reference + detection”, “deadlock
`+ livelock + race + detection”, “memory leaks + detection”,
`“crash + detection”, “performance + load testing”, “non-
`functional + error detection”, “fuzzing + test oracle” and
`“anomaly detection”.
`There have also been papers researching strategies for
`handling the lack of an automated test oracle (see Section 7).
`Here, we applied the queries “human oracle”, “test
`
`2. We use + to separate the keywords in a query; a phrase, not inter-
`nally separated by +, like “test oracle”, is a compound keyword, quoted
`when given to the search engine.
`
`4
`
`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`511
`
`Fig. 3. Chronological introduction of test oracles techniques and concepts.
`
`minimization”, “test suite reduction” and “test data + gen-
`eration + realistic + valid”.
`Each of the above queries were appended by the key-
`words “software testing”. The results were ﬁltered,
`removing articles that were found to have no relation to
`software testing and test oracles. Fig. 2 shows the cumu-
`lative number of publications on each type of test oracle
`from 1978 onwards. We analyzed the research trend on
`this data by applying different regression models. The
`trend line, shown in Fig. 2,
`is ﬁtted using a power
`model. The high values for the four coefﬁcients of deter-
`mination (R2), one for each of the four types of test ora-
`cle, conﬁrm that our models are good ﬁts to the trend
`data. The trends observed suggest a healthy growth in
`research volumes in these topics related to the test oracle
`problem in the future.
`
`3.2 The Advent of Test Oracle Techniques
`We classiﬁed the collected publications by techniques or
`concepts they proposed to (partially) solve a test oracle
`problem; for example, Model Checking [35] and Metamorphic
`Testing [36] fall into the derived test oracle and DAISTIS
`[69] is an algebraic speciﬁcation system that addresses the
`speciﬁed test oracle problem.
`For each type of test oracle and the advent of a tech-
`nique or a concept, we plotted a timeline in chronological
`order of publications to study research trends. Fig. 3 shows
`the timeline starting from 1978 when the term “test oracle”
`was ﬁrst coined. Each vertical bar presents the technique
`
`or concept used to solve the problem labeled with the year
`of its ﬁrst publication.
`The timeline shows only the work that is explicit on the
`issue of test oracles. For example, the work on test genera-
`tion using ﬁnite state machines (FSM) can be traced back
`to as early as 1950s. But the explicit use of ﬁnite state
`machines to generate test oracles can be traced back to Jard
`and Bochmann [102] and Howden in 1986 [98]. We record,
`in the timeline, the earliest available publication for a given
`technique or concept. We consider only published work in
`journals, the proceedings of conferences and workshops, or
`magazines. We excluded all other types of documentation,
`such as technical reports and manuals.
`Fig. 3 shows a few techniques and concepts that pre-
`date 1978. Although not explicitly on test oracles, they
`identify and address issues for which test oracles were
`later developed. For example, work on detecting concur-
`rency issues (deadlock, livelock, and races) can be traced
`back to the 1960s. Since these issues require no speciﬁca-
`tion, implicit test oracles can and have been built that
`detect them on arbitrary systems. Similarly, Regression
`Testing detects problems in the functionality a new ver-
`sion of a system shares with its predecessors and is a pre-
`cursor of derived test oracles.
`The trend analysis suggests that proposals for new tech-
`niques and concepts for the formal speciﬁcation of test
`oracles peaked in 1990s, and has gradually diminished in
`the last decade. However,
`it remains an area of much
`research activity, as can be judged from the number of
`
`5
`
`

`512
`
`IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 41, NO. 5, MAY 2015
`
`publications for each year in Fig. 2. For derived test oracles,
`many solutions have been proposed throughout this period.
`Initially, these solutions were primarily theoretical, such as
`Partial/Pseudo-Oracles [196] and Speciﬁcation Inference
`[194]; empirical studies however followed in late 1990s.
`For implicit test oracles, research into the solutions estab-
`lished before 1978 has continued, but at a slower pace than
`the other types of test oracles. For handling the lack of an
`automated test oracle, Partition Testing is a well-known tech-
`nique that helps a human test oracle select tests. The trend
`line suggests that only recently have new techniques and
`concepts for tackling this problem started to emerge, with
`an explicit focus on the human oracle cost problem.
`
`4 SPECIFIED TEST ORACLES
`
`Speciﬁcation is fundamental to computer science, so it is not
`surprising that a vast body of research has explored its use as
`a source of test oracle information. This topic could merit an
`entire survey on its own right. In this section, we provide an
`overview of this work. We also include here partial speciﬁca-
`tions of system behaviour such as assertions and models.
`A speciﬁcation deﬁnes, if possible using mathematical
`logic, the test oracle for particular domain. Thus, a speciﬁca-
`tion language is a notation for deﬁning a speciﬁed test oracle
`D, which judges whether the behaviour of a system con-
`forms to a formal speciﬁcation. Our formalism, deﬁned in
`Section 2, is, itself, a speciﬁcation language for specifying
`test oracles.
`Over the last 30 years, many methods and formalisms for
`testing based on formal speciﬁcation have been developed.
`They fall into four broad categories: model-based speciﬁca-
`tion languages, state transition systems, assertions and con-
`tracts, and algebraic speciﬁcations. Model-based languages
`deﬁne models and a syntax that deﬁnes desired behavior in
`terms of its effect on the model. State transition systems
`focus on modeling the reaction of a system to stimuli,
`referred to as “transitions” in this particular formalism.
`Assertions and contracts are fragments of a speciﬁcation
`language that are interleaved with statements of the imple-
`mentation language and checked at runtime. Algebraic
`speciﬁcations deﬁne equations over a program’s operations
`that hold when the program is correct.
`
`4.1 Speciﬁcation Languages
`Speciﬁcation languages deﬁne a mathematical model of a
`system’s behaviour, and are equipped with a formal seman-
`tics that deﬁnes the meaning of each language construct in
`terms of the model. When used for testing, models do not
`usually fully specify the system, but seek to capture salient
`properties of a system so that test cases can be generated
`from or checked against them.
`
`4.1.1 Model-Based Speciﬁcation Languages
`Model-based speciﬁcation languages model a system as a
`collection of states and operations to alter these states, and
`are therefore also referred to as “state-based speciﬁcations”
`in the literature [101], [109], [182], [183]. Preconditions and
`postconditions constrain the system’s operations. An oper-
`ation’s precondition imposes a necessary condition over the
`input states that must hold in a correct application of the
`
`operation; a postcondition deﬁnes the (usually strongest)
`effect the operation has on program state [109].
`A variety of model-based speciﬁcation languages exist,
`including Z [172], B [110], UML/OCL [31], VDM/VDM-SL
`[62], Alloy [101], and the LARCH family [71], which
`includes an algebraic speciﬁcation sub-language. Broadly,
`these languages have evolved toward being more concrete,
`closer to the implementation languages programmers use to
`solve problems. Two reasons explain this phenomenon: the
`ﬁrst is the effort to increase their adoption in industry by
`making them more familiar to practitioners and the second
`is to establish synergies between speciﬁcation and imple-
`mentation that facilitate development as iterative reﬁne-
`ment. For
`instance, Z models disparate entities,
`like
`predicates, sets, state properties, and operations, through a
`single structuring mechanism, its schema construct; the B
`method, Z’s successor, provides a richer array of less
`abstract language constructs.
`B€orger discusses how to use the abstract state machine
`formalism, a very general set-theoretic speciﬁcation lan-
`guage geared toward the deﬁnition of functions, to deﬁne
`high level test oracles [29]. The models underlying speciﬁca-
`tion languages can be very abstract, quite far from concrete
`execution output. For instance, it may be difﬁcult to com-
`pute whether a model’s postcondition for a function permits
`an observed concrete output. If this impedance mismatch
`can be overcome, by abstracting a system’s concrete output
`or by concretizing a speciﬁcation model’s output, and if a
`speciﬁcation’s postconditions can be evaluated in ﬁnite
`time, they can serve as a test oracle [4].
`Model-based speciﬁcation languages, such as VDM, Z,
`and B can express invariants, which can drive testing. Any
`test case that causes a program to violate an invariant has
`discovered an incorrect behavior; therefore, these invariants
`are partial test oracles.
`In search of a model-based speciﬁcation language acces-
`sible to domain experts, Parnas et al. proposed TOG (Test
`Oracles Generator) from program documentation [142],
`[145], [148]. In their method, the documentation is written
`in fully formal tabular expressions in which the method sig-
`nature, the external variables, and relation between its start
`and end states are speciﬁed [104]. Thus, test oracles can be
`automatically generated to check the outputs against the
`speciﬁed states of a program. The work by Parnas et al. has
`been developed over a considerable period of more than
`two decades [48], [59], [60], [144], [149], [190], [191].
`
`4.1.2 State Transition Systems
`State transition systems often present a graphical syntax,
`and focus on transitions between different states of the sys-
`tem. Here, states typically abstract sets of concrete state of
`the modeled system. State transition systems have been
`referred as visual languages in the literature [197]. A wide
`variety of state transition systems exist, including ﬁnite state
`machines [111], Mealy/Moore machines [111], I/O Autom-
`ata [117], Labeled Transition Systems [180], SDL [54], Harel
`Statecharts [81], UML state machines [28], X-Machines [95],
`[96], Simulink/Stateﬂow [179] and PROMELA [97]. Mou-
`chawrab et al. conducted a rigorous empirical evaluation
`of test oracle construction techniques using state transition
`systems [70], [137].
`
`6
`
`

`BARR ET AL.: THE ORACLE PROBLEM IN SOFTWARE TESTING: A SURVEY
`
`513
`
`An important class of state transition systems have a
`ﬁnite set of states and are therefore particularly well-suited
`for automated reasoning about systems whose behaviour
`can be abstracted into states deﬁned by a ﬁnite set of values
`[93]. State transition systems capture the behavior of a sys-
`tem under test as a set of states,3 with transitions represent-
`ing stimuli that cause the system to change state. State
`transition systems model the output of a system they
`abstract either as a property of the states (the ﬁnal state in
`the case of Moore machines) or the transitions traversed (as
`with Mealy machines).
`Models approximate a SUT, so behavioral differences
`between the two are inevitable. Some divergences, however,
`are spurious and falsely report testing failure. State-transition
`models are especially susceptible to this problem when
`modeling embedded systems, for which time of occurrence is
`critical. Recent work model tolerates spurious differences in
`time by “steering” model’s evaluation: when the SUT and its
`model differ, the model is backtracked, and a steering action,
`like modifying timer value or changing inputs, is applied to
`reduce the distance, under a similarity measure [74].
`Protocol conformance testing [72] and,
`later, model

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases