A new IPO-G based algorithm that supports constraint handling
Background
A couple of weeks ago, when I was writing a jcunit demo that tests geophile
[0]
library,
I noticed that its old constraint handling doesn't work well.
Since the behavior looked hard to fix without hurting compatibility and it would take
a bit time, I decided to implement another algorithm which I came up with.
Introduction
I'm calling this new algorithm IPO-gc, named after well-known one IPO-G. Before getting into my own, I would shortly touch up on its ancestor.
IPO-G[7], which is a basis of my new algorithm IPO-gc, is one of best known combinatorial test suite generation algorithms where it grows up an initial covering array, generated by calculating Cartesian products of first t(= desired strength) factors, horizontally and vertically until all the desired tuples are covered.
Following is a pseudo code to illustrate the algorithm.[7]
Algorithm: IPOG-Test (int t , ParameterSet ps ) { 1. initialize test set ts to be an empty set 2. denote the parameters in ps , in an arbitrary order, as P1 , P2, ..., and Pn 3. add into test set ts a test for each combination of values of the first t parameters 4. for (int i = t + 1 ; i ≤ n ; i ++ ){ 5. let π be the set of t -way combinations of values involving parameter Pi and t -1 parameters among the first i – 1 parameters 6. // horizontal extension for parameter Pi 7. for (each test τ = (v 1 , v 2 , ..., v i-1 ) in test set ts ) { 8. choose a value vi of Pi and replace τ with τ’ = (v 1 , v 2 , ..., vi-1 , vi ) so that τ’ covers the most number of combinations of values in π 9. remove from π the combinations of values covered by τ’ 10. } 11. // vertical extension for parameter P i 12. for (each combination σ in set π ) { 13. if (there exists a test that already covers σ ) { 14. remove σ from π 15. } else { 16. change an existing test, if possible, or otherwise add a new test to cover σ and remove it from π 17. } 18. } 19. } 20. return ts; }
This algorithm consists of 4 most core elements, which are
1. *ts* = Create a initial covering array from first *t* factors. 2. *π* = Compute partial tuples to be covered by an iteration foreach *factor* not in first *t* factors: 3. hg: Grow *ts* horizontally by adding a *factor* to the right of given one so that partial tuples in *π* as many as possible. covered tuples by this step will be removed from *π*. while *π* is not empty 4. vg: Grow *ts* vertically by adding a new full tuple that covers partial tuples in π as many as possible. If there is already a tuple in *ts* that can cover another partial tuple in *π*, it will be modified to cover the partial one instead of adding a new one to *ts*.
where "a full tuple" means a tuple which has all the first i attributes and "a partial tuple" means a tuple in π, only has t attributes.
IPO-gc algorithm
- Enhance definitions of hg and vg operation so that it can create a new covering array from 2 partial covering arrays, not only from 1 covering array and 1 factor.
- Consider a simple factor a special form of covering arrays.
- Group given constraints into smaller chunks, each of which reachable from one constraint to another connected by shared factors.
- For each group of constraints, create covering array from factors referenced by the constraints in it. In the current implementation it is a brute force algorithm, but it can be any algorithm such as PICT, IPOC, or other AETG based algorithms.
Assumptions behind the last item is following,
- Generally speaking, constraints defined over a test space can be grouped into some smaller groups each of which contains only small amount of factors.
In my limited experience, this is true for automatic test suite generation for FSM's by JCUnit[1]. And actually it would be considered that creating a set of constraints that relate to many factors directly and indirectly isn't a good idea[8]
Definitions
Before discussing further details, I want to define following several terms.
- FS Factors.
- CS Constraint sets.
- t Desired strength of a generated test suite.
- ts Generated test suite. To which the algorithm adds generated tests by growing up horizontally and vertically.
- τ A (current) element in ts.
- π Partial tuples that should be covered in a current iteration. A tuple in this set has values from FS (simple factors) only. Not from GFS because this set is used to evaluate how good a next choice is and it should be done based on how many possible partial tuples generated from FS not from GFS.
- GFS Grouped factors. A grouped factors consists of one or more factors. And has zero or more constraints each of which uses one or more factors in it, but none of the constraints uses any factors outside the grouped factor. A grouped factor is one form of a "covering array" whose characteristics will be discussed later.
- GFS.length Total number of grouped factors in GFS.
- GFS[ i] ith element of GFS.
- GFS[ i].subfactors Simple factors that belong to GFS[ i].
- GFS[ i].constraints Constraints that belong to GFS[ i].
- GFS[ i].levels Levels of GSF[ i], which are tuples that consist of values from GFS[ i].subfactors.
A grouped factor: If a set of factors FS, strength t, and constraints CS over them are given, we can define a grouped factor GF(FS, CS, t) as follows.
- It is a set of full-tuples all of FS.
- All the possible tuples of strength t over FS are covered by it.
- None of CA violates any of CS.
- For simplicity, if an element is removed from it and the conditions above are still met, it will be removed.
One thing to be noted here is once a covering array is computed for a given factors and their constraints, we are always able to compute all the possible partial tuples of strength t by just scanning elements in it. You do not need to worry about what constraints are given.
Ideas
Figures to illustrate the ideas of IPO-gc are as follows
In IPO-G, horizontal growth hg is defined as an operation between, ts, Pi (a conventional factor), t(desired strength), and π (tuples to be covered in this iteration).
IPO-G ts ts' P0 P1...Pi-1 Pi P0 P1...Pi-1 Pi +---------+ +---+ +---------------+ | | |v0 | | | | | |v1 | | | | | |. | | | hg( |τ | , |. | , t, π) => | | | | |. | | | | | | | | | | | | | | | +---------+ +---+ +---------------+
π is calculated from a set of all possible tuples of strength t in ts and all the levels of Pi just by Cartesian product.
This will look like following diagram in IPO-gc.
IPO-gc ts ts' GF[0] GF[1]...GF[i-1] GF[i] GF[0] GF[1]...GF[i] +---------------------+ +---+ +---------------+ | | |v0 | | | | | |v1 | | | | | |. | | | hg( |τ | , |. | , t, π) => |τ' | | | |. | | | | | | | | | | | | | | +---------------------+ +---+ +---------------+
GF[ i] is a grouped factor to be added to ts. π is calculated by following procedure.
- Pick up all the possible partial tuples of strength x (smaller than t)from ts. Call it Left.
- Pick up all the possible partial tuples of strength y (x + y = t)from GF[i]. Call it Right.
- Calculate cartesian product between Left and Right. => π
When expanding ts to ts', values for GF[i] of τ' will be chosen to cover as many tuples in π as possible.
In vg (vertical growth) operation, there's a not a big difference than in hg. Add a new test or modify existing test to cover a new tuple. But when we add or modify a tuple, attributes that belong to the same grouped factor need to be updated at once. Not one by one. Otherwise, we may create a test case that violates some of given constraints.
To build a grouped factor, we group constraints first by original factors involved by them.
Suppose that factors Pi ( 0 <= i <= 7) and constraints C1, C2, C3, and C4 are given. And factors involved in each constraint are described as in following diagram.
P1 P2 P3 P4 P5 P6 P7 C1 |-----| | C2 | | C3 | |-----| C4 |
In this case, IPO-gc will build 3 grouped factors
- GF[ 0] = ({P1, P2, P4}, {C1, C2})
- GF[ 1] = ({P3, P5, P6}, {C3})
- GF[ 2] = ({P7}, {C4})
Note: A simple factor can be considered a grouped factor.
Algorithm
In short, it is an algorithm that transforms given factors and constraints into grouped factors, which are actually a set of covering arrays under related constraints, of desired strength t. And then do enhanced IPO algorithm, which can handle horizontal growth between current ts (test suite) and a grouped factor.
IPO-gc algorithm (summarized version) Input: - FS: Factor set - CS: Constraint set - t: strength Output: - ts <- ca(FS, CS, t) 1. GFS[] = transform(FS, CS, t) 2. ts = initial_covering_array(GFS[0], ..., GFS[t-1]) 3. foreach GF in [GFS[t], GFS[t + 1], ..., GFS[GFS.length - 1]] { 3.0 i = indexOf(GF in GFS) 3.1 π = compute_all_possible_partial_tuples([GFS[0], ..., GFS[i-1]], GF) // horizontal extention 3.2 foreach τ in ts[] { 3.2.1 v = choose_best_level(τ, GF, π) 3.2.2 update(τ, GF, v) 3.2.3 remove_subsuples(π, τ) } 3.2.4 // vertical extension for parameter P i 3.2.5 for (each combination σ in set π ) { 3.2.6 if (there exists a test that already covers σ ) { 3.2.7 remove σ from π } else { 3.2.8 change an existing test, if possible, or otherwise add a new test to cover σ and remove it from π } }
In the step 1, I just stated "transform simple factors into grouped factors". But each grouped factor is a covering array for given factors and constraints and it's very preferable to make it small. In other words, we still need to solve the same problem, "how to create a covering array from given factors under given constraints".
IPO-gc algorithm is not providing a good way to solve it. It is rather a framework to create a bigger covering array from given ones. In current implementation in JCUnit, what I created is just a very naive one. Just check all the possible combinations and exclude ones that violate any of related constraints. And then remove redundant ones (that do not cover new combinations). This approach can become easily impractical when many factors are involved in constraints. But,
- Maybe you should revise the test design so that we can make the model simple.
- It is easy to use another better algorithm to handle constraints and generate a covering array.
Another thing that needs to be achieved in this step is to group given constraints and factors into some cohorts. If a constraint C1 uses factors A and B, and another constraint C2 uses B and C, those will belong to the same group. And factors A, B, and C will belong to the same grouped factor. Plus C1 and C2 will belong to it, too. Because B is shared by C1 and C2, and this means we need to take all these 3 factors into account at once to check if a given tuple is violating any of them or not.
But C3, that only involves factor D and E, it will belong to another grouped factor, because there is no overlap between factors referenced by C1 and C2.
In step 3.2.1, original version of IPO-G just chooses one level from a simple factor that covers tuples in π the most. But in IPO-gc it is required to choose a set of levels in a manner the set of chosen levels do not violate any given constraints. This can be achieved by choosing a level of a grouped factor rather than choosing levels from individual (sub-)factors.
In step 3.2.8, the intention of the step is also same as the original one, but what we need to do is a bit different. If σ is involved in a constraint, other attributes might be required to be a specific value or in a specific range. In such case, we need to change not only the values directly mentioned in σ but also other (sub-)factors that belong to the same grouped factor.
NOTE: In step 3.1, We are calculating cartesian product of GFS[ 0 ], ..., GFS[i-1], and GF. But we should be able to make the step faster by scanning ts instead of the expensive operation.
3.1 π = compute_all_possible_partial_tuples(ts, GF)
Because ts should cover all the desired partial tuples at the beginning of an iteration. But we are not getting into further detail of it for now since it is an implementation detail.
Advantages and limitations of this algorithm
- Easier to make it parallel: Calculating covering arrays is an expensive operation and we can do it concurrently.
- Easy to implement: I could write this mechanism with in 2 weekds from I came up with it.
Future works
- Survey: I invented this algorithm but did not conducted any survey to claim it is a novel one. Maybe I'm just re-invented it since it's sort of naive. Also the assumption mentioned in IPO-gc algorithm section, where constraints should not involve so many parameters and therefore brute force algorithm is sufficient, should be surveyed to what extent it is applicable.
- Use other algorithms to generate a covering array for a grouped factor: It is easy to enhance the implementation to make the algorithm pluggable but right now it is hard-coded because there is only one available algorithm.
- Evaluation: Let's compare this algorithm with other ones. Sizes of test suites, performance, etc.
- Synthesize a new level from existing levels instead of just choosing one from a "grouped factor"(covering array under constraints): This will make generated test suite's size smaller.