# Background

A couple of weeks ago, when I was writing a jcunit demo that tests `geophile`

[0]
library,
I noticed that its old constraint handling doesn't work well.
Since the behavior looked hard to fix without hurting compatibility and it would take
a bit time, I decided to implement another algorithm which I came up with.

# Introduction

I'm calling this new algorithm **IPO-gc**, named after well-known one **IPO-G**.
Before getting into my own, I would shortly touch up on its ancestor.

**IPO-G**[7], which is a basis of my new algorithm **IPO-gc**, is one of best known
combinatorial test suite generation algorithms where it grows up an initial covering
array, generated by calculating Cartesian products of first *t*(= desired strength) factors,
horizontally and vertically until all the desired tuples are covered.

Following is a pseudo code to illustrate the algorithm.[7]

Algorithm: IPOG-Test (int t , ParameterSet ps ) { 1. initialize test set ts to be an empty set 2. denote the parameters in ps , in an arbitrary order, as P1 , P2, ..., and Pn 3. add into test set ts a test for each combination of values of the first t parameters 4. for (int i = t + 1 ; i ≤ n ; i ++ ){ 5. let π be the set of t -way combinations of values involving parameter Pi and t -1 parameters among the first i – 1 parameters 6. // horizontal extension for parameter Pi 7. for (each test τ = (v 1 , v 2 , ..., v i-1 ) in test set ts ) { 8. choose a value vi of Pi and replace τ with τ’ = (v 1 , v 2 , ..., vi-1 , vi ) so that τ’ covers the most number of combinations of values in π 9. remove from π the combinations of values covered by τ’ 10. } 11. // vertical extension for parameter P i 12. for (each combination σ in set π ) { 13. if (there exists a test that already covers σ ) { 14. remove σ from π 15. } else { 16. change an existing test, if possible, or otherwise add a new test to cover σ and remove it from π 17. } 18. } 19. } 20. return ts; }

This algorithm consists of 4 most core elements, which are

1. *ts* = Create a initial covering array from first *t* factors. 2. *π* = Compute partial tuples to be covered by an iteration foreach *factor* not in first *t* factors: 3. hg: Grow *ts* horizontally by adding a *factor* to the right of given one so that partial tuples in *π* as many as possible. covered tuples by this step will be removed from *π*. while *π* is not empty 4. vg: Grow *ts* vertically by adding a new full tuple that covers partial tuples in π as many as possible. If there is already a tuple in *ts* that can cover another partial tuple in *π*, it will be modified to cover the partial one instead of adding a new one to *ts*.

where "a full tuple" means a tuple which has all the first *i* attributes and
"a partial tuple" means a tuple in π, only has *t* attributes.

# IPO-gc algorithm

- Enhance definitions of
*hg*and*vg*operation so that it can create a new covering array from 2 partial covering arrays, not only from 1 covering array and 1 factor. - Consider a simple factor a special form of covering arrays.
- Group given constraints into smaller chunks, each of which reachable from one constraint to another connected by shared factors.
- For each group of constraints, create covering array from factors referenced by the constraints in it. In the current implementation it is a brute force algorithm, but it can be any algorithm such as PICT, IPOC, or other AETG based algorithms.

Assumptions behind the last item is following,

- Generally speaking, constraints defined over a test space can be grouped into some smaller groups each of which contains only small amount of factors.

In my limited experience, this is true for automatic test suite generation for FSM's by JCUnit[1]. And actually it would be considered that creating a set of constraints that relate to many factors directly and indirectly isn't a good idea[8]

## Definitions

Before discussing further details, I want to define following several terms.

*FS*Factors.*CS*Constraint sets.*t*Desired strength of a generated test suite.*ts*Generated test suite. To which the algorithm adds generated tests by growing up horizontally and vertically.*τ*A (current) element in*ts*.*π*Partial tuples that should be covered in a current iteration. A tuple in this set has values from*FS*(simple factors) only. Not from*GFS*because this set is used to evaluate how good a next choice is and it should be done based on how many possible partial tuples generated from*FS*not from*GFS*.*GFS*Grouped factors. A grouped factors consists of one or more factors. And has zero or more constraints each of which uses one or more factors in it, but none of the constraints uses any factors outside the grouped factor. A grouped factor is one form of a "covering array" whose characteristics will be discussed later.*GFS.length*Total number of grouped factors in*GFS*.*GFS[ i]**i*th element of*GFS*.*GFS[ i].subfactors*Simple factors that belong to*GFS[ i]*.*GFS[ i].constraints*Constraints that belong to*GFS[ i]*.*GFS[ i].levels*Levels of*GSF[ i]*, which are tuples that consist of values from*GFS[ i].subfactors*.*A grouped factor*: If a set of factors*FS*, strength*t*, and constraints*CS*over them are given, we can define a grouped factor*GF(FS, CS, t)*as follows.- It is a set of full-tuples all of
*FS*. - All the possible tuples of strength
*t*over*FS*are covered by it. - None of
*CA*violates any of*CS*. - For simplicity, if an element is removed from it and the conditions above are still met, it will be removed.

- It is a set of full-tuples all of

One thing to be noted here is once a covering array is computed for a given
factors and their constraints, we are always able to compute all the possible
partial tuples of strength *t* by just scanning elements in it. You do not
need to worry about what constraints are given.

## Ideas

Figures to illustrate the ideas of **IPO-gc** are as follows

In IPO-G, horizontal growth *hg* is defined as an operation between, *ts*, *Pi*
(a conventional factor), *t*(desired strength), and *π* (tuples to be covered
in this iteration).

IPO-G ts ts' P0 P1...Pi-1 Pi P0 P1...Pi-1 Pi +---------+ +---+ +---------------+ | | |v0 | | | | | |v1 | | | | | |. | | | hg( |τ | , |. | , t, π) => | | | | |. | | | | | | | | | | | | | | | +---------+ +---+ +---------------+

*π* is calculated from a set of all possible tuples of strength *t* in *ts* and
all the levels of *Pi* just by Cartesian product.

This will look like following diagram in IPO-gc.

IPO-gc ts ts' GF[0] GF[1]...GF[i-1] GF[i] GF[0] GF[1]...GF[i] +---------------------+ +---+ +---------------+ | | |v0 | | | | | |v1 | | | | | |. | | | hg( |τ | , |. | , t, π) => |τ' | | | |. | | | | | | | | | | | | | | +---------------------+ +---+ +---------------+

*GF[ i]* is a grouped factor to be added to ts. *π* is calculated by following procedure.

- Pick up all the possible partial tuples of strength
*x*(smaller than*t*)from*ts*. Call it*Left*. - Pick up all the possible partial tuples of strength
*y*(*x*+*y*=*t*)from*GF[i]*. Call it*Right*. - Calculate cartesian product between
*Left*and*Right*. =>*π*

When expanding *ts* to *ts'*, values for *GF[i]* of *τ'* will be chosen to cover
as many tuples in *π* as possible.

In **vg** (vertical growth) operation, there's a not a big difference than in **hg**.
Add a new test or modify existing test to cover a new tuple. But when we add or modify
a tuple, attributes that belong to the same grouped factor need to be updated at
once. Not one by one. Otherwise, we may create a test case that violates some of
given constraints.

To build a grouped factor, we group constraints first by original factors involved by them.

Suppose that factors *Pi* ( 0 <= i <= 7) and constraints *C1*, *C2*, *C3*, and *C4*
are given.
And factors involved in each constraint are described as in following diagram.

P1 P2 P3 P4 P5 P6 P7 C1 |-----| | C2 | | C3 | |-----| C4 |

In this case, **IPO-gc** will build 3 grouped factors

*GF[ 0]*= ({*P1*,*P2*,*P4*}, {*C1*,*C2*})*GF[ 1]*= ({*P3*,*P5*,*P6*}, {*C3*})*GF[ 2]*= ({*P7*}, {*C4*})

**Note:** A simple factor can be considered a grouped factor.

## Algorithm

In short, it is an algorithm that transforms given factors and constraints into
grouped factors, which are actually a set of covering arrays under related constraints,
of desired strength *t*.
And then do enhanced IPO algorithm, which can handle horizontal growth between
current *ts* (test suite) and a grouped factor.

IPO-gc algorithm (summarized version) Input: - FS: Factor set - CS: Constraint set - t: strength Output: - ts <- ca(FS, CS, t) 1. GFS[] = transform(FS, CS, t) 2. ts = initial_covering_array(GFS[0], ..., GFS[t-1]) 3. foreach GF in [GFS[t], GFS[t + 1], ..., GFS[GFS.length - 1]] { 3.0 i = indexOf(GF in GFS) 3.1 π = compute_all_possible_partial_tuples([GFS[0], ..., GFS[i-1]], GF) // horizontal extention 3.2 foreach τ in ts[] { 3.2.1 v = choose_best_level(τ, GF, π) 3.2.2 update(τ, GF, v) 3.2.3 remove_subsuples(π, τ) } 3.2.4 // vertical extension for parameter P i 3.2.5 for (each combination σ in set π ) { 3.2.6 if (there exists a test that already covers σ ) { 3.2.7 remove σ from π } else { 3.2.8 change an existing test, if possible, or otherwise add a new test to cover σ and remove it from π } }

In the step 1, I just stated "transform simple factors into grouped factors". But each grouped factor is a covering array for given factors and constraints and it's very preferable to make it small. In other words, we still need to solve the same problem, "how to create a covering array from given factors under given constraints".

**IPO-gc** algorithm is not providing a good way to solve it. It is rather a framework
to create a bigger covering array from given ones.
In current implementation in **JCUnit**, what I created is just a very naive one.
Just check all the possible combinations and exclude ones that violate any of related
constraints. And then remove redundant ones (that do not cover new combinations).
This approach can become easily impractical when many factors are involved in constraints.
But,

- Maybe you should revise the test design so that we can make the model simple.
- It is easy to use another better algorithm to handle constraints and generate a covering array.

Another thing that needs to be achieved in this step is to group given constraints
and factors into some cohorts.
If a constraint *C1* uses factors *A* and *B*, and another constraint *C2* uses *B*
and *C*, those will belong to the same group. And factors *A*, *B*, and *C* will
belong to the same grouped factor. Plus *C1* and *C2* will belong to it, too.
Because *B* is shared by *C1* and *C2*, and this means we need to take all these
3 factors into account at once to check if a given tuple is violating any of them
or not.

But *C3*, that only involves factor *D* and *E*, it will belong to another grouped
factor, because there is no overlap between factors referenced by *C1* and *C2*.

In step 3.2.1, original version of IPO-G just chooses one level from a simple factor
that covers tuples in π the most. But in **IPO-gc** it is required to choose a set
of levels in a manner the set of chosen levels do not violate any given constraints.
This can be achieved by choosing a level of a grouped factor rather than choosing
levels from individual (sub-)factors.

In step 3.2.8, the intention of the step is also same as the original one, but what we need to do is a bit different. If σ is involved in a constraint, other attributes might be required to be a specific value or in a specific range. In such case, we need to change not only the values directly mentioned in σ but also other (sub-)factors that belong to the same grouped factor.

**NOTE:** In step 3.1, We are calculating cartesian product of *GFS[ 0 ], ..., GFS[i-1]*,
and *GF*. But we should be able to make the step faster by scanning *ts* instead
of the expensive operation.

3.1 π = compute_all_possible_partial_tuples(ts, GF)

Because *ts* should cover all the desired partial tuples at the beginning of an
iteration.
But we are not getting into further detail of it for now since it is an implementation
detail.

# Advantages and limitations of this algorithm

*Easier to make it parallel:*Calculating covering arrays is an expensive operation and we can do it concurrently.*Easy to implement:*I could write this mechanism with in 2 weekds from I came up with it.

# Future works

**Survey:**I invented this algorithm but did not conducted any survey to claim it is a novel one. Maybe I'm just re-invented it since it's sort of naive. Also the assumption mentioned in*IPO-gc algorithm*section, where constraints should not involve so many parameters and therefore brute force algorithm is sufficient, should be surveyed to what extent it is applicable.**Use other algorithms to generate a covering array for a grouped factor:**It is easy to enhance the implementation to make the algorithm pluggable but right now it is hard-coded because there is only one available algorithm.**Evaluation:**Let's compare this algorithm with other ones. Sizes of test suites, performance, etc.**Synthesize a new level from existing levels instead of just choosing one from a "grouped factor"(covering array under constraints):**This will make generated test suite's size smaller.