In this section, we present experiments which investigate the behaviour of our evaluation method. The principal idea is that of producing perturbed datasets, with progressive degrees of degradation. Ideally, our specificity and generalisation measures should vary monotonically with the degree of degradation.
The sensitivity of our method, that is, the degree of change that it can reliably detect, is an important issue, which is further explored in Section .