In this section, we discuss the design of experiments to investigate the behaviour of different methods of evaluating NRR. The main idea is that progressive misregistration of initially registered datasets should result in monotonically increasing values of specificity and generalisation (decreasing performance). We also derive a measure of sensitivity to misregistration that can be used to compare methods of NRR evaluation.