To study the quality of the model (from which the quality of NRR can be directly inferred), 1000 synthetic volumes are derived from that model. Using MATLAB (or potentially C++/VXL), each of these synthetic images is compared against the original (training) dataset - that which was registered and built the model. The shuffle distance between images in the training and synthetic sets can be aggregated or averaged to give a figure of merit. Such figures can be derived which indicate how good or bad the registration was.
In shuffle distance estimates, it is preferable to use a radius rather than consider just a cube (or box) of pixel as a neighbourhood. A radius of 2.5 voxels is said to be preferable (corresponds roughly to a box of 5x5x5 voxels). There emerges another issue: the thickness of the slices differs from the resolution in pixels, for any given slice. This means that a better solution should involve an elliptical neighbourhood (i.e. 2 radii). Another possibility is to do a plain-type comparison, comparing one slice against another corresponding slice. This can be done in a similar fashion to 2-D experiments, treating a volume as a set of plains.