The notion of ``information gained'' and homologous uncertainty measures were initially proposed by Hartley. These were later extended by Shannon, a considerable amount of time before becoming applicable to registration [15]. Shannon quantified the amount of uncertainty and named it entropy - a very fundamental term in information theory.
The relevance of this information-theoretical principle to our research stems from the fact that excess of information that is encapsulated in messages is indeed measurable. Compression implies data simplification abilities and the less uncertainty is present in the data, the less information will be required to encompass the redundancy in that data. This therefore can be looked at as a measure of the simplicity of data, enabling prediction and detection of patterns. Minimum Message Length or Minimum Description Length (MDL), as defined by Rissanen [16], is a useful mechanism for discovering the complexity of some data and following Occam's Razor, models that are most concise, should also fit best.
We are then left with the decision of how to construct a message which describes model quality. In practice, such a description should not only consist of the model, but it also must contain all the training data with its relationship to that model.