Minimum description length [] provides a measure
of the minimal amount of information necessary to encode some
data. Any data can be transformed in a particular way so that it becomes
a sequence of symbols (numbers or signals even, to be less general)4.1. The transition between one symbol to another can be encoded by some
transition table which holds the probabilities of all possible transitions.
For symbols, up to
transition need be defined. Markov
chains are one such model type which is a convenient way of explaining
the nature of MDL. Markov chains of a high order can accommodate for
data of more awkward and unpredictable variance. MDL infrequently
defaults to higher-order models (see examples at [WWW-14]) which
are superiorly expressive, although they require a far greater number
of transitions to be specified.
With proper use of models as in the case above, data can and should be represented solely by all transitions and can then essentially replicated from these transitions. Unless the data is peculiar and shows no patterns, such a description would be compact for data large enough in bulk. MDL attempts to describe the extent to which some data is capable of diminishing in bulk (with or without loss being a separate issue) or rather the minimal amount of information that needs to be available to describe and reconstruct that data. In most cases, the more uncertainty present (i.e. higher entropy), the greater the minimal description length would be.
As an example, here is a vector representation of some arbitrary data:
There is an alternative way of representing
this data. By using a first-order transition table, e.g.
1,
..., the likelihood or probability of transition
from every element to its successor is revealed. Observable patterns
are merely meaningless in this data example. Encoding of transitions
will also be an inefficient approach as a result of the small vector
size and the low sequential correlation4.2. On the sharp contrary, vectors such as
or
bear a very small measure of uncertainty. In the second of these4.3, only one transition exists so it can be represented by a tiny model
and the entropy is 0.
To summarise, MDL is a measure of the minimal amount of information that expresses a sequence4.4. By inspecting transitions it is possible to get an insight into the complexity of some model. A heart beat pattern, for instance, is rather predictable and repetitive in comparison with the positions of a person's fingers over time. This means that the description length of the heart state should be shorter than that of the hand. Less information is required to capture the behaviour of the heart in motion (heart beats are not sporadic).
Issues of transitions and understanding of data patterns will later be explained in a different context. Rather than reconstructing vectors, images need to be reconstructed by the least number of bits. These bits are permitted to permute quantised natural numbers as the above arguments suggest.