PDF version of this entire document


Minimum Description Length

Broadly speaking, information theory is a science associated with representations and manipulations of information. It is valuable in this context because image analysis often involves the passage and handling of large sets of data. Extraction of meaningful parts of this data is necessary and compression becomes ever more crucial or at least valuable when complexity of data is learned. Quantification of data is deeply linked to the compression and transmission of data. In turn, compression of data is linked to building of models. These models can embody the same data using an alternative representation which is more compact.

The term entropy (also commonly referred to as Shannon's entropy 4.1) is used to denote a general measure of uncertainty. Entropy tells how much information there is in some given data. It is a measure of order and disorder in such data, where this measure indicates just how much is needed to describe it.

In practical terms, Shannon computed [] the optimum codeword length for transmission of a discrete event $\{i\}$ that occurs with a probability $\{p_{i}\}$. In simpler terms, rare events can be assigned longer codewords and events that recur many times assigned common short codewords. In general, the optimum code length required to transmit the occurrence of event $i$ is $-\log p_{i}$. This gives a minimum total message length for transmitting the occurrence of events $\{i_{1},i_{2}...i_{\eta}\}$, where $\eta$ is the number of events considered. In this way, a message whose total length is minimal can be constructed to encode such events. ``Events'' and ``data'' are somewhat interchangeable here. Entropy relates to the observation that the amount of information in some data can be thought of as the number of bits required to transmit this data.

Usually, uncommon values are those which need to be represented, so they will increase the entropy. For some given homogeneous data, the total number of bits required in order for the data to be transferred will be referred to as entropy from now on.

For more complex heterogeneous data, the idea can be extended to that of description length. Minimum description length (MDL) can be used to measure the minimum cost of transmitting data []. MDL is applied to such data 4.2 and there are different parameterisations to choose from, where one needs to find the best description length. MDL uses the Shannon result to identify data which can be compressed and it then shortens the message required to transmit this data.

It is worth noting that the Shannon transmission omits one important element - in effect, the codebook needed to decode the message. Adding in that codebook - or in this case the details of the model - is what warps the problem from the Shannon result to minimum description length. For example, in shape models, the model is what enables shifting from the raw numbers (the shape parameters) to the actual physical shapes, which makes it clearer that there is a process of decoding. It is not possible to know what a set of shape parameters (which is just a set of numbers) means in terms of a physical shape, not without the model anyway.

In the example which is presented here, what gets used is a set of images represented by a model which encodes these images. A model of this data is obtained along with model parameters, but model parameters can change. In this case, the parameter change will affect correspondence. The description of the model used in this case needs to be sent along with the parameters for that model, the data encoded using that model, as well as residuals. The description length is just the message length of these 4 elements combined but measured separately in bits. By quantifying everything like this, MDL can solve the model selection problem as a matter of optimisation.

An important issue to consider is the tradeoff in selection. Simpler models have fewer parameters, so these may be cheaper to transmit, but then, the pdf of the model will not fit the data properly and hence it will cost more.

With all different factors taken into account and quantified, it is reasonable to expect MDL to do the right thing.

Roy Schestowitz 2010-04-05