Model Training

$next$ $up$ $previous$ $contents$
Next: Searching and Fitting Up: Active Appearance Models Previous: Appearance Model Construction Contents

Model Training

A descriptive statistical model is now available for utility and various analyses. That model is a type of flexible deformable entity that can describe any instance of object or image¹¹ in the range of the training set¹². Assuming that the training set was infinite in size or comprised all possible instances that the model might be presented with, it should then be considered a powerful, fully compatible, flawless model.

It is still not trivial in any case how one should deform the model to achieve an appearance instance that is valid. It is now a completely opposite problem that a user of this model is faced with: how can one model generate new instances after existing instances generated that one model? In some sense, an inverted operation is needed so that the model can be used in the opposite way to the means in which it was created. Things are not very simple in reality and the alteration of model values needs to be guided by some minimisation (the next section elaborates on this) that obtains the matching which is required. Unfortunately, in an expectedly high dimensional space as above, the process is almost endless unless extra knowledge about this minimisation problem is provided in advance.

The way in which this problem can be circumvented quickly involves learning how the parameters $\mathbf{c}_{i}$ affect the model¹³with respect to a typical target. Each parameter in $\mathbf{c}_{i}$ has an unequalled effect on different regions in the model, e.g. its size, intensities and so on. By changing the value of each such parameter and recording the change that is perceived in an image (using pixel-based comparison of some kind), a type of deformation index can be maintained. This index indicates which parameters should be changed and if so in what way at to what degree in order to approach good overlap between a model and some target image.

More formally, the procedure works as follows:

For the model parameters $\mathbf{c}_{i}$ where , a parameter change $\delta\mathbf{c}$ (where one parameter value or more can be readjusted) is applied to generate some new shape and texture. $\delta\mathbf{c}$ expresses in a vector-based representation the offsets that each of the original parameters $\mathbf{c}_{i}$ is subjected to. The exhaustive pixel-wise difference in intensity¹⁴is calculated in accordance with:

$\displaystyle \delta\mathbf{I}=\mathbf{I}_{model}-\mathbf{I}_{image}$

(2.6)

to produce a new vector of intensities (the differences). This vector can also be visualised to display this difference to a human eye. A simple measure of difference is used although this need not necessarily be the case. Sum-of-squares of the pixel differences is then used because larger quadratic differences will have a greater effect on the final measure and summation then only consists of positive values. For example:

$\displaystyle \delta\mathbf{I}=sumofsquares(\{-1,3,5,2,6,-10,-1\})$

(2.7)

then becomes

$\displaystyle \delta\mathbf{I}=sum(\{1,9,25,4,36,100,1\})=176$

(2.8)

as opposed to

$\displaystyle \delta\mathbf{I}=sum(\{-1,3,5,2,6,-10,-1\})=4.$

(2.9)

With this measure of intensity difference recorded, a correlation can be expressed between the parameter change and this difference as it appears in image space where a model is superimposed on some target. A target image which is the model in its mean form is needed here to be used for basic comparison. This quantitative measure of difference obtained will however indicate solely the ``goodness'' of the parameter change and not the overall effect that it has on the image. This means that it will not necessarily be obvious what parts in the two entities (model and target) remained similar and which ones did not¹⁵. A type of a sequential data such as a vector is hence more useful as it retains the location of each computed difference value. Unsurprisingly, this consumes far more space. In either case, under the premise that space is more expendable than time complexity a vector of difference is calculated and the correlation can be formulated as follows:

$\displaystyle \mathbf{c}_{i}\rightarrow\mathbf{c}_{i}+\delta\mathbf{c}\rightarrow\delta\mathbf{I}$

(2.10)

This type of offset $\delta\mathbf{c}$ that was applied to the collection of parameters $\mathbf{c}_{i}$ is accompanied by a global change in intensity values across the image frame. This correlation can now be stored aside and become accessible from an index as its size is proportional to the image size. The storage is dictated by the following (somewhat artificial) relation:

$\displaystyle \delta\mathbf{c}=\mathbf{A}\delta\mathbf{I}$

(2.11)

where $\mathbf{A}$ is a matrix recording the change in intensities due to the parameter change $\delta\mathbf{c}$ . This is a type of matrix which is analogous to an n-dimensional vector that expresses the change which was discovered off-line. It linearly defines (in a possibly high dimensional space) the linear relation between change to the parameters and change to the intensities, or more precisely the different image. It can be looked up directly later on when performing a search and thereby avoid re-computation in a virtually recurring and almost identical problem.

The most fundamental and perhaps even compact procedure will carry out the steps above for each of the modes of variation, as well as the basic geometrical linear transformations. This can be a very laborious and cumbersome process although it depends on the robustness prescribed. As the next stage illustrates, models that are not rich enough will fail to converge in difficult scenarios, a classic example of which is inappropriate initialisation.

The matrix A holds real valued numbers (preferably of limited accuracy to decrease space requirements and access speed). The values in this matrix form a beneficent map that guides exploration for good parameter changes; this will be of great use when fitting the model to a target. In practice, such matrices are visualised by showing negative values as dark colour and positive one as increasingly brighter values.

$next$ $up$ $previous$ $contents$
Next: Searching and Fitting Up: Active Appearance Models Previous: Appearance Model Construction Contents

2004-07-19