Active appearance models are somewhat of an extension to active shape models and a brief introduction to shape models may be worthwhile. Given a collection of images that depict an object which possesses some innate properties, it is then possible to express the visual appearance or shape of that model in a way that discards subtle changes in view-point , object position, object size and is robust to some level of object deformation. That object which appears in the group of images need not even be the exact same one; it can be an object belonging to one common class. Some variation can be handled reliably by simple transformations, but their functionality is inevitably very limited. There are statistical means which allow the encoding of the variability as was learned throughout a so-called training process. That training process does not require far more than an exhaustive inspection of the set of images. However, in order to interpret several image, some simplification steps are required as the images are expected to be relatively large in practice - certainly large enough to result in an exponential blow-up4. A method is sought which reduces the amount of information that is required to describe the object of interest and the different forms it can take. In most cases, edge detection is sufficient to capture regions or points of greater significance in the image5. Such points are often chosen to become what is entitled landmarks. Landmarks are positions in the image which effectively distinguish one object from another in the set of images. They also have a some interesting spatial traits and their low proximity can form near-optimal curves (or contours) which together make up shapes. The concatenation of the coordinates of these landmarks can then describe an image (or rather the object focused on) in a by concise and useful representation. In 2-D, for landmarks, a vector of size can roughly infer the shape of the object present in an image. For the naíve pixel-wise representation, not only will a space of x need to be allocated, but also the manipulation process on this data will slow down considerably.
With the concise landmark-based representation as above set to be the convention and a collection of fair-sized vectors rather than a collection of images, it should be possible to express (in a feasible way) the legal range of each one of the vector components. This in essence establishes the model. It is an entity that can be manipulated to reconstruct all the images (or objects) it originated from and far beyond that. This model encapsulates the variation which was learned from the data and it usually improves its performance as more legal examples are viewed and ``fed'' to support some further training. Alternation of the parameters of the model can generate new (unseen) examples as long as that value alteration remains loyal to the legal range, as learned from the training examples. The vector representation mentioned beforehand can be also looked at as a description of some fixed location in space that comprises dimensions (see illustrative scatter below). This turns out to be a useful analogy as will be seen later when dimensionality reduction is applied.
Shape models are ``statistical information containers'' which can be built from the images with overlaid landmark points identified and recorded. A common grid whose purpose is to ease collective alignment is a crucial way of achieving consistency amongst the coordinates of all landmarks. Some more issues that are concerned with normalisation steps are described slightly more accurately later in this document. A human expert usually performs annotation or landmarking of the images with the aid of some computer-assistant tools. In recent years, alternatives which are automatic showed great promise  and these extend to 3-D too .
Active appearance models were later developed by Edwards et al. [9,10] and the great advantage of these was that they were able to sample grey-level data (Stegmann et al.  incorporate full colour by now) from images rather than just points. Therefore, they held information about what an image looks like rather than just its form as visualised by contours (or surfaces in 3-D). Just as points in the image were earlier chosen, grey-level values (also referred to as intensity or texture) could be systematically extracted from a normalised image and put in an intensity vector. This normalisation process and the representation of this intensity vector is described later in this section.
What enables appearance models to exhibit quite an astonishing graphical resemblance to reality is that at the later stages a combined vector is made available. It incorporates both shape and intensity and is aware of how a change in one affects the other. Hence it has a notion of the correlation between the two - a notion that is dependent on the training data and Principal Component Analysis. Although appearance models are not as quick and accurate as shape models, they contain all information that is held in the shape models and in that sense are a superset of shape models. Also, some techniques have been developed and employed to speed up active appearance models. Tasks such as the matching of an appearance model to some target image are described later in this section and illustrated in .