Next: References Up: No Title Previous: A Prototype-based Model

Learning Deformable Templates

The success of the structured deformable template matching approaches depends, obviously, on an accurate description of the shape class --- the expected shape instances and their variations. This information, similar to the prior distribution in a Bayesian framework, can be subjective. Some recent work on shape modeling has focused on the active learning of the shape models from training samples, influenced by the goals of ``active vision''. To describe a shape class, one has to learn both the ``representative'' shape and the ``variability'' in the shape class [12,10,11,22,23,31].

Principle component analysis has been shown to be very useful in some computer vision applications because of its capability to reduce the dimensionality and to extract the important dimensions in terms of the amount of variations they explain. As a result, it plays an important role in learning object representations (e.g., eigenfaces). Cootes et al. [12,10,11] have adopted this method to learn deformable template models. They proposed the ``active shape models'' for templates represented as line-drawings. By an ``active shape model'', they mean that instead of handcrafting the parametric form for the shape class, the prototype shape and its deformations are learned from a collection of correctly annotated example shapes. Basically, polygonal representations are used for shape modeling. By manually aligning the training set, i.e., establishing the correspondences between the ``landmark points'' (nodes) of training samples of the same class, they calculated the mean position and variation of each node from the training shapes. This mean shape is used as the generic template of the class of shapes. A number of modes of variation, i.e., the eigenvectors of the covariance matrix, are determined for describing the main factors by which the examplar shapes tend to deform from the generic shape. A small set of linearly independent parameters are used to describe the deformation. In this way, their shape model allows for considerable meaningful variability, but is still specific to the class of structures it represents. The major contribution of their work is that the active shape model is able to learn the characteristic pattern of a shape class and can deform in a way which reflects the variations in the training set. The limitations of the approach are its sensitivity to partial occlusion, and its inability to handle large scale and orientation change.

Kervrann and Heitz [22] proposed a deformable model which is very similar to Cootes et al.'s model. They presented an unsupervised approach to learn the structure and deformation modes of 2D polygonal objects, given long image sequences. They used a combination of both the global and local deformation modes to model a deformable shape. The global mode is the same as that in Cootes et al.'s work, i.e., is modeled by a generic shape plus the global deformation which is a linear combination of the variation modes obtained from principle component analysis. The local deformation, which is considered to contain additional information from the new image frame, is modeled by a Markov random process for the consecutive nodes, which takes into account interactions between the neighboring points. In the training stage, upon the processing of every new image frame, the computed local deformations are used to update the global average template and the global deformation modes. They applied this approach to object tracking. However, a good initial template is still required, and the convergence of the sequence is not guaranteed.

Given a set of representative shape examples, Pentland [30] and Pentland and Scaroff [31,34] have proposed a novel shape modeling method using the finite element models (FEM). They used 3-D finite element models which act like lumps of elastic clays to model 3-D shapes. They derive modes of vibration of a suitable base shape, such as an ellipsoid, and build up shapes using different modes of variation. The first few modes are the large-scale variations of the shape; the higher order modes are more localized. A total of 30 modes were used to model human heads. They fitted models to range data by an interactive process, and compared modeled objects using fitted parameter values. The advantages of these models are that they are easy to construct and allow for a compact parametric representation of a family of shapes. Additionally, a close-form solution can be obtained for the complex 3-D shape modeling problem. However, this does not always lead to a compact description of the variability within a particular class of objects.

Next: References Up: No Title Previous: A Prototype-based Model

Bob Fisher
Wed May 5 18:16:24 BST 1999