Segmentation Based on Analytic Image Transformations

Next: Segmentation Based on Regularization Up: Image Segmentation and Object Tracking Previous: Segmentation Based on Simple Clustering or Inconsistency

Segmentation Based on Analytic Image Transformations

Segmentation using image transformations finds fits within the flow field of analytic functions with a number of parameters. The number of parameters used varies widely across the research field. The assumption may be made of constant flow, or linearly varying flow etc., and the image is divided up into regions conforming internally to their own fit to the model. In top-down segmentation, the primary fit to the whole image is found first, and then recomputed once outliers to this fit have been thrown away. Then each region of outliers has its own fit made, and so on. In bottom-up segmentation, regions are grown outwards as the model allows. The fit used may be based on a three dimensional world model, usually giving a large number of parameters in the projected two dimensional fit.

In [1] Adiv uses flow fields found either from spatiotemporal derivatives or by using Lawton's method described in [56]. Here the flow field is segmented into regions corresponding to planar surfaces in the environment. These surfaces are then merged into rigid objects. Object outlines are found, but not very accurately.

Blostein and Huang, in [10] and [11], look at the detection of very small objects (typically one pixel in size) moving with constant image velocity. They track these objects over many frames using a tree-search algorithm. Given this restrictive set of assumptions about the environment their algorithms successfully found the tracks of points moving in noisy images which humans could not see. In [28] Debrunner and Ahuja use this method of tracking feature points to segment image sequences into rigid objects of constant rotation per frame. This is done by taking point trajectories and splitting regions into separately moving sub-regions when the motion estimate (which finds the translation and the fixed rotation) fits the measured motion poorly, i.e. when a region clearly contains two independent regions. The algorithm allows regions to merge as well as split, if their motion parameters are similar. A problem (shown in the results) with using this world model is that static points may be linked with rotating regions if they lie on the projection in the image of the rotation axis.

In [32] Diehl models two dimensional transformations of parabolic world surfaces. (There is no merging of adjacent surfaces with similar three dimensional motion into single objects. For example, two faces of a cube will be treated as being two separate objects by the model of parabolic object surfaces.) He finds image motion by warping and differencing, estimating the image transformation of the largest part of the image to do the warping, and then calculating those parts of the image which do not fit this transformation. The algorithm is then applied to the smaller parts to find their image motion. This type of approach has problems if any two (or more) adjacent image sections at the same hierarchical level have similar area, as the transformation estimation will not be well conditioned. This could easily occur in outdoor scenes.

The work of Burt, Bergen et. al. uses a similar segmentation approach. In [16] and [15] the flow field is hierarchically segmented into background and moving objects using coarse to fine flow estimation and coarse to fine segmentation. Again, this works best when moving objects take up a small fraction of the image. The flow is found using correlation, spatiotemporal derivatives or Fourier domain phase shifts (similar in practice to spatiotemporal derivatives). Roughly planar surfaces are assumed, giving rise to an affine flow model being used. In [6] two motions in an image are found ``simultaneously''; the method works even when the motions are superimposed across the image (for example semi-transparent motion). In practice the two motions are computed in alternating iterations of the algorithm estimating the affine transformations. The ``second'' motion at any stage is found by taking the current estimate of the first motion and using this to perform image registration (warping one image to match the second according to the motion transformation) and differencing. In [54] Irani et. al. use these segmentation methods with temporal integration of the segmentation results to improve performance, in cases where different regions at the same hierarchical levels have very different sizes. This ``temporal integration'' does not involve any sort of shape tracking or modelling, so that information about scene events is not readily available.

In [113] Torr and Murray assume that the background motion corresponds to an affine image transformation, and segment out parts of the image which have flow inconsistent with this transformation. The transformation is applied directly to the spatiotemporal image brightness derivatives. This method relies on linear variations in the background flow, and therefore becomes better conditioned as the world becomes flatter. Good segmentation results are presented. It is not clear how well the method would work in the problem cases mentioned above.

Next: Segmentation Based on Regularization Up: Image Segmentation and Object Tracking Previous: Segmentation Based on Simple Clustering or Inconsistency