Soumya's Assignment 13

Models for learning spatial interactions in natural images for context based classification. – Sanjiv Kumar (CMU)
The author proposes a non causal graphical model for modeling contextual information necessary for various image analysis task. The proposed model is called the Discriminative Random Field and is a 2dimensional version of the Conditional Random Field which is in turn an extension of the popular Markov Random Fields. Markov random fields model spatial interactions under the Bayesian framework by regarding label interactions as a global prior and likelihood as the data dependent term. Such a formulation has been known to perform well in various image labeling and de-noising tasks. Despite their theoretical elegance MRFs suffer from a number of drawbacks.Firstly, there is no easy way to model the likelihood. For computational tractability this is often factored as a product of likelihoods over the set of all sites,under the assumption that the observation of a site is independent given its label. This assumption is particularly not true in natural images. The observations can not be assumed independent since they often follow some underlying structure. Such structure span multiple sites, thus causing the observations at neighboring sites to be highly correlated. Secondly, the prior models the label interactions ignoring how these interactions are affected by the observations in the neighborhood. Finally, labeled data is often hard to come by. This makes the task of accurately modeling the likelihood and the prior difficult. Directly modeling the posterior probability can avoid these difficulties and make the most of the limited training data.
In the light of the above facts the author proposes his Discriminative random field, which directly models the posterior probability (probability of a certain label configuration given the image) as a random field.
P(x|y) = 1/Z(A(xi,y)+sum(I(xi,xj,y)) where xi is the label at site i, xj is the label at site j and the summation is over all js which are neighbors of i, y is the set of observations over the original image. In the above formulation the first term known as the association potential and the second term known as the interaction potential can be learnt discriminatively. Furthermore any arbitrary discriminative classifier might be used to learn these potentials. This allows domain dependent classifier selection. Though the authors here suggest logistic regression, later work has extensively used SVMs.
Based on this fundamental idea, Kumar further develops DRF theory to solve multiclass classification problems and provides a framework for hierarchical fields for incorporating spatial context at multiple scales. The thesis also goes into detail about the nitty grittes of the learning and inference procedures.

Kumar compares his work with MRFs and logistic regression on a number of real world problems, notably man made structure detection in natural images. The DRF formulation is shown to outperform the other alternatives.

Overall, I think this is a very well written thesis representing a substantial piece of dissertation work. Not only is the proposed model novel, but the development of surrounding theory to support it is very refreshing. Kumar has refrained from using 'hacks' that work, in favor of solid theoretical formulations.

Last modified 27 November 2007 at 10:46 am by Soumya