Soumya's Assignment 14

Image understanding is a high level vision task which deals with interpreting and associating meaning to images. This includes but is not limited to figuring out what real world objects occur in the image (object recognition), where these objects occur (localization) and the spatio-temporal relationships between these objects. Semantic image understanding has been an active area of research in the vision community resulting in a number of paradigms:

Bottom-up strategies: These work by associating meaning to the results of low level processing such as region segmentation.
Top-Down strategies: These are model based approaches, where one a priori knows the kind of objects that are present in the image and also has a well defined model to describe such objects. The primitive Hough Transform based detection techniques belong to this category of algorithms.
Hybrid Approaches[1]: Hybrid approaches try to utilize the best of both worlds. They generate multiple data dependent hypotheses using bottom up strategies and then prune away those hypotheses which do not fit the predefined models.

Though significant progress has been made in the field a unified framework is lacking. Numerous papers are published each year developing new algorithms for domain dependent image understanding. Algorithms which work well in one domain often perform poorly in other domains. For instance, medical image understanding algorithms seldom perform well on aerial image analysis tasks. Adapting algorithms across domains often involves non trivial amounts of work (often hacky). This lack of generalization is what i intend to address in my thesis. In this thesis I investigate whether generalization is possible at all, and if so under what circumstances. Initial experiments have evaluated generalization capabilities across similar domains by adapting concepts from Transfer Learning[2]. Another aspect of this research has been to understand what kind of image features are the most domain independent. The work in this thesis provides a first step towards a unified image understanding framework.

1) Z.W. Tu, X.R. Chen, A.L. Yuille, and S.C. Zhu, "Image parsing: unifying segmentation, detection and recognition", Int'l J. of Computer Vision, 63(2), 113-140, 2005.
2)Rosenstein, M. T., Marx, Z., Kaelbling, L. P., Dietterich, T. G. (2005). To transfer or not to transfer. NIPS 2005 Workshop on Transfer Learning, Whistler, BC.