Lee Becker's Assignment 13

Maximum Entropy Models for Natural Language Ambiguity Resolution

by Adwait Ratnaparkhi, University of Pennsylvania

Many problems in natural language processing (NLP) can be thought of as classification problems. A classifier can be thought of as a conditional probability p(a|b) or the probability of observing class a given context or evidence b. In natural languages processing, the context tends to be related to words in some way, while the classification depends on the task.

This thesis presents an approach to resolving natural language ambiguities using the Principle of Maximum Entropy that states that the best probability model for the data is the one which maximizes entropy over the set of probability distributions that are consistent with the evidence. In statistical mechanics, entropy is thought of as the number of distinct “microscopic” states in a system. These states in NLP systems are typically the probability distributions of various linguistic features (syntactic, semantic, lexical) over a large corpus of text.

Ratnaparkhi’s thesis gives a thorough background on the techniques of machine learning as they are applied to NLP and the maximum entropy framework. However the bulk of his thesis is devoted to comparing the Maximum Entropy approach to other approaches in problems of ambiguity resolution. In particular, he examines five natural language processing problems: 1) Sentence Boundary Detection, 2) Part-of-Speech Tag Assignment, 3) Parsing, 4) Prepositional Phrase Attachment and 5) Text categorization.
\With sentence boundary detection, the maximum entropy framework yielded a system that performed comparably to other state-of-the-art systems which use much more resources. The Part-of-Speech Tagger implemented for this thesis achieved very high accuracy (96.9%) on unseen Wall Street Journal data. Similarly the models for prepositional phrase attachment and text categorization outperformed other models when tested under identical or similar conditions.

In short, these experiments show that maximum entropy framework can produce high degrees of accuracy with little effort. However the main concepts to take away from this reading is that the maximum entropy model’s compact nature makes it applicable to many problem domains and portable across many problem sub-domains. Moreover this approach to statistical modeling is well suited for knowledge poor feature sets (which is very typical in NLP).

Reading this thesis gave me a better understanding of the breadth and depth required for writing a Ph.D. dissertation. Although it is over eight years old, it still gives a nice introduction to machine learning and NLP, and I now have a better idea of the degree of thoroughness necessary in experimental design and analysis. The biggest testament to the quality of this thesis is the number of NLP algorithms that now make use of the Maximum Entropy framework.