Yifei's Assignment 13

Thesis title: "Very Sparse Kernel Models: Predicting with few Examples and few Features" by Thomas Strohmann

Machine learning and building machine learning model is my favorite research area, so I select this thesis in this assignment. This thesis focuses on the supervised machine learning area. Supervised machine learning is the study of automatically building a model from a set of labeled training data. After building a model it is then used on unseen data to predict labels from a set of features.

This thesis proposed a supervised machine learning model by combining "predicting with few examples" and "predicting with few features" into a single framework. With this Sparse Kernel Models this thesis proposed, it could make predictions using a small number of examples and a small number of feature. The sparse model can have impact in areas of robotics, signal process and bioinformatics. The sparse model has a number of benefits:
1. They require less memory to store and are faster to evaluate.
2. Eliminating noisy data from the model (i.e. spurious features and/or training examples which are outliers) can improve the prediction accuracy.
3. Sparse model has the advantage to be easier to interpret by the user who can focus his/her analysis of a problem on the selected features and/or training examples.

At the beginning, this thesis reviews two main areas of research related to this area: sparse basis function models and feature selection. The idea of sparse model is that given a dictionary of basis functions, find a model which only uses a few basis functions from the dictionary. (for the case that the basis functions are directly constructed from training examples, using few basis function and using few examples are equal.) The goal of a sparse kernel method is then to find a model that uses only a few fi while at the same time having a low prediction error.
The main idea of feature selection is to choose a small subset of the available features for predicting the label of unseen examples. The benefit of using only a small subset of features is:
1. The cost for predicting new examples is reduced.
2. For data sets which contain many spurious features (i.e. features that are unrelated to the label) the prediction accuracy can be improved by removing these features.
3. Feature selection algorithms can help the user to better understand a particular problem domain by telling the user which features are most relevant.

Sparse models aim to make good predictions with a small number of examples while feature selection methods aim to make good predictions with a small number of features. Then this thesis introduces the main idea of the thesis, combining these two ideas into a single framework. It starts with a sparse kernel model, the sparse minimax probability machine, which uses few examples but all features. Then it extends the idea of sparse models to use few examples as well as few features for making predictions. After that this thesis introduces kernel functions which can model data locally using only a few features and a filter approach for local feature selection. Finally, this thesis presents the experimental result for the algorithm developed in this thesis.
This thesis presents its ideas with a very clear structure, and it is adequately supported by the experimental results.

Last modified 3 December 2007 at 6:10 pm by yifei