4.7 Article

Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective

Journal

IEEE TRANSACTIONS ON SIGNAL PROCESSING
Volume 58, Issue 5, Pages 2686-2700

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2010.2042491

Keywords

Active learning; Bayesian models; kernel methods; linear regression and classification; optimal experiments

Ask authors/readers for more resources

We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, Phi is an element of R-N x N, for which the (i, j)th element is defined by the kernel function K(gamma(i), gamma(j)) is an element of R, with the observed data gamma(i) is an element of R-d. We seek a model, M : gamma(i) -> y(i), where y(i) is a real-valued response or integer-valued label, which we do not have access to a priori. To achieve this goal, a submatrix, Phi I-l, I-b is an element of R-n x m, is sought that corresponds to the intersection of n rows and m columns of Phi, indexed by the sets I-l and I-b, respectively. Typically m << N and n << N. We have two objectives: (i) Determine the m columns of Phi, indexed by the set I-b, that are the most informative for building a linear model, M : [perpendicular to Phi(i, Ib)]T -> y(i), without any knowledge of {y(i)}(i=1)(N) and (ii) using active learning, sequentially determine which subset of n elements of {y(i)}(i=1)(N) should be acquired; both stopping values, |I-b| = m and |I-l| = n, are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x, as measured by the differential entropy of its posterior distribution. The parameter vector x is an element of R-m, as well as the model bias eta is an element of R, is then learned from the resulting problem, y(Il) = Phi(Il, Ib) x + eta 1 + epsilon. The remaining N - n responses/labels not included in y(Il) can be inferred by applying x to the remaining N - n rows of Phi(i, Ib). We show experimental results for several regression and classification problems, and compare to other active learning methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available