⇒

Active Learning Book

Synthesis Lectures on Artificial Intelligence and Machine Learning

Morgan & Claypool Publishers, June 2012, 114 pages

Burr Settles

Carnegie Mellon University

Online supplementary materials coming soon.

Abstract

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain.

This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities.

Table of Contents: Automating Inquiry / Uncertainty Sampling / Searching Through the Hypothesis Space / Minimizing Expected Error and Variance / Exploiting Structure in Data / Theory / Practical Considerations

Active Learning Literature Survey. This book is partially based on a popular unpublished literature survey, the contents of which are subsumed by and expanded on in the book. For historical interest, here are archival versions of that survey: 26-jan-2010, 09-jan-2009