Some Mathematical Tools for Machine Learning

Video Lectures (by Chris Burges from Microsoft Research)



Lectures contain:

1. Lagrange multipliers: Lagrange multipliers: an indirect approach can be easier; Multiple Equality Constraints; Multiple Inequality Constraints; Two points on a d-sphere; The Largest Parallelogram; Resource allocation; A convex combination of numbers is maximized by choosing the largest; The Isoperimetric problem; For fixed mean and variance, which univariate distribution has maximum entropy? An exact solution for an SVM living on a simplex

2. Notes on some Basic Statistics; Probabilities can be Counter-Intuitive (Simpson's paradox; the Monty Hall puzzle); IID-ness: Measurement Error decreases as 1/sqrt{n}; Correlation versus Independence; The Ubiquitous Gaussian: Product of Gaussians is Gaussian o Convolution of two Gaussians is a Gaussian; Projection of a Gaussian is a Gaussian; Sum of Gaussian random variables is a Gaussian random variables; Uncorrelated Gaussian variables are also independent; Maximum Likelihood Estimates for mean and covariance (prove required matrix identities); Aside: For 1-dim Laplacian, max. likelihood gives the median; Using cumulative distributions to derive densities

3. Principal Component Analysis and Generalizations: Ordering by Variance; Does Grouping Change Things? PCA Decorrelates the Samples; PCA gives Reconstruction with Minimal Mean Squared Error; PCA preserves Mutual Information on Gaussian data; PCA directions lie in the span of the data; PCA: second order moments only; The Generalized Rayleigh Quotient; Non-orthogonal principal directions; OPCA; Fisher Linear Discriminant; Multiple Discriminant Analysis

4. Elements of Functional Analysis: High Dimensional Spaces; Is Winning Transitive?; Most of the Volume is Near the Surface: Cubes; Spheres in n-dimensions; Banach Spaces, Hilbert Spaces, Compactness; Norms; Useful Inequalities (Minkowski and Holder); Vector Norms; Matrix Norms; The Hamming Norm; L1, L2, L_infty norms - is L0 a norm?

Example: Using a Norm as a Constraint in Kernel Algorithms



These are lectures on some fundamental mathematics underlying many approaches and algorithms in machine learning. They are not about particular learning algorithms; they are about the basic concepts and tools upon which such algorithms are built. Often students feel intimidated by such material: there is a vast amount of "classical mathematics", and it can be hard to find the wood for the trees. The main topics of these lectures are Lagrange multipliers, functional analysis, some notes on matrix analysis, and convex optimization. I've concentrated on things that are often not dwelt on in typical CS coursework. Lots of examples are given; if it's green, it's a puzzle for the student to think about. These lectures are far from complete: perhaps the most significant omissions are probability theory, statistics for learning, information theory, and graph theory. I hope eventually to turn all this into a series of short tutorials.

Introduction to Learning Theory

Video Lectures (by Olivier Bousquet from Max Planck Institute for Biological Cybernetics)

Description:

The goal of this course is to introduce the key concepts of learning theory. It will not be restricted to Statistical Learning Theory but will mainly focus on statistical aspects. Instead of giving detailed proofs and precise statements, this course will aim at providing some useful conceptual tools and ideas useful for practitioners as well as for theoretically-driven people.

An Introduction to Pattern Classification

Video Lectures (by Elad Yom Tov from Technion)

Lectures contain:

Pattern classification algorithms, classification procedures, supervised learning, unsupervised learning, classifier and preprocessing algorithms, errors, classifier and computational complexity, dimensionality reduction, approaches for dimensionality reduction: feature reduction, feature selection; genetic programming, whitening transform, nearest neighbor editing algorithm, voronoi diagram, clusters, clustering techniques: agglomerative, partitional, minimum spanning tree, aghc, kohonen maps, k-means, fuzzy k-means, competitive learning; bayes rule, heuristic algorithms, tree based algorithms, optimization algorithms, neural networks, training methods, perceptons, radial-basis function networks, support-vector machines, error estimation methods.

Statistical Learning Theory

Video Lectures (by Olivier Bousquet from Max Planck Institute for Biological Cybernetics)





Description:

This course will give a detailed introduction to learning theory with a focus on the classification problem. It will be shown how to obtain (pobabilistic) bounds on the generalization error for certain types of algorithms. The main themes will be:



probabilistic inequalities and concentration inequalities

union bounds and chaining

measuring the size of a function class

Vapnik Chervonenkis dimension

shattering dimension and Rademacher averages

classification with real-valued functions

Some knowledge of probability theory would be helpful but not required since the main tools will be introduced.

Stochastic Learning

Video Lectures (by Léon Bottou from NEC Research)



These lectures contain:

Early learning systems, recursive adaptive algorithms, risks, batch gradient descent, stochastic gradient descent, non differentiable loss functions, rosenblatt's perceptrons, k-means, vector quantization, stochastic noise, multilayer networks

Bayesian Learning

Video Lectures (by Zoubin Ghahramani from University College London)

Description of video course:

Bayes Rule provides a simple and powerful framework for machine learning. This tutorial will be organised as follows:

1. Lecturer will give motivation for the Bayesian framework from the point of view of rational coherent inference, and highlight the important role of the marginal likelihood in Bayesian Occam's Razor.

2. He will discuss the question of how one should choose a sensible prior. When Bayesian methods fail it is often because no thought has gone into choosing a reasonable prior.

3. Bayesian inference usually involves solving high dimensional integrals and sums. He will give an overview of numerical approximation techniques (e.g. Laplace, BIC, variational bounds, MCMC, EP...).

4. Mr. Ghahramani will talk about more recent work in non-parametric Bayesian inference such as Gaussian processes (i.e. Bayesian kernel "machines"), Dirichlet process mixtures, etc.

Learning on Structured Data

Video Lectures (by Yasemin Altun from TTI)

Lectures description:

Discriminative learning framework is one of the very successful fields of machine learning. The methods of this paradigm, such as Boosting, and Support Vector Machines have significantly advanced the state-of-the-art for classification by improving the accuracy and by increasing the applicability of machine learning methods. One of the key benefits of these methods is their ability to learn efficiently in high dimensional feature spaces, either by the use of implicit data representations via kernels or by explicit feature induction. However, traditionally these methods do not exploit dependencies between class labels where more than one label is predicted. Many real-world classification problems involve sequential, temporal or structural dependencies between multiple labels. We will investigate recent research on generalizing discriminative methods to learning in structured domains. These techniques combine the efficiency of dynamic programming methods with the advantages of the state-of-the-art learning methods.

Information Retrieval and Text Mining

Video Lectures (by Thomas Hofmann from Brown University)

Description:

This four hour course will provide an overview of applications of machine learning and statistics to problems in information retrieval and text mining. More specifically, it will cover tasks like document categorization, concept-based information retrieval, question-answering, topic detection and document clustering, information extraction, and recommender systems. The emphasis is on showing how machine learning techniques can help to automatically organize content and to provide efficient access to information in textual form.

Foundations of Learning

Video Lectures (by Steve Smale from University of California)

An introduction to grammars and parsing

Video Lecture (by Mark Johnson from Brown Laboratory for Linguistic Information Processing)

Video Lecture contains:

computational linguistics, its syntactic and semantic structure, context free grammars, its derivations, probabalistics cfg's (pcfg), dynamic programming, expectation maximization, em algorithm for pcfg's, top-down parsing, bottom-up parsing, left-corner parsing.

Information Geometry

Video Lectures (by Sanjoy Dasgupta from University of California)

Description:

This tutorial will focus on entropy, exponential families, and information projection. We'll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions — which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an "information projection": the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We'll study the geometry of this problem and discuss two popular iterative algorithms for it.

Tutorial on Machine Learning Reductions

Video Lectures (by John Langford from Yahoo Research)



Tutorial description:

There are several different classification problems commonly encountered in real world applications such as 'importance weighted classification', 'cost sensitive classification', 'reinforcement learning', 'regression' and others. Many of these problems can be related to each other by simple machines (reductions) that transform problems of one type into problems of another type. Finding a reduction from your problem to a more common problem allows the reuse of simple learning algorithms to solve relatively complex problems. It also induces an organization on learning problems — problems that can be easily reduced to each other are 'nearby' and problems which can not be so reduced are not close.

Online Learning and Game Theory

Video Lectures (by Adam Kalai from Toyota Technological Institute)

Description:

We consider online learning and its relationship to game theory. In an online decision-making problem, as in Singer's lecture, one typically makes a sequence of decisions and receives feedback immediately after making each decision. As far back as the 1950's, game theorists gave algorithms for these problems with strong regret guarantees. Without making statistical assumptions, these algorithms were guaranteed to perform nearly as well as the best single decision, where the best is chosen with the benefit of hindsight. We discuss applications of these algorithms to complex learning problems where one receives very little feedback. Examples include online routing, online portfolio selection, online advertizing, and online data structures. We also discuss applications to learning Nash equilibria in zero-sum games and learning correlated equilibria in general two-player games.

On the Borders of Statistics and Computer Science

Video Lectures (by Peter Bickel from Berkley University)



Description:

Machine learning in computer science and prediction and classification in statistics are essentially equivalent fields. I will try to illustrate the relation between theory and practice in this huge area by a few examples and results. In particular I will try to address an apparent puzzle: Worst case analyses, using empirical process theory, seem to suggest that even for moderate data dimension and reasonable sample sizes good prediction (supervised learning) should be very difficult. On the other hand, practice seems to indicate that even when the number of dimensions is very much higher than the number of observations, we can often do very well. We also discuss a new method of dimension estimation and some features of cross validation.

Decision Maps

Video Lecture (by Nadler Boaz from Toyota Technological Institute)

Measures of Statistical Dependence

Video Lectures (by Arthur Gretton from Max Planck Institute for Biological Cybernetics)

Description:

A number of important problems in signal processing depend on measures of statistical dependence. For instance, this dependence is minimised in the context of instantaneous ICA, in which linearly mixed signals are separated using their (assumed) pairwise independence from each other. A number of methods have been proposed to measure this dependence, however they generally assume a particular parametric model for the densities generating the observations. Recent work suggests that kernel methods may be used to find estimates that adapt according to the signals they compare. These methods are currently being refined, both to yeild greater accuracy, and to permit the use of the signal properties over time in improving signal separability. In addition, these methods can be applied in cases where the statistical dependence between observations must be maximised, which is true for certain classes of clustering algorithms.

Anti-Learning

Video Lectures (by Adam Kowalczyk from National ICT)

Description:

The Biological domain poses new challenges for statistical learning. In the talk we shall analyze and theoretically explain some counter-intuitive experimental and theoretical findings that systematic reversal of classifier decisions can occur when switching from training to independent test data (the phenomenon of anti-learning). We demonstrate this on both natural and synthetic data and show that it is distinct from overfitting. The natural datasets discussed will include: prediction of response to chemo-radio-therapy for esophageal cancer from gene expression (measured by cDNA-microarrays); prediction of genes affecting the aryl hydrocarbon receptor pathway in yeast. The main synthetic classification problem will be the approximation of samples drawn from high dimensional distributions, for which a theoretical explanation will be outlined.

Brain Computer Interfaces

Video Lectures (by Klaus-Robert Müller from Fraunhofer FIRST)

Description:

Brain Computer Interfacing (BCI) aims at making use of brain signals for e.g. the control of objects, spelling, gaming and so on. This tutorial will first provide a brief overview of the current BCI research activities and provide details in recent developments on both invasive and non-invasive BCI systems. In a second part -- taking a physiologist point of view -- the necessary neurological/neurophysical background is provided and medical applications are discussed. The third part -- now from a machine learning and signal processing perspective -- shows the wealth, the complexity and the difficulties of the data available, a truely enormous challenge. In real-time a multi-variate very noise contaminated data stream is to be processed and classified. Main emphasis of this part of the tutorial is placed on feature extraction/selection and preprocessing which includes among other techniques CSP and also ICA methods. Finally, I report in more detail about the Berlin Brain Computer (BBCI) Interface that is based on EEG signals and take the audience all the way from the measured signal, the preprocessing and filtering, the classification to the respective application. BCI communication is discussed in a clincial setting and for gaming.

Introduction to Kernel Methods

Video Lecture (by Mikhail Belkin from University of Chicago)

Lecture contains:

Kernel-based algorithms, regression / classification, regularization, rkhs, representer theorem, rls algorithm, svms, feature map.

Related Posts

I found some great Machine Learning and Artificial Intelligence (AI) video lecture courses recently and I will share them with you this month.Here are lectures from Machine Learning Summer School 2003, 2005 and 2006.Machine Learning is a foundational discipline of the Information Sciences. It combines deep theory from areas as diverse as Statistics, Mathematics, Engineering, and Information Technology with many practical and relevant real life applications.

Labels: ai, artificial intelligence, bayesian learning, data mining, game theory, learning theory, machine learning, mathematics, pattern classification, statistical learning, stochastic learning, text mining