Tweet Share Share

Last Updated on September 5, 2016

In November 2014, Bernhard Scholkopf was awarded the Milner Award by the Royal Society for his contributions to machine learning.

In accepting the award, he gave a layman’s presentation of his work on statistical and causal machine learning methods titled “Statistical and causal approaches to machine learning“.

It’s an excellent one hour talk and I highly recommend that you watch it.

Statistical Learning

On the statistical side, Scholkopf talks about empirical inference and generalisation.

An interesting and motivating point he makes early is on hard inference problems, motivating his work on kernel machines.

Specifically, he references the problem of classifying DNA sequences from locations as mentioned in Sonnenburg, et al. 2008 titled “Large Scale Multiple Kernel Learning“. In the paper, the authors show that algorithm performance increases as a function of the amount of data available.

He calls this a paradigm changing fact and characterizes these hard inference problems as having:

High dimensionality

Complex regularities

Little prior knowledge

Requiring “big data” sets

He finishes this part of the talk on statistical learning, describing the three key aspects of contribution of kernels methods.

Formalizes the notion of similarity

Induces a linear representation of the data in a vector space, no mater where the original data comes from

Encodes the function class used for learning, solutions of kernel algorithms can be expressed as kernel expansions

Causal Learning

The second part of the talk talks about Scholkopf’s work on causal modeling.

He describes causality, graphical models of causality and how one may infer a causal model from data.

Specifically, he touched on two new approaches to addressing the problems in inferring a causal model:

Separating out the cause from the mechanism (independence of noise and functions)

Restricting the functional model

The most interesting part of this discussion for me was when he touched on his work on viewing semi-supservied learning through the lens of a causal model. This was drawn from his work in “On causal and anticausal learning“, 2012.

He describes two examples:

Example 1 : Predicting proteins from mRNA sequences. Here X (mRNA) causes Y (protein) and it is a causal problem.

: Predicting proteins from mRNA sequences. Here X (mRNA) causes Y (protein) and it is a causal problem. Example 2: Predicting class membership from a handwritten digit. Here X (class membership) causes Y (handwritten digit) and it is an anti-causal problem.

The key finding is that modeling P(X) with extra data does not help in the first problem. We assume that P(X) is independent of P(Y|X). But in the second case modeling P(Y) is helpful because P(Y) is dependent on P(X|Y).

Problems like those in example 2 (predicting the cause X from the effect Y) will benefit from semi-supervised learning techniques. I’m surprised that this finding is talked about more often, perhaps it’s obvious to those deeper in the field.

Summary

It’s a great video and I’m sure it will get you motivated with regard to two important areas of machine learning.

Again, you can watch the video here: “Statistical and causal approaches to machine learning“.