Consider the following problem statement:

Input: a) A set of labelled examples where every is the input vector, and is the corresponding output label. b) A set of unlabelled instances .

Output: The set of expected labels for all instances in .

There are two ways (or rather, two philosophies) you could use, to solve this problem. Lets look at the first one most people in ML are acquainted with:

The Inductive way

Induction, in the context of learning, is the attempted discovery of rules/generalizations based on analysis of collected data. ‘Attempted discovery’ is the key term here – the generalizations are not facts, but approximations based on evidence you have gathered.

The main characteristic of inductive learning is the building of a model – those rules/properties you induce from the data to answer your questions, together make up the model. The learning can happen in a supervised or semi-supervised (or even unsupervised) fashion. What you are basically trying to do, is make generalizations that can help you understand/label unseen instances in the future. Statistical inference (which deals with building parametric models) is one of the techniques used in inductive learning.

Concretely speaking, heres how inductive learning will work for the problem defined above:

Input: for supervised learning (and too if we go for semi-supervised learning).

Output: The set of expected labels for all instances in .

Approach: Compute a function , using information in , such that is as close to as possible over all instances in . Using this function, compute each as .

The other way to go about this would be…

The Transductive way

Read the problem statement at the beginning of this post, once again. The output which the problem definition asks for, is the labels of instances in only. What we tried to do in inductive learning, is build a model to label any unseen instances in the future. If you think about it, we basically solved a problem thats more general than the one we needed to solve. Instead of building a universal model – which we don’t need anyway – could we leverage the information contained in the instances of (with respect to those in ) to make better predictions for specifically? That is precisely what Transductive Learning tries to do.

Think about it this way. Suppose you were shown a pack of dogs, where each dog’s breed can be either A or B. Some of the dogs have been tagged with their correct breed, but many are not. You are asked to label the unlabelled dogs with their respective breeds. What would you do? Apart from the characteristics of dogs from A and B, wouldn’t it make sense to observe the unlabelled dogs and their interactions and similarities to those from A and B, to make good guesses for their breeds? That is the philosophy of Transduction.

Transduction, in the context of learning, refers to reasoning from specific observed (training) instances, to specific observed (unlabelled) instances. It was introduced by Vladmir Vapnik, with the core thought behind it being this:

“When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one.”

To avoid reinventing the wheel, please refer to the Wikipedia example problem to get an an intuitive understanding of how many Transductive learning algorithms work. For those of you who are acquainted with Support Vector Machines (SVMs), theres a transductive version of them around too. If you don’t want to go into the mathematical details, heres a summary (Basically, you incorporate the unlabelled samples into determining what the ‘maximum margin’ actually is):

It is important to understand that not all Semi-Supervised Learning methods are Transductive in nature. The inclusion of unlabelled samples in the training is not the primary characteristic of transduction – the avoidance of building a general ‘model’ is. We are using the information implicit in the instances whose output is required, to understand them better.

The major disadvantage of Transductive learning is pretty obvious: The information you ‘learn’ cannot be used to label new instances (which you did not have during training) – since you are not building a model. Essentially, every time you want to classify a new set of instances, you have to re-do the whole training again. Therefore, it makes most sense only when your goals (in terms of the instances you want to understand) are specific.

Another thing to note, is that the effectiveness of Transduction can suffer if you have some really noisy samples in your data. Think about the dogs example I gave. If some dogs from the unlabelled ones were from some random breed C (or cross-breeds of A and B), your internal understanding of the entire pack would suffer. You would see some dogs behaving randomly with others, some showing characteristics of both – it might make you question the labelled dogs as well! Since the distribution(s) of unlabelled data are used along with the training data, the integrity of the whole dataset (or at least most of it) is a prerequisite in Transductive learning.