1 Introduction¶

[back to top]

The Parzen-window technique is a widely used non-parametric approach to estimate a probability density function $p(\pmb x)$ for a specific point $\pmb x$ from a sample $\pmb x_n$ that doesn't require any knowledge or assumption about the underying distribution.

Putting it in context - Where would this method be useful?

A popular application of the Parzen-window technique is to estimate the class-conditional densities (or also often called 'likelihoods') $p(\pmb x \; | \omega_i)$ in a supervised pattern classification problem from the training dataset (where $\pmb x$ is a multi-dimensional sample that belongs to particular class $\omega_i$).

Imagine that we are about to design a Bayes classifier for solving a statistical pattern classification task using Bayes's rule:

$P(\omega_i\; | \;\pmb x) = \frac{p(\pmb x \; | \; w_i) \cdot P(\omega_i)}{p(\pmb x)} \\ \Rightarrow posterior \; probability = \frac{\; likelihood \; \cdot \; prior \; probability}{evidence}$

If the parameters of the the class-conditional densities (also called likelihoods) are known, it is pretty easy to design the classifier. I have solved some simple examples in IPython notebooks under the section Parametric Approaches.

However, it becomes much more challenging, if we don't don't have prior knowledge about the underlying parameters that define the model of our data.

Imagine we are about to design a classifier for a pattern classification task where the parameters of the underlying sample distribution are not known. Therefore, we wouldn't need the knowledge about the whole range of the distribution; it would be sufficient to know the probability of the particular point, which we want to classify, in order to make the decision. And here we are going to see how we can estimate this probability from the training sample.

However, the only problem of this approach would be that we would seldom have exact values - if we consider the histogram of the frequencies for a arbitrary training dataset. Therefore, we define a certain region (i.e., the Parzen-window) around the particular value to make the estimate.

And where does this name Parzen-window come from?

As it was quite common in the earlier days, this technique was named after its inventor, Emanuel Parzen, who published his detailed mathematical analysis in 1962 in the Annals of Mathematical Statistics: On the Estimation of a Probability Density Function and Mode.