I’m excited to announce a paper that Rajesh Ranganath, Dave Blei, and I released today on arXiv, titled Deep and Hierarchical Implicit Models.

Implicit probabilistic models are all about sampling as a primitive: they define a process to simulate data and do not require tractable densities (Diggle & Gratton (1984), Hartig, Calabrese, Reineking, Wiegand, & Huth (2011)) . We leverage this fundamental idea to develop new classes of models: they encompass simulators in the scientific communities, generative adversarial networks (Goodfellow et al., 2014), and deep generative models such as sigmoid belief nets (Neal, 1990) and deep latent Gaussian models (Rezende, Mohamed, & Wierstra (2014), Kingma & Welling (2014)). These modeling developments could not really be done without inference, and we develop a variational inference algorithm that underpins them all.

Biased as I am, I think this is quite a dense paper—chock full of simple ideas that are rife with deep implications. There are many nuggets of wisdom that I could ramble on about, and I just might in separate blog posts.

As a practical example, we show how you can take any standard neural network and turn it into a deep implicit model: simply inject noise into the hidden layers. The hidden units in these layers are now interpreted as latent variables. Further, the induced latent variables are astonishingly flexible, going beyond Gaussians (or exponential families (Ranganath, Tang, Charlin, & Blei, 2015)) to arbitrary probability distributions. Deep generative modeling could not be any simpler!

Here’s a 2-layer deep implicit model in Edward. It defines the generative process,

This generates layers of latent variables , and data via functions of noise .

import tensorflow as tf from edward.models import Normal from keras.layers import Dense N = 55000 # number of data points d = 100 # noise dimensionality # random noise is Normal(0, 1) eps2 = Normal ( tf . zeros ([ N , d ]), tf . ones ([ N , d ])) eps1 = Normal ( tf . zeros ([ N , d ]), tf . ones ([ N , d ])) eps0 = Normal ( tf . zeros ([ N , d ]), tf . ones ([ N , d ])) # alternate latent layers z with hidden layers h z2 = Dense ( 128 , activation = 'relu' )( eps2 ) h2 = Dense ( 128 , activation = 'relu' )( z2 ) z1 = Dense ( 128 , activation = 'relu' )( tf . concat ([ eps1 , h2 ], 1 )) h1 = Dense ( 128 , activation = 'relu' )( z1 ) x = Dense ( 10 , activation = None )( tf . concat ([ eps0 , h1 ], 1 ))

The model uses Keras, where Dense(256)(x) denotes a fully connected layer with hidden units applied to input x . To define a stochastic layer, we concatenate noise with the previous layer. The model alternates between stochastic and deterministic layers to generate data points .

Check out the paper for how you can work with, or even interpret, such a model.

EDIT (2017/03/02): The algorithm is now merged into Edward.

References