In this article, we will try to address the following problem: say we have some \(iid\) sequence of discrete random variable \(\{X_i\}_{1 < i}\) and \(N\) follows some distribution, how do we estimate the density of \(\sum_{i \geq 1}^N X_i\).

The first approach for this problem, which one might think of is using convolution. If a convenient formulation of its analytic form is available, then we are done. However, sometimes, we might want to compute numerically the distribution of our sum. Take the example for \(X_1\) and \(X_2\) two random variable with their own distributions \(\mathbb{P}_{X_1}(X_1 = x_1)\) and \(\mathbb{P}_{X_2}(X_2 = x_2)\) with \(Y = X_1 + X_2\). This implies that there are various combinations of \(x_1\) and \(x_2\), of which the sum yield \(y\)! Moreover, each of those combinations have probability \(\mathbb{P}_{X_1}(X_1 = x_1)\mathbb{P}_{X_2}(X_2 = x_2)\). All this allows us to deduce the probability mass function of \(Y : \mathbb{P}(Y = y) = \sum_{x_1 + x_2 = y} \mathbb{P}_{X_1}(X_1 = x_1)\mathbb{P}_{X_2}(X_2 = x_2)\). Just with this case of two random variables, if we were to compute numerically, we might have quite a few combinations of \(x_1\) and \(x_2\) to consider! Take a more general case with \(X_1\) through \(X_n\), we have: \(\mathbb{P}(Y = y) = \sum_{k=0}^{y} \mathbb{P}(X_1 + \cdots + X_{n-1} = k)\mathbb{P}(X_n = y-k)\) and recursively extend our convolution. I do not know about you, but this is getting far to convoluted for my taste (ha ha ha….)!

When we think of estimating (or analytically deriving) the distribution of a random variable, one tool which may come to mind is the probability density function. One reason for this is that it uniquely defines the distribution of a random variable. Let’s show this:

Take \(X\), a discrete random variable where \(\mathbb{P}(X \in A) = \mathbb{P}( \omega \in \Omega | X(\omega) \in A ) \) , where \(X: \Omega \rightarrow \mathbb{R}\) is a measurable function with \(\Omega\) countable, whether finite or infinite. Let’s remind our self, that a finite set has an injective mapping to \( \{1, \cdots n\}\) for some \(n \in \mathbb{N}\), while a countable set means that there exists an injective mapping to some subset of \(\mathbb{N}\). Our probability generating function, from now on referred to as PGF, is defined as: \(G_X(s) = E[s^X] = \sum_{i \in \mathbb{N}} p(i) s^{i}\). Note that our geometric series has terms up to infinity; if our random variable is finite, then we simply assign probability 0 to any value not defined. Interesting to note is the convergence of the series for any \(|s| \leq 1\). Clearly from the definition, if two discrete random variables have the same probability mass function, then they have the same PGF. Now the other way around. A PGF is a power series expansion of a function about the origin, hence each coefficient is unique. Given the coefficients are \(\mathbb{P}(X = i)\) we are done.

Let’s make our problem more precise, say we have \(iid\) sequence of a discrete random variable \(\{X_i\}_{1 < i}\) with PGF: \(\phi_X(s)\), and \(N\) is a discrete random variable with PGF: \(\xi_N(s)\), lets find the probability mass function of \(\sum_{i = 1}^N X_i \) which we will call \(Y\). The PGF for Y is \(G_Y(s) = \mathbb{E}[\mathbb{E}[s^{\sum_{i=1}^N} | X_i] | N] = \mathbb{E}[ \phi_X(s)^N|N] = \xi_N(\phi_X(s))\). Now that is great! The issue is that we need to retrieve our coefficients for the probability masses. To make excercise more tractable, lets make explicit some random variables. If \(N \sim Poi(\lambda)\) then \(\xi_N(s) = \sum_{i \in \mathbb{N}} \frac{e^{-\lambda}\lambda^i}{i!}s^i = e^{-\lambda (1-s)}\), from which we obtain: \(G_Y(s) = e^{-\lambda (1-\phi_X(s))}\). We therefore need to find a method to extract our masses from the PGF. Here is where Fourier Transforms come in!

Say we have a discrete random variable \(X\), the Fourier transform for the \(k\)’th observation in observations \(x_1 \cdots x_n\) is: \(\hat{x_k} = \sum_{j = 1}^n x_j exp(\frac{-2\pi \textbf{i} jk}{n})\). Those familiar with \(\mathbb{C}\), the complex numbers, will find the notation \(re^{\textbf{i}\theta}\) familiar, which equates, using Eulers formula to: \(r(cos(\theta) + \textbf{i}sin(\theta))\) and \(\textbf{i}^2 = -1\). Immediately, we can derive the inverse transformation: \(x_k = \frac{1}{n}\sum_{j = 1}^n \hat{x_j} exp(\frac{2\pi \textbf{i} jk}{n})\). One thing which I find quite interesting to notice, is that the Fourier Transform looks alot like the moment generating function (MGF). Recall, for some random variable \(X\), its MGF is \(\mathbb{E}[e^{sX}]\). Moreover the MGF of a sum of \(iid\) random variables, is the product of their MGF. Looking at the formulation of our Fourier Transforms, the same holds, a quite conveniant property!

Say, for the time being, that \(N\) is constant, so \(Y\) takes on at most \(Nl\) for \(X\) defined on \(\{0 , \cdots , l\}\). Recall \(\phi_X(s)\) is our probability generating function for \(X\), therefore the PGF for \(Y\) is simply \(\phi_X(s)^N\) which needs to also be equal to: \(\sum_{j = 0}^{Nl}s^j \mathbb{P}[Y = j]\), where we want to extract \(\mathbb{P}[Y = i]\) for \(i \in \{0 , \cdots Nl\}\). The idea is to evaluate \(\phi_X(s)\) for \(s_0 , \cdots ,s_{Nl}\) and \(s_k = exp(\frac{-2\pi \textbf{i}k}{Nl+1})\). Therefore, \(\phi_X(s_k) = \sum_{j = 0}^{Nl}exp(\frac{-2 \pi \textbf{i} k}{Nl+1}j)\mathbb{P}_X(X = j)\) with \(\mathbb{P}_X(X = j) = 0\) for all \(j > l\). \((\phi_X(s_0) , \cdots ,\phi_X(s_{Nl}))\) is the Discrete Fourier Transform for \((\mathbb{P}_X(X = 0) , \cdots , \mathbb{P}_X(X = Nl))\). Now, note that \(\phi_X(s_k)^N\) is an element in the DFT, \(G_Y(s)\) for \(( \mathbb{P}(Y = 0) , \cdots , \mathbb{P}(Y = Nl))\). Following this observation, we use the inverse DFT to obtain our \(k\)’th element of \(P_Y(Y = k): \frac{1}{Nl+1}\sum_{j=0}^{Nl}G_Y(s_k)exp(\frac{-2 \pi \textbf{i} k}{Nl+1}j)\). As you might expect, a probability is not a complex number, hence one can prove (perhapse some other blog post) that the imaginary part of the probability is null.

What do we do if our N, is not constant, but a random variable? What we do is find some M, as the probable maximum Y will ever take on. We will use some inequalities, so if there is need for review: Simple Bounds on Probabilities

\begin{aligned}

P(Y \geq M) \leq {e^{ – \lambda (1 – \phi(s))} \over s^M} \leq \epsilon \\

\Rightarrow {e^{ – \lambda (1 – \phi(s))} \over \epsilon} \leq s^M \\

{ – \lambda (1 – \phi(s)) -ln(\epsilon) \over ln(s)} \leq M

\end{aligned}

Therefore we take:

\begin{aligned}

M = inf_{s >1}{ – \lambda (1 – \phi(s)) -ln(\epsilon) \over ln(s)}

\end{aligned}

We then proceed with the methodology reviewed above.

Below you can find a short R code which implements this with \(N \sim Poi(7)\) and \(P(X = x) = \frac{4 – |3-x|}{16}\)

lambda <- 7 epsilon <- 10^(-5) x <- 0:6 p_x <- (4 - abs(3-x))/16 phi <- c() for(s in ((0:1000)/1000 +1)){ phis <- c() for(i in 0:6){ phis <- c(phis ,(s^i) * p_x[i+1]) } phi <- c(phi ,sum(phis)) } rm(s) s <- ((1:1000)/1000 +1) Ms <- c() for(i in (1:1000)){ Ms <- c(Ms , (-log(epsilon) - lambda * (1 - phi[i])) / (log(s[i]))) } M <- ceiling(min(Ms)) print(M) ######## #dft ######## zeros <- rep(0, M - length(p_x)) p_x <- c(p_x , zeros) p_hat <- c() for(j in 0 : (M-1)){ p_hats <- c() for(x in 0 : (M-1)){ p_hats <- c(p_hats , complex(real = cos( 2 * pi * j * x / M) , imaginary = - sin(2 * pi * j * x / M)) * p_x[x+1]) } p_hat <- c(p_hat , sum(p_hats)) } # exp( (- 2) * pi * j * x * I / M) == complex(real = cos( 2 * pi * j * x / M) , #imaginary = - sin(2 * pi * j * x / M)) g_p_hat <- c() for( j in 0 : (M-1)){ #exp((-lambda)*(1- p_hat[j +1])) g_p_hat <- c(g_p_hat , exp((-lambda)*(1- p_hat[j +1]))) } PY <- c() for(y in 0 : (M-1)){ PYs <- c() for(j in 0 : (M-1)){ PYs <- c(PYs , (1/M) * g_p_hat[j +1] * complex(real = cos(2 * pi * y * j / M), imaginary = sin(2 * pi * y * j / M)) ) } PY <- c(PY , sum(PYs)) } #for checking if probability mass function makes sense : adds to 1 print(sum(Re(PY))) #adds to 1! plot(Re(PY))

This seems reasonable!