Let's carefully discuss what we mean by sparse in this context. A signal $f$ is sparse if it can be expressed in very few nonzero components ($\mathbf{s}$) with respect to a given basis ($ \mathbf{\Psi} $ ). In other words, in matrix-vector language:

$ \mathbf{f} = \mathbf{\Psi} \mathbf{s} $

where $ || \mathbf{s} ||_0 \leq N $ where $N$ is the length of the vector and $|| \cdot||_0$ counts the number of nonzero elements in $\mathbf{s}$. Furthermore, we don't actually collect $N$ samples point-wise as we did in the Shannon sampling case. Rather, we measure $\mathbf{f}$ indirectly as $\mathbf{y}$ with another matrix as in:

$\mathbf{y} = \mathbf{\Phi f} = \mathbf{\Phi} \mathbf{\Psi} \mathbf{s} = \mathbf{\Theta s} $

where $\mathbf{\Theta}$ is an $M \times N$ matrix and $ M < N $ is the number of measurements. This setup means we have two problems to solve. First, how to design a stable measurement matrix $\mathbf{\Phi}$ and then, second, how to reconstruct $ \mathbf{f} $ from $ \mathbf{y} $.

This may look like a standard linear algebra problem but since $ \mathbf{\Theta} $ has fewer rows than columns, the solution is necessarily ill-posed. This is where we inject the sparsity concept! Suppose that $f$ is $K$-sparse ( $||f||_0=K$ ), then if we somehow knew which $K$ columns of $ \mathbf{\Theta} $ matched the $K$ non-zero entries in $\mathbf{s}$, then $\mathbf{\Theta}$ would be $ M \times K $ where we could make $M > K$ and then have a stable inverse.

This bit of reasoning is encapsulated in the following statement for any vector $\mathbf{v}$ sharing the same $K$ non-zero entries as $\mathbf{s}$, we have

$$1-\epsilon \leq \frac{|| \mathbf{\Theta v} ||_2}{|| \mathbf{v} ||_2} \leq 1+\epsilon $$

which is another way of saying that $\mathbf{\Theta}$ preserves the lengths of $K$-sparse vectors. Of course we don't know ahead of time which $K$ components to use, but it turns out that this condition is sufficient for a stable inverse of $\mathbf{\Theta}$ if it holds for any $3K$-sparse vector $\mathbf{v}$. This is the Restricted Isometry Property (RIP). Unfortunately, in order to use this sufficient condition, we would have to propose a $\mathbf{\Theta}$ and then check all possible combinations of nonzero entries in the $N$-length vector $\mathbf{v}$. As you may guess, this is prohibitive.

Alternatively, we can approach stability by defining incoherence between the measurement matrix $\mathbf{\Phi}$ and the sparse basis $\mathbf{\Psi}$ as when any of the columns of one cannot be expressed as a small subset of the columns of the other. For example, if we have delta-spikes for $\mathbf{\Phi}$ as the row-truncated identity matrix

$$\mathbf{\Phi} = \mathbf{I}_{M \times N} $$

and the discrete Fourier transform matrix for $\mathbf{\Psi}$ as

$\mathbf{\Psi} = \begin{bmatrix}\\\\ e^{-j 2\pi k n/N}\\\\ \end{bmatrix}_{N \times N}$

Then we could not write any of the columns of $\mathbf{\Phi}$ using just a few of the columns of $\mathbf{\Psi}$.

It turns out that picking the measuring $M \times N$ matrix randomly according to a Gaussian zero-mean, $1/N$ variance distribution and using the identity matrix as $\mathbf{\Phi}$, that the resulting $\mathbf{\Theta}$ matrix can be shown to satisfy RIP with a high probability. This means that we can recover $N$-length $K$-sparse signals with a high probability from just $M \ge c K \log (N/K)$ samples where $c$ is a small constant. Furthermore, it also turns out that we can use any orthonormal basis for $\mathbf{\Phi}$, not just the identity matrix, and these relations will all still hold.