The scenario¶

You have 20 datapoints, each of which has 1,000,000 attributes. Each observation also has an associated $y$ value, and you are interested in whether a linear combination of a few attributes can be used to predict $y$. That is, you are looking for a model

$$ y_i \sim \sum_j w_j x_{ij} $$

where most of the 1 million $w_j$ values are 0.

The problem¶

Since there are so many more attributes than datapoints, the chance that a few attributes correlate with $y$ by pure coincidence is fairly high.

You kind of remember that cross-validation helps you detect over-fitting, but you're fuzzy on the details.