Desirable property of Feature Distributions

Before moving on I wish to cover some desired properties of feature distributions. (I personally think this topic is very important)

Sparse features per example (Population Sparsity) → Each element in the matrix (or each example) should be represented by only few (non-zero) features.

Sparse features across examples (Lifetime Sparsity) → Each feature should allow us to distinguish from one example to another example. (Other words features should be effective enough for us to tell a difference from one example to the next. )

Uniform activity distribution (High Dispersal) → For each example, the number of active features should be similar to other examples.