Introduction and Overview

An unsupervised hierarchical feature learning system is an unsupervised learning algorithm for extracting, recognizing, and predicting patterns online. It generates a hierarchy of patterns, each a composition of lower level features and learned through observations of streaming data.

A pattern represents a state, and the lowest level state at any given time is discrete as it cannot be broken down into smaller features.

At higher levels, states are more continuous because the patterns are defined by sequences of lower-level states. If level n+1 has sequences of size 3, the state at level n is able to transition 2 times before the higher level state changes.

It’s important for high-level patterns to interact in this way with low-level patterns because it allows for state representations to transcend the moment-to-moment chaos of temporally evolving systems.

Learning and Bottom-Up Control

Each level of the hierarchy performs self-organization to manage storage space by getting rid of infrequent or incorrect patterns and maintaining those that are useful and/or occur often. Learning occurs through a clustering process and an update process, where the former establishes new patterns and the latter adjusts/removes existing patterns.

The hierarchy as a whole attempts to minimize the number of levels as well as the number of patterns per level stored at any given time, while also maximizing its quality as a model of an environment.

Once the top level of the hierarchy grows beyond some threshold, a new level begins generating higher-level patterns on top of it. This continues until the top level generates fewer patterns defined by the threshold.

If at some point the top level starts growing again then the process will continue, as it never truly ‘stops’ but rather pauses until it is needed. In practical circumstances, however, a limit on the number of layers would be desirable so as to not push it toward an unmanageable size.

Prediction and Top-Down Control

While low-level states are changing, higher levels predict the future state transitions using a finite list of recent low-level states to determine which high-level state they are likely to be in, then determining which low-level state is likely to follow the current observation.

If a prediction is wrong then the state which the higher level was thought to be in is incorrect, because high-level states are literally defined by specific sequences of low-level states. Thus the result of a prediction works as a cost function to adjust patterns at higher levels.

Predictions must be accurate before a new layer is built onto the current top layer, otherwise the set of patterns used to generate the new layer would not be reliable. Therefore the top layer at any given time must perform above a certain threshold, so as to ensure a strong foundation for the new layer.