If you haven't yet seen the first post for this series, please take a minute to read that first... Are you back? Great. Let's enter the wormhole...

This post is going to delve into the mechanics of feature engineering for the sorts of time series data that you may use as part of a stock price prediction modeling system.

I'll cover the basic concept, then offer some useful python code recipes for transforming your raw source data into features which can be fed directly into a ML algorithm or ML pipeline.

Anyone who has dabbled with any systems-based trading or charting already has experience with simple forms of feature engineering, whether or not they realized it. For instance:

Converting a series of asset prices into percent change values is a simple form of feature engineering

Charting prices vs. a moving average is an implicit form of feature engineering

Any technical indicator (RSI, MACD, etc...) are also forms of feature engineering

The process takes in one or more columns of "raw" input data (e.g., OHLC price data, 10-Q financials, social media sentiment, etc...) and converts it into many columns of engineered features.

I believe (and I don't think I'm alone here!) that featue engineering is the most under-appreciated part of the art of machine learning. It's certainly the most time consuming and tedious, but it's creative and "fun" (for those who like getting their hands dirty with data, anyway...).

Feature engineering is also one of the key areas where those with domain expertise can shine. Those whose expertise in investing is greater than their skill in machine learning will find that feature engineering will allow them to make use of that domain expertise.

Feature engineering is a term of art for data science and machine learning which refers to pre-processing and transforming raw data into a form which is more easily used by machine learning algorithms. Much like industrial processing can extract pure gold from trace elements within raw ore, feature engineering can extract valuable "alpha" from very noisy raw data.

You have to dig through a lot of dirt to find gold.