Tali Soroker is a Financial Analyst at I Know First.

Stock Market Forecast

Summary

Chaotic systems revisited – see original article here

Understanding the stock market as a chaotic process

What is a model and what is a multivariate time series?

How is machine learning used to create an efficient/effective model For stock market forecast

Considering higher dimension data sets and time complexity

Stock Market Forecast: Brief Review of Chaotic Processes

Ask any person on the street if something is random or chaotic and they may assume the two are synonymous. However, there is a very important distinction between the two types of processes, making it possible to make predictions for chaos while for randomness we cannot. Chaotic systems have the important property that each future event depends on past and current events. Using this, we can create models of chaotic systems in order to make predictions about future events. Models can be created using statistical analysis or, to avoid the difficulty involved in these problems, we can create a model for stock market forecast using machine learning and artificial intelligence.

Stock Market as a Chaotic Process

The stock market is a complex chaotic system. While markets exhibit many of the properties of chaotic systems, there are also other factors to consider. Market behavior displays aspects of both systemic and random components, which is what allows us to make realistic stock market forecasts with a decent degree of precision. In regards to the stock market, the chaos that we see is a consequence of the psychology of trading, something that we know is often times irrational. The irrationality of human psychology is not entirely predictable, however, there are underlying economic principles and assumptions that tell us how people will likely react to market change. Patterns form in the data over longer periods of time, and these patterns are crucial for the functionality of our chaos model.

Stock Market Forecast: Creating a Model

We construct and use models in an attempt to explain some unknown phenomena in terms of what we do know. The most useful models are those that we can represent as mathematical equations. Possibly the most famous example of such a model is Albert Einstein’s model of mass-energy equivalence, better known as the equation, E = mc2. Mathematicians create models all the time, trying to understand new concepts and hypotheses.

For this process to be effective in eventually reaching an understanding, these models must be testable. Just as in the scientific method, mathematicians must be able to put forward a potential model, test it, and have the ability to find a negative answer. Due to the chaotic nature of the systems that these models are attempting to explain, it is not necessary for there to be an exact fit with the data being analyzed. However, there should be at least some connection with the phenomena being exhibited. The best of the best should not only have decent precision but predictive capabilities as well.

When creating a model of complex phenomena, such as the stock market, one should begin by collecting all available information that is relevant in any way to the outcome of a situation. Once the data is gathered, it needs to be organized and one or more equations can be constructed that attempt to model the system. A simple example of this process can be seen below. On the left, there is a scatter plot of our “relevant data,” on the right we’ve drawn a line of best fit that can represent this data. We can then determine the equation of the line and use it to model our data. This basic principle is impossible for humans to employ for such systems as complex as the stock market, there are simply too many factors involved. Using artificial intelligence, we are able to make models for immense data sets with a similar, but more complicated method.

Multivariate time Series

The simple definition of a time series is a series of values that are measuring some system and are obtained at successive times, often with a set interval between them. Time series can be univariate or they can be multivariate. These terms are not referring to the number of independent variables in the time series, as one may think. These terms are referring, in a general sense, to the ways in which the data sets are relating to each other. Imagine somebody is charting the heights and weights of people over a certain period of time. In a univariate model of this information, there is no interaction between the height and weight of a person, while in a multivariate model the two sets of data are charted together and the correlation between the two can be examined. This example is useful for seeing the benefits of a multivariate examination, as we know that the weight of a person generally increases as their height increases. We know that there is a correlation between the two, and it is helpful to be able to analyze it.

The stock market operates as a multivariate system. The Dow Jones (DJI), one of the oldest market indices with a daily value representation, and gold prices are both examples of time series. The two are related, but it is unclear in what way they are affecting each other. To use a univariate system to model the two would be a mistake as the relation that they have to each other would not be able to be examined or understood. By aligning them together on one chart, one can see their inter-relation and make an attempt to construct a model. The complexity of the stock market means that there are many more examples of systems working together in ways that we are unable to see without the help of modeling.

Machine Learning

Once the model is built, it must be tested using real data. Multiple models are tested against each other with a “fitness function” being used to compare the models and check for accuracy. This function will evaluate how “good” each potential solution is compared to the others. After the function evaluates the data, it returns a positive integer called a “fitness value”, that shows how accurate the model is.

Here, machine learning comes into play in a major way. Once fitness values are found for potential solutions, they go through a process of natural selection to select which solutions survive and are passed on to the next generation and which do not. This process is not as simple as selecting the top ‘x’ number of solutions with the highest fitness values, the solutions are chosen statistically with a higher fitness value increasing the statistical weight of a given solution. So, the higher the fitness value of a solution, the higher the chance that it will survive, but it is not certain.

As the natural selection process continues, just as in the natural world, the machine is learning and parameters of the models are being optimized to improve performance. The mechanism of machine learning should allow for testing and screening of different models to occur simultaneously with the proposal of new improved models.

The algorithm’s key principle is based on the fact that a stock’s price is a function of many factors that interact with each other non-linearly. Real-life examples generally will not fit a simple linear model because they often exhibit the reversal of a trend after a particular saturation point. Artificial neural networks and genetic algorithms work much better for modeling these systems because they allow for this reversal phenomena.

Dimensionality Reduction

Dimensionality reduction is a crucial part of building a credible and efficient model for the given system. Reducing the number of random variables that the algorithm is considering and processing speeds up the machine’s execution and improves performance. The goal here is to leave as few variables as possible while losing a negligible amount of accuracy.

Principle Component Analysis (PCA) is just one method of dimensionality reduction. PCA works by taking the n-dimensional data set and orthogonally transforming the data into a new set of n coordinates called “Principle Components”. The first principle component will be determined by accounting for the most variation that exists in the data set. Each succeeding component will have the next highest variance; all components will be orthogonal to each other as well. The goal here is to reduce the dimensionality of the data set by taking these principal components that represent the highest amount of variance and to disregard the components that are not representing prominent variance.

Other notable methods of dimensionality reduction include applying a low variance filter or a high correlation filter, pruning the network, and adding and replacing inputs. Clustering can also be used for dimensionality reduction, or it can be a goal on its own. Clustering involves separating given sets of examples into sets of “similar” examples. This process is a good way to create structure among the observed data which is helpful for summarizing large and unmanageable data sets.

Time Constraint

When creating a model for systems with substantial data sets, the amount of time that is required for the algorithm to make all of the necessary calculations can be extensive. If the model is too complicated, it may not be able to solve the problem is its entirety, making it completely impractical. Models must then be created with a certain time constraint. As Einstein once said, “the model should be simple, but not too simple.” We want to restrict the parameters to create a model that is simple enough that it can operate without losing crucial information.

As we make restrictions on the parameters, compromises must be made. Speed, precision, and generality are all competing with each other and one must prioritize in order to optimize the model. Time complexity, commonly expressed in big O notation, of an algorithm, quantifies the amount of time that is needed as a function of the input. Often times the time complexity is estimated using the worst-case time complexity because the algorithm’s performance can vary based on different input types.

Conclusion

As a chaotic system, the stock market contains patterns that can be analyzed with models. Algorithms can be designed to make predictions about the stock market. Machine learning is incorporated into the model and is used for testing and optimizing the solution. As the model is being created, many factors must be considered such as the dimensions of the data and the algorithm’s time complexity. In order to have a workable model, compromises will need to be made in some cases in order to find a balance between speed, precision, and generality. I Know First has successfully created an algorithm to model the stock market and to make predictions on trends in the stock market over 6 different time horizons.

Part 1 Click Here