Step 1: Get the Data

The first step in the process is usually all about data mining and filtering. Many data sources are often quite large and unstructured. So this step is all about extracting structured data from sources. On the topic of sources, be sure to select relevant and trusted sources. If I were trying to predict election results I would probably avoid using The Onion — although given political outcomes this year I may be wrong.

Step 2: Analyse the Data

Here we need to start focusing on the contents of the data. This alone can prove to be quite a challenge. For example, if I am trying to make predictions about my own health, what information should I take into account? Do I smoke? What is my favourite colour? Where do I work? Often determining what is relevant and what is not is its own challenge. Proper pre-processing and filtering techniques are a must when cleaning up your data.

You should also ensure your data is of good quality. A reliable source alone does not ensure quality. What if you scraped your data from wikipedia on the day someone thought it would be fun to vandalise the articles you were mining? Running your data through existing analysis pipelines could be quite informative and a simple method of spotting questionable data. More formally you can use confirmatory factor analysis to ensure your extracted data will at least fit your model. It is also recommend that you apply other statistical techniques to ensure your data can account for variance, false positives, and other issues which often crop up from real world data.

Step 3: Model the Data

This step is fundamental as it allows you to structure your data in such a way that you can start recognising patterns that potentially allow you to extract future trends. Models also allow you to formally describe your data. This is helpful in understanding the results you get from your data analysis but is also a good starting point when it comes time to visualise your results.

Similarly to data extraction, your models should undergo the same scrutiny. You should ensure that your models are valid representations of the issue you are trying to predict. Consulting with domain experts is often a good idea. Trying to predict inflation for the next years? Well you should probably speak to an economist as a first step when defining the model. When modelling ontologies at Grakn Labs I cannot count the times an expert on hand would have saved us from hours of deliberation.

Step 4: Predicting the Future

I would have stuck with the charts.

Your data is extracted, cleaned, quality checked, and fits your model. Time to start peering into your crystal ball and predicting the future. . . Oh wait, there are multiple crystal balls to use and you not sure which one will work.

This is where the massive field of machine learning can come into play. There are a multitude of ways to start recognising patterns in your data and exploiting those patterns. Neural Networks, Linear Regression, Bayesian Networks, Deep Learning: all of these and many more can help you to start making predictions. Personally, I recommend Graph Based Analytics, but I may be a bit biased here.

Luckily, data analytics is becoming so desirable these days that many of these tools are available as simple applications. This means that it is now much easier to start analysing your data without the need to understand how each crystal ball works.