Recession Prediction Model Using Machine Learning

kyleelamb1324
Jan 31, 2022
3 min read

Updated: Feb 14, 2022

Recession Prediction Model Using Machine Learning - Kyle E. Lamb

Problem Statement:

Can we predict future US recessions using economic data obtained from the US financial markets?

One of the biggest problems in stock market cycles is recessions. During these periods of time, it is very common for businesses and individual investors to lose a significant value in their portfolios due to the rapid sell-off that occurs during these bear markets. It is important for investors to be able to distinguish, with some sort of confidence, the probability of a market crash occurring in the proximate future. With such knowledge, we can act in advance to hedge our portfolios in assets that will protect our wealth against a market crash.

In a Nutshell:

This project will attempt to apply machine learning techniques and data analysis to create a model for predicting future US recessions on a twelve month basis. The model is designed using multiple economic factors, such as the US Yield Curves, Federal Funds rate, and the Shiller PE ratio.

Assumptions:

There are a few assumptions that have to be made in order for this model to have any value:

Past US recessions exhibit similar behavior compared to each other.
A selection of economic data correlates with the likelihood of a recession occurring within some period of time

Data Visualization:

In order to create a concrete recession prediction model, we are going to need data that covers different characteristics of the US financial market. The table below represents the final dataset which contains one valuation feature (Shiller PE) and three monetary policy features (Federal Funds rate and Yield Curves).

Below is our data visualized in graph form.

Model Structure:

I decided to create the model so that the output is a probability that a recession will occur within a year from now, where a value greater than 50% would mean the model predicts that a recession will occur within the next twelve months.

Models Considered:

I considered the following eight algorithms for the model: K Nearest Neighbors, Random Forest Classifier, Ada Boost, Logistic Regression, Neural Networks, Gaussian Naive Bayes, Quadratic Discriminant Analysis, and Decision Tree.

After training and validating each model's statistics on in-sample data, I found that an ensemble approach of the top three algorithms had the best results. The final prediction model consisted of Naive Bayes, Quadratic Discriminant Analysis and the Random Forest Classifier. The results of the model over the entire dataset are shown below.

Statistical Report:

Testing and Validation:

To test the model, I trained it on a portion of the dataset and attempted to predict the Covid-19 recession using only economic data. I believe that this is a valid test to evaluate the models performance as the Covid-19 recession was not fully economic, but was due to an emergency pandemic response. The results for this test are shown below along the the statistical report of the model.

Statistical Report:

Conclusion:

From the statistical reports, we can see that although the model trained on the entire dataset performed extremely well, it was likely victim to overfitting. This is possibly the reason that our test sample, the Covid-19 recession, did not perform as well. Though, our test sample was accurate. The model correctly classified whether a recession will occur 92% of all days, and of those days that the recession did occur, it had a precision of 86%. This means that when the model predicts that there will be a recession, it will be accurate 86% of the time.

Model Downfalls:

Although the results were acceptable, there are some factors that are worth mentioning. Due to the limited frequency of recessions, it is not possible to substantially validate the model via testing. Due to the nature of time series data, we cannot test our model on the 2000 dot com bubble or the 2008 financial crisis. This is because time series data is temporally dependent, meaning data points close to each other are often very dependent on one another. It does not make sense to predict the past with data from the future.

Bottom Line:

This model is just another tool that investors can use to minimize risk inside the markets. Strategies can be created to hedge against the market to protect the value of our portfolios when likelihood of recession is probable.

Recession Prediction Model Using Machine Learning