Building a Model to Forecast Inflation and Disinflation

6 min readSep 24, 2020

The Federal Reserve attempts for forecast inflation every month and they state that their margin of error can be in the ballpark of + or -240 bps. In other words, if inflation is .7% then the FED’s forecast could miss the mark by 2.4%. Yes, more than triple the actual inflation rate. One of the major issues of forecasting in inflation is that metrics for measuring inflation has changed multiple times in the last decade. For more on this and forecasting done by the FED, here’s a link to their approach . Additionally, there is asset inflation, consumer inflation, and monetary inflation. Asset inflation is fairly obvious when the stock market is hitting all time highs. It’s harder to recognize consumer inflation when prices seem higher at the grocery store and gas prices are nearly half of what they were two years ago.

In this project, the Consumer Price Index (CPI) was my target. I created multiple classification models from logistic regression, random forest classifier, gradient boosting, and XGB classifier. I set out to predict whether economic inflation was ‘accelerating’ or ‘decelerating’.

The ISM Manufacturing Index (ISM) is a forward-looking questionnaire that is given to 300 manufacturing companies and addresses the ebb and flow of business conditions. Who better to answer this questionnaire than the purchasing managers of those businesses. For more information on the ISM click here. In short, an ISM above 50 is seen as a positive business outlook and under 50 is seen as a negative for investors. Looking at the ISM, you can see many correlations between the movement of commodities and the Core CPI. Below are a couple charts showing similar movements.

Data comes from FRED (Federal Reserve Economic Data)

There are more correlating charts but here is one [below] that has a keen correlation to the ISM especially so starting in 2008 during the global financial crisis.

My target for the model was determining whether inflation is ‘accelerating’ or ‘decelerating’. The baseline for classification was 51.9% (on training data) and favoring deceleration. I calculated the baseline by using value_counts() on the y_train[target] and normalizing the y_train values (acceleration and deceleration) to get a percentage of occurrences of the max percentage. The validation target favored acceleration slightly more so the shift from deceleration to acceleration frequency should be noted. This baseline [for the validation set] was 54% favoring acceleration but I was unaware of this until after the model was completed. I didn’t want any personal biases to flow into the model creation process. Later in the project I modeled a XGBoost with a train (.942), validation (.507 Accuracy), and test set (.62 Test Accuracy) in which the results were not as accurate as the models I settled on. I believe some of this had to do with there not being much historical data on the features I used. The most of the features I used only go back to 1994 and I used monthly data points.

Multiple models were used including a logistic regression (59.4% accuracy). This model did provide results however, the logistic regression lacks the robustness needed to model this data. The data set did not provide much linearity and the outcomes were contingent on many dynamics in the data. For example, when one feature began to rise in rate of change terms another that feature would have more importance when previously it had very little effect. Below is a PDP interact grid plot showing the 10 year U.S. Bond yield (y-axis) and the WTI price rate of change. While the rate of change of WTI remained low (below 23.5) the feature had little effect on the model compared to the 10 year bond yield. Once WTI RoC broke the threshold of 23.5 it had more importance on the model than the 10 year bond. This plot shows the dynamic nature of the model and why the logistic regression model would be less accurate than other models.

I want to mention two models that have different modeling approaches: random forest classifier and gradient boosting classifier. The GB model produced a 72.9% accuracy while the RF model had a 70.2% accuracy. Despite the lower accuracy score with the RF model I don’t think it should be considered less than the GB model. As mentioned earlier the training set favored deceleration at 51.9% and the validation set favored acceleration at 54% (regarding the frequency of the outcome). The RF model confusion matrix (see below) shows that this model picked up on the acceleration targets better than the GB model. However, the RF model correctly identified acceleration and deceleration in a more balanced way. Therefore, both models have their strengths.

(Above) Gradient Boosting Classification Confusion Matrix

(Above) Random Forest Classification Confusion Matrix

The outcome of the gradient boosting model:

True Positives = 13

True Negatives = 14

False Positives = 7

False Negatives =3

There is one reason I think the GB model is more robust than the RF model in regards to this time series model. The modeling happens in series rather than in parallel. Since econometrics are a dynamic landscape with new information constantly changing the landscape you need model that changes as new data comes in but those changes should take into account the previous data and makes adjustments accordingly.

Above are a few features that were important in creating this gradient boosting classification model. We can see that the rate of change of particular currencies and commodities as well as the ratios and correlations had a strong effect. It should be noted that the actual prices had very little effect on inflation. The strongest features had more to do with the direction in which the correlations, ratios, and rate of change were moving rather than whether price was rising, falling. There was a limitation to this model. The model only has two targets, accelerating and decelerating. In my opinion, there should be a third possible outcome, neutral. Because of my own limitations I was unable to apply a metric that could determine whether inflation was neutral. What metric could I use to determine neutrality?

Here we can see a fairly balanced precision and recall. This equates to an f1-score of .72 and .74. The model has a decent classification distribution in labeling false negatives and false positives. Since this classification model started with a baseline of .519 this shows that the class distribution has an even distribution (nearly .50) and accuracy may be a better metric. In this model the accuracy is harmonically balanced with the f1-score of .73 . One issue I ran into was that there was not much historical data to get a substantial data set. Economic data has a short history and most of the data is only collected quarterly or monthly.

Forecasting and modeling could be considered more of an art than a science. As with each brush stroke the canvas changes and something similar can be said about the dynamic nature of modeling financial markets. Full disclosure… I am a student of the markets and it is this curiosity of mine that seeks to understand the function of different market variables and their effect on inflation and market behaviors. Diving into the interactions of these asset classes and inflation, I now have more questions than before. Most economists define inflation as rising prices of consumer goods. What if their is another variable that defines inflation such as the velocity of money? Below is a chart that depicts this correlation. For a later and more in-depth study I’ll potentially be looking into the Core CPI and the M2 Velocity.

Building a Model to Forecast Inflation and Disinflation

Written by Drew Bordelon