Conclusions
In this data exploration, I sought to examine the relationship between various indicators of economic well being in America. Specifically, I was interested in the relationship between formal measure (like unemployment rates from the Bureau of Labor Statistics), and potential informal signals, namely google trends data on economic topics. How close were these measures? Were some of these variables useful leading indicators of others?
Getting to Know Our Data
All time series analysis begins with visualizations to understand our data. Through visualizing my time series, I was quicly able to tell some key patterns that might shape how I'd need to treat my data (if I were to use all of the data in all my analyses).
- First, I could see that unemployment and consumer sentiment seemed to track well with major historical economic events.
- Google searches had a weaker relationship with economic events. Searches for 'food stamps' and 'unemployment' both rose during the Great Recession and spiked hugely during COVID.
- Gas prices and google searches for 'cheap gas' did not seemed related to each other, but not to the other series I examined.
- Food stamps were a very smooth time series with almost no volatility, which suggested to me that it might not be very useful in modeling the more frequent ups and downs of the economy.
- Visualization also made clear just how significant the impact of COVID-19 was in my data sets. Because I saw this, in some cases I chose to do analysis on data that did not include 2020, because those points were such outliers.
Relationships Among The Variables
Because I was interested in the relationship among variables, I chose to do an extra section of my website on multivariate relationships: please look there for the full analysis, which I will re-summarize here.
I was able to fit a VAR(1) model, that showed that unemployment was significantly related to lag-1 variables on unemployment, google searches for 'unemployment', consumer sentiment, and google searches for 'food stamps'. This is important becasue it suggests that google trends could give a 1-month warning of unemployment statistics.
Areas for Future Analysis
This time series analysis suggests several areas for future research.
- Google Trends: google trends for any search terms area available from 2004 to the present. Thus, a researcher could test a wide variety of economic terms to see which ones have the strongest relationship to formal economic measures like unemployment. For example, one could test searches for 'job', 'welfare', 'lost work', 'hiring', 'cost of living', 'recession', which may suggest negative economic realities. Conversley, there may be Google searches we could try that would be correlated with lower rates of unemployment, like searches related to major purchases, vacations, leisure activities, etc.
- Leading Indicators: It would also be intriguing to see which google search terms had the longest lag relationship to formal economic measures. For example, in my 'Multivariate' section I tested a VAR(1) model. Are there any google search terms that would have statistically significant coefficients on unemployment with a lag of 6? Ie, unemployment in month T is related to a google search from time T-6.
Forecasting Does Poorly in the COVID-19 Era
Here I compare the chosen SARIMA(4,1,1)(2,0,2)12 model of US unemployment with various benchmark methods (the series mean and the naive forecast), and with the actual rate for 2020 (January onwards). This just works with data from January 2004 onward.
As we can see, all of the forecasting did terribly for 2020 because none of them could have accounted for the fact of the COVID-19 crisis. The 'mean' method (which simply predicts that the future will be equal to the mean of all past observations) did the best, but still not great.
Please see the ARIMA page for discussion of the error rates of these methods.
In this way, COVID-19 is a reality-check on the limits of time series forecasting, because these basic methods assume that patterns and relationships of recent time periods will hold in the future.
It is possible that, going forward, the huge spikes in economic indicators in 2020 will require us to increase the uncertainty in future models, ie in anticipation of future pandemics. Having seen the high impact of a public health emergency on the US economy, perhaps forecasters could incorporate information about the current risk of health threats, or just project a larger confidence interval to capture this possiblity.
Images Source: New York Public Library at https://unsplash.com/photos/Q0_u7YwXqh0